When I'm programming, I work very incrementally, almost intentionally with no grand plan other than my stories. I pay attention to the design, but I'm not working toward a planned design. When the code starts pushing back, I make it better. This feels to me like a course of discovery rather than bringing about some carefully thought out transformation.

Warning

You’ll want to skim this article, I think, rather than work through the details. It’s describing the beginning of refactoring an existing program for improvement. My purpose isn’t to show a beautiful hand-crafted refactoring that goes through neat little steps and changes a pile of junk into a Ferrari. This is a day in the life of a real programmer improving a program that needs it.

Programming a Blog

In the case in hand, we’ll look at my blog program. If I was to have a blog, it seemed to me, I should write it, not just use one that’s out there, so that I could learn something along the way. So Chet and Paul Friedman and I have been building the blog that’s now running on XProgramming, Hot Needle Of Inquiry. We’re using C#.NET, and the underlying page layout follows a Wiki style of markup.

I want to cut to the chase here very quickly, providing background as we need it. Let me know via the contact page if this moves too quickly.

Basic Operation

The blog stores articles in files, each article in a separate file. There are several web pages involved: one to display articles, one to edit them, and a forthcoming one to allow readers to comment. In what I hope is proper C# style, the .ASPX files representing the pages are backed up with a C# “code-behind” file that fills in the fields and runs the buttons. Those files use various “model” objects behind the scenes.

The current model object is Article, which represents an article, the text in the middle pane of the blog page. And article is stored on the server as a text file, using various naming conventions that we’ll talk about later. The text is a simple wiki kind of markup, where enclosing something in double apostrophes leads to italics on the page, and so on. The Article has an ArticleFiler member variable, which knows how to read the text to and from files. And it’s this object, the ArticleFiler, and its usage, that is at the center of the current exercise.

As Chet and I work on the code, we find that we are relying more and more on running the blog locally. Our tests don’t make us comfortable. Two or three times now we have said that we feel that we need some kind of refactoring to make the code easier to understand and easier to test, but while we have made a little progress around the edges, we still have this feeling. And we don’t trust our tests, and some of them are set to Ignore because they have to be edited whenever we run the blog manually. It’s time to improve the situation.

As We Stand ...

Here’s some of the important bits of code as they stand now. The Page.aspx.cs class operates the main blog page:

  public class Page : WikiPage {
    protected Button EditButton;
    protected Image Logo;
    protected Label ArticleContents;
    protected Button Button1;
    protected TextBox Password;
    protected System.Web.UI.WebControls.Label ArticleContent;
    protected HtmlGenericControl title;

    private void Page_Load(object sender, EventArgs e) {
      if (Request.Params["display"] == null)
        Response.Redirect("Page.aspx?display=HotNeedleOfInquiry");
      this.title.InnerText = WikiNameTagger.SpacedWikiWord(this.RequestFileName());
      this.ArticleContents.Text = Article.Html();
      this.ArticleContents.Visible = true;
    }...

All that matters to us at the moment is that we get the text of the Article by sending it the message Html(). But what is “Article”? We find that in the Page superclass, WikiPage:

  public class WikiPage : System.Web.UI.Page {
    protected Article article;

    override protected void OnInit(EventArgs e) {
      base.OnInit(e);
      article = new Article(RequestFileInfo());
    }

    protected Article Article {
      get { return article; }
    }

    protected FileInfo RequestFileInfo() {
      FileInfo info = new FileInfo(RequestPath() + "\\" + RequestFileName());
      return info;
    }

    // TODO this is used a lot. Why do so many people do this?

    protected virtual string RequestFileName() {
      string fileName;
      fileName = Request.Params["display"];
      if (fileName != null) return fileName;
      return("");
    }

    private string RequestPath() {
      string path = Request.MapPath(@"Pages\");
      return path;
    }
  }

(Note, in passing, the TODO comment. There are quite a few uses of this call for the RequestFileName. That seems inappropriate and is one of the reasons we’re on this quest: “It just don’t look right to me …”.

We see now that our property Article is an instance of Article class, created with a FileInfo as its parameter, containing the path to the blog Pages (in the Pages folder) and the name of the file as shown in the display= parameter. The Redirect in the Page_Load of the Page file ensures that if we have no display parm, we use “HotNeedleOfInquiry” as the parameter.

The Article class looks like this, with some deletions that aren’t needed for general understanding:

  public class Article {
    ArticleFiler articleFiler;
    private string pageName;

    public Article(FileInfo file) {
      this.articleFiler = new ArticleFiler(file);
      this.pageName = file.Name;
    }

    public void WriteText(string newText) {
      articleFiler.WriteText(newText);
    }

    public string Html() {
      HtmlCalculator calculator = 
        new HtmlCalculator(pageName, Text, articleFiler.DirectoryName);
      return calculator.Html();
    }

    public virtual string Text {
      get { 
        if (articleFiler.Name == "HotNeedleOfInquiry")
          return TableOfContentsText();
        else
          return ArticleText();
      }
    }

    protected string ArticleText() {
      if (! articleFiler.Exists )
        return "Describe " + pageName + " here.";
      else
        return articleFiler.ArticleText();
    }
  }

We begin to see now how things are going to work. The Article creates an ArticleFiler, and passes the FileInfo “file” to it. Messages calling for the Text, or trying to write the text, are just forwarded on to the ArticleFiler.

A look at the ArticleFiler itself won’t be very enlightening at this point, but I’ll include some of it for your interest:

  public class ArticleFiler {

    const string invisibleExtension = ".invisible";
    const string commentExtension = ".wiki";
    private FileInfo fileInfo;

    public ArticleFiler(string filePath) {
			fileInfo = new FileInfo(filePath);
    }

    public ArticleFiler( FileInfo fileInfo)
      :this (  fileInfo.FullName) {
    }

    public string ArticleText() {
      if ( ! Exists )
        return "Cannot find article text";
      using (StreamReader reader = FileToRead.OpenText()) 
        return reader.ReadToEnd();
    }

    public void WriteText(string text) {
      using(StreamWriter writer = this.CreateText())
        writer.Write(text);
    }

    public bool Exists {
      get { return fileInfo.Exists || InvisibleFile.Exists || CommentFile.Exists; }
    }

    public string DirectoryName {
      get { return fileInfo.DirectoryName; }
    }

    public string Name {
      get { return fileInfo.Name; }
    }

    // TODO not entirely happy about this automatic file finding stuff

    internal FileInfo FileToRead { // testing and local only
      get {
        if ( InvisibleFile.Exists ) 
          return InvisibleFile;
        else if ( fileInfo.Exists )
          return fileInfo;
        else if ( CommentFile.Exists )
          return CommentFile;
        else // none exist
          return fileInfo;
      }
    }

    internal FileInfo FileToWrite { // testing and local only
      get { return FileToRead; }
    }

    private StreamWriter CreateCommentFile() {
      return CommentFile.CreateText();
    }

    private StreamWriter CreateText() {
      return FileToWrite.CreateText();
    }

    private FileInfo InvisibleFile {
      get { return new FileInfo(fileInfo.FullName + invisibleExtension); }
    }

    private FileInfo CommentFile {
      get { return new FileInfo(fileInfo.FullName + commentExtension); }
    }
  }

Reviewing this code now, I see a couple of things that I don’t like. First of all, the two creation methods seem odd. Why do we have two of them, and why does the one that comes in with a FileInfo actually turn that into a string and then call the other one? This kind of thing usually suggest some kind of historical tag end that needs cleaning up.

Second, I notice now that ArticleText() returns a default text if the article doesn’t exist, but it turns out that the corresponding method in Article also returns a default. One or the other of these needs to be deleted.

The Refactorer's Dilemma

What should we do? On the one hand, we are here with a somewhat grand goal of cleaning up the ArticleFiler, as I’ll describe in a moment. On the other hand, we see a few things that are making the code more complex. Should we take the opportunity to clean up these items, which might make our larger task easier or smaller? Or should we do the larger task, in hopes that these small tag ends will be easier to remove, or perhaps even disappear as a matter of course?

What is that grand goal? Well, there are two interrelated issues. First of all, I don’t trust the tests and find myself checking the blog pages manually. Second, some parts of the testing are clearly made more difficult by the tight binding of the Article to the ArticleFiler, and of the Filer to a directory. This is certainly why some of my RSS-related tests are turned off at the moment, and I feel intuitively that it is related to my reluctance to write other tests that I need.

So. Should we simplify before restructuring, or restructure and then simplify if it’s still needed? It’s usually my practice to do the simplest shortest task I can think of. And if I’m careful to make sure there are tests to support the changes, the experience might even help with that concern about the tests.

So let’s go with the simpler problems first. First, the creation methods. We have seen that the Page uses the FileInfo-bearing creation method. Who uses the pathname one? Probably a test. Well, not quite. We couldn’t be so lucky. One usage is indeed a test, in fact a whole raft of tests:

  public class TestWikiFileInfo : Assertion{

    private ArticleFiler Info(string path) {
      return new ArticleFiler(path);
    }

    [Test] public void Hookup() {
      Assert(true);
    }

    [Test] public void DoesntExist() {
      Assert(!Info("../Pages/MyBusinessIsToConfuse").Exists);
    }

    [Test] public void Exists() {
      Assert(Info("../Pages/MyBusinessIsToCreate").Exists);
    }

    [Test] public void InvisibleExists() {
      Assert(Info("../Pages/PasswordRequired").Exists);
    }

    [Test] public void WikiExists() {
      Assert(Info("../Pages/WikiTest").Exists);
    }

    [Test] public void FileToRead() {
      AssertEquals
        ("PasswordRequired.invisible", 
        Info("../Pages/PasswordRequired").FileToRead.Name);
    }

    [Test] public void WikiFileToRead() {
      AssertEquals
        ("WikiTest.wiki",
        Info("../Pages/WikiTest").FileToRead.Name);
    }

    [Test] public void FileToWrite() {
      AssertEquals
        ("PasswordRequired.invisible", 
        Info("../Pages/PasswordRequired").FileToWrite.Name);
    }
  }

All these tests use the string version of the ArticleFiler creation method. It would be easy enough to fix them by just creating the FileInfo in the test. But what about the non-test usage? This one shows up in a class we haven’t seen yet, the WikiNameTagger. In a wiki, a “wiki word” is a word that begins with a capital letter, and has at least one non-adjacent cap inside. “WikiWord” is a wiki word. When someone types a word in that format, there are two possibilities for what should happen. If a page of that name is defined, then the word should be enclosed in anchor tags linked to that page. If no such page exists, the word is left alone, but a question mark is appended after it, and that question mark is linked to a page for defining the word. The WikiNameTagger determines whether the name exists, by determining whether the file exists, and makes the appropriate anchor substitution:

  public class WikiNameTagger {
    private string path;
    public static Regex wikiWord = new Regex("!?(([A-Z][a-z]+){2,})");
    private MatchEvaluator subs;

    public WikiNameTagger(String path) {
      this.path = path;
      subs = new MatchEvaluator(LinkSubstitution);
    }

    private string LinkSubstitution(Match match) {
      string name = match.Groups[1].Value;
      if ( name.StartsWith("!") )
        return name;
      ArticleFiler testArticleFiler = new ArticleFiler(path + "\\" + name);
      if ( testArticleFiler.Exists )
        return ExistingLinkSubstitution(match);
      else
        return MissingLinkSubstitution(match);
    }

    private string ExistingLinkSubstitution(Match match) {
      return "<a href=\"Page.aspx?display=" + match.Groups[1] + "\">"
        + SpacedWikiWord(match.Groups[1].Value) + "</a>";
    }

    private string MissingLinkSubstitution(Match match) {
      return
        match.Groups[1]
          + "<a href=\"Edit.aspx?" + match.Groups[1] + "\">"
          + "?</a>";
    }

    public string TagWikiNames(string line) {
      return wikiWord.Replace(line, subs);
    }

    public static string SpacedWikiWord(string word) {
      StringWriter spacedName = new StringWriter();
      bool spacing = false;
      foreach ( char c in word ) {
        if ( spacing && Char.IsUpper(c) )
          spacedName.Write(" ");
        spacedName.Write(c);
        spacing = true;
      }
      return spacedName.ToString();
    }
  }

In the LinkSubstitution method above, we see our use of ArticleFiler to determine whether the file exists. And the WikiNameTagger itself is created on a path. The only usage of the Tagger is in the Html() method, which we haven’t seen. It turns out that the Html() method is passed the directory name of the pages, and passes that on to the Tagger.

Wow, Just Like Legacy Code!

Look at this situation. In just a few Sessions of work, Chet and I have created some really great legacy code! (To be fair, most of this was done without Chet’s help, which certainly makes it Chet’s fault that it happened.) We have one basic idea, finding out whether files exist, and we’re doing it at least two different ways, sometimes with a FileInfo object, and sometimes by passing string directory and file names about. Now in your legacy application, there might be three or four ways of doing the same thing, and there would probably be many occurrences of each. At least here we have the advantage that there are just a few things to clean up.

This situation is already pretty grubby, and a bit daunting. But I’ve got an idea. Here’s my cunning plan.

We’ll create a new object that is like a FileInfo, in that it knows a directory name. Maybe it will also know a file name: we’ll find out about that as we go. We’ll create that new object in the Page class, and we’ll start passing it in as an additional parameter to all the methods that are currently using the ArticleFiler to find things out. We’ll begin to defer actions to that parameter, instead of to the built-in member variables like ArticleFiler, and to creations inside the objects, until finally, all remnants of ArticleFiler are gone. In the course of this, small things will get more clean as we go along.

I hope. This might not work – it might even make things worse. Let’s find out.

TestWikiFile

Just to get our hands dirty, let’s change the TestWikiFileInfo test not to use the path-creating version of ArticleFiler. To do that, we just have to change the Info() method:

    private ArticleFiler Info(string path) {
      return new ArticleFiler(new FileInfo(path));
    }

Voila! That’s done and the tests still run. Can we snag the other reference to the file version as well? That’s in WikiNameTagger, and we can certainly change it to create a FileInfo as well. But wait! If we do that, we will have duplication, in this case between the tests and the code, but duplication still. We can’t do that without a good reason, because duplication is bad. Let’s back that change out and think again.

    private ArticleFiler Info(string path) {
      return new ArticleFiler(path);
    }

Let’s go the other way. Let’s change all the creators of ArticleFiler to use the path, not the FileInfo. That might be a good step anyway, as a FileInfo is a rather complex and odd object to be using anyway. There’s just one occurrence of creating the ArticleFiler that way, in the Article constructor, which we change to:

    public Article(FileInfo file) {
      this.articleFiler = new ArticleFiler(file.FullName);
      this.pageName = file.Name;
    }

Now we can remove the FileInfo version of the ArticleFiler constructor. Internally, though, the ArticleFiler still uses a FileInfo. Let’s change it to use the path and the file name instead. Notice that most of what ArticleFiler does anyway is to create a bunch of FileInfo objects when it needs them. Let’s just make it create one more. Here it is as it stands:

  public class ArticleFiler {

    const string invisibleExtension = ".invisible";
    const string commentExtension = ".wiki";
    private FileInfo fileInfo;

    public ArticleFiler(string filePath) {
			fileInfo = new FileInfo(filePath);
    }

    public string CommentText() {
      if (! CommentFile.Exists )
        return "----/r/nComment Here";
      using (StreamReader reader = CommentFile.OpenText())
        return reader.ReadToEnd();
    }

    public string ArticleText() {
      if ( ! Exists )
        return "Cannot find article text";
      using (StreamReader reader = FileToRead.OpenText()) 
        return reader.ReadToEnd();
    }

    public void WriteText(string text) {
      using(StreamWriter writer = this.CreateText())
        writer.Write(text);
    }

    public void WriteComment(string text) {
      using(StreamWriter writer = this.CreateCommentFile())
        writer.Write(text);
    }

    public bool Exists {
      get { return fileInfo.Exists || InvisibleFile.Exists || CommentFile.Exists; }
    }

    public string DirectoryName {
      get { return fileInfo.DirectoryName; }
    }

    public string Name {
      get { return fileInfo.Name; }
    }

    // TODO not entirely happy about this automatic file finding stuff

    internal FileInfo FileToRead { // testing and local only
      get {
        if ( InvisibleFile.Exists ) 
          return InvisibleFile;
        else if ( fileInfo.Exists )
          return fileInfo;
        else if ( CommentFile.Exists )
          return CommentFile;
        else // none exist
          return fileInfo;
      }
    }

    internal FileInfo FileToWrite { // testing and local only
      get { return FileToRead; }
    }

    private StreamWriter CreateCommentFile() {
      return CommentFile.CreateText();
    }

    private StreamWriter CreateText() {
      return FileToWrite.CreateText();
    }

    private FileInfo InvisibleFile {
      get { return new FileInfo(fileInfo.FullName + invisibleExtension); }
    }

    private FileInfo CommentFile {
      get { return new FileInfo(fileInfo.FullName + commentExtension); }
    }
  }

We’ll change it to store the path and file name, extracted from the input path, and see what happens.

  public class ArticleFiler {

    const string invisibleExtension = ".invisible";
    const string commentExtension = ".wiki";
    private string path;
    private string articleName;

    public ArticleFiler(string filePath) {
			FileInfo fileInfo = new FileInfo(filePath);
      path = fileInfo.DirectoryName;
      articleName = fileInfo.Name;
    } ...

This, of course, causes lots of other references to fileInfo not to compile, because we have removed that variable. I’ll begin by creating the missing FileInfo dynamically:

    private FileInfo BaseFileInfo {
      get { return new FileInfo(path+articleName); }
    }

And I’ll replace references to fileInfo with calls to that property, for example, changing:

    public bool Exists {
      get { return fileInfo.Exists || InvisibleFile.Exists || CommentFile.Exists; }
    }

to:

    public bool Exists {
      get { return BaseFileInfo.Exists || InvisibleFile.Exists || CommentFile.Exists; }
    }

Doing this blindly should make everything work again. Let’s see … well, modulo some trouble concatenating strings to make file names, it does. With a couple of helper methods added, we get this:

  public class ArticleFiler {

    const string invisibleExtension = ".invisible";
    const string commentExtension = ".wiki";
    private string path;
    private string articleName;

    public ArticleFiler(string filePath) {
			FileInfo fileInfo = new FileInfo(filePath);
      path = fileInfo.DirectoryName;
      articleName = fileInfo.Name;
    }

    private string FileName( string path, string name, string extension ) {
      return path + "/" + name + extension; 
    }

    private string FileName ( string path, string name ) {
      return FileName(path, name, ""); 
    }

    public string CommentText() {
      if (! CommentFile.Exists )
        return "----/r/nComment Here";
      using (StreamReader reader = CommentFile.OpenText())
        return reader.ReadToEnd();
    }

    public string ArticleText() {
      if ( ! Exists )
        return "Cannot find article text";
      using (StreamReader reader = FileToRead.OpenText()) 
        return reader.ReadToEnd();
    }

    public void WriteText(string text) {
      using(StreamWriter writer = this.CreateText())
        writer.Write(text);
    }

    public void WriteComment(string text) {
      using(StreamWriter writer = this.CreateCommentFile())
        writer.Write(text);
    }

    public bool Exists {
      get { return BaseFile.Exists || InvisibleFile.Exists || CommentFile.Exists; }
    }

    public string DirectoryName {
      get { return path; }
    }

    public string Name {
      get { return articleName; }
    }

    // TODO not entirely happy about this automatic file finding stuff

    internal FileInfo FileToRead { // testing and local only
      get {
        if ( InvisibleFile.Exists ) 
          return InvisibleFile;
        else if ( BaseFile.Exists )
          return BaseFile;
        else if ( CommentFile.Exists )
          return CommentFile;
        else // none exist
          return BaseFile;
      }
    }

    internal FileInfo FileToWrite { // testing and local only
      get { return FileToRead; }
    }

    private StreamWriter CreateCommentFile() {
      return CommentFile.CreateText();
    }

    private StreamWriter CreateText() {
      return FileToWrite.CreateText();
    }

    private FileInfo BaseFile {
      get { return new FileInfo(FileName(path, articleName)); }
    }

    private FileInfo InvisibleFile {
      get { return new FileInfo(FileName(path, articleName, invisibleExtension)); }
    }

    private FileInfo CommentFile {
      get { return new FileInfo(FileName(path, articleName, commentExtension)); }
    }
  }

Note especially the constructor:

    public ArticleFiler(string filePath) {
			FileInfo fileInfo = new FileInfo(filePath);
      path = fileInfo.DirectoryName;
      articleName = fileInfo.Name;
    }

This creates a FileInfo from the input path, just to deconstruct the path into path and articleName. Let’s change the constructor to accept those directly, and make everyone else separate them out.

    public ArticleFiler(string path, string articleName) {
      this.path = path;
      this.articleName = articleName;
    }

And fix everyone who sends the old one. That’s just a small matter of editing to change things from this:

    [Test] public void WikiExists() {
      Assert(Info("../Pages/WikiTest").Exists);
    }

to this:

    [Test] public void WikiExists() {
      Assert(Info("../Pages", "WikiTest").Exists);
    }

Tests all run. Where are we now, and why did we come here?

Is This Progress?

We have changed the ArticleFiler so that all it “knows” is a directory name and a file name. It uses FileInfo objects extensively internally, but that’s no surprise, as its job is to look for files and to open them for read and write. Externally, it’s a simpler object. That’s a bit of progress, but frankly it doesn’t satisfy me much. This has taken me a couple of hours and there’s not much improvement.

But there is some. There’s now only one way to create an ArticleFiler, and it’s a much simpler way. And the ArticleFiler is somewhat simpler inside.

In passing, I notice that the ArticleFiler can be seen as having three kinds of function. One, it answers questions about the mapping between article names and files, and two, it opens files, and three, it reads or writes them. We might want to break out these capabilities into two or three classes. But that’s for later.

So I think this is progress. Just one more thing before we sum up. Recall that the Article itself is created using a FileInfo:

    public Article(FileInfo file) {
      this.articleFiler = new ArticleFiler(file.DirectoryName, file.Name);
      this.pageName = file.Name;
    }

This is done from the base pages, of course, in WikiPage.cs:

    override protected void OnInit(EventArgs e) {
      base.OnInit(e);
      article = new Article(RequestFileInfo());
    }

    protected FileInfo RequestFileInfo() {
      FileInfo info = new FileInfo(RequestPath() + "\\" + RequestFileName());
      return info;
    }

There’s no reason for the Article to be created with a FileInfo and then unwrap it. Let’s just pass the path and file name right through. In Article:

    override protected void OnInit(EventArgs e) {
      base.OnInit(e);
      article = new Article(RequestPath(), RequestFileName());
    }

Requiring in Article:

    public Article(string directoryName, string articleName) {
      this.pageName = articleName;
      this.articleFiler = new ArticleFiler(directoryName, articleName);
    }

We look for users of the other Article constructor and change them. This turns up a couple of places in code we haven’t looked at here yet, but nothing worth thinking about. The changes were all straightforward, although they did turn up some other places where FileInfo objects are being passed about. I’ll mark them with TODO for now.

Results So Far

This article represents only a couple of hours of programming. The results are that the program is a bit better, and I’ve got a clearer notion of what I might do next. My plan for next time is to see about passing an ArticleFiler in to the methods in Article that read and write. That will take a little thinking as well as a little doing. Check back soon …