Building a C++ Dependency Analyzer Test-First.The Series Introduction Part 1 Part 2

Sitting down to write an article is a strange experience for me. I always feel that it would be easier if I just started in the middle and took care of the beginning later, but that doesn’t always work. Since I’m on a roll, I’ll continue. By the way, if this paragraph isn’t in the article when you read it, Ron edited it out. I’m pretty sure he will, since I’ve included his MasterCard number here: XXXX XXXX XXXX XXXX (expires XX/XX). Ahhh, the world has been saved from my rambling intro.

Editor's note: Just minimal edits this time ;->

Before leaping into more tests, I want to explain a few of the decisions I made in the last installment. Test first design felt pretty alien to me when I first started doing it. There was a decision around every corner, and it was hard to get a sense of whether I was approaching things at the appropriate level. So, what I’ll do is add a few heuristic blurbs as we go along. They aren’t rules, they are just things that have helped my work as I’ve gone forward.

Fulfilling the First Test

In our last episode, I started with a single story. We need an object that tells us how many lines there are in a C++ file. I jump-started it with this test:

	public void testLineCount () {
		CppFile file = new CppFile ("TaskIoDevice.cpp");
		assertEquals (9, file.lineCount ());
	}

The test for lineCount gives us a few options. After all, it just defines the interface of a class; we are left with the work of writing the code. One of the things that I could have done is place the file manipulation and line counting code in the CppFile constructor. At that point, lineCount () would just be a getter method. Instead, I opted to put the code in the lineCount method. Why? Does it matter?

One thing I’ve noticed is that I end up making fewer changes if I start with trivial constructors and make my first method very responsible. If your constructor is responsible for putting your object into a good initial state and doing something else, then it is really doing two separate things. Add another test and maybe the constructor will do three or four; there is no end to the madness. So, my rule of thumb is:

Make your first method responsible. Keep a slim constructor (Heuristic Blurb #1)

For a while, I couldn’t find a compelling argument for this approach. After all, isn’t putting code in a constructor simple? Doesn’t it feel wrong to place a for-loop in a method that looks like a getter?

I think that the responsibility argument is strong. If your constructor is putting your object into a good state and doing other work, it is doing too much. To do anything at all with an object, you have to write a method that handles your first intention. In this case, it is lineCount. Let that method do the work.

The First Factoring

Once we were able to get the line count, we approached our next story: return the total of the number of lines in a file and all of the files it includes. At that point, I decided to give CppFile a new name: TranslationUnit. The name seemed to fit the classes responsibilities better. Here is the test we were using:

	public void testTotalLineCount () {
		TranslationUnit unit = new TranslationUnit ("TaskIoDevice.cpp");
		assertEquals (34, unit.totalLineCount ());
	}

This test provokes a bit too much new behavior. The TaskIoDevice.cpp file has a #include directive in it and the sum of all the included files is thirty-four, but how do we write the code for this incrementally? Unfortunately, this dilemma comes up over and over again in test first design: you know what you want your object to do, but the next piece is too big. At this point, I use one of two strategies: Grow then Split or Split then Grow.

Grow then Split

People use many different analogies when describing object-oriented development. In some circles, building construction is a favorite. Systems are designed by architects, and then coders translate the plans into code via a process of construction. For me, this isn’t the most heartening analogy. Software is much more pliable than the average building. It offers us many more possibilities.

To me, object-oriented development is closer to a biological process. Objects are like cells. As their volume increases, the ratio of internal substance to interface grows. At a certain point, growth becomes hard and it is easier to split than it is to grow. This is good because a few smaller interfaces can present more opportunities than one large one. For an example, see Martin Fowler’s video store example in his Refactoring book. At the start, a single method contained just about all of the code in a small system. After Martin finished refactoring it, he had many little pieces, but it easy to see that those objects could accommodate a wide variety of requirements changes. The system was just more versatile because it was aerated with more interfaces. As bizarre as it sounds, that is how I think about objects. I apologize for mentioning it in public. Back from the digression…

In the Grow then Split strategy, you do everything that you can to fulfill your test without creating a new class. Let’s look back at TranslationUnit’s totalLineCount method.

public class TranslationUnit
{
	public TranslationUnit (String pathName) {
		this.pathName = pathName;
	}

	public int lineCount () {
		int lineCount = 0;
		try {
			LineNumberReader reader = new LineNumberReader (
					new FileReader (pathName));

			while (reader.readLine () != null)
				;
			lineCount = reader.getLineNumber ();
		}
		catch (FileNotFoundException ignored) {
		}
		catch (IOException ignored) {
		}
		return lineCount;
	}

	private String pathName;
}

If we need to count the lines in all of the included files, we need to figure out whether a line has an include directive. Here is one approach that we could use:

  1. Write a test for a new method on TranslationUnit: containsIncludeDirective (String), and then write the code for it. This method would answer true if the string contains a valid C++ include directive.
  2. Write a test for a new method: includeFileName (String) which returns the parsed file name from the include directive.
  3. Add calls to both methods in lineCount ()
  4. .

The code would end up looking like this:

	public int lineCount () {
		int lineCount = 0;
		try {
			LineNumberReader reader = new LineNumberReader (
					new FileReader (pathName));
			String line;
			while ((line = reader.readLine ()) != null) {
          			if (containsIncludeDirective (line)) {
              				TranslationUnit includedUnit
							= new TranslationUnit (includeFileName (line));
              				lineCount += includedUnit.lineCount ();
          			}
			}
			lineCount += reader.getLineNumber ();
		}
		catch (FileNotFoundException ignored) {
		}
		catch (IOException ignored) {
		}
		return lineCount;
	}

Pretty ugly, but we can refactor to clean it up. We can extract a method for the body of the while-loop and call it processLine (String). It might make sense to make lineCount an instance variable also.

What we are doing is building up the functionality that we want inside the class: growing it. Interestingly, each of the methods we’ve added only refers to a line of text. This is one indication that the class can be divided. This is similar to the smell Martin Fowler calls data clumps but there is only one item in this clump: the line data. For me, it is enough. I have a general heuristic that I use when I look at my code:

If accesses in a class are disjoint, consider making another class (Heuristic Blurb #2)

Considering isn’t necessarily doing, but this heuristic helps me consider possibilities. In this instance, there is a single piece of data, the line, that that is the focus of all the attention of several methods. It is as if the string that contains the line had feature envy; the methods do not care about anything else. So, we’ve found a natural cleavage in the class and we can break it in two if we wish. The fact that I can think up a new name for the class, Line, helps me decide to do it.

Split then Grow

Split then Grow is just Grow then Split done sooner. In the case of TranslationUnit, I got an immediate sense that the containsIncludeDirective (String) method was misdirected. It would be easy enough to create a Line class and let it be the focus of responsibility for our line processing.

Which is a better approach: Grow then Split or Split then Grow? The answer is: it depends. Grow then Split is inherently a more conservative approach; it is easier to just do your work in place. On the other hand, you run the risk of creating some pretty obtuse code that you’ll have to refactor in a few minutes. It pays to pay attention to your comfort level as you grow your class. If the next class becomes clear to you quickly, it does not make sense to keep adding responsibilities to the current class: factor out the new class at the next opportunity.

Grow in place until the next class becomes clear to you (Heuristic Blurb #3)

In general, I like to Grow then Split. It allows me to concentrate on the immediate problem. Afterwards, I just have a refactoring step. However, when I see that a method belongs on a different class (one which may not even exist yet), I move towards Split then Grow. It is a calculated gamble. In the worst case, I just have to inline the class later.

Conclusion

My word processor is telling me that I’ve written five pages so far. I guess it is time to close off. This article didn’t go quite the way I had planned. I planned to continue with the dependency analyzer and elaborate the Line class a little more. But, I did feel that I needed to confront some of the issues left over from the first installment. Otherwise, they would have bottled up and exploded some time later in the series. I don’t think that would’ve been would be pretty.

Next time, we will march through several tests and add some new capabilities to the dependency analyzer.