Building a C++ Dependency Analyzer Test-First.The Series Introduction Part 1 Part 2

One of the most disconcerting things about working with C++ is the fact that it is very easy to end up with a large code base that compiles at glacial speeds. On the other hand, you can end up with a code base that compiles relatively quickly if you pay attention to dependencies as you develop. Unfortunately, many projects let dependency get out of hand. Cleaning up the mess often requires considerable insight into the physical structure of the code. Tools can help. In this project, I’ll develop a dependency analysis tool for C++ in Java, one test at a time. The standard disclaimers apply. This code was not pair-programmed or customer-driven. In some sense, it is not quite real, but it suffices to demonstrate some aspects of emergent design.

What is the simplest thing that would be useful for someone trying to assess a large C++ code base? There are many “not so simple things” that would be useful, but being able to determine how many lines are in each file would be a good start. Let’s say that we have a file like this one:

1:
2: #include "TaskIODevice.h"
3:
4:
5: void TaskIODevice::writeOnPort (int port, int length, unsigned char *buffer)
6: {
7: }
8:
9:

We know that the number of lines in the file is nine, but we’d like to have an object to tell us that. Let’s place our expectation in a test in JUnit:

public void testLineCount () {
  CppFile file = new CppFile ("TaskIoDevice.cpp");
  assertEquals (9, file.lineCount ());
}

To compile this test, we’ll need a class named CppFile that accepts a file name and provides a lineCount method. In a strongly typed language, the compiler is the first test that you run.

public class CppFile
{
  public CppFile (String pathName) {
  }

  public int lineCount () {
     return 0;
  }
}

Now that the compiler passes, we can round out the code to satisfy the test in JUnit.

import java.io.LineNumberReader;

public class CppFile
{
  public CppFile (String pathName) {
     this.pathName = pathName;
  }

  public int lineCount () {
     int lineCount = 0;
     try {
       LineNumberReader reader = new LineNumberReader (
           new FileReader (pathName));

       while (reader.readLine () != null)
         ;
       lineCount = reader.getLineNumber ();
     }
     catch (FileNotFoundException ignored) {
     }
     catch (IOException ignored) {
     }
     return lineCount;
  }

  private String pathName;
}

Java’s LineNumberReader class comes in handy. We can query its objects to find out how many lines have been read. Java purists may be shocked by the fact that the code just swallows the exceptions that can be thrown by LineNumberReader and FileReader, but there is a very good reason for this. I haven’t decided how to handle errors yet. To make sure that I don’t forget, I write, “file not found, io error, cppfile” on an index card and slip its edge under my mouse pad. I also add another test that shows that the lineCount for a non-existent file is zero.

Our CppFile class works well. It gives us line counts for any file, not just C++ source files. Here is a sample invocation:

  CppFile file = new CppFile ("Task.cpp");
  System.out.println (file.lineCount ());

If we wanted to, we could create a little driver that iterates over a directory, creates CppFile objects and prints their line counts.

Establishing the total number of lines in a file is useful information, but it doesn’t really highlight problems. In C++ programs, the same header file can be included by many different source files. Moreover, if a header includes other headers it is hard to get a sense of where the compilation bottlenecks are. In bad cases, an innocent looking source file may transitively include half the headers in the project. So, it would be nice to know the transitive number of lines in a C++ source file: the number of lines in the file and in all files that are included during compilation.

First, we’ll need a file that includes other files. TaskIODevice.cpp includes its header file: TaskIODevice.h. TaskIODevice also has a superclass, so its header includes another file as well. With line counts from a good editor, we arrive at our expected value, 34 lines.

Let’s take a crack at it. Here is a new test:

  public void testTotalLineCount () {
     CppFile file = new CppFile ("TaskIoDevice.cpp");
     assertEquals (34, file.totalLineCount ());
  }

One thing that occurs to me immediately is that the CppFile class is more than a view into a C++ source file. It is giving us information about an entire translation unit. Maybe that would be a better name.

After renaming the class everywhere, we have:

public void testTotalLineCount () {
     TranslationUnit unit = new TranslationUnit ("TaskIoDevice.cpp");
     assertEquals (34, unit.totalLineCount ());
  }

So, what is the simplest way to make this test pass? We already have class that counts lines for a file. Maybe we can use it to count the lines in whatever files we include.

Let’s look at the code again.

public class TranslationUnit
{
  public TranslationUnit (String pathName) {
     this.pathName = pathName;
  }

  public int lineCount () {
     int lineCount = 0;
     try {
       LineNumberReader reader = new LineNumberReader (
           new FileReader (pathName));

       while (reader.readLine () != null)
         ;
       lineCount = reader.getLineNumber ();
     }
     catch (FileNotFoundException ignored) {
     }
     catch (IOException ignored) {
     }
     return lineCount;
  }

  private String pathName;
}

It looks like we’ll need something like the loop in lineCount. To get the total number of lines, we can look at each line and see if it is an include directive. If it is, we can make a translation unit for the included file and ask it for its line count. Hmm… there are too many steps here. How do we know whether a line contains an include directive? Let’s tackle that problem first.

We could write a method on TranslationUnit which tells us whether a string contains an include directive. What would we name it? How about containsIncludeDirective (String line)? Actually, I don’t like that name. It confuses me. Are we asking the translation unit whether it contains an include directive or whether our line contains the include directive? Why don’t we ask the line?

We need to be able to ask a line a question: “Do you contain an include directive?” In a test we could have:

public void testInclusionDirective () {
        Line line = new Line ("#include \"Thrak.h\"");
        assert (line.containsInclude ());
}

If the answer is “yes” (true), we should be able to ask the line for the path of the included file.

       ...
       assert (line.inclusionFileName ().equals ("Thrak.h"));

Those core tests call a Line class into existence. We can also add a few more tests to deal with the bad cases: malformed lines which want to contain include directives but don’t.

In the next part, we’ll bring up the Line class, test by test. Then we’ll complete our story, refactoring as we go. We’ll also start to ask some serious questions about the structure of our little app. We have a TranslationUnit class and a Line class. Is this a good factoring for our two stories? Will we be able to evolve our app when we have our next story? We’ll see.