Metrics. Maybe we should just say no.

Check out this slide from VersionOne:

One of the biggest problems with a tool like VersionOne is that it encourages you to track “metrics”. Do you want Crappy Agile? That’s how you get Crappy Agile, at least far too often.

Surely it’s important to know how you’re doing and try to improve. Yes, sure … except … here are a few ways that most of these will be gamed. I say will, not might. Even if you try not to game these metrics, these effects will inevitably occur. The saying “you’ll get what you measure” is true.

Velocity - 59% of users

Tracking points? Stories will tend to be estimated at higher numbers of points. All you have to do is frown a little bit when the team only does 24 points instead of last Sprint’s 30, and guess what, the points will be higher next time.

Tracking stories? Make stories smaller: you get more.

Tracking stories but don’t split them down? Test a bit less, refactor a bit less: you get more.

Iteration or Release Burn Down / Burn Up - around 50%

Iteration and release burn down are important for seeing where you are. When you see a turn coming up, you don’t speed up, you adjust your steering. Same thing with burn down: use it to decide what to do, not to drive your speed up.

Pushing burn down results in smaller stories (which can be a good thing). It results in less testing and less refactoring. These are not good things and they will slow you down and produce more defects. You want defects? That’s how you get defects.

Planned vs Actual Stories - 35%

Pressure me because I thought we’d have 10 stories and we got 8? What will I do? I’m already working as hard and smart as I can. So what can I do? Less work, that’s what. But you’ve defined the stories, so all I can do is less work inside each story. I can make it a bit less robust, I can test it a bit less carefully, I can leave the code in a bit worse order. Do you want those things? No, you do not.

Planned vs Actual Release Dates - 29%

Hello, what are you even talking about? A Fundamental Value of Agile Software Development is Working Software. Scrum demands that you have a tested, integrated, ready-to-ship version, containing all features to date, in hand at the end of every Sprint.

If you have that, you can hit any desired release date. You might hit it with fewer features than you imagined. In fact you will hit it with fewer features than you imagined, because some of those features you imagined aren’t worth being late for. If you’d order features by highest value first, you’d likely have 80% or 90% of the value in hand well before the release data.

Good Agile teams release before the planned date. Adequate teams release by the date. If you can’t release by the date, something was wrong and you bloody well should have noticed it before the final deadline.

Customer Satisfaction - 21%

Wow, here at number seven, we find customer satisfaction. Brilliant. Of course it’s hard to measure (unless you give them software very frequently). Of course it’s a lagging indicator (unless you give them software frequently). And, of course, it measures the quality of your business decisions, and maybe what you’d rather do is blame the developers for low decision quality about what to do.

Nonetheless, this is a decent metric, though hard to measure and hard to manage.

Speeding up now …

Let’s look more rapidly at the remaining items on VersionOne’s list:

Work-in-process is a good thing to look at. If it’s high, then work is waiting and people are likely switching tasks. That’s how you slow things down and inject more defects, so keeping WIP low is a good thing. Use WIP as a check to see if your teams are not as cross-functional as they might be.

Defects into production and defects over time are well worth tracking. Good Agile teams release perhaps one-tenth or even fewer defects than before they went Agile. If defects don’t go down when you go Agile, take a look at getting your teams a workshop in Agile software practices like TDD and refactoring. If defects drift upward, your teams need to bear down on testing and refactoring. One common cause of lowered quality is pressure to go faster. That’s why measuring things like velocity are almost always negative.

Budget vs actual cost is an odd one. A software team’s budget is dominated by salaries over time. The only way you can be over budget in a month, by any significant margin, is by accidentally hiring more people than you planned. This is a budgeting or management error, not a team problem. Unless you’re sending them off to too many meetings in Aruba, this is not likely to be a useful metric.

Defect resolution is also odd. Avoid defects, don’t resolve them. However, if you do have too many defects, while you’re working to beef up testing to bring down defect injection, do look at how long they’re taking to fix. If the time to fix defects lengthens, consider whether perhaps you’re not prioritizing well. Do you want more defects fixed? Create more defect-fixing backlog items.

Wait, what? You thought the Product Owner should just pile on more new stories and the team should magically fix defects in their copious free time? You were mistaken. As Product Owner, your job is to present to the team all the functionality-based backlog items that need to be done. New features, fixes to old ones: those are yours to decide upon.

Estimation accuracy. The only good thing about this item is that it is down at number 13. First of all, you don’t really mean accuracy. If they were getting everything done in less than the time estimated you’d be happy, even though they were incredibly inaccurate. Therefore, they’ll estimate everything higher and you’ll be happy. Estimation accuracy is a bogus metric.

The best teams, these days, aren’t estimating at the story level at all. They just slice stories small and do them, complete, working, and done.

Business value delivered. Great! Best idea yet. Too bad it’s number 14 on the list. How do you improve your business value delivered? You give the team important valuable work to do. Since your best idea is probably worth 100 times your worst, doing stories in the order from high to low is the best possible plan. See above on release dates.

Individual hours per iteration / week. Are you even serious??? Do you really think that making people work more hours gets more work done at acceptable quality levels? Get out of here, you’re not even qualified to use the word “Agile” on your resume. No, seriously, just get the hell out.

Cycle time, on the other hand, can be useful. How long does it take from a story being given to the team until it’s done? More than a couple of days? Then it was probably too large or not well-enough understood. How long does it take from specifying some important need until the story gets into the team’s hands? Then you have a priority issue. Unless the story isn’t really that important. Consider the actual value stream from idea to the customer. How much of it is in your development team? How much of it is elsewhere in your organization? Fix what’s broken.

Test pass/fail over time sounds good but I think it’s a red herring. Unless you have a defect-ridden shambling hulk to begin with, the tests should all stay green all the time and the number of tests should increase roughly linearly with the number of backlog items in the system. If you have red tests for a long time, you have a problem. Use this as a retrospective topic, not as a team measurement topic.

Scope change in a release is interesting. It seems to imply that you were supposed to know up front what you were going to build, and somehow you got it wrong. Well, hell yes. Your Product Owner’s job is to deliver highest value backlog items first. It is to get the highest possible return on the work of the team, week in and week out. It is not to pump through some fixed list of stuff, and it is not to predict up front, other than in very general terms, what will be done. If you’re looking at scope change in a release, don’t show up at the next Agile conference unless you want me to rip the Agile badge off your shirt.

Cumulative flow chart might be interesting. I’d suggest graphing both story count, which should go up roughly linearly over time, and value, which should tail off (because you did the high value things first, remember?). If the story count declines, you have material for a retrospective. In no case do you have material for comparing teams to each other. You might have material for comparing product ideas to each other.

Earned value is popular among some large organizations. It is thought by some to be a useful way to compare the amount of work done with the time and money allocated. I think it probably works pretty well for building a bridge or a road. EVM, however, requires a detailed plan in advance. If you have a detailed plan in advance, it is unlikely that you’re really doing an Agile method, even if you’re using some techniques you think to be “Agile” such as Sprints or even TDD.

EVM, with a fixed staffing budget, would typically show a straight line of value vs time or budget. It would assume linear growth of “value”. Since you’re picking your backlog items with highest value first, your EVM should stay ahead of the line for the entire project. If you’re really good at budgeting the costs of features you’ve never built. It is somewhat possible to estimate how long it’ll take to build a bridge or road. (Add your own list of over-budget bridges and roads here.) It is nearly impossible to budget how many features you can do in a new product with a new team. It may be somewhat possible to budget a product that’s very similar to another one you just did.

You’ll be getting the idea that I don’t care for EVM. I do not. I know one person who I think is pretty good at it. If you would like me to put you in touch with him, let me know. But unless you’re required to use EVM by law, don’t, just don’t.

Customer retention is a fair idea. Very lagging indicator, hard to relate to your decisions, and just about as weak a customer-based indicator as you can think of.

Revenue / sales impact is also very laggy and hard to relate to individual product decisions. Certainly it’s something to watch but I’m not clear at all how it relates to “Agile qua Agile”.

Product utilization can be great. I know a company who tracked clicks versus individual features, in an advertising situation. They found that valuable in planning how to serve up ads. And most of us can think of at least one feature in our commonly used apps that we’d like to have left out. So if you tracked utilization at a fine enough grain, close enough to your decisions, you might learn how to make good decisions.

OK, Ron, do you like ANY metrics?

Comparing one team to another? I don’t like metrics very much, because comparing teams to one another is worse than apples and oranges. At least apples and oranges are fruit and fairly round. Teams, not so much. I do recall one case of a team comparison that worked well, however:

At Atlas, when they existed, they had many departments all doing a Scrum kind of process. Most of the teams were learning to do small stories and that helped them, because small stories generally focus your attention better, help you select value better, and help you see how you’re doing. One team, however, a sort of systems department, kept doing large stories that took months to do. No amount of advice seemed to change it.

Then, Nathan McCoy, who was in charge of all the teams, started showing a simple graph in his monthly status reports, showing number of stories done by each team. The systems manager objected to this because it made him look bad. Other teams were doing ten or twenty things, and he might get one per month. As I was told the story by Nathan, the conversation went like this:

Manager: “Nathan, this isn’t fair. You’re just showing the number of stories, not how big they are.”
Nathan: “That’s right.”
Manager: “But that’s not fair!”
Nathan: [silent]
Manager: “All I’d have to do would be to divide up my stories into little bits and release those every month.”
Nathan: [silent, smiling]
Manager: “Oh.”

Soon, the manager was doing small stories, to the benefit of everyone.

That’s how you do a metric!

The point is that comparative metrics will be met, so you need to be very careful when you set them up, because they will be met … somehow.

In-team, however …

When the team chooses metrics to tell themselves how they’re doing, I’m all for it. Here are a few I’ve found useful.

Graph number of tests vs number of stories done. This will give you a quick view into whether the team is trying to go too fast.

Put a small square on the white board. When a code section needs improvement, put a sticky note in that square. This is a quick way for the team to indicate when they’re encountering crufty code. Check this square in each retrospective. Bear down on refactoring when it starts filling up. And really “small square” is important. Maybe a foot square if you use three-inch stickies.

Graph acceptance tests written versus green. I’ve seen this metric indicate that the team had written tests but had not been provided “correct” answers by the Product Owner team. You might find it useful.

Velocity, in terms of stories done, is worth looking at. It will go up and down and you can learn something from thinking about it. However, be very careful: you’re not looking at worker effort or team effort or anything of that kind. But when the curve of progress changes, you are looking at something. Use your retrospective to figure out what it is.

Here’s an older article on Big Visible Charts, to give you some ideas. I might see problems in some of those that I didn’t see when I wrote the article back in 2004, particularly regarding possible misuses.

Overall, be very cautious about trying to measure a team by metrics. Do use metrics to flag things for the team to look at. Very likely you shouldn’t use them to compare teams at all.