Hours Estimation

There has been a discussion over the past week or so on the scrumdevelopment Yahoo(!) list.

Christofer Jennings asked whether people were using hours estimation for tasks, saying that he had often found it to be wasteful, especially when you have a task board showing how things are going. Christofer did say that he has found hours estimation to be useful with deciding how much work to take on, but he went right back to the concerns, including talking about the wrong thing (hours rather than the work) or just tuning out.

Some people were in favor of estimation of tasks in hours, and said they generally do it. Some found it useful for new teams, as did Christofer.

Others were much less in favor of tasks, task estimation in hours, task estimation at all, or estimation of any kind. I don’t mean to imply that fame correlates with being right, but by and large the people who were less in favor seemed to be names that I recognized as having been around a long time.

Perhaps they just wanted people to get off their lawn, but I don’t think so. I believe that broad experience tends to reveal things that are likely to be true, and that it’s likely that hours estimation of tasks is less than ideal.

Tell us what you think, Ron!

(Said no one, ever.)

Let’s limit our discussion to the practices of breaking down proposed Sprint Backlog Items into tasks, working in tasks, estimating those tasks in hours, burning down the hours, and tracking planned versus actual hours to see how you did. We’ll proceed last to first.

Tracking planned vs actual

Tracking actual versus planned leads to trouble.

If we have estimated task hours, we can track actuals. If they differ, there’s obviously something wrong. The natural question is “what went wrong with your estimate?” The natural response is to get better at estimating.

Since there are numbers involved, it is very tempting for management, or people with a management bent, to try to manage to the numbers. Focus on getting the numbers to match. Even worse, focus will probably be on being sure actuals are always less than estimated. It’s OK to be early; it’s never OK to be late.

You get what you ask for. What you’ll get is estimates that are large enough that they’ll rarely be exceeded. That will cause the team to take on less work and that will slow you down.

Tracking actual hours versus estimated tends to lead to trouble. Is this inevitable? Certainly not. Similarly, it is not inevitable that if you play in traffic, you’ll get hit by a truck.

Burning down hours

Burning down hours is an inefficient way to track.

OK, we don’t try to make estimated and actual line up, but we do use hours estimates of work remaining to burn down the tasks in our backlog. If you squint your eyes just right, you can see the Scrum Guide telling you to do something like this.

If you do this, you’ll surely know whether tasks are getting done, and whether one is dragging on and on.

There are easier and more common ways to know this. The most common is a task board, showing all tasks in a few simple states like Not Started, Running, and Done. If something stays in Running a long time, you’ve got a problem. Can’t remember from day to day? Put a red dot on a card in Running, every morning. See lots of red dots? Trouble.

Burning down hours is OK, but inefficient. And having the hours leads to tracking them at Sprint end, which per the previous section, is less than ideal.

(Editor’s Note: When Ron says “less than ideal” he means “bad”. He’s just being nice for some reason we do not comprehend.)

Estimating tasks in hours

Estimating task hours doesn’t pay off.

According to the argument above, we don’t need task hours for any external purpose: they’re not good for managing either day to day or Sprint to Sprint. Do they have some other benefit, such that we should use them and then perhaps throw them away?

If we’re talking about estimating tasks, there are few arguments in favor of estimating for internal purposes: we’ve already had whatever design discussion caused us to know the tasks.

No real payoff for task hours if you’re not going to burn them or track them, and you shouldn’t burn them or track them for the reasons above.

Estimating stories in hours

Estimating stories in hours can be useful. But not tasks.

When it comes to stories, even if we’re not going to burn or track hours, there’s a bit in favor of using them.

Dave Farley, in a Twitter conversation with Kate Oneal that I was copied on, actually told Kate that his team re-established estimates because they missed the design discussions that resulted, and that they throw the estimates away. Joe Sadowski, similarly told her that planning poker creates discussions when there’s not immediate agreement.

I can see how that could happen. I say it’ll take a day, you say four. If I am wise, I’ll be all like “What?” and find out what you’re seeing that I’m not. Quite often we’ll find that one of us has misunderstood the story, or mistaken how hard something is. We’ll learn something from the discussion.

Doing that in a context of estimating can be a quick way of identifying stories where we’re not all on the same page.

I think a quick planning-poker style session, show cards, move on if they’re close enough, discuss if not, could be a quick way to help the right discussions to happen without wasting time on things that won’t pay off.

There’s no reason to do this with hours or days, however, though if you’re comfortable with them, there’s no harm. Just don’t fall into the trap of tracking and justifying the numbers simply because you have them.

I suppose it “ought” to be possible to have the design discussions without the poker, but given that we’re just using the game to decide what to talk about, it seems like a decent practice.

Working in tasks

Working in tasks leads to integration problems and to waste.

So far, we have found little value, even negative value, in estimating tasks. What about dividing up the work into tasks and working that way?

If we figure out the various steps to implement a story, we’ll perforce think about it a lot, do some design, and so on. That will be good. Design is good.

However, if we then actually work that task plan, many things can happen, and most of them are bad.

We might complete each task, put them together, and have the story actually work. It could happen. More often than not, we think the tasks are complete, and when we integrate them, the story does not work. This is “less than ideal”, q.v. It leads to finger-pointing, other finger gestures, and to rushing at the end of the Sprint to figure out what’s wrong.

Working in tasks essentially causes this to happen: whoever is working on each task has no feedback, until all the tasks are done as to whether they are doing the right thing or not. Working without feedback leads to error.

But let’s look a bit more deeply at what can go wrong when you work on Task A1 and I work on task A2, adding up to story A. Each of us can do the wrong thing or the right. We won’t know until the end. We can each do too little work, leaving something out, or too much work, putting something in that isn’t even needed. Or we can get it just right.

What are the odds? The odds are that we’ll have trouble when we integrate and the odds are that there will be code rushed in at the last minute and that there will be other code that need not have been done at all.

Waste, that’s what happens. Working task breakdowns is often ineffective and often inefficient. Don’t be that person. Work in complete stories instead. (vide infra)

Identifying task breakdowns

As a design step, task breakdown can be useful. Just don’t build that way.

When we plan to build something even slightly complicated, it can be quite valuable to have a “quick design session”, where we sketch out what we think we’re going to do. This amounts to a task breakdown, and it can be quite OK to do. There’s great value to doing this in small groups, at least the pair who are going to work on the story, and often it’s helpful to bring in some other people.

I’ve even seen value in the Product Owner sitting in on these sessions, because quite often they’ll hear something that tells them we have a misunderstanding.

It’s fine to write these implementation ideas down as tasks on cards or in a list. But as we’ve seen above, there are few advantages to building that way, and many disadvantages.

Estimates, hours and other: mostly no

So here we are, in my experienced if not wise opinion:

Designing using tasks can have value.
Building with tasks is almost always inferior.
Estimates can trigger useful conversations.
Tracking estimates is always inefficient and often harmful.

Stories: better than tasks

Complete stories work better than tasks. Keep them small. No estimates required.

I promised (vide supra) to talk about working with whole stories, rather than tasks. Here goes.

First of all, stories (or Product Backlog Items, if you prefer) are what the Product Owner wants. The PO doesn’t want tasks: they’re just how we might organize our work. Therefore, when a team focuses on stories, they are better focused on what’s actually needed.

Second, a need or desire to break up the work is often driven by various specialties in the team. I’m the database person, you’re the GUI person, Sam is the tester, and so on. The whole team may be cross-functional, but as individuals we are specialized, and therefore limited. It would be better if we could all work on most anything, perhaps leaving the difficult database issues to me, the tough GUI topics to you, and the most complex testing to Sam. When we work together in pairs, or in a mob, we all learn more. It’s better to have some breadth in our capabilities and it doesn’t get in the way of deep knowledge in our favorite areas.

Working in stories serves the Product Owner better, and it serves the team members better as well. It’s not the only way to go. It’s OK to work in tasks. And it’s likely better to work in stories.

But stories are too big!

I heard that. If your stories are too big, they were too big when you were doing tasks. Either you got them done inside one Sprint, or you didn’t. If you didn’t, then you didn’t really deliver a “done” increment of product, or, as I’ve called it for years, “Running Tested Software”.

If your stories are too big, make them smaller.

Isn’t that a form of estimation?

When someone asked that question in a Twitter conversation with Kate Oneal the other day, she said that if she were MIss Sticky Semanticist she might call that estimation, but that she was thinking of estimates as the numbers that “they” use. “They” meaning management, I assume.

Chet and I recommend that stories be no more than a couple of days to implement. We aren’t interested in whether it’s one or two: we’re interested in whether it’s small.

Neil Killick has pointed out that a good rule of thumb is whether the story can be defined in a single acceptance test. That’s a brilliant idea, because there’s no estimation involved, but there is some good analysis necessary to define the test, which addresses the only value we’ve thought of for estimates inside the team, namely taking a deeper look at the story.

So, no, it’s not really a kind of estimate to say that stories are small. And it is a good thing to do. It keeps our eye on what the Product Owner wants, it helps the team be more cross-functional, it makes it easier to keep boring ideas out of the features. Small stories: the way to go.

Projecting when we’ll be done

I expect that many of you will be agreeing just now, but some will still be concerned that estimates are somehow God’s way of doing things, or at least the VP’s way. Well, my heart goes out to you, but that doesn’t make estimates the only way, or the best way. In my opinion, they are neither. Small stories work better. Acceptance tests work better.

So here are a couple of challenges:

What if all stories were approximately the same size. Then what could we do with story estimates that we couldn’t do with story counts?
What if all stories were one acceptance test? What could we do with story estimates that we couldn’t do with story counts (or, now, acceptance test counts)?

TL;DR

There are many good reasons not to do task-level estimates or even story-level estimates. There are very many good reasons not to track them, not to justify them, not to try to improve them.

You can do almost everything with story counts or acceptance test counts that you can do with task or story estimates. When you can, it works better in almost every way.

If you know of a situation where this thinking doesn’t hold up, I’d like to hear about it.