Estimation, COSMIC and Other

Some thoughts after another pleasant conversation with Colin Hammond of ScopeMaster.

I had another very interesting and enjoyable chat yesterday, with Colin Hammond, the creator and purveyor of ScopeMaster. I recommend that you have a look at the site: it’s a very interesting and capable product. There’s even a very quick and dirty example of some of my Dungeon stories.

I’m recommending that you look at it. I’m not saying anyone should, or should not, try the product. I rather hope that some people in our community do try it.

I’ll talk below about what I’ve seen and try to give a sense of how and why it works, but first I want to sum up my position on estimation in general and ScopeMaster in particular.

Views as of 06/03/2022 0817 hours

Based on years of software development, including over two decades in the Agile style, I am confident that teams can readily do fairly decent size estimation of stories.

I’ve seen teams do estimation well, using estimated “ideal” time, story points, or simple story counts. These estimates tracked well with actual results. I say teams can do it because I’ve seen teams do it.

I do not recommend story-level estimation of any kind, nor do I plan to change that view in the future. That being said, if you have to estimate, it is possible to do it well.

I am also not recommending against story-level estimation, at least not very strongly. I think that story estimation is easy to use in a harmful fashion, and as such, I lean away from it. I understand and somewhat support the “No Estimates” ideas, and I understand and somewhat support those who say that estimates are often valuable. My personal inclination is not to do them.

Based on brief views of the product, I believe that ScopeMaster does very interesting and potentially valuable analysis of, and reporting on, specifications written in plain English.

I have seen the product used in real time, seen the output that it produces, and seen it change its output dynamically based on changes to the free-form text input one provides. I think that it does what it says on the box. I would love to see what happened if it were used on some real Agile projects, ideally with some real Agile experts involved with the teams.

Why Estimation [Kind of] Works

As someone who started writing software over six decades ago and who wrote some yesterday, I would say that to a surprising degree, a lot of software, if not all software, is pretty much the same. A given software module has some Gozintas and Gozoutas, it accesses some data, it updates some data, and in a lot of software, that’s the end of it.

(You can think of exceptions. So can I. And yet, even inside those exceptions, objects and functions have Gozintas and Gozoutas, they access information and update it …)

When we write specifications for a software feature, we might start with a simple Connextra (ptui!) style story:

As a payroll administrator, I want to access an employee’s record, so that I can update their information.

Now if we have a conversation about that story we’ll ask questions, and take notes, and maybe even write a better story:

As a payroll administrator, I want to access an employee’s record using their employee number, so that I can update their pay rate, shift number, experience status, and/or hours worked.

Face it, we really can estimate some of this stuff.

Can we estimate this story? Sure we can. We have to imagine what we’re working on, but presumably if this is a real situation, we know that we’re working on the admin GUI pages and so we know that for this page, we want to enter employee number, and then fill in the employee info, according to some list like name, id, and the fields listed above, and we want to be able to type values into each of the data fields, and we want a save button and a cancel button.

Unless this is day one of the project, we know our tools and our product and we probably know that there was a very similar form done last week, and this one looks to be about twice as complicated as that one.

If we are using story points, we might say “That form last week was a 3. I’d give this one about a 6.” Or we might say “That form I did last week took me almost two days. This one looks at least twice as hard. I need four days, maybe all week.”

Even if we don’t say things like that out loud, even if we don’t publish our numbers, we very quickly build up a reasonably solid intuition about how big stories are.

As I said above: I know estimation is possible because I’ve seen it done.

Now let’s imagine a computer program that could help us with that work.

Hey Siri, How Long Will This Take?

Earlier this week, I asked Siri:

*Hey, Siri, what’s that thing on the end of a shoelace called?”

And Siri told me it is an “aglet”. Amazing.

Suppose we put some decent “natural language recognition” software to work on some fairly decent user stories. Suppose that the “natural language recognition” could identify the actors, the action and the acted-upon. “admin updates pay rate”. “senior_admin deletes employee record”.

It’s clear that such things are within the scope of today’s language recognition. Siri, Alexa, and Cortana do it every day.

The stories for a given IT application will generally use roughly the same words for the same ideas throughout. They’ll use roughly the same nouns, verbs, and objects. Consider these stories for a moment:

The admin creates a new employee record including name, id, address (street, city, state), phone numbers (up to 5), work location, and pay rate.
The admin can update an employee record including phone and pay rate.

It’s not hard to believe that a tool could be written to observe that the update doesn’t mention the other fields, and issue a report line on the second story.

It’s not hard to believe that a tool could be written to understand enough CRUD so that it would ask whether there was a missing story about deleting an employee record.

It turns out that ScopeMaster does things like that. I’ve seen it do them instantly on a small set of stories and Colin tells me it takes minutes, not days, to do a thousand stories.

Perhaps the most valuable thing it does is this: If it can’t make sense of a requirement, it flags it in red as not able to understand. Oh, look, the story says “adm prs wrg btn see Debbie”.

Hammond tells me that it’s not uncommon for ScopeMaster to be unable to understand around half of the input specs its given. If you’re the kind of outfit that likes written specs, that’s actually a pretty valuable result right there. “Here, Jack, go find out what Debbie wanted and write up the story”.

What About Estimation Though?

ScopeMaster uses COSMIC estimation, a simplified, much easier kind of “function point” estimation. You count entries, exits, reads and writes. These are all reasonably well-defined, for those who care to do COSMIC by hand. But ScopeMaster, using whatever natural language / AI kinds of things it has inside, does it for you.

I don’t find this to be incredible. Spec language is usually pretty regular and there are only so many words that probably mean “read”: get, access, fetch, look at, view … and so on. So ScopeMaster identifies such words in each spec item and adds them up.

So what? So … you now have a COSMIC-style “size” for your specified problem. How long will it take? COSMIC and ScopeMaster don’t say. But you, the organization building things, have information that can be used to suggest, hint, indicate, even guess how long it’ll take to do 500 COSMIC points, because you can run your other projects through it, get their points, and use what you know about the staff size and time required for those other projects to give you an idea about this one.

Size to cost and time are up to you.

Is it perfect? Hell, no. Is it better than no estimate? It sure is if you’re bidding on a project, or if your boss demands to know how many people and how much time you need. You can multiply the COSMIC numbers by whatever constant you want. Apply a range. You’d be a fool not to.

That Can’t Work!

Someone out there is saying “You can’t just add up the reads and writes from all your stories and magically know how big the product is!!!”.

But don’t forget, we do it ourselves in a matter of minutes. “This one is about twice as big as that one”. We’re using our experience to identify something about those two stories that makes us think the new one is twice the old one. What are we identifying? Probably steps, or fields accessed, or the like.

Where do we fail? Well, if inside that simple story there is some algorithm that we don’t yet know or haven’t estimated. “Calculate the shortest route to deliver packages to these addresses”. Yeah, that’s one function call, sortToMinimizeTravel, but that function … I don’t know how to write it, I don’t know how to represent the map, I know the problem expands massively as the number of targets increases … and I don’t know how to size that story.

ScopeMaster doesn’t know either. It’ll say something like, OK, one read one write, that’s a four, moving right along.

ScopeMaster isn’t better than we are. It’s not as good in some ways, but possibly better in others. In particular, it doesn’t get tired. It’s good at tossing out specs that it can’t understand. It’s good at picking up inconsistencies across stories. And it’s amazingly good at drawing interesting pictures and diagrams, if you’re into pictures and diagrams.

Based on what I’ve seen of it, ScopeMaster is interesting, and provides useful information from individual stories and from a batch.

The value may be in story quality.

Most of that information isn’t about estimates. It’s about the quality of your stories. ScopeMaster quickly identifies the ones that are too vague or weird for its limited understanding. Right there, we’d quickly learn to be more clear. That can’t hurt.

It produces sequence diagrams and object diagrams for the requirements. I think it can do a diagram for just one or for a group. I would take those diagrams with more than a grain of salt. They represent what ScopeMaster’s “AI” got out of the requirements. They don’t represent what a good human designer might get. But they are interesting and looking at them, I often saw that “Yes, that’s roughly what I’d do” or “No, that’s wrong”, which usually meant that the story was wrong.

I had that experience often.

Expert Interaction

As Hammond demonstrated the product to me yesterday, I had a number of experiences with a common aspect. We’d be looking at some requirements, in some app, in some random format. I’d see something odd in one of ScopeMaster’s displays. I might see two words in the cloud that clearly, to me, referred to the same person. In the Dungeon example that he did, I had said “level designer” sometimes and “designer” sometimes, but I meant the same user. I spotted that instantly from the ScopeMaster output.

In another instance, ScopeMaster had drawn a sort of flow chart of how something needed to be tested (one of its tricks is that it can identify missing paths). But I looked at the proposed tests and say, instantly, that we wouldn’t do that thing that way. The requirement was written poorly and needed to be refined in the light of a better way to do the thing.

Note that it would take a developer / engineer / technical person of some kind to identify that problem. Reading a thousand requirements, they might not notice the issue. Looking at the flow charts, they might notice more easily. I certainly did.

Of course you’d still have to look at a thousand flow charts, but at least you wouldn’t have to draw them as you read the specs.

In another instance I noticed that ScopeMaster had parsed the requirement wrongly, concluding that something was a noun rather than a modifier. Something like “create triangular and round objects”, where the author clearly meant “triangular objects and round objects” but ScopeMaster somehow decided that “triangular” was a thing and drew a diagram creating a “triangular”. Again, this popped right out as I looked at the diagram.

This caused me to imagine a situation.

Suppose we had a ScopeMaster screen up on the wall as we did an iteration planning meeting or backlog refinement session. As our Product Owner started to discuss each story, they showed the ScopeMaster understanding of that story. What might happen?

Story quality would improve.

Well, the first thing that would surely happen is they would never bring us stories that ScopeMaster flagged as bright red what the hell does this even mean. We’d all learn to express stories that ScopeMaster could understand, which would draw all of us toward using common terms consistently.

Then we might look at the sequence diagram and see whether the right objects were being messaged in the right order. Or we’d look at the object diagram and see whether we were connecting to the right objects with the right kinds of messages.

We might look at the testing diagram and see whether we agree with how the thing would be tested.

And so on. I can tell you this: even looking at the ScopeMaster output for Colin’s misinterpretation of my vaguely described stories, I could see in there some of the “truth” of the stories, and I could see quickly what he’d gotten wrong. I could see how to rephrase the stories so that the output would be more credible and more useful.

Let me pop up and kind of summarize …

Where Does This Leave Me?

I’m quite sure that the program does what it says on the box, taking natural language requirements and producing the reports shown.

I’ve seen it do a decent job on a well-phrased requirement, a poor job on a poorly phrased one, and seen it throw up its hands on a sufficiently bad one. I’ve seen it identify missing cases that one would want to look at.

I think it would be interesting to use it in iteration planning or backlog refinement.

If you’re the kind of organization that has a full written spec for a product, I think it would be interesting to run the entire spec through ScopeMaster, and I think it would be of value to do so.

Would it be a uniquely high value? That, I can’t say. There are estimates in the ScopeMaster pricing pages of time saved, suggesting that you could save a couple of team hours per story. I’m not equipped to tell you whether that’s true, or where the savings would come from.

I think that using ScopeMaster would possibly make planning and refinement take longer, with a corresponding savings in implementation, testing, and avoidable rework. I think that a good team can do without it, and that a relatively inexperienced team might benefit greatly from building a better common understanding.

I’d like to see some teams, especially well-coached teams, give ScopeMaster a real try and report what happens. I’m sure there would be some discoveries and insights, probably some of them quite interesting.

I suspect that the closer a team’s work comes to just cranking out a long series of written stories, the more ScopeMaster might help, and the closer they are to an rapidly-interacting room full of product definers, designers, and developers (like a true XP team or a really great Scrum team), the less value they’d get from ScopeMaster.

Cost and Savings?

And I could be wrong. It costs ten quid to put a story into ScopeMaster. With a single developer costing as much as upwards of a hundred or two per hour, finding one important mistake per week could pay for the product many times over.

I just don’t know. I’m sure that it does what it says on the box, and does it well enough that the output will certainly remind you of your own understanding. I’m sure that it often identifies important issues, and that it often inspires a user to thoughts they might not have had otherwise.

Would it be fun to use, valuable to the team, valuable to the company? I don’t know. Would it be possible to misuse it? Sure, people can misuse anything, but since ScopeMaster’s focus is on providing views of things, and since it sizes only in points, not time, it might be difficult to misuse. That said, I’d like to start using it within an experienced productive team, not using it from outside a pressure-driven team.

TL: DR

ScopeMaster seems to do what it says on the box. Looking at my own project, I saw easily where my stories were vague or had not been understood. Looking at other projects, its output helped me see what the projects were about, and I was quickly able to spot difficulties in those as well.

I’d like to see some well-coached teams try it internally, long enough to get a real sense of what it’s good for, what it’s not so good at, how it helps, and whether in some ways it hurts.

I think it’s rather nifty, an impressive program with some visible value.