Spans

During our FGNO Zoom session, Ken brought up an interesting little problem. Let’s play with it a bit. Gets a bit long, but it’s really pretty light work. Fun effort, meant as such.

During our Friday Geeks Night Out zoom meetings, held each Tuesday without fail, we talk of many things. Our most enjoyable sessions are those where we talk about and work with code, whether it’s something one of us is actually working on, or just something that pops up. Last Tuesday, Ken brought us a little problem that he had lifted from a Michael Feathers video.

The Challenge

The problem, in my own words, well, words that I hope we all share but my own selection and arrangement thereof, is:

Write a function that receives a list of numbers and returns the number of consecutive spans of non-zero numbers in the list.

Please consider

Please consider stopping right now, before anything I say can contaminate your brain, and try solving this problem on your own. Extra credit will be given for each different (and correct) solution after your first. How different? That’s up to you, this is a self-scoring exercise.

No, really

Please. if you’re going to do this, it will be far better for you to generate your own ideas first, because at least one of what we’ll do below may catch your attention and may be hard to get beyond. I’m a little worried about that myself, because I’ve already worked on this, both Tuesday night and since then.

OK, Challenge Accepted

Let’s be sure that we understand the problem by solving a small example by hand:

numbers  [0, 1, 2, 3, 0, 4, 5, 6, 0, 0, 7]
spans     0  1  1  1  1  2  2  2  2  2  3

As I understand the problem, the answer to the numbers above is 3 spans.

One Way

One way that I like to approach things is with TDD and with almost no idea in mind for an algorithm. I like to try to generate the solution bit by bit. Let’s try that first.

I decide to start with an empty list and devise this clever test and implementation:

def spans(numbers):
    return 0

class TestForArticle:
    def test_empty(self):
        numbers = []
        assert spans(numbers) == 0

We are green. Ship it!

What next? I’m going to try to go really incrementally, even if it makes a mess, because, well, I know a very nifty solution that I really like, and I want to avoid it.

We could put in aa list with one or more zeros but that would pass. A test that will break is this one:

    def test_a_non_zero(self):
        numbers = [1]
        assert spans(numbers) == 1

That fails as anticipated. Let’s code something not too bright …

def spans(numbers):
    if not numbers:
        return 0
    return 1

Python has this “nice” feature whereby empty collections are falsy. We use it here in hopes of confusing me later, because I generally think of not x as checking for x being None, or False.

Our tests are green. We need something that can fail that will push us forward. Seems like we have to get something other than a zero or one. Let’s just do two.

    def test_two_seqs(self):
        numbers = [0, 1, 0, 2]
        assert spans(numbers) == 2

This would pass, but it’s just silly:

def spans(numbers):
    if not numbers:
        return 0
    return numbers[-1]

It does pass all the tests, though. I’ll change the test to make that fail and then see if I can do some work.

    def test_two_seqs(self):
        numbers = [0, 1, 0, 379]
        assert spans(numbers) == 2

OK, let’s see. We could go thru the list counting sequences of numbers that aren’t zero. We could tally up a sequence when we encounter a zero … no, that won’t do … zero is first in our current test. We could tally when we previously saw a zero and now see a non-zero. That might work. How about this:

I really had high hopes for this:

def spans(numbers):
    if not numbers:
        return 0
    tally = 0
    in_seq = False
    i = 0
    while i < len(numbers):
        curr = numbers[i]
        if in_seq and curr != 0:
            pass
        elif in_seq and curr == 0:
            tally += 1
            inseq = False
        elif not in_seq and curr == 0:
            pass
        elif not in_seq and curr != 0:
            inseq = True
        i += 1
    return tally

But two (!) tests fail:

    def test_a_non_zero(self):
        numbers = [1]
        assert spans(numbers) == 1

        0 != 1

    def test_two_seqs(self):
        numbers = [0, 1, 0, 379]
        assert spans(numbers) == 2

        0 != 2

This suggests that we never increment tally, so that we never got in_seq and curr == 0.

A lucky moment tells me that I’m setting inseq not in_seq. Fix that and the second test fails with 1 != 2. But I don’t look, because I am stuck on the notion of “never increment tally”. But that’s OK:

I decide to debug this in my head, as we sometimes do. Consider the simple list with just 1 in it.

We start with in_seq False, and curr == 1. That will set in_seq to true … but it will exit the loop. No problem, we’ll check the last item and if it is non-zero, increment tally.

This passes my tests:

def spans(numbers):
    if not numbers:
        return 0
    tally = 0
    in_seq = False
    i = 0
    while i < len(numbers):
        curr = numbers[i]
        if in_seq and curr != 0:
            pass
        elif in_seq and curr == 0:
            tally += 1
            in_seq = False
        elif not in_seq and curr == 0:
            pass
        elif not in_seq and curr != 0:
            in_seq = True
        i += 1
    if numbers[-1]:
        tally += 1
    return tally

So that’s, um interesting. What would it do if we removed the special check at the front? I think it might pass.

No, it blows up because then we can’t check numbers[-1]. Se can combine the checks:

def spans(numbers):
    tally = 0
    in_seq = False
    i = 0
    while i < len(numbers):
        curr = numbers[i]
        if in_seq and curr != 0:
            pass
        elif in_seq and curr == 0:
            tally += 1
            in_seq = False
        elif not in_seq and curr == 0:
            pass
        elif not in_seq and curr != 0:
            in_seq = True
        i += 1
    if numbers and numbers[-1]:
        tally += 1
    return tally

I think we really owe ourselves some additional testing.

    def test_all_zero(self):
        numbers = [0, 0, 0, 0]
        assert spans(numbers) == 0

    def test_no_leading_zero(self):
        numbers = [1, 0, 379]
        assert spans(numbers) == 2

    def test_no_leading_but_trailing_zero(self):
        numbers = [1, 0, 379, 0, 0, 0, -67, 228, 0]
        assert spans(numbers) == 3

These all pass. Now let’s see if we can improve the span code somehow. Two cases are ignored, so I think we can remove them:

def spans(numbers):
    tally = 0
    in_seq = False
    i = 0
    while i < len(numbers):
        curr = numbers[i]
        if in_seq and curr == 0:
            tally += 1
            in_seq = False
        elif not in_seq and curr != 0:
            in_seq = True
        i += 1
    if numbers and numbers[-1]:
        tally += 1
    return tally

Still green. We just increment i every time, so we can use a for loop:

def spans(numbers):
    tally = 0
    in_seq = False
    for curr in numbers:
        if in_seq and curr == 0:
            tally += 1
            in_seq = False
        elif not in_seq and curr != 0:
            in_seq = True
    if numbers and numbers[-1]:
        tally += 1
    return tally

It’d be nice if we could get rid of that special check at the end. If only all sequences ended with at least one zero …

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    in_seq = False
    for curr in our_numbers:
        if in_seq and curr == 0:
            tally += 1
            in_seq = False
        elif not in_seq and curr != 0:
            in_seq = True
    return tally

It was tempting to append a zero to the input list, but that would edit the actual list provided, probably offending our users. So we make a copy, sparing no expense.

Tests are green.

Let’s think about this in a little more depth. When is in_seq true and when is it false? It is true if the preceding number was non-zero, otherwise false. Can we replace it with a “previous” number? If we did, what would the code have to be? Nothing ever changes if the current number and the previous both zero or both non-zero. Let’s try a little something:

def spans(numbers):
    def in_seq(prev):
        return prev != 0

    our_numbers = numbers + [0]
    tally = 0
    prev = our_numbers[0]
    for curr in our_numbers:
        if in_seq(prev) and curr == 0:
            tally += 1
        elif not in_seq(prev) and curr != 0:
            pass
        prev = curr
    return tally

That’s green. And we can remove the empty elif:

def spans(numbers):
    def in_seq(prev):
        return prev != 0

    our_numbers = numbers + [0]
    tally = 0
    prev = our_numbers[0]
    for curr in our_numbers:
        if in_seq(prev) and curr == 0:
            tally += 1
        prev = curr
    return tally

And we can inline the function:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = our_numbers[0]
    for curr in our_numbers:
        if prev != 0 and curr == 0:
            tally += 1
        prev = curr
    return tally

That’s nearly good, isn’t it?

It would be nice to get rid of that extension of the list (which could be very costly by the way, if the list is huge).

What is this code doing? Well it’s detecting a non-zero followed by a zero. If we think of a sort of trace of a signal, it looks like this:

 ___     ___    non-zero
|   |   |   |
|   |___|   |_ 
    ^       ^   zero

The little ‘^’ characters show that we are tallying on the trailing edge of the waveform. If we were to trigger on the leading edge instead, and we were to init prev to a zero … we could trigger on the leading edge. I think that would work. We change the code:

def spans(numbers):
    tally = 0
    prev = 0
    for curr in numbers:
        if prev == 0 and curr != 0:
            tally += 1
        prev = curr
    return tally

Tests are green. We’ve eliminated our nasty extension of the list. The picture is this:

  ___     ___   non-zero
 |   |   |   |
_|   |___|   |
 ^       ^      zero

The initial zero isn’t really in the list: setting prev to zero acts as though there is a leading zero, so we detect an initial non-zero as a leading edge of the waveform.

Retrospective

I am surprised that some nearly reasonable reasoning got us to this leading edge finder, which is the nicest solution that I know to this problem. I was thinking that I’d have to just pull it out and hand it to you.

Most of the changes we made to our original program were quite reasonable. There was a bit of a dig in the bag of tricks when we appended the zero, but really all we had to do was notice that we really wish the sequence would end with zero … and then make that happen.

Noticing the trailing edge was a bit deeper in the bag as well, and if I hadn’t had the leading edge solution in mind, I’m not sure if I’d have done that reflection. But the code was pretty reasonable by then anyway:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = our_numbers[0]
    for curr in our_numbers:
        if prev != 0 and curr == 0:
            tally += 1
        prev = curr
    return tally

The transformation from the code above to the leading-edge version was small, but it was certainly not just something that a machine refactoring could have done. I had to figure it out. It was bout three edits, so not what you’d call difficult, but the logic had to be inverted and the …

Idea!

You know what, I think I could have done it in a few simple steps. Let’s revert and see. Changing the prev line doesn’t break anything:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = 0
    for curr in our_numbers:
        if prev != 0 and curr == 0:
            tally += 1
        prev = curr
    return tally

Invert if condition:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = 0
    for curr in our_numbers:
        if prev == 0 or curr != 0:
            pass
        else:
            tally += 1
        prev = curr
    return tally

Green Move the tally, changing the if:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = 0
    for curr in our_numbers:
        if prev == 0 and curr != 0:
            tally += 1
        prev = curr
    return tally

Green. Change the loop to use numbers:

def spans(numbers):
    our_numbers = numbers + [0]
    tally = 0
    prev = 0
    for curr in numbers:
        if prev == 0 and curr != 0:
            tally += 1
        prev = curr
    return tally

Still green. Remove the useless our_numbers:

def spans(numbers):
    tally = 0
    prev = 0
    for curr in numbers:
        if prev == 0 and curr != 0:
            tally += 1
        prev = curr
    return tally

And there we are, three smaller changes, green in between.

I still do not know whether, unprimed, I’d have noticed the trailing edge / leading edge idea, because Tuesday night, Ken showed a very similar solution to this one and I somehow recognized it as an edge detector, so I had the concept near the top of my mind.

With it or without it, we have a pretty decent solution here.

Do you have a nicer one? Maybe something with some lambdas or functional programming / streaming operations?

I do have one that goes a step further, not necessarily a good step. The current scheme detects an edge and tallies it.

We can write a generator that generates a 1 whenever it sees an edge, and sum however many ones we get, like this:

def spans(numbers):
    def edges(numbers):
        prev = 0
        for number in numbers:
            if prev == 0 and number != 0:
                yield 1
            prev = number
        return

    return sum(edges(numbers))

The function edges is a generator and it generates and yields a 1 on every leading edge. It passes all the tests.

How about something a bit more like functional programming. This passes all the tests:

from itertools import pairwise
def spans(numbers):
    pairs = pairwise([0] + numbers)
    edges = map(lambda pair: (1 if pair[0] == 0 and pair[1] != 0 else 0), pairs)
    return sum(edges)

Let’s tart that up a bit.

def spans(numbers):
    def leading_edge(pair):
        prev, curr = pair
        return 1 if prev == 0 and curr != 0 else 0

    return sum(map(leading_edge, pairwise([0] + numbers)))

I inlined the sum and map because when we program using this FP style, we seem to avoid the Explaining Variable Name pattern, preferring instead to inline as much as we can, ensuring that the thing is at least a hard to understand as it was to write. Truth be told, I prefer this:

def spans(numbers):
    def leading_edge(pair):
        prev, curr = pair
        return 1 if prev == 0 and curr != 0 else 0

    pairs = pairwise([0] + numbers)
    edges = map(leading_edge, pairs)
    return sum(edges)

Summary

Added in Post: I noticed on reviewing before publishing that if we look at my hand example, it surely suggests writing a leading edge detector. I didn’t do that because I knew it was my favorite, but it does seem possible that from the hand example we’d jump right to detecting the leading transitions:

numbers  [0, 1, 2, 3, 0, 4, 5, 6, 0, 0, 7]
spans     0  1  1  1  1  2  2  2  2  2  3

Anyway …

What have we learned? Mostly that programming can be fun, I think. But this is an amusing little problem that can be solved in a lot of interesting ways. And, I think, if we start down a bad path, we can get pretty tangled up. Somehow, today, while things did a bit ad-hoc for a while, it was pretty straightforward to get down to a reasonable implementation. Then we did have to have a bit of an improbable insight, but without it the program would have been fine. Some might even find it easier to understand in an earlier form.

Much of our debate last Tuesday’s FGNO was over a particular FP style solution, similar to my final one, and whether we considered it better than the various procedural ones we came up with. The original procedural one that Ken brought us was pretty nasty, with two or three special cases in it, making it pretty hard to understand. In that light, one might lean toward the FP solution … but we felt that it depends on what the team knows and is comfortable with. In a team of juniors, we might avoid the FP solutions, as they are often easier to write than they are to read and we read code a lot more often than we write it.

One of the XP practices is “Coding Standard”, expressing the notion that the team decides its own coding standards. So in this case, the team might decide to get really good with FP type things, and to use them, or it might decide to leave that new-fangled stuff for your effete snobs and to stick to the good old ifs and gotos. Team gets to decide.

What does my team here chez Ron like? Here are the candidates one more time:

Procedural Edge Detector

def spans(numbers):
    tally = 0
    prev = 0
    for curr in numbers:
        if prev == 0 and curr != 0:
            tally += 1
        prev = curr
    return tally

Edge Generator

def spans(numbers):
    def edges(numbers):
        prev = 0
        for number in numbers:
            if prev == 0 and number != 0:
                yield 1
            prev = number
        return

    return sum(edges(numbers))

Light Functional Programming

def spans(numbers):
    def leading_edge(pair):
        prev, curr = pair
        return 1 if prev == 0 and curr != 0 else 0

    pairs = pairwise([0] + numbers)
    edges = map(leading_edge, pairs)
    return sum(edges)

For me, while I might be proud of the latter two and might keep them here at home, I have to say that I think the first entirely procedural version is the one I’d really recommend if asked in public.

What’s your preference? Toot me up, if you care to. See you next time!