OK, I promised to test reading, and Iā€™m gonna do it. I sort of have a plan.

Weā€™re here to test this method:

    def read(self):
        self._read()

        if self._jsonheader_len is None:
            self.process_protoheader()

        if self._jsonheader_len is not None:
            if self.jsonheader is None:
                self.process_jsonheader()

        if self.jsonheader:
            if self.response is None:
                self.process_response()

Last time, I listed a lot of aspects that we might want to test:

  1. Do we correctly accumulate bytes into the receive buffer?
  2. When we consume those bytes, do we correctly remove them?
  3. When we read the two byte ā€œprotocol headerā€, do we get the value sent in?
  4. When we read the protocol header do we actually set the _jsonheader_len variable, so that the next call will occur?
  5. Will the code correctly drop through on a single read that has all the necessary data?
  6. Does the code properly handle all the gaps that might occur due to incomplete reads?
  7. Does process_response correctly decide between json and other?
  8. Does whatever it calls deal properly with an unexpected format?

Now I think Iā€™d argue that at most two of the questions above are really about this specific method. Iā€™ve highlighted them: the one about properly dropping through, and the one about handling gaps.

The basic scheme of this method is that it reads whatever bytes are available and then tries to get the two-byte header, the json header, and the json response. It starts with none of those and if a given one isnā€™t there, it tries to process it and then goes on to see if if can do another and another.

If we knew we had all the bytes we need, this code could be more like this:

    def read(self):
        self._read()
        self.process_protoheader()
        self.process_jsonheader()
        self.process_response()

If this were my code, I think Iā€™d actually try writing it that way, with guard clauses inside those methods causing them to skip out until there is enough data to do their job. It might be better that way. It might not. I could also imagine using some explaining method names, like this:

    def read(self):
        self._read()

        if self.we_need_the_proto_header:
            self.process_protoheader()

        if self.we_have_proto_header_and_no_json_header:
            self.process_jsonheader()

        if self.we_have_json_header_and_no_response:
            self.process_response()

I suspect that python thinkers might almost read the original code and think the thoughts above. And I think that in this case, that transformation doesnā€™t provide enough additional clarity to be worth it. But I would at least consider it, and might try it so as to look at it (as I did above) to help me decide. My Smalltalk mentors would almost certainly advise doing something like that.

Anyway ā€¦

Weā€™re here now and we want to see how we could build up our confidence that this works. I have a rough plan.

I propose to run the actual program and capture a message that is actually sent from the server to the client, and use that as the basis for my tests. I donā€™t think Iā€™ve ever run this app before, though I did run the earlier ones.

The app-server starts with host and port, and the app-client starts with host, port, action, and value, where action pretty much has to be ā€˜searchā€™ and value can be ā€˜morpheusā€™ or ā€˜ringā€™. I choose ā€˜ringā€™.

The client run looks like this:

/Users/ron/PycharmProjects/cs1/.venv/bin/python /Users/ron/PycharmProjects/cs1/app-client.py 127.0.0.1 65432 search ring 
Starting connection to ('127.0.0.1', 65432)
Sending b'\x00g{"byteorder": "little", "content-type": "text/json", "content-encoding": "utf-8", "content-length": 37}{"action": "search", "value": "ring"}' to ('127.0.0.1', 65432)
Received response {'result': 'In the caves beneath the Misty Mountains. šŸ’'} from ('127.0.0.1', 65432)
Got result: In the caves beneath the Misty Mountains. šŸ’
Closing connection to ('127.0.0.1', 65432)

Process finished with exit code 0

And the server said:

/Users/ron/PycharmProjects/cs1/.venv/bin/python /Users/ron/PycharmProjects/cs1/app-server.py 127.0.0.1 65432 
Listening on ('127.0.0.1', 65432)
Accepted connection from ('127.0.0.1', 58194)
Received request {'action': 'search', 'value': 'ring'} from ('127.0.0.1', 58194)
Sending b'\x00g{"byteorder": "little", "content-type": "text/json", "content-encoding": "utf-8", "content-length": 60}{"result": "In the caves beneath the Misty Mountains. \xf0\x9f\x92\x8d"}' to ('127.0.0.1', 58194)
Closing connection to ('127.0.0.1', 58194)

So I figure I should be able to use that Misty Mountains string as my test string. (I still do not understand how they get a header length of 2 out of that.) So I begin to write my test:

    def test_query_response(self):
        answer = (b'\x00g{"byteorder": "little", "content-type": "text/json", '
                  b'"content-encoding": "utf-8", "content-length": 60}'
                  b'{"result": "In the caves beneath the Misty Mountains. '
                  b'\xf0\x9f\x92\x8d"}')

My fond hope is that I can stuff this into the input buffer and run the code and find a useful answer in the output buffer.

I should say ā€œmy fond hope wasā€, because this is _read

    def _read(self):
        try:
            # Should be ready to read
            data = self.sock.recv(4096)
        except BlockingIOError:
            # Resource temporarily unavailable (errno EWOULDBLOCK)
            pass
        else:
            if data:
                self._recv_buffer += data
            else:
                raise RuntimeError("Peer closed.")

The _read method accesses the socket sock. So Iā€™ll have to put my data in there, I guess. Letā€™s make a little test double for the socket:

class FakeSocket:
    def __init__(self, data):
        self.data = data

    def recv(self, size):
        return self.data

My new fond hope is that we can fudge around with various copies of this guy and test what we want to test. The first test weā€™ll provide the whole message and see if we can get an expected response from it.

    def test_query_response(self):
        answer = (b'\x00g{"byteorder": "little", "content-type": "text/json", '
                  b'"content-encoding": "utf-8", "content-length": 60}'
                  b'{"result": "In the caves beneath the Misty Mountains. '
                  b'\xf0\x9f\x92\x8d"}')
        sock = FakeSocket(answer)
        msg = Message(None, sock, None, None)
        msg.read()

This fails and by golly Iā€™ll know the reason why or heads will roll.

Received response {'result': 'In the caves beneath the Misty Mountains. šŸ’'} from None
Got result: In the caves beneath the Misty Mountains. šŸ’
Closing connection to None
Error: selector.unregister() exception for None: AttributeError("'NoneType' object has no attribute 'unregister'")

Oh this is delicious! It worked, and then tried to unregister. I need to pass something in as the selector, with an empty unregister method. Letā€™s just add the method to our fake thing and rename it.

I do that, discover that I also need close and we have this working without exploding. No assert yet.

class FakeSocketAndSelector:
    def __init__(self, data):
        self.data = data

    def recv(self, size):
        return self.data

    def unregister(self, sock):
        pass

    def close(self):
        pass

class TestLibClient:        
    def test_query_response(self):
        answer = (b'\x00g{"byteorder": "little", "content-type": "text/json", '
                  b'"content-encoding": "utf-8", "content-length": 60}'
                  b'{"result": "In the caves beneath the Misty Mountains. '
                  b'\xf0\x9f\x92\x8d"}')
        fake = FakeSocketAndSelector(answer)
        msg = Message(fake, fake, None, None)
        msg.read()

This is nice, and it even prints the response all nicely decoded. I can fetch that same value and check it:

    def test_query_response(self):
        answer = (b'\x00g{"byteorder": "little", "content-type": "text/json", '
                  b'"content-encoding": "utf-8", "content-length": 60}'
                  b'{"result": "In the caves beneath the Misty Mountains. '
                  b'\xf0\x9f\x92\x8d"}')
        fake = FakeSocketAndSelector(answer)
        msg = Message(fake, fake, None, None)
        msg.read()
        assert (msg.response.get('result') ==
                'In the caves beneath the Misty Mountains. šŸ’')

Thatā€™s green. Commit the new test. Now we can proceed to test what happens with partial information. Letā€™s try a simple one, with just half the data and then the other half. I think that I can satisfy my concerns by splitting the data in various ways.

Hm. I was going to create a fake with partial data, call read, the smash in new data and call read again. But now that we have gone to the trouble of writing this fake thing, letā€™s make it a bit smarter. If we were to give it a list of byte strings, and have it tick through them, that might be useful.

Change the fake to expect a list, and change the user code accordingly.

class FakeSocketAndSelector:
    def __init__(self, data_list):
        self.data_list = data_list
        self.index = -1

    def recv(self, size):
        self.index += 1
        return self.data_list[self.index]

    def unregister(self, sock):
        pass

    def close(self):
        pass

Now we can try a new test:

    def test_split_query(self):
        answers = [b'\x00g{"byteorder": "little", "content-type": "text/json", ',
                  b'"content-encoding": "utf-8", "content-length": 60}',
                  b'{"result": "In the caves beneath the Misty Mountains. ',
                  b'\xf0\x9f\x92\x8d"}']
        fake = FakeSocketAndSelector([answers])
        msg = Message(fake, fake, None, None)
        msg.read()
        assert (msg.response.get('result') ==
                'In the caves beneath the Misty Mountains. šŸ’')
Note
I copy-pasted the fake = line and the brackets should not be there. Skip to the next Note to see how easy this process would have been had I not confused myself with that.

I was sure that would work. Unfortunately it fails for some reason, and Iā€™ve burnt too much time trying to figure it out. Hereā€™s the failure:

    def _read(self):
        try:
            # Should be ready to read
            data = self.sock.recv(4096)
        except BlockingIOError:
            # Resource temporarily unavailable (errno EWOULDBLOCK)
            pass
        else:
            if data:
>               self._recv_buffer += data
E               TypeError: cannot concat list to bytes

How did it get a list?

OK, let me see here.

    def test_split_query(self):
        answers = [b'\x00g{"byteorder": "little", "content-type": "text/json", ',
                  b'"content-encoding": "utf-8", "content-length": 60}',
                  b'{"result": "In the caves beneath the Misty Mountains. ',
                  b'\xf0\x9f\x92\x8d"}']
        fake = FakeSocketAndSelector([answers])
        msg = Message(fake, fake, None, None)
        data = fake.recv(4096)
        assert data == b''

This has returned the entire array. And I see why in the test, finally. I left the [] in the setup. Duh. And also, I need to read four times:

    def test_split_query(self):
        answers = [b'\x00g{"byteorder": "little", "content-type": "text/json", ',
                  b'"content-encoding": "utf-8", "content-length": 60}',
                  b'{"result": "In the caves beneath the Misty Mountains. ',
                  b'\xf0\x9f\x92\x8d"}']
        fake = FakeSocketAndSelector(answers)
        msg = Message(fake, fake, None, None)
        msg.read()
        msg.read()
        msg.read()
        msg.read()
        assert (msg.response.get('result') ==
                'In the caves beneath the Misty Mountains. šŸ’')
Note
I think weā€™d have quickly moved to this point, had I not made the mistake of pasting that line. If I had typed it I would almost certainly not have put in the extra brackets.

Passes! Woot! Commit: split read test passing.

OK, we now are more confident about splitting, but letā€™s find some more difficult splits. And letā€™s change how the test works:

    def test_split_query(self):
        answers = [b'\x00g{"byteorder": "little", "content-type": "text/json", ',
                  b'"content-encoding": "utf-8", "content-length": 60}',
                  b'{"result": "In the caves beneath the Misty Mountains. ',
                  b'\xf0\x9f\x92\x8d"}']
        fake = FakeSocketAndSelector(answers)
        msg = Message(fake, fake, None, None)
        for _ in answers:
            msg.read()
        assert (msg.response.get('result') ==
                'In the caves beneath the Misty Mountains. šŸ’')

Now weā€™ll read as many times as there are elements in the array. I expect at least to copy and paste this code and perhaps even to extract it.

I want just one more test for now, splitting right near the beginning. I go wild with splits:

    def test_vicious_split_query(self):
        answers = [b'\x00',
                   b'g',
                   b'{"byteorder": "lit',
                   b'tle", "content-type": "text/json", ',
                   b'"content-encoding": "utf',
                   b'-8", "content-length": 60}',
                   b'{"res',
                   b'ult": "In the caves beneath the Misty Mountains. ',
                   b'\xf0\x9f',
                   b'\x92\x8d"}']
        fake = FakeSocketAndSelector(answers)
        msg = Message(fake, fake, None, None)
        for _ in answers:
            msg.read()
        assert (msg.response.get('result') ==
                'In the caves beneath the Misty Mountains. šŸ’')

That passes. And it satisfies my need to test, because Iā€™ve split up just about everything I can. I suppose we could write a test to feed one byte at a time, which might actually have been better, but honestly Iā€™m satisfied.

Summary

This didnā€™t quite go as I had imagined, mot even counting my confusion over having written [answers] when I should have written answers. I had though Iā€™d have to inject short strings one at a time, but once I had my fake object set up, giving it an array of byte strings came to mind and that was easier, and I think more clear. And that last test seems to me to be about as good as it can get, especially breaking out the first two bytes separately.

An additional nice thing about having this test is that now we could refactor that read code into the different forms I was contemplating, if we wanted to, and we could be quite sure it was still working. But the main thing here is that it really wasnā€™t that hard to test, and had I not copied that line with the brackets, it would have gone very smoothly.

Iā€™ll put skip notes in the article so you can see how smoothly it would have gone, just for fun.

It wasnā€™t that bad even with the silly typo, and itā€™s fascinating how, when it didnā€™t work, I never even considered that the fake was set up wrong. And there were plenty of things I could have done to find it. But I assumed (and you know what happens when we assume) that the problem was something important and subtle, so I was looking at the read code and how it was manipulating the socket and all the stuff over there when I should have been looking right here.

This, by the way, is why it is nearly impossible to predict how long it will take to fix a defect. Unless we already know what it is and what line of code to change, the time to fix a defect is often dominated strongly by the time it takes to find the two characters that need changing.

Maybe ChatGPT or Claude would have done better. Feel free to go read their articles if you can find them. Theyā€™re pretty busy destroying whatā€™s left of the environment, so donā€™t be surprised if theyā€™re even less amusing than I am.

I have mixed feelings about the fake object. It really seemed like the thing to do, and I am always pleased when I can create a simple fake and use it, and always a bit displeased that I needed it. Generally, I think the need to use a fake hints at a problem in the design.

What might be the problem here, you ask? Well, we have a very complicated wrapper for our very simple actual answer about the Misty Mountains. We have a two-byte count giving the length of a multi-element dictionary giving among other things the length of another dictionary containing a key ā€˜resultā€™ and the string that is the only thing we care about:

In the caves beneath the Misty Mountains. šŸ’

If the two-byte header had been the complete message length, we could have a simple ā€œread everythingā€ and an exceedingly simple ā€œoh everything is the exact answer we wantedā€.

But itā€™s a tutorial, and it really is valuable to show how we might do something really complicated. Although I might have gone another way if I were doing the tutorial. But itā€™s a very good tutorial and it can take the credit for a lot of what I now know about using sockets. I continue to find RealPython to be a very fine source of information.

Bottom line: We have tested a very intricate part of this legacy program we inherited from RealPython, and once we got started testing, it wasnā€™t all that difficult.

I think next time weā€™ll stop testing this thing and decide what to do with our robot world. See you then!