FAFO on GitHub

Thinking about simplicity, generality, abstraction, and the FAFO data. A tiny but possibly important insight.

A number of my colleagues have been talking about simplicity lately, mostly feeling that no one values simplicity any more, in fact creating unnecessarily complex approaches to almost everything, including software. The other day, I was surprised to find that in all the Categories and tags on my site, I do not have one for simplicity. That’s particularly interesting—at least to me—because I try to do everything as simply as I can, with rare exceptions.

But I don’t often write directly about simplicity, I guess, which is why I never created the category. Maybe.

I have occasionally, or at least once, mentioned that I’ve seen programmers “generalize” some capability in two very different ways. One of those ways makes the code worse, and one makes it better.

To make the code worse, we create some code that does something. Then we think of something else that code could do, so we add it on. Then we think of yet another thing it could do, and add it on. After a while we have an object or a library that can do seventy-eleven things, with three times seventy-eleven entry points, and a plethora of enums, true-false flags, and optional arguments to allow this thing to do all the things.

I am sure you can name a library or object like that that impacts your life. If you cannot, consider yourself blessed beyond compare.

The other approach to generalization makes the code better. This is the way of “abstraction”, and it is rare, not least because it is quite difficult, and possibly because many of the things we do just don’t yield to abstraction.

Commonly, in programming, abstraction will be described by examples where a complex object is broken down into a number of smaller objects, each responsible for some aspect of the larger object’s behavior. In the recent work I’ve been doing on Asteroids and Space Invaders, we might say that the Collider object is an abstraction of the very specific ways that different objects collide. It handles just one aspect of collision behavior, determining whether or not two things are colliding.

This makes the code better, and smaller, because instead of each object computing collisions on its own, they can all use this one object to do the job, delegating that responsibility to it.

This is a small, but I think legitimate example of useful abstraction. It is typical of good object-oriented programming, where by “good” I probably mean something like “the way Ron thinks things should be done” or even “code that has things like Collider in it”.

There is another even better kind of abstraction, much closer to the mathematical notion of abstraction.

In mathematical abstraction, we often start with some set of concrete problems and solutions, and we remove all the nasty parts about reality, producing a mathematical model that is simpler, more general, and more provably correct than what we had before.

I think the first known example of mathematical abstraction is Euclidean geometry. Examples of right angles and circles are very common discoveries in very old archeological sites. Builders found ways to make reasonably square corners and fairly round circles. Geometry removed a lot of the cruft from the dusty reality of building things, and allowed us to think about a perfectly flat plane with perfectly straight lines and perfectly round circles.

And in that space’s simplicity, it was possible to figure out how to construct a line perpendicular to another line, which could be applied to constructing a wall perpendicular to another wall, or, even better, to constructing an object with a guaranteed perpendicular angle, the first true carpenter’s square.

Now, of course, it never quite goes like that, but the principle remains that plane geometry is a simplification of the real world (and, we now think, a gross simplification of the larger universe). We can solve problems in that simplified space, like drawing a diagram of the house we plan to build. We can resolve man of the issues of building that house, including the measurements of key components. With a bit of effort, we can figure out how many concrete blocks we’ll need and how much tile for the floor.

(And, if we are wise, we’ll bring a bit extra. Sometimes abstractions are like that.)

One of the most significant abstractions in programming is the relational database. The RDB is based on a fairly simple mathematical notion, the “relation”, and a few fairly simple mathematical operations on relations, such as restriction, selection, join, and the like. These were, forgive me, codified by Ted Codd of IBM, and were built into the many relational database systems that were created, culminating, perhaps sadly, in things like SQL.

SQL “generalizes” relational math in the messy way. It adds on syntax and strange operations and APIs and whatnot, but it all holds up, because it has a formal abstract basis.

Is there a point here?

That question was for me, not for you. If I can come up with a point, it’ll be good for both of us.

However, the way I get to a point is often to think around the topic, getting a feel for it, and then from that, drawing out something I consider useful. It’s also how I build programs, letting them get messy and then drawing out bits that can be packaged — abstracted — separately

In the FAFO thing so far, I can sense two abstractions forming. One is the notion of a set of name-value pairs. That feels to me to be very set-theoretic, which means to me that, like relational algebra, we can probably express the things we want to do with tags as simple set-theoretic operations, which we can implement simply and reliably because sets work in a certain way that is useful to us. I haven’t proven this … I just sense it happening.

The second abstraction is different. I have “abstracted” the notion of a key-value database into a simple implementation using a single folder, keys as file names, and values as long strings (documents). By isolating the database idea into that simple structure, I can put off thinking about database details for a while.

(Or is the second abstraction really different? Something just came to me, a tiny insight.)

That thing in the file folder? It’s a set of pairs with a string name and a string value … just exactly like the sets I’ve been using in memory, with small names like “author” and small values like “ron”.

This surprise, and it is a surprise to me, is the sort of thing that happens with abstractions: we begin to see that certain things are alike, and as such we can simplify our thinking, and often, our code.

Now I have a problem

I want to work further with these ideas, if GeePaw Hill doesn’t mind my doing so. I’d like to sort of follow along with his ideas, using them as grist for my programming mill. I am assuming that he won’t mind.

I know that he thinks of the documents he’s dealing with as being tree-like, and that he considers paths through tho tree-like thing, such as the list of particular documents that a cohort of students works through as one of their courses.

I was thinking that I’d work with the Composite Pattern, which lets a group of objects be treated “just like” a single object of that type. It’s pretty much the official way of handling tree-like data.

But now that I’ve recognized that my base documents are just name-value pairs … I’m tempted, nay almost compelled to try a set-based solution.

Well, this is good. Something to try, things to discover.

The only thing that would make it better would be if I could do something that Hill could actually use. That would probably require me to build the thing in Kotlin or something, but that wouldn’t be the end of the world.

Of course, a document server could and should be a separate thing anyway, so it could even be built in Python.

More likely, this is just another in my endless series of exercises to keep myself and my small cadre of readers entertained.

Summary

Just thinking here. But maybe we can do some interesting abstraction. That would be fun.

I can’t wait to see what I do next time!