Update on the Site Technical Project

Spoiler Alert: I started this article thinking that I was going to tell you what I knew about how we’ve built the site generator so far. What really happened is that in writing about it I found some problems and one completely ignorant misapprehension. That’s life in the big city: my articles don’t get sanitized. Read on and laugh with me, not at me.

If you’ve been avidly dropping by looking for wisdom, you may have been in the wrong place. However, you may have noticed some new amenities like the category pages. They’re still works in progress, in that we’re thinking about splitting out fewer categories and more tags and so on. But functionally the site is in pretty good shape. Here are a few of the technical topics that I’d like to document.

Scraper

Scraper is the separate program that pulls material out of xprogramming.com and converts it to articles here. Perhaps I’ll write up some details on it later. It’s viewable on Vaguery’s GitHub if you want to wander around in its weeds. Basically it’s just a ruby program that scrapes through a WordPress export, converts the WP HTML to Markdown, arranges it in folders, and so on. Not very complex, but lots of details like remapping all the URLs and such.

Scraper is trying to be our highest priority but there are fun little things going on in ronjeffries.com that I wanted to get in place before loading a few hundred files into it.

Categories

Articles have categories. You can see the current list by clicking on “Site Categories” near the top of the site home page (or by clicking the hamburger menu button if you’re looking in from mobile). The categories for an article show up in some YAML at the beginning of the file. This one looks like this:

---
date: "2014-12-16"
title: Update on the Site Technical Project
blurb: A report on what's going on behind the scenes, with a bit of info on how we have built and used plugins and templates.
categories:
- rj.com
---

Jekyll / Liquid Basics

We’re building the site with Jekyll and its included templating language, Liquid, as reported elsewhere. The main flow is that Jekyll scans all the folders for the site “source”. When it finds something with YAML in it, it builds an internal data structure representing that page. The page is basically a hash with various keys. One key is “categories” and the value is an array of strings representing the page’s categories, in this case ["'rj'.com]". There’ll be a key for “title” with the value “Update on the Site Technical Project”, and so on.

The elements of this hash are available to the Liquid templating language, so that you can format a page using various templates. For example, part of the template for an article, called titled-article, looks like this:

  <header class="post-header">
    <h1 class="post-title">{{ page.title }}</h1>
    <p class="post-meta">{{ page.date | date: "%b %-d, %Y" }}{% if page.author %} • {{ page.author }}{% endif %}</p>
  </header>

Those things in braces with words like page.title refer to the hash for this page, and fetch the actual title and write it into the HTML for the page. There are includes and whatnot so you can organize the template layouts to your liking. If “liking” is a term that can be used with Liquid, which I doubt.

More Advanced Liquid

For something like a category page, Liquid includes processing syntax as well as the inclusion syntax shown above. Here’s the Liquid code to display one element in a category page:

{% assign sortedcats = p.categories | sort %}
<li>
  <p class="cat-index-title"><a href="{{ p.url }}">{{ p.title }} <span class="cat-index-date">({{ p.date }})</span></a></p>
  <p class="cat-index-blurb">{{p.blurb}}</p>
  <p class="cat-index-categories">[{%for c in sortedcats %}
    <a href="/categories/{{c | slugify}}/index.html">
    {{c}}{%unless forloop.last%}, {%endunless%}</a> {% endfor %}]</p>
</li>

If that looks cryptic, well, it is. And better yet, although the Jekyll/Liquid documentation is pretty, it doesn’t include many examples, so you get to experiment. When something goes wrong, the most common behavior is that nothing happens. Yum.

Anyway let’s look at this top down:

The first line there creates a new array, sortedcats, which will contain the page categories, p.categories, sorted. We’re going to loop over it later. We sort it here because we can’t sort it in the for loop (as far as I can determine).

The next few lines are just plugging various values into the HTML: p.url, p.title, and so on. These are all set up, more or less automatically, by Jekyll.

But notice the paragraph with class cat-index-categories. In there, we display a square bracket, then loop over sortedcats, emitting an a tag, which links to /categories/ and then c | slugify. The c is the current category loop element and slugify converts it, for example, from “Beyond Agile” to “beyond-agile”. I’ll show you the code for that in a moment, because we had to write it. Moving on inside the for loop, the a tag wraps the current category loop element, such as “Beyond Agile”, followed by a comma, unless its the last time through the loop. So we get [ Bar, Foo, Mumble ] without a trailing comma at the end.

Yes, Virginia, this stuff is cryptic. I list it here because despite how bloody confused you probably feel right now, it may be the best example of how to do this on the entire Internet. Not that it’s good, it’s just that there aren’t many working examples out there.

Summary So Far

So all that code produces some very simple HTML that looks like

[<a href="/categories/beyond-agile/index.html">Beyond Agile, </a>

… and so on. The Liquid code is limited, primarily for safety. Liquid is designed to let non-experts build templates for pages in their on-line stores running on someone else’s shopping site (Shopify, for example). So it is trying to balance power with enough safety to be sure that the customer can make his own page ugly but he can’t write a real script that would bring down Shopify.

The result of this compromise is that it’s rather tricky to write.

Confession, or Announcement: This code was even more cryptic before I started writing this article. As I wrote, I noticed things that were harder than they had to be. I didn’t know any better before, but I’ve learned enough to simplify them a bit. So, thanks for reading, you made the code better.

Slugify

There’s that one nifty bit where we write c | slugify and that will work such that "Beyond Agile" | slugify will return "beyond-agile". We need that, of course, because we don’t want path names with spaces in them. That would be bad. There is a slugify method built into Jekyll somewhere. You could use it if you were Jekyll. But you aren’t.

But wait, don’t answer yet. You could be Jekyll: you could be a plugin. There are three different kinds of plugins in Jekyll and I sort of understand two, filters and generators. There are also tags. I’m not sure what tags do. Anyway slugify is a filter, and the source looks like this:

module SlugifyFilter
  def slugify(input)
    input
  end
end

Liquid::Template.register_filter(SlugifyFilter)

STOP THE PRESSES!!

A glance at that code tells us that it does not slugify anything: it’s not calling the Ruby slugify. Therefore, whatever the slugify above in the Liquid is doing, it’s not calling that code. If it did, it would just return the input.

To verify that, I just removed the above plugin and everything still works.

Warts and All

As long-time readers know, when I write a development article – or book – I write about what really happens, not some sanitized version of what would happen if I were a god. I’m a good programmer, darn good when at my best. But I’m not a god, as the above proves. So I’m leaving these stupid mistakes in, because I make stupid mistakes sometimes. I leave them in because my purpose is to show you how I work, how I learn, and what I really do.

I hope you never make stupid mistakes. But i suspect that you do, and I hope this sort of thing will let you smile ruefully and move on without calling your masculinity, femininity, or humanity into question. We all make mistakes. I hope you make fewer than I do.

Where Were Your Tests???

You’re probably wondering “where were your tests”? I certainly am. I have not yet figured out how to get any leverage from unit tests in this situation, because everything you do is plugged deeply into Jekyll and Liquid, so you can’t readily write many tests. I could certainly have tested that slugify method. If I had, I’d have learned that it didn’t work and fixed it so that it did.

And, when I wrote this article in that universe, I’d assume my slugify was doing something, but it wouldn’t have been, because it seems that it is not called at all. (It’s not impossible that it was called, so long as the real one was called before or after it. I think that’s unlikely, but I don’t know and I’m not going to try to find out now.)

It would be possible to have seen whether it worked by some means. We could put a print statement in it and watch the transcript to see if it comes out. We could put an intentional error in it, making it return ABSOLUTE-CRAP or something, and look at the resulting HTML. I didn’t do that this time, and so I didn’t learn anything until I looked at the code in writing this article.

So, thanks again for pairing with me. You’ve found two important misunderstandings for me. You’re a darn good pair partner.

Enough for Now

I think we’ll stop here. The article is long enough. I didn’t get as much of the arcana of Jekyll and Liquid documented as I had planned, because I was building up to the big finish of slugify, which fizzled. But there’s more going on that needs to be documented for posterity and we’ll look at those next time.

Thanks for being here, you found some important misunderstandings for me!