Restarting after long gap. Maybe some dates.

Sep 5

What it is, we have a process running that will copy down files from my Dropbox-based input folder, run Jekyll to generate the site, and then publish the categories, home page, and article files that correspond to what was found in the in box. And we moved the main Jekyll folder up into Dropbox, and separated out the folder containing the built site, so it’s in an unshared folder on whatever computer builds the site.

What I realized was that our process could “easily” build and put up articles that were not done on the iPad, but on the Mac the way I usually work. We’d “just”¹ have to check the dates on all the files in the whole site, select which ones are newer than the last time we ran, build the site, and then publish the Jekyll output files corresponding to those new ones. That process would pick up edits most anywhere.

(One remaining issue would be the historical xprog files, which are kept separately and are not generated unless I explicitly use a different Jekyll config. This is done to save time, as it takes a long time to generate the whole site. This, too, could be automated, but that’s a separate story.)

Quick Design Session

If I were to create a new framework, which I do not plan to do, one of the practices would probably be Quick Design Session. I don’t believe I’ve spoken of that much in this “Erors” series but we do them all the time.

Although I do proceed to code very quickly, because it makes thing concrete, and because I like to push practices to the limit, we always talk about how we’re going to build our next bit of code, how it fits in, and so on. Often as a result of those chats, we’ll even build a new test and Spike the idea.

Tozier’s not here yet, and I have decided to start thinking anyway, and he can join in when he gets here. Here’s my plan for what needs to be done:

There will be two designated files at the top of the ipad input box: one requesting a run as of some time, and one saying when the last run was done. Call them, I don’t know, run-request-time and run-complete-time. These files will contain a date-time string, which I think will look like 20170905092803, that is, YYYYMMDDHHMMSS.
When our job runs, on a timer, it will check to see if the contents of run-complete-time is less than that in run-request-time. If it is, it’ll run Jekyll and update the run-complete-time.²
We’ll spike some code to scan the Jekyll input folder, selecting all the file names where the file is newer than some provided date. Then we’ll at least sketch how to use that list, instead of our current list, to decide what to publish.

2017-09-07

On the 5th, we chatted about things but as time was short and we had other topics, we didn’t do any work.

“Spec”

Here’s the current design thinking as I have it:

Job runs, checks run-request-time > run-complete-time, exit if not.
Update run-complete-time.
Unilaterally copy from the iPad input folder into the Jekyll site source folders, in folders corresponding in name to the folders found in the input, typically article/something/index.md. This preserves whatever the source files’ current dates are.
Run Jekyll, making a “complete” new site in the local j_site folder.
Publish all the files whose Jekyll source is newer than run-complete-time.
Publish various categories and other standard files.

We note that there are race conditions in the time logic. If we update at the very end, a new file might get a time less than the soon-to-be set completion time. But if we set completion right away, and then explode, we won’t try again. Maybe that’s OK or at least safer.

We discuss whether to build some kind of semaphore. I’m like “no”. Maybe later, it’s a good idea but we don’t have a story for it now.

Today, when T gets here, I think we’ll do some work generating a list of files newer than some date, using the string format I have in mind. Meanwhile I’ll create a new test class.

OK, well, I have this already:

require "minitest/autorun"
require "pathname"

class Test_File_Thing  < Minitest::Test
  def setup
    pnY = Pathname.new('path-Y/a/b/')
    pnY.mkpath
    pnYC = pnY+'c.txt'
    pnYC.write("hello")
    sleep(1) #gotta make it newer
    pnX = Pathname.new('path-X/a/b/')
    pnX.mkpath
    pnXC = pnX+'c.txt'
    pnXC.write("hello")
  end

  def teardown
    pn = Pathname.new('path-X')
    pn.rmtree
    pn = Pathname.new('path-Y')
    pn.rmtree
  end

  def test_newer
    pair = FilePair.new('a/b/c.txt', 'path-X', 'path-Y')
    assert(pair.newer?, "should have been newer")
  end

  def test_second_exists
    ok = FilePair.new('a/b/c.txt', 'path-X', 'path-Y')
    assert(ok.second_exists?, "didn't find c.txt")
    missing = FilePair.new('a/b/nothere.txt', 'path-X', 'path-Y')
    assert(! missing.second_exists?, "eek found nothere.txt")
  end
end

class FilePair
  def initialize(common_path, first_path, second_path)
    @pn1 = Pathname.new(first_path)+common_path
    @pn2 = Pathname.new(second_path)+common_path
  end

  def newer?
    @pn1.ctime > @pn2.ctime
  end

  def second_exists?
    @pn2.exist?
  end
end

So that’s sort of interesting, it compares two files for their actual dates. Not quite what I think we need now: we’ll have a given “run-complete-time” and be looking for files newer than that. I’ll go ahead and sketch in that change, which I think I’ll do just by taking an actual Time and formatting it.

  def test_string_newer
    pair = FilePair.new('a/b/c.txt', 'path-X', 'path-Y')
    d1 = pair.first_date
    d2 = pair.second_date
    assert(d1>d2, "should have been newer using strings")
    puts "#{pair.first_date} #{d1}"
  end

OK, I don’t like this but offhand couldn’t think of anything better. The puts is there to let me visually check whether my formatting works. The code is just this:

# class FilePair
  def first_date
    return @pn1.ctime.strftime('%y%m%d%H%M%S')
  end

  def second_date
    return @pn2.ctime.strftime('%y%m%d%H%M%S')
  end

This isn’t lovely but it does what I want. I’m just sketching to laern how the Ruby classes work, and to get a sense of what I want. My best guess is that this FilePair isn’t a production object at all, but that we’ll do something somewhat derived from it.

Now we need to do a Quick Design Session on what we’ll really do. T asks whether we’re going to first scrap what we have for a list of files. I’m like “what list” so will check the tests and code.

In JekyllRunner we have this:

  def files_to_ftp(folder)
    perform_in_folder(folder) do
      Dir.glob('**/*').collect { |f| rename_md_to_html(f) }
    end
  end

This is used in ftp_the_articles. It gets the names of all the files in the iPad input folder and uses them to drive the publication. It would seem that we can make everything work by changing this one method so that, instead of looking at the ipad input, it scans all the Jekyll input folders and moves everything “newer”, changing .md to .html as it does now.

This will probably wreck all our tests, however. We do some exploration, and find out how to use the Pathname object to select by time, using irb:

2.2.0 :004 > require 'pathname'
 => true 
2.2.0 :005 > pn = Pathname.glob('**/*')
 => true 
2.2.0 :006 > pn.length
 => 2456 
2.2.0 :014 > sept = Time.new(2017,9,1)
 => 2017-09-01 00:00:00 -0400 
2.2.0 :015 > pn.find_all { |p| p.ctime > sept }
 => [#<Pathname:articles/017-08ff>, #<Pathname:articles/017-08ff/ipad-k-db/index.md>, #<Pathname:articles/017-08ff/ipad-l-dates>, #<Pathname:articles/017-08ff/ipad-l-dates/index.md>, #<Pathname:articles/017-08ff/ramps>, #<Pathname:articles/017-08ff/ramps/index.md>, #<Pathname:draft_folder>, #<Pathname:indexes>, #<Pathname:indexes/archived_index.yaml>, #<Pathname:xprog>, #<Pathname:xprog/articles>, #<Pathname:xprog/articles/practices>, #<Pathname:xprog/blog>] 
2.2.0 :016 > 

This provides an array of Pathname objects containing the folders and files whose change time is greater than September 1 2017. Now our existing code uses an array of strings with the names (from the iPad input folder). We now face a few interesting issues:

We now want to check the Jekyll input folders, not the iPad input;
We’d like to switch to using Pathname instead of raw strings;
Our test is currently built around the wrong application flow.

The morally correct thing to do would be to make the tests match the desired flow. I’m concerned, though, since the tests aren’t date dependent at all.

I propose that we change first to Pathname. Or T does, I’m not sure. I don’t see anything that depends on that in the jekyll runner tests, so it seems to me we can refactor JekyllRunner to use Pathname with impunity. Of course everything will break and we’ll be punished, which is an odd kind of impunity. So here goes:

Arrgh. We tried the obvious, changing our Dir.glob to Pathname.glob and following our noses. There are too many string assumptions going on in the code.

This, of course, tells us that we “should” have converted away from string much sooner. “We talked about that.” “Yeah, but you didn’t do it.” Be that as it may, we are ratholed. We have no real choice but to revert and start over. Possibly our timer should have gone off sooner but we’re pretty close to nominal time for discovering you’re in a hole.

We git stash and will start this again anon.

I have an appointment, so we must break for the day. We’ll figure out what to do next Tuesday.

Lesson Learned(?)

This is a lesson I’ve learned a million times. When you write a program that passes strings around, that’s a Very Nasty Code Smell™. Sooner or later you’ll wish you hadn’t, and unwinding your reliance on strings will be difficult, because the damn things are all over.

We chatted a bit before separating for the day. I think we were too aggressive about using Pathname, and that starting top down, while it seems sensible, commits us to too many changes: at first glance, anyway, it seems like we have to fix it all the way down. On the other hand, maybe we just went too far or something.

Anyway, next time I think we’ll try bottom up, if we do the Pathname thing at all. We’ll find a method that accepts a file name as a string, and we’ll fix him to use a Pathname internally. If he calls out, we’ll leave him calling with a string, not a Pathname. Once he’s converted to Pathname throughout (and this should be “easy” since our methods are short), we can change his callers to send him a Pathname, which is, roughly, string_file_name.to_path. Then we’ll change the callers, one at a time, to use Pathname internally. Rinse, repeat. This “should” be straightforward.

We’ll find out.

Top down might work similarly, I’m not sure. I rather prefer top down, because one wants one’s powerful abstractions up near the top and more machine-like down below. So we might try that.

However, we’re faced with a moral dilemma. It’s quite easy to use Pathname in one place, get all the folders and files, select them based on date as needed, and then return, not an array of Pathname but an array strings. Everything else would just work. We could even do it with our existing Dir.glob() construction. So the code isn’t as habitable as it should be, but we’re here now and our job isn’t to drain the swamp.

You’ve probably been there. You can just bash your new feature in, or you can pause and habitable up the code and then slide the feature in more easily. Bashing is always faster “just this once”. But each new bash takes longer and longer. Cleaning up pays off if there are N more changes coming down. What’s N? We don’t know.

My advice is always the same: keep the code as habitable as you can, because it pays off. As you can see, I face the same dilemmas and temptations that you probably do and they’re just as painful. The good news is, when I put off doing something, and get in trouble later, I can write an article about it and it’s almost as good as doing things right. Almost.

Anyway, see you next time! I’ll publish this one because I want to write another one about TDD.

“Just” is one of the most dangerous words in software development. Whenever I hear it, I make a note that there’s big trouble ahead, and I recommend the same to you. ↩
The Tozier family inquires about what will happen if I write an article in Boulder, or Singapore. I don’t know. Maybe use GMT in the files? Maybe ctime is always UTC. We’ll see. ↩