Blog Software

An Eccentric Anomaly: Ed Davies's Blog

This blog is produced using some home-grown software. It's intended specifically for my own use — that's the point; it should do what I want without all sorts of configuration options for things I don't need — so is very unlikely to be of much direct interest to anybody else. Never-the-less, some design decisions, and perhaps even code modules, might be of interest to somebody rolling their own.

This entry is a brief overview to be supplemented by later discussion of certain aspects.

Principles

For quite a few years I have managed my web site simply by hand editing the required files. It has mostly worked fine but is a bit painful when updating the layout and doesn't scale for the level of duplication required for a blog with index pages and an Atom feed. However, I'm quite happy with hand editing the actual content as XHTML.

What I want, therefore, is something which can take the content in a heap of XML files and build the site ready to serve. I don't see any point in having much of a database or anything anywhere in the process: baking is fine. That was a pretty good thing with my previous hosting provider who only supported static files; my current provider does support CGI and SQL but still...

Evolution

I've been fiddling around with this problem on and off for a couple of years. The first version built the site by running a handful of Saxon XSLT scripts from Java using a Jena RDF triple store to collect information on each of the pages. It worked but finished up not giving as much flexibility and power as I had hoped for the amount of code involved.

I then did a rewrite in Python which sat almost finished for nearly a year. In the meantime I've been using a fairly baroque set of scripts to store my static web site in Subversion, determine the changed files and FTP them to my old site. The Python code was sort-of a plugin for this.

Recently I moved hosting provider and switched from Subversion to Mercurial. Both changes simplified things as the new provider supports the use of rsync for updates, getting rid of the need to work out what needs FTPing, and, less significantly, Mercurial doesn't leave a lot of dot directories deep in the tree where they're a nuisance. I wrote a little Python script to take a copy of the site files, check that all of the files which should be well-formed XML actually are, then either serve the site locally for testing or upload it using rsync.

Very recently I've plugged my older Python code to do blog and static page templating into this. I was only going to do a few little changes but finished up doing sufficient simplification and tidy up (mostly removal of 'good ideas' which probably aren't going to be needed) that it basically became a rewrite.

Workflow

The site files are stored in Mercurial in more-or-less the directory structure they'll have on the web. Old files from my existing site are stored in exactly this way and, apart from being copied around and checked for XML well-formedness as appropriate, are untouched.

The repository actually contains two directories: www for the web site files themselves and bin for the scripts which build the site. Keeping these under the same version control should make it a lot easier to go back and build old versions of the site if required.

I write blog entries and new static pages as XML files which are pretty close to being Atom Entry Documents. There are some slight differences so elements which don't strictly comply with the Atom spec are put in a parallel namespace to avoid any possible confusion.

For example, here's the start of this document:

<?xml version="1.0" ?>

<c:entry 
    xmlns="http://www.w3.org/1999/xhtml" 
    xmlns:a="http://www.w3.org/2005/Atom" 
    xmlns:c="http://names.edavies.me.uk/2010/01/content">
        
    <a:title>Blog Software</a:title>
    <a:category term="self-reference"/>
    <a:published>2011-03-31T18:30:00Z</a:published>
    <c:minor-updated>2011-04-24T14:40:00Z</c:minor-updated>
    
    <c:content>

<div>

<p>
This blog is produced using some home-grown software.

I now follow the common pattern of giving all pages names of the form <year>/<month>/<name>/ published as an index.html file in that directory. The corresponding source files have names of the forms:

<year>/<month>/<name>.blog-entry.xml
<year>/<month>/<name>/blog-entry.xml
<year>/<month>/<name>.static-page.xml
<year>/<month>/<name>/static-page.xml

I.e., a file with the page name and a double-barrel extension or a file in the named directory with a fixed name. If the page has any associated files (e.g., images) then the directory + fixed file form is needed and the other files go straight in the directory.

For example, the source file for this page is called www/2011/03/blog-software/blog-entry.xml.

Publication is done by the update-site.py script. This takes a temporary copy of the www directory tree, builds the static and blog pages and the associated index HTML and feed documents, deletes the input files, checks the lot for XML well-formedness then either serves the pages locally for testing and proof reading or rsyncs them to the host's web space.

Usage: update-site.py [options]

Options:
  -h, --help   show this help message and exit
  --repo=REPO  Root directory of the repository with the source files in a www
               sub-directory, (/home/edavies/projects/web/blog-software)
  --work=WORK  Work directory in which to build site images, etc,
               (/home/edavies/web-work)
  --serve      Serve the constructed site locally (default)
  --release    Deploy the constructed site to edavies.me.uk
  --check      Just construct the site and do checks

Mostly it's all pretty straight-forward Python which isn't very interesting. The templating code which creates the output HTML pages and Atom feed contains a few ideas which could be of use to somebody so I'll try to do a separate post about how that works soon.

The key balance in this code is how specific to my own needs to make it. On the one hand it would be nice if it was more generally applicable but on the other hand the point is to try to make it not too complicated. The compromise taken is to try to set up the structure for something which could be fairly general without producing a lot of code to support it. Here's a typical method in class BlogPage which is a subclass of ContentPage:

    def blog(self):
        """ Return the blog associated with this blog page.
            
            Overrides the default implementation in ContentPage.
            
            We only have one blog for now so determining which blog
            this is an entry for is pretty simple.
        """
        return anEccentricAnomaly

I was a bit concerned that it might all get a bit too slow so I created a pile of dummy pages: 20 for each month in the years 2012 to 2018 inclusive. On my 1.3 GHz, 1.5 GB laptop those took just over 50 seconds to process. I could live with that but maybe I'll get a faster machine in about 2014 (perhaps tweaking the code for a bit more parallelism) or work out how to rebuild only stuff which has changed recently. It is nice, though, that at the moment the site builds and starts serving locally in only a couple of seconds.

mint.py

Creating the directories and outline files for a new blog post is hardly a lot of work but it's a bit tedious and there're a few potential mistakes so I have a mint.py script to mint new URIs in the edavies.me.uk namespace. For the moment it just creates outline static and blog pages but I have ideas for extensions. It uses the same XML templating code as the update-site.py script.

Usage: mint.py [options] name [title/description]

Options:
  -h, --help            show this help message and exit
  --repo=REPO           Root directory of the repository with the source files
                        in a www sub-directory, (/home/edavies/projects/web
                        /blog-software)
  --base=BASE           Initial part of the URI to be minted,
                        (http://edavies.me.uk/)
  --year=YEAR           Year component of URI to be minted, (2011)
  --month=MONTH         Month component of URI to be minted, (03)
  --blog                Create a blog entry (default)
  -s, --static          Create a static page
  -#, --hash            Create a hash URI
  -d, --directory       Create directory for entry
  -n, --noedit          Skip opening the outline in an editor
  -c CATEGORIES, --category=CATEGORIES
                        Specify a category for a blog post
  --self-reference      Category: Commentary on this blog's construction
  --general             Category: Life, the universe and everything
  --rant                Category: The many things that wind me up
  --astronomy           Category: Things in space and in the sky in general

I created the outline of this page using:

bin/mint.py --self-reference blog-software Blog Software.