Blog Templating

An Eccentric Anomaly: Ed Davies's Blog

Introduction

The software used to create this blog writes a number of files: the individual HTML pages, the HTML index files and the Atom feed. Though it's not necessary for the HTML to be well-formed XML I do have a strong preference for it to be. Actually, I'd rather just use XHTML but that's a separate rant.

XML is just text and can be treated as such but there are enough traps for the wary, let alone the unwary, that it's really not a good plan; in particular, when doing template substitution on the text form it is quite hard to guarantee that the result will be well formed. It's much better to handle it as a proper document structure and leave serialization to code which specializes in doing that right. Python has a number of packages for handling XML with varying degrees of power and standarization but I decided to just use the basic xml.dom and xml.dom.minidom for simplicity. The lack of any form of XPath expressions or the like could be a pain in some applications but isn't a problem here. The compromised namespace management is more irritating but I've hived off dealing with that to a separate module avoiding too much grief while accepting some limitations.

The general scheme is to have template documents stored in the Python code as multiline strings and "compiled" to DOM trees on load. For each output document the Python code recurses down the appropriate template tree producing the output by copying most of the template and performing substitutions for elements in a specific namespace.

The template document therefore has a similar flavour to an XSLT Literal Result Element StyleSheet. The main difference is that the values substituted come from Python data structures rather than from an input XML document.

Template Elements

Here's a template fragment which illustrates four of the special template elements. It's the part which generates the blog link subtitle for blog entry pages (An Eccentric Anomaly: Ed Davies's Blog, see the title bar, above). I've changed the whitespace for readability, the original is set so that the whitespace carried through makes the result document reasonably readable at the cost of the template being a bit of a puzzle:

<t:for_each value="blog?">
    <h2>
        <a>
            <t:attribute name="href">/<t:path generator=".path"/>/</t:attribute>
            <span class="blogTitle">
                <t:sequence value=".title"/>
            </span>
            <span class="blogTitleSeparator">: </span>
            <span class="blogSubtitle">
                <t:sequence value=".subtitle"/>
            </span>
        </a>
    </h2>
</t:for_each>

With the prefix t bound to the appropriate namespace the special elements processed are:

<t:sequence value="..." generator="...">

Substitutes the value(s) specified by the value and generator attributes as described below. If multiple values are specified they are simply concatenated with no intervening whitespace.

<t:for_each value="..." generator="...">

Expands the contents of the for_each element once for each value specified by the value and generator attributes as described below. During each expansion the value is pushed on to the top of the context stack (also described below) and popped back off afterwards.

<t:attribute name="...">

Sets an attribute named by the name attribute on the closest enclosing result element. The attribute value is found by concatenating the node values (text strings stripped of any XML markup) of the children of the t:attribute element. Typically this will be a single t:sequence element. The attribute is put in the default namespace.

<t:path value="..." generator="...">

Makes a list of the string values of the contents of the t:path element and the values specified by the value and generator attributes and joins them together separated by the URI path separator character ('/'). Substitutes that into the result.

Expressions

The values of the value and generator attributes of the various template elements are expressions which reference Python values of various sorts.

The basis of this access is a context stack. This is a simple list of Python objects. Any sort of Python object can be used but they're typically normal objects with properties to be accessed by name in the expressions or dicts with string keys to be accessed similarly.

When a template substitution operation is started a root context object is specified. For example, when a page for an individual blog entry is to be produced this is a Python object with properties containing or referencing all the required information about the blog entry. This forms the single initial value on the context stack. Whenever a t:for_each element is encountered the values it references are, in turn, pushed onto the context stack, the body of the t:for_each is expanded and the value popped off the stack. t:for_eaches can be nested in the obvious way.

Rather than duplicate some writing, here are the starts of the two main functions responsible for expression evaluation:

def _expr(self, expr):
    """ Evaluate an expression in the context of this generator.
    
        expr ::= simpleExpr ( '|' simpleExpr )*
    
        The first simpleExpr which evaluates to something other than 
        None is the result.
    """
def _simpleExpr(self, expr):
    """ Evaluate a simple expression string in the context of this generator.
    
        simpleExpr ::= contextSpec ( propRef ( '.' propRef )* )?
        
        That is, a (possibly empty) context specification followed 
        by zero or more dot-separated property references.
        
        contextSpec ::= '.'*
        
        An empty context specification indicates use of the context object
        passed to the top-level XMLTemplate.generate function. One dot
        means the current object of the nearest enclosing t:for_each,
        two dots the one surrounding that and so on. When there are 
        n nested t:for_each elements n+1 dots is synonymous with none
        (i.e., accesses the top-level context object). More than n+1 dots 
        is an error.
        
        In principle an empty expression could be a reference to the 
        top-level context object but, since that's not likely to be 
        useful it's taken as an error. Might revisit this if an expression
        like '.wibble?|' (i.e., the wibble property of the innermost context 
        object, if any, otherwise the global context object) was ever
        found to be useful.
        
        propRef ::= propName '?'?
        
        propName ::= <any characters other than '.', '?' or '|'>+
        
        A property reference names a property of an Python object or,
        if it is a dictionary (member of class dict or a derivative)
        then the value for that key. An appended question mark indicates
        that the property is optional, if it is not present then None
        is returned rather than raising an error.
    """

Expression Values

Expression evaluation first accesses the appropriate Python object property or dictionary values described above. If the result is callable it is called and the returned value is used. None values (either original or the result of a call) are discarded.

For the generator attribute the resulting value must be an iterable other than a string. It is iterated and the resulting values are used. The value attribute's value is used directly. If, slightly oddly, an element has both a value and generator attribute then the value's value is used first then the generator's values.

The value obtained is typically transformed into a DOM tree in the result document. If it's a DOM node it's deep copied in. Iterables other than strings are iterated and the resulting values are copied in, recursively. Strings and anything which can be converted to a string (using the unicode function) is converted to an XML text node.

Here's the start of the docstring for the function responsible for that:

Yields deep copies of possibly, virtual DOM, tree(s) into the new DOM 
Nodes in the implementation of the result document.

source can be, in descending order of precedence:

    None                Yields nothing.

    A DOM node:         Deep copied into the result implementation:
                        see code for odd cases.
    
    A tree walker:      Something implementing domTreeIterator
                        which returns a virtual DOM tree or forest
                        to be copied into the result implementation.
                        
    A string:           Instance of basestring (i.e., str or unicode)
                        which is copied into a text node in the result 
                        implementation.
                        
    An iterable:        Iterated yielding the copies of the elements.
                        
    Anything else:      Anything implementing __unicode__ or __str__
                        which is copied into a text node in the
                        result implementation.

One of the odd cases with DOM nodes alluded to above is CDATA sections. If the result is to be interpreted as HTML then these are converted to text nodes (as they're not part of HTML). Where text is to remain as XML (such as the XHTML in the Atom feed) then CDATA sectionness (sectionality?) is preserved.

domTreeIterator is my own function, analogous to Python's __iter__, which returns an iterator over the nodes of a DOM tree. This is particularly handy for "virtual" DOM trees created from parts of existing ones, e.g., the first few paragraphs of a blog entry which are to appear in the blog index HTML document and the Atom feed.

Code

I tried showing a few examples but they seemed to need a bit much context to be interesting so, instead, here are a few key modules which should give a pretty good idea.

The main template substitution code is in xmltemplate.py. This declares the XMLTemplate class which is used in a couple of places including genericpagetemplate.py. The templating code is in a separate package whose __init__.py handles the overall templating operation once the higher level code has dealt with making a working copy of the files and before it deals with deployment (rsync or local serving).

If anybody's interested in looking at the rest of the code just ask and I'll see about putting it on BitBucket or something.