SExpr: Internal Representation

An Eccentric Anomaly: Ed Davies's Blog

SExpr lives in the same space as Peter Norvig's Lispy and Michael Nielsen's Tiddlylisp but the internal representation of S-expressions is just different enough to be worth writing down if only to give context to following blog posts on other aspects of the language.

An sexpr can be any of: null, True, False, a number, a string, a callable (lambda expression, value of a symbol defined by a def expression, a built-in or a context object (to be discussed later)), or, recursively, a cons pair of two other sexprs.

Basically, all values are immutable. The only exception is that you can define a symbol in a context if that symbol has not previously been defined (and if the context hasn't been frozen). There's no equivalent of Scheme's set! or set-car!, etc.

Most of these values have straightforward Python implementations. See sexpr/__init__.py.

Null is represented by the singleton instance of a special class. At first I just used Python's None value but that allowed a couple of errors where I forgot return values from some built-ins by accepting Python's default return value. Also, having a separate class allows a custom __str__ function to return “()” making for more convenient serialization.

True and False are represented directly by the equivalent Python values as they serialize to sensible strings.

Numbers are represented by Python int and float values in the obvious way. The only bit that was non-obvious to me is that Python's True and False values are sort of numbers whereas I don't want booleans to be considered as numbers so explicitly disallow them as being considered so. (Must get round to understanding this aspect of Python in more detail - I suspect I'm not going to like what I find.)

The only strings allowed in the source language, at present, are identifiers but they're defined pretty liberally. They are represented internally using Python strings. Later I could add quoted strings in the source language if I found a need and represent them in the same way, perhaps with an implicit quote wrapper - I'm not sure.

At first I represented cons pairs as simple two-element Python tuples but fairly quickly changed to using instances of a Python class specifically written for the purpose. As with null that makes serialization easier but more importantly makes equality testing a lot quicker as the class can have its own custom hash function so two cons pairs with different hash values can quickly be seen to have different heads, tails or both. IIRC, this changed some code from taking multiple seconds to execute down to near instant.

Lispy and Tiddlylisp don't bother with cons pairs - they just support well-formed (ie, properly null-terminated) lists which are represented directly as Python lists. The problem is that when they take the tail of the list (cdr) they have to make a copy which seems a bit heavy handed. My code, dealing with parts lists, does this a lot. Anyway, cons pairs is proper Lisp ;-).

Callables are basically anything which can be the first item of a list which is evaluated. They are represented by Python functions, an idea I got from Lispy and Tiddlylisp. For a Python function to be considered a callable it must take two parameters:

Somewhat unusually, the remainder of the list is not evaluated prior to being passed to the function. It's up to the callee to do that, using the context object passed as the second parameter, in the common case where that's required. This allows the callee the option of not evaluating some of its parameters, e.g., the first for a let expression to give the name of the symbol to be defined.

Here's the implementation of let in the interpreter:

def destructure(var, val, ctx):
    """ Carry out a simple form of destructuring assignment. """
    if var == null:
        assert val == null
    elif isPair(var):
        destructure(head(var), head(val), ctx)
        destructure(tail(var), tail(val), ctx)
    else:
        ctx[var] = val

def operatorLet(params, ctx):
    """ Bind a symbol to a value (or a list of symbols to the values in
        a list).
    """
    assert listLen(params) == 2
    var = params[0]
    val = seval(params[1], ctx)
    destructure(var, val, ctx)
    return val

The first element of params (the name to be defined) is used as is whereas seval is called for the second element to get the value to be assigned. The checks for the symbol being a string, not already being defined and the context not being frozen are done within the context object's __setitem__ method.

This leaving evaluation to the callee also allows short-circuit evaluation to bypass expressions which would cause nasty side effects (like raising errors or taking a long time). E.g:

def operatorAnd(params, ctx):
    assert listLen(params) >= 2
    for e in listItems(params):
        p = seval(e, ctx)
        assert p in (True, False)
        if p != True:
            return False
    return True

For functions defined in the language (using def or λ) the usual thing is that the wrapper closure which is used to represent the function will do the evaluation of all of the parameters before going on to evaluate the body of the function.

However, there's a notation in the formal parameter list allowing this parameter evaluation to be suppressed. This means that otherwise-ordinary functions can achieve the same sort of effect as Scheme macros without any thought of “compile time” (whatever that means - I don't know, exactly, in this context). Some examples when I get round to discussing context objects and so on.