Saturday 31 December 2016

How not to build your own Lisp

Occasionally one buys a book which is a disappointment. Usually, when I buy a book which is a disappointment, I don't review it, because it isn't nice trashing other people's hard work; and that's especially true when the writer has written as engagingly and sincerely as Daniel Holden has. He's written a book I'd like to like.

But sometimes it's important to explain why a book is a disappointment, what is wrong with it, and what residual merit it still has.

Lisp is the List Processing Language. The clue is in the name. A linked list is a very simple and primitive data structure - essentially a node in a binary directed graph - from which other data structures (including executable programs, in the form of recursive functions) can be built recursively. And it is this inherent recursive nature which enables the other critically interesting point about Lisp: it implements the lambda calculus, Alonzo Church's groundbreaking mathematical formalism which made it possible to reason about the nature and limits of computation.

The problem is that the language Holden is showing you how to write, while it has some of the surface level syntactic structure of Lisp, isn't a list processing language at all. There are no lists. So there are no list cells. So there can be no primitive to construct a cons cell, nor one to take the value of its first pointer, nor one to take the value of its second pointer.

Holden does mention his lack of list cells, in a boxout on page 88; he says

'This naturally leads to an implementation using linked lists, a different data structure to the one we are using. I choose to represent S-Expressions as a variable sized array in this book for the purposes of simplicity, but it is important to be aware that the official definition, and typical implementation are both subtly different'

It's not subtly different. It's crucially different. You can indeed make something that looks like a duck out of papier mache, but it won't walk like a duck and it won't quack like a duck. It's not a duck, and this is not a Lisp.

For example, take a list '(a b c). Let's call that list p: (let ((p '(a b c))). Now take the tail of that list twice: (let ((q (cdr p))(r (cdr p))). Now suppose we test whether q and r (both, remember, being the tail of p) are the same thing:

* (let ((p '(a b c))) (let ((q (cdr p))(r (cdr p)))(eq q r)))

T

Yes, they are.

What about in Holden's language?

No, they're not.

They're both identical copies of the same thing.

We've lost the distinction between what is the same and what looks the same. We can no longer tell the difference between a duck and a papier mache copy of a duck.

Another crucial issue in the design of a Lisp is memory management, something I'm intensely interested in. Holden ignores this, simply delegating it to the C heap; but because instead of constructing homogeneous list cells all of which have the same size, he's constructing variable sized vectors, he will fragment the heap and ultimately cease to be able to allocate more memory even when there is memory available. Of course, in modern machines with very large amounts of memory it's unlikely that anything written in a toy programming language is going to get into this situation, but it's still disappointing.

All of this raises the question who the book is for. Lisp is not a popular language. It's a relatively obscure language, of interest to computer science geeks because of its simplicity and power. A book called 'Build Your Own Lisp' is likely to appeal to computer science geeks, and especially to computer science geeks who want to build their own Lisp. It's not likely to be of interest to beginner programmers, because beginner programmers wont know what Lisp is or why they should be interested in it.

So the people who will buy the book won't benefit from it, and are liable to be annoyed by it; while the people who might benefit from it are unlikely to buy it.

But they'll be missing a trick, because what this book is is a very good, clear, engaging introduction to writing a non-trivial program in C.

No comments:

Creative Commons Licence
The fool on the hill by Simon Brooke is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License