The Fool on the Hill: Post scarcity: Memory, threads and communication

One benefit of getting really annoyed with Daniel Holden's book on how to Build Your Own Lisp is that I have finally started work on building software for my decade-old Post Scarcity Software idea. There are some problems you don't really see until you start to build something.

Almost all previous Lisps have been primarily single threaded; I think all previous Lisps have been single user. Some problems occur with a multi-threaded, multi-user system which don't occur (or at least aren't problematic) on a single-threaded, single-user system.

In this note I use the words 'thread' and 'process' interchangeably; for the purposes of this argument, there is no difference between a thread and a process.

In a multi-threaded system, whenever one writes to an object, one needs to get an exclusive lock on it, in order to prevent any other thread trying to write to it at the same time. In theory, immutable data gets round this problem, since once a piece of memory is allocated it can't be written to. The problem with that is that memory management requires to keep track of the use of objects, and in typical schemes that involves writing all over the place. Also, memory allocation in a common pool must normally require locks.

Post Scarcity Hardware and the locking problem

The Post Scarcity Hardware proposal partly gets around this problem. Main memory is curated by one process running on one processor which does nothing else but curate main memory. It runs no user space programs, but instead only listens for requests for memory objects from 'normal' processors and streams the requested data over the data bus.

Each 'normal' processor has its own local data pool which both caches data requested from main memory and stores work-in-progress not yet returned to main memory. Each 'normal' processor runs only a single thread of user-space program (it runs at least one background process managing bus access). But because there are at least hundreds and possibly millions of 'normal' processors the number of concurrent threads can be very large.

However, we don't yet have special purpose Post Scarcity Hardware, so for now Post Scarcity Software has to run on systems with low numbers - a few tens - of processors, running in a common memory pool.

Commodity hardware, and reference counting

I'm still experimenting with reference counting garbage collection. Reference counting garbage collection requires that every time one makes a link to a cell, the reference count on that cell must be updated, so the cell must be locked by one process in order that that update can be carried out. Having a lock on each cell makes every cell bigger, and consequently is expensive of memory. So reference counting makes the problem of multi-threading more difficult than would mark-and-sweep. Does this mean I should abandon reference counting? I don't think so.

The Page Per Thread model

The nature of immutable data is that a thread can 'see' - hold pointers to - only data written by its parent, grandparent or older ancestor threads. It also can't see new data written by its ancestor threads after it was spawned. This is because an older cell, being immutable, cannot hold a pointer to a newer cell. If new data is created by other, concurrent threads, then 'our' thread cannot except by extraordinary mechanisms see that data, and so it can't make links to that data.

A new cell can of course be created with a link to an older cell, but only an older cell which is visible to its process; which is to say, in the normal run of things, only a cell written by its own process or by an ancestor process. But in typical running, cells most frequently link to cells of similar age - to cells only a little bit older, and so, typically, in the same process.

This suggests an optimisation. We're organising cells into pages anyway, for a number of pragmatic reasons. If, instead of a lock-per-cell model, if we have a lock-per-page model we not only save a lot of memory; we also save a lot of contention. Threads can be assigned a page 'of its own' to work with, and, when that page is exhausted, can be assigned further pages. A separate free list is maintained for each page.

This means no thread will contend with other threads when writing new data into memory. Contention with other threads may occur when updating reference counts to older data either written by a still running parent process or by a common ancestor of another still-running thread. So there still needs to be a mechanism for locking pages, and a process must still verify that it has a lock before it can update a reference count, much less write new data. But since it will mostly be working in its own page, waiting for locks will be relatively infrequent.

Note that the 'page per thread' model works equally for mark-and-sweep as for reference counting. Indeed for mark-and-sweep it has an additional benefit - only one thread has to be halted while mark-and-sweep is done.

Communicating in an immutable environment

The problem

If we stick rigidly to the principle that data is immutable, it is impossible to set up a communication channel between two processes, and it is impossible for a process to access, for example, a newer version of some function written since the process was started. Indeed, strictly immutable data means a process cannot see any data written by its grandparent since its parent was spawned, by its great grandparent since its grandparent was spawned, and so on back to the root process. So we cannot create a communication channel between two processes, since neither can see what the other is writing.

If namespaces are implemented as association lists, then namespaces themselves will form trees mapping exactly onto the process tree. A name in a namespace, as visible to a process, will refer to the value it had when its parent was spawned, or to a value written by its parent before it was itself spawned, and so on back, recursively.

This is deeply undesirable. It means that if the root of the process tree - the ultimate ancestor process, equivalent to the init process on UNIX - is kept alive, and a critical software update is deployed in that ancestor, nevertheless that critical software update will not be available to any other process. Windows, of course, needs a reboot after any system update, but that is exactly the sort of crap that post-scarcity software is supposed to be making obsolete.

As I'm envisaging systems which stay up for very long periods - many years - a reboot to deploy new versions of system functions is not acceptable.

But the problem is worse than that. If Anne and Bill are working concurrently on a new bit of software, Bill cannot see the changes Anne has made and vice versa. If they aren't working in the same building, they can't even open a chat channel to one another.

Mutable namespaces

What I think this means is that some (not necessarily all; indeed, very probably not all) namespaces must be mutable. Of course this means that a namespace must have not only a reader access list but also a writer access list. Anything that is to be communicated between concurrent processes, or whose most recent value must be visible to other processes, must be named in a mutable namespace and, if requested by name, the most recently written value must be returned.

In this way system software updates are immediately available to all processes, and people can work collaboratively on the same things. It also provides a very crude mechanism for communication between processes, but I think as a mechanism for communication this is too crude and I shall propose another in a moment.

Automatic revision control in namespaces

If it's possible to rebind a name in a namespace to point to something new, what happens to its previous value? Is it lost and garbage collected, or is it retained as a 'previous version'? Both possibilities are desirable, but if we're serious about 'post scarcity' then economy of store is not a significant issue. I'd prefer automatic revision control to be the default behaviour, so that all previous values were retained and could be explored; so we can ask not only for the current binding of a name but also for a history of its previous bindings and for any particular binding from that history.

Inter-process communication

I already have the concept of read streams and write streams. A pipe can be implemented as a pair of a write stream and a read stream; if there's a shared mutable namespace visible to two processes, possibly owned by different users, then it's possible for one process to advertise a request to communicate to another, and for the other to accept that request.

Obviously if there's a pipe it's possible to print a datastructure to the write end, and for the read end to create a copy; but I believe that it would be desirable also to be able to pass pointers over a pipe, so that instead of Bill being able to make his own local copy of Anne's datastructure he could just link to it directly.

Summary

Multiple processes means that there's potential contention when writing to memory; a fairly simple segmentation of writable memory should make it much less onerous. Immutable data makes interprocess communication hard; some relaxation of immutability is needed to make it possible.