The Fool on the Hill: The immutable pool: more on optimising memory management for functional languages

Further to yesterday's note on optimising the Java Runtime Environment's memory allocator for the code generated by functional language compilers, I've been reading up on the memory allocator in the OpenJDK Java platform implementation.

First, a note about nomenclature. To my mind the 'Java Virtual Machine' is simply a processor which processes instruction codes - as it were, something in the same category as an ARM 6 or an Intel 80486, except implemented in software. To me it's clear that memory management is not 'part of' that device, it's a low level library expected to be available as part of the runtime environment. However, the writers of the HotSpot VM documentation don't see it that way. To them, the memory allocator is part of the virtual machine, not part of the runtime environment, and as I'm playing with their ball I shall try in what follows to stick to their preferred nomenclature.

The HotSpot memory allocator operates a per-thread generational garbage collector, which depends on

... the second part of the weak generational hypothesis: that there are few references from old objects to young objects.

The generational garbage collector is in fact much more complex precisely because there are some references from an older object to younger objects. The collector uses two pools of memory, a 'young generation' pool and an 'old generation' (or 'tenured') pool (actually it's considerably more subtle than that, but that's enough detail for the present discussion). Churn and froth are expected in the young generation pool, so it is garbage collected regularly. An object which survives more than a short while in the young generation pool is promoted or scavenged into the old generation pool by copying it; and when this is done, it is necessary to scan through the whole old generation pool (as well as the young generation pool) to fix up any pointers which pointed to its old location in the young generation pool so that they now point to its new location in the old generation pool. That scan is inevitably costly: it must visit every single pointer, and compare its value to the value to be changed.

So far so good. But, how come there are any pointers in the old generation pool which point to newly created objects in the young generation pool? There are, because objects in Java are mutable. We can do the equivalent of RPLACA and RPLACD operations - we can destructively overwrite pointers - and in fact the Java imperative object oriented paradigm encourages us to do so.

You can write classes of immutable objects in Java; if you declared each instance variable of a class (and each instance variable of every class from which it inherits) to be final, then every object of that class would be immutable. I've never seen it done. But in pure functional languages, all data items are immutable. You cannot overwrite pointers.

Of course, for many purposes holding state information is pretty vital, or at least it's hard for us, educated as we have been in imperative languages, not to see it as such. So Clojure, for example, while holding to the mantra that data is immutable, has a special reserved area of Software Transactional Memory in which mutable data may be stored, and also handles things like I/O by invoking Java classes expected to be present in the environment which do use mutable data. Nevertheless, programs compiled with the Clojure compiler can be expected to generate a very high proportion of objects with are immutable.

So the question arises, at what density of immutable objects does the idea of having an additional pool for 'old generation immutable' objects become a win? Remember that an older, immutable object cannot hold a pointer to a younger object (whether mutable or not), because the younger object did not exist when the older object was created. So immutable 'old generation' objects do not need to be scanned and 'fixed up' when a surviving 'young generation' object is scavenged into the old generation pool. Given that the scan operation is expensive, it does not seem to me that the density of immutable objects would need to be very high before this was a win.

For clarity, the algorithm for the revised 'scavenger' part of the garbage collector could be as follows:

for each object in the 'young generation' pool
if it has survived long enough to be worthy of scavenging (no change to that part of the algorithm) then
if mutable flag set, then
copy into the 'old generation mutable' pool (i.e. no change to what happens now).
else
copy into the 'old generation immutable' pool
end if
scan remaining objects in the 'young generation' pool and fix up pointers (since objects in the young generation pool, even if immutable, may reference the object just promoted - again, no change to what happens now)
scan all objects in the 'old generation mutable' pool and fix up pointers.
end if
end loop

There's no need to scan anything in the 'old generation immutable' pool, because it cannot hold pointers to the promoted object. For the same reason there's no need to scan the 'old generation immutable' pool during the mark phase of a generational mark-and-sweep operation, which necessarily happens before each group of salvage operations.

So the tradeoff is one 'check flag and branch' operation for each object promoted, against scanning the whole old generation space including potentially many immutable objects. I'd guess that by 10% immutable objects you'd have a performance win, and that for higher immutable object densities you could have an even more important win.

The fly in the ointment is that you need a flag bit on the header of every object. The OpenJDK object header currently contains a (by implication 32 bit) 'mark word', which is used precisely for flags used by the memory allocation system, but I haven't yet found where those flags are documented or whether any are free to be used.

Finally, this optimisation - while both potentially bigger and easier to implement than the cons-space optimisation I suggested yesterday - is independent of it. Both optimisations together potentially offer a still bigger win.

The Fool on the Hill

Saturday, 24 August 2013

The immutable pool: more on optimising memory management for functional languages

No comments: