The Fool on the Hill: Reference counting, and the garbage collection of equal sized objects

Yes, I'm still banging on about ideas provoked by the Wing, White and Singer paper, as presented at Mostly Functional. Brief summary: on Friday I posted an essay on the possible use of cons-space as an optimisation for the JVM memory allocator, for use specifically with functional languages (I'm thinking mainly of Clojure, but it has implications for other things such as Armed Bear Common Lisp, and possibly also Scala). Yesterday, I wrote about maintaining a separate heap for immutable objects, as another optimisation. Today, I'm going to write about reference counting, and where it fits in the mix.

The HotSpot JVM uses a tunable generational garbage collector, as detailed here. Generational garbage collectors are a development of the classic mark-and-sweep garbage collector that was used in early LISP implementations from about 1962.

Mark and Sweep

A mark and sweep garbage collector is triggered when allocatable memory is almost exhausted. When it is triggered, execution of the user program is halted. The garbage collector then

Mark phase:

iterate over every object in the heap, clearing the 'mark bit' in the header of each;
Set the mark bit in the 'root' object;
repeatedly do
for each marked object in the heap,
for each pointer in the object,
set the mark bit in the header of the pointed-to object;
end for each
end for each
until no further objects are marked

Sweep phase:

iterate over objects in the heap again, as follows
for each object,
if there is 'free space' (i.e., objects which were not marked as pointed to in the preceding step) 'below' the object in the heap, then
copy the object as low in free space as is possible
iterate through every object in the whole heap fixing up every pointer which pointed to the object in its old location, to point to its new location.
end if
end for each

Finally, the user program is restarted. Needless to say, all this is expensive in time, and leads to noticable pauses in the running program.

Generational

The 'generational' garbage collector optimises this by observing that in most programs, the majority of objects are short lived, and that therefore it is younger objects which should most aggressively be garbage collected. The heap is divided into (at least two) segments, an 'old generation' segment and a 'young generation' segment. A generation counter is added to the header of each object, initialised to zero when the object is instantiated.

Then, when allocatable memory is almost exhausted, normally only the 'young generation' segment is marked and swept. Each time an object survives garbage collection in the 'young generation' segment, its generation counter is incremented, and when it hits the generation-max value, it is copied from the 'young generation' segment into the 'old generation' segment. However, obviously, when any object is moved, either by the sweep operation in the 'young generation' space or by promotion into the 'old generation' space, the entire heap needs to be checked for pointers which need to be updated.

Finally, if, when doing a promotion, it is found that 'old generation' space is almost exhausted, a full mark and sweep operation is performed on old generation space.

Although this sounds (and is) considerably more complicated than the naive mark and sweep algorithm, the 'new generation' garbage collection operations tend to be relatively quick, leading to less frequent noticeable pauses.

Mark and sweep interacts horribly with paged virtual memory systems but it has to be said that generational isn't a whole lot better here, since there will still be a need repeatedly to access every page on which any part of the heap, no matter how rarely used, is stored.

An aside: the look-aside table

One possible means of avoiding the need to iterate over every pointer in the heap each time an object is moved is to have an indirection table, or look-aside table. This is simply an array of pointers, one pointer for every possible object in the heap. Now 'pointers' within user-program objects are simply indices into the location in the indirection table where the actual pointer is stored. Now, when an object is moved during GC, only the indirection table needs to be updated.

That sounds very appealing and efficient; it clearly saves a very great deal of time. Unfortunately it's inefficient and clumsy in its use of memory. The indirection table once allocated cannot easily be extended, and once entries in the table are exhausted, no more objects can be created even if there it still plenty of free memory left in the heap. Finally, every single object access requires an additional memory lookup. All these things together mean that the look-aside table isn't as much of a win as it seems, and is not often used.

Reference counting

In a reference counting garbage collector, every object header contains a reference counter, initially set to one (since when the object is created, something must point to it). Whenever a new pointer is created to an object, the reference counter on the object is incremented. Whenever something that points to the object is removed from the system, the reference counter on the object is decremented. when the reference counter is decremented to zero, the object is removed from the system, and the reference counters of any objects it pointed to are in their turn decremented.

Let's walk a little over what that means. Typically, reference counting is used in systems which have uniform-sized objects. At system initialisation time, memory is essentially an array of 'empty' objects. Each of these objects is initialised with a header which marks it as being an empty object, with a reference count value of zero. A register or global variable, the 'free list pointer', points to the first of these objects; each object in turn points to the next object.

In use, when a memory object is allocated, it is popped off the front of the free list; when a memory object is deallocated, it is pushed back on the front of the free list. Because all the objects are equal sized, any object can be initialised in the space left by any other, so there's never any need to move things. And if memory allocated to the user program becomes exhausted, provided the operating system can allocate more memory it can be initialised and added to the end of the free list at any point - it does not have to be contiguous with the existing memory allocation.

So, lots and lots of win. There's never any significant pause for garbage collection. Nothing ever has to be moved, so there's no problem with fixing up pointers. Why doesn't every system do it this way?

Because, sadly, in a normal useful user program there's a need for unequal sized objects. Yes, strings can be represented as lists of characters; raster images can represented as lists of lists of bits. But this is hopelessly inefficient. So it's much better to maintain a heap for variable sized data. Of course, the pages of equal sized objects - 'cons space' - can themselves float in the heap, so there's no ideological problem with having variable sized data in a reference counter system.

Typically, for each 'heap space object', you'll have a proxy pointer object in cons space. Other things which reference the heap space object will actually hold a pointer to its proxy, and the proxy will have the usual header fields of a cons-space object including the reference counter. The type field in its header will indicate that it is a proxy for the heap space object, and a pointer in its body will point to the actual heap space object.

Problems

There are still a few problems with this solution, most of which affect long-running programs. The first is, no matter how many bits you allocate to the reference counter, there is a maximum value it can store. What happens when an object has more references to it than its reference counter can store? Well, obviously, it can't be incremented further, because it would wrap around and you'd end with a mess. But, equally, it can't be decremented - because you don't know how many times to decrement. So once an object has reached the maximum reference value, it can never be removed from the system by the normal operation of reference counting memory manager.

More subtly, circular data structures can never be removed from the system even if nothing outside the circle any longer references it, since each element holds a pointer to the next and none can ever be decremented to zero. This, however, can't happen in a pure functional language with immutable data objects, since circular data structures are then impossible to create.

Finally, while the user program will not have to be paused for mark-and-sweep, occasionally the deletion of an object which serves as the root of a deep tree of objects will cause a cascade of further deletions, which may also cause a noticeable pause.

Heap space, of course, will fragment, as heap space of any variable-size-object system always does. But heap space can be compacted using a very infrequent mark-and-sweep, and, in any case, this problem isn't special to reference counting systems.

Conclusion

In summary, especially for systems with very large numbers of equal-sized objects such as is typical of programs written in LISP-like languages, reference counting garbage collectors have always struck me as having many benefits. Adding reference counting to any Java Virtual Machine would be decidedly non-trivial, however, and, in particular, using proxy objects in cons-space to point to heap space objects might (I don't yet know) break compatibility with existing compiled Java programs. Also, while a reference counting system may have fewer noticeable pauses, its overall efficiency is not necessarily better than a generational system. It's more 'worth a try' than 'a certain win'.