Garbage collectors are scary

Florian Weimer

Garbage collectors are expected to be invisible: if programmers (or even users) notice them, something is wrong. Usually, such problems are are performance problems, but there could be correctness issues as well. This makes implementing them a bit scary.

As a hobby, I am working on a virtual instruction set architecture, mainly to better to understand the nature of computing, and eventually, trade-offs in programming language design. The project stalled for several months because I was reluctant to start working on a garbage collector, which I consider necessary to maintain memory safety. I was a bit puzzled by this, but now I think it is because garbage collectors are scary to work with. Not only are they intended to be largely invisible, but they need to compute a highly non-local property, too: Whether an object is live and cannot yet be deallocated depends on the state of the pointers in all the other object on the heap. Production-grade garbage collectors are very complex, too. So it all seemed very daunting.

This article attempts to document what I did to get unstuck. Most of the ideas below focus on making the garbage collector operation more obvious.

What can be done?

The choice of collector matters. A copying semispace collector, such as Cheney's algorithm looks like a great starting point. (See C. J. Cheney, A nonrecursive list compacting algorithm <https://dl.acm.org/doi/10.1145/362790.362798> (1970), or its Wikipedia page, Cheney's algorithm <https://en.wikipedia.org/wiki/Cheney%27s_algorithm> , retrieved 2024-03-07.) Such a collector can be very small and simple: Not counting the code for identifying root object pointers and for scanning objects and stacks, a functional implementation of tge collector is just a few dozen lines of code (even fewer if there is already code for duplicating objects). The collector copies all objects from a fromspace to a tospace part of the heap (hence the name “semispace collector”). This has the important side effect that all object addresses are guaranteed to change during a garbage collection cycle. Consequently, any use of an object pointer that was missed and not processed during collection results in very obvious issues (typically a crash, but bogus data could be the result as well). With a less memory-hungry collector that tries to work in place, perhaps one that never moves any live objects at all, unprocessed pointers may go unnoticed until a more complex test case comes along. With a copying collector, it is pretty much a necessity to get the root pointer, stack, and object scanning code right from the start.

With such a semispace collector, the address range of the previous heap (what used to be the fromspace) can be kept reserved at the kernel level and mapped with the PROT_NONE protection flag, so that accidental deferencing through an outdated pointer that has not been updated in the previous garbage collection cycle is very likely to fault. This works even if pointers are stored externally (probably by accident), in places that are not expected to be scanned by the garbage collector.

When interfaces with separately developed C code, the untracked pointer problem can be mitigated by careful interface design. The C interface contained in the reference implementation of the Lua programming language uses a virtual stack on which object pointers are stored (Section 4.1, The Stack, <https://lua.org/manual/5.4/manual.html#4.1> in Roberto Ierusalimschy et al.: Lua 5.4 Reference Manual (2023)). The object pointers are never exposed. C programmers use stack slot indices to manipulate them instead. Contrast this with the CPython approach, where the actual object pointers are exposed to C extension modules. With a copying collector, restricting access to object pointers to a small part of the virtual machine only seems reasonable, too.

My virtual ISA has unboxed (not heap-allocated) fixed-size integers and floating point values, both as local variables in procedures, and as fields in heap-allocated objects. The assembler performs type checking, so it already knows whether a register contains an object pointer at the start of each instruction. (An instruction may only use a register as an input if all execution paths to it have a sufficiently compatible type, but the register itself does not have a single fixed type during the execution of the procedure.) At run time, the virtual machine interpreter does not need to perform any type checking because apart from register-to-register moves, separate instructions are used for scalar and pointer values. I already had a working stack unwinder (for exception handling), but it was not clear whether the encoding of the stack frame information at relevant instructions (that may trigger garbage collection) in the form of stack maps was working. Although there is no debugger interface yet, I added a very basic stack dumping implementation, which shows the contents of the stack and, for object pointers identified with the help of stack maps, the object header from the heap. The test suite can then use simple regular expressions to verify that the dump has pointers at the expected places for a set of representative test cases.

For the pointer maps for heap-allocated objects, it is possible to do better: compare the pointer values from explicit object access with the results of the object scanning procedure that uses the same way for identifying pointers as the garbage collector. (I'm explicitly not aiming for general run-time reflection support, so this is very much an internal testing interface only.) This approach based on redundant data (comparing the garbage collector results against the pointers known to be there) makes it straightforward to do end-to-end testing for the object scanning code. Similar introspective testing is possible for stack maps if there is a way to cast object pointers to integers, losing the pointer property. (Against this is just for internal testing purposes only because exposing the address of objects in this way encourages direct manipulation of objects, perhaps from separately written C code.) The garbage collector will not update these integers anymore, but the address object of the object changes. The test case knows which registers contain pointers and make sure that the collector updated those registers (assuming that we have a copying collector).

Continuing the theme of using redundant data for consistency checking, it is possible to keep the heap parseable at all times, or at least at safepoints: Starting at the low address of the heap or a segment, it is possible to find all previously allocated objects (whether live or not). For each object, the address of the next object can be determined based on header information found in the current object. This means that object headers must exist, but they are useful in other contexts as well (such as for implementing dynamic dispatch, or checked downcasts from base class pointers to derived class pointers). Heap traversal can be used to verify at certain points (for example, right after garbage collection) that the heap is consistent, without relying on a fresh traversal from the roots. A first phase can build a bitmap containing set bits for all the start addresses of objects in the heap, and a second phase of the verification procedure can then check that every identified pointer is either null, or points to the start of a previously identified heap object. Currently, this verification procedure is very similar to the Cheney-style copying collector (minus the object copying parts), but it can be used unmodified with any other collector that maintains a parseable heap, even if only artificially (by inserting dummy objects to bridge heap gaps).

One more thing allowed me to get better coverage with the limited tests I have so far: triggering a garbage collection at every allocation, or at least much more frequently than usual. (I think I saw this in Hotspot first, which has various GCALot flags in debugging builds.) With such an execution mode, the tests exercise the garbage collection code much more heavily.

Was it worthwhile?

So far, the collector seems to be working well, which should not be surprising given that it is so simple. The collector-specific issues I encountered were associated with the heuristics for heap sizing, and determining the heap limit at which the next garbage collection starts. The address calculations for that (and which memory regions to unmap, given that they will not be needed) turned out to be quite tricky. There were some early integration issues with stack maps because the assembler and virtual machine implementation did not quite agree upon their encoding, but that was already visible in the stack debugging dumps I added for testing.

Note: There is a follow-up article

Revisions

2024-03-07: published
2024-10-20: follow-up article published

Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum