Garbage collection (Halloween edition)

Florian Weimer

The thing I worried about since I started to work on my own toy garbage collector has happened: I had my first real garbage collector bug.

My first garbage collector bug

As expected, my first garbage collector was difficult to track down, needing quite a few hours of staring at stack traces and dumps and even instruction execution traces. And it was not obvious at all that it was related to object allocation and garbage collection. All those are among the reasons why I wrote a while back that garbage collectors are scary.

There is how the story unfolded. While working on a support library for byte strings, one of the exception tests hit a curious bug: a 64-bit integer field in an expected exception was not initialized correctly. Instead of the value -1, it contained a number like -4294967264 (0xffffffff00000020 in hexadecimal). This was rather puzzling because single-stepping through the implementation of the alloc instruction showed that the correct value as written:

        // _long values.
        count = fi & 0x1f000;
        if (count) {
          count >>= 12;
          auto p = align_pointer_and_cast<uint64_t>(ptr);
          for (unsigned i = 0; i < count; ++i) {
            FETCH_SOURCE_REG();
            *p = ul3_regs[source_reg].l;
            ++p;
          }
          ptr = (char *) p;
        }

But yet at the end of the allocation, somehow the object did not have the right data. A hardware watchpoint on the memory address (activated using GDB's watch -l command) did not show any unexpected writes to the location, which made this rather mysterious. The issue persists at the -O0 C++ compiler optimization level, so it is clearly not an aliasing violation.

The answer is related to the pointer alignment step. In my virtual instruction set, the alloc instruction is a bit of an outlier in that it is variable-length: it contains a register list for the initial field values. To execute the instruction, the interpreter iterates through the list, guided by type metadata information that provides the width of each field that is stored into the newly allocated object. If a 64-bit field (a _long field) follows other, smaller fields (or comes directly after the 32-bit object header), there may be be a padding gap. This gap is not encoded in the type metadata. Instead it is handled implicitly. Hence the align_pointer_and_cast function call in the code snippet above. (The whole design of the alloc instruction, with a nested bytecode interpreter of a different kind, is a bit dubious in retrospect; it might deserve an article of its own.)

Of course, this goes wrong rather spectacularly if the start address of the newly allocated object is not itself 64-bit aligned. This is exactly what happened. As part of the byte string library, I had added an internal function that performed an uninitialized byte array allocation. (The usual byte array allocation instruction produces as a zero-initialized array.) It turned out that this new function did not round up the next allocation pointer to the next multiple of 8. Instead it left it pointing right after the end of the newly allocated array. As a result, subsequent allocations might place objects at addresses that are misaligned and not multiples of eight bytes.

Apart from the alloc instruction fiasco, this worked surprisingly well. Especially considering that in the current implementation, object pointers need to have their least-significant bit set, otherwise the garbage collector treats them as non-pointers. (This allows the collector to ignore null pointers and fixnum values at the same time.) Misaligned pointers would not survive garbage collection, which leads to the next question.

Where was GC-a-lot-mode when I needed it?

As mentioned in my previous article, it is possible to run garbage collectors in a GC-a-lot mode, where every allocation triggers garbage collection. Why did GC-a-lot mode not catch this issue sooner? In GC-a-lot mode, after a successful garbage collection, the allocation limit (for the pointer bumping allocator) is set so that only the next allocation (the one that triggered garbage collection) succeeds. The allocation after that will again invoke the garbage collector. I did not consider that the initial heap size is not zero. While the heap is quite small, it is still large enough that garbage collection is never triggered for most tests. As a result, GC-a-lot mode never kicks in.

I fixed this by special-casing heap initialization. If GC-a-lot mode is enabled from the start, the initial heap is as small as possible, containing only the relocated version of the initial heap data stored in the program image.

I also added a check at the start of each garbage collection that the next allocation pointer is a multiple of eight. This assertion cannot be compiled out and is active in regular mode, not just GC-a-lot mode. The test suite now disables tests that are incompatible with GC-a-lot-mode automatically (mostly tests that do not complete in any reasonable amount of time due to the increased execution overhead). Part of the problem was that running the entire testsuite in GC-a-lot-mode was not a routine activity, I think.

A whole bag of treats

As usual, if you think you are testing something, but you actually are not, once you start testing it, all kinds of odd issues show up.

This is already quite an impressive list for a subsystem that was believed to be relatively bug-free, at least as far as such core correctness problems are concerned.

And the story goes on

This was just the initial bug haul. The issue that triggered all this (the misaligned next allocation pointer) is quite close to the garbage collector core, but the other defects are mostly related to identifying the right stack map and processing it correctly. The bugs did not quite end there, though. During subsequent library development, two more issues showed up:

How many more bugs remain?

At least one of the important bugs was already present when I wrote the initial article about garbage collector scariness. Against my hopes, GC-a-lot-mode did not find reveal the bug at that time, mainly because the mode as implemented did not work as expected for short tests with little heap usage. This leaves an uneasy feel about what types of bugs might still be lurking in the shadows. I also encountered a couple of type safety bugs in my assembler, but that was not unexpected given the ad-hoc nature of its type checker.

At least there is a silver lining. As a part of debugging the initial slew of bugs, I finally wrote some basic debugging information support. Backtraces for failed test assertions now include procedure names, not just program counter values from the executed program image. This makes programming in this system a bit more tractable, and even fun.

Revisions


Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum