Conservative Garbage Collection for GNAT

In this memo, we describe our steps to add support for the Boehm-Demers-Weiser conservative garbage collector to GNAT, the GNU Ada compiler.

Design goals

Allocation Types

There are six different allocation types for object, controlled by two parameters.

Reachability

The first parameter is reachability type and controls how the object can reached from other objects.

Atomicity

The second parameter specifies if the object can contain references to other objects.

Mapping to the Boehm-Demers-Weiser Collector

Note that these two parameters are orthogonal. The table below shows how the allocation type maps to the allocation functions provided by the collector.

ReachabilityNon-Leaf ObjectsLeaf Object
permanentGC_malloc_uncollectableGC_malloc_atomic_uncollectable
initialGC_malloc_ignore_off_pageGC_malloc_atomic_ignore_off_page
fullGC_mallocGC_malloc_atomic

Obviously, an implementation should default to full reachability and non-leaf objects, so that the collector can work correctly in most cases.

Unchecked deallocation

The reachability type affects the behavior of instances of Ada.Unchecked_Deallocation, as indicated in the table below (the atomicity has no such effect).

ReachabilityFinalizationReclamationSafety
permanentimmediateimmediatenone
initialimmediatedelayed (via collector)partial
fullimmediatedelayed (via collector)partial

In this context, finalization refers to the steps executed when the user invokes an instance of Ada.Unchecked_Deallocation for the object.

Permanent reachability is not safe against premature or repeated explicit deallocation. In the first case, dangling references can be created, in the second case, heap corruption might occur. Incorrect memory management of such objects still results in erroneous execution. In contrast, initial and full reachability can never result in such dangling references, because the actual storage reclaimation is left to the collector. However, if finalization of an object is not idempotent, erroneous execution can still occur.

If full reachability is the default reachability type, Ada.Unchecked_Deallocation will only perform finalization for most objects, and defer actual storage reclamation to the next collector run.

Allocation type inference

Note that in most cases, an Ada compiler such as GNAT can infer the required allocation type automatically, by recursively analyzing the involved data structures. However, in some cases, a mechanism for overriding the compiler choice might be helpful. Pragma Controlled can turn off automatic reclamation of objects, but other mechanisms are not yet supported.

Required Changes to GNAT

New allocator functions

We add the following six global access-to-subprogram variables of type access function (Size : Interfaces.C.size_t) returns System.Address:

The variable names encode the allocation type (reachability type and atomicity). These are lowest-level allocation primitives without abort deferal. Client code should call one of the following functions, which defer aborts, check for null sizes, and so on. If garbage collection is turned off, all access variables refer to wrappers of the C standard library routine malloc.

Conceptually, all of them are replacements for __gnat_malloc, and map to the corresponding routines in the Boehm-Demers-Weiser collector.

The compiler determines the reachability type of an object as indicated above, and generates the fitting call, replacing the former calls to __gnat_malloc. Invocations of malloc and __gnat_malloc in the run-time library code have been adjusted, too.

Dealloaction

Two access-to-subprogram variables of type access procedure (Address : System.Address) are provided:

Again, client code should invoke one of these functions instead:

The first routine is used to deallocate objects which have been allocated for permanent reachability (it is GC_free in the Boehm-Demers-Weiser collector case). The second routine explicitly deallocates storage for initial and full reachability. In garbage-collection mode, this routine does nothing, otherwise it iss equivalent to the free routine of the C run-time library.

User-defined finalization

Like in every other programming language, user-defined finalization poses a complicated problem for garbage collection in Ada. The Ada language requires that finalization occurs for all objects before the program terminates, even those which are explicitly deallocated. Therefore, a list is used to keep track of allocated objects, all allocated objects are reachable until they are explicitly deallocated, and the collector cannot perform its duty.

The Boehm-Demers-Weiser collector can invoke finalizers for unreachable objects, but this mechanism is semantically very different from Ada finalization. As a result, the Ada approach (which prevents finalization in the GNAT implementation) has to be used by default. Objects for which pragma Finalize_Storage_Only has been specified do not require such bookkeeping and can be properly collected. A mechanism should be offered to register user-defined finalization procedures for such objects which are run when the collector is about to deallocate the object, but without the stringent Ada guarantees (that is, the finalization procedures are not necessarily executed before termination of the main program).

This change is optional, and its implementation has been postponed. Therefore, objects with controlled components can never be leaf objects, even if pragma Finalize_Storage_Only is specified for them.

Interaction with tasking and POSIX threads

The Boehm-Demers-Weiser collector must be notified of thread creation. Therefore, some POSIX thread library calls have to be redirected. The following routines are affected:

Run-time library calls to these routines are replaced with indirect calls to:

Run-time overhead

Even if the garbage collector is not active and traditional memory management is used, the changes introduce some additional overhead.

This overhead is not relevant in practice because these operations are already rather expensive, so that an additional indirect call or a comparison does not matter.

Open questions

A few questions need further investigation.

Patches

A patch to the GCC source tree is provided below.

Note: This patch is preliminary and not fully tested. After applying it, GCC might not even build.

Revisions


Florian Weimer
Home Blog (DE) Blog (EN) Impressum RSS Feeds