In this memo, we describe our steps to add support for the Boehm-Demers-Weiser conservative garbage collector to GNAT, the GNU Ada compiler.
GNAT should not require the programmer to specify additional information for objects (with exception of pragma Controlled
). Most information should be inferred from type information.
GNAT should always generate code which is ready for garbage collection, by specifying the proper allocation types (see below).
However, the additional run-time overhead should be minimal.
There are six different allocation types for object, controlled by two parameters.
The first parameter is reachability type and controls how the object can reached from other objects.
Permanent reachability means that the object will not be collected, and explicit deallocation is required to reclaim the space occupied by the object. (In other words, it is assumed that the object in question is always reachable.) An allocator for an access type for which pragma Controlled
has been specified should create objects of this reachability type.
Initial reachability indicates that during the lifetime of the object, it is reachable through a reference to data at its beginning (the first 256 bytes), or that no references to the interior of the object are permitted. If the object has direct or indirect components which are aliased, an allocator can use this reachability type for it.
Full reachability means that references to the interior of the object are permitted. This reachability type has to be used for objects with aliased components.
The second parameter specifies if the object can contain references to other objects.
Atomic objects cannot contain references to other objects. Objects in this category are characters, integers, and arraays and records of these types. To avoid confusion with Ada's concept of atomicity, we also call these objects leaf objects.
Non-atomic or non-leaf objects can contain references to other objects. Note that the Ada implementation may automatically generate such references in complex objects, even if they are invisible to the programmer.
Note that these two parameters are orthogonal. The table below shows how the allocation type maps to the allocation functions provided by the collector.
Reachability | Non-Leaf Objects | Leaf Object |
---|---|---|
permanent | GC_malloc_uncollectable | GC_malloc_atomic_uncollectable |
initial | GC_malloc_ignore_off_page | GC_malloc_atomic_ignore_off_page |
full | GC_malloc | GC_malloc_atomic |
Obviously, an implementation should default to full reachability and non-leaf objects, so that the collector can work correctly in most cases.
The reachability type affects the behavior of instances of Ada.Unchecked_Deallocation
, as indicated in the table below (the atomicity has no such effect).
Reachability | Finalization | Reclamation | Safety |
---|---|---|---|
permanent | immediate | immediate | none |
initial | immediate | delayed (via collector) | partial |
full | immediate | delayed (via collector) | partial |
In this context, finalization refers to the steps executed when the user invokes an instance of Ada.Unchecked_Deallocation
for the object.
Permanent reachability is not safe against premature or repeated explicit deallocation. In the first case, dangling references can be created, in the second case, heap corruption might occur. Incorrect memory management of such objects still results in erroneous execution. In contrast, initial and full reachability can never result in such dangling references, because the actual storage reclaimation is left to the collector. However, if finalization of an object is not idempotent, erroneous execution can still occur.
If full reachability is the default reachability type, Ada.Unchecked_Deallocation
will only perform finalization for most objects, and defer actual storage reclamation to the next collector run.
Note that in most cases, an Ada compiler such as GNAT can infer the required allocation type automatically, by recursively analyzing the involved data structures. However, in some cases, a mechanism for overriding the compiler choice might be helpful. Pragma Controlled
can turn off automatic reclamation of objects, but other mechanisms are not yet supported.
We add the following six global access-to-subprogram variables of type access function (Size : Interfaces.C.size_t) returns System.Address
:
__gnat_config_alloc_full
__gnat_config_alloc_initial
__gnat_config_alloc_permanent
__gnat_config_alloc_full_leaf
__gnat_config_alloc_initial_leaf
__gnat_config_alloc_permanent_leaf
The variable names encode the allocation type (reachability type and atomicity). These are lowest-level allocation primitives without abort deferal. Client code should call one of the following functions, which defer aborts, check for null sizes, and so on. If garbage collection is turned off, all access variables refer to wrappers of the C standard library routine malloc
.
void *__gnat_alloc_full(size_t size);
void *__gnat_alloc_initial(size_t size);
void *__gnat_alloc_permanent(size_t size);
void *__gnat_alloc_full_leaf(size_t size);
void *__gnat_alloc_initial_leaf(size_t size);
void *__gnat_alloc_permanent_leaf(size_t size);
Conceptually, all of them are replacements for __gnat_malloc
, and map to the corresponding routines in the Boehm-Demers-Weiser collector.
The compiler determines the reachability type of an object as indicated above, and generates the fitting call, replacing the former calls to __gnat_malloc
. Invocations of malloc
and __gnat_malloc
in the run-time library code have been adjusted, too.
Two access-to-subprogram variables of type access procedure (Address : System.Address)
are provided:
__gnat_config_dealloc_immediate
__gnat_config_dealloc_advisory
Again, client code should invoke one of these functions instead:
void __gnat_dealloc_immediate(void *address);
void __gnat_dealloc_advisory(void *address);
The first routine is used to deallocate objects which have been allocated for permanent reachability (it is GC_free
in the Boehm-Demers-Weiser collector case). The second routine explicitly deallocates storage for initial and full reachability. In garbage-collection mode, this routine does nothing, otherwise it iss equivalent to the free
routine of the C run-time library.
Like in every other programming language, user-defined finalization poses a complicated problem for garbage collection in Ada. The Ada language requires that finalization occurs for all objects before the program terminates, even those which are explicitly deallocated. Therefore, a list is used to keep track of allocated objects, all allocated objects are reachable until they are explicitly deallocated, and the collector cannot perform its duty.
The Boehm-Demers-Weiser collector can invoke finalizers for unreachable objects, but this mechanism is semantically very different from Ada finalization. As a result, the Ada approach (which prevents finalization in the GNAT implementation) has to be used by default. Objects for which pragma Finalize_Storage_Only
has been specified do not require such bookkeeping and can be properly collected. A mechanism should be offered to register user-defined finalization procedures for such objects which are run when the collector is about to deallocate the object, but without the stringent Ada guarantees (that is, the finalization procedures are not necessarily executed before termination of the main program).
This change is optional, and its implementation has been postponed. Therefore, objects with controlled components can never be leaf objects, even if pragma Finalize_Storage_Only
is specified for them.
The Boehm-Demers-Weiser collector must be notified of thread creation. Therefore, some POSIX thread library calls have to be redirected. The following routines are affected:
pthread_create
pthread_sigmask
pthread_join
pthread_detach
Run-time library calls to these routines are replaced with indirect calls to:
__gnat_pthread_create
__gnat_pthread_sigmask
__gnat_pthread_join
__gnat_pthread_detach
Even if the garbage collector is not active and traditional memory management is used, the changes introduce some additional overhead.
Evaluation of an allocator is slightly more expensive because it now involves one more direct and one indirect call. So is deallocation.
Improved support for user-defined finalization requires moving a compile-time check to run time.
Calls to some POSIX thread routines are indirect instead of direct.
This overhead is not relevant in practice because these operations are already rather expensive, so that an additional indirect call or a comparison does not matter.
A few questions need further investigation.
Task objects are very complex. Are they handled correctly by the collector?
Can we mark access-to-protected-subprogram types as containing no pointers? (We do this for access-to-subprogram types because subprogram code is not a collectible resource.)
Are controlled types (or records with controlled components) properly flagged as non-atomic?
How does incremental or parallel collection interact with Ada tasking?
Does it make sense to issue warnings for packed arrays and records, and records with representation clauses if the non-standard layout hides pointers for the garbage collector?
How do we support programmers in the following situation? An object stores a reference to, say, a file descriptor. When the object is deallocated, the file descriptor should be closed. With the current form of finalization support, the programmer has to write two different objects, one for explicit memory management (which closes the descriptor in the Finalize
procedure), and one which registers finalization handler with the collector. (However, in most cases, objects with references to external resources have to be deallocated explicitly because releasing the external resources has an externally visible effect. Even in the described case, closing the descriptor might release a lock or terminate a network connection, and a conservative collector might never perform this operation.)
A patch to the GCC source tree is provided below.
Note: This patch is preliminary and not fully tested. After applying it, GCC might not even build.
2003-06-29: published
2003-09-05: Preliminary patch for GNAT is now available.