Performance issues with Java finalizers

It is surprisingly easy to outpace the Java garbage collector if lots of objects with finalizers are created, leading to spurious out-of-memory errors with lots of objects which could theoretically be reclaimed.

The issue described in this note occurred in a very thin wrapper around SQLite <http://www.sqlite.org/> , a small process-embedded database written in the C language. Like most SQL database wrappers, there are two basic classes: databases and prepared SQL statements. In both cases, the Java wrappers refer to native resources managed through JNI, but this detail is not really relevant here. Both wrappers override the finalize() method. However, application code should not rely on finalization. Instead, it should call public close() methods, using a try-finalize block. Finalization is just used as a fallback to handle obscure failure scenarios which slip through regular exception handling.

Unfortunately, an innocuous loop like the following one (which creates a new SQL statement, just to immediately discard it again) executes rather slowly and eventually results in an out-of-memory error.

for (int i = 0; i < 10000000; ++i) {
    final Statement st = db.prepare("SELECT 1");
    st.close();
}

Originally, I thought there was a complicated locking interaction (more on that below), but this does not seem to be the case. The explanation appears to be that the finalization queue grows faster than it can be processed by the finalization thread. Finalization has to occur in a separate thread because it is an inherently concurrent activity. If finalizers were executed in the allocating thread (for instance), it is quite likely that synchronized blocks are entered, although the regular (non-finalizer) application code has acquired a lock on the same object. This appears as a locking implementation error to the application (a lock not achieving exclusion of other threads of control). So there is a separate finalization thread (or many of them), processing a queue of to-be-finalized objects which is populated by the garbage collector with objects with finalizers which have become otherwise unreachable. Performance issues occur if application code creates new objects faster than the finalizer thread can process old objects which have become unreachable. In this case, the finalization queue can grow without bounds, despite any apparent leak in the application code.

What can be done at the virtual machine level to combat this phenomenon? Throttling allocations so that the finalizer thread can catch up does not work in general due to locking issues (and so does trying to clear the finalization queue before bailing out with an out-of-memory error). Creating more finalizer threads may help, but only on large machines (and not the in the example above, again due to locking issues).

There are some things an application (or library) can do, though. The library knows what happens in the finalizers (provided that the finalization mechanism cannot be overridden, which is a very good idea). In the SQLite database example, weak references and a per-database finalization queue could be used. After taking the per-database lock, the wrapper could check if there are statement objects in the queue, and free their native resources. (This approach is very similar to how WeakHashMap is typically implemented.) No separate thread is needed because freeing those resources does not block (potentially introducing deadlocks), nor does it acquire any locks already held by the current thread—at least until you start supporting user-defined functions and aggregates. However, in the end, I opted for a far simpler solution: I removed the finalize() method from the statement wrapper class. Leaked native resources are still cleaned up when the database is closed, so there is still a safety net. Closing the database object explicitly automatically frees native resources for statements. To prevent use-after-free issues (which could crash the virtual machine), the statement wrappers check that the database object has not been closed.

Now to the obscure locking issues I alluded to above (which also explains why countermeasures at the virtual machine level are difficult to implement). The wrapper targets the SQLite version in Debian 5.0, which is rather strictly single-threaded. Consequently, all operations on databases and statements are guarded by a lock on the database wrapper object. This includes allocation and deallocation of statement objects. In the original implementation, the finalize() method of statement wrappers acquired a lock on the database to which the statement belongs (no matter if the close() method had been called or not). Now assume that the run-time environment stalled new applications while the finalization queue keeps growing. In the loop above, it is somewhat likely that the application thread is holding a lock on the database object which prevents the finalizer thread from making progress. So in the end, stalling allocations did not achieve anything, but made things worse due to additional overhead. Of course, the finalizer could be improved not to take a lock if the close() method has already been called by application code, but it seems to me that the run-time environment should not rely on such a coding style. (I actually implemented this optimization in the SQLite wrapper, and in turned out that it did not make a significant difference, leading me to the conclusion that my initial assumption was wrong—the locking issue described in this paragraph was not the culprit.)

What does this tell us about finalizers? A very short summary is:

Adding a finalize() method to a class means that objects of this class are never short-lived, independently of their usage patterns.

This statement is not just a rule of thumb, it is closer to a fundamental property of finalizers. If many objects with finalizers are created, it is likely that there will be performance issues, even if the underlying native resources are explicitly freed using try-finalize blocks. If this turns out to be a problem and the finalizer safety net is needed (for instance, if native resources are involved), a pool-based approach is often practical—and in many cases, there are database-like objects which could serve as pools.

Note: This issue is mitigated by a new programming interface in OpenJDK 9.

Revisions

2009-05-01: published
2017-11-01: added reference to new article

Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum