Java 101: Trash talk, Part 1

Java recycles its memory through garbage collection

1 2 3 4 5 Page 2
Page 2 of 5

Garbage collection algorithms

Not only does the JLS not mandate a garbage collector for a JVM, it also does not specify an algorithm that a JVM implementer must follow to implement a garbage collector. A garbage collector must only detect objects no longer needed and reclaim the heap space that those objects occupy. Over the years, computer scientists have devised many garbage collection algorithms; each offers some advantage in terms of memory use or performance.

Note
The Java Language Specification (JLS), which is the official word on the Java language, does not state that a JVM implementation must support a garbage collector. For example, a JVM implementation on a smart card -- a plastic card with an embedded processor -- might require all Java programs to fit within the confines of that card's memory. As a result, the smart card JVM would not need a garbage collector. However, most JVMs need garbage collectors.

Many garbage collection algorithms identify a root set of references and determine an object's reachability from those roots. The root set of references is a reference variable collection always accessible to an executing Java program. Each variable's reference value allows the program to access an object's fields and call the object's methods. Root-set variables include local variables, parameters, and class fields. A garbage collector regards all objects reachable from root-set variables as live objects and cannot collect them as garbage. That includes objects indirectly reachable from root-set variables, meaning the objects are reachable through live objects that are directly reachable from the root-set variables. In contrast, an object that is unreachable by any path from any root-set variable is eligible for garbage collection. For example, in the earlier code fragment, because no root-set variables reference Circle after area() finishes, Circle is no longer reachable and can be garbage collected.

The following sections examine various categories of garbage collection algorithms. The first category -- reference counting algorithms -- distinguishes itself as the only category not requiring a root set of references to detect unreachable objects.

Reference counting algorithms

The earliest garbage collection algorithms used reference counting to distinguish live objects from objects no longer in use. Basically, each object on the heap associates with a reference count. When that object first comes into existence and its reference assigns to a variable, the reference count initializes to one. When that object's reference assigns to any other variable, the reference count increments for each assignment. When a variable containing the object's reference goes out of scope (e.g., a method returns, and the variable was local to that method), the reference count decrements. Once the reference count reaches zero, the object becomes eligible for garbage collection. During garbage collection, the garbage collector decrements the reference count for each object that a garbage-collected object references. If any of those objects' reference counts reach zero, the reference counting-based garbage collector can also simultaneously collect the associated objects.

Reference counting has the advantage over other algorithm categories in that it does not interrupt program execution for lengthy periods. Because a reference counting-based garbage collector can run in short time periods that carefully coordinate with a program's execution, it is ideally suited for those programs that must run in real time. On the flip side, there are two disadvantages to this kind of algorithm:

  1. Reference counting adds some overhead to program execution. That overhead is due to the incrementing and decrementing of a counter each time an object's reference assigns to a new variable or each time an existing variable goes out of scope.
  2. Reference counting cannot detect cycles. For example, imagine a parent object that contains a variable that references a child object; this child object contains a variable that references the parent object. Such objects will never have a zero reference count -- even when no other variable exists that can reach either object. Hence, a reference counting-based garbage collector can never collect such objects.

Tracing algorithms

To overcome the aforementioned problems with reference counting algorithms, computer scientists devised the concepts of a root set of references (discussed above) and tracing algorithms. A tracing-based garbage collector identifies all objects (in a recursive manner) reachable from root-set variables and marks those objects in some fashion, such as setting one or more bits that associate with each object. Following the marking phase, the simplest form of tracing-based collector "sweeps" through all objects and garbage collects those objects that it did not mark. This tracing-based garbage collector is known as a mark-and-sweep garbage collector.

Compacting algorithms

A mark-and-sweep garbage collector should have some means to combat heap fragmentation. One way to accomplish that task is for the mark-and-sweep garbage collector to incorporate a compacting algorithm as part of the mark-and-sweep garbage collector's logic.

A compacting mark-and-sweep garbage collector moves all objects to one side of the heap during the sweep phase. The other end of the heap becomes one contiguous region of free memory. The collector updates all references to all objects it moves so those references identify the same objects in their new locations.

To simplify the object reference updates, some JVM garbage collector implementations that use compacting mark-and-sweep add a level of indirection. That level of indirection manifests itself through handles and a handle table. Under that scenario, an object reference always refers to the same handle entry in the handle table. In turn, the handle entry contains the object's actual reference. When a compacting mark-and-sweep garbage collector moves an object, the collector only modifies the actual object reference in the handle entry. All references to the handle entry remain undisturbed. Although the use of a handle table simplifies heap fragmentation, accessing each object adds the disadvantage of additional overhead, which can slow program execution.

Copying algorithms

To overcome the handles overhead, computer scientists invented copying algorithms. A copying-based garbage collector is a tracing-based garbage collector that combats heap fragmentation as follows: The heap initially divides into an object area and one or more free space areas. A program allocates memory for objects from the object area and leaves the free space areas alone. Once the object area is full, a copying-based garbage collector traces all live objects from root-set variables and copies each live object to a free space area. In the free space area, the collector positions each live object immediately next to another live object so no holes exist between objects. The program now allocates memory for objects from the chosen free space area, which is now known as the object area, and leaves the old object area, now a free space area, alone. The next time the copying-based garbage collector runs, it copies live objects from that object area to a new free space area. This back-and-forth copying cycle between object and free space areas continues each time the copying-based garbage collector runs.

One of the more common copying-based garbage collectors is known as the stop-and-copy garbage collector. That collector divides the heap into an object area and one free space area and stops program execution until it finishes copying all objects from the object area to the free space area.

Generational algorithms

A disadvantage of stop-and-copy garbage collectors is that the collector must copy all live objects, which increases the time a program must wait until it can resume execution. Fortunately, you can reduce the wait time because computer scientists have discovered the following:

  • Most program objects exist for short time periods
  • Most programs create few objects that exist for long time periods -- and copying long-lived objects over and over accounts for most of a copying collector's inefficiency

Generational algorithms overcome the inefficiency of repeatedly copying long-lived objects as follows: They divide the heap into two or more subheaps, where each subheap serves a generation of objects. Garbage collection takes place most frequently on the subheap that represents the youngest generation. Because most program objects exist for short time periods, the program will eventually not reference those objects and the garbage collector will collect them from the youngest subheap. After the generational garbage collector runs a few times, those objects that survived the previous runs move to the next highest generation subheap. Each progressively older generation subheap gets garbage-collected less often. And that saves time.

Note
Sun's HotSpot JVM uses a modified form of a generational collector. That modified collector manages the mature object space through the Train algorithm first proposed by Richard Hudson and Eliot Moss. I leave it as an exercise for you to research that algorithm. Check out Resources to get started.

Adaptive algorithms

Some garbage collection algorithms are better than others for certain heap situations. An adaptive algorithm monitors the current heap situation and selects an appropriate garbage collector on a situation-by-situation basis. Alternatively, an adaptive algorithm might divide the object heap into subheaps and assign a different garbage collector to each subheap. Each garbage collector might run simultaneously with the other collectors.

That's enough theory for now. It's time to start exploring the practicalities of garbage collection from Java's perspective. To begin, we investigate how to run Java's garbage collector.

1 2 3 4 5 Page 2
Page 2 of 5