Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

JVM performance optimization, Part 4: C4 garbage collection for low-latency Java applications

Boost Java scalability with concurrently compacting garbage collection

  • Print
  • Feedback

Page 2 of 7

Being concurrent, the C4 algorithm actually changes the premise of modern Java enterprise architecture and deployment models. Just consider what hundreds of GB per JVM instance could do for your Java applications:

  • What does a Java deployment look like if it's able to scale within a single JVM instance rather than across many?
  • What type of objects might be stored in memory that weren't before due to GC constraints?
  • How might distributed clusters -- be they caches, region servers, or some other type of server nodes -- change as a result of larger JVM instances? What happens to traditional node counts, node deaths, and cache misses when the JVM size can increase without negatively impacting application responsiveness?

The three phases of the C4 algorithm

The main design premises of the C4 algorithm are "garbage is good" and "compaction is inevitable." C4's design goal is to be both concurrent and collaborative, thus eliminating the need for stop-the-world collection. The C4 garbage collection algorithm consists of three phases:

  1. Marking -- finding what's live
  2. Relocation -- moving things together to free up larger consecutive space (also known as compaction)
  3. Remapping -- updating references to moved objects

We'll look at each phase in detail.

Marking in C4

The marking phase of the C4 algorithm uses a concurrent marking and reference-tracing approach, which I discussed in detail in Part 3 of this series.

The marking phase is started by GC threads traversing the references from known live objects in thread stacks and registers. These threads continue to trace references until all reachable objects have been found on the heap. In this phase, the C4 algorithm is quite similar to other concurrent markers.

C4's marker starts to differentiate during the concurrent mark phase. If an application thread hits an unmarked object during this phase, it facilitates that object to be queued up for further reference tracing. It also ensures that the object is marked, so that it never has to be traversed again by the garbage collector or another application thread. This saves marking time and eliminates the risk of recursive remarks. (Note that a long recursive mark would potentially force the application to run out of memory before memory could be reclaimed -- a pervasive problem in most garbage collection scenarios.)

Figure 1. Application threads traverse the heap just once during marking

If the C4 algorithm relied on dirty-card tables or other methods of logging reads and writes into already-traversed heap areas, an application's GC threads might have to revisit certain areas for re-marking. In extreme cases a thread could get stuck in an infinite re-marking scenario -- at least infinite enough to cause the application to run out of memory before new memory could be freed. But C4 relies on a self-healing load value barrier (LVB), which enables application threads to immediately see if a reference is already marked. If the reference is not marked, the application thread will add it to the GC queue. Once the reference is in the queue it can't be re-marked. The application thread is free to continue on with its work.

  • Print
  • Feedback

Resources

Earlier articles in the Java performance optimization series:

More about garbage collection: