Realistically real-time

Real-time Java application development using multicore systems

Page 3 of 4

Stack allocation and the Javolution library

A major problem with concurrent collection is that it cannot always keep up with the collection if too much garbage is generated too fast. This situation results in GC's famous "stop the world" collection. An obvious solution is to limit garbage-generation throughput. Advanced compilers may avoid allocating new objects if these objects are de-referenced quickly (basically, if it can determine that the object will be candidate for garbage collection).

For example during the FFT calculation, the compiler could avoid allocating intermediate complex numbers and store their real/imaginary values directly in registers. But this optimization, called "escape analysis," is still a work in progress and only applicable to simple cases. Another solution would be to declare some classes as ValueType and reference instances of them only by copy. ValueType objects could then be allocated on the stack with no adverse effect on the collector (the same as for primitive types such as int/double). (I submitted a request for enhancement for this stack allocation feature in 2005.)

Alternatively, the Javolution library supports custom object allocation policies. But it requires the use of static factory methods instead of constructors (the behavior of the new keyword cannot be programmatically changed).

In rt-test2.java I have re-written the Complex class from rt-test1.java to support stack allocation. I've also re-written the FFT code to use the stack (the stack actual implementation can be "scoped memory" for Real-Time VM or thread-local object pools). I have also removed the Thread.sleep(5) statement, which was originally placed in the program's main loop to reduce garbage flow and avoid overloading the CMS collector. As you see in Figure 6, stack allocation by reducing garbage generation allows the concurrent collector to complete its work before the system runs out of memory.

Benchmark results of custom stack allocation.
Figure 6. Javolution's custom stack allocation Maximum execution time: 76.192644 ms Average execution time: 13.151166 ms

Compared to the previous graph the average execution time has been significantly reduced. On the other hand, the worst-case execution time hasn't improved a bit. Why not?

The reason for the discrepancy is that the minor collections run less frequently but clean up the same memory space. One solution is the option -XX:NewRatio=128 (from 8 the default for client VMs), which lets us take advantage of having less garbage to decrease the size of the young generation space. The VM options should now look as shown here:

-Xms256M -Xmx256M
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:CompileThreshold=1
-XX:NewRatio=128
Results of decreasing the size of young generation.
Figure 7. Decrease the size of young generation Maximum execution time: 31.635864 ms Average execution time: 16.256594 ms

Limiting the garbage generated is clearly the most efficient way to reduce GC pauses without overloading the concurrent collector. We have so far reduced the worst-case execution time from 814 ms to 31 ms!

Javolution's concurrent context

It is not rare in real-time systems that some critical processing has to be performed as quickly as possible. Increasing thread priority guarantees that the code will be scheduled for execution quickly, but it does not make the execution any faster. This problem is exacerbated with the latest highly concurrent processors such as the Sun Niagara 2 Processor. This processor is capable of running 32 threads concurrently but each thread executes at very low clock speed. Obviously it would be helpful, when something urgent has to be done, for all the processors to participate in completing the task quickly.

To this effect, the Javolution library provides a specialized execution context called ConcurrentContext. When a thread enters a concurrent context, it may perform concurrent executions by calling the ConcurrentContext.execute(Runnable) static method. The logic is then executed by a concurrent thread, or by the current thread itself if no concurrent thread is immediately available. (The number of concurrent threads is typically limited to the number of processors.)

Only after all concurrent executions are completed is the current thread allowed to exit the scope of the concurrent context (internal synchronization). In Listing 1 the FFT calculations are rewritten to execute concurrently:

Listing 1. FFT calculations using ConcurrentContext

import static javolution.context.ConcurrentContext.*;
...
enter();
try {
    for (int i = 0; i < n; i++) {
         final int j = i % K;
         execute(new Runnable() {
              public void run() {
                  fft(frames[j], results[j]);
              }
         });
    }  
} finally {
    exit(); // Waits for concurrent executions to complete.
}

Concurrent contexts ensure the same behavior whether or not the execution is performed by the current thread or a concurrent thread. Any exception raised during the concurrent logic executions is propagated to the current thread.

Concurrent contexts are used with Web application servers in order to average server response time, even when some user actions take longer than others. In such a scenario, lengthy operations are performed in a concurrent context and are authorized to use up to half of the processors available. This is done through the simple command:

ConcurrentContext.setConcurrency(
        (Runtime.getRuntime().availableProcessors() / 2) - 1);

The other half of the processors can continue servicing simple queries.

Javolution's concurrent contexts have proven to be very efficient. JScience's matrix multiplications, for example, are accelerated by a factor of 1.99x when concurrency is enabled on a dual-core processor. On one government project running on a Sun Fire T2000 Server, the execution of lengthy database operations was accelerated 17 times by using ConcurrentContext and 5 lines of code!

| 1 2 3 4 Page 3