It's a well-known fact that hardware companies are abandoning the race for single-CPU speed and instead are focusing on multicore processors. Despite the fact that many algorithms can be easily parallelized, most client-side Java code is still written for single-CPU systems. In this article Kirill Grouchnikov shows you how to fine-tune a core JDK array-sorting algorithm for improved processing speed of as much as 35%.
The consistent increase in single-core CPU speed that programmers could rely on for years is no longer available. This has been true for the server side for at least a decade, and now it's also the reality for client-side programming. Quite a few tasks easily lend themselves to being parallelized using new features in JDK 6.0, letting client-side applications take advantage of newer multicore hardware. Using a core JDK array sorting algorithm as an example, this article walks through the implementation details, highlighting points that are specific to parallel algorithms. Even a simple implementation results in a significant speed-up -- around 35 % -- on a dual-core machine.
The demise of Moore's Law
Up until a year or two ago, API providers could simply rely on an empirical observation made in 1965 by Gordon Moore, the co-founder of Intel. Moore's Law states that the number of transistors on an integrated circuit at minimum cost doubles every 18 to 24 months. For software developers, this meant that you could write a program, run it on the simplest CPU available at that time, and be pretty sure that the same program would run twice as fast on the simplest CPU available two years later. As long as the operating system was backward compatible, you didn't even need to recompile the program.
However, in the last couple of years, hardware manufacturers have started hitting production barriers that make cramming more computing power into a single chip very costly. The solution most of them have adopted -- first for the server market and now for the consumer (client) market -- is to put multiple cores on the same chip, without increasing each core's processing speed. Translated into the software world, this means that you don't get "free rides" any more. If your program has a simple sequential flow, it no longer can enjoy advances in the underlying hardware. This is true for both the programs that you're writing and the core language libraries.
Collection sorting in the single-core world
Existing Java collection-sorting routines are no exception. They perform no faster on newer multicore machines than on single-core machines. This might seem acceptable on smaller inputs, but the input domains for most real-world problems don't stand still. Moreover, developers and users rightly expect their programs to run faster on newer hardware.
A look at the generic algorithms that operate on collections, and their existing Java implementations, will clarify the problem -- and light the way to a solution.
Most of the generic core algorithms that operate on collections are available in the
Collections classes. Using the APIs exposed by these two classes, you can sort, search, and fill lists and arrays. Because most of the APIs operate on the contents of the entire collection, the running time is proportional to the collection size.
For some methods (such as sorting), the running time is even longer. You can't sort an arbitrary collection in a number of steps that is proportional to the collection's size. (This is known as linear complexity.) If the collection size is N, the best algorithm can sort this collection in a number of steps that is proportional to N*log(N); this lower bound has been theoretically proven. (As a side note, the quadratic complexity of the straightforward bubble sort means that the number of steps is proportional to N*N, which is highly undesirable for large collections.)
What does this mean to users of these core APIs? When you call
Collections.sort, the running time doesn't grow linearly with the collection size. On my machine, calling
Arrays.sort on an array of 200,000 strings takes 490ms on average. An array of 400,000 strings is sorted in 1290ms -- an increase in running time by factor of about 2.6.