Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 3 of 3
The source archive accompanying this article contains the following Java classes:
CoreSort tests the core implementation of Arrays.sort().
ConcurrentSort is a simple parallelized implementation of the same method that takes advantage of dual-core CPUs.
TestConcurrentSort tests the parallelized sort implementation on different inputs.
The following results were taken on dual-core Intel processor, each core running at 1.66Ghz. The same input array of a million
random strings was sorted by the core Arrays.sort method and by the parallelized version. The average running time of the core implementation was 3350ms, while the average
running time of the parallelized version was 2180ms (a 35% improvement).
In an ideal case, the performance improvement should be exactly 50%. However, for a few reasons, you can rarely achieve this. First, creating and running multiple threads incurs some "penalty." Second, most problems involve some kind of a "merging" stage where the results from the subproblems are analyzed to create the result for the original problem. Finally, some problems require communication among the subtasks.
So, is it just that easy? Yes and no. It is very simple to break up problems that are inherently concurrent, which you can do in just a few lines of code using the new JDK 6.0 concurrency APIs. And if your implementation is written to scale to an arbitrary number of available processors, your code can once again enjoy Moore's Law (at least as long as the hardware vendors keep putting more and more identical cores on their chips).
On the other hand, you are now faced with a multitude of aspects that are specific to concurrent programming. First, you need to decide if you split your problem among all processors, some of them, or only a fixed number (as in this article's example). Splitting the work among all processors might result in the best average completion time, but it is more complicated to do, and in some cases (when one core is used heavily by some external process), a specific subtask can take much longer to complete than the others. Splitting the task among a fixed number of cores is simpler but gives you only a constant performance improvement, no matter how many cores are available.
Second, your testing becomes much more complicated. You need to test your code on different configurations, including single-cores, dual-cores, quad-cores, and so on. Some bugs become very hard to reproduce, because the code no longer runs sequentially.
Third, the code itself is more complicated, and not only for the task splitting. You also need to think about system resources; your method might be called from multiple threads, and if you spawn yet more threads for every such request, you can easily bring the whole system to a crawl.
The last consideration, but certainly not the least, is that some problems, even if they appear sequential, don't easily lend
themselves to a parallel implementation. Consider, for example, ArrayList.indexOf (which is also used in ArrayList.contains). The implementation is very simple, scanning the entire list from the beginning and returning the first element that matches
the passed parameter. This can indeed be easily split among any number of available processors, each one searching in the
matching subrange. But the main difference between this method and Arrays.sort is the merging stage. In the indexOf method, you need to return the index of the first matching element. If the subtask that operates on the first list slice finds a matching element, as it is in the case of
ArrayList.indexOf, all the other subtasks are now using the processors for no reason.
ArrayList.indexOf, unlike Arrays.sort, doesn't need the full results of all the subtasks. If more than one subtask returns a nonnegative result, the implementation
should take the smallest one. Suppose you split the search into four subtasks, and third subtask returns a matching index.
You can't just stop the remaining subtasks and return it, but must wait until the first two subtasks complete in order to
decide what the result is. In this specific example, when a subtask N returns a matching result, you can stop all the tasks with an index higher than N but need to wait for all the tasks with an index lower than N.
Multicore machines are the reality for client-side development. Failure to adapt will result in software that does not scale well on modern hardware. By identifying routines that lend themselves to parallelizing, your code can continue enjoying the advances of multicore machines. However, this means switching your mindset from the sequential model to the concurrent model. The concurrent model doesn't only yield performance improvements; it also comes with its own set of best practices. If you want to achieve competitive performance and write software that scales with the hardware, you need to dip your toes in the concurrency pool.
Read more about Core Java in JavaWorld's Core Java section.
Archived Discussions (Read only)