Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
In this article, I will focus on tuning IO performance. Many applications spend much of their runtime on network or file IO, and poorly performing IO code can run many times slower than well-tuned IO code.
Some general concepts appear over and over again when assessing the performance of Java programs. The examples in this article focus specifically on tuning an IO-bound application, but you can apply those principles to many other performance situations.
Perhaps the most important principle is this: Measure early, measure often. You can't effectively manage performance if you don't know the source of your problem. Programmers are notoriously bad at guessing where performance problems lie. Spending days tuning a subsystem that accounts for 1 percent of an application's total runtime simply cannot yield more than a 1 percent improvement in application performance. Instead of guessing, use performance measurement tools -- such as profiling or nonintrusive time-stamped logging -- to identify where your application spends its time and to focus your energy on those hot spots. After you tune, measure again. Not only will measuring help you concentrate your energy where it will count the most, but it will also provide evidence of the success -- or lack thereof -- of your efforts.
In tuning a program for performance, you may want to measure a number of quantities such as total runtime, average memory usage, peak memory usage, throughput, request latency, and object creations. Which factors you should focus on depend on your situation and performance requirements. Several good commercial profiling tools are available that can help you measure many of those quantities, but you don't need to buy an expensive package to collect useful performance data.
In the performance data gathered for this article, I focused entirely on runtime and, to gather my measurements, I used a
class similar to the simple Timer class below. (It can easily be extended to support additional operations such as pause() and restart().) Timer also allows you to gather timing information without cluttering your output with time-stamped logging statements, which can
also distort runtime measurements since the logging statements may create objects and perform IO.
public class Timer {
// A simple "stopwatch" class with millisecond accuracy
private long startTime, endTime;
public void start() { startTime = System.currentTimeMillis(); }
public void stop() { endTime = System.currentTimeMillis(); }
public long getTime() { return endTime - startTime; }
}
One of the most common causes of Java performance problems is the excessive creation of temporary objects. While newer Java
VMs more effectively reduce the performance impact of creating many small objects, the fact remains that object creation is
still an expensive operation. Since strings are immutable, the String class is one of the biggest offenders; every time a String is modified, one or more new objects is created. That brings us to our second performance principle: Avoid excessive object instantiations.
Many applications process large volumes of data, and IO is one of those areas where small differences in implementation can have a large impact on performance. I derived this article's examples from tuning a text-processing application that consumes and analyzes large quantities of text. The amount of time spent reading and processing the input was significant, and the methods employed in tuning the application provide good examples of the performance principles stated above.
One of the principal culprits that affects Java IO performance is the use of character-by-character IO -- calling the InputStream.read() or the Reader.read() methods to read one character. Java inherited that idiom from C programming, in which it is quite common -- and efficient
-- to read a file by repeatedly calling getc(). In C, character IO is fast because the getc() and putc() functions are implemented as macros and provide buffered file access, which means they only take a few machine instructions
to execute. In Java, you encounter a quite different situation. Not only do you incur the cost of one or more method calls
for each character but, more significantly, if you don't use any sort of buffering, you also suffer the cost of a system call
to obtain the character. While a Java program that relies on read() may look and function just like its C counterpart, it will not perform in the same way. Fortunately, Java offers several
easy approaches to achieving higher performance IO.
You can address buffering in one of two ways -- use the standard BufferedReader and BufferedInputStream classes or use the block-read methods to read larger blocks of data at a time. The former offers a quick and easy solution,
and substantially improves performance with little additional coding and opportunity for error. The do-it-yourself technique
offers a somewhat more complicated -- although still not difficult -- remedy, and it is even faster. It also features some
additional advantages, which I will describe later.
To measure the effect of different IO buffering strategies, I wrote six small programs that read several hundred files and examine each character. Table 1 shows the runtimes of those six programs, using five commonly used Linux Java VMs -- the Sun 1.1.7, 1.2.2, and 1.3 Java VMs, and the 1.1.8 and 1.3 Java VMs from IBM.
The programs are:
RawBytes: Read data one byte at a time, using FileInputStream.read()RawChars: Read data one char at a time, using FileReader.read()BufferedIS: Wrap FileInputStream with BufferedInputStream and read data one byte at a time with read()BufferedR: Wrap FileReader with BufferedReader and read data one char at a time with read()SelfBufferedIS: Read data 1 K at a time, using FileInputStream.read(byte[]), and access the data from the buffer
SelfBufferedR: Read data 1 K at a time, using FileReader.read(char[]), and access the data from the buffer
| Runtime | |||||
| Sun 1.1.7 | IBM 1.1.8 | Sun 1.2.2 | Sun 1.3 | IBM 1.3 | |
RawBytes |
20.6 | 18.0 | 26.1 | 20.70 | 62.70 |
RawChars |
100.0 | 235.0 | 174.0 | 438.00 | 148.00 |
BufferedIS |
9.2 | 1.8 | 8.6 | 2.28 | 2.65 |
BufferedR |
16.7 | 2.4 | 10.0 | 2.84 | 3.10 |
SelfBufferedIS |
2.1 | 0.4 | 2.0 | 0.61 | 0.53 |
SelfBufferedR |
8.2 | 0.9 | 2.7 | 1.12 | 1.17 |
We can draw several obvious conclusions from the performance data in Table 1 (measured as total runtime to process several hundred text files after adjusting for Java VM and program startup):