Master Merlin's new I/O classes

Squeeze maximum performance out of nonblocking I/O and memory-mapped buffers with java.nio

With the recent public beta release of J2SE (Java 2 Platform, Standard Edition) 1.4 (code-named Merlin), Sun has once again unleashed scores of new classes, features, and interfaces on unsuspecting Java developers. Because J2SE 1.3 focused only on performance improvements, J2SE 1.4 incorporates two years' worth of feature enhancements. Also, as the first J2SE release defined by the Java Community Process (JCP), Merlin reflects a wider array of interests than previous JDK releases.

A few obvious additions in Merlin have received most of the press so far, including the XML parser, secure sockets extension, and 2D graphics enhancements. This article introduces an exciting new API many have overlooked. The new I/O (input/output) packages finally address Java's long-standing shortcomings in its high-performance, scalable I/O. The new I/O packages -- java.nio.* -- allow Java applications to handle thousands of open connections while delivering scalability and excellent performance. These packages introduce four key abstractions that work together to solve the problems of traditional Java I/O:

  1. A Buffer contains data in a linear sequence for reading or writing. A special buffer provides for memory-mapped file I/O.
  2. A charset maps Unicode character strings to and from byte sequences. (Yes, this is Java's third shot at character conversion.)
  3. Channels -- which can be sockets, files, or pipes -- represent a bidirectional communication pipe.
  4. Selectors multiplex asynchronous I/O operations into one or more threads.

A quick review

Before diving into the new API's gory details, let's review Java I/O's old style. Imagine a basic network daemon. It needs to listen to a ServerSocket, accept incoming connections, and service each connection. Assume for this example that servicing a connection involves reading a request and sending a response. That resembles the way a Web server works. Figure 1 depicts the server's lifecycle. At each heavy black line, the I/O operation blocks -- that is, the operation call won't return until the operation completes.

Figure 1. Blocking points in a typical Java server

Let's take a closer look at each step.

Creating a ServerSocket is easy:

ServerSocket server = new ServerSocket(8001);

Accepting new connections is just as easy, but with a hidden catch:

Socket newConnection = server.accept();

The call to server.accept() blocks until the ServerSocket accepts an incoming network connection. That leaves the calling thread sitting for an indeterminate length of time. If this application has only one thread, it does a great impression of a system hang.

Once the incoming connection has been accepted, the server can read a request from that socket, as shown in the code below. Don't worry about the Request object. It is a fiction invented to keep this example simple.

 InputStream in = newConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
LineNumberReader lnr = new LineNumberReader(reader);
Request request = new Request();
while(!request.isComplete()) {
  String line = lnr.readLine();

This harmless-looking chunk of code features problems. Let's start with blocking. The call to


eventually filters down to call

. There, if data waits in the network buffer, the call immediately returns some data to the caller. If there isn't enough data buffered, then the call to read blocks until enough data is received or the other computer closes the socket. Because


asks for data in chunks (it extends


), it might just sit around waiting to fill a buffer, even though the request is actually complete. The tail end of the request can sit in a buffer that


has not returned.

This code fragment also creates too much garbage, another big problem. LineNumberReader creates a buffer to hold the data it reads from the socket, but it also creates Strings to hold the same data. In fact, internally, it creates a StringBuffer. LineNumberReader reuses its own buffer, which helps a little. Nevertheless, all the Strings quickly become garbage.

Now it's time to send the response. It might look something like this (imagine that the Response object creates its stream by locating and opening a file):

 Response response = request.generateResponse();
OutputStream out = newConnection.getOutputStream();
InputStream in = response.getInputStream();
int ch;
while(-1 != (ch = {

This code suffers from only two problems. Again, the read and write calls block. Writing one character at a time to a socket slows the process, so the stream should be buffered. Of course, if the stream were buffered, then the buffers would create more garbage.

You can see that even this simple example features two problems that won't go away: blocking and garbage.

The old way to break through blocks

The usual approach to dealing with blocking I/O in Java involves threads -- lots and lots of threads. You can simply create a pool of threads waiting to process requests, as shown in Figure 2.

Figure 2. Worker threads to handle requests

Threads allow a server to handle multiple connections, but they still cause trouble. First, threads are not cheap. Each has its own stack and receives some CPU allocation. As a practical matter, a JVM might create dozens or even a few hundred threads, but it should never create thousands of them.

In a deeper sense, you don't need all those threads. They do not efficiently use the CPU. In a request-response server, each thread spends most of its time blocked on some I/O operation. These lazy threads offer an expensive approach to keeping track of each request's state in a state machine. The best solution would multiplex connections and threads so a thread could order some I/O work and go on to something productive, instead of just waiting for the I/O work to complete.

New I/O, new abstractions

Now that we've reviewed the classic approach to Java I/O, let's look at how the new I/O abstractions work together to solve the problems we've seen with the traditional approach.

Along with each of the following sections, I refer to sample code (available in Resources) for an HTTP server that uses all these abstractions. Each section builds on the previous sections, so the final structure might not be obvious from just the buffer discussion.

Buffered to be easier on your stomach

Truly high-performance server applications must obsess about garbage collection. The unattainable ideal server application would handle a request and response without creating any garbage. The more garbage the server creates, the more often it must collect garbage. The more often it collects garbage, the lower its throughput.

Of course, it's impossible to avoid creating garbage altogether; you need to just manage it the best way you know how. That's where buffers come in. Traditional Java I/O wastes objects all over the place (mostly Strings). The new I/O avoids this waste by using Buffers to read and write data. A Buffer is a linear, sequential dataset and holds only one data type according to its class:

java.nio.BufferAbstract base class
java.nio.ByteBufferHolds bytes. Can be direct or nondirect. Can be read from a ReadableByteChannel. Can be written to a WritableByteChannel.
java.nio.MappedByteBufferHolds bytes. Always direct. Contents are a memory-mapped region of a file.
java.nio.CharBufferHolds chars. Cannot be written to a Channel.
java.nio.DoubleBufferHolds doubles. Cannot be written to a Channel.
java.nio.FloatBufferHolds floats. Can be direct or nondirect.
java.nio.IntBufferHolds ints. Can be direct or nondirect.
java.nio.LongBufferHolds longs. Can be direct or nondirect.
java.nio.ShortBufferHolds shorts. Can be direct or nondirect.
Table 1. Buffer classes

You allocate a buffer by calling either allocate(int capacity) or allocateDirect(int capacity) on a concrete subclass. As a special case, you can create a MappedByteBuffer by calling mode, long position, int size).

A direct buffer allocates a contiguous memory block and uses native access methods to read and write its data. When you can arrange it, a direct buffer is the way to go. Nondirect buffers access their data through Java array accessors. Sometimes you must use a nondirect buffer -- when using any of the wrap methods (like ByteBuffer.wrap(byte[])) -- to construct a Buffer on top of a Java array, for example.

When you allocate the Buffer, you fix its capacity; you can't resize these containers. Capacity refers to the number of primitive elements the Buffer can contain. Although you can put multibyte data types (short, int, float, long, and so on) into a ByteBuffer, its capacity is still measured in bytes. The ByteBuffer converts larger data types into byte sequences when you put them into the buffer. (See the next section for a discussion about byte ordering.) Figure 3 shows a brand new ByteBuffer created by the code below. The buffer features a capacity of eight bytes.

 ByteBuffer example = ByteBuffer.allocateDirect(8);
Figure 3. A fresh ByteBuffer

The Buffer's position is the index of the next element that will be written or read. As you can see in Figure 3, position starts at zero for a newly allocated Buffer. As you put data into the Buffer, position climbs toward the limit. Figure 4 shows the same buffer after the calls in the next code fragment add some data.

 example.put( (byte)0xca );
example.putShort( (short)0xfeba );
example.put( (byte)0xbe );
Figure 4. ByteBuffer after a few puts

Another of the buffer's important attributes is its limit. The limit is the first element that should not be read or written. Attempting to put() past the limit causes a BufferOverflowException. Similarly, attempting to get() past the limit causes a BufferUnderflowException. For a new buffer, the limit equals the capacity. There is a trick to using buffers. Between filling the buffer with data and writing it on a Channel, the buffer must flip. Flipping a buffer primes it for a new sequence of operations. If you've been putting data into a buffer, flipping it ensures that it's ready to read the data. More precisely, flipping the buffer sets its limit to the current position and then resets its position to zero. Its capacity does not change. The following code flips the buffer. Figure 5 depicts the effect of flipping the sample buffer.

Figure 5. The flipped ByteBuffer

After the flip, the buffer can be read. In this example, get() returns four bytes before it throws a BufferUnderflowException.

An aside about byte ordering

Any data type larger than a byte must be stored in multiple bytes. A short (16 bits) requires two bytes, while an int (32 bits) requires four bytes. For a variety of historical reasons, different CPU architectures pack these bytes differently. On big-endian architectures, the most significant byte goes in the lowest address, as shown in Figure 6. Big-endian order is often referred to as network order.

Figure 6. Big-endian byte ordering


architectures put the least significant byte first, as in Figure 7.

Figure 7. Little-endian byte ordering

Anyone who programs networks in C or C++ can rant at length about byte-ordering problems. Host byte order, network byte order, big endian, little endian ... they're a pain. If you put a short into a byte array in big-endian ordering and remove it in little-endian ordering, you receive a different number than you put in! (See Figure 8.)

Figure 8. The result of mismatched byte ordering

You might have noticed that the call to example.putShort() illustrated in Figure 4 resulted in 0xFE at Position 1 and 0xBA at Position 2. In other words, the most significant byte went into the lowest numbered slot. Therefore, Figure 4 offers an example of big-endian byte ordering. java.nio.ByteBuffer defaults to big-endian byte ordering on all machines, no matter what the underlying CPU might use. (In fact, Intel microprocessors are little endian.) ByteBuffer uses instances of java.nio.ByteOrder to determine its byte ordering. The static constants ByteOrder.BIG_ENDIAN and ByteOrder.LITTLE_ENDIAN do exactly what you would expect.

Essentially, if you talk to another Java program, leave the byte ordering alone and it will work. If you talk to a well-behaved socket application in any language, you should also leave the byte ordering alone. You fiddle with byte ordering in only two instances: when you talk to a poorly-behaved network application that does not respect network byte ordering, or when you deal with binary data files created on a little-endian machine.

How do buffers help?

So how can buffers improve performance and cut down on garbage? You could create a pool of direct Buffers to avoid allocations during request processing. Or you could create Buffers for common situations and keep them around. The following fragment from our sample HTTP server illustrates the latter approach:

1 2 3 Page 1
Page 1 of 3