Master Merlin's new I/O classes

Squeeze maximum performance out of nonblocking I/O and memory-mapped buffers with java.nio

1 2 3 Page 2
Page 2 of 3
 class ReadWriteThread extends Thread {
  ...
  private WeakHashMap fileCache = new WeakHashMap();
  private ByteBuffer[] responseBuffers = new ByteBuffer[2];
  ...
  public ReadWriteThread(Selector readSelector, 
                         ConnectionList acceptedConnections, 
                         File dir) 
      throws Exception 
  {
    super("Reader-Writer");
    ...
    responseBuffers[0] = initializeResponseHeader();
    ...
  }
  ...
  protected ByteBuffer initializeResponseHeader() throws Exception {
    // Pre-load a "good" HTTP response as characters.
    CharBuffer chars = CharBuffer.allocate(88);
    chars.put("HTTP/1.1 200 OK\n");
    chars.put("Connection: close\n");
    chars.put("Server: Java New I/O Example\n");
    chars.put("Content-Type: text/html\n");
    chars.put("\n");
    chars.flip();
    // Translate the Unicode characters into ASCII bytes.
    ByteBuffer buffer = ascii.newEncoder().encode(chars);
    ByteBuffer directBuffer = ByteBuffer.allocateDirect(buffer.limit());
    directBuffer.put(buffer);
    return directBuffer;
  }
  ...
}

The above code is an excerpt from the thread that reads requests and sends responses. In the constructor, we set up two

ByteBuffer

s for the responses. The first buffer always contains the HTTP response header. This particular server always sends the same headers and the same response code. To send error responses, the method

sendError()

(not shown above) creates a similar buffer with an HTTP error response for a particular status code. It saves the error response headers in a

WeakHashMap

, keyed by the HTTP status code.

The initializeResponseHeader() method actually uses three buffers. It fills a CharBuffer with Strings. The character set encoder turns the Unicode strings into bytes. I will cover character conversion later. Since this header is sent at the beginning of every response from the server, it saves time to create the response once, save it in a buffer, and just send the buffer every time. Notice the call to flip the CharBuffer after we put our data into it. The third buffer used in initializeResponseHeader() seems a bit odd. Why convert the characters into a ByteBuffer just to then copy them into another ByteBuffer? The answer: because CharsetEncoder creates a nondirect ByteBuffer. When you write a direct buffer to a channel, it immediately passes to native calls. However, when you pass a nondirect buffer to a channel, the channel provider creates a new, direct buffer and copies the nondirect buffer's contents. That means extra garbage and a data copy. It worsens when the buffer with the response header is sent in every HTTP response. Why let the channel provider create a direct buffer on every request if we can do it once and get it over with?

Character encoding

When putting data into

ByteBuffer

s, two related problems crop up: byte ordering and character conversion.

ByteBuffer

handles byte ordering internally using the

ByteOrder

class. It does not deal with character conversion, however. In fact,

ByteBuffer

doesn't even have methods for reading or writing strings. Character conversion is a complicated topic, subject to many international standards, including the Internet Engineering Task Force's requests for comments, the Unicode Standard, and the Internet Assigned Numbers Authority (IANA). However, almost every time you deal with character conversion, you must convert Unicode strings to either ASCII or UTF-8. Fortunately, these are easy cases to handle. ASCII and UTF-8 are examples of

character sets.

A character set defines a mapping from Unicode to bytes and back again. Character sets are named according to IANA standards. In Java, a character set is represented by an instance of

java.nio.charset.Charset

. As with most internationalization classes, you do not construct

Charset

s directly. Instead, you use the static factory method

Charset.forName()

to acquire an appropriate instance.

Charset.availableCharsets()

gives you a map of supported character set names and their

Charset

instances. The J2SE 1.4 beta includes eight character sets: US-ASCII, ISO-8859-1, ISO-8859-15, UTF-8, UTF-16, UTF-16BE (big endian), UTF-16LE (little endian), and Windows-1252.

Charset constructs CharsetEncoders and CharsetDecoders to convert character sequences into bytes and back again. Take another look at ReadWriteThread below. The encoder shows up twice for converting an entire CharBuffer into a ByteBuffer. readRequest, on the other hand, uses the decoder on the incoming request.

 class ReadWriteThread extends Thread {
  ...
  private Charset ascii;
  ...
  public ReadWriteThread(Selector readSelector, 
                         ConnectionList acceptedConnections, 
                         File dir) 
      throws Exception 
  {
    super("Reader-Writer");
    ...
    ascii = Charset.forName("US-ASCII");
    responseBuffers[0] = initializeResponseHeader();
    ...
  }
  ...
  protected ByteBuffer initializeResponseHeader() throws Exception {
    ...
    // Translate the Unicode characters into ASCII bytes.
    ByteBuffer buffer = ascii.newEncoder().encode(chars);
    ...
  }
  ...
  protected String readRequest(SelectionKey key) throws Exception {
    SocketChannel incomingChannel = (SocketChannel)key.channel();
    Socket incomingSocket = incomingChannel.socket();
    ...
    int bytesRead = incomingChannel.read(readBuffer);
    readBuffer.flip();
    String result = asciiDecoder.decode(readBuffer).toString();
    readBuffer.clear();
    StringBuffer requestString = (StringBuffer)key.attachment();
    requestString.append(result);
    ...
  }
  ...
  protected void sendError(SocketChannel channel, 
                           RequestException error) throws Exception {
      ...
      // Translate the Unicode characters into ASCII bytes.
      buffer = ascii.newEncoder().encode(chars);
      errorBufferCache.put(error, buffer);
      ...
  }
}

Channel the new way

You might notice that none of the existing java.io classes know how to read or write Buffers. In Merlin, Channels read data into Buffers and send data from Buffers. Channels join Streams and Readers as a key I/O construct. A channel might be thought of as a connection to some device, program, or network. At the top level, the java.nio.channels.Channel interface just knows whether it is open or closed. A nifty feature of Channel is that one thread can be blocked on an operation, and another thread can close the channel. When the channel closes, the blocked thread awakens with an exception indicating that the channel closed. There are several Channel classes, as shown in Figure 9.

Figure 9. Channel interface hierarchy

Additional interfaces depicted in Figure 9 add methods for reading (java.nio.channels.ReadableByteChannel), writing (java.nio.channels.WritableByteChannel), and scatter/gather operations. A gathering write can write data from several buffers to the channel in one contiguous operation. Conversely, a scattering read can read data from the channel and deposit it into several buffers, filling each one in turn to its limit. Scatter/gather operations have been used for years in high-performance I/O managers in Unix and Windows NT. SCSI controllers also employ scatter/gather to improve overall performance. In Java, the channels quickly pass scatter/gather operations down to the native operating system functions for vectored I/O. Scatter/gather operations also ease protocol or file handling, particularly when you create fixed headers in some buffers and change only one or two variable data buffers. You can configure channels for blocking or nonblocking operations. When blocking, calls to read, write, or other operations do not return until the operation completes. Large writes over a slow socket can take a long time. In nonblocking mode, a call to write a large buffer over a slow socket would just queue up the data (probably in an operating system buffer, though it could even queue it up in a buffer on the network card) and return immediately. The thread can move on to other tasks while the operating system's I/O manager finishes the job. Similarly, the operating system always buffers incoming data until the application asks for it. When blocking, if the application asks for more data than the operating system has received, the call blocks until more data comes in. In nonblocking mode, the application just gets whatever data is immediately available. The sample code included with this article uses each of the following three channels at various times:

  • ServerSocketChannel
  • SocketChannel
  • FileChannel

ServerSocketChannel

java.nio.channels.ServerSocketChannel plays the same role as java.net.ServerSocket. It creates a listening socket that accepts incoming connections. It cannot read or write. ServerSocketChannel.socket() provides access to the underlying ServerSocket, so you can still set socket options that way. As is the case with all the specific channels, you do not construct ServerSocketChannel instances directly. Instead, use the ServerSocketChannel.open() factory method.

ServerSocketChannel.accept() returns a java.nio.channel.SocketChannel for a newly connected client. (Note: Before Beta 3, accept() returned a java.net.Socket. Now the method returns a SocketChannel, which is less confusing for developers.) If the ServerSocketChannel is in blocking mode, accept() won't return until a connection request arrives. (There is an exception: you can set a socket timeout on the ServerSocket. In that case, accept() eventually throws a TimeoutException.) If the ServerSocketChannel is in nonblocking mode, accept() always returns immediately with either a Socket or null. In the sample code, AcceptThread constructs a ServerSocketChannel called ssc and binds it to a local TCP port:

 class AcceptThread extends Thread {
  private ServerSocketChannel ssc;
  public AcceptThread(Selector connectSelector, 
                      ConnectionList list, 
                      int port) 
      throws Exception 
  {
    super("Acceptor");
    ...
    ssc = ServerSocketChannel.open();
    ssc.configureBlocking(false);
    InetSocketAddress address = new InetSocketAddress(port);
    ssc.socket().bind(address);
    ...
  }

SocketChannel

java.nio.channels.SocketChannel

is the real workhorse in this application. It encapsulates a

java.net.Socket

and adds a nonblocking mode and a state machine.

SocketChannels can be created one of two ways. First, SocketChannel.open() creates a new, unconnected SocketChannel. Second, the Socket returned by ServerSocketChannel.accept() actually has an open and connected SocketChannel attached to it. This code fragment, from AcceptThread, illustrates the second approach to acquiring a SocketChannel:

class AcceptThread extends Thread {
  private ConnectionList acceptedConnections;
  ...
  protected void acceptPendingConnections() throws Exception {
    ...
    for(Iterator i = readyKeys.iterator(); i.hasNext(); ) {
      ...
      ServerSocketChannel readyChannel = (ServerSocketChannel)key.channel();
      SocketChannel incomingChannel = readyChannel.accept();
      acceptedConnections.push(incomingChannel);
    }
  }
}

Like

SelectableChannel

's other subclasses,

SocketChannel

can be blocking or nonblocking. If it is blocking, then read and write operations on the

SocketChannel

behave exactly like blocking reads and writes on a

Socket

, with one vital exception: these blocking reads and writes can be interrupted if another thread closes the channel.

FileChannel

Unlike

SocketChannel

and

ServerSocketChannel

,

java.nio.channels.FileChannel

does not derive from

SelectableChannel

. As you will see in the next section, that means that

FileChannel

s cannot be used for nonblocking I/O. Nevertheless,

FileChannel

has a slew of sophisticated features that were previously reserved for C programmers.

FileChannel

s allow locking of file portions and direct file-to-file transfers that use the operating system's file cache.

FileChannel

can also map file regions into memory. Memory mapping a file uses the native operating system's memory manager to make a file's contents look like memory locations. For more efficient mapping, the operating system uses its disk paging system. From the application's perspective, the file contents just exist in memory at some range of addresses. When it maps a file region into memory,

FileChannel

creates a

MappedByteBuffer

to represent that memory region.

MappedByteBuffer

is a type of direct byte buffer. A

MappedByteBuffer

offers two big advantages. First, reading memory-mapped files is fast. The biggest gains go to sequential access, but random access also speeds up. The operating system can page the file into memory far better than

java.io.BufferedInputStream

can do its block reads. The second advantage is that using

MappedByteBuffer

s to send files is simple, as shown in the next code fragment, also from

ReadWriteThread

:

  protected void sendFile(String uri, SocketChannel channel) throws 
RequestException, IOException {
    if(Server.verbose) 
      System.out.println("ReadWriteThread: Sending " + uri);
    Object obj = fileCache.get(uri);
    
    if(obj == null) {
      Server.statistics.fileMiss();
      try {
            File f = new File(baseDirectory, uri);
            FileInputStream fis = new FileInputStream(f);
            FileChannel fc = fis.getChannel();
            
            int fileSize = (int)fc.size();
            responseBuffers[1] = fc.map(FileChannel.MapMode.READ_ONLY, 0, fileSize);
            fileCache.put(uri, responseBuffers[1]);
      } catch(FileNotFoundException fnfe) {
            throw RequestException.PAGE_NOT_FOUND;
      }
    } else {
      Server.statistics.fileHit();
      responseBuffers[1] = (MappedByteBuffer)obj;
      responseBuffers[1].rewind();
    }
    responseBuffers[0].rewind();
    channel.write(responseBuffers);
  }

The

sendFile()

method sends a file as an HTTP response. The lines inside the

try

block create the

MappedByteBuffer

. The rest of the method caches the memory-mapped file buffers in a

WeakHashMap

. That way, repeated requests for the same file are blindingly fast, yet when memory tightens, the garbage collector eliminates the cached files. You could keep the buffers in a normal

HashMap

, but only if you know that the file number is small (and fixed). Notice that the call to

channel.write()

actually passes an array of two

ByteBuffers

1 2 3 Page 2
Page 2 of 3