Use the two "R"s of Java 1.1 -- Readers and Writers

Learn how to use the two new additions to the java.io package -- class Reader and class Writer -- to filter out unwanted e-mail

If you're joining us for the first time, you might want to begin by reading last month's column, which introduced the concept of streams. The stream model presents information as flowing from one point to another, as if it were in a stream or pipe. The model fits many types of real-world information. Whether it is keycodes coming from a computer keyboard, audio data coming from an audio file, or line after line of text coming from a text file, all appear to be streams of information. It is a simple yet powerful model, and forms the basis for the Java I/O classes.

Java 1.1's two types of streams

The Java 1.1 class library provides two different types of streams -- byte-oriented and character-oriented. I'll explain why in a moment, but first I want to give you some more background.

The two types of streams are organized into two separate class hierarchies, one consisting entirely of the byte-oriented stream classes, and the other consisting entirely of character-oriented stream classes. The classes within the two hierarchies are named consistently, except for their suffix. The byte-oriented stream classes end in either InputStream or OutputStream, while the character-oriented stream classes end in either Reader or Writer. The two hierarchies are functionally almost identical, and they contain most of the same subclass specializations (for example, one contains LineNumberInputStream and the other contains LineNumberReader).

Prior to version 1.1, the Java class library provided only byte-oriented streams. This setup reflected the reality imposed by the operating systems on which Java was developed -- they were hopelessly byte-oriented, and Java was initially tailored with that environment in mind. The byte-oriented stream classes, however, were flawed in one area -- support for Java's highly regarded Unicode character encoding. As you will soon learn, these classes provided little or no support for converting between bytes and characters.

Unicode is an international multi-language character encoding standard. Most operating systems we are familiar with use ASCII (American Standard Code for Information Interchange) encoding. ASCII (with a 7-bit, 128-character set) doesn't do a very good job supporting characters from languages similar to ours (Danish, for example), much less the character sets for languages such as Japanese or Thai. By basing its character encoding on Unicode (with a 16-bit, 65,536-character set) instead of ASCII, Java supposedly solved that problem -- or did it?

Things aren't always what they seem -- PrintStream

Let's take a look at the Java 1.0 implementation of

PrintStream

, a byte-oriented subclass of

OutputStream

, and the class from which the objects

System.out

and

System.err

are constructed.

The PrintStream class provides a bevy of methods for printing Java primitive data types and objects to an output stream. These are the two most obvious:

public void println(char [] rgc); public void println(String str);

Here are their implementations:

    /**
     * Prints an array of characters.
     * @param s the array of chars to be printed
     */
    synchronized public void print(char s[]) {
    for (int i = 0 ; i < s.length ; i++) {
        write(s[i]);
    }
    }
    /**
     * Prints a String.
     * @param s the String to be printed
     */
    synchronized public void print(String s) {
    if (s == null) {
        s = "null";
    }
    int len = s.length();
    for (int i = 0 ; i < len ; i++) {
        write(s.charAt(i));
    }
    }

Both methods call write, which, according to the specification, writes a byte (the low eight bits of its integer argument). Specifically, the documentation states that the println(char [] rgc) method prints "the low eight bits of each of the characters in the character array, followed by a newline character, to this print stream's underlying output stream."

Likewise, for the method println(String str) the documentation notes that "if the string argument is null, the string "null" followed by a newline character is printed to this print stream's underlying output stream. Otherwise, the low eight bits of each of the characters in the string, followed by a newline character, is printed to the underlying output stream."

Wack! These two methods crop a Unicode character right back down to ASCII! Since byte-oriented streams were all the class library supplied, proper support for Unicode was cut off well below the knees. An unfortunate decision, I suppose, but one that probably made sense given (as I mentioned earlier) the byte-oriented nature of the operating system on which Java was developed.

So things had to change

The release of Java 1.1 changed everything. In order to attract users in other countries to Java, Java needed to speak their language (or, more precisely, read it and write it). It made little sense to provide Unicode support internally if there was no way to get non-ASCII characters in or out.

So the Java developers at Sun created a new set of character-oriented stream classes called Readers and Writers.

Common character-oriented input stream methods

Let's take a look at the methods common to all character-oriented input streams (or readers). Following each method declaration, I'll list the tasks the method performs.

public int read() throws IOException

  • Reads a single character from the input stream and returns it. The character is returned as an integer between 0 and 16383.
  • Returns -1 if the end of the input stream has been reached.
  • Blocks (or waits) until data is available, if necessary.
  • Throws IOException if an error occurs during the read operation.

public int read(char [] rgc) throws IOException

  • Reads a sequence of characters from the input stream and places them in the specified array.
  • Returns the number of characters read. Returns -1 if the end of the input stream has been reached.
  • Blocks (or waits) until data is available, if necessary.
  • Throws IOException if an error occurs during the read operation.

public int read(char [] rgb, int nOff, int nLen) throws IOException

  • Reads a sequence of characters of the specified length from the input stream and places them in the specified array at the specified offset.
  • Returns the number of characters read. Returns -1 if the end of the input stream has been reached.
  • Blocks (or waits) until data is available, if necessary.
  • Throws IOException if an error occurs during the read operation.

public long skip(long n) throws IOException

  • Skips over the specified number of characters.
  • Returns the number of characters skipped. Returns -1 if the end of the input stream has been reached.
  • Throws IOException if an error occurs during the skip operation.

public boolean ready() throws IOException

  • Indicates whether or not this input stream is ready to be read.
  • Returns true or false as appropriate.
  • Throws IOException if an error occurs during the operation.

public void close() throws IOException

  • Closes the input stream and releases any resources (operating system file handles, for example) associated with the input stream.
  • Throws IOException if an error occurs during the operation.

public void mark(int nReadLimit)

  • Marks the current position in the input stream. Subsequent calls to reset() will reposition the input stream to this position.
  • Specifies the number of characters that may be read past the mark before the the mark is invalidated.
  • Not all input streams support the mark operation.

public void reset() throws IOException

  • Repositions the input stream to the last marked position.
  • Throws IOException if the stream has not been marked, or if the mark has been invalidated.

public boolean markSupported()

  • Indicates whether or not this input stream supports the mark and reset operations.

Common character-oriented output stream methods

Let's take a look at the methods common to all character-oriented output streams (or writers). As with the previous section, I'll list the tasks the method performs following each method declaration.

public void write(int c) throws IOException

  • Writes a single character to the output stream.
  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.

public void write(char [] rgc) throws IOException

  • Writes a sequence of characters to the output stream.
  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.

public void write(char [] rgc, int nOff, int nLen) throws IOException

  • Writes a sequence of characters of the specified length to the output stream, beginning at the specified offset.
  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.

public void write(String str) throws IOException

  • Writes a string to the output stream.
  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.

public void write(String str, int nOff, int nLen) throws IOException

  • Writes a portion of a string of the specified length to the output stream, beginning at the specified offset.
  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.

public void flush() throws IOException

  • Flushes the output stream, immediately writing any buffered data.
  • Throws IOException if an error occurs during the operation.

public void close() throws IOException

  • Closes the output stream and releases any resources (operating system file handles, for example) associated with the output stream.
  • Throws IOException if an error occurs during the operation.

New and improved filtering

I've reworked last month's mail filtering program to use the

Reader

and

Writer

classes in place of the

InputStream

and

OutputStream

classes. Consequently, this code only works with version 1.1 of the Java language.

Let's take a brief look at how the program works.

The centerpiece of the program is the class PatternFilter. This class is derived from class FilterReader.

public int read() throws IOException { // Look ahead in the input stream for a match -- // method test() should throw an exception is one // one is found...

test();

return in.read(); }

The read method first tests the input stream for a pattern match and then (if the test doesn't fail) returns a single character. The test method must be implemented in a subclass, and StringPatternFilter is just such a subclass.

protected void test() throws IOException { // Read characters, create a string, and // compare.

in.mark(_l);

char [] rgc = new char [_l];

in.read(rgc);

in.reset();

String str = new String(rgc);

if (str.equals(_str)) { throw new JunkMailException(_str); } }

The test method marks the current position in the stream, which allows the method to rewind the stream to that position once the test is completed. It then checks for an exact match to the pattern string. If the strings match, test throws an exception; otherwise, it quietly returns.

The class Main builds a list of these filters and connects them to the system input stream System.in.

To accommodate as many of you as possible, I've bundled the code as both a gzipped tar file and a zip file.

Like last month's code, this code doesn't run as an applet, so you'll need access to the Java Development Kit or a similar command-line environment. Once you get the code, unpack it in a convenient location.

Next, from the command line, execute the Java runtime as follows:

 % java Main [keyword] [keyword] ... < [email file]

You may specify any number of keywords on the command line. The program builds a filter for each of the keywords and links the filters together -- into a stream. The input is expected to arrive on standard input. The program will read from standard input, send the data through the stream, and write to standard output. If any of the filters detect a match on their keyword, they will raise an exception, which will stop the program.

Conclusion

Java's stream-based I/O classes are both elegant and powerful, and with the addition of the character-oriented stream classes in Java 1.1, Java is ready to take on the world. Which leads me directly to the topic of next month's column: internationalization -- a big word with big implications for the future of Java.

See you next month.

Todd Sundsted has been writing programs since computers became available in convenient desktop models. Though originally interested in building distributed applications in C++, Todd moved to the Java programming language when it became the obvious choice for that sort of thing. Todd is co-author of the Java Language API SuperBible. In addition to writing, Todd is president of Etcee, which offers Java-centric training, mentoring, and consulting.
1 2 Page 1