Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Reading textual data: Fun with streams

Find out how to extend and customize the character-stream classes to easily read textual data

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Never let it be said that I'm happy to rush out a simple article (or book) just to meet a publisher's deadline. What started out as a basic, inefficient character-stream filter, designed to read a stream of digits and parse it into a number, has gradually ballooned into a small stream library sporting unnecessary features and go-faster stripes. Not only has the evolving character-stream library discarded many genetically-inferior siblings along the way, it's behind schedule! (This brings back memories of that 300-page manuscript I discarded in favor of starting anew. I think I'm beginning to see a destructive pattern emerging...)

I initially set out to design a data-reading character-stream filter class. Analogous to the byte-stream filter DataInputStream, this character-stream filter was intended to provide the capability to read textual data from a character stream (namely the output of a human or the PrintWriter println() method).

Let me now, in retrospect, describe what I actually implemented.

First, I created an UndoReader class. This character-stream filter supports three special methods:

  • checkpoint()
  • commit()
  • rollback()


As you read characters through the stream, you have the option to checkpoint the stream -- that is, save the stream's current state and put it into a mode such that it stores all data subsequently read through it. From that point on, the UndoReader stores all the data you read. After any amount of reading, choosing to commit the stream will cause the stored data to be discarded, after which reading proceeds undisturbed. Alternatively, choosing to rollback the stream will cause it to rewind and revert to reading from the position at which you asserted the checkpoint -- just as if you hadn't yet read anything. This stream also supports a couple of related methods.

Next, I implemented the DataReader class, the character-stream filter. This class makes use of the UndoReader class and provides methods to read all the primitive Java types (readInt(), readFloat(), readBoolean(), and so on). What is special about this class is that if you attempt to read a primitive from the stream and it turns out the stream data is incorrect -- if, for example, you attempt to read a Boolean and the next token in the stream is truthfulness -- it rolls back, un-reading any characters read during the erroneous operation, and throws an exception. The stream also supports a feature whereby it can read data one line at a time, signaling each time the end of a line is reached (among other wonders).

The classes I developed will only work in JDK 1.1-plus. Adapting them to work as InputStreams, usable under JDK 1.0.2, should be quite easy, however.

Justification for UndoReader

In the interest of brevity, I'll spare you an introduction to character streams. Todd Sundsted's November 1997 How-To Java column should serve that purpose quite adequately; if you want an introduction to byte streams, check out Todd's October 1997 column. For further details of the Java stream classes, I refer you to Java Network Programming, Second Edition, which I coauthored with Michael Shoffner and Derek Hamner, and which is due out any day now. (See Resources.)

What I should perhaps explain is my justification for the UndoReader class. At the most abstract level, I want the ability to undo a series of read operations, because otherwise my DataReader class will violate the basic law of propriety: It would be improper for an erroneous attempt at reading an int to end up consuming a Boolean. Furthermore, the behavior of my class wouldn't necessarily be clearly defined in the presence of such an error: the amount of erroneous input consumed would be implementation-dependent, and exposing implementation-dependent details of this nature simply invites abuse.

Those readers familiar with the stream classes may then ask about the mark() and reset() methods, or indeed the PushbackReader class: Do these basic features of the stream API not already address my needs? Indeed, use of the mark() and reset() methods does allow a sequence of read operations to be undone, and the PushbackReader unread() methods can be used to the same effect. However, both of these options are bounded. Therefore you must, in each case, declare ahead of time the maximum volume of data you will un-read. In our situation, no such limit exists for textual data: "00...01" is a valid integer, just as "00...0z" is not. I cannot, simply to avoid writing an extra class, presume to impose arbitrary limitations on the data I will process.

Thus rationalized, we can now proceed with the code.

Class UndoReader

The UndoReader class is a character-stream filter that provides unbounded checkpoint, commit, and rollback operations and an additional undo facility.

Figure 1. Using methods checkpoint(), commit(), rollback(), and undo()

  • When checkpoint() is used, it proceeds to store all data read through it in an internal, expanding buffer

  • When commit() is used, the stored contents of the checkpoint buffer are discarded and further reads are no longer stored

  • When rollback() is used, reading reverts to data stored in the internal buffer; when this is used up, reading proceeds as normal

  • Any number of reads performed since a checkpoint can be undone without a full rollback by partially reverting in the internal buffer

  • It is an error to checkpoint a stream that has already been checkpointed or to commit, rollback, or undo a noncheckpointed stream

  • We must support the case where a checkpoint is placed while we're still reading out of the internal buffer


The class definition

We'll start by looking at our class definition:

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources
  • Download the complete source code for this article as a zip file http://www.javaworld.com/jw-04-1999/step/jw-04-step.zip
  • "Use the two 'R's of Java 1.1 -- Readers and Writers" by Todd Sundsted (JavaWorld November 1997) http://www.javaworld.com/jw-11-1997/jw-11-howto.html
  • "Waging war on electronic junk mail" by Todd Sundsted (JavaWorld October 1997) http://rwanda.wpi.com/javaworld/jw-10-1997/jw-10-howto.html
  • Merlin Hughes, Michael Shoffner, and Derek Hamner's Java Network Programming, Second Edition covers the stream classes in detail http://www.manning.com/Hughes/007.html
  • The Java Developer ConnectionSM TechTips often cover I/O-related issues http://developer.java.sun.com/developer/javaInDepth/TechTips/
  • The JDK 1.1 documentation includes coverage of the character streams http://java.sun.com/products/jdk/1.1/docs/guide/io/
  • Read Merlin's previous Java Step by Step columns http://www.javaworld.com/topicalindex/jw-ti-step.html