Newsletter sign-up
View all newsletters

Sign up for our Enterprise Java Newsletter

Enterprise Java

Waging war on electronic junk mail

Put Java on the front line in the war against electronic junk mail

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Sound familiar? These are but a few of the numerous (and often offensive) unsolicited e-mails I received this past week, and which inspired me to write this column.

For those of you lucky enough to have avoided the electronic junk mail epidemic, let me tell you, it's a real problem. And this month, we're going to tackle it head-on -- with Java.

Just as in columns past, We'll begin with a quick look at the problem and discuss its solution. Then, I will introduce you to the parts of the Java class library that we'll use to implement the solution. Finally, we'll work through the solution.

Staking out the enemy

There's no escaping the reality of electronic junk mail, so let's take a moment to think about how we can minimize its intrusion into our lives.

The best, most efficient solution would simply be to stop people from sending us unwanted electronic mail. Unfortunately, something called the First Amendment (at least here in the U.S.) prevents us from taking this approach, so we must consider another angle. We must focus on getting rid of junk electronic mail before we ever set eyes on it. The question is how?

One reasonably effective method involves examining a piece of electronic mail and deciding whether to keep it or reject it based on its content. This is, after all, what we do when we read a piece of electronic mail.

Consider how we go about filtering electronic mail now. We scan a piece of mail -- character by character and line by line -- looking for words we recognize. If the mail contains the word "Java," we keep it; if it contains the phrase "Make Money Fast," we send it to the bit-bucket.

But why go through the trouble? Let's see if we can make a computer program suffer this task for us.

Tactical assessment

I'm going to take a step back from the problem at hand and look at the classes in the I/O package of the Java class library. I/O stands for Input/Output, which represents the information that goes into and comes out of a program, and the parts of a program that handle the information.

The Java class library input and output classes are based on a very simple, but very powerful model -- the stream.

The stream model, which is shown in the following figure, presents information as flowing from one point to another, as if it were in a stream or pipe. From a vantage point at any position along the flow, an observer sees pieces of information pass by, a piece at a time, in sequence.

A stream passes information from one point to another

The model fits many types of real-world information. Whether it is keycodes coming from a computer keyboard, audio data coming from an audio file, or line after line of text coming from a text file, all appear to be streams of information.

An important tool for working on streams is the filter. Filters take information arriving at their upstream side, filter or process it in some way, and send it out their downstream side. The figure below shows how a filter works.

A filter interrupts the flow of information for processing

The key to the stream model's power is the ability to chain together very simple individual filters to create more powerful compound filters, as shown in the following figure.

A cascade of filters

The Java class library breaks streams up into two types -- input and output. Such a distinction is not necessary in theory, but is useful in practice.

Input streams generally have as their ultimate source some device or file, and are involved in taking data from that source and bringing it into the domain of the program. The input stream is often filtered in the process.

Output streams generally have as their ultimate destination some device or file, and are involved in taking data from the domain of the program and sending it to that source. The output stream is often filtered in the process.

We will use the stream classes of the Java class library in the solution to our electronic junk mail problem for two reasons:

  1. It's easy to think of electronic mail as flowing line by line and word by word into our computer.

  2. We want to examine the mail, line by line and word by word, as it arrives at our computer to see if it matches any of the patterns we specify.



Our arsenal -- The stream classes in detail

The Java 1.1 specification describes two nearly identical sets of input and output stream classes. One set is byte oriented, the other is character oriented. The byte-oriented stream classes were present, with only minor differences, in Java 1.0.2. The character-oriented stream classes are entirely new with the 1.1 spec.

This month we'll look at the byte-oriented stream classes. We'll do this for two reasons. First, this will allow those of you who do not yet have access to Java 1.1 to make use of the this material. Second, it will allow me to point out some problem areas with the Java 1.0.2 class libraries that were fixed in Java 1.1.

Recall that streams can be divided into two broad categories: input streams and output streams. In Java, all byte-oriented input stream classes are subclasses of the abstract class InputStream. Class InputStream defines the basic suite of methods an input stream class must provide. Likewise, all byte-oriented output streams classes are subclasses of the abstract class OutputStream. Class OutputStream defines the basic suite of methods an output stream class must provide.

Common input stream methods

Let's take a look at the methods common to all input streams. Following each method declaration, I'll list the tasks the method performs.

public int read() throws IOException

  • Reads a single byte from the input stream and returns it.

  • Returns -1 if the end of the input stream has been reached.

  • Blocks (or waits) until data is available, if necessary.

  • Throws IOException if an error occurs during the read operation.



public int read(byte [] rgb) throws IOException

  • Reads a sequence of bytes from the input stream and places them in the specified array.

  • Returns the number of bytes read.

  • Returns -1 if the end of the input stream has been reached.

  • Blocks (or waits) until data is available, if necessary.

  • Throws IOException if an error occurs during the read operation.



public int read(byte [] rgb, int nOff, int nLen) throws IOException

  • Reads a sequence of bytes of the specified length from the input

    stream and places them in the specified array at the specified offset.

  • Returns the number of bytes read.

  • Returns -1 if the end of the input stream has been reached.

  • Blocks (or waits) until data is available, if necessary.

  • Throws IOException if an error occurs during the read operation.



public long skip(long n) throws IOException

  • Skips over the specified number of bytes.

  • Returns the number of bytes skipped.

  • Returns -1 if the end of the input stream has been reached.

  • Throws IOException if an error occurs during the skip operation.



public int available() throws IOException

  • Returns the number of the bytes that can be read from the input stream without the read operation blocking.

  • Throws IOException if an error occurs during the operation.



public void close() throws IOException

  • Closes the input stream and releases any resources (operating system file handles, for example) associated with the input stream.

  • Throws IOException if an error occurs during the operation.



public void mark(int nReadLimit)

  • Marks the current position in the input stream. Subsequent calls to reset() will reposition the input stream to this position.
  • Specifies the number of bytes that may be read past the mark before the the mark is invalidated.



public void reset() throws IOException

  • Repositions the input stream to the last marked position.

  • Throws IOException if the stream has not been marked, or if the mark has been invalidated.



public boolean markSupported()

  • Indicates whether or not this input stream supports the mark and reset operations.



Common output stream methods

Let's take a look at the methods common to all output stream. As with the previous section, I'll list the tasks the method performs following each method declaration.

public void write(int b) throws IOException

  • Writes a single byte to the output stream.

  • Blocks (or waits) until

    the data is actually written.

  • Throws IOException if an error occurs during the write operation.



public void write(byte [] rgb) throws IOException

  • Writes a sequence of bytes to the output stream.

  • Blocks (or waits) until the data is actually written.

  • Throws IOException if an error occurs during the write operation.



public void write(byte [] rgb, int nOff, int nLen) throws IOException

  • Writes a sequence of bytes of the specified length to the output stream, beginning at the specified offset.

  • Blocks (or waits) until the data is actually written.
  • Throws IOException if an error occurs during the write operation.



public void flush() throws IOException

  • Flushes the output stream, immediately writing any buffered data.

  • Throws IOException if an error occurs during the operation.



public void close() throws IOException

  • Closes the output stream and releases any resources (operating system file handles, for example) associated with the output stream.

  • Throws IOException if an error occurs during the operation.



Our plan of attack

The code this month comes in three different flavors. Here's why.

Byte to char conversions in Java 1.0.2 were fundamentally flawed (making the language's Unicode support essentially useless). To support internationalization, the flaws were fixed in Java 1.1. The result is two almost identical APIs that differ only in the methods they provide for converting bytes to chars.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources
  • The InputStream class for Java 1.0.2 http://java.sun.com/products/jdk/1.0.2/api/java.io.InputStream.html
  • The OutputStream class for Java 1.0.2 http://java.sun.com/products/jdk/1.0.2/api/java.io.OutputStream.html
  • The InputStream class for Java 1.1 http://www.javasoft.com/products/jdk/1.1/docs/api/java.io.InputStream.html
  • The OutputStream class for Java 1.1 http://www.javasoft.com/products/jdk/1.1/docs/api/java.io.OutputStream.html
  • I/O Enhancements for Java 1.1 http://www.javasoft.com/products/jdk/1.1/docs/guide/io/index.html
  • Previous How-To Java articles