Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

How to build an interpreter in Java, Part 2: The structure

The trick to assembling the foundation classes for a simple interpreter

  • Print
  • Feedback
In last month's column I discussed the idea of building an interpreter in Java and introduced my version of BASIC, named Cocoa. In this month's column we will jump right into the source code for the Cocoa interpreter. However, as the emphasis here is on building interpreters in Java, more attention will be paid to the structure than to the actual code. As always, the full source code for the column is available in the Resources section below.

The Program class

All the constituent classes of the interpreter are collected into a single Java package. For this interpreter, that package is named basic. (Here is Cocoa's programming documentation.) This does contravene Sun's suggested naming convention for packages for no good reason, but then again Sun's naming scheme, as described in the Java Language spec, isn't particularly well motivated either. :-) However, there is good reason to put the classes together in a single package, and that is to provide visibility of interpreter private methods to other interpreter classes without exposing them to the interpreter's clients.

As I discussed last month, the primary class for the interpreter is a public class with a factory method to create executable programs. I chose to implement this load method in a class named basic.Program. (For the rest of the article we'll assume the class names are all preceded by the "basic" package name without expressly calling it out.) Program exports a static method whose signature is:

public static Program load(InputStream source, PrintStream out)
        throws IOException, BASICSyntaxError { ... }


This method is responsible for getting the text in an InputStream parsed and collected into something that can be interpreted. It returns an instance of a Program object, which is the parsed program. The other important public method in Program is run, whose signature is as follows:

public void run(InputStream in, OutputStream out) throws BASICRuntimeError {
        PrintStream pout;


The run method is not static; it actually runs the instance of the program using the input and output streams that are passed for doing character I/O.

As you can see, both of these methods throw exceptions. The first throws an IOException if an I/O error occurs while reading the source code from the input stream. The load method also can throw a BASICSyntaxError exception if the source code it is reading fails to parse. The second method, run, throws the exception BASICRuntimeError when an error occurs during the execution of the program. The kinds of situations that would throw this error include dividing by zero or reading from an uninitialized variable.

These two methods, load and run, define the two halves of any interpreter, loading and executing.

Getting the program loaded

The load method in the Program class begins as follows:

public static Program load(InputStream source, PrintStream out) throws IOException, BASICSyntaxError {
        DataInputStream dis = null;
        char data[] = new char[256];
        LexicalTokenizer lt = new LexicalTokenizer(data);
        String lineData;
        Statement s;
        Token t;
        Program result = new Program();
    if (! (source instanceof DataInputStream))
        dis = new DataInputStream(new BufferedInputStream(source));
    else
        dis = source;


In the code above, the variables for the parsing section of the interpreter are initialized. These include a lexical analyzer (in the form of the LexicalTokenizer class) and a DataInputStream. There is a simple optimization that checks to see whether or not the method was passed an instance of a DataInputStream. The data array is an array of characters the lexical analyzer will use while reading in the source code. Note that it puts a limit of about 256 characters on the input line.

  • Print
  • Feedback

Resources