Build an interpreter in Java -- Implement the execution engine

Here's how to take the interpreter classes and run with them

1 2 3 Page 2
Page 2 of 3

Other aspects: Strings and arrays

Two other parts of the BASIC language are implemented by the COCOA interpreter: strings and arrays. Let's look at the implementation of strings first.

To implement strings as variables, the Expression class was modified to include the notion of "string" expressions. This modification took the form of two additions: isString and stringValue. The source for these two new methods is shown below.

        String stringValue(Program pgm) throws BASICRuntimeError {
        throw new BASICRuntimeError("No String representation for this.");
    boolean isString() { return false; }

Clearly, it isn't too useful to a BASIC program to get the string value of a base expression (which is always either numeric or boolean expression). You might conclude from the lack of utility that these methods then did not belong in Expression and belonged in a subclass of Expression instead. However, by putting these two methods in the base class, all Expression objects can be tested to see if, in fact, they are strings.

Another design approach is to return the numeric values as strings using a StringBuffer object to generate a value. So, for example, the same code could be rewritten as:

        String stringValue(Program pgm) throws BASICRuntimeError {
    StringBuffer sb = new StringBuffer();
    return sb.toString();

And if the code above is used, you can eliminate the use of isString because every expression can return a string value. Further, you can modify the value method to try to return a number if the expression evaluates to a string by running it through the valueOf method of java.lang.Double. In many languages such as Perl, TCL, and REXX, this sort of amorphous typing is used to great advantage. Both approaches are valid, and you should make your choice based on the design of your interpreter. In BASIC, the interpreter needs to return an error when a string is assigned to a numeric variable, so I chose the first approach (returning an error).

As for arrays, there are different ways in which you can design your language to interpret them. C uses the square brackets around array elements to distinguish the array's index references from function references that have parentheses around their arguments. However, the language designers for BASIC chose to use parentheses for both functions and arrays so when the text NAME(V1, V2) is seen by the parser, it could be either a function call or an array reference.

The lexical analyzer discriminates between tokens that are followed by parentheses by first assuming they are functions and testing for that. Then it goes on to see if they are keywords or variables. It is this decision that prevents your program from defining a variable named "SIN." Any variable whose name matched a function name would be returned by the lexical analyzer as a function token instead. The second trick the lexical analyzer uses is to check to see if the variable name is immediately followed by `('. If it is, the analyzer assumes it is an array reference. By parsing this in the lexical analyzer, we eliminate the string `MYARRAY ( 2 )' from being interpreted as a valid array (note the space between the variable name and the open parenthesis).

The final trick to implementing arrays is in the Variable class. This class is used for an instance of a variable, and as I discussed in last month's column, it is a subclass of Token. However, it also has some machinery to support arrays and that is what I'll show below:

class Variable extends Token {
    // Legal variable sub types
    final static int NUMBER = 0;
    final static int STRING  = 1;
    final static int NUMBER_ARRAY = 2;
    final static int STRING_ARRAY = 4;
    String name;
    int subType;
     * If the variable is in the symbol table these values are
     * initialized.
    int ndx[];  // array indices.
    int mult[]; // array multipliers
    double nArrayValues[];
    String sArrayValues[];

The above code shows the instance variables associated with a variable, as in the ConstantExpression class. One has to make a choice about the number of classes to be used versus the complexity of a class. One design choice might be to build a Variable class that holds only scalar variables and then add an ArrayVariable subclass to deal with the intricacies of arrays. I chose to combine them, turning scalar variables essentially into arrays of length 1.

If you read the above code, you will see array indices and multipliers. These are here because multidimensional arrays in BASIC are implemented using a single linear Java array. The linear index into the Java array is computed manually using the elements of the multiplier array. The indices used in the BASIC program are checked for validity by comparing them to the maximum legal index in the indices' ndx array.

For example, a BASIC array with three dimensions of 10, 10, and 8, would have the values 10, 10, and 8 stored in ndx. This allows the expression evaluator to test for an "index out of bounds" condition by comparing the number used in the BASIC program to the maximum legal number that is now stored in ndx. The multiplier array in our example would contain the values 1, 10, and 100. These constants represent the numbers one uses to map from a multidimensional array index specification into a linear array index specification. The actual equation is:

Java Index = Index1 + Index2 * Max Size of Index1 + Index3 * (MaxSize of Index1 * MaxSizeIndex 2)

The next Java array in the Variable class is shown below.

    Expression expns[];

The expns array is used to deal with arrays that are written as "A(10*B, i)." In that case, the indices are actually expressions rather than constants, so the reference has to contain pointers to those expressions that are evaluated at run time. Finally there is this fairly ugly-looking piece of code that calculates the index depending on what was passed in the program. This private method is shown below.

        private int computeIndex(int ii[]) throws BASICRuntimeError {
        int offset = 0;
        if ((ndx == null) || (ii.length != ndx.length))
            throw new BASICRuntimeError("Wrong number of indices.");
        for (int i = 0; i < ndx.length; i++) {
            if ((ii[i] < 1) || (ii[i] > ndx[i]))
                throw new BASICRuntimeError("Index out of range.");
            offset = offset +  (ii[i]-1) * mult[i];
        return offset;

Looking at the code above, you'll note that the code first checks to see that the correct number of indices were used when referencing the array, and then that each index was within the legal range for that index. If an error is detected, an exception is thrown to the interpreter. The methods numValue and stringValue return a value from the variable as a number or a string respectively. These two methods are shown below.

        double numValue(int ii[]) throws BASICRuntimeError {
        return nArrayValues[computeIndex(ii)];
    String stringValue(int ii[]) throws BASICRuntimeError {
        if (subType == NUMBER_ARRAY)
            return ""+nArrayValues[computeIndex(ii)];
        return sArrayValues[computeIndex(ii)];

There are additional methods for setting the value of a variable that are not shown here.

By hiding much of the complexity of how each piece is implemented, when it finally comes time to execute the BASIC program, the Java code is quite straightforward.

Running the code

The code to interpret the BASIC statements and execute them is contained in the


method of the


class. The code for this method is shown below, and I'll step through it to point out the interesting parts.

 1    public void run(InputStream in, OutputStream out) throws BASICRuntimeError {
 2        PrintStream pout;
 3        Enumeration e = stmts.elements();
 4        stmtStack = new Stack();    // assume no stacked statements ...
 5        dataStore = new Vector();   // ...  and no data to be read.
 6        dataPtr = 0;
 7        Statement s;
 9        vars = new RedBlackTree();
11        // if the program isn't yet valid.
12        if (! e.hasMoreElements())
13            return;
15        if (out instanceof PrintStream) {
16            pout = (PrintStream) out;
17        } else {
18            pout = new PrintStream(out);
19        }

The above code shows that the run method takes an InputStream and an OutputStream for use as the "console" for the executing program. In line 3, the enumeration object e is set to the set of statements from the collection named stmts. For this collection I used a variation on a binary search tree called a "red-black" tree. (For further information on binary search trees, see my previous column on building generic collections.) Following that, two additional collections are created -- one using a Stack and one using a Vector. The stack is used like the stack in any computer, but the vector is used expressly for the DATA statements in the BASIC program. The final collection is another red-black tree that holds the references for the variables defined by the BASIC program. This tree is the symbol table that is used by the program while it is executing.

Following the initialization, the input and output streams are set up, and then if e is not null, we start by collecting any data that has been declared. That is done as shown in the following code.

        /* First we load all of the data statements */
        while (e.hasMoreElements()) {
            s = (Statement) e.nextElement();
            if (s.keyword == Statement.DATA) {
                s.execute(this, in, pout);

The above loop simply looks at all of the statements, and any DATA statements it finds are then executed. The execution of each DATA statement inserts the values declared by that statement into the dataStore vector. Next we execute the program proper, which is done using this next piece of code:

        e = stmts.elements();
        s = (Statement) e.nextElement();
        do {
            int yyy;
            /* While running we skip Data statements. */
            try {
                yyy = in.available();
            } catch (IOException ez) {
                yyy = 0;
            if (yyy != 0) {
                pout.println("Stopped at :"+s);
            if (s.keyword != Statement.DATA) {
                if (traceState) {
                    s.trace(this, (traceFile != null) ? traceFile : pout);
                s = s.execute(this, in, pout);
            } else
                s = nextStatement(s);
        } while (s != null);

As you can see in the code above, the first step is to reinitialize e. The next step is to fetch the first statement into the variable s and then to enter the execution loop. There is some code to check for pending input on the input stream to allow the progress of the program to be interrupted by typing at the program, and then the loop checks to see if the statement to execute would be a DATA statement. If it is, the loop skips the statement as it was already executed. The rather convoluted technique of executing all data statements first is required because BASIC allows the DATA statements that satisfy a READ statement to appear anywhere in the source code. Finally, if tracing is enabled, a trace record is printed and the very uninpressive statement s = s.execute(this, in, pout); is invoked. The beauty is that all the effort of encapsulating the base concepts into easy-to-understand classes makes the final code trivial. If it isn't trivial then perhaps you have a clue that there might be another way to split your design.

Wrapping up and further thoughts

The interpreter was designed so that it could run as a thread, thus there can be several COCOA interpreter threads running simultaneously in your program space at the same time. Further, with the use of function expansion we can provide a means whereby those threads can interact with each other. There was a program for the Apple II and later for the PC and Unix called C-robots that was a system of interacting "robotic" entities that were programmed using a simple BASIC derivative language. The game provided me and others with many hours of entertainment but was also an excellent way to introduce the basic principles of computation to younger students (who mistakenly believed they were just playing and not learning). Java based interpreter subsystems are much more powerful than their pre-Java counterparts because they are instantly available on any Java platform. COCOA ran on Unix systems and Macintoshes the same day I got working on a Windows 95 based PC. While Java gets beaten up by incompatibilities in the thread or window toolkit implementations, what is often overlooked is this: A lot of code "just works."

1 2 3 Page 2
Page 2 of 3