Build an interpreter in Java -- Implement the execution engine

Here's how to take the interpreter classes and run with them

For those of you just joining us, in my "Java In Depth" column over the past couple months, I've been discussing how one might go about building an interpreter in Java. In the first interpreter column, we covered some of the desirable attributes of an interpreter; in the second column we discussed both parsing and the layout of a class package for implementing the interpreter. In this column we'll look at running the interpreter, and the support classes necessary to accomplish that. Finally, I'll wrap up the series here with a discussion of how an interpreter can be hooked into other Java classes, thus enhancing their abilities.

Reviewing the relevant bits

Let me start by sketching out what we've covered so far and point out those parts of the design that will become more important as we discuss execution mode. For a more detailed description of these classes, refer to my previous columns or to the source code links that are in the

Resources section

below.

There are three foundation classes in the implementation of the interpreter, Program, Statement, and Expression. The following shows how the three are related:

Program

This Program class glues together the parsing and execution components of the parser. This class defines two principal methods, load and run. The load method reads statements from an input stream and parses them into a collection of statements, the run method iterates over the collection and executes each of the statements. The Program class also provides a collection of variables for the program to use as well as a stack for storing data.

Statement

The Statement class contains a single parsed statement. This class is actually subclassed into a specific type of statement (PRINT, GOTO, IF, and so on) but all statements contain the method execute which is called to execute the statement in the context of a Program class instance.

Expression

The Expression class contains the parse tree of an expression. During execution, the value method is used to evaluate the expression and return its value. Like Statement, the Expression class is primarily designed to be subclassed by specific types of expressions.

All of these classes work together to form the basis of an interpreter. The Program class simultaneously encapsulates the parsing operation and execution operation, whereas the Statement and Expression classes encapsulate the actual computational concepts of the language we've implemented. For these three articles on building interpreters, the example language has been BASIC.

Facilities for computation

There are two executable classes in the interpreter,

Statement

and

Expression

. First let's take a look at

Expression

.

The instances of Expression are created by the method expression in the class ParseExpression. The ParseExpression class implements the expression parser in this interpreter. This class is a peer of the ParseStatement class, which uses the statement method to parse BASIC statements. Instances of Expression have an internal type that identifies what operator the instance represents, and two methods, value and stringValue, that return the computed value of the expression. Additionally, when an Expression instance is created, it is nominally given two parameters representing the left and right side of the expression's operation. Shown in source form, the first part of the Expression is as follows:

class Expression {
    Expression arg1, arg2;
    int oper;
    final static int OP_ADD  = 1;   // Addition '+'
    final static int OP_SUB  = 2;   // Subtraction '-'
    final static int OP_MUL  = 3;   // Multiplication '*'
    final static int OP_DIV  = 4;   // Division '/'
    [... etc for all of the operators ...]
    final static int OP_BNOT = 19;  // Boolean negation '.NOT.'
    final static int OP_NEG  = 20;  // Unary minus

As you can see in the code above, there are the instance variables, an operator type named oper, and two halves of the operation in arg1 and arg2, and then some constants to define the various types. Also, there are two constructors that are used by the expression parser; these create a new expression from two expressions. Their source is shown below:

protected Expression(int op, Expression a, Expression b) throws BASICSyntaxError {
        arg1 = a;
        arg2 = b;
        oper = op;
        /*
         * If the operator is a boolean, both arguments must be boolean.
         */
        if (op > OP_GE) {
            if ( (! (arg1 instanceof BooleanExpression)) ||
                 (! (arg2 instanceof BooleanExpression)) )
                throw new BASICSyntaxError(typeError);
        } else {
            if ((arg1 instanceof BooleanExpression) || (arg2 instanceof BooleanExpression))
                throw new BASICSyntaxError(typeError);
        }
    }
    protected Expression(int op, Expression a) throws BASICSyntaxError {
        arg2 = a;
        oper = op;
        if ((oper == OP_BNOT) && (! (arg2 instanceof BooleanExpression)))
            throw new BASICSyntaxError(typeError);
    }

The first constructor builds an arbitrary expression object, and the second one builds a "unary" expression object -- such as unary minus. One thing to note is that if there is only one argument, arg2 is used to store its value.

The method that is used the most frequently in the Expression class is value, which is defined as follows:

        double value(Program pgm) throws BASICRuntimeError {
        switch (oper) {
            case OP_ADD :
                return arg1.value(pgm) + arg2.value(pgm);
            case OP_SUB :
                return arg1.value(pgm) - arg2.value(pgm);
    ... etc for all of the other operator types. ...

You can see that each expression object represents a tuple consisting of an operator and one or two arguments. The fun part of designing the expression execution engine in this way is that when you construct this set of expression tuples based on the Expression object, you can compute the value of the expression by simply invoking the value method. The value method recursively invokes the value method of the two arguments that compose this expression, applies the operation to them, and returns the result. This design was used so that expressions would be easy to understand.

To keep the class structure clean, all computational units -- from constants to trigonometric functions -- are subclasses of Expression. This idea, stolen shamelessly from Lisp, completely encapsulates the notion of "causing" an evaluation to occur, from the actual implementation of "how" that evaluation occurs. To demonstrate how this principle is applied, we need only look at some of the specialized subclasses of Expression.

Constants in my version of BASIC, which I named COCOA are represented by the class ConstantExpression, which subclasses Expression and simply stores the numeric value in a member value. The source code to ConstantExpression is shown conceptually below. I say "conceptually" because I did choose to bundle what would have been StringConstantExpression and NumericConstantExpression into a single class. So the real class includes a constructor for creating a constant with a string argument and for returning its value as a string. The following code shows how the ConstantExpression class handles numeric constants.

class ConstantExpression extends Expression {
    private double v;
    ConstantExpression(double a) {
        super();
        v = a;
    }
    double value(Program pgm) throws BASICRuntimeError {
        return v;
    }
}

The code shown above replaces the more complicated constructors of Expression with a simple store of an instance variable; the value method is simply replaced with a return of the stored value.

It is true that you could code the Expression class to accept in its constructors a constant that would save you a class. However, one advantage of designing Expression the way I have is that the code in Expression remains maximally generic. I find this style of coding helps me eliminate the complexity of special cases and thus when I am "done" with the Expression code, I can move on to other aspects of expressions without revisiting the base class again and again. The advantage becomes clearer when we delve into another subclass of Expression named FunctionExpression.

In the class FunctionExpression, there were two design requirements I felt should be met to keep the interpreter flexible. The first was to implement the standard BASIC functions; the other was to encapsulate the parsing of the functions arguments into the same class that implemented these functions. The second requirement, parsing, was motivated by a desire to make this BASIC extensible by creating additional function libraries that could be passed to the parser as subclasses of FunctionExpression. Further, those passed-in classes could be used by the parser to increase the number of functions available to the user's program.

The FunctionExpression class is only moderately more complicated than a ConstantExpression and is shown in condensed form below:

 1 class FunctionExpression extends Expression {
 2     [ ... parsing stuff removed ...]
 3    double value(Program p) throws BASICRuntimeError {
 4        try {
 5            switch (oper) {
 6                case RND :
 7                    if (r == null)
 8                        r = p.getRandom();
 9                    return (r.nextDouble() * arg2.value(p));
10                case INT :
11                    return Math.floor(arg2.value(p));
12                case SIN :
13                    return Math.sin(arg2.value(p));
14    [ ... and so on for all of the functions implemented ... ]
15                default :
16                    throw new BASICRuntimeError("Unknown or non-numeric function.");
17            }
18        } catch (Exception e) {
19            if (e instanceof BASICRuntimeError)
20                throw (BASICRuntimeError) e;
21            else
22                throw new BASICRuntimeError("Arithmetic Exception.");
23        }
24    }

The above source shows how the value method is implemented. The oper variable is reused to hold the function identity, and the Expression objects referenced by arg1 and arg2 are used as arguments to the functions themselves. Finally, there is a large switch statement that dispatches the request. One interesting aspect is that the value method does catch the potential arithmetic exceptions and converts them into instances of BASICRuntimeError. The parsing code in FunctionExpression is shown below, again condensed to save space. (Remember, all of the source code is available using links in the Resources section.)

 1   static FunctionExpression parse(int ty, LexicalTokenizer lt) throws BASICSyntaxError {
 2       FunctionExpression result;
 3       Expression a;
 4       Expression b;
 5       Expression se;
 6       Token t;
 7
 8       t = lt.nextToken();
 9       if (! t.isSymbol('(')) {
10           if (ty == RND) {
11               lt.unGetToken();
12               return new FunctionExpression(ty, new ConstantExpression(1));
13           } else if (ty == FRE) {
14               lt.unGetToken();
15               return new FunctionExpression(ty, new ConstantExpression(0));
16           }
17           throw new BASICSyntaxError("Missing argument for function.");
18       }
19       switch (ty) {
20           case RND:
21           case INT:
22           case SIN:
23           case COS:
24           case TAN:
25           case ATN:
26           case SQR:
27           case ABS:
28           case CHR:
29           case VAL:
30           case STR:
31           case SPC:
32           case TAB:
33           case LOG:
34               a = ParseExpression.expression(lt);
35               if (a instanceof BooleanExpression) {
36                   throw new BASICSyntaxError(functions[ty].toUpperCase()+" function cannot accept boolean expression.");
37               }
38               if ((ty == VAL) && (! a.isString()))
39                   throw new BASICSyntaxError(functions[ty].toUpperCase()+" requires a string valued argument.");
40               result = new FunctionExpression(ty, a);
41               break;
       [ ... other cases omitted for brevity ... ]
42           default:
43               throw new BASICSyntaxError("Unknown function on input.");
44
45       }
46       t = lt.nextToken();
47       if (! t.isSymbol(')')) {
48           throw new BASICSyntaxError("Missing closing parenthesis for function.");
49       }
50       return result;
51   }

Note that this code takes advantage of the fact that the expression parser in ParseStatement has already figured out that it is looking at an expression and has passed the identity of the expression in as parameter ty. This parser then need only locate the opening parenthesis and the closing parenthesis that contain the argument(s). But look carefully: In lines #9 through #18, the parser allows some functions to have no arguments (in this case RND and FRE). This demonstrates the flexibility afforded by having the function subparser built into this class, rather than forcing all functions to conform to some predefined template. Given a function type in the parameter ty, the switch statement selects a branch that can parse the arguments required for that function, be they strings, numbers, other expressions, and so on.

1 2 3 Page 1
Page 1 of 3