Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

How to build an interpreter in Java, Part 1: The BASICs

For complex applications requiring a scripting language, Java can be used to implement the interpreter, adding scripting abilities to any Java app

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Page 3 of 4

The third component is the execution environment. As we'll see, the requirements for this component are fairly simple, but the implementation has a couple of interesting twists.

A very quick BASIC tour

For those of you who may never have heard of BASIC, I'll give you a brief glimpse of the language, so you can understand the parsing and execution challenges ahead. For more information on BASIC, I highly recommend the resources at the end of this column.

BASIC stands for Beginners All-purpose Symbolic Instructional Code, and it was developed at Dartmouth University to teach computation concepts to undergraduate students. Since its development, BASIC has evolved into a variety of dialects. The simplest of these dialects are used as control languages for industrial process controllers; the most complex dialects are structured languages that incorporate some aspects of object-oriented programming. For my project, I chose a dialect known as BASIC-80 that was popular on the CP/M operating system in the late seventies. This dialect is only moderately more complex than the simplest dialects.

Statement syntax

All statement lines are of the form

<Line> <Keyword> <Parameters> [ : <Statement> [ : ... ] ]

where "Line" is a statement line number, "Keyword" is a BASIC statement keyword, and "Parameters" are a set of parameters associated with that keyword.

The line number has two purposes: It serves as a label for statements that control execution flow, such as a goto statement, and it serves as a sorting tag for statements inserted into the program. As a sorting tag, the line number facilitates a line editing environment in which editing and command processing are mixed in a single interactive session. By the way, this was required when all you had was a teletype. :-)

While not very elegant, line numbers do give the interpreter environment the ability to update the program one statement at a time. This ability stems from the fact that a statement is a single parsed entity and can be linked in a data structure with line numbers. Without line numbers, often it is necessary to re-parse the entire program when a line changes.

The keyword identifies the BASIC statement. In the example, our interpreter will support a slightly extended set of BASIC keywords, including goto, gosub, return, print, if, end, data, restore, read, on, rem, for, next, let, input, stop, dim, randomize, tron, and troff. Obviously, we won't go over all of these in this article, but there will be some documentation online in my next month's "Java In Depth" for you to explore.

Each keyword has a set of legal keyword parameters that can follow it. For example, the goto keyword must be followed by a line number, the if statement must be followed by a conditional expression as well as the keyword then -- and so on. The parameters are specific to each keyword. I'll cover a couple of these parameter lists in detail a bit later.

Expressions and operators

Often, a parameter specified in a statement is an expression. The version of BASIC I'm using here supports all of the standard mathematical operations, logical operations, exponentiation, and a simple function library. The most important component of the expression grammar is the ability to call functions. The expressions themselves are fairly standard and similar to the ones parsed by the example in my previous StreamTokenizer column.

Variables and data types

Part of the reason BASIC is such a simple language is because it has only two data types: numbers and strings. Some scripting languages, such as REXX and PERL, don't even make this distinction between data types until they are used. But with BASIC, a simple syntax is used to identify data types.

Variable names in this version of BASIC are strings of letters and numbers that always start with a letter. Variables are not case-sensitive. Thus A, B, FOO, and FOO2 are all valid variable names. Furthermore, in BASIC, the variable FOOBAR is equivalent to FooBar. To identify strings, a dollar sign ($) is appended to the variable name; thus, the variable FOO$ is a variable containing a string.

Finally, this version of the language supports arrays using the dim keyword and a variable syntax of the form NAME(index1, index2, ...) for up to four indices.

Program structure

Programs in BASIC start by default at the lowest numbered line and continue until there are either no more lines to process or the stop or end keywords are executed. A very simple BASIC program is shown below:

100 REM This is probably the canonical BASIC example
110 REM Program. Note that REM statements are ignored.
120 PRINT "This is a test program."
130 PRINT "Summing the values between 1 and 100"
140 LET total = 0
150 FOR I = 1 TO 100
160     LET total = total + i
170 NEXT I
180 PRINT "The total of all digits between 1 and 100 is " total
190 END


The line numbers above indicate the lexical order of the statements. When they are run, lines 120 and 130 print messages to the output, line 140 initializes a variable, and the loop in lines 150 through 170 update the value of that variable. Finally, the results are printed out. As you can see, BASIC is a very simple programming language and therefore an ideal candidate for teaching computation concepts.

Organizing the approach

Typical of scripting languages, BASIC involves a program composed of many statements that run in a particular environment. The design challenge, then, is to construct the objects to implement such a system in a useful way.

When I looked at the problem, a straightforward data structure fairly leaped out at me. That structure is as follows:

The public interface to the scripting language shall consist of

  • A factory method that takes source code as input and returns an object representing the program.
  • An environment that provides the framework in which the program executes, including "I/O" devices for text input and text output.
  • A standard way of modifying that object, perhaps in the form of an interface, that allows the program and the environment to be combined to achieve useful results.


Internally, the structure of the interpreter was a bit more complicated. The question was how to go about factoring the two facets of the scripting language, parsing and execution? Three groups of classes resulted -- one for parsing, one for the structural framework of representing parsed and executable programs, and one that formed the base environment class for execution.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources