Jul 24, 2006 1:00 AM PT

Javalution

Play around with Snobol and Infiqs

Since Java's inception, the language's usefulness has increased through Sun Microsystems' introduction of new language features—ranging from inner classes to generics, annotations, covariant return types, and more. But Sun isn't the only one to extend the Java language: various third-party products have made Java more useful by introducing new language features and translating extended Java source code to Sun-standard Java.

Many programming languages pre-date Java, resulting in an enormous amount of legacy source code. Because migrating all of this code to Java is costly, other third-party products have been developed to compile legacy source code to classfiles and to interpret legacy source code. These products increase Java's usefulness by increasing Java's software base; they also extend the useful life of legacy source code.

Third-party products that extend Java or migrate legacy source code to Java (resulting in software that is part Java and part non-Java) contribute to Java's evolution—or Javalution. In this installment of Java Fun and Games, I introduce two such products: the Infiqs macro expander and the Snobol3 language interpreter.

Along with their practical uses, both products make Java programming more enjoyable and interesting. Use Infiqs to specify big decimal-based numeric expressions with simple operators, instead of writing these expressions as lengthy, hard-to-read, and error-prone sequences of constructor and method calls. Use Snobol3 to merge Snobol3 source code with Java, resulting in interesting part-Java/part-Snobol3 hybrids.

Note
Unlike previous Java Fun and Games installments, which were written from the perspective of J2SE 1.4, this installment requires Java SE 5.0 (Java SE is Sun's new name for J2SE).

Infiqs macro expander

Harold Kaplan's Infiqs—an intentional misspelling of the word infix—is a macro expander that lets programmers use infix operators to perform various arithmetic operations on instances of java.math.BigDecimal. This product is distributed in the infiqs.zip distribution file, requires Java SE 5.0, and is freeware. (See Resources for a link to Kaplan's Website, where you can download infiqs.zip.)

The infiqs.zip distribution file contains the Infiqs application's Infiqs.class and Inf.class classfiles. It also contains this application's Infiqs.java source file, making it possible to modify Infiqs. Several example source files with .infiqs file extensions and an HTML documentation file round out the distribution.

Infiqs can save you time and frustration in situations where you specify complex mathematical expressions that involve BigDecimal objects. Rather than build lengthy sequences of BigDecimal object-creation expressions and method calls (which can become tedious to write and are error-prone), you specify short macros that begin with the dec keyword. Consider the following code fragment:

 BigDecimal x;
dec x = "1" + "2" ;

After specifying BigDecimal variable x, the code fragment employs an Infiqs macro to create two BigDecimal objects containing 1 and 2, to add the second object's 2 to the first object's 1, and to assign the first object's reference to x. (Notice the mandatory space character after dec and before the also mandatory semicolon.) Macro expansion yields the code fragment below:

 BigDecimal x;
x=new BigDecimal("1",mc).add(new BigDecimal("2",mc),mc);

What's with mc? Java SE 5.0 introduced java.math.MathContext to support the concept of precision (number of decimal digits) for big decimals. Infiqs requires that a MathContext object be created and assigned to MathContext variable mc before specifying macros. Behind the scenes, Infiqs passes mc to BigDecimal constructors and method calls.

Cube roots

To demonstrate the usefulness of Infiqs, Kaplan presents an example that obtains a number's cube root via Newton-Raphson—a numerical analysis algorithm that repeatedly evaluates expression xi+1=xi-f(x)/f'(x). The xi+1 is the latest approximation of a root, xi is the previous approximation, f(x) is the function whose root is being sought, and f'(x) is the first derivative (slope) of f(x).

Whereas Kaplan focuses on finding the cube root of 17, I've generalized the example to locate any number's cube root. My example expresses Newton-Raphson expression xi+1=xi-(xi3-n)/3xi2) as Infiqs macro dec x = x - ( x ^ 3 - n ) / ( "3" * x ^ 2 ) ;. Listing 1's source code puts this macro into the context of an application that finds the cube root for arbitrary n.

Listing 1. CubeRoot.infiqs

 

// CubeRoot.infiqs

import java.math.*;

public class CubeRoot { // Maximum number of iterations performed by Newton-Raphson loop.

final static int MAXITER = 15;

public static void main (String [] args) { // Validate number of command line arguments.

if (args.length != 1) { System.err.println ("usage : java CubeRoot n"); System.err.println ("example: java CubeRoot 19"); System.err.println (" Find the cube root of 19"); return; }

// Specify a math context with 40 digits of precision.

MathContext mc = new MathContext (40);

// Specify the number whose cube root is being sought.

BigDecimal n = new BigDecimal (args [0], mc);

// Specify the starting value in the search for the cube root.

BigDecimal x; dec x = "1" ;

// Search for the cube root via the Newton-Raphson loop. Output each // successive iteration's value.

for (int i = 0; i < MAXITER; i++) { dec x = x - ( x ^ 3 - n ) / ( "3" * x ^ 2 ) ; System.out.println (x); } } }

By convention, Infiqs source files are assigned .infiqs file extensions. To convert an Infiqs source file to an equivalent Java source file, run the Infiqs application, identifying the Infiqs source file as the standard input source. You should also specify a filename with a .java file extension as the standard output destination (unless you want to examine macro expansion on the screen):

 java Infiqs <CubeRoot.infiqs >CubeRoot.java

Infiqs reads from the standard input source one line at a time. Lines not beginning with dec are passed unchanged to the standard output destination. If a line begins with dec and is terminated with a semicolon character, Infiqs evaluates the macro and copies equivalent Java source code to the standard output destination. Listing 2 presents CubeRoot's equivalent Java source code.

Listing 2. CubeRoot.java

 

// CubeRoot.infiqs

import java.math.*;

public class CubeRoot { // Maximum number of iterations performed by Newton-Raphson loop.

final static int MAXITER = 15;

public static void main (String [] args) { // Validate number of command line arguments.

if (args.length != 1) { System.err.println ("usage : java CubeRoot n"); System.err.println ("example: java CubeRoot 19"); System.err.println (" Find the cube root of 19"); return; }

// Specify a math context with 40 digits of precision.

MathContext mc = new MathContext (40);

// Specify the number whose cube root is being sought.

BigDecimal n = new BigDecimal (args [0], mc);

// Specify the starting value in the search for the cube root.

BigDecimal x; x=new BigDecimal("1",mc);

// Search for the cube root via the Newton-Raphson loop. Output each // successive iteration's value.

for (int i = 0; i < MAXITER; i++) { x=x.subtract(x.pow(3,mc).subtract(n,mc).divide(new BigDecimal("3",mc). multiply(x.pow(2,mc),mc),mc),mc); System.out.println (x); } } }

After comparing Listings 1 and 2, I think you'll agree with me that the dec x = x - ( x ^ 3 - n ) / ( "3" * x ^ 2 ) ; macro is much easier to understand than the longer x=x.subtract(x.pow(3,mc).subtract(n,mc).divide(new BigDecimal("3",mc).multiply(x.pow(2,mc),mc),mc),mc); expression.

Note
Whenever you introduce a new big decimal into an Infiqs macro, such as 3 in the "3" * x ^ 2 portion of the previous macro, you must surround the big decimal with double quotes, which tells Infiqs to create a new BigDecimal object.

Compile CubeRoot.java and run the resulting application. For example, invoke java CubeRoot 11.5 to discover how Newton-Raphson finds 11.5's cube root. In response, you should notice the following output, which indicates (at least to 40 digits of precision) that 2.257178717737000689722531351332293570729 is the cube root of 11.5:

 4.5
3.189300411522633744855967078189300411523
2.503065206710545285332351274464042799496
2.280542237899026579351609436198669505057
2.257417253127543076236357536600910449407
2.257178742941525167756167000357339173250
2.257178717737000971165927797364245467219
2.257178717737000689722531351332328663380
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729
2.257178717737000689722531351332293570729

Additional examples

The Infiqs distribution file includes several examples that enhance BigDecimal with trigonometric and other capabilities. For instance, SinD.infiqs introduces a static BigDecimal sin(BigDecimal x, MathContext mc) method for calculating a big decimal's sine value. These examples use Infiqs macros to simplify their operations.

Snobol3 language interpreter

From 1962 to 1967, AT&T Bell Laboratories' David J. Farber, Ralph E. Griswold, and Ivan P. Polonsky developed Snobol (String Oriented Symbolic Language), a text-manipulation language with a pattern-matching algorithm that is considered more powerful (in many respects) than regular expressions (strings whose patterns describe sets of strings).

The first version of Snobol did not come with built-in functions. This limitation was addressed by Snobol2. The third version, Snobol3, supported user-defined functions. Snobol4, the last major implementation of this language, came with improved patterns, numeric data types, arrays, structures, and tables. Furthermore, Snobol4 also influenced its Icon language descendent.

In late 2005, Dennis Heimbigner of the University of Colorado's Software Engineering Research Laboratory released a Java implementation of Snobol3. This product is distributed in the s3-1.0.jar distribution file, requires Java SE 5.0, and is released under the BSD license. (See Resources for a link to Heimbigner's Website, where you can download s3-1.0.jar.)

The s3-1.0.jar distribution file contains the Java source code to a Snobol3 language interpreter, examples, documentation, and more. Because this is a source-only distribution, you will need to build the interpreter's classfiles. The easiest (and recommended) way to accomplish this task is to use Apache's Ant build tool. I used version 1.6.5.

Before running Apache Ant, I copied s3-1.0.jar to c:\snobol3 on my Microsoft Windows platform. I then invoked jar xvf s3-1.0.jar to extract all files from this jar file. I changed to the automatically created c:\snobol3\s3-1.0 directory, where I ran Ant against the extracted build.xml file. After a successful build, I discovered s3.jar in s3-1.0.

Greetings from Snobol3

The s3.jar file contains the Snobol3 interpreter's classfiles. Before running this application, you need a Snobol3 program's source code to interpret. Because it's traditional to demonstrate an unfamiliar language with a short program that outputs a Hello, World message, look at Listing 3, which presents the appropriate source code.

Listing 3. hw.sno

 

* hw.sno

* Traditional "Hello, World" program in Snobol3.

SYSPOT = 'Hello, World'

Listing 3 shows hw.sno's contents. (Although file extensions are not required, Snobol source files are usually given the extension .sno.) Lines with asterisks (*) in Column 1 are considered to be comments; the interpreter ignores them. Blank lines are also ignored. The line containing SYSPOT = 'Hello, World' outputs Hello, World to the standard output device.

To interpret Snobol3 source code, specify java -jar s3.jar <options> filename. The <options> is a whitespace-delimited list of command line options (I identify some command line options later). filename identifies the source file to interpret. To interpret hw.sno, for example, invoke java -jar s3.jar hw.sno. You'll see the following output:

 Hello, World

Language tour

Unlike many other programming languages (including Snobol4), Snobol3 has only a single data type—the string. Because the Snobol3 language interpreter is written in Java, strings may contain any Unicode character, although the -escapes command line option must be specified to include Unicode-based escape sequences (like '\u00ff'), in addition to standard escape sequences (such as '\n').

Strings may appear literally in source code by placing them between single-quote characters. Double-quote characters may be used as well, but the -dquotes command line option must be specified. Keep in mind that a literal string's beginning and ending quotes must be the same (either single or double). If no characters appear between the quotes, the string is considered to be empty.

Strings are stored in variables, which must exist before they are accessed. Each variable's name consists of alphanumeric characters and periods. Unlike many other languages, a variable name can begin with a digit; it can begin with a period as well. Because Snobol3 is a case-sensitive language, two variables with the same name but different cases (x versus X) are considered distinct.

Strings are assigned to variables via the varname = string syntax, where = is Snobol3's assignment operator. The string concatenation operator appends a string, expressed literally or via a variable name, to another literal string or variable name; two strings are concatenated into a single string by placing whitespace characters between them:

 name = 'Jeff'
Name = 'Java'
SYSPOT = 'Hello, ' name '. How are you?'
SYSPOT = 'Hello, ' Name '. How are you?'

Although Snobol3 has no numeric data types, this language specifies a few operators that perform arithmetic on literal strings and variables that contain integers. These operators include addition (+), subtraction (-), multiplication (*), division (/), and negation (-). Arithmetic operators have the usual precedence; parentheses can be used to change precedence:

 

i = '100' SYSPOT = i + '8'

SYSPOT = i - '8' SYSPOT = i * '8' SYSPOT = i / '8' SYSPOT = '-8' SYSPOT = (i + '3') * '2'

A Snobol3 program consists of statements—one statement per line. Each statement is specified according to the [label] statement body/branch syntax. If the label field is present, it must begin in Column 1. If the branch field is present, it must begin with a forward slash. Even the statement body field is optional.

A label is a sequence of letters and digits that identifies the destination of a branch operation. By convention, I capitalize labels. A branch is a go-to that transfers execution to another part of a Snobol3 program. (Unfortunately, Snobol3 doesn't offer any other control-flow mechanisms.) There are three kinds of branches:

  • Branch on failure: The /F(label) or /f(label) syntax transfers execution to label if the statement body evaluates to failure.
  • Branch on success: The /S(label) or /s(label) syntax transfers execution to label if the statement body evaluates to success.
  • Unconditional branch: Occasionally, you'll want to transfer execution without regard for a statement body's success/failure. The /(label) syntax unconditionally transfers execution to label.

Success and failure are important to Snobol3's branch on failure and branch on success. They are signals that indicate whether a statement body succeeded or failed. For example, an attempt to read from the standard input device when there is no more input results in failure. It's helpful to think of success as a Boolean true value and failure as a Boolean false value, although they are never stored in variables.

Snobol3 associates input and output operations with the predefined SYSPOT and SYSPPT variables. When a string is assigned to SYSPOT, the string is printed to the standard output device on its own line. Similarly, each time SYSPPT is accessed, a line of text (less the trailing new line) is read from the standard input device. SYSPOT and SYSPPT must be uppercase.

Heimbigner has extended Snobol3 with predefined stdin, stdout, and stderr variables, which must be lowercase. The stdin variable is an alias for SYSPPT. Similarly, stdout aliases SYSPOT. (There is no variable for stderr to alias.) Listing 4 demonstrates this alias usage in a program that copies standard input to standard output.

Listing 4. copy.sno

 

* copy.sno

* Copy standard input to standard output.

n = '0'

COPY stdout = SYSPPT /F(DONE) n = n + '1' /(COPY)

DONE stderr = n ' lines were copied.'

Listing 4 tracks the number of lines copied from standard input to standard output. As long as SYSPPT returns a line, the line is tallied and an unconditional branch takes execution back to the COPY-labeled statement. But once SYSPPT indicates no more lines (a failure), the branch on failure transfers execution to the DONE-labeled statement.

Snobol3 comes with several built-in functions. For example, EQUALS(s, t) compares two strings for equality, in the Java s.equals(t) sense, and signals success if both strings are equal. Similarly, UNEQL(s, t) compares two strings for inequality, in the Java !s.equals(t) sense; success is signaled if both strings are not equal. Listing 5 demonstrates EQUALS(s, t).

Listing 5. cmp.sno

 

* cmp.sno

* Compare text entered via the standard input device against a password.

stdout = 'Enter your password' word = stdin

EQUALS(word, 'password') /S(SUCCESS)

stdout = 'Password not valid' /(FINISH)

SUCCESS

stdout = 'Password successfully entered'

FINISH

Although passwords shouldn't be specified in source code, Listing 5 nicely shows off EQUALS(s, t), which signals success and enables the branch to SUCCESS if the entered and hard-coded passwords are equal. Don't place whitespace characters between EQUALS (or any other built-in function) and its open ( character: an error occurs. Also, most function names can be specified in lowercase.

In addition to using built-in functions, you can define your own functions with Snobol3's built-in DEFINE(s,t,u...) function: s names the function and lists its parameters, t specifies the label that identifies the function's first statement, and u lists variables to be given local scope within the function. Variables outside of functions have global scope.

Invoking a user-defined function causes its arguments to be evaluated and assigned to corresponding parameters, which are treated as variables with local scope; listed local variables are initialized to empty strings. The function returns a value by assigning this value to a variable with the same name as the function: this pseudo-variable has local scope. Listing 6 demonstrates function definition, invocation, and return.

Listing 6. fact.sno

 

* fact.sno

* Compute a list of factorials from 0 through 10 inclusive.

define('fact(n)', 'FACT') /(MAIN) FACT fact = .le(n, '1') '1' /S(RETURN) fact = n * fact(n - '1') /(RETURN)

MAIN i = '0' MAIN1 stdout = i ' ' fact(i) i = i + '1' .LE(i, '10') /S(MAIN1)

Listing 6 uses define('fact(n)', 'FACT') /(MAIN) to define a recursive factorial function. The /(MAIN) unconditional branch is specified to transfer execution around the function body; otherwise the interpreter outputs an error. The first statement in the function body is identified by the FACT label.

The labeled fact = .le(n, '1') '1' /S(RETURN) statement determines whether or not recursion continues. This decision is made by comparing n with 1. If n is less than or equal to 1, the built-in .LE(i, j) function signals success, 1 assigns to pseudo-variable fact, and /S(RETURN) returns execution from the function.

If .LE(i, j) signals failure, fact = n * fact(n - '1') /(RETURN) executes. The /(RETURN) unconditional branch is necessary to ensure that recursion takes place: assignment to a pseudo-variable requires a return from a recursive call. To see the results of this recursion, examine the following output:

 0 1
1 1
2 2
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880
10 3628800

Snobol3's greatest strength lies in its support for pattern matching, the process of examining a subject string for a certain combination of characters. The subject is the first statement element after the (optional) label. This element is then followed by one or more whitespace characters, and the pattern—I show this syntactically below:

 [label] subject pattern

The pattern match succeeds if the pattern is found in the subject; otherwise the match fails. To determine what statement to execute following a success or a failure, a branch field follows the pattern. Although you can specify both success and failure branches, as shown syntactically below, I haven't found a good reason to do this:

 [label] subject pattern /S(label1) F(label2)

Pattern matching is based on pattern elements, special character sequences that perform different kinds of pattern matches. Three examples are StringMatch (match entire pattern), Arb (match arbitrary characters), and Len (match a fixed-length string). Listing 7 demonstrates these pattern elements; it also shows you how to replace the portion of a subject that matches a pattern.

Listing 7. pm.sno

 

* pm.sno

* Pattern matching.

subject = 'The quick brown fox jumped over the lazy ox.'

stdout = 'Subject: The quick brown fox jumped over the lazy ox.' stdout = 'Enter a pattern'

pattern = stdin stdout = ''

* StringMatch: Match entire pattern

subject pattern /F(NOMATCH)

stdout = pattern ' found in ' subject /(NEXT)

NOMATCH stdout = pattern ' not found in ' subject

NEXT

x = '?'

text = 'Mountain'

* Arb: Match arbitrary characters

text 'o' *x* 'a'

stdout = x

* Len: Match fixed-length string

text *x/'4'*

stdout = x

* Len with replacement

text *x/'4'* = 'Foun'

stdout = 'x = ' x ' text = ' text

Listing 7's text 'o' *x* 'a' statement demonstrates an Arb match in the text subject variable. This match returns all characters (in variable x) between o and a. Statement text *x/'4'* demonstrates a Len match: the first four characters return in x; statement text *x/'4'* = 'Foun' replaces these characters with Foun.

Options fix

The s3-1.0.jar distribution file's s3-1.0\doc directory includes a reference manual: refman.html. This manual offers detailed information about the Snobol3 interpreter. One section lists and describes various command line options that you can pass to the interpreter. Examples include -debug, -stacktrace, -lint, and -dquotes.

Unfortunately, not everything in this manual is correct. For example, if you specify -dquotes (which lets you quote strings within double quotes), the interpreter spits out an error message referring to -dquotes as an unknown option. The same is true of the -escapes option, which allows strings to include standard escape sequences. Fortunately, you can fix these problems.

To fix the -dquotes problem, first change to the s3-1.0\src directory, which contains the Snobol3.java source file. Load this file into a text editor. Near the top of the source code, you'll find a static String [] formals array specification. Append "dquotes?" to the array, close the editor, move up one directory level, and invoke Ant on build.xml.

After Ant announces a successful build, create a simple test.sno file containing this line: stdout = "double quotes supported". (Make sure stdout does not begin in Column 1.) Then invoke java -jar s3.jar -dquotes test.sno. If all goes well, you should see the message double quotes supported instead of an error message.

From Snobol3 to Snobol4

This completes my coverage of Snobol3. To learn more about this language, you'll need to download various source files from the Internet and interpret them with the Snobol3 language interpreter. Unfortunately, many of these files require Snobol4. Because Heimbigner hasn't released a Java-based Snobol4 interpreter, you'll have to implement this interpreter yourself. The following two resources can help you with this task:

  • JPattern: In late 2005, Heimbigner released JPattern, a Java-based product that offers Snobol4-style pattern matching.
  • "A Snobol4 Tutorial": This tutorial will help you learn those Snobol4 features you need to implement. (See Resources for links to these Websites.)

Heimbigner's Snobol3 reference manual is helpful for adding built-in functions to Snobol3. It helped me introduce Snobol4's DATE() function (which returns the current date/time) and &TRIM keyword, as a TRIM(s) function (which removes leading/trailing spaces from string s), to the Snobol3 language interpreter—which required several changes to s3-1.0\src\Primitive.java:

  1. I placed an import java.util.Date; statement in the imports section, which I needed to access the Date class.
  2. I next appended the following function definition method calls to the static void definePrimitives() method:

     

    fcnDef("date",(p=new $Date())); fcnDef("DATE",p);

    fcnDef("trim",(p=new $Trim())); fcnDef("TRIM",p);

    The fcnDef() method calls store function definitions in a java.util.HashMap called functions, located in the Snobol3 class. In keeping with Snobol3's policy of letting you call a function by specifying its name entirely in uppercase or in lowercase, I stored the same function definition with both versions of the function's name.

  3. I lastly introduced the following $Date and $Trim classes at the end of the source file:

     

    class $Date extends Primitive { public $Date() {super();} public int nargs() {return 0;} public void execute(VM vm, PrimFunction fcn) throws Failure { vm.cc = false; setReturn (vm, new Date ().toString ()); } }

    class $Trim extends Primitive { public $Trim() {super();} public int nargs() {return 1;} public void execute(VM vm, PrimFunction fcn) throws Failure { String s = (String)vm.pop(); s = s.trim (); vm.cc = false; setReturn (vm, s); } }

    The Primitive class's public int nargs() method returns the number of arguments that can be passed to a function. Because many Snobol3 built-in functions require two arguments, this method was overridden in the $Date and $Trim subclasses. Also, some of the Snobol3 language interpreter's VM-related methods and fields were accessed.

After completing the necessary additions to Primitive.java, I used Ant to rebuild s3.jar. Following a successful build, I created a simple source file to test the new DATE() and TRIM(s) functions—and Snobol3's built-in SIZE(s) function, which returns a string's length. This source file's content, followed by the appropriate output, appears below:

 

stdout = DATE()

str = ' ABC ' stdout = 'str = [' str '], size = ' size(str)

str = TRIM(str) stdout = 'str = [' str '], size = ' size(str)

Wed Jul 12 16:00:39 CDT 2006 str = [ ABC ], size = 9 str = [ABC], size = 3

Conclusion

Most of the previous Java Fun and Games installments have presented homework for you to accomplish; this installment is no different. To reinforce this article's content, I've prepared the following questions and exercises, which will help you become even better acquainted with the Infiqs macro expander and the Snobol3 language interpreter:

  • What is wrong with the dec x = "1" + 2 ; macro? Why is this wrong?
  • Infiq uses ^ for its exponentiation operator. What other exponentiation syntax is supported?
  • Snobol3 includes the built-in CALL(s) function for dynamically invoking a function identified by string s. Use this function to invoke Listing 6's FACT(n) function, with 6 as the value of n.
  • Extend the Snobol3 language interpreter with built-in LCASE(s) and UCASE(s) functions. Each function has a single string parameter (s) and returns a string. Can you write these functions as user-defined functions (assume that the interpreter hasn't been enhanced)?

I downloaded the Infiqs macro expander and the Snobol3 language interpreter from the Programming Languages for the Java Virtual Machine Website. This Website provides links to more than 200 third-party products that evolve Java in various ways. After you finish playing with Infiqs and Snobol3, I recommend that you check out these other Javalution products.

Jeff Friesen is a freelance software developer and educator specializing in C, C++, and Java technology.

Learn more about this topic