Introduction to scripting in Java, Part 1

Learn what makes a scripting language like Ruby shine and why Groovy's suddenly so groovy, in this two-part excerpt from the forthcoming Scripting in Java: Languages, Frameworks, and Patterns (Addison Wesley Professional, August 2007).

1 2 3 4 Page 2
Page 2 of 4

Definition of a scripting language

There are many definitions of the term scripting language, and every definition you can find does not fully match some of the languages known to be representatives of scripting languages. Some people categorize languages by their purpose and others by their features and the concepts they introduce. In this chapter, we discuss all the characteristics defining a scripting language. In Chapter 2, we categorize scripting languages based on their role in the development process.

Compilers vs. interpreters

Strictly speaking, an interpreter is a computer program that executes other high-level programs line by line. Languages executed only by interpreters are called interpreted languages.

To better understand the differences between compilers and interpreters, let's take a brief look at compiler architecture (see Figure 1.1).

As you can see in Figure 1.1, translating source code to machine code involves several steps:

  1. First, the source code (which is in textual form) is read character by character. The scanner groups individual characters into valid language constructs (such as variables, reserved words, and so on), called tokens.

  2. The tokens are passed to the parser, which checks that the correct language syntax is being used in the program. In this step, the program is converted to its parse tree representation.

  3. Semantic analysis performs type checking. Type checking validates that all variables, functions, and so on, in the source program have been used consistently with their definitions. The result of this phase is intermediate representation (IR)  code.

  4. Next, the optimizer (optionally) tries to make equivalent but improved IR code.

  5. In the final step, the code generator creates target machine code from the optimized IR code. The generated machine code is written as an object file.

Figure 1.1
Figure 1.1

Compiler architecture

To create one executable file, a linking phase is necessary. The linker takes several object files and libraries, resolves all external references and creates one executable object file. When such a compiled program is executed, it has complete control of its execution.

Unlike compilers, interpreters handle programs as data that can be manipulated in any suitable way (see Figure 1.2).

Figure 1.2
Figure 1.2

Interpreter architecture

As you can see in Figure 1.2, the interpreter, not the user program, controls program execution. Thus, we can say the user program is passive in this case. So, to run an interpreted program on a host, both the source code and a suitable interpreter must be available. The presence of the program source (script) is the reason why some developers associate interpreted languages with scripting languages. In the same manner, compiled languages are usually associated with system-programming languages.

Interpreters usually support two modes of operation. In the first mode, the script file (with the source code) is passed to the interpreter. This is the most common way of distributing scripted programs. In the second, the interpreter is run in interactive mode. This mode enables the developer to enter program statements line by line, seeing the result of the execution after every statement. Source code is not saved to the file. This mode is important for initial system debugging, as we see later in the book.

In the following sections, I provide more details on the strengths and weaknesses of using compilers and interpreters. For now, here are some clear drawbacks of both approaches important for our further discussion:

  • It is obvious compiled programs usually run faster than interpreted ones. This is because with compiled programs, no high-level code analysis is being done during runtime.

  • An interpreter enables the modification of a user program as it runs, which enables interactive debugging capability. In general, interpreted programs are easier to debug because most interpreters point directly to errors in the source code.

  • Interpreters introduce a certain level of machine independence because no specific machine code is generated.

  • The important thing from a scripting point of view, as we see in a moment, is interpreters allow the variable type to change dynamically. Because the user program is reexamined constantly during execution, variables do not need to have fixed types. This is harder to accomplish with compilers because semantic analysis is done at compile time.

From this list, we can conclude interpreters are better suited for the development process, and compiled programs are better suited for production use. Because of this, for some languages, you can find both an interpreter and a compiler. This means you can reap all the benefits of interpreters in the development phase and then compile a final version of the program for a specific platform to gain better performance.

Many of today's interpreted languages are not interpreted purely. Rather, they use a hybrid compiler-interpreter approach, as shown in Figure 1.3.

Figure 1.3
Figure 1.3

Hybrid compiler-interpreter architecture

In this model, the source code is first compiled to some intermediate code (such as Java bytecode), which is then interpreted. This intermediate code is usually designed to be very compact (it has been compressed and optimized). Also, this language is not tied to any specific machine. It is designed for some kind of virtual machine, which could be implemented in software. Basically, the virtual machine represents some kind of processor, whereas this intermediate code (bytecode) could be seen as a machine language for this processor.

This hybrid approach is a compromise between pure interpreted and compiled languages, due to the following characteristics:

  • Because the bytecode is optimized and compact, interpreting overhead is minimized compared with purely interpreted languages.

  • The platform independence of interpreted languages is inherited from purely interpreted languages because the intermediate code could be executed on any host with a suitable virtual machine.

Lately, just-in-time compiler technology has been introduced, which allows developers to compile bytecode to machine-specific code to gain performance similar to compiled languages. I mention this technology throughout the book, where applicable.

Source code in production

As some people have pointed out, you should use a scripting language to write user-readable and modifiable programs that perform simple operations and control the execution of other programs. In this scenario, source code should be available in the production system at runtime, so programs are delivered not in object code, but in plain text files (scripts) in their original source. From our previous discussion of interpreters, it is obvious this holds true for purely interpreted languages. Because scripting languages are interpreted, we can say this rule applies to them as well. But because some of them use a hybrid compilation-interpretation strategy, it is possible to deliver the program in intermediate bytecode form. The presence of the bytecode improves execution speed because no compilation process is required. The usual approach is to deliver necessary libraries in the bytecode and not the program itself. This way, execution speed is improved, and the program source is still readable in production. Some of the compiler-interpreter languages cache in the file the bytecode for the script on its first execution. On every following script execution, if the source hasn't been changed, the interpreter uses the cached bytecode, improving the startup speed required to execute the script.

As such, the presence of source code in the production environment is one of the characteristics of scripting languages, although you can omit it for performance reasons or if you want to keep your source code secret.

1 2 3 4 Page 2
Page 2 of 4