Introduction to scripting in Java, Part 1

Learn what makes a scripting language like Ruby shine and why Groovy's suddenly so groovy, in this two-part excerpt from the forthcoming Scripting in Java: Languages, Frameworks, and Patterns (Addison Wesley Professional, August 2007).

1 2 3 4 Page 3
Page 3 of 4

 

Typing strategies

Before I start a discussion on typing strategies implemented in different programming languages, I have to explain what types are.

There is no simple way to explain what typing is because its definition depends on the context in which it is used. Also, a whole branch of mathematics is dedicated to this issue. It is called type theory, and its proponents have the following saying, which emphasizes their attitude toward the importance of this topic:

Design the type system correctly, and the language will design itself.

To put it simply, types are metadata that describe the data stored in some variable. Types specify what values can be stored in certain variables, as well as the operations that can be performed on them.

Type constraints determine how we can handle and operate a certain variable. For example, what happens when you add the values of one variable to those of another depends on whether the variables are integers, floats, Booleans or strings. A programming language's type system could classify the value hello as a string and the value 7 as a number. Whether you can mix strings with numbers in this language depends on the language's type policy.

Some types are native (or primitive), meaning they are built into the language. The usual representatives of this type category are Booleans, integers, floats, characters and even strings in some languages. These types have no visible internal structure.

Other types are composite, and are constructed of primitive types. In this category, we have structures and various so-called container types, such as lists, maps and sets. In some languages, string is defined as a list of characters, so it can be categorized as a composite type.

In object-oriented languages, developers got the opportunity to create their own types, also known as classes. This type category is called user-defined types. The big difference between structures and classes is with classes, you define not just the structure of your complex data, but also the behavior and possible operations you can perform with it. This categorizes every class as a single type, where structures (in C, for example) are one type.

Type systems provide the following major benefits:

  • Safety — Type systems are designed to catch the majority of type-misuse mistakes made by developers. In other words, types make it practically impossible to code some operations that cannot be valid in a certain context.

  • Optimization — As I already mentioned, languages that employ static typing result in programs with better-optimized machine code. That is because early type checks provide useful information to the compiler, making it easier to allocate optimized space in memory for a certain variable. For example, there is a great difference in memory usage when you are dealing with a Boolean variable vs. a variable containing some random text.

  • Abstraction — Types allow developers to make better abstractions in their code, enabling them to think about programs at a higher level of abstraction, not bothering with low-level implementation of those types. The most obvious example of this is in the way developers deal with strings. It is more useful to think of a string as a text value rather than as a byte array.

  • Modularity — Types allow developers to create APIs for the subsystems used to build applications. Typing localizes the definitions required for interoperability of subsystems and prevents inconsistencies when those subsystems communicate.

  • Documentation — Use of types in languages can improve the overall documentation of the code. For example, a declaration that some method's arguments are of a specific type documents how that method can be used. The same is true for return values of methods and variables.

Now that we know the basic concepts of types and typing systems, we can discuss the type strategies implemented in various languages. We also discuss how the choice of implemented typing system defines languages as either scripting (dynamic) or static.

Dynamic typing

The type-checking process verifies that the constraints introduced by types are being respected. System-programming languages traditionally used to do type checking at compile time. This is referred to as static typing.

Scripting languages force another approach to typing. With this approach, type checking is done at runtime. One obvious consequence of runtime checking is all errors caused by inappropriate use of a type are triggered at runtime. Consider the following example:

x = 7
y = "hello world"
z = x + y

This code snippet defines an integer variable, x, and a string variable, y, and then tries to assign a value for the z variable that is the sum of the x and y values. If the language has not defined an operator, +, for these two types, different things happen depending on whether the language is statically or dynamically typed. If the language was statically typed, this problem would be discovered at compile time, so the developer would be notified of it and forced to fix it before even being able to run the program. If the language was dynamically typed, the program would be executable, but when it tried to execute this problematic line, a runtime error would be triggered.

Dynamic typing usually allows a variable to change type during program execution. For example, the following code would generate a compile-time error in most statically typed programming languages:

x = 7
x = "Hello world"

On the other hand, this code would be legal in a purely dynamic typing language. This is simply because the type is not being misused here.

Dynamic typing is usually implemented by tagging the variables. For example, in our previous code snippet, the value of variable x after the first line would be internally represented as a pair (7, number). After the second line, the value would be internally represented as a pair ("Hello world", string). When the operation is executed on the variable, the type is checked and a runtime error is triggered if the misuse is discovered. Because no misuse is detected in the previous example, the code snippet runs without raising any errors.

I comprehensively discuss the pros and cons of these approaches later in this chapter, but for now, it is important to note a key benefit of dynamic typing from the developer's point of view. Programs written in dynamically typed languages tend to be shorter than equivalent solutions written in statically typed languages. This is an implication of the fact that developers have more freedom in terms of expressing their ideas when they are not constrained by a strict type system.

Weak typing

There is yet another categorization of programming-language typing strategy. Some languages raise an error when a programmer tries to execute an operation on variables whose types are not suitable for that operation (type misuse). These languages are called strongly typed languages. On the other hand, weakly typed languages implicitly cast (convert) a variable to a suitable type before the operation takes place.

To clarify this, let's take a look at our first example of summing a number and string variable. In a strongly typed environment, which most system-programming languages deploy, this operation results in a compile-time error if no operator is defined for these types. In a weakly typed language, the integer value usually would be converted to its string representative (7 in this case) and concatenated to the other string value (supposing that the + operator represents string concatenation in this case). The result would be a z variable with the "7HelloWorld" value and the string type.

Most scripting languages tend to be dynamic and weakly typed, but not all of them use these policies. For example, Python, a popular scripting language, employs dynamic typing, but it is strongly typed. We discuss in more detail the strengths and weaknesses of these typing approaches, and how they can fit into the overall system architecture, later in this chapter and in Chapter 2.

Data structures

For successful completion of common programming tasks, developers usually need to use different complex data structures. The presence of language mechanisms for easy handling of complex data structures is in direct connection to developers' efficiency.

Scripting languages generally provide more powerful and flexible built-in data types than traditional system-programming languages. It is natural to see data structures such as lists, sets, maps, and so on, as native data types in such languages.

Of course, it is possible to implement an arbitrary data structure in any language, but the point is these data structures are embedded natively in language syntax making them easier to learn and use. Also, without this standard implementation, novice developers are often tempted to create their own solution that is usually not robust enough for production use.

As an example, let's look at Python, a popular dynamic language with lists and maps (also called dictionaries) as its native language type. You can use these structures with other language constructs, such as a for loop, for instance. Look at the following example of defining and iterating a simple list:

list = ["Mike", "Joe", "Bruce"]
for item in list :
    print item

As you can see, the Python code used in this example to define a list is short and natural. But more important is the for loop, which is designed to naturally traverse this kind of data. Both of these features make for a comfortable programming environment and thus save some time for developers.

Java developers may argue that Java collections provide the same capability, but prior to J2SE 1.5, the equivalent Java code would look like this:

String[] arr = new String[]{"Mike", "Joe", "Bruce"};
List list = Arrays.asList(arr);
for (Iterator it = list.iterator(); it.hasNext(); ) {
      System.out.println(it.next());
}

Even for this simple example, the Java code is almost twice as long as and is harder to read than the equivalent Python code. In J2SE 1.5, Java got some features that brought it closer to these scripting concepts. With the more flexible for loop, you could rewrite the preceding example as follows:

String[] arr = new String[]{"Mike", "Joe", "Bruce"};
List list = Arrays.asList(arr);        
for (String item : list) {
      System.out.println(item);
}

With this in mind, we can conclude data structures are an important part of programming, and therefore native language support for commonly used structures could improve developers' productivity. Many scripting languages come with flexible, built-in data structures, which is one of the reasons why they are often categorized as "human-oriented."

Code as data

The code and data in compiled system programming languages are two distinct concepts. Scripting languages, however, attempt to make them more similar. As I said earlier, programs (code) in scripting languages are kept in plain text form. Language interpreters naturally treat them as ordinary strings.

Evaluation

It is not unusual for the commands (built-in functions) in scripting languages to evaluate a string (data) as language expression (code). For example, in Python, you can use the eval() function for this purpose:

x = 9
eval("print x + 7")

This code prints 16 on execution, meaning the value of the variable x is embedded into the string, which is evaluated as a regular Python program.

More important is the fact that scripted programs can generate new programs and execute them "on the fly". Look at the following Python example:

temp = open("temp.py", "w")
temp.write("print x + 7")
temp.close()
x = 9
execfile("temp.py")

In this example, we created a file called temp.py, and we wrote a Python expression in it. At the end of the snippet, the execfile() command executed the file, at which point 16 was displayed on the console.

This concept is natural to interpreted languages because the interpreter is already running on the given host executing the current script. Evaluation of the script generated at runtime is not different from evaluation of other regular programs. On the other hand, for compiled languages this could be a challenging task. That is because a compile/link phase is introduced during conversion of the source code to the executable program. With interpreted languages, the interpreter must be present in the production environment, and with compiled languages, the compiler (and linker) is usually not part of the production environment.

Closures

Scripting languages also introduce a mechanism for passing blocks of code as method arguments. This mechanism is called a closure. A good way to demonstrate closures is to use methods to select items in a list that meet certain criteria.

Imagine a list of integer values. We want to select only those values greater than some threshold value. In Ruby, a scripting language that supports closures, we can write something like this:

threshold = 10
newList = orig.select {|item| item > threshold}

The select() method of the collection object accepts a closure, defined between the {}, as an argument. If parameters must be passed, they can be defined between the ||. In this example, the select() method iterates over the collection, passing each item to the closure (as an item parameter) and returning a collection of items for which the closure returned true.

Another thing worth noting in this example is closures can refer to variables visible in the scope in which the closure is created. That's why we could use the global threshold value in the closure.

Closures in scripting languages are not different from any other data type, meaning methods can accept them as parameters and return them as results.

1 2 3 4 Page 3
Page 3 of 4