Clojure: Challenge your Java assumptions

Let this functional language for the JVM redefine your approach to software design

Clojure is a dynamic functional language for the JVM, recently released in version 1.0. Clojure offers a new set of programming techniques for robust code and rapid development. In particular, it has new solutions for multicore computing. Whether you make the shift to Clojure or stick to Java, learning about this new language will challenge your assumptions about the best way to design software.

Clojure is a new language for the JVM. Like Groovy, Jython, and JRuby, it offers dynamism, conciseness, and seamless interoperability with Java.

Clojure is a dialect of Lisp, recently released in version 1.0. Developers often dismiss Lisp as impractical, perhaps because of its distinctive syntax, its ascetic simplicity, or the academic uses it's often applied to. Clojure is set to break that curse. Rich Hickey designed the language to make it easy and practical to take on the same sorts of problems you handle with Java, more robustly and with less code.

Any new programming language, no matter how good, has to find its killer app to break through into widespread use. Clojure's killer app is parallel programming for multicore CPUs, which are now the major route to increased processing power. With its immutable datatypes, lockless concurrency, and simple abstractions, Clojure makes multithreading simpler and more robust than in Java.

I'll describe some of Clojure's distinctive features and show how applying lessons from the language can make your Java code more elegant and less buggy. I hope that you come away wanting to learn more.

Code as data

Let's start with the simple function in Listing 1, which calculates the area of a circle.

Listing 1. A simple Clojure function

(defn
   circle-area [r] 
    (* Math/PI r r))

Clojure code looks quite different from Java code, for a simple reason. In Clojure, code is data; it is built from exactly the same lists and vectors as any other data structure. The consistent homoiconic syntax makes it easier for both programmers and programs to understand and manipulate the code.

So, the function definition in Listing 1 is nothing but a list (marked with parentheses), holding a vector (marked with square brackets) and another list. The list syntax in the first line defines the function. After the function name, a vector holds the parameters. And in the last line, the list syntax invokes the function of multiplication, applied to three operands.

Clojure's minimalist syntax becomes quite readable with even a little practice. Support for Clojure in all major development environments -- including NetBeans, IntelliJ, and Eclipse, as well as vi and Emacs -- makes it even easier to handle. Figure 1 shows an example from VimClojure, where paired parentheses are matched by color. (The function extracts lower-case characters from a string: (get-lower "AbCd") is "bd".)

Figure 1. Clojure support in Vim (Click to enlarge.)

In fact, because of its lightweight syntax, a Clojure program is often simpler than the equivalent Java. The Java getLower() function in Listing 2, for example, has twice the brackets and four times the code as the Clojure function.

Listing 2. A Java function -- more complicated than the Clojure equivalent

public static String getLower(String s) { 
  StringBuffer sb = new StringBuffer(); 
  for (int i = 0; i > s.length(); i++) { 
  char ch = s.charAt(i); 
  if (Character.isLowerCase(ch)) { 
       sb.append(ch); 
  } 
  }  
   return sb.toString(); 
}

In Java, like other languages, code is transformed into an abstract syntax tree as part of compilation. Java allows access to this structure's classes, fields, and methods using reflection, but the access is read-only, and there is no access to the method implementations. Clojure macros, in contrast, can freely manipulate the syntax tree, letting you implement functionality that ordinary code can't handle. With macros, you can transform code elements while conditioning, wrapping, and deferring evaluation.

Here's a familiar and painful example. To process data from a reader or stream in Java, you must jump through some hoops, as shown in Listing 3.

Listing 3. Simple functionality, complicated in Java

BufferedReader rdr = null; 
try { 
    rdr = new BufferedReader(new FileReader(fileName)); 
    //core processing logic goes here 
} finally { 
    if (rdr != null) { 
       try { 
          rdr.close(); 
       } catch (IOException e) { } 
    } 
}

Most of the code in Listing 3 is boilerplate, wrapping around the part of the design that varies, namely the data processing. Java often, as here, fails to capture the constant features in reusable code. The essence of good software engineering is knowing what is likely to change and what is not, and Clojure makes it easy to create reusable new code structures with macros, such as the one in Listing 4 (from the core library).

Listing 4. A Clojure macro

(defmacro with-open [bindings & body] 
  `(let ~bindings 
    (try 
      ~@body 
    (finally 
      (.close ~(first bindings)))))) 

(with-open [rdr (java.io.BufferedReader.(java.io.FileReader. "a.txt"))]
  (println (.readLine rdr)))

The macro receives a vector bindings, which holds a symbol, and a reader or stream. In the second line, the symbol is bound to the reader. The macro also receives the body, code that's wrapped in the try statement. In the last two lines, the macro is executed, calling println as its "data processing" logic.

A macro rearranges the code after it is parsed but before it is executed; the actual execution of the body occurs only when the macro is invoked, as with any function. Together with Clojure's dynamic typing and unchecked exceptions, this macro makes the Clojure code not only more reusable, but also more readable.

Clojure macros have some resemblance to C macros: Both rearrange code in a preprocessing step, before execution. But whereas the C preprocessor works on the code as a string, Clojure reuses the expressiveness of the programming language itself to manipulate the code as a data structure.

The last two lines of Listing 4 illustrate the ease of Clojure/Java interoperability: java.io.BufferedReader. -- with the dot at the end -- is a constructor invocation, and .readLine calls the method. In Listing 1, Math/PI accesses a static field. Java can similarly call Clojure code. Clojure can also inherit from Java classes, and vice versa.

Concurrency with pure functional programming

Java has built-in support for multithreading. But concurrency in Java is still difficult. Without locks in the right places, race conditions corrupt data; with locks in the wrong places, starvation and deadlock slow or stop threads. In practice, most developers write single-threaded applications or let application servers manage the threads. But when a single application must decompose a problem into parallel lines of execution and coordinate a solution, there's little choice but to write explicitly multithreaded code.

Deadlock-inducing anti-patterns

Read about three concurrency anti-patterns sure to lead to deadlock in Obi Ezechukwu's High Performance Java blog.

Multicore chips make this more urgent. With single-core CPUs, multithreading was mostly used to allow one task to run while others block on I/O. But with today's CPUs, true concurrency is needed to keep multiple cores running at their fullest. With Clojure's pure functional programming and with special multithreading constructs, threadsafe code becomes much easier.

By default, Clojure functions are pure functions: they receive parameters and pass back a return value, with no changes to any accessible state. Different state requires a new object. For example, we can define a map (marked with curly braces), then add a key to the map with assoc:

(let [m {:roses "red", :violets "blue"}]  
  (assoc m :sugar "sweet"))

The result is a new map with the additional key: {:sugar "sweet", :violets "blue", :roses "red"}.

The original map remains unchanged.

This would seem to make for inefficient copying every time a change is needed, but in fact it enables a nice optimization: immutable objects, like the two maps above, can share parts of their underlying structure, with no risk of changes to one structure affecting the other one inadvertently.

Pure functions are easy for programmers to understand. Because there are no side effects, everything relevant appears in the functions' arguments and return value, simplifying debugging and testing.

Pure functions are also easy for Clojure itself to understand, and so to apply optimizations. Pure function calls can run in parallel, without regard for order; they can run in separate cores without the risk of interfering with one another's results. They can be rerun safely when a transaction fails, and their execution can be deferred until the results are needed. Their results can be memoized -- cached for lookup on subsequent invocations.

It really works. Clojure does these and other optimizations safely, and with little effort on your part.

In your Java, using immutability and side-effect-free functions will make it easier for you to optimize and avoid bugs. Where possible, declare classes and all their fields final, setting their value in the constructor. You can also add partial safety to mutable objects with wrappers like Collections.unmodifiableCollection().

Strings are the best-known immutable Java objects. Because they never change, the JVM can intern them and cache their hashcodes to minimize the creation of new objects. Such optimizations are rare in Java; in Clojure they are quite ordinary.

Thread-safe state

Not everything can be immutable. Mutable state is essential for any input or output, whether to the disk, network, or GUI. Managing the mutable state is all the more difficult when multiple threads are involved. Clojure provides special constructs to support these cases safely.

Typical thread-safe data structures in Java are locked with synchronized. This blocks threads, slows execution, and poses the risk of a deadlock.

A Clojure Ref is a lockless alternative, using an innovative concurrency model called software transactional memory. As in optimistic database transactions, multiple threads can simultaneously try to update the same variables without blocking; in cases of conflict among multiple writes, one thread rolls back and retries the function.

Listing 5 defines a Ref that wraps a set, marked with #{}, holding the books in our bookshelf. Any thread can safely shelve or unshelve a book, adding (conj) or removing (disj) the item by making the Ref refer to a new set. All changes to the referenced value are done in a transaction with dosync (which is unrelated to the Java synchronized keyword).

Listing 5. Defining a Ref

(def bookshelf (ref #{})) 
(defn shelve[book]
  (dosync (alter bookshelf conj book))) 
(defn unshelve [book] 
   (dosync (alter bookshelf disj book)))

You can retrieve the value, using @bookshelf, without a transaction.

The result is a simple threadsafe in-memory transactional database. The complexities of locking are hidden. No thread need wait for another, and each thread sees a consistent view of the data.

A Clojure Agent runs a function asynchronously, using a separate thread from a thread-pool. You can then retrieve its value when it's done running. For example, this code below maintains log -- a sequence of strings:

(def log (agent [])) 
 (send log conj "2009-03-28 10:34 Shelved Hamlet")

This code first sets up the agent, wrapping an empty vector, then adds records by sending the conj function to the agent, along with the item to add. Though conj returns quickly, if we had sent the agent a long-running function, it would be valuable to let the agent update without blocking the calling thread.

Some of the same concurrency designs are also in useful in Java multithreading. In addition to keeping mutability to a minimum, it's a good idea to control the points of cross-thread shared mutable state carefully. When possible, avoid building your designs from low-level primitives like synchronized and wait(), and use higher-level abstractions such as those in the java.util.concurrent package. (Where needed, the Java multithreading constructs are also available in Clojure.)

A Clojure Var provides a way to rebind a variable temporarily within a thread. It passes information long-distance like a global variable. It does this safely, because the value is only visible in a single thread, and only in the dynamic scope of the binding's runtime invocation.

A Java thread-local variable is similar: It passes state "long distance," skipping levels of the call stack, while avoiding the cross-thread risks of static fields. However, it continues to live throughout the thread and is not restricted to a well-defined dynamic scope, as a Var binding is.

For example, the Webjure Web framework handles HTTP requests with a binding of the relevant HTTP *request* and *response* objects. All request-handling code has access to these objects, and there is no need to pass them down the stack as an argument to every function. Other threads do not see this value, and each HTTP request receives its own objects. Even within the thread, the new value is visible only in the scope of the binding -- that is, during the handling of this single request, as shown in Listing 6.

Listing 6. Var bindings in Webjure

(binding [*request* request *response* response] 
  (binding [*matched-handler* (find-handler (request-path *request*))] 
    ((*matched-handler* :handler)))))) ; This invokes the request-handler

Type hints

Clojure compiles on the fly, producing bytecode that runs as fast as any Java. However, slower reflective invocations are necessary when the compiler can't infer the types of parameters- which, of course, happens often in a dynamically typed language.

Here, we set Clojure to warn if reflection has to be used, then define the function:

(set! *warn-on-reflection* true)  
(defn year [cal]
  (.get cal java.util.Calendar/YEAR)) 
Reflection warning, line: 3 - call to get can't be resolved.

We can, however, inform the compiler of the type with #^Calendar. This "metadata" (extra information orthogonal to the main purpose of the object) lets the compiler avoid reflection by creating just-in-time fast bytecode:

(defn year [#^java.util.Calendar cal]
 (.get cal java.util.Calendar/YEAR))

In Java, annotations likewise let you add extra information to source code. Annotations, however, are less powerful than Clojure metadata. They can be set only at development time, are limited to simple types like strings, do not directly affect behavior, and require static type definitions of their own. Thus, they are rarely used in practice except by framework developers.

By the way, though just-in-time compilation is convenient, you can also compile Clojure at development time, as you do for Java. In this way, Clojure becomes just another Java library -- a good way to bypass the usual resistance to new languages, and to Lisp in particular, from managers and customers.

Clojure: Challenging your Java assumptions

Clojure has clean, minimal syntax -- the beauty of Lisp transformed for the modern world. It lets you safely optimize and parallelize your code, and extend the language with macros. The JVM provides a well-supported infrastructure and access to a tremendous array of libraries. And even as you continue coding in Java, your new knowledge will highlight limitations you have always had to work around -- even if you never noticed them before.

Acknowledgments

My thanks to Dimitry Gashinsky, Stuart Halloway, Raoul Duke, Christophe Grand, Boris Melamed, Stuart Sierra, Robert Bauer, Chas Emerick, Laurent Petit, and Paul Stadig for their comments. Any mistakes that remain are mine.

Joshua Fox has worked with Java since 1996. His experience with dynamic languages in the JVM runs from JavaScript-controlled VoIP telephony to an implementation of a Python-based ontological expression language. Two of his articles on JRuby appear in JavaWorld.