Iterating over collections in Java 8

Java 8's declarative iterators should be more readable, less error prone, and easier to parallelize -- so what's the catch?

The Java platform includes a variety of ways to iterate over a collection of objects, including new options based on features introduced in Java 8. In this article John Moore revisits the Iterator design pattern with attention to the difference between active and passive iterators. Learn how Java 8's forEach() method and features of the Stream API can help you fine-tune and parallelize the behavior of Java iterators, then conclude with some performance benchmarks that might surprise you.

Anytime you have a collection of things you will need some mechanism to systematically step though the items in that collection. As an everyday example, consider the television remote control, which lets us iterate over various television channels. Similarly, in the programming world, we need a mechanism to systematically step through a collection of software objects. The mechanism used for this purpose is known by various names, including index (for iterating over an array), cursor (for iterating over the results of a database query), enumeration (in early versions of Java), and iterator (in more recent versions of Java).

I first learned to program in an early version of FORTRAN, where the only data structuring capability was an array. I quickly learned how to iterate over an array using an index and a DO-loop. From there it was only a short mental leap to the idea of using a common index into multiple arrays to simulate an array of records. Most programming languages have features similar to arrays, and they support straightforward looping over arrays. But modern programming languages also support more complex data structures such as lists, sets, maps, and trees, where the capabilities are made available via public methods but the internal details are hidden in private parts of the class. Programmers need to be able to traverse the elements of these data structures without exposing their internal structure, which is the purpose of iterators.

The Iterator pattern

An iterator is a mechanism that permits all elements of a collection to be accessed sequentially, with some operation being performed on each element. In essence, an iterator provides a means of "looping" over an encapsulated collection of objects. Examples of using iterators include

  • Visit each file in a directory (aka folder) and display its name.
  • Visit each node in a graph and determine whether it is reachable from a given node.
  • Visit each customer in a queue (for instance, simulating a line in a bank) and find out how long he or she has been waiting.
  • Visit each node in a compiler's abstract syntax tree (which is produced by the parser) and perform semantic checking or code generation. (You could also use the Visitor pattern in this context.)

Certain principles hold for the use of iterators: In general, you should be able to have multiple traversals in progress at the same time; that is, an iterator should allow for the concept of nested looping. An iterator should also be nondestructive in the sense that the act of iteration should not, by itself, change the collection. Of course the operation being performed on the elements in a collection could possibly change some of the elements. It might also be possible for an iterator to support removing an element from a collection or inserting a new element at a particular point in the collection, but such changes should be explicit within the program and not a byproduct of the iteration. In some cases, you will also need to have iterators with different traversal methods; for instance, preorder and postorder traversal of a tree, or depth-first and breadth-first traversal of a graph.

Active iterators versus passive iterators

There are two general approaches to implementing an iterator depending on who controls the iteration. For an active iterator (also known as explicit iterator or external iterator), the client controls the iteration in the sense that the client creates the iterator, tells it when to advance to the next element, tests to see if every element has been visited, and so on. This approach is common in languages like C++, and it is the approach that receives the most attention in the GoF book. Although iterators in Java have taken different forms, using an active iterator was essentially the only viable option prior to Java 8.

For a passive iterator (also known as an implicit iterator, internal iterator, or callback iterator), the iterator itself controls the iteration. The client essentially says to the iterator, "perform this operation on the elements in the collection." This approach is common in languages like LISP that provide anonymous functions or closures. With the release of Java 8, this approach to iteration is now a reasonable alternative for Java programmers.

To illustrate the various approaches to iteration in Java, I need an example of a collection and something that needs to be done with its elements. For the initial part of this article I'll use a collection of strings representing names of things. For each name in the collection, I will simply print its value to standard output. These basic ideas are easily extended to collections of more complicated objects (such as employees), and where the processing for each object is a little more involved (like giving each highly rated employee a 4.5 percent raise).

Iteration in the early versions of Java — Enumerations

In Java 1.0 and 1.1, the two primary collection classes were Vector and Hashtable, and the Iterator design pattern was implemented in a class called Enumeration. In retrospect this was a bad name for the class. Do not confuse the class Enumeration with the concept of enum types, which didn't appear until Java 5. Today both Vector and Hashtable are generic classes, but back then generics were not part of the Java language. The code to process a vector of strings using Enumeration would look something like Listing 1.

Listing 1. Using an Enumeration to iterate over a Vector of strings

Vector names = new Vector();

// ... add some names to the collection

Enumeration e = names.elements();
while (e.hasMoreElements())
    String name = (String) e.nextElement();

Iteration in the middle versions of Java — Iterator

Java 1.2 introduced the collection classes that we all know and love, and the Iterator design pattern was implemented in a class appropriately named Iterator. Because we didn't yet have generics in Java 1.2, casting an object returned from an Iterator was still necessary. For Java versions 1.2 through 1.4, iterating over a list of strings might resemble Listing 2.

Listing 2. Using an Iterator to iterate over a List of strings.

List names = new LinkedList();

// ... add some names to the collection

Iterator i = names.iterator();
while (i.hasNext())
    String name = (String);

Iteration after Java 5 — generics and the enhanced for-loop

Java 5 gave us generics, the interface Iterable, and the enhanced for-loop. The enhanced for-loop is one of my all-time-favorite small additions to Java. The creation of the iterator and calls to its hasNext() and next() methods are not expressed explicitly in the code, but they still take place behind the scenes. Thus, even though the code is more compact, we are still using an active iterator. Using Java 5, our example would look something like what you see in Listing 3.

Listing 3. Using generics and the enhanced for-loop over a List of strings.

List<String> names = new LinkedList<String>();

// ... add some names to the collection

for (String name : names)

Java 7 gave us the diamond operator, which reduces the verbosity of generics. Gone were the days of having to repeat the type used to instantiate the generic class after invoking the new operator! In Java 7 we could simplify the first line in Listing 3 above to the following:

List<String> names = new LinkedList<>();

Iteration in Java 8 — the forEach() method

Before delving into Java 8 iteration features, let's reflect on what's wrong with the code shown in the previous listings — which is, well, nothing really. There are millions of lines of Java code in currently deployed applications that use active iterators similar to those shown in my listings. Java 8 simply provides additional capabilities and new ways of performing iteration. For some scenarios, the new ways can be better.

The major new features in Java 8 center on lambda expressions, along with related features such as streams, method references, and functional interfaces. These new features in Java 8 allow us to seriously consider using passive iterators instead of the more conventional active iterators. In particular, the Iterable interface provides a passive iterator in the form of a default method called forEach().

A default method, another new feature in Java 8, is a method in an interface with a default implementation. In this case, the forEach() method is actually implemented using an active iterator in a manner similar to what you saw in Listing 3.

Collection classes that implement Iterable (for example, all list and set classes) now have a forEach() method. This method takes a single parameter that is a functional interface. Therefore the actual parameter passed to the forEach() method is a candidate for a lambda expression. Using the features of Java 8, our running example would evolve to the form shown in Listing 4.

Listing 4. Iteration in Java 8 using the forEach() method

List<String> names = new LinkedList<>();

// ... add some names to the collection

names.forEach(name -> System.out.println(name));
1 2 Page 1
Page 1 of 2