Iterating over collections in Java 8

java snazzy

Java 8's new declarative iterators should be more readable, less error prone, and easier to parallelize -- so what's the catch?

The Java platform includes a variety of ways to iterate over a collection of objects, including new options based on features introduced in Java 8. In this article John Moore revisits the Iterator design pattern with attention to the difference between active and passive iterators. Learn how Java 8's forEach() method and features of the Stream API can help you fine-tune and parallelize the behavior of Java iterators, then conclude with some performance benchmarks that might surprise you.

Anytime you have a collection of things you will need some mechanism to systematically step though the items in that collection. As an everyday example, consider the television remote control, which lets us iterate over various television channels. Similarly, in the programming world, we need a mechanism to systematically step through a collection of software objects. The mechanism used for this purpose is known by various names, including index (for iterating over an array), cursor (for iterating over the results of a database query), enumeration (in early versions of Java), and iterator (in more recent versions of Java).

I first learned to program in an early version of FORTRAN, where the only data structuring capability was an array. I quickly learned how to iterate over an array using an index and a DO-loop. From there it was only a short mental leap to the idea of using a common index into multiple arrays to simulate an array of records. Most programming languages have features similar to arrays, and they support straightforward looping over arrays. But modern programming languages also support more complex data structures such as lists, sets, maps, and trees, where the capabilities are made available via public methods but the internal details are hidden in private parts of the class. Programmers need to be able to traverse the elements of these data structures without exposing their internal structure, which is the purpose of iterators.

The Iterator pattern

An iterator is a mechanism that permits all elements of a collection to be accessed sequentially, with some operation being performed on each element. In essence, an iterator provides a means of "looping" over an encapsulated collection of objects. Examples of using iterators include

  • Visit each file in a directory (aka folder) and display its name.
  • Visit each node in a graph and determine whether it is reachable from a given node.
  • Visit each customer in a queue (for instance, simulating a line in a bank) and find out how long he or she has been waiting.
  • Visit each node in a compiler's abstract syntax tree (which is produced by the parser) and perform semantic checking or code generation. (You could also use the Visitor pattern in this context.)

Certain principles hold for the use of iterators: In general, you should be able to have multiple traversals in progress at the same time; that is, an iterator should allow for the concept of nested looping. An iterator should also be nondestructive in the sense that the act of iteration should not, by itself, change the collection. Of course the operation being performed on the elements in a collection could possibly change some of the elements. It might also be possible for an iterator to support removing an element from a collection or inserting a new element at a particular point in the collection, but such changes should be explicit within the program and not a byproduct of the iteration. In some cases, you will also need to have iterators with different traversal methods; for instance, preorder and postorder traversal of a tree, or depth-first and breadth-first traversal of a graph.

Active iterators versus passive iterators

There are two general approaches to implementing an iterator depending on who controls the iteration. For an active iterator (also known as explicit iterator or external iterator), the client controls the iteration in the sense that the client creates the iterator, tells it when to advance to the next element, tests to see if every element has been visited, and so on. This approach is common in languages like C++, and it is the approach that receives the most attention in the GoF book. Although iterators in Java have taken different forms, using an active iterator was essentially the only viable option prior to Java 8.

For a passive iterator (also known as an implicit iterator, internal iterator, or callback iterator), the iterator itself controls the iteration. The client essentially says to the iterator, "perform this operation on the elements in the collection." This approach is common in languages like LISP that provide anonymous functions or closures. With the release of Java 8, this approach to iteration is now a reasonable alternative for Java programmers.

To illustrate the various approaches to iteration in Java, I need an example of a collection and something that needs to be done with its elements. For the initial part of this article I'll use a collection of strings representing names of things. For each name in the collection, I will simply print its value to standard output. These basic ideas are easily extended to collections of more complicated objects (such as employees), and where the processing for each object is a little more involved (like giving each highly rated employee a 4.5 percent raise).

Iteration in the early versions of Java — Enumerations

In Java 1.0 and 1.1, the two primary collection classes were Vector and Hashtable, and the Iterator design pattern was implemented in a class called Enumeration. In retrospect this was a bad name for the class. Do not confuse the class Enumeration with the concept of enum types, which didn't appear until Java 5. Today both Vector and Hashtable are generic classes, but back then generics were not part of the Java language. The code to process a vector of strings using Enumeration would look something like Listing 1.

Listing 1. Using an Enumeration to iterate over a Vector of strings


Vector names = new Vector();

// ... add some names to the collection

Enumeration e = names.elements();
while (e.hasMoreElements())
  {
    String name = (String) e.nextElement();
    System.out.println(name);
  }

Iteration in the middle versions of Java — Iterator

Java 1.2 introduced the collection classes that we all know and love, and the Iterator design pattern was implemented in a class appropriately named Iterator. Because we didn't yet have generics in Java 1.2, casting an object returned from an Iterator was still necessary. For Java versions 1.2 through 1.4, iterating over a list of strings might resemble Listing 2.

Listing 2. Using an Iterator to iterate over a List of strings.


List names = new LinkedList();

// ... add some names to the collection

Iterator i = names.iterator();
while (i.hasNext())
  {
    String name = (String) i.next();
    System.out.println(name);
  }

Iteration after Java 5 — generics and the enhanced for-loop

Java 5 gave us generics, the interface Iterable, and the enhanced for-loop. The enhanced for-loop is one of my all-time-favorite small additions to Java. The creation of the iterator and calls to its hasNext() and next() methods are not expressed explicitly in the code, but they still take place behind the scenes. Thus, even though the code is more compact, we are still using an active iterator. Using Java 5, our example would look something like what you see in Listing 3.

Listing 3. Using generics and the enhanced for-loop over a List of strings.


List<String> names = new LinkedList<String>();

// ... add some names to the collection

for (String name : names)
    System.out.println(name);

Java 7 gave us the diamond operator, which reduces the verbosity of generics. Gone were the days of having to repeat the type used to instantiate the generic class after invoking the new operator! In Java 7 we could simplify the first line in Listing 3 above to the following:


List<String> names = new LinkedList<>();

Iteration in Java 8 — the forEach() method

Before delving into Java 8 iteration features, let's reflect on what's wrong with the code shown in the previous listings — which is, well, nothing really. There are millions of lines of Java code in currently deployed applications that use active iterators similar to those shown in my listings. Java 8 simply provides additional capabilities and new ways of performing iteration. For some scenarios, the new ways can be better.

The major new features in Java 8 center on lambda expressions, along with related features such as streams, method references, and functional interfaces. These new features in Java 8 allow us to seriously consider using passive iterators instead of the more conventional active iterators. In particular, the Iterable interface provides a passive iterator in the form of a default method called forEach().

A default method, another new feature in Java 8, is a method in an interface with a default implementation. In this case, the forEach() method is actually implemented using an active iterator in a manner similar to what you saw in Listing 3.

Collection classes that implement Iterable (for example, all list and set classes) now have a forEach() method. This method takes a single parameter that is a functional interface. Therefore the actual parameter passed to the forEach() method is a candidate for a lambda expression. Using the features of Java 8, our running example would evolve to the form shown in Listing 4.

Listing 4. Iteration in Java 8 using the forEach() method


List<String> names = new LinkedList<>();

// ... add some names to the collection

names.forEach(name -> System.out.println(name));

Note the difference between the passive iterator in Listing 4 and the active iterator in the previous three listings. In the first three listings, the loop structure controls the iteration, and during each pass through the loop, an object is retrieved from the list and then printed. In Listing 4, there is no explicit loop. We simply tell the forEach() method what to do with the objects in the list — in this case we simply print the object. Control of the iteration resides within the forEach() method.

Iteration in Java 8 — streams

Now let's consider doing something slightly more involved than simply printing the names in our list. Suppose, for example, that we want to count the number of names that begin with the letter A. We could implement the more complicated logic as part of the lambda expression, or we could use Java 8's new Stream API. Let's take the latter approach.

Collections and arrays can be used to generate streams — but beware: Streams are not collections that store elements! Rather, think of a stream as a mechanism for carrying a sequence of values from a source through a pipeline of operations. A stream pipeline consists of the stream source, intermediate operations that transform the stream and produce a new stream, and a terminal operation that either produces a result or calls the forEach() method. In general, stream operations are lazy, in the sense that they are performed only when the result is needed. An example should help clarify these concepts.

As illustrated in Listing 5, the list names is used to create the stream source for the operation pipeline. Then a filter is applied to transform the source stream into a stream consisting only of those names that start with the letter A. The filter() method has a single parameter of type Predicate, which is simply a boolean-valued function. A lambda expression is the perfect way to express the actual parameter. Finally, the stream's count() method is applied as the terminal operation that yields the result.

Listing 5. Using a stream pipeline of operations


List<String> names = new LinkedList<>();

// ... add some names to the collection

long count = names.stream()
                  .filter(name -> name.startsWith("A"))
                  .count();

Why Java 8 streams are important

To understand why streams matter, compare the code in Listing 5 with the code in Listing 6 below, which implements the same functionality using an active iterator.

Listing 6. Using active iterators in Java 7


List<String> names = new LinkedList<>();

// ... add some names to the collection

long count = 0;
for (String name : names)
  {
    if (name.startsWith("A"))
        ++count;
  }

You might still wonder why anyone would prefer the code in Listing 5 over Listing 6, which probably looks more familiar to most Java developers. It can be argued that the declarative approach in Listing 5 is actually more readable and less error prone, at least once you understand the basic concepts of streams and stream operations. In particular, depending on the context, the logic in Listing 6 might not be thread-safe, while Listing 5 is thread safe. It is also easier to parallelize the operations shown in Listing 5.

Parallel streams

Collection classes do not only have a stream() method, which returns a sequential stream, but they also have a parallelStream() method, which returns a parallel stream. Parallel streams have the potential to improve performance by allowing pipeline operations to be executed concurrently in separate Java threads; but note that the order in which collection elements are processed can change. When I used a parallel stream to simply print the names in a list, the order that they were printed in was not the order that they appeared in, in the list. Parallel streams are implemented using the Fork/Join framework that was added to Java 7. Parallelizing the operations in a stream pipeline is usually as simple as replacing the call to stream() in Listing 5 with a call to parallelStream().

Benchmarking performance: Active vs passive iterators

While there are several advantages to using the passive iterators now available in Java 8, I thought that it would be interesting to test whether or not they provide a performance improvement. In order to compare the performance of active iterators, streams, and parallel streams, I created a simple, non-scientific benchmark that allowed me to test all three iteration approaches with several different collection classes.

Similar to Listings 5 and 6 above, the benchmark code counts the number of names in a collection that begin with the letter A. Listing 7 shows the actual benchmark methods that I used to test the three approaches.

Listing 7. Benchmarking code


private static long testActiveIterator(Collection<String> names)
  {
    long count = 0;
    for (String name : names)
      {
        if (name.startsWith("A"))
            ++count;
      }

    return count;
  }

private static long testSequential(Collection<String> names)
  {
    return names.stream()
                .filter(name -> name.startsWith("A"))
                .count();
  }

private static long testParallel(Collection<String> names)
  {
    return names.parallelStream()
                .filter(name -> name.startsWith("A"))
                .count();
  }

On my computer, which has a dual-core Pentium processor, 4G RAM, the 64-bit version of Windows 7, and a 64-bit version of Java 8, I timed the calls to each of these three methods using five different collection classes (LinkedList, ArrayList, HashSet, LinkedHashSet, and TreeSet and two different collection sizes (1,000,000 and 2,000,000). For each collection class and size, I ran the benchmark several times and summed the results, which are summarized in Tables 1 and 2.

In some cases, the benchmark results were not what I expected. More extensive benchmarking should be performed using different computer configurations in a production or simulated production scenarios before drawing any definitive conclusions. But based on this simple benchmark and the results shown, the following statements appear to be true:

  • For the linked data structures LinkedList and LinkedHashSet, active iterators and streams appear to have similar performance, and both outperformed parallel streams on this benchmark.
  • For ArrayList and TreeSet, active iterators and streams appear to have similar performance, but parallel streams outperformed both on this benchmark.
  • For HashSet, parallel streams outperformed active iterators, and active iterators outperformed streams on this benchmark.

In conclusion

New features in Java 8 support a more declarative approach to iteration, in that we specify the result we want rather than how to compute it. Benefits of the new approach are that it can be more readable, less error prone, and more easily parallelized. Simple benchmarks suggest that the parallel versions are not necessarily faster for every collection class, however.

About the author

John I. Moore, Jr., Professor of Mathematics and Computer Science at The Citadel, has a wide range of experience in both industry and academia, with specific expertise in the areas of object-oriented technology, software engineering, and applied mathematics. For more than three decades he has designed and developed software using relational databases and several high-order languages, and he has worked extensively in Java since version 1.1. In addition, he has developed and taught numerous academic courses and industrial seminars on advanced topics in computer science.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies