Iterating over collections in Java 8

Java 8's declarative iterators should be more readable, less error prone, and easier to parallelize -- so what's the catch?

1 2 Page 2
Page 2 of 2

Note the difference between the passive iterator in Listing 4 and the active iterator in the previous three listings. In the first three listings, the loop structure controls the iteration, and during each pass through the loop, an object is retrieved from the list and then printed. In Listing 4, there is no explicit loop. We simply tell the forEach() method what to do with the objects in the list — in this case we simply print the object. Control of the iteration resides within the forEach() method.

Iteration in Java 8 — streams

Now let's consider doing something slightly more involved than simply printing the names in our list. Suppose, for example, that we want to count the number of names that begin with the letter A. We could implement the more complicated logic as part of the lambda expression, or we could use Java 8's new Stream API. Let's take the latter approach.

Collections and arrays can be used to generate streams — but beware: Streams are not collections that store elements! Rather, think of a stream as a mechanism for carrying a sequence of values from a source through a pipeline of operations. A stream pipeline consists of the stream source, intermediate operations that transform the stream and produce a new stream, and a terminal operation that either produces a result or calls the forEach() method. In general, stream operations are lazy, in the sense that they are performed only when the result is needed. An example should help clarify these concepts.

As illustrated in Listing 5, the list names is used to create the stream source for the operation pipeline. Then a filter is applied to transform the source stream into a stream consisting only of those names that start with the letter A. The filter() method has a single parameter of type Predicate, which is simply a boolean-valued function. A lambda expression is the perfect way to express the actual parameter. Finally, the stream's count() method is applied as the terminal operation that yields the result.

Listing 5. Using a stream pipeline of operations


List<String> names = new LinkedList<>();

// ... add some names to the collection

long count = names.stream()
                  .filter(name -> name.startsWith("A"))
                  .count();

Why Java 8 streams are important

To understand why streams matter, compare the code in Listing 5 with the code in Listing 6 below, which implements the same functionality using an active iterator.

Listing 6. Using active iterators in Java 7


List<String> names = new LinkedList<>();

// ... add some names to the collection

long count = 0;
for (String name : names)
  {
    if (name.startsWith("A"))
        ++count;
  }

You might still wonder why anyone would prefer the code in Listing 5 over Listing 6, which probably looks more familiar to most Java developers. It can be argued that the declarative approach in Listing 5 is actually more readable and less error prone, at least once you understand the basic concepts of streams and stream operations. In particular, depending on the context, the logic in Listing 6 might not be thread-safe, while Listing 5 is thread safe. It is also easier to parallelize the operations shown in Listing 5.

Parallel streams

Collection classes do not only have a stream() method, which returns a sequential stream, but they also have a parallelStream() method, which returns a parallel stream. Parallel streams have the potential to improve performance by allowing pipeline operations to be executed concurrently in separate Java threads; but note that the order in which collection elements are processed can change. When I used a parallel stream to simply print the names in a list, the order that they were printed in was not the order that they appeared in, in the list. Parallel streams are implemented using the Fork/Join framework that was added to Java 7. Parallelizing the operations in a stream pipeline is usually as simple as replacing the call to stream() in Listing 5 with a call to parallelStream().

Benchmarking performance: Active vs passive iterators

While there are several advantages to using the passive iterators now available in Java 8, I thought that it would be interesting to test whether or not they provide a performance improvement. In order to compare the performance of active iterators, streams, and parallel streams, I created a simple, non-scientific benchmark that allowed me to test all three iteration approaches with several different collection classes.

Similar to Listings 5 and 6 above, the benchmark code counts the number of names in a collection that begin with the letter A. Listing 7 shows the actual benchmark methods that I used to test the three approaches.

Listing 7. Benchmarking code


private static long testActiveIterator(Collection<String> names)
  {
    long count = 0;
    for (String name : names)
      {
        if (name.startsWith("A"))
            ++count;
      }

    return count;
  }

private static long testSequential(Collection<String> names)
  {
    return names.stream()
                .filter(name -> name.startsWith("A"))
                .count();
  }

private static long testParallel(Collection<String> names)
  {
    return names.parallelStream()
                .filter(name -> name.startsWith("A"))
                .count();
  }

On my computer, which has a dual-core Pentium processor, 4G RAM, the 64-bit version of Windows 7, and a 64-bit version of Java 8, I timed the calls to each of these three methods using five different collection classes (LinkedList, ArrayList, HashSet, LinkedHashSet, and TreeSet and two different collection sizes (1,000,000 and 2,000,000). For each collection class and size, I ran the benchmark several times and summed the results, which are summarized in Tables 1 and 2.

In some cases, the benchmark results were not what I expected. More extensive benchmarking should be performed using different computer configurations in a production or simulated production scenarios before drawing any definitive conclusions. But based on this simple benchmark and the results shown, the following statements appear to be true:

  • For the linked data structures LinkedList and LinkedHashSet, active iterators and streams appear to have similar performance, and both outperformed parallel streams on this benchmark.
  • For ArrayList and TreeSet, active iterators and streams appear to have similar performance, but parallel streams outperformed both on this benchmark.
  • For HashSet, parallel streams outperformed active iterators, and active iterators outperformed streams on this benchmark.

Java 8 summary

New features in Java 8 support a more declarative approach to iteration, in that we specify the result we want rather than how to compute it. Benefits of the new approach are that it can be more readable, less error prone, and more easily parallelized. Simple benchmarks suggest that the parallel versions are not necessarily faster for every collection class, however.

About the author

John I. Moore, Jr., Professor of Mathematics and Computer Science at The Citadel, has a wide range of experience in both industry and academia, with specific expertise in the areas of object-oriented technology, software engineering, and applied mathematics. For more than three decades he has designed and developed software using relational databases and several high-order languages, and he has worked extensively in Java since version 1.1. In addition, he has developed and taught numerous academic courses and industrial seminars on advanced topics in computer science.

Learn more about this topic

1 2 Page 2
Page 2 of 2