Open source Java projects: Akka
Building distributed systems for concurrent and scalable Java applications
The actor model is a message-passing paradigm that resolves some of the major challenges of writing concurrent, scalable code
for today's distributed systems. In this installment of Open source Java projects, Steven Haines introduces Akka, a JVM-based toolkit and runtime that implements the actor model. Get started with a simple
program that demonstrates how an Akka message passing system is wired together, then build a more complex program that uses
concurrent processes to compute prime numbers.
If you have experience with traditional concurrency programming, then you might appreciate the actor model, which is a design pattern for writing concurrent and scalable code that runs on distributed systems. Briefly, here's how
the actor model works:
- Rather than invoking an object directly, you construct a message and send it to the object (called an actor) by way of an actor reference.
- The actor reference stores the message in a mailbox.
- When a thread becomes available, the engine running the actor delivers that message to its destination object.
- When the actor completes its task, it sends a message back to the originating object, which is also considered an actor.
You might suspect that the actor model is more of an event-driven or message-passing architecture than a strict concurrency solution, and you would be right. But Akka is a different story: an actor model implementation that enables developers to accomplish impressively high concurrency at
a very low overhead.
Re-thinking concurrency with Akka (and Scala)
Actors provide a simple and high-level abstraction for concurrency and parallelism. They support asynchronous, non-blocking,
and high-performance event-driven programming, and they are lightweight processes. (Akka's founding company, Typesafe, claims up to 2.7 million actors per gigabyte of RAM.) Akka and other message-passing frameworks offer a workaround to the challenges of multithreaded programming (see the sidebar "What's wrong with multithreaded programming?"),
while also meeting some of the emergent needs of enterprise programming:
- Fault tolerance: Supervisor hierarchies support a "let-it-crash" semantic and can run across multiple JVMs in a truly fault tolerant deployment.
Akka is excellent for highly fault-tolerant systems that self-heal and never stop processing.
- Location transparency: Akka is designed to run in a distributed environment using a pure message-passing, asynchronous strategy.
- Transactors: Combine actors with software transaction memory (STM) to form transactional actors, which allow for atomic message flows
and automatic retry and rollback functionality.
Since the actor model is relatively new to most Java developers, I'll explain how it works first, then we'll look at how it's
implemented in Akka. Finally, we'll try out the Akka toolkit in a program that computes prime numbers.
What's wrong with multithreaded programming?
Multithreaded programming basically means running multiple copies of your application code in their own threads and then synchronizing access to any
shared objects. While it's a complex issue, multithreaded programming has three major fault lines:
- Shared objects: Whenever multiple threads access shared objects there is always the danger that one thread will modify the data upon which
another thread is operating underneath it. Typically, developers solve this issue by encapsulating the dependent functionality
in a synchronized method or synchronized block of code. Numerous threads may attempt to enter that code block, but only one thread will get through; the others will wait until
it completes. This approach protects your data, but it also creates a point in your code where operations occur serially.
- Deadlock: Because we need to synchronize access to code that operates on shared resources, deadlock sometimes occurs. In code synchronization (as described above), the first thread that enters a synchronized block obtains
the lock, which is owned by the object on which the operation is synchronized. Until that lock is released, no other thread
is permitted to enter that code block. If thread 1 obtains the lock to synchronized block 1, and thread 2 obtains the lock
to synchronized block 2, but it happens that thread 1 needs access to synchronized block 2 and thread 2 needs access to synchronized
block 1 then the two threads will never complete and are said to be deadlocked.
- Scalability: Managing multiple threads in a single JVM is challenging enough, but when you need to scale the application across multiple JVMs the problem increases by an order of magnitude. Typically, running concurrent code across multiple JVMs involves storing
shared state in a database and then relying on the database to manage concurrent access to that data.
Akka and the actor model
Akka is an open source toolkit and runtime that runs on the JVM. It's written in Scala (a language often touted for concurrency) but you can use Java code (or Scala) to call all of its libraries and features.
The principle design pattern that Akka implements is the actor model, as shown in Figure 1.
Figure 1. Actor model concurrency in Akka (click to enlarge)