Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Open source Java projects: Storm

Parallel realtime computation for unbounded data streams

  • Print
  • Feedback

Storm is a big data framework that is similar to Hadoop but fine-tuned to handle unbounded data streams. In this installment of Open source Java projects, learn how Storm builds on the lessons and success of Hadoop to deliver massive amounts of data in realtime, then dive into Storm's API with a small demonstration app.

When I wrote my recent Open source Java projects introduction to Github I noted that Storm, a project submitted by developer Nathan Marz, was among GitHub's most watched Java repositories. Curious, I decided to learn more about Storm and why it was causing a stir in the GitHub Java developer community.

What I found is that Storm is a big data processing system similar to Hadoop in its basic technology architecture, but tuned for a different set of use cases. Whereas Hadoop targets batch processing, Storm is an always-active service that receives and processes unbound streams of data. Like Hadoop, Storm is a distributed system that offers massive scalability for applications that store and manipulate big data. Unlike Hadoop, it delivers that data instantaneously, in realtime.

Open source licensing

Storm is a free and open source project. It is hosted on GitHub and available under the Eclipse Public License for use in both open source and proprietary software.

In this installment of the Open source Java projects series I introduce Storm. We'll start with an overview of Storm's architecture and use case scenarios. Then I'll walk through a demonstration of setting up a Storm development environment and building a simple application whose goal is to process prime numbers in realtime. You'll learn a bit about Storm and get your hands into its code, and you'll also get a little taste of its speed and versatility. If Storm is applicable to your application needs, you'll be ready for next steps after reading this article.

What is Storm?

Storm is a free and open source distributed real-time computation system that can be used with any programming language. It is written primarily in Clojure and supports Java by default. In terms of categorical use cases, Storm is especially well suited to any of the following:

  • Realtime analytics
  • Online machine learning
  • Continuous computation
  • Distributed RPC
  • ETL

Perusing the Storm User group discussion forum, I found that Storm has been used in some very interesting real-world scenarios. Given a large database for an online auction site, for instance, Storm was used to view the top number of trending items in any category, in realtime. It has also been connected to the Twitter firehose and used to identify posting trends. Other potential uses for Storm include the following:

  • Watch live traffic to a website in order to understand user behavior and feed a recommendation engine.
  • Execute search operations in parallel (i.e., use parallel threads to search through different areas of a data set).
  • Compute video analytics, perform tagging, build tracks, and so forth.

See the Storm user group discussion forum for more real-world and speculative use cases for Storm.

  • Print
  • Feedback

Resources
  • Download the source code for this article's demonstration app.

Recent articles in the Open source Java projects series:

More about Storm: