Hands on: Build a Storm analytics solution

Storm lets you create real-time analytics for every conceivable need. Here's a tasty example using Twitter data and source code hosted on GitHub

Two weeks ago, InfoWorld examined the two most popular real-time processing frameworks, Apache Storm and Apache Spark. Now we're going to take a much deeper look at Storm and walk through a basic Storm deployment for consuming Twitter messages and performing analytics on the Twitter stream.

To this end, we'll extract important keywords from individual tweets and calculate rolling metrics related to how actively a given keyword is being discussed. Plus, we'll do some lightweight sentiment analysis to determine the tenor of the discussion on a given topic. We'll also look at how Storm and XMPP combine nicely for extracting important "moment in time" events from a stream and for sending those events out as alerts.

All about Storm

Storm is an open source, distributed, stream-processing platform, designed to make it easy to build massively scalable systems for performing real-time computations on continuous streams of data.

People sometimes refer to Storm as the Hadoop of real-time processing, but it's important to note that Storm has no particular dependency on the MapReduce programming model. You may, if your needs so dictate, code a Storm solution to use a MapReduce model, but nothing about Storm requires it. In fact, Storm bears a slight resemblance to pre-Hadoop distributed computing systems like MPI in terms of the flexibility you have in designing your application.

This hands-on feature is part of the InfoWorld Insider library. Sign in or create an account to Build your own Storm analytics solution using Twitter and GitHub.

This story, "Hands on: Build a Storm analytics solution" was originally published by InfoWorld.