Two weeks ago, InfoWorld examined the two most popular real-time processing frameworks, Apache Storm and Apache Spark. Now we're going to take a much deeper look at Storm and walk through a basic Storm deployment for consuming Twitter messages and performing analytics on the Twitter stream.
To this end, we'll extract important keywords from individual tweets and calculate rolling metrics related to how actively a given keyword is being discussed. Plus, we'll do some lightweight sentiment analysis to determine the tenor of the discussion on a given topic. We'll also look at how Storm and XMPP combine nicely for extracting important "moment in time" events from a stream and for sending those events out as alerts.
All about Storm
Storm is an open source, distributed, stream-processing platform, designed to make it easy to build massively scalable systems for performing real-time computations on continuous streams of data.
People sometimes refer to Storm as the Hadoop of real-time processing, but it's important to note that Storm has no particular dependency on the MapReduce programming model. You may, if your needs so dictate, code a Storm solution to use a MapReduce model, but nothing about Storm requires it. In fact, Storm bears a slight resemblance to pre-Hadoop distributed computing systems like MPI in terms of the flexibility you have in designing your application.
This story, "Hands on: Build a Storm analytics solution" was originally published by InfoWorld.