Operational systems manage our finances, shopping, devices, and much more. Adding real-time analytics to these systems enables them to instantly respond to changing conditions and provide immediate, targeted feedback. This use of analytics is known as operational intelligence, and the need for it is growing fast.
For example, financial trading applications must rapidly respond to fluctuating market conditions as market data flows through trading systems. E-commerce systems must reconcile orders with inventory changes on a second-by-second basis, and they need to quickly respond to shopping behavior to offer personalized recommendations. Smart grid-monitoring systems need to continuously analyze telemetry from many sources to anticipate and respond to unexpected changes in power grids.
In all of these examples, live, fast-changing data sets churn in active, ongoing operations. The advantages of responding to this live data in real time -- to present shoppers with promotions based on the contents of their shopping carts, for example -- are both compelling and within reach. The combination of in-memory computing and data-parallel analysis, running on a cluster of commodity servers, allows systems to continuously track and analyze live data, extract important patterns, and generate immediate feedback that steers the system’s behavior. This technology can be found within a category of software called in-memory data grids (IMDGs), which have been evolving over the last decade to help manage operational systems.
What are in-memory data grids?
IMDGs store data in memory and distribute it across a cluster of commodity servers (or virtual servers in a cloud environment). Using an object-oriented data storage model, IMDGs provide APIs for reading and updating data objects with very low latency, typically in less than a millisecond, depending on the size of the object. This enables operational systems to use IMDGs for storing, accessing, and updating fast-changing, “live” data that track the system’s state, while maintaining quick access times even as the storage workload grows.
IMDGs have “elastic” storage in the sense that you can grow or shrink both storage capacity and throughput simply by adding or removing servers. In addition, they store in-memory data with high availability so that it is continuously available. Servers can fail and recover -- or otherwise be added and subtracted from the cluster -- without disrupting operations.
Perhaps most important, IMDGs can take advantage of the cluster’s computing power to perform data-parallel computations on stored data. Because data and computing power reside together, thereby avoiding data motion, IMDGs can provide fast results (often in less than a second) with minimal overhead. This makes IMDGs well suited to quickly analyzing the state of an operational system and providing immediate feedback.
Modeling operational systems for operational intelligence
Operational systems usually comprise a large population of highly dynamic entities, such as stock portfolios in a financial trading system, online shoppers browsing an e-commerce website, or viewers controlling set-top boxes in a cable TV network. These entities create a stream of events that must be correlated, enriched with offline data (for example, customer preferences or history), and analyzed to discover patterns and trends.
If this analysis is completed in real time, feedback can be provided to the operational system to enhance its functionality and improve its effectiveness. For example, stock trades can be triggered to capture market fluctuations, shoppers can be offered relevant, personalized recommendations, and cable TV viewers can be alerted to special promotions based on their viewing preferences and current selection.
Popular approaches to implementing real-time analytics focus on analyzing incoming streams of data and reacting to the data within those streams. Examples include complex event processing used in financial services and stream processing using Apache Storm, a parallel platform originally designed to analyze Twitter streams.
However, focusing on event processing does not provide a complete framework for modeling the behavior of real-world entities, which, in addition to event streams, have both history and context that must be taken into account. Using an in-memory model of the real-world entities managed by an operational system, the IMDG can correlate incoming events and enrich them with offline information to maintain a comprehensive context that can be subjected to real-time analysis. The output of this analysis then can be fed directly back to the system to add value to its operations. It also can be provided to personnel monitoring the system.
Using IMDGs to implement operational intelligence
IMDGs provide exactly the technology needed to implement an in-memory model for active entities within an operational system and continuously track incoming events from these entities, enriching them with relevant historical information and structuring a parallel analysis of aggregate behavior. This in-memory representation takes advantage of the IMDG’s object-oriented storage model to organize in-memory data representing the entities.
Because the IMDG is both elastic and highly available, it can handle highly variable workloads and run within a mission-critical operational system. The IMDG’s data-parallel computation engine enables it to quickly analyze state changes within the model and provide immediate feedback to the system, while capturing aggregate trends emerging across all entities.
Consider the analysis of clickstream data from a population of online shoppers to provide personalized recommendations. An IMDG can maintain an in-memory model of individual shoppers that is continuously updated by this clickstream data. Using an object-oriented approach, the IMDG represents each shopper by a memory-based object, which holds a dynamic collection of time-ordered clickstream events as well as preferences and historical shopping patterns (obtained from secondary storage).
This object-oriented view enables incoming events to be easily correlated, and it provides the basis for continuous, data-parallel analysis of shopping activity, both to generate immediate, personalized recommendations for individual shoppers and to look for emerging trends across all shoppers (such as identifying the most popular items or evaluating the response to a sale).
An IMDG also provides a natural software architecture for tracking events flowing from cable TV set-top boxes as viewers turn their TV sets on and off and switch channels. The in-memory model correlates incoming events for each set-top box, cleanses them according to rules to remove uninteresting events (such as random channel switching), and enriches them with programming information and known viewer information (history, characteristics, and preferences).
This integrated set of data for each viewer provides the basis for making upsell recommendations on programs or packages of interest, and the analysis can be performed in parallel across all active viewers. To give you an idea of the power of an IMDG, performance results from a recent simulation of 10M live cable TV set-top boxes demonstrated that an in-memory model (about 80GB of data, including replicas) was able to correlate and enrich 25K events per second and complete a parallel analysis of all 10M set-top boxes every 10 seconds on a cluster of 12 commodity servers hosted on Amazon EC2.
Bringing real-time intelligence to operational systems places huge demands on a computation engine. The engine must be able to ingest a high volume of incoming events, correlate and enrich that data, then quickly analyze it. Feedback must be provided while the opportunity exists to take advantage of the analysis, usually within milliseconds or seconds.
With elastic, memory-based storage and data-parallel computation specifically designed for use in operational systems, IMDGs provide a highly effective platform for implementing operational intelligence. Their ability to host an in-memory model of a real-world system, track changes as they occur, and analyze the model in real time both simplifies this task and delivers the performance that’s needed. The benefits of operational intelligence are only beginning to be realized. The opportunities are mind-boggling.
Dr. William L. Bain is founder of ScaleOut Software (@ScaleOut_Inc). He earned a Ph.D. (1978) in electrical engineering/parallel computing from Rice University and has worked at Bell Labs research, Intel, and Microsoft. At Valence Research, one of three startup companies he founded prior to joining Microsoft, he developed a distributed Web load-balancing solution that was acquired by Microsoft and is now called Network Load Balancing within the Windows Server operating system.
This story, "In-memory computing brings real-time intelligence to operational systems" was originally published by InfoWorld.