Monday, November 10, 2014

Real time processing frameworks - S4 and Storm

S4, Storm – When, What and How to choose.

Real-time processing denotes processing, transforming and analyzing data on the fly, as and when data is generated or received. Real-time processing is different from batch processing, where data is stored as tables, files or blocks and the stored data is processed as chunks, in a distributed parallel fashion. In real-time processing, data is processed as individual records or small groups of records, depending on the speed of data arrival/generation. There are numerous frameworks available as open source for performing real-time operations on streaming data, for example S4, Storm and Spark streaming. In this blog post, I will focus on S4 and Storm.

S4 is a general purpose, scalable, distributed platform for processing event streams. S4 is developed in Java. In S4, a processing element (PE) is the smallest component responsible for performing operations on a subset of data – or a partition of the entire data, depending on the design. Applications (called APP in the S4 world) are built as a graph of processing elements. S4 spawns a PE instance for each unique combination of data, depending on the design of APP. Adapters are S4 applications that can convert external streams into streams of S4 events. PE’s communicate asynchronously by sending events on streams. Events are dispatched to nodes according to their key

Storm is a scalable, fault tolerant platform for processing event streams. In my previous blog post, Comparing Apache Storm and Trident, I explained more about this framework. Storm is developed partly in Java and partly in Clojure. In Storm, a bolt is responsible for performing operations on a subset of data. User has control to direct streams of data to appropriate tuples based on the requirement.

Read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .