Tuesday, December 10, 2013

Understanding Storm: Low-Latency Processing

Storm is an open source system, for doing low-latency distributed computation on a shared-nothing architecture. In this post, we will present a simple (but useful) job, to maintain the set of unique users from different geographical regions. In the next post, we will show how to deploy this job to Amazon EC2. In these posts, only excerpts of the source will be given. For full source and demo data, see www.github.com/KasperMadsen/SimpleStormJob.

Before starting, I would like to point out the difference between Hadoop and Storm. Both systems scale very well, and can thus be used to process large amounts of data. One of the main differences is that Storm is optimized for low-latency processing and Hadoop is not (it is batch-based). There is a cost in providing low-latency processing, compared to batch-based processing, which makes performance comparisons unfair (the systems are solving two different problems). As a rule of thumb, use Storm when low latency matters, otherwise you are probably better off using Hadoop.

Read more here

Leave a Reply

All Tech News IN © 2011 DheTemplate.com & Main Blogger .