Thursday, January 9, 2014

Spark: Low latency, massively parallel processing framework

While Hadoop fits well in most batch processing workload, and is the primary choice of big data processing today, it is not optimized for other types of workload due to its following limitation

  • Lack of iteration support 
  • High latency due to persisting intermediate data onto disk 
Nevertheless, the Map/Reduce processing paradigm is a proven mechanism for dealing with large scale data. On the other hand, many of Hadoop's infrastructure piece such as HDFS, HBase has been mature over time.

In this blog post, we'll look at a different architecture called Spark, which has taken the strength of Hadoop and make improvement in a number of Hadoop's weakness, and provides a more efficient batch processing framework with a much lower latency. Spark has generated a lot of excitement in the big data community and represents a very promising parallel execution stack for big data analytics.

Read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .