Wednesday, February 19, 2014

Hazelcast Introduces MapReduce API

Typical scenarios where you want to use Hazelcast MapReduce API are distributed computations where the EntryProcessor is not a good fit. Either you want to have data transforming or you want to utilize multiple data sources. It is also a good fit for long running operations since all of the current systems working directly on partition threads so you do not have to do explicit locking for data changes. In one of the next version I will add continuous map reduce support so you can have a fully streaming analysis running. The best example for this is always Twitter which processes tweets in realtime to collect informations like reweets, favorits and a lot of other statistics. This is also useful for risk management and analysis.

The biggest difference to Hadoop is the in-memory and the realtime processing. In Hadoop you have different phases where every phase is executed one after the other whereas in Hazelcast you get full performance due to the internal concurrent design where mapping and reducing running in parallel on all nodes. Phases itself are pretty similar to what you find in Hadoop, so you have mapping (and combining), shuffeling (partitioning to the nodes) and reducing phases but there not as clearly separated as in Hadoop.

A comparison with MongoDB is hard since I never used their MapReduce API but it seems to lack Combiners which are very helpful for huge amounts of datasets but as I said I'm not aware of their implementation.

Read the complete interview here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .