Friday, March 13, 2015

Using Kafka and YARN for Stream Analytics on Hadoop

Real-time stream processing is growing in importance, as businesses need to be able to react faster to events as they that occur. Data that is valuable now may be worthless a few hours later. Use cases include sentiment analysis, monitoring and anomaly detection.

With cheap and infinitely scalable storage and compute infrastructure, more and more data flows into the Hadoop cluster. For the first time, the opportunity is ripe to fully leverage that infrastructure and bring real-time processing as close to the data in HDFS as possible, yet isolated from other workloads. This need has been a driver for Hadoop native streaming platforms and a key reason why other streaming solutions, like Storm, fall short.

This post motivates critical infrastructure pieces to build mission critical real-time streaming applications on Hadoop, specifically needed for end-to-end fault tolerance for the processing platform.

Read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .