Wednesday, January 14, 2015

Apache Samza: LinkedIn’s Stream Processing engine

LinkedIn began processing “big data” on Apache Hadoop six years ago. As time passed, we recognized that some of our use cases couldn’t be implemented in Hadoop due to the large turn-around time that batch processing needed. We wanted our results to be calculated incrementally and available immediately at any time.

Around the same time, LinkedIn developed Apache Kafka, which is a low-latency distributed messaging system. Unlike Hadoop, which is optimized for throughput, Kafka is optimized for low-latency messaging. We built a processing system on top of Kafka to allow us to react to the messages- to join, filter, and count the messages. The new processing system, Apache Samza, solved our batch processing latency problem and has allowed us to process data in near real-time.

Read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .