Wednesday, October 30, 2013

How We Selected Apache Kafka on our Path to Real-time Data Ingestion


About a year ago, RichRelevance was looking for a way to enhance our existing approach for collecting clickstream data and to move it reliably in a scalable way in real time from our front-end data centers to our back-end cloud-based platform. As the global leader in omni-channel personalization, we serve more than 160 global clients in over 40 countries, and sit on petabytes of customer data. What made this effort extra challenging was the fact that our infrastructure is globally distributed.

We used several key requirements to guide our search, including:

  • An overall design principle that supported streaming over a batch approach 
  • Low operational complexity while being reliable and scalable 
  • A pub-sub model that incorporates push and pull mechanism 
  • A system that fit very well with our existing use cases (involving pacing of ad campaigns, omni-channel retail use cases, etc.) 
  • The ability to pull data at different rates to effectively take the data and use it for analytics, use it operationally to create models, or use it for BI purposes, rather than ingesting data and getting into ETL processing with the platform; (Being able to pull this streaming data for whatever use case, and integrate it with other streaming computation frameworks was critical.)
Read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .