Wednesday, August 12, 2015

How Apache Spark, Scala, and Functional Programming Made Hard Problems Easy at Barclays

At Barclays, our team recently built an application called Insights Engine to execute an arbitrary number N of near-arbitrary SQL-like queries and execute them in a way that can scale with increasing N. The queries were non-trivial, each constituting 200-300 lines of SQL, and they were running over a large dataset for hours as Apache Hive scripts. Yet, we need to execute 50 queries in less than an hour for our use case.

While we could have achieved huge speed-ups by moving from Hive to Impala, we determined that SQL was a poor fit for this application. Our solution instead was to design a flexible, super-scalable, and highly optimized aggregation engine built in Scala and Apache Spark, with some help from functional programming. This post discusses the problem and presents the Barclays solution, in particular showing how applying functional programming in Apache Spark was key to the outcome. The techniques discussed in this post will be useful for writing custom machine-learning algorithms or building complex applications, especially when flexibility, abstraction, robustness, stability, and speed of delivery are important.

read more here

Leave a Reply

All Tech News IN © 2011 & Main Blogger .