Tuesday, September 16, 2014

GraphX: Graph Processing in a Distributed Dataflow Framework

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose dis- tributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming ab- stractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advan- tages of specialized graph processing systems can be re- covered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph pro- cessing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a fa- miliar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented us- ing only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with special- ized graph systems, GraphX recasts graph-specific op- timizations as distributed join optimizations and mate- rialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of spe- cialized graph processing systems while enabling a wider range of computation.

Read more here

Leave a Reply

All Tech News IN © 2011 DheTemplate.com & Main Blogger .