Wednesday, September 4, 2013

Streaming MapReduce with Summingbird

0 comments
Summingbird is a library that lets you write streaming MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms like Storm and Scalding.

For example, a word-counting aggregation in pure Scala might look like this:

def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
  source.flatMap { sentence =>
    toWords(sentence).map(_ -> 1L)
  }.foreach { case (k, v) => store.update(k, store.get(k) + v) }




However, counting words in Summingbird looks like this:



def wordCount[P <: Platform[P]]
  (source: Producer[P, String], store: P#Store[String, Long]) =
    source.flatMap { sentence => 
      toWords(sentence).map(_ -> 1L)
    }.sumByKey(store)

Read more here

Leave a Reply

 
All Tech News IN © 2011 DheTemplate.com & Main Blogger .