Monday, October 21, 2013

Crawling the Web with Cassandra and Nutch

0 comments

So, you want to harvest a massive amount of data from the internet? What better storage mechanism than Cassandra? This is easy to do with Nutch.

Often people use Hbase behind Nutch. This works, but it may not be an ideal solution if you are (or want to be) a Cassandra shop. Fortunately, Nutch 2+ uses the Gora abstraction layer to access its data storage mechanism. Gora supports Cassandra. Thus, with a few tweaks to the configuration, you can use Nutch to harvest content directly into Cassandra.

Read more here

Leave a Reply

 
All Tech News IN © 2011 DheTemplate.com & Main Blogger .