Tuesday, March 4, 2014

A GitHub River for Elasticsearch

We’ve started using Elasticsearch for a few of our projects. It’s a great tool for storing and, especially, querying giant text datasets. In the words of their creators “Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine”. It’s fast and it’s ease to use and very useful in many situations.

One of the things that Elasticsearch does very well is to listen to a large stream of data and index it. This is done very easily with what’s called a river. A river is an easy way to set up a continuous flow of data that goes into your Elasticsearch datastore. Quoting once more the creators “A river is a pluggable service running within Elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster.”

It is more convenient than the classical way of manually indexing data because once configured, all the data will be updated automatically. This reduces complexity and also helps build a real-time system.

Read more here

Leave a Reply

All Tech News IN © 2011 DheTemplate.com & Main Blogger .