Tuesday, August 27, 2013

Upton : a framework for easy web-scraping

Upton is a framework for easy web-scraping with a useful debug mode that doesn't hammer your target's servers. It does the repetitive parts of writing scrapers, so you only have to write the unique parts for each site.

With Upton, you can scrape complex sites to a CSV in just one line of code.

Upton::Scraper.new("http://website.com/list_of_stories.html", "a#article-link", :css).
     scrape_to_csv("output.csv", &Upton::Utils.list("#comments li a.commenter-name", :css))

Just specify a URL to a list of links -- or simply a list of links --, an XPath expression or CSS selector for the links and a block of what to do with the content of the pages you've scraped. Upton comes with some pre-written blocks (Procs, technically) for scraping simple lists and tables, like the list function above.

More details here

