PML example: RDFizing ScraperWiki's CSV

From Inference Web

Jump to: navigation, search

(draft example; see iw:PML Primer for an introduction to PML.)

by: Tim Lebo

Contents

Introduction

http://data-gov.tw.rpi.edu/wiki/ScraperWiki allows users to author python code to scrape specific web pages for data, and then provides the result in CSV format. This CSV data can then be converted to RDF using something like http://data-gov.tw.rpi.edu/wiki/Csv2rdf4lod.


An example

The lineage of the oil well data:

The process to document

The interesting thing about this process is that the initial source appears to be less trustworthy ("who's scraperwiki.com?"), but process documentation would show that the data came from the UK government. Also, the RDF version would be hosted at logd.tw.rpi.edu, so there are three levels of indirection.

Another important aspect is that the UK government may want to know what applications are using their data. Process documentation could be queried for information derived from their source.

The process documentation

http://scraperwiki.com/scrapers/show/uk-offshore-oil-wells/history/ shows when the scraper code was modified and executed.

Another example

http://scraperwiki.com/scrapers/show/annual-average-daily-traffic-flows/data/

Personal tools
Navigation