PML example: downloading a file

From Inference Web

Jump to: navigation, search

(draft example; see iw:PML Primer for an introduction to PML.)

by: Tim Lebo

Contents

Introduction

Although data may come from an authoritative source, web interfaces conveying the data do not implicitly inherent trust in the source. This is because some degree of processing has been performed. This middle layer must be transparent and available for inspection to gain user trust in the web application's message. The first step in providing the process documentation is reporting that we obtained the source data from the authoritative source.

(a newer example is at https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-pcurl.sh)

An example

As an example, the White House offers a list of its visitors, which RPI's data-gov team used to create demonstrations showing:

But besides RPI's claim that they used the White House's data, how can one know for sure? What if an observer wants to inspect biases that the application developer may have imposed? What if an observer sees a mistake -- was it caused by RPI or the White House?

A trace from the source data to the final visual provides a start to addressing these kinds of questions.

In this example, we look at the first step in the trace: downloading the data file from the White House. Subsequent steps include converting the data file to RDF, hosting it as a dump file and in a sparql endpoint, the demo application's query for a certain subset of the data, and how the application populates a variety of visual constructs.

The process to document

How do we capture the provenance of downloading a file? For example,

curl -O http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Key-1209.txt

The process documentation

The following figure shows the full PML representation (using Turtle syntax) a visual abbreviation of the same.

PDF version

Personal tools
Navigation