IW Meeting 2012-11-29

Meeting Information



  • Tim (cannot join until 3:30)

Meeting Preparation

Around the room

* Add a section for yourself 2 hours before meeting.
* Mark any discussion point that you would like to raise during meeting (with DURING MEETING). 
* Otherwise, assume that others will read the rest before meeting. 
* Also, please be considerate and read others' discussion points before the meeting starts.


  • PROV-O and friends are now finalized for CR publication. Awaiting the Director's call next week before actually publishing them.
  • Deployed plunk on provenanceweb VM and loaded some of LOGD's csv2rdf4lod PML (see listing at http://aquarius.tw.rpi.edu/projects/provenanceweb/namedGraphs)
    • Need to find a way to load only a sample of the verbose proofs (e.g. TPTP).
  • Gathered ~50 file format URIs and XML/CSV descriptions from UK National Archives project PRONOM
    • e.g. http://www.nationalarchives.gov.uk/pronom/fmt/111
    • A collection of file formats will be a useful dataset in provenanceweb.org (where is UTEP's collection?)
    • DROID tool identifies formats of files in a directory structure (asserts the PRONOM format ids).
    • Applied DROID tool to 4 data-gov dataset directories (120k files, 250GB) and created csv2rdf4lod eparams to transform to [some rather nice] RDF.
  • Applied Jim's WWW sio:has-attribute [ sio:references ] pattern to the DROID tool output.
    • Sketched out the design for how to record the provenance of DROID's claim that a file has a certain format.
    • Explored the idea that prov:Attribute is a subclass of prov:Agent. It's now Jim-two-thumbs-approved.

RETURN TO: submitting implementation reports to PROV-WG.

Up next:

  • Reviving http://healthdata.tw.rpi.edu/ + provenanceweb work.
    • yeah from deborah - would like to be pushing this more. I would like to revive this more and also when it makes sense push out some writeup on the healthdata as well as the provenanceweb
    • Implementing lodspeakr views of the underlying loaded data.
    • expose frequency occurrence and usage patterns
    • Implement "ontology+data" views; tie in https://github.com/jimmccusker/twc-healthdata/tree/master/ontology
    • Review PROV-O use in DataFAQs (since it's almost a year old).


  • Finished work on aggregation, identity and provenance paper, submitted to WWW: https://www.dropbox.com/s/0piwni2cp6w3fdc/aggident.pdf
  • includes mapping of SIO (SemanticScience Integrated Ontology) into PROV.
  • Preparing AGU poster, need volunteer to pick up at kinko's in SF next wek.
  • Rest of week on community science proposal.
  • Back on healthdata after that.
  • PML3 as it can be squeezed in.


  • Preparing for AGU (poster on SPCDIS work)
  • Research:
    • Approach for encoding/composing process plans
    • Goal: Capture details on data usage, which can be established through plans based on ontology classification. Plans will capture granularity of data being used, expressed through datacube encoding. Indicator classification in FUSE ontology being used as starting point.
    • Planning to submit to either IJCAI or AAAI.
      • todo - send draft title and abstract before next week


  • Fuse code clean up
  • Preparation for next phase.


The efforts above are supposed to be used by NASA-funded ELSEWeb project: http://earthdata.nasa.gov/node/2858



  • community science proposal
  • iarpa follow up
  • agu prep
  • us park service
  • dataone
  • return to -
    • would like to get back to pml3 and healthdata

Outstanding Items

Date15 November 2012  +
