IW Meeting 2012-10-11

Meeting Information



  • Tim
  • James (regrets)
  • Deborah (Chicago)
  • Jim
  • Cynthia
  • Paulo (who is using titanpad from PNNL!)
  • Patrick (titanpad only)

Meeting Preparation

Around the room

 * Add a section for yourself 2 hours before meeting.  * Mark any discussion point that you would like to raise during meeting (with DURING MEETING).   * Otherwise, assume that others will read the rest before meeting.   * Also, please be considerate and read others' discussion points before the meeting starts.



Deborah: swap order, based on FUSE inspiration? What inputs for the explanation stuff, that would prioritize it?

RETURN TO: bring the class in? They're using the healthdata.


  • FUSE support (visualization testing)


  • Fuse delivery
    • Fuse ontology extension
    • Fuse explanation service integration with BAE's
    • Fuse indicator metrics, features, rationale and relevance functionalists.


Titanpad still hasn't solved her issues. :-(

      Titanpad is now working for Paulo (from PNNL) -- please do not change! ;-)

  • Carole Goble's keynote at IEEE Escience on reproducibility is great.  
    • worth checking out and also worth looking at the references. 
    • it may be worth reviewing it and thinking about the reproducibility slant on our work.  i noted:
    •   pdiff from woodman et al - comparing provenance traces to diagnose divergence - that we probably want to cite it in any follow on to our proof combination 
    • stodden in AMP 2011 - documents barriers to describing data
  • Gave invited talk at Big Data in Rochester.  I called it Data Semantics - might be worth a return to in dicussion (RETURN TO)
  • class projects this term are: SemantEco extensions; health care data challenge extension / hospital angle; First Responder extension; political contribution and corruption

TODO: review her talk.

RETURN TO: data semantics slant

RETURN TO: healthdata , provenance requirements and class.


  • Mostly all healthdata
  • Starting work on aggregation/data cube/specializationOf theory paper for ESWC(?)
    • Includes provenance of aggregations
  • Might set up an "ontopedia", information about ontologies, with open editing/reviews, could apply to provenance ontologies
  • Helping out on FUSE where asked, keep seeing tasks disappear in front of me.


  • VisKo to be exposed to ESIP community. Nice opportunity to expose PML3 through VisKo although we are not going to talk much about provenance during our initial meetings with the ESIP community.
  • Can Tim get some VisKo PML to toss into provenanceweb? It could steer the focus for PML 3 terms to start writing up.
    • Gives us a good sense of coverage.
    • Paulo: you can turn it on with provenance.
    • TODO: Paulo to give time one example and teach him to fish so that he can get more.
    • TODO: Tim to shove it into provenanceweb
  • Tim realized the other day that we should replace PLUNK's manual Turtle with CKAN... (it would fix the pain of "where is your PML 2" problem that we're discussing right now).
  • Is VisKo doing any reasoning beyond file formats?

NASA EO - image of the day. gets 250k hits per day. Give them provenance via VisKO?


How they deal with colors, how they deal with labels.

TODO: Tim to look at PML 2, look to reencode to PML 3. TODO: Tim to sit with Nick and Paulo to discuss how to encode PML 3.



  • getting my hair cut this weekend, really looking forward to that
  • might set up weekly chiropractic appointments ... I need it
  • nothing to report in terms of provenance.
    • OPeNDAP software provenance, still need to write it up
      • OPeNDAP is responsible for reading in a datafile and "translating" it into a DAP object
      • Server-side functions can be run against the data to do additional transformations
      • can return the data in a different format
      • keeping track of version information of the software, what transformations, translations, or anything else that is done to the data
    • MetPetDB provenance writeup
      • they receive the data and someone takes that data and puts together a set of spreadsheets
      • the information is run through an upload script that puts it all in the database
      • no additional metadata is used in the upload (descriptions, comments, etc...)
      • also wondering if the method in which subsampling is done might be of importance, will ask the community (they get a big rock. To run tests on it they break it up into smaller subsamples)
    • I'd LOVE to finish up SPCDIS. I know the project is over, but I still want to persue the project and get it working and integrated into the data pipeline at HAO
    • Starting to work on CEDAR data provenance for VSTO
  • thinking of taking a cooking class, what do you all think?

Thanks, Patrick. No problem!

Outstanding Items

  • discussion of plans for PML 3 and how to accelerate
  • students dump their provenance?
  • students answer survey about their provenance use?
  • Jim is using the hospital data (converter, how to use, what challenges are, creating requirements).
  • Patrice is using the semantaqua project
  • Evan is a contact on the political one (collecting more data)
  • John/Raymond/Deborah on the NIST First responders

Paulo: Information Extraction. Mississippi. NIMD. SourceUsage, Document that is open, but might need query into the document, might want a part of it. Is in english, another format, escape characters. Whole infrastructure for info extraction.

FUSE is not using UIMA, but is doing info extraction.

Tim: students could toss Tim their prov, and he'll load it, and they can craft some lodspeark model/views.

  • TODO: tim and Jim.
