2011 30 June UTEP visit to RPI minutes

From Inference Web

Jump to: navigation, search



Deborah Jiao Jim McCusker Ping Yangfan Li Allen Renear James Michaelis Patrick West Nick Del Rio Hugo Porras Stephan Cynthia Joanne


http://inference-web.org/wiki/2011_30_June_UTEP_visit_to_RPI http://inference-web.org/w/images/8/8c/2011-06.30-lebo-csv2rdf4lod-pml.pdf

Action Items

  • TODO action item: Deborah remind paul to do the summary to support the proposal
  • TODO: Paulo and Deborah LTER connections via SONET effort.

Objectives for day

1) coordinate RPI/UTEP efforts 2) prov-wg

Projects using provenance

(These should be listed at http://inference-web.org/wiki/Projects) Deborah listing efforts and who they are supporting.

  • SESF
  • FUSE
  • SPCDIS (with NCAR)
  • Semantic Sea Ice (with the national snow and ice data center)
  • PopSciGrid (with NIH/NCI)
  • to be described later - SWAMP/CSIRO , NASA efforts, possibly WHOI,

Paulo listing efforts and who they are supporting. Allen (mostly interested in theoretical problems): Illinois

  • co-lead (with Dave Dubin) of Data Concepts working group (DCDC) of Illinois component (PI Carole Palmer) of DataNet Data Conservancy (prime: JHU) project.
  • also coordinating with another Illinois component of DC: the Data Practices Group (led by Carole Palmer) studying (empirically) scientists behavior around data sets
  • PI on IMLS project supporting humanities data curation, with two provenance projects getting under way: (i) literary databases and workflow and (ii) music databases and workflow

PML's lessons learned

* come back after today's discussion

useful vs. usabale best to give constructors and how it is used. Benefits of PML: useful. (this is our focus)

  • crossing derivation with (abstract vs. concrete).
  • Proof Theory (and it's relaxed version) - mechanics of transforming things.
    • TODO: we can look to proof theory for what is and is NOT essential. - Paulo (e.g. it does not have Time)
  • Theory of Provenance : Proof Theory (which exists) + Theory about Assertions (which does not exist).
  • (notion of) source usage - particularly motivated by applications that integrate with text analytics components such as UIMA
  • proofs that explain themselves.
  • "validation" - but it highlights the fact that PML has problems outside of it's ontology.
  • Integrity Constraints being applied to provenance aspects of PML.
  • We've been relaxing OWL constraints of the ontology the whole time.
  • Abstraction

TODO: The 4-part effort (Paulo, Jim, Allen, Tim)

TODO: Tim email pointer to PLUNK

TODO: Cynthia/Stephan/UTEP/Hugo respond to Tim's email about PML pointers

TODO: Tim to work on analyzing the collection

TODO: Examples of benefits of Proof Theory - paulo first draft, Deborah to discuss draft Disadvantages of PML: not usable.

Recap of why PML is awesome by Paulo

* Suggested Upper Merged Ontology (SUMO) 
* one thing that we do but other may not: scalability 
* useability issues: to hard to use
*potential candidates for slants on "awesome issues":  (paraphrasing from Paulo)
- scalability concerns (justification on scalability of PML?)
- unified view - arguably a point of maturity
- sweet spot on representational level
- moving from usefulness to usability
- proof theoretic foundation

analogy of provenance and previous work on ontologies. e.g. SUMO had 100 people over 10 years. mission: all concepts that are domain independent. everybody would agree. Europe did better with DOLCE (developed by 2/3 people) DOLCE is more consistent and has less tension - better support for tools b/c of no consistencies. we need concrete steps to move forward. start with a little understanding before using knowledge. what we are dealign with:

  • scalability (they do not have the instances that are creating new problems)
  • have gone through more cycles of development
  • moving from usefulness to usability (first need to show value, then need to make it usable)

what we need to improve:

  • documentation

Jim McCusker:

  • OPM was result of common denominator (they were starting with other work)
  • Tooling for PML is not there. (tools don't work for PML that a third party creates)
  • OPM has more design for usability


  • ask about definition for usability
  • usability impacted by the design of the technical solutions


  • usability in terms of cost, adopting new technology


  • pure compliant PML
  • we should evaluate why PML is useful and make it usable
  • initially, OPM is easier to use; if you go on with OPM, you might have issues
  • PML' s expressiveness, notion of sourceUsage
  • let proof theory hidden

Jim McCusker:

  • proof theory is grounded on logic
  • the world is not logical
  • proof theory is too obscure to be on the surface
  • RDF has theory foundations: graph theory


  • what part of proof theory do we ground PML on?
  • surface terms that connect to the foundations

prov-wg addressing new concepts:

  • assertions (which require an agent)

http://inference-web.org/wiki/Review_of_prov-xg%27s_Provenance_Vocabulary_Mappings OPM diagram of ontology http://inference-web.org/w/images/b/b2/Opm.owl.rdf.manual.graffle.pdf Jim being a conclusion of his parents. Jim: finding a way for people to use the "proof theory" when they don't care about it (they're just trying to talk about the world). TWO ORTHOGONAL ASPECTS: OPM handles derivation chains. PML-J handle derivation chains. FRBR handles concepts and how we transfer/obtain them. pml:Information and pmlj:NodeSet handle concepts and how to concretely transfer it. Continum from concrete vs. abstract ( conclusions at the frbr:Item level - Jim from his parents. conclusions at the frbr:Expression level. derivat

PML in Action

Tim's scenario

http://inference-web.org/w/images/8/8c/2011-06.30-lebo-csv2rdf4lod-pml.pdf one aspect to address later - is PML <-> SPARQL (came up in one of paulo's comments - in particular about Opendap TODO: Alan and Tim will talk more about the converter Paulo: what is the connection between rdf molecule and named graph? Li: They are 2 different concepts. 1 concluded triple with 712 triples of provenance Deborah: why so many triples of provenance? Tim: retrieve data files (pcurl), the enhancemnet parameters, invocation of the converter (meta data about the converter), pvload to the triple store Nick met similar problem with issue 1:Roles when he created a visualization pipeline SPARQL 1.1 have support for list (csc: I asked Franz about this and the answer is AllegroGraph will have it by Fall) TODO: Tim to pull tables from CI-Server's, enhance, and post back RDF version and it's provenance to CI-Server.

Greg's response regarding 1.1

When would normal people be able to start adopting sparql 1.1? I'd say whenever your software supports it. Obviously this is a challenge if you need your queries to be portable, but many of the big implementations already support big chunks for 1.1. Also, could you describe/point to how 1.1 handles lists? We're facing retarded PML... Property paths can give you all the elements of a list, but not in order. Lists continue to be a total nightmare to deal with properly, and I'm afraid 1.1 doesn't really make much progress in that respect.

CSIRO - Stephan, presented by Patrick

  • usage: flow forecasting, allow the farmers to pull from the river or not
  • they have sensors from different organizations, can't require the organizations to support provenance
  • data collected from other agencies, can be downloaded to CSIRO's servers
  • using sensors, determine water level
  • issue: users want to visualize the trace
    • domain dependent provenance visualization
    • farmers might come and look at the provenance, they want to see how a decesion is made can't visualize domain specific things: about the sensors, the scientists, the calibration of the sensors, the processing steps, who wrote the process steps

these are not necessarily provenance, but domain specific

    • need to have configurable view of the provenance that lets view show domain information.
    • compare and combine tools like "probe it"
    • Jim Myers - they just list antecedents in one long ordered list.
    • collapsible portions of the provenance graph.
  • Nick: Provenance listeners associate themselves to wdo:Methods in a Semantic Abstract Workflows. ONLY the portions described in the SAW are shown.
  • Paulo:In probe it, context: question, answers, workflow, graph, explanation, one step a time, in combination of scientific workflows/devices (MRI)

to discuss how to put things together, what "ProbeIt" doesn't do

  • Deborah: a specification of composite nodestes
  • Tim: you can model in PML with high granularity or in abstract level
  • Deborah: we have worked on "rewrite"
  • Tim: how to align the course grained provenance with fine grained one?
  • Paulo: isExplanationOf to connect them

TODO: CSIRO has a list of provenance queries. TODO: Composite NodeSet

  • abstracting a big ugly to create a new smaller proof.
  • how does the "parallel" from the big ugly associate to the new smaller proof (how is it ENCODED in PML?; how does one RECOGNIZE that a big proof is being abstracted?)

Jim Mc:

  • giving tools PML coming from an endpoint
  • giving a single document to a tool.

Q: can corroborate feeds for close sensors (some you trust, some you don't; some have provenance, some don't) users want to consider different models of the data. TODO: cynthia, paulo, patrick, stephan - look at proof combination for CSIRO


TODO: Tim to try out "from document" capability in IW Browser, report back to group Ping's document the encoding is not typed as PML hasConclusion is not necessarily PML, it can be from someone else TODO: Cynthia to list issues she sees with Tim's PML into the PML Validator list.

TODO: we need a version controlled repository that collects the "goods" and the "bads" (e.g. Tim's ## and Ping's untyped Information) TODO: Tim to resend the testing vocabularies. http://inference-web.org/wiki/PML_Validation_Service_Requirements#Reuse_existing_RDF_vocabularies_to_describe_test_results

NASA Goddard's use

Nick Del Rio http://giovanni.gsfc.nasa.gov/aerostat/ co-location algorithm. satelites sampling same area at ALMOST same time. NASA has their own provenance XML, but no schema. No use cases. Nick will walk XML into PML. NASA PI wants provenance next to chart - need to determine what provenance is important. trying to keep everything lightweight javascript. They also want provenance of visualization process, too. The only use case: if the user finds an interesting image, he can copy and paste the image to some journal NetCDF has its own ontoloy provenance doesn't tell something: the dimensions used to generate/select the image? visko: http://trust.utep.edu/visko/ Take away: Deborash, Stephan, Patrick and Tim (maybe) hear about what they care about the provenance, the prevenance they think they want Paulo: visko is for service composition to simplify their life

PML Validator - Jiao

Jiao http://inference-web.org/wiki/PML_Validation_Service_Requirements http://tw.rpi.edu/web/project/SWaMP/WorkingGroup/PML_Validator Validation has different types: Error, Warning, etc. Paulo concerned about rule: NodeSet should assert isConsequentOf Stephan: validator requirements are embodying how tool handles PML, not just the PML ontology Deborah: sometimes projects have specific requirements, e.g. naming conversion Stephan: let users add constraints Jiao can model constraints using OWL constraints. Paulo: concerns

  • strange for loops in NodeSet, but need to consider distinctions among WHO derived them. (tweak?: nodeset loops ONLY within scope of a particular person)

Allen - A derives A, so a loop isn't Proof Theoretic issue, it's a epistemic issue Paulo - but it's a way to assert an axiom. Jim Myers - same with time ordering from different sources. Paulo: strength of PML: capture knowledge and discuss inconsistencies. Stephan: the loop may not be become an error, but the validator will let the user know something interesting Jim: open world assumption create interface for implementing tests: https://scm.escience.rpi.edu/trac/ticket/502 Cynthia: do we want to validate the range of a property? There are many issues related to the values of properties. Jitin check syntax and inconsistencies of a new small file against an entire collection of files. Jiao's validator checks IC (for applying closed world) for only a single file (not agains the entire collection). Li: On one hand, we have proof theory the syntax and semantics On the other hand, there are tools consuming PML users may like to set constraints when using these tools Jiao's tool can help the users understand why the data is not loaded and visualized Manifestation has Item (Item a file on disk) download file derived from TODO: Tim to give pointers of PML to everyone. Everyone to check that the PML they care about is referenced. in particular, cynthia, stephan, and utep should look carefully

Personal tools