PML 3.0

From Inference Web

Jump to: navigation, search

The Provenance Markup Language (PML 3.0) is being designed as a "drop-in" replacement for W3C's PROV-O.

This extension will include best-of-breed concepts from PML 2.0 that will be reshaped to suit the PROV-O paradigm.




Everybody in this project should know how these are being used (and, should use them).

  • Mapping PML 2.0 to PML 3.0 is Tim's Google spreadsheet that gives an overview of how PML 2.0 terms will be categorized. (more info)
    • Save The (PML 2.0) Whales wiki page that Cynthia and Paulo should use to argue to get a PML 2.0 concept into PML 3.0.
      • After acceptable resolution on this page (via discussion), Tim will update his spreadsheet.



PROV-WG operates on 2-week cycles for document reviews. So if we announce for comment on day 1, reviewers have one week to respond and editors have one week to incorporate feedback into the document.

  • August 16: PML 3.0 currently has Member Submission status.
  • September 13: Announce for comment W3C-style HTML document about "Low hanging fruit" PML 3.0 mappings.
  • November 01: PROV-WG votes to approve all Notes.
  • November 02: Consider a PML API for Java

Comparing PML 2.0 to PROV-O

The W3C Provenance Working Group has completed its design for a data model of provenance interchange. The group's data model reflects the consensus of many researchers and practitioners, each of whom has leveraged their experience and knowledge to contribute to the final design. The result provides a simple, core model that many systems can use to interoperate despite any differences in their native provenance needs, perspectives, or implementation.

This milestone from the W3C provides a good opportunity to reflect upon existing provenance models and interlingua. If we take stock of their benefits and shortcomings in the variety of their uses throughout their lifetime, we can look forward to how the existing models can contribute to the community that adopts W3C's recommendation.

The Proof Markup Language has provided a provenance interlingua for more than a decade, and its most recent revision (PML 2.0) is about half that age. It has been applied to many applications in different domains and has enabled different functionalities. PML 2.0 has been used to enable transparency in Linking Open Government Data aggregation, it enabled @@CIRO, it did in @@MONALOA, it @@NLP, @@justification.

This section provides an overview of the results of the individual analyses performed.

Things that PML 2.0 can offer PROV

  • Elaborate prov:Entity using the subclasses of pmlp:Source.
  • Elaborate prov:Plan using the subclasses of pmlp:InferenceRule (and incorporating Language).
  • pmlp:isMemberOf domain prov:Agent; range prov:Organization; owl:inverseOf foaf:member .

e.g. SPARQL bindings:

[ a pmlj:Mapping; 
  pmlj:mapFrom "NodeSet"; 
  pmlj:mapTo "" ;
] .

prov:qualifiedQuotation [ 
  a prov:Quotation ; pml3:QuotationOfDocumentFragment;
  prov:entity <> ;
  pmlp:hasToCol   75 ;
  pmlp:hasToRow   63 ;
  pmlp:hasFromCol 1 ;
  pmlp:hasFromRow 62 ; # pmlp:DocumentFragmentByRowCol, pmlp:DocumentFragment
] .

Provenance as Plans

When I am sitting at home and I want to eat spaghetti -- but my kitchen cupboards are empty, I need to make a plan to get a plate of hot food on my dining table.

When I make a plan, I think of an end state that reflects my goal. In that end state, I have a plate of hot food sitting on my dining room table. The food is steaming, the noodles are wet, the sauce is red, and there are slivers of green peppers sprinkled throughout.

As I think about that plate of mouth-watering food, I realize that I probably have a half-empty box of noodles on my counter, along with a cutting board and dirty knife next to a pile of green pepper seeds and stems. As I scan the rest of my kitchen, I realize that there is a cloudy pool of water in a pot on my stove, and in the trash can there is a sales receipt from the grocery store with tomorrow's date on it.

If I ask myself how the mess in the kitchen came to be, I would remember back to the sound of jingling keys, the creaking of my front door opening, and the rustling of two plastic grocery bags full of an unopened box of noodles, some whole green papers, a jar of tomato sauce, and a sales receipt with tomorrow's date on it.

If I ask myself how I came to have those two rustling plastic bags, I would remember back to the beeps and chatter at the grocery store just twenty minutes before. After I watch the teenager ring me up and she prints me a receipt with tomorrow's date on it, I walk out to my care and drive off.

An illustration of what I just described, with a sketch of how PROV-O can be used to intuitively describe plans (even -- especially -- when it is phrased in the past tense):

While I'm thinking about these situations, I could start jotting them down:

:when_I_have_a_plate_of_hot_spaghetti {

      a dining:Food;
      dcterms:hasPart :noodle_1, :noodle_2;         # And probably a few more...
      science:temperature_in_F "82.3"^^xsd:decimal; # At this moment, at least...

      science:quantity_in_grams "110"^^xsd:decimal; # Before I dig in, anyway...
      science:temperature_in_F "92.6"^^xsd:decimal; # At this moment, at least...
      dcterms:hasPart :tomato_sludge, :pepper_sliver_1, :pepper_sliver_2; # And probably a few more...

   :tomato_sludge   a dining:Food, <> .
   :pepper_sliver_1 a dining:Food, <> .
   :pepper_sliver_2 a dining:Food, <> .

      a dining:Supply, dining:Opened;
      prov:atLocation :kitchen-counter;

      a dining:Supply, dining:Opened;
      prov:atLocation :kitchen-counter;

      a dining:Dirty, dining:CuttingBoard;
      science:height_in_inches "18.4"^^xsd:decimal;
      science:width_in_inches  "24.8"^^xsd:decimal;

      a dining:Waste;
      prov:atLocation :cutting_board;

} a prov:Bundle, prov:Plan; prov:generatedAtTime "Tuesday 12 noon";
  prov:wasDerivedFrom :me_wanting_spaghetti;

:me_wanting_spaghetti {
      a dining:Utensil;
      prov:atLocation :dining_room_table;

      a dining:Food;
      prov:atLocation :plate;
      dcterms:hasPart :noodles, :sauce;
} a prov:Bundle, :Goal;
  prov:generatedAtTime "Monday 12 noon";
  prov:wasAssociatedWith :me;
  pml:hadOwner :me;

   a prov:Agent; 
   foaf:name "Tim";


:when_I_am_walking_through_the_door_with_groceries {
      a dining:Supply, dining:Opened;
      prov:wasDerivedFrom :purchased_box_of_noodles;

      a dining:Receipt;
      prov:atLocation :trash_can;
   :sauce # This is mentioned above in :when_I_have_a_plate_of_hot_spaghetti
      a prov:Food;
      prov:wasDerivedFrom :open_jar_of_sauce;
      a prov:Supply;
} a prov:Bundle, prov:Plan; prov:generatedAtTime "Tuesday 1pm";
  prov:wasDerivedFrom :when_I_have_a_plate_of_hot_spaghetti;


:when_I_am_at_the_grocery_store {
      a dining:Utensil;

      a dining:Utensil;

   :receipt # mentioned above in :when_I_am_walking_through_the_door_with_groceries 
            # and :when_I_have_a_plate_of_hot_spaghetti
      a dining:Receipt;
      prov:atLocation :bag_1;

   :purchased_box_of_noodles # mentioned above in :when_I_am_walking_through_the_door_with_groceries
      a dining:Closed, dining:Supply;
      prov:atLocation :bag_1;

   :purchased_jar_of_sauce   # mentioned above in :when_I_am_walking_through_the_door_with_groceries
      a dining:Closed, dining:Supply;
      prov:atLocation :bag_2;

} a prov:Bundle, prov:Plan; prov:generatedAtTime "Tuesday 2pm";
  prov:wasDerivedFrom :when_I_am_walking_through_the_door_with_groceries;

Things in PML 2.0 that we should leave behind (because they are captured in PROV)

  • pmlp:Agent (prov:Agent)
    • pmlp:Person (prov:Person),
    • pmlp:Software (prov:SoftwareAgent when running, and prov:Plan when not)
    • pmlp:Organization (prov:Organization)
  • pmlp:Source (prov:Entity -- removing the imposed contextualization will be refreshing. pmlp:Source should be pmlp:PotentialSource (which is silly), and the range of prov:hadSource should be prov:Source, and prov:Source's subclasses should be taken away from it.)

Things in PROV that PML 2.0 does not have

  • prov:Involvement - the description between an Activity/InferenceStep and how it relates to an antecedent or conclusion.
  • Multiple prov:wasGeneratedBy|s (aka pmlj:hasConclusion|s)
  • hadPlan mapping highlights the collision of Activity and Involvement; PML doesn't have Involvement.
  • Dictionary is a type of "pmlp:Source" (which should be called prov:Entity).

We're using the property <> in the PML 2.0 mappings of PROV where PML 2.0 does not have a reasonable analogue.

Things that are candidates for PML 3.0

  • note this will include constructors that are in PML 2 that are not in prov
  • this will also include constructors that are new to PML and PROV but are driven by current work.
  • one example is rationale that was requested from USGS as a result of our SemantAQUA work


  • PML 2.0 seems to have an imposed two-layer specialization hierarchy, where the NodeSet is a prov:specializationOf the Information's use. (We've said this before, but now that PROV has established specializationOf, it's worth restating).
  • Entity is much better than Information (b/c it's more general).
  • Note that the parentheses in the PML mappings are TERRIBLE to work with b/c they reduce to rdf:Lists.
  • Note that bnodes [] in the PML mappingss are a sign of proliferation.
  • The "third-party" annotation approach of PML is dissonant w.r.t. PROV's "uniform flow" from present to past (both in unqualified and qualified relations). The "direction of the triples" consistently "point to the past".
  • Odd that pmlp:Website exists but not a Webpage.
  • figure 7 (9B) page 17 shows ordering between two uses. This did not make it to PROV.

Possible PML 3.0 Modules

    • Proposed Modules:
      • PML-base
      • PML-proof (PROV-EN?)
      • PML-IR (PML for Information Resources)
      • PML-BIO (PML for biomedical provenance)

Information Extraction

2) support for information extraction and natural language processing (UIMA stuff)

* the main point here is that PML3 would have special treatment for entities that are “sentences in natural language”

From (1) and (2), we see that PML3 has support for entities that are “sentences”

pml3:Sentence as a subclass of prov:Entity (and would be related to prov:Quotation)

(do NOT have pml3:EnglishSentence; instead use hasLanguage)

E.g.:,  X is prov:Entity; X pml3:hasLanguage “English”

pml3:LogicalSentence as a sublcass of pml3:Sentence

E.g.: Y is pml3:LogicalSentence; Y hasLanguage RDF

Y prov:wasDerivedFrom X (since we parsed the string into a grammar tree)

(plus English stuff that is not a pml3:Sentence)

For information extraction itself:

  • getting a span of text from a narrative
  • getting an html fragment from a web page that requires some input information to return the expected result
  • Grabbing the 4th row of a csv
  • query results


Natural Language Processing is a special form of Information Extraction (and not one that most of us do regularly).

Question and Answer

(mentioned by paulo

Proof Verification

1) support for formal proofs (TPTP stuff)

* support for automated reasoning
* support for natural deduction-like proofs
* full support for entities that are “logical sentences”

Logical sentences may have variables, bindings, etc.

Formal proofs come with the notion of proof verification

1.1) Proof verification:

What is a valid statement?

What is a true derivation? And this again comes from valid axioms (ground assertions in a proof). It also comes from the notion of “sound” and “unsound” plans. Sound plans preserve trust. Unsound plans don’t.

We will NOT verify PROV. Instead, we will provide some extensions of PROV that -- if used -- allows anyone to verify PML-3 proofs.

The extensions that we need to add:

?? pml3:used ??

 a prov:Activity, pml3:ModusPonens;  # A successful completion of applying pml3:ModusPonens
 prov:used :a,
 prov:generated :c;
 prov:wasAssociatedWith :your-favorite-reasoner-running; # Software did it.

  a prov:Activity;  # Paulo’s desire assertions.
  pml3:used :a,    # pml3:used means that a and b are the only entities
                    :b;    # required to generate c, and that c was derived from a and b ONLY.
  prov:generated :c;

# part of pml-3-proof (not low-hanging-fruit)
pml3:wasDerivedFrom subpropertyOf [ pml3:wasGeneratedBy o pml3:used ] .

   rdfs:comment “This is referenced via a prov:Activity mutli-typing
                          and the same Activity’s Association’s prov:hadPlan”;
   a owl:Class, prov:Plan;
   owl:equivalentClass [
     pml3:used exactly 2 prov:Entity and
     pml3:generated exactly 1 pml3:ModusPonensConclusion;]
pml3:ModusPonensConclusion owl:equivalentClass [prov:Entity and pml3:wasDerivedFrom exactly 2 prov:Entity].

:c prov:wasGeneratedBy :act . # This is inferred from the inverse of prov:generated .

:a a prov:Entity .
:b a prov:Entity .

   a prov:Entity;
   prov:wasDerivedFrom :a,
                                       :b; # How do we know that c was only derived from a and b
                       # If we bring the plan in, and the plan specifies only a and b as prov:used

?? pml3:wasDerivedFrom ??

?? pml3:wasGeneratedBy ??

Task assignments



  • examples analysis (PML 2.0's partitioning based on PROV examples)
  • frequency analysis of PML 2.0
  • TODO: 72 TODOs from 102 terms (Tim and Jim)
  • TODO: Tim histogram of PROV vs. histogram of PML-P - gives us strength of correspondence. ( 72 activities vs. 72 infsteps)
  • TODO: Tim histogram of PML pure examples vs. PML-ified PROV histogram
  • TODO: Tim and Jim do the obvious mappings (Document subclass Entity) directly into owl on github

follow on (after histograms): enumerate benefits from personal retrospective, based on observing the histogram comparisons.

Create PROV-O-like figures for PML 3.0

  • put figure into PML 3.0 html

Dust off plunk

resourceForm and literalForm annotation properties (e.g. dcterms:description pml3:resourceForm pml3:hadDescription and prov:value pml3:resourceForm prov:specializationOf)

Tim instance analysis object histogram for hasLanguage and hasFormat. [1]

  • span should be from DEFINITIONS \union INSTANCE data (to include those defined but not found)


ADOPTED hasDataCollectionEndDateTime (currently a subprop)

New for PML 3.0 web page.

  • consolidate to just SURF, access rdflib graph from that (check Jim's code)

Source timrdf/docuspeakr from alvaro/docuspeakr.

Review plunk's processing flow, md5 of data files, track from source, categorize sources (project, producing organization, application type)

  • situate .sd_name in directories
  • fire up virtuoso on localhost
  • version control analysis for distribution

Add d3.js to PML 3.0 HTML - query for usage information from endpoint.

Add new egs collection in prov-wg hg

  • apply the "create file for every term" and commit
  • include from PML 3.0 page


Check out Olaf's PROV-O mapping.

Add "publisher" to the base ont. as association, agent, and restrictions.


Examples of how he uses Query.

Personal tools