Proof Markup Language (PML) Primer

Working Draft 04 October 2007

This version:
          https://inference-web.org/2007/primer/ 
This version:
          https://inference-web.org/2007/primer/ 
Previous version:
          http://iw.stanford.edu/2005/wd-pml-primer/ 

Editors:
          Deborah L. McGuinness (Stanford University, Rensselaer Polytechnic Institute)
          Paulo Pinheiro da Silva (Stanford University, The University of Texas at El Paso)
          Li Ding (Stanford University, Rensselaer Polytechnic Institute)

Copyright ©2007 Inference Web Group.


Abstract
This document provides a brief introduction to the Proof Markup Language (PML), an interlingua for representing and sharing explanations generated by various intelligent systems such as hybrid web-based question answering systems, text analytic components, theorem provers, task processors, web services, rule engines, and machine learning components. The interlingua is split into three modules (provenance, justification, and trust relations) to reduce maintenance and reuse costs. We will introduce the ontologies and the use of Java API by examples using a question answering and information-extraction scenario. We identify four types of justifications, namely unproved goals, direct assertions, assumptions, and derived conclusions and show how PML supports these.

Contents

Introduction

More user decision processes are relying in part or in whole on results from information obtained from automated intelligent systems. When users obtain results from intelligent systems, they need to decide when and how to act on the results. As part of that decision process, users are increasingly requesting explanations about how the results were obtained and what the results depended on, and this demand is known as transparency . There are many enabling factors to explanation including representation of explanation metadata, presentation of explanations for human users as well as machine consumers, data management of explanation metadata data access, and computation on explanation metadata such as abstraction, or propagation and aggregation of trust . We have identified three dominant aspects of explanation representation, namely provenance, justifications, and trust relation; we designed our interlingua by splitting it into three ontology modules. Our modular design can support applications that only desire provenance representation (e.g., information and sources), potentially with later expansion to information manipulation step representation (e.g. logic justifications and function calls), and eventually expansion to trust representation of information sources.

Using PML to Explain Hybrid Intelligent Scenario

In order to illustrate the representational power of PML, we introduce a simple scenario that uses an intelligent agent to answer a user's question and to incrementally explain how the answer was determined using various assertion and derivation techniques on information coming from different sources.

The Scenario: Question, Query and Answer

Consider a simple scenario where John consults an online tour guide agent Agent X about the specialty of a local restaurant called Tonys. The agent is equipped with the JTP hybrid reasoner to support question answering and an information extractor EX to support extraction of external knowledge from Web pages. The conversation between John and Agent X is essentially composed of two sentences in English:
                John: What type of food is Tonys' specialty? 
                X: Tonys' specialty is Shellfish. 
For internal representation of sentences, Agent X uses the KIF language. Therefore, the sentences above are internally represented as follows:
                Question:       (type TonysSpecialty ?x) 
                Answer:         (type TonysSpecialty ShellFish) 

John initially asks these simple (essentially look up questions) but later he may want to ask how the answer was derived, what knowledge was used to support the derivation, and related information such as information derivative on it such as what wines are suggested as a pairing with the restaurant's speciality, what wines are available locally, etc.

The agent answers questions using its embedded inference engine on its knowledge (potentially hand input or generated from information extraction processes) and has the option of providing explanations about its question an- swering processes. We will show portions of those processes in the following sections.

Provenance: Encoding Information, Sources and Other Provenence Elements

A provenance element represents an information unit describing the origin of a PML concept. For example, a Language represents a language a character string is written, and an Inference Engine represents an engine used to produce the justification for the couclusion.

In order to generate explanations, Agent X needs to first encode the questions and answers and make them referenceable. Using PML-P, users can capture the content of information as a string using the property hasRawString, and optionally annotate the processing and presentation instructions using PML-P properties such as hasLanguage, hasFormat, and hasPrettyNameMappingList. The example below shows that the content of information is encoded in the KIF language.

   
    http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH) 
     
   

The Language KIF is encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    http://www.w3.org/2001/XMLSchema#string">Knowledge Interchange Format (KIF)
    
      
        http://www.w3.org/2001/XMLSchema#anyURI">http://logic.stanford.edu/kif/kif.html
      
    
    
      
        
        
          
            
          
        
      
    
  

Besides directly embedding the content within the instance of Information, PML-P allows users to reference information whose content is stored elsewhere. The external content can be referenced by the value of hasURL, i.e., the information content can be obtained from an online document. The example below shows that the background knowledge content was actually obtained from an online document specified by hasURL, and that the content is written in KIF language.

  
    http://iw.stanford.edu/ksl/registry/storage/documents/tonys_fact.kif />
    
  

In addition to annotating information, Agent X needs to annotate the sources it used. The following example defines an instance of Document (browse it in IWBrowser, sample program code using PML Java API) whose content is the information seen in the above example. Using this representation, the Inference Web Registry provides a public repository that allows users to pre-register metadata about sources as a way of increasing metadata reuse.

   
     
      
        http://iw.stanford.edu/ksl/registry/storage/documents/tonys_fact.kif/>
        
     
     
   

PML-P supports references at the document level or at finer grained levels such as spans of text. The example below uses the DocumentFragmentByOffset concept to annotate the location of a span of text in the original document. With this representation, an application can highlight the corresponding span of text in a raw source document.

   
     
    http://www.w3.org/2001/XMLSchema#int">58 
    http://www.w3.org/2001/XMLSchema#int">91 
   

Also, PML includes the notion of SourceUsage that indicates how information is associated with a given source. The encoding below shows how PML is used to represent date information.

   
     
       
         
        http://www.w3.org/2001/XMLSchema#int">58 
        http://www.w3.org/2001/XMLSchema#int">91 
       
     
    http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z 
   

Justification: Explaining Computation

Based on our past experiences, we have identified four important types of justi- fications leading to a conclusion. PML-J offers a set of primitives that support all of them. Using the common scenario, we illustrate examples of each one of these justifications.

TYPE I - the conclusion is an unproved conclusion or goal

In this case, no justification is available, and no InferenceStep is associated with the NodeSet as shown in the following example:

'(type TonysSpecialty SHELLFISH)' is either unproved or a goal

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    
       
        http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH) 
        
       
    
  

TYPE II - the conclusion is an assumption

In this case, the conclusion is directly assumed by an agent to be a true statement, as shown in the following example:

'(type TonysSpecialty SHELLFISH)' has been directly assumed by the JTP inference engine

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    
       
        http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH) 
        
         
     
     
      
        
        
      
    
  

TYPE III - the conclusion is a direct assertion

The conclusion can be directly asserted by the inference engine without using any antecedent information. A direct assertion usually allows agents to specify details of SourceUsage, as shown in the following example:

'(type TonysSpecialty SHELLFISH)' has been directly asserted in Stanford's Tonys Specialty Example as a span of text between byte offset 58 and byte offset 91 as of 10:30 on 2005-10-17

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    
      
        http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH)
        
      
    
    
      
        http://www.w3.org/2001/XMLSchema#int">0
        https://inference-web.org/registry/DPR/Told.owl#Told"/>
        
          
            
              
                https://inference-web.org/registry/PUB/STE.owl#STE"/>
                http://www.w3.org/2001/XMLSchema#int">58
                http://www.w3.org/2001/XMLSchema#int">91
              
            
            http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z
          
        
      
    
  
  

TYPE IV - the conclusion is derived from a list of antecedents by applying a certain computation (e.g. inference rule)

This representation can be used to encode various types of computation steps. Annotations for the type of the computation can be done using thehasInferenceRule property, the computation's inputs using hasAntecedents, and the performing agent using the hasInferenceEngine property. For the example below, we assume that there are two direct assertions with their unique content presented as follows:

Premise 1:

'(subClassOf CRAB SHELLFISH)' has being directly asserted in Stanford's Tony's Specialty Ontology as a span of text between byte offset 56 and byte offset 82 as of 10:30 on 2005-10-17

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    
      
        http://www.w3.org/2001/XMLSchema#string">(subClassOf CRAB SHELLFISH)
        
      
    
    
      
        0
        
        
          
            
              
                http://www.w3.org/2001/XMLSchema#int">56
                
                82
              
            
            http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z
          
        
      
    
  
  


Premise 2:

'(or (not(subClassOf CRAB ?x)) (type TonysSpecialty ?x))' has being directly asserted in Deborah as of 10:30 on 2005-10-17

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

    
    
      
        http://www.w3.org/2001/XMLSchema#string">(or (not(subClassOf CRAB ?x)) (type TonysSpecialty ?x))
        
      
    
    
      
        0
        
        
          
            
            http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z
          
        
      
    
  
  

Agent X uses JTP to derive a justification as presented below:

'(type TonysSpecialty SHELLFISH)' is derived from the application of General Modus Ponens rule on the two premises linked by
https://inference-web.org/2007/primer/examples/proofs/tonys/answer_2/ns1.owl#ns1 and
https://inference-web.org/2007/primer/examples/proofs/tonys/answer_2/ns2.owl#ns2, and
we also record that the inference step binds the variable ?x to SHELLFISH.

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    
      
        http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty ?x)
        
      
    
    
      
        http://www.w3.org/2001/XMLSchema#int">0
        
        
          
            http://www.w3.org/2001/XMLSchema#string">?x
            http://www.w3.org/2001/XMLSchema#string">SHELLFISH
          
        
        
        
          
            
            
              
                
              
            
          
        
      
    
  
  

It is notable that antecedents are maintained in an ordered list to ensure consistent presentation of the antecedents during user interaction, and support some inference engines that use the order of input data.

The NodeSet concept additionally supports integrating multiple justifications for the same conclusion, therefore, one conclusion can have more than one justification, as shown in the following example:

'(type TonysSpecialty SHELLFISH)' is derived from the application of General Modus Ponens rule or, alternatively, from a direct assertion.

The justification can be encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

    
    
      
        http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH)
        
      
    
    
      
        http://www.w3.org/2001/XMLSchema#int">0
        
        
          
            
              
                http://www.w3.org/2001/XMLSchema#int">58
                
                http://www.w3.org/2001/XMLSchema#int">91
              
            
            http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z
          
        
            
    
    
      
        http://www.w3.org/2001/XMLSchema#int>1
        
        
          
            http://www.w3.org/2001/XMLSchema#string">?x
            http://www.w3.org/2001/XMLSchema#string">SHELLFISH
          
        
        
        
          
              
            
              
             
              
            
          
        
      
    
  
  

Encoding Proofs

Encoding the Question

An instance of question records the original user's question, which is an English sentence in this scenario.

"What type of food is Tony's specialty?"

The question is represented in PML as below: (sample program code using PML Java API)

    
    
      
        http://www.w3.org/2001/XMLSchema#string"
        >What type of food is Tony's specialty?
      
    
  
  

Encoding the Query/Answer

An instance of query records the translated internal representation of user's question, which is a KIF query in this scenario.

(type TonysSpecialty ?x)

The query is represented in PML as below: (browse it in IWBrowser) (sample program code using PML Java API)

    
    
      
        http://www.w3.org/2001/XMLSchema#string"
        >(type TonysSpecialty ?x)
      
    
    
    
    
    
    
  
  

In this query, we need to specify the following information that captures how the query has been answered.

Linking NodeSets to Query and Answer

  
    ...    
      
        ...
        
        ...
      
      ...
  
  


  
    ...
      
        ...
        
        ...
      
      ...
  
  

Trust: Sharing Trust in Conclusions

In some settings such as collaborative social networks, users may be interested in either reputation as calculated by populations or trust relations as stated and stored by users. PML-T provides basic ontology primitives for asserting simple trust statements like "Agent A believes information #B" and "Agent C trusts Agent D". It does not preclude users from extending PML-T to represent their own trust model. The example below indicates that the agent (identified by #X) believes a piece of information (identified by #info), and that the belief statement is associated with a value of 0.84. This example only captures some critical information of a trust relation, and the actual semantics of the trust relation and its value are missing. Many users may choose to extend PML-T by adding properties that explicitly encode the specific semantics and context of their trust models.


  
    
    
    0.84
  
  

This simple representation enables a minimal common representation for sharing trust knowledge. Some users may choose to provide an infrastructure for propagating trust information, calculate trust information, and/or build more complex trust models.

References

All references are available from the Inference Web publication page: https://inference-web.org/publications.html

Best PML references:

Best Inference Web reference: