Proof Markup Language (PML) Primer

Working Draft 04 October 2007

This version:
          https://inference-web.org/2007/primer/
This version:
          https://inference-web.org/2007/primer/
Previous version:
          http://iw.stanford.edu/2005/wd-pml-primer/

Editors:
          Deborah L. McGuinness (Stanford University, Rensselaer Polytechnic Institute)
          Paulo Pinheiro da Silva (Stanford University, The University of Texas at El Paso)
          Li Ding (Stanford University, Rensselaer Polytechnic Institute)

Abstract
This document provides a brief introduction to the Proof Markup Language (PML), an interlingua for representing and sharing explanations generated by various intelligent systems such as hybrid web-based question answering systems, text analytic components, theorem provers, task processors, web services, rule engines, and machine learning components. The interlingua is split into three modules (provenance, justification, and trust relations) to reduce maintenance and reuse costs. We will introduce the ontologies and the use of Java API by examples using a question answering and information-extraction scenario. We identify four types of justifications, namely unproved goals, direct assertions, assumptions, and derived conclusions and show how PML supports these.

1 Introduction
2 Using PML
3 References

Introduction

More user decision processes are relying in part or in whole on results from information obtained from automated intelligent systems. When users obtain results from intelligent systems, they need to decide when and how to act on the results. As part of that decision process, users are increasingly requesting explanations about how the results were obtained and what the results depended on, and this demand is known as transparency . There are many enabling factors to explanation including representation of explanation metadata, presentation of explanations for human users as well as machine consumers, data management of explanation metadata data access, and computation on explanation metadata such as abstraction, or propagation and aggregation of trust . We have identified three dominant aspects of explanation representation, namely provenance, justifications, and trust relation; we designed our interlingua by splitting it into three ontology modules. Our modular design can support applications that only desire provenance representation (e.g., information and sources), potentially with later expansion to information manipulation step representation (e.g. logic justifications and function calls), and eventually expansion to trust representation of information sources.

The provenance ontology (also known as PML-P focuses on ontology primitives used for referencing and annotating IdentifiedThings. An instance of IdentifiedThing refers to an entity used or processed in a hybrid intelligent system, and its properties annotate the entity's properties such as name, description, create datetime, authors, and owner. The concept Information in PML-P is to reference information, such as a formula in a logical language, a natural language text fragment, or a dataset at various levels of granularity. The concept Source refers to information containers and hosts such as a document, a person, and a web page. PML-P also has defined a simple but extensible taxonomy of sources. In order to associate Information and Source, PML-P defined SourceUsage for capturing metadata about acquiring information from a source at a certain time. Moreover, PML-P also defined some auxiliary concepts such as Language and InferenceRule to make annotation more structural and better connected.
The justification ontology (also known as PML-J) focuses on ontology primitives used for encoding justifications about how a conclusion was derived. Here, a justification is not merely a standard logic reasoning step; it could be one step of any kind of computation such as an information extraction step, an assertion of a fact or assumption. A justification can sometimes be a complex process such as a web service execution, so it may not necessarily naturally be declaratively represented. In PML-J, a NodeSet is used to host a set of alternative justifications to one conclusion (i.e. a piece of information). An InferenceStep represents the details (such as the InferenceEngine, InferenceRule, and the set of antecedent NodeSets) of one justification for the conclusion of the corresponding NodeSet.
The trust relation ontology (also known as PML-T) focuses on ontology primitives used for explaining belief assertions associated with information and trust assertions associated with sources. While PML-P and PML-J help users establish understanding about information derivations by exposing their knowledge provenance, PML-T complements them by supporting explicit representation and sharing of users' trust assertions (and systems trust calculations) related to sources (including other users).

Using PML to Explain Hybrid Intelligent Scenario

In order to illustrate the representational power of PML, we introduce a simple scenario that uses an intelligent agent to answer a user's question and to incrementally explain how the answer was determined using various assertion and derivation techniques on information coming from different sources.

The Scenario: Question, Query and Answer

Consider a simple scenario where John consults an online tour guide agent Agent X about the specialty of a local restaurant called Tonys. The agent is equipped with the JTP hybrid reasoner to support question answering and an information extractor EX to support extraction of external knowledge from Web pages. The conversation between John and Agent X is essentially composed of two sentences in English:

                John: What type of food is Tonys' specialty? 
                X: Tonys' specialty is Shellfish.

For internal representation of sentences, Agent X uses the KIF language. Therefore, the sentences above are internally represented as follows:

                Question:       (type TonysSpecialty ?x) 
                Answer:         (type TonysSpecialty ShellFish)

John initially asks these simple (essentially look up questions) but later he may want to ask how the answer was derived, what knowledge was used to support the derivation, and related information such as information derivative on it such as what wines are suggested as a pairing with the restaurant's speciality, what wines are available locally, etc.

The agent answers questions using its embedded inference engine on its knowledge (potentially hand input or generated from information extraction processes) and has the option of providing explanations about its question an- swering processes. We will show portions of those processes in the following sections.

Provenance: Encoding Information, Sources and Other Provenence Elements

A provenance element represents an information unit describing the origin of a PML concept. For example, a Language represents a language a character string is written, and an Inference Engine represents an engine used to produce the justification for the couclusion.

In order to generate explanations, Agent X needs to first encode the questions and answers and make them referenceable. Using PML-P, users can capture the content of information as a string using the property hasRawString, and optionally annotate the processing and presentation instructions using PML-P properties such as hasLanguage, hasFormat, and hasPrettyNameMappingList. The example below shows that the content of information is encoded in the KIF language.

   
    http://www.w3.org/2001/XMLSchema#string">(type TonysSpecialty SHELLFISH)

The Language KIF is encoded in the following PML: (browse it in IWBrowser) (sample program code using PML Java API)

  
    http://www.w3.org/2001/XMLSchema#string">Knowledge Interchange Format (KIF)
    
      
        http://www.w3.org/2001/XMLSchema#anyURI">http://logic.stanford.edu/kif/kif.html

Besides directly embedding the content within the instance of Information, PML-P allows users to reference information whose content is stored elsewhere. The external content can be referenced by the value of hasURL, i.e., the information content can be obtained from an online document. The example below shows that the background knowledge content was actually obtained from an online document specified by hasURL, and that the content is written in KIF language.

  
    http://iw.stanford.edu/ksl/registry/storage/documents/tonys_fact.kif />

In addition to annotating information, Agent X needs to annotate the sources it used. The following example defines an instance of Document (browse it in IWBrowser, sample program code using PML Java API) whose content is the information seen in the above example. Using this representation, the Inference Web Registry provides a public repository that allows users to pre-register metadata about sources as a way of increasing metadata reuse.

   
     
      
        http://iw.stanford.edu/ksl/registry/storage/documents/tonys_fact.kif/>

PML-P supports references at the document level or at finer grained levels such as spans of text. The example below uses the DocumentFragmentByOffset concept to annotate the location of a span of text in the original document. With this representation, an application can highlight the corresponding span of text in a raw source document.

   
     
    http://www.w3.org/2001/XMLSchema#int">58 
    http://www.w3.org/2001/XMLSchema#int">91

Also, PML includes the notion of SourceUsage that indicates how information is associated with a given source. The encoding below shows how PML is used to represent date information.

   
     
       
         
        http://www.w3.org/2001/XMLSchema#int">58 
        http://www.w3.org/2001/XMLSchema#int">91 
       
     
    http://www.w3.org/2001/XMLSchema#dateTime">2005-10-17T10:30:00Z

Justification: Explaining Computation

Based on our past experiences, we have identified four important types of justi- fications leading to a conclusion. PML-J offers a set of primitives that support all of them. Using the common scenario, we illustrate examples of each one of these justifications.

TYPE I - the conclusion is an unproved conclusion or goal

In this case, no justification is available, and no InferenceStep is associated with the NodeSet as shown in the following example:

'(type TonysSpecialty SHELLFISH)' is either unproved or a goal