IW Meeting 2009-09-10

From Inference Web

Jump to: navigation, search




(paulo) iwsearch requirement

Using CI-Miner here at UTEP, I have code that keeps track of a workflow execution through the use dissertation writing services of data annotators that capture provenance through the generation of PML-J. In the PML annotation that is generated, I may also have PML-P Inference Engine that is hard-coded along with the annotators because the tool generating the data annotators may also create its own PML-P instances instead of reusing PMP-P published elsewhere (we may want to change this later). In the SPCDIS context, for example, I see the need of annotating the PML-J with at least the following PML-P:

  • Inference Rule: we need PML-P about the algorithm used by a given application ordering a set of values in a given dataset;
  • Source: we need PML-P about “CHIP” to annotate the source usage of direct assertions produced by CHIP, i.e., to annotate that CHIP was the instrument responsible for capturing the image of interest;
  • Format: we need PML-P about “FIT” to annotate the node sets concluding FIT images.

For searching the PML-P instances above, we know a name or partial name of the instance and the type of the instance. For instance, we know that “CHIP” is a sensor, “FIT” is a format, etc.

It appears that we are talking about the same thing but I have the impression that we need a difference search approach for Format than for the other kinds of PML-P instances. The total number of format instances appears to be orders of magnitude below the number of other PML instances such as PML-P sensor or PML-P organization. On top of this, format is a key enabler for using scientific data. If you don’t know the format you may not know how to use the data. In this case, we may need a different strategy for format. I believe it makes some sense for IW to have a kind of “reference registry” for formats (that could be our own IWBase). This means that people can: (1) reuse formats from the registry; (2) create their own formats; or (3) create their own format (e.g., they may need to have the information available locally) but with a sameAs relation connecting their format instance to an entry in the format registry. In this case, IWSearch needs to have a more systematic way of searching for formats at the same time that it should be able to search for any PML-P format instance available on the web. In this case, the search could have a parameter specifying the domain where to search for data. If the parameter domain is the domain of the registry, then search should be able to retrieve a single entry as opposed to a search in another domain that could return multiple instances. This would facilitate our work of annotating scientific data a lot.

Now let think about CHIP. The problem here is to select which CHIP instance to use if someone gets more than one result. In this case, it would be interesting to use some different techniques. For instance, I am wondering if we could specify a scientific domain. In this way, we could narrow down the search to something like ‘“CHIP” and “astronomy”’. If that is possible, we could then further use something like page-rank and eventually get the instance that we are looking for. Now, if the device is a small sensor spread with thousands, millions of instances, we may need not only to look for a name but for something like a serial number. In this case, I am wondering if IWSearch could look for some PML-P sensor like “’RFID’ and “SENSORID=’1234567890’”. As you can see, I am not giving you a straight list of IWSearch request but showing that we will need very specialized search capabilities if we are going to generate and reuse tons of PML-P instances.


Personal tools