OPeNDAP Example

From Inference Web

Jump to: navigation, search

OPeNDAP and Provenance Use Case

Background

If we inspect the collection of data models behind OPeNDAP, we see that most of them can be characterized as arrays. Moreover, if we observe how these arrays are often processed in scientific pipelines, we see that they are merged, filtered and aggregated. If we consider these arrays to be relations, we can say that perform unions (merging), projections and selections (filtering) and aggregation of relations if we decide to use a relational algebra terminology. The big question I have is how we can use provenance to annotate these arrays.

Problem specification

Let say that we have two sensors installed at the same location (same lat and long) and reading the temperature at random times. This means that each sensor is going to generate one array of temperature readings and that the provenance of each array indicates that each value was measured by a single sensor. This situation leaves us with the option of annotating each value of the array or annotating the entire array. For the second option, I see the possibility for us to use NetCDF attributes to annotate the arrays (or as NetCDF call them, variables). For the first option, I see the possibility of adding a new dimension called provenance and to add the sensor information there. I am just not sure how to do that. Now, things get more complicated when we start processing these data. For instance, I do not see any other option other than creating this new addition if we decide to perform an union where some values are going to be from original array and other values from the second array. However, we do need systematic support for this and I would like to see if Stephan and Patrick have ever discussed this.

The questions above are not necessarily about OPeNDAP but about how information is handled by OPeNDAP-compatible formats. A more specific OPeNDAP question is about some of the operators supported by an OPeNDAP server. For instance, it may not be uncommon for a dataset produced by a projection and a sequence of selections (these are the basic operations supported by OPeNDAP) to result in arrays from temperatures coming from a single sensor. In this case, would it make sense for the server to be smart enough to replace value annotations by array annotations (please note that compression is a major issues in this community). More interesting, OPeNDAP is capable of performing aggregation (e.g., computing the average temperature of a given day). I this case, how can we document and propagate the provenance of this temperature?

Personal tools
Navigation