Semantic Water Quality Portal

From Inference Web

Jump to: navigation, search

This page focuses on how the Semantic Water Quality Portal uses provenance.

Contents

Demo

Overview

The Semantic Water Quality Portal aims to connect water data and allow a broad class of users to query the data in a convenient way to allow them to identify polluted water sources and facilities in geographic areas of choice. It also incorporates multiple regulations including federal and state regulations so that users may review water data with respect to a variety of regulations. This demonstration gives a brief introduction to what can be done and highlights the provenance aspects of the effort.

A sample static demonstration follows. The live demonstration can be accessed at: [TWC Semantic Water Quality Portal] And the Tetherless World project page on this effort is available at [SemantAQUA]

Static Demo: Assume one is interested in water data in a particular zipcode. The screen show shows the results of inputting the zip code: 02888 with the selected the facets as shown. The user clicks on “Go!” The portal visualizes the results on a Google map using different icons to distinguish between clean and polluted water sources and facilities. To get the pop up window, click the polluted water source.

Switch regulation

With the “Regulation” box, the user can select the regulation to apply, e.g. check the “CA Regulation”. The portal discovers more polluted water sites, polluting facilities based on the CA Regulation.

Retrieval by characteristic

Unselect “No Filter” and click “select” at the next row and select one or more characteristics from the pop up window. The portal detects less polluting facilities and no polluted water sources, since we only select phosphorus_total_as_P. So all polluting facilities displayed are facilities releasing over limit amount of phosphorus_total_as_P.

Pop up window for pollution details

The user can access more details about a site by clicking on its icon. The information provided in the pop up window include: the contaminant names, the measurement values, the limit values, and measurement time.

Provenance of water data

Click the “?” near the “measurement value” in the pop up window for pollution details. The provenance provided in the pop up window includes: the pml file, the RDF file and the original source file.

Provenance of water regulations

Click the “?” near the “limit value” in the pop up window for pollution details. The provenance provided in the pop up window include: the pml file, the RDF file and the original source file.

Trend Visualization

Click the “Visualize Characteristics” at the bottom of the pop up window for pollution facts; select the permit for the facility (one facility can have multiple permits), the characteristic, and the test type. The page displays a trend graph of the water quality over time with violations highlighted. Move the mouse near the data point, the measurement time and value appear.

Data Sources

Data Type Data Source
Water Quality Data [EPA Enforcement & Compliance History Online (ECHO) Database]
Water Quality Data [USGS National Water Information System (NWIS) Water-Quality Web Services]
Water Quality Regulation [EPA (National Water Regulation)]
Water Quality Regulation [California Code of Regulations]
Water Quality Regulation [Massachusetts Department of Environmental Protection]
Water Quality Regulation [New York Department of Health]
Water Quality Regulation [State of Rhode Island Department of Environmental Management]

Technical Design

Provenance capture

Water data provenance capture

Integration Stage Provenance Script
Retrieval source URL, modification time, inference engine, inference rule, involved actor purl.sh
Adjust antecedent data, modification time, inference engine, inference rule, involved actor punzip.sh, justify.sh
Convert antecedent data, invocation time, inference engine, interpretation rule convert*.sh (conversion trigger)
publish dump file URL, publish time, involved actor publish.sh

Water regulation provenance capture

The water quality regulations are converted to OWL2 ontologies with our ad-hoc regulation converter. The regulation provenance data are captured manually. We plan to automate the regulation provenance capture with our regulation converter.

Provenance usage

Data source widget

Input URL of SPARQL endpoint and (optional) list of its named graphs, and URL of the SimpleNamedGraphSourceGraph instance
Output SimpleNamedGraphSourceGraph instance filled with simple descriptions of the source organizations responsible for the data
Processing Walk a big provenance graph for each named graph and abstracts it into one triple: <data_1> dct:source <source_1>

An example instance of SWQPSimpleNamedGraphSourceGraph is as below:

#apps do not hard code this graph
graph <SWQPSimpleNamedGraphSourceGraph> { 
   <SWQPSimpleNamedGraphSourceGraph> rdf:type :SimpleNamedGraphSourceGraph .
   <rhode_island_data_1> dct:source <http://health.tw.rpi.edu/source/epa-gov> .
   <rhode_island_data_2> dct:source <http://health.tw.rpi.edu/source/epa-gov> .
   # 2 sources is OK
   <rhode_island_data_3> dct:source <http://health.tw.rpi.edu/source/usgs-gov> .
   <rhode_island_data_3> dct:source <http://health.tw.rpi.edu/source/epa-gov> . 
}

Usage:

  • Presentation of the data sources on the interface
  • Source based data retrieval

Advantage:

  • The only thing that web applications need to know is the OWL class :SimpleNamedGraphSourceGraph. They can find the named graph in arbitrary endpoints that are typed to this class.
  • Lightweight provenance - done at holistic level of named graphs. When you care about triples returned, you can ask where THOSE SPECIFIC triples came from.

Disadvantage of this simplification:

  • All organizations identified share ALL of the responsibility for ALL of the data in the named graph. The more granular provenance will know which triple came from where.

Provenance Visualization

The responses provided by the water portal may not be trusted by some users if it does not provide users with the option to examine how the responses are obtained. The following provenance visualization example that shows the process trace of the USGS data when the input zip code is for Bristol County, RI. As we realized that the visualization of the full provenance trace is not readable for users, we are working on support both full and abstracted provenance trace.

Technical details

For more discussion about technical details, please refer to Semantic Water Quality Portal - technical details.

Data details

For more discussion about data details, please refer to Semantic Water Quality Portal - data details.

Publication

Presentation

Related Work

Source Code Repository

Source Code: http://code.google.com/p/swqp/

Personal tools
Navigation