IW Meeting 2012-11-01

From Inference Web

Jump to: navigation, search


Meeting Information



  • Tim
  • Jim
  • Deborah
  • James
  • Paulo
  • Patrice

Regrets: Cynthia (hope she feels better soon)

Meeting Preparation

Around the room

* Add a section for yourself 2 hours before meeting.
* Mark any discussion point that you would like to raise during meeting (with DURING MEETING). 
* Otherwise, assume that others will read the rest before meeting. 
* Also, please be considerate and read others' discussion points before the meeting starts.


Last week:


  • Need to knock out some PROV-O issues before the F2F4 in Boston
  • Need to develop the PROV Tutorial at ISWC - PRIORITY
  • First steps on WebSci paper.


Last week:

  • FUSE documentation
  • Working on aggregation/provenance/identity paper

This week:

  • Working on aggregation/provenance/identity paper
  • Working on thesis proposal
  • SATBI/SWIM talk (priority, talk is on 11/12)
  • Community Science proposal
  • contribution to fuse future directions and/or towards explaining emergence paper


Last week:

  • Finished FUSE drilldown module for phase 1 submission. (what can we learn and generalize from this work?)
    • We should list potential shortcomings that analysts may have run into when using the first version of the explanation module, as well as cases where we had to take one-off strategies (e.g., ad-hoc SPARQL generation for generating graphs).
    • One issue with graphs we had: they were a one-size-fits-all solution. Limited ability to get back data drilldown variations for individual indicators.
    • Testing + debugging on individual indicators with RPI + BAE.


  • Resuming FUSE datacube encoding design, based on BAE's latest feature+indicator graphs (awaiting response from BAE on reactivating needed endpoint).
  • Prep work for FUSE Site Visit + ISWC.
    • Assessment of explanation limitations on current drilldown module, and how a generalized approach could follow using datacube encodings.
  • contribution to fuse future directions and/or towards explaining emergence paper


TODO: get better soon.


  • PNNL broadening participation on PML3 and VisKo development
    • Development
      • PML3 out of VisKo: Eric Stephan's participation
      • Tim: "can't" == I think relates to suiting PROV-O's paradigm?
    • Use Cases
      • computational material modeling (PNNL, LANL, NERSC/LBNL)
        • Erin Barker's presentation?


Provenance use case from semanteco:

  • "AVI" data source aggregates from another source, how to model that?

Tim: having the actual example written up would be helpful.

Tim: Any chance I can get you to write down the source-provides-an-aggregate use case? Tim: I think it's interesting and very doable, I just need the details.


@prefix prov: <http://www.w3.org/ns/prov#> . [] prov:value "i can type - this is the first time in possibly 6 months!"; prov:wasAttributedTo <http://tw.rpi.edu/instances/Deborah_L_McGuinness> .

  • FUSE - swamped us last week.... now we need to do a draft of slides to show at BAE. This will include an explanation scenario (mostly vgs from cynthia and james) and also research contributions and research plans. including planned papers.
  • community science proposal discussions for next steps for a provenance environment of the future
  • would like to enable more progress on PML 3
  • next week - Deborah will be in alaska at an SSIII meeitng
  • f ollowing week is during iswc

Outstanding Items

James: incentives for experts to help non-experts.

Tim: expertise currency, Expertise Points?

  • I earn 50 EPs for 3 hours of walking a freshman through ls, cd, pwd unix commands.
  • Then can spend them for 1x10^-100000000000000000000 microsecond of Paulo's time.
  • Function of expertise ratings? Grad student to grad student or fresman to freshman is 1:1.

If we have the provenance for how the RDF data that I made from EPA's HTML table, I can get credit for all future results that prov:used my product. If we have the provenance for who is using csv2rdf4lod to convert data, I could get some EPs just for breathing (I "enabled" them to make their product). OpenLink would get mad EPs by TW's use of Virtuoso on every project that we do.

Pie trust: http://www.pietrust.com/ Whuffie implementation from that - Our system allows your users to evaluate each other, and produces a reputation score for each user via our simple API

More about Whuffie: http://en.wikipedia.org/wiki/Whuffie TODO: Jim to write up a couple sentences about how this can be used in community science. Tracking whuffie with provenance?

questions for us - what kind of infrastructure and provenance or other declarative modeling do we need to faciliate this and how do we make the communities transparent and nimble and agile enough to evolve? and how do we evaluate ?

Getting EPs for pointing help seekers to the eventual help providers.

Currency system, based on EPs: If one user helps another in a particular task, they could get a certain number of EPs, which could be spent (promoting a request to others above requests made by others)---?? sounds interesting.

In this situation, assume that a particular circle of people is looking at questions being asked by a community. Here, we may have an ordering of questions asked by the community, which could be promoted based on EPs spent. Similar idea to Google (users buying ad space or page rankings with real $).

Could help in cases where 1000s of requests being made to small group.

Wasn't there a talk at IPAW that tracked cell changes in Excel?

OntoWIki is a RDF graph editor that acts like a wiki.

Jim generated 90%, James generated 40%, then let people refute others claims. Rebuttal

Providing the percentage factor along with the author list. N x N matrix of how much everybody thinks everybody else contributed. - converge to the 1 x N vector

What are the major challenges for provenance for community science and if we had a breakthrough there, what would it enable

jim mcc - challenge - statistics is hard as is formulating hypotheses. need tools that help people formulate and pose hypotheses and then test hypotheses... and then be able to do this in a statistically correct way.

if we can define aggregate sets without having to directly encode in languages like owl that could help. .

From our Big Data talk slides a couple of weeks ago:

“Big picture” provenance

  • How to accumulate the right provenance traces (for a particular situation)? what is "right" -- that is the challenge :-)
    • Although provenance traces could be isolated linear paths, traces are more interesting when traces relate the same entities (e.g. the same file is reused, or the same result is generated by two activities).
    • but once traces overlap, then you end up with a multitude of paths across the traces.
  • How to present abstractions of traces that have many detailed steps (for a particular situation)?

“Insight” provenance - which of the many bits of information that we have is actually the stuff that would give someone insight? and

  • How to partition a body of provenance into trace threads?
  • How to recognize interesting trace intersects?
  • How to enable others to record provenance without requiring tailored captures?


After enabling provenance, we’re continually faced with addressing “so what”? Although we’ve been able to show that we can use our own provenance in the cases that we’ve designed, how do we step back and generically approach bodies of provenance that we might be less familiar with, or that we have differing uses than originally intended?

“Big picture” provenance is about trying to step away from the particular use case that exposes a specific answer, so that we can get a more complete sense of what we have. This is very much “overview” task.

Beyond looking at “what we have”, we can press into finding coherence within the provenance that one has collected. Can one “tell a story” by finding a trace among the hairball of provenance? After proposing those cohesive threads, how do they overlap, and which overlaps are interesting to which users?

Lastly, provenance is still a niche. How to enable others to do it more and to do it well

paulo: find incentives that work the current culture -

repeater and repeated both get EPs.

what kinds of rigor might help compensate for challenges around affordable reproducibility

Scary: Rigor makes up for lack of reproducability ? paulo introduces the notion - is much of big science pseudo science since it is often not reproducible

questions - about when rules apply and when they do not (from jim mcc) in mode of following rules instead of writing them.. interesting analogy to masters and phd - mastering a craft vs. mastering and now has the ability to change or set the direction of the craft...

argues that we need a mentoring capability in place to continue some of the structure... but from deborah - are there other ways to work? can we prop people up without having to have them turn into a master before they can make contributions? . From Jim: apprentices and journeymen can do good work, if they get guidance from masters.

Facts about IW Meeting 2012-11-01RDF feed
Date1 November 2012  +
Personal tools