Table of Contents

Use Cases

Rationale for new properties, as defined here: Vocabulary

Automatic Indexing

OAI-PMH

The following use-case deals with the representation of OAI-PMH data in the DC-PROV model. Especially with the representation of the provenance related information that may or may not be part of an OAI-PMH dataset.

The provenance features of OAI-PMH are briefly described here:

The follwiong example illustrates an origin description in OAI-PMH. In bold, we already indicate the possible mapping to DC vocabulary.

A straight-forward approach seems to be to create implicitly another description set for the original description. In this case, according to the definition, we can use dc:source (A related resource from which the described resource is derived.).

The identifier, according to OAI-PMH, is an identifier for the record, not the described resource! Thta means that we use it as URI for the description set. The contents of the description sets are totally arbitrary, i.e. we are not concerned with the representation of them in RDF. It is interesting that with this approach, the provenance chain is intact, if every party provides information in that way, i.e. we have a quite natural fit between the OAI-PMH model and the DC-PROV model.

The following graph illustrates the implementation in DC-PROV:

Additional Notes:

  1. Ordered List ItemThe dotted owl:sameAs relationship is something that should hold, but is not part of our crosswalk. A crosswalk for the actual metadata encapsulated in OAI-PMH probably would contain such a reference to the original identifer, if it is not reused anyway.
  2. Regarding the example dataset, the information about baseUrl and metadataNamespace are not represented in DC-PROV, if it would be needed, it would get assigned to the description set.In linked data settings, it is not necessary, es the information can be deferred from the contents of the description sets.

OAI-ORE

OAI-ORE aims to provide a standard way of describing constituents or a boundary of aggregations for machine readability. Whereas OAI-PMH is metadata-centric OAI-ORE is resource-centric!

Further information about OAI-ORE are available here.

from: http://www.openarchives.org/ore/1.0/primer#Nutshell

Use Case / Crosswalk from OAI-ORE to our Domain Model

The figure above shows an RDF Graph expressed by a Resource Map that includes metadata properties about Resource Map and Aggregation. Note that aspects of the graph already described are grayed-out to emphasize the concepts introduced by the figure.

from: http://www.openarchives.org/ore/1.0/datamodel#Metadata_about_the_ReM

The following graph illustrates the implementation in DC-PROV:

(The original) ReM-1 contains everything; every triple (except triple: ReM1 - ore:describes - A-1). In our model we seperated ReM-1 into two Sets; Annotation Set vs. Description Set.

A-1 is an aggregation of 'something' with creator 'Y'. The whole aggregation is contained in the Description Set. ('graph for itself') A-1 is no metadata resource!

ReM-1 (Resource Map) was created by 'X' and essentially is the Description Set in our Domain Model. The Resource Map allows to contain metadata about itself and about the aggregation it describes. The Annotation Set consists of parts of the Resource Map (4 Triples in our example).

In our model the content of the Description Set does not make sense, it does not include any additional information. ore:describes does not make sense in our model either.

Old Use Case

The following example illustrates provenance as it is used in OAI-ORE. As one can see OAI-ORE already uses some dc vocabulary.

//<!-- About the Aggregation for the ArXiv document -->
  <rdf:Description rdf:about="http://arxiv.org/aggregation/astro-ph/0601007">
  <!-- The Resource is an ORE Aggregation  -->
  <rdf:type rdf:resource="ht tp://www.openarchives.org/ore/terms/Aggregation"/>
  <!-- The Aggregation aggregates ... -->
  <ore:aggregates rdf:resource="http://arxiv.org/abs/astro-ph/0601007"/>
  <ore:aggregates rdf:resource="http://arxiv.org/ps/astro-ph/0601007"/>
  <ore:aggregates rdf:resource="http://arxiv.org/pdf/astro-ph/0601007"/>
 <!-- Metadata about the **Aggregation**: title and authors -->
  <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
  <dcterms:creator rdf:parseType="Resource">
  <foaf:name>Hui Li</foaf:name>
  <foaf:mbox rdf:resource="mailto:lihui@somewhere.cn"/>
  </dcterms:creator>
  <dcterms:creator rdf:parseType="Resource">
     <foaf:name>Zong-Kuan Guo</foaf:name>
     </dcterms:creator>
     <dcterms:creator rdf:parseType="Resource">
     <foaf:name>Yuan-Zhong Zhang</foaf:name>
  </dcterms:creator>//

Pubby Example

The modelling using TriG Syntax and following the discussion of the last teleconf:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix dcprov: <http://namespaceNotYetKnown/.
@prefix example: <http://example.org/data/> .
#default graph
	{ 
		#this naming would allow to automatically refer to the description set from the resource
		<http://example.org/data/guideIdentifier/prov> a dcprov:DescriptionSet .  
		<http://example.org/data/AnnotationSet/annSet1> a dcprov:AnnotationSet .
		<http://example.org/data/AnnotationSet/annSet2> a dcprov:AnnotationSet .
		<http://example.org/data/AnnotationSet/annSet1> dcprov:describes <http://example.org/data/guideIdentifier/prov> .
		<http://example.org/data/AnnotationSet/annSet2> dcprov:describes <http://example.org/data/AnnotationSet/annSet1> .
	}
#DescriptionSet1
<http://example.org/data/guideIdentifier/prov> 
	{ 
		example:guideIdentifier dc:date "2011-05-27"^^xsd:date.
		example:guideIdentifier dc:creator example:Paco_Nadal.
		example:WasGeneratedBy1123344 opmo:cause example:guideIdentifier .	  	  
	}
#AnnotationSet1
<http://example.org/data/AnnotationSet/annSet1> 
	{ 
		<http://example.org/data/guideIdentifier/prov> dc:creator "2011-05-28"^^xsd:date.
		<http://example.org/data/guideIdentifier/prov> dc:date example:Prisa_Digital.
		<http://example.org/data/guideIdentifier/prov> dc:publisher example:UPM.
	}
#AnnotationSet2
<http://example.org/data/AnnotationSet/annSet2> 
	{ 
		<http://example.org/data/AnnotationSet/annSet1> prv:dataCreation example:DataCreation1.
	}
  1. The graph URI is made from the resource's URI, not via a direct assertion
  2. The necessity of an additional graph per resource → scalability problems?

Once the problem is solved, another interesting question would be how to access the metadata provenance of a resource, given its URI. Should I see the graph relationship when accessing example:guideIdentifier? How do I know that a resource has provenance without asking for it explicitly?. (How do I know that a triple belongs to a graph?)

OPMV

Use Case Inspired on the old use case, but not from real data.

We start from the Named Graph 1, which contains all the provenance statements about the rdf graph of school 1.We assume that Agent 123 was the one who created the metadata, while Agent 124 was the one who validated the process. The process started at one date and ended at another date, and as a result, we obtain the NamedGraph1 (generated at one specific hour).

Now, after some days, Agent 123 adds/modifies the previous set of statements in NamedGraph 1, resulting in another artifact (NamedGraph 2), derived from the previous one. Using OPMV, the representation of the example is as in this figure:

Original nodes (form OPM specification):

Original edges:

Extended nodes (to model the example):

Extended edges (to model the example):

Using our domain model, the representation is as in the next figure. It is similar to OPMV, but way more simple:

Advantages/disadvantages of each model:

How to Translate the source model into our model

—- Old Use case (expanding an example of opmv's guide. Too simple)

There are some real world use cases in the OPMV Guide (although it is a bit drafty yet). The available use cases are:

We are going to focus on the first example, expanding it to treat the metadata provenance from 2 perspectives: OPMV and our domain model.

Edubase (register of all educational establishments in England and Wales) starts publishing its data straight from its database, one page per school. The RDF generated for a school is generated on demand from the database by some code that formats the result of a query on the database as RDF/XML. The generated provenance graph is described below (extracted form the OPMV Guide):

  eg:school1
  rdf:type <http://www.w3.org/2004/03/trix/rdfg-1/Graph> ;
  rdf:type opmv:Artifact, prv:DataItem ;
  opmv:wasDerivedFrom _:queryResult ;
  opmv:wasGeneratedBy [
      rdf:type opmv:Process ;         
      opmv:used _:queryResult ;
      opmv:wasPerformedBy _:netcode ;    ### sub-property of opmv:wasControlledBy
      opmv:wasControlledBy <http://www.jenitennison.com/#me>       
  ]
  .
  
  _:queryResult rdf:type opmv:Artifact ;  
      opmv:wasGeneratedBy [
          rdf:type opmv:Process ;         
          opmv:used <http://example.edu/edubase> ;
          opmv:used _:query ;
      ] 
  .
  
  _:netcode rdf:type opmv:Agent ;   
      rdfs:label ".NET code that formats the result of a SQL query on the database as RDF/XML" ;
  .
  
  <http://example.edu/edubase> rdf:type opmv:Artifact, opmvTypes:SQLDatabase 
     rdfs:label "Edubase: the database about schools and education." .
  
  _:query rdf:type opmv:Artifact, prvTypes:SQLQuery
     rdfs:comments "select * from schools where ***"

In RDF we can see how the provenance from the RDF graph of the school (eg:school1) has been modeled. It has been derived from a query result, and generated by a process controlled by <http://www.jenitennison.com/#me>.

Expanding the use case example

Now lets assume we want to assert who is the creator of the provenance information (it can be taken as reference for other properties too).

In OPMV, it would be modeled as in the figure below:

The example detailed the previous code can be seen in the named graph one. Artifacts are represented as ovals, while the processes are the boxes and the agents the hexagons.

The meta-level is represented in the named graph 2. We have had to expand some of the opmv's properties (in this example, we have expanded the opmv:wasGeneratedBy property, creating the opmv:metadataGeneratedBy subproperty). We haven't reused the opmv:wasGeneratedBy to avoid confussion with the normal provenance. The named graph is necessary for asserting more levels of metadata provenance. (The named graphs are not declared in the RDF code. However, according to the opmv specification, it is how it would be done. OPMO would be modeled slightly different (using opmo:account)).

With our domain model, it would be modeled as in the figure below:

In this case, instead of the named graphs we would use the description set for grouping the previous statements of provenance, and the annotation set to describe the metalevel. Each of the statemnts in the annotation set would also be dcprov:annotations.