Table of Contents
This page contains the description and documentation of the domain model that is proposed by the DC-Provenance task group. It is not yet finalized and currently regarded as work in progress. See the revision history of this page for earlier drafts. The group members agreed on the basic model that is presented here on January 26, 2011.
The main part of the current model is the definition of the classes and their relations, as illustrated in the following UML class diagram:
The proposed model extends the Dublin Core Abstract Model. Particularly, we use the following classes:
- Description Set (from DCAM terminology): A set of one or more → Descriptions, each of which describes a single resource.
- Description (from DCAM terminology): One or more → Statements about one, and only one, resource.
- Statement (from DCAM terminology): An instantiation of a property-value pair made up of a property URI (a URI that identifies a property) and a value surrogate.
- Annotation: One or more → Statements about one → Description Set. Subclass of → Description.
- Annotation Set: A set of one or more → Annotations. Subclass of → Description Set.
The DC-PROV domain model is going to form the basis of an application profile for metadata provenance. The main purpose of the UML diagram is to illustrate
- how the new Annotation entity (i.e., the entity comprising metadata provenance information) relates to the existing entities of the Dublin Core Abstract Model (DCAM), and
- how an Annotation is associated with the metadata it provides information about.
At this point, the domain model does not attempt to describe the makeup of an Annotation Set in the specific context of metadata provenance, i.e., it does not yet provide an element vocabulary needed to put together a concrete metadata provenance Annotation Set. At the moment, it only provides the generic scaffolding to accommodate such an element vocabulary.
What is a metadata provenance annotation?
As stated in the UML diagram, Annotation and Annotation Sets are primarily specifications of their DCAM counterparts, i.e., subclasses in the RDF model. Just like a Description Set is an aggregation of Descriptions (statements about a single resource), an Annotation Set is an aggregation of Annotations – we only assume a different cardinality of this relationship, the motivation of which will be explained below.
This means that every Annotation Set is also a Description Set in the sense of the DCAM, and can be treated as such. So why not just stick with the DCAM entities?
The motivation of deriving subclasses in the first place was that the main rationale of an annotation is to provide information about a Description Set, i.e., about metadata, a specific kind of resource.
Also, the Annotations created in the context of the DC-Provenance Application Profile try to provide not just any kind of information about metadata, but a specific kind: information about the provenance of the described metadata.
What does a metadata provenance annotation describe?
Annotations are associated only with Description Sets. Description Sets contain one or more descriptions. The relationship between Annotations and Description Sets (the “role” of Annotations in UML terms) is generically stated as being descriptive. The concrete mechanism or relationship employed here will depend on the metadata or resource description model used in a specific metadata application or use case (e.g., RDF).
The “describes” relationship in the diagram must not be confused with properties in RDF. In an RDF implementation, the “describes” relationship would just be the fact that the Description Set is used as a subject in the triples that form the Annotations.
The cardinality of 1 on the side of the Description Set indicates that an Annotation must only be related to a single Description Set. The same Annotation cannot be associated with more than one Description Set for two reasons; firstly, to be compliant with the DCAM definition of Description (“Statements about one, and only one, resource”) which Annotation is derived from, and secondly, to make expressions of the domain model in metadata frameworks like RDF easier, where one Annotation about two different Description Sets would result in two completely different triples.
Annotations are aggregated in Annotation Sets, just as Descriptions are generally aggregated in Description Sets. The main difference between these lies, once more, in cardinality. Whereas the association of a Description with a Description Set is optional, this does not hold for the association between an Annotation and an Annotation Set. An Annotation has to be part of at least one Annotation Set; conversely, every Annotation Set aggregates at least one Annotation.
The rationale for this cardinality constraint is mainly to enable basic discoverability of Annotations. Since (1) a variety of relationships are used for annotating description sets, and (2) not all entities associates in that manner with a Description Set may be metadata provenance related, the Annotation Set provides a general means of retrieving metadata provenance information.
In addition, this constraint ensures that Annotations can be further annotated by associating higher-level Annotations with a lower-level Annotation Set.
@@The first row reiterates the entities of the DCAM. The second row of the domain model contains the new classes required by the metadata provenance application profile, which in itself are specifications of their corresponding DCAM counterparts.
What is the third level?
Because an Annotation Set is a Description Set, an Annotation Set can itself be annotated by means of a further Annotation Set, i.e. we can capture provenance information for Annotation Sets as well. The model is able to handle an arbitrary number of meta-levels.
Vocabulary for the Annotations
It has to be distinguished between the vocabulary that is introduced by the Domain Model (dcprov: namespace) and the vocabulary that is used to create the actual annotations. In the latter case, the common Dublin Core Vocabulary (dcterms) is used to state provenance information like creator, creation date, sources, contributors, etc. For more information, see the Use Cases.
The model for sure allows the use of arbitrary vocabularies as annotations and the mix of Dublin Core and other Vocabularies is perfectly ok, in the same way as it is common practice in usual application profiles.
The abstract domain model has to be expressed to elements offered by a specific data model to be useful. The following illustrates a way to annotate RDF (meta-)data with provenance annotation.
# Named graph: http://example.org/data/ML-Desc @prefix dct: <http://purl.org/dc/terms/> . @prefix dctype: <http://purl.org/dc/dcmitype/> . :MonaLisa dct:format dctype:StillImage ; dct:creator :LeonardoDaVinci .
# Named graph: http://example.org/data/ML-Anno @prefix dct: <http://purl.org/dc/terms/> . <http://example.org/data/ML-Desc> rdf:type dcam:DescriptionSet ; dct:creator :BnF . <http://example.org/data/ML-Anno> rdf:type dcprov:AnnotationSet .
These triples describe two separate RDF graphs.
The following table shows how some of the RDF resources map to their corresponding UML classes of the domain model.
|:MonaLisa dct:creator :LeonardoDaVinci .||Description|
|<ML-Desc> dct:creator :BnF .||Annotation|
Our example consists of two statements about the resource
:MonaLisa, one about authorship of the resource, the other about its format. The graph
<ML-Desc> containing these statements forms a Description Set. Annotations about this metadata are contained in a second graph,
<ML-Anno>, forming an Annotation Set.
Statements that are part of this graph are considered annotations, i.e., statements about the provenance of the metadata of the original resource
:MonaLisa, not the resource itself. The statement
<ML-Desc> dct:creator :BnF . would mean that the Bibliothèque nationale de France created the description of the :MonaLisa (i.e., the metadata) contained in the graph
<ML-Desc> as opposed to the creation of the :MonaLisa itself.
Issues and further Ideas
- Superclass of Description Set necessary? Domain/range problems in OWL, could be circumvented by property/chain inclusion?