LOD Mapping 201107

This page is build upon Converting the Open Data from the hbz to BIBO. We explain the mapping process from hbz MAB2 to our lobid.org datamodel which uses Bibo, Dublin Core and other Ontologies.

We took care of the considerations made in RDF representation of series and multi volumes .


See the current list of vocabularies.

There are two namespaces in addition to the ones mentioned in Converting the Open Data from the hbz to BIBO :

and now we use the Prefix dcterms instead of dc for the namespace "<http://purl.org/dc/terms/>" .

Mapping of fields

Note: for convenience, if we speak of "field" we only give the field-ID, for example 037b_a would be <rdfmab:field/037b_a>.

We have mapped to fields from the record-centric RDF/ISO2709-format to a resource-centric BIBO-description as follows. Note that the original field names used below may contain wildcards for single characters (. and quantifier (?)as used in regular expressions).

The URI of the resource that is to be described is derived from identifier of the record, to be found in 001_.?.a. We decided to also mint URIs for ZDB-IDs. These ZDB-IDs come mainly from 026_.?a (the value have to start with "ZDB", there are other values there). Note that some ZDB-IDs for lobid.org were generated via post processing and are not contained in the original open data dump. Read more at HTTP URIs with ZDB-IDs
The title of the resource, combining main title and other title information. There will be only one dcterms:title. Sequentially the following fields will be used (if they are available): 310..?a,331..?a,333_.?a and these as well as these fields. (See also isbd:P1004 and isbd:P1006 below.)
The language of the resource, found in 037b.a.
dcterms:issued The year the resource was issued, sanity checked using <rdfmab:field/425..a>.
Subject-Links. These are derived from several fields:
  • 9.._.?9 fields contain identifiers from the subject authority file of the German National Library(DNB), which are available as Linked Data since April 2010.
  • 700b.a contain DDC-Notations. In order to link to the Linked Data Version of the classification, these numbers are truncated to the first three levels. If the full classification where available, we would be very happy to link to deeper levels.
The ISSN of the resource, found in 542..?a. The ISSN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISSN:ISSN>. This conforms to the range defined in the BIBO.
The extent of the resource, usually the number of pages, as found in 43[3457].?.?.?.
The type of a resource is derived from several fields, thus possibly resulting in multiple types for the same resource. The current mapping is a little a whole lot more elaborate since the start of lobid. org but will be subject of a further analysis. Have a look a the actual mapping fields .
The volume number of the resource, found in 090_..?,which holds the sortable form, and the descriptive form in field 089_?.?a is used.
Fortunately, the original data already includes many links from subordinate to superordinate records which can be used to link the corresponding resources:
  • 010_?1?a? contains the record-id of a direct superordinate
  • 453.?.?.? contains the record-id of the first series title
  • 599..?a contains the record-id of the record describing the journal that this resource is published in.
dcterms:creator 1...19 fields contain authority numbers of the authors of the resource. We decided to no more use the bibo:authorlist because of the simplicity of dcterms:creator (no need of blank nodes) - thus it can be ideally handled by generic Linked-Data-Displays such as pubby. Note that there are basically two types of authority numbers in the data: those maintained by the DNB (which are available as Linked Data) and local hbz-numbers, which are not available as Linked Data. In the first case, the resulting link leads to the Linked Data Service of the DNB, in the latter case the link unfortunately leads nowhere. Note two: somewhere at the end of the year the hbz-PND will be merged into the dnb-GND.
There are a lot of fields, look at the mapping: {publisher_name}} is the name of the publisher and {publisher_place}} the place of the publisher. To conform to the range of the dcterms:publisher predicate as defined in the DCMI Metadata Terms, we have introduced blank nodes for the publishers, typed as foaf:Organisation. The place of the publisher is attached as another blank node via geo:location. That blank node is typed geo:SpatialThing and has the name of the place attached by geonames:name, since we lack a mapping of the place names to geonames-identifiers. We are aware that this seems overly complicated, but we are trying to identify and properly model the entities that are referenced in the original data, even if that results in blank nodes in the first run. As soon as an authority file for publishers is available, we will try to link there. We might even have a look at the resulting blank nodes and see if the information is clean enough to form the basis of such a file.
In the current state of the raw data, holding information is only implicitly available. Since the records are segmented into packages by instutition, we know that an institution is the frbr:owner of at least one frbr:Item of the described frbr:Manifestation. Since we currently do not have signature-information, those items are once again modelled as blank nodes.

The following predicates are totally new:

dcterms:format and dcterms:medium 050 , 652_a . The MAB2 values will be mapped according to mapping , look under format. Here is work to do: for now we do not have (a known) controlled vocabulary. Also at the moment the values of both properties are a bit mixed up.
owl:sameAs For now there are some owl:sameAs links if the fields 026_.?a exist. If it is a ZDB-ID a second owl:sameAs is created with this ID suffixed to http://lobid.org/resource/ . Have a look at HTTP URIs with ZDB-IDs.
wdrs:describedby An URL to the local hbz-OPAC view is generated using the field 001_.?a .
dcterms:source There are a lot of fields, look at the mapping under source. Mostly there will be just literals so we cannot provide dcterms:source or something similar.
isbd:P1004 Main title of the resource. (Is used in parallel to dc:title (which combines main title and other title information). Sequentially the following fields will be used (if they are available): 310..?a,331..?a,333_.?a and these
The subtitle or any other remainder of the title of the resource. There can be many dcterms:alternative , coming from these fields
bibo:isbn10 and bibo:isbn13
The ISBN 10 an ISBN 13 of the resource, found in 540...?. The former used bibo:isbn is given up to these more specific predicates. The ISBN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISBN:ISBN>. This conforms to the range defined in the BIBO.
dcterms:abstract That's the property we use in the rare but happily cases the fields of description_abstract (look at the mapping) exist.
bibo:doi If field 552b.a exists the value is mapped to bibo:doi
bibo:oclcnum Thats the field 25o[12]a.
bibo:edition Thats the fields 400 1a, 403 1[an], 510 1a.
dcterms:source Thats the field 021 1a. The ID points to the internal ID of the original source from which the resource is derived. From this ID a lobid-resource link is assembled.
dcterms:hasFormat All resources which are linked to through dcterms:source will be enhanced with that predicate, linking to the otehr resource so that here will be a reziprocal relation. As this information is sadly missing in the underlying datasets this triple will be produced after the complete data transformation (sigh ) (not yet implemented).
dcterms:hasPart Thats the field 529z*9a. It's a link to a resource supplement.

The resulting model 

Geben Sie Stichwörter ein, die dieser Seite hinzugefügt werden sollen:
Please wait 
Sie suchen ein Stichwort? Beginnen Sie einfach zu schreiben.

Kommentar hinzufügen