Today, we roll out a major change to the data on bibliographic resources we get from the hbz union catalog. We have put a lot of work into this during the last half year. For one thing, the lobid data now complies with the DINI KIM recommendations on publishing title data as RDF (pdf, German).
Simultaneously, with this move, we have switched our entire data transformation workflow from a local tool to the free software tool Metafacture that is developed on GitHub by the German National Library (DNB) with support from the hbz and others.
Here are two major changes the new transformation brings with it.
We have added links to scanned table of contents (ToCs) for more than one Million documents. E. g.:
The table of contents reside within a DigiTool repository. When you resolve the URL within a browser you will be directed to a viewer presenting a PDF of the ToCs. Using curl you will be redirected to an - unfortunately unstructured - OCR text of these ToCs, e.g.
(See also GitHub issue #307.)
We now have information on URNs (when a resource has a URN assigned). In the data we both express this with the property lv:urn which has the URN as literal and with directly linking to the resource using the property umbel:isLike. Example:
Data of two "Landesbibliographien" (bibliographies of German federal states) are part of the hbz union catalog (the bibliographies of North Rhine-Westphalia (NWBib) and Rhineland-Palatinate (RPB)). These are very interesting datasets
which focus on the literature about a specific region. As we are currently working on a project to build a web site for one of these bibliographies (NWBib) on top of the lobid API, we needed to get the the bibliography-specific information into the RDF. There are three special points of interest:
- Identifying resources that are part of a bibliography. We did this linking each resource to a bibliography using dct:isPartOf, e.g.:
- Enabling bibliography-only search. We enable search over NWBib, respectively RPB with using a 'set' parameter, e.g. - if you are interested in economy books about (parts of) Northrhine Westphalia: http://lobid.org/resource?name=economy&set=NWBib
- Including the bibliography-specific classification in the data. Each of the bibliographies uses a custom classification system for subject indexing. We converted these systems to SKOS (RPB SKOS classification, NWBib classifications in SKOS) and used URIs - instead of notation numbers - to link bibliographic resources with the SKOS concepts. Thus, we get a rich list of subject for such a resource, at times including GND subjects, DDC as well as the respective bibliography's classification. For the example resource we get eight subject links:
We have implemented this analogous with the RPB data.
Also, we did a lot of minor improvements, amongst others:
- Identifying audio(-visual) material in the RDF (e.g. http://lobid.org/resource/TT000039498/about).
- Marking honorees as such using Marc relators in RDF, .e.g <http://lobid.org/resource/HT013582860> <http://id.loc.gov/vocabulary/relators/hnr> <http://d-nb.info/gnd/124231764> .
- Adding the hbz ID - that is also part of a resource URI - to the RDF, e.g. <http://lobid.org/resource/HT002189125> <http://purl.org/lobid/lv#hbzID> "HT002189125" .
- Some maps in the hbz union catalog have geo data indicating the region that is represented by the map. We added this data to the RDF e.g.: