We have recently updated the organisations data in lobid.org using the Culturegraph Metafacture software (see the morph-mapping) and - in order to represent even more data from the German ISIL registry - published some new controlled vocabularies in RDF. To be more appealing for re-users we chose to switch to a sustainable URI namespace at purl.org for the lobid vocabularies that are maintained at github.
So what did we actually add to the organisation descriptions?
- Added information about the organisation type (using the libtype vocabulary).
- Added information about the stock size and the type of funding organisation (using the newly published stocksize and the fundertype vocabularies.
- Added opening hours information.
- Added subject headings.
Some of these changes were already present in the triple store data but weren't reflected on the lobid.org frontend. Now you can get all this data in HTML and RDFa via your web browser or - using content negotiation - in other RDF formats.
We've automated the updating process so that from now on the organisations data will be updated on a monthly basis.
If you want to make use of the new data by querying subject headings, say: "Give me all institutions which have 'Karten' (German for 'maps')", that would translate into SPARQL as follows:
You will be disappointed, because this simple query (about a small dataset of only 350k triples) took 10 minutes (at the first time, without cache) and will not bring full result because of "hit of complexity". The problem is not SPARQL per se, but that you deal with literals, for which a triple store is (understandable) not optimized. This directly leads to another desiderata:
Subject Headings should not be literals, but URIs. That's already the case in the lobid data describing bibliographic resources but not in the organisation descriptions.
URIs as subject headings have other positive side effects. Using e. g. dewey decimal classification you can have direct access to translation of each class into many languages. You have a hierarchy of classes, and, whats most important, you have a non ambigous identifier from a controlled vocabulary rather than a plain word which could have different meanings in different contexts.
Thus, the transformation of these literals into URIs is a TODO.
Of course, an API based on a search engine would be also fast and would bring some extra benefits, e. g. auto suggestions. We are working on that!