Sharing context - publishing application profiles with JSON-LD

Since 2010, more and more library service centers and libraries in Germany have been publishing their catalog data as linked open data, see e.g. this list on the Data Hub. Regarding the RDF modeling and the related questions of which RDF properties from which vocabularies to use and how, the different data publishers mostly oriented themselves towards prior LOD publication projects. Thus, the different LOD publications don't differ in the broader approach, e.g. all agree on using the Bibliographic Ontology (Bibo) and Dublin Core as base vocabularies choosing needed properties from other ontologies like RDA elements. Nonetheless, the datasets slightly differ in what RDF properties they use and how they apply them. In order to easily work with and combine different datasets it would make things easier, though, to have some agreed-upon best practices established for representing library catalog data in RDF.

To promote such best practices by publishing a recommendation for the RDF representation of bibliographic records, the group "Titeldaten" within the KIM-DINI working group (KIM = Competence Centre Interoperable Metadata) was established in January 2012. Currently the group consists of representatives from most German-speaking library service centers, from the German and Swiss National Libraries as well as from other institutions. The group will soon publish the first stable version of the recommendations (in German) which are currently focused on bibliographic descriptions of textual resources and thus leave out descriptions of audio(-visual) media etc.

Application profiles

At the hbz, we are promoting the re-use of existing vocabularies instead of creating a new one for every application. For our LOD service lobid, we only create new properties or SKOS vocabularies if we can't find anything from an existing vocabulary that looks serious and is still maintained. But as we have seen, re-using vocabularies doesn't by itself guarantee interoperability on the linked data web. Even if two projects select the same RDF properties for publishing their data, their use of these properties might differ significantly. E. g, one application might use dcterms:alternative for indicating what librarians call a uniform title and others might use it for title information that accompanies the main title. That is why documentation of vocabulary usage in the form of application profiles makes sense. Principally, the goal of the "Titeldaten" group is nothing else but creating an application profile (recently also called "community profile") for publishing library catalogs as linked data.

The concept of an application profile has its origin in the Dublin Core community. In a Dublin Core glossary published in 2001, "application profile" is explained as follows:

A set of metadata elements, policies, and guidelines defined for a particular application. The elements may be from one or more element sets, thus allowing a given application to meet its functional requirements by using metadata from several element sets including locally defined sets. For example, a given application might choose a subset of the Dublin Core that meets its needs, or may include elements from the Dublin Core, another element set, and several locally defined elements, all combined in a single schema. An Application profile is not complete without documentation that defines the policies and best practices appropriate to the application.

So, an application profile is an element set that draws together elements from other element sets. "Element set" is Dublin Core language for what in the linked data community is often called a "vocabulary". (Element sets aren't necessarily encoded in RDF, though.) So you can say that an application profile is a selection of RDF properties from different vocabularies. But it is more than that as the last sentence of the quote indicates. An important part of any application profile is a "documentation that defines the policies and best practices appropriate to the application".

Sharing an application profile

We like the concept of an application profile and we think it should play an important role in a linked data world where vocabularies for different domains are published all over the web and can be used by anybody for exposing their linked data. We believe, that the LOD community would benefit from a broader practice of documentation and sharing of application profiles. But how to do this properly?

Regarding the choice of language, the approach of the "Titeldaten" group probably isn't the best as we stuck to German as the language for discussing and publishing the profile. As the recommendations are directed toward the German-speaking community this might nonetheless make sense. For documentation, we chose a wiki which probably is fine for anybody interested in understanding and using the application profile for their LOD publication. In line with other DINI recommendations, the "official" text will also be published as PDF. However, what we at hbz like to have is a simple overview of the properties used for an application property along with maybe some additional information like whether URIs or strings should be used in object position. At best, we would like to publish this simple list in a standard machine-readable way so that it could even directly be used by applications. Also, this would make it possible for people to fork the application profile on github and to extend it while one could easily see the differences between both profiles. That is where JSON-LD comes into play...

JSON-LD and @context

What excited me about JSON-LD was that it brings something new to the linked data world: external JSON-LD context documents. Before I go there, first some explanation of JSON and JSON-LD. JSON (JavaScript Object Notation) is a lightweight format that is used for data interchanging. It is also a subset of JavaScript's Object Notation (the way objects are built in JavaScript) (source). During the last couple of years, JSON more and more replaced XML as standard format for data interchange on the web. Today, nearly every API on the web serves JSON.

JSON-LD is a way of encoding RDF statements (triples) in JSON. Thus, JSON-LD could be a big step forward for the linked data community as it makes it quite easy for web developers to understand the virtues of linked data. Here's an example JSON-LD document:

In the first line, '@id’ indicates what entity this JSON-LD document is about. In other cases, the '@id' keyword is used to make clear that a URI is used as value (i.e. object of an RDF statement). The second and third line make statements about the resource using the elements 'title' and 'creator' from the DC terms vocabulary. This looks straightforward and easy to understand if you already have a linked data and/or JSON background. To shorten the descriptive part of the document and to make it easier to read, JSON-LD has introduced the @context as a syntactic mechanism to map short JSON terms to property URIs.

So the context documents maps JSON terms (here: "title" and "creator") to property URIs and - as seen with dcterms:creator - with the usage of the '@type' keyword declares when the object position is to be interpreted as URI. The thing that makes a @context interesting in the context of application profiles is that it can be published at a different location than the descriptive part of the JSON document and may only be referenced in the document, e.g.:

In this example I link to a version of an external JSON-LD context file for the DINI-KIM "Titeldaten" recommendations which, amongst others, contains the mapping of the terms "title" and "creator" to the corresponding DC elements.

Example: Putting an application profile into @context

I have already pointed to a version of the DINI-KIM "Titeldaten" recommendations as JSON-LD context document. Let's look at an excerpt.

This part of the context document catches some core information of the recommendations in a clear and concise form: that title and other title information should be represented with dc:title and rda:otherTitleInformation and that both must contain an xsd:string in object position; that the uniform title is specified with dcterms:alternative; that you use different properties for indicating the URI of a creator or contributor in an authority file (dcterms:creator/dcterms:contributor) as to specifying the creator's/contributor's name (dc:creator/dc:contributor). I think this provides a good starting point for someone who wants to quickly get familiar with an application profile - and it is even comprehensible for people not at all familiar with German.

As usual, this approach also has some shortcomings. One of them is, that you can't specify a list of terms that are intended to be used as values of the listed properties. E.g. the "Titeldaten" recommendations make use of rdf:type and dcterms:medium and a list of different owl:classes and skos:concepts for indication of media type/carrier format. (Notably, this is a temporary solution as currently now sensible solution for indication of carrier and media type exists.) However, there is definitely no way to express this in a JSON-LD context document so that this part of the recommendations can't be found in the @context document.

Using @context for specifying property labels?

One might have the idea to put more than this information in a JSON-LD context document, e.g., I would like to be able to use it for specifying property labels. Linked Data might be actionable data for machines but in the end you always want to present that data to humans. That is when you need human-readable labels for the properties you use. Often, you don't want to use the label that is declared in the respective vocabulary with rdfs:label especially if you are in need of a German label. And even if the vocabulary provides a label in the language you need, you might want to choose another one for a specific application (e.g. you can present something like "title proper" to librarians but not to people who aren't familiar with cataloging rules.)

However, it might make a lot of sense to specify property labels in an application profile so that e.g. users of online services by different libraries aren't confused by differing terms. Thus, I tried to provide this kind of information in a @context document but unfortunately this is not valid JSON-LD (see the invalid document here). As said above, a context document only is a syntactic mechanism but isn't intended for containing this kind of semantic information that you can express with a vocabulary defined with RDFS. Markus Lanthaler and Niklas Lindström helped me to better understand JSON-LD and its restrictions and proposed some options on how to route around the problem by somehow enclosing property label information otherwise in the document. I must say that I am still a beginner with JSON-LD and don't know which option makes most sense for us. We will explore these options deeper when we come to the point of replacing our current setup for the lobid.org frontend (we are working on replacing the Fresnel implementation Phresnel with a solution that is based on the lobid-API currently under development).

Conclusion

It is not difficult and doesn't take much time to encode the core content of an application profile as a JSON context document. As it is a useful addition to a human-readable documentation of an application profile it may well be worth the effort to publish the core information of an application profile as a JSON-LD context document.

Read the follow-up post on providing application profiles with OAI-ORE

Leave a comment

Stichwörter

json-ld json-ld Löschen
applicationprofiles applicationprofiles Löschen
Geben Sie Stichwörter ein, die dieser Seite hinzugefügt werden sollen:
Please wait 
Sie suchen ein Stichwort? Beginnen Sie einfach zu schreiben.