View other issues


Preparing the Ground for the German Digital Library

by Kai Stalmann, Marion Borowski and Sven Becker

Providing users with information and knowledge about cultural heritage objects has been a core business of libraries, museums, archives, and other institutions for centuries. Accessing the information, however, has always been limited to those who literally step through the portals of these elevated places of acquired knowledge. Digitizing and publishing of digitized objects to the web brings knowledge to a much broader audience. It is even conceivable that cultural heritage might once again play a significant role in society, providing it is easy enough to explore, openly accessible and applicable to range of purposes.

The DDB as a linked data project: see the woods and see the trees...
The DDB as a linked data project: see the woods and see the trees...

With the use of the right key terms, a search engine can offer the user a veritable treasure trove of knowledge. The “treasure”, however, can vary depending on what the individual is searching for, ranging from an object located somewhere on the globe to an image of or deeper knowledge about a concept or object.

Which technologies might be useful in enabling the public to access their cultural heritage in a variety of ways? The approach taken for implementing the Deutsche Digitale Bibliothek (DDB) at Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS in conjunction with the Federal Government Representative for Culture and Media (BKM) relies on a modular and scalable, light weight architecture, a clear cut API based on open standards, a flexible model, and the use of given datasets and smart algorithms.

The DDB may hold binaries like images, scanned book pages, digitized 3D objects and video material, thus allowing users to work directly with these assets. This offers a range of fantastic uses, including opportunities for those who are artistically inclined who wish to work with the bare material and use it for their own creative productions.

Finding items turns out to be amazingly easy on the “Cortex platform” which underlies the DDB. The state of the art search engine supports keyword search and facetted search which are usually found on commercial search interfaces. Facets provide a means to drill down iteratively, and thus allow for easy management of the largest result sets. What makes a difference here is the semantic richness of the facets which depends on both the quality of the model and of the mapping of the input data to that model. The DDB uses a harmonized model based mainly on CIDOC CRM and the new Europeana Data Model (EDM). These models were designed explicitly with a vision of a network of knowledge in mind.

Creating a network of knowledge is at the heart of the DDB project. Over the years, millions of things, events, actors, places and time spans will become entangled by different kinds of relationships between these entities. Thus the DDB contributes to the semantic web. It also depends on the semantic web: vocabularies and ontologies are needed for recognizing relationshipss and identifiers. A two-way relationship exists between users and the DDB; users profit from a DDB but are also the most powerful resource for enabling the network of knowledge to come into existence. The unique selling point of the DDB is not only the original content, but also what given persistent identifiers, ’artificial intelligence’ and ’the crowd’ may reveal as hidden links between the objects of cultural heritage.

Analyzing single objects (like museum pieces or dossiers in an archive) may still require travelling to the place where these objects reside whenever high quality digital representations can not be made accessible online due to technical or legal restrictions. But exploring relationships between entities that are part of the knowledge model encourages a different kind of exploration that is not restricted by physical borders. The DDB will therefore support different levels of queries. Besides keyword search combined with semantically rich facets, more advanced queries based on SPARQL will allow for data mining the cultural heritage. Data mining in general unveils correlations that are extremely difficult to detect without those technologies.

The DDB consists of a middleware (Cortex) for managing access, search, and the ingestion of objects. The ingest service binds objects to identifiers, resolves resources already in place, and links objects to internal and external resources. The ingest process is the tricky part since cultural heritage metadata files differ tremendously in semantic richness, format (DC, EAD, Lido, Marc, Mods, to name a few), and size (from less than 1KB to over 100MB for one item). Despite approaches towards standardizing, the metadata formats of one kind may well still be used in many different ways. In our approach, the incoming data is prepared for ingest by a separate tool called ASC or Augmented SIP Creator (SIP standing for Submission Information Package, the term is borrowed from the OAIS Reference Model). The Cortex middleware, a set of metadata mappings, and the ASC are currently being built at Fraunhofer IAIS (a first release was accepted in April 2011). The middleware makes use of various widely accepted technologies like Spring, REST, JSON, XML). As backends Cortex uses a triple store, Solr as a search index, and a storage cloud. The architecture was explicitly designed for reuse in similar projects. Since the underlying model and the mappings can be adapted to other needs, this system could be applied to any in which digital objects form a knowledge network based on either given or automatically extracted metadata.

The DDB is a national project that will be steered by a “Kompetenznetzwerk” (competence network) DDB consisting of members of many institutions that contribute content to the DDB. But the DDB may also be embraced by an ecosystem that still has to be evocated. A first exciting glimpse of an ecosystem around the DDB was presented in Berlin in April 2011. Therefore a set of distributed services where mashed up to extend the core DDB functionality. We anticipate that the DDB could stimulate activities around cultural heritage objects. Beyond ‘looking for something’ this might include 'exploring the betweens’, and ultimately also providing functionality that does something new to both objects and betweens.

DDB project website:
Fraunhofer IAIS Netmedia:
Documents on the Cortex platform (currently only in German):

Please contact:
Kai Stalmann, Marion Borowski, Sven Becker
Fraunhofer IAIS, Germany
E-mail: {kai.stalmann, marion.borowski, sven.becker}

{jcomments on}