In Codice Ratio:
Scalable Transcription of Vatican Registers

by Donatella Firmani, Paolo Merialdo (Roma Tre University) and Marco Maiorino (Vatican Secret Archives)

In Codice Ratio is an end-to-end workflow for the automatic transcription of the Vatican Registers, a corpus of more than 18.000 pages contained as part of the Vatican Secret Archives. The workflow has a character recognition phase, featuring a deep convolutional neural network, and a proper transcription phase using language statistics. Results produced so far have high quality and require limited human effort.

by Achille Felicetti (Università degli Studi di Firenze)

Natural language processing and machine learning technologies have acquired considerable importance, especially in disciplines where the main information is contained in free text documents rather than in relational databases. TEXTCROWD is an advanced cloud based tool developed within the framework of EOSCpilot project for processing textual archaeological reports. The tool has been boosted and made capable of browsing big online knowledge repositories, educating itself on demand and used for producing semantic metadata ready to be integrated with information coming from different domains, to establish an advanced machine learning scenario.

by Matthijs Brouwer (Meertens Institute, KNAW)

To deliver searchability on huge amounts of textual resources and associated metadata, the Lucene based Apache Solr index provides a well proven and scalable solution. Within the field of Humanities, textual data is often enriched with structures like part-of-speech, named entities, sentences or paragraphs. To include and use these annotations in search conditions and results, we developed a plugin that extends existing Solr functionalities with search and analysis options on these layers.

by Joachim Köhler (Fraunhofer IAIS), Nikolaus P. Himmelmann (Universität zu Köln) and Almut Leh (FernUniversität in Hagen)

In the project KA3 (Cologne Centre for Analysis and Archiving of Audio-Visual Data), advanced speech technologies are developed and provided to enhance the process of indexing and analysis of speech recordings from the oral history domain and the language sciences. These technologies will be provided as a central service to support researchers in the digital humanities to exploit spoken content.

by José Devezas and Sérgio Nunes (INESC TEC and FEUP)

In an information society, people expect to find answers to their questions quickly and with little effort. Sometimes, these answers are locked within textual documents, which often require a manual analysis, after being retrieved from the web using search engines. At FEUP InfoLab, we are researching graph-based models to index combined data (text and knowledge), with the goal of improving entity-oriented search effectiveness.

by Mihály Héder (MTA-SZTAKI)

Engineering research literature tends to have fewer citations per document than other areas of science. Natural language processing, argumentation structure analysis, and pattern recognition in graphs can help to explain this by providing an understanding of the scientific impact mechanisms between scientists, engineers and society as a whole.

by Maria Daskalaki and Lida Charami (ICS-FORTH)

In order to integrate new digital technologies and traditional research in the humanities and enable collaboration across the various scientific fields, we have developed a coherent overarching thesaurus with a small number of highly expressive, consistent upper level concepts which can be used as the starting point for harmonising the numerous discipline and even project specific terminologies into a coherent and effective thesaurus federation.

by George Bruseker, Martin Doerr and Chryssoula Bekiari (ICS-FORTH)

In the era of big data, digital humanities faces the ongoing challenge of formulating a long-term and complete strategy for creating and managing interoperable and accessible datasets to support its research aims. Semantics and formal ontologies, properly understood and implemented, provide a powerful potential solution to this problem. A long-term research programme is contributing to a complete methodology and toolset for managing the semantic data lifecycle.

by Claire Clivaz, Sara Schulthess and Anastasia Chasapi (SIB)

HumaReC, a Swiss National Foundation project, aims to test a new model of digital research partnership: from original document source to publisher. Its object of study is a trilingual, 12th century, New Testament manuscript, which is used by HumaReC to test continuous data publishing.

by Sheena Bassett (PIN Scrl), Sara Di Giorgio (MIBACT-ICCU) and Paola Ronzino (PIN Scrl)

Understanding how data has been created and under which conditions it can be reused is a significant step towards the realisation of open science.
PARTHENOS [L1] is a Horizon 2020 project funded by the European Commission that aims at strengthening the cohesion of research in the broad sector of linguistic studies, cultural heritage, history, archaeology and related fields through a thematic cluster of European research infrastructures. PARTHENOS is building a cross-disciplinary virtual environment to enable researchers in the humanities to have access to data, tools and services based on common policies, guidelines and standards. The project is built around two European Research Infrastructure Consortia (ERICs) from the humanities and arts sector: DARIAH [L2] (research based on digital humanities) and CLARIN [L3] (research based on language data), along with ARIADNE [L4] (digital archaeological research infrastructure), EHRI [L5] (European Holocaust research infrastructure), CENDARI [L6] (digital research infrastructure for historical research), CHARISMA [L7] and IPERION-CH [L8] (EU projects on heritage science) and involves all the relevant integrating activities projects.

by Jacco van Ossenbruggen (CWI)

Research in the humanities typically involves studying specific and potentially subjective interpretations of (historical) sources, whilst the computational tools and techniques used to support such questions aim at providing generic and objective methods to process large volumes of data. We claim that the success of a digital humanities project depends on the extent it succeeds in making an appropriate match of the specific with the generic, and the subjective with the objective. Trust in the outcome of a digital humanities study should be informed by a proper understanding of this match, but involves a non-trivial fit for use assessment.

by Jennifer Edmond (Trinity College Dublin), Frank Fischer (National Research University Higher School of Economics, Moscow), Michael Mertens (DARIAH EU) and Laurent Romary (Inria)

As it begins its second decade of development, the Digital Research Infrastructure for the Arts and Humanities (DARIAH) continues to forge an innovative approach to improving support for and the vibrancy of humanities research in Europe.

by Alessia Bardi and Luca Frosini (ISTI-CNR)

Research infrastructures (RIs) are “facilities, resources and services used by the science community to conduct research and foster innovation” [1]. Researchers’ needs for digital services led to the realisation of e-Infrastructures, i.e., RIs offering digital technologies for data management, computing and networking. Relevant examples are high speed connectivity infrastructures (e.g., GÈANT), grid computing infrastructures (e.g., European Grid Infrastructure EGI), scholarly communication infrastructures (e.g., OpenAIRE), data e-infrastructures (e.g., D4Science).

by Muhammad Hanif and Anna Tonazzini (ISTI-CNR)

Archival, ancient manuscripts constitute a primary carrier of information about our history and civilisation process. In the recent past they have been the object of intensive digitisation campaigns, aimed at their preservation, accessibility and analysis. At ISTI-CNR, the availability of the diverse information contained in the multispectral, multisensory and multiview digital acquisitions of these documents has been exploited to develop several dedicated image processing algorithms. The aim of these algorithms is to enhance the quality and reveal the obscured contents of the manuscripts, while preserving their best original appearance according to the concept of “virtual restoration”. Following this research line, within an ERCIM “Alain Bensoussan” Fellowship, we are now studying sparse image representation and dictionary learning methods to restore the natural appearance of ancient manuscripts affected by spurious patterns due to various ageing degradations.

by John N. Wall (NC State University), John Schofield (St Paul’s Cathedral, London), David Hill  (NC State University and Yun Jing (NC State University)

The Virtual St Paul’s Cathedral Project [L1], now at the half-way point in its development, is beginning to show results.  Attached are images of our draft model of St Paul’s Cathedral and buildings in the cathedral’s churchyard, from the early 1620’s, before everything seen here was destroyed by the Great Fire of London in 1666. These images are based on a combination of contemporary images of the cathedral and its surrounding buildings, surveys of these buildings made after the Great Fire, and images of appropriate buildings from this period that survive in modern-day England.

by Jean-Baptiste Barreau (CNRS/CReAAH UMR 6566), Ronan Gaugne (Université de Rennes 1/IRISA-Inria) and Valérie Gouranton (INSA Rennes/ IRISA-Inria)

A point cloud is the basic raw data obtained when digitizing cultural heritage sites or monuments with laser scans or photogrammetry. These data represent a rich and faithful record provided that they have adequate tools to exploit them. Their current analyses and visualizations on PC require software skills and can create ambiguities regarding the architectural dimensions. We propose a toolbox to explore and manipulate such data in an immersive environment, and to dynamically generate 2D cutting planes usable for CH documentation and reporting.

by Pierre Alliez (Inria), François Forge (Reciproque), Livio de Luca (CNRS MAP), Marc Pierrot-Deseilligny (IGN) and Marius Preda (Telecom SudParis)

One of the limitations of the 3D digitisation process is that it typically requires highly specialised skills and yields heterogeneous results depending on proprietary software solutions and trial-and-error practices. The main objective of Culture 3D Cloud [L1], a collaborative project funded within the framework of the French “Investissements d’Avenir” programme, is to overcome this limitation, providing the cultural community with a novel image-based modelling service for 3D digitisation of cultural artefacts. This will be achieved by leveraging the widespread expert knowledge of digital photography in the cultural arena to enable cultural heritage practitioners to perform routine 3D digitisation via photo-modelling. Cloud computing was chosen for its capability to offer high computing resources at reasonable cost, scalable storage via continuously growing virtual containers, multi-support diffusion via remote rendering and efficient deployment of releases.

by Théophane Nicolas (Inrap/UMR 8215 Trajectoires), Ronan Gaugne (Université de Rennes 1/IRISA-Inria) and Valérie Gouranton (INSA de Rennes/ IRISA-Inria) and Jean-Baptiste Barreau (CNRS/CReAAH UMR 6566)

Traditionally, accessing the interior of an artefact or an archaeological material is a destructive activity. We propose an alternative non-destructive technique, based on a combination of medical imaging and advanced transparent 3D printing.

by András Micsik, Tamás Felker and Balázs Nász (MTA SZTAKI)

The COURAGE project is exploring the methods for cultural opposition in the socialist era (cc. 1950-1990). We are building a database of historic collections, persons, groups, events and sample collection items using a fully linked data solution with data stored in an RDF triple store. The registry will be used to create virtual and real exhibitions and learning material, and will also serve as a basis for further narratives and digital humanities (DH) research.

by Thomas Tamisier, Irene Gironacci and Roderick McCall (Luxembourg Institute of Science and Technology)

The Locale project proposes a vision of location-aware digital storytelling empowered by a combination of technologies including data mining, information visualisation and augmented reality. The approach is tested through pilot contributors who share their experiences, stories and testimonies of Luxembourg since the end of World War II.

by Kathrin Koebel, Doris Agotai, Stefan Arisona (FHNW) and Matthias Oberli (SIK-ISEA)

Virtual reality (VR) reconstruction offers a new interactive way to explore the archives of the Swiss Pavilion at the “Biennale di Venezia” Art Exhibition.
The Swiss pavilion at “Biennale di Venezia” offers a platform for national artists to exhibit their work. This well-known white cube showcases the changes in contemporary Swiss art from the early 50s to the present day. The aim of the “Biennale 4D” is to make the archives of the past bi-annual art exhibitions more comprehensible by creating an interactive explorative environment using innovative virtual reality (VR) technology. «Biennale 4D» poses multiple challenges including visualisation of historic content and its documentation, dealing with the heterogeneity and incompleteness of archives, interaction design and interaction mapping in VR space, integration of metadata as well as realising a virtual reality experience for the public space with current VR technology.

by Matteo Abrate, Angelica Lo Duca, Andrea Marchetti (IIT-CNR)

The “Clavius on the Web Project” [L1] is an initiative involving the National Research Council in Pisa and the Historical Archives of the Pontifical University in Rome. The project aims to create a web platform for the input, analysis and visualisation of digital objects, i.e., letters sent to Christopher Clavius, a famous scientist of the 17th Century.
In the field of digital humanities, cultural assets can be valued and preserved at different levels, and whether or not an object is considered a knowledge resource depends on its peculiarity and richness. Within the Clavius on the Web Project we mainly consider two kinds of knowledge resources: contextual resources associated to digitised documents and manual annotations of cultural assets. We implemented a different software for each of these resource types: the Web Metadata Editor for contextual resources and the Knowledge Atlas to support manual annotation.

by Stefano Chessa and Michele Girolami (ISTI-CNR)

The Wireless Network Laboratory at ISTI-CNR studies how to exploit mobility and sociality of people in a mobile social network in order to advertise and discover the services provided by devices. This problem is commonly referred to as service discovery. An efficient strategy for advertising to other devices about the existence of new or existing services is proposed alongside a strategy for discovering a specific service.

Next issue: January 2018
Special theme:
Quantum Computing
Call for the next issue
Image ERCIM News 111 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed