by Martin Doerr (ICS-FORTH), Pavlos Fafalios (ICS-FORTH) and Apostolos Delis (IMS-FORTH)

The European project SeaLiT has developed a set of innovative tools for supporting maritime historians in digitising, curating and exploring archival sources of maritime history. The tools are the result of the interdisciplinary work between maritime historians of the Institute of Mediterranean Studies of FORTH and researchers and data engineers of the Centre for Cultural Informatics of the Institute of Computer Science of FORTH.

A vast area of research in historical science concerns the analysis of historical archival sources, in order to describe, examine and question a sequence of past events, and investigate patterns of causes and effects. This kind of research requires a data management approach that can support historians in all activities involved in their research processes, from digitising the (usually hand-written) archival sources, to curating the transcribed data and performing quantitative analysis and exploration. However, current practice nearly exclusively uses spreadsheets or simple relational databases to organise the data and perform quantitative analysis. This practice causes problems like the high dependency of the transcribed data on the initial research hypothesis, usually useless for other research, the lack of representation of the details from which the registered relations are inferred, and the difficulty to revisit the original sources of transcribed facts for verification, corrections or improvements.

SeaLiT [L1] is a European (ERC) project of maritime history in this context, that explores the transition from sail to steam navigation and its effects on seafaring populations in the Mediterranean and the Black Sea between the 1850s and the 1920s. Historians in this project are investigating a range of areas, including the maritime labour market, the evolving relations among ship-owners, captain, crew and local societies, and the development of new business strategies, trade routes and navigation patterns, during the transitional period from sail to steam. The archival sources that are studied range from handwritten ship logbooks, crew lists, payrolls and student registers, to civil registers, business records, account books and consulate reports, gathered from different authorities and written in different languages, including Spanish, Italian, French, Russian and Greek.

The information management challenge faced by SeaLiT is the ability to faithfully catalogue these unique historical sources and then use them as a primary source for research, while integrating this data into a common form from which historical analysis and questions can be carried out efficiently.

To this end, we have developed FAST CAT [L2], a collaborative system for assistive data transcription and curation in digital humanities and similar forms of empirical research. In FAST CAT, data from different information sources can be transcribed as “records” belonging to specific “templates”, where a template represents the structure of a single data source. A record organises the data and the metadata in tabular form (similar to spreadsheets), offering functionalities like nesting tables and selection of terms from vocabularies (Figure 1). The cells in a table can accept values of different types, in particular entity (the value is the name or attribute of an entity, e.g., of a person or location), vocabulary term (the value is a term from a controlled vocabulary), literal (the value is a literal, e.g., a free text, number, or date), or nested table (the value is another table).

Figure 1: An example of a record in FAST CAT.
Figure 1: An example of a record in FAST CAT.

The curation of the transcribed data can be performed through FAST CAT TEAM, a special environment of FAST CAT that allows the collaborative management of entities and vocabularies (Figure 2). This involves activities like i) applying corrections in entity names or attributes, ii) adding missing entity information or enriching with additional data (e.g., adding coordinates in the locations for enabling map visualisations), iii) dealing with the varying entity identity assumptions through instance matching, and iv) maintaining vocabularies of terms for certain types of transcribed data and enabling the creation of term hierarchies.

Figure 2: Data curation in FAST CAT TEAM.
Figure 2: Data curation in FAST CAT TEAM.

FAST CAT is innovative in its ability to support features like nested tabular structures for data entry, embedded instance matching and vocabulary maintenance processes, as well as provenance-aware data curation that does not spoil the data as transcribed from the original sources. In addition, it is configurable, which means that it can be easily used for digitising and curating other data sources, beyond the area of maritime history. These are important features that, to our knowledge, are not currently supported by existing solutions.

The transcribed and curated data can be then exploited by external applications, like the Ship Voyages map application [L3]. Ship Voyages visualises, on an interactive map, the curated data of a set of transcribed ship logbooks of the nineteenth and twentieth centuries (Figure 3). Up to now, the data of fourteen ships are visualised: nine Greek and five Spanish. The user can inspect the routes of particular ships, get more information about a selected ship location, such as location date and time, weather conditions, ship course, and related events (like change of ship course), or visit the corresponding FAST CAT record for getting additional context information as transcribed from the original sources.

Figure 3: The Ship Voyages map application.
Figure 3: The Ship Voyages map application.

Currently, we are studying additional methods on how to explore and visualise the transcribed and curated data, focusing on how to support historians in expressing complex information needs through intuitive and user-friendly interfaces.

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the European Research Council (ERC) grant agreement No 714437.
Collaborators: Georgios Samaritakis, Kostas Petrakis, Korina Doerr, Athina Kritsotaki (ICS-FORTH)


Please contact:
Pavlos Fafalios, Centre for Cultural Informatics, ICS-FORTH, Greece
+30 2810 391619
This email address is being protected from spambots. You need JavaScript enabled to view it.

Next issue: January 2024
Special theme:
Large Language Models
Call for the next issue
Image ERCIM News 124
This issue in pdf
Image ERCIM News 124 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed
Cookies user preferences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Read more
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics
Set of techniques which have for object the commercial strategy and in particular the market study.
DoubleClick/Google Marketing