Reinforcing Open Science in Biodiversity through Semantic Knowledge Graphs

by Yannis Marketakis, Eleni Tsouloucha, Athina Kritsotaki, and Yannis Tzitzikas (FORTH-ICS)

Transforming FishBase into a semantic knowledge graph shows how legacy biodiversity databases can become FAIR, interoperable infrastructures that enable reuse and integration in Open Science.

Open Science is one of the cornerstones of European research policy, promoting transparency, accessibility, reuse, and collaboration across scientific domains. While many scientific communities have made progress in opening data, experience shows that availability alone does not guarantee meaningful integration, reuse, or long-term sustainability. Addressing this challenge requires data infrastructures that go beyond simple access, supporting semantic integration, evolution, and machine-readable access, enabling data to be combined, queried, and reused across domains.

Biodiversity research provides a compelling paradigm for these challenges. FishBase, one of the most widely used global databases on fish species, has long played a crucial role in supporting research, education, and policy. Its contents are openly accessible and continuously curated, making it an important resource for the biodiversity community. Despite its importance, like many other legacy databases, FishBase was originally designed as a standalone information system. As a result, integrating its data with other research resources, performing complex cross-domain queries, or reusing its content in new contexts often requires manual effort.

This experience highlights a broader Open Science challenge: how to support the evolution of well-adopted, widely used open databases into FAIR-by-design data infrastructures. In what follows, FishBase is used as a concrete example to illustrate how semantic knowledge graphs, open APIs, and evolution workflows can reinforce Open Science by enabling interoperability, reuse, and long-term sustainability of biodiversity data, as demonstrated in recent Open Science efforts around biodiversity knowledge graphs [1]. Figure 1 provides an overview of this transformation and its role in enabling integration with external Open Science infrastructures.

Figure 1: The transformation workflow of FishBase from a relational database into a FAIR-by-design semantic knowledge graph, supported by evolution workflows and open APIs enabling interoperable access and integration with external Open Science infrastructures.

Applying FAIR data principles in practice requires more than just making data available online. It requires data to be described in a structured and semantically explicit manner, accessed through standardized interfaces, and supported by mechanisms that ensure consistency and sustainability as data evolve over time. This also directly supports reproducibility and transparency, since datasets, structures, and relationships can be inspected, queried and recombined in a verifiable and machine-interpretable manner. Thus, FAIR data management is not an afterthought, but a design choice that must be embedded into the data infrastructure and its governance policy.

The experience with FishBase clearly illustrates these requirements. Although FishBase has long been an open and continuously curated resource, its original design as a conventional database limits cross-domain integration and large-scale reuse. Integrating its data with external biodiversity resources, answering complex queries that require combining different pieces of information, or supporting data reuse typically requires manual effort.

The transformation of FishBase into a semantic knowledge graph, carried out within the SemantyFish initiative [1] [L1], addresses these limitations by providing a FAIR-by-design representation of biodiversity data. Species, biological characteristics, ecology, population dynamics, life cycle and history, distribution, and other resources are modelled as semantically defined entities connected through explicit relationships. This makes data machine-interpretable and easier to integrate across domains. To this end, ontologies play a central role, offering shared conceptual models that ensure semantic consistency and interoperability with other biodiversity infrastructures.

Furthermore, access and reuse are further enhanced through open APIs, which expose the semantic knowledge graph via standardized, programmatic interfaces. This layer enables both human-oriented applications and machine-operated services and workflows to access up-to-date data in a structured, standardized, and reusable form, without relying on static exports or other ad hoc solutions.

Finally, evolution workflows ensure that the semantic representation remains aligned with updates to the original FishBase database, without requiring changes to FishBase’s existing data management or update policies. As new data are added or existing data are revised, changes are systematically propagated to the knowledge graph, preserving accuracy and trustworthiness over time. Overall, semantic modeling, knowledge graph construction, open APIs, and evolution workflows demonstrate how a widely used biodiversity database can be transformed into a FAIR, sustainable infrastructure that supports Open Science.

This transformation enables forms of sharing and reuse that were previously difficult to achieve. The explicit representation of entities and relationships allows researchers to formulate complex queries that traverse species, ecological traits, habitats, and references, supporting biodiversity research and analyses that go beyond predefined database views. At the same time, machine-interpretable semantic resources make it possible to programmatically combine FishBase data with external datasets, facilitating automated workflows and interoperability, as demonstrated in cross-domain knowledge graph applications for sustainable aquafood communication [2].

The experience with FishBase illustrates how rethinking data infrastructures can strengthen Open Science practices. Designing for interoperability, reuse, and evolution of existing scientific data resources proved essential for enabling meaningful data sharing without disrupting established practices. Looking ahead, such approaches can support broader perspectives, including cross-domain research by providing sustainable foundations for Open Science.

Link:
[L1] https://semantyfish.github.io/

References:
[1] Connecting fish data for open and sustainable science. EuroFish Magazine, Issue 6, 2025. https://eurofish.dk/connecting-fish-data-for-open-and-sustainable-science/
[2] Y. Marketakis, et al “Construction, Access and Application of a Knowledge Base for Sustainable Aquafood Communication”, 14th International Joint Conference on Knowledge Graphs. Heraklion, Greece, 2025.

Please contact:
Yannis Marketakis
FORTH-ICS, Greece
This email address is being protected from spambots. You need JavaScript enabled to view it.