Advancing Open Science: Federated Infrastructures and Trustworthy Ecosystems

by the guest editors Leonardo Candela (CNR-ISTI) and Roberto Di Cosmo (Inria and University Paris Cité)

Open Science is a broad and evolving movement. The UNESCO framework describes it as an inclusive approach aimed at making scientific knowledge openly available, accessible, and reusable for everyone, opening the processes of knowledge creation and evaluation to stakeholders beyond the traditional research community [1]. Today, Open Science is no longer merely a normative ideal: it has become an operational requirement embedded in national strategies, funding conditions, and research assessment reforms across Europe and beyond.

Yet the very breadth of Open Science is also its greatest challenge. The movement rests on several distinct pillars, each with its own history, infrastructure landscape, and degree of maturity — and each exposed to the same structural risk: fragmentation.

The three pillars — and the fragmentation trap
The oldest pillar is open access to publications. Decades of effort have produced undeniable progress, but also a cautionary tale. Because coordination came late, the landscape is now highly fragmented: OpenDOAR counts over 6,000 open access repositories worldwide, each requiring its own infrastructure, archival, backup, and metadata curation. Content is duplicated, metadata is inconsistent, and the cost of maintaining this patchwork is borne many times over. The recent move to fund, via national grants, the EU-originated Open Research Europe journal illustrates how difficult it is to retrofit coherence onto an ecosystem that grew without a shared architectural plan.

The second pillar, open research data, has benefited from the lessons of publications and from the early adoption of the FAIR principles. Yet a similar proliferation of platforms and curation challenges is already visible, with a very long tail of research data that struggles to find a sustainable home. National initiatives such as Recherche Data Gouv in France and the PLATICA project in Spain point toward a promising model: shared, mutualized infrastructures that host curated research data as a public good, rather than leaving each institution to build and maintain its own silo.

The third pillar — research software — has long pre-existed the others, since software has been at the heart of scientific computation for decades. Yet it was recognized as a pillar of Open Science only very recently. The French Second National Plan for Open Science (2021) was the first national strategy to dedicate a full chapter to software, establishing measures for archiving, referencing, and citing source code, creating a national research software award, and providing explicit support for Software Heritage as a key infrastructure [2]. Spain is now actively building on this momentum, as evidenced by the discussions at the recent second national days on Open Science held in Aranjuez in March 2026.

For software, there is a unique opportunity to avoid the fragmentation that has plagued publications and data. Software Heritage was designed from the outset as a universal, open, non-profit archive for all software source code. It already preserves over 28 billion source files from more than 430 million projects collected across over 5,000 code hosting and distribution platforms worldwide, assigning intrinsic, cryptographically strong identifiers (SWHIDs, now standardized as ISO/IEC 18670). This provides a single, shared layer for archiving, referencing, describing, and citing software — a foundation that Open Science policy can build on directly, without the need to reconcile thousands of independent local repositories after the fact.

Federating from the top: promise and friction
Alongside bottom-up infrastructure efforts, Europe has invested heavily in top-down coordination through the European Open Science Cloud (EOSC), which aims to federate existing services into an interoperable, cross-border research environment. Several contributions to this issue illustrate both the promise and the complexity of this endeavour.
Yet federation by decree is hard. Even in countries with active EOSC engagement, surveys show that a majority of researchers still store data primarily on personal computers, and awareness of federated infrastructure remains low. The gap between policy ambition and daily research practice is real, and bridging it requires not just technical platforms but sustained investment in skills, incentives, and institutional culture change.

A map of the current landscape
The contributions collected in this special theme offer a cross-section of the current European effort, organised into five thematic clusters.

A first cluster addresses research assessment and scholarly representation. The OpenAIRE Graph (Manghi) provides a community-governed scholarly knowledge graph treating datasets and software as first-class outputs, offering an open alternative to proprietary research intelligence. MyResearchFolio (Amodeo and Xenou) builds on this to support richer researcher profiles aligned with responsible assessment principles, while BibTexViz (Horcas) demonstrates visual analytics for open bibliographic data. The EOSC Open Science Observatory (Szybisty) combines indicators, national narratives, and AI-assisted analysis to monitor Open Science progress across Europe.

A second cluster explores the transition from FAIR data to AI-ready workflows. Contributions show how shared industrial datasets can feed collaborative knowledge pipelines (Gorissen and Brauner), how compute-to-data architectures enable scalable analysis on research infrastructures (Brus et al.), and how modular, open-source research software frameworks can support advanced biomedical analytics (Segura-Ortiz et al.).

A third cluster highlights semantic foundations and knowledge graph infrastructures as critical enablers of interoperability, through the transformation of legacy databases into FAIR-by-design knowledge graphs (Marketakis et al.) and the evolution of the EOSC Interoperability Framework toward machine-actionable, composable service templates (Bardi et al.).

A fourth cluster addresses the governance, skills, sovereignty, and ethical foundations without which technical infrastructure cannot function. Contributions cover human-centred threat modelling (Onofri and Corti), structured co-creation in data spaces (Stampfl and Palkovits-Rauter), Open Science education beyond purely technical skills (Flicker et al.), the Czech national experience with FAIR adoption (Dvořák et al.), the tension between Creative Commons licences and AI training (Spichtinger), and privacy-enhancing technologies for secure cross-border data sharing (Jimenez-Bejarano et al.).

The fifth and final cluster presents operational experiences with federated science gateways, including the EOSC EU Node (Brunschweiger et al.), the Innovation Sandbox (Drago and Fiore), the Data Commons (Fernández and Fava), the ENVRI-Hub for environmental research (Drago et al.), the D4Science virtual research environments (Assante et al.), and the DAVE conversational AI assistant for navigating complex research workflows (Dell'Amico et al.).

Looking ahead
Taken together, these contributions make clear that the next phase of Open Science will be defined not just by openness, but by trustworthiness and integration. Several priorities stand out.

First, avoiding fragmentation must become a conscious design principle, not an afterthought. For each pillar of Open Science — publications, data, and software — the question is whether we build shared, mutualized infrastructure from the start or spend decades trying to harmonize a patchwork.

Second, research assessment must formally recognize the full range of research outputs — datasets, software, workflows — alongside publications, moving away from proprietary metrics toward transparent, community-governed research intelligence.

Third, the intersection of open licensing and AI training remains legally ambiguous. As AI models increasingly consume open research data and code, robust opt-in/opt-out mechanisms and legal clarity are urgently needed.

Finally, long-term financial sustainability for community-governed infrastructure remains an open problem. Short-term project funding cannot secure the digital commons on which European research increasingly depends.

If the first phase of Open Science was about making research outputs accessible, the present phase is about making research ecosystems interoperable, intelligent, and trustworthy. The contributions in this issue offer both concrete experiences and forward-looking perspectives on how Europe is working to make that vision a reality.

References:
[1] UNESCO (2022) An introduction to the UNESCO Recommendation on Open Science. https://doi.org/10.54677/XOIR1696
[2] Second French Plan for Open Science, Generalising Open Science in France, 2021-2024 https://www.ouvrirlascience.fr/second-national-plan-for-open-science/

Please contact:
Leonardo Candela, CNR-ISTI, Italy
This email address is being protected from spambots. You need JavaScript enabled to view it.

Roberto Di Cosmo
Inria and University Paris Cité, France
This email address is being protected from spambots. You need JavaScript enabled to view it.