by Magdalena Brus (EGI Foundation), Ville Tenhunen (EGI Foundation), Gergely Sipos (EGI Foundation), on behalf of the RI-SCALE consortium
European Research Infrastructures generate vast amounts of valuable scientific data, yet reusing these data remains difficult in practice due to the growing gap between data availability and data usability. As datasets reach terabyte and petabyte scale, downloading, storing, and analysing them locally becomes impractical, while configuring suitable computational and AI environments presents significant barriers. This article presents a new platform-based approach that brings data, computation, and artificial intelligence tools and applications together, enabling more effective, transparent, and scalable Open Science.
Research Infrastructures (RIs) play a central role in the European research ecosystem. They produce high-quality, curated datasets that underpin scientific progress in domains such as climate science, atmospheric physics, biomedicine, and imaging. Advances in sensors, cameras, and digital instrumentation have dramatically increased both the volume and complexity of this data, turning many RIs into large-scale data producers.
In parallel, Open Science has become a strategic priority in Europe. Policies promoting open access, FAIR data principles, and reproducible research have significantly improved the availability of research data. However, open availability alone does not guarantee effective reuse. As datasets grow to terabyte and petabyte scale, researchers increasingly face practical barriers related to computing capacity, software environments, and data handling expertise. Downloading large datasets to local machines is often infeasible, while configuring suitable analysis or AI environments remains a major obstacle for many users.
This growing gap between open data and usable data limits the societal and scientific value of RI data holdings. While some RIs offer tailored analysis services or cloud-based environments, these solutions are often domain-specific, difficult to scale, or costly to maintain. A more general, interoperable approach is needed to support Open Science across disciplines and infrastructures.
Recent technological developments provide new opportunities to address this challenge. Cloud and high-performance computing infrastructures, containerisation technologies, interactive notebooks, and federated identity management now make it possible to analyse data where they are hosted rather than moving data to users. At the same time, AI has become a key driver of scientific discovery, enabling new forms of data analysis, pattern detection, and automation across many research domains.
Against this backdrop, the EU-funded RI-SCALE project introduces the concept of Data Exploitation Platforms as a new way to enable scalable reuse of RI data in Open Science. A Data Exploitation Platform (Figure 1) extends an RI’s data holdings with co-provisioned computational services, allowing users to analyse data and develop AI applications directly on scalable computing infrastructures. Rather than replacing existing repositories, these platforms complement them by transforming static data collections into active environments for exploration and reuse.

Figure 1: Conceptual overview of a Data Exploitation Platform connecting Research Infrastructure data holdings with scalable computing resources and AI environments through trusted access and interoperable services.
Data Exploitation Platforms are designed as open, modular systems built on open-source software and open standards. They integrate three main functional elements. First, data lifecycle management services enable trusted replication, caching, and orchestration of datasets from RI repositories onto suitable compute resources. This reduces data movement, improves performance, and supports transparent provenance tracking. Second, computational environments provide ready-to-use tools such as interactive notebooks, data analytics frameworks, and AI toolkits, lowering technical barriers for users. Third, trust and access management mechanisms support federated authentication and authorisation, enabling both open and controlled access to data and computing resources in line with legal, ethical, and policy requirements.
By embedding AI frameworks and reusable models directly within the data environment, these platforms support AI-driven Open Science. Researchers can reuse existing models, adapt community-developed solutions, or train new models on large RI datasets without needing to manage complex infrastructure. At the same time, RI operators can apply AI techniques to improve data quality, enhance metadata, and support FAIRification processes. Importantly, provenance information for both data and models can be preserved, supporting transparency and reproducibility.
The Data Exploitation Platform approach is being validated across several European Research Infrastructures in environmental and health sciences. These include domains such as climate modelling, atmospheric research, biobanking, and bioimaging, where data volumes and reuse potential are particularly high. The platforms are also designed to interoperate with emerging European Data Spaces, enabling enrichment of RI data with external sources and supporting cross-domain research while respecting data protection and security constraints.
Beyond technical capabilities, sustainable Open Science depends on skills, governance, and long-term operational models. For this reason, the platform concept is complemented by competence centres and training activities that support researchers, RI staff, and external stakeholders in adopting scalable data and AI solutions. These activities promote knowledge sharing, responsible use of computational resources, and awareness of energy efficiency and environmental impact.
Looking ahead, Data Exploitation Platforms offer a concrete pathway toward more effective Open Science infrastructures in Europe. By bringing computation to data and integrating AI capabilities into trusted, interoperable environments, they help unlock the full value of publicly funded research data. In doing so, they support a transition from data availability to data usability, accelerating discovery and innovation for the benefit of science and society.
Links:
[L1] RI-SCALE project website: https://www.riscale.eu/
[L2] RI-SCALE use cases: https://www.riscale.eu/use-cases
Please contact:
Magdalena Brus, EGI Foundation

