Supporting Open Science in Virtual Research Environments: The DAVE Experience

by Andrea Dell’Amico, Alfredo Oliviero, Giancarlo Panichi, Biagio Peccerillo (CNR-ISTI) and Marco Procaccini (CNR-IGG)

DAVE, a conversational assistant integrated into D4Science Virtual Research Environments, simplifies access to services and supports Open Science workflows through natural language interaction.

Virtual Research Environments (VREs) provide integrated access to data, services, and collaboration tools for open and data-intensive research [3].

Building on the D4Science Virtual Research Environments, which support over 230 VREs and around 28,000 users worldwide, our work on conversational agents has led to the development of DAVE (D4Science Assistant for Virtual Research Environments), a system designed to assist researchers directly within their workflows. The D4Science infrastructure and its VREs are described in detail in the article by Assante et al. in this issue.

The development of DAVE followed an iterative, user-centered approach that was guided by four key requirements: (a) flexibility and extensibility, as DAVE adopts a highly modular architecture that enables components to be selected, replaced, and extended as needed, ensuring adaptability to an evolving infrastructure; (b) context-awareness, as it relies on a rich and extensible knowledge base that supports heterogeneous research domains and aligns with open science ecosystems; (c) openness and explainability, embodied by the system’s transparency in both its behaviour and the knowledge sources behind its responses, which supports trust, reproducibility, and responsible reuse in line with open science values; (d) security and trustworthiness, ensuring that privacy and data protection meet the standards of the underlying infrastructure, while also supporting open science practices that balance accessibility with the safeguarding of sensitive research data.

Our journey leading to DAVE involved three successive prototypes [1]. The early modular agent, Janet (early 2023), which was based on a pipeline of fine-tuned components, proved to be limited in terms of flexibility and robustness, as well as having a high development and maintenance cost. The second attempt, the D4Science AI Agent (late 2024), adopted the Cheshire Cat framework, moving to a single-agent, multi-tool model. While improving modularity, this architecture proved less suitable for VREs, where diverse services and communities require numerous specialised capabilities. Concentrating all functionalities into one agent led to complex prompt engineering, which increased operational costs and diluted contextual relevance.

DAVE represents a paradigm shift. It adopts a highly modular multi-tool multi-agent system based on the Google Agent Development Kit (ADK) framework. The architecture, shown in Figure 1, features a VRE Assistant acting as a central orchestrator that interprets user requests, plans the sequence of actions, and delegates specific tasks to specialised sub-agents. These specialised agents are tightly integrated with D4Science services within VREs:

The Workspace Agent allows users to browse, discover, and summarize scientific documents stored in shared folders;
The Catalogue Agent facilitates the discovery and exploitation of research artifacts (datasets, software, publications);
The Social Agent summarizes community activities and interactions;
The CCP Agent interfaces with the D4Science Cloud Computing Platform, managing the execution of analytics and ensuring the repeatability and reproducibility of research methods.

Figure 1: DAVE general architecture. It is designed as a system of specialized agents coordinated by a VRE Assistant.

DAVE simplifies the application of FAIR principles by supporting the discovery and reuse of research artefacts across the scholarly lifecycle. Through natural language interaction, researchers can explore the VRE Catalogue, aggregating datasets, software, and publications, and obtain coherent overviews without navigating complex metadata or multiple interfaces. DAVE also strengthens the collaborative and methodological dimensions of research. By interfacing with the VRE social networking services, it provides contextual summaries of community activities and interactions, linking data with ongoing scientific discussion. At the same time, it supports transparency and reproducibility by guiding researchers in preparing, executing, and documenting analytical methods within the Cloud Computing Platform, making algorithms more accessible, executable, and reusable.

In conclusion, DAVE demonstrates how conversational agents can advance open science by making VREs more accessible, transparent, and reusable. By supporting the discovery, sharing, and reuse of research artifacts and enabling reproducible methods, DAVE embeds FAIR principles and transparency directly into researchers’ workflows, fostering open, trustworthy, and collaborative science.

Links:
[L1] D4Science website www.d4science.org
[L2] DAVE Live Demonstration: https://services.d4science.org/web/collab/

References:
[1] M. Assante et al.: “Deploying Conversational Agents in Virtual Research Environments: Approaches and Lessons Learned”, SN Computer Science, 2026, in press.
[2] M. Assante et al.: “Enacting open science by D4Science”, Future Generation Computer Systems, Volume 101, 2019, Pages 555–563. https://doi.org/10.1016/j.future.2019.05.063
[3] L. Candela et al.: “Virtual research environments: an overview and a research agenda”, Data Science Journal, Volume 12, 2013, Pages GRDI75–GRDI81. https://doi.org/10.2481/dsj.GRDI-013

Please contact:
Biagio Peccerillo
CNR-ISTI, Italy
This email address is being protected from spambots. You need JavaScript enabled to view it.