by Catherine Jones, Brian Matthews and Antony Wilson
Data publication and sharing are becoming accepted parts of the data ecosystem to support research, and this is becoming recognised in the field of ‘facilities science’. We define facilities science as that undertaken at large-scale scientific facilities, in particular neutron and synchrotron x-ray sources, although similar characteristics can also apply to large telescopes, particle physics institutes, environmental monitoring centres and satellite observation platforms. In facilities science, a centrally managed set of specialized and high value scientific instruments is made accessible to users to run experiments which require the particular characteristics of those instruments
The institutional nature of the facilities, with the provision of support infrastructure and staff, has allowed the facilities to support their user communities by systematically providing data acquisition, management, cataloguing and access. This has been successful to date; however, as the expectations of facilities users and funders develop, this approach has its limitations in the support of validation and reuse, and thus we propose to evolve the focus of the support provided.
A research project produces many outputs during its lifespan; some are formally published, some relate to the administration of the project and some will relate to the stages in the process. Changes in culture are encouraging the re-use of existing data which means that data should be kept, discoverable and useable, for the long term. For example, a scientist wishing to reuse data may have discovered the information about the data from a journal article; but to be able to reuse this data they will also need to understand information about the analysis done to produce the data behind the publication. This activity may happen years after the original experiment has been undertaken and to achieve this, the data digital object and its context must be preserved from the start.
We propose that instead of focussing on traditional artefacts such as data or publications as the unit of dissemination, we elevate the notion of experiment or ‘investigation’ as an aggregation of the artefacts and supporting metadata surrounding a particular experiment on a facility to a first class object of discourse, which can be managed, published and cited in its own right. By providing this aggregate ‘research object’, we can provide information at the right level to support validation and reuse, by capturing the context for a given digital object and also preserving that context over the long term for effective preservation. In the SCAPE project , STFC has built on the notion of a Research Object which enables the aggregation of information about research artefacts. These are usually represented as a Linked Data graph; thus RDF is used as the underlying model and representation, with a URI used to uniquely identify artefacts, and OAI-ORE used as a aggregation container, with standard vocabularies for provenance citation and for facilities investigations. Within the SCAPE project, the focus of the research lifecycle is the experiment undertaken at the ISIS Neutron Spallation Facility. By following the lifecycle of a successful beam line application, we can collect all the artefacts and objects related to it, with their appropriate relationships. As this is strongly related to allocation of the resources of the facility, this is a highly appropriate intellectual unit for the facility; the facility want to record and evaluate the scientific results arising from the allocation of its scarce resources.
In this process, we provide the data in the context in which has it has been collected, and thus we are using the data provenance, in the broad sense who has undertaken the experiment and why, and how the data has subsequently been processed to provide derived data products and presentations.
Figure 1: Schematic of an Investigation Research Object.
Within SCAPE, STFC has explored how to construct and maintain Investigation Research Objects as part of the Facilities scientific lifecycle, using them to prototype a notion of a ‘Data Journal’, so that Experiments can be published with their full context and supporting artefacts. These can then be used as to form an archival information package for preservation of the experiment in context. Further work combines the Open Annotation Model with an ontology of experimental techniques to provide indexing of the investigations.
Future work will consider how to bring this notion into practice, particularly in support of publication data within the article preparation process, automatically combining data from different sources and relating these to the original experiment.
 V. Bunakov, C. Jones, B. Matthews: “Towards the Interoperable Data Environment for Facilities Science”, in Collaborative Knowledge in Scientific Research Networks, ed. P. Diviacco et al., 127-153 (2015), doi:10.4018/978-1-4666-6567-5.ch007
 B. Matthews, V. Bunakov, C. Jones, S. Crompton: “Investigations as research objects within facilities science”, in 1st Workshop on Linking and Contextualizing Publications and Datasets, Malta, 26 Sep 2013.
Brian Matthews, STFC, UK