by Robert Viseur and Nicolas Devos
The term ‘open data’ refers to “information that has been made technically and legally available for reuse”. Open data is currently a hot topic for a number of reasons, namely: the scientific community is moving towards reproducible research and sharing of experimental data; the enthusiasm, especially within the scientific community, for the semantic web and linked data; the publication of datasets in the public sector (e.g., geographical information); and the emergence of online communities (e.g., OpenStreetMap). The open data movement engages the public sector, as well as business and academia. The motivation for opening data, however, varies among interest groups.
The publication of scientific data is the logical continuation of open access initiatives. Such initiatives, e.g., the Budapest Open Access Initiative, wish to achieve open access to scientific publications - i.e., access without financial, legal or technical barriers. Three criteria must be met for data to be considered open: (i) access to publications should be free; (ii) the data should not be protected by digital rights management (digital handcuffs) or scrambled, (iii) access policies should be clearly communicated and allow the copy, distribution and modification of data.
Murray-Rust notes that the lack of a consistent definition of the terms ‘open’ and ‘open access’ may cause confusion. The term ‘open’ should imply that the scientific data belongs to the scientific community and is made freely available. The accessibility of scientific publications and the related data is of great interest both for the validation of the research and the reuse of the data in new studies that may (or may not) have been foreseen by the original authors. The practice of data sharing is more common in some scientific disciplines, such as biosciences where the data is published and consolidated into databases. Nevertheless, some publishers aggressively defend their copyright and are opposed to new scientific practices that publicly disseminate the research results with their source code, the data structures, the experimental design, the parameters, the documentation and figures .
Figure 1: An extract from Insight illustrating the automatic testing and contribution process.
The journal “Insight” shows how open access, open source and open data could change scientific practice. “Insight” is a peer reviewed online publication that is associated with the software Insight Segmentation and Registration Toolkit (ITK). ITK, which is supported by the company Kitware, is an open source software tool for image analysis. Scientific results are published with each article as per usual, but are also accompanied by the source code and the data in order to enhance the reproducibility of the research (‘reproducible research’) . The technical infrastructure automates the source code compilation and testing. Several authors have developed the idea of ‘executable papers’, based on Linked Data and the Cloud infrastructure .
Open access to data can be fostered by governments. Governments can impose open access to research units, as illustrated by the UK Department for International Development, which in 2012, opened access to development research data in order to stimulate innovation.
Several issues will have to be addressed to facilitate the spread of open access to scientific data. Few peer reviewed journals support open data or possess the technical infrastructure needed to power the automation of the code execution, compilation and testing. Moreover, high ranking journals benefit from their advantageous market position and have no incentive to develop new publication methods. As a result, the mobilization of the entire scientific community, its commitment to play the openness game, and its support to high-quality innovative journals, will be fundamental to the success of these new collaborative practices.
Insight Segmentation and Registration Toolkit: http://www.itk.org
Open Knowledge Foundation: http://www.okfn.org
Panton Principles for Open scientific data: http://www.pantonprinciples.org
 J. Jomier, A. Bailly, M Le Gall, et al.: “An open-source digital archiving system for medical and scientific research”, Open Repositories, 2010, vol. 7.
 T. Kauppinen, G.M. de Espindola: “Linked open science-communicating, sharing and evaluating data, methods and results for executable papers”, Procedia Computer Science, 2011, vol. 4, p. 726-731.
 V. Stoddden: “The legal framework for reproducible scientific research: Licensing and copyright”, Computing in Science & Engineering, 2009, vol. 11, no 1, p. 35-40.
Robert Viseur, Nicolas Devos