by Laurent Romary (Inria)
Despite some concerns within the scientific community, open access has the potential to offer a wealth of information for the strategic management of research.
Recent years have seen an increase in initiatives intended to foster open access, i.e. the free dissemination and re-use of scientific publications, at institutional, national and European levels. The European Union is taking a strong stance on the issue, making open access mandatory in its H2020 program, and funding the OpenAire initiative to coordinate publication repositories all over the continent. Several countries have also taken important legislative initiatives with the “Zweitveröffentlichung” principle in Germany and its equivalent in the recent “Loi pour une République Numérique” in France. Moreover, several institutions, such as Inria (FR), CentraleSupelec (FR) and the University of Liège (BE) have set up a “deposit mandate”, whereby annual reporting is entirely based upon the scholarly publications that are made public in designated open publication repositories (e.g., HAL in France).
Whilst this clearly represents major progress for the fast and free dissemination of scientific knowledge, there are concerns within the scientific community that their activity and results may be scrutinised in an inappropriate way, and in particular that researchers’ assessments may come to be based upon numerical impact measures computed from the material in publication repositories. Scientists are also uncertain with regards to the possible poorer quality of material put online at an early stage, as well as with the risk of being plagiarized, even if on the contray open access is the best rampart against plagiarism. It is vital that we begin to balance these concerns by acquiring an understanding of the ways in which science and scientists may benefit from an ambitious open access policy, and that we communicate these benefits to the scientific community. For instance, a thorough coverage of the publication corpus of an institution, or of an entire country, could provide a means to analyse co-publication patterns, and in doing so could help identify collaboration strengths or weaknesses that could be fostered or improved. Text and data mining techniques can also provide mechanisms to identify expertise in a given research landscape or emerging topics within a given field. At the end of day, open scientific information could be used in a whole range of ways that we have to invent and deal with in an ethical way, in compliance with the DORA principles, for instance.
Right now we have even more pressing challenges, though, including: a) the possible fragmentation of the publication corpus and b) the actual quality of the available information. The first issue has to do with our capacity to have a coherent picture of the publication landscape and could be based upon the existence of a network of interoperable publication repositories where content, as well as access or download information, can be used, re-used and above all mined under open licences. Relying on private third parties (under the “gold open access model”) carries with it the risk that we may be left with diminished re-use possibilities for our own publications, which may end up spread across various publishers’ servers.
The second issue is even more central when considering publication information as the basis for the assessment and design of scientific policies. We need to ensure that the documentation associated with publication, comprising author identification, affiliations and precise publication information, is recorded and curated in a way that guarantees a trusted background for further analyses. For instance, it is important to maintain proper (open) authorities for research institutions upon which affiliation information can be based. In this domain we need to determine how much we can rely on third party initiatives such as ORCID where publishers may have too strong a voice.
If all the conditions are met for a trusted information corpus related to scientific publication, this could conceivably form the basis for an open, transparent process for research assessment exercises. In fact, this is exactly what the French assessment institution Hcéres is proposing, with their announcement in early 2018 that future assessment campaigns would rely entirely on the French national publication repository HAL to disseminate a) reference publications, as selected by the research organisations under assessment, as the sole source of information for reviewers and b) the self-assessment reports themselves, so that this information is made available to the wider research community and the public.
In conclusion, if we are serious about the issue of open access to publications - and probably to research data too - we could imagine setting up a visibility index that would relate to the proportion of openly available and re-usable content in a public repository in relation to the actual research output of an individual, team laboratory or institution. While not necessarily useful for “assessing” scientific outputs, such an index could encourage scholars to feel concerned with the wider dissemination of their own research.
At the end of the day, if open access policies are to be used for any form of assessment, its goal should be to assist the researchers themselves, providing them with appropriate data and tools to reflect on their own research trends, practices and, possibly, impact  . This implies, in addition to the factors that we have touched on above, that openness also applies to research data, expressed in formats compliant with international standards, and tools, developed and distributed according to open source principles.
 S. Harnad: “Open access scientometrics and the UK Research Assessment Exercise”, Scientometrics, 79(1), 147-156, 2008.
 L. Romary, C. Armbruster: “Beyond institutional repositories”, International Journal of Digital Library Systems, 1 (1), pp.44-61, 2010. 〈hal-00399881〉
Laurent Romary, Inria, France