This section was edited in cooperation with Informatics Europe. Guest editors: Hélène Kirchner (Inria) and Fabrizio Sebastiani (ISTI-CNR)
Research evaluation, as applied to individual researchers, university departments, or research centres, plays a crucial role in recognising and supporting research that can lead to advances in knowledge and benefits to society.
Evaluation can have a tremendously positive effect in improving research quality and productivity. At the same time, the effect of following wrong criteria or practices in research evaluation can have seriously negative long-term effects, potentially on several generations of researchers. These negative effects range from demotivating researchers who, despite performing good-quality research, get evaluated negatively, to wrongly promoting unworthy research endeavours at the expense of worthy ones.
To achieve the positive effects of research evaluation, its specific goals must be clearly formulated upfront, and it must be performed in a transparent way that is aligned with these goals. The evaluation should follow established principles and practical criteria, known and shared by evaluators, researchers, and the broader scientific community.
Research evaluation should indeed take into account the specificities of Informatics as a new science and technology, and its rapid and pervasive evolution. Informatics Europe recently published a report on this topic entitled “Informatics Research Evaluation”. The report focuses on research evaluation performed to assess individual researchers, typically for promotion or hiring. The recommendations and conclusions are outlined in the first article of this section.
The Informatics Europe report also raises several questions on the publication culture and its evolution, on the importance of artifacts, and on impact assessment for this scientific domain. Several contributions in this special section are devoted to exploring these questions.
How to evaluate the quality and impact of publications?
Dino Mandrioli (Politecnico di Milano) addresses the conferences vs. journals controversy. He explains why focusing on conference publications, rather than on journal articles, may be harmful for research quality and evaluation in informatics, and he gives serious arguments in favour of journals. He argues that conferences should go back to their original, authentic goal, i.e., circulation and discussion of ideas; journals, in his view, then remain the best and natural (although not the only) medium for publication of research results, and research evaluations should treat them accordingly. The recent trend towards the tight coupling of conferences and journals, with conference papers appearing in a journal, or conferences incorporating journal-first papers into their program, could reconcile the divergent views.
Stefano Mizzaro (University of Udine) argues in favour of an alternative to traditional peer review that relies on crowdsourcing, i.e., that exploits data and information from fellow researchers who read, spread, comment on, and cite papers. The idea is to compute a quality index of papers based on the post-publication comments they receive; in turn, this allows the computation of quality indexes for researchers as authors and researchers as readers, via algorithms in which these indexes mutually reinforce each other. Although it is unclear whether the quality of the scientific literature will be improved by crowd-sourcing peer review, proposals such as this warrant further study. The development of public archives and the concept of overlay publication might offer an opportunity to experiment with this idea.
How to evaluate software, artifacts and outreach?
Alain Girault and Laura Grigori (Inria) address the growing importance of software in academic research and, as a consequence, the necessity of taking software development activities into account when evaluating researchers. They support this idea by describing the software evaluation procedure at Inria that allows researchers to characterise their own software along different criteria, in the context of recruitment or team evaluation. Software evaluation goes together with the ability to reproduce experimental results described in a publication, and opens the way to a new publication model which, in addition to open access, also includes open data and open software, simultaneously released.
How to take into account open science criteria?
Laurent Romary (Inria) addresses the link between open access and research assessment and argues that, despite some fears, open access offers the potential of a wealth of information for the strategic management of research, typically for identifying experts or emerging topics in a given field. He points out challenges related to the possible fragmentation of the publication corpus and to the actual quality of the available information, but argues that if all the proper conditions are fulfilled for a trusted information corpus of scientific publications, we can actually foresee the basis for an open transparent process for research assessment. This opens a wide variety of usages that we have to invent and deal with in an ethical way. He suggests for instance a visibility index related to the dissemination effort of a researcher or a research institution. He concludes with a reminder that open access on research data and tools ultimately exist to help researchers themselves.
How to measure scholarly impact?
Giovanni Abramo (IASI-CNR) revisits the concept of scholarly impact, i.e., the footprint that a research paper leaves on scientific research performed after its publication. He remarks that, assuming that the amount of citations obtained by a certain paper is a good proxy of its scholarly impact, this impact can be measured only at the end of the “life cycle” of the paper, i.e., after it has ceased to influence current research (and thus ceased to be cited). This is too late for the practical needs of those who need to evaluate research (e.g., a researcher, for promotion or hiring); this means that there is a need of “early indicators of impact”, i.e., measures capable of predicting whether and how much a paper is going to be cited in the future. Giovanni Abramo recalls current efforts, essentially based on (a) using as features the early citation counts plus the impact factor of the journal in which the paper was published, and (b) using late citation counts as benchmarks for the accuracy of these predictors, but stresses in conclusion that more research is needed in this area.
Many more questions on research evaluation could be raised and good practices exchanged, but we should be aware that each of us, as soon as we are involved in any evaluation process, has the opportunity to encourage and promote research evaluation criteria which should be based primarily on quality and impact.
Hélène Kirchner, Inria, France
Fabrizio Sebastiani, ISTI-CNR, Italy