Taking Software Development Activities into Account when Evaluating Researchers

by Alain Girault and Laura Grigori (Inria)

Software is becoming increasingly important in academic research and consequently, software development activities should be taken into account when evaluating researchers (be it individually or teams of researchers). In 2011, a dedicated working group of the Inria Evaluation Committee addressed the issue of software evaluation and proposed criteria that allow researchers to characterise their own software. In the context of recruitment or team evaluation, this self-assessment exercise allows the evaluators to estimate the importance of software development in the activities of the authors, to measure the impact of the software within its target community, and to understand its current state and its future evolution.

In computer science and applied mathematics, software is crucial for several reasons:

It can be used as a proof of concept to prototype an original idea.
It can be distributed to an entire community, thus serving as a dissemination tool (quite often there can even be some competition between teams worldwide).
It can be used as an experimentation platform (e.g., the Grid5000 management software).
It can be delivered to industry and allow a faster transfer of academic results to industry.

Competitions between international teams address key challenges and aim at establishing new records with respect to a given criterion (e.g., size of the solved problems, speed or accuracy of the results). This is the case in several domains of computer science, including the model-checking, computer vision, high-performance computing, and the 3D meshing communities. In this context, it is mandatory to devise and implement new algorithms and to make the corresponding software publicly available. This allows the new algorithms to be tested by competitors.

A concrete proof of the growing importance of software is that several journals and conferences accept “software artifacts” to support a published paper, which are formally tested and evaluated. A recent trend focuses on the ability to reproduce the experimental section included in an article. Allowing this reproducibility is an important step forward towards reproducible research: given the same data and the same software, one has to derive the same results. Some journals already provide a “replication certificate” for the papers the results of which have been reproduced. We are convinced that this is an adequate step, and we support a publication model that, in addition to open access, also includes open data and open software, released simultaneously.

There are two different evaluation processes at Inria (in general, not only for software):

The evaluation of the Inria teams that takes place every four years.
The evaluation of the Inria researchers when they apply for various promotions or recruitments (in particular when they are hired as junior researchers and later when they apply for a promotion as senior researchers or higher ranks).

Taking software into account when evaluating a researcher or a research team at Inria has been common practice for over 15 years. Reflections on this topic started in the early 2000s within the Evaluation Committee of Inria, and a working group was created in 2011 under the guidance of Gérard Berry, which produced a document that allows the researchers and teams to evaluate their own software development [1]. The working group recommended the use of several qualitative and quantitative criteria to assess a software. This self-assessment process has been used at Inria for seven years and has been demonstrated to work satisfactorily. The criteria are then used by the evaluators to estimate the importance of software development in the activities of the authors of the software, to measure the impact of the software in a specific community, to understand its current state and its future evolution. More precisely, the criteria are:

1. Audience: ranging from personal usage to a massive usage.
2. Software originality: ranging from implementing known ideas to fully original ideas.
3. Software maturity: ranging from a software used for demonstration purposes only to a high assurance software certified by an evaluation agency (e.g., DO178 in civil avionics).
4. Evolution and maintenance: ranging from no particular plan to a fully organised support including, e.g., a user group or a forum where users can ask questions.
5. Software distribution and licencing: ranging from internal use within the authors’ team to external packaging and distribution (e.g., as a Linux, Matlab, or R toolboxes or packages, just to take examples).

For each criterion, the software authors choose between the four or five available levels, spanning the extreme options listed above, the one that best characterises their software. Those levels are not used to rank the software, the higher levels are not necessarily the best. Indeed, the evaluators appreciate not only the size or the maturity of the software, but also the originality of the implemented ideas and algorithms.

In addition, the authors mention their own contribution along the different axes that matter for the production of software: design and architecture, coding and debugging, maintenance and support, and finally project management. For each of these criteria, an individual’s contribution can range from occasional contribution to being (one of) the principal contributor(s). This second set of criteria is essential for highlighting the respective contribution of the different researchers to a particular software. As mentioned above, higher levels are not necessarily the best: to provide a concrete example, for the same software there can be the main architect who provided the core algorithms that form the basis of the software, and the main developer who coded the software and decided the architecture and the test and debug plan. Both roles are key to the success of the software.

In an additional succinct section, the researchers can add any information they consider relevant, such as the application targeted by the software, the user’s community (research, education, industry), its impact, a short description of the differences with respect to other state-of-the-art software with same functionalities, and so on. This section should also include the programming language, the number of lines of the code, and a url from which the software can be downloaded. The url is very important so that the evaluators can download and test the software themselves. The criteria listed in this report are self-assessment criteria, so being able to download and test the software is essential to validate the self-assessment performed by the software authors. One trend that we are seeing at the moment is that source files are stored on repository systems like git, so the evaluators can monitor precisely the actual contribution of each author to a particular software.

In conclusion, we would like to emphasise that software development plays a crucial role at Inria, it definitely constitutes a key part of the activity of our research teams, both in the applied mathematics and computer science fields. As such, being able to evaluate the software produced by the teams and the researchers is of paramount importance. The criteria for self-assessment reported in [1] have been used for the past seven years and have proved to work very well.

Reference:
[1] Inria Evaluation Committee, “Criteria for Software Self-Assessment”, 2011.

Please contact:
Alain Girault and Laura Grigori, Inria
This email address is being protected from spambots. You need JavaScript enabled to view it., This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

Taking Software Development Activities into Account when Evaluating Researchers