DIRECT: the First Prototype of the PROMISE Evaluation Infrastructure for Information Retrieval Experimental Evaluation

by Nicola Ferro

PROMISE is a network of excellence focused on the experimental evaluation of multilingual and multimedia information access systems. One of its key contributions is to develop and provide an open evaluation infrastructure, which brings automation to the evaluation process, managing, curating, and providing access to the scientific data produced during the evaluation activities.

{jcomments on}

Experimental evaluation is a key activity for driving and supporting the development of multilingual and multimedia information access systems. It is an essential part of the scientific process since using shared data sets and evaluation scenarios systems can be compared, performances can be better understood, and progress can be pursued and demonstrated.

Large-scale evaluation initiatives, such as Text REtrieval Conference (TREC) in the United States, the Cross-Language Evaluation Forum (CLEF) in Europe, and the NII-NACSIS Test Collection for IR Systems (NTCIR) in Asia, contribute significantly to advancements in research and industrial innovation in the information retrieval sector, and to the building of strong research communities. A study conducted by NIST reports that “for every $1 that NIST and its partners invested in TREC, at least $3.35 to $5.07 in benefits accrued to IR researchers. The internal rate of return (IRR) was estimated to be over 250% for extrapolated benefits and over 130% for unextrapolated benefits”.

Large-scale evaluation campaigns produce a huge amount of extremely valuable scientific data which provides the foundations for subsequent scientific production and system development and constitutes an essential reference for literature in the field. This data is also economically valuable, due the considerable effort devoted to its production: the NIST study estimates in about 30 million dollars the overall investment in TREC.

Nevertheless, little attention has been paid over the years to modelling, managing, curating and accessing the scientific data produced by evaluation initiatives, despite the fact that the importance of scientific data in general has been highlighted by many institutional organizations, such the European Commission, the US National Scientific Board, and the Australian Working Group on Data for Science.

Objectives
Our goal is to deliver a unified infrastructure and environment for data, knowledge, tools, methodologies, and the user community in order to advance the experimental evaluation of complex multimedia and multilingual information systems. The evaluation infrastructure will:

manage and provide access to the scientific data produced during evaluation activities
support the organization of evaluation campaigns
increase the automation of the evaluation process
provide component-based evaluation
foster the usage and understanding of the scientific data.

A user-centered design approach will be adopted involving the different stakeholders, eg scientists, evaluation campaign organizers, system developers, students, in the development of the infrastructure.

DIRECT: the First Prototype
The outcome of this effort is the Distributed Information Retrieval Evaluation Campaign Tool (DIRECT), which

introduces a conceptual model of the information space of an evaluation campaign
provides metadata describing the scientific data managed, to enable sharing and re-use
adopts a unique identification mechanism allowing explicit citation of and easy access to the scientific data
manages all the aspects of an evaluation campaign, and provides tools for statistical analyses and reporting of results.

DIRECT has been developed and tested in the course of the annual CLEF evaluation campaigns since 2005. It now manages and provides online access to much of the data produced over ten years of CLEF. It also aims at improving interaction with the experimental results by researchers and system developers. We are now investigating the adoption of innovative devices, such as the iPad, which can allow for a natural and easy interaction with the experimental results and scientific data in real time.

Next Steps
PROMISE is a three year project beginning in September 2010. It will issue releases of the evaluation infrastructure with new functionalities, such data annotation and visual analytic techniques, annually. In order to achieve a better representation, interaction, and understanding of experimental results, we are investigating how best to exploit human-computer interaction and the principles of visual analytics. This will be the topic of the PROMISE Winter School Information Retrieval meets Information Visualization, which will be held in January 2012, Zinal, Switzerland.

The information retrieval area is now beginning to explore and exploit the scientific data produced during the evaluation studies by making use of methods typical of the database and knowledge management areas. The aim of the Data infrastructurEs for Supporting Information Retrieval Evaluation (DESIRE 2011) workshop, co-located with CIKM 2011, the 20th ACM Conference on Information and Knowledge Management, in October 2011, Glasgow, UK is to bring together experts from the three communities in order to discuss the challenges involved. The intention of the organizers is to produce a roadmap and a set of initial best practices guiding the development of evaluation infrastructures to manage experimental data.

Links:
PROMISE: http://www.promise-noe.eu/
DIRECT: http://direct.dei.unipd.it/
DEMO: http://www.youtube.com/watch?v=fDsXDCUPkiM
CLEF 2011: http://www.clef2011.org/
CLEF: http://www.clef-campaign.org/
DESIRE 2011 Workshop: http://www.promise-noe.eu/events/desire-2011/
PROMISE Winter School 2012: http://www.promise-noe.eu/events/winter-school-2012/
TREC Economic Impact Study: http://trec.nist.gov/pubs/2010.economic.impact.pdf

Please contact:
Nicola Ferro, University of Padua, Italy
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.