Cross-Language Evaluation Forum

by Carol Peters

The results of the seventh campaign of the Cross-Language Evaluation Forum were presented at a two-and-a-half day workshop held in Alicante, Spain, 20-22 September, immediately following the tenth European Conference on Digital Libraries. The workshop was attended by over 130 researchers and system developers from academia and industry.

The main objectives of the Cross-Language Evaluation Forum (CLEF) are to stimulate the development of mono- and multilingual information retrieval systems for European languages and to contribute to the building of a research community in the multidisciplinary area of multilingual information access (MLIA). These objectives are realised through the organisation of annual evaluation campaigns and workshops. The scope of CLEF has gradually expanded over the years. While in the early years, the main interest was in textual document retrieval, the focus is now diversified to include different kinds of text retrieval across languages and on different kinds of media (ie not just plain text but collections containing images and speech as well). In addition, attention is given to issues that regard system usability and user satisfaction with tasks to measure the effectiveness of interactive systems.

Evaluation Tracks
In CLEF 2006 eight tracks were offered to evaluate the performance of systems for:

mono-, bi- and multilingual document retrieval on news collections (Ad-hoc)
mono- and cross-language structured scientific data (Domain-specific)
interactive cross-language retrieval (iCLEF)
multiple language question answering (QA@CLEF)
cross-language retrieval on image collections (ImageCLEF)
cross-language speech retrieval (CL-SR)
multilingual web retrieval (WebCLEF)
cross-language geographic retrieval (GeoCLEF).

Test Suites
Most of the tracks adopt a corpus-based automatic scoring method for the assessment of system performance. The test collections consist of sets of statements representing information needs (queries) and collections of documents (corpora). System performance is evaluated by judging the documents retrieved in response to a topic with respect to their relevance (relevance assessments) and computing recall and precision measures.

The following document collections were used in CLEF 2006:

CLEF multilingual comparable corpus of more than 2 million news documents in 12 European languages
CLEF domain-specific corpora: English/German and Russian social science databases
Malach collection of spontaneous speech in English and Czech, derived from the Shoah archives
EuroGOV, ca 3.5 M webpages crawled from European governmental sites.

The ImageCLEF track used collections for both general photographic and medical image retrieval:

IAPR TC-12 photo database; LTU photo collection for image annotation;
ImageCLEFmed radiological database; IRMA collection for automatic image annotation.

Participation
Participation was up again this year with 90 groups submitting results for one or more of the different tracks: 60 from Europe, 14 from North America, 10 from Asia, 4 from South America and 2 from Australia.

The real-time exercise: demonstrating the interface to the participants.

Workshop
The campaign culminated in the workshop held in Alicante, 20-22 September. The workshop was divided between plenary track overviews, parallel, poster and breakout sessions. In her opening talk, Carol Peters, the CLEF Coordinator, stressed the need for more technical transfer activities. She commented that although many advances had been made in the multilingual information access research field there were still few real-world operational cross-language systems. In her opinion, CLEF should be paying more attention to issues that directly regard the user and the needs of the application communities rather than focusing most attention on system performance only in terms of precision and recall. In fact, one of the most interesting activities this year was the real-time question answering exercise, organised on-site by Fernando Llopis and Elisa Noguera, U. Alicante (see figure). Here the aim was to examine the ability of question answering systems to respond within a time constraint. The need for more technical transfer was taken up again in the final session in two talks. Martin Braschler, U. Applied Sciences Winterthur, Switzerland, gave an insightful talk on What MLIA Applications can learn from Evaluation Campaigns while Fredric Gey from U.C Berkeley, USA, summarised some of the main conclusions of the MLIA workshop at SIGIR 2006 in Seattle, where much of the discussion was concentrated on problems involved in building and marketing commercial MLIA systems. There was also an invited talk by Noriko Kando, National Institute of Informatics, Tokyo, Japan, on new evaluation activities at the NTCIR evaluation initiative for Asian languages.

The presentations given at the CLEF Workshops and detailed reports on the experiments of CLEF 2006 and previous years can be found on the CLEF website at http://www.clef-campaign.org/. The preliminary agenda for CLEF 2007 will be available from mid-November. CLEF is an activity of the DELOS Network of Excellence for Digital Libraries.

Link:
http://www.clef-campaign.org

Please contact:
Carol Peters, ISTI-CNR, Italy, CLEF Coordinator
Tel: +39 050 3152897
E-mail: carol.petersisti.cnr.it