Cross-Language Evaluation Forum

The objective of the Cross Language Evaluation Forum is to promote research in the field of multilingual system development. This is done through the organisation of annual evaluation campaigns offering tasks designed to test different aspects of mono- and cross-language information retrieval systems. The intention is to encourage experimentation with all kinds of multilingual information access from the development of systems for monolingual retrieval operating on many languages to the implementation of complete multilingual multimedia search services. The aim is to stimulate the development of next generation multilingual IR systems.

This year 100 groups, mainly but not only from academia, participated in the campaign. Most of the groups were from Europe but there was also a good contingent from North America and Asia plus a few participants from South America and Africa.

CLEF 2008 Tracks
CLEF 2008 offered seven tracks designed to evaluate the performance of systems for:

multilingual textual document retrieval (Ad Hoc)
mono- and cross-language information on structured scientific data (Domain-Specific
interactive cross-language retrieval (iCLEF)
multiple language question answering (QA@CLEF)
cross-language retrieval in image collections (ImageCLEF)
multilingual retrieval of Web documents (WebCLEF)
cross-language geographical information retrieval (GeoCLEF)

Two new tracks were offered as pilot tasks:

cross-language video retrieval (VideoCLEF)
multilingual information filtering (INFILE@CLEF)

In addition, MorphoChallenge 2008, an activity of the EU Network of Excellence Pascal, was organized in collaboration with CLEF.

Test Collections
Most of the tracks adopt a corpus-based automatic scoring method for the assessment of system performance. The test collections consist of sets of statements representing information needs known as topics (queries) and collections of documents (corpora). System performance is evaluated by judging the documents retrieved in response to a topic with respect to their relevance (relevance assessment) and computing recall and precision measures.

A number of document collections were used to build the test collections for CLEF2008:

CLEF multilingual corpus of more than 3 million news documents in 14 European languages
Hamshahri Persian newspaper corpus
Library catalog records belonging to The European Library and derived from the archives of the British Library, the Austrian National Library and the Bibliothèque Nationale de France
English/German and Russian social science data
The ImageCLEF track used collections for both general photographic and medical image retrieval:
- IAPR TC-12 photo database; INEX Wikipedia image collection
- ARRS Goldminer database of radiographs; IRMA collection for medical image annotation
Dutch and English documentary television programs provided by Sound & Vision, The Netherlands
Agence France Press (AFP) comparable newswire stories in Arabic, French and English.

Diverse sets of topics or queries were prepared in many languages according to the needs of the various tracks. At the end of the campaign, the result is a number of valuable and reusable test collections.

Workshop
The Workshop plays an important role by providing the opportunity for all the groups that have participated in the evaluation campaign to get together comparing approaches and exchanging ideas. It was held in Aarhus, Denmark, this year and was attended by 150 researchers and system developers. The schedule was divided between plenary track over-views, plus parallel, poster and breakout sessions. There were several invited talks. Noriko Kando, National Institute of Informatics Tokyo, reported on the activities of NTCIR-7 (NTCIR is an evaluation initiative focussed on testing IR systems for Asian languages), while John Tait of the Information Retrieval Facility, Vienna, presented a proposal for an Intellectual Property track which would focus on cross-language retrieval of legal patents in CLEF 2009.

The presentations given at the CLEF Workshops and detailed reports on the experiments of CLEF 2008 and previous years can be found on the CLEF website. The preliminary agenda for CLEF 2009 will be available from mid-November.

CLEF and Treble-CLEF
CLEF 2008 is organized under the auspices of TrebleCLEF, a Coordination Action of the Seventh Framework Programme Over the years, CLEF has done much to promote the development of multilingual IR systems. However, the focus has been on building and testing research prototypes rather than developing fully operational systems. TrebleCLEF is building on and extending the results achieved by CLEF. The objective is to support the development and consolidation of expertise in the multidisciplinary research area of multilingual information access and to promote a dissemination action in the relevant application communities.

Treble-CLEF thus has three main goals:

to promote high standards of evaluation in MLIA systems using three approaches: test collections; user evaluation; and log file analysis
to sustain an evaluation community by providing high quality access to past evaluation results
to disseminate knowhow, tools, resources and best practice guidelines, enabling DL creators to make content and knowledge accessible, usable and exploitable over time, over media and over language boundaries.

The aim will be to provide applications that need multilingual search solutions with the possibility to identify the most appropriate technology. For this purposem a series of best practice workshops are being organised:

Workshop on Best Practices for the Development of Multilingual Information Access Systems, Segovia, Spain, June 2008
Workshop on Best Practices for System Developers: Bringing Multilingual Information Access to Operational Systems, Winterthur, Switzerland, October 2008
Workshop on Best Practices in Query Log Analysis, Spring 2009.

A Summer School on Multilingual Information Access is also being organised for June 2009 in Pisa. The focus of the Summer School will be on "How to build effective MLIA systems and How to evaluate them".

More information on the activities of TrebleCLEF can be found on the Web site.

Links:
CLEF: http://www.clef-campaign.org
NTCIR: http://research.nii.ac.jp/ntcir/
TrebleCLEF: http://www.trebleclef.eu
IRF: http://www.ir-facility.org/

Please contact:
Carol Peters
ISTI-CNR, Italy
E-mail: carol.petersisti.cnr.it