AXES - Finding Video Clips Using Speech and Image Recognition

by Peggy van der Kreeft, Kay Macquarrie and Martijn Kleppe

Searching for clips or segments of videos in large archives can be a daunting task. In which clip was a person mentioned and where in the clip is he or she shown? Even after you locate the correct video, you still need to watch the entire video to find that one segment containing the person that you are looking for. The novel technologies being developed by AXES make finding what you are looking for in large archives and libraries significantly easier.

The aim of the AXES (Access to Audio-Visual Archives) project is to develop tools that provide various types of users with new engaging ways to interact with audiovisual libraries, helping them discover, browse, navigate, search and enrich archives.

Robin Aly of University of Twente explains visual search during user trials.

The technologies used in the project involve multimodal analysis of people, places, objects and events. This includes recognition and identification at both a general and a rather specific level. For example, the system can find categories of people (athlete) or individuals (Angela Merkel); location categories (countryside) or specific places (Grand Place, Brussels); object categories (car) or specific objects (a logo or flag); and event categories (mountain climbing) or particular events (a speech by a politician). Visual analysis (including image similarity), audio (speech-to-text) and text analysis (metadata, OCR) across various languages, and advanced linking technologies are also seamlessly integrated.

Each component of the AXES system focuses on a particular aspect of the audiovisual analysis. For instance, the on-the-fly specific person retrieval [1] component provides a method for finding shots of a particular person in a large dataset without relying on metadata. This on-the-fly method involves pre-processing of the video corpus, making it first searchable for any person, and subsequently, based on a text query specifying the individual, learning a discriminative classifier for that person from face images downloaded from Google image search. It results in a ranking of the faces in the corpus, allowing retrieval of the person of interest.

The AXES video search engine is designed to allow easy integration of existing and novel search methods, to be adaptable for innovative user interaction models, and to provide a test bed for trialing these methods. The search engines and components have been demonstrated and extensively tested within the TRECVid international benchmarking initiative during the course of the project. [2]

The user accesses the system through one of the user interfaces, specifically geared towards different end-user groups. Three distinct user interfaces are being developed, one for each of the targeted user groups and based upon several user studies [3]. Specifically, we target (1) media professionals and archivists, working with archive systems and metadata on a daily basis; (2) researchers, including students, academics and (investigative) journalists looking for resource material; and (3) home users, the general public in search of information. The integrated system, built upon the open-source WebLab platform, is accessible as web services and all components work in the background without expert knowledge required from the users; the user interface steers the end user through the advanced and powerful system in a user-friendly and intuitive way.

The first prototype, AXES PRO, was developed during the first year and has been tested by media professionals. The second prototype, AXES RESEARCH, is currently in its final stages of development, with preliminary testing ongoing. Full user testing by the researcher group is scheduled for autumn 2013. The third prototype, AXES HOME, is planned to be deployed in 2014, with subsequent user testing. The underlying technologies being used by all three systems are largely the same, but the user options and interface differ based on carefully considered user requirements, resulting in an optimized user experience geared to each user group.

AXES opens up a whole new way of experiencing digital libraries and archives, reaching out to the end user and making the vast and rich amount of existing audiovisual content accessible. Visit the AXES website (http://www.axes-project.eu) for more details, video demonstrations, and related publications.

About AXES
The four-year research project AXES started in January 2011 and is co-funded within the EU FP7 framework. Thirteen partners are collaborating on this integrated project. ERCIM, the administrative coordinator, and KU Leuven (Belgium), the technical coordinator, ensure a consistent and collaborative operation. Cassidian (France), Technicolor (France), Inria (France), Fraunhofer IAIS (Germany), KU Leuven, Dublin City University (Ireland), University of Oxford (UK), University Twente (the Netherlands) develop and integrate the technical components. Finally, user partners BBC (UK), NISV (the Netherlands), Erasmus University Rotterdam (the Netherlands) and Deutsche Welle (Germany) describe user requirements, provide audiovisual content and perform user testing. The partners’ complementary skill sets allow an in-depth and wide coverage of technologies resulting in a powerful, innovative set of audiovisual search and analysis tools.

Link:
http://www.axes-project.eu

References:
[1] O. M. Parkhi et al.: “On-the-fly Specific Person Retrieval,”
International Workshop on Image Analysis for Multimedia Interactive Services, 2012
[2] R. Aly et al.: “AXES at TRECVID 2012: KIS, INS, and MED”, in: TRECVid 2012, 26-28 Nov 2012, Gaithersburg, Maryland, USA
[3] M. Kemman et al.: “Dutch Journalism in the Digital Age,” Icono 14, 11(2) 2013, pp. 163–181. doi:10.7195/ri14.v11i2.596.

Please contact:
Peggy van der Kreeft
Deutsche Welle, Germany
Tel: +32 495 664099
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Kay Macquarrie, Deutsche Welle, Germany
Tel: +49 30 4646 5656
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Martijn Kleppe, Erasmus Universiteit Rotterdam, the Netherlands
Tel: +31 10 408 2646
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

{jcomments on}

Sidebar

Contents

AXES - Finding Video Clips Using Speech and Image Recognition