AI Multilingual Search Platform for EU Audiovisual Media Archives

by Pilar Orero, Chiara Gunella, and Sarah McDonagh (Universitat Autònoma de Barcelona)

Culture and linguistic diversity in Europe is both a jewel and a barrier to communication. Broadcaster’s archives are part of the EU cultural heritage –inaccessible for many reasons. The MOSAIC project aims to develop tools for multilingual translation, automatic subtitling, and AI-driven content adaptation. MOSAIC seeks to empower broadcasters and media producers by making their content available to a wider audience, enhancing cultural exchange and unity across Europe.

European broadcasters, news agencies, and corporations are known for high-qulitymedia content and information production, distribution, and consumption and for the richness of their content archives, containing a wealth of cultural, political, historical, and artistic content in various formats, including film/video, television, radio, and digital media. Nevertheless, digital platforms outside Europe already exert control over the media landscape, leading to a fragmentation of accessible knowledge within the European Union. The potential of Europe’s richness remains underused with a serious risk that Europe will lose valuable resources and archival material. In this scenario, the adoption of artificial intelligence (AI) technologies by organisations, and thus the creation of media content using advanced techniques, is essential not only for the cultural sovereignty of Europe but also for the preservation and presentation of heritage, education, and entertainment across a European context. Creating a more unified European media identity, based on a fragmented but common cultural heritage, is a major challenge: on one side, both public and private broadcasters have to ensure that national regulatory authorities are independent of political or commercial influence to ensure media freedom and pluralism, on the other side, the common challenge is the constant need to adapt to new technologies to stay relevant and to have the chance to make the European media market competitive worldwide and European media content archives an unparalleled source of cultural richness.

Artificial Intelligence (AI), Natural Language Processing (NLP), Natural Language Understanding (NLU), Language Technologies (LTs), and Speech Technologies(STs) have the potential to enable multilingualism technologically but, according to the META-NET White Paper Series “Europe’s Languages in the Digital Age” [1] published in 2012, our languages suffer from an extreme imbalance in terms of technological support: English is well supported through technologies, tools, and datasets, but languages such as Maltese, Estonian or Icelandic still have very poor support.

The goal is to enable multilingualism technologically since “the EU and its institutions have a duty to enhance, promote and uphold linguistic diversity in Europe” (European Parliament 2018) [2]. Today, Generative AI (GenAI) can create all kinds of media, including texts, sounds, videos and 3D content. Its use by news organisations and broadcasters is still in its infancy. It has the potential to reshape the concept of creation and affect the operation and business models across the media and cultural sectors, with the danger of fake news being created at scale. It is expected that the impact on the media and cultural and creative industries (CCI) will be significant, but strong research and innovation support is needed to fully benefit from these new opportunities [L1].

The GenAI market is expected to grow substantially in the next few years. For example, a recent report from Sopra Steria shows that it could go from around 8 billion USD in 2023 to more than 100 billion by 2028. At the same time AI can also generate intelligent media services that adapt to user needs, including environmental factors and personal capabilities, threreby providing accessible multi-language, multimodal media services as prerequisite to enjoying any XR media content. However, the full potential of AI still needs to be further exploited within ethical and legal boundaries, ethical guidelines and regulations should be developed to address the issues related to such models such as sustainability, safety, intellectual property, bias, explainable AI, and trustworthiness.

Figure 1: MOSAIC scenarios.

In terms of AI for Media Access Services, NEM SRIA 2024 [L1] identified some major challenges like i) streamlining the circulation of audiovisual (or video) programs through machine translation, while humans focus on the quality of work; ii) encouraging synergies and convergence between subtitling and the development of multilingualism or the integration of foreigners (e.g., migrants); iii) developping AI tools for automatic translation from speech to subtitles, from text to Sign Language, and from Sign Language to text; iv) Develop AI tools for robust automatic translation of subtitles (multi-languages). In this context, MOSAIC [L2] aims to develop a prototype European AI-enhanced platform, serving as a central, scalable hub for broadcasters and news creators, distributors and consumers. The platform, as can be seen in Figure 1, will leverage knowledge repositories by a sophisticated, multilingual and multimodal AI-based integrated system that links to producers and harnesses the abundance and richness of cultural heritage, media and news repositories, while providing a source of monetisation from accumulated knowledge repositories.
MOSAIC 2024-2026 is co-funded by the DIGITAL EUROPE program under grant no. 479833.

Links:
[L1] https://nem-initiative.org/wp-content/uploads/2024/05/nem_sria_2024-v1-0.pdf?x55852
[L2] https://mosaic-media.eu/

References:
[1] G. Rehm and H. Uszkoreit, “Language Technology Support for Polish,” in The Polish Language in the Digital Age, Springer, 2012, pp. 52-67.
[2] G. Rehm, A. Way (eds.), “ European Language Equality”, Springer, 2023. https://doi.org/10.1007/978-3-031-28819-7

Please contact:
Pilar Orero
Universitat Autònoma de Barcelona, Spain
This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

AI Multilingual Search Platform for EU Audiovisual Media Archives