Research and Innovation

MR-DIS: A Scalable Instance Selection Algorithm using MapReduce on Spark

by Álvar Arnaiz-González, Alejandro González-Rogel and Carlos López-Nozal (University of Burgos)

Efficient methods are required to process increasingly massive data sets. Most pre-processing techniques (e.g., feature selection, prototype selection) and learning processes (e.g., classification, regression, clustering) are not suitable for dealing with huge data sets, and many problems emerge as the volume of information grows. Here, parallelisation can help. Recently, many parallelisation techniques have been developed to simplify the tedious and difficult task of scheduling and planning parallel executions. One such technique is the instance selection method ‘Democratic Instance Selection’, which uses the successful paradigm MapReduce. The main strength of this algorithm is its complexity: linear in the number of examples, i.e., O(n).

Towards Efficient Numerical Optimal Transport

by Bruno Levy (Inria)

Recent research results at the junction between mathematics and computer sciences raise the possibility of new computational tools with many possible applications in data analysis, computer graphics and computational physics.

Subsampling Enables Fast Factorisation of Huge Matrices into Sparse Signals

by Arthur Mensch, Julien Mairal, Bertrand Thirion and Gaël Varoquaux (Inria)

A new algorithm developed at Inria gives large speedups for factorise matrices that are huge in both directions, a growing need in signal processing. Indeed, larger datasets are acquired every day, with more samples of richer signals. To capture interpretable latent representations in these very big data, matrix factorisation is a central tool. But such decompositions of terabytes of data come with a hefty cost.

Deep Learning Applied to Semantic Content Extraction of Document Images

by Victoria Ruiz, Jorge Sueiras, Ángel Sánchez and José F. Vélez (Rey Juan Carlos University)

The ATRECSIDE research project is investigating applications of deep learning models to automatic handwritten recognition problems, such as non-constrained extraction of text from document images, handwritten text recognition, and summarisation and prediction of texts.

Forensics using Internal Database Structures

by Peter Kieseberg, Edgar Weippl (SBA Research) and Sebastian Schrittwieser (JRC TARGET, St. Poelten University of Applied Sciences)

The information stored in databases is typically structured in the form of ‘B+-Trees’ in order to boost the performance of querying for information. Less well known is the fact that B+-Trees also store information on their actual construction, which permits the detection of manipulation attempts.

Development Tool Interoperability Specifications and Standardization

by Erwin Schoitsch (AIT), Jürgen Niehaus (SafeTRANS)

Standardisation plays an important role in large industry-driven European research projects that aim to put research results into practice, and are run by EC PPP (Public-Private Partnership) organisations such as ECSEL JU (Electronic Components and Systems for European Leadership, Joint Undertaking) and its predecessor ARTEMIS. In recent years, several projects in the area of safety critical embedded systems have addressed interoperability specifications (IOS) for development tools to reduce costs and errors in critical CPS development and to allow easier integration of tools of different suppliers. CP-SETIS, a Horizon 2020 project, will harmonise these efforts and create a sustainable structure to further develop and maintain the landscape of standards, specifications and guidelines which comprise the IOS.

Enabling Research and Innovation Beyond Continental Borders: A Case for Satellite Events in Evaluation Campaigns

by Frank Hopfgartner (University of Glasgow, UK), Cathal Gurrin (Dublin City University, Ireland), and Hideo Joho (University of Tsukuba, Japan)

A satellite session of the NTCIR (Evaluation of Information Access Technologies) conference was experimentally held in Glasgow, allowing participants to present their work either in Europe or in Asia. This experience, designed to foster research and innovation across continental borders, was a great success.