Data Analytics and AI Testing Facility Based on Research and Technology Infrastructure

by Samuel Renault and Patrik Hitzelberger (Luxembourg Institute of Science and Technology)

Opening a high-performance data analytics and AI research and technology infrastructure to small companies will support digitisation and co-development of Industry 4.0 projects. A test-before-invest approach is supported by the local digital innovation hub.

In Luxembourg, the national strategy on AI defines as its two first focus areas, the first being a living lab for applied AI, and the second enabling access to data [L1]. To support this, LIST has invested in a hybrid high-performance data analytics and AI research and technology infrastructure with the support of the European Regional Development Fund.

The project started from a collection of data analytics and AI use cases identified within LIST’s own project portfolio. From these use cases, a clustering was done to identify the main AI and analytics domains to be covered. This clustering and the use cases were used to describe the requirements for a high-performance data analytics and AI infrastructure to be used in research and innovation projects with industrial partners.

A call for tenders helped to select the technology providers that supported the implementation of the infrastructure. The tender focused on a hybrid infrastructure combining on-premises clusters with proprietary and open-source software stacks and cloud services.

The two-year implementation project led to the deployment of an on-premises infrastructure composed of three main interconnected computing clusters. The first computing cluster supports the data analytics software stack of IBM (i.e. IBM Cloud Pak for Data) with modules allowing users with limited skills in data analytics and AI to perform the data ingestion, cleaning and preparation steps before building their AI models with assistance modules. This offers users the required support for starting their AI and data analytics journey. The second computing cluster supports an equivalent open-source software stack from Apache (Hadoop, Spark, Hive, NiFi among others), which is meant to be used by technically advanced users. This second stack offers more flexibility and finer control in all the data pipeline steps (ingestion, storage, cleaning, preparation, analysis, model development, monitoring, deployment) and allows a larger deployment flexibility. A third cluster dedicated to data visualisation and graphical computing was set up together with a large visualisation facility: a 7m-by-2m multitouch visualisation wall. This allows for visualisation and analysis of large-scale or high-dimensional data sets at a glance. After its initial deployment in early 2022, the infrastructure was updated with containerisation capabilities. The open-source (Hadoop) cluster was split to accommodate a small Docker cluster. The figure shows the current overview of the infrastructure and a representation of the visualisation facility.

Figure 1: A use case of the visualisation facility: visualisation wall in the back and a tangible table in the front (left). Overview of the infrastructure’s clusters and subscriptions (right).

Since its launch, this infrastructure has been used in national- and European-funded projects in various fields: energy grids analytics and optimisation for a digital twin project, space images quality improvement [1], industrial process improvement [2] and product quality measurement, among others. Beginning 2023, the infrastructure is proposed as a testing facility in two specific cases.

One is the support to a Testing and Experimentation Facilities (TEF) project in relation to smart cities and electromobility. Here, the infrastructure will be used to jointly develop pilot cases in electromobility with private companies, allowing companies to access data sets, reuse and process them through the infrastructure to provide innovative mobility and charging solutions (e.g. optimal location of charging stations, physically and logically in the energy grid) to cities and communities. This will also allow further development of interoperability standards such as OASC’s minimal interoperability mechanisms [L2].

The other case is within Luxembourg’s digital innovation hub where the technology infrastructure will be opened to industrial and manufacturing SMEs as a test-before-invest facility, with guidance and support through the infrastructure engineers and researchers. It will help companies to determine the worthiness of the technology for their specific business purpose, to tailor their artificial intelligence and data analytics projects and make informed investment decisions for their Industry 4.0 projects.

With numerous current developments both in data processing capabilities (e.g. AI models) and in data gathering, structuration and sharing (e.g. open data, data lakes, etc.), using dedicated data infrastructures will be a key enabler for innovation in private companies. Such infrastructures, like ours in Luxembourg, provide features to ensure data lineage, cleaning, description and access control, which are key to develop and test ideas before scaling up to the market. Where questions of sovereignty and locality of data, processing and expertise arise, technology infrastructures offered by research organisations should be taken into consideration.

Links:
[L1] https://gouvernement.lu/dam-assets/fr/publications/rapport-etude-analyse/minist-digitalisation/Artificial-Intelligence-a-strategic-vision-for-Luxembourg.pdf
[L2] https://oascities.org/minimal-interoperability-mechanisms/

References:
[1] O. Parisot, et al., “Improving accessibility for deep sky observation,” ERCIM News, vol. 130, Jun. 2022.
[2] U. Iffat, E. Roseren, and M. Laib, “Dealing with high dimensional sequence data in manufacturing,” Procedia CIRP, vol. 104, pp. 1298–1303, Nov. 2021, doi: 10.1016/j.procir.2021.11.218.

Please contact:
Samuel Renault, Luxembourg Institute of Science and Technology, Luxembourg
This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

Data Analytics and AI Testing Facility Based on Research and Technology Infrastructure