by Lisa Veiber, Salah Ghamizi (University of Luxembourg) and Jean-Sébastien Sottet (LIST)
Many statistical and machine learning (ML) models have been developed to provide forecasts for the COVID-19 crisis. Acquiring qualitative data with a rather short timeframe is a challenge for anyone who wants to build a ML algorithm to support forecasts about the pandemic. We propose a hybrid approach that takes into consideration factors from human knowledge in order to reinforce or correct data-driven ML predictions.
At the peak of the COVID-19 crisis, governments took drastic measures to stop the pandemic, forcing the population into a total lockdown, which brought the economy to a standstill. While social distancing has disrupted social life, the economic impact of this outbreak has been severe and unprecedented. During the resurgence in COVID-19 positive cases, countries are too economically impacted to reinstate a strict lockdown as they did for the first wave and are instead investigating different actions to limit the pandemic without crushing the economy.
Our research is thus driven by the need to provide decision-makers in Luxembourg with an appropriate tool to identify which measures help control the spread of the virus with minimal economic impacts. In contrast to current approaches, which take a rather holistic view of the crisis, we have been focussing on the impact on particular sectors; specifically, the impact of the COVID-19 on the hospitality sector.
We have developed a machine learning (ML) driven approach, intended to function as an instrumental backup to the economic recovery strategy and ensure granular mitigation of the pandemic’s effects. Our approach is complemented by human-centric modelling of the impacted ecosystem, including social, economic and health aspects. This model-based approach aims to correct the potential lack of data; fine-tuning the ML results and providing better user control. Ultimately, we aim to deliver a decision-making tool that helps find the right balance between health protection and economic recovery.
The machine learning approach
The proposed ML approach embeds state-of-the-art simulation and prediction techniques to provide predictions about economics and health, based on scarce data.
We use Bayesian likelihood  to estimate the Reproductive Number Rt (a measure of how fast the disease is spreading), then we implement a Deep Neural Network classifier that uses mobility data as  to predict the Rt of each economic sector. These processes are shown by the red indices 1 and 2 in Figure 1. We only relied on the cases per sector where we had enough data for Bayesian estimation. Other sectors were estimated using a generic country-wide Rt. The Neural Network uses smoothed samples of 30 days with a total of 76 features.
Once the Rt is predicted using our ML classifier, it is injected in a SEIR-HCD compartmental model to predict the cases (see Figure 1, index 3: prediction process and SEIR-HCD using a set of ordinary differential equations), deaths, individuals who are hospitalised and number of people who are critically ill for each economic sector. Once these elements are known, economic metrics can be computed.
For the economic modelling, only a few data points were available for the target output, which meant that only shallow ML techniques could be used without risk of overfitting. After training and validation, a linear regression was used to link the simulated epidemiological output and the different measures parameters to the economic impact. This is shown in Figure 1, red index 4.
Figure 1: Overall process and in/outputs of our simulation tool.
Human factors and the decision-making process
We designed a user-centred tool to help decision-makers: inputs of the simulation are mapped on real polices that authorities in Luxembourg and throughout Europe can use to mitigate the impacts of COVID-19. We believe that collecting human knowledge about contextual information, which is not always reflected in the training data, is necessary to make models more precise. Thus, we have enhanced our original ML with human-design models (e.g., rules). These models affect the input measure and potentially the learning, or correct and/or mitigate the result of the learned model. This is represented in Figure 1, red index 5, where we build mitigation rules for ML and user input. These rules will map the user input measures into the ML inputs.
We identified some possible measures to be used as input, such as: the closure of borders to neighbouring countries (Belgium, France and Germany); the restriction of economic activity; school restrictions; and restrictions on public and private gatherings. Having data on the actual number of cases and deaths, we were able to infer the reproduction rate Rt of the pandemic and build a model to link this Rt to actual measures like border closures and school restrictions. Our model showed a R²=82% on a random split for Luxembourg.
We have also designed a model for different economic dependencies between the sectors. For instance, essential sectors such as energy production cannot be completely shut down since many industries, and more importantly hospitals, depend on them.
We also tuned the output of the ML approach by proposing a contamination factor between the sectors. It then corrects the output of the Rt for the defined sector using a mitigation factor. We will be able to fully validate and potentially retrieve those inter-sector contamination factors when we have access to more data, which unfortunately was not available at the time that we developed this model.
A decision-making tool
Our decision-making tool, embedding human-designed models and ML, represent a major leap in simulation engines, which, in the near future, will help decision-makers to compare the impact of different policies on health and the economy. As it uses simulation and ML prediction as a tool for decision-making, it still requires fine-tuning to correctly calibrate government policy (e.g., closures of non-essential shops) on our prediction tool inputs: the impacts of some measures on the pandemic have not been scientifically evaluated as data is lacking or not trustworthy. As the pandemic evolves, further study is required to examine the correlation of measures with reality and economic sector specificity, as well as to make this model interoperable with other models developed during the crisis.
 L. M. A. Bettencourt, R. M. Ribeiro: “Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases”, PLoS ONE 3(5): e2185, 2018. https://doi.org/10.1371/ journal.pone.0002185
 S. Ghamizi, et al.: “Data-driven Simulation and Optimization for Covid-19 Exit Strategies”, in Proc. of KDD’20, ACM, New York, 3434–3442, 2020. https://doi.org/10.1145/3394486.3412863
Luxembourg Institute of Science and Technology (LIST), Luxembourg