by Gianluigi Folino, Massimo Guarascio (ICAR-CNR), Francesco Chiaravalloti (IRPI-CNR) and Salvatore Gabriele (IRPI-CNR)
Accurate rainfall estimates are critical for areas presenting high hydrological risks. We have devised a general machine learning framework based on a deep learning architecture, which also integrates information derived from remote sensing measurements, such as weather radars and satellites. Experimental results conducted on real data from a southern region in Italy, provided by the Department of Civil Protection (DCP), show significant improvements compared to current state-of-the-art methods.
Accurate rainfall estimates are important in a range of fields, including meteorology, geology and agronomy, allowing researchers to model hydrological and other environmental processes. The spatial variability of rainfall greatly affects the local hydrological processes, and the prediction accuracy of rainfall-runoff simulations is strongly determined by the precision of rainfall estimates . Therefore, an accurate retrieval of the spatio-temporal rainfall patterns is crucial for flood hazard protection, river basin management, erosion modelling and other applications for hydrological impact modelling.
To this end, rainfall sensors, called “rain gauges”, are commonly used by meteorologists and hydrologists to obtain direct, quantitative, and reliable measurements of rainfall intensity in single point sites. Spatial interpolation methods can use rain gauge data to extract an estimate of the precipitation field over a broader area. However, even dense rain gauge networks may be too sparse and insufficient to reconstruct the rainfall field, failing to capture heavy convective events.
Recent research has investigated the possibility of integrating data from heterogenous sources to improve the accuracy of rainfall estimation models. However, traditional interpolation methods (e.g., ordinary kriging) only handle a single source at a time. Kriging with external drift (KED) was introduced to overcome this, but its high computational cost makes it difficult to use in a real-time setting .
In this scenario, deep learning-based architectures have the potential to efficiently extract accurate rainfall estimation models by combining raw low-level data recorded by heterogenous data sources. Indeed, we exploit the capacity of deep neural networks (DNNs) to work in a hierarchical way: i.e., several layers of non-linear processing units are stacked into a hierarchical scheme and each subsequent layer generates a feature set with a higher level of abstraction than the previous one. Therefore, deep learning-based approaches are the ideal choice to analyse raw data provided in different formats and from different types of source. Moreover, deep architectures can be used within infrastructures for big data storage and analysis (e.g., Hadoop and Spark) and can exploit GPUs to parallelise the computation and to reduce the learning times.
A joint collaboration between ICAR-CNR and IRPI-CNR aimed to design a framework  based on three main macro-components (discussed in further detail below): (i) information retrieval, (ii) data analytics and (iii) evaluation, making it possible to integrate information extracted from many data sources.
The information retrieval macro-module is designed to extract and integrate data from different sources. Specifically, a “data source connector” is used to establish the connection with a specific data source. The information extracted from each connector is provided as input for the “data wrapper” module that combines these data into a single view suitable for the analysis. Finally, the raw data is stored in the “knowledge base” (KB), which is used for data exchange among the framework modules.
Rain gauge data are provided by the Calabrian DCP, weather radar data are delivered by the Italian DCP and MSG data are acquired and decoded by a DVB receiver station in high rate information transmission (HRIT).
The data analytics module is devoted to training the rainfall estimation model (REM) from the raw data stored in the KB. It includes three sub-modules designed to handle the whole knowledge discovery flow: data preprocessing, data sampling and model building. Some transformation and filtering operations have to be performed before data could be provided as input to the learning algorithm. The data preprocessing module performs the necessary data cleaning methods for handling the different data issues: missing values, outliers, and noisy data.
Figure 1: left, DNN Architecture; right, building block of the DNN model.
A classification model based on DNN architectures is used to estimate the rainfall. Specifically, a feed-forward fully-connected neural network including dropout and batch normalisation layers, shown in Figure 1 (a), is employed to provide more accurate predictions for heavy rainfall events. The building block (Figure 1 (b)) composing our architecture includes three base components: (i) a fully-connected dense layer using a rectified linear unit (ReLU) activation function for each node composing the layer, (ii) a batch-normalisation layer for improving stability and performance of the current dense layer, and (iii) a dropout layer to reduce the risk of overfitting. Moreover, we devised a suitable weighted loss to tackle the class unbalancing problem due to the rarity of (dangerous) heavy rainfall events.
To evaluate the solution, we applied it to a real challenging scenario: data from Calabria, a peninsular region in Italy (see Figure 2), provided by the Italian Department of Civil Protection. It represents an effective test case for a number of reasons: despite having more than 700 km of coastline, it is one of the most mountainous regions in Italy, therefore exhibiting strong climatic variability. Furthermore, the interaction between the complex orography, the heat flux from the Mediterranean Sea and the high reliefs near the warm sea, support the convective instability of the region. In addition, floods and landslides are quite frequent here.
Figure 2: Map of Calabria. The small circles represent the rain gauges, and the large dashed circles the radar ranges.
Figure 3: An example of the areal rainfall field estimation for Calabria (left, using our method, right using KED).
Experimental results show significant improvements in comparison with KED (see Figure 3) and with other machine-learning techniques. Although it generate more false alarms, our method detects more rainfall events, in particular for the latter two classes, representing exceptional and/or extreme rainfall events.
Our method is being implemented within the RAMSES (RAilway Meteorological SEcurity System) system, a pilot CNR project, recently co-funded by RFI SpA, that aims to mitigate geo-hydrological risk along the railway [L1].
 S Gabriele, F. Chiaravalloti, A. Procopio: “Radar–rain-gauge rainfall estimation for hydrological applications in small catchments”, Advances in Geosciences, 44, 61-66, 2017.
 N. Nanding, M. A. Rico-Ramirez, D. Han: “Comparison of different radar-raingauge rainfall merging techniques”, Journal of Hydroinformatics, 17(3), 422-445, 2015.
 G. Folino, et al.: “A Deep Learning based architecture for rainfall estimation integrating heterogeneous data sources”, IJCNN 2019, pp 1-8, 2019.