research-article

Open access

Modeling the Geospatial Evolution of COVID-19 using Spatio-temporal Convolutional Sequence-to-sequence Neural Networks

Authors:

Arlindo OliveiraAuthors Info & Claims

ACM Transactions on Spatial Algorithms and Systems, Volume 8, Issue 4

Article No.: 28, Pages 1 - 19

https://doi.org/10.1145/3550272

Published: 01 December 2022 Publication History

All formats PDF

Abstract

Europe was hit hard by the COVID-19 pandemic and Portugal was severely affected, having suffered three waves in the first twelve months. Approximately between January 19th and February 5th 2021 Portugal was the country in the world with the largest incidence rate, with 14-day incidence rates per 100,000 inhabitants in excess of 1,000. Despite its importance, accurate prediction of the geospatial evolution of COVID-19 remains a challenge, since existing analytical methods fail to capture the complex dynamics that result from the contagion within a region and the spreading of the infection from infected neighboring regions.

We use a previously developed methodology and official municipality level data from the Portuguese Directorate-General for Health (DGS), relative to the first twelve months of the pandemic, to compute an estimate of the incidence rate in each location of mainland Portugal. The resulting sequence of incidence rate maps was then used as a gold standard to test the effectiveness of different approaches in the prediction of the spatial-temporal evolution of the incidence rate. Four different methods were tested: a simple cell level autoregressive moving average (ARMA) model, a cell level vector autoregressive (VAR) model, a municipality-by-municipality compartmental SIRD model followed by direct block sequential simulation, and a new convolutional sequence-to-sequence neural network model based on the STConvS2S architecture. We conclude that the modified convolutional sequence-to-sequence neural network is the best performing method in this task, when compared with the ARMA, VAR, and SIRD models, as well as with the baseline ConvLSTM model.

1 Introduction

Modeling the Spatio-temporal dynamics of the evolution of the COVID-19 pandemic is important for a variety of reasons, which include health service planning, adoption of containment measures, and allocation of resources. In fact, knowing the incidence rate in a given area enables public health officials and individuals to adopt the best strategies to contain the disease and reduce the risk of infection.

Regrettably, it is difficult to predict the incidence rate in the near future at any given time and place not only because statistics are usually collected at an aggregate level at the country, province, municipality, or city level but also because there are no widely known methods that are reliable enough to make this prediction.

However, since contagion depends on individual and social behavior and only happens when people are in close proximity, incidence rate maps remain an excellent tool to plan and act in order to contain an epidemic that propagates mainly through person-to-person contact. Previous work by some of the authors of this article proposed an incidence rate spatial model for COVID-19, based on a geostatistical framework and using the confirmed positive tested cases reported by the Portuguese Directorate-General for Health (DGS) acronym, in Portuguese [1]. These incidence rate maps are a very good approximation to the real incidence rate since they are derived from official data and require only mild assumptions about the way epidemics spread through a territory.

In this work, we propose a new method to perform Spatio-temporal prediction of future incidence rates of COVID-19, and assess its effectiveness with real-world data obtained during the first year of the pandemic, in Portugal. The proposed method, a convolutional sequence-to-sequence neural network model based on the STConvS2S architecture, performs better than the tested alternatives: a simple cell level autoregressive moving average (ARMA) model, a cell level vector autoregressive (VAR) model, and a municipality level compartmental SIRD model followed by the same geostatistical simulation method used to generate the reference data. The experiments showed that the convolutional sequence-to-sequence neural network provides the best results in terms of prediction ability. Furthermore, the predictions performed by this method degrade more gracefully as the prediction date moves forward to a more distant future, showing that the ability to take into account Spatio-temporal information is important when making predictions about the evolution of pandemics,

After a brief review of related work, in Section 2, we describe the method used to derive the incidence rate maps and the characteristics of the reference dataset in Section 3. To test the accuracy of the modeling algorithms in the incidence rate data, we implemented and evaluated four different methods, described in Section 4. The results obtained with these models are reported in Section 5. Section 6 presents the conclusions gathered from this work and proposes a few directions for future work.

2 Related Work

In this section, we provide a brief overview of related work on geospatial modeling of disease risks, with a focus on the methods that take into account the continuous nature of the territories.

2.1 Geospatial Modeling of Disease Risk

Data concerning diseases is commonly available by administrative regions. For many infectious diseases, statistics on the number of cases by countries, municipalities, or other administrative regions are widely available. Thus, the most common approaches to map and visualize the risk of disease is through the use of choropleth maps, where those pre-defined spatial regions are colored in accordance with recorded incidence rates. In the wake of the COVID-19 pandemic, many such analyses have been performed and made available [2, 3, 4]. However, such maps suffer from the intrinsic limitation caused by the arbitrary boundaries imposed by the pre-defined administrative regions considered and exhibit undesirable discontinuities at these boundaries. Geostatistics provides the framework to model spatial variability and to account for data with varying spatial support delivering continuous maps of the risk of disease, overcoming the limitations of choropleth maps. Additionally, geostatistical risk maps can integrate data uncertainty resulting from population size, which is very important in this case, given that the population density varies widely across the country.

A number of methods that estimate the spatial risk of diseases from known datapoints have been proposed [5], including distance-based approaches and kernel density estimation (KDE) methods. These and other methodologies may be used to convert point data to continuous risk maps. However, these methods have been designed to model static or slow-varying distributions, and cannot be trivially adapted to dynamic modeling of diseases. Combined with time-series prediction, KDE has had some success in visualizing disease evolution [6] but depends critically on the accuracy of the time-series prediction method.

Direct estimation of the geospatial risk of COVID-19 in Switzerland using high-resolution Spatio-temporal data analysis [7] based on the modified space–time density-based spatial clustering of application with noise (MST-DBSCAN) algorithm [8] has been proposed. However, high Spatio-temporal resolution health data are rarely available, due to limited information and other limiting factors, such as personal data protection policies.

Our modeling of the geospatial risk model, described in Section 3 is based on geostatistical block sequential simulation [9], assuming a Poisson model for rare diseases as proposed by Waller and Gotway [10] to estimate the local distributions functions [11]. Given the actual municipality-level data, with varying support, this method makes it possible to integrate this uncertainty into the spatial model to generate the incidence rate maps and to assess its spatial uncertainty in any region in the country.

The block sequential simulation we use assumed that the disease propagation is isotropic, at the resolution involved in the simulation. In practice, this may not be true in some scales, since streets, roads, city organization, and other characteristics of built space may privilege some specific directions of transmission. However, this assumption makes the problem tractable and our experiments with mobility information (provided by a cellphone operator and tested as additional inputs) did not contribute to improving the precision of the models, contrary to our initial expectations. The method also assumes that the incidence rate collected at the municipality level is a good proxy for the incidence rate in the municipality since it is used in the kriging process. This is an assumption of the direct block sequential simulation (Block-DSS) method that cannot be avoided. However, we weigh the location of the centroid by the population density of each municipality, in order to decrease the possible negative effects of this assumption.

2.2 Compartmental Models

Compartmental models have been extensively used in epidemiological studies. The first proposals to group populations in different compartments and to use differential equations to model the transitions between compartments, deriving the dynamics of the processes, date back more than a century [12, 13]. Compartmental models use continuous differential equations to model the evolution of an epidemic in a population of individuals, where each individual is assigned a state (a compartment) that determines its stage in the epidemic process [13, 14]. Since then, many articles have been published on the use of compartmental models to predict and simulate the dynamics of epidemics. In the last year alone, hundreds of articles have been published on the application of compartmental models to study the dynamics of the COVID-19 pandemic in many countries and locations, such as China, Italy, and India [15, 16, 17]. A complete description of the many applications of the models, as well as the variants proposed, is outside the scope of this work.

Our own application of compartmental models to predict the dynamics of the pandemic in Portugal is based on the Susceptible-Infected-Recovered-Dead (SIRD) model, which uses four different compartments. Despite the model’s simplicity, many assumptions and approximations are required to make it fit real-world data, including the consideration of time-varying parameters, adjustments for small-sample, high-variance, data, and estimation of unknown values in the observed data series. We used, with adaptations described in Section 4.3, the methods proposed in recent work that used data from the COVID-19 pandemic in New York City, Madrid, and Stockholm, among other cities, together with various states, countries, and regions to estimate a SIRD model [18].

2.3 Autoregressive Moving Average and Vector Autoregressive Models

ARMA models have been widely used in the modeling of time series and they can be used to predict the evolution of a univariate stationary stochastic process using two polynomials, one for the autoregression and a second one for the moving average. Proposed by Peter Whittle [19] in 1951, it has been extensively used in time series analysis and represents a solid baseline against which other, more sophisticated, models can be tested.

The ARMA model suffers from a strong limitation since it computes the evolution of the incidence rate at each cell not taking into account the values of any other cells. To circumvent this limitation we also tested a VAR model [20], which computes the incidence rate at each cell using all the values from the other cells. Vector autoregression is commonly used to capture the relationship between different time-varying quantities, as it is a generalization of the (univariate) autoregressive model that does not suffer from the same limitations.

2.4 Spatio-temporal Modeling using Neural Networks

Recurrent neural network (RNN) architectures, based on Long Short-Term Memory (LSTM) units [21] or gated recurrent units (GRU) [22] have been extensively employed in forecasting tasks involving time-series data. However, these models consider the input data as independent sequences of vectors and do not explore the spatial context available in Spatio-temporal data.

To address this limitation, spatial convolution operations were combined with LSTMs, thereby exploiting the abilities of convolutional neural networks (CNNs) and RNNs to effectively model spatial and temporal information, respectively. In the Convolutional LSTM (ConvLSTM) approach [23], all input data structures are 3D tensors, with the first dimension corresponding to either the number of measurements or the number of feature maps and the last two dimensions representing the spatial dimensions (i.e., width and height). By replacing the product operation in the original LSTM with the convolution operation, the future states of a certain cell depend on the inputs and past states of itself and its local neighbors.

Many different architectures have been proposed using ConvLSTM building blocks, and applied to different problems, such as weather prediction from satellite image sequences [24], air quality forecasting [25], and video frame prediction [26].

Inspired by the aforementioned ConvLSTM architecture, Wang et al. [27] proposed another recurrent model, PredRNN, which captures spatial and temporal features in a unified memory pool. In PredRNN, the states of an adapted LSTM cell can travel both vertically between layers and horizontally across states. The authors introduced a new cell state in each LSTM unit, that flows in a zig-zag direction, first upwards across layers and then forwards over time. This extension to standard LSTM units, called Spatio-Temporal Long Short-Term Memory (STLSTM), allows simultaneous flow of both spatial and temporal memory and the standard temporal memory, present in traditional ConvLSTMs. The authors defined the PredRNN as a multi-layer architecture employing ST-LSTMs, having tasked the model with predicting 10 future observations from three distinct datasets, given the previous 10 observations. Results showed that PredRNN utilizing the ST-LSTM architecture significantly outperformed every other baseline, including previous state-of-the-art models. In general, the proposed model was able to more accurately predict the trajectories and digits, particularly in sequences that displayed overlapping sections, when compared to the baseline models.

In the architecture that served as the basis for this work, STConvS2S [28], the authors used a standard encoder-decoder structure comprised of multiple stacked ConvLSTM layers, with each decoder layer initializing its hidden states from the output of the corresponding encoder layer. The final predictions of the model are given by the concatenation of the hidden states from the decoder network followed by a \(1 \times 1\) convolution. This method was adapted to this problem by introducing a number of improvements, including local weights and learnable inputs, described in Section 4.4.

3 Computing the Incidence Rate

3.1 Computation of Incidence Rate using Block-DSS

We use the incidence rate, the number of confirmed positive tests in a region, in a period of 14 consecutive days, divided by the population of the region, as a proxy for assessing the risk of infection. In particular, we consider the incidence rate per municipality \(z_\alpha (t)\), defined by

\begin{equation} z_\alpha (t)=\frac{c_\alpha (t)}{n_\alpha } , \end{equation}

(1)

where \(c_\alpha (t)\) is the number of confirmed positive tests in the 14 days preceding a given day t, at each municipality, \(\alpha\), and \(n_\alpha\) is the respective population size. We will use \(z_u\) to denote the incidence rate at a specific location u (or cell) in the country. In the sequence, we may drop the explicit dependence on t to simplify the notation, when not required.

The daily numbers of new cases by the municipality are reported regularly by the Portuguese DGS, and they have been converted to incidence rates by dividing by the population count of each municipality, as given by official statistics, and scaled to 100,000 inhabitants (thus corresponding to the number of new cases in the most recent 14 days per 100,000 inhabitants). This will be the scale used in the maps and reports in this article.

Since not all cases are detected, this incidence rate may underestimate the real incidence rate. However, it is reasonable to assume that the ratio between the actual number of newly infected persons and the number of detected cases is approximately equal in the different regions of the country, enabling us to use the incidence rate computed in this way as a proxy to the real incidence rate. In practice, we may be underestimating the incidence rate by a significant factor, since it is known [29] that a relevant fraction of the COVID-19 cases are asymptomatic and remain undetected. This has an impact on the value of the parameters of the dynamic models, since important characteristics of the process such as the group immunity threshold, depending on the actual rates of infection and recovery. In practice, we believe this underestimation does not have a strong impact on the accuracy of the models (other than a systematic underestimation), since all model parameters are estimated from actual data and updated as time goes by in order to reflect the dynamics of the pandemic. Therefore, the only significant impact of the fact that not all cases are detected is the underestimation of the incidence rate, by a factor that can be considered approximately constant, if one assumes that there are no significant differences in testing strategy from region to region. This assumption can be justified by the relative homogeneity of the health system in a country like Portugal.

We consider the country divided into a regular grid of square cells, which, in this work, have a dimension of 2 km by 2 km (see Figure 1 for an example of the discretization used).

Fig. 1.

The methodology can be trivially adapted to a different grid. We compute the incidence rate maps by estimating \(z_u\) for each cell in the country, using geostatistical stochastic simulation, namely block-DSS [30], which is based on a stationary spatial covariance model (i.e., stationary variogram model). This modeling technique allows the daily update of COVID-19 incidence rate maps at high-resolution together with the associated spatial uncertainty. The geostatistical incidence rate maps are continuous (up to the scale of the discretization) and their spatial distribution follows a given spatial pattern as revealed by a variogram model inferred from the data. The incidence rate maps do not show any sharp discontinuities and are not influenced by administrative boundaries, such as municipality borders. Since incidence rates refer to different population sizes, these must be weighted when calculating the experimental variogram such that municipalities with large populations have greater weighting.

Below, we briefly describe this geostatistical simulation method. A full description of the method can be found in previous work by the authors [1]. We assume that the geometric centroid of each municipality \(\alpha\) has coordinates \((x_\alpha ,y_\alpha)\). Data are available for the 278 municipalities in mainland Portugal, as illustrated in Figure 2.

Fig. 2.

For clarity, in this section, we will drop the explicit dependency on t. The daily update of incidence rates \(z_\alpha\) known for each municipality is assigned to each centroid (weighted by population density) and are used as experimental data in the geostatistical simulation. This is a reasonable approximation given the relatively small area of each municipality when compared with the area of the country.

Within a geostatistical framework \(z_\alpha\) is interpreted as a realization of a random variable \(Z_\alpha\) corresponding to the true, and unknown, incidence rate in municipality \(\alpha\). The expected value of \(E[Z_\alpha ]\) is an approximation of the incidence rate. Z depends on the population size of each municipality. For example, a given municipality with a small population size \(n_\alpha\) (i.e., a small denominator) will have high variance and consequently higher uncertainty in the incidence rate estimate. We use the Poisson kriging model [11] to define the risk variance

\begin{equation} Var[Z_\alpha ]=\sigma _R^2+\frac{E[Z_\alpha ]}{n_\alpha }. \end{equation}

(2)

Block-DSS [30] is a stochastic sequential simulation method based on the Poisson kriging model and an adequate modeling technique to deal with data with varying spatial support (i.e., municipalities with varying size and shape) and to make spatial predictions with a change of support, e.g., from the area (block) to point support. In this case, the scale related to the map cells can be referred to as point support, because it denotes a small area when compared with the municipality areas. The following sequence of steps summarizes the block-DSS algorithm [30]:

(1)

Generate a random path that visits each cell, \({u}\), of the simulation grid;

(2)

at each location along the random path, \({u}\), search the conditioning data within a given neighborhood dependent on the variogram model. The conditioning data comprises the closest experimental data, previously simulated values and block data;

(3)

calculate the local covariance values considering spatial covariance matrices between block-to-block, point-to-block, and point-to-point. The point data represents the incidence rates assigned at the centroid of each municipality and the block data are defined as the spatial linear average of point values. These matrices are built to solve the block kriging system and obtain the local mean and kriging variance estimate at location \({u}\) [30];

(4)

draw a value, \(z_u\), from the global probability distribution function centered at the local mean and bounded by the local variance obtained in (3);

(5)

add the simulated value to the dataset and repeat steps (2) to (6) until all grid cells are simulated for one realization.

3.2 Dataset Generation and Characterization

Block-DSS generates alternative incidence rate maps, designated realizations, at each run, as the random path changes and consequently the conditioning data when simulating a location \({u}\). A given set of realizations provides the value of the uncertainty of incidence rate distribution in a given cell. We have simulated daily sets of one hundred realizations of incidence rates for mainland Portugal.

We computed the incidence values for each cell in the territory, for each day in the period under consideration. The final dataset, used to assess the performance of the models, was generated from data in 278 municipalities, for the period between March 1st, 2020 and February 28th, 2021. The territory was discretized into 40,608 square cells (in a \(288 \times 141\) grid), each cell corresponding to a surface of 2 km by 2 km. The resulting dataset consists of information (incidence and uncertainty estimates) for each cell during the first 365 days of the pandemic. The first 184 days (March 1st until August 31st) were used as the training set, to train or define the parameters of the models, and the following 181 days were used to test the prediction ability of the models.

To illustrate the data, Figure 3 depicts an example of the computation of the incidence rates, for a specific day in the period.

Fig. 3.

Since the method also derives confidence intervals for the incidence rate in each cell, we also make these available (see example in Figure 4).

Fig. 4.

The value of the incidence rate at each cell represents a good approximation, given existing data limitations, to the real incidence rate at that location. Both the incidence risk maps and the confidence intervals are made available as supplemental data.

4 Modeling Methods

To model the Spatio-temporal spread of the COVID-19 epidemic in Portugal, we implemented four models:

—

An ARMA autoregressive-moving-average at the cell level;

—

a vector-autoregressive model at the cell level that takes into account the previous values of all other cells;

—

a municipality level compartmental SIRD model coupled with block-DSS;

—

a Spatio-temporal convolutional sequence-to-sequence neural network.

All models produce predictions at the cell level. These predictions are then compared with the gold standard obtained using the method described in Section 3.

4.1 Autoregressive Moving Average Model

The baseline ARMA(p, q) model is a simple, cell-level, autoregressive-moving-average model [19]. In this model, the incidence rate for each cell u is computed using

\begin{equation} z_u(t) = k_u + \epsilon _u(t)+\sum _{i=1}^p \varphi _u(i) z_u(t-i)+\sum _{i=1}^q \theta _u(i) \epsilon _u(t-i), \end{equation}

(3)

where \(k_u\), \(\varphi _u(i)\), and \(\theta _u(i)\) are calculated, for each cell u, using linear regression, in order to minimize the observed errors, \(\epsilon _u(t)\).

We used the ARIMA package in Python to compute an independent ARMA model for each cell u. The resulting model was used to predict the temporal evolution of the incidence rate \(z_u(t)\) for each cell u at time t.

4.2 Vector Autoregressive Model

We used a VAR model [20], which computes the incidence rate at each cell using all the values from the other cells, in accordance with

\begin{equation} z_u(t) = k_u + \epsilon _u(t) + \sum _{i=1}^p \sum _{v=1}^N A_{u,v} (i) z_v(t-i), \end{equation}

(4)

where \(z_u\) denotes the incidence rate and A is an \(N \times N\) time-invariant matrix, calculated, together with \(k_u\), using linear regression, in order to minimize the observed errors, \(\epsilon _u(t)\). We used the VAR package in Python to compute a VAR model for the data and a prediction of \(z_u(t)\), for each cell u at time t.

4.3 Compartmental Models Coupled with Block-DSS

4.3.1 SIRD.

In the SIRD compartmental model used at the municipality level, a given person can be in four different states, in what regards his or her relation with the disease: susceptible (S), infected (I), recovered (R), and dead (D). Under this model, the dynamics of the epidemic are then modeled using the following system of differential equations

\begin{equation} \frac{d S}{d t} = - \frac{\beta I S}{N}, \end{equation}

(5)

\begin{equation} \frac{d I}{d t} = \frac{\beta I S}{N}-\gamma I-\delta I, \end{equation}

(6)

\begin{equation} \frac{d R}{d t} = \gamma I, \end{equation}

(7)

\begin{equation} \frac{d D}{d t} = \delta I, \end{equation}

(8)

where \(\beta\), \(\gamma\), and \(\delta\) are the rates of infection, recovery, and mortality, respectively.

We computed a separate SIRD model for each municipality, \(\alpha\), with independent parameters \(\beta _\alpha\), \(\gamma _\alpha\), and \(\delta _\alpha\), for a total of 278 municipalities. Furthermore, as other authors have done [18], we removed the assumption of the basic SIRD model that the rates \(\beta\), \(\gamma\), and \(\delta\) are constant and depend only on the nature of the epidemic and the fixed spreading dynamics. Since, in practice, the behavior of an epidemic is sensitive to changes in human behavior and to other environmental variables, we used a generalized version of the SIRD model that uses time-varying rates, which are calculated from known pandemic data, at each municipality \(\alpha\), resulting in a series of values for each parameter:

\begin{equation} \beta _\alpha (t)=\frac{i_\alpha (t) }{I_\alpha (t)} , \end{equation}

(9)

\begin{equation} \gamma _\alpha (t)=\frac{r_\alpha (t)}{I_\alpha (t)} , \end{equation}

(10)

\begin{equation} \delta _\alpha (t)=\frac{d_\alpha (t)}{I_\alpha (t)} , \end{equation}

(11)

where \(i_\alpha (t)\) represents the new confirmed cases at time t, \(r_\alpha (t) = R_\alpha (t)-R_\alpha (t-1)\) the daily recovered, \(d_\alpha (t) = D_\alpha (t)-D_\alpha (t-1)\) the daily deaths, and \(I_\alpha (t)\) the total active cases at time t, given by

\begin{equation} I_\alpha (t_f)=\sum _{t=t_0}^{t_f} i_\alpha (t)-[r_\alpha (t)+d_\alpha (t)]. \end{equation}

(12)

To avoid instability and to reflect the fact that the \(\beta\), \(\gamma\), and \(\delta\) rates vary slowly, once the parameters are calculated for the entire historical data, kernel smoothing is applied to reduce noise. We use Nadaraya-Watson as the kernel with a bandwidth of 10.

4.3.2 Forecasting.

Each Portuguese municipality \(\alpha\) is treated as an independent region, modeled by its own set of SIRD equations, which can be used to forecast the evolution of the cases. In order to predict the future number of infections for each municipality, we solve the SIRD model forward in time using the last day of historical data as the initial conditions for S, I, \(R,\) and D. We obtain the values for \(\beta _\alpha (t)\), \(\gamma _\alpha (t)\), and \(\delta _\alpha (t)\) to use during the forecast period based on the historical smoothed parameter series. This can be done in three ways:

(1)

Using their last known value;

(2)

using an average of the last n days;

(3)

estimating via extrapolation.

4.3.3 Pseudo-count Component.

Equations (9)–(11) compute all parameters relative to the number of active cases. This can lead to severe fluctuations of parameter values when the population of the municipality is small since the variance of the parameters will be high. We mitigate this by introducing a pseudo-count component that forces parameter values to consider not only regional but national data as well—therefore avoiding unfounded radical shifts.

Let \(\beta _\alpha (t)\), \(\gamma _\alpha (t)\), and \(\delta _\alpha (t)\) be the parameters for municipality \(\alpha\) at time t, and \(\beta _P(t)\), \(\gamma _P(t)\), \(\delta _P(t)\) the values for the rates calculated on the national level. We can now calculate \(\beta ^{*}_\alpha (t)\) as

\begin{equation} \beta ^{*}_\alpha (t) = \frac{K \beta _P(t) + n_\alpha \beta _\alpha (t)}{K+n_\alpha }, \end{equation}

(13)

where K is the pseudo-count. If \(K=0\), then the national parameters are ignored. As K increases so do their influence but remain inversely proportional to the region’s population size. The values for \(\gamma _\alpha ^{*}(t)\), \(\delta _\alpha ^{*}(t)\) are computed in the same way. \(\beta _\alpha ^{*}(t)\), \(\gamma _\alpha ^{*}(t)\), and \(\delta _\alpha ^{*}(t)\) are then used in Equations (5)–(8) to model the evolution of the relevant variables, instead of using the rates computed directly from the historical time series.

4.3.4 Approximating the Number of Recoveries and Deaths.

Privacy laws create a challenge in obtaining the number of deaths for each region daily. Furthermore, it is difficult to determine the exact recovery date for every single patient accurately. As a way of overcoming these limitations, we approximate the values for each region’s number of recoveries and deaths based on its population size, the mean recovery period, and the national mortality ratio. We used a mean recovery period of 14 days, which led to the best fit of the models to the real data.

First, we calculate the national mortality rate for each day t

\begin{equation} \mu (t) = \frac{d_{P}(t)}{I_{P}(t-w)}, \end{equation}

(14)

where t is the current day, w is the expected recovery time, \(d_{P}(t)\) is the number of deaths in the country on day t, and \(I_{P}(t - w)\) is the total number of infected people in the country at time \(t-w\).

Then, for each municipality \(\alpha\) and for each day t, we estimate the number of recoveries and deaths, as follows:

\begin{equation} d_\alpha (t) = \mu (t) I_\alpha (t-w), \end{equation}

(15)

\begin{equation} r_\alpha (t) = I_\alpha (t) - d_\alpha (t), \end{equation}

(16)

where \(r_\alpha (t)\), \(I_\alpha (t)\), \(d_\alpha (t)\) are the number of recoveries, total active cases and deaths at time t, respectively.

4.3.5 Computing the Geographical Incidence Rate using the SIRD Model.

By solving the SIRD equations forward in time, with the estimated parameters, smoothed using pseudo-counts, we obtain the values of \(I_\alpha (t)\) for each municipality \(\alpha\) and each time t. From \(I_\alpha (t)\) we can trivially compute the number of new cases, for each day, which makes it possible to compute the incidence rate \(z_\alpha (t)\) for each time t. This data, generated by the SIRD model, is then used instead of the real data, as input to the block-DSS algorithm, using the procedure described in Section 3 to compute the incidence rate in each cell u of the territory at time t, \(z_u(t)\).

4.4 Spatio-temporal Convolutional Sequence-to-sequence Neural Networks

This approach uses recently proposed techniques for Spatio-temporal modeling using neural networks, resulting in a model derived from the STConvS2S architecture [28], coupled with new normalization-activation layers and learnable parameters intrinsic to each location.

The STConvS2S architecture is a sequence-to-sequence model comprised exclusively of convolutional layers, which are suited to capture spatial features by design (see Figure 5). Convolutions are performed with factorized 3D kernels \(K = t \times d \times d\), where t is the size of the temporal kernel and d is the size of the spatial kernel.

Fig. 5.

The first component is a temporal block that extracts temporal features through the temporal kernels (i.e., \(t \times 1 \times 1\)), while the second component is a spatial block that receives the output of the previous block in order to extract spatial features through the spatial kernels (i.e., \(1\, \times \, d\, \times \, d\)). Both the temporal and spatial blocks are comprised of successive convolutional blocks with normalization-activation operations, with the temporal block using causal convolutions to maintain temporal coherency during prediction (i.e., predictions for a time-step t make no use of future information from time-steps \(t + 1\) onward).

The feature maps generated throughout the architecture are adequately padded in order to maintain the initial dimensions, and the intermediate feature maps are increased by a factor of 2 in each layer, with the last convolutional layer in both the spatial and temporal block reducing the number back to the initial amount.

Instead of the traditional batch normalization and activation function operations following each convolution, we apply an extension of the normalization-activation layers recently proposed [31]. The chosen operation is an adapted version of the EvoNormB0 layer, which was the best performing batch-based version reported, and explicitly models Spatio-temporal scenarios, considering sequences of two-dimensional inputs when calculating both the batch and instance variance (i.e., the values associated to each time-step in the input sequence are considered separately when computing the variance). Let \(v_1\), \(\theta\), and \(\psi\) denote learnable parameter vectors, and let \(s_{b,d,w,h}\) and \(s_{d,w,h}\) represent the variance of a mini-batch and the variance of a single instance, respectively. This extension, denoted here as B03D, is defined as follows:

\begin{equation} \mathrm{B03D} = \dfrac{x}{\mathrm{max}\left(\sqrt {s_{b,d,w,h}^2\left(x\right) + \epsilon }, v_1 \cdot x + \sqrt {s_{d,w,h}^2\left(x\right) + \epsilon }\right)}\cdot \theta + \psi . \end{equation}

(17)

Another technique used in our model was the extension of the input data with learnable local features and weights, intrinsic to each spatial location, as originally proposed for the area of wind forecasting, where changes in location imply changes in the model parameters [32]. This technique seeks to improve Spatio-temporal forecasting scenarios by combining both (i) global location-invariant features, and (ii) location-specific features.

Typical CNNs are mostly translation invariant. Translating an input x and convolving with a filter k will yield the same result as translating the feature map resulting from a convolution between x and k. Therefore, standard CNNs treat each spatial location equally and thus learn global patterns. This is often insufficient in many Spatio-temporal forecasting scenarios, where the behavior in specific locations should be guided by their own intrinsic local features.

To allow the learning of such features, we used two complementary techniques, Learnable Inputs (LI) and Local Weights (LW). The LIs correspond to a set of n trainable parameters with the same spatial dimensions as the original input, concatenated with the input before processing with a convolutional layer. These parameters allow the network to learn local features regarding every spatial location, which can complement the learned global patterns or be ignored by the convolutional kernels in situations where they are not beneficial. The LWs further reinforce the individualized learning of different spatial locations, through a locally-connected layer (i.e., a convolutional layer with a different filter at each input region, allowing different spatial locations to be weighed according to their relevance) of weights over the input, resulting in m trainable weights with the same spatial dimensions as the input. Similar to LIs, these weights are afterward concatenated with the original input. In our experiments, both the LIs and LWs are concatenated with the inputs on the channel dimension (sharing the LIs across the different input time-steps and using different LWs in each time-step), prior to every convolutional layer. LIs are implemented with a locally connected layer that receives as input a constant unitary tensor with the same spatial dimensions as the original input, and the LWs are implemented with a separate locally connected layer that receives the original tensor as input.

5 Experimental Comparison of Modeling Accuracy

We performed an empirical comparison of the four methods described in Section 4 by using them to predict the incidence rate 7, 10, and 14 days ahead (corresponding to time-step \(t+T\), with \(T=7\), \(T=10\), and \(T=14\)), using only past data up to time-step t. The methods were used to predict this variable from September 7th, 2020, until February 28th, 2021. Each model was initially trained with data from the first six months of the pandemic (March 1st, 2020–August 31st, 2020) and, as the prediction moved forward, was allowed to use existing data up to T days before the day being predicted. During the periods used for training and testing (twelve months) Portugal faced three major waves of the infection. The prediction period included the two largest waves, which reached incidence rates never seen during the training period. Figure 6 shows the 14-day incidence rates, per 100,000 inhabitants, for the whole country, during the first year of the pandemic.

Fig. 6.

5.1 Model Training and Parameterization

The ARMA and VAR baseline models used as input data the entire past sequence for each cell up to time t. They were then used to predict the T next days sequentially. We considered only the last value in the output sequence, corresponding to the time-step \(t+T\), and the process was repeated by sliding the window one day into the future and concatenating the ground-truth value corresponding to the time-step \(t+1\) with the existing data.

For the SIRD compartmental models, no training is necessary. As explained in Section 4.3.5, the time-varying parameters for each municipality were determined and used to solve the equations forward in time. For each parameter, we calculate each day in the historical dataset and then use extrapolation to project its values forward T days. By solving the equations for each municipality \(\alpha\) we get a prediction of the future number of new cases for that municipality, \(z_{\alpha }\). We then applied the block-DSS algorithm to determine the incidence rate for each cell in the territory, as described in Section 4.3.5. The application of the block-DSS algorithm used the same spatial covariance matrix as the one used to build the gold standard dataset.

For the STConvS2S architecture, the model received as input a sequence of the previous T contiguous days, and generated a sequence corresponding to the next T days. The final prediction is thus given by considering only the last time-step in the output sequence. This process was repeated by sliding the input window one day into the future, as was done for the ARMA and VAR models. Furthermore, in order to incrementally update the parameters as new data becomes available, we apply a simple online learning procedure, fine-tuning the model with each new input sequence for 5 epochs, after the prediction was made. This enables the model to quickly adapt to sudden changes.

Regarding hyperparameter choices, for the ARMA model, we consider \(p = 7\) and \(q = 1\). For the VAR model, we used \(p=4\). For the SIRD model, we used the pseudo-count parameters \(K=10,\!000\) in the prediction seven days ahead and \(K=100,\!000\) in the prediction 10 and 14 days ahead. These values were selected as the ones that obtained better results. As in the original proposal, the STConvS2S architecture used 3 convolutional layers for both the temporal and spatial block, plus the final convolution to reduce the channel dimensionality back to one. The initial number of convolutional filters was set to 32, and each filter was of dimensionality 5 \(\times\) 5. In order to select the most optimal Local Inputs and Learnable Weights configuration, we varied the filter size k of the LWs between 1 \(\times\) 1 \(\times\) T, 2 \(\times\) 2 \(\times\) T, and 1 \(\times\) 1 \(\times\) 1 (i.e, in this last case, considering a direct element-wise weighing unique to each spatial location and time-step in the input sequence), where T corresponds to the number of time-steps in the input sequence. Furthermore, we varied the number of LIs and LWs \(n \in \lbrace 1, 2, 3\rbrace\). The parameters were finally set as k = 1 \(\times\) 1 \(\times\) 1 and \(n = 2\). Training relied on the AdaMod optimizer using a learning rate of \(10^{-3}\) and batch size of five. Furthermore, for the STConvS2S model, in order to fit in the GPU memory and to avoid losing information over individual regions, the same architecture was trained and tested on three different subsets of the input region, corresponding to the upper third, middle third, and lower third of Portugal. The final prediction results were then concatenated, resulting in a prediction encompassing all of mainland Portugal.

5.2 Experimental Results

The predictions made by each method were compared with the reference incidence rate values, obtained as described in Section 3. We computed two commonly used figures of merit in order to compare the predictions of the models with the reference data, the RMSE, and the sMAPE.

For a given day t, the root mean square error (RMSE) between the predicted value and the reference value, averaged over all cells, is given by

\begin{equation} RMSE = \sqrt {\frac{\sum _{u} (z_u(t) - \widehat{z_u}(t))^2}{N}}, \end{equation}

(18)

where \(z_u(t)\) is, as before, the incidence rate in cell u at time t and \(\widehat{z_u}(t)\) is the value predicted by the model.

Since absolute values of the incidence rate vary greatly by location, it is also relevant to compute the second figure of merit, the symmetric mean average error (sMAPE):

\begin{equation} sMAPE = \frac{1}{N} \sum _u \frac{2 |z_u(t) - \widehat{z_u}(t)|}{z_u(t)+\widehat{z_u}(t)}. \end{equation}

(19)

Figures 7 and 8 show the evolution of these two figures of merit for the four methods used (day 0 corresponds to September 7th, 2020). The superior performance of the STConvS2S method is clear, when compared with both the baseline ARMA and VAR models, and the SIRD model. As always, the incidence rate is reported as the number of new cases in the last 14 days per 100,000 inhabitants, and the value of the RMSE uses the same scale.

Fig. 7.

Fig. 8.

Table 1 shows the values for the RMSE and the sMAPE of the predictions, averaged over all days in the period September 7th, 2020 to February 28th, 2021. The table includes the four methods studied and also a reference baseline, the ConvLSTM [23]. This table shows the clear superiority of the neural network-based Spatio-temporal convolution methods over the alternatives and, in particular, the superiority of the STConvS2S algorithm, not only in the quality of the predictions but in the stability of these forecasts as the prediction horizon is moved forward.

Table 1.

Method	RMSE	sMAPE	RMSE	sMAPE	RMSE	sMAPE
	7 days ahead ( \(\mu \pm \sigma\))		10 days ahead ( \(\mu \pm \sigma\))		14 days ahead ( \(\mu \pm \sigma\))
ARMA	\(210.7 \pm 152.6\)	\(0.653 \pm 0.345\)	\(419.4 \pm 668.2\)	\(0.817 \pm 0.365\)	3,798.3 \(\pm\) 18,534.6	\(0.995 \pm 0.363\)
VAR	\(162.6 \pm 139.7\)	\(0.519 \pm 0.281\)	\(230.9 \pm 210.3\)	\(0.652 \pm 0.305\)	\(317.6 \pm 307.0\) \(\hphantom{\!\pm }\)	\(0.799 \pm 0.328\)
SIRD	\(204.0 \pm 156.0\)	\(0.566 \pm 0.251\)	\(195.0 \pm 152.6\)	\(0.575 \pm 0.297\)	\(438.6 \pm 337.4\) \(\hphantom{\!\pm }\)	\(0.883 \pm 0.343\)
ConvLSTM	\(117.3 \pm 97.6\) \(\hphantom{\!\pm }\)	\(0.370 \pm 0.226\)	\(155.6 \pm 120.7\)	\(0.452 \pm 0.214\)	\(198.0 \pm 151.9\) \(\hphantom{\!\pm }\)	\(0.527 \pm 0.244\)
STConvS2S	\(89.4 \pm 67.1\)	\(0.342 \pm 0.252\)	\(87.5 \pm 69.9\)	\(0.331 \pm 0.250\)	\(98.6 \pm 72.3\) \(\hphantom{\!\pm }\)	\(0.393 \pm 0.296\)

Table 1. Average and Standard Deviation of RMSE and sMAPE, Computed for All Cells and All Days in the Period September 8th, 2020–February 28th, 2021, for Prediction 7, 10, and 14 Days Ahead

Figures 9 and 10 illustrates the quality of the predictions for week seventeenth of the predicted period, the week after Christmas (December 28th, 2020–January 3rd, 2021). This was a particularly relevant week since it corresponded to a sharp inflection of the tendency and the start of the third wave. The predictions were made with the data available until December 21st, 2020, for the first day in the week. The window was then adjusted by one day for each successive day. The results clearly show the superior predictive ability of the STConvS2S model which, in this particular week, only exhibited significant error in a sparsely populated region in the south of Portugal, Alentejo, while the other models exhibited a much more significant error in different parts of the country. Other weeks exhibit similar behavior, although there is significant variation in the model errors over time, as shown in Figures 7 and 8.

Fig. 9.

Fig. 10.

5.3 Analysis and Discussion

The results obtained in these tests have shown conclusively the superior predictive ability of the modified STConvS2S method, when compared with the other methods described in Section 4. We attribute this superiority to the ability of this model to take into consideration the incidence rate at neighboring cells, when predicting the evolution of that rate in a given cell. Although, conceptually, the VAR model could also use this information, it has no access to the position of each cell and, even though it beats the basic ARMA method, it does not reach the level of precision attained by the STConvS2S model. STConvS2S also performs significantly better than the baseline Spatio-temporal neural network ConvLSTM. We conjecture that the use of local weights and learnable inputs, coupled with the modified normalization-activation layers, made the STConvS2S more able to adapt to dynamics that change with the characteristics of the territory and, consequently, more precise.

Furthermore, the predictions of the STConvS2S and the ConvLSTM degrade more slowly as the prediction horizon is moved forward, while the predictions of the other methods, namely ARMA and SIRD, degrade rapidly as we ask the models to make predictions further into the future. These results also show that precise predictions for more than two weeks ahead, in this specific problem, are difficult to make, possibly because the dynamics of the phenomenon change in ways that are hard to learn from past data. Still, the Spatio-temporal neural networks (ConvLSTM and STConvS2S) are clearly the ones that degrade more gracefully, although at the cost of significant use of computational resources that, ultimately, make long-term predictions to error prone and too expensive.

The SIRD model suffers from the same limitations as the ARMA model, in that it cannot use information from the neighboring municipalities to infer the dynamics of the pandemic. In principle, this model should have been the one better tuned to the reference data, since it uses exactly the same method to infer cell level incidence rates from municipality level rates as the approach used to obtain the reference data: the Block-DSS algorithm. We attribute the lower precision of this method to two factors: the inherent limitations of the time-varying SIRD models to make predictions far in the future due to the changing pandemic dynamics; and the inability of the SIRD model to take into consideration the geographical relations between municipalities. We conjecture that a modified SIRD model that uses information from neighboring municipalities may improve significantly these results.

The adherence between the predictions and the reference data obtained with the STConvS2S method is dependent on a set of assumptions, analyzed in Section 2.1, which are used by the kriging method used to generate the reference data from the official infection numbers. The models also assume that the number of detected cases is a reasonable and stable proxy for the actual number of infections in a region. We believe that the reference data, which we make publicly available as part of additional material, is relevant for the study of the evolution of the pandemic in Portugal and accurately reflects the underlying incidence rate (and, implicitly, infection risk). However, the methodology used to generate the reference data may be more or less precise in different geographies, where phenomena that violate one or more of the assumptions analyzed in Section 2.1 may occur more frequently.

6 Conclusions and Future Work

We computed an approximation to the Spatio-temporal incidence rate of COVID-19 in mainland Portugal, using a methodology that uses the official municipality level information made available by the Portuguese DGS post-processed by block-DSS. This dataset is made available with the publication of this article, and was used as a gold standard for the Spatio-temporal evolution of the COVID-19 incidence rates during the first year, in this specific geography.

We then used these data as a gold standard to test the behavior of four predictive models: an autoregressive-moving-average model, a vector autoregressive model, a model based on the combination of a SIRD compartmental model with block-DSS, and a model based on the STConvS2S methodology. We also included, in the comparison, the ConvLSTM baseline. We concluded that the modified STConvS2S neural network performed significantly better than the alternatives considered, and should therefore be considered the state-of-the-art for this problem. We invite researchers interested in applying Spatio-temporal prediction methods to use these data, and to compare the predictions with the ones obtained by the four methods reported in this study.

All the data used in this article, a significant set of additional results including the day-by-day previsions, maps of incidence rate, as well as the code for the SIRD and the STConvS2S models is available at http://covid.vps.tecnico.ulisboa.pt/models. The baseline incidence rate data and the previsions of this model have been integrated in the interactive information site available at http://covid.vps.tecnico.ulisboa.pt.

Acknowledgments

The authors would like to thank André Peralta Santos and the Portuguese Directorate-General for Health, for making available the municipality level epidemiological data.

References

[1]

Leonardo Azevedo, Maria João Pereira, Manuel C. Ribeiro, and Amílcar Soares. 2020. Geostatistical COVID-19 infection risk maps for Portugal. International Journal of Health Geographics 19, 1 (2020), 1–8.

Abstract

1 Introduction

2 Related Work

2.1 Geospatial Modeling of Disease Risk

2.2 Compartmental Models

2.3 Autoregressive Moving Average and Vector Autoregressive Models

2.4 Spatio-temporal Modeling using Neural Networks

3 Computing the Incidence Rate

3.1 Computation of Incidence Rate using Block-DSS

3.2 Dataset Generation and Characterization

4 Modeling Methods

4.1 Autoregressive Moving Average Model

4.2 Vector Autoregressive Model

4.3 Compartmental Models Coupled with Block-DSS

4.3.1 SIRD.

4.3.2 Forecasting.

4.3.3 Pseudo-count Component.

4.3.4 Approximating the Number of Recoveries and Deaths.

4.3.5 Computing the Geographical Incidence Rate using the SIRD Model.

4.4 Spatio-temporal Convolutional Sequence-to-sequence Neural Networks

5 Experimental Comparison of Modeling Accuracy

5.1 Model Training and Parameterization

5.2 Experimental Results

5.3 Analysis and Discussion

6 Conclusions and Future Work

Acknowledgments

References

Cited By

Index Terms

Recommendations

Convolutional neural networks and temporal CNNs for COVID-19 forecasting in France

Time series forecasting of COVID-19 transmission in Asia Pacific countries using deep neural networks

An advanced spatio-temporal convolutional recurrent neural network for storm surge predictions

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations