EpiLPS: a fast and flexible Bayesian tool for approximate real-time estimation of the time-varying reproduction number ================ Oswaldo Gressani

EpiLPS (Gressani et al. 2022), is
the acronym for **Epi**demiological modeling (tool) with
**L**aplacian-**P**-**S**plines.
It proposes a new Bayesian methodology for estimating the instantaneous
reproduction number \(R_t\), i.e. the
average number of secondary cases generated by an infectious agent at
time \(t\)
(White et al., 2020) ; a key metric for assessing the
transmission dynamics of an infectious disease and a useful reference
for guiding interventions of governmental institutions in a public
health crisis. The EpiLPS project builds upon two strong pillars in the
statistical literature, namely Bayesian P-splines and Laplace
approximations, to deliver a fast and flexible methodology for inference
on \(R_t\). EpiLPS requires two
(external) inputs: (1) a time series of incidence counts and (2) a
serial or generation interval distribution.

The underlying model for smoothing incidence counts is based on the
Negative Binomial distribution to account for possible overdispersion in
the data. EpiLPS has a *two-phase engine* for estimating \(R_t\). First, Bayesian P-splines are used
to smooth the epidemic curve and to compute estimates of the mean
incidence count of the susceptible population at each day \(t\) of the outbreak. Second, in the
philosophy of Fraser (2007), the
renewal equation is used as a bridge to link the estimated mean
incidence and the reproduction number. As such, the second phase
delivers a closed-form expression of \(R_t\) as a function of the B-spline
coefficients and the serial interval distribution.

Another key strength of EpiLPS is that two different strategies can
be used to estimate \(R_t\). The first
approach called LPSMAP is completely sampling-free and fully relies on
Laplace approximations and *maximum a posteriori* computation of
model hyperparameters for estimation. Routines for Laplace
approximations and B-splines evaluations are typically the ones that are
computationally most intensive and are therefore coded in C++ and
integrated in R via the Rcpp
package, making them time irrelevant. The second approach is called
LPSMALA (Laplacian-P-splines with a Metropolis-adjusted Langevin
algorithm) and is fully stochastic. It samples the posterior of the
model by using Langevin diffusions in a Metropolis-within-Gibbs
framework. Of course, LPSMAP has a computational advantage over LPSMALA.
Thanks to the lightning fast implementation of Laplace approximations,
LPSMAP typically delivers estimate of \(R_t\) in a fraction of a second. The chart
below, summarizes the skeleton and mechanisms behind EpiLPS for
estimating \(R_t\).

As the EpiLPS package includes C++ code, Windows users will need to install Rtools to include the required compilers for a smooth experience. Rtools is free and can be downloaded from https://cran.r-project.org/bin/windows/Rtools/. To install the Github version of EpiLPS (with devtools) type the following lines in the R console:

```
install.packages("devtools")
::install_github("oswaldogressani/EpiLPS") devtools
```

The package can then be loaded as follows:

`library("EpiLPS")`

The EpiLPS package structure is fairly simple. It has three main routines and an S3 method for plots:

`epilps()`

The main routine for model fit.`plot.epilps()`

S3 method to plot an object of class`epilps`

.`episim()`

A routine to simulate epidemic data.`perfcheck()`

Checks the performance of`epilps()`

via simulations.

To simulate data with `episim()`

, a serial interval
distribution and a pattern for the true reproduction number curve has to
be specified. Six patterns are available for the moment. The data
generating process is based on Poisson counts or negative binomial
counts and the epidemic renewal equation for establishing the link
between the mean number of infections and the reproduction number. The
default duration of the simulated outbreak is 50 days but other choices
are possible. The code below simulates an epidemic according to pattern
4 and gives summarizing plots by setting the option
`plotsim = TRUE`

:

```
<- c(0.12, 0.28, 0.30, 0.25, 0.05) # Specify a serial interval distribution
si <- episim(serial_interval = si, Rpattern = 4, plotsim = TRUE) simepi
```

The simulated incidence count data can be accessed by typing:

`$y simepi`

```
## [1] 10 4 2 5 10 10 15 13 32 31 44 68 69 102 104 130 151 148 201
## [20] 175 196 197 198 172 183 169 121 129 96 93 73 58 53 41 29 26 9 10
## [39] 11 9 4 3 1 3 1 0 2 0 0 0
```

The `epilps()`

routine can be used to fit the epidemic
data. By default, the LPSMAP approach is used with 30 B-splines in the
interval \([1;50]\) and a second order
penalty. The `plot()`

routine on the
`epifit_LPSMAP`

object can be used to plot the estimated
reproduction number.

`<- epilps(incidence = simepi$y, serial_interval = si, tictoc = TRUE) epifit_LPSMAP `

```
## Inference method chosen: LPSMAP.
## CI for LPSMAP computed via lognormal posterior approx. of Rt.Total number of days: 50.
## Mean Rt discarding first 7 days: 0.951.
## Mean 95% CI of Rt discarding first 7 days: (0.816,1.149)
## Elapsed real time (wall clock time): 0.16 seconds.
```

`plot(epifit_LPSMAP)`

Several options can be specified in the `plot()`

routine.
For instance, graphical parameters such as `themetype`

and
`rtcol`

can be used to control the theme and color of the
fitted \(R_t\) curve. In addition, the
option `overlayEpiestim`

can be set to `TRUE`

to
overlay the estimated \(R_t\) curve
with the EpiEstim package of Cori et al.,
(2013) .

`plot(epifit_LPSMAP, themetype = "light", rtcol = "steelblue", overlayEpiestim = TRUE)`

The numerical values of the estimated \(R(t)\) at days \(t=8,\dots,14\) obtained with LPSMAP and the associated \(95\%\) credible interval can be obtained by typing:

`::kable(epifit_LPSMAP$epifit[8:14,2:4]) knitr`

R_estim | R95CI_low | R95CI_up | |
---|---|---|---|

8 | 2.075027 | 1.683394 | 2.557772 |

9 | 2.206379 | 1.838787 | 2.647456 |

10 | 2.261637 | 1.923711 | 2.658926 |

11 | 2.241810 | 1.952498 | 2.573991 |

12 | 2.158756 | 1.908214 | 2.442194 |

13 | 2.031592 | 1.823449 | 2.263495 |

14 | 1.887389 | 1.709116 | 2.084258 |

A smooth estimate of the epidemic curve can be obtained with the code
below. The option `epicol`

controls the color of the curve
and `incibars`

can be set to *TRUE* or *FALSE*
to show or not the bar plot of the incidence counts.

`plot(epifit_LPSMAP, plotout = "epicurve", themetype = "light", epicol = "orange", incibars = TRUE)`

To illustrate EpiLPS on real data, we work with the Covid19 R
Interface Data Hub https://covid19datahub.io/. Four countries are
considered (Luxembourg, Italy, Canada and Japan) and the reproduction
number is estimated with LPSMAP over the period April 2020 - October
2021 with a uniform serial interval over 5 days. For Japan, option
`overlayEpiestim`

is *TRUE* to compare the EpiLPS and
EpiEstim fits.

```
library("COVID19")
# Uniform serial interval over 5 days
<- c(0.2, 0.2, 0.2, 0.2, 0.2)
si
# Luxembourg
<- covid19(country = "LUX", level = 1, verbose = FALSE) Luxembourg
```

```
## Warning in require_bit64_if_needed(ans): Some columns are type 'integer64'
## but package bit64 is not installed. Those columns will print as strange
## looking floating point data. There is no need to reload the data. Simply
## install.packages('bit64') to obtain the integer64 print method and print the
## data again.
```

```
<- Luxembourg$date[75:649]
dateLUX <- Luxembourg$hosp[75:649]
inciLUX
# Italy
<- covid19(country = "ITA", level = 1, verbose = FALSE) Italy
```

```
## Warning in require_bit64_if_needed(ans): Some columns are type 'integer64'
## but package bit64 is not installed. Those columns will print as strange
## looking floating point data. There is no need to reload the data. Simply
## install.packages('bit64') to obtain the integer64 print method and print the
## data again.
```

```
<- Italy$date[42:616]
dateITA <- Italy$hosp[42:616]
inciITA
# Canada
<- covid19(country = "CAN", level = 1, verbose = FALSE) Canada
```

```
## Warning in require_bit64_if_needed(ans): Some columns are type 'integer64'
## but package bit64 is not installed. Those columns will print as strange
## looking floating point data. There is no need to reload the data. Simply
## install.packages('bit64') to obtain the integer64 print method and print the
## data again.
```

```
<- Canada$date[71:645]
dateCAN <- Canada$hosp[71:645]
inciCAN
# Japan
<- covid19(country = "JPN", level = 1, verbose = FALSE) Japan
```

```
## Warning in require_bit64_if_needed(ans): Some columns are type 'integer64'
## but package bit64 is not installed. Those columns will print as strange
## looking floating point data. There is no need to reload the data. Simply
## install.packages('bit64') to obtain the integer64 print method and print the
## data again.
```

```
<- Japan$date[75:649]
dateJPN <- Japan$hosp[75:649]
inciJPN
# Fit with EpiLPS
<- epilps(incidence = inciLUX, serial_interval = si, verbose = FALSE)
epiLUX <- epilps(incidence = inciITA, serial_interval = si, verbose = FALSE)
epiITA <- epilps(incidence = inciCAN, serial_interval = si, verbose = FALSE) epiCAN
```

```
## Warning in epilps(incidence = inciCAN, serial_interval = si, verbose =
## FALSE): Count data contains 3.65% of NA values that have been replaced (see
## documentation for details). A 'too high' percentage of NA values can bias the
## analysis, so care should be taken in interpreting the results in this case.
```

```
<- epilps(incidence = inciJPN, serial_interval = si, verbose = FALSE)
epiJPN
::grid.arrange(
gridExtraplot(epiLUX, dates = dateLUX, datelab = "3m", rtcol = "steelblue",
Rtitle = "Estimated R Luxembourg"),
plot(epiITA, dates = dateITA, datelab = "3m", rtcol = "chartreuse4",
Rtitle = "Estimated R Italy"),
plot(epiCAN, dates = dateCAN, datelab = "3m", rtcol = "brown2",
Rtitle = "Estimated R Canada"),
plot(epiJPN, dates = dateJPN, datelab = "3m", rtcol = "darkorchid1",
overlayEpiestim = TRUE, Rtitle = "Estimated R Japan"),
nrow = 2, ncol = 2)
```

To check the (statistical) performance of EpiLPS, the
`perfcheck()`

routine can be used to simulate epidemic
outbreaks under four different scenarios. Each scenario has a different
\(R_t\) curve to be compared with the
estimated trajectories fitted by EpiLPS. For comparative reasons, the
trajectories of EpiEstim (with a weekly sliding window) are also shown.
The code below simulates 25 epidemic outbreaks with a data generating
process following scenario 3 and a given serial interval distribution. A
seed can also be specified for reproducibility.

```
<- perfcheck(S = 20, method = "LPSMALA",
simexample serial_interval = c(0.2, 0.4, 0.2, 0.1, 0.1),
scenario = 3, ci_level = 0.95, seed = 1234,
dist = "negbin", overdisp = 1000,
themetype = "gray")
```

```
## Comparing LPSMALA vs EpiEstim in S=20 replications (epidemic T=50 days)
## Mean Bias on days 8-50:
## -- EpiLPS mean Bias: 0.00117
## -- EpiEstim mean Bias: -0.01485
## Mean MSE on days 8-50:
## -- EpiLPS mean MSE: 0.01053
## -- EpiEstim mean MSE: 0.07618
## Mean credible interval coverage on days 8-50 (nominal level: 95 %):
## -- EpiLPS mean coverage: 94.76744
## -- EpiEstim mean coverage: 30.11628
## -- EpiLPS mean CI width: 0.39
## -- EpiEstim mean CI width: 0.3
## Incidence of cases generated from a negative binomial distribution.
## Overdispersion parameter value: 1000.
## EpiEstim window size: 7 day(s).
## Average values computed over period t=8 to T=50.
## Penalty order for P-splines: 2.
## Parameters for the Gamma prior on dispersion parameter: a=10 b=10.
```

This is version 1.0.7 (2023-01-17) - “Thunderlight”.

This project is funded by the European Union’s Research and Innovation Action under the H2020 work programme, EpiPose (grant number 101003688).

EpiLPS: a fast and flexible Bayesian tool for estimation of the time-varying reproduction number. Copyright (C) 2021-2023 Oswaldo Gressani.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Gressani, O., Wallinga, J., Althaus, C. L., Hens, N. and Faes, C.
(2022). EpiLPS: A fast and flexible Bayesian tool for estimation of the
time-varying reproduction number. *PLoS Comput Biol*
**18**(10): e1010618. https://doi.org/10.1371/journal.pcbi.1010618

White, L. F., Moser, C. B., Thompson, R. N., & Pagano, M. (2021).
Statistical estimation of the reproductive number from case notification
data. *American Journal of Epidemiology*,
**190**(4), 611-620.

Fraser C (2007) Estimating Individual and Household Reproduction
Numbers in an Emerging Epidemic. *PLoS ONE*
**2**(8): e758. https://doi.org/10.1371/journal.pone.0000758

Cori, A., Ferguson, N.M., Fraser, C., Cauchemez, S. (2013) A new
framework and software to estimate time-varying reproduction numbers
during epidemics, *American Journal of Epidemiology*,
**178**(9), 1505–1512. https://doi.org/10.1093/aje/kwt133