Fit a Stan SEIR model — fit

This function fits a Stan SEIR model to one or more sets of COVID-19 case data. See project_seir() for making predictions/projections.

fit_seir(
  daily_cases,
  obs_model = c("NB2", "Poisson"),
  forecast_days = 0,
  time_increment = 0.25,
  samp_frac_fixed = NULL,
  samp_frac_type = c("fixed", "estimated", "rw", "segmented"),
  samp_frac_seg = NULL,
  f_seg = c(0, rep(1, nrow(daily_cases) + forecast_days - 1)),
  days_back = 45,
  R0_prior = c(log(2.6), 0.2),
  phi_prior = 1,
  f_prior = c(0.4, 0.2),
  e_prior = c(0.8, 0.05),
  samp_frac_prior = c(0.4, 0.2),
  start_decline_prior = c(log(15), 0.05),
  end_decline_prior = c(log(22), 0.05),
  use_ramp = TRUE,
  rw_sigma = 0.1,
  seed = 42,
  chains = 4,
  iter = 2000,
  N_pop = 5100000,
  pars = c(D = 5, k1 = 1/5, k2 = 1, q = 0.05, ud = 0.1, ur = 0.02, f0 = 1),
  i0_prior = c(log(8), 1),
  state_0 = c(E1_frac = 0.4, E2_frac = 0.1, I_frac = 0.5, Q_num = 0, R_num = 0,
    E1d_frac = 0.4, E2d_frac = 0.1, Id_frac = 0.5, Qd_num = 0, Rd_num = 0),
  save_state_predictions = FALSE,
  delay_scale = 9.85,
  delay_shape = 1.73,
  ode_control = c(1e-07, 1e-06, 1e+06),
  fit_type = c("NUTS", "VB", "optimizing"),
  init = c("prior_random", "optimizing"),
  init_list = NULL,
  X = NULL,
  ...
)

Arguments

daily_cases	Either a vector of daily new cases if fitting to a single data type or a matrix of case data if fitting to multiple data types. Each data type should be in its own column. Can have NA values (will be ignored in likelihood). A vector will be turned into a one column matrix.
obs_model	Type of observation model.
forecast_days	Number of days into the future to forecast. The model will run faster with fewer forecasted days. It is recommended to set this to 0 here and use `project_seir()` for projections.
time_increment	Time increment for ODEs and Weibull delay-model integration in units of days. Larger numbers will run faster, possibly at the expense of accuracy.
samp_frac_fixed	A vector (or matrix) of sampled fractions. Should be of dimensions: `nrow(daily_cases) + forecast_days` (rows) by `ncol(daily_cases` (columns). A vector will be turned into a one column matrix.
samp_frac_type	How to treat the sample fraction. Fixed, estimated, a constrained random walk, or segmented. Only applies to the first data type column. The other data types must always have a fixed sample fraction currently.
samp_frac_seg	A vector of sample fraction segment indexes of length `daily_cases` + `forecast_days`. Should start at 1. Applies if `samp_frac_type = "segmented"`.
f_seg	A vector of segment indexes of length `daily_cases` + `forecast_days`. The segment index values should start at 0 to represent the fixed "no social distancing" value `f0` from the `pars` argument. Doesn't come into play until after `end_decline` day.
days_back	Number of days to go back for the Weibull case-delay integration. Should be sufficiently large that the results do not change.
R0_prior	Lognormal log mean and SD for the R0 prior.
phi_prior	SD of `1/sqrt(phi) ~ Normal(0, SD)` prior, where NB2(mu, phi) and `Var(Y) = mu + mu^2 / phi`. See the Stan Prior Choice Recommendations. If of length 2 will be treated as lognormal prior on phi.
f_prior	Beta mean and SD for the `f` parameters. If S segments are used, should be a Sx2 matrix. If multiple `f` segments are used but only one mean and SD are specified, they will be repeated as needed.
e_prior	Beta mean and SD for the `e` (derived) parameter. `e` represents the fraction of people who are social distancing.
samp_frac_prior	`samp_frac` prior if `samp_frac_type` is "estimated" or "rw" or "segmented". In the case of the random walk, this specifies the initial state prior. The two values correspond to the mean and SD of a Beta distribution. Only applies to first time series.
start_decline_prior	Lognormal log mean and SD for the parameter representing the day that social distancing starts ramping in (`start_decline`).
end_decline_prior	Lognormal log mean and SD for the parameter representing the day that the social distancing ramp finishes being ramped in (`end_decline`).
use_ramp	Logical. Use the ramp?
rw_sigma	The fixed standard deviation on the optional `samp_frac` random walk.
seed	MCMC seed for `rstan::stan()`.
chains	Number of MCMC chains for `rstan::stan()`.
iter	MCMC iterations per chain for `rstan::stan()`.
N_pop	Number of people in population.
pars	A named numeric vector of fixed parameter values.
i0_prior	Infected people infected at initial point in time. Lognormal log mean and SD for the parameter. Note that the default is set up for BC. You may need to use a lower prior for other regions.
state_0	Initial state: a named numeric vector.
save_state_predictions	Include the state predictions? `y_hat` Will make the resulting model object much larger.
delay_scale	Weibull scale parameter for the delay in reporting. If there are multiple datatypes then this should be a vector of the same length as the data types.
delay_shape	Weibull shape parameter for the delay in reporting. Same format as for `delay_scale`.
ode_control	Control options for the Stan ODE solver. First is relative difference, then absolute difference, and then maximum iterations. These can likely be left as is.
fit_type	Stan sampling/fitting algorithm to use. NUTS = `rstan::sampling()` or `rstan::stan()`; VB = `rstan::vb()`; optimizing = `rstan::optimizing()`.
init	Initialization type. Draw randomly from the prior or try to use the MAP estimate? Can be overridden by the `init_list` argument.
init_list	An optional named list foreign initialization. Note that the `f_s` argument needs to be an array of appropriate length. See the example below.
X	An optional model matrix applied additively to log expected cases.
...	Other arguments to pass to `rstan::sampling()` / `rstan::stan()` / `rstan::vb()` / `rstan::optimizing()`.

Value

A named list object

Examples

# Example daily case data:
cases <- c(
  0, 0, 1, 3, 1, 8, 0, 6, 5, 0, 7, 7, 18, 9, 22, 38, 53, 45, 40,
  77, 76, 48, 67, 78, 42, 66, 67, 92, 16, 70, 43, 53, 55, 53, 29,
  26, 37, 25, 45, 34, 40, 35
)

# Example assumed sampling fractions of positive cases:
s1 <- c(rep(0.1, 13), rep(0.2, length(cases) - 13))

# To use parallel processing with multiple chains:
# options(mc.cores = parallel::detectCores() / 2)

# Only using 200 iterations and MAP (optimizing) fitting for example speed:
m <- fit_seir(
  cases,
  iter = 200,
  fit_type = "optimizing",
  samp_frac_fixed = s1
)
#> Finding the MAP estimate.
print(m)
#> MAP estimate:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          4.05          3.38         14.69         21.39          0.18 
#>        phi[1]             e 
#>          6.66          0.82 
#> Mean in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          4.73          3.40         14.77         21.38          0.19 
#>        phi[1]             e 
#>          7.31          0.83 
#> SD in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          3.08          0.27          0.71          1.00          0.09 
#>        phi[1]             e 
#>          2.51          0.04 
names(m)
#>  [1] "fit"                  "post"                 "phi_prior"           
#>  [4] "R0_prior"             "f_prior"              "obs_model"           
#>  [7] "e_prior_trans"        "start_decline_prior"  "end_decline_prior"   
#> [10] "samp_frac_fixed"      "state_0"              "daily_cases"         
#> [13] "days"                 "time"                 "last_day_obs"        
#> [16] "pars"                 "f2_prior_beta_shape1" "f2_prior_beta_shape2"
#> [19] "stan_data"            "days_back"            "opt"                 
#> [22] "fit_type"            
names(m$post) # from rstan::extract(m$fit)
#> [1] "i0"            "R0"            "start_decline" "end_decline"  
#> [5] "phi"           "ur"            "e"             "f_s"          

# Use tidybayes if you'd like:
# post_tidy <- tidybayes::spread_draws(m$fit, c(y_rep, mu)[day, data_type])

# Add hospitalization data and estimate 2 sample-fraction blocks
# for the reported cases:
samp_frac_seg <- c(rep(1, 13), rep(2, length(cases) - 13))

s2 <- rep(0.07, 42) # Assuming 7\% of positive individuals are hospitalized
hosp <- c(
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 11, 9, 4,
  7, 19, 23, 13, 11, 3, 13, 21, 14, 17, 29, 21, 19, 19, 10, 6,
  11, 8, 11, 8, 7, 6
)

# Only using 200 iterations and MAP (optimizing) fitting for example speed:
m2 <- fit_seir(
  daily_cases = cbind(cases, hosp),
  iter = 200,
  samp_frac_type = "segmented", # `samp_frac_type` affects the first data type only
  samp_frac_seg = samp_frac_seg,
  samp_frac_fixed = cbind(s1, s2), # s1 is ignored and could be anything
  delay_scale = c(9.8, 11.2),
  delay_shape = c(1.7, 1.9),
  fit_type = "optimizing"
)
#> Finding the MAP estimate.
#> Warning: first element used of 'length.out' argument
print(m2)
#> Warning: first element used of 'length.out' argument
#> MAP estimate:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          0.71          3.85         14.91         21.59          0.15 
#>        phi[1]             e 
#>          6.12          0.83 
#> Mean in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          0.89          3.85         14.88         21.54          0.17 
#>        phi[1]             e 
#>          6.61          0.82 
#> SD in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          0.65          0.31          0.74          1.16          0.09 
#>        phi[1]             e 
#>          2.22          0.05 

# Estimate a second f_s segment:
f_seg <- c(rep(0, 14), rep(1, 20), rep(2, length(cases) - 20 - 14))
f_prior <- matrix(c(0.4,0.6, rep(0.2,2)), ncol=2, nrow=2) # nrow corresponds to num f_s segments

# Only using 200 iterations and MAP (optimizing) fitting for example speed:
m3 <- fit_seir(
  cases,
  iter = 200,
  f_seg = f_seg,
  f_prior = f_prior,
  samp_frac_fixed = s1,
  fit_type = "optimizing"
)
#> Finding the MAP estimate.
print(m3)
#> MAP estimate:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          3.96          3.40         14.69         21.39          0.17 
#>        f_s[2]        phi[1]             e 
#>          0.71          6.88          0.82 
#> Mean in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          4.63          3.39         14.71         21.35          0.18 
#>        f_s[2]        phi[1]             e 
#>          0.66          7.27          0.82 
#> SD in constrained space of MVN samples:
#>            i0            R0 start_decline   end_decline        f_s[1] 
#>          2.71          0.24          0.68          0.97          0.09 
#>        f_s[2]        phi[1]             e 
#>          0.21          2.64          0.05 

# Choose initial values as a named list:
m <- fit_seir(cases,
  fit_type = "optimizing",
  samp_frac_fixed = s1,
  init_list = list(
    R0 = 3, f_s = array(0.2), # note that `f_s` is an array of appropriate length
    i0 = 5, ur = 0.025, start_decline = 15, end_decline = 22)
)
#> Finding the MAP estimate.