This function fits a Stan SEIR model to one or more sets of COVID-19 case data. See project_seir() for making predictions/projections.

fit_seir(
  daily_cases,
  obs_model = c("NB2", "Poisson"),
  forecast_days = 0,
  time_increment = 0.25,
  samp_frac_fixed = NULL,
  samp_frac_type = c("fixed", "estimated", "rw", "segmented"),
  samp_frac_seg = NULL,
  f_seg = c(0, rep(1, nrow(daily_cases) + forecast_days - 1)),
  days_back = 45,
  R0_prior = c(log(2.6), 0.2),
  phi_prior = 1,
  f_prior = c(0.4, 0.2),
  e_prior = c(0.8, 0.05),
  samp_frac_prior = c(0.4, 0.2),
  start_decline_prior = c(log(15), 0.05),
  end_decline_prior = c(log(22), 0.05),
  use_ramp = TRUE,
  rw_sigma = 0.1,
  seed = 42,
  chains = 4,
  iter = 2000,
  N_pop = 5100000,
  pars = c(D = 5, k1 = 1/5, k2 = 1, q = 0.05, ud = 0.1, ur = 0.02, f0 = 1),
  i0_prior = c(log(8), 1),
  state_0 = c(E1_frac = 0.4, E2_frac = 0.1, I_frac = 0.5, Q_num = 0, R_num = 0,
    E1d_frac = 0.4, E2d_frac = 0.1, Id_frac = 0.5, Qd_num = 0, Rd_num = 0),
  save_state_predictions = FALSE,
  delay_scale = 9.85,
  delay_shape = 1.73,
  ode_control = c(1e-07, 1e-06, 1e+06),
  fit_type = c("NUTS", "VB", "optimizing"),
  init = c("prior_random", "optimizing"),
  init_list = NULL,
  X = NULL,
  ...
)

Arguments

daily_cases

Either a vector of daily new cases if fitting to a single data type or a matrix of case data if fitting to multiple data types. Each data type should be in its own column. Can have NA values (will be ignored in likelihood). A vector will be turned into a one column matrix.

obs_model

Type of observation model.

forecast_days

Number of days into the future to forecast. The model will run faster with fewer forecasted days. It is recommended to set this to 0 here and use project_seir() for projections.

time_increment

Time increment for ODEs and Weibull delay-model integration in units of days. Larger numbers will run faster, possibly at the expense of accuracy.

samp_frac_fixed

A vector (or matrix) of sampled fractions. Should be of dimensions: nrow(daily_cases) + forecast_days (rows) by ncol(daily_cases (columns). A vector will be turned into a one column matrix.

samp_frac_type

How to treat the sample fraction. Fixed, estimated, a constrained random walk, or segmented. Only applies to the first data type column. The other data types must always have a fixed sample fraction currently.

samp_frac_seg

A vector of sample fraction segment indexes of length daily_cases + forecast_days. Should start at 1. Applies if samp_frac_type = "segmented".

f_seg

A vector of segment indexes of length daily_cases + forecast_days. The segment index values should start at 0 to represent the fixed "no social distancing" value f0 from the pars argument. Doesn't come into play until after end_decline day.

days_back

Number of days to go back for the Weibull case-delay integration. Should be sufficiently large that the results do not change.

R0_prior

Lognormal log mean and SD for the R0 prior.

phi_prior

SD of 1/sqrt(phi) ~ Normal(0, SD) prior, where NB2(mu, phi) and Var(Y) = mu + mu^2 / phi. See the Stan Prior Choice Recommendations. If of length 2 will be treated as lognormal prior on phi.

f_prior

Beta mean and SD for the f parameters. If S segments are used, should be a Sx2 matrix. If multiple f segments are used but only one mean and SD are specified, they will be repeated as needed.

e_prior

Beta mean and SD for the e (derived) parameter. e represents the fraction of people who are social distancing.

samp_frac_prior

samp_frac prior if samp_frac_type is "estimated" or "rw" or "segmented". In the case of the random walk, this specifies the initial state prior. The two values correspond to the mean and SD of a Beta distribution. Only applies to first time series.

start_decline_prior

Lognormal log mean and SD for the parameter representing the day that social distancing starts ramping in (start_decline).

end_decline_prior

Lognormal log mean and SD for the parameter representing the day that the social distancing ramp finishes being ramped in (end_decline).

use_ramp

Logical. Use the ramp?

rw_sigma

The fixed standard deviation on the optional samp_frac random walk.

seed

MCMC seed for rstan::stan().

chains

Number of MCMC chains for rstan::stan().

iter

MCMC iterations per chain for rstan::stan().

N_pop

Number of people in population.

pars

A named numeric vector of fixed parameter values.

i0_prior

Infected people infected at initial point in time. Lognormal log mean and SD for the parameter. Note that the default is set up for BC. You may need to use a lower prior for other regions.

state_0

Initial state: a named numeric vector.

save_state_predictions

Include the state predictions? y_hat Will make the resulting model object much larger.

delay_scale

Weibull scale parameter for the delay in reporting. If there are multiple datatypes then this should be a vector of the same length as the data types.

delay_shape

Weibull shape parameter for the delay in reporting. Same format as for delay_scale.

ode_control

Control options for the Stan ODE solver. First is relative difference, then absolute difference, and then maximum iterations. These can likely be left as is.

fit_type

Stan sampling/fitting algorithm to use. NUTS = rstan::sampling() or rstan::stan(); VB = rstan::vb(); optimizing = rstan::optimizing().

init

Initialization type. Draw randomly from the prior or try to use the MAP estimate? Can be overridden by the init_list argument.

init_list

An optional named list foreign initialization. Note that the f_s argument needs to be an array of appropriate length. See the example below.

X

An optional model matrix applied additively to log expected cases.

...

Other arguments to pass to rstan::sampling() / rstan::stan() / rstan::vb() / rstan::optimizing().

Value

A named list object

Examples

# Example daily case data: cases <- c( 0, 0, 1, 3, 1, 8, 0, 6, 5, 0, 7, 7, 18, 9, 22, 38, 53, 45, 40, 77, 76, 48, 67, 78, 42, 66, 67, 92, 16, 70, 43, 53, 55, 53, 29, 26, 37, 25, 45, 34, 40, 35 ) # Example assumed sampling fractions of positive cases: s1 <- c(rep(0.1, 13), rep(0.2, length(cases) - 13)) # To use parallel processing with multiple chains: # options(mc.cores = parallel::detectCores() / 2) # Only using 200 iterations and MAP (optimizing) fitting for example speed: m <- fit_seir( cases, iter = 200, fit_type = "optimizing", samp_frac_fixed = s1 )
#> Finding the MAP estimate.
print(m)
#> MAP estimate: #> i0 R0 start_decline end_decline f_s[1] #> 4.05 3.38 14.69 21.39 0.18 #> phi[1] e #> 6.66 0.82 #> Mean in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 4.73 3.40 14.77 21.38 0.19 #> phi[1] e #> 7.31 0.83 #> SD in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 3.08 0.27 0.71 1.00 0.09 #> phi[1] e #> 2.51 0.04
names(m)
#> [1] "fit" "post" "phi_prior" #> [4] "R0_prior" "f_prior" "obs_model" #> [7] "e_prior_trans" "start_decline_prior" "end_decline_prior" #> [10] "samp_frac_fixed" "state_0" "daily_cases" #> [13] "days" "time" "last_day_obs" #> [16] "pars" "f2_prior_beta_shape1" "f2_prior_beta_shape2" #> [19] "stan_data" "days_back" "opt" #> [22] "fit_type"
names(m$post) # from rstan::extract(m$fit)
#> [1] "i0" "R0" "start_decline" "end_decline" #> [5] "phi" "ur" "e" "f_s"
# Use tidybayes if you'd like: # post_tidy <- tidybayes::spread_draws(m$fit, c(y_rep, mu)[day, data_type]) # Add hospitalization data and estimate 2 sample-fraction blocks # for the reported cases: samp_frac_seg <- c(rep(1, 13), rep(2, length(cases) - 13)) s2 <- rep(0.07, 42) # Assuming 7\% of positive individuals are hospitalized hosp <- c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 11, 9, 4, 7, 19, 23, 13, 11, 3, 13, 21, 14, 17, 29, 21, 19, 19, 10, 6, 11, 8, 11, 8, 7, 6 ) # Only using 200 iterations and MAP (optimizing) fitting for example speed: m2 <- fit_seir( daily_cases = cbind(cases, hosp), iter = 200, samp_frac_type = "segmented", # `samp_frac_type` affects the first data type only samp_frac_seg = samp_frac_seg, samp_frac_fixed = cbind(s1, s2), # s1 is ignored and could be anything delay_scale = c(9.8, 11.2), delay_shape = c(1.7, 1.9), fit_type = "optimizing" )
#> Finding the MAP estimate.
#> Warning: first element used of 'length.out' argument
print(m2)
#> Warning: first element used of 'length.out' argument
#> MAP estimate: #> i0 R0 start_decline end_decline f_s[1] #> 0.71 3.85 14.91 21.59 0.15 #> phi[1] e #> 6.12 0.83 #> Mean in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 0.89 3.85 14.88 21.54 0.17 #> phi[1] e #> 6.61 0.82 #> SD in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 0.65 0.31 0.74 1.16 0.09 #> phi[1] e #> 2.22 0.05
# Estimate a second f_s segment: f_seg <- c(rep(0, 14), rep(1, 20), rep(2, length(cases) - 20 - 14)) f_prior <- matrix(c(0.4,0.6, rep(0.2,2)), ncol=2, nrow=2) # nrow corresponds to num f_s segments # Only using 200 iterations and MAP (optimizing) fitting for example speed: m3 <- fit_seir( cases, iter = 200, f_seg = f_seg, f_prior = f_prior, samp_frac_fixed = s1, fit_type = "optimizing" )
#> Finding the MAP estimate.
print(m3)
#> MAP estimate: #> i0 R0 start_decline end_decline f_s[1] #> 3.96 3.40 14.69 21.39 0.17 #> f_s[2] phi[1] e #> 0.71 6.88 0.82 #> Mean in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 4.63 3.39 14.71 21.35 0.18 #> f_s[2] phi[1] e #> 0.66 7.27 0.82 #> SD in constrained space of MVN samples: #> i0 R0 start_decline end_decline f_s[1] #> 2.71 0.24 0.68 0.97 0.09 #> f_s[2] phi[1] e #> 0.21 2.64 0.05
# Choose initial values as a named list: m <- fit_seir(cases, fit_type = "optimizing", samp_frac_fixed = s1, init_list = list( R0 = 3, f_s = array(0.2), # note that `f_s` is an array of appropriate length i0 = 5, ur = 0.025, start_decline = 15, end_decline = 22) )
#> Finding the MAP estimate.