This vignette discusses some recent updates in EpiModel
on working with attributes and summary statistics within the stochastic
network model class. This material is oriented towards custom models
with extension modules and functions. Further details are provided in
the “New Network Models with EpiModel” section of the EpiModel tutorials.
In network simulations with netsim, we store all data in
the Main List Object (referred to as dat). In this
vignette we will discuss with three types of data on
dat:
Attributes are characteristics of the nodes (e.g., persons)
in the model at the current time-step. By default, every node has a
unique_id and an active attribute
used to track each node, as well as three attributes used in the
epidemic modules: status for disease status,
entrTime for the time the node entered the population, and
exitTime for the time the node left the population. Other
attributes can be added of any name and value, like age,
but must be stored as a scalar values.
We work with attribute vectors that are all of the same size
of the number of nodes in the model. In the attribute vectors,
a node is identified by its Positional ID or
posit_id (i.e., the position in vector). The default
attribute unique_id created for each node gives a
unique identification number allowing us to refer to nodes no longer in
the model. Deaths or other forms of exit from the network disrupt the
positional ID but do not impact the unique ID.
The EpiModel::get_attr function will extract the vector
of a given attribute. In its simplest form, we can pull a
complete attribute vector from dat like so:
active <- EpiModel::get_attr(dat, "active")
The above call will pull from dat the
active attribute for all nodes.
active is a vector of size current number of
nodes. With values being either 1 or 0 depending on whether a node
is active or not.
Trying to extract an attribute that does not exist will
cause EpiModel::get_attr to throw an error.
In custom extension modules, we usually want to modify some of the attributes. Below is a minimal module that increments the age of all the nodes by 1.
aging_module <- function(dat, at) {
# Extract current attribute
age <- EpiModel::get_attr(dat, "age")
# Aging process
new_age <- age + 1
# Output updated attributes
dat <- EpiModel::set_attr(dat, "age", new_age)
return(dat)
}
Let’s break down this very simple yet perfectly valid module.
age attribute vector as we did in the
previous section.new_age incrementing all ages by
one.dat object with the
EpiModel::set_attr function.dat object.We can see that EpiModel::set_attr takes as
arguments:
dat object to update.new_age).When using EpiModel::set_attr, there are several things
to note:
dat object, it merely
returns a modified version of it to be assigned back to
dat. (Nicely, this does not cause performance issues due to
the way R handles shallow copies since version 3.1)new_age size is not equal to the number of nodes in
the network, the function will throw an error.The above example describes the recommended way to work with attribute:
dat object with the revised local
vectors.dat object at the end of the function.These functions have other arguments that are described in the
documentation: see
help("net-accessor", package = "EpiModel") for further
details.
The Attributes described above refer to the state of each node in the network at the current time-step. However, it is sometimes useful to track of what happened to nodes over time. Because saving the full history of the attributes would consume too much memory and is rarely necessary for full-scale research models, EpiModel offers a way to record specific attribute for specific nodes at different time-steps. These may be useful for diagnostic purposes (e.g., to see that a dynamic process is coded correctly) or for tracking a limited set of individual-level outcomes for further analysis.
The Attribute History is an efficient collection of recorded
attributes at different time-steps. EpiModel has functions to record
these elements and to access them in a convenient manner once the
netsim simulation is complete.
The following is a module that records the viral load of infected nodes every 10 time steps. We assume that this module is part of a model that defines and updates two attributes over time:
status with possible values being “infected” or
“susceptible”.viral_load as a continuous number for “infected” nodes
and NA otherwise.viral_load_logger_module <- function(dat, at) {
# Run every 10 time steps
if (at %% 10 == 0) {
# Attributes
status <- EpiModel::get_attr(dat, "status")
viral_load <- EpiModel::get_attr(dat, "viral_load")
infected <- which(status == "infected")
dat <- EpiModel::record_attr_history(
dat, at,
"viral_load",
infected,
viral_load[infected]
)
}
# Output
return(dat)
}
The steps in the code are as follows:
infected the posit_ids of the
infected nodes.viral_load of infected nodes at
time at under the label “viral_load”.dat object.EpiModel::record_attr_history takes five arguments:
dat object.at, the current time-step).posit_ids referring to the nodes of
interest (here infected).viral_load[infected])Note that EpiModel::record_attr_history requires a set
of posit_ids. Internally, the function will convert them to
unique_ids so the Attribute History will not be
affected by nodes entering or leaving the population over time.
When recording some attribute histories, one must ensure that we
record as many values as there are posit_ids. Otherwise,
the function will throw an error. It however possible to use only one
value for that attribute even if we record a value for multiple nodes.
This last situation actually uses less RAM. In any case, we would want
to record attribute histories sparingly as the storage can be large,
especially for open population models with many nodes that run over many
time steps.
The Attribute History is meant to be accessed once the
netsim simulation is complete. At that point, we can use
the EpiModel::get_attr_history function to access the
histories that we have recorded, like so:
sim <- netsim(est, param, init, control)
attr_history <- EpiModel::get_attr_history(sim)
The attr_history object is a list of
data.frames. One for each attribute history that was
recorded (we use this flexible list structure because each history may
be of different dimension). Assume that we were running two simulations,
using the module defined above and another one (not shown) recording
when a node switched from infected to recovered.
get_attr_history(sim)
# $viral_load
# sim step attribute uids values
# 1 1 10 viral_load 1001 2000
# 2 1 10 viral_load 1002 1878
# 3 1 20 viral_load 1001 1500
# 4 1 20 viral_load 1002 300
# 5 2 10 viral_load 401 2500
# 6 2 10 viral_load 402 1378
# 7 2 20 viral_load 401 1200
# 8 2 20 viral_load 402 100
# ...
#
# $status
# sim step attribute uids values
# 1 1 22 status 1001 infected
# 2 1 64 status 1002 infected
# 3 1 110 status 1001 recovered
# 4 1 220 status 1002 recovered
# 5 2 7 status 401 infected
# 6 2 15 status 402 infected
# 7 2 20 status 401 recovered
# 8 2 120 status 402 recovered
# ...
We would get a named list of two data.frame’s:
value column being the viral
loads.value column being whether the node
became “infected” or “recovered” at the given time-step.The next type of data stored in dat is called an
Epidemic Tracker. This refers to some summary information about
the population at a specific time step. This information is
created and stored for each time step; therefore
epidemic trackers are historical summary statistics.
One example of an Epidemic Trackers is num,
present in any model, which stores the size of the population at each
time step.
Inside a module, Epidemic Trackers are accessed and modified
with the functions EpiModel::get_epi and
EpiModel::set_epi. Below is an updated new version of our
aging_module above with the addition of epidemic
trackers.
aging_track_module <- function(dat, at) {
# Attributes
age <- EpiModel::get_attr(dat, "age")
# Aging process
new_age <- age + 1
# Calculate summary statistics
mean_age <- mean(new_age)
prev_mean_age <- EpiModel::get_epi(dat, at - 1, "mean_age")
age_change <- mean_age - prev_mean_age
# Update nodal attributes
dat <- EpiModel::set_attr(dat, "age", new_age)
# Update epidemic trackers
dat <- set_epi(dat, "mean_age", at, mean_age)
dat <- set_epi(dat, "age_change", at, age_change)
return(dat)
}
In this new module, in addition to incrementing the age of each node
by one, we also record two values as Epidemic Trackers: the
mean age of the population and the change in mean age compared to the
previous step (we could have calculated the latter after the simulation
was complete with mutate_epi, but here we do it on the fly
to demonstrate get_epi).
We extract the mean age at the previous time step using
EpiModel::get_epi and set the second argument as
at - 1. After all the calculations are complete, we store
mean_age and age_change in dat
using EpiModel::set_epi.
EpiModel::set_attr, dat is
not modified directly and need to be assigned back to itself. Also, the
value we store must be a scalar.Epidemic Trackers are usually the main quantities of
interest after a simulation has completed. We access them simply by
calling as.data.frame on a netsim object or by
using the plot or summary functions within by EpiModel. Note also that
you can perform derived summary statistic calculations after a
netsim call is complete with
EpiModel::mutate_epi.
It can be useful to create small Epidemic Trackers outside
of the modules and use them only when we need them. EpiModel will
automatically run the custom trackers passed to the
.tracker.list argument of control.net.
We call a tracker function a function that
takes dat as arguments and outputs a scalar value. Every
tracker function is run by EpiModel::netsim at
each time step.
epi_s_num <- function(dat) {
needed_attributes <- c("status")
output <- with(get_attr_list(dat, needed_attributes), {
sum(status == "s", na.rm = TRUE)
})
return(output)
}
The epi_s_num function defined above is a tracker
function. It calculates at each time step the number of
susceptible nodes in the network.
The next example is a tracker function that calculates the proportion of the population infected at each time step. Let’s look what each element does:
epi_prop_infected <- function(dat) {
# we need two attributes for our calculation: `status` and `active`
needed_attributes <- c("status", "active")
# we use `with` to simplify code
output <- with(EpiModel::get_attr_list(dat, needed_attributes), {
pop <- active == 1 # we only look at active nodes
cond <- status == "i" # which are infected
# how many are `infected` among the `active`
sum(cond & pop, na.rm = TRUE) / sum(pop, na.rm = TRUE)
})
return(output)
}
We recommend that you write your tracker functions using this format:
needed_attributes variable containing a vector
of attribute names: in this example: “status” and “active”.with and
EpiModel::get_attr_list(dat, needed_attributes) to work in
an environment with only the objects you need. This helps clarify what
the tracker does and simplify any debugging. We save the result of this
call into output.with expression is what will
be stored in output. This must be a scalar.return(output), what was calculated inside the
with construct.This functionality is enabled by passing the tracker
functions as a named list in control.net:
.tracker.list. Let’s make a simple SI epidemic model with
some added trackers:
# Create the `tracker.list` list
some.trackers <- list(
prop_infected = epi_prop_infected,
s_num = epi_s_num
)
control <- EpiModel::control.net(
type = "SI",
nsims = 1,
nsteps = 50,
verbose = FALSE,
.tracker.list = some.trackers
)
param <- EpiModel::param.net(
inf.prob = 0.3,
act.rate = 0.1
)
nw <- network_initialize(n = 50)
nw <- set_vertex_attribute(nw, "race", rbinom(50, 1, 0.5))
est <- EpiModel::netest(
nw,
formation = ~edges,
target.stats = 25,
coef.diss = dissolution_coefs(~offset(edges), 10, 0),
verbose = FALSE
)
#> Starting maximum pseudolikelihood estimation (MPLE):
#> Evaluating the predictor and response matrix.
#> Maximizing the pseudolikelihood.
#> Finished MPLE.
#> Stopping at the initial estimate.
init <- EpiModel::init.net(i.num = 10)
sim <- EpiModel::netsim(est, param, init, control)
d <- as.data.frame(sim)
knitr::kable(tail(d, n = 15))
| sim | time | s.num | i.num | num | si.flow | prop_infected | s_num | |
|---|---|---|---|---|---|---|---|---|
| 36 | 1 | 36 | 25 | 25 | 50 | 1 | 0.50 | 25 |
| 37 | 1 | 37 | 25 | 25 | 50 | 0 | 0.50 | 25 |
| 38 | 1 | 38 | 23 | 27 | 50 | 2 | 0.54 | 23 |
| 39 | 1 | 39 | 22 | 28 | 50 | 1 | 0.56 | 22 |
| 40 | 1 | 40 | 22 | 28 | 50 | 0 | 0.56 | 22 |
| 41 | 1 | 41 | 21 | 29 | 50 | 1 | 0.58 | 21 |
| 42 | 1 | 42 | 20 | 30 | 50 | 1 | 0.60 | 20 |
| 43 | 1 | 43 | 20 | 30 | 50 | 0 | 0.60 | 20 |
| 44 | 1 | 44 | 20 | 30 | 50 | 0 | 0.60 | 20 |
| 45 | 1 | 45 | 19 | 31 | 50 | 1 | 0.62 | 19 |
| 46 | 1 | 46 | 19 | 31 | 50 | 0 | 0.62 | 19 |
| 47 | 1 | 47 | 19 | 31 | 50 | 0 | 0.62 | 19 |
| 48 | 1 | 48 | 19 | 31 | 50 | 0 | 0.62 | 19 |
| 49 | 1 | 49 | 18 | 32 | 50 | 1 | 0.64 | 18 |
| 50 | 1 | 50 | 18 | 32 | 50 | 0 | 0.64 | 18 |
Each function must be named in the .tracker.list list.
The name given there will be used to identify the tracker
function in the epi list and will be the name of the
corresponding column of the data.frame produced by
as.data.frame(sim) where sim is a
netsim object.
Note: in the some.trackers list, we put
epi_prop_infected and epi_s_num without
parentheses at the end. This is because we store the
function and not the result of calling the function.
Important: The trackers are Always run at the end of a simulation step. The value outputted reflect the state of the simulation after all the modules performed their tasks.
When working with complex research-level models, we sometimes want to
inspect the state of an object without stopping the simulation. The
function EpiModel::record_raw_object allows the user to
save any object during the simulation.
introspect_module <- function(dat, at) {
# Attributes
age <- get_attr(dat, "age")
if (mean(age, na.rm = TRUE) > 50) {
obj <- data.frame(
age = age,
status = EpiModel::get_attr(dat, "status")
)
dat <- EpiModel::record_raw_object(dat, at, "old pop", obj)
}
return(dat)
}
In this module, we look at the age of the population and if the mean
age is more than 50, we create a data.frame called
obj containing the age and status
attribute vectors and store it in a Raw Record.
This Raw Record can be accessed in the final
netsim object for debugging purposes.
sim <- netsim(est, param, init, control)
sim[["raw.records"]][[1]]