The MRTAnalysis
package now supports analysis of
mediated causal excursion effect of time-varying treatments,
time-varying mediators, and a continuous distal outcome
in micro-randomized trials (MRTs), using the function
mcee()
.
Distal outcomes are measured once at the end of the study (e.g., weight
loss, cognitive score), in contrast to proximal
outcomes which are repeatedly measured after each treatment
decision point. Time-varying mediators can often be the proximal
outcomes in MRTs.
Micro-randomized trials (MRTs) are experimental
designs to evaluate and optimize quickly-adapting interventions, where
each participant is randomized among treatment options many times
throughout the study. This vignette shows how to use the
mcee
(stands for “mediated causal excursion effect”)
function to assess how treatment effects on a distal outcome are
mediated through intermediate variables (e.g., the activity
prompt -> near-term activity -> long-term cardiovascular health
pathway).
This package implements estimation of the natural direct excursion effect (NDEE) and the natural indirect excursion effect (NIEE) in MRTs and related longitudinal designs:
The modeling of the direct and indirect effects is allowed to vary
with the decision point index, specified through the
argument time_varying_effect_form
. For example, one may
model NIEE and NDEE as constants over time (using ~1
) or as
quadratic functions of the decision point (using
~ dp + I(dp^2)
). The parameter that parameterizes NIEE is
denoted by \(\beta\) and the one for
NDEE is \(\alpha\). At present, the
software does not support mediation effects conditional on
baseline or other time-varying covariates. For full details of the
estimands, see Qian (2025).
For applied users: if analyzing an MRT, you typically know the
randomization probability, and need to provide a
control_formula_with_mediator
describing covariates and
mediators to include in nuisance models. By default, all nuisance
regressions are estimated with GLMs, but you can substitute other
machine learning methods.
For statistically inclined readers: the estimator is multiply robust and generalizes the semiparametric estimators of Tchetgen Tchetgen & Shpitser (2012) to longitudinal settings. It’s a two-stage estimation process, where the first stage fits five nuisance functions and the second stage estimates the parameterized NIEE and NDEE. The five nuisance functions are (see Qian (2025) for the precise definition):
The package provides three entry points depending on how much control you want:
mcee
: streamlined interface for MRTs
with known randomization probabilities. Recommended default for most MRT
analyses.mcee_general
: more flexible, with
explicit configuration of nuisance models. Useful for observational
studies or MRTs when experimenting with different learners.mcee_userfit_nuisance
: accepts
user-supplied nuisance predictions (from external ML fits).The remainder of this vignette will first focus on mcee
for a quick introduction, then show how to use the more flexible
wrappers.
You can install the MRTAnalysis package from CRAN:
# install.packages("MRTAnalysis")
We illustrate the most basic usage of mcee
on a small
simulated dataset. Note that the dataset is in long
format, one row per subject-by-decision point. The distal
outcome is repeated on each row as a constant within subject.
set.seed(123)
# Simulate a toy dataset
n <- 20
T_val <- 5
id <- rep(1:n, each = T_val)
dp <- rep(1:T_val, times = n)
A <- rbinom(n * T_val, 1, 0.5)
M <- rbinom(n * T_val, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp))
Y <- ave(0.5 * A + 0.7 * M + 0.2 * dp + rnorm(n * T_val), id) # constant within id
dat <- data.frame(id, dp, A, M, Y)
# Minimal mcee call
fit <- mcee(
data = dat,
id = "id", dp = "dp",
outcome = "Y", treatment = "A", mediator = "M",
time_varying_effect_form = ~1, # constant-over-time NDEE and NIEE
control_formula_with_mediator = ~ dp + M, # nuisance adjustment
control_reg_method = "glm", # default method
rand_prob = 0.5, # known randomization prob
verbose = FALSE
)
summary(fit)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", rand_prob = 0.5, time_varying_effect_form = ~1,
#> control_formula_with_mediator = ~dp + M, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 18
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.1704 -0.0824 0.4231 0.1203 1.4158 18 0.17
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.02591 -0.00193 0.05374 0.01325 1.95537 18 0.066 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The summary
output provides estimates of Natural Direct
Excursion Effect (NDEE; \(\alpha\)) and
Natural Indirect Excursion Effect (NIEE; \(\beta\)), with standard errors, 95%
confidence intervals, and p-values. The only row in the output
Natural Direct Excursion Effect (alpha)
and
Natural Indirect Excursion Effect (beta)
is named
Intercept
, indicating that these effects are modeled as
constants over time (like intercepts in the effect models).
In particular, the
estimated NDEE is 0.17 (95% CI: -0.08, 0.42; p-value: 0.17) and the
estimated NIEE is 0.025 (95% CI: -0.002, 0.054; p-value: 0.06). The
confidence intervals and p-values are based on t-quantiles.
This package ships a small example dataset,
data_time_varying_mediator_distal_outcome
. It is in
long format, one row per subject-by-decision point.
ID
: subject identifier (use the actual column name
present in your data)t_val
: decision point index (must be strictly
increasing within each subject)I
: availability (binary in {0,1}
); if
omitted, the analysis assumes all rows are available (which is not the
case for this data example)p_A
: randomization probability, i.e., probability of
\(A_{it} = 1\) at a given decision
point conditional on the history informationA
: binary treatment, coded in {0,1}
M
: mediator (continuous or binary are both
allowed)Y
: distal outcome — constant within
each subject (the same value repeated across the subject’s rows)X
, A_prev
, M_prev
, etc.)data(data_time_varying_mediator_distal_outcome)
dat <- data_time_varying_mediator_distal_outcome
dplyr::glimpse(dat)
#> Rows: 500
#> Columns: 16
#> $ id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, …
#> $ dp <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1…
#> $ X_prev <dbl> -0.5605, 0.0800, 2.2500, 1.4600, 0.0000, -1.7400, 0.4700, 0.050…
#> $ I_prev <int> 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, …
#> $ A_prev <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, …
#> $ M_prev <dbl> 0.00, 0.13, -1.10, 0.79, 0.98, 2.18, 0.23, 1.62, 2.38, 1.09, 0.…
#> $ X <dbl> 0.08, 2.25, 1.46, 0.00, -1.74, 0.47, 0.05, 1.46, 1.96, -0.28, -…
#> $ I <int> 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, …
#> $ A <int> 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, …
#> $ M <dbl> 0.13, -1.10, 0.79, 0.98, 2.18, 0.23, 1.62, 2.38, 1.09, 0.15, 0.…
#> $ mu_X <dbl> -0.17, 0.05, 0.45, 0.60, 0.20, 0.11, 0.19, 0.54, 0.91, 0.81, -0…
#> $ p_I <dbl> 0.82, 0.89, 0.91, 0.78, 0.66, 0.67, 0.81, 0.76, 0.80, 0.75, 0.8…
#> $ p_A <dbl> 0.39, 0.45, 0.43, 0.47, 0.47, 0.68, 0.62, 0.76, 0.71, 0.64, 0.3…
#> $ mu_M <dbl> -0.66, -0.38, -0.64, -0.09, 0.64, 1.24, 1.29, 1.76, 1.24, 0.97,…
#> $ mu_Y <dbl> 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 1.4…
#> $ Y <dbl> 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 0.6…
dplyr::count(dat, id, name = "Ti") |>
dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti))
#> mean_T min_T max_T
#> 1 10 10 10
# Delete some decision points for certain individuals to mimic scenarios
# where not all individuals have the same number of decision points.
dat <- dat[!((dat$id == 1 & dat$dp == 10) | (dat$id == 2 & dat$dp %in% c(9, 10))), ]
dplyr::count(dat, id, name = "Ti") |>
dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti))
#> mean_T min_T max_T
#> 1 9.94 8 10
mcee
We focus first on the streamlined MRT workflow with known randomization probability. Throughout, we’ll use the included dataset.
set.seed(1)
fit1 <- mcee(
data = dat,
id = "id",
dp = "dp",
outcome = "Y",
treatment = "A",
mediator = "M",
availability = "I",
rand_prob = "p_A",
time_varying_effect_form = ~1, # NDEE and NIEE are constant over time
control_formula_with_mediator = ~ dp + M + X, # covariate adjustment
control_reg_method = "glm", # nuisance learners for q, eta, mu, nu
verbose = FALSE
)
summary(fit1)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.180 -0.021 0.381 0.100 1.801 48 0.078 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.0421 -0.0334 0.1176 0.0376 1.1200 48 0.27
time_varying_effect_form
: defines the basis f(t) used
to model alpha (direct) and beta (indirect).control_formula_with_mediator
: adjustment set for
nuisance models; include the mediator here (the wrapper
will automatically remove it from the nuisance function models that
should not include it). This can include s()
terms if
control_reg_method = "gam"
.dp
)fit2 <- mcee(
data = dat,
id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
availability = "I", rand_prob = "p_A",
time_varying_effect_form = ~dp, # NDEE, NIEE vary linearly in time
control_formula_with_mediator = ~ dp + M + X,
control_reg_method = "glm",
verbose = FALSE
)
summary(fit2) # rows now labeled (Intercept) and dp
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.75643 0.17719 1.33568 0.28777 2.62863 46 0.012 *
#> dp -0.10528 -0.20272 -0.00784 0.04841 -2.17487 46 0.035 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.07360 -0.11968 0.26688 0.09602 0.76653 46 0.45
#> dp -0.00576 -0.03888 0.02736 0.01645 -0.35011 46 0.73
Interpretation: the rows for dp
report how the natural
direct/indirect excursion effects change per unit increase in decision
point.
Tip. If your dataset already contains a basis of time (e.g., spline columns), you can reference them in
time_varying_effect_form
. The package will warn when variables other thandp
appear there, but this is allowed for precomputed time bases.
You can switch the learner used for fitting the nuisance functions
via control_reg_method
:
"gam"
"rf"
"ranger"
"sl"
# Example: GAM (generalized additive model)
set.seed(2)
fit3 <- mcee(
data = dat,
id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
availability = "I", rand_prob = "p_A",
time_varying_effect_form = ~dp,
control_formula_with_mediator = ~ s(dp) + s(M) + s(X), # spline formula for mgcv::gam
control_reg_method = "gam",
verbose = FALSE
)
summary(fit3)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp,
#> control_formula_with_mediator = ~s(dp) + s(M) + s(X), control_reg_method = "gam",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.7679 0.1921 1.3437 0.2861 2.6845 46 0.01 *
#> dp -0.1165 -0.2137 -0.0194 0.0483 -2.4145 46 0.02 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) -0.0305 -0.2225 0.1615 0.0954 -0.3196 46 0.75
#> dp -0.0034 -0.0352 0.0284 0.0158 -0.2150 46 0.83
If interested in estimating direct and indirect effects of
intervention at a specific decision point (or a few decision points),
you can use the specific_dp_only
argument. For example, to
estimate effects at the first two decision points (1 and 2) only, you
can either:
specific_dp_only = c(1, 2)
in mcee
,
orweight_per_row
that puts nonzero mass only at
those rows (the wrappers normalize weights within subject).fit4 <- mcee(
data = dat,
id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
availability = "I", rand_prob = "p_A",
time_varying_effect_form = ~1,
control_formula_with_mediator = ~ dp + M + X,
control_reg_method = "glm",
specific_dp_only = c(1, 2),
verbose = FALSE
)
summary(fit4)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> specific_dp_only = c(1, 2), verbose = FALSE)
#>
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.712 0.119 1.305 0.295 2.415 48 0.02 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.0478 -0.2022 0.2979 0.1244 0.3847 48 0.7
The effect estimates are now an average over decision points 1 and 2.
In the next section of the vignette, we’ll show how to use the more
flexible wrappers, mcee_general
and
mcee_userfit_nuisance
.
Every wrapper in this package returns an object of class
mcee_fit
. This object is a list with several
components.
$mcee_fit
alpha_hat
, beta_hat
: estimated
coefficients for the natural direct excursion effect
(\(\alpha\)) and natural
indirect excursion effect (\(\beta\)).alpha_se
, beta_se
: standard errors.varcov
: estimated covariance matrix for both \(\alpha\) and \(\beta\) coefficients (ordered as \((\alpha^T, \beta^T)^T\)).alpha_varcov
, beta_varcov
: estimated
covariance matrices for \(\alpha\) and
\(\beta\) separately.fit1$mcee_fit
#> $alpha_hat
#> (Intercept)
#> 0.18
#>
#> $alpha_se
#> (Intercept)
#> 0.09999
#>
#> $beta_hat
#> (Intercept)
#> 0.04206
#>
#> $beta_se
#> (Intercept)
#> 0.03756
#>
#> $varcov
#> alpha_(Intercept) beta_(Intercept)
#> alpha_(Intercept) 9.998e-03 -4.016e-05
#> beta_(Intercept) -4.016e-05 1.411e-03
#>
#> $alpha_varcov
#> (Intercept)
#> (Intercept) 0.009998
#>
#> $beta_varcov
#> (Intercept)
#> (Intercept) 0.001411
The lincomb_joint
, lincomb_alpha
, and
lincomb_beta
arguments to summary()
lets you
compute linear combinations of the estimated coefficients. For
example:
fit1
)# difference between direct and indirect excursion effects
summary(fit1, lincomb_joint = c(1, -1))
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.180 -0.021 0.381 0.100 1.801 48 0.078 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.0421 -0.0334 0.1176 0.0376 1.1200 48 0.27
#>
#> Joint linear combinations (L * (alpha, beta)):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x theta 0.1380 -0.0775 0.3535 0.1072 1.2873 48 0.2
This requests the linear combination \(\alpha - \beta\), with standard error and t-statistic computed from the covariance matrix.
fit2
)Suppose you fit a model with a linear time basis, e.g.:
fit2 <- mcee(
data = dat,
id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
availability = "I", rand_prob = "p_A",
time_varying_effect_form = ~dp,
control_formula_with_mediator = ~ dp + M + X,
control_reg_method = "glm",
verbose = FALSE
)
To get the effect at decision point 10 (the last decision point), you need the intercept plus 9 times the slope:
summary(fit2, lincomb_alpha = c(1, 9), lincomb_beta = c(1, 9))
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.75643 0.17719 1.33568 0.28777 2.62863 46 0.012 *
#> dp -0.10528 -0.20272 -0.00784 0.04841 -2.17487 46 0.035 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.07360 -0.11968 0.26688 0.09602 0.76653 46 0.45
#> dp -0.00576 -0.03888 0.02736 0.01645 -0.35011 46 0.73
#>
#> Linear combinations of alpha (L * alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x alpha -0.191 -0.581 0.199 0.194 -0.986 46 0.33
#>
#> Linear combinations of beta (L * beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x beta 0.0218 -0.1202 0.1637 0.0705 0.3086 46 0.76
If you prefer a joint contrast (e.g., alpha(t=10) − beta(t=10)),
stack the two contrasts into a single row over
c(alpha, beta)
:
L_joint_t10 <- matrix(c(
1, 9, # alpha part
-1, -9 # beta part
), nrow = 1)
summary(fit2, lincomb_joint = L_joint_t10)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.75643 0.17719 1.33568 0.28777 2.62863 46 0.012 *
#> dp -0.10528 -0.20272 -0.00784 0.04841 -2.17487 46 0.035 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.07360 -0.11968 0.26688 0.09602 0.76653 46 0.45
#> dp -0.00576 -0.03888 0.02736 0.01645 -0.35011 46 0.73
#>
#> Joint linear combinations (L * (alpha, beta)):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x theta1 -0.213 -0.651 0.225 0.218 -0.978 46 0.33
For diagnostics, the returned object also includes:
$nuisance_models
: the actual model objects used for
fitting each nuisance parameter (p
, q
,
eta
, mu
, nu
).$nuisance_fitted
: the fitted values for each nuisance
function (p1
, q1
, eta1
,
eta0
, mu1
, mu0
, nu1
,
nu0
).One can use summary(..., show_nuisance = TRUE)
to get a
compact summary of the nuisance fits.
summary(fit1, show_nuisance = TRUE)
#>
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A",
#> mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1,
#> control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm",
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.180 -0.021 0.381 0.100 1.801 48 0.078 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.0421 -0.0334 0.1176 0.0376 1.1200 48 0.27
#>
#> Stage-1 nuisance fits:
#> [p] known constant: multiple values (e.g., 0.39, 0.45, 0.43, ...)
#> [q] glm: ~ dp + M + X
#> [eta1] glm: ~ dp + X
#> [eta0] glm: ~ dp + X
#> [mu1] glm: ~ dp + M + X
#> [mu0] glm: ~ dp + M + X
#> [nu1] glm: ~ dp + X
#> [nu0] glm: ~ dp + X
#> Note: For full details, inspect `$nuisance_models` directly.
head(fit1$nuisance_fitted$mu1)
#> [1] 3.277 3.655 3.757 3.083 2.595 2.775
Tip. Because
varcov
is the joint covariance ofc(alpha, beta)
, you can build any custom Wald test by forming your own contrast matrixL
and usingL %*% c(alpha, beta)
with varianceL %*% varcov %*% t(L)
. This is equivalent to usinglincomb_joint = L
insummary()
.
mcee_general
and
mcee_userfit_nuisance
While mcee
covers the common MRT use case (known
randomization, one control formula for all nuisance functions), two
additional wrappers provide more flexibility in controlling how the
nuisance functions are fitted:
mcee_general
This interface lets you configure each nuisance function separately.
known
: supply a fixed vector/scalar (commonly for
treatment mechanism p
in MRT).formula
: regression formula, which can include
s()
terms if using method = "gam"
.method
: learner (glm
, gam
,
lm
, rf
, ranger
,
sl
).family
: optional; defaults inferred from nuisance type
(binomial
for p
and q
;
gaussian
for eta
, mu
,
nu
).Helper functions can be used to build these objects:
mcee_config_maker
, mcee_config_known
,
mcee_config_glm
, mcee_config_gam
,
mcee_config_lm
, mcee_config_rf
,
mcee_config_ranger
, mcee_config_sl
,
mcee_config_sl_user
.
# Families (binomial vs Gaussian) are chosen automatically when omitted; for `p` and `q` the default is binomial, for the outcome regressions it is Gaussian.
cfg <- list(
p = mcee_config_known("p", dat$p_A), # known randomization prob in MRT
q = mcee_config_glm("q", ~ dp + X + M),
eta = mcee_config_glm("eta", ~ dp + X),
mu = mcee_config_glm("mu", ~ dp + X + M),
nu = mcee_config_glm("nu", ~ dp + X)
)
fit_gen <- mcee_general(
data = dat,
id = "id", dp = "dp", outcome = "Y",
treatment = "A", mediator = "M",
availability = "I",
time_varying_effect_form = ~dp,
config_p = cfg$p, config_q = cfg$q,
config_eta = cfg$eta, config_mu = cfg$mu, config_nu = cfg$nu,
verbose = FALSE
)
summary(fit_gen)
#>
#> Call:
#> mcee_general(data = dat, id = "id", dp = "dp", outcome = "Y",
#> treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp,
#> config_p = cfg$p, config_q = cfg$q, config_eta = cfg$eta,
#> config_mu = cfg$mu, config_nu = cfg$nu, verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.75643 0.17719 1.33568 0.28777 2.62863 46 0.012 *
#> dp -0.10528 -0.20272 -0.00784 0.04841 -2.17487 46 0.035 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.07360 -0.11968 0.26688 0.09602 0.76653 46 0.45
#> dp -0.00576 -0.03888 0.02736 0.01645 -0.35011 46 0.73
Tip. This interface is particularly useful for observational studies where the treatment mechanism
p
is unknown and must be estimated. You can setconfig_p
to a regression formula and method of your choice.
mcee_userfit_nuisance
Sometimes you fit nuisance models externally, perhaps with custom ML workflows. In that case, you can bypass Stage‑1 nuisance model fitting and supply predictions directly.
p1
, q1
,
eta1
, eta0
, mu1
,
mu0
, nu1
, nu0
, each aligned with
the rows of your data.
p1
: \(P(A_t = I_t |
H_t)\)q1
: \(P(A_t = I_t | H_t,
M_t)\)eta1
: \(E(Y | H_t, A_t =
I_t)\)eta0
: \(E(Y | H_t, A_t =
0)\)mu1
: \(E(Y | H_t, A_t = I_t,
M_t)\)mu0
: \(E(Y | H_t, A_t = 0,
M_t)\)nu1
: \(E[E(Y | H_t, A_t =
I_t, M_t) | H_t, A_t = 0]\)nu0
: \(E[E(Y | H_t, A_t = 0,
M_t) | H_t, A_t = I_t]\)p1 = 1
and q1 = 1
at I == 0
.# Fit nuisance regressions manually: p, q, eta, mu, nu
p1_hat <- dat$p_A # known randomization prob in MRT
p1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message
q1_hat <- predict(glm(A ~ dp + X + M, family = binomial(), data = dat[dat$I == 1, ]),
type = "response", newdata = dat
)
q1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message
eta1_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == dat$I, ]), newdata = dat)
eta0_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == 0, ]), newdata = dat)
mu1_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == dat$I, ]), newdata = dat)
mu0_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == 0, ]), newdata = dat)
nu1_hat <- predict(lm(mu1 ~ dp + X, data = cbind(dat, mu1 = mu1_hat)[dat$A == 0, ]), newdata = dat)
nu0_hat <- predict(lm(mu0 ~ dp + X, data = cbind(dat, mu0 = mu0_hat)[dat$A == dat$I, ]), newdata = dat)
fit_usr <- mcee_userfit_nuisance(
data = dat,
id = "id", dp = "dp", outcome = "Y",
treatment = "A", mediator = "M",
availability = "I",
time_varying_effect_form = ~dp,
p1 = p1_hat,
q1 = q1_hat,
eta1 = eta1_hat, eta0 = eta0_hat,
mu1 = mu1_hat, mu0 = mu0_hat,
nu1 = nu1_hat, nu0 = nu0_hat,
verbose = FALSE
)
summary(fit_usr)
#>
#> Call:
#> mcee_userfit_nuisance(data = dat, id = "id", dp = "dp", outcome = "Y",
#> treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp,
#> p1 = p1_hat, q1 = q1_hat, eta1 = eta1_hat, eta0 = eta0_hat,
#> mu1 = mu1_hat, mu0 = mu0_hat, nu1 = nu1_hat, nu0 = nu0_hat,
#> verbose = FALSE)
#>
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#>
#> Natural Direct Excursion Effect (alpha):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.75643 0.17719 1.33568 0.28777 2.62863 46 0.012 *
#> dp -0.10528 -0.20272 -0.00784 0.04841 -2.17487 46 0.035 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Natural Indirect Excursion Effect (beta):
#> Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept) 0.07360 -0.11968 0.26688 0.09602 0.76653 46 0.45
#> dp -0.00576 -0.03888 0.02736 0.01645 -0.35011 46 0.73
mcee
.p1
as a known constant
vector.Different wrappers will produce the exact same result if the same
nuisance models are used. Here we compare fit2
(from
mcee
) and fit_gen
(from
mcee_general
) which use the same GLM nuisance models and
the same control variables.
fit2$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")]
#> $alpha_hat
#> (Intercept) dp
#> 0.7564 -0.1053
#>
#> $beta_hat
#> (Intercept) dp
#> 0.073602 -0.005761
#>
#> $alpha_se
#> (Intercept) dp
#> 0.28777 0.04841
#>
#> $beta_se
#> (Intercept) dp
#> 0.09602 0.01645
fit_gen$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")]
#> $alpha_hat
#> (Intercept) dp
#> 0.7564 -0.1053
#>
#> $beta_hat
#> (Intercept) dp
#> 0.073602 -0.005761
#>
#> $alpha_se
#> (Intercept) dp
#> 0.28777 0.04841
#>
#> $beta_se
#> (Intercept) dp
#> 0.09602 0.01645
all.equal(fit2$mcee_fit, fit_gen$mcee_fit, tolerance = 1e-6)
#> [1] TRUE
all.equal(fit2$mcee_fit, fit_usr$mcee_fit, tolerance = 1e-6)
#> [1] TRUE
mcee
. For analyzing MRTs
with known randomization probabilities, this is the simplest
choice.dp
in time_varying_effect_form
(or simply set
it to ~1
). Do not include baseline or time-varying
covariates here. If you’ve precomputed spline or polynomial bases of
dp
, referencing those columns is allowed.control_formula_with_mediator
(or in
config_mu
/config_q
for
mcee_general
).dp
. Outcomes must be
constant within subjects. No missing values are allowed.The mcee
package provides a multiply robust estimator of
natural direct and indirect excursion effects in
micro-randomized trials and observational longitudinal studies with
time-varying treatments, time-varying mediators and a distal outcome.
Three wrapper functions serve different needs:
mcee
: the streamlined MRT workflow with known
randomization.mcee_general
: full control over nuisance model
specifications.mcee_userfit_nuisance
: for injecting externally fitted
nuisance parameters.Together, these tools allow both applied researchers and methodologists to estimate mediated excursion effects with transparent assumptions and flexible modeling options.
This appendix documents how the core calculation happens in
.mcee_core_rows()
, which is a detailed derivation of Eq.
(12) and the asymptotic variance of Theorem 4 in Qian (2025).
The proposed estimating function is
\[ \psi(\gamma; \zeta) := \sum_{t=1}^T \omega(t) \begin{bmatrix} \{ \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - \phi_t^{00}(p_t, \eta_t) - f(t)^\top \alpha \} f(t) \\ \{ \phi_t^{11}(p_t, \eta_t) - \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - f(t)^\top \beta \} f(t) \end{bmatrix}. \]
The estimating equation (EE), omitting the nuisance parameters and allowing different \(T_i\) for each subject \(i\), is
\[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} \{ \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha \} f(t) \\ \{ \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta \} f(t) \end{bmatrix} = 0. \]
The estimators are computed by
\[ \hat\alpha = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \{ \phi_{it}^{10} - \phi_{it}^{00} \} f(t) \right], \]
\[ \hat\beta = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \{ \phi_{it}^{11} - \phi_{it}^{10} \} f(t) \right]. \]
The asymptotic variance of \((\alpha^\top, \beta^\top)^\top\) can be estimated by
\[ \text{Var}(\hat\gamma) \approx \text{Bread}^{-1} \, \text{Meat} \, (\text{Bread}^{-1})^\top, \]
where
\[ \text{Bread} = \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} f(t)f(t)^\top & 0 \\ 0 & f(t)f(t)^\top \end{bmatrix}, \]
\[ \text{Meat} = \frac{1}{n}\sum_{i=1}^n \left( \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} \{ \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha \} f(t) \\ \{ \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta \} f(t) \end{bmatrix} \right)^{\otimes 2}. \]