The MRTAnalysis package now supports analysis of mediated causal excursion effect of time-varying treatments, time-varying mediators, and a continuous distal outcome in micro-randomized trials (MRTs), using the function mcee().
Distal outcomes are measured once at the end of the study (e.g., weight loss, cognitive score), in contrast to proximal outcomes which are repeatedly measured after each treatment decision point. Time-varying mediators can often be the proximal outcomes in MRTs.

1. Introduction

Micro-randomized trials (MRTs) are experimental designs to evaluate and optimize quickly-adapting interventions, where each participant is randomized among treatment options many times throughout the study. This vignette shows how to use the mcee (stands for “mediated causal excursion effect”) function to assess how treatment effects on a distal outcome are mediated through intermediate variables (e.g., the activity prompt -> near-term activity -> long-term cardiovascular health pathway).

This package implements estimation of the natural direct excursion effect (NDEE) and the natural indirect excursion effect (NIEE) in MRTs and related longitudinal designs:

The modeling of the direct and indirect effects is allowed to vary with the decision point index, specified through the argument time_varying_effect_form. For example, one may model NIEE and NDEE as constants over time (using ~1) or as quadratic functions of the decision point (using ~ dp + I(dp^2)). The parameter that parameterizes NIEE is denoted by \(\beta\) and the one for NDEE is \(\alpha\). At present, the software does not support mediation effects conditional on baseline or other time-varying covariates. For full details of the estimands, see Qian (2025).

Estimation framework

For applied users: if analyzing an MRT, you typically know the randomization probability, and need to provide a control_formula_with_mediator describing covariates and mediators to include in nuisance models. By default, all nuisance regressions are estimated with GLMs, but you can substitute other machine learning methods.

For statistically inclined readers: the estimator is multiply robust and generalizes the semiparametric estimators of Tchetgen Tchetgen & Shpitser (2012) to longitudinal settings. It’s a two-stage estimation process, where the first stage fits five nuisance functions and the second stage estimates the parameterized NIEE and NDEE. The five nuisance functions are (see Qian (2025) for the precise definition):

  • \(p\): treatment assignment mechanism (in MRTs this is known and should be specified as such — see later).
  • \(q\): treatment assignment further conditional on the mediator.
  • \(\eta\): outcome regression.
  • \(\mu\): outcome regression further conditional on the mediator.
  • \(\nu\): a “cross-world” outcome regression.

Three user-facing wrappers

The package provides three entry points depending on how much control you want:

  • mcee: streamlined interface for MRTs with known randomization probabilities. Recommended default for most MRT analyses.
  • mcee_general: more flexible, with explicit configuration of nuisance models. Useful for observational studies or MRTs when experimenting with different learners.
  • mcee_userfit_nuisance: accepts user-supplied nuisance predictions (from external ML fits).

The remainder of this vignette will first focus on mcee for a quick introduction, then show how to use the more flexible wrappers.


2. Installation

You can install the MRTAnalysis package from CRAN:

# install.packages("MRTAnalysis")

3. Quick Start Example

We illustrate the most basic usage of mcee on a small simulated dataset. Note that the dataset is in long format, one row per subject-by-decision point. The distal outcome is repeated on each row as a constant within subject.

set.seed(123)

# Simulate a toy dataset
n <- 20
T_val <- 5
id <- rep(1:n, each = T_val)
dp <- rep(1:T_val, times = n)
A <- rbinom(n * T_val, 1, 0.5)
M <- rbinom(n * T_val, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp))
Y <- ave(0.5 * A + 0.7 * M + 0.2 * dp + rnorm(n * T_val), id) # constant within id

dat <- data.frame(id, dp, A, M, Y)

# Minimal mcee call
fit <- mcee(
    data = dat,
    id = "id", dp = "dp",
    outcome = "Y", treatment = "A", mediator = "M",
    time_varying_effect_form = ~1, # constant-over-time NDEE and NIEE
    control_formula_with_mediator = ~ dp + M, # nuisance adjustment
    control_reg_method = "glm", # default method
    rand_prob = 0.5, # known randomization prob
    verbose = FALSE
)

summary(fit)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", rand_prob = 0.5, time_varying_effect_form = ~1, 
#>     control_formula_with_mediator = ~dp + M, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 18
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)   0.1704 -0.0824  0.4231     0.1203  1.4158 18     0.17
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.02591 -0.00193  0.05374    0.01325  1.95537 18    0.066 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The summary output provides estimates of Natural Direct Excursion Effect (NDEE; \(\alpha\)) and Natural Indirect Excursion Effect (NIEE; \(\beta\)), with standard errors, 95% confidence intervals, and p-values. The only row in the output Natural Direct Excursion Effect (alpha) and Natural Indirect Excursion Effect (beta) is named Intercept, indicating that these effects are modeled as constants over time (like intercepts in the effect models). In particular, the estimated NDEE is 0.17 (95% CI: -0.08, 0.42; p-value: 0.17) and the estimated NIEE is 0.025 (95% CI: -0.002, 0.054; p-value: 0.06). The confidence intervals and p-values are based on t-quantiles.

4. A More Complex Data Example

This package ships a small example dataset, data_time_varying_mediator_distal_outcome. It is in long format, one row per subject-by-decision point.

Columns (Variables)

  • ID: subject identifier (use the actual column name present in your data)
  • t_val: decision point index (must be strictly increasing within each subject)
  • I: availability (binary in {0,1}); if omitted, the analysis assumes all rows are available (which is not the case for this data example)
  • p_A: randomization probability, i.e., probability of \(A_{it} = 1\) at a given decision point conditional on the history information
  • A: binary treatment, coded in {0,1}
  • M: mediator (continuous or binary are both allowed)
  • Y: distal outcome — constant within each subject (the same value repeated across the subject’s rows)
  • Additional time-varying covariates for adjustment (e.g., X, A_prev, M_prev, etc.)

Peek at the included dataset

data(data_time_varying_mediator_distal_outcome)

dat <- data_time_varying_mediator_distal_outcome

dplyr::glimpse(dat)
#> Rows: 500
#> Columns: 16
#> $ id     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, …
#> $ dp     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1…
#> $ X_prev <dbl> -0.5605, 0.0800, 2.2500, 1.4600, 0.0000, -1.7400, 0.4700, 0.050…
#> $ I_prev <int> 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, …
#> $ A_prev <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, …
#> $ M_prev <dbl> 0.00, 0.13, -1.10, 0.79, 0.98, 2.18, 0.23, 1.62, 2.38, 1.09, 0.…
#> $ X      <dbl> 0.08, 2.25, 1.46, 0.00, -1.74, 0.47, 0.05, 1.46, 1.96, -0.28, -…
#> $ I      <int> 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, …
#> $ A      <int> 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, …
#> $ M      <dbl> 0.13, -1.10, 0.79, 0.98, 2.18, 0.23, 1.62, 2.38, 1.09, 0.15, 0.…
#> $ mu_X   <dbl> -0.17, 0.05, 0.45, 0.60, 0.20, 0.11, 0.19, 0.54, 0.91, 0.81, -0…
#> $ p_I    <dbl> 0.82, 0.89, 0.91, 0.78, 0.66, 0.67, 0.81, 0.76, 0.80, 0.75, 0.8…
#> $ p_A    <dbl> 0.39, 0.45, 0.43, 0.47, 0.47, 0.68, 0.62, 0.76, 0.71, 0.64, 0.3…
#> $ mu_M   <dbl> -0.66, -0.38, -0.64, -0.09, 0.64, 1.24, 1.29, 1.76, 1.24, 0.97,…
#> $ mu_Y   <dbl> 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 4.08, 1.4…
#> $ Y      <dbl> 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 4.06, 0.6…

dplyr::count(dat, id, name = "Ti") |>
    dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti))
#>   mean_T min_T max_T
#> 1     10    10    10

# Delete some decision points for certain individuals to mimic scenarios
# where not all individuals have the same number of decision points.
dat <- dat[!((dat$id == 1 & dat$dp == 10) | (dat$id == 2 & dat$dp %in% c(9, 10))), ]
dplyr::count(dat, id, name = "Ti") |>
    dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti))
#>   mean_T min_T max_T
#> 1   9.94     8    10

5. Detailed Walkthrough of mcee

We focus first on the streamlined MRT workflow with known randomization probability. Throughout, we’ll use the included dataset.

5.1 Basic usage (GLM; constant effects)

set.seed(1)
fit1 <- mcee(
    data = dat,
    id = "id",
    dp = "dp",
    outcome = "Y",
    treatment = "A",
    mediator = "M",
    availability = "I",
    rand_prob = "p_A",
    time_varying_effect_form = ~1, # NDEE and NIEE are constant over time
    control_formula_with_mediator = ~ dp + M + X, # covariate adjustment
    control_reg_method = "glm", # nuisance learners for q, eta, mu, nu
    verbose = FALSE
)
summary(fit1)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)  
#> (Intercept)    0.180  -0.021   0.381      0.100   1.801 48    0.078 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)   0.0421 -0.0334  0.1176     0.0376  1.1200 48     0.27
  • time_varying_effect_form: defines the basis f(t) used to model alpha (direct) and beta (indirect).
  • control_formula_with_mediator: adjustment set for nuisance models; include the mediator here (the wrapper will automatically remove it from the nuisance function models that should not include it). This can include s() terms if control_reg_method = "gam".

5.2 Time-varying effects (linear in dp)

fit2 <- mcee(
    data = dat,
    id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
    availability = "I", rand_prob = "p_A",
    time_varying_effect_form = ~dp, # NDEE, NIEE vary linearly in time
    control_formula_with_mediator = ~ dp + M + X,
    control_reg_method = "glm",
    verbose = FALSE
)
summary(fit2) # rows now labeled (Intercept) and dp
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.75643  0.17719  1.33568    0.28777  2.62863 46    0.012 *
#> dp          -0.10528 -0.20272 -0.00784    0.04841 -2.17487 46    0.035 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)
#> (Intercept)  0.07360 -0.11968  0.26688    0.09602  0.76653 46     0.45
#> dp          -0.00576 -0.03888  0.02736    0.01645 -0.35011 46     0.73

Interpretation: the rows for dp report how the natural direct/indirect excursion effects change per unit increase in decision point.

Tip. If your dataset already contains a basis of time (e.g., spline columns), you can reference them in time_varying_effect_form. The package will warn when variables other than dp appear there, but this is allowed for precomputed time bases.

5.3 Other learners for nuisance fitting

You can switch the learner used for fitting the nuisance functions via control_reg_method:

  • Generalized Additive Models: "gam"
  • Random forest (randomForest): "rf"
  • Ranger (fast RF): "ranger"
  • SuperLearner: "sl"
# Example: GAM (generalized additive model)
set.seed(2)
fit3 <- mcee(
    data = dat,
    id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
    availability = "I", rand_prob = "p_A",
    time_varying_effect_form = ~dp,
    control_formula_with_mediator = ~ s(dp) + s(M) + s(X), # spline formula for mgcv::gam
    control_reg_method = "gam",
    verbose = FALSE
)
summary(fit3)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, 
#>     control_formula_with_mediator = ~s(dp) + s(M) + s(X), control_reg_method = "gam", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)  
#> (Intercept)   0.7679  0.1921  1.3437     0.2861  2.6845 46     0.01 *
#> dp           -0.1165 -0.2137 -0.0194     0.0483 -2.4145 46     0.02 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)  -0.0305 -0.2225  0.1615     0.0954 -0.3196 46     0.75
#> dp           -0.0034 -0.0352  0.0284     0.0158 -0.2150 46     0.83

5.4 Estimating effects at specific decision points

If interested in estimating direct and indirect effects of intervention at a specific decision point (or a few decision points), you can use the specific_dp_only argument. For example, to estimate effects at the first two decision points (1 and 2) only, you can either:

  • Supply specific_dp_only = c(1, 2) in mcee, or
  • Provide weight_per_row that puts nonzero mass only at those rows (the wrappers normalize weights within subject).
fit4 <- mcee(
    data = dat,
    id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
    availability = "I", rand_prob = "p_A",
    time_varying_effect_form = ~1,
    control_formula_with_mediator = ~ dp + M + X,
    control_reg_method = "glm",
    specific_dp_only = c(1, 2),
    verbose = FALSE
)
summary(fit4)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     specific_dp_only = c(1, 2), verbose = FALSE)
#> 
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)  
#> (Intercept)    0.712   0.119   1.305      0.295   2.415 48     0.02 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)   0.0478 -0.2022  0.2979     0.1244  0.3847 48      0.7

The effect estimates are now an average over decision points 1 and 2.

In the next section of the vignette, we’ll show how to use the more flexible wrappers, mcee_general and mcee_userfit_nuisance.

6. Testing for Linear Combinations of Coefficients

Every wrapper in this package returns an object of class mcee_fit. This object is a list with several components.

6.1 Estimates $mcee_fit

  • alpha_hat, beta_hat: estimated coefficients for the natural direct excursion effect (\(\alpha\)) and natural indirect excursion effect (\(\beta\)).
  • alpha_se, beta_se: standard errors.
  • varcov: estimated covariance matrix for both \(\alpha\) and \(\beta\) coefficients (ordered as \((\alpha^T, \beta^T)^T\)).
  • alpha_varcov, beta_varcov: estimated covariance matrices for \(\alpha\) and \(\beta\) separately.
fit1$mcee_fit
#> $alpha_hat
#> (Intercept) 
#>        0.18 
#> 
#> $alpha_se
#> (Intercept) 
#>     0.09999 
#> 
#> $beta_hat
#> (Intercept) 
#>     0.04206 
#> 
#> $beta_se
#> (Intercept) 
#>     0.03756 
#> 
#> $varcov
#>                   alpha_(Intercept) beta_(Intercept)
#> alpha_(Intercept)         9.998e-03       -4.016e-05
#> beta_(Intercept)         -4.016e-05        1.411e-03
#> 
#> $alpha_varcov
#>             (Intercept)
#> (Intercept)    0.009998
#> 
#> $beta_varcov
#>             (Intercept)
#> (Intercept)    0.001411

6.2 Linear combinations

The lincomb_joint, lincomb_alpha, and lincomb_beta arguments to summary() lets you compute linear combinations of the estimated coefficients. For example:

Example 1: Difference \(\alpha - \beta\) for constant‑effect model (fit1)

# difference between direct and indirect excursion effects
summary(fit1, lincomb_joint = c(1, -1))
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)  
#> (Intercept)    0.180  -0.021   0.381      0.100   1.801 48    0.078 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)   0.0421 -0.0334  0.1176     0.0376  1.1200 48     0.27
#> 
#> Joint linear combinations (L * (alpha, beta)):
#>           Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x theta   0.1380 -0.0775  0.3535     0.1072  1.2873 48      0.2

This requests the linear combination \(\alpha - \beta\), with standard error and t-statistic computed from the covariance matrix.

Example 2: Effects at decision point 10 for linear‑trend model (fit2)

Suppose you fit a model with a linear time basis, e.g.:

fit2 <- mcee(
    data = dat,
    id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M",
    availability = "I", rand_prob = "p_A",
    time_varying_effect_form = ~dp,
    control_formula_with_mediator = ~ dp + M + X,
    control_reg_method = "glm",
    verbose = FALSE
)

To get the effect at decision point 10 (the last decision point), you need the intercept plus 9 times the slope:

summary(fit2, lincomb_alpha = c(1, 9), lincomb_beta = c(1, 9))
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.75643  0.17719  1.33568    0.28777  2.62863 46    0.012 *
#> dp          -0.10528 -0.20272 -0.00784    0.04841 -2.17487 46    0.035 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)
#> (Intercept)  0.07360 -0.11968  0.26688    0.09602  0.76653 46     0.45
#> dp          -0.00576 -0.03888  0.02736    0.01645 -0.35011 46     0.73
#> 
#> Linear combinations of alpha (L * alpha):
#>           Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x alpha   -0.191  -0.581   0.199      0.194  -0.986 46     0.33
#> 
#> Linear combinations of beta (L * beta):
#>          Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x beta   0.0218 -0.1202  0.1637     0.0705  0.3086 46     0.76

If you prefer a joint contrast (e.g., alpha(t=10) − beta(t=10)), stack the two contrasts into a single row over c(alpha, beta):

L_joint_t10 <- matrix(c(
    1, 9, # alpha part
    -1, -9 # beta part
), nrow = 1)
summary(fit2, lincomb_joint = L_joint_t10)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.75643  0.17719  1.33568    0.28777  2.62863 46    0.012 *
#> dp          -0.10528 -0.20272 -0.00784    0.04841 -2.17487 46    0.035 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)
#> (Intercept)  0.07360 -0.11968  0.26688    0.09602  0.76653 46     0.45
#> dp          -0.00576 -0.03888  0.02736    0.01645 -0.35011 46     0.73
#> 
#> Joint linear combinations (L * (alpha, beta)):
#>            Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> L x theta1   -0.213  -0.651   0.225      0.218  -0.978 46     0.33

6.3 Inspecting nuisance models

For diagnostics, the returned object also includes:

  • $nuisance_models: the actual model objects used for fitting each nuisance parameter (p, q, eta, mu, nu).
  • $nuisance_fitted: the fitted values for each nuisance function (p1, q1, eta1, eta0, mu1, mu0, nu1, nu0).

One can use summary(..., show_nuisance = TRUE) to get a compact summary of the nuisance fits.

summary(fit1, show_nuisance = TRUE)
#> 
#> Call:
#> mcee(data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", 
#>     mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, 
#>     control_formula_with_mediator = ~dp + M + X, control_reg_method = "glm", 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 48
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)  
#> (Intercept)    0.180  -0.021   0.381      0.100   1.801 48    0.078 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate 95% LCL 95% UCL Std. Error t value df Pr(>|t|)
#> (Intercept)   0.0421 -0.0334  0.1176     0.0376  1.1200 48     0.27
#> 
#> Stage-1 nuisance fits:
#>   [p] known constant: multiple values (e.g., 0.39, 0.45, 0.43, ...)
#>   [q] glm: ~ dp + M + X
#>   [eta1] glm: ~ dp + X
#>   [eta0] glm: ~ dp + X
#>   [mu1] glm: ~ dp + M + X
#>   [mu0] glm: ~ dp + M + X
#>   [nu1] glm: ~ dp + X
#>   [nu0] glm: ~ dp + X
#>   Note: For full details, inspect `$nuisance_models` directly.

head(fit1$nuisance_fitted$mu1)
#> [1] 3.277 3.655 3.757 3.083 2.595 2.775

Tip. Because varcov is the joint covariance of c(alpha, beta), you can build any custom Wald test by forming your own contrast matrix L and using L %*% c(alpha, beta) with variance L %*% varcov %*% t(L). This is equivalent to using lincomb_joint = L in summary().

7. Other Wrappers: mcee_general and mcee_userfit_nuisance

While mcee covers the common MRT use case (known randomization, one control formula for all nuisance functions), two additional wrappers provide more flexibility in controlling how the nuisance functions are fitted:

7.1 mcee_general

This interface lets you configure each nuisance function separately.

  • Configuration objects specify:
    • known: supply a fixed vector/scalar (commonly for treatment mechanism p in MRT).
    • formula: regression formula, which can include s() terms if using method = "gam".
    • method: learner (glm, gam, lm, rf, ranger, sl).
    • family: optional; defaults inferred from nuisance type (binomial for p and q; gaussian for eta, mu, nu).

Helper functions can be used to build these objects: mcee_config_maker, mcee_config_known, mcee_config_glm, mcee_config_gam, mcee_config_lm, mcee_config_rf, mcee_config_ranger, mcee_config_sl, mcee_config_sl_user.

Example: flexible nuisance configs

# Families (binomial vs Gaussian) are chosen automatically when omitted; for `p` and `q` the default is binomial, for the outcome regressions it is Gaussian.

cfg <- list(
    p   = mcee_config_known("p", dat$p_A), # known randomization prob in MRT
    q   = mcee_config_glm("q", ~ dp + X + M),
    eta = mcee_config_glm("eta", ~ dp + X),
    mu  = mcee_config_glm("mu", ~ dp + X + M),
    nu  = mcee_config_glm("nu", ~ dp + X)
)

fit_gen <- mcee_general(
    data = dat,
    id = "id", dp = "dp", outcome = "Y",
    treatment = "A", mediator = "M",
    availability = "I",
    time_varying_effect_form = ~dp,
    config_p = cfg$p, config_q = cfg$q,
    config_eta = cfg$eta, config_mu = cfg$mu, config_nu = cfg$nu,
    verbose = FALSE
)
summary(fit_gen)
#> 
#> Call:
#> mcee_general(data = dat, id = "id", dp = "dp", outcome = "Y", 
#>     treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp, 
#>     config_p = cfg$p, config_q = cfg$q, config_eta = cfg$eta, 
#>     config_mu = cfg$mu, config_nu = cfg$nu, verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.75643  0.17719  1.33568    0.28777  2.62863 46    0.012 *
#> dp          -0.10528 -0.20272 -0.00784    0.04841 -2.17487 46    0.035 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)
#> (Intercept)  0.07360 -0.11968  0.26688    0.09602  0.76653 46     0.45
#> dp          -0.00576 -0.03888  0.02736    0.01645 -0.35011 46     0.73

Tip. This interface is particularly useful for observational studies where the treatment mechanism p is unknown and must be estimated. You can set config_p to a regression formula and method of your choice.

7.2 mcee_userfit_nuisance

Sometimes you fit nuisance models externally, perhaps with custom ML workflows. In that case, you can bypass Stage‑1 nuisance model fitting and supply predictions directly.

  • Required inputs: vectors p1, q1, eta1, eta0, mu1, mu0, nu1, nu0, each aligned with the rows of your data.
    • p1: \(P(A_t = I_t | H_t)\)
    • q1: \(P(A_t = I_t | H_t, M_t)\)
    • eta1: \(E(Y | H_t, A_t = I_t)\)
    • eta0: \(E(Y | H_t, A_t = 0)\)
    • mu1: \(E(Y | H_t, A_t = I_t, M_t)\)
    • mu0: \(E(Y | H_t, A_t = 0, M_t)\)
    • nu1: \(E[E(Y | H_t, A_t = I_t, M_t) | H_t, A_t = 0]\)
    • nu0: \(E[E(Y | H_t, A_t = 0, M_t) | H_t, A_t = I_t]\)
  • The wrapper will enforce p1 = 1 and q1 = 1 at I == 0.

Example: manual nuisance fits with GLM

# Fit nuisance regressions manually: p, q, eta, mu, nu
p1_hat <- dat$p_A # known randomization prob in MRT
p1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message
q1_hat <- predict(glm(A ~ dp + X + M, family = binomial(), data = dat[dat$I == 1, ]),
    type = "response", newdata = dat
)
q1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message
eta1_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == dat$I, ]), newdata = dat)
eta0_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == 0, ]), newdata = dat)
mu1_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == dat$I, ]), newdata = dat)
mu0_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == 0, ]), newdata = dat)
nu1_hat <- predict(lm(mu1 ~ dp + X, data = cbind(dat, mu1 = mu1_hat)[dat$A == 0, ]), newdata = dat)
nu0_hat <- predict(lm(mu0 ~ dp + X, data = cbind(dat, mu0 = mu0_hat)[dat$A == dat$I, ]), newdata = dat)

fit_usr <- mcee_userfit_nuisance(
    data = dat,
    id = "id", dp = "dp", outcome = "Y",
    treatment = "A", mediator = "M",
    availability = "I",
    time_varying_effect_form = ~dp,
    p1 = p1_hat,
    q1 = q1_hat,
    eta1 = eta1_hat, eta0 = eta0_hat,
    mu1 = mu1_hat, mu0 = mu0_hat,
    nu1 = nu1_hat, nu0 = nu0_hat,
    verbose = FALSE
)
summary(fit_usr)
#> 
#> Call:
#> mcee_userfit_nuisance(data = dat, id = "id", dp = "dp", outcome = "Y", 
#>     treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp, 
#>     p1 = p1_hat, q1 = q1_hat, eta1 = eta1_hat, eta0 = eta0_hat, 
#>     mu1 = mu1_hat, mu0 = mu0_hat, nu1 = nu1_hat, nu0 = nu0_hat, 
#>     verbose = FALSE)
#> 
#> Inference: small-sample t; df = 46
#> Confidence level: 95%
#> 
#> Natural Direct Excursion Effect (alpha):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)  
#> (Intercept)  0.75643  0.17719  1.33568    0.28777  2.62863 46    0.012 *
#> dp          -0.10528 -0.20272 -0.00784    0.04841 -2.17487 46    0.035 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Natural Indirect Excursion Effect (beta):
#>             Estimate  95% LCL  95% UCL Std. Error  t value df Pr(>|t|)
#> (Intercept)  0.07360 -0.11968  0.26688    0.09602  0.76653 46     0.45
#> dp          -0.00576 -0.03888  0.02736    0.01645 -0.35011 46     0.73
  • Particularly useful when applying custom ML pipelines not wrapped by mcee.
  • For MRTs, you can still supply p1 as a known constant vector.

7.3 Comparing results

Different wrappers will produce the exact same result if the same nuisance models are used. Here we compare fit2 (from mcee) and fit_gen (from mcee_general) which use the same GLM nuisance models and the same control variables.

fit2$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")]
#> $alpha_hat
#> (Intercept)          dp 
#>      0.7564     -0.1053 
#> 
#> $beta_hat
#> (Intercept)          dp 
#>    0.073602   -0.005761 
#> 
#> $alpha_se
#> (Intercept)          dp 
#>     0.28777     0.04841 
#> 
#> $beta_se
#> (Intercept)          dp 
#>     0.09602     0.01645
fit_gen$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")]
#> $alpha_hat
#> (Intercept)          dp 
#>      0.7564     -0.1053 
#> 
#> $beta_hat
#> (Intercept)          dp 
#>    0.073602   -0.005761 
#> 
#> $alpha_se
#> (Intercept)          dp 
#>     0.28777     0.04841 
#> 
#> $beta_se
#> (Intercept)          dp 
#>     0.09602     0.01645

all.equal(fit2$mcee_fit, fit_gen$mcee_fit, tolerance = 1e-6)
#> [1] TRUE
all.equal(fit2$mcee_fit, fit_usr$mcee_fit, tolerance = 1e-6)
#> [1] TRUE

8. Best Practices


9. Conclusion

The mcee package provides a multiply robust estimator of natural direct and indirect excursion effects in micro-randomized trials and observational longitudinal studies with time-varying treatments, time-varying mediators and a distal outcome. Three wrapper functions serve different needs:

Together, these tools allow both applied researchers and methodologists to estimate mediated excursion effects with transparent assumptions and flexible modeling options.

Appendix: Estimating Equations and Asymptotic Variance

This appendix documents how the core calculation happens in .mcee_core_rows(), which is a detailed derivation of Eq. (12) and the asymptotic variance of Theorem 4 in Qian (2025).

The proposed estimating function is

\[ \psi(\gamma; \zeta) := \sum_{t=1}^T \omega(t) \begin{bmatrix} \{ \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - \phi_t^{00}(p_t, \eta_t) - f(t)^\top \alpha \} f(t) \\ \{ \phi_t^{11}(p_t, \eta_t) - \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - f(t)^\top \beta \} f(t) \end{bmatrix}. \]

The estimating equation (EE), omitting the nuisance parameters and allowing different \(T_i\) for each subject \(i\), is

\[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} \{ \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha \} f(t) \\ \{ \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta \} f(t) \end{bmatrix} = 0. \]

The estimators are computed by

\[ \hat\alpha = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \{ \phi_{it}^{10} - \phi_{it}^{00} \} f(t) \right], \]

\[ \hat\beta = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \{ \phi_{it}^{11} - \phi_{it}^{10} \} f(t) \right]. \]

The asymptotic variance of \((\alpha^\top, \beta^\top)^\top\) can be estimated by

\[ \text{Var}(\hat\gamma) \approx \text{Bread}^{-1} \, \text{Meat} \, (\text{Bread}^{-1})^\top, \]

where

\[ \text{Bread} = \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} f(t)f(t)^\top & 0 \\ 0 & f(t)f(t)^\top \end{bmatrix}, \]

\[ \text{Meat} = \frac{1}{n}\sum_{i=1}^n \left( \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} \{ \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha \} f(t) \\ \{ \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta \} f(t) \end{bmatrix} \right)^{\otimes 2}. \]

Reference

Qian, T. (2025). Dynamic causal mediation analysis for intensive longitudinal data. arXiv preprint arXiv:2506.20027.