In the context of this package, “marginal means” refer to the values obtained by this three step process:
For example, consider a model with a numeric, a factor, and a logical predictor:
library(marginaleffects)
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
dat$am <- as.logical(dat$am)
mod <- lm(mpg ~ hp + cyl + am, data = dat)Using the predictions function, we set the hp variable at its mean and compute predictions for all combinations for am and cyl:
predictions(mod, variables = c("am", "cyl"))
#>   rowid     type predicted std.error conf.low conf.high       hp    am cyl
#> 1     1 response  21.03914  1.213043 18.55019  23.52810 146.6875  TRUE   6
#> 2     2 response  16.88129  1.272938 14.26944  19.49314 146.6875 FALSE   6
#> 3     3 response  24.96372  1.176830 22.54907  27.37838 146.6875  TRUE   4
#> 4     4 response  20.80587  1.756564 17.20169  24.41004 146.6875 FALSE   4
#> 5     5 response  21.43031  1.826126 17.68341  25.17721 146.6875  TRUE   8
#> 6     6 response  17.27245  1.116885 14.98079  19.56411 146.6875 FALSE   8For illustration purposes, it is useful to reshape the above results:
| cyl | TRUE | FALSE | Marginal mean of cyl | 
|---|---|---|---|
| 6 | 21.0 | 16.9 | 19.0 | 
| 4 | 25.0 | 20.8 | 22.9 | 
| 8 | 21.4 | 17.3 | 19.4 | 
| Marginal means of am | 22.5 | 18.3 | 
The marginal means of am and cyl are obtained by taking the mean of the adjusted predictions across cells. The marginalmeans function gives us the same results easily:
marginalmeans(mod)
#>   term value marginalmean std.error
#> 1  cyl     4     22.88479 1.3566479
#> 2  cyl     6     18.96022 1.0729360
#> 3  cyl     8     19.35138 1.3770817
#> 4   am FALSE     18.31987 0.7853925
#> 5   am  TRUE     22.47772 0.8343346The same results can be obtained using the very powerful emmeans package:
library(emmeans)
emmeans(mod, specs = "cyl")
#>  cyl emmean   SE df lower.CL upper.CL
#>  4     22.9 1.36 27     20.1     25.7
#>  6     19.0 1.07 27     16.8     21.2
#>  8     19.4 1.38 27     16.5     22.2
#> 
#> Results are averaged over the levels of: am 
#> Confidence level used: 0.95
emmeans(mod, specs = "am")
#>  am    emmean    SE df lower.CL upper.CL
#>  FALSE   18.3 0.785 27     16.7     19.9
#>   TRUE   22.5 0.834 27     20.8     24.2
#> 
#> Results are averaged over the levels of: cyl 
#> Confidence level used: 0.95The summary, tidy, and glance functions are also available to summarize and manipulate the results:
mm <- marginalmeans(mod)
tidy(mm)
#>   term value estimate std.error statistic p.value conf.low conf.high
#> 1  cyl     4 22.88479 1.3566479  16.86863       0 20.22581  25.54378
#> 2  cyl     6 18.96022 1.0729360  17.67134       0 16.85730  21.06313
#> 3  cyl     8 19.35138 1.3770817  14.05246       0 16.65235  22.05041
#> 4   am FALSE 18.31987 0.7853925  23.32575       0 16.78053  19.85921
#> 5   am  TRUE 22.47772 0.8343346  26.94090       0 20.84246  24.11299
glance(mm)
#>   r.squared adj.r.squared   sigma statistic      p.value df    logLik      AIC
#> 1  0.824875     0.7989306 2.70253   31.7939 7.400614e-10  4 -74.50167 161.0033
#>        BIC deviance df.residual nobs       F    rmse
#> 1 169.7978  197.199          27   32 31.7939 2.70253
summary(mm)
#> Estimated marginal means 
#>   Term Value  Mean Std. Error z value   Pr(>|z|) 2.5 % 97.5 %
#> 1  cyl     4 22.88     1.3566   16.87 < 2.22e-16 20.23  25.54
#> 2  cyl     6 18.96     1.0729   17.67 < 2.22e-16 16.86  21.06
#> 3  cyl     8 19.35     1.3771   14.05 < 2.22e-16 16.65  22.05
#> 4   am FALSE 18.32     0.7854   23.33 < 2.22e-16 16.78  19.86
#> 5   am  TRUE 22.48     0.8343   26.94 < 2.22e-16 20.84  24.11
#> 
#> Model type:  lm 
#> Prediction type:  responseThanks to those tidiers, we can also present the results in the style of a regression table using the modelsummary package:
library("modelsummary")
modelsummary(mm,
             title = "Estimated Marginal Means",
             estimate = "{estimate} ({std.error}){stars}",
             statistic = NULL,
             group = term + value ~ model)| Model 1 | ||
|---|---|---|
| cyl | 4 | 22.885 (1.357)*** | 
| 6 | 18.960 (1.073)*** | |
| 8 | 19.351 (1.377)*** | |
| am | FALSE | 18.320 (0.785)*** | 
| TRUE | 22.478 (0.834)*** | |
| Num.Obs. | 32 | |
| R2 | 0.825 | |
| R2 Adj. | 0.799 | |
| AIC | 161.0 | |
| BIC | 169.8 | |
| Log.Lik. | −74.502 | |
| F | 31.794 | |
| RMSE | 2.70 | 
This example requires version 0.2.0 of the marginaleffects package.
To begin, we generate data and estimate a large model:
library(nnet)
library(marginaleffects)
set.seed(1839)
n <- 1200
x <- factor(sample(letters[1:3], n, TRUE))
y <- vector(length = n)
y[x == "a"] <- sample(letters[4:6], sum(x == "a"), TRUE)
y[x == "b"] <- sample(letters[4:6], sum(x == "b"), TRUE, c(1 / 4, 2 / 4, 1 / 4))
y[x == "c"] <- sample(letters[4:6], sum(x == "c"), TRUE, c(1 / 5, 3 / 5, 2 / 5))
dat <- data.frame(x = x, y = factor(y))
tmp <- as.data.frame(replicate(20, factor(sample(letters[7:9], n, TRUE))))
dat <- cbind(dat, tmp)
void <- capture.output({
    mod <- multinom(y ~ ., dat)
})Try to compute marginal means, but realize that your grid won’t fit in memory:
marginalmeans(mod, type = "probs")
#> Error in typical(..., model = model, newdata = newdata, FUN.character = FUN.character, : You are trying to create a prediction grid with more than 1 billion rows, which is likely to exceed the memory and computational power available on your local machine. Presumably this is because you are considering many variables with many levels. All of the functions in the `marginaleffects` package include arguments to specify a restricted list of variables over which to create a prediction grid.Use the variables and variables_grid arguments to compute marginal means over a more reasonably sized grid: