This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.
Every {gtsummary} table has a few characteristics common among all tables created with the package. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.
library(gtsummary)
tbl_regression_ex <-
  lm(age ~ grade + marker, trial) %>%
  tbl_regression() %>%
  bold_p(t = 0.5)
tbl_summary_ex <-
  trial %>%
  select(trt, age, grade, response) %>%
  tbl_summary(by = trt)Every {gtsummary} object is a list comprising of, at minimum, these elements:
.$table_body    .$table_styling         The .$table_body object is the data frame that will
ultimately be printed as the output. The table must include columns
"label", "row_type", and
"variable". The "label" column is printed, and
the other two are hidden from the final output.
tbl_summary_ex$table_body
#> # A tibble: 8 × 7
#>   variable var_type    var_label      row_type label          stat_1      stat_2
#>   <chr>    <chr>       <chr>          <chr>    <chr>          <chr>       <chr> 
#> 1 age      continuous  Age            label    Age            46 (37, 59) 48 (3…
#> 2 age      continuous  Age            missing  Unknown        7           4     
#> 3 grade    categorical Grade          label    Grade          <NA>        <NA>  
#> 4 grade    categorical Grade          level    I              35 (36%)    33 (3…
#> 5 grade    categorical Grade          level    II             32 (33%)    36 (3…
#> 6 grade    categorical Grade          level    III            31 (32%)    33 (3…
#> 7 response dichotomous Tumor Response label    Tumor Response 28 (29%)    33 (3…
#> 8 response dichotomous Tumor Response missing  Unknown        3           4The .$table_styling object is a list of data frames
containing information about how .$table_body is printed,
formatted, and styled.
The list contains the following data frames header,
footnote, footnote_abbrev,
fmt_fun, text_format,
fmt_missing, cols_merge and the following
objects source_note, caption,
horizontal_line_above.
header
The header table has the following columns and is one
row per column found in .$table_body. The table contains
styling information that applies to entire column or the columns
headers.
| Column | Description | 
|---|---|
| column | Column name from  | 
| hide | Logical indicating whether the column is hidden in the output. This column is also scoped in  | 
| align | Specifies the alignment/justification of the column, e.g. 'center' or 'left' | 
| label | Label that will be displayed (if column is displayed in output) | 
| interpret_label | the {gt} function that is used to interpret the column label,  | 
| spanning_header | Includes text printed above columns as spanning headers. | 
| interpret_spanning_header | the {gt} function that is used to interpret the column spanning headers,  | 
| modify_stat_{*} | any column beginning with  | 
| modify_selector_{*} | any column beginning with  | 
footnote &
footnote_abbrev
Each {gtsummary} table may contain a single footnote per header and
cell within the table. Footnotes and footnote abbreviations are handled
separately. Updates/changes to footnote are appended to the bottom of
the tibble. A footnote of NA_character_ deletes an existing
footnote.
| Column | Description | 
|---|---|
| column | Column name from  | 
| rows | expression selecting rows in  | 
| footnote | string containing footnote to add to column/row | 
fmt_fun
Numeric columns/rows are styled with the functions stored in
fmt_fun. Updates/changes to styling functions are appended
to the bottom of the tibble.
| Column | Description | 
|---|---|
| column | Column name from  | 
| rows | expression selecting rows in  | 
| fmt_fun | list of formatting/styling functions | 
text_format
Columns/rows are styled with bold, italic, or indenting stored in
text_format. Updates/changes to styling functions are
appended to the bottom of the tibble.
| Column | Description | 
|---|---|
| column | Column name from  | 
| rows | expression selecting rows in  | 
| format_type | one of  | 
| undo_text_format | logical indicating where the formatting indicated should be undone/removed. | 
fmt_missing
By default, all NA values are shown blanks. Missing
values in columns/rows are replaced with the symbol. For
example, reference rows in tbl_regression() are shown with
an em-dash. Updates/changes to styling functions are appended to the
bottom of the tibble.
| Column | Description | 
|---|---|
| column | Column name from  | 
| rows | expression selecting rows in  | 
| symbol | string to replace missing values with, e.g. an em-dash | 
cols_merge
This object is experimental and may change in the future.
This tibble gives instructions for merging columns into a single column.
The implementation in as_gt() will be updated after
gt::cols_label() gains a rows= argument.
| Column | Description | 
|---|---|
| column | Column name from  | 
| rows | expression selecting rows in  | 
| pattern | glue pattern directing how to combine/merge columns. The merged columns will replace the column indicated in 'column'. | 
source_note
String that is made a table source note. The attribute
"text_interpret" is either
c("md", "html").
caption
String that is made into the table caption. The attribute
"text_interpret" is either
c("md", "html").
horizontal_line_above
Expression identifying a row where a horizontal line is placed above in the table.
Example from tbl_regression()
tbl_regression_ex$table_styling
#> $header
#> # A tibble: 24 × 9
#>    column             hide  align  inter…¹ label inter…² spann…³ modif…⁴ modif…⁵
#>    <chr>              <lgl> <chr>  <chr>   <chr> <chr>   <chr>     <int>   <int>
#>  1 variable           TRUE  center gt::md  vari… gt::md  <NA>        179     179
#>  2 var_label          TRUE  center gt::md  var_… gt::md  <NA>        179     179
#>  3 var_type           TRUE  center gt::md  var_… gt::md  <NA>        179     179
#>  4 reference_row      TRUE  center gt::md  refe… gt::md  <NA>        179     179
#>  5 row_type           TRUE  center gt::md  row_… gt::md  <NA>        179     179
#>  6 header_row         TRUE  center gt::md  head… gt::md  <NA>        179     179
#>  7 N_obs              TRUE  center gt::md  N_obs gt::md  <NA>        179     179
#>  8 N                  TRUE  center gt::md  **N** gt::md  <NA>        179     179
#>  9 coefficients_type  TRUE  center gt::md  coef… gt::md  <NA>        179     179
#> 10 coefficients_label TRUE  center gt::md  coef… gt::md  <NA>        179     179
#> # … with 14 more rows, and abbreviated variable names ¹interpret_label,
#> #   ²interpret_spanning_header, ³spanning_header, ⁴modify_stat_N,
#> #   ⁵modify_stat_n
#> 
#> $footnote
#> # A tibble: 0 × 4
#> # … with 4 variables: column <chr>, rows <list>, text_interpret <chr>,
#> #   footnote <chr>
#> 
#> $footnote_abbrev
#> # A tibble: 2 × 4
#>   column    rows      text_interpret footnote                
#>   <chr>     <list>    <chr>          <chr>                   
#> 1 ci        <quosure> gt::md         CI = Confidence Interval
#> 2 std.error <quosure> gt::md         SE = Standard Error     
#> 
#> $text_format
#> # A tibble: 2 × 4
#>   column  rows       format_type undo_text_format
#>   <chr>   <list>     <chr>       <lgl>           
#> 1 label   <language> indent      FALSE           
#> 2 p.value <quosure>  bold        FALSE           
#> 
#> $fmt_missing
#> # A tibble: 4 × 3
#>   column    rows      symbol
#>   <chr>     <list>    <chr> 
#> 1 estimate  <quosure> —     
#> 2 ci        <quosure> —     
#> 3 std.error <quosure> —     
#> 4 statistic <quosure> —     
#> 
#> $fmt_fun
#> # A tibble: 10 × 3
#>    column      rows      fmt_fun   
#>    <chr>       <list>    <list>    
#>  1 estimate    <quosure> <fn>      
#>  2 N           <quosure> <fn>      
#>  3 N_obs       <quosure> <fn>      
#>  4 n_obs       <quosure> <fn>      
#>  5 conf.low    <quosure> <fn>      
#>  6 conf.high   <quosure> <fn>      
#>  7 p.value     <quosure> <fn>      
#>  8 std.error   <quosure> <prrr_fn_>
#>  9 statistic   <quosure> <prrr_fn_>
#> 10 var_nlevels <quosure> <prrr_fn_>
#> 
#> $cols_merge
#> # A tibble: 0 × 3
#> # … with 3 variables: column <chr>, rows <list>, pattern <chr>When constructing a {gtsummary} object, the author will begin with
the .$table_body object. Recall the
.$table_body data frame must include columns
"label", "row_type", and
"variable". Of these columns, only the "label"
column will be printed with the final results. The
"row_type" column typically will control whether or not the
label column is indented. The "variable" column is often
used in the inline_text() family of functions, and merging
{gtsummary} tables with tbl_merge().
tbl_regression_ex %>%
  purrr::pluck("table_body") %>%
  select(variable, row_type, label)
#> # A tibble: 5 × 3
#>   variable row_type label               
#>   <chr>    <chr>    <chr>               
#> 1 grade    label    Grade               
#> 2 grade    level    I                   
#> 3 grade    level    II                  
#> 4 grade    level    III                 
#> 5 marker   label    Marker Level (ng/mL)The other columns in .$table_body are created by the
user and are likely printed in the output. Formatting and printing
instructions for these columns is stored in
.$table_styling.
There are a few internal {gtsummary} functions to assist in
constructing and modifying a .$table_header data frame.
.create_gtsummary_object(table_body) After a user
creates a table_body, pass it to this function and the
skeleton of a gtsummary object is created and returned (including the
full table_styling list of tables).
.update_table_styling() After columns are added or
removed from table_body, run this function to update
.$table_styling to include or remove styling instructions
for the columns. FYI the default styling for each new column is to hide
it.
modify_table_styling() This exported function
modifies the printing instructions for a single column or groups of
columns.
modify_table_body() This exported function helps
users make changes to .$table_body. The function runs
.update_table_styling() internally to maintain internal
validity with the printing instructions.
All {gtsummary} objects are printed with
print.gtsummary(). Before a {gtsummary} object is printed,
it is converted to a {gt} object using as_gt(). This
function takes the {gtsummary} object as its input, and uses the
information in .$table_styling to construct a list of {gt}
calls that will be executed on .$table_body. After the
{gtsummary} object is converted to {gt}, it is then printed as any other
{gt} object.
In some cases, the package defaults to printing with other engines,
such as flextable (as_flex_table()), huxtable
(as_hux_table()), kableExtra
(as_kable_extra()), and kable (as_kable()).
The default print engine is set with the theme element
"pkgwide-str:print_engine"
While the actual print function is slightly more involved, it is basically this:
print.gtsummary <- function(x) {
  get_theme_element("pkgwide-str:print_engine") %>%
    switch(
      "gt" = as_gt(x),
      "flextable" = as_flex_table(x),
      "huxtable" = as_hux_table(x),
      "kable_extra" = as_kable_extra(x),
      "kable" = as_kable(x)
    ) %>%
    print()
}.$meta_data$df_stats tibbleSome {gtsummary} tables contain an internal object called
.$meta_data containing a list column called
"df_stats". The column is a list of tibbles with each
tibble containing the summary statistics presented in the final
gtsummary table. While the statistics contained in each
"df_stats" tibble can vary within a single gtsummary
object, all the tibbles have a few common characteristics.
Each tibble contain the following columns
| Column | Description | 
|---|---|
| 
 | String of the variable name | 
| 
 | String matching the variable's values in  | 
| 
 | The column name the statistics appear under in  | 
| 
 | This column appears if and only if the variable being summarized has multiple levels. The column is equal to the variable's levels. | 
| 
 | Primarily, the tibble stores the summary statistics for each variable. For example, when the mean is requested in  | 
The statistics columns each have an attribute called
"fmt_fun" containing the formatting function that will be
applied before the statistic is placed in .$table_body.