Create satisficing tables in R the formula way.
The objective of tablespan
is to provide a “good enough”
approach to creating tables by leveraging R’s formulas.
The following shows an example, where we define a relatively complex table header setup with a single formula. The details of the syntax will be explained below.
library(dplyr)
library(tablespan)
data("mtcars")
<- mtcars |>
summarized_table group_by(cyl, vs) |>
summarise(N = n(),
mean_hp = mean(hp),
sd_hp = sd(hp),
mean_wt = mean(wt),
sd_wt = sd(wt))
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.
<- tablespan(data = summarized_table,
tbl formula = Cylinder:cyl + Engine:vs ~
+
N `Horse Power` = Mean:mean_hp + SD:sd_hp) +
(`Weight` = Mean:mean_wt + SD:sd_wt),
(title = "Motor Trend Car Road Tests",
subtitle = "A table created with tablespan",
footnote = "Data from the infamous mtcars data set.")
tbl#> Motor Trend Car Road Tests
#> A table created with tablespan
#>
#> | | Horse Power Weight |
#> | Cylinder Engine | N Mean SD Mean SD |
#> | -------- ------ - -- ----------- ----- ------ ---- |
#> | 4 0 | 1 91 2.14 |
#> | 4 1 | 10 81.8 21.87 2.3 0.6 |
#> | 6 0 | 3 131.67 37.53 2.76 0.13 |
#> | ... ... | ... ... ... ... ... |
#> Data from the infamous mtcars data set.
tablespan
builds on the awesome packages openxlsx
and
gt
, which allows
tables created with tablespan
to be exported to the
following formats:
To install tablespan
from CRAN use:
install.packages("tablespan")
The development version of tablespan
can be installed
from GitHub with:
library(remotes)
::install_github("jhorzek/tablespan") remotes
R has a large set of great packages that allow you to create and
export tables that look exactly like you envisioned. However, sometimes
you may just need a good-enough table that is easy to create and share
with others. This is where tablespan
can be of help.
Let’s assume that we want to share the following table:
library(dplyr)
data("mtcars")
<- mtcars |>
summarized_table group_by(cyl, vs) |>
summarise(N = n(),
mean_hp = mean(hp),
sd_hp = sd(hp),
mean_wt = mean(wt),
sd_wt = sd(wt))
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.
print(summarized_table)
#> # A tibble: 5 × 7
#> # Groups: cyl [3]
#> cyl vs N mean_hp sd_hp mean_wt sd_wt
#> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 4 0 1 91 NA 2.14 NA
#> 2 4 1 10 81.8 21.9 2.30 0.598
#> 3 6 0 3 132. 37.5 2.76 0.128
#> 4 6 1 4 115. 9.18 3.39 0.116
#> 5 8 0 14 209. 51.0 4.00 0.759
We don’t want to share the table as is - the variable names are all a bit technical and the table could need some spanners summarizing columns. So, we want to share a table that looks something like this:
| | Horse Power | Weight |
| Cylinder | Engine | Mean | SD | Mean | SD |
| -------- | ------ | ----- | --- | ---- | -- |
| | |
tablespan
allows us to create this table with a single
formula.
In tablespan
, the table headers are defined with a
formula. For example, cyl ~ mean_hp + sd_hp
defines a table
with cyl
as the row names and mean_hp
and
sd_hp
as columns:
library(tablespan)
tablespan(data = summarized_table,
formula = cyl ~ mean_hp + sd_hp)
#>
#> | cyl | mean_hp sd_hp |
#> | --- - ------- ----- |
#> | 4 | 91 |
#> | 4 | 81.8 21.87 |
#> | 6 | 131.67 37.53 |
#> | ... | ... ... |
Note that the row names (cyl
) are in a separate block to
the left.
Spanners are defined using braces and spanner names. For example, the
following defines a spanner for mean_hp
and
sd_hp
with the name Horsepower
:
cyl ~ (Horsepower = mean_hp + sd_hp)
:
tablespan(data = summarized_table,
formula = cyl ~ (Horsepower = mean_hp + sd_hp))
#>
#> | | Horsepower |
#> | cyl | mean_hp sd_hp |
#> | --- - ---------- ----- |
#> | 4 | 91 |
#> | 4 | 81.8 21.87 |
#> | 6 | 131.67 37.53 |
#> | ... | ... ... |
Spanners can also be nested:
tablespan(data = summarized_table,
formula = cyl ~ (Horsepower = (Mean = mean_hp) + (SD = sd_hp)))
#>
#> | | Horsepower |
#> | | Mean SD |
#> | cyl | mean_hp sd_hp |
#> | --- - ---------- ----- |
#> | 4 | 91 |
#> | 4 | 81.8 21.87 |
#> | 6 | 131.67 37.53 |
#> | ... | ... ... |
Variable names in an R data.frame
are often very
technical (e.g., mean_hp
and sd_hp
). When
sharing the table, we may want to replace those names. In the example
above, we may want to replace mean_hp
and
sd_hp
with “Mean” and “SD”. In tablespan
renaming variables is achieved with new_name:old_name
. For
example, cyl ~ (Horsepower = Mean:mean_hp + SD:sd_hp)
renames mean_hp
to Mean
and sd_hp
to SD
:
tablespan(data = summarized_table,
formula = cyl ~ (Horsepower = Mean:mean_hp + SD:sd_hp))
#>
#> | | Horsepower |
#> | cyl | Mean SD |
#> | --- - ---------- ----- |
#> | 4 | 91 |
#> | 4 | 81.8 21.87 |
#> | 6 | 131.67 37.53 |
#> | ... | ... ... |
The combination of row names, spanners, and renaming of variables allows creating the full table:
library(dplyr)
library(tablespan)
data("mtcars")
<- mtcars |>
summarized_table group_by(cyl, vs) |>
summarise(N = n(),
mean_hp = mean(hp),
sd_hp = sd(hp),
mean_wt = mean(wt),
sd_wt = sd(wt))
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.
<- tablespan(data = summarized_table,
tbl formula = Cylinder:cyl + Engine:vs ~
+
N `Horse Power` = Mean:mean_hp + SD:sd_hp) +
(`Weight` = Mean:mean_wt + SD:sd_wt),
(title = "Motor Trend Car Road Tests",
subtitle = "A table created with tablespan",
footnote = "Data from the infamous mtcars data set.")
tbl#> Motor Trend Car Road Tests
#> A table created with tablespan
#>
#> | | Horse Power Weight |
#> | Cylinder Engine | N Mean SD Mean SD |
#> | -------- ------ - -- ----------- ----- ------ ---- |
#> | 4 0 | 1 91 2.14 |
#> | 4 1 | 10 81.8 21.87 2.3 0.6 |
#> | 6 0 | 3 131.67 37.53 2.76 0.13 |
#> | ... ... | ... ... ... ... ... |
#> Data from the infamous mtcars data set.
Using 1
on the left hand side of the formula creates a
table without row names. For example,
1 ~ (Horsepower = Mean:mean_hp + SD:sd_hp)
defines:
tablespan(data = summarized_table,
formula = 1 ~ (Horsepower = Mean:mean_hp + SD:sd_hp))
#>
#> | Horsepower |
#> | Mean SD |
#> | ---------- ----- |
#> | 91 |
#> | 81.8 21.87 |
#> | 131.67 37.53 |
#> | ... ... |
Tables created with tablespan
can now be translated to
xlsx tables with openxlsx
using
the as_excel
function:
# as_excel creates an openxlsx workbook
<- as_excel(tbl = tbl)
wb
# Save the workbook as an xlsx file:
# openxlsx::saveWorkbook(wb,
# file = "cars.xlsx",
# overwrite = TRUE)
Tables created with tablespan
can also be exported to
gt
which allows saving as HTML, LaTeX, or RTF file. To this
end, we simply have to call as_gt
on our table:
# Translate to gt:
<- as_gt(tbl = tbl)
gt_tbl gt_tbl
tablespan
allows adding styles to tables that are
automatically exported to gt
and openxlsx
. The
workflow is heavily inspired by gt
.
All functions used to style tablespan
tables start with
style_
:
style_title
: Adapt the style of the titlestyle_subtitle
: Adapt the style of the subtitlestyle_header
: Adapt the style of the headerstyle_header_cells
: Stlye the header cells in openxlsx.
Used to create the borders around header cells (only relevant for
openxlsx exports)style_column
: Add styling to the body of the tablestyle_footnote
: Adapt the style of the footnotesstyle_vline
: Adapt the style of the vertical lines in
the table. Only relevant for openxlsxstyle_hline
: Adapt the style of the horizontal lines in
the table. Only relevant for openxlsxUse the style_title
and style_subtitle
options to style title and subtitle:
|>
tbl style_title(background_color = "#000000",
text_color = "#ffffff",
bold = TRUE,
italic = TRUE) |>
style_subtitle(italic = TRUE) |>
as_gt()
All of the styles applied in the following would also be exported to
.xlsx files when using as_excel
and saving the workbook
with openxlsx
.
To adapt the header, use style_header
:
|>
tbl style_header(background_color = "#000000",
text_color = "#ffffff",
bold = TRUE,
italic = TRUE) |>
as_gt()
The body is adapted with style_column
. This function
also allows styling only specific cells in the body or styling multiple
columns at once. Columns are selected with tidy expressions. Please note
that you will have to reference the columns with their respective names
in the raw data set, not the names shown in the spanner:
|>
tbl style_column(
columns = starts_with("mean_"),
rows = 2:3,
background_color = "#000000",
text_color = "#ffffff",
bold = TRUE,
italic = TRUE) |>
as_gt()
|>
tbl style_footnote(
background_color = "#000000",
text_color = "#ffffff",
bold = TRUE,
italic = TRUE) |>
as_gt()
The current interface only exports a small amount of the styles
available in gt
and openxlsx
. However, you can
also provide custom styles:
|>
tbl style_column(
columns = dplyr::where(is.double),
# custom style for the gt table export:
gt_style = gt::cell_text(decorate = "underline"),
# custom style for the excel table export:
openxlsx_style = openxlsx::createStyle(textDecoration = "underline")
|>
) as_gt()
In addition to applying specific styles to the table, you can also
adapt the number formatting. The current setup is slightly more
complicated because unifying all of the possible styles supported by
gt
and openxlsx
is challenging.
|>
tbl format_column(
columns = dplyr::where(is.double),
rows = 2:3,
# For great tables, we need a function that
# takes in the table, columns, and rows and then
# applies the style
format_gt = function(x, columns, rows, ...) {
::fmt_number(x, columns = columns, rows = rows, decimals = 4)
gt
},# For openxlsx, we have to provide the style that will be passed
# to numFmt in openxlsx::createStyle
format_openxlsx = "0.0000"
|>
) as_gt()
The gt
package provides a wide range of functions to
adapt the style of the table created with as_gt
. For
instance, opt_stylize
adds a pre-defined style to the
entire table:
|>
gt_tbl ::opt_stylize(style = 6,
gtcolor = 'gray')
When adapting the gt
object, there is an important
detail to keep in mind: To ensure that each table spanner has a unique
ID, tablespan
will create IDs that differ from the text
shown in the spanner. To demonstrate this, Let’s assume that we want to
add a spanner above Horse Power
and
Weight
:
|>
gt_tbl ::tab_spanner(label = "New Spanner",
gtspanners = c("Horse Power", "Weight"))
#> Error in `gt::tab_spanner()`:
#> ! One or more spanner ID(s) supplied in `spanners` (Horse Power and
#> Weight), for the new spanner with the ID `New Spanner` doesn't belong to any
#> existing spanners.
This will throw an error because the spanner IDs are different from
the spanner labels. To get the spanner IDs, use
gt::tab_info()
:
|>
gt_tbl ::tab_info() gt
The IDs for the spanners can be found at the very bottom. To add
another spanner above Horse Power
and Weight
,
we have to use these IDs:
|>
gt_tbl ::tab_spanner(label = "New Spanner",
gtspanners = c("__BASE_LEVEL__Horse Power",
"__BASE_LEVEL__Weight"))