matsindf_apply() is a powerful and versatile function
that enables analysis of data frames by applying FUN in
helpful ways. The function is called matsindf_apply(),
because it can be used to apply FUN to a
matsindf data frame, a data frame that contains matrices as
individual entries in a data frame. (A matsindf data frame
can be created by calling collapse_to_matrices(), as
demonstrated below.)
But matsindf_apply() can apply FUN across
much more: data frames of single numbers, lists of matrices, lists of
single numbers, and individual numbers. This vignette demonstrates
matsindf_apply(), starting with simple examples and
proceeding to sophisticated analyses.
The basis of all analyses conducted with
matsindf_apply() is a function (FUN) to be
applied across data. FUN must return a named list of
variables as its result. Here is an example function that both adds and
subtracts is arguments, a and b, and returns a
list containing its result, c and d.
example_fun <- function(a, b){
return(list(c = sum_byname(a, b), d = difference_byname(a, b)))
}Similar to lapply() and its siblings, additional
argument(s) to matsindf_apply() include the data over which
FUN is to be applied. These arguments can, in the first
instance, be supplied as named arguments to the ...
argument of matsindf_apply(). The ...
arguments to matsindf_apply() are passed to
FUN according to their names. In this case, the output of
matsindf_apply() is the the named list returned by
FUN.
matsindf_apply(FUN = example_fun, a = 2, b = 1)
#> $c
#> [1] 3
#>
#> $d
#> [1] 1Passing an additional argument (z = 2) causes the
familiar unused argument error, because
example_fun does not have a z argument.
tryCatch(
matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
error = function(e){e}
)
#> <simpleError in FUN(...): unused argument (z = 2)>Failing to pass a needed argument (b = 1) causes the
familiar argument X is missing error, because
example_fun requires a value for b.
tryCatch(
matsindf_apply(FUN = example_fun, a = 2),
error = function(e){e}
)
#> <simpleError in sum_byname(a, b): argument "b" is missing, with no default>(If example_fun tolerated a missing argument, no such
error would be created.)
Alternatively, arguments to FUN can be given in a named
list to the first argument to matsindf_apply()
(.dat). When a value is assigned to .dat, the
return value from matsindf_apply() contains all named
variables in .dat (in this case both a and
b) in addition to the results provided by FUN
(in this case both c and d).
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1Extra variables are tolerated in .dat, because
.dat is considered to be a store of data from which
variables can be drawn as needed.
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1In contrast, named arguments to ... are specified by the
user, so including an extra variable is considered an error, as shown
above.
If a named argument is supplied by both .dat and
..., the argument in ... takes precedence,
overriding the argument in .dat.
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
#> $a
#> [1] 10
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 11
#>
#> $d
#> [1] 9When supplying both .dat and
..., ... can contain named strings which are
interpreted as mappings from item names in .dat to
arguments in the signature of FUN. In the example below,
a = "z" indicates that argument a to
FUN should be supplied by item z in
.dat.
matsindf_apply(list(a = 2, b = 1, z = 42),
FUN = example_fun, a = "z")
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 43
#>
#> $d
#> [1] 41If a named argument appears in both .dat and the output
of FUN, a name collision occurs in the output of
matsindf_apply(), and a warning is issued.
tryCatch(
matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
warning = function(w){w}
)
#> <simpleWarning in matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun): name collision in matsindf_apply: c>If FUN accepts mixed argument types, arguments to
matsindf_apply() can be wrapped in list() for
success.
example_fun_with_string <- function(str_a, b) {
a <- as.numeric(str_a)
list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b))
}
# Fails, because of mixed argument types.
tryCatch(
matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2),
error = function(e) {
print(e)
}
)
#> <simpleError in FUN(...): argument "str_a" is missing, with no default>
# All of the following work,
# because arguments are wrapped in list() or
# supplied in .dat.
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2))
#> added subtracted
#> 1 3 -1
matsindf_apply(FUN = example_fun_with_string,
str_a = list("1", "3"),
b = list(2, 4))
#> added subtracted
#> 1 3 -1
#> 2 7 -1
matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string)
#> $str_a
#> $str_a[[1]]
#> [1] "1"
#>
#>
#> $b
#> $b[[1]]
#> [1] 2
#>
#>
#> $added
#> $added[[1]]
#> [1] 3
#>
#>
#> $subtracted
#> $subtracted[[1]]
#> [1] -1
matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string,
str_a = "m", b = "n")
#> $m
#> $m[[1]]
#> [1] "1"
#>
#>
#> $n
#> $n[[1]]
#> [1] 2
#>
#>
#> $added
#> $added[[1]]
#> [1] 3
#>
#>
#> $subtracted
#> $subtracted[[1]]
#> [1] -1
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)),
FUN = example_fun_with_string)
#> str_a b added subtracted
#> 1 1 2 3 -1
#> 2 3 4 7 -1
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)),
FUN = example_fun_with_string,
str_a = "str_a", b = "b")
#> str_a b added subtracted
#> 1 1 2 3 -1
#> 2 3 4 7 -1
matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)),
FUN = example_fun_with_string,
str_a = "m", b = "n")
#> m n added subtracted
#> 1 1 2 3 -1
#> 2 3 4 7 -1.dat can be a list (as shown in several examples above),
but it can also be a data frame.
df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)
#> a b c d
#> 1 2 1 3 1
#> 2 3 2 5 1
#> 3 4 3 7 1Furthermore, matsindf_apply() works with a
matsindf data frame, a data frame wherein each entry in the
data frame is a matrix. To demonstrate use of
matsindf_apply() with a data frame, we’ll construct a
simple matsindf data frame (midf) using
functions in this package.
# Create a tidy data frame containing data for matrices
tidy <- data.frame(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
matnames = c(rep("U", 8), rep("V", 8)),
matvals = c(1:4, 11:14, 21:24, 31:34),
rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2),
rep(c(rep("i1", 2), rep("i2", 2)), 2)),
colnames = c(rep(c("i1", "i2"), 4),
rep(c("p1", "p2"), 4))) %>%
mutate(
rowtypes = case_when(
matnames == "U" ~ "product",
matnames == "V" ~ "industry",
TRUE ~ NA_character_
),
coltypes = case_when(
matnames == "U" ~ "industry",
matnames == "V" ~ "product",
TRUE ~ NA_character_
)
)
tidy
#> Year matnames matvals rownames colnames rowtypes coltypes
#> 1 2017 U 1 p1 i1 product industry
#> 2 2017 U 2 p1 i2 product industry
#> 3 2017 U 3 p2 i1 product industry
#> 4 2017 U 4 p2 i2 product industry
#> 5 2018 U 11 p1 i1 product industry
#> 6 2018 U 12 p1 i2 product industry
#> 7 2018 U 13 p2 i1 product industry
#> 8 2018 U 14 p2 i2 product industry
#> 9 2017 V 21 i1 p1 industry product
#> 10 2017 V 22 i1 p2 industry product
#> 11 2017 V 23 i2 p1 industry product
#> 12 2017 V 24 i2 p2 industry product
#> 13 2018 V 31 i1 p1 industry product
#> 14 2018 V 32 i1 p2 industry product
#> 15 2018 V 33 i2 p1 industry product
#> 16 2018 V 34 i2 p2 industry product
# Convert to a matsindf data frame
midf <- tidy %>%
group_by(Year, matnames) %>%
collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") %>%
spread(key = "matnames", value = "matvals")
# Take a look at the midf data frame and some of the matrices it contains.
midf
#> Year U V
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34
midf$U[[1]]
#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
midf$V[[1]]
#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"With midf in hand, we can demonstrate use of tidyverse-style
functional programming to perform matrix algebra within a data frame.
The functions of the matsbyname package (such as
difference_byname() below) can be used for this
purpose.
result <- midf %>%
mutate(
W = difference_byname(transpose_byname(V), U)
)
result
#> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
result$W[[1]]
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
result$W[[2]]
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"This way of performing matrix calculations works equally well within
a 2-row matsindf data frame (as shown above) or a 1000-row
matsindf data frame.
Users can write their own functions using
matsindf_apply(). A flexible calc_W function
can be written as follows.
calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W"){
# The inner function does all the work.
W_func <- function(U_mat, V_mat){
# When we get here, U_mat and V_mat will be single matrices or single numbers,
# not a column in a data frame or an item in a list.
# Calculate W_mat from the inputs U_mat and V_mat.
W_mat <- difference_byname(transpose_byname(V_mat), U_mat)
# Return a named list.
list(W_mat) %>% magrittr::set_names(W)
}
# The body of the main function consists of a call to matsindf_apply
# that specifies the inner function
matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}This style of writing matsindf_apply() functions is
incredibly versatile, leveraging the capabilities of both the
matsindf and matsbyname packages. (Indeed, the
Recca package uses matsindf_apply() heavily
and is built upon the functions in the matsindf and
matsbyname packages.)
Functions written like calc_W can operate in ways
similar to matsindf_apply() itself. To demonstrate, we’ll
use calc_W in all the ways that
matsindf_apply() can be used, going in the reverse order to
our demonstration of the capabilities of matsindf_apply()
above.
calc_W can be used as a specialized mutate
function that operates on matsindf data frames.
midf %>% calc_W()
#> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20The added column could be given a different name from the default
(“W”) using the W argument.
midf %>% calc_W(W = "W_prime")
#> Year U V W_prime
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20As with matsindf_apply(), column names in
midf can be mapped to the arguments of calc_W
by the arguments to calc_W.
midf %>%
rename(X = U, Y = V) %>%
calc_W(U = "X", V = "Y")
#> Year X Y W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20calc_W can operate on lists of single matrices, too.
This approach works, because the default values for the U
and V arguments to calc_W are “U” and “V”,
respectively. The input list members (in this case
midf$U[[1]] and midf$V[[1]]) are returned with
the output.
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
#> $U
#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
#>
#> $V
#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"
#>
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"It may be clearer to name the arguments as required by the
calc_W function without wrapping in a list first, as shown
below. But in this approach, the input matrices are not returned with
the output.
calc_W(U = midf$U[[1]], V = midf$V[[1]])
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"calc_W can operate on data frames containing single
numbers.
data.frame(U = c(1, 2), V = c(3, 4)) %>% calc_W()
#> U V W
#> 1 1 3 2
#> 2 2 4 2Finally, calc_W can be applied to single numbers, and
the result is 1x1 matrix.
calc_W(U = 2, V = 3)
#> $W
#> [1] 1This vignette demonstrated use of the versatile
matsindf_apply() function. Inputs to
matsindf_apply() can be
matsindf_apply() can be used for programming, and
functions constructed as demonstrated above share characteristics with
matsindf_apply():
dplyr::mutate()
operators, and