For the purpose of this tutorial, we are going to use the example
module bio/seq. The module implements some very basic
mechanisms for dealing with DNA sequences (= character strings
consisting of the letters A, C, G
and T).
First, we load the module:
box::use(./bio/seq)The function box::use accepts a list of
unquoted, qualified module names. Each of these module
names will load a single module and make it available to the caller in
some form. In the code above, we’ve loaded a single module,
seq/seq. bio serves as a parent
module that may group several submodules. Since the module name
inside box::use starts with ./, the module
location is resolved locally, i.e. relative to the path of the
currently running code.
In the above, seq is the module’s proper name.
bio/seq is its fully qualified name. And
./bio/seq is its use declaration.
To see the effect of this use declaration, let’s inspect
our workspace:
ls()## [1] "seq"
seq## <module: ./bio/seq>
We have used the module’s fully qualified name to load it. But, as
shown by ls, loading the module this way only introduces a
single new name into the current scope, the module itself, identified by
its proper (non-qualified) name.
To see which names a module exports, we use ls again,
this time on the module itself:
ls(seq)## [1] "is_valid" "revcomp" "seq" "table"
It appears that seq exports 4 different names. To access
exported names, we use the $ operator:
seq$is_valid allows us to use the first function in the
list of exported names. We can also display the interactive help for
individual names using the box::help function, e.g.:
box::help(seq$revcomp)Now let’s actually use the module. The seq
function inside the dna/seq module constructs a set of
(optionally named) biological sequences:
s = seq$seq(
gene1 = 'GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC',
gene2 = 'CATAGCAACTGACATCACAGCG'
)
seq$is_valid(s)## [1] TRUE
s## 2 DNA sequences:
## >gene1
## GATTACAGATCAGCTCAGCACCTAGCA...
## >gene2
## CATAGCAACTGACATCACAGCG
Note how we automatically get pretty-printed (FASTA) output
because the print method (which gets called implicitly
here) is specialised for the 'bio/seq' S3 class in the
bio/seq module (prefixing S3 classes inside modules with
the full module name is a convention to avoid name clashes of S3
classes):
getS3method('print', 'bio/seq')## function (x) {
## box::use(stringr[str_trunc])
##
## if (is.null(names(x))) names(x) = paste('seq', seq_along(x))
##
## cat(
## sprintf('%d DNA sequence%s:\n', length(x), if (length(x) == 1L) '' else 's'),
## sprintf(' >%s\n %s\n', names(x), str_trunc(x, 30L)),
## sep = ''
## )
## invisible(x)
## }
## <environment: 0x7fa5beb20630>
The source code for print.bio/seq contains an
interesting use declaration. It show-cases an alternative
way of invoking box::use, which we’ll explore now.
Let’s have a look at alternative ways of using modules.
To start, let’s unload the bio/seq module …
box::unload(seq)… and load it again, via a different route:
options(box.path = getwd())
box::use(bio/seq[revcomp, is_valid])After unloading the already loaded module,
options(box.path = …) sets the module search path: this is
where box::use searches for modules. If more than one path
is given, box::use searches them all until a module of
matching name is found. This works analogously to how
.libPaths operates on R packages.
The box::use directive can now use bio/seq
instead of ./bio/seq as the module name: rather than a
relative name we specify a global name. In this example we set
the search path to the current working directory but in normal usage it
would be a global library location, e.g. (following the XDG
base directory specification) ~/.local/R/modules on
Linux.
Note that non-local module names must be
fully qualified, nested modules: box::use(foo/bar) works,
box::use(bar) does not (instead, it is assumed that
bar refers to a package)!
In the declaration above we use [revcomp] to specify
that the revcomp name from the bio/seq module
should be attached in the calling environment. The […] part
is an attach specification: a comma-separated list of names
inside the parentheses specifies which names to attach. The special
symbol ... specifies that all exported names
should be attached. This has an effect similar to conventional package
loading via library (or attaching an
environment): all the attached names are now available for direct use
without necessitating the seq$ qualifier:
is_valid(s)## [1] TRUE
revcomp(s)## 2 DNA sequences:
## >gene1
## GTTGCTGATAGTGCTAGGTGCTGAGCT...
## >gene2
## CGCTGTGATGTCAGTTGCTATG
However, unlike the attach function, module attachment
happens in the current, local scope only.
Since the above code was executed in the global environment, there’s no distinction between local and global scope:
search()## [1] ".GlobalEnv" "mod:bio/seq" "mod:./source_file"
## [4] "devtools_shims" "package:testthat" "package:stats"
## [7] "package:graphics" "package:grDevices" "package:utils"
## [10] "package:datasets" "renv:shims" "package:methods"
## [13] "Autoloads" "package:base"
Note the second item, which reads “mod:bio/seq”. But
let’s now undo that, to attach (and use) the module locally instead:
detach()
seq_table = function (s) {
box::use(./bio/seq[...])
table(s)
}
seq_table(s)## $gene1
## A C G T
## 13 12 6 7
##
## $gene2
## A C G T
## 8 7 4 3
Unlike above, we are now attaching all exported names
instead of specifying individual names. The subsequent line of code uses
the seq$table function rather than base::table
(which would have a different output). And note that the
seq module’s table function is not
attached outside the local scope:
search()## [1] ".GlobalEnv" "mod:./source_file" "devtools_shims"
## [4] "package:testthat" "package:stats" "package:graphics"
## [7] "package:grDevices" "package:utils" "package:datasets"
## [10] "renv:shims" "package:methods" "Autoloads"
## [13] "package:base"
table(s)## s
## CATAGCAACTGACATCACAGCG GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC
## 1 1
This is very powerful, as it isolates separate scopes more
effectively than the attach function. What is more, modules
which are used and attached inside another module remain inside
that module and are not visible outside the module by default.
Nevertheless, the normal, recommended usage of a module is without an attach specification, as this makes it clearer which names are being referring to.
The module bio/seq, which we have used in the previous
section, is implemented in the file bio/seq.r. The file seq.r
is, by and large, a normal R source file, which happens to live in a
directory named bio.
In fact, there are only three things worth mentioning:
Documentation. Functions in the module file can be documented
using ‘roxygen2’ syntax.
It works the same as for packages. The ‘box’ package parses the
documentation and makes it available via box::help.
Displaying module help requires that ‘roxygen2’ is
installed.
Export declarations. Similar to packages, modules explicitly need
to declare which names they export; they do this using the annotation
comment #' @export in front of the name. Again, this
works similarly to ‘roxygen2’ (but does not require having that
package installed).
The module exports S3
functions. The ‘box’ package takes care to register such functions
automatically but this only works for user generics that are
defined inside the same module. When overriding “known generics” (such
as print), we need to register these manually via
register_S3_method (this is necessary since these functions
are inherently ambiguous and there is no automatic way of finding
them).
Modules can also form nested hierarchies. In fact, here is the
implementation of bio (in bio/__init__.r: since
bio is a directory rather than a file, the module
implementation resides in the nested file __init__.r):
#' @export
box::use(./seq)The submodule is specified as ./seq rather than
seq: the explicitly provided relative path prevents lookup
in the import search path (that we set via
options(box.path = …)); instead, only the current directory
(that is, the directory containing the bio module) is
considered.
We can now use the bio module:
options(box.path = NULL) # Reset search path
box::use(./bio)
ls(bio)## [1] "seq"
ls(bio$seq)## [1] "is_valid" "revcomp" "seq" "table"
bio$seq$revcomp('CAT')## 1 DNA sequence:
## >seq 1
## ATG
We could also have implemented bio as follows:
#' @export
box::use(./seq[...])This would have made all of seq’s definitions
immediately available in bio, without having to always
write seq$…. This is sometimes useful, but should be
employed with care: being explicit about namespaces generally increases
code robustness and readability.
Modules define functions and values. To execute code when a module is
loaded, put it inside a function with the name .on_load.
This function is similar to the hook for the .onLoad
package namespace event.
This function is executed the first time the module is loaded in an R
session. Subsequent calls to box::use for that module,
regardless of whether they occur in a different scope, will refer to the
already loaded, cached module, and will not reload the
module.
We can illustrate this by loading a module which has side-effects,
info.
.on_load = function (ns) {
message(
'Loading module "', box::name(), '"\n',
'Module path: "', basename(box::file()), '"'
)
}
box::export() # Mark as a ‘box’ module.Let’s use it:
box::use(./info)## Loading module "info"
## Module path: "vignettes"
We have imported the module, and get the diagnostic messages. Let’s re-use the module:
box::use(./info)… no messages are displayed. However, we can explicitly reload a module. This clears the cache, and loads the module again. This can be useful during development and debugging:
box::reload(info)## Loading module "info"
## Module path: "vignettes"
And this displays the messages again. The reload
function is a shortcut for unload followed by
import (using the exact same arguments as used on the
original import call).
This info module also show-cases two important helper
functions:
box::name returns the name of the module with which
it was loaded. This is especially handy because, when called outside of
a module, box::name is NULL. This allows
testing whether a piece of code was loaded as a module, or invoked
directly (e.g. via Rscript on the command line).
box::file is similar to system.file: it
returns the full path to any file within the directory where a module is
stored. This is useful when distributing data files with modules, which
are loaded from within the module. When invoked without arguments,
box::file returns the full path to the directory containing
the module source file.