| Type: | Package |
| Title: | A Companion to the Multi-CAST Collection |
| Version: | 2.0.0 |
| URL: | https://multicast.aspra.uni-bamberg.de/ |
| Description: | Provides a basic interface for accessing annotation data from the Multi-CAST collection, a database of spoken natural language texts edited by Geoffrey Haig and Stefan Schnell. The collection draws from a diverse set of languages and has been annotated across multiple levels. Annotation data is downloaded on request from the servers of the University of Bamberg. See the Multi-CAST website https://multicast.aspra.uni-bamberg.de/ for more information and a list of related publications. |
| License: | CC BY 4.0 |
| Encoding: | UTF-8 |
| LazyData: | true |
| Depends: | R (≥ 3.0.0), |
| RoxygenNote: | 7.1.1 |
| NeedsCompilation: | no |
| Packaged: | 2021-02-22 00:06:07 UTC; Nils |
| Author: | Nils Norman Schiborr [aut, cre] |
| Maintainer: | Nils Norman Schiborr <nils-norman.schiborr@uni-bamberg.de> |
| Repository: | CRAN |
| Date/Publication: | 2021-02-22 19:20:02 UTC |
Count clauses in a multicastR table
Description
mc_clauses counts the number of clause units (bounded by the
<##> or <#> GRAID annotation symbols) in a multicastR table.
Usage
mc_clauses(x, bytext = FALSE, printToConsole = FALSE)
Arguments
x |
A |
bytext |
Logical. If |
printToConsole |
Logical. If |
Value
A data.frame with the corpus, text (if
bytext is TRUE), the number of valid clause units in each
corpus (nClause), the total number of clause units (nAll),
the number of clause units not analyzed (nNC), and the percentage
the later make up of the total (pNC).
See Also
multicast, mc_index,
mc_metadata, mc_referents,
mc_clauses
Examples
## Not run:
# count clause units in the most recent version
# of the Multi-CAST data, by corpus
n <- mc_clauses(multicast())
# count by text instead
m <- mc_clauses(multicast(), bytext = TRUE)
# number of clauses units in the whole collection
sum(n$nClauses)
## End(Not run)
Access the Multi-CAST version index
Description
mc_index downloads a tabular index of the versions of the Multi-CAST
corpus data from the servers of the University of Bamberg. The value in the
leftmost version column may be passed to the multicast
method for access to earlier versions of the annotations.
Usage
mc_index()
Value
A data.frame with five columns:
[, 1] versionVersion key. Used for the
vkeyargument of other functions in this package.[, 2] datePublication date in YYYY-MM-DD format.
[, 3] corporaNumber of corpora (languages).
[, 4] textsNumber of texts.
[, 5] sizeTotal file size in kilobytes.
See Also
multicast, mc_metadata,
mc_referents, mc_clauses
Examples
## Not run:
# retrieve version index
mc_index()
## End(Not run)
Access the Multi-CAST metadata
Description
mc_metadata downloads a table with metadata on the texts and speakers
in the Multi-CAST collection from the servers of the University of Bamberg.
Usage
mc_metadata(vkey = NULL)
Arguments
vkey |
A four-digit number specifying the requested version of the
metadata. Must be one of the version keys listed in the first column of
|
Value
A data.frame containing metadata on the Multi-CAST
collection. The table has the following eight columns:
[, 1] corpusThe name of the corpus.
[, 2] textThe title of the text.
[, 3] typeThe text type, either
TN'traditional narrative',AN'autobiographical narrative', orSN'stimulus-based narrative'.[, 4] recordedThe year (YYYY) the text was recorded.
[, 5] speakerThe identifier for the speaker.
[, 6] genderThe speaker's gender.
[, 7] ageThe speaker's age at the time of recording. Approximate values are prefixed with a
c.[, 8] bornThe speaker's birth year (YYY). Approximate values are prefixed with a
c.
See Also
multicast, mc_index,
mc_referents, mc_clauses
Examples
## Not run:
# retrieve the most recent version of the Multi-CAST metadata
mc_metadata()
# retrieve the lists of referents published in January 2021
mc_metadata(2101)
# join the metadata to a table with annotation values
mc <- multicast()
merge(mc, mc_metadata(),
by = c("corpus", "text"))
## End(Not run)
Access the Multi-CAST list of referents
Description
mc_referents downloads a tabular list of all discourse referents
occuring in those texts in the Multi-CAST collection that have been annotated
with the RefIND scheme (Schiborr et al. 2018). The data are downloaded from
the servers of University of Bamberg.
Usage
mc_referents(vkey = NULL)
Arguments
vkey |
A four-digit number specifying the requested version of the list
of referents. Must be one of the version keys listed in the first column of
|
Value
A data.frame containing a list of referents for all
texts with RefIND annotations in the Multi-CAST collection. The table has
the following eight columns:
[, 1] corpusThe name of the corpus.
[, 2] textThe title of the text.
[, 3] refindThe four-digit referent index, unique to each referent in a text.
[, 4] labelThe label used for the referent.
[, 5] descriptionA short description of the referent.
[, 6] classThe semantic class of the referent. Legend:
hum= human,anm= animate,inm= inanimate,bdp= body part,mss= mass,loc= location,tme= time,abs= abstract.[, 7] relationsRelations of the referent to other referents. Legend:
<= set member of (partial co-reference),>= includes (split antecedence),M= part-whole.[, 8] notesAnnotators' notes on the referent and its properties.
See Also
multicast, mc_index,
mc_metadata, mc_clauses
Examples
## Not run:
# retrieve the most recent version of the Multi-CAST list of referents
mc_referents()
# retrieve the lists of referents published in January 2021
mc_referents(2021)
# join the list of referents to a table with annotation values
mc <- multicast()
merge(mc, mc_referents(),
by = c("corpus", "text", "refind"),
all.x = TRUE)
## End(Not run)
Access Multi-CAST annotation data
Description
multicast downloads corpus data from the Multi-CAST collection (Haig &
Schnell 2015) from the servers of the University of Bamberg. As the
Multi-CAST collection is continuously evolving through the addition of
further data sets and the revision of older annotations, the multicast
function takes an optional argument vkey to select earlier versions of
the annotation data, ensuring scientific accountability and the
reproducibility of results.
Usage
multicast(vkey = NULL)
Arguments
vkey |
A four-digit number specifying the requested version of the
metadata. Must be one of the version keys listed in the first column of
|
Value
A data.frame with eleven columns:
[, 1] corpusThe name of the corpus.
[, 2] textThe name of the text.
[, 3] uidThe utterance identifier. Uniquely identifies an utterance within a text.
[, 4] gwordGrammatical words. The tokenized utterances in the object language.
[, 5] glossMorphological glosses following the Leipzig Glossing Rules.
[, 6] graidAnnotations with the GRAID scheme (Haig & Schnell 2014).
[, 7] gformThe form symbol of a GRAID gloss.
[, 8] ganimThe person-animacy symbol of a GRAID gloss.
[, 9] gfuncThe function symbol of a GRAID gloss.
[, 10] refindReferent tracking using the RefIND scheme (Schiborr et al. 2018).
[, 11] isnrefAnnotations of the information status of newly introduced referents.
Licensing
The Multi-CAST annotation data accessed by this package are published under a Create Commons Attribution 4.0 International (CC-BY 4.0) licence (https://creativecommons.org/licenses/by-sa/4.0/). Please refer to the Multi-CAST website for information on how to give proper credit to its contributors.
Citing Multi-CAST
Data from the Multi-CAST collection should be cited as:
Haig, Geoffrey & Schnell, Stefan (eds.). 2015. Multi-CAST: Multilinguial Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/) (Accessed date.)
If
for some reason you need to cite this package specifically, please refer to
citation(multicastR).
References
Haig, Geoffrey & Schnell, Stefan. 2014. Annotations using GRAID (Grammatical Relations and Animacy in Discourse): Introduction and guidelines for annotators. Version 7.0. (https://multicast.aspra.uni-bamberg.de/#annotations)
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND – Referent Indexing in Natural-language Discourse: Annotation guidelines. Version 1.1. (https://multicast.aspra.uni-bamberg.de/#annotations)
See Also
mc_index, mc_metadata,
mc_referents, mc_clauses
Examples
## Not run:
# retrieve and print the most recent version of the
# Multi-CAST annotations
multicast()
# retrieve the version of the annotation data published
# in January 2021
multicast(2021)
## End(Not run)
multicastR: A companion to the Multi-CAST collection.
Description
The multicastR package provides a basic interface for accessing the
annotated corpus data in the Multi-CAST collection (edited by Geoffrey Haig
and Stefan Schnell), a database of spoken natural language texts that draws
from a diverse set of languages.
The corpus data are downloaded on command from the servers of the University
of Bamberg via the multicast method. Details on the
Multi-CAST project and a list of publications can be found online at
https://multicast.aspra.uni-bamberg.de/.
Licensing
The Multi-CAST annotation data accessed by this package are published under a Create Commons Attribution 4.0 International (CC-BY 4.0) licence (https://creativecommons.org/licenses/by-sa/4.0/). Please refer to the Multi-CAST website for information on how to give proper credit to its contributors.
Citing Multi-CAST
Data from the Multi-CAST collection should be cited as:
Haig, Geoffrey & Schnell, Stefan (eds.). 2015. Multi-CAST: Multilinguial Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/) (Accessed date.)
If for some reason you need to cite this package
specifically, please refer to citation(multicastR).