| Title: | A Naive IPA Tokeniser | 
| Version: | 0.1.0 | 
| Date: | 2025-02-23 | 
| Description: | It provides users with functions to parse International Phonetic Alphabet (IPA) transcriptions into individual phones (tokenisation) based on default IPA symbols and optional user specified multi-character phones. The tokenised transcriptions can be used for obtaining counts of phones or for searching for words matching phonetic patterns. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.3.2 | 
| Imports: | cli, dplyr, lifecycle, magrittr, stringi, stringr, tibble, Unicode | 
| Depends: | R (≥ 2.10) | 
| Suggests: | rmarkdown, knitr, tidyverse | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/stefanocoretta/phonetisr, https://stefanocoretta.github.io/phonetisr/ | 
| NeedsCompilation: | no | 
| Packaged: | 2025-02-25 13:39:25 UTC; ste | 
| Author: | Stefano Coretta  | 
| Maintainer: | Stefano Coretta <stefano.coretta@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-02-26 13:10:02 UTC | 
phonetisr: A Naive IPA Tokeniser
Description
It provides users with functions to parse International Phonetic Alphabet (IPA) transcriptions into individual phones (tokenisation) based on default IPA symbols and optional user specified multi-character phones. The tokenised transcriptions can be used for obtaining counts of phones or for searching for words matching phonetic patterns.
Author(s)
Maintainer: Stefano Coretta stefano.coretta@gmail.com (ORCID)
See Also
Useful links:
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs | 
 A value or the magrittr placeholder.  | 
rhs | 
 A function call using the magrittr semantics.  | 
Value
The result of calling rhs(lhs).
Add features to list of phones
Description
This function counts occurrences of phones and includes basic phonetic features.
Usage
featurise(phlist)
Arguments
phlist | 
 A list of phones or the output of   | 
Value
A tibble.
Examples
ipa <- c("ada", "buba", "kiki", "sa\u0283a")
ip_ph <- phonetise(ipa)
featurise(ip_ph)
Get non-IPA characters.
Description
Given a vector of characters, it returns those which are not part of the IPA.
Usage
get_no_ipa(chars)
Arguments
chars | 
 A vector of characters.  | 
Value
A vector.
Examples
get_no_ipa(c("a", "\0283", ">"))
List of IPA symbols
Description
List of IPA symbols
Usage
ipa_symbols
Format
A data frame with 143 rows and 12 variables:
- IPA
 IPA symbol.
- unicode
 Unicode code.
- uni_name
 Unicode name.
- ipa_name
 IPA name.
- phon_type
 The phonetic type of the symbol.
- type
 General character type (
consonant,vowel,diacritic).- height_ipa
 Vowel openness.
- height
 Vowel height.
- backness
 Vowel backness.
- rounding
 Vowel rounding.
- voicing
 Consonant voicing.
- place
 Consonant place of articulation.
- manner
 Consonant manner of articulation.
- lateral
 Is the consonant lateral?
- sonorant
 Is the phone sonorant?
Klingon Swadesh list
Description
The Swadesh list in Klingon.
Usage
kl_swadesh
Format
A data frame with 195 rows and 4 variables:
- id
 Swadesh list item number.
- gloss
 English gloss.
- translit
 Klingon transliteration.
- ipa
 IPA transcription.
Search phones
Description
Given a vector of phonetised strings, find phones.
Usage
ph_search(phlist, phonex)
Arguments
phlist | 
 The output of   | 
phonex | 
 A phonetic expression. Supported shorthands are   | 
Value
A list.
Examples
ipa <- c("p\u02B0a\u0303k\u02B0", "t\u02B0um\u0325", "\u025Bk\u02B0\u026F", "pun")
ph <- c("p\u02B0", "t\u02B0", "k\u02B0", "a\u0303", "m\u0325")
ipa_ph <- phonetise(ipa, multi = ph)
ph_search(ipa_ph, "#CV")
# partial matches are also returned
ph_search(ipa_ph, "p")
# use regular expressions
ph_search(ipa_ph, "p\u02B0?V")
Tokenise IPA strings
Description
phonetise() tokenises strings of IPA symbols (like phonetic transcriptions
of words) into individual "phones". The output is a list.
Usage
phonetise(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)
phonetize(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)
Arguments
strings | 
 A character vector with a list of words in IPA.  | 
multi | 
 A character vector of one or more multi-character phones as strings.  | 
regex | 
 A string with a regular expression to match several multi-character phones.  | 
split | 
 If set to   | 
sep | 
 A character to be used as the separator of the phones if   | 
sanitise | 
 Whether to remove all non-IPA characters (  | 
ignore_stress | 
 If   | 
ignore_tone | 
 If   | 
diacritics | 
 If set to   | 
affricates | 
 If set to   | 
v_sequences | 
 If set to   | 
prenasalised | 
 If set to   | 
all_multi | 
 If set to   | 
sanitize | 
 Alias of   | 
Value
A list of phonetised strings.
Examples
# using unicode escapes for CRAN policy
ipa <- c("p\u02B0a\u0303k\u02B0", "t\u02B0um\u0325", "\u025Bk\u02B0\u026F")
ph <- c("p\u02B0", "t\u02B0", "k\u02B0", "a\u0303", "m\u0325")
phonetise(ipa, multi = ph)
ph_2 <- ph[4:5]
# Match any character followed by <\u02B0> with ".\u02B0".
phonetise(ipa, multi = ph_2, regex = ".\u02B0")
# Same result.
phonetise(ipa, regex = ".(\u0303|\u0325|\u02B0)")
# Don't split strings and use "." as separator
phonetise(ipa, multi = ph, split = FALSE, sep = ".")