phylotypr

R-CMD-check Codecov test coverage

Overview

phylotypr is a package for classification based analysis of DNA sequences. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. Although you can classify any type of sequence (assuming you have the proper database), this algorithm is mainly used to classify 16S rRNA gene sequences.

Installation

You can install the development version of phylotypr from GitHub with:

# install.packages("devtools")
devtools::install_github("mothur/phylotypr")

You can also get the official release version from CRAN

install.packages("phylotypr")

Usage

Be sure to see the Getting Started article to see an example of how you would build the database and classify individual and multiple sequences.

Reference databases

The {phylotypr} package ships with the RDP’s v.9 of their training data. This is relatively small and old (2010) relative to their latest versions. You are encouraged to install newer versions of the RDP, greengenes, and SILVA databases from the {phylotyprrefdata} package on GitHub. Note that installing the package will take about 20 minutes to install. If it sits at “moving datasets to lazyload DB” for a long time, this is normal :)

devtools::install_github("mothur/phylotyprrefdata")
library(phylotyprrefdata)

The following will list the references that are available in {phylotyprrefdata}:

data(package = "phylotyprrefdata")

More information about {phylotypr}

You can learn more about the underlying algorithm in the paper that originally described the algorithm that was published in Applied and Environmental Microbiology. If you want to learn more about how this package was created, be sure to check out the mothur YouTube channel where a playlist is available showing every step.