| Type: | Package | 
| Title: | Leave One Out Kernel Density Estimates for Outlier Detection | 
| Version: | 0.1.4 | 
| Maintainer: | Sevvandi Kandanaarachchi <sevvandik@gmail.com> | 
| Description: | Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.1 | 
| Imports: | TDAstats, evd, RANN, ggplot2, tidyr | 
| Suggests: | knitr, rmarkdown | 
| URL: | https://sevvandi.github.io/lookout/ | 
| NeedsCompilation: | no | 
| Packaged: | 2022-10-13 23:23:55 UTC; kan092 | 
| Author: | Sevvandi Kandanaarachchi
     | 
| Repository: | CRAN | 
| Date/Publication: | 2022-10-14 00:10:02 UTC | 
lookout: Leave One Out Kernel Density Estimates for Outlier Detection
Description
Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers.
Author(s)
Maintainer: Sevvandi Kandanaarachchi sevvandik@gmail.com (ORCID)
Authors:
Rob Hyndman rob.hyndman@monash.edu (ORCID)
Other contributors:
Chris Fraley fraley@u.washington.edu [contributor]
See Also
Useful links:
Plots outliers identified by lookout algorithm.
Description
Scatterplot of two columns from the data set with outliers highlighted.
Usage
## S3 method for class 'lookoutliers'
autoplot(object, columns = 1:2, ...)
Arguments
object | 
 The output of the function 'lookout'.  | 
columns | 
 Which columns of the original data to plot (specified as either numbers or strings)  | 
... | 
 Other arguments currently ignored.  | 
Value
A ggplot object.
Examples
X <- rbind(
  data.frame(x = rnorm(500),
             y = rnorm(500)),
  data.frame(x = rnorm(5, mean = 10, sd = 0.2),
             y = rnorm(5, mean = 10, sd = 0.2))
)
lo <- lookout(X)
autoplot(lo)
Plots outlier persistence for a range of significance levels.
Description
This function plots outlier persistence for a range of significance levels using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
## S3 method for class 'persistingoutliers'
autoplot(object, alpha = object$alpha, ...)
Arguments
object | 
 The output of the function 'persisting_outliers'.  | 
alpha | 
 The significance levels to plot.  | 
... | 
 Other arguments currently ignored.  | 
Value
A ggplot object.
Examples
X <- rbind(
  data.frame(
    x = rnorm(500),
    y = rnorm(500)
  ),
  data.frame(
    x = rnorm(5, mean = 10, sd = 0.2),
    y = rnorm(5, mean = 10, sd = 0.2)
  )
)
plot(X, pch = 19)
outliers <- persisting_outliers(X, unitize = FALSE)
autoplot(outliers)
Identifies outliers using the algorithm lookout.
Description
This function identifies outliers using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
lookout(X, alpha = 0.05, unitize = TRUE, bw = NULL, gpd = NULL, fast = TRUE)
Arguments
X | 
 The input data in a dataframe, matrix or tibble format.  | 
alpha | 
 The level of significance. Default is   | 
unitize | 
 An option to normalize the data. Default is   | 
bw | 
 Bandwidth parameter. Default is   | 
gpd | 
 Generalized Pareto distribution parameters. If 'NULL' (the default), these are estimated from the data.  | 
fast | 
 If set to   | 
Value
A list with the following components:
outliers | 
 The set of outliers.  | 
outlier_probability | 
 The GPD probability of the data.  | 
outlier_scores | 
 The outlier scores of the data.  | 
bandwidth | 
 The bandwdith selected using persistent homology.  | 
kde | 
 The kernel density estimate values.  | 
lookde | 
 The leave-one-out kde values.  | 
gpd | 
 The fitted GPD parameters.  | 
Examples
X <- rbind(
  data.frame(x = rnorm(500),
             y = rnorm(500)),
  data.frame(x = rnorm(5, mean = 10, sd = 0.2),
             y = rnorm(5, mean = 10, sd = 0.2))
)
lo <- lookout(X)
lo
autoplot(lo)
Identifies outliers in univariate time series using the algorithm lookout.
Description
This is the time series implementation of lookout.
Usage
lookout_ts(x, alpha = 0.05)
Arguments
x | 
 The input univariate time series.  | 
alpha | 
 The level of significance. Default is   | 
Value
A lookout object.
See Also
Examples
set.seed(1)
x <- arima.sim(list(order = c(1,1,0), ar = 0.8), n = 200)
x[50] <- x[50] + 10
plot(x)
lo <- lookout_ts(x)
lo
Computes outlier persistence for a range of significance values.
Description
This function computes outlier persistence for a range of significance values, using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
persisting_outliers(
  X,
  alpha = seq(0.01, 0.1, by = 0.01),
  st_qq = 0.9,
  unitize = TRUE,
  num_steps = 20
)
Arguments
X | 
 The input data in a matrix, data.frame, or tibble format. All columns should be numeric.  | 
alpha | 
 Grid of significance levels.  | 
st_qq | 
 The starting quantile for death radii sequence. This will be used to compute the starting bandwidth value.  | 
unitize | 
 An option to normalize the data. Default is   | 
num_steps | 
 The length of the bandwidth sequence.  | 
Value
A list with the following components:
out | 
 A 3D array of   | 
bw | 
 The set of bandwidth values.  | 
gpdparas | 
 The GPD parameters used.  | 
lookoutbw | 
 The bandwidth chosen by the algorithm   | 
Examples
X <- rbind(
  data.frame(x = rnorm(500),
             y = rnorm(500)),
  data.frame(x = rnorm(5, mean = 10, sd = 0.2),
             y = rnorm(5, mean = 10, sd = 0.2))
)
plot(X, pch = 19)
outliers <- persisting_outliers(X, unitize = FALSE)
outliers
autoplot(outliers)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- ggplot2