| Type: | Package | 
| Title: | Nonlinear Conditional Independence Tests | 
| Version: | 0.1.5 | 
| Date: | 2019-11-11 | 
| Author: | Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch>, Jonas Peters <jonas.peters@math.ku.dk>, Asbjoern Marco Sinius Munk <fgp998@alumni.ku.dk> | 
| Depends: | R (≥ 3.1.0) | 
| Maintainer: | Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch> | 
| Description: | Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <doi:10.48550/arXiv.1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <doi:10.48550/arXiv.1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <doi:10.48550/arXiv.1706.08576>). | 
| License: | GPL-2 | GPL-3 [expanded from: GPL] | 
| LazyData: | TRUE | 
| Imports: | methods, randomForest, quantregForest, lawstat, RPtests, caTools, mgcv, MASS, kernlab, pracma, mize | 
| URL: | https://github.com/christinaheinze/nonlinearICP-and-CondIndTests | 
| BugReports: | https://github.com/christinaheinze/nonlinearICP-and-CondIndTests/issues | 
| RoxygenNote: | 6.1.1 | 
| Suggests: | testthat | 
| NeedsCompilation: | no | 
| Packaged: | 2019-11-11 16:14:39 UTC; heinzec | 
| Repository: | CRAN | 
| Date/Publication: | 2019-11-12 06:50:21 UTC | 
Wrapper function for conditional independence tests.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
CondIndTest(Y, E, X, method = "KCI", alpha = 0.05,
  parsMethod = list(), verbose = FALSE)
Arguments
Y | 
 An n-dimensional vector or a matrix or dataframe with n rows and p columns.  | 
E | 
 An n-dimensional vector or a matrix or dataframe with n rows and p columns.  | 
X | 
 An n-dimensional vector or a matrix or dataframe with n rows and p columns.  | 
method | 
 The conditional indepdence test to use, can be one of
  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
parsMethod | 
 Named list to pass options to   | 
verbose | 
 If   | 
Value
A list with the p-value of the test (pvalue) and possibly additional
entries, depending on the output of the chosen conditional independence test in method.
References
Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576 and the corresponding reference for the conditional independence test.
Examples
# Example 1
set.seed(1)
n <- 100
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 * X^2 + Z + rnorm(n)
test1 <- CondIndTest(X,Y,Z, method = "KCI")
cat("These data come from a distribution, for which X and Y are NOT
cond. ind. given Z.")
cat(paste("The p-value of the test is: ", test1$pvalue))
# Example 2
set.seed(1)
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 + Z + rnorm(n)
test2 <- CondIndTest(X,Y,Z, method = "KCI")
cat("The data come from a distribution, for which X and Y are cond.
ind. given Z.")
cat(paste("The p-value of the test is: ", test2$pvalue))
Invariant conditional quantile prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05,
  verbose = FALSE, test = fishersTestExceedance,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)
Arguments
Y | 
 An n-dimensional vector.  | 
E | 
 An n-dimensional vector. If   | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
verbose | 
 If   | 
test | 
 Unconditional independence test that tests whether exceedence is
independent of E. Defaults to   | 
mtry | 
 Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to   | 
ntree | 
 Random forest parameter: Number of trees to grow. Defaults to 100.  | 
nodesize | 
 Random forest parameter: Minimum size of terminal nodes. Defaults to 5.  | 
maxnodes | 
 Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.  | 
quantiles | 
 Quantiles for which to test independence between exceedence and E.
Defaults to   | 
returnModel | 
 If   | 
Value
A list with the following entries:
-  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. -  
modelThe fitted quantile regression forest model ifreturnModel = TRUE. 
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
Invariant environment prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8),
  test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
Arguments
Y | 
 An n-dimensional vector.  | 
E | 
 An n-dimensional vector. If   | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
verbose | 
 If   | 
trainTestSplitFunc | 
 Function to split sample. Defaults to stratified sampling
using   | 
argsTrainTestSplitFunc | 
 Arguments for sampling splitting function.  | 
test | 
 Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and Y as predictors for E.
Defaults to   | 
mtry | 
 Random forest parameter: Number of variables randomly sampled as
candidates at each split.  Defaults to   | 
ntree | 
 Random forest parameter: Number of trees to grow. Defaults to 100.  | 
nodesize | 
 Random forest parameter: Minimum size of terminal nodes. Defaults to 5.  | 
maxnodes | 
 Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to   | 
permute | 
 Random forest parameter: If   | 
returnModel | 
 If   | 
Value
A list with the following entries:
-  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. -  
modelThe fitted models ifreturnModel = TRUE. 
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)
# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)
Invariant residual distribution test.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantResidualDistributionTest(Y, E, X, alpha = 0.05,
  verbose = FALSE, fitWithGam = TRUE,
  test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  returnModel = FALSE)
Arguments
Y | 
 An n-dimensional vector.  | 
E | 
 An n-dimensional vector. E needs to be a factor.  | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
verbose | 
 If   | 
fitWithGam | 
 If   | 
test | 
 Unconditional independence test that tests whether residual distribution is
invariant across different levels of E. Defaults to   | 
colNameNoSmooth | 
 Gam parameter: Name of variables that should enter linearly into the model.
Defaults to   | 
mtry | 
 Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to   | 
ntree | 
 Random forest parameter: Number of trees to grow. Defaults to 100.  | 
nodesize | 
 Random forest parameter: Minimum size of terminal nodes. Defaults to 5.  | 
maxnodes | 
 Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to   | 
returnModel | 
 If   | 
Value
A list with the following entries:
-  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. -  
modelThe fitted model ifreturnModel = TRUE. 
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
Invariant target prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = NULL, test = fTestTargetY,
  colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
Arguments
Y | 
 An n-dimensional vector.  | 
E | 
 An n-dimensional vector or an nxq dimensional matrix or dataframe.  | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
verbose | 
 If   | 
fitWithGam | 
 If   | 
trainTestSplitFunc | 
 Function to split sample. Defaults to stratified sampling
using   | 
argsTrainTestSplitFunc | 
 Arguments for sampling splitting function.  | 
test | 
 Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and E as predictors for Y.
Defaults to   | 
colNameNoSmooth | 
 Gam parameter: Name of variables that should enter linearly into the model.
Defaults to   | 
mtry | 
 Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to   | 
ntree | 
 Random forest parameter: Number of trees to grow. Defaults to 100.  | 
nodesize | 
 Random forest parameter: Minimum size of terminal nodes. Defaults to 5.  | 
maxnodes | 
 Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.  | 
permute | 
 Random forest parameter: If   | 
returnModel | 
 If   | 
Value
A list with the following entries:
-  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. -  
modelThe fitted models ifreturnModel = TRUE. 
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)
# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, E, X)
InvariantTargetPrediction(Y, X, E)
InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)
Kernel conditional independence test.
Description
Tests the null hypothesis that Y and E are independent given X. The distribution of the test statistic under the null hypothesis equals an infinite weighted sum of chi squared variables. This distribution can either be approximated by a gamma distribution or by a Monte Carlo approach. This version includes an implementation of choosing the hyperparameters by Gaussian Process regression.
Usage
KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE,
  gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001,
  thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)
Arguments
Y | 
 A vector of length n or a matrix or dataframe with n rows and p columns.  | 
E | 
 A vector of length n or a matrix or dataframe with n rows and p columns.  | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
width | 
 Kernel width; if it is set to zero, the width is chosen automatically (default: 0).  | 
alpha | 
 Significance level (default: 0.05).  | 
unbiased | 
 A boolean variable that indicates whether a bias correction should be applied (default: FALSE).  | 
gammaApprox | 
 A boolean variable that indicates whether the null distribution is approximated by a Gamma distribution. If it is FALSE, a Monte Carlo approach is used (default: TRUE).  | 
GP | 
 Flag whether to use Gaussian Process regression to choose the hyperparameters  | 
nRepBs | 
 Number of draws for the Monte Carlo approach (default: 500).  | 
lambda | 
 Regularization parameter (default: 1e-03).  | 
thresh | 
 Threshold for eigenvalues. Whenever eigenvalues are computed, they are set to zero if they are smaller than thresh times the maximum eigenvalue (default: 1e-05).  | 
numEig | 
 Number of eigenvalues computed (only relevant for computing the distribution under the hypothesis of conditional independence) (default: length(Y)).  | 
verbose | 
 If   | 
Value
A list with the following entries:
-  
testStatisticthe statistic Tr(K_(ddot(Y)|X) * K_(E|X)) -  
criticalValuethe critical point at the p-value equal to alpha; obtained by a Monte Carlo approach ifgammaApprox = FALSE, otherwise obtained by Gamma approximation. -  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. It is obtained by a Monte Carlo approach ifgammaApprox = FALSE, otherwise obtained by Gamma approximation. 
Examples
# Example 1
n <- 100
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
KCI(Y, E, X)
KCI(Y, X, E)
Residual prediction test.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE,
  degree = 4, basis = c("nystrom", "nystrom_poly", "fourier",
  "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL,
  noiseMat = NULL, getnoiseFct = function(n, ...) {     rnorm(n) },
  argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) {    
  abs(x) }, useX = TRUE, returnXBasis = FALSE,
  nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5,
  maxnodes = NULL)
Arguments
Y | 
 An n-dimensional vector.  | 
E | 
 An n-dimensional vector or an nxq dimensional matrix or dataframe.  | 
X | 
 A matrix or dataframe with n rows and p columns.  | 
alpha | 
 Significance level. Defaults to 0.05.  | 
verbose | 
 If   | 
degree | 
 Degree of polynomial to use if    | 
basis | 
 Can be one of    | 
resid_type | 
 Can be    | 
XBasis | 
 Basis if    | 
noiseMat | 
 Matrix with simulated noise. Defaults to NULL in which case the simulation is performed inside the function.  | 
getnoiseFct | 
 Function to use to generate the noise matrix. Defaults to   | 
argsGetNoiseFct | 
 Arguments for   | 
nSim | 
 Number of simulations to use. Defaults to 100.  | 
funcOfRes | 
 Function of residuals to use in addition to predicting the
conditional mean. Defaults to   | 
useX | 
 Set to   | 
returnXBasis | 
 Set to   | 
nSub | 
 Number of random features to use if    | 
ntree | 
 Random forest parameter: Number of trees to grow. Defaults to 500.  | 
nodesize | 
 Random forest parameter: Minimum size of terminal nodes. Defaults to 5.  | 
maxnodes | 
 Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.  | 
Value
A list with the following entries:
-  
pvalueThe p-value for the null hypothesis that Y and E are independent given X. -  
XBasisBasis expansion ifreturnXBasiswas set toTRUE. -  
fctBasisExpansionFunction used to create basis expansion if basis is not"provided". 
Examples
# Example 1
n <- 100
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)
# not run:
# # Example 3
# E <- rnorm(n)
# X <- 4 + 2 * E + rnorm(n)
# Y <- 3 * (X)^2 + rnorm(n)
# ResidualPredictionTest(Y, E, X)
# ResidualPredictionTest(Y, X, E)
F-test for a nested model comparison.
Description
Used as a subroutine in InvariantTargetPrediction to test
whether out-of-sample prediction performance is better when using X and E as predictors for Y,
compared to using X only.
Usage
fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Arguments
Y | 
 An n-dimensional vector.  | 
predictedOnlyX | 
 Predictions for Y based on predictors in X only.  | 
predictedXE | 
 Predictions for Y based on predictors in X and E.  | 
verbose | 
 Set to   | 
... | 
 The dimensions of X (df) and E (dimE) need to be passed via the ... argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY.  | 
Value
A list with the p-value for the test.
Fishers test to test whether the exceedance of the conditional quantiles is independent of the categorical variable E.
Description
Used as a subroutine in InvariantConditionalQuantilePrediction
to test whether the exceedance of the conditional quantiles
is independent of the categorical variable E.
Usage
fishersTestExceedance(Y, predicted, E, verbose)
Arguments
Y | 
 An n-dimensional vector.  | 
predicted | 
 A matrix with n rows. The columns contain predictions for different conditional quantiles of Y|X.  | 
E | 
 An n-dimensional vector.   | 
verbose | 
 Set to   | 
Value
A list with the p-value for the test.
Kolmogorov-Smirnov test to compare residual distributions
Description
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
Usage
ksResidualDistributions(Y, predicted, E, verbose)
Arguments
Y | 
 An n-dimensional vector.  | 
predicted | 
 An n-dimensional vector of predictions for Y.  | 
E | 
 An n-dimensional vector.   | 
verbose | 
 Set to   | 
Value
A list with the p-value for the test.
Levene and wilcoxon test to compare first and second moments of residual distributions
Description
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
Usage
leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)
Arguments
Y | 
 An n-dimensional vector.  | 
predicted | 
 An n-dimensional vector of predictions for Y.  | 
E | 
 An n-dimensional vector.   | 
verbose | 
 Set to   | 
Value
A list with the p-value for the test.
Proportion test to compare two misclassification rates.
Description
Used as a subroutine in InvariantEnvironmentPrediction to test
whether out-of-sample performance is better when using X and Y as predictors for E,
compared to using X only.
Usage
propTestTargetE(E, predictedOnlyX, predictedXY, verbose)
Arguments
E | 
 An n-dimensional vector.  | 
predictedOnlyX | 
 Predictions for E based on predictors in X only.  | 
predictedXY | 
 Predictions for E based on predictors in X and Y.  | 
verbose | 
 Set to   | 
Value
A list with the p-value for the test.
Wilcoxon test to compare two mean squared error rates.
Description
Used as a subroutine in InvariantTargetPrediction to test
whether out-of-sample performance is better when using X and E as predictors for Y,
compared to using X only.
Usage
wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Arguments
Y | 
 An n-dimensional vector.  | 
predictedOnlyX | 
 Predictions for Y based on predictors in X only.  | 
predictedXE | 
 Predictions for Y based on predictors in X and E.  | 
verbose | 
 Set to   | 
... | 
 Argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY.  | 
Value
A list with the p-value for the test.