| Type: | Package | 
| Title: | Assumption-Lean and Data-Adaptive Post-Prediction Inference | 
| Version: | 1.0.0 | 
| Maintainer: | Jiacheng Miao <jiacheng.miao@wisc.edu> | 
| Description: | Implementation of assumption-lean and data-adaptive post-prediction inference (POPInf), for valid and efficient statistical inference based on data predicted by machine learning. See Miao, Miao, Wu, Zhao, and Lu (2023) <doi:10.48550/arXiv.2311.14220>. | 
| URL: | https://arxiv.org/abs/2311.14220, https://github.com/qlu-lab/POPInf | 
| Depends: | R (≥ 3.5.0), | 
| Imports: | randomForest, MASS | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| NeedsCompilation: | no | 
| Packaged: | 2024-02-19 18:38:56 UTC; jiacheng | 
| Author: | Jiacheng Miao  | 
| Repository: | CRAN | 
| Date/Publication: | 2024-02-20 20:40:12 UTC | 
Calculation of the matrix A based on single dataset
Description
A function for the calculation of the matrix A based on single dataset
Usage
A(X, Y, quant = NA, theta, method)
Arguments
X | 
 Array or DataFrame containing covariates  | 
Y | 
 Array or DataFrame of outcomes  | 
quant | 
 quantile for quantile estimation  | 
theta | 
 parameter theta  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
matrix A based on single dataset
Variance-covariance matrix of the estimation equation
Description
Sigma_cal function for variance-covariance matrix of the estimation equation
Usage
Sigma_cal(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  A_lab_inv,
  A_unlab_inv,
  method
)
Arguments
X_lab | 
 Array or DataFrame containing observed covariates in labeled data.  | 
X_unlab | 
 Array or DataFrame containing observed or predicted covariates in unlabeled data.  | 
Y_lab | 
 Array or DataFrame of observed outcomes in labeled data.  | 
Yhat_lab | 
 Array or DataFrame of predicted outcomes in labeled data.  | 
Yhat_unlab | 
 Array or DataFrame of predicted outcomes in unlabeled data.  | 
w | 
 weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
A_lab_inv | 
 Inverse of matrix A using labeled data  | 
A_unlab_inv | 
 Inverse of matrix A using unlabeled data  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
variance-covariance matrix of the estimation equation
Initial estimation
Description
est_ini function for initial estimation
Usage
est_ini(X, Y, quant = NA, method)
Arguments
X | 
 Array or DataFrame containing covariates  | 
Y | 
 Array or DataFrame of outcomes  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
initial estimatior
Hessians of the link function
Description
link_Hessian function for Hessians of the link function
Usage
link_Hessian(t, method)
Arguments
t | 
 t  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
Hessians of the link function
gradient of the link function
Description
link_grad function for gradient of the link function
Usage
link_grad(t, method)
Arguments
t | 
 t  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
gradient of the link function
Sample expectation of psi
Description
mean_psi function for sample expectation of psi
Usage
mean_psi(X, Y, theta, quant = NA, method)
Arguments
X | 
 Array or DataFrame containing covariates  | 
Y | 
 Array or DataFrame of outcomes  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
sample expectation of psi
Sample expectation of POP-Inf psi
Description
mean_psi_pop function for sample expectation of POP-Inf psi
Usage
mean_psi_pop(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)
Arguments
X_lab | 
 Array or DataFrame containing observed covariates in labeled data.  | 
X_unlab | 
 Array or DataFrame containing observed or predicted covariates in unlabeled data.  | 
Y_lab | 
 Array or DataFrame of observed outcomes in labeled data.  | 
Yhat_lab | 
 Array or DataFrame of predicted outcomes in labeled data.  | 
Yhat_unlab | 
 Array or DataFrame of predicted outcomes in unlabeled data.  | 
w | 
 weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
sample expectation of POP-Inf psi
Gradient descent for obtaining estimator
Description
optim_est function for gradient descent for obtaining estimator
Usage
optim_est(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method,
  step_size = 0.1,
  max_iterations = 500,
  convergence_threshold = 1e-06
)
Arguments
X_lab | 
 Array or DataFrame containing observed covariates in labeled data.  | 
X_unlab | 
 Array or DataFrame containing observed or predicted covariates in unlabeled data.  | 
Y_lab | 
 Array or DataFrame of observed outcomes in labeled data.  | 
Yhat_lab | 
 Array or DataFrame of predicted outcomes in labeled data.  | 
Yhat_unlab | 
 Array or DataFrame of predicted outcomes in unlabeled data.  | 
w | 
 weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
step_size | 
 step size for gradient descent  | 
max_iterations | 
 maximum of iterations for gradient descent  | 
convergence_threshold | 
 convergence threshold for gradient descent  | 
Value
estimator
Gradient descent for obtaining the weight vector
Description
optim_weights function for gradient descent for obtaining estimator
Usage
optim_weights(
  j,
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)
Arguments
j | 
 j-th coordinate of weights vector  | 
X_lab | 
 Array or DataFrame containing observed covariates in labeled data.  | 
X_unlab | 
 Array or DataFrame containing observed or predicted covariates in unlabeled data.  | 
Y_lab | 
 Array or DataFrame of observed outcomes in labeled data.  | 
Yhat_lab | 
 Array or DataFrame of predicted outcomes in labeled data.  | 
Yhat_unlab | 
 Array or DataFrame of predicted outcomes in unlabeled data.  | 
w | 
 weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
weights
POP-Inf M-Estimation
Description
pop_M function conducts post-prediction M-Estimation.
Usage
pop_M(
  X_lab = NA,
  X_unlab = NA,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  alpha = 0.05,
  weights = NA,
  max_iterations = 100,
  convergence_threshold = 0.05,
  quant = NA,
  intercept = FALSE,
  focal_index = NA,
  method
)
Arguments
X_lab | 
 Array or DataFrame containing observed covariates in labeled data.  | 
X_unlab | 
 Array or DataFrame containing observed or predicted covariates in unlabeled data.  | 
Y_lab | 
 Array or DataFrame of observed outcomes in labeled data.  | 
Yhat_lab | 
 Array or DataFrame of predicted outcomes in labeled data.  | 
Yhat_unlab | 
 Array or DataFrame of predicted outcomes in unlabeled data.  | 
alpha | 
 Specifies the confidence level as 1 - alpha for confidence intervals.  | 
weights | 
 weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).  | 
max_iterations | 
 Sets the maximum number of iterations for the optimization process to derive weights.  | 
convergence_threshold | 
 Sets the convergence threshold for the optimization process to derive weights.  | 
quant | 
 quantile for quantile estimation  | 
intercept | 
 Boolean indicating if the input covariates' data contains the intercept (TRUE if the input data contains)  | 
focal_index | 
 Identifies the focal index for variance reduction.  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
A summary table presenting point estimates, standard error, confidence intervals (1 - alpha), P-values, and weights.
Examples
data <- sim_data()
X_lab <- data$X_lab
X_unlab <- data$X_unlab
Y_lab <- data$Y_lab
Yhat_lab <- data$Yhat_lab
Yhat_unlab <- data$Yhat_unlab
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "mean")
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, quant = 0.75, method = "quantile")
pop_M(X_lab = X_lab, X_unlab = X_unlab,
      Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "ols")
Esimating equation
Description
psi function for esimating equation
Usage
psi(X, Y, theta, quant = NA, method)
Arguments
X | 
 Array or DataFrame containing covariates  | 
Y | 
 Array or DataFrame of outcomes  | 
theta | 
 parameter theta  | 
quant | 
 quantile for quantile estimation  | 
method | 
 indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".  | 
Value
esimating equation
Simulate the data for testing the functions
Description
sim_data function for the calculation of the matrix A
Usage
sim_data(r = 0.9, binary = FALSE)
Arguments
r | 
 imputation correlation  | 
binary | 
 simulate binary outcome or not  | 
Value
simulated data