| Type: | Package | 
| Title: | Bayesian Sparse Estimation of a Covariance Matrix | 
| Version: | 1.0.3 | 
| Date: | 2025-08-14 | 
| Maintainer: | Kyeongwon Lee <kwlee1718@gmail.com> | 
| Author: | Kwangmin Lee [aut], Kyeongwon Lee [aut, cre], Kyoungjae Lee [aut], Seongil Jo [aut], Jaeyong Lee [ctb] | 
| Depends: | R (≥ 4.2) | 
| Description: | Bayesian estimations of a covariance matrix for multivariate normal data. Assumes that the covariance matrix is sparse or band matrix and positive-definite. Methods implemented include the beta-mixture shrinkage prior (Lee et al. (2022) <doi:10.1016/j.jmva.2022.105067>), screened beta-mixture prior (Lee et al. (2024) <doi:10.1214/24-BA1495>), and post-processed posteriors for banded and sparse covariances (Lee et al. (2023) <doi:10.1214/22-BA1333>; Lee and Lee (2023) <doi:10.1016/j.jeconom.2023.105475>). This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea ('NRF') funded by the Ministry of Education ('RS-2023-00211979', 'NRF-2022R1A5A7033499', 'NRF-2020R1A4A1018207' and 'NRF-2020R1C1C1A01013338'). | 
| Imports: | GIGrvg, coda, progress, BayesFactor, MASS, mvnfast, matrixcalc, matrixStats, purrr, dplyr, RSpectra, Matrix, plyr, CholWishart, magrittr, future, furrr, ks, ggplot2, ggmcmc, caret, FinCovRegularization, mvtnorm, stats, patchwork, reshape2, future.apply | 
| Suggests: | hdbinseg, POET, tidyquant, tidyr, timetk, quantmod | 
| License: | GPL-2 | 
| LazyLoad: | yes | 
| URL: | https://github.com/statjs/bspcov | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-08-14 02:52:08 UTC; runner | 
| LazyData: | true | 
| LazyDataCompression: | xz | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-18 19:20:30 UTC | 
SP500 dataset
Description
The S&P 500 dataset from State Street Global Advisors with the collection period from Jan 2013 to Nov 2023.
Format
list
Source
State Street Global Advisors.
Examples
data("SP500")
names(SP500)
Bayesian Estimation of a Banded Covariance Matrix
Description
Provides a post-processed posterior for Bayesian inference of a banded covariance matrix.
Usage
bandPPP(X, k, eps, prior = list(), nsample = 2000)
Arguments
X | 
 a n   | 
k | 
 a scalar value (natural number) specifying the bandwidth of covariance matrix.  | 
eps | 
 a small positive number decreasing to   | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
nsample | 
 a scalar value giving the number of the post-processed posterior samples.  | 
Details
Lee, Lee, and Lee (2023+) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a banded covariance matrix:
Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior
IW_p(B_0, \nu_0)\Sigma \mid X_1, \ldots, X_n \sim IW_p(B_0 + nS_n, \nu_0 + n),where
S_n = n^{-1}\sum_{i=1}^{n}X_iX_i^\top.Post-processing step: Post-process the samples generated from the initial samples
\Sigma_{(i)} := \left\{\begin{array}{ll}B_{k}(\Sigma^{(i)}) + \left[\epsilon_n - \lambda_{\min}\{B_{k}(\Sigma^{(i)})\}\right]I_p, & \mbox{ if } \lambda_{\min}\{B_{k}(\Sigma^{(i)})\} < \epsilon_n, \\ B_{k}(\Sigma^{(i)}), & \mbox{ otherwise }, \end{array}\right.
where \Sigma^{(1)}, \ldots, \Sigma^{(N)} are the initial posterior samples,
\epsilon_n is a small positive number decreasing to 0 as n \rightarrow \infty,
and B_k(B) denotes the k-band operation given as
B_{k}(B) = \{b_{ij}I(|i - j| \le k)\} \mbox{ for any } B = (b_{ij}) \in R^{p \times p}.
For more details, see Lee, Lee and Lee (2023+).
Value
Sigma | 
 a nsample   | 
p | 
 dimension of covariance matrix.  | 
Author(s)
Kwangmin Lee
References
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
See Also
cv.bandPPP estimate
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X,2,0.01,nsample=100)
Bayesian Sparse Covariance Estimation
Description
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via the beta-mixture shrinkage prior.
Usage
bmspcov(
  X,
  Sigma,
  prior = list(),
  nsample = list(),
  nchain = 1,
  seed = NULL,
  do.parallel = FALSE,
  show_progress = TRUE
)
Arguments
X | 
 a n   | 
Sigma | 
 an initial guess for Sigma.  | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
nsample | 
 a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
  | 
nchain | 
 number of MCMC chains to run. Default is 1.  | 
seed | 
 random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1.  | 
do.parallel | 
 logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured.  | 
show_progress | 
 logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel.  | 
Details
Lee, Jo and Lee (2022) proposed the beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The beta-mixture shrinkage prior for \Sigma = (\sigma_{jk}) is defined as
 \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\},
where \pi^{u}(\cdot) is the unconstrained prior given by
\pi^{u}(\sigma_{jk} \mid \rho_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\rho_{jk}}{1 - \rho_{jk}}\tau_1^2\right)
\pi^{u}(\rho_{jk}) = Beta(\rho_{jk} \mid a, b), ~ \rho_{jk} = 1 - 1/(1 + \phi_{jk})
\pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda).
For more details, see Lee, Jo and Lee (2022).
Value
Sigma | 
 a nmc   | 
Phi | 
 a nmc   | 
p | 
 dimension of covariance matrix.  | 
mcmctime | 
 elapsed time for MCMC sampling. For multiple chains, this becomes a list.  | 
nchain | 
 number of chains used.  | 
burnin | 
 number of burn-in samples discarded.  | 
nmc | 
 number of MCMC samples retained for analysis.  | 
Author(s)
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
References
Lee, K., Jo, S., and Lee, J. (2022), "The beta-mixture shrinkage prior for sparse covariances with near-minimax posterior convergence rate", Journal of Multivariate Analysis, 192, 105067.
See Also
sbmspcov
Examples
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))
# Multiple chains example with parallel processing:
# fout_multi <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))), 
#                               nchain = 4, do.parallel = TRUE)
# post.est.multi <- bspcov::estimate(fout_multi)
# When do.parallel = TRUE, the function automatically sets up 
# a multisession plan with nchain workers for parallel execution.
colon dataset
Description
The colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
Format
data.frame
Source
http://genomics-pubs.princeton.edu/oncology/affydata/.
Examples
data("colon")
head(colon)
CV for Bayesian Estimation of a Banded Covariance Matrix
Description
Performs leave-one-out cross-validation (LOOCV) to calculate the predictive log-likelihood for a post-processed posterior of a banded covariance matrix and selects the optimal parameters.
Usage
cv.bandPPP(X, kvec, epsvec, prior = list(), nsample = 2000, ncores = 2)
Arguments
X | 
 a n   | 
kvec | 
 a vector of natural numbers specifying the bandwidth of covariance matrix.  | 
epsvec | 
 a vector of small positive numbers decreasing to   | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
nsample | 
 a scalar value giving the number of the post-processed posterior samples.  | 
ncores | 
 a scalar value giving the number of CPU cores.  | 
Details
The predictive log-likelihood for each k and \epsilon_n is estimated as follows:
 \sum_{i=1}^n \log S^{-1} \sum_{s=1}^S p(X_i \mid B_k^{(\epsilon_n)}(\Sigma_{i,s})),
 
where X_i is the ith observation, \Sigma_{i,s} is the sth posterior sample based on (X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n), and B_k^{(\epsilon_n)} represents the banding post-processing function.
For more details, see (3) in Lee, Lee and Lee (2023+).
Value
elpd | 
 a M   | 
Author(s)
Kwangmin Lee
References
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
Gelman, A., Hwang, J., and Vehtari, A. (2014). "Understanding predictive information criteria for Bayesian models." Statistics and computing, 24(6), 997-1016.
See Also
bandPPP
Examples
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
kvec <- 1:2
epsvec <- c(0.01,0.05)
res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4)
plot(res)
CV for Bayesian Estimation of a Sparse Covariance Matrix
Description
Performs cross-validation to estimate spectral norm error for a post-processed posterior of a sparse covariance matrix.
Usage
cv.thresPPP(
  X,
  thresvec,
  epsvec,
  prior = NULL,
  thresfun = "hard",
  nsample = 2000,
  ncores = 2
)
Arguments
X | 
 a n   | 
thresvec | 
 a vector of real numbers specifying the parameter of the threshold function.  | 
epsvec | 
 a vector of small positive numbers decreasing to   | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
thresfun | 
 a string to specify the type of threshold function.   | 
nsample | 
 a scalar value giving the number of the post-processed posterior samples.  | 
ncores | 
 a scalar value giving the number of CPU cores.  | 
Details
Given a set of train data and validation data, the spectral norm error for each \gamma and \epsilon_n is estimated as follows:
 ||\hat{\Sigma}(\gamma,\epsilon_n)^{(train)} - S^{(val)}||_2
 
where \hat{\Sigma}(\gamma,\epsilon_n)^{(train)} is the estimate for the covariance based on the train data and S^{(val)} is the sample covariance matrix based on the validation data.
The spectral norm error is estimated by the 10-fold cross-validation.
For more details, see the first paragraph on page 9 in Lee and Lee (2023).
Value
CVdf | 
 a M   | 
Author(s)
Kwangmin Lee
References
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics, 236(3), 105475.
See Also
thresPPP
Examples
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
thresvec <- c(0.01,0.1)
epsvec <- c(0.01,0.1)
res <- bspcov::cv.thresPPP(X,thresvec,epsvec,nsample=100)
plot(res)
Point-estimate of posterior distribution
Description
Compute the point estimate (mean) to describe posterior distribution. For multiple chains, combines all chains to compute a more robust estimate.
Usage
estimate(object, ...)
## S3 method for class 'bspcov'
estimate(object, ...)
Arguments
object | 
 an object from bandPPP, bmspcov, sbmspcov, and thresPPP.  | 
... | 
 additional arguments for estimate.  | 
Value
Sigma | 
 the point estimate (mean) of covariance matrix. For multiple chains, uses combined samples from all chains.  | 
Author(s)
Seongil Jo and Kyeongwon Lee
See Also
plot.postmean.bspcov
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X,2,0.01,nsample=100)
est <- bspcov::estimate(res)
Plot Diagnostics of Posterior Samples and Cross-Validation
Description
Provides a trace plot of posterior samples and a plot of a learning curve for cross-validation. For multiple chains, each chain is displayed in a different color for convergence assessment.
Usage
## S3 method for class 'bspcov'
plot(x, ..., cols, rows)
Arguments
x | 
 an object from bmspcov, sbmspcov, cv.bandPPP, and cv.thresPPP.  | 
... | 
 additional arguments for ggplot2.  | 
cols | 
 a scalar or a vector including specific column indices for the trace plot.  | 
rows | 
 a scalar or a vector including specific row indices greater than or equal to columns indices for the trace plot.  | 
Value
plot | 
 a plot for diagnostics of posterior samples x. For multiple chains, returns colored trace plots with each chain in a different color.  | 
Author(s)
Seongil Jo and Kyeongwon Lee
Examples
set.seed(1)
n <- 100
p <- 20
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
plot(fout, cols = c(1, 3, 4), rows = c(1, 3, 4))
plot(fout, cols = 1, rows = 1:3)
# Cross-Validation for Banded Structure
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
kvec <- 1:2
epsvec <- c(0.01,0.05)
res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4)
plot(res)
Plot method for postmean.bspcov objects
Description
Create heatmap visualization for posterior mean estimate of sparse covariance matrix. Provides flexible visualization options with customizable aesthetics and labeling.
Usage
## S3 method for class 'postmean.bspcov'
plot(
  x,
  title = NULL,
  color_limits = NULL,
  color_low = "black",
  color_high = "white",
  base_size = 14,
  legend_position = "bottom",
  x_label = "Variable",
  y_label = "Variable",
  show_values = FALSE,
  ...
)
Arguments
x | 
 an object of class   | 
title | 
 character string for plot title. If NULL, auto-generated title is used.  | 
color_limits | 
 optional vector of length 2 specifying color scale limits. If NULL, computed from data.  | 
color_low | 
 color for low values in heatmap. Default is "black".  | 
color_high | 
 color for high values in heatmap. Default is "white".  | 
base_size | 
 base font size for plot theme. Default is 14.  | 
legend_position | 
 position of legend. Default is "bottom".  | 
x_label | 
 label for x-axis. Default is "Variable".  | 
y_label | 
 label for y-axis. Default is "Variable".  | 
show_values | 
 logical indicating whether to display values on tiles. Default is FALSE.  | 
... | 
 additional arguments passed to plotting functions.  | 
Value
A ggplot object showing heatmap visualization of the posterior mean covariance matrix.
Author(s)
Seongil Jo, Kyeongwon Lee
See Also
estimate, plot.bspcov, plot.quantile.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100)
est <- bspcov::estimate(res)
# Basic plot
plot(est)
# Plot with custom color scheme
plot(est, color_low = "blue", color_high = "red")
Plot method for quantile.bspcov objects
Description
Create visualization plots for posterior quantiles of covariance matrices. Produces heatmap visualizations showing uncertainty across different quantile levels.
Usage
## S3 method for class 'quantile.bspcov'
plot(
  x,
  type = "heatmap",
  titles = NULL,
  ncol = 3,
  color_limits = NULL,
  color_low = "black",
  color_high = "white",
  base_size = 14,
  legend_position = "bottom",
  x_label = "Variable",
  y_label = "Variable",
  width = NULL,
  height = 6,
  ...
)
Arguments
x | 
 an object of class   | 
type | 
 character string specifying plot type. Options are "heatmap" (default), "uncertainty", or "comparison".  | 
titles | 
 character vector of titles for each quantile plot. If NULL, auto-generated titles are used.  | 
ncol | 
 number of columns in the plot layout. Default is 3.  | 
color_limits | 
 optional vector of length 2 specifying color scale limits. If NULL, computed from data.  | 
color_low | 
 color for low values in heatmap. Default is "black".  | 
color_high | 
 color for high values in heatmap. Default is "white".  | 
base_size | 
 base font size for plot theme. Default is 14.  | 
legend_position | 
 position of legend. Default is "bottom".  | 
x_label | 
 label for x-axis. Default is "Variable".  | 
y_label | 
 label for y-axis. Default is "Variable".  | 
width | 
 plot width when saving. Default is calculated based on number of quantiles.  | 
height | 
 plot height when saving. Default is 6.  | 
... | 
 additional arguments passed to plotting functions.  | 
Value
A ggplot object (single quantile) or patchwork object (multiple quantiles) showing heatmap visualizations.
Author(s)
Kyeongwon Lee
See Also
quantile, plot.bspcov, plot.postmean.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100)
quant <- quantile(res)
# Plot uncertainty visualization
plot(quant)
# Plot with custom titles and labels
plot(quant, titles = c("Lower Bound", "Median", "Upper Bound"),
     x_label = "Variable", y_label = "Variable")
Preprocess SP500 data
Description
The proc_SP500 function preprocesses the SP500 stock data by calculating monthly
returns for selected sectors and generating idiosyncratic errors.
Usage
proc_SP500(SP500data, sectors)
Arguments
SP500data | 
 A data frame containing SP500 stock data with columns including: 
  | 
sectors | 
 A character vector specifying the sectors to include in the analysis.  | 
Details
Calculates monthly returns for each stock in the specified sectors
Estimates the number of factors via
hdbinseg::get.factor.model(ic="ah")Uses
POET::POET()to extract factor loadings/factors and form idiosyncratic errors
Value
A list with components:
Uhat | 
 A matrix of idiosyncratic errors.  | 
Khat | 
 Estimated number of factors.  | 
factorparthat | 
 Estimated factor returns.  | 
sectornames | 
 Sector for each column of   | 
Examples
data("SP500")
set.seed(1234)
sectors <- c("Energy", "Financials", "Materials",
   "Real Estate", "Utilities", "Information Technology")
Uhat <- proc_SP500(SP500, sectors)$Uhat
PPPres <- thresPPP(Uhat, eps = 0, thres = list(value = 0.0020, fun = "hard"), nsample = 100)
postmean <- estimate(PPPres)
diag(postmean) <- 0 # hide color for diagonal
plot(postmean)
Preprocess Colon Gene Expression Data
Description
The proc_colon function preprocesses colon gene expression data by:
Log transforming the raw counts.
Performing two-sample t-tests for each gene between normal and tumor samples.
Selecting the top 50 genes by absolute t-statistic.
Returning the filtered expression matrix and sample indices/groups.
Usage
proc_colon(colon, tissues)
Arguments
colon | 
 A numeric matrix of raw colon gene expression values (genes × samples). Rows are genes; columns are samples.  | 
tissues | 
 A numeric vector indicating tissue type per sample: positive for normal, negative for tumor.  | 
Value
A list with components:
- X
 A numeric matrix (samples x 50 genes) of selected, log‐transformed expression values.
- normal_idx
 Integer indices of normal‐tissue columns in the original data.
- tumor_idx
 Integer indices of tumor‐tissue columns in the original data.
- group
 Integer vector of length
ncol(colon), with 1 = normal, 2 = tumor.
Examples
data("colon")
data("tissues")
set.seed(1234)
colon_data <- proc_colon(colon, tissues)
X <- colon_data$X
foo <- bmspcov(X, Sigma = cov(X))
sigmah <- estimate(foo)
Quantiles of posterior distribution
Description
Compute quantiles to describe posterior distribution. For multiple chains, combines all chains to compute more robust quantiles.
Usage
## S3 method for class 'bspcov'
quantile(x, probs = c(0.025, 0.5, 0.975), ...)
Arguments
x | 
 an object from bandPPP, bmspcov, sbmspcov, and thresPPP.  | 
probs | 
 numeric vector of probabilities with values in [0,1]. Default is c(0.025, 0.5, 0.975).  | 
... | 
 additional arguments for quantile.  | 
Value
quantiles | 
 a list containing quantile matrices for each probability level. For multiple chains, uses combined samples from all chains.  | 
Author(s)
Kyeongwon Lee
See Also
estimate, plot.postmean.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample=100)
quant <- quantile(res)
# Get 95% credible intervals
quant <- quantile(res, probs = c(0.025, 0.975))
Save quantile plot to file
Description
Convenience function to save quantile plots with appropriate dimensions.
Usage
save_quantile_plot(x, filename, width = NULL, height = 6, ...)
Arguments
x | 
 an object of class   | 
filename | 
 filename to save the plot.  | 
width | 
 plot width. If NULL, calculated based on number of quantiles.  | 
height | 
 plot height. Default is 6.  | 
... | 
 additional arguments passed to   | 
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100)
quant <- quantile(res)
Bayesian Sparse Covariance Estimation using Sure Screening
Description
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via screened beta-mixture prior.
Usage
sbmspcov(
  X,
  Sigma,
  cutoff = list(),
  prior = list(),
  nsample = list(),
  nchain = 1,
  seed = NULL,
  do.parallel = FALSE,
  show_progress = TRUE
)
Arguments
X | 
 a n   | 
Sigma | 
 an initial guess for Sigma.  | 
cutoff | 
 a list giving the information for the threshold.
The list includes the following parameters (with default values in parentheses):
  | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
nsample | 
 a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
  | 
nchain | 
 number of MCMC chains to run. Default is 1.  | 
seed | 
 random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1.  | 
do.parallel | 
 logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured.  | 
show_progress | 
 logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel.  | 
Details
Lee, Jo, Lee, and Lee (2023+) proposed the screened beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The screened beta-mixture shrinkage prior for \Sigma = (\sigma_{jk}) is defined as
 \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\},
where \pi^{u}(\cdot) is the unconstrained prior given by
\pi^{u}(\sigma_{jk} \mid \psi_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\psi_{jk}}{1 - \psi_{jk}}\tau_1^2\right), ~ \psi_{jk} = 1 - 1/(1 + \phi_{jk})
\pi^{u}(\psi_{jk}) = Beta(\psi_{jk} \mid a, b) ~ \mbox{for } (j, k) \in S_r(\hat{R})
\pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda),
where S_r(\hat{R}) = \{(j,k) : 1 < j < k \le p, |\hat{\rho}_{jk}| > r\}, \hat{\rho}_{jk} are sample correlations, and r is a threshold given by user.
For more details, see Lee, Jo, Lee and Lee (2022+).
Value
Sigma | 
 a nmc   | 
p | 
 dimension of covariance matrix.  | 
Phi | 
 a nmc   | 
INDzero | 
 a list including indices of off-diagonal elements screened by sure screening. For multiple chains, this becomes a list of lists.  | 
cutoff | 
 the cutoff value specified by FNR-approach. For multiple chains, this becomes a list.  | 
mcmctime | 
 elapsed time for MCMC sampling. For multiple chains, this becomes a list.  | 
nchain | 
 number of chains used.  | 
burnin | 
 number of burn-in samples discarded.  | 
nmc | 
 number of MCMC samples retained for analysis.  | 
Author(s)
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
References
Lee, K., Jo, S., Lee, K., and Lee, J. (2023+), "Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior", arXiv:2206.12773.
See Also
bmspcov
Examples
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))
# Multiple chains example with parallel processing:
# fout_multi <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))), 
#                                nchain = 4, seed = 123, do.parallel = TRUE)
# post.est.multi <- bspcov::estimate(fout_multi)
# summary(fout_multi, cols = 1, rows = 1:3)  # Shows MCMC diagnostics
# plot(fout_multi, cols = 1, rows = 1:3)     # Shows colored trace plots
# When do.parallel = TRUE, the function automatically sets up 
# a multisession plan with nchain workers for parallel execution.
Summary of Posterior Distribution
Description
Provides the summary statistics for posterior samples of covariance matrix.
Usage
## S3 method for class 'bspcov'
summary(object, cols, rows, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), ...)
Arguments
object | 
 an object from bandPPP, bmspcov, sbmspcov, and thresPPP.  | 
cols | 
 a scalar or a vector including specific column indices.  | 
rows | 
 a scalar or a vector including specific row indices greater than or equal to columns indices.  | 
quantiles | 
 a numeric vector of quantiles to compute. Default is c(0.025, 0.25, 0.5, 0.75, 0.975).  | 
... | 
 additional arguments for the summary function.  | 
Value
summary | 
 a table of summary statistics including empirical mean, standard deviation, and quantiles for posterior samples. For multiple chains, also includes effective sample size (n_eff) and R-hat diagnostics.  | 
Note
If both cols and rows are vectors, they must have the same length.
Author(s)
Seongil Jo and Kyeongwon Lee
Examples
# Example with simulated data
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
summary(fout, cols = c(1, 3, 4), rows = c(1, 3, 4))
summary(fout, cols = 1, rows = 1:p)
# Custom quantiles:
summary(fout, cols = 1, rows = 1:3, quantiles = c(0.05, 0.5, 0.95))
Bayesian Estimation of a Sparse Covariance Matrix
Description
Provides a post-processed posterior (PPP) for Bayesian inference of a sparse covariance matrix.
Usage
thresPPP(X, eps, thres = list(), prior = list(), nsample = 2000)
Arguments
X | 
 a n   | 
eps | 
 a small positive number decreasing to   | 
thres | 
 a list giving the information for thresholding PPP procedure.
The list includes the following parameters (with default values in parentheses):
  | 
prior | 
 a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
  | 
nsample | 
 a scalar value giving the number of the post-processed posterior samples.  | 
Details
Lee and Lee (2023) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a sparse covariance matrix:
Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior
IW_p(B_0, \nu_0)\Sigma \mid X_1, \ldots, X_n \sim IW_p(B_0 + nS_n, \nu_0 + n),where
S_n = n^{-1}\sum_{i=1}^{n}X_iX_i^\top.Post-processing step: Post-process the samples generated from the initial samples
\Sigma_{(i)} := \left\{\begin{array}{ll}H_{\gamma_n}(\Sigma^{(i)}) + \left[\epsilon_n - \lambda_{\min}\{H_{\gamma_n}(\Sigma^{(i)})\}\right]I_p, & \mbox{ if } \lambda_{\min}\{H_{\gamma_n}(\Sigma^{(i)})\} < \epsilon_n, \\ H_{\gamma_n}(\Sigma^{(i)}), & \mbox{ otherwise }, \end{array}\right.
where \Sigma^{(1)}, \ldots, \Sigma^{(N)} are the initial posterior samples,
\epsilon_n is a positive real number, and H_{\gamma_n}(\Sigma) denotes the generalized threshodling operator given as
(H_{\gamma_n}(\Sigma))_{ij} = \left\{\begin{array}{ll}\sigma_{ij}, & \mbox{ if } i = j, \\
h_{\gamma_n}(\sigma_{ij}), & \mbox{ if } i \neq j, \end{array}\right.
where \sigma_{ij} is the (i,j) element of \Sigma and h_{\gamma_n}(\cdot) is a generalized thresholding function.
For more details, see Lee and Lee (2023).
Value
Sigma | 
 a nsample   | 
p | 
 dimension of covariance matrix.  | 
Author(s)
Kwangmin Lee
References
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics.
See Also
cv.thresPPP
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100)
est <- bspcov::estimate(res)
tissues dataset
Description
The tissues data of colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
Format
numeric
Source
http://genomics-pubs.princeton.edu/oncology/affydata/.
Examples
data("tissues")
head(tissues)