Title: | Gauss - Dantzig Selector: Aggregation over Random Models |
Version: | 0.1.1 |
Description: | The method aims to identify important factors in screening experiments by aggregation over random models as studied in Singh and Stufken (2022) <doi:10.48550/arXiv.2205.13497>. This package provides functions to run the Gauss-Dantzig selector on screening experiments when interactions may be affecting the response. Currently, all functions require each factor to be at two levels coded as +1 and -1. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
Depends: | R (≥ 2.10) |
LazyData: | true |
Imports: | lpSolve |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
URL: | https://github.com/agrakhi/GDSARM |
BugReports: | https://github.com/agrakhi/GDSARM/issues |
Author: | Rakhi Singh |
Maintainer: | Rakhi Singh <agrakhi@gmail.com> |
Packaged: | 2022-07-13 20:04:06 UTC; R_SINGH5 |
Repository: | CRAN |
Date/Publication: | 2022-07-13 20:20:02 UTC |
Gauss-Dantzig Selector - Aggregation over Random Models (GDS-ARM)
Description
The GDS-ARM procedure consists of three steps. First, it runs
the Gauss Dantzig Selector (GDS) nrep
times, each time
with a different set of nint
randomly selected two-factor interactions.
All m
main effects are included in each GDS run. Second, the best
ntop
models are identified with the smallest BIC. Effects that
appear in at least pkeep x ntop
of the ntop
models are then passed on to the third stage. In the third stage, stepwise
regression is used. With
n
being the number of runs, the stepwise regression starts
with at most n-3
selected effects from the previous step. The
remaining effects from the previous step as well as all main effects are
given a chance to enter into the model using the forward-backward stepwise
regression. The function also has the option of using the modified GDS-ARM.
The modified version incorporates effect heredity in two steps, first, for
each model found by GDS, we ignore active interactions when at least one of
the main effects is not active (for weak heredity) or when both main effects are not
active (for strong heredity); and second, we do the same for the model found
after the stepwise stage of GDS-ARM.
Usage
GDSARM(
delta.n = 10,
nint,
nrep,
ntop,
pkeep,
design,
Y,
cri.penter = 0.01,
cri.premove = 0.05,
opt.heredity = c("none"),
seedvalue = 1234
)
Arguments
delta.n |
a positive integer suggesting the number of delta values
to be tried. |
nint |
a positive integer representing the number of randomly
chosen interactions. The suggested value to use is the ceiling of 20%
of the total number of interactions, that is, for |
nrep |
a positive integer representing the number of times GDS should
be run. The suggested value is |
ntop |
a positive integer representing the number of top models
to be selected among the |
pkeep |
a number between 0 and 1 representing the proportion of |
design |
a |
Y |
a vector of |
cri.penter |
the p-value cutoff for the most significant effect to enter into the stepwise regression model. The suggested value is 0.01. |
cri.premove |
the p-value cutoff for the least significant effect to exit from the stepwise regression model. The suggested value is 0.05. |
opt.heredity |
a string with either |
seedvalue |
a seed value that will fix the set of interactions being selected. The default value is seed to 1234. |
Value
A list returning the selected effects as well as the corresponding important factors.
Source
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., Hodi, F. S. and Eagar, T. W. (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi: 10.48550/arXiv.2205.13497
See Also
Examples
data(dataHamadaWu)
X = dataHamadaWu[,-8]
Y = dataHamadaWu[,8]
delta.n = 10
n = dim(X)[1]
m = dim(X)[2]
nint = ceiling(0.2*choose(m,2))
nrep = choose(m,2)
ntop = max(20, nint*nrep/(2*choose(m,2)))
pkeep = 0.25
cri.penter = 0.01
cri.premove = 0.05
design = X
# GDS-ARM with default values
GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove)
# GDS-ARM with default values but with weak heredity
opt.heredity="weak"
GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity)
data(dataCompoundExt)
X = dataCompoundExt[,-9]
Y = dataCompoundExt[,9]
delta.n = 10
n = dim(X)[1]
m = dim(X)[2]
nint = ceiling(0.2*choose(m,2))
nrep = choose(m,2)
ntop = max(20, nint*nrep/(2*choose(m,2)))
pkeep = 0.25
cri.penter = 0.01
cri.premove = 0.05
design = X
# GDS-ARM on compound extraction
GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove)
# GDS-ARM on compound extraction with strong heredity
opt.heredity = "strong"
GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity)
Gauss-Dantzig Selector
Description
This function runs the Gauss-Dantzig selector on the given columns.
We have two options: either (a) GDS(m) on the m
main
effects, and (b) GDS(m+2fi) on the m
main effects and the corresponding two-factor interactions.
For a given delta
, DS minimizes the L_1-norm (sum of absolute values)
of beta
subject to the constraint that max(|t(X)(y-X * beta)|)
<= delta
.
The GDS is run for multiple values of delta
. We use kmeans and BIC to select a best model.
Usage
GDS_givencols(delta.n = 10, design, Y, which.cols = c("main2fi"))
Arguments
delta.n |
a positive integer suggesting the number of delta values
to be tried. |
design |
a |
Y |
a vector of |
which.cols |
a string with either |
Value
A list returning the selected effects as well as the corresponding important factors.
Source
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., Hodi, F. S. and Eagar, T. W. (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi: 10.48550/arXiv.2205.13497
See Also
Examples
data(dataHamadaWu)
X = dataHamadaWu[,-8]
Y = dataHamadaWu[,8]
delta.n = 10
# GDS on main effects
GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main")
# GDS on main effects and two-factor interactions
GDS_givencols(delta.n, design = X, Y=Y)
data(dataCompoundExt)
X = dataCompoundExt[,-9]
Y = dataCompoundExt[,9]
delta.n = 10
# GDS on main effects
GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main")
# GDS on main effects and two-factor interactions
GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main2fi")
Step III: Stepwise on the consolidated output from different GDS runs
Description
Runs the stepwise regression on the output received from
top models of the consolidated output of different GDS runs. With
n
being the number of runs, the stepwise regression starts
with at most (n-3)
selected effects from the previous step. The
remaining effects from the previous step as well as all main effects are
given a chance to enter into the model using the forward-backward stepwise
regression.
Usage
StepIII_stepwise(
xstart,
xremain,
Xmain,
Xint,
Y,
cri.penter = 0.01,
cri.premove = 0.05,
opt.heredity = "none"
)
Arguments
xstart |
a vector with effects' names corresponding to the starting model. |
xremain |
a vector with effects' names corresponding to the remaining main effects and other effects that needs to be explored with stepwise regression. |
Xmain |
a |
Xint |
a matrix of |
Y |
a vector of |
cri.penter |
the p-value cutoff for the most significant effect to enter into the stepwise regression model |
cri.premove |
the p-value cutoff for the least significant effect to exit from the stepwise regression model |
opt.heredity |
a string with either |
Value
A list returning the selected effects as well as the corresponding important factors.
Source
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi: 10.48550/arXiv.2205.13497
Step I: Multiple GDS runs with random interactions
Description
Runs the Gauss Dantzig Selector (GDS) multiple times, each time
with a different set of randomly selected two-factor interactions.
All m
main effects are included in each GDS run. For each set of
randomly selected interactions, the best GDS output is chosen among
delta.n
values of delta
. We use kmeans with 2
clusters and BIC to select such best model.
Usage
StepI_chooseints(
delta.n = 10,
nint,
nrep,
Xmain,
Xint,
Y,
opt.heredity = c("none")
)
Arguments
delta.n |
a positive integer suggesting the number of delta values
to be tried. |
nint |
a positive integer representing the number of randomly
chosen interactions. The suggested value to use is the ceiling of 20%
of the total number of interactions, that is, for |
nrep |
a positive integer representing the number of times GDS should
be run. The suggested value is |
Xmain |
a |
Xint |
a matrix of |
Y |
a vector of |
opt.heredity |
a string with either |
Value
A list containing the (a) matrix of the output of each GDS run with each row representing the selected effects from the corresponding GDS run, (b) a vector with the corresponding BIC values of each model.
Source
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi: 10.48550/arXiv.2205.13497
Dantzig selector with an option to make profile plot
Description
The Dantzig selector (DS) finds a solution for the model parameters
of a linear model, beta
using linear programming. For a given delta
,
DS minimizes the L_1-norm (sum of absolute values) of beta
subject to the constraint
that max(|t(X)(y-X * beta)|) <= delta
.
Usage
dantzig.delta(X, y, delta, plot = FALSE)
Arguments
X |
a design matrix. |
y |
a vector of responses. |
delta |
a vector with the values of |
plot |
a boolean value of either TRUE or FALSE with TRUE indicating that the profile plot should be drawn. |
Value
A matrix of the estimated values of beta
with each
row corresponding to a particular value of delta
.
Source
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
See Also
Examples
data(dataHamadaWu)
X = dataHamadaWu[,-8]
Y = dataHamadaWu[,8]
#scale and center X and y
scaleX = base::scale(X, center= TRUE, scale = TRUE)
scaleY = base::scale(Y, center= TRUE, scale = FALSE)
maxDelta = max(abs(t(scaleX)%*%matrix(scaleY, ncol=1)))
# Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta
dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4))
Dantzig selector using the lpsolve package
Description
The Dantzig selector (DS) finds a solution for the model parameters
of a linear model, beta
using linear programming. For a given delta
,
DS minimizes the L_1-norm (sum of absolute values) of beta
subject to the constraint
that max(|t(X)(y-X * beta)|)
<= delta
.
Usage
dantzigS(X, y, delta, scale.X = 1)
Arguments
X |
a design matrix. |
y |
a vector of responses. |
delta |
the specific value of |
scale.X |
a number by which each column of |
Value
A list containing the (a) opt
(Value of objective function
at optimum), (b) status
(Numeric indicator: 0 = success,
2 = no feasible solution), (c) beta
(the estimated values
of beta
), (d) delta
Source
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Compound Extraction experiment of Dopico-Garc\' ia et al. (2007)
Description
An analytical experiment conducted by Dopico-Garc\' ia et al. (2007) to characterize the chemical composition of white grapes simultaneously determining the most important phenolic compounds and organic acids for the grapes. This example has been further studied in Phoa et al. (2009b) for one phenolic compound, kaempferol-3-Orutinoside + isorhamnetin-3-O glucoside, which is also what we studied. It is accepted for these data that fitting a main-effects model suggests that V3 (Factor C), V4 (Factor D), and interaction V1:V4 (A:D) are active effects.
Usage
data(dataCompoundExt)
Format
A data frame with 12 rows and 9 columns:
- V1
Factor A
- V2
Factor B
- V3
Factor C
- V4
Factor D
- V5
Factor E
- V6
Factor F
- V7
Factor G
- V8
Factor H
- V9
Response
Source
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Phoa, F. K., Wong, W. K., and Xu, H (2009b). The need of considering the interactions in the analysis of screening designs. Journal of Chemometrics: A Journal of the Chemometrics Society, 23(10): 545–553.
Examples
data(dataCompoundExt)
X = dataCompoundExt[,-9]
Y= dataCompoundExt[,9]
Cast fatigue experiment of Hunter et al. (1982)
Description
A cast fatigue experiment with 12 runs and 7 factors was originally studied by Hunter et al. (1982), and was later revisited by Hamada and Wu (1992) and Phoa et al. (2009), among others. It is widely accepted for these data that V6 (F) and interaction V6:V7 (F:G) are active effects, with interaction of V1:V5 (A:E) possibly being active as well.
Usage
data(dataHamadaWu)
Format
A data frame with 12 rows and 8 columns:
- V1
Factor A
- V2
Factor B
- V3
Factor C
- V4
Factor D
- V5
Factor E
- V6
Factor F
- V7
Factor G
- V8
Response
Source
Hamada, M. and C. F. J. Wu (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., F. S. Hodi, and T. W. Eagar (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Y. H. Pan, and H. Xu (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Examples
data(dataHamadaWu)
X = dataHamadaWu[,-8]
Y= dataHamadaWu[,8]