The goal of the svyROC package is to plot weighted estimates of the ROC curves and to obtain weighted estimates of the AUC.
The following functions are available:
wse
, wsp
: estimate sensitivity and
specificity parameters for a specific cut-off point considering sampling
weights.wroc
: estimate the ROC curve considering sampling
weights.wauc
: estimate the AUC considering sampling
weights.corrected.wauc
: correct the optimism of the weighted
estimate of the AUC by means of replicate weights.wocp
: calculate optimal cut-off points for individual
classification considering sampling weights.wroc.plot
: plot the ROC curve.The methodology proposed for the above-mentioned functions can be found in the following references:
Iparragirre, A., Barrio, I., Aramendi, J. and Arostegui, I. (2022). Estimation of cut-off points under complex-sampling design data. SORT-Statistics and Operations Research Transactions 46(1), 137–158.
Iparragirre, A., Barrio, I. and Arostegui, I. (2023). Estimation of the ROC curve and the area under it with complex survey data. Stat 12(1), e635.
Iparragirre, A. and Barrio, I. (2024). Optimism Correction of the AUC with Complex Survey Data. In: Einbeck, J., Maeng, H., Ogundimu, E., Perrakis, K. (eds) Developments in Statistical Modelling. IWSM 2024. Contributions to Statistics. Springer, Cham.
To install the package from CRAN:
install.packages("svyROC")
To install the most updated version of the package from GitHub run the following code:
::install_github("aiparragirre/svyROC") devtools
We need information on three elements for each unit in the sample in
order to estimate the ROC curve (wroc()
function) and AUC
(wauc()
function):
response.var
: variable indicating the dichotomous
response variable.phat.var
: predicted probabilities of event.weights.var
: variable indicating the sampling
weights.We can put these three vectors in a data frame, or save them
separately in three different vectors. The data set
example_data_wroc
is set as an example in the package. We
also need to define the tags for events and non-events.
library(svyROC)
data(example_data_wroc)
<- wroc(response.var = "y", phat.var = "phat", weights.var = "weights",
mycurve data = example_data_wroc,
tag.event = 1, tag.nonevent = 0)
# Or equivalently
<- wroc(response.var = example_data_wroc$y,
mycurve phat.var = example_data_wroc$phat,
weights.var = example_data_wroc$weights,
tag.event = 1, tag.nonevent = 0)
Similarly, we can run the following code to estimate the AUC:
<- wauc(response.var = "y",
auc.obj phat.var = "phat",
weights.var = "weights",
tag.event = 1,
tag.nonevent = 0,
data = example_data_wroc)
# Or equivalently
<- wauc(response.var = example_data_wroc$y,
auc.obj phat.var = example_data_wroc$phat,
weights.var = example_data_wroc$weights,
tag.event = 1, tag.nonevent = 0)
We can correct the optimism of the weighted estimate of the AUC by
means of replicate weights, as proposed in Iparragirre and Barrio
(2024), by means of the corrected.wauc()
function. For this
purpose, we additionally need information on the covariates and the
sampling design. Here is an example of the usage of this function:
data(example_variables_wroc)
<- survey::svydesign(ids = ~cluster, strata = ~strata,
mydesign weights = ~weights, nest = TRUE,
data = example_variables_wroc)
<- survey::svyglm(y ~ x1 + x2 + x3 + x4 + x5 + x6, design = mydesign,
m family = quasibinomial())
<- predict(m, newdata = example_variables_wroc, type = "response")
phat <- wauc(response.var = example_variables_wroc$y, phat.var = phat,
myaucw weights.var = example_variables_wroc$weights)
# Correction of the AUCw:
set.seed(1)
<- corrected.wauc(data = example_variables_wroc,
cor formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
tag.event = 1, tag.nonevent = 0,
weights.var = "weights", strata.var = "strata", cluster.var = "cluster",
method = "dCV", dCV.method = "pooling", k = 10, R = 20)
# Or equivalently:
set.seed(1)
<- corrected.wauc(design = mydesign,
cor formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
tag.event = 1, tag.nonevent = 0,
method = "dCV", dCV.method = "pooling", k = 10, R = 20)
We can also estimate the sensitivity (wse()
) and
specificity (wsp()
) parameters for a specific cut-off point
considering sampling weights. For this purpose, we need to indicate the
cut-off point we want to use in the function by means of the argument
cutoff.value
:
# Specificity ----------------------------------------------------------
<- wsp(response.var = "y",
sp.obj phat.var = "phat",
weights.var = "weights",
tag.nonevent = 0,
cutoff.value = 0.5,
data = example_data_wroc)
# Or equivalently
<- wsp(response.var = example_data_wroc$y,
sp.obj phat.var = example_data_wroc$phat,
weights.var = example_data_wroc$weights,
tag.nonevent = 0,
cutoff.value = 0.5)
# Sensitivity ----------------------------------------------------------
<- wse(response.var = "y",
se.obj phat.var = "phat",
weights.var = "weights",
tag.event = 1,
cutoff.value = 0.5,
data = example_data_wroc)
# Or equivalently
<- wse(response.var = example_data_wroc$y,
se.obj phat.var = example_data_wroc$phat,
weights.var = example_data_wroc$weights,
tag.event = 1,
cutoff.value = 0.5)
Finally, use the function wocp()
to obtain optimal
cut-off points for individual classification as proposed in Iparragirre
et al (2022). Some functions of the package
OptimalCutpoints
have been modified in order for them to
consider sampling weights:
Lopez-Raton, M., Rodriguez-Alvarez, M.X, Cadarso-Suarez, C. and Gude-Sampedro, F. (2014). OptimalCutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests. Journal of Statistical Software 61(8), 1–36.
One of the methods proposed in the paper needs to be selected when
running the function by means of the argument method
:
Youden
, MaxProdSpSe
, ROC01
or
MaxEfficiency
.
<- wocp(response.var = "y",
myocp phat.var = "phat", weights.var = "weights",
tag.event = 1,
tag.nonevent = 0,
method = "Youden",
data = example_data_wroc)
# Or equivalently
<- wocp(example_data_wroc$y,
myocp $phat,
example_data_wroc$weights,
example_data_wroctag.event = 1,
tag.nonevent = 0,
method = "Youden")
If you want to draw the optimal cut-off point in the ROC curve, then
use the function wroc.plot()
and indicate the method by
means of the argument cutoff.method
in the function
wroc()
as follows:
<- wroc(response.var = "y",
mycurve phat.var = "phat",
weights.var = "weights",
data = example_data_wroc,
tag.event = 1,
tag.nonevent = 0,
cutoff.method = "Youden")
wroc.plot(x = mycurve,
print.auc = TRUE,
print.cutoff = TRUE)