For the most part, this document will present the functionalities of the function surveysd::calc.stError() which generates point estimates and standard errors for user-supplied estimation functions.
In order to use a dataset with calc.stError(), several weight columns have to be present. Each weight column corresponds to a bootstrap sample. In the following examples, we will use the data from demo.eusilc() and attach the bootstrap weights using draw.bootstrap() and recalib(). Please refer to the documentation of those functions for more detail.
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
strata = "region", period = "year")
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[, onePerson := nrow(.SD) == 1, by = .(year, hid)]
## print part of the dataset
dat_boot_calib[1:5, .(year, povertyRisk, eqIncome, onePerson, pWeight, w1, w2, w3, w4, w5)]| year | povertyRisk | eqIncome | onePerson | pWeight | w1 | w2 | w3 | w4 | w5 |
|---|---|---|---|---|---|---|---|---|---|
| 2010 | FALSE | 16090.69 | FALSE | 504.5696 | 0.4477743 | 1007.5651 | 0.4466992 | 0.4489062 | 1019.721 |
| 2010 | FALSE | 16090.69 | FALSE | 504.5696 | 0.4477743 | 1007.5651 | 0.4466992 | 0.4489062 | 1019.721 |
| 2010 | FALSE | 16090.69 | FALSE | 504.5696 | 0.4477743 | 1007.5651 | 0.4466992 | 0.4489062 | 1019.721 |
| 2011 | FALSE | 16090.69 | FALSE | 504.5696 | 0.4237016 | 968.2711 | 0.4468813 | 0.4654705 | 1005.100 |
| 2011 | FALSE | 16090.69 | FALSE | 504.5696 | 0.4237016 | 968.2711 | 0.4468813 | 0.4654705 | 1005.100 |
The parameters fun and var in calc.stError() define the estimator to be used in the error analysis. There are two built-in estimator functions weightedSum() and weightedRatio() which can be used as follows.
povertyRate <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio)
totalIncome <- calc.stError(dat_boot_calib, var = "eqIncome", fun = weightedSum)Those functions calculate the ratio of persons at risk of poverty (in percent) and the total income. By default, the results are calculated separately for each reference period.
| year | n | N | val_povertyRisk | stE_povertyRisk |
|---|---|---|---|---|
| 2010 | 14827 | 8182222 | 14.44422 | 0.5838199 |
| 2011 | 14827 | 8182222 | 14.77393 | 0.6771848 |
| 2012 | 14827 | 8182222 | 15.04515 | 0.4907275 |
| 2013 | 14827 | 8182222 | 14.89013 | 0.4756386 |
| 2014 | 14827 | 8182222 | 15.14556 | 0.5195149 |
| 2015 | 14827 | 8182222 | 15.53640 | 0.4808834 |
| 2016 | 14827 | 8182222 | 15.08315 | 0.3566648 |
| 2017 | 14827 | 8182222 | 15.42019 | 0.5592589 |
| year | n | N | val_eqIncome | stE_eqIncome |
|---|---|---|---|---|
| 2010 | 14827 | 8182222 | 162750998071 | 960851034 |
| 2011 | 14827 | 8182222 | 161926931417 | 901915958 |
| 2012 | 14827 | 8182222 | 162576509628 | 1198230549 |
| 2013 | 14827 | 8182222 | 163199507862 | 1474893389 |
| 2014 | 14827 | 8182222 | 163986275009 | 1528829971 |
| 2015 | 14827 | 8182222 | 163416275447 | 1338217239 |
| 2016 | 14827 | 8182222 | 162706205137 | 1207314432 |
| 2017 | 14827 | 8182222 | 164314959107 | 1322744977 |
Columns that use the val_ prefix denote the point estimate belonging to the “main weight” of the dataset, which is pWeight in case of the dataset used here.
Columns with the stE_ prefix denote standard errors calculated with bootstrap replicates. The replicates result in using w1, w2, …, w10 instead of pWeight when applying the estimator.
n denotes the number of observations for the year and N denotes the total weight of those persons.
In order to define a custom estimator function to be used in fun, the function needs to have two arguments like the example below.
## define custom estimator
myWeightedSum <- function(x, w) {
sum(x*w)
}
## check if results are equal to the one using `suveysd::weightedSum()`
totalIncome2 <- calc.stError(dat_boot_calib, var = "eqIncome", fun = myWeightedSum)
all.equal(totalIncome$Estimates, totalIncome2$Estimates)## [1] TRUE
The parameters x and w can be assumed to be vectors with equal length with w being numeric and x being the column defined in the var argument. It will be called once for each period (in this case year) and for each weight column (in this case pWeight, w1, w2, …, w10).
In case an estimator should be applied to several columns of the dataset, var can be set to a vector containing all necessary columns.
multipleRates <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio)
multipleRates$Estimates| year | n | N | val_povertyRisk | stE_povertyRisk | val_onePerson | stE_onePerson |
|---|---|---|---|---|---|---|
| 2010 | 14827 | 8182222 | 14.44422 | 0.5838199 | 14.85737 | 0.2572088 |
| 2011 | 14827 | 8182222 | 14.77393 | 0.6771848 | 14.85737 | 0.2778282 |
| 2012 | 14827 | 8182222 | 15.04515 | 0.4907275 | 14.85737 | 0.2909612 |
| 2013 | 14827 | 8182222 | 14.89013 | 0.4756386 | 14.85737 | 0.3458258 |
| 2014 | 14827 | 8182222 | 15.14556 | 0.5195149 | 14.85737 | 0.4394058 |
| 2015 | 14827 | 8182222 | 15.53640 | 0.4808834 | 14.85737 | 0.3810024 |
| 2016 | 14827 | 8182222 | 15.08315 | 0.3566648 | 14.85737 | 0.3105411 |
| 2017 | 14827 | 8182222 | 15.42019 | 0.5592589 | 14.85737 | 0.3030293 |
Here we see the relative number of persons at risk of poverty and the relative number of one-person households.
The groups argument can be used to calculate estimators for different subsets of the data. This argument can take the grouping variable as a string that refers to a column name (usually a factor) in dat. If set, all estimators are not only split by the reference period but also by the grouping variable. For simplicity, only one reference period of the above data is used.
dat2 <- subset(dat_boot_calib, year == 2010)
for (att in c("period", "weights", "b.rep"))
attr(dat2, att) <- attr(dat_boot_calib, att)To calculate the ratio of persons at risk of poverty for each federal state of Austria, group = "region" can be used.
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, group = "region")
povertyRates$Estimates| year | n | N | region | val_povertyRisk | stE_povertyRisk |
|---|---|---|---|---|---|
| 2010 | 549 | 260564 | Burgenland | 19.53984 | 1.8042764 |
| 2010 | 733 | 377355 | Vorarlberg | 16.53731 | 3.2896891 |
| 2010 | 924 | 535451 | Salzburg | 13.78734 | 2.2585866 |
| 2010 | 1078 | 563648 | Carinthia | 13.08627 | 1.7469304 |
| 2010 | 1317 | 701899 | Tyrol | 15.30819 | 1.3943715 |
| 2010 | 2295 | 1167045 | Styria | 14.37464 | 1.3500431 |
| 2010 | 2322 | 1598931 | Vienna | 17.23468 | 1.0428496 |
| 2010 | 2804 | 1555709 | Lower Austria | 13.84362 | 1.7034375 |
| 2010 | 2805 | 1421620 | Upper Austria | 10.88977 | 0.8709790 |
| 2010 | 14827 | 8182222 | NA | 14.44422 | 0.5838199 |
The last row with region = NA denotes the aggregate over all regions. Note that the columns N and n now show the weighted and unweighted number of persons in each region.
In case more than one grouping variable is used, there are several options of calling calc.stError() depending on whether combinations of grouping levels should be regarded or not. We will consider the variables gender and region as our grouping variables and show three options on how calc.stError() can be called.
Calculate the point estimate and standard error for each region and each gender. The number of rows in the output is therefore
\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12.\]
The last row is again the estimate for the whole period.
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = c("gender", "region"))
povertyRates$Estimates| year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
|---|---|---|---|---|---|---|
| 2010 | 549 | 260564 | NA | Burgenland | 19.53984 | 1.8042764 |
| 2010 | 733 | 377355 | NA | Vorarlberg | 16.53731 | 3.2896891 |
| 2010 | 924 | 535451 | NA | Salzburg | 13.78734 | 2.2585866 |
| 2010 | 1078 | 563648 | NA | Carinthia | 13.08627 | 1.7469304 |
| 2010 | 1317 | 701899 | NA | Tyrol | 15.30819 | 1.3943715 |
| 2010 | 2295 | 1167045 | NA | Styria | 14.37464 | 1.3500431 |
| 2010 | 2322 | 1598931 | NA | Vienna | 17.23468 | 1.0428496 |
| 2010 | 2804 | 1555709 | NA | Lower Austria | 13.84362 | 1.7034375 |
| 2010 | 2805 | 1421620 | NA | Upper Austria | 10.88977 | 0.8709790 |
| 2010 | 7267 | 3979572 | male | NA | 12.02660 | 0.6303407 |
| 2010 | 7560 | 4202650 | female | NA | 16.73351 | 0.6300745 |
| 2010 | 14827 | 8182222 | NA | NA | 14.44422 | 0.5838199 |
state and genderSplit the data by all combinations of the two grouping variables. This will result in a larger output-table of the size
\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + 1) = 1\cdot(9\cdot2 + 1)= 19.\]
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = list(c("gender", "region")))
povertyRates$Estimates| year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
|---|---|---|---|---|---|---|
| 2010 | 261 | 122741.8 | male | Burgenland | 17.414524 | 2.2332286 |
| 2010 | 288 | 137822.2 | female | Burgenland | 21.432598 | 2.0899871 |
| 2010 | 359 | 182732.9 | male | Vorarlberg | 12.973259 | 3.1093204 |
| 2010 | 374 | 194622.1 | female | Vorarlberg | 19.883637 | 3.7576712 |
| 2010 | 440 | 253143.7 | male | Salzburg | 9.156964 | 1.9028612 |
| 2010 | 484 | 282307.3 | female | Salzburg | 17.939382 | 2.5374889 |
| 2010 | 517 | 268581.4 | male | Carinthia | 10.552148 | 2.0614209 |
| 2010 | 561 | 295066.6 | female | Carinthia | 15.392924 | 1.9756282 |
| 2010 | 650 | 339566.5 | male | Tyrol | 12.857542 | 1.0371712 |
| 2010 | 667 | 362332.5 | female | Tyrol | 17.604861 | 2.2145631 |
| 2010 | 1128 | 571011.7 | male | Styria | 11.671247 | 1.5111686 |
| 2010 | 1132 | 774405.4 | male | Vienna | 15.590616 | 1.3348673 |
| 2010 | 1167 | 596033.3 | female | Styria | 16.964539 | 1.3758548 |
| 2010 | 1190 | 824525.6 | female | Vienna | 18.778813 | 0.9304295 |
| 2010 | 1363 | 684272.5 | male | Upper Austria | 9.074690 | 1.1711956 |
| 2010 | 1387 | 772593.2 | female | Lower Austria | 16.372949 | 1.8366058 |
| 2010 | 1417 | 783115.8 | male | Lower Austria | 11.348283 | 1.6437512 |
| 2010 | 1442 | 737347.5 | female | Upper Austria | 12.574205 | 0.7710579 |
| 2010 | 14827 | 8182222.0 | NA | NA | 14.444218 | 0.5838199 |
In this case, the estimates and standard errors are calculated for
The number of rows in the output is therefore
\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9\cdot2 + 9 + 2 + 1) = 30.\]
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = list("gender", "region", c("gender", "region")))
povertyRates$Estimates| year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
|---|---|---|---|---|---|---|
| 2010 | 261 | 122741.8 | male | Burgenland | 17.414524 | 2.2332286 |
| 2010 | 288 | 137822.2 | female | Burgenland | 21.432598 | 2.0899871 |
| 2010 | 359 | 182732.9 | male | Vorarlberg | 12.973259 | 3.1093204 |
| 2010 | 374 | 194622.1 | female | Vorarlberg | 19.883637 | 3.7576712 |
| 2010 | 440 | 253143.7 | male | Salzburg | 9.156964 | 1.9028612 |
| 2010 | 484 | 282307.3 | female | Salzburg | 17.939382 | 2.5374889 |
| 2010 | 517 | 268581.4 | male | Carinthia | 10.552148 | 2.0614209 |
| 2010 | 549 | 260564.0 | NA | Burgenland | 19.539836 | 1.8042764 |
| 2010 | 561 | 295066.6 | female | Carinthia | 15.392924 | 1.9756282 |
| 2010 | 650 | 339566.5 | male | Tyrol | 12.857542 | 1.0371712 |
| 2010 | 667 | 362332.5 | female | Tyrol | 17.604861 | 2.2145631 |
| 2010 | 733 | 377355.0 | NA | Vorarlberg | 16.537310 | 3.2896891 |
| 2010 | 924 | 535451.0 | NA | Salzburg | 13.787343 | 2.2585866 |
| 2010 | 1078 | 563648.0 | NA | Carinthia | 13.086268 | 1.7469304 |
| 2010 | 1128 | 571011.7 | male | Styria | 11.671247 | 1.5111686 |
| 2010 | 1132 | 774405.4 | male | Vienna | 15.590616 | 1.3348673 |
| 2010 | 1167 | 596033.3 | female | Styria | 16.964539 | 1.3758548 |
| 2010 | 1190 | 824525.6 | female | Vienna | 18.778813 | 0.9304295 |
| 2010 | 1317 | 701899.0 | NA | Tyrol | 15.308191 | 1.3943715 |
| 2010 | 1363 | 684272.5 | male | Upper Austria | 9.074690 | 1.1711956 |
| 2010 | 1387 | 772593.2 | female | Lower Austria | 16.372949 | 1.8366058 |
| 2010 | 1417 | 783115.8 | male | Lower Austria | 11.348283 | 1.6437512 |
| 2010 | 1442 | 737347.5 | female | Upper Austria | 12.574205 | 0.7710579 |
| 2010 | 2295 | 1167045.0 | NA | Styria | 14.374637 | 1.3500431 |
| 2010 | 2322 | 1598931.0 | NA | Vienna | 17.234683 | 1.0428496 |
| 2010 | 2804 | 1555709.0 | NA | Lower Austria | 13.843623 | 1.7034375 |
| 2010 | 2805 | 1421620.0 | NA | Upper Austria | 10.889773 | 0.8709790 |
| 2010 | 7267 | 3979571.7 | male | NA | 12.026600 | 0.6303407 |
| 2010 | 7560 | 4202650.3 | female | NA | 16.733508 | 0.6300745 |
| 2010 | 14827 | 8182222.0 | NA | NA | 14.444218 | 0.5838199 |