The scf R package provides a structured, reproducible,
and pedagogically-conscious toolkit for analyzing the U.S. Federal
Reserve’s Survey of Consumer Finances (SCF), one of the
highest-quality data sources for information on U.S. households’ balance
sheets and income statements.
It wraps replicate-weighted, multiply-imputed SCF data into through a
custom data object (scf_mi_survey) with which users can
implement custom easy-to-use functions for generating proper population
estimates for descriptive statistics, hypothesis testing, regression
modeling, and high-quality visualizations.
scf_download(): Downloads and preprocesses SCF
microdata, including all five implicates and 999 replicate weights.scf_load(): Loads .rds files into
structured scf_mi_survey objects ready for analysis.scf_update(): Adds or transforms variables across
implicates.scf_subset(): Subsets the data consistently across all
implicates.scf_freq(): Weighted frequency tables for categorical
variables.scf_xtab(): Cross-tabulations by row, column, or cell
percentages.scf_mean(), scf_median(),
scf_percentile(): Computes groupwise or overall statistics
using Rubin’s Rules or a commensurate methodology.scf_corr(): Weighted Pearson correlations.scf_ttest(): One-sample and two-sample t-tests for
continuous variables.scf_prop_test(): One-sample and two-sample proportion
tests for binary variables.scf_MIcombine(): Combines estimates across imputations
using Rubin’s Rules (internal to most functions).scf_ols(): Linear regression with pooled estimates and
implicate diagnostics.scf_glm(): Generalized linear models (e.g., logistic,
Poisson).scf_logit(): Wrapper for logistic regression with
optional odds ratio output.All model functions (scf_ols, scf_glm, scf_logit) return objects of
class scf_model_result, with methods for
coef(), vcov(), predict(),
AIC(), residuals(), and
summary().
scf_plot_dist(): Kernel density plots for visualizing
and comparing distributions by group.scf_plot_dbar(): Bar plots of categorical variable
distributions.scf_plot_bbar(): Stacked bar plots for two categorical
variables.scf_plot_cbar(): Bar plots for continuous variable
summaries by group.scf_plot_smooth(): Smoothed line plots for continuous
distributions.scf_plot_hist(): Weighted histograms of continuous
variables.scf_plot_hex(): Weighted hexbin plots for bivariate
continuous data.print(), summary(): Custom methods for
clean, interpretable output in analysis and teaching.Install the latest version of the package through CRAN:
install.packages("scf")The package requires R ≥ 3.6 and the following packages:
survey (for replicate-weighted designs)ggplot2 (for plotting)httr, haven (for downloading and reading
SCF data)mitools, stats, utils,
methods, and others (loaded automatically)Use install.packages() to install any missing
dependencies manually if needed.
``` r, eval = F # Download SCF data for 2022: scf_download(2022)
scf2022 <- scf_load(2022)
```r, include = F
# This document will use mock data for CRAN compliance
# use the above method to download and load data in your analysis instead of:
scf2022 <- readRDS(system.file("extdata", "mock_scf2022.rds", package = "scf"))
# NOTE: Mock data for demonstration only.
# Use `scf_download()` and `scf_load()` for full SCF datasets.
# Frequency of education categories
scf_freq(scf2022, ~edcl)
# Median household net worth
scf_median(scf2022, ~networth)
# 90th percentile of income
scf_percentile(scf2022, ~income, q = 0.9)
# Histogram of net worth distribution
scf_plot_hist(scf2022, ~networth)
# Smoothed density plot of income
scf_plot_smooth(scf2022, ~income)# Cross-tabulation of education and homeownership
scf_xtab(scf2022, ~edcl, ~own)
# Stacked bar chart: homeownership by education
scf_plot_bbar(scf2022, ~edcl, ~own)
# Weighted bar chart: mean net worth by education
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")
# Grouped median income by race
scf_median(scf2022, ~income, by = ~racecl)
# Correlation between income and net worth
scf_corr(scf2022, ~income, ~networth)
# Hexbin plot: income vs. net worth
scf_plot_hex(scf2022, ~income, ~networth)# One-sample proportion test: Is more than 10% of households rich?
scf_prop_test(scf2022, ~I(networth > 1e6), p = 0.10, alternative = "greater")
# Two-sample proportion test: Are women less likely to be rich?
scf_prop_test(scf2022, ~I(networth > 1e6), ~factor(hhsex, labels = c("Male", "Female")), alternative = "less")
# One-sample t-test: Is mean income different from $75,000?
scf_ttest(scf2022, ~income, mu = 75000)
# Two-sample t-test: Are older households wealthier?
scf_ttest(scf2022, ~networth, ~I(age > 50), alternative = "greater")# Linear regression: Predict net worth from income and education
scf_ols(scf2022, networth ~ income + factor(edcl))
# Generalized linear model: Predict borrowing with logistic regression
scf_glm(scf2022, hborrff ~ income + age + factor(edcl), family = binomial())
# Logit wrapper: Predict probability of owning stocks
scf_logit(scf2022, ~I(owns_stocks == 1) ~ age + income + factor(edcl))
# Bar chart of a single categorical variable
scf_plot_dbar(scf2022, ~edcl)
# Stacked bar chart comparing education by race
scf_plot_bbar(scf2022, ~edcl, ~racecl, scale = "percent", percent_by = "row")
# Smoothed line plot of net worth distribution
scf_plot_smooth(scf2022, ~networth, xlim = c(0, 2e6), method = "loess")
# Histogram of income distribution
scf_plot_hist(scf2022, ~income, bins = 40, xlim = c(0, 300000))
# Bar chart of mean net worth by education level
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")
# Hexbin plot: net worth vs. income
scf_plot_hex(scf2022, ~income, ~networth, bins = 60)# Create new variables across all implicates
scf2022 <- scf_update(scf2022,
rich = networth > 1e6,
senior = age >= 65,
log_income = log(income + 1)
)
# Subset to working-age households with positive net worth
scf_sub <- scf_subset(scf2022, age >= 25 & age < 65 & networth > 0)
# Extract implicate-level estimates from a frequency table
freq <- scf_freq(scf_sub, ~own)
scf_implicates(freq, long = TRUE)For detailed examples, function documentation, and usage guides, consult the package vignettes and reference manual.
This package includes a small mock dataset
(mock_scf2022.rds) for testing purposes.
It includes only 75 rows and select variables. It is structurally
valid,
but not suitable for analytical use or inference. Mock
data objects carry a “mock” = TRUE attribute and may trigger warnings in
functions to discourage interpretive use.
If you use scf in published work, please cite it as:
Joseph N. Cohen (2025). scf: Tools for Analyzing the Survey of Consumer Finances. R package. ver. 1.0.5. https://github.com/jncohen/scf
Use citation("scf") in R for formatted references.
Joseph N. Cohen
Department of Sociology & Program in Data Analytics
Queens College, City University of New York
joseph.cohen@qc.cuny.edu
https://jncohen.commons.gc.cuny.edu