Bayesian Modal Regression Analysis of 2003 United States Crime Data

library(GUD)

Introduction

The general unimodal distribution (GUD) family is essentially a family of two-component mixture distributions. The probability density function (pdf) of a member of GUD family is \[ f\left(y \mid w, \theta, \boldsymbol{\xi}_1, \boldsymbol{\xi}_2\right)=w f_1\left(y \mid \theta, \boldsymbol{\xi}_1\right)+(1-w) f_2\left(y \mid \theta, \boldsymbol{\xi}_2\right), \] where \(w \in [0,1]\) is the weight parameter, \(\theta \in (-\infty, +\infty)\) is the mode as a location parameter, \(\boldsymbol{\xi}_1\) consists of parameters other than the location parameter in \(f_1\left(\cdot \mid \theta, \boldsymbol{\xi}_1\right)\) and \(\boldsymbol{\xi}_2\) is defined similarly for \(f_2\left(\cdot \mid \theta, \boldsymbol{\xi}_2\right)\). Besides unimodality, all members of the GUD family share three features:

The pdfs \(f_1\left(\cdot \mid \theta, \boldsymbol{\xi}_1\right)\) and \(f_2\left(\cdot \mid \theta, \boldsymbol{\xi}_2\right)\) are unimodal at \(\theta\).
The pdfs \(f_1\left(\cdot \mid \theta, \boldsymbol{\xi}_1\right)\) and \(f_2\left(\cdot \mid \theta, \boldsymbol{\xi}_2\right)\) are left-skewed and right-skewed respectively.
The mixture pdf \(f\left(\cdot \mid w, \theta, \boldsymbol{\xi}_1, \boldsymbol{\xi}_2\right)\) in (1) is continuous in its domain.

More details of the GUD family can be found in Liu, Q., Huang, X., & Bai, R. (2024).

Bayesian Modal Regression Analysis of 2003 United States Crime Data

In this section, we demonstrate how to use the GUD package to analyze 2003 United States Crime Data as in Section 2 of Liu, Q., Huang, X., & Bai, R. (2024).

In “The Art and Science of Learning from Data, 5th edition” by Alan Agresti, Christine A. Franklin, and Bernhard Klingenberg, an interesting example about the 2003 United States crime data is presented to demonstrate the influence of outliers in the classic linear regression model. This example is very compelling and partially motivates the construction of the Bayesian modal regression based on the GUD family. This data contains the murder rate, percentage of college education, poverty percentage, and metropolitan rate for the 50 states in the United States and the District of Columbia (D.C.) from 2003. The murder rate is defined as the annual number of murders per \(100{,}000\) people in the population. The poverty percentage is the percentage of residents with income below the poverty level, and the metropolitan rate is defined as the percentage of population living in the metropolitan area. In the exploratory data analysis, we present the conditional scatter plot matrices below.

# load data crime from the GUD package
df1 <- crime
# the conditional scatter plot matrices of U.S. crime data
if (require(lattice)) {
  lattice::splom(~df1[c(6,4,9,3)],
                 main = NULL,
                 panel = function(x,y,...) {
                   panel.splom(x,y,...)
            })
}
#> Loading required package: lattice

In the conditional scatter plot matrices, we notice an outlier, Washington, D.C., which stands out and does not follow the common pattern of other states.

Next, we demonstrate how to fit the Bayesian modal regression model based on the TPSC distribution to the 2003 United States crime data.

TPSC_model <- modal_regression(`murder rate` ~ college + poverty + metropolitan, 
                               data = df1, 
                               model = "TPSC",
                               chains = 2,
                               iter = 2000)
#> 
#> SAMPLING FOR MODEL 'TPSC' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 6.4e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.64 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 1.052 seconds (Warm-up)
#> Chain 1:                0.72 seconds (Sampling)
#> Chain 1:                1.772 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'TPSC' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 1.3e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.995 seconds (Warm-up)
#> Chain 2:                0.719 seconds (Sampling)
#> Chain 2:                1.714 seconds (Total)
#> Chain 2:

Summary of Bayesian Analysis

One can summarize the Bayesian analysis using the summary function.

print(summary(TPSC_model), n = 7)
#> # A tibble: 113 × 10
#>   variable        mean  median     sd    mad       q5     q95  rhat ess_bulk
#>   <chr>          <dbl>   <dbl>  <dbl>  <dbl>    <dbl>   <dbl> <dbl>    <dbl>
#> 1 w             0.278   0.283  0.117  0.129   0.0808   0.465   1.01     343.
#> 2 delta         1.91    1.81   0.670  0.575   1.05     3.26    1.00     601.
#> 3 sigma         1.18    1.17   0.265  0.261   0.761    1.63    1.00     633.
#> 4 (Intercept)   1.24    1.41   2.68   2.58   -3.38     5.24    1.00     701.
#> 5 college      -0.201  -0.206  0.0851 0.0842 -0.331   -0.0521  1.00     750.
#> 6 poverty       0.239   0.243  0.135  0.138   0.00730  0.448   1.00     555.
#> 7 metropolitan  0.0636  0.0625 0.0150 0.0145  0.0407   0.0900  1.00     711.
#> # ℹ 106 more rows
#> # ℹ 1 more variable: ess_tail <dbl>

One can present the traceplot of the MCMC chain using the bayesplot::mcmc_trace function.

if (require(bayesplot)) {
  bayesplot::mcmc_trace(TPSC_model, pars = c("(Intercept)",
                                             "college", 
                                             "poverty", 
                                             "metropolitan"))
}
#> Loading required package: bayesplot
#> Warning: package 'bayesplot' was built under R version 4.3.1
#> This is bayesplot version 1.11.1
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

The summary of posterior predictive distribution can be assessed using the following command. Here ystar[1] represents the posterior prediction of the first observation in the dataset, and so on.

summary(posterior::subset_draws(TPSC_model, variable = "ystar"))
#> # A tibble: 51 × 10
#>    variable   mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
#>    <chr>     <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#>  1 ystar[1]   7.08   6.18  8.24  2.02  3.06  12.8   1.00    1569.    1736.
#>  2 ystar[2]   2.26   1.24  7.96  1.98 -1.57   8.42  1.00    1777.    1963.
#>  3 ystar[3]   7.41   6.17 12.7   1.89  3.29  13.8   1.00    1806.    1691.
#>  4 ystar[4]   6.42   5.61  7.98  2.27  2.33  12.7   1.00    1899.    1922.
#>  5 ystar[5]   7.08   6.17 10.5   1.90  3.47  12.8   1.00    1604.    1866.
#>  6 ystar[6]   3.47   2.67  7.71  2.10 -0.423  9.84  1.00    1797.    1788.
#>  7 ystar[7]   7.04   3.83 73.2   2.10  0.824 12.5   1.00    1822.    1886.
#>  8 ystar[8]   6.33   4.99  6.97  1.98  2.20  13.6   1.00    1823.    1882.
#>  9 ystar[9]   6.16   5.14 16.2   2.60  1.18  12.4   1.00    1496.    1823.
#> 10 ystar[10]  8.10   6.50 19.3   1.96  3.45  13.9   1.00    1626.    1842.
#> # ℹ 41 more rows

Further comparisons between mean, median, and modal regression can be found in Section 2 of Liu, Q., Huang, X., & Bai, R. (2024) and Section 6 of Liu, Q., Huang, X., & Zhou, H. (2024).