The Bivariate Geometric Conditionals Distribution (BGCD)

Introduction

This vignette introduces the Bivariate Geometric Conditionals Distribution (BGCD), defined via conditional specifications, as proposed by Ghosh, Marques, and Chakraborty (2023). The BCD package provides functions to evaluate the joint and cumulative distributions, perform random sampling, and estimate parameters via maximum likelihood.

Joint Probability: dgeomBCD()

The joint probability mass function (p.m.f.) of the BBCD is given by:

\[ P(X = x, Y = y) = K q_1^x q_2^y q_3^{xy}, \]

where \(K\) is a normalizing constant ensuring the probabilities sum to 1 and .

Note that: \(q_3 < 1\)indicates the negative correlation between \(X\) and \(Y\), while \(q_3 = 1\) indicates the independence between \(X\) and \(Y\).

Example

dgeomBCD(x = 1, y = 2, q1 = 0.5, q2 = 0.6, q3 = 0.8)
#> [1] 0.02739216
dgeomBCD(x = 0, y = 4, q1 = 0.5, q2 = 0.6, q3 = 0.8)
#> [1] 0.03081618

Cumulative Distribution: pgeomBCD()

The function pgeomBCD() computes the cumulative distribution:

\[ P(X \leq x, Y \leq y) \]

Example

pgeomBCD(x = 1, y = 2, q1 = 0.5, q2 = 0.6, q3 = 0.8)
#> [1] 0.669396
pgeomBCD(x = 0, y = 0, q1 = 0.4, q2 = 0.3, q3 = 0.9)
#> [1] 0.4306375

Random Sampling: rpoisBCD()

Generate samples from the BPCD using:

rgeomBCD(n, q1, q2, q3)

Example

set.seed(123)
samples <- rgeomBCD(n = 100, q1 = 0.5, q2 = 0.5, q3 = 0.1)
head(samples)
#>   X Y
#> 1 0 5
#> 2 0 0
#> 3 4 0
#> 4 0 0
#> 5 1 0
#> 6 3 0
cor(samples$X, samples$Y)  # Should be negative
#> [1] -0.3334655

Maximum Likelihood Estimation: MLEgeomBCD()

Estimate the parameters of the distribution from data.

Example

samples <- rgeomBCD(n = 50, q1 = 0.2, q2 = 0.2, q3 = 0.5)
result <-MLEgeomBCD(samples)
print(result)
#> $q1
#> [1] 0.270921
#> 
#> $q2
#> [1] 0.2251644
#> 
#> $q3
#> [1] 0.3809298
#> 
#> $logLik
#> [1] -65.99833
#> 
#> $AIC
#> [1] 137.9967
#> 
#> $BIC
#> [1] 143.7327
#> 
#> $convergence
#> [1] 0

For better estimation accuracy and stability, consider increasing the sample size (n = 1000)

samples <- rgeomBCD(n = 1000, q1 = 0.2, q2 = 0.2, q3 = 0.5)
result <-MLEgeomBCD(samples)
print(result)
#> $q1
#> [1] 0.21731
#> 
#> $q2
#> [1] 0.2079197
#> 
#> $q3
#> [1] 0.3958311
#> 
#> $logLik
#> [1] -1186.314
#> 
#> $AIC
#> [1] 2378.629
#> 
#> $BIC
#> [1] 2393.352
#> 
#> $convergence
#> [1] 0

Real Data Example

The dataset abortflights records the number of aborted flights by 109 aircrafts during two consecutive periods. The counts are cross-tabulated by the number of aborted flights in each period.

data(abortflights)
head(abortflights)
#>   X Y
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
table(abortflights$X, abortflights$Y)
#>    
#>      0  1  2  3  4
#>   0 34 20  4  6  4
#>   1 17  7  0  0  0
#>   2  6  4  1  0  0
#>   3  0  4  0  0  0
#>   5  2  0  0  0  0
fit <- MLEgeomBCD(abortflights)
FTtest(abortflights, "BGCD", params = fit, num_params = 3)
#> $observed
#>    0  1 2 3 4
#> 0 34 20 4 6 4
#> 1 17  7 0 0 0
#> 2  6  4 1 0 0
#> 3  0  4 0 0 0
#> 4  0  0 0 0 0
#> 5  2  0 0 0 0
#> 
#> $expected
#>            0           1          2           3            4
#> 0 36.7576770 16.82940105 7.70529487 3.527848008 1.6152154829
#> 1 15.6134533  5.92019337 2.24477499 0.851157125 0.3227354426
#> 2  6.6320819  2.08258686 0.65396780 0.205357047 0.0644856163
#> 3  2.8170905  0.73260581 0.19051971 0.049546101 0.0128848405
#> 4  1.1966075  0.25771375 0.05550390 0.011953893 0.0025745139
#> 5  0.5082795  0.09065773 0.01616989 0.002884093 0.0005144124
#> 
#> $test
#> $test$statistic
#> [1] 49.12544
#> 
#> $test$df
#> [1] 26
#> 
#> $test$p_value
#> [1] 0.00399207

Reference: Ghosh, I., Marques, F., & Chakraborty, S.(2023) A bivariate geometric distribution via conditional specification: properties and applications, Communications in Statistics - Simulation and Computation, 52:12, 5925–5945.