In this vignette, we demonstrate FORD algorithm in A New Measure Of Dependence: Integrated R2, a forward stepwise variable selection algorithm based on the integrated dependence measure. FORD is designed for variable ranking in both linear and nonlinear multivariate regression settings.
FORD closely follows the structure of FOCI A Simple Measure Of Conditional Dependence, but replaces the core dependence measure with irdc.
Let be the response variable and the predictor variables. Given i.i.d. samples of , FORD proceeds as follows:
Select
If , return
Iteratively add the feature that gives the maximum increase in irdc:
Stop when the irdc does not increase anymore:
If no such exists, select all variables.
Here, depends only on the first 4 features of in a nonlinear way.
set.seed(42)
n <- 2000
p <- 100
X <- matrix(rnorm(n * p), ncol = p)
colnames(X) <- paste0("X", seq_len(p))
Y <- X[, 1] * X[, 2] + sin(X[, 1] * X[, 3]) + X[, 4]^2
result_foci_1 <- foci(Y, X, numCores = 1)
result_foci_1
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#>
#> $stepT
#> [1] 0.3356423 0.4027284 0.6226254 0.7619649
#>
#> attr(,"class")
#> [1] "foci"
result_ford_1 <- ford(Y, X, numCores = 1)
result_ford_1
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#>
#> $step_nu
#> [1] 0.3198165 0.4026348 0.6324854 0.7668089
#>
#> attr(,"class")
#> [1] "ford"
We can force both FOCI and FORD to select a specific number of variables instead of using an automatic stopping rule.
result_foci_2 <- foci(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_foci_2
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#> 5: 66 X66
#>
#> $stepT
#> [1] 0.3356423 0.4027284 0.6226254 0.7619649 0.6900384
#>
#> attr(,"class")
#> [1] "foci"
result_ford_2 <- ford(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_ford_2
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#> 5: 31 X31
#>
#> $step_nu
#> [1] 0.3198165 0.4026348 0.6324854 0.7668089 0.6988827
#>
#> attr(,"class")
#> [1] "ford"
FORD provides an interpretable, irdc-based alternative to FOCI for variable selection in regression tasks. It offers a principled forward selection framework that can detect complex nonlinear relationships and be adapted for fixed-size feature subsets.
For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2