BayesChange Tutorial

Here we provide a brief tutorial of the BayesChange package. The BayesChange package contains two main functions: one that performs change points detection on time series and survival functions and one that perform clustering of time series and survival functions with common change points. Here we briefly show how to implement these.

Detecting change points

The function detect_cp provide a method for detecting change points, it is based on the work Martínez and Mena (2014) and on Corradin et al. (2025).

Depending on the structure of the data, detect_cp might perform change points detection on univariate time series or multivariate time series. For example we can create a vector of 100 observations where the first 50 observations are sampled from a normal distribution with mean 0 and variance 0.1 and the other 50 observations still from a normal distribution with mean 0 but variance 0.25.

data_uni <- as.numeric(c(rnorm(50,0,0.1), rnorm(50,1,0.25)))

Now we can run the function detect_cp, as arguments of the function we need to specify the number of iterations, the number of burn-in steps and a list with the the autoregressive coefficient phi for the likelihood of the data, the parameters a, b, c for the priors and the probability q of performing a split at each step. Since we deal with time series we need also to specify kernel = "ts".

out <- detect_cp(data = data_uni,                             
                 n_iterations = 1000, n_burnin = 100,  
                 params = list(q = 0.25, phi = 0.1, a = 1, b = 1, c = 0.1), kernel = "ts")
#> Completed:   100/1000 - in 0.009 sec
#> Completed:   200/1000 - in 0.02 sec
#> Completed:   300/1000 - in 0.029 sec
#> Completed:   400/1000 - in 0.037 sec
#> Completed:   500/1000 - in 0.046 sec
#> Completed:   600/1000 - in 0.054 sec
#> Completed:   700/1000 - in 0.063 sec
#> Completed:   800/1000 - in 0.086 sec
#> Completed:   900/1000 - in 0.096 sec
#> Completed:   1000/1000 - in 0.109 sec

With the methods print and summary we can get information about the algorithm.

print(out)
#> DetectCpObj object
#> Type: change points detection on univariate time series

summary(out)
#> DetectCpObj object
#> Detecting change points on an univariate time series:
#>  Number of burn-in iterations: 100 
#>  Number of MCMC iterations: 900 
#>  Computational time: 0.11 seconds

In order to get a point estimate of the change points we can use the method posterior_estimate that uses the method salso by David B. Dahl and Müller (2022) to get the final latent order and then detect the change points.

table(posterior_estimate(out, loss = "binder"))
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.
#> 
#>  1  2 
#> 50 50

The package also provides a method for plotting the change points.

plot(out, loss = "binder")
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.

If we define instead a matrix of data, detect_cp automatically performs a multivariate change points detection method.

data_multi <- matrix(NA, nrow = 3, ncol = 100)

data_multi[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_multi[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_multi[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

Arguments k_0, nu_0, phi_0, m_0, par_theta_c, par_theta_d and prior_var_gamma correspond to the parameters of the prior distributions for the multivariate likelihood.

out <- detect_cp(data = data_multi, n_iterations = 1000, n_burnin = 100,
                 q = 0.25, params = list(k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3), 
                      m_0 = rep(0,3), par_theta_c = 2, par_theta_d = 0.2, 
                      prior_var_gamma = 0.1), kernel = "ts")
#> Completed:   100/1000 - in 0.016 sec
#> Completed:   200/1000 - in 0.032 sec
#> Completed:   300/1000 - in 0.046 sec
#> Completed:   400/1000 - in 0.063 sec
#> Completed:   500/1000 - in 0.076 sec
#> Completed:   600/1000 - in 0.089 sec
#> Completed:   700/1000 - in 0.101 sec
#> Completed:   800/1000 - in 0.113 sec
#> Completed:   900/1000 - in 0.125 sec
#> Completed:   1000/1000 - in 0.136 sec

table(posterior_estimate(out, loss = "binder"))
#> 
#>  1  2 
#> 50 50

plot(out, loss = "binder", plot_freq = TRUE)

Function detect_cp can also be used to detect change points on survival functions. We define a matrix of one row and a vector with the infection rates.

data_mat <- matrix(NA, nrow = 100, ncol = 1)

betas <- c(rep(0.45, 25),rep(0.14,75))

With function sim_epi_data we simulate a set of infection times.

inf_times <- sim_epi_data(10000, 10, 100, betas, 1/8)

inf_times_vec <- rep(0,100)
names(inf_times_vec) <- as.character(1:100)

for(j in 1:100){
 if(as.character(j) %in% names(table(floor(inf_times)))){
   inf_times_vec[j] = table(floor(inf_times))[which(names(table(floor(inf_times))) == j)]
 }
}

data_mat[,1] <- inf_times_vec

To run detect_cp on epidemiological data we need to set kernel = "epi". Moreover, besides the usual parameters, we need to set the number of Monte Carlo replications M for the approximation of the integrated likelihood and the recovery rate xi. a0 and b0 are optional and correspond to the parameters of the gamma distribution for the integration of the likelihood.

out <- detect_cp(data = data_mat, n_iterations = 200, n_burnin = 50,
                 params = list(xi = 1/8, a0 = 40, b0 = 10, M = 1000), kernel = "epi")
#> Completed:   20/200 - in 1.257 sec
#> Completed:   40/200 - in 2.489 sec
#> Completed:   60/200 - in 3.729 sec
#> Completed:   80/200 - in 4.88 sec
#> Completed:   100/200 - in 6.057 sec
#> Completed:   120/200 - in 7.217 sec
#> Completed:   140/200 - in 8.404 sec
#> Completed:   160/200 - in 9.596 sec
#> Completed:   180/200 - in 10.754 sec
#> Completed:   200/200 - in 11.942 sec

print(out)
#> DetectCpObj object
#> Type: change points detection on an epidemic diffusion
table(posterior_estimate(out, loss = "binder"))
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.
#> 
#>  1  2 
#> 29 71

Also here, with function plot we can plot the survival function and the position of the change points.

plot(out)
#> Warning in salso::salso(mcmc_chain, loss = "VI", maxNClusters = maxNClusters, :
#> The number of clusters equals the default maximum possible number of clusters.

Clustering time dependent data with common change points

BayesChange contains another function, clust_cp, that cluster respectively univariate and multivariate time series and survival functions with common change points. Details about this methods can be found in Corradin et al. (2024).

In clust_cp the argument kernel must be specified, if data are time series then kernel = "ts" must be set. Then the algorithm automatically detects if data are univariate or multivariate.

If time series are univariate we need to set a matrix where each row is a time series.

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

Arguments that need to be specified in clust_cp are the number of iterations n_iterations, the number of elements in the normalisation constant B, the split-and-merge step L performed when a new partition is proposed and a list with the parameters of the algorithm, the likelihood and the priors..

out <- clust_cp(data = data_mat, n_iterations = 1000, n_burnin = 100, 
                kernel = "ts",
                params = list(B = 1000, L = 1, gamma = 0.5))
#> Normalization constant - completed:  100/1000 - in 0.004 sec
#> Normalization constant - completed:  200/1000 - in 0.008 sec
#> Normalization constant - completed:  300/1000 - in 0.012 sec
#> Normalization constant - completed:  400/1000 - in 0.016 sec
#> Normalization constant - completed:  500/1000 - in 0.02 sec
#> Normalization constant - completed:  600/1000 - in 0.024 sec
#> Normalization constant - completed:  700/1000 - in 0.028 sec
#> Normalization constant - completed:  800/1000 - in 0.032 sec
#> Normalization constant - completed:  900/1000 - in 0.036 sec
#> Normalization constant - completed:  1000/1000 - in 0.039 sec
#> 
#> ------ MAIN LOOP ------
#> 
#> Completed:   100/1000 - in 0.053 sec
#> Completed:   200/1000 - in 0.108 sec
#> Completed:   300/1000 - in 0.159 sec
#> Completed:   400/1000 - in 0.212 sec
#> Completed:   500/1000 - in 0.263 sec
#> Completed:   600/1000 - in 0.311 sec
#> Completed:   700/1000 - in 0.366 sec
#> Completed:   800/1000 - in 0.415 sec
#> Completed:   900/1000 - in 0.467 sec
#> Completed:   1000/1000 - in 0.517 sec

posterior_estimate(out, loss = "binder")
#> [1] 1 2 3 4 4

Method plot for clustering univariate time series represents the data colored according to the assigned cluster.

plot(out, loss = "binder")

If time series are multivariate, data must be an array, where each element is a multivariate time series represented by a matrix. Each row of the matrix is a component of the time series.

data_array <- array(data = NA, dim = c(3,100,5))

data_array[1,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[2,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[3,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

data_array[1,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[2,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[3,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))

data_array[1,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[2,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[3,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_array, n_iterations = 1000, n_burnin = 100, 
                kernel = "ts", params = list(gamma = 0.1, k_0 = 0.25, nu_0 = 5,  phi_0 = diag(0.1,3,3), m_0 = rep(0,3)))
#> Normalization constant - completed:  100/1000 - in 0.007 sec
#> Normalization constant - completed:  200/1000 - in 0.015 sec
#> Normalization constant - completed:  300/1000 - in 0.022 sec
#> Normalization constant - completed:  400/1000 - in 0.029 sec
#> Normalization constant - completed:  500/1000 - in 0.037 sec
#> Normalization constant - completed:  600/1000 - in 0.044 sec
#> Normalization constant - completed:  700/1000 - in 0.052 sec
#> Normalization constant - completed:  800/1000 - in 0.059 sec
#> Normalization constant - completed:  900/1000 - in 0.066 sec
#> Normalization constant - completed:  1000/1000 - in 0.074 sec
#> 
#> ------ MAIN LOOP ------
#> 
#> Completed:   100/1000 - in 0.07 sec
#> Completed:   200/1000 - in 0.142 sec
#> Completed:   300/1000 - in 0.213 sec
#> Completed:   400/1000 - in 0.293 sec
#> Completed:   500/1000 - in 0.366 sec
#> Completed:   600/1000 - in 0.44 sec
#> Completed:   700/1000 - in 0.517 sec
#> Completed:   800/1000 - in 0.592 sec
#> Completed:   900/1000 - in 0.667 sec
#> Completed:   1000/1000 - in 0.741 sec

posterior_estimate(out, loss = "binder")
#> [1] 1 2 2 3 3

plot(out, loss = "binder")

Finally, if we set kernel = "epi", clust_cp cluster survival functions with common change points. Also here details can be found in Corradin et al. (2024).

Data are a matrix where each row is the number of infected at each time. Inside this package is included the function sim_epi_data that simulates infection times.

data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
               c(rep(0.55, 25),rep(0.11,25)),
               c(rep(0.50, 25),rep(0.12,25)),
               c(rep(0.52, 10),rep(0.15,40)),
               c(rep(0.53, 10),rep(0.13,40)))

  inf_times <- list()

  for(i in 1:5){

    inf_times[[i]] <- sim_epi_data(S0 = 10000, I0 = 10, max_time = 50, beta_vec = betas[[i]], xi_0 = 1/8)

    vec <- rep(0,50)
    names(vec) <- as.character(1:50)

    for(j in 1:50){
      if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
      vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
      }
    }
    data_mat[i,] <- vec
  }

out <- clust_cp(data = data_mat, n_iterations = 100, n_burnin = 10, 
                kernel = "epi", 
                list(M = 100, B = 1000, L = 1, q = 0.1, gamma = 1/8))
#> Normalization constant - completed:  100/1000 - in 0.516 sec
#> Normalization constant - completed:  200/1000 - in 1.035 sec
#> Normalization constant - completed:  300/1000 - in 1.554 sec
#> Normalization constant - completed:  400/1000 - in 2.071 sec
#> Normalization constant - completed:  500/1000 - in 2.597 sec
#> Normalization constant - completed:  600/1000 - in 3.134 sec
#> Normalization constant - completed:  700/1000 - in 3.659 sec
#> Normalization constant - completed:  800/1000 - in 4.188 sec
#> Normalization constant - completed:  900/1000 - in 4.726 sec
#> Normalization constant - completed:  1000/1000 - in 5.258 sec
#> 
#> ------ MAIN LOOP ------
#> 
#> Completed:   10/100 - in 0.519 sec
#> Completed:   20/100 - in 1.095 sec
#> Completed:   30/100 - in 1.666 sec
#> Completed:   40/100 - in 2.267 sec
#> Completed:   50/100 - in 2.866 sec
#> Completed:   60/100 - in 3.499 sec
#> Completed:   70/100 - in 4.134 sec
#> Completed:   80/100 - in 4.883 sec
#> Completed:   90/100 - in 5.639 sec
#> Completed:   100/100 - in 6.5 sec

posterior_estimate(out, loss = "binder")
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.
#> [1] 1 2 3 1 4

plot(out, loss = "binder")
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.

Corradin, Riccardo, Luca Danese, Wasiur R. KhudaBukhsh, and Andrea Ongaro. 2024. “Model-Based Clustering of Time-Dependent Observations with Common Structural Changes.” https://arxiv.org/abs/2410.09552.

———. 2025. “Model-Based Clustering of Time-Dependent Observations with Common Structural Changes.” Statistics and Computing 36 (1): 7. https://doi.org/10.1007/s11222-025-10756-x.

David B. Dahl, Devin J. Johnson, and Peter Müller. 2022. “Search Algorithms and Loss Functions for Bayesian Clustering.” Journal of Computational and Graphical Statistics 31 (4): 1189–1201. https://doi.org/10.1080/10618600.2022.2069779.

Martínez, Asael Fabian, and Ramsés H. Mena. 2014. “On a Nonparametric Change Point Detection Model in Markovian Regimes.” Bayesian Analysis 9 (4): 823–58. https://doi.org/10.1214/14-BA878.