The segtest
package offers a suite of tools for testing
segregation distortion in F1 polyploid populations across diverse
meiotic models. These methods support autopolyploids (full polysomic
inheritance), allopolyploids (full disomic inheritance), and segmental
allopolyploids (partial preferential pairing). Double reduction is
optionally modeled fully in tetraploids and partially (at simplex loci
only) in higher ploidies. A user-specified maximum proportion of
outliers allows the method to accommodate moderate double reduction at
non-simplex loci. Offspring genotypes may be known or modeled using
genotype likelihoods to account for genotype uncertainty. Parent data
may or may not be provided, at your option. Parents can have different
(even) ploidies, at your option. Details of the methods may be found in
Gerard et al. (2025a) and Gerard et al. (2025b).
Additional functions include those that generate gamete and genotype frequencies under different models of meiosis, functions that simulate genotype (log) likelihoods, and “competing” tests for segregation distortion.
The main functions are:
seg_multi()
: Run the likelihood ratio test for
segregation distortion in parallel at many loci.multidog_to_g
: Format the genotyping output from
updog::multidog()
to be compatible with the input of
seg_multi()
.seg_lrt()
: Test for segregation distortion for any even
ploidy.gamfreq()
: Gamete frequencies.gf_freq()
: Genotype frequencies of an F1 population of
polyploids.drbounds()
: Upper bounds on the double reduction
rate(s) based on two different extreme models of meiosis.simgl()
: Simulate genotype log-likelihoods given a
vector of genotype counts.gamfreq()
will generate gamete frequencies under
different models of meiosis. gf_freq()
will generate
genotype frequencies under the same models by convolving the output from
gamfreq()
. We focus on gamfreq()
, as
gf_freq()
uses the same models but applied separately to
each parent.
For autopolyploids, specify type = "polysomic"
and,
optionally, the amount of double reduction via alpha
.
alpha
is a vector of length floor(ploidy / 4)
where element i
is the probability a gamete has
i
pairs of identical by double reduction alleles. The upper
bounds for alpha
can be found via drbounds()
.
E.g., for a parental octoploid with genotype 4 with no and moderate
levels of double reduction:
drbounds(ploidy = 8) ## DR bounds
#> [1] 0.38571429 0.02142857
gamfreq(g = 4, ploidy = 8, type = "polysomic") ## no DR
#> [1] 0.01428571 0.22857143 0.51428571 0.22857143 0.01428571
gamfreq(g = 4, ploidy = 8, alpha = c(0.1, 0.01), type = "polysomic") ## Some DR
#> [1] 0.022 0.232 0.492 0.232 0.022
For allopolyploids, the possible gamete frequencies can be found in
seg
. E.g., for a parental octoploid with genotype 4, the
possible gamete frequencies are
seg[seg$ploidy == 8 & seg$g == 4 & seg$mode %in% c("disomic", "both"), "p"]
#> [[1]]
#> [1] 0.0625 0.2500 0.3750 0.2500 0.0625
#>
#> [[2]]
#> [1] 0.00 0.25 0.50 0.25 0.00
#>
#> [[3]]
#> [1] 0 0 1 0 0
Note that you also need to filter for the mode
to be
either "disomic"
or "both"
(both disomic and
polysomic). The total number of possible allopolyploid distributions is
n_pp_mix()
.
You can specify one of these distributions via a 1-of-3 vector. E.g.
gamfreq(g = 4, ploidy = 8, gamma = c(1, 0, 0), type = "mix")
#> [1] 0.0625 0.2500 0.3750 0.2500 0.0625
gamfreq(g = 4, ploidy = 8, gamma = c(0, 1, 0), type = "mix")
#> [1] 0.00 0.25 0.50 0.25 0.00
gamfreq(g = 4, ploidy = 8, gamma = c(0, 0, 1), type = "mix")
#> [1] 0 0 1 0 0
Segmental allopolyploids are mixtures of the possible allopolyploid segregation distributions. E.g., an equal mixture of the three for an octoploid with genotype 4 is
gamfreq(g = 4, ploidy = 8, gamma = c(1, 1, 1)/3, type = "mix")
#> [1] 0.02083333 0.16666667 0.62500000 0.16666667 0.02083333
At simplex, loci, there is only one possible allopolyploid segregation distribution:
n_pp_mix(g = 1, ploidy = 8)
#> [1] 1
gamfreq(g = 1, ploidy = 8, gamma = 1, type = "mix")
#> [1] 0.5 0.5 0.0 0.0 0.0
n_pp_mix(g = 7, ploidy = 8)
#> [1] 1
gamfreq(g = 7, ploidy = 8, gamma = 1, type = "mix")
#> [1] 0.0 0.0 0.0 0.5 0.5
You can account for double reduction at these loci by including
beta
. The upper bound of which can be found via
beta_bounds()
.
Let’s suppose we have some genotype frequencies we want to simulate individual data from:
gf <- gf_freq(
p1_g = 2,
p1_ploidy = 6,
p1_gamma = c(0.7, 0.3),
p1_type = "mix",
p2_g = 4,
p2_ploidy = 6,
p2_gamma = c(0.5, 0.5),
p2_type = "mix")
plot(gf, type = "h", xlab = "Genotype", ylab = "Frequency")
To simulate genotype counts, just use multinom()
from
the stats
package. Let’s simulate data from 10
individuals.
To simulate genotype (log) likelihoods, insert these genotype counts
into simgl()
.
gl <- simgl(nvec = x)
gl
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] -2.476575 -1.178405 -2.395080 -4.380673 -7.348080 -12.283402 -26.730748
#> [2,] -10.226929 -2.738521 -1.514758 -1.614191 -2.750519 -5.504672 -16.089299
#> [3,] -13.071998 -3.968154 -1.970660 -1.450388 -1.970660 -3.968154 -13.071998
#> [4,] -16.089299 -5.504672 -2.750519 -1.614191 -1.514758 -2.738521 -10.226929
#> [5,] -19.319654 -7.362628 -3.867459 -2.118237 -1.394945 -1.826320 -7.525849
#> [6,] -16.089299 -5.504672 -2.750519 -1.614191 -1.514758 -2.738521 -10.226929
#> [7,] -19.319654 -7.362628 -3.867459 -2.118237 -1.394945 -1.826320 -7.525849
#> [8,] -16.089299 -5.504672 -2.750519 -1.614191 -1.514758 -2.738521 -10.226929
#> [9,] -22.827562 -9.587669 -5.365836 -3.006407 -1.654578 -1.273137 -4.947896
#> [10,] -13.071998 -3.968154 -1.970660 -1.450388 -1.970660 -3.968154 -13.071998
You can test for segregation distortion using seg_lrt()
.
E.g., let’s test for it using the data (both known genotypes and
genotype likelihoods) we simulated from the previous section:
## With known genotypes
sout1 <- seg_lrt(x = x, p1_ploidy = 6, p2_ploidy = 6, p1 = 2, p2 = 4)
sout1$p_value
#> [1] 0.5820762
## With genotype likelihoods
sout2 <- seg_lrt(x = gl, p1_ploidy = 6, p2_ploidy = 6, p1 = 2, p2 = 4)
sout2$p_value
#> [1] 0.5860578
My recommendation is to always use the genotype log-likelihoods. But
seg_lrt()
allows for known genotypes, if that situation
works best for you.
The default (model = "seg"
) is to assume your organism
is a segmental allopolyploid, and to account for possible double
reduction at simplex loci. But you should absolutely use other models if
you have more information on your organism:
"seg"
: General segmental allopolyploid with possible
double reduction at simplex loci."allo_pp"
: Segmental allopolyploid with complete
bivalent pairing (no double reduction)."allo"
: Pure allopolyploid (disomic inheritance)."auto"
: Pure autopolyploid with complete bivalent
pairing (no double reduction)."auto_dr"
: Pure autopolyploid with possible multivalent
pairing (some double reduction)."auto_allo"
: Same null hypothesis as in polymapR.We allow for some non-valid genotypes via the ob
argument. This is the upper bound on the proportion of outliers. By
default, this is set to 0.03. You can set this to 0
(or set
outlier = FALSE
) if you want any outliers to indicate
segregation distortion.
Make sure that the log-likelihoods are base \(e\). If they are base 10, you’ll get the wrong \(p\)-value:
gl10 <- gl / log(10)
seg_lrt(x = gl10, p1_ploidy = 6, p2_ploidy = 6, p1 = 2, p2 = 4)$p_value
#> [1] 0.9141411
Don’t mess with the technical arguments (ntry
,
opt
, optg
, df_tol
). These have to
do with the optimization and how to approximate the degrees of freedom
of the test. Except possibly ntry
. You could increase that
if you are seeing weird results. But then let me know, because I haven’t
seen any bad behavior with ntry = 3
(the default).
Gerard D, Thakkar M, & Ferrão LFV (2025a). “Tests for segregation distortion in tetraploid F1 populations.” Theoretical and Applied Genetics, 138(30), p. 1–13. doi:10.1007/s00122-025-04816-z.
Gerard, D, Ambrosano, GB, Pereira, GdS, & Garcia, AAF (2025b). “Tests for segregation distortion in higher ploidy F1 populations.” bioRxiv, p. 1–20. bioRxiv:2025.06.23.661114