| Title: | Assessing Predisposition Between Phenotypes using Polygenic Scores | 
| Version: | 1.0.0 | 
| Description: | Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object. | 
| Depends: | R (≥ 3.5.0) | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Imports: | ggplot2, stats, utils, MASS, nnet, parallel, ivreg | 
| LazyData: | true | 
| Suggests: | testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-15 14:26:52 UTC; vincentp | 
| Author: | Vincent Pascat  | 
| Maintainer: | Vincent Pascat <vincent.pascat@univ-lille.fr> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-15 14:40:02 UTC | 
Association of a PGS distribution with a Phenotype
Description
assoc() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a data frame showing the association of PGS on the Phenotype
Usage
assoc(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = ""
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
phenotype_col | 
 a character specifying the Phenotype column name  | 
scale | 
 a boolean specifying if scaling of PGS should be done before testing  | 
covar_col | 
 a character vector specifying the covariate column names (facultative)  | 
verbose | 
 a boolean (TRUE by default) to write in the console/log messages.  | 
log | 
 a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink.  | 
Value
return a data frame showing the association of the PGS on the Phenotype with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous','Ordered Categorical','Categorical'or'Cases/Controls'Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression','Binary logistic regression','Ordinal logistic regression'or'Multinomial logistic regression'Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
Examples
results <- assoc(
  df = comorbidData,
  prs_col = "ldl_PGS",
  phenotype_col = "log_ldl",
  scale = TRUE,
  covar_col = c("age", "sex", "gen_array")
)
print(results)
Multiple PGS Associations Plot
Description
assocplot() takes a data frame of associations. Returns plot of the associations
from assoc() (ggplot2 object or list of ggplot object)
Usage
assocplot(score_table = NULL, axis = "vertical", pval = FALSE)
Arguments
score_table | 
 a dataframe with association results with at least the following columns: 
  | 
axis | 
 a character,   | 
pval | 
 a parameter specifying information on how to display P-value 
  | 
Value
return either:
a ggplot object representing the association results.
a list of two ggplot objects, accessible by $continuous_phenotype and $discrete_phenotype, if there are both Continuous Phenotypes and Discrete Phenotypes (i.e. "Categorical" or "Cases/Controls")
Centiles Plot from a PGS Association
Description
centileplot() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot (ggplot2 object) with centiles (or deciles if not enough individuals)
of PGS in x and Prevalence/Median/Mean of the Phenotype in y
Usage
centileplot(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  decile = FALSE,
  continuous_metric = NA
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
phenotype_col | 
 a character specifying the Phenotype column name  | 
decile | 
 a boolean specifying if centiles or deciles should be used  | 
continuous_metric | 
 a facultative character specifying what metric to
use for continuous Phenotype, only three options:   | 
Value
return a figure of results in the format ggplot2 object
Mock dataset for comorbidPGS package
Description
A dataset with sets of PGSs, Phenotypes and Covariates to demo the comorbidPGS package
Usage
comorbidData
Format
who
A data frame with 10,000 rows (individuals) and 16 columns:
- ID
 Individual's identifier, characters
- sex
 Sex of the individuals, binary numeric values
- age
 Age of the individuals, numeric value
- gen_array
 The genotypic array used for those individuals, factor values
- ethnicity
 The ethnicity of individuals, can be also used as Categorical Phenotype, factor values
- brc_PGS, t2d_PGS, ldl_PGS
 Three distributions of PGS for Breast Cancer, Type 2 Diabetes and Hypertension respectively; numeric values
- brc, t2d, hypertension
 Three Cases/Controls Phenotypes, representing Breast Cancer, Type 2 Diabetes and Hypertension respectively; binary values
- ldl, bmi, sbp
 Three Continuous Phenotypes, representing low-density lipoprotein, body-mass index, and systolic blood pressure respectively; numeric values
- log_ldl
 A continuous Phenotype, based on log(ldl) to have a normal distribution; numeric values
- sbp_cat
 An Ordered Categorical Phenotype, with 3 possible outcomes: low, normal or high systolic blood pressure; factor values
Source
https://github.com/VP-biostat/comorbidPGS
Deciles BoxPlot from a PGS Association with a Continuous Phenotype
Description
decileboxplot() takes a distribution of PGS, a Continuous Phenotype.
Returns a plot with deciles of PGS in x and Boxplot of the Phenotype in y
Usage
decileboxplot(df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype")
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
phenotype_col | 
 a character specifying the Continuous Phenotype column name  | 
Value
return a ggplot object (ggplot2)
Density Plot from a PGS Association
Description
densityplot() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot with density of PGS in x by Categories of the Phenotype
Usage
densityplot(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  threshold = NA
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
phenotype_col | 
 a character specifying the Phenotype column name  | 
scale | 
 a boolean specifying if scaling of PGS should be done before plotting  | 
threshold | 
 a facultative numeric specifying for Continuous Phenotype the Threshold to consider individuals as Cases/Controls as following: 
  | 
Value
return a ggplot object (ggplot2)
Mendelian Randomization Two-Stage Least Square (2SLS) method with external PGS
Description
mr_2sls() takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame of the result of the Mendelian Randomization 2SLS methods using PGS
Usage
mr_2sls(
  df = NULL,
  prs_col = "SCORESUM",
  exposure_col = NA,
  outcome_col = NA,
  scale = TRUE,
  verbose = TRUE,
  log = ""
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
exposure_col | 
 a character specifying the Exposure (Phenotype) column name  | 
outcome_col | 
 a character specifying the Outcome (Phenotype) column name  | 
scale | 
 a boolean specifying if scaling of PGS should be done before testing  | 
verbose | 
 a boolean (TRUE by default) to write in the console/log messages.  | 
log | 
 a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink.  | 
Value
return a data frame with the Mendelian Randomization association result using 2SLS method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here 2SLS)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
Examples
result <- mr_2sls(
  df = comorbidData,
  prs_col = "ldl_PGS",
  exposure_col = "log_ldl",
  outcome_col = "bmi",
  scale = TRUE
)
print(result)
Mendelian Randomization ratio method with external PGS
Description
mr_ratio() takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame showing the Mendelian Randomization ratio methods using PGS
Usage
mr_ratio(
  df = NULL,
  prs_col = "SCORESUM",
  exposure_col = NA,
  outcome_col = NA,
  scale = TRUE,
  verbose = TRUE,
  log = ""
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
exposure_col | 
 a character specifying the Exposure (Phenotype) column name  | 
outcome_col | 
 a character specifying the Outcome (Phenotype) column name  | 
scale | 
 a boolean specifying if scaling of PGS should be done before testing  | 
verbose | 
 a boolean (TRUE by default) to write in the console/log messages.  | 
log | 
 a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink.  | 
Value
return a data frame with the Mendelian Randomization association result using the ratio method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here Ratio)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
Examples
result <- mr_ratio(
  df = comorbidData,
  prs_col = "ldl_PGS",
  exposure_col = "log_ldl",
  outcome_col = "bmi",
  scale = TRUE
)
print(result)
Multiple PGS Associations from a Data Frame
Description
multiassoc() takes a data frame with distribution(s) of PGS and Phenotype(s),
and a table of associations to make from this data frame.
Returns a data frame showing the association results
Usage
multiassoc(
  df = NULL,
  assoc_table = NULL,
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = "",
  parallel = FALSE,
  num_cores = NA
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
assoc_table | 
 a dataframe or matrix specifying the associations to make from df, with 2 columns: PGS and Phenotype (in this order)  | 
scale | 
 a boolean specifying if scaling of PGS should be done before testing  | 
covar_col | 
 a character vector specifying the covariate column names (facultative)  | 
verbose | 
 a boolean (TRUE by default) to write in the console/log messages.  | 
log | 
 a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. If parallel = TRUE, the log will be incomplete  | 
parallel | 
 a boolean, if TRUE,   | 
num_cores | 
 an integer, if parallel = TRUE (default),   | 
Value
return a data frame showing the association of the PGS(s) on the Phenotype(s) with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous','Ordered Categorical','Categorical'or'Cases/Controls'Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression','Binary logistic regression','Ordinal logistic regression'or'Multinomial logistic regression'Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression, OR of logistic regression otherwise
SE: standard error of the related Effect (Beta or OR)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
Examples
assoc_table <- expand.grid(
  c("t2d_PGS", "ldl_PGS"),
  c("ethnicity","brc","t2d","log_ldl","sbp_cat")
)
results <- multiassoc(
  df = comorbidData,
  assoc_table = assoc_table,
  covar_col = c("age", "sex", "gen_array"),
  parallel = FALSE,
  verbose = FALSE
)
print(results)
Multiple PGS Associations from different Phenotypes
Description
multiphenassoc() takes a distribution of PGS and multiple Phenotypes and eventual confounders.
Returns a data frame showing the association results
Usage
multiphenassoc(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = ""
)
Arguments
df | 
 a dataframe with individuals on each row, and at least the following columns: 
  | 
prs_col | 
 a character specifying the PGS column name  | 
phenotype_col | 
 a character vector specifying the Phenotype column names  | 
scale | 
 a boolean specifying if scaling of PGS should be done before testing  | 
covar_col | 
 a character vector specifying the covariate column names (facultative)  | 
verbose | 
 a boolean (TRUE by default) to write in the console/log messages.  | 
log | 
 a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink.  | 
Value
return a data frame showing the association of the PGS on the Phenotypes with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous','Ordered Categorical','Categorical'or'Cases/Controls'Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression','Binary logistic regression','Ordinal logistic regression'or'Multinomial logistic regression'Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value