Create a cohort

Once we have the conceptSet, we can proceed to generate a cohort. There are two functions in this package to do that:

generateConceptCohortSet(): to generate a cohort for a certain list of concepts (they do not have to be a drug). This function is exported from CDMConnector package.
generateDrugUtilisationCohortSet(): to generate a cohort of the drug use.

Use generateConceptCohortSet() to create a cohort

Let’s try to use generateConceptCohortSet() to get the asthma cohort using the conceptSet_code created before. We could also use conceptSet_json_1 or conceptSet_json_2 to obtain the same result.

cdm <- generateConceptCohortSet(cdm,
  conceptSet = conceptSet_code,
  name = "asthma_1",
  overwrite = TRUE
)
cdm$asthma_1
#> # Source:   table<main.asthma_1> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1         78 1969-04-05        1971-07-21     
#>  2                    1        128 2021-07-27        2021-08-05     
#>  3                    1        174 2014-09-13        2018-06-09     
#>  4                    1        123 1986-09-05        1990-02-02     
#>  5                    1         47 1994-08-15        2002-04-02     
#>  6                    1        137 2008-06-22        2019-11-21     
#>  7                    1        181 2013-07-28        2016-01-02     
#>  8                    1        152 2020-09-17        2021-07-21     
#>  9                    1        169 1996-01-01        2015-08-07     
#> 10                    1         33 2017-12-30        2019-01-28     
#> # ℹ more rows

The count of the cohort can be assessed using cohortCount() from CDMConnector.

cohortCount(cdm$asthma_1)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1            100             100

Cohort attrition can be assessed using attrition() from CDMConnector.

attrition(cdm$asthma_1)
#> # A tibble: 1 × 7
#>   cohort_definition_id number_records number_subjects reason_id reason          
#>                  <int>          <int>           <int>     <int> <chr>           
#> 1                    1            100             100         1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>

end parameter

You can use the end parameter to set how the cohort end date will be defined. By default, end = observation_period_end_date, but it can also be defined as event_end_date or by defining a numeric scalar. See an example below:

cdm <- generateConceptCohortSet(cdm,
  conceptSet = conceptSet_code,
  name = "asthma_2",
  end = "event_end_date",
  overwrite = TRUE
)
cdm$asthma_2
#> # Source:   table<main.asthma_2> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1         69 2020-09-02        2020-11-07     
#>  2                    1         99 1989-03-11        1989-11-22     
#>  3                    1          6 2021-11-24        2022-01-17     
#>  4                    1        117 1991-11-16        1993-05-08     
#>  5                    1         85 1998-12-22        2005-06-03     
#>  6                    1         96 2019-08-18        2020-06-03     
#>  7                    1         74 2019-04-13        2019-10-20     
#>  8                    1        124 2005-10-19        2007-01-29     
#>  9                    1        143 2017-12-15        2018-01-08     
#> 10                    1         95 2007-11-28        2008-10-24     
#> # ℹ more rows

requiredObservation parameter

The requiredObservation parameter is a numeric vector of length 2, that defines the number of days of required observation time prior to index and post index for an event to be included in the cohort. The default value is c(0,0). Let’s check how the difference between asthma_3 and asthma_2 when changing this parameter.

cdm <- generateConceptCohortSet(cdm,
  conceptSet = conceptSet_code,
  name = "asthma_3",
  end = "observation_period_end_date",
  requiredObservation = c(10, 10),
  overwrite = TRUE
)
cdm$asthma_3
#> # Source:   table<main.asthma_3> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1        191 2018-02-06        2018-05-19     
#>  2                    1        199 1996-10-11        2014-04-23     
#>  3                    1        108 2008-08-26        2013-05-19     
#>  4                    1        174 2014-09-13        2018-06-09     
#>  5                    1         15 1998-07-28        2010-06-14     
#>  6                    1         32 2007-10-29        2011-08-04     
#>  7                    1        158 2020-09-03        2020-10-10     
#>  8                    1         50 2021-03-13        2022-09-02     
#>  9                    1        139 1988-03-24        2001-02-11     
#> 10                    1        150 1980-07-07        1995-03-30     
#> # ℹ more rows

cohortCount(cdm$asthma_3)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1             94              94

attrition(cdm$asthma_3)
#> # A tibble: 1 × 7
#>   cohort_definition_id number_records number_subjects reason_id reason          
#>                  <int>          <int>           <int>     <int> <chr>           
#> 1                    1             94              94         1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>

Use generateDrugUtilisationCohortSet() to generate a cohort

Now let’s try the function generateDrugUtilisationCohortSet() to get the drug cohort for the ingredient simvastatin. See an example below:

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_1",
  conceptSet = conceptSet_ingredient
)
cdm$simvastin_1
#> # Source:   table<main.simvastin_1> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1          1 2021-08-10        2021-11-22     
#>  2                    1         78 1969-01-22        1970-08-13     
#>  3                    1         10 2009-05-07        2014-05-19     
#>  4                    1         18 2016-12-18        2018-08-13     
#>  5                    1         49 2021-08-25        2022-01-07     
#>  6                    1         93 1987-02-01        1992-05-21     
#>  7                    1        131 2022-05-26        2022-08-04     
#>  8                    1        111 2016-09-17        2016-10-21     
#>  9                    1         25 1992-06-19        2009-01-18     
#> 10                    1        170 2017-12-15        2018-01-25     
#> # ℹ more rows

cohortCount(cdm$simvastin_1)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1            109              99

attrition(cdm$simvastin_1)
#> # A tibble: 1 × 7
#>   cohort_definition_id number_records number_subjects reason_id reason          
#>                  <int>          <int>           <int>     <int> <chr>           
#> 1                    1            109              99         1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>

imputeDuration and durationRange parameters

The parameter durationRange specifies the range within which the duration must fall, where duration will be calculated as:

duration = cohort_end_date - cohort_start_date + 1

The default value is c(1, Inf). See that this parameter must be a numeric vector of length two, with no NAs and with the first value equal or bigger than the second one. Duration values outside of durationRange will be imputed using imputeDuration. imputeDuration can be set as: none(default), median, mean, mode or an integer (count).

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_2",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf) # default as c(1, Inf)
)

attrition(cdm$simvastin_2)
#> # A tibble: 1 × 7
#>   cohort_definition_id number_records number_subjects reason_id reason          
#>                  <int>          <int>           <int>     <int> <chr>           
#> 1                    1            109              99         1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>

gapEra paratemer

The gapEra parameter defines the number of days between two continuous drug exposures to be considered as a same era. Now let’s change it from 0 to a larger number to see what happens.

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_3",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 30 # default as 0
)

attrition(cdm$simvastin_3) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 2 × 4
#>   number_records reason                       excluded_records excluded_subjects
#>            <int> <chr>                                   <int>             <int>
#> 1            109 Initial qualifying events                   0                 0
#> 2            107 join exposures separated by…                2                 0

From the simvastin_3 cohort attrition, we can see that when joining eras, it resulted in less records, compared to the simvastin_2 cohort, as exposures with less than 30 days gaps are joined.

priorUseWashout parameter

The priorUseWashout parameter specifies the number of prior days without exposure (often termed as ‘washout’) that are required. By default, it is set to NULL, meaning no washout period is necessary. See that when increasing this value, the number of records decrease.

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_4",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 30,
  priorUseWashout = 30
)

attrition(cdm$simvastin_4) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 3 × 4
#>   number_records reason                       excluded_records excluded_subjects
#>            <int> <chr>                                   <int>             <int>
#> 1            109 Initial qualifying events                   0                 0
#> 2            107 join exposures separated by…                2                 0
#> 3            107 require prior use washout o…                0                 0

priorObservation parameter

The parameter priorObservation defines the minimum number of days of prior observation necessary for drug eras to be taken into account. If set to NULL, the drug eras are not required to fall within the observation_period.

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_5",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 30,
  priorUseWashout = 30,
  priorObservation = 30
)

attrition(cdm$simvastin_5) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 4 × 4
#>   number_records reason                       excluded_records excluded_subjects
#>            <int> <chr>                                   <int>             <int>
#> 1            109 Initial qualifying events                   0                 0
#> 2            107 join exposures separated by…                2                 0
#> 3            107 require prior use washout o…                0                 0
#> 4             99 require at least 30 prior o…                8                 8

cohortDateRange parameter

The cohortDateRange parameter defines the range for the cohort_start_date and cohort_end_date.

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_6",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 30,
  priorUseWashout = 30,
  priorObservation = 30,
  cohortDateRange = as.Date(c("2010-01-01", "2011-01-01"))
)

attrition(cdm$simvastin_6) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 6 × 4
#>   number_records reason                       excluded_records excluded_subjects
#>            <int> <chr>                                   <int>             <int>
#> 1            109 Initial qualifying events                   0                 0
#> 2            107 join exposures separated by…                2                 0
#> 3            107 require prior use washout o…                0                 0
#> 4             99 require at least 30 prior o…                8                 8
#> 5             66 restrict cohort_start_date …               33                30
#> 6             13 restrict cohort_end_date on…               53                48

limit parameter

The input limit allows all (default) and first options. If we set it to first, we will only obtain the first record that fulfills all the criteria. Observe how it impacts the attrition of the simvastin_7 in comparison to the simvastin_6 cohort. The number of records has decreased because of the First limit.

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_7",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 30,
  priorUseWashout = 30,
  priorObservation = 30,
  cohortDateRange = as.Date(c("2010-01-01", "2011-01-01")),
  limit = "First"
)

attrition(cdm$simvastin_7) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#>   number_records reason                       excluded_records excluded_subjects
#>            <int> <chr>                                   <int>             <int>
#> 1            109 Initial qualifying events                   0                 0
#> 2            107 join exposures separated by…                2                 0
#> 3            107 require prior use washout o…                0                 0
#> 4             99 require at least 30 prior o…                8                 8
#> 5             66 restrict cohort_start_date …               33                30
#> 6             13 restrict cohort_end_date on…               53                48
#> 7             13 restric to first record                     0                 0

If we just wanted to get the first-ever era, we can also use this parameter. To achieve that, try the following setting:

cdm <- generateDrugUtilisationCohortSet(cdm,
  name = "simvastin_8",
  conceptSet = conceptSet_ingredient,
  imputeDuration = "none",
  durationRange = c(0, Inf),
  gapEra = 0,
  priorUseWashout = Inf,
  priorObservation = 0,
  cohortDateRange = as.Date(c(NA, NA)),
  limit = "First"
)

Constructing concept sets and generating various cohorts are the initial steps in conducting a drug utilisation study. For further guidance on using getting more information like characteristics from these cohorts, please refer to the other vignettes.

Use DrugUtilisation to create a cohort

Marti Catala, Mike Du, Yuchen Guo, Kim Lopez-Guell, Edward Burn, Xintong Li

2024-06-03

Introduction

Create a cdm object

Get the concept code

Concept list from a .json file

Concept list listed directly

Concept list of an ingredient

Concept list from an ATC code