In this vignette we will introduce different ways to create a drug utilisation cohort using DrugUtilisation.
To create a cdm_reference object you will need to use the package CDMConnector. For this vignette we will use mock data contained in the DrugUtilisation package.
library(DrugUtilisation)
library(CodelistGenerator)
library(dplyr)
library(CDMConnector)
<- mockDrugUtilisation(numberIndividual = 200) cdm
The first thing that we need is a concept list. The concept list can be obtained through different ways:
To create a cohort with a concept list from a .json file, use
codesFromConceptSet()
from CodelistGenerator package. Let’s
see an example:
<- codesFromConceptSet(here::here("inst/Concept"), cdm)
conceptSet_json
conceptSet_json#> $asthma
#> [1] 317009
The concept list can also be created manually:
#get concept using code directly
<- list(asthma = 317009)
conceptSet_code
conceptSet_code#> $asthma
#> [1] 317009
In case there is a certain ingredient of interest, the code can also
be obtained by getDrugIngredientCodes()
from the package
CodelistGenerator.
#get concept by ingredient
<- getDrugIngredientCodes(cdm, name = "simvastatin")
conceptSet_ingredient
conceptSet_ingredient#> $simvastatin
#> [1] 1539403 1539462 1539463
We can also obtain the ATC code by using getATCCodes()
from CodelistGenerator package.
#get concept from ATC codes
<- getATCCodes(cdm,
conceptSet_ATC level = "ATC 1st",
name = "ALIMENTARY TRACT AND METABOLISM")
conceptSet_ATC#> $alimentary_tract_and_metabolism
#> [1] 35897399
Once we have the conceptSet
, we can proceed to generate
a cohort. There are two functions in this package to do that:
generateConceptCohortSet()
: to generate a cohort for
a certain list of concepts (they do not have to be a drug). This
function is exported from CDMConnector package.
generateDrugUtilisationCohortSet()
: to generate a
cohort of the drug use.
Let’s try to use generateConceptCohortSet()
to get the
asthma cohort using the conceptSet_code
created before. We
could also use conceptSet_json_1
or
conceptSet_json_2
to obtain the same result.
<- generateConceptCohortSet(cdm,
cdm conceptSet = conceptSet_code,
name = "asthma_1",
overwrite = TRUE
)$asthma_1
cdm#> # Source: table<main.asthma_1> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 78 1969-04-05 1971-07-21
#> 2 1 128 2021-07-27 2021-08-05
#> 3 1 174 2014-09-13 2018-06-09
#> 4 1 123 1986-09-05 1990-02-02
#> 5 1 47 1994-08-15 2002-04-02
#> 6 1 137 2008-06-22 2019-11-21
#> 7 1 181 2013-07-28 2016-01-02
#> 8 1 152 2020-09-17 2021-07-21
#> 9 1 169 1996-01-01 2015-08-07
#> 10 1 33 2017-12-30 2019-01-28
#> # ℹ more rows
The count of the cohort can be assessed using
cohortCount()
from CDMConnector.
cohortCount(cdm$asthma_1)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 100 100
Cohort attrition can be assessed using attrition()
from
CDMConnector.
attrition(cdm$asthma_1)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 100 100 1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
You can use the end
parameter to set how the cohort end
date will be defined. By default,
end = observation_period_end_date
, but it can also be
defined as event_end_date
or by defining a numeric scalar.
See an example below:
<- generateConceptCohortSet(cdm,
cdm conceptSet = conceptSet_code,
name = "asthma_2",
end = "event_end_date",
overwrite = TRUE
)$asthma_2
cdm#> # Source: table<main.asthma_2> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 69 2020-09-02 2020-11-07
#> 2 1 99 1989-03-11 1989-11-22
#> 3 1 6 2021-11-24 2022-01-17
#> 4 1 117 1991-11-16 1993-05-08
#> 5 1 85 1998-12-22 2005-06-03
#> 6 1 96 2019-08-18 2020-06-03
#> 7 1 74 2019-04-13 2019-10-20
#> 8 1 124 2005-10-19 2007-01-29
#> 9 1 143 2017-12-15 2018-01-08
#> 10 1 95 2007-11-28 2008-10-24
#> # ℹ more rows
The requiredObservation
parameter is a numeric vector of
length 2, that defines the number of days of required observation time
prior to index and post index for an event to be included in the cohort.
The default value is c(0,0)
. Let’s check how the difference
between asthma_3 and asthma_2 when
changing this parameter.
<- generateConceptCohortSet(cdm,
cdm conceptSet = conceptSet_code,
name = "asthma_3",
end = "observation_period_end_date",
requiredObservation = c(10, 10),
overwrite = TRUE
)$asthma_3
cdm#> # Source: table<main.asthma_3> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 191 2018-02-06 2018-05-19
#> 2 1 199 1996-10-11 2014-04-23
#> 3 1 108 2008-08-26 2013-05-19
#> 4 1 174 2014-09-13 2018-06-09
#> 5 1 15 1998-07-28 2010-06-14
#> 6 1 32 2007-10-29 2011-08-04
#> 7 1 158 2020-09-03 2020-10-10
#> 8 1 50 2021-03-13 2022-09-02
#> 9 1 139 1988-03-24 2001-02-11
#> 10 1 150 1980-07-07 1995-03-30
#> # ℹ more rows
cohortCount(cdm$asthma_3)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 94 94
attrition(cdm$asthma_3)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 94 94 1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
Now let’s try the function
generateDrugUtilisationCohortSet()
to get the drug cohort
for the ingredient simvastatin. See an example below:
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_1",
conceptSet = conceptSet_ingredient
)$simvastin_1
cdm#> # Source: table<main.simvastin_1> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 1 2021-08-10 2021-11-22
#> 2 1 78 1969-01-22 1970-08-13
#> 3 1 10 2009-05-07 2014-05-19
#> 4 1 18 2016-12-18 2018-08-13
#> 5 1 49 2021-08-25 2022-01-07
#> 6 1 93 1987-02-01 1992-05-21
#> 7 1 131 2022-05-26 2022-08-04
#> 8 1 111 2016-09-17 2016-10-21
#> 9 1 25 1992-06-19 2009-01-18
#> 10 1 170 2017-12-15 2018-01-25
#> # ℹ more rows
cohortCount(cdm$simvastin_1)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 109 99
attrition(cdm$simvastin_1)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 109 99 1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
durationRange
specifies the range within
which the duration must fall, where duration will be calculated as:
duration = cohort_end_date - cohort_start_date + 1
The default value is c(1, Inf)
. See that this parameter
must be a numeric vector of length two, with no NAs and with the first
value equal or bigger than the second one. Duration values outside of
durationRange
will be imputed using
imputeDuration
. imputeDuration
can be set as:
none
(default), median
, mean
,
mode
or an integer (count).
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_2",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf) # default as c(1, Inf)
)
attrition(cdm$simvastin_2)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 109 99 1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
The gapEra
parameter defines the number of days between
two continuous drug exposures to be considered as a same era. Now let’s
change it from 0 to a larger number to see what happens.
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_3",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30 # default as 0
)
attrition(cdm$simvastin_3) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 2 × 4
#> number_records reason excluded_records excluded_subjects
#> <int> <chr> <int> <int>
#> 1 109 Initial qualifying events 0 0
#> 2 107 join exposures separated by… 2 0
From the simvastin_3 cohort attrition, we can see that when joining eras, it resulted in less records, compared to the simvastin_2 cohort, as exposures with less than 30 days gaps are joined.
The priorUseWashout
parameter specifies the number of
prior days without exposure (often termed as ‘washout’) that are
required. By default, it is set to NULL, meaning no washout period is
necessary. See that when increasing this value, the number of records
decrease.
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_4",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30
)
attrition(cdm$simvastin_4) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 3 × 4
#> number_records reason excluded_records excluded_subjects
#> <int> <chr> <int> <int>
#> 1 109 Initial qualifying events 0 0
#> 2 107 join exposures separated by… 2 0
#> 3 107 require prior use washout o… 0 0
The parameter priorObservation
defines the minimum
number of days of prior observation necessary for drug eras to be taken
into account. If set to NULL, the drug eras are not required to fall
within the observation_period.
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_5",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30
)
attrition(cdm$simvastin_5) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 4 × 4
#> number_records reason excluded_records excluded_subjects
#> <int> <chr> <int> <int>
#> 1 109 Initial qualifying events 0 0
#> 2 107 join exposures separated by… 2 0
#> 3 107 require prior use washout o… 0 0
#> 4 99 require at least 30 prior o… 8 8
The cohortDateRange
parameter defines the range for the
cohort_start_date and cohort_end_date.
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_6",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30,
cohortDateRange = as.Date(c("2010-01-01", "2011-01-01"))
)
attrition(cdm$simvastin_6) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 6 × 4
#> number_records reason excluded_records excluded_subjects
#> <int> <chr> <int> <int>
#> 1 109 Initial qualifying events 0 0
#> 2 107 join exposures separated by… 2 0
#> 3 107 require prior use washout o… 0 0
#> 4 99 require at least 30 prior o… 8 8
#> 5 66 restrict cohort_start_date … 33 30
#> 6 13 restrict cohort_end_date on… 53 48
The input limit
allows all
(default) and
first
options. If we set it to first
, we will
only obtain the first record that fulfills all the criteria. Observe how
it impacts the attrition of the simvastin_7 in
comparison to the simvastin_6 cohort. The number of
records has decreased because of the First
limit.
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_7",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30,
cohortDateRange = as.Date(c("2010-01-01", "2011-01-01")),
limit = "First"
)
attrition(cdm$simvastin_7) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <int> <chr> <int> <int>
#> 1 109 Initial qualifying events 0 0
#> 2 107 join exposures separated by… 2 0
#> 3 107 require prior use washout o… 0 0
#> 4 99 require at least 30 prior o… 8 8
#> 5 66 restrict cohort_start_date … 33 30
#> 6 13 restrict cohort_end_date on… 53 48
#> 7 13 restric to first record 0 0
If we just wanted to get the first-ever era, we can also use this parameter. To achieve that, try the following setting:
<- generateDrugUtilisationCohortSet(cdm,
cdm name = "simvastin_8",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 0,
priorUseWashout = Inf,
priorObservation = 0,
cohortDateRange = as.Date(c(NA, NA)),
limit = "First"
)
Constructing concept sets and generating various cohorts are the initial steps in conducting a drug utilisation study. For further guidance on using getting more information like characteristics from these cohorts, please refer to the other vignettes.