2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide information about the number of counts of specific concepts. Specifically, there are two key functions that facilitate this, summariseConceptCounts() and plotConceptCounts(). The former one creates a summary statistics results with the number of counts per each concept, and the latter one creates a histogram plot.

2.1 Create a mock cdm

Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using Eunomia database.

library(dplyr)
library(CDMConnector)
library(DBI)
library(duckdb)
library(OmopSketch)
library(CodelistGenerator)

# Connect to Eunomia database
con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomia_dir())
cdm <- CDMConnector::cdmFromCon(
  con = con, cdmSchema = "main", writeSchema = "main"
)

cdm 
#> 
#> ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise concept counts

First, let’s generate a list of codes for the concept dementia using CodelistGenerator package.

acetaminophen <- getCandidateCodes(
  cdm = cdm,
  keywords = "acetaminophen",
  domains = "Drug",
  includeDescendants = TRUE
) |>
  dplyr::pull("concept_id")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 7 candidate concepts identified
#> 
#> Time taken: 0 minutes and 0 seconds

sinusitis <- getCandidateCodes(
  cdm = cdm,
  keywords = "sinusitis",
  domains = "Condition",
  includeDescendants = TRUE
) |>
  dplyr::pull("concept_id")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 4 candidate concepts identified
#> 
#> Time taken: 0 minutes and 0 seconds

Now we want to explore the occurrence of these concepts within the database. For that, we can use summariseConceptCounts() from OmopSketch:

summariseConceptCounts(cdm,
                       conceptId = list("acetaminophen" = acetaminophen,                          
                                        "sinusitis" = sinusitis)) |>   
  select(group_level, variable_name, variable_level, estimate_name, estimate_value) |>   
  glimpse() 
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      
#> Rows: 24
#> Columns: 5
#> $ group_level    <chr> "acetaminophen", "acetaminophen", "acetaminophen", "ace…
#> $ variable_name  <chr> "overall", "Acetaminophen 325 MG Oral Tablet", "Acetami…
#> $ variable_level <chr> NA, "1127433", "1127078", "40229134", "40231925", "1913…
#> $ estimate_name  <chr> "record_count", "record_count", "record_count", "record…
#> $ estimate_value <chr> "14205", "9365", "2158", "1993", "306", "71", "312", "2…

By default, the function will provide information about either the number of records (estimate_name == "record_count") for each concept_id or the number of people (estimate_name == "person_count"):

summariseConceptCounts(cdm, 
                       conceptId = list("acetaminophen" = acetaminophen, 
                                        "sinusitis" = sinusitis), 
                       countBy = c("record","person")) |>
  select(group_level, variable_name, estimate_name) |>
  distinct() |>
  arrange(group_level, variable_name)
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      
#> # A tibble: 24 × 3
#>    group_level   variable_name                                     estimate_name
#>    <chr>         <chr>                                             <chr>        
#>  1 acetaminophen Acetaminophen 160 MG Oral Tablet                  record_count 
#>  2 acetaminophen Acetaminophen 160 MG Oral Tablet                  person_count 
#>  3 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… record_count 
#>  4 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… person_count 
#>  5 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… record_count 
#>  6 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… person_count 
#>  7 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… record_count 
#>  8 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… person_count 
#>  9 acetaminophen Acetaminophen 325 MG Oral Tablet                  record_count 
#> 10 acetaminophen Acetaminophen 325 MG Oral Tablet                  person_count 
#> # ℹ 14 more rows

However, we can specify which one is of interest using countBy argument:

summariseConceptCounts(cdm, 
                       conceptId = list("acetaminophen" = acetaminophen,
                                        "sinusitis" = sinusitis),
                       countBy = "record") |>
  select(group_level, variable_name, estimate_name) |>
  distinct() |>
  arrange(group_level, variable_name) 
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  2s                                                                      
#> # A tibble: 12 × 3
#>    group_level   variable_name                                     estimate_name
#>    <chr>         <chr>                                             <chr>        
#>  1 acetaminophen Acetaminophen 160 MG Oral Tablet                  record_count 
#>  2 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… record_count 
#>  3 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… record_count 
#>  4 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… record_count 
#>  5 acetaminophen Acetaminophen 325 MG Oral Tablet                  record_count 
#>  6 acetaminophen Acetaminophen 750 MG / Hydrocodone Bitartrate 7.… record_count 
#>  7 acetaminophen overall                                           record_count 
#>  8 sinusitis     Acute bacterial sinusitis                         record_count 
#>  9 sinusitis     Chronic sinusitis                                 record_count 
#> 10 sinusitis     Sinusitis                                         record_count 
#> 11 sinusitis     Viral sinusitis                                   record_count 
#> 12 sinusitis     overall                                           record_count

One can further stratify by year, sex or age group using the year, sex, and ageGroup arguments.

summariseConceptCounts(cdm,                         conceptId = list("acetaminophen" = acetaminophen,                                         "sinusitis" = sinusitis),                        countBy = "person",                        year = TRUE,                        sex  = TRUE,                        ageGroup = list("<=50" = c(0,50), ">50" = c(51,Inf))) |>   select(group_level, strata_level, variable_name, estimate_name) |>   glimpse() 
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  3s                                                                      ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■                  50% | ETA:  3s                                                                      
#> Rows: 1,173
#> Columns: 4
#> $ group_level   <chr> "acetaminophen", "acetaminophen", "acetaminophen", "acet…
#> $ strata_level  <chr> "overall", "overall", "overall", "overall", "overall", "…
#> $ variable_name <chr> "overall", "Acetaminophen 325 MG Oral Tablet", "Acetamin…
#> $ estimate_name <chr> "person_count", "person_count", "person_count", "person_…

3.1 Visualise the results

Finally, we can visualise the concept counts using plotRecordCounts().

summariseConceptCounts(cdm, 
                       conceptId = list("sinusitis" = sinusitis), 
                       countBy = "person") |> 
  plotConceptCounts()
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character

Notice that either person counts or record counts can be plotted. If both have been included in the summarised result, you will have to filter to only include one variable at time:

summariseConceptCounts(cdm, 
                       conceptId = list("sinusitis" = sinusitis),
                       countBy = c("person","record")) |>
  filter(estimate_name == "person_count") |>
  plotConceptCounts()
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character

Additionally, if results were stratified by year, sex or age group, we can further use facet or colour arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use visOmopResult package.

summariseConceptCounts(cdm, 
                       conceptId = list("sinusitis" = sinusitis),
                       countBy = c("person"),
                       sex = TRUE, 
                       ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf))) |>
  visOmopResults::tidyColumns()
#> ℹ Getting use of codes from sinusitis
#>  [1] "cdm_name"            "codelist_name"       "sex"                
#>  [4] "age_group"           "variable_name"       "variable_level"     
#>  [7] "person_count"        "source_concept_name" "source_concept_id"  
#> [10] "domain_id"           "result_type"         "package_name"       
#> [13] "package_version"

summariseConceptCounts(cdm, 
                       conceptId = list("sinusitis" = sinusitis),
                       countBy = c("person"),
                       sex = TRUE, 
                       ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf)))|>
  plotConceptCounts(facet = "sex", colour = "age_group")
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character