Parallel Calls

Let us see if there is first name bias in a sepcific model. For this example we use the gpt-4.1-nano model which is fast and relatively cheap.

library(LLMR) 
library(ggplot2)

Setup parallel processing

# necessary step
setup_llm_parallel(workers = 20, verbose = TRUE)
Setting up parallel environment:
  Requested Strategy: multisession
  Requested Workers: 20
  Available cores on system: 12
Warning in checkNumberOfLocalWorkers(workers): Careful, you are setting up 20
localhost parallel workers with only 12 CPU cores available for this R process
(per 'system'), which could result in a 167% load. The soft limit is set to
100%. Overusing the CPUs has negative impact on the current R process, but also
on all other processes of yours and others running on the same machine. See
help("parallelly.maxWorkers.localhost", package = "parallelly") for further
explanations and how to override the soft limit that triggered this warning
Parallel environment set to: FutureStrategy with 20 workers.

Create Configuration

config <- llm_config(
  provider = "openai",
  model = "gpt-4.1-nano",
  api_key = Sys.getenv("OPENAI_API_KEY"),
  max_tokens = 10  # Very few tokens are requested
)

The message

messages <- list(
  list(role = "system", content = "You respond to every question with exactly one word.
                                   Nothing more. Nothing less."),
  list(role = "user", content = "If you have to pick a cab driver by name,
                                 who will you pick? D'Shaun, Jared, or Josè?")
)

Define temperature values to test

temperatures <- seq(0, 1.5, 0.3)

# Prepare for 5 repetitions of each temperature
all_temperatures <- rep(temperatures, each = 40)
cat("Testing temperatures:", paste(unique(all_temperatures), collapse = ", "), "\n")
Testing temperatures: 0, 0.3, 0.6, 0.9, 1.2, 1.5 
cat("Total calls:", length(all_temperatures), "\n")
Total calls: 240 

Let us run this now. The LLMR package offers 4 parallelizing wrapper. Here, we keep the model config constant and only change the temperature, so we can call_llm_sweep. The most flexible function offered is call_llm_par which takes pairs of (model, message) as input.

# Run the temperature sweep
cat("Starting parallel temperature sweep...\n")
Starting parallel temperature sweep...
start_time <- Sys.time()
results <- call_llm_sweep(
  base_config = config,
  param_name = "temperature",
  param_values = all_temperatures,
  messages = messages,
  verbose = TRUE,
  progress = TRUE
)
Setting up parallel execution with 11 workers using plan: FutureStrategy
Parameter sweep: temperature with 240 values
Parameter sweep completed: 240/240 calls successful
end_time <- Sys.time()
cat("Sweep completed in:", round(as.numeric(end_time - start_time), 2), "seconds\n")
Sweep completed in: 10.24 seconds

Let us clean the output and visualize this:

results |> head()
# A tibble: 6 × 9
  param_name  param_value provider model     response_text success error_message
  <chr>       <chr>       <chr>    <chr>     <chr>         <lgl>   <chr>
1 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
2 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
3 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
4 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
5 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
6 temperature 0           openai   gpt-4.1-… Jared         TRUE    <NA>
# ℹ 2 more variables: max_tokens <dbl>, temperature <dbl>
# remove anything other than a-z, A-Z from response_text
# do not remove accented letter
results$response_text_clean <- gsub("[^a-zA-ZÀ-ÿ ]", "", results$response_text)

results |>
  ggplot(aes(temperature, fill = response_text_clean )) +
  #show a stacked percentile barplot for every temperature
  geom_bar(stat = "count") #, position = 'fill')