Help for package geeLite

Type:

Package

Title:

Building and Managing Local Databases from 'Google Earth Engine'

Version:

1.0.2

Description:

Simplifies the creation, management, and updating of local databases using data extracted from 'Google Earth Engine' ('GEE'). It integrates with 'GEE' to store, aggregate, and process spatio-temporal data, leveraging 'SQLite' for efficient, serverless storage. The 'geeLite' package provides utilities for data transformation and supports real-time monitoring and analysis of geospatial features, making it suitable for researchers and practitioners in geospatial science. For details, see Kurbucz and Andrée (2025) "Building and Managing Local Databases from Google Earth Engine with the geeLite R Package" https://hdl.handle.net/10986/43165.

License:

MPL-2.0

Encoding:

UTF-8

RoxygenNote:

7.3.2

VignetteBuilder:

knitr

Imports:

rnaturalearthdata, rnaturalearth, googledrive, data.table, reticulate, rstudioapi, geojsonio, lubridate, jsonlite, magrittr, progress, reshape2, tidyrgee, RSQLite, stringr, crayon, dplyr, h3jsr, knitr, utils, purrr, stats, tidyr, rgee, cli, sf

Suggests:

testthat (≥ 3.0.0), rmarkdown, leaflet, withr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-07-19 00:22:45 UTC; Marcell

Author:

Marcell T. Kurbucz [aut, cre], Bo Pieter Johannes Andrée [aut]

Maintainer:

Marcell T. Kurbucz <m.kurbucz@ucl.ac.uk>

Repository:

CRAN

Date/Publication:

2025-07-21 09:51:25 UTC

Aggregate Data by Frequency

Description

Aggregates data from a wide-format data frame according to a specified frequency and applies aggregation and post-processing functions.

Usage

aggr_by_freq(
  table,
  freq,
  prep_fun,
  aggr_funs,
  postp_funs,
  variable_name,
  preprocess_body
)

Arguments

table

[mandatory] (data.frame) A wide-format data frame.

freq

[mandatory] (character) Specifies the frequency to aggregate the data.

prep_fun

[mandatory] (function) Function used for pre-processing.

aggr_funs

[mandatory] (function or list) Aggregation function(s).

postp_funs

[mandatory] (function or list) Post-processing function(s).

variable_name

[mandatory] (character) Name of the current variable.

preprocess_body

[mandatory] (character) Body of the prep_fun function.

Value

A data frame in wide format with aggregated values.

Perform a Single Drive Export for Multiple Geometry Chunks

Description

Exports multiple geometry chunks to Google Drive in a single batch task. The function processes spatial data using Google Earth Engine (GEE) and exports results in CSV format.

Usage

batch_drive_export(
  sf_list,
  imgs,
  stat_fun,
  band,
  stat,
  scale,
  folder = ".geelite_tmp_drive",
  user = NULL,
  description = "geelite_export",
  verbose = FALSE
)

Arguments

sf_list

[mandatory] (list) A list of sf data.frames representing geometry chunks to be processed.

imgs

[mandatory] (ee$ImageCollection) The Earth Engine image collection to extract statistics from.

stat_fun

[mandatory] (ee$Reducer) The reducer function to apply to extract statistics.

band

[mandatory] (character) The band name from the image collection (e.g., "NDVI").

stat

[mandatory] (character) The statistical function name (e.g., "mean").

scale

[mandatory] (numeric) The spatial resolution in meters for 'reduceRegions'.

folder

[optional] (character) Name of the Google Drive folder where the export will be stored. Default is ".geelite_tmp_drive".

user

[optional] (character) If multiple rgee user profiles exist, specify the user profile directory.

description

[optional] (character) A custom description for the export task. Default is "geelite_export".

verbose

[optional] (logical) If TRUE, progress messages will be printed. Defaults to FALSE.

Value

(data.frame) A data frame containing extracted statistics with columns id, band, zonal_stat, and date-based values.

Check Google Earth Engine connection

Description

Returns TRUE if the user is authenticated with GEE via 'rgee', without triggering interactive prompts. Useful in non-interactive contexts like CRAN. Prints a message and returns FALSE if not.

Usage

check_rgee_ready()

Value

A logical value: TRUE if authenticated with GEE, FALSE otherwise (invisibly).

Clean Contents or Entire Google Drive Folders by Name

Description

Searches for all Google Drive folders with the specified name and optionally removes their contents and/or the folders themselves. Useful for cleaning up scratch or export folders used by Earth Engine batch processes.

Usage

clean_drive_folders_by_name(
  folder_name,
  delete_folders = FALSE,
  verbose = TRUE
)

Arguments

folder_name

[mandatory] (character) Name of the folder(s) to search for in Google Drive.

delete_folders

[optional] (logical) If TRUE, the matched folders themselves will be deleted after their contents are removed. Default is FALSE.

verbose

[optional] (logical) If TRUE, messages will be printed about cleanup actions. Default is TRUE.

Compare Lists and Highlight Differences

Description

Compares two lists and marks new values with '+' and removed values with '-'.

Usage

compare_lists(list_1, list_2)

Arguments

list_1

[mandatory] (list) First list to compare.

list_2

[mandatory] (list) Second list to compare.

Value

A list showing added and removed values marked with '+' and '-'.

Compare Vectors and Highlight Differences

Description

Compares two vectors and indicates added ('+') and removed ('-') values.

Usage

compare_vectors(vector_1, vector_2)

Arguments

vector_1

[mandatory] (character or integer) First vector to compare.

vector_2

[mandatory] (character or integer) Second vector to compare.

Value

A vector showing added and removed values marked with '+' and '-'.

Collect and Process Grid Statistics

Description

This function retrieves and processes grid statistics from Google Earth Engine (GEE) based on the specified session task. The collected data is stored in SQLite format (data/geelite.db), along with supplementary files such as CLI files (cli/...), the state file (state/state.json), and the log file (log/log.txt).

Usage

compile_db(task, grid, mode, verbose)

Arguments

task

[mandatory] (list) Session task that specifies the parameters for data collection.

grid

[mandatory] (sf) Simple features object containing the geometries of the regions of interest.

mode

[optional] (character) Mode of data extraction. Currently supports "local" or "drive" (for larger exports via Google Drive). Defaults to "local".

verbose

[mandatory] (logical) Display messages and progress status.

Create or Open the Database Connection

Description

Tries to connect to an SQLite database using dbConnect(). If the initial connection fails, it retries up to max_retries times, waiting wait_time seconds between each attempt. If the connection cannot be established after the maximum retries, the function stops and throws an error.

Usage

db_connect(db_path, max_retries = 3, wait_time = 5)

db_connect(db_path, max_retries = 3, wait_time = 5)

Arguments

db_path

[mandatory] (character) A string specifying the file path to the SQLite database.

max_retries

[mandatory] (integer) The maximum number of retries if the connection fails (default: 3).

wait_time

[mandatory] (numeric) The number of seconds to wait between retries (default: 5).

Value

A database connection object if the connection is successful.

Internal Dummy Function for Declared Imports

Description

Ensures CRAN recognizes packages listed in 'Imports:' that are indirectly required but not explicitly used. Never called at runtime and has no side effects.

Usage

dummy_use_for_cran()

Value

NULL (invisible)

Expand Data to Daily Frequency

Description

Expands the input data frame to a daily frequency, filling in any missing dates within the observed range.

Usage

expand_to_daily(df_long, prep_fun)

Arguments

df_long

[mandatory] (data.frame) A long-format data frame with at least the columns id and date.

prep_fun

[mandatory] (function) Function used for pre-processing.

Value

A data frame with daily dates and preprocessed value column.

Extract Large-Scale Statistics in Drive Mode with Fewer Tasks

Description

Batches multiple geometry chunks into fewer ee_table_to_drive tasks, reducing overhead and leveraging Google Earth Engine's parallel processing.

Usage

extract_drive_stats(
  sf_chunks,
  imgs,
  band,
  stat,
  stat_fun,
  scale,
  folder = ".geelite_tmp_drive",
  user = NULL,
  pb,
  pb_step
)

Arguments

sf_chunks

[mandatory] (list) A list of sf data frames representing geometry chunks.

imgs

[mandatory] (ee$ImageCollection) The Earth Engine image collection to extract statistics from.

band

[mandatory] (character) The band name (e.g., "NDVI").

stat

[mandatory] (character) The statistical function to apply (e.g., "mean").

stat_fun

[mandatory] (ee$Reducer) The Earth Engine reducer function.

scale

[mandatory] (numeric) The spatial resolution in meters for reduceRegions.

folder

[optional] (character) Name of the Google Drive folder where exports will be stored. Defaults to ".geelite_tmp_drive".

user

[optional] (character) GEE user profile name, if applicable.

pb

[mandatory] (Progress bar object) A progress bar instance from progress::progress_bar or similar package. Used to track task progress.

pb_step

[mandatory] (numeric) The step size for updating the progress bar.

Value

(data.frame) A merged data frame containing extracted statistics from all Drive exports.

Fetch ISO 3166-1 Country Codes

Description

Retrieves country-level regions using ISO 3166-1 alpha-2 codes.

Usage

fetch_country_regions()

Value

A data frame with country names, ISO 3166-1 codes, and admin level 0.

Fetch ISO 3166 Country and Subdivision Codes

Description

Returns a data frame containing ISO 3166-1 country codes and ISO 3166-2 subdivision codes for the specified administrative level.

Usage

fetch_regions(admin_lvl = 0)

Arguments

admin_lvl

[optional] (integer) Administrative level to retrieve: 0 for country-level (ISO 3166-1), 1 for first-level subdivisions (ISO 3166-2), or NULL to include both (default: 0).

Value

A data frame containing region names, ISO 3166-2 codes, and the corresponding administrative levels.

Examples

# Example: Fetch ISO 3166-1 country codes
## Not run: 
  fetch_regions()

## End(Not run)

Fetch ISO 3166-2 Subdivision Codes

Description

Retrieves first-level administrative subdivisions (e.g., states, provinces) using ISO 3166-2 codes.

Usage

fetch_state_regions()

Value

A data frame with subdivision names, ISO 3166-2 codes, and admin level 1.

Fetch Variable Information from an SQLite Database

Description

Displays information on the available variables in the SQLite database (data/geelite.db).

Usage

fetch_vars(
  path,
  format = c("data.frame", "markdown", "latex", "html", "pipe", "simple", "rst")
)

Arguments

path

[mandatory] (character) Path to the root directory of the generated database.

format

[mandatory] (character) A character string. Possible values are "data.frame" (default) to return a data.frame object, or one of "markdown", "latex", "html", "pipe" (Pandoc's pipe tables), "simple" (Pandoc's simple tables), and "rst" to be passed on to knitr for formatting.

Value

Returns the variable information in the selected format. If format = "data.frame", a data.frame is returned. For other formats, the output is printed in the specified format and NULL is

Examples

# Example: Printing the available variables
## Not run: 
  fetch_vars(path = "path/to/db")

## End(Not run)

Install and Configure a Conda Environment for 'rgee'

Description

Sets up a Conda environment with all required Python and R dependencies for using the rgee package, including a specific version of the earthengine-api. If Conda is not available, the user will be prompted to install Miniconda. The created environment is automatically registered for use with rgee.

Usage

gee_install(conda = "rgee", python_version = "3.10", force_recreate = FALSE)

Arguments

conda

[optional] (character) Name of the Conda environment to create or use. Defaults to "rgee".

python_version

[optional] (character) Python version to use when creating the Conda environment. Defaults to "3.10".

force_recreate

[optional] (logical) If TRUE, deletes and recreates the Conda environment even if it already exists. Defaults to FALSE.

Value

Invisibly returns the name of the Conda environment used or created.

Note

Even after installation, users must manually accept the Conda Terms of Service (ToS) using the 'conda tos accept' command before package installation can proceed. Clear instructions will be provided if ToS acceptance is needed.

Examples

# Example: Creating a Conda environment with 'rgee' dependencies
## Not run: 
  gee_install()

## End(Not run)

Print Google Earth Engine and Python Environment Information

Description

Prints information about the Google Earth Engine (GEE) and Python environment.

Usage

gee_message(user)

Arguments

user

[mandatory] (character) Specifies the Google account directory for which information is displayed.

Define Output Messages

Description

Defines output messages based on whether the database is new or updated.

Usage

gen_messages(database_new)

Arguments

database_new

[mandatory] (logical) A logical value indicating whether the database is new.

Value

A list of output messages.

Create Batches from an sf Object

Description

Divides an sf object (grid) into a list of chunks, either based on a specified number of batches (batch_num) or a maximum chunk size (batch_size).

Usage

get_batch(grid, batch_size = NULL, batch_num = NULL)

Arguments

grid

[mandatory] (sf) The sf object to be split into chunks.

batch_size

[optional] (integer) Maximum rows per chunk. Must be set if batch_num is NULL.

batch_num

[optional] (integer) Total number of chunks to create. Must be set if batch_size is NULL.

Value

(list) A list of sf objects (chunks).

Produce Batches for Build/Update Mixed Cases

Description

Divides the grid into one or two lists of chunked sf objects, depending on the data-collection case (1,2,3).

Usage

get_batches(cases, grid, batch_size)

Arguments

cases

[mandatory] (integer) 1=All build, 2=All update, 3=Mixed.

grid

[mandatory] (sf) The sf object (grid) containing column 'add' to distinguish existing vs. new rows.

batch_size

[mandatory] (integer) If cases = 1 or 2, we'll call get_batch(grid,batch_size=batch_size).

Value

(list) A list of two elements, b1 and b2, each a list of sf subsets (chunks). b2 might be NULL if not needed.

Get H3 Bins for Shapes

Description

Generates H3 bins for the provided shapes at the specified resolution.

Usage

get_bins(shapes, resol)

Arguments

shapes

[mandatory] (sf) A simple features object containing geometries used for generating H3 bins.

resol

[mandatory] (integer) An integer specifying the resolution of the H3 grid.

Value

A data frame containing the H3 bins with columns for region ISO 3166 codes, bin IDs, and geometry.

Determine the Cases of Data Collection Requests

Description

Determines the cases of data collection requests based on the markers of 'datasets', 'bands', and 'stats'.

Usage

get_cases(database_new, dataset_new, band_new, stats_new, regions_new)

Arguments

database_new

[mandatory] (logical) A logical value indicating whether the database is new.

dataset_new

[mandatory] (logical) A logical value indicating whether the dataset is new.

band_new

[mandatory] (logical) A logical value indicating whether the band is new.

stats_new

[mandatory] (logical) A logical vector indicating which statistics are new.

regions_new

[mandatory] (logical) A logical vector indicating which regions are new.

Value

An integer indicating the processing cases as follows:

1: All build
2: All update
3: Mixed

Print the Configuration File

Description

Reads and prints the configuration file from the database's root directory in a human-readable format.

Usage

get_config(path)

Arguments

path

[mandatory] (character) The path to the root directory of the generated database.

Value

A character string representing the formatted JSON content of the configuration file.

Examples

# Example: Printing the configuration file
## Not run: 
  get_config(path = "path/to/db")

## End(Not run)

Obtain H3 Hexagonal Grid

Description

Retrieves or creates the grid for the task based on the specified regions and resolution.

Usage

get_grid(task)

Arguments

task

[mandatory] (list) Session task that specifies the parameters for data collection.

Value

A simple features (sf) object containing grid data.

Retrieve Images and Related Information

Description

Retrieves images and related information from Google Earth Engine (GEE) based on the specified session task.

Usage

get_images(task, mode, cases, dataset, band, regions_new, latest_date)

Arguments

task

[mandatory] (list) Session task specifying parameters for data collection.

mode

[mandatory] (character) Mode of data extraction. Currently supports "local" or "drive" (for larger exports via Google Drive). Defaults to "local".

cases

[mandatory] (integer) Type of data collection request (1: All build, 2: All update, 3: Mixed).

dataset

[mandatory] (character) Name of the GEE dataset.

band

[mandatory] (character) Name of the band.

latest_date

[mandatory] (date) The most recent data available in the related SQLite table. Set to NULL during the (re)building procedure.

Value

List containing retrieved images and related information as follows:

$build: Images for the building procedure
$update: Images for the updating procedure
$batch_size: Batch size
$skip_band: TRUE if 'band' is up-to-date and can be skipped
$skip_update: TRUE if 'band' is up-to-date but cannot be skipped

Print JSON File

Description

Reads and prints a specified JSON file from the provided root directory in a human-readable format.

Usage

get_json(path, file_path)

Arguments

path

[mandatory] (character) The path to the root directory of the generated database.

file_path

[mandatory] (character) The relative path to the JSON file from the root directory.

Value

A character string representing the formatted JSON content of the specified file.

Get Reducers

Description

Initializes a list of reducers for grid statistics calculation.

Usage

get_reducers()

Value

A list of available reducers.

Get Shapes for Specified Regions

Description

Retrieves the shapes of specified regions, which can be at the country or state level.

Usage

get_shapes(regions)

Arguments

regions

[mandatory] (character) A vector containing ISO 3166-2 region codes. Country codes are two characters long, while state codes contain additional characters.

Value

A simple features (sf) object containing the shapes of the specified regions.

Print the State File

Description

Reads and prints the state file from the database's root directory in a human-readable format.

Usage

get_state(path)

Arguments

path

[mandatory] (character) The path to the root directory of the generated database.

Value

A character string representing the formatted JSON content of the state file.

Examples

# Example: Printing the state file
## Not run: 
  get_state(path = "path/to/db")

## End(Not run)

Generate Session Task

Description

Generates a session task based on the configuration and state files.

Usage

get_task()

Value

A list representing the session task.

Initialize Post-Processing Folder and Files

Description

Creates a postp folder at the specified path and adds two empty files: structure.json and functions.R.

Usage

init_postp(path, verbose = TRUE)

Arguments

path

[mandatory] character The path to the root directory where the postp folder should be created.

verbose

[optional] (logical) Display messages (default: TRUE).

Details

The structure.json file is initialized with a default JSON structure: "default": null. This file is intended for mapping variables to post-processing functions. The functions.R file is created with a placeholder comment indicating where to define the R functions for post-processing. If the postp folder already exists, an error will be thrown to prevent overwriting existing files.

Value

No return value, called for side effects.

Examples

# Example: Initialize post-processing files in the database directory
## Not run: 
  init_postp("path/to/db")

## End(Not run)

Simple Linear Interpolation

Description

Replaces NA values with linear interpolation.

Usage

linear_interp(x)

Arguments

x

[mandatory] (numeric) A numeric vector possibly containing NA values.

Value

A numeric vector of the same length as x, with NA values replaced by linear interpolation.

Load External Post-Processing Functions

Description

Loads post-processing functions and their configuration from an external folder named postp, located in the root directory of the database. The folder must contain two files: structure.json (which defines the post-processing configuration) and functions.R (which contains the R function definitions to be used for post-processing). The function checks for these files and loads the JSON configuration and sources the R script. If the required files are missing, it stops execution and notifies the user with instructions on how to set up the files correctly.

Usage

load_external_postp(path)

Arguments

path

[mandatory] (character) The path to the root directory where the database is located.

Value

Returns a list of post-processing functions loaded from the structure.json file. The functions defined in functions.R are sourced and made available in the returned environment.

Note

The postp folder must contain two files: structure.json and functions.R. The structure.json file contains mappings of variables to the post-processing functions, while functions.R contains the actual function definitions that will be used for post-processing.

Extract Statistics Locally for a Single Geometry Chunk

Description

Computes statistical summaries for a given spatial feature (sf_chunk) from an Earth Engine ee$ImageCollection over a specified date range. This function extracts values for a specific band and applies a chosen reducer.

Usage

local_chunk_extract(sf_chunk, imgs, dates, band, stat, stat_fun, scale)

Arguments

sf_chunk

[mandatory] (sf) An sf data frame containing geometry.

imgs

[mandatory] (ee$ImageCollection) The Earth Engine image collection to extract statistics from.

dates

[mandatory] (character) A vector of date strings corresponding to images in the collection.

band

[mandatory] (character) The name of the band to extract.

stat

[mandatory] (character) The statistical function to apply (e.g., "mean").

stat_fun

[mandatory] (ee$Reducer) The Earth Engine reducer function.

scale

[mandatory] (numeric) The spatial resolution in meters for reduce operations.

Value

(data.frame) A data frame containing extracted statistics with columns id, band, zonal_stat, and date-based values.

Modify Configuration File

Description

Modifies the configuration file located in the specified root directory of the generated database (config/config.json) by updating values corresponding to the specified keys.

Usage

modify_config(path, keys, new_values, verbose = TRUE)

Arguments

path

[mandatory] (character) The path to the root directory of the generated database.

keys

[mandatory] (list) A list specifying the path to the values in the configuration file that need updating. Each path should correspond to a specific element in the configuration.

new_values

[mandatory] (list) A list of new values to replace the original values at the locations specified by 'keys'. The length of new_values must match the length of keys.

verbose

[optional] (logical) If TRUE, displays messages about the updates made (default: TRUE).

Value

No return value, called for side effects.

Examples

# Example: Modifying the configuration file
## Not run: 
  modify_config(
    path = "path/to/db",
    keys = list("limit", c("source", "MODIS/061/MOD13A2", "NDVI")),
    new_values = list(1000, "mean")
  )

## End(Not run)

Output Message

Description

Outputs a message if verbose mode is TRUE.

Usage

output_message(message, verbose)

Arguments

message

[mandatory] (list) The message to display.

verbose

[mandatory] (logical) A flag indicating whether to display the message.

Display geeLite Package Version

Description

Displays the version of the geeLite package with formatted headers.

Usage

print_version(verbose)

Arguments

verbose

[mandatory] (logical) If TRUE, the version of the geeLite package is printed.

Process a Single Source File

Description

Processes an individual source file by updating the file with the specified 'path' and writing the updated file to the cli/ directory of the database.

Usage

process_single_file(src_file_path, path)

Arguments

src_file_path

[mandatory] (character) The path of the source file to process.

path

[mandatory] (character) The path to the root directory of the generated database.

Process Source Files

Description

Processes multiple source files by iterating through them.

Usage

process_source_files(src_files_path, path)

Arguments

src_files_path

[mandatory] (character) A vector of source file paths.

path

[mandatory] (character) The path to the root directory of the generated database.

Process Marked Vector

Description

Generates a list categorizing items based on their marks: items to be added ('+'), items to be dropped ('-'), items to be used (unmarked or marked with '+'), and indices of '+' items within the used category.

Usage

process_vector(vector)

Arguments

vector

[mandatory] (character) A character vector containing elements marked with '+' and '-' prefixes.

Value

A list with the following components:

$add: Items marked with '+'
$drop: Items marked with '-'
$use: Items that are unmarked or marked with '+'
$use_add: TRUE for items marked with '+' within the $use category

Reading, Aggregating, and Processing the SQLite Database

Description

Reads, aggregates, and processes the SQLite database (data/geelite.db).

Usage

read_db(
  path,
  variables = "all",
  freq = c("month", "day", "week", "bimonth", "quarter", "season", "halfyear", "year"),
  prep_fun = NULL,
  aggr_funs = function(x) mean(x, na.rm = TRUE),
  postp_funs = NULL
)

Arguments

path

[mandatory] (character) Path to the root directory of the generated database.

variables

[optional] (character or integer) Names or IDs of the variables to be read. Use the fetch_vars function to identify available variables and IDs (default: "all").

freq

[optional] (character) The frequency for data aggregation. Options include "day", "week", "month", "bimonth", "quarter", "season", "halfyear", "year" (default: "month").

prep_fun

[optional] (function or NULL) A function for pre-processing time series data prior to aggregation. If NULL, a default linear interpolation (via linear_interp) will be used for daily-frequency data. If non-daily, the default behavior simply returns the vector without interpolation.

aggr_funs

[optional] (function or list) A function or a list of functions for aggregating data to the specified frequency (freq). Users can directly refer to variable names or IDs. The default function is the mean: function(x) mean(x, na.rm = TRUE).

postp_funs

[optional] (function or list) A function or list of functions applied to the time series data of a single bin after aggregation. Users can directly refer to variable names or IDs. The default is NULL, indicating no post-processing.

Value

A list where the first element (grid) is a simple feature (sf) object, and subsequent elements are data frame objects corresponding to the variables.

Examples

# Example: Reading variables by IDs
## Not run: 
  db_list <- read_db(path = "path/to/db",
    variables = c(1, 3))

## End(Not run)

Read Grid from Database

Description

Reads the H3 grid from the specified SQLite database (data/geelite.db).

Usage

read_grid()

Value

A simple features (sf) object containing the grid data.

Read Variables from Database

Description

Reads the specified variables from the SQLite database.

Usage

read_variables(path, variables, freq, prep_fun, aggr_funs, postp_funs)

Arguments

path

[mandatory] (character) Path to the root directory of the generated database.

variables

[mandatory] (character) A vector of variable names to read.

freq

[mandatory] (character) Specifies the frequency to aggregate the data.

prep_fun

[mandatory] (function) Function used for pre-processing.

aggr_funs

[mandatory] (function or list) Aggregation function(s).

postp_funs

[mandatory] (function or list) Post-processing function(s).

Value

A list of variables read from the database.

Remove Tables from the Database

Description

Removes tables from the database if their corresponding dataset is initially marked for deletion ('-').

Usage

remove_tables(tables_drop)

Arguments

tables_drop

[mandatory] (character) A character vector of tables to be deleted.

Build and Update the Grid Statistics Database

Description

Collects and stores grid statistics from Google Earth Engine (GEE) data in SQLite format (data/geelite.db), initializes CLI files (cli/...), and initializes or updates the state (state/state.json) and log (log/log.txt) files.

Usage

run_geelite(
  path,
  conda = "rgee",
  user = NULL,
  rebuild = FALSE,
  mode = "local",
  verbose = TRUE
)

Arguments

path

[mandatory] (character) The path to the root directory of the generated database. This must be a writable, non-temporary directory. Avoid using the home directory (~), the current working directory, or the package directory.

conda

[optional] (character) Name of the virtual Conda environment used by the rgee package (default: "rgee").

user

[optional] (character) Specifies the Google account directory within ~/.config/earthengine/. This directory stores credentials for a specific Google account (default: NULL).

rebuild

[optional] (logical) If TRUE, the database and its supplementary files are overwritten based on the configuration file (default: FALSE).

mode

[optional] (character) Mode of data extraction. Currently supports "local" or "drive" (for larger exports via Google Drive). Defaults to "local".

verbose

[optional] (logical) Display computation status and messages (default: TRUE).

Value

Invisibly returns NULL, called for side effects.

Examples

# Example: Build a Grid Statistics Database
## Not run: 
  run_geelite(path = tempdir())

## End(Not run)

Initialize CLI Files

Description

Creates R scripts to enable the main functions to be called through the Command Line Interface (CLI). These scripts are stored in the cli/ directory of the generated database.

Usage

set_cli(path, verbose = TRUE)

Arguments

path

verbose

[optional] (logical) Whether to display messages (default: TRUE).

Value

No return value, called for side effects.

Examples

## Not run: 
  set_cli(path = tempdir())

## End(Not run)

Initialize the Configuration File

Description

Creates a configuration file in the specified directory of the generated database (config/config.json). If the specified directory does not exist but its parent directory does, it will be created.

Usage

set_config(
  path,
  regions,
  source,
  start = "2020-01-01",
  resol,
  scale = NULL,
  limit = 10000,
  verbose = TRUE
)

Arguments

path

regions

[mandatory] (character) ISO 3166-1 alpha-2 country codes or ISO 3166-2 subdivision codes.

source

[mandatory] (list) Description of Google Earth Engine (GEE) datasets of interest (the complete data catalog of GEE is accessible at: https://developers.google.com/earth-engine/datasets/catalog). It is a nested list with three levels:

names

(list) Datasets of interest (e.g., "MODIS/061/MOD13A1").

bands

(list) Bands of interest (e.g., "NDVI").

zonal_stats: (character) Statistics of interest (options: "mean", "median", "min", "max", "sd").

start

[optional] (date) First date of the data collection (default: "2020-01-01").

resol

[mandatory] (integer) Resolution of the H3 bin.

scale

[optional] (integer) Specifies the nominal resolution (in meters) for image processing. If left as NULL (the default), a resolution of 1000 is used.

limit

[optional] (integer) In "local" mode, 'limit / dates' sets batch size; in "drive" mode, 'limit' is the max features per export (default: 10000).

verbose

[optional] (logical) Display messages (default: TRUE).

Value

No return value, called for side effects.

Examples

## Not run: 
  set_config(path = tempdir(),
             regions = c("SO", "YM"),
             source = list(
              "MODIS/061/MOD13A1" = list(
                "NDVI" = c("mean", "sd")
             )
            ),
            resol = 3)

## End(Not run)

Set Dependencies

Description

Authenticates the Google Earth Engine (GEE) account and activates the specified Conda environment.

Usage

set_depend(conda = "rgee", user = NULL, drive = TRUE, verbose = TRUE)

Arguments

conda

[optional] (character) Name of the virtual Conda environment used by the rgee package (default: "rgee").

user

[optional] (character) Specifies the Google account directory within ~/.config/earthengine/. This directory stores credentials for a specific Google account (default: NULL).

drive

[optional] (logical) If TRUE, initializes Google Drive authentication for tasks involving Drive exports (default: TRUE).

verbose

[optional] (logical) Display messages (default: TRUE).

Generate Necessary Directories

Description

Generates "data", "log", "cli", and "state" subdirectories at the specified path.

Usage

set_dirs(rebuild)

Arguments

rebuild

[optional] (logical) If TRUE, existing directories will be removed and recreated.

Set Progress Bar

Description

Initializes a progress bar if 'verbose' is TRUE.

Usage

set_progress_bar(verbose)

Arguments

verbose

[mandatory] (logical) If TRUE, a progress bar is initialized.

Value

A progress bar (environment) if 'verbose' is TRUE, or NULL if FALSE.

Source an R Script with Notifications About Functions Loaded

Description

Sources an R script into a dedicated environment and lists the functions that have been loaded.

Usage

source_with_notification(file)

Arguments

file

[mandatory] (character) A character string specifying the path to the R script to be sourced.

Value

An environment containing the functions loaded from the sourced file.

Update Grid Statistics

Description

Updates existing grid statistics with newly calculated statistics.

Usage

update_grid_stats(grid_stat, batch_stat)

Arguments

grid_stat

[optional] (data.frame) Existing data frame of grid statistics to append the newly calculated statistics to.

batch_stat

[mandatory] (data.frame) New data frame of grid statistics to append to the existing statistics.ű

Value

(data.frame) A combined data frame with missing columns filled as NA.

A data frame containing the updated grid statistics.

Validate Parameters

Description

Validates multiple parameters.

Usage

validate_params(params)

Arguments

params

[mandatory] (list) A list of parameters to be validated.

Details

The following validations are performed: - 'admin_lvl': Ensures it is NULL, 0, or 1. - 'conda': Verifies if it is an available Conda environment. - 'file_path': Constructs a file path and checks if the file exists. - 'keys': Ensures it is a non-empty list with valid entries. - 'limit': Ensures it is a positive numeric value. - 'mode': Ensures it is 'local' or 'drive'. - 'new_values': Ensures it is a list with the same length as 'keys'. - 'user': Verifies it is NULL or a character value. - 'path': Verifies if the directory exists. - 'rebuild': Verifies it is a logical value. - 'regions': Ensures the first two characters are letters. - 'start': Ensures it is a valid date. - 'verbose': Verifies it is a logical value.

Value

Returns NULL invisibly if all validations pass.

Validate Source Parameter

Description

Checks the validity of the 'source' parameter.

Usage

validate_source_param(source)

Arguments

source

[mandatory] (list) A list containing datasets, each with its associated bands and statistics. The structure should follow a nested format where each dataset is a named list, each band within a dataset is also a named list, and each statistic within a band is a non-empty character string.

Value

Returns TRUE if the 'source' parameter is valid. Throws an error if the parameter is invalid.

Validate and Process Parameters for Variable Selection and Data Processing

Description

Validates and processes input parameters related to variable selection and data processing in the read_db function. It ensures that the variables, frequency, and functions provided are valid, correctly formatted, and compatible with the available data.

Usage

validate_variables_param(
  variables,
  variables_all,
  prep_fun,
  aggr_funs,
  postp_funs
)

Arguments

variables

[mandatory] (character or integer) Variable IDs or names to be processed. Use fetch_vars to obtain valid variable names or IDs. Accepts "all" to select all available variables.

variables_all

[mandatory] (data.frame) A data frame containing all available variables, typically obtained from fetch_vars.

prep_fun

[mandatory] (function) A function used for pre-processing.

aggr_funs

[mandatory] (function or list) Aggregation function(s).

postp_funs

[mandatory] (function or list) Post-processing function(s).

Value

A character vector of variable names to process.

Write Grid to Database

Description

Writes the H3 grid to the specified SQLite database (data/geelite.db).

Usage

write_grid(grid)

Arguments

grid

[mandatory] (sf) Simple features object containing the grid data to be written into the database.

Write Grid Statistics to Database

Description

Writes grid statistics to the SQLite database.

Usage

write_grid_stats(database_new, dataset_new, dataset, db_table, grid_stats)

Arguments

database_new

[mandatory] (logical) A logical value indicating whether the database is new.

dataset_new

[mandatory] (logical) A logical value indicating whether the dataset is new.

dataset

[mandatory] (character) Name of the dataset to initialize or update in the SQLite database.

db_table

[mandatory] (data.frame) The table to be updated or retrieved from the SQLite database. Set to NULL during the (re)building procedure.

grid_stats

[mandatory] (list) List containing grid statistics separately for (re)building and updating procedures.

Write Log File

Description

Writes the log file to the specified directory within the generated database (log/log.txt).

Usage

write_log_file(database_new)

Arguments

database_new

[mandatory] (logical) A logical value indicating whether the database is new.

Write State File

Description

Writes the state file to the specified directory within the generated database (state/state.json).

Usage

write_state_file(task, regions, source_for_state)

Arguments

task

[mandatory] (list) Session task specifying parameters for data collection.

regions

[mandatory] (character) A vector containing ISO 3166-2 region codes. Country codes are two characters long, while state codes contain additional characters.

source_for_state

[mandatory] (list) A list containing information regarding the collected data.