Get searchable catalog of Peru open data with smart loading

Downloads and returns a catalog of datasets and resources from Peru's open data portal. Uses intelligent chunked loading to handle API limitations. Can get partial or complete catalogs based on target size.

Usage

po_catalog(
  refresh = FALSE,
  verbose = TRUE,
  target_size = NULL,
  extend_existing = FALSE
)

Arguments

refresh: Logical. Force refresh of cached catalog (default FALSE)
verbose: Logical. Show progress messages (default TRUE)
target_size: Integer. Number of datasets to fetch (default NULL = try all 3954)
extend_existing: Logical. Add more data to existing catalog (default FALSE)

Value

A list containing:

datasets: Tibble with all datasets and summary information
resources: Tibble with all resources linked to parent datasets
summary: List with catalog statistics and metadata

Examples

if (FALSE) { # \dontrun{
# Get a working subset (1000-1500 datasets)
catalog <- po_catalog(target_size = 1500)

# Get more coverage progressively
catalog <- po_catalog(target_size = 2500)

# Try to get everything (may timeout)
catalog <- po_catalog()

# Extend existing catalog with more data
more_catalog <- po_catalog(target_size = 3000, extend_existing = TRUE)

# Find all CSV files under 50MB
csv_files <- catalog$resources %>%
  filter(format == "CSV", size_mb < 50)

# Find datasets by organization
minsa_data <- catalog$datasets %>%
  filter(grepl("MINSA", organization))
} # }