Skip to contents

Downloads and returns a catalog of datasets and resources from Peru's open data portal. Uses intelligent chunked loading to handle API limitations. Can get partial or complete catalogs based on target size.

Usage

po_catalog(
  refresh = FALSE,
  verbose = TRUE,
  target_size = NULL,
  extend_existing = FALSE
)

Arguments

refresh

Logical. Force refresh of cached catalog (default FALSE)

verbose

Logical. Show progress messages (default TRUE)

target_size

Integer. Number of datasets to fetch (default NULL = try all 3954)

extend_existing

Logical. Add more data to existing catalog (default FALSE)

Value

A list containing:

datasets

Tibble with all datasets and summary information

resources

Tibble with all resources linked to parent datasets

summary

List with catalog statistics and metadata

Examples

if (FALSE) { # \dontrun{
# Get a working subset (1000-1500 datasets)
catalog <- po_catalog(target_size = 1500)

# Get more coverage progressively
catalog <- po_catalog(target_size = 2500)

# Try to get everything (may timeout)
catalog <- po_catalog()

# Extend existing catalog with more data
more_catalog <- po_catalog(target_size = 3000, extend_existing = TRUE)

# Find all CSV files under 50MB
csv_files <- catalog$resources %>%
  filter(format == "CSV", size_mb < 50)

# Find datasets by organization
minsa_data <- catalog$datasets %>%
  filter(grepl("MINSA", organization))
} # }