Skip to contents

peruopen hex sticker

peruopen provides a comprehensive R interface to Peru’s CKAN data portal (datosabiertos.gob.pe), enabling users to search, discover, and download open government datasets directly into R.

Features

  • 🔍 Smart Search: Search across 3,954+ datasets with flexible filtering by organization, format, and content
  • 📊 Data Catalog: Browse complete catalog with metadata for datasets and 10,000+ resources
  • Fast Downloads: Intelligent resource selection and caching for efficient data access
  • 🎨 Beautiful Output: Colored console output for easy data exploration
  • 🔄 Robust API: Built-in retry logic and error handling for reliable data access
  • 📖 Rich Metadata: Access complete dataset information including organizations, formats, and descriptions

Installation

You can install the development version of peruopen from GitHub with:

# install.packages("devtools")
devtools::install_github("horaciochacon/peruopen")

Quick Start

library(peruopen)

# Get the complete data catalog (3,954 datasets, 10,000+ resources)
catalog <- po_catalog()

# Search for COVID-related data
covid_data <- po_search("covid")
covid_data

# Explore data by topic
health_data <- po_explore("salud")
health_data

# Download a specific dataset
malaria_data <- po_get("malaria-2024")

# Find all CSV files about dengue
dengue_csv <- po_search("dengue", formats = "CSV")

Main Functions

Function Purpose
po_catalog() Get complete data catalog with datasets and resources
po_search() Search datasets by keywords, tags, formats, organizations
po_explore() Interactive exploration with summaries by different dimensions
po_get() Smart data download with automatic format selection

Examples

Search and Filter Data

# Search across all fields
climate_data <- po_search("cambio climático")

# Filter by organization
minsa_data <- po_search(organizations = "MINSA")

# Find specific file formats
excel_files <- po_search(formats = c("XLSX", "XLS"))

# Search only in tags
health_tags <- po_search("salud", search_tags_only = TRUE)

Data Exploration

# Explore all available data
overview <- po_explore()

# Focus on specific topic
education <- po_explore("educación")

# See what organizations publish data
overview$by_organization

Smart Data Downloads

# Download best available resource from dataset
data <- po_get("covid-vaccinations")

# Get specific format
csv_data <- po_get("dataset-name", format = "CSV")

# Download multiple resources
resources <- catalog$resources %>% 
  filter(format == "CSV", size_mb < 50)
all_data <- po_get(resources)

# Save files locally
po_get("dataset-name", save_to = "data/peru/")

Data Coverage

The Peru open data portal contains:

  • 3,954 datasets from 283+ government organizations
  • 10,000+ resources in multiple formats (CSV, Excel, JSON, PDF, etc.)
  • ~1.7 TB of total data across all resources
  • Regular updates from ministries, regional governments, and agencies

Peru’s Open Data Ecosystem

This package connects to Peru’s official open data portal, which provides access to datasets from:

  • MINSA (Ministry of Health) - Health statistics, epidemiological data
  • MEF (Ministry of Economy) - Budget execution, economic indicators
  • MINEDU (Ministry of Education) - Educational statistics
  • INEI (National Statistics Institute) - Census, demographic data
  • Regional Governments - Local statistics and programs
  • And 278+ more organizations

Getting Help

Code of Conduct

Please note that the peruopen project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

MIT © Horacio Chacón-Torrico