Work/2026

Mediscraper

Open observatory of medicine prices across Peru — reverse-engineered from DIGEMID, normalized, and served to anyone.

TimelineMay 2026 — present
StatusFlagship · Live demo in progress
TypeOpen data · Data engineering · Product
Mediscraper — main visual
18,334products in the master catalog
21,319pharmacies compared
12.4M+prices indexed

An independent observatory over DIGEMID’s public drug-price data. The official portal is an Angular SPA with no public API, so I reverse-engineered its internal endpoints and built the full path from raw scraping to a normalized PostgreSQL database, a REST API, and a React frontend where anyone can compare a medicine’s price across pharmacies nationwide.

Highlights

  • Reverse engineering that pays for itself. Discovered the entire searchable catalog with a ~36-request prefix sweep of the autocomplete endpoint, and that a single price query returns every pharmacy in the country — cutting the scraping cost by orders of magnitude.
  • Five idempotent, resumable pipelines. Geography → catalog → ~18k-product master from a daily Excel → prices → pharmacy enrichment, with budgeted runs, bulk ON CONFLICT upserts, and a single-lane pacer with jitter and cooldowns to stay polite with the source.
  • Change-data-capture historization. Price history rows are written only when a price actually changes, and catalog snapshots append only when a SHA-256 fingerprint of the record differs — keeping history small and meaningful. Fuzzy search via PostgreSQL pg_trgm trigram indexes.
Built with
PythonFastAPIPostgreSQL 16SQLAlchemyhttpxAPSchedulerReact 18TypeScriptDocker