# Spain Church Importer (horariosmisas.com) — Design ## Overview Import ~10,000 Spanish churches with mass schedules from horariosmisas.com. Static WordPress site with fully permissive robots.txt and sitemaps. No Playwright needed — simple HTTP + HTML parsing. ## Data Source - **Site:** https://horariosmisas.com - **Coverage:** 18,000+ churches claimed, ~10,000 in sitemaps across 52 Spanish provinces - **Data:** Church name, address, phone, website, mass schedules (summer/winter seasonal variants) - **No coordinates** — addresses only. Forward geocoding via Nominatim as a separate pass. - **robots.txt:** Fully permissive (`User-agent: * / Disallow:`) - **Sitemaps:** 20 post sitemaps + 7 category sitemaps ## Architecture ### Two-Pass Approach **Pass 1: Import** — Fetch all churches from sitemaps, parse HTML, match against existing Spanish OSM churches, upsert with mass schedules. Unmatched churches created with address but no coordinates. **Pass 2: Geocode** — Forward-geocode unmatched churches via Nominatim public API (`address → lat/lng`). 1 req/sec rate limit. ### Schema Change Add `horariosMisasId String? @unique` to Church model (same pattern as `philmassId`, `massSchedulesPhId`). Update church matcher and all existing importers. ### URL Structure - Sitemap index: `/sitemap_index.xml` → 20 post sitemaps - Church pages: `/{province}/{city}/{church-slug}/` - Non-church posts (filtered out): `/misas-diarias/`, `/santos-del-dia/`, `/oraciones/`, etc. ### HTML Parsing - **Name:** `
📌 Street, PostalCode City (Province)
` - **Phone:** `Teléfono: ...` - **Website:** `Página Web: ...` - **Schedule:** `