# Spain Church Importer (horariosmisas.com) — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Import ~10,000 Spanish churches with mass schedules from horariosmisas.com, with optional Nominatim forward geocoding for unmatched churches. **Architecture:** Sitemap-driven importer. Fetch 20 post sitemaps for church URLs, parse static WordPress HTML for names/addresses/schedule tables, match against existing Spanish OSM churches, upsert with mass schedules. Separate geocoding pass via Nominatim public API. **Tech Stack:** TypeScript, Prisma, HTML parsing (regex — no Playwright), Nominatim geocoding API. --- ## Task 1: Add `horariosMisasId` to Prisma Schema **Files:** - Modify: `prisma/schema.prisma` **Step 1: Add field and index** After the `philmassId` line (around line 38), add: ```prisma horariosMisasId String? @unique @map("horarios_misas_id") // horariosmisas.com URL slug ``` And add an index in the `@@index` block (around line 78): ```prisma @@index([horariosMisasId]) ``` **Step 2: Push schema to NAS database** ```bash npx prisma db push --accept-data-loss ``` Expected: `Your database is now in sync with your Prisma schema.` **Step 3: Regenerate Prisma client** ```bash npx prisma generate ``` **Step 4: Push schema to Neon production** ```bash npx prisma db push --url "$(grep DATABASE_URL .env.production | sed 's/DATABASE_URL="//' | sed 's/"$//')" --accept-data-loss ``` **Step 5: Commit** ```bash git add prisma/schema.prisma git commit -m "feat: add horariosMisasId to Church model for Spain import" ``` --- ## Task 2: Extend Church Matcher and Existing Importers **Files:** - Modify: `src/lib/church-matcher.ts` - Modify: `scripts/import-osm-churches.ts` - Modify: `scripts/import-gcatholic.ts` - Modify: `scripts/import-baidu-churches.ts` - Modify: `scripts/import-osm-region.ts` - Modify: `scripts/import-orarimesse.ts` - Modify: `scripts/import-mass-schedules-ph.ts` - Modify: `scripts/import-philmass.ts` ### Step 1: Update church-matcher.ts In `ExistingChurch` interface (line ~11-26), add after `philmassId`: ```typescript horariosMisasId: string | null; ``` In `ChurchCandidate` type (line ~113-122), add after `philmassId`: ```typescript horariosMisasId?: string; ``` In `findDuplicateChurch()`, add a new pass after the fifth pass (philmassId match, line ~169-175). Before the proximity+name pass: ```typescript // Sixth pass: exact horariosMisasId match if (candidate.horariosMisasId) { const horariosMisasMatch = existingChurches.find( (church) => church.horariosMisasId === candidate.horariosMisasId ); if (horariosMisasMatch) return horariosMisasMatch; } ``` Update the comment on the proximity pass to say "Seventh pass". ### Step 2: Update all existing importers In every importer that queries churches with a `select` clause containing `philmassId: true`, add: ```typescript horariosMisasId: true, ``` In every importer that creates/pushes churches with `philmassId: null`, add: ```typescript horariosMisasId: null, ``` **Files to update:** `import-osm-churches.ts`, `import-gcatholic.ts`, `import-baidu-churches.ts`, `import-osm-region.ts`, `import-orarimesse.ts`, `import-mass-schedules-ph.ts`, `import-philmass.ts` ### Step 3: Verify build ```bash npx tsc --noEmit ``` Expected: No errors. ### Step 4: Commit ```bash git add src/lib/church-matcher.ts scripts/import-*.ts git commit -m "feat: add horariosMisasId to church matcher and all importers" ``` --- ## Task 3: Create `import-horariosmisas.ts` **Files:** - Create: `scripts/import-horariosmisas.ts` ### Architecture This importer follows the exact same structure as `scripts/import-mass-schedules-ph.ts`. Key differences: - **Sitemap:** Fetches 20 post sitemaps from sitemap index (not a single sitemap) - **URL filtering:** Church URLs have 3 path segments (`/{province}/{city}/{slug}/`). Non-church URLs (blog posts, daily readings) are filtered out. - **Schedule parsing:** Two seasonal tables (summer/winter). Import seasonally appropriate one based on current month. - **Day names:** Spanish (`Lunes`, `Martes`, etc.) with range support (`Lunes a Viernes`) - **Times:** 24-hour `HH:MMh` format (e.g., `08:00h`, `20:30h`) - **No coordinates:** Churches created with `latitude: 0, longitude: 0` — geocoded separately - **Geocoding:** Optional `--geocode` flag uses Nominatim public API (1 req/sec) ### Constants ```typescript const SITE_BASE = 'https://horariosmisas.com'; const SITEMAP_INDEX_URL = `${SITE_BASE}/sitemap_index.xml`; const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)'; const REQUEST_DELAY_MS = 1500; const NOMINATIM_DELAY_MS = 1100; const NOMINATIM_URL = 'https://nominatim.openstreetmap.org/search'; ``` ### Spanish Day Mapping ```typescript const DAY_MAP: Record = { 'domingos y festivos': [0], 'domingos': [0], 'domingo': [0], 'lunes': [1], 'martes': [2], 'miércoles': [3], 'miercoles': [3], 'jueves': [4], 'viernes': [5], 'sábado': [6], 'sabado': [6], 'sábados': [6], 'sabados': [6], }; ``` ### Sitemap Fetching 1. Fetch sitemap index → extract `post-sitemap*.xml` URLs 2. Fetch each post sitemap → extract URLs with exactly 3 path segments 3. Filter out non-church URLs (patterns: `/misas-diarias/`, `/santos-del-dia/`, `/oraciones/`, `/noticias/`, `/blog/`, `/contacto/`, `/aviso-legal/`, `/politica-de-privacidad/`, `/politica-de-cookies/`) 4. Deduplicate by slug ### HTML Parsing **Church name:** `

Church Name (City)

` → strip `(City)` suffix **Address:** `📌 Calle Goya, 26 28001 Madrid (Madrid)` → extract street, postal code (5-digit `\b\d{5}\b`), city (text after postal code), strip `(Province)` suffix **Phone:** `Teléfono: number` **Website:** `Página Web: ...` **Schedule tables:** Find `` elements with DÍA/HORARIO headers. Split by seasonal headings (☀️ verano / ⛄ invierno). Pick seasonally appropriate section (Oct-May = winter, Jun-Sep = summer). Parse `
` cells: first cell = day name(s), second cell = times. Times in `HH:MMh` format extracted via regex `(\d{1,2}):(\d{2})\s*h?`. ### Day Range Resolution Support ranges like `Lunes a Viernes` → [1,2,3,4,5] and compound entries like `Lunes, Miércoles y Viernes` → [1,3,5]. ### Geocoding (--geocode / --geocode-only) Query Nominatim with: `{address}, Spain` → fallback to `{postalCode} {city}, Spain` → fallback to `{city}, Spain`. Use `countrycodes=es` parameter. Max 1 req/sec. ### Matching Strategy 1. `horariosMisasId` exact match (primary — for re-imports) 2. Name + proximity against existing Spanish OSM churches (secondary) 3. Unmatched: create new church with `latitude: 0, longitude: 0`, country=ES ### CLI ``` --all Import all churches from sitemaps --province Import only churches from this province --dry-run No database writes --geocode After import, geocode unmatched churches --geocode-only Only geocode (skip import) --resume-from Skip first N churches --job-id Background job tracking ``` ### Mass Schedule Language Set `language: 'Spanish'` on all created mass schedules. ### Step 1: Create the file Use `scripts/import-mass-schedules-ph.ts` as the structural template. Implement all functions described above. ### Step 2: Verify build ```bash npx tsc --noEmit ``` ### Step 3: Dry-run test ```bash npx tsx scripts/import-horariosmisas.ts --province navarra --dry-run ``` ### Step 4: Commit ```bash git add scripts/import-horariosmisas.ts git commit -m "feat: add horariosmisas.com Spain church importer" ``` --- ## Task 4: Add to Scheduler Pipeline and npm Scripts **Files:** - Modify: `scripts/scheduler.ts` - Modify: `package.json` ### Step 1: Add to PIPELINE_GROUPS In `scripts/scheduler.ts`, in the `imports` group (line ~40-51), add after the `philmass-import` entry: ```typescript { name: 'horariosmisas-import', type: 'horariosmisas-import', config: {} }, ``` ### Step 2: Add getJobCommand case In the `getJobCommand` function (around line ~182), before the `default:` case, add: ```typescript case 'horariosmisas-import': { const args = ['tsx', 'scripts/import-horariosmisas.ts', '--all', '--geocode']; if (config?.province) args.push('--province', String(config.province)); if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom)); return { command: 'npx', args }; } ``` ### Step 3: Add npm scripts In `package.json`, add after the `"import:philmass"` line: ```json "import:horariosmisas": "tsx scripts/import-horariosmisas.ts", ``` ### Step 4: Verify build ```bash npx tsc --noEmit ``` ### Step 5: Commit ```bash git add scripts/scheduler.ts package.json git commit -m "feat: add horariosmisas import to scheduler pipeline" ``` --- ## Verification 1. **Dry run on single province**: `npx tsx scripts/import-horariosmisas.ts --province navarra --dry-run` - Verify: church names parsed correctly, schedules extracted, matches found 2. **Dry run on Madrid**: `npx tsx scripts/import-horariosmisas.ts --province madrid --dry-run` - Verify: larger province, summer/winter schedule selection, address parsing 3. **Single province real import**: `npx tsx scripts/import-horariosmisas.ts --province navarra` - Verify: churches created/updated, mass schedules in database 4. **Geocode test**: `npx tsx scripts/import-horariosmisas.ts --geocode-only --dry-run` - Verify: finds churches needing geocoding, Nominatim returns coordinates 5. **Full import**: `npx tsx scripts/import-horariosmisas.ts --all --geocode` ## Runtime Estimate - Sitemap fetch: 20 sitemaps x 1.5s = ~30s - Import: ~10,000 churches x 1.5s = ~4.2 hours - Geocode: depends on unmatched count x 1.1s