Reset local main to gitea/master (new source of truth) and restored local-only files: web scrapers, admin dashboard, ChromaDB integration, debug scripts, and utility libraries that aren't tracked in Gitea. Gitea master adds: discovermass, buscarmisas-network, hk-parishes, bohosluzby, kerknet, gottesdienstzeiten, miserend importers, ClaimRequest model, forward geocoding, heartbeat healthcheck. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.7 KiB
Spain Church Importer (horariosmisas.com) — Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Import ~10,000 Spanish churches with mass schedules from horariosmisas.com, with optional Nominatim forward geocoding for unmatched churches.
Architecture: Sitemap-driven importer. Fetch 20 post sitemaps for church URLs, parse static WordPress HTML for names/addresses/schedule tables, match against existing Spanish OSM churches, upsert with mass schedules. Separate geocoding pass via Nominatim public API.
Tech Stack: TypeScript, Prisma, HTML parsing (regex — no Playwright), Nominatim geocoding API.
Task 1: Add horariosMisasId to Prisma Schema
Files:
- Modify:
prisma/schema.prisma
Step 1: Add field and index
After the philmassId line (around line 38), add:
horariosMisasId String? @unique @map("horarios_misas_id") // horariosmisas.com URL slug
And add an index in the @@index block (around line 78):
@@index([horariosMisasId])
Step 2: Push schema to NAS database
npx prisma db push --accept-data-loss
Expected: Your database is now in sync with your Prisma schema.
Step 3: Regenerate Prisma client
npx prisma generate
Step 4: Push schema to Neon production
npx prisma db push --url "$(grep DATABASE_URL .env.production | sed 's/DATABASE_URL="//' | sed 's/"$//')" --accept-data-loss
Step 5: Commit
git add prisma/schema.prisma
git commit -m "feat: add horariosMisasId to Church model for Spain import"
Task 2: Extend Church Matcher and Existing Importers
Files:
- Modify:
src/lib/church-matcher.ts - Modify:
scripts/import-osm-churches.ts - Modify:
scripts/import-gcatholic.ts - Modify:
scripts/import-baidu-churches.ts - Modify:
scripts/import-osm-region.ts - Modify:
scripts/import-orarimesse.ts - Modify:
scripts/import-mass-schedules-ph.ts - Modify:
scripts/import-philmass.ts
Step 1: Update church-matcher.ts
In ExistingChurch interface (line ~11-26), add after philmassId:
horariosMisasId: string | null;
In ChurchCandidate type (line ~113-122), add after philmassId:
horariosMisasId?: string;
In findDuplicateChurch(), add a new pass after the fifth pass (philmassId match, line ~169-175). Before the proximity+name pass:
// Sixth pass: exact horariosMisasId match
if (candidate.horariosMisasId) {
const horariosMisasMatch = existingChurches.find(
(church) => church.horariosMisasId === candidate.horariosMisasId
);
if (horariosMisasMatch) return horariosMisasMatch;
}
Update the comment on the proximity pass to say "Seventh pass".
Step 2: Update all existing importers
In every importer that queries churches with a select clause containing philmassId: true, add:
horariosMisasId: true,
In every importer that creates/pushes churches with philmassId: null, add:
horariosMisasId: null,
Files to update: import-osm-churches.ts, import-gcatholic.ts, import-baidu-churches.ts, import-osm-region.ts, import-orarimesse.ts, import-mass-schedules-ph.ts, import-philmass.ts
Step 3: Verify build
npx tsc --noEmit
Expected: No errors.
Step 4: Commit
git add src/lib/church-matcher.ts scripts/import-*.ts
git commit -m "feat: add horariosMisasId to church matcher and all importers"
Task 3: Create import-horariosmisas.ts
Files:
- Create:
scripts/import-horariosmisas.ts
Architecture
This importer follows the exact same structure as scripts/import-mass-schedules-ph.ts. Key differences:
- Sitemap: Fetches 20 post sitemaps from sitemap index (not a single sitemap)
- URL filtering: Church URLs have 3 path segments (
/{province}/{city}/{slug}/). Non-church URLs (blog posts, daily readings) are filtered out. - Schedule parsing: Two seasonal tables (summer/winter). Import seasonally appropriate one based on current month.
- Day names: Spanish (
Lunes,Martes, etc.) with range support (Lunes a Viernes) - Times: 24-hour
HH:MMhformat (e.g.,08:00h,20:30h) - No coordinates: Churches created with
latitude: 0, longitude: 0— geocoded separately - Geocoding: Optional
--geocodeflag uses Nominatim public API (1 req/sec)
Constants
const SITE_BASE = 'https://horariosmisas.com';
const SITEMAP_INDEX_URL = `${SITE_BASE}/sitemap_index.xml`;
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
const REQUEST_DELAY_MS = 1500;
const NOMINATIM_DELAY_MS = 1100;
const NOMINATIM_URL = 'https://nominatim.openstreetmap.org/search';
Spanish Day Mapping
const DAY_MAP: Record<string, number[]> = {
'domingos y festivos': [0],
'domingos': [0],
'domingo': [0],
'lunes': [1],
'martes': [2],
'miércoles': [3],
'miercoles': [3],
'jueves': [4],
'viernes': [5],
'sábado': [6],
'sabado': [6],
'sábados': [6],
'sabados': [6],
};
Sitemap Fetching
- Fetch sitemap index → extract
post-sitemap*.xmlURLs - Fetch each post sitemap → extract URLs with exactly 3 path segments
- Filter out non-church URLs (patterns:
/misas-diarias/,/santos-del-dia/,/oraciones/,/noticias/,/blog/,/contacto/,/aviso-legal/,/politica-de-privacidad/,/politica-de-cookies/) - Deduplicate by slug
HTML Parsing
Church name: <h1>Church Name (City)</h1> → strip (City) suffix
Address: 📌 <strong>Calle Goya, 26 28001 Madrid (Madrid)</strong> → extract street, postal code (5-digit \b\d{5}\b), city (text after postal code), strip (Province) suffix
Phone: <strong>Teléfono:</strong> <a href="tel:...">number</a>
Website: <strong>Página Web:</strong> <a href="url">...</a>
Schedule tables: Find <table> elements with DÍA/HORARIO headers. Split by seasonal headings (☀️ verano / ⛄ invierno). Pick seasonally appropriate section (Oct-May = winter, Jun-Sep = summer). Parse <td> cells: first cell = day name(s), second cell = times. Times in HH:MMh format extracted via regex (\d{1,2}):(\d{2})\s*h?.
Day Range Resolution
Support ranges like Lunes a Viernes → [1,2,3,4,5] and compound entries like Lunes, Miércoles y Viernes → [1,3,5].
Geocoding (--geocode / --geocode-only)
Query Nominatim with: {address}, Spain → fallback to {postalCode} {city}, Spain → fallback to {city}, Spain. Use countrycodes=es parameter. Max 1 req/sec.
Matching Strategy
horariosMisasIdexact match (primary — for re-imports)- Name + proximity against existing Spanish OSM churches (secondary)
- Unmatched: create new church with
latitude: 0, longitude: 0, country=ES
CLI
--all Import all churches from sitemaps
--province <name> Import only churches from this province
--dry-run No database writes
--geocode After import, geocode unmatched churches
--geocode-only Only geocode (skip import)
--resume-from <n> Skip first N churches
--job-id <uuid> Background job tracking
Mass Schedule Language
Set language: 'Spanish' on all created mass schedules.
Step 1: Create the file
Use scripts/import-mass-schedules-ph.ts as the structural template. Implement all functions described above.
Step 2: Verify build
npx tsc --noEmit
Step 3: Dry-run test
npx tsx scripts/import-horariosmisas.ts --province navarra --dry-run
Step 4: Commit
git add scripts/import-horariosmisas.ts
git commit -m "feat: add horariosmisas.com Spain church importer"
Task 4: Add to Scheduler Pipeline and npm Scripts
Files:
- Modify:
scripts/scheduler.ts - Modify:
package.json
Step 1: Add to PIPELINE_GROUPS
In scripts/scheduler.ts, in the imports group (line ~40-51), add after the philmass-import entry:
{ name: 'horariosmisas-import', type: 'horariosmisas-import', config: {} },
Step 2: Add getJobCommand case
In the getJobCommand function (around line ~182), before the default: case, add:
case 'horariosmisas-import': {
const args = ['tsx', 'scripts/import-horariosmisas.ts', '--all', '--geocode'];
if (config?.province) args.push('--province', String(config.province));
if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom));
return { command: 'npx', args };
}
Step 3: Add npm scripts
In package.json, add after the "import:philmass" line:
"import:horariosmisas": "tsx scripts/import-horariosmisas.ts",
Step 4: Verify build
npx tsc --noEmit
Step 5: Commit
git add scripts/scheduler.ts package.json
git commit -m "feat: add horariosmisas import to scheduler pipeline"
Verification
- Dry run on single province:
npx tsx scripts/import-horariosmisas.ts --province navarra --dry-run- Verify: church names parsed correctly, schedules extracted, matches found
- Dry run on Madrid:
npx tsx scripts/import-horariosmisas.ts --province madrid --dry-run- Verify: larger province, summer/winter schedule selection, address parsing
- Single province real import:
npx tsx scripts/import-horariosmisas.ts --province navarra- Verify: churches created/updated, mass schedules in database
- Geocode test:
npx tsx scripts/import-horariosmisas.ts --geocode-only --dry-run- Verify: finds churches needing geocoding, Nominatim returns coordinates
- Full import:
npx tsx scripts/import-horariosmisas.ts --all --geocode
Runtime Estimate
- Sitemap fetch: 20 sitemaps x 1.5s = ~30s
- Import: ~10,000 churches x 1.5s = ~4.2 hours
- Geocode: depends on unmatched count x 1.1s