# DiscoverMass.com Importer Implementation Plan > **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Import 20,284 US Catholic churches with mass/confession/adoration schedules from discovermass.com into the NearestMass database. **Architecture:** Enumerate 11 WordPress sitemaps → fetch each church page at 10s intervals (respecting Crawl-delay) → parse server-rendered HTML for name/address/coordinates/schedules → match against existing US churches via church-matcher → upsert with full schedule data. **Tech Stack:** TypeScript/tsx, Prisma 7 + PrismaPg adapter, pg Pool, Node.js `fetch`, regex HTML parsing (no DOM library needed — HTML is server-rendered and predictable). --- ## Chunk 1: Schema + church-matcher ### Task 1: Add discovermassId to schema **Files:** - Modify: `prisma/schema.prisma` The schema lives in this repo but migrations run in BethelGuide. After editing schema.prisma here, run `npx prisma generate` to regenerate the Prisma client. Do NOT run `prisma migrate`. - [ ] **Step 1: Find the right place in schema.prisma** Open `prisma/schema.prisma`. Find the block of source ID fields — they look like: ```prisma gottesdienstzeitenId String? @unique @map("gottesdienstzeiten_id") ``` This is inside the `model Church { ... }` block, after `kerknetId` and before `claimed`. - [ ] **Step 2: Add discovermassId field** After `gottesdienstzeitenId`: ```prisma discovermassId String? @unique @map("discovermass_id") ``` Also find the `@@index` block near the bottom of the Church model (it groups all the index definitions). Add: ```prisma @@index([discovermassId]) ``` - [ ] **Step 3: Regenerate Prisma client** ```bash cd /home/albert/Documents/ScraperControl npx prisma generate ``` Expected output: `✔ Generated Prisma Client` (no errors). This does NOT touch the database — it only updates the TypeScript client. - [ ] **Step 4: Apply migration to database** The schema source of truth is BethelGuide. Run the migration there, then sync back. Since we're on the same dev server: ```bash # Check if discovermass_id column already exists (it shouldn't yet) psql postgresql://postgres:postgres@192.168.0.145:5434/nearestmass -c "\d churches" | grep discovermass ``` If the column doesn't exist, apply it directly: ```bash psql postgresql://postgres:postgres@192.168.0.145:5434/nearestmass -c " ALTER TABLE churches ADD COLUMN IF NOT EXISTS discovermass_id VARCHAR UNIQUE; CREATE INDEX IF NOT EXISTS churches_discovermass_id_idx ON churches(discovermass_id); " ``` Expected output: `ALTER TABLE` and `CREATE INDEX` - [ ] **Step 5: Verify column exists** ```bash psql postgresql://postgres:postgres@192.168.0.145:5434/nearestmass -c "\d churches" | grep discovermass ``` Expected output: `discovermass_id | character varying | ...` - [ ] **Step 6: Commit** ```bash cd /home/albert/Documents/ScraperControl git add prisma/schema.prisma git commit -m "feat: add discovermassId field to Church schema" ``` --- ### Task 2: Update church-matcher **Files:** - Modify: `src/lib/church-matcher.ts` The `ExistingChurch` interface (line ~11) lists all source IDs. The `ChurchCandidate` type (line ~122) lists optional source IDs for the candidate. The `findDuplicateChurch` function has sequential passes checking each ID before falling back to proximity+name. - [ ] **Step 1: Add discovermassId to ExistingChurch interface** Find the `export interface ExistingChurch {` block. After the `gottesdienstzeitenId` line, add: ```typescript discovermassId: string | null; ``` - [ ] **Step 2: Add discovermassId to ChurchCandidate type** Find `export type ChurchCandidate = {`. After `gottesdienstzeitenId?: string;`, add: ```typescript discovermassId?: string; ``` - [ ] **Step 3: Add discovermassId matching pass in findDuplicateChurch** Find the `findDuplicateChurch` function. It has a series of passes like: ```typescript if (candidate.gottesdienstzeitenId) { const match = existingChurches.find(c => c.gottesdienstzeitenId === candidate.gottesdienstzeitenId); if (match) return match; } // Proximity + name similarity ``` Add a new pass BEFORE the proximity+name pass (after gottesdienstzeitenId): ```typescript if (candidate.discovermassId) { const match = existingChurches.find(c => c.discovermassId === candidate.discovermassId); if (match) return match; } ``` - [ ] **Step 4: Update all callers that construct ExistingChurch objects** Search for places that build ExistingChurch objects (the in-memory push after creating a new church). Each importer has a block like: ```typescript existingChurches.push({ id: newChurch.id, ... gottesdienstzeitenId: null, ... }); ``` Run: ```bash grep -rn "gottesdienstzeitenId: null" scripts/ ``` For each file found: add `discovermassId: null,` after `gottesdienstzeitenId: null,`. These are the in-memory dedup arrays — they need the new field or TypeScript will complain. Also update the `loadExistingChurches` select queries if any importer has one (check with `grep -rn "gottesdienstzeitenId: true" scripts/`). - [ ] **Step 5: Verify TypeScript compiles** ```bash cd /home/albert/Documents/ScraperControl npx tsc --noEmit ``` Expected: no errors. Fix any type errors (they'll be missing `discovermassId` fields). - [ ] **Step 6: Commit** ```bash # Stage church-matcher AND all importer scripts that were updated in Step 4 git add src/lib/church-matcher.ts git add scripts/ git commit -m "feat: add discovermassId to church-matcher ExistingChurch and ChurchCandidate" ``` --- ## Chunk 2: import-discovermass.ts — utilities and parsing ### Task 3: Create file skeleton + utilities **Files:** - Create: `scripts/import-discovermass.ts` - [ ] **Step 1: Create the file with header, imports, constants, types** Create `scripts/import-discovermass.ts` with this content: ```typescript #!/usr/bin/env tsx /** * Import Catholic churches and mass schedules from discovermass.com (USA) * * discovermass.com is a US Catholic church directory with 20,284 churches. * Data includes name, address, phone, website, coordinates, mass times, * confessions, and adoration schedules. * * robots.txt specifies Crawl-delay: 10 — this importer follows that rule. * * Usage: * npx tsx scripts/import-discovermass.ts --all * npx tsx scripts/import-discovermass.ts --all --dry-run * npx tsx scripts/import-discovermass.ts --all --resume-from 5000 * npx tsx scripts/import-discovermass.ts --all --job-id {uuid} */ import dotenv from 'dotenv'; import path from 'path'; dotenv.config({ path: path.resolve(process.cwd(), '.env.local') }); dotenv.config({ path: path.resolve(process.cwd(), '.env') }); import { Pool } from 'pg'; import { PrismaPg } from '@prisma/adapter-pg'; import { PrismaClient } from '@prisma/client'; const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass'; console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`); const pool = new Pool({ connectionString: dbUrl, ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined, }); const adapter = new PrismaPg(pool); const prisma = new PrismaClient({ adapter }); import { findDuplicateChurch } from '../src/lib/church-matcher'; import type { ExistingChurch } from '../src/lib/church-matcher'; // ─── Constants ─────────────────────────────────────────────────────────────── const SITE_BASE = 'https://discovermass.com'; const SITEMAP_COUNT = 11; const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)'; const REQUEST_DELAY_MS = 10_000; // Crawl-delay: 10 from robots.txt // ─── Types ─────────────────────────────────────────────────────────────────── interface ParsedChurch { name: string; address: string | null; city: string | null; state: string | null; zip: string | null; phone: string | null; website: string | null; lat: number; lng: number; } interface ParsedMass { dayOfWeek: number; // 0=Sun, 1=Mon, ..., 6=Sat time: string; // HH:MM 24-hour language: string; notes?: string; } interface ParsedConf { dayOfWeek: number; startTime: string; // HH:MM 24-hour endTime: string; // HH:MM 24-hour notes?: string; } interface ParsedAdoration { dayOfWeek: number; startTime: string; // HH:MM 24-hour endTime: string; // HH:MM 24-hour notes?: string; } interface ImportStats { total: number; created: number; updated: number; skipped: number; errors: number; massSchedulesCreated: number; confessionSchedulesCreated: number; adorationSchedulesCreated: number; } interface CLIArgs { all: boolean; dryRun: boolean; resumeFrom?: number; jobId?: string; } ``` - [ ] **Step 2: Add day mappings and time utilities** Append to the file: ```typescript // ─── Day Mappings ───────────────────────────────────────────────────────────── // Full day names used in mass schedule