# Brazil + Spain Importers Implementation Plan
> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add two new church importers — horariodemissa.com.br (8,895 Brazilian churches + 28,523 mass times) and misas.org (17,919 Spanish churches with coordinates).
**Architecture:** Chunk 1 (shared prerequisites) must complete first. Tasks 3–5 (Brazil) and Tasks 6–7 (Spain) are independent and can run in parallel as subagents. All scripts follow the established importer pattern: fetch → regex parse → church-matcher dedup → prisma upsert.
**Tech Stack:** TypeScript, tsx, native `fetch`, regex HTML parsing (matchAll), Prisma + pg, church-matcher
**Spec:** `docs/superpowers/specs/2026-03-10-brazil-spain-importers-design.md`
---
## Chunk 1: Shared Prerequisites (schema + church-matcher)
### Task 1: Schema additions
**Files:**
- Modify: `prisma/schema.prisma`
- [ ] **Step 1: Add two new ID fields to the Church model**
In `prisma/schema.prisma`, find the block of importer ID fields (near `gottesdienstzeitenId`) and add after it:
```prisma
horarioDemissaId String? @unique @map("horario_demissa_id")
misasOrgId String? @unique @map("misas_org_id")
```
Then add two indexes in the `@@index` block at the bottom of the Church model:
```prisma
@@index([horarioDemissaId])
@@index([misasOrgId])
```
- [ ] **Step 2: Regenerate Prisma client**
```bash
npx prisma generate
```
Expected: `✔ Generated Prisma Client` with no errors.
- [ ] **Step 3: Verify the fields exist in generated types**
```bash
grep -n "horarioDemissaId\|misasOrgId" node_modules/.prisma/client/index.d.ts | head -10
```
Expected: both fields appear in the type definitions.
- [ ] **Step 4: Commit**
```bash
git add prisma/schema.prisma
git commit -m "feat: add horarioDemissaId and misasOrgId fields to Church schema"
```
---
### Task 2: church-matcher updates
**Files:**
- Modify: `src/lib/church-matcher.ts`
- [ ] **Step 1: Add new fields to ExistingChurch interface**
In `src/lib/church-matcher.ts`, find `ExistingChurch` interface and add after `gottesdienstzeitenId`:
```typescript
horarioDemissaId: string | null;
misasOrgId: string | null;
```
- [ ] **Step 2: Add new fields to ChurchCandidate type**
Find `ChurchCandidate` type and add after `gottesdienstzeitenId?`:
```typescript
horarioDemissaId?: string;
misasOrgId?: string;
```
- [ ] **Step 3: Add two new exact-match passes in findDuplicateChurch**
After the Thirteenth pass (gottesdienstzeitenId), add before the proximity pass:
```typescript
// Fourteenth pass: exact horarioDemissaId match
if (candidate.horarioDemissaId) {
const match = existingChurches.find(
(church) => church.horarioDemissaId === candidate.horarioDemissaId
);
if (match) return match;
}
// Fifteenth pass: exact misasOrgId match
if (candidate.misasOrgId) {
const match = existingChurches.find(
(church) => church.misasOrgId === candidate.misasOrgId
);
if (match) return match;
}
```
- [ ] **Step 4: Verify TypeScript compiles**
```bash
npx tsc --noEmit
```
Expected: no errors.
- [ ] **Step 5: Commit**
```bash
git add src/lib/church-matcher.ts
git commit -m "feat: add horarioDemissaId and misasOrgId to church-matcher"
```
---
## Chunk 2: Brazil Importer (import-horariodemissa.ts)
> Depends on Chunk 1. Can run in parallel with Chunk 3.
### Task 3: Boilerplate + sitemap enumeration
**Files:**
- Create: `scripts/import-horariodemissa.ts`
- [ ] **Step 1: Create script with boilerplate + types + sitemap parsing**
Create `scripts/import-horariodemissa.ts`:
```typescript
#!/usr/bin/env tsx
/**
* Import Catholic churches and mass schedules from horariodemissa.com.br (Brazil)
*
* horariodemissa.com.br has 8,895 churches across all 26 Brazilian states + DF,
* with 28,523 mass times. All data is server-rendered — one HTTP request per city
* page returns all churches + schedules for that city.
*
* City pages have a split structure:
* - Address/phone: embedded in JS h.push() strings (sidebar/map data)
* - Schedules: in server-rendered .result divs with
rows
* Both sets are linked by the same church key (e.g. "dvey2").
*
* Import strategy:
* 1. Fetch sitemap.xml → deduplicate to pt-only city URLs (~3,552 cities)
* 2. For each city: fetch page → parse address/phone from JS + schedules from DOM
* 3. Join by church key, match against existing BR churches, upsert
* 4. Optional --geocode flag for Nominatim pass after import
*
* Usage:
* npx tsx scripts/import-horariodemissa.ts --all
* npx tsx scripts/import-horariodemissa.ts --all --dry-run
* npx tsx scripts/import-horariodemissa.ts --state SP
* npx tsx scripts/import-horariodemissa.ts --all --resume-from 500
* npx tsx scripts/import-horariodemissa.ts --all --geocode
* npx tsx scripts/import-horariodemissa.ts --geocode-only
* npx tsx scripts/import-horariodemissa.ts --all --job-id {uuid}
*/
import dotenv from 'dotenv';
import path from 'path';
dotenv.config({ path: path.resolve(process.cwd(), '.env.local') });
dotenv.config({ path: path.resolve(process.cwd(), '.env') });
import { Pool } from 'pg';
import { PrismaPg } from '@prisma/adapter-pg';
import { PrismaClient } from '@prisma/client';
const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass';
console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`);
const pool = new Pool({
connectionString: dbUrl,
ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined,
});
const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
import { findDuplicateChurch } from '../src/lib/church-matcher';
import type { ExistingChurch } from '../src/lib/church-matcher';
// ─── Constants ───────────────────────────────────────────────────────────────
const SITE_BASE = 'https://horariodemissa.com.br';
const SITEMAP_URL = `${SITE_BASE}/sitemap.xml`;
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
const REQUEST_DELAY_MS = 1500;
const NOMINATIM_DELAY_MS = 1100;
const NOMINATIM_URL = 'https://nominatim.openstreetmap.org/search';
// ─── Types ───────────────────────────────────────────────────────────────────
interface CityUrl {
state: string; // e.g. "SP"
city: string; // e.g. "São Paulo"
url: string; // full fetch URL
}
interface ParsedSchedule {
dayOfWeek: number; // 0=Sun, 1=Mon, ..., 6=Sat
time: string; // "HH:MM"
notes: string | null;
}
interface ParsedConfession {
dayOfWeek: number;
startTime: string;
endTime: string;
notes: string | null;
}
interface ParsedChurch {
key: string; // e.g. "dvey2" (used as horarioDemissaId)
name: string;
address: string | null;
phone: string | null;
city: string;
state: string;
massSchedules: ParsedSchedule[];
confessionSchedules: ParsedConfession[];
}
interface CLIArgs {
all: boolean;
state?: string;
dryRun: boolean;
geocode: boolean;
geocodeOnly: boolean;
resumeFrom?: number;
jobId?: string;
}
interface ImportStats {
citiesProcessed: number;
churchesFound: number;
churchesCreated: number;
churchesUpdated: number;
massSchedulesCreated: number;
geocoded: number;
geocodeFailed: number;
errors: number;
}
// ─── Brazilian Day Name Mapping ───────────────────────────────────────────────
const DAY_MAP: Record = {
'domingo': 0,
'segunda-feira': 1, 'segunda': 1,
'terça-feira': 2, 'terca-feira': 2, 'terça': 2,
'quarta-feira': 3, 'quarta': 3,
'quinta-feira': 4, 'quinta': 4,
'sexta-feira': 5, 'sexta': 5,
'sábado': 6, 'sabado': 6,
};
const SPECIAL_DAY_MAP: Record = {
'primeiro domingo': { dayOfWeek: 0, notes: 'Primeiro Domingo' },
'segundo domingo': { dayOfWeek: 0, notes: 'Segundo Domingo' },
'terceiro domingo': { dayOfWeek: 0, notes: 'Terceiro Domingo' },
'quarto domingo': { dayOfWeek: 0, notes: 'Quarto Domingo' },
'primeiro sábado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
'primeiro sabado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
'segundo sábado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
'segundo sabado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
};
// ─── HTTP Client ──────────────────────────────────────────────────────────────
let requestCount = 0;
function delay(ms: number): Promise {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function fetchPage(url: string, delayMs: number = REQUEST_DELAY_MS): Promise {
if (requestCount > 0) await delay(delayMs);
requestCount++;
try {
const response = await fetch(url, {
headers: {
'User-Agent': USER_AGENT,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'pt-BR,pt;q=0.9',
},
});
if (!response.ok) {
console.error(` HTTP ${response.status} for ${url}`);
return null;
}
return await response.text();
} catch (error) {
console.error(` Fetch error for ${url}: ${error instanceof Error ? error.message : error}`);
return null;
}
}
// ─── Sitemap Parser ───────────────────────────────────────────────────────────
export function parseCityUrlsFromSitemap(sitemapXml: string, filterState?: string): CityUrl[] {
const seen = new Set();
const cities: CityUrl[] = [];
for (const match of sitemapXml.matchAll(/([^<]+)<\/loc>/g)) {
const rawUrl = match[1].replace(/&/g, '&');
// Only pt-language city search pages
if (!rawUrl.includes('opcoes=cidade_opcoes') || rawUrl.includes('hl=en')) continue;
const ufMatch = rawUrl.match(/[?&]uf=([A-Z]+)/);
const cidadeMatch = rawUrl.match(/[?&]cidade=([^&]+)/);
if (!ufMatch || !cidadeMatch) continue;
const state = ufMatch[1];
const city = decodeURIComponent(cidadeMatch[1].replace(/\+/g, ' '));
if (filterState && state !== filterState.toUpperCase()) continue;
const key = `${state}:${city}`;
if (seen.has(key)) continue;
seen.add(key);
cities.push({ state, city, url: rawUrl });
}
cities.sort((a, b) => a.state.localeCompare(b.state) || a.city.localeCompare(b.city));
return cities;
}
async function fetchCityUrls(filterState?: string): Promise {
console.log(`Fetching sitemap: ${SITEMAP_URL}`);
const xml = await fetchPage(SITEMAP_URL);
if (!xml) throw new Error('Failed to fetch sitemap');
const cities = parseCityUrlsFromSitemap(xml, filterState);
console.log(`Found ${cities.length} unique cities${filterState ? ` in ${filterState}` : ''}`);
return cities;
}
```
- [ ] **Step 2: Verify sitemap parsing works**
```bash
npx tsx -e "
import dotenv from 'dotenv';
dotenv.config({ path: '.env' });
const { parseCityUrlsFromSitemap } = await import('./scripts/import-horariodemissa.ts');
const xml = await fetch('https://horariodemissa.com.br/sitemap.xml').then(r => r.text());
const cities = parseCityUrlsFromSitemap(xml);
console.log('Total cities:', cities.length);
console.log('Sample:', JSON.stringify(cities.slice(0, 3), null, 2));
const states = [...new Set(cities.map(c => c.state))].sort();
console.log('States:', states.join(', '));
"
```
Expected: ~3,500 cities, states include SP, RJ, MG, RS, BA, DF, etc.
- [ ] **Step 3: Commit**
```bash
git add scripts/import-horariodemissa.ts
git commit -m "feat: horariodemissa importer scaffold + sitemap enumeration"
```
---
### Task 4: HTML parsing
**Files:**
- Modify: `scripts/import-horariodemissa.ts`
- [ ] **Step 1: Understand the dual-source page structure**
Each city page contains two data sources per church, joined by the same key (e.g. `dvey2`):
**Source A** — JS `h.push()` strings embedded in `