Two new importers: - horariodemissa.com.br: 8,895 Brazilian churches + 28,523 mass times - misas.org: 17,919 Spanish churches with coordinates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
46 KiB
Brazil + Spain Importers Implementation Plan
For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add two new church importers — horariodemissa.com.br (8,895 Brazilian churches + 28,523 mass times) and misas.org (17,919 Spanish churches with coordinates).
Architecture: Chunk 1 (shared prerequisites) must complete first. Tasks 3–5 (Brazil) and Tasks 6–7 (Spain) are independent and can run in parallel as subagents. All scripts follow the established importer pattern: fetch → regex parse → church-matcher dedup → prisma upsert.
Tech Stack: TypeScript, tsx, native fetch, regex HTML parsing (matchAll), Prisma + pg, church-matcher
Spec: docs/superpowers/specs/2026-03-10-brazil-spain-importers-design.md
Chunk 1: Shared Prerequisites (schema + church-matcher)
Task 1: Schema additions
Files:
-
Modify:
prisma/schema.prisma -
Step 1: Add two new ID fields to the Church model
In prisma/schema.prisma, find the block of importer ID fields (near gottesdienstzeitenId) and add after it:
horarioDemissaId String? @unique @map("horario_demissa_id")
misasOrgId String? @unique @map("misas_org_id")
Then add two indexes in the @@index block at the bottom of the Church model:
@@index([horarioDemissaId])
@@index([misasOrgId])
- Step 2: Regenerate Prisma client
npx prisma generate
Expected: ✔ Generated Prisma Client with no errors.
- Step 3: Verify the fields exist in generated types
grep -n "horarioDemissaId\|misasOrgId" node_modules/.prisma/client/index.d.ts | head -10
Expected: both fields appear in the type definitions.
- Step 4: Commit
git add prisma/schema.prisma
git commit -m "feat: add horarioDemissaId and misasOrgId fields to Church schema"
Task 2: church-matcher updates
Files:
-
Modify:
src/lib/church-matcher.ts -
Step 1: Add new fields to ExistingChurch interface
In src/lib/church-matcher.ts, find ExistingChurch interface and add after gottesdienstzeitenId:
horarioDemissaId: string | null;
misasOrgId: string | null;
- Step 2: Add new fields to ChurchCandidate type
Find ChurchCandidate type and add after gottesdienstzeitenId?:
horarioDemissaId?: string;
misasOrgId?: string;
- Step 3: Add two new exact-match passes in findDuplicateChurch
After the Thirteenth pass (gottesdienstzeitenId), add before the proximity pass:
// Fourteenth pass: exact horarioDemissaId match
if (candidate.horarioDemissaId) {
const match = existingChurches.find(
(church) => church.horarioDemissaId === candidate.horarioDemissaId
);
if (match) return match;
}
// Fifteenth pass: exact misasOrgId match
if (candidate.misasOrgId) {
const match = existingChurches.find(
(church) => church.misasOrgId === candidate.misasOrgId
);
if (match) return match;
}
- Step 4: Verify TypeScript compiles
npx tsc --noEmit
Expected: no errors.
- Step 5: Commit
git add src/lib/church-matcher.ts
git commit -m "feat: add horarioDemissaId and misasOrgId to church-matcher"
Chunk 2: Brazil Importer (import-horariodemissa.ts)
Depends on Chunk 1. Can run in parallel with Chunk 3.
Task 3: Boilerplate + sitemap enumeration
Files:
-
Create:
scripts/import-horariodemissa.ts -
Step 1: Create script with boilerplate + types + sitemap parsing
Create scripts/import-horariodemissa.ts:
#!/usr/bin/env tsx
/**
* Import Catholic churches and mass schedules from horariodemissa.com.br (Brazil)
*
* horariodemissa.com.br has 8,895 churches across all 26 Brazilian states + DF,
* with 28,523 mass times. All data is server-rendered — one HTTP request per city
* page returns all churches + schedules for that city.
*
* City pages have a split structure:
* - Address/phone: embedded in JS h.push() strings (sidebar/map data)
* - Schedules: in server-rendered .result divs with <table> rows
* Both sets are linked by the same church key (e.g. "dvey2").
*
* Import strategy:
* 1. Fetch sitemap.xml → deduplicate to pt-only city URLs (~3,552 cities)
* 2. For each city: fetch page → parse address/phone from JS + schedules from DOM
* 3. Join by church key, match against existing BR churches, upsert
* 4. Optional --geocode flag for Nominatim pass after import
*
* Usage:
* npx tsx scripts/import-horariodemissa.ts --all
* npx tsx scripts/import-horariodemissa.ts --all --dry-run
* npx tsx scripts/import-horariodemissa.ts --state SP
* npx tsx scripts/import-horariodemissa.ts --all --resume-from 500
* npx tsx scripts/import-horariodemissa.ts --all --geocode
* npx tsx scripts/import-horariodemissa.ts --geocode-only
* npx tsx scripts/import-horariodemissa.ts --all --job-id {uuid}
*/
import dotenv from 'dotenv';
import path from 'path';
dotenv.config({ path: path.resolve(process.cwd(), '.env.local') });
dotenv.config({ path: path.resolve(process.cwd(), '.env') });
import { Pool } from 'pg';
import { PrismaPg } from '@prisma/adapter-pg';
import { PrismaClient } from '@prisma/client';
const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass';
console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`);
const pool = new Pool({
connectionString: dbUrl,
ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined,
});
const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
import { findDuplicateChurch } from '../src/lib/church-matcher';
import type { ExistingChurch } from '../src/lib/church-matcher';
// ─── Constants ───────────────────────────────────────────────────────────────
const SITE_BASE = 'https://horariodemissa.com.br';
const SITEMAP_URL = `${SITE_BASE}/sitemap.xml`;
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
const REQUEST_DELAY_MS = 1500;
const NOMINATIM_DELAY_MS = 1100;
const NOMINATIM_URL = 'https://nominatim.openstreetmap.org/search';
// ─── Types ───────────────────────────────────────────────────────────────────
interface CityUrl {
state: string; // e.g. "SP"
city: string; // e.g. "São Paulo"
url: string; // full fetch URL
}
interface ParsedSchedule {
dayOfWeek: number; // 0=Sun, 1=Mon, ..., 6=Sat
time: string; // "HH:MM"
notes: string | null;
}
interface ParsedConfession {
dayOfWeek: number;
startTime: string;
endTime: string;
notes: string | null;
}
interface ParsedChurch {
key: string; // e.g. "dvey2" (used as horarioDemissaId)
name: string;
address: string | null;
phone: string | null;
city: string;
state: string;
massSchedules: ParsedSchedule[];
confessionSchedules: ParsedConfession[];
}
interface CLIArgs {
all: boolean;
state?: string;
dryRun: boolean;
geocode: boolean;
geocodeOnly: boolean;
resumeFrom?: number;
jobId?: string;
}
interface ImportStats {
citiesProcessed: number;
churchesFound: number;
churchesCreated: number;
churchesUpdated: number;
massSchedulesCreated: number;
geocoded: number;
geocodeFailed: number;
errors: number;
}
// ─── Brazilian Day Name Mapping ───────────────────────────────────────────────
const DAY_MAP: Record<string, number> = {
'domingo': 0,
'segunda-feira': 1, 'segunda': 1,
'terça-feira': 2, 'terca-feira': 2, 'terça': 2,
'quarta-feira': 3, 'quarta': 3,
'quinta-feira': 4, 'quinta': 4,
'sexta-feira': 5, 'sexta': 5,
'sábado': 6, 'sabado': 6,
};
const SPECIAL_DAY_MAP: Record<string, { dayOfWeek: number; notes: string }> = {
'primeiro domingo': { dayOfWeek: 0, notes: 'Primeiro Domingo' },
'segundo domingo': { dayOfWeek: 0, notes: 'Segundo Domingo' },
'terceiro domingo': { dayOfWeek: 0, notes: 'Terceiro Domingo' },
'quarto domingo': { dayOfWeek: 0, notes: 'Quarto Domingo' },
'primeiro sábado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
'primeiro sabado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
'segundo sábado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
'segundo sabado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
};
// ─── HTTP Client ──────────────────────────────────────────────────────────────
let requestCount = 0;
function delay(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function fetchPage(url: string, delayMs: number = REQUEST_DELAY_MS): Promise<string | null> {
if (requestCount > 0) await delay(delayMs);
requestCount++;
try {
const response = await fetch(url, {
headers: {
'User-Agent': USER_AGENT,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'pt-BR,pt;q=0.9',
},
});
if (!response.ok) {
console.error(` HTTP ${response.status} for ${url}`);
return null;
}
return await response.text();
} catch (error) {
console.error(` Fetch error for ${url}: ${error instanceof Error ? error.message : error}`);
return null;
}
}
// ─── Sitemap Parser ───────────────────────────────────────────────────────────
export function parseCityUrlsFromSitemap(sitemapXml: string, filterState?: string): CityUrl[] {
const seen = new Set<string>();
const cities: CityUrl[] = [];
for (const match of sitemapXml.matchAll(/<loc>([^<]+)<\/loc>/g)) {
const rawUrl = match[1].replace(/&/g, '&');
// Only pt-language city search pages
if (!rawUrl.includes('opcoes=cidade_opcoes') || rawUrl.includes('hl=en')) continue;
const ufMatch = rawUrl.match(/[?&]uf=([A-Z]+)/);
const cidadeMatch = rawUrl.match(/[?&]cidade=([^&]+)/);
if (!ufMatch || !cidadeMatch) continue;
const state = ufMatch[1];
const city = decodeURIComponent(cidadeMatch[1].replace(/\+/g, ' '));
if (filterState && state !== filterState.toUpperCase()) continue;
const key = `${state}:${city}`;
if (seen.has(key)) continue;
seen.add(key);
cities.push({ state, city, url: rawUrl });
}
cities.sort((a, b) => a.state.localeCompare(b.state) || a.city.localeCompare(b.city));
return cities;
}
async function fetchCityUrls(filterState?: string): Promise<CityUrl[]> {
console.log(`Fetching sitemap: ${SITEMAP_URL}`);
const xml = await fetchPage(SITEMAP_URL);
if (!xml) throw new Error('Failed to fetch sitemap');
const cities = parseCityUrlsFromSitemap(xml, filterState);
console.log(`Found ${cities.length} unique cities${filterState ? ` in ${filterState}` : ''}`);
return cities;
}
- Step 2: Verify sitemap parsing works
npx tsx -e "
import dotenv from 'dotenv';
dotenv.config({ path: '.env' });
const { parseCityUrlsFromSitemap } = await import('./scripts/import-horariodemissa.ts');
const xml = await fetch('https://horariodemissa.com.br/sitemap.xml').then(r => r.text());
const cities = parseCityUrlsFromSitemap(xml);
console.log('Total cities:', cities.length);
console.log('Sample:', JSON.stringify(cities.slice(0, 3), null, 2));
const states = [...new Set(cities.map(c => c.state))].sort();
console.log('States:', states.join(', '));
"
Expected: ~3,500 cities, states include SP, RJ, MG, RS, BA, DF, etc.
- Step 3: Commit
git add scripts/import-horariodemissa.ts
git commit -m "feat: horariodemissa importer scaffold + sitemap enumeration"
Task 4: HTML parsing
Files:
-
Modify:
scripts/import-horariodemissa.ts -
Step 1: Understand the dual-source page structure
Each city page contains two data sources per church, joined by the same key (e.g. dvey2):
Source A — JS h.push() strings embedded in <script> (sidebar/map):
h.push('<p><strong><a href="igreja.php?k=dvey2">NAME</a></strong><br/>Rua X, 123</p><p><strong>Telefone:</strong> (11) 1234-5678</p>');
Contains: key, name, address, phone.
Source B — Server-rendered .result divs:
<div class="result">
<a href="igreja.php?k=dvey2" class="result_title">NAME</a>
<p class="blockleft"><table>
<tr><td style="...">Domingo:</td><td>07:30, 10:30</td></tr>
</table></p>
</div>
Contains: key + schedule tables (first = masses, optional second = confessions).
- Step 2: Add parseDayLabel, parseTimeCells, parseMassTable, parseConfessionTable
// ─── HTML Parsers ─────────────────────────────────────────────────────────────
export function parseDayLabel(label: string): { dayOfWeek: number; notes: string | null } | null {
const normalized = label.toLowerCase().replace(/:$/, '').trim();
if (SPECIAL_DAY_MAP[normalized]) {
const s = SPECIAL_DAY_MAP[normalized];
return { dayOfWeek: s.dayOfWeek, notes: s.notes };
}
if (DAY_MAP[normalized] !== undefined) {
return { dayOfWeek: DAY_MAP[normalized], notes: null };
}
return null;
}
export function parseTimeCells(timesText: string): Array<{ time: string; notes: string | null }> {
const results: Array<{ time: string; notes: string | null }> = [];
// Split by comma but not inside parentheses
const parts = timesText.split(/,(?![^(]*\))/);
for (const part of parts) {
const trimmed = part.trim();
if (!trimmed) continue;
const timeMatch = trimmed.match(/\b(\d{1,2}:\d{2})\b/);
if (!timeMatch) continue;
const [h, m] = timeMatch[1].split(':');
const time = `${h.padStart(2, '0')}:${m}`;
const notesMatch = trimmed.match(/\(([^)]+)\)/);
results.push({ time, notes: notesMatch ? notesMatch[1].trim() : null });
}
return results;
}
export function parseMassTable(tableHtml: string): ParsedSchedule[] {
const schedules: ParsedSchedule[] = [];
for (const rowMatch of tableHtml.matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi)) {
const tds = [...rowMatch[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
.map(m => m[1].replace(/<[^>]+>/g, '').trim());
if (tds.length < 2) continue;
const dayResult = parseDayLabel(tds[0]);
if (!dayResult) continue;
for (const { time, notes } of parseTimeCells(tds[1])) {
schedules.push({
dayOfWeek: dayResult.dayOfWeek,
time,
notes: [dayResult.notes, notes].filter(Boolean).join('; ') || null,
});
}
}
return schedules;
}
export function parseConfessionTable(tableHtml: string): ParsedConfession[] {
const confessions: ParsedConfession[] = [];
for (const rowMatch of tableHtml.matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi)) {
const tds = [...rowMatch[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
.map(m => m[1].replace(/<[^>]+>/g, '').trim());
if (tds.length < 2) continue;
const dayResult = parseDayLabel(tds[0]);
if (!dayResult) continue;
// "09:00 às 11:00" or "09:00 a 11:00"
const rangeMatch = tds[1].match(/(\d{1,2}:\d{2})\s+(?:às|a)\s+(\d{1,2}:\d{2})/i);
if (!rangeMatch) continue;
const pad = (t: string) => { const [hh, mm] = t.split(':'); return `${hh.padStart(2,'0')}:${mm}`; };
confessions.push({
dayOfWeek: dayResult.dayOfWeek,
startTime: pad(rangeMatch[1]),
endTime: pad(rangeMatch[2]),
notes: dayResult.notes,
});
}
return confessions;
}
/**
* Parse a full city page HTML into church records.
* Joins h.push() JS data (name/address/phone) with .result DOM (schedules) by church key.
*/
export function parseCityPage(html: string, city: string, state: string): ParsedChurch[] {
// Parse Source A: h.push() JS strings → name, address, phone
const jsData = new Map<string, { name: string; address: string | null; phone: string | null }>();
for (const pushMatch of html.matchAll(/h\.push\('([\s\S]*?)'\);/g)) {
const content = pushMatch[1].replace(/\\'/g, "'");
const keyMatch = content.match(/igreja\.php\?k=([a-zA-Z0-9]+)/);
if (!keyMatch) continue;
const nameMatch = content.match(/igreja\.php\?k=[^"]+">([^<]+)<\/a>/);
const addrMatch = content.match(/<br\/>([^<]+)<\/p>/);
const phoneMatch = content.match(/Telefone:<\/strong>\s*([^<]+)/);
jsData.set(keyMatch[1], {
name: nameMatch ? nameMatch[1].trim() : '',
address: addrMatch ? addrMatch[1].trim() || null : null,
phone: phoneMatch ? phoneMatch[1].trim() || null : null,
});
}
// Parse Source B: .result divs → schedules
// Use split() rather than a lookahead regex — lookahead with $ drops the last result div
const scheduleData = new Map<string, { massSchedules: ParsedSchedule[]; confessionSchedules: ParsedConfession[] }>();
const resultParts = html.split('<div class="result">');
for (let i = 1; i < resultParts.length; i++) {
const resultHtml = resultParts[i];
const keyMatch = resultHtml.match(/href="igreja\.php\?k=([a-zA-Z0-9]+)"/);
if (!keyMatch) continue;
const tables = [...resultHtml.matchAll(/<table>([\s\S]*?)<\/table>/g)].map(m => m[1]);
scheduleData.set(keyMatch[1], {
massSchedules: tables[0] ? parseMassTable(tables[0]) : [],
confessionSchedules: tables[1] ? parseConfessionTable(tables[1]) : [],
});
}
// Join both sources by church key — every church in jsData gets its schedules from scheduleData
const allKeys = new Set([...jsData.keys(), ...scheduleData.keys()]);
const churches: ParsedChurch[] = [];
for (const key of allKeys) {
const js = jsData.get(key);
const sched = scheduleData.get(key);
if (!js?.name) continue;
churches.push({
key,
name: js.name,
address: js.address,
phone: js.phone,
city,
state,
massSchedules: sched?.massSchedules ?? [],
confessionSchedules: sched?.confessionSchedules ?? [],
});
}
return churches;
}
- Step 3: Verify parsing against a live city page
npx tsx -e "
import dotenv from 'dotenv';
dotenv.config({ path: '.env' });
const { parseCityPage } = await import('./scripts/import-horariodemissa.ts');
const url = 'https://horariodemissa.com.br/search.php?uf=SP&cidade=S%C3%A3o+Paulo&bairro=&opcoes=cidade_opcoes&submit=12345678&hl=pt';
const html = await fetch(url, { headers: { 'User-Agent': 'NearestMass-Importer/1.0' } }).then(r => r.text());
const churches = parseCityPage(html, 'São Paulo', 'SP');
console.log('Churches found:', churches.length);
console.log('With schedules:', churches.filter(c => c.massSchedules.length > 0).length);
console.log('Sample:', JSON.stringify(churches[0], null, 2));
"
Expected: 20+ churches found, majority with mass schedules, first entry shows name/address/phone/schedules.
- Step 4: Commit
git add scripts/import-horariodemissa.ts
git commit -m "feat: horariodemissa HTML parser (day mapping, schedule tables, dual-source join)"
Task 5: DB upsert + main()
Files:
-
Modify:
scripts/import-horariodemissa.ts -
Step 1: Add geocode helper
// ─── Geocoding ────────────────────────────────────────────────────────────────
async function geocodeAddress(address: string, city: string, state: string): Promise<{ lat: number; lng: number } | null> {
const query = [address, city, state, 'Brasil'].filter(Boolean).join(', ');
const url = `${NOMINATIM_URL}?q=${encodeURIComponent(query)}&format=json&limit=1&countrycodes=br`;
await delay(NOMINATIM_DELAY_MS);
try {
const response = await fetch(url, {
headers: { 'User-Agent': USER_AGENT, 'Accept': 'application/json' },
});
if (!response.ok) return null;
const results = await response.json() as Array<{ lat: string; lon: string }>;
if (!results.length) return null;
return { lat: parseFloat(results[0].lat), lng: parseFloat(results[0].lon) };
} catch {
return null;
}
}
- Step 2: Add upsertChurch function
Note: latitude/longitude are non-nullable in the schema. Use 0 as the sentinel for "no coordinates yet" (geocode pass will fill these in). The source field must be set explicitly — the schema default is "masstimes" which would corrupt source-based queries.
// ─── DB Upsert ────────────────────────────────────────────────────────────────
async function upsertChurch(
parsed: ParsedChurch,
existingChurches: ExistingChurch[],
args: CLIArgs,
stats: ImportStats
): Promise<void> {
const candidate = { name: parsed.name, lat: 0, lng: 0, horarioDemissaId: parsed.key };
const existing = findDuplicateChurch(candidate, existingChurches);
if (args.dryRun) {
console.log(` [dry-run] ${existing ? 'UPDATE' : 'CREATE'} ${parsed.name} (${parsed.key})`);
if (existing) stats.churchesUpdated++; else stats.churchesCreated++;
return;
}
try {
let churchId: string;
await prisma.$transaction(async (tx) => {
const church = await tx.church.upsert({
where: { horarioDemissaId: parsed.key },
create: {
horarioDemissaId: parsed.key,
name: parsed.name,
address: parsed.address,
city: parsed.city,
state: parsed.state,
country: 'BR',
phone: parsed.phone,
source: 'horario-demissa', // must set explicitly — schema default is "masstimes"
latitude: 0, // sentinel for "no coordinates"; geocode pass fills this in
longitude: 0,
lastScrapedAt: new Date(),
scrapeStrategy: 'horario-demissa',
},
update: {
name: parsed.name,
address: parsed.address ?? undefined,
city: parsed.city,
state: parsed.state,
phone: parsed.phone ?? undefined,
lastScrapedAt: new Date(),
},
});
churchId = church.id;
await tx.massSchedule.deleteMany({ where: { churchId: church.id } });
if (parsed.massSchedules.length > 0) {
// Deduplicate by day+time before inserting
const seen = new Set<string>();
const deduped = parsed.massSchedules.filter((s) => {
const k = `${s.dayOfWeek}:${s.time}`;
return seen.has(k) ? false : (seen.add(k), true);
});
await tx.massSchedule.createMany({
data: deduped.map((s) => ({
churchId: church.id,
dayOfWeek: s.dayOfWeek,
time: s.time,
notes: s.notes,
})),
});
stats.massSchedulesCreated += deduped.length;
}
await tx.confessionSchedule.deleteMany({ where: { churchId: church.id } });
if (parsed.confessionSchedules.length > 0) {
await tx.confessionSchedule.createMany({
data: parsed.confessionSchedules.map((c) => ({
churchId: church.id,
dayOfWeek: c.dayOfWeek,
startTime: c.startTime,
endTime: c.endTime,
notes: c.notes,
})),
});
}
});
if (existing) {
stats.churchesUpdated++;
} else {
stats.churchesCreated++;
// Use real DB UUID (churchId!) not the source key string
existingChurches.push({
id: churchId!, name: parsed.name, latitude: 0, longitude: 0,
osmId: null, baiduId: null, masstimesId: null, orarimesseId: null,
massSchedulesPhId: null, philmassId: null, horariosMisasId: null,
mszeInfoId: null, weekdayMassesId: null, messesInfoId: null,
bohosluzbyId: null, miserendId: null, kerknetId: null,
gottesdienstzeitenId: null, horarioDemissaId: parsed.key, misasOrgId: null,
source: 'horario-demissa', website: null, phone: parsed.phone,
address: parsed.address, country: 'BR',
});
}
} catch (error) {
console.error(` Error upserting ${parsed.name}: ${error instanceof Error ? error.message : error}`);
stats.errors++;
}
}
- Step 3: Add geocodeOnly pass
Note: latitude is non-nullable (Float in schema), so { latitude: null } will never match. Use { latitude: 0 } — that is the sentinel value set on creation for address-only churches.
async function runGeocodeOnly(stats: ImportStats): Promise<void> {
console.log('\nGeocoding Brazilian churches without coordinates...');
const churches = await prisma.church.findMany({
where: { horarioDemissaId: { not: null }, latitude: 0, address: { not: null } },
select: { id: true, name: true, address: true, city: true, state: true },
});
console.log(`Found ${churches.length} churches to geocode`);
for (const church of churches) {
const coords = await geocodeAddress(church.address!, church.city ?? '', church.state ?? '');
if (coords) {
await prisma.church.update({ where: { id: church.id }, data: { latitude: coords.lat, longitude: coords.lng } });
stats.geocoded++;
console.log(` Geocoded: ${church.name} → ${coords.lat}, ${coords.lng}`);
} else {
stats.geocodeFailed++;
}
}
}
- Step 4: Add CLI arg parser + main()
// ─── CLI + Main ───────────────────────────────────────────────────────────────
function parseArgs(): CLIArgs {
const argv = process.argv.slice(2);
const idx = (flag: string) => argv.indexOf(flag);
return {
all: argv.includes('--all'),
state: idx('--state') >= 0 ? argv[idx('--state') + 1] : undefined,
dryRun: argv.includes('--dry-run'),
geocode: argv.includes('--geocode'),
geocodeOnly: argv.includes('--geocode-only'),
resumeFrom: idx('--resume-from') >= 0 ? parseInt(argv[idx('--resume-from') + 1], 10) : undefined,
jobId: idx('--job-id') >= 0 ? argv[idx('--job-id') + 1] : undefined,
};
}
async function main(): Promise<void> {
const args = parseArgs();
const stats: ImportStats = {
citiesProcessed: 0, churchesFound: 0, churchesCreated: 0,
churchesUpdated: 0, massSchedulesCreated: 0,
geocoded: 0, geocodeFailed: 0, errors: 0,
};
console.log('\n' + '='.repeat(70));
console.log('HORARIO DE MISSA (BRAZIL) IMPORTER');
console.log('='.repeat(70));
console.log(`Mode: ${args.geocodeOnly ? 'geocode-only' : args.dryRun ? 'dry-run' : 'import'}`);
if (args.state) console.log(`State filter: ${args.state}`);
if (args.resumeFrom) console.log(`Resume from: ${args.resumeFrom}`);
console.log(`Time: ${new Date().toISOString()}\n`);
try {
if (args.geocodeOnly) {
await runGeocodeOnly(stats);
} else if (args.all || args.state) {
console.log('Loading existing BR churches...');
const existingChurches = await prisma.church.findMany({
where: { country: 'BR' },
select: {
id: true, name: true, latitude: true, longitude: true,
osmId: true, baiduId: true, masstimesId: true, orarimesseId: true,
massSchedulesPhId: true, philmassId: true, horariosMisasId: true,
mszeInfoId: true, weekdayMassesId: true, messesInfoId: true,
bohosluzbyId: true, miserendId: true, kerknetId: true,
gottesdienstzeitenId: true, horarioDemissaId: true, misasOrgId: true,
source: true, website: true, phone: true, address: true, country: true,
},
}) as ExistingChurch[];
console.log(`Loaded ${existingChurches.length} existing BR churches\n`);
const cities = await fetchCityUrls(args.state);
const startIndex = args.resumeFrom ?? 0;
for (let i = startIndex; i < cities.length; i++) {
const { state, city, url } = cities[i];
console.log(`[${i + 1}/${cities.length}] ${state} / ${city}`);
const html = await fetchPage(url);
if (!html) { stats.errors++; continue; }
const churches = parseCityPage(html, city, state);
stats.churchesFound += churches.length;
stats.citiesProcessed++;
console.log(` ${churches.length} churches`);
for (const church of churches) {
await upsertChurch(church, existingChurches, args, stats);
}
if (args.geocode && !args.dryRun) {
for (const church of churches) {
if (!church.address) continue;
const dbChurch = await prisma.church.findUnique({
where: { horarioDemissaId: church.key },
select: { id: true, latitude: true },
});
// latitude === 0 is the sentinel for "no real coordinates yet"
if (dbChurch && dbChurch.latitude === 0) {
const coords = await geocodeAddress(church.address, church.city, church.state);
if (coords) {
await prisma.church.update({ where: { id: dbChurch.id }, data: { latitude: coords.lat, longitude: coords.lng } });
stats.geocoded++;
} else {
stats.geocodeFailed++;
}
}
}
}
}
} else {
console.error('Usage: --all | --state XX | --geocode-only');
process.exit(1);
}
} finally {
await prisma.$disconnect();
await pool.end();
}
console.log('\n' + '='.repeat(70));
console.log('SUMMARY');
console.log('='.repeat(70));
console.log(`Cities processed: ${stats.citiesProcessed}`);
console.log(`Churches found: ${stats.churchesFound}`);
console.log(` Created: ${stats.churchesCreated}`);
console.log(` Updated: ${stats.churchesUpdated}`);
console.log(` Errors: ${stats.errors}`);
console.log(`Mass schedules: ${stats.massSchedulesCreated}`);
if (args.geocode || args.geocodeOnly) {
console.log(`Geocoded: ${stats.geocoded} / Failed: ${stats.geocodeFailed}`);
}
console.log('='.repeat(70) + '\n');
}
main().catch(console.error);
- Step 5: Test dry-run on small state
npx tsx scripts/import-horariodemissa.ts --state DF --dry-run
Expected: Lists churches from Distrito Federal (Brasília) without DB writes.
- Step 6: Test real import on smallest state (Roraima)
npx tsx scripts/import-horariodemissa.ts --state RR
Then verify:
npx tsx -e "
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
import { prisma } from './src/lib/db.ts';
const count = await prisma.church.count({ where: { country: 'BR' } });
const sched = await prisma.massSchedule.count({ where: { church: { country: 'BR' } } });
console.log('BR churches:', count, '| Mass schedules:', sched);
await prisma.\$disconnect();
"
Expected: Some churches from Roraima with mass schedules in DB.
- Step 7: Commit
git add scripts/import-horariodemissa.ts
git commit -m "feat: complete horariodemissa importer (Brazil, 8895 churches + 28523 mass times)"
Chunk 3: Spain Importer (import-misas.ts)
Depends on Chunk 1. Can run in parallel with Chunk 2.
Task 6: API pagination + boilerplate
Files:
-
Create:
scripts/import-misas.ts -
Step 1: Create script with boilerplate + API pagination
Create scripts/import-misas.ts:
#!/usr/bin/env tsx
/**
* Import Catholic churches from misas.org (Spain)
*
* misas.org lists 17,919 Spanish parishes with name, address, coordinates,
* and province via a public JSON REST API. Mass schedules are auth-gated
* (401 on detail endpoint), so this importer creates/updates church records
* only — no schedule data.
*
* The listing API accepts offset-based pagination. We use Madrid as the center
* with a large radius (999999m) to cover all of Spain in a single stream.
*
* Import strategy:
* 1. Paginate GET /api/parishsearch?country=es&pos=[...]&offset=N&limit=500
* 2. For each parish: id, name, addr, loc (city), prov (province), zip, lat, long
* 3. Match against existing ES churches by misasOrgId or proximity+name
* 4. Upsert church record (no mass schedules)
*
* Usage:
* npx tsx scripts/import-misas.ts --all
* npx tsx scripts/import-misas.ts --all --dry-run
* npx tsx scripts/import-misas.ts --all --resume-from 5000
* npx tsx scripts/import-misas.ts --all --job-id {uuid}
*/
import dotenv from 'dotenv';
import path from 'path';
dotenv.config({ path: path.resolve(process.cwd(), '.env.local') });
dotenv.config({ path: path.resolve(process.cwd(), '.env') });
import { Pool } from 'pg';
import { PrismaPg } from '@prisma/adapter-pg';
import { PrismaClient } from '@prisma/client';
const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass';
console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`);
const pool = new Pool({
connectionString: dbUrl,
ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined,
});
const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
import { findDuplicateChurch } from '../src/lib/church-matcher';
import type { ExistingChurch } from '../src/lib/church-matcher';
// ─── Constants ───────────────────────────────────────────────────────────────
const API_BASE = 'https://misas.org/api/parishsearch';
// Madrid coordinates, large radius covers all of Spain
const SPAIN_POS = encodeURIComponent('[-3.7038,40.4168,999999]');
const PAGE_SIZE = 500;
const REQUEST_DELAY_MS = 500;
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
// ─── Types ───────────────────────────────────────────────────────────────────
interface MisasParish {
id: number;
name: string;
uri: string;
addr: string;
loc: string; // city
prov: string; // province
zip: string;
lat: string;
long: string;
}
interface MisasApiResponse {
count: number;
pars: MisasParish[];
}
interface CLIArgs {
all: boolean;
dryRun: boolean;
resumeFrom?: number;
jobId?: string;
}
interface ImportStats {
total: number;
created: number;
updated: number;
errors: number;
}
// ─── HTTP Client ──────────────────────────────────────────────────────────────
let requestCount = 0;
function delay(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function fetchParishes(offset: number): Promise<MisasApiResponse | null> {
if (requestCount > 0) await delay(REQUEST_DELAY_MS);
requestCount++;
const url = `${API_BASE}?country=es&pos=${SPAIN_POS}&offset=${offset}&limit=${PAGE_SIZE}`;
try {
const response = await fetch(url, {
headers: {
'User-Agent': USER_AGENT,
'Accept': 'application/json',
'Referer': 'https://misas.org/',
},
});
if (!response.ok) {
console.error(` HTTP ${response.status} at offset ${offset}`);
return null;
}
return await response.json() as MisasApiResponse;
} catch (error) {
console.error(` Fetch error at offset ${offset}: ${error instanceof Error ? error.message : error}`);
return null;
}
}
// ─── Pagination ───────────────────────────────────────────────────────────────
export async function* paginateParishes(startOffset: number = 0): AsyncGenerator<MisasParish> {
let offset = startOffset;
let totalKnown = Infinity;
while (offset < totalKnown) {
console.log(` Fetching offset ${offset}${totalKnown < Infinity ? `/${totalKnown}` : ''}...`);
const data = await fetchParishes(offset);
if (!data || !data.pars || data.pars.length === 0) break;
totalKnown = data.count;
for (const parish of data.pars) {
yield parish;
}
offset += data.pars.length;
}
}
- Step 2: Verify API returns expected data
npx tsx -e "
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
const { paginateParishes } = await import('./scripts/import-misas.ts');
let count = 0;
for await (const p of paginateParishes()) {
if (count === 0) console.log('First parish:', JSON.stringify(p, null, 2));
count++;
if (count >= 5) break;
}
console.log('Fetched:', count, 'from first batch');
"
Expected: Parish objects with id, name, lat, long, addr, loc, prov fields.
- Step 3: Commit
git add scripts/import-misas.ts
git commit -m "feat: misas.org importer scaffold + API pagination"
Task 7: DB upsert + main()
Files:
-
Modify:
scripts/import-misas.ts -
Step 1: Add upsertParish + main()
Note: latitude/longitude are Float (non-nullable) — use 0 as sentinel when coordinates are missing. Set source explicitly to 'misas-org' — the schema default is "masstimes".
// ─── DB Upsert ────────────────────────────────────────────────────────────────
async function upsertParish(
parish: MisasParish,
existingChurches: ExistingChurch[],
args: CLIArgs,
stats: ImportStats
): Promise<void> {
const lat = parseFloat(parish.lat);
const lng = parseFloat(parish.long);
const misasOrgId = String(parish.id);
const resolvedLat = isNaN(lat) ? 0 : lat;
const resolvedLng = isNaN(lng) ? 0 : lng;
const candidate = {
name: parish.name,
lat: resolvedLat,
lng: resolvedLng,
misasOrgId,
};
const existing = findDuplicateChurch(candidate, existingChurches);
if (args.dryRun) {
console.log(` [dry-run] ${existing ? 'UPDATE' : 'CREATE'} ${parish.name} (${misasOrgId})`);
stats.total++;
if (existing) stats.updated++; else stats.created++;
return;
}
try {
const church = await prisma.church.upsert({
where: { misasOrgId },
create: {
misasOrgId,
name: parish.name,
address: parish.addr || null,
city: parish.loc || null,
state: parish.prov || null,
zip: parish.zip || null,
country: 'ES',
source: 'misas-org', // must set explicitly — schema default is "masstimes"
latitude: resolvedLat, // 0 = no real coordinates; misas.org provides coords for most
longitude: resolvedLng,
lastScrapedAt: new Date(),
scrapeStrategy: 'misas-org',
},
update: {
name: parish.name,
address: parish.addr || undefined,
city: parish.loc || undefined,
state: parish.prov || undefined,
zip: parish.zip || undefined,
// Only update coords if we have real values (don't overwrite good data with 0)
...(resolvedLat !== 0 && { latitude: resolvedLat, longitude: resolvedLng }),
misasOrgId, // stamp ID even if matched by proximity
lastScrapedAt: new Date(),
},
});
if (existing) {
stats.updated++;
} else {
stats.created++;
existingChurches.push({
id: church.id, name: parish.name,
latitude: resolvedLat, longitude: resolvedLng,
osmId: null, baiduId: null, masstimesId: null, orarimesseId: null,
massSchedulesPhId: null, philmassId: null, horariosMisasId: null,
mszeInfoId: null, weekdayMassesId: null, messesInfoId: null,
bohosluzbyId: null, miserendId: null, kerknetId: null,
gottesdienstzeitenId: null, horarioDemissaId: null, misasOrgId,
source: 'misas-org', website: null, phone: null,
address: parish.addr || null, country: 'ES',
});
}
stats.total++;
} catch (error) {
console.error(` Error upserting ${parish.name}: ${error instanceof Error ? error.message : error}`);
stats.errors++;
stats.total++; // count errors in total so progress log fires correctly
}
}
// ─── CLI + Main ───────────────────────────────────────────────────────────────
// Note: --job-id is accepted for scheduler compatibility but BackgroundJob status
// tracking is not wired up in this importer (acceptable for v1 — add later if needed).
function parseArgs(): CLIArgs {
const argv = process.argv.slice(2);
const idx = (flag: string) => argv.indexOf(flag);
return {
all: argv.includes('--all'),
dryRun: argv.includes('--dry-run'),
resumeFrom: idx('--resume-from') >= 0 ? parseInt(argv[idx('--resume-from') + 1], 10) : undefined,
jobId: idx('--job-id') >= 0 ? argv[idx('--job-id') + 1] : undefined,
};
}
async function main(): Promise<void> {
const args = parseArgs();
const stats: ImportStats = { total: 0, created: 0, updated: 0, errors: 0 };
console.log('\n' + '='.repeat(70));
console.log('MISAS.ORG (SPAIN) IMPORTER');
console.log('='.repeat(70));
console.log(`Mode: ${args.dryRun ? 'dry-run' : 'import'}`);
if (args.resumeFrom) console.log(`Resume from offset: ${args.resumeFrom}`);
console.log(`Time: ${new Date().toISOString()}\n`);
if (!args.all) {
console.error('Usage: --all [--dry-run] [--resume-from N]');
process.exit(1);
}
try {
console.log('Loading existing ES churches...');
const existingChurches = await prisma.church.findMany({
where: { country: 'ES' },
select: {
id: true, name: true, latitude: true, longitude: true,
osmId: true, baiduId: true, masstimesId: true, orarimesseId: true,
massSchedulesPhId: true, philmassId: true, horariosMisasId: true,
mszeInfoId: true, weekdayMassesId: true, messesInfoId: true,
bohosluzbyId: true, miserendId: true, kerknetId: true,
gottesdienstzeitenId: true, horarioDemissaId: true, misasOrgId: true,
source: true, website: true, phone: true, address: true, country: true,
},
}) as ExistingChurch[];
console.log(`Loaded ${existingChurches.length} existing ES churches\n`);
for await (const parish of paginateParishes(args.resumeFrom ?? 0)) {
await upsertParish(parish, existingChurches, args, stats);
if (stats.total % 500 === 0) {
console.log(` Progress: ${stats.total} processed (${stats.created} created, ${stats.updated} updated)`);
}
}
} finally {
await prisma.$disconnect();
await pool.end();
}
console.log('\n' + '='.repeat(70));
console.log('SUMMARY');
console.log('='.repeat(70));
console.log(`Total processed: ${stats.total}`);
console.log(` Created: ${stats.created}`);
console.log(` Updated: ${stats.updated}`);
console.log(` Errors: ${stats.errors}`);
console.log('='.repeat(70) + '\n');
}
main().catch(console.error);
- Step 2: Test dry-run end-to-end
npx tsx scripts/import-misas.ts --all --dry-run 2>&1 | tail -20
Expected: Processes all 17,919 parishes, shows Total processed: 17919 with created/updated split.
- Step 3: Commit
git add scripts/import-misas.ts
git commit -m "feat: complete misas.org importer (Spain, 17919 churches with coordinates)"
Chunk 4: Integration
Task 8: package.json + scheduler
Files:
-
Modify:
package.json -
Modify:
scripts/scheduler.ts -
Step 1: Add npm scripts
In package.json "scripts" block, add after "import:masstimes-api":
"import:horariodemissa": "tsx scripts/import-horariodemissa.ts",
"import:misas": "tsx scripts/import-misas.ts"
- Step 2: Add getJobCommand cases in scheduler.ts
In scripts/scheduler.ts, add before default: in getJobCommand():
case 'horariodemissa-import': {
const args = ['tsx', 'scripts/import-horariodemissa.ts', '--all'];
if (config?.state) args.push('--state', String(config.state));
if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom));
if (config?.geocode) args.push('--geocode');
return { command: 'npx', args };
}
case 'misas-import': {
const args = ['tsx', 'scripts/import-misas.ts', '--all'];
if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom));
return { command: 'npx', args };
}
- Step 3: Add to PIPELINE_GROUPS imports sequence
In PIPELINE_GROUPS[0].phases, add after the masstimes-api-import entry:
{ name: 'horariodemissa-import', type: 'horariodemissa-import', config: {} },
{ name: 'misas-import', type: 'misas-import', config: {} },
- Step 4: Verify TypeScript
npx tsc --noEmit
Expected: no errors.
- Step 5: Smoke test both npm scripts
npm run import:horariodemissa -- --state DF --dry-run 2>&1 | tail -10
npm run import:misas -- --all --dry-run 2>&1 | tail -10
- Step 6: Commit
git add package.json scripts/scheduler.ts
git commit -m "feat: add horariodemissa and misas.org to npm scripts and scheduler pipeline"
Final Verification
- Import small state from Brazil to confirm end-to-end
npx tsx scripts/import-horariodemissa.ts --state DF
npx tsx -e "
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
import { prisma } from './src/lib/db.ts';
const churches = await prisma.church.count({ where: { country: 'BR' } });
const schedules = await prisma.massSchedule.count({ where: { church: { country: 'BR' } } });
console.log('BR churches:', churches, '| Mass schedules:', schedules);
await prisma.\$disconnect();
"
Expected: Distrito Federal churches in DB with mass schedules.
- Dry-run Spain importer full pass
npx tsx scripts/import-misas.ts --all --dry-run 2>&1 | grep -E "SUMMARY|Total|Created|Updated" | tail -10
Expected: ~17,919 total, mix of created vs updated depending on existing ES church overlap.