1384 lines
46 KiB
Markdown
1384 lines
46 KiB
Markdown
|
|
# Brazil + Spain Importers Implementation Plan
|
|||
|
|
|
|||
|
|
> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|||
|
|
|
|||
|
|
**Goal:** Add two new church importers — horariodemissa.com.br (8,895 Brazilian churches + 28,523 mass times) and misas.org (17,919 Spanish churches with coordinates).
|
|||
|
|
|
|||
|
|
**Architecture:** Chunk 1 (shared prerequisites) must complete first. Tasks 3–5 (Brazil) and Tasks 6–7 (Spain) are independent and can run in parallel as subagents. All scripts follow the established importer pattern: fetch → regex parse → church-matcher dedup → prisma upsert.
|
|||
|
|
|
|||
|
|
**Tech Stack:** TypeScript, tsx, native `fetch`, regex HTML parsing (matchAll), Prisma + pg, church-matcher
|
|||
|
|
|
|||
|
|
**Spec:** `docs/superpowers/specs/2026-03-10-brazil-spain-importers-design.md`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Chunk 1: Shared Prerequisites (schema + church-matcher)
|
|||
|
|
|
|||
|
|
### Task 1: Schema additions
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `prisma/schema.prisma`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Add two new ID fields to the Church model**
|
|||
|
|
|
|||
|
|
In `prisma/schema.prisma`, find the block of importer ID fields (near `gottesdienstzeitenId`) and add after it:
|
|||
|
|
|
|||
|
|
```prisma
|
|||
|
|
horarioDemissaId String? @unique @map("horario_demissa_id")
|
|||
|
|
misasOrgId String? @unique @map("misas_org_id")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Then add two indexes in the `@@index` block at the bottom of the Church model:
|
|||
|
|
|
|||
|
|
```prisma
|
|||
|
|
@@index([horarioDemissaId])
|
|||
|
|
@@index([misasOrgId])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Regenerate Prisma client**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx prisma generate
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: `✔ Generated Prisma Client` with no errors.
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Verify the fields exist in generated types**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
grep -n "horarioDemissaId\|misasOrgId" node_modules/.prisma/client/index.d.ts | head -10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: both fields appear in the type definitions.
|
|||
|
|
|
|||
|
|
- [ ] **Step 4: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add prisma/schema.prisma
|
|||
|
|
git commit -m "feat: add horarioDemissaId and misasOrgId fields to Church schema"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 2: church-matcher updates
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `src/lib/church-matcher.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Add new fields to ExistingChurch interface**
|
|||
|
|
|
|||
|
|
In `src/lib/church-matcher.ts`, find `ExistingChurch` interface and add after `gottesdienstzeitenId`:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
horarioDemissaId: string | null;
|
|||
|
|
misasOrgId: string | null;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Add new fields to ChurchCandidate type**
|
|||
|
|
|
|||
|
|
Find `ChurchCandidate` type and add after `gottesdienstzeitenId?`:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
horarioDemissaId?: string;
|
|||
|
|
misasOrgId?: string;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Add two new exact-match passes in findDuplicateChurch**
|
|||
|
|
|
|||
|
|
After the Thirteenth pass (gottesdienstzeitenId), add before the proximity pass:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Fourteenth pass: exact horarioDemissaId match
|
|||
|
|
if (candidate.horarioDemissaId) {
|
|||
|
|
const match = existingChurches.find(
|
|||
|
|
(church) => church.horarioDemissaId === candidate.horarioDemissaId
|
|||
|
|
);
|
|||
|
|
if (match) return match;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Fifteenth pass: exact misasOrgId match
|
|||
|
|
if (candidate.misasOrgId) {
|
|||
|
|
const match = existingChurches.find(
|
|||
|
|
(church) => church.misasOrgId === candidate.misasOrgId
|
|||
|
|
);
|
|||
|
|
if (match) return match;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 4: Verify TypeScript compiles**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsc --noEmit
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: no errors.
|
|||
|
|
|
|||
|
|
- [ ] **Step 5: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add src/lib/church-matcher.ts
|
|||
|
|
git commit -m "feat: add horarioDemissaId and misasOrgId to church-matcher"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Chunk 2: Brazil Importer (import-horariodemissa.ts)
|
|||
|
|
|
|||
|
|
> Depends on Chunk 1. Can run in parallel with Chunk 3.
|
|||
|
|
|
|||
|
|
### Task 3: Boilerplate + sitemap enumeration
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Create: `scripts/import-horariodemissa.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Create script with boilerplate + types + sitemap parsing**
|
|||
|
|
|
|||
|
|
Create `scripts/import-horariodemissa.ts`:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
#!/usr/bin/env tsx
|
|||
|
|
/**
|
|||
|
|
* Import Catholic churches and mass schedules from horariodemissa.com.br (Brazil)
|
|||
|
|
*
|
|||
|
|
* horariodemissa.com.br has 8,895 churches across all 26 Brazilian states + DF,
|
|||
|
|
* with 28,523 mass times. All data is server-rendered — one HTTP request per city
|
|||
|
|
* page returns all churches + schedules for that city.
|
|||
|
|
*
|
|||
|
|
* City pages have a split structure:
|
|||
|
|
* - Address/phone: embedded in JS h.push() strings (sidebar/map data)
|
|||
|
|
* - Schedules: in server-rendered .result divs with <table> rows
|
|||
|
|
* Both sets are linked by the same church key (e.g. "dvey2").
|
|||
|
|
*
|
|||
|
|
* Import strategy:
|
|||
|
|
* 1. Fetch sitemap.xml → deduplicate to pt-only city URLs (~3,552 cities)
|
|||
|
|
* 2. For each city: fetch page → parse address/phone from JS + schedules from DOM
|
|||
|
|
* 3. Join by church key, match against existing BR churches, upsert
|
|||
|
|
* 4. Optional --geocode flag for Nominatim pass after import
|
|||
|
|
*
|
|||
|
|
* Usage:
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --all
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --all --dry-run
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --state SP
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --all --resume-from 500
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --all --geocode
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --geocode-only
|
|||
|
|
* npx tsx scripts/import-horariodemissa.ts --all --job-id {uuid}
|
|||
|
|
*/
|
|||
|
|
|
|||
|
|
import dotenv from 'dotenv';
|
|||
|
|
import path from 'path';
|
|||
|
|
|
|||
|
|
dotenv.config({ path: path.resolve(process.cwd(), '.env.local') });
|
|||
|
|
dotenv.config({ path: path.resolve(process.cwd(), '.env') });
|
|||
|
|
|
|||
|
|
import { Pool } from 'pg';
|
|||
|
|
import { PrismaPg } from '@prisma/adapter-pg';
|
|||
|
|
import { PrismaClient } from '@prisma/client';
|
|||
|
|
|
|||
|
|
const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass';
|
|||
|
|
console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`);
|
|||
|
|
const pool = new Pool({
|
|||
|
|
connectionString: dbUrl,
|
|||
|
|
ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined,
|
|||
|
|
});
|
|||
|
|
const adapter = new PrismaPg(pool);
|
|||
|
|
const prisma = new PrismaClient({ adapter });
|
|||
|
|
|
|||
|
|
import { findDuplicateChurch } from '../src/lib/church-matcher';
|
|||
|
|
import type { ExistingChurch } from '../src/lib/church-matcher';
|
|||
|
|
|
|||
|
|
// ─── Constants ───────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
const SITE_BASE = 'https://horariodemissa.com.br';
|
|||
|
|
const SITEMAP_URL = `${SITE_BASE}/sitemap.xml`;
|
|||
|
|
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
|
|||
|
|
const REQUEST_DELAY_MS = 1500;
|
|||
|
|
const NOMINATIM_DELAY_MS = 1100;
|
|||
|
|
const NOMINATIM_URL = 'https://nominatim.openstreetmap.org/search';
|
|||
|
|
|
|||
|
|
// ─── Types ───────────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
interface CityUrl {
|
|||
|
|
state: string; // e.g. "SP"
|
|||
|
|
city: string; // e.g. "São Paulo"
|
|||
|
|
url: string; // full fetch URL
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface ParsedSchedule {
|
|||
|
|
dayOfWeek: number; // 0=Sun, 1=Mon, ..., 6=Sat
|
|||
|
|
time: string; // "HH:MM"
|
|||
|
|
notes: string | null;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface ParsedConfession {
|
|||
|
|
dayOfWeek: number;
|
|||
|
|
startTime: string;
|
|||
|
|
endTime: string;
|
|||
|
|
notes: string | null;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface ParsedChurch {
|
|||
|
|
key: string; // e.g. "dvey2" (used as horarioDemissaId)
|
|||
|
|
name: string;
|
|||
|
|
address: string | null;
|
|||
|
|
phone: string | null;
|
|||
|
|
city: string;
|
|||
|
|
state: string;
|
|||
|
|
massSchedules: ParsedSchedule[];
|
|||
|
|
confessionSchedules: ParsedConfession[];
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface CLIArgs {
|
|||
|
|
all: boolean;
|
|||
|
|
state?: string;
|
|||
|
|
dryRun: boolean;
|
|||
|
|
geocode: boolean;
|
|||
|
|
geocodeOnly: boolean;
|
|||
|
|
resumeFrom?: number;
|
|||
|
|
jobId?: string;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface ImportStats {
|
|||
|
|
citiesProcessed: number;
|
|||
|
|
churchesFound: number;
|
|||
|
|
churchesCreated: number;
|
|||
|
|
churchesUpdated: number;
|
|||
|
|
massSchedulesCreated: number;
|
|||
|
|
geocoded: number;
|
|||
|
|
geocodeFailed: number;
|
|||
|
|
errors: number;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// ─── Brazilian Day Name Mapping ───────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
const DAY_MAP: Record<string, number> = {
|
|||
|
|
'domingo': 0,
|
|||
|
|
'segunda-feira': 1, 'segunda': 1,
|
|||
|
|
'terça-feira': 2, 'terca-feira': 2, 'terça': 2,
|
|||
|
|
'quarta-feira': 3, 'quarta': 3,
|
|||
|
|
'quinta-feira': 4, 'quinta': 4,
|
|||
|
|
'sexta-feira': 5, 'sexta': 5,
|
|||
|
|
'sábado': 6, 'sabado': 6,
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
const SPECIAL_DAY_MAP: Record<string, { dayOfWeek: number; notes: string }> = {
|
|||
|
|
'primeiro domingo': { dayOfWeek: 0, notes: 'Primeiro Domingo' },
|
|||
|
|
'segundo domingo': { dayOfWeek: 0, notes: 'Segundo Domingo' },
|
|||
|
|
'terceiro domingo': { dayOfWeek: 0, notes: 'Terceiro Domingo' },
|
|||
|
|
'quarto domingo': { dayOfWeek: 0, notes: 'Quarto Domingo' },
|
|||
|
|
'primeiro sábado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
|
|||
|
|
'primeiro sabado': { dayOfWeek: 6, notes: 'Primeiro Sábado' },
|
|||
|
|
'segundo sábado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
|
|||
|
|
'segundo sabado': { dayOfWeek: 6, notes: 'Segundo Sábado' },
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
// ─── HTTP Client ──────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
let requestCount = 0;
|
|||
|
|
|
|||
|
|
function delay(ms: number): Promise<void> {
|
|||
|
|
return new Promise((resolve) => setTimeout(resolve, ms));
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function fetchPage(url: string, delayMs: number = REQUEST_DELAY_MS): Promise<string | null> {
|
|||
|
|
if (requestCount > 0) await delay(delayMs);
|
|||
|
|
requestCount++;
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
const response = await fetch(url, {
|
|||
|
|
headers: {
|
|||
|
|
'User-Agent': USER_AGENT,
|
|||
|
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
|||
|
|
'Accept-Language': 'pt-BR,pt;q=0.9',
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
if (!response.ok) {
|
|||
|
|
console.error(` HTTP ${response.status} for ${url}`);
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return await response.text();
|
|||
|
|
} catch (error) {
|
|||
|
|
console.error(` Fetch error for ${url}: ${error instanceof Error ? error.message : error}`);
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// ─── Sitemap Parser ───────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
export function parseCityUrlsFromSitemap(sitemapXml: string, filterState?: string): CityUrl[] {
|
|||
|
|
const seen = new Set<string>();
|
|||
|
|
const cities: CityUrl[] = [];
|
|||
|
|
|
|||
|
|
for (const match of sitemapXml.matchAll(/<loc>([^<]+)<\/loc>/g)) {
|
|||
|
|
const rawUrl = match[1].replace(/&/g, '&');
|
|||
|
|
|
|||
|
|
// Only pt-language city search pages
|
|||
|
|
if (!rawUrl.includes('opcoes=cidade_opcoes') || rawUrl.includes('hl=en')) continue;
|
|||
|
|
|
|||
|
|
const ufMatch = rawUrl.match(/[?&]uf=([A-Z]+)/);
|
|||
|
|
const cidadeMatch = rawUrl.match(/[?&]cidade=([^&]+)/);
|
|||
|
|
if (!ufMatch || !cidadeMatch) continue;
|
|||
|
|
|
|||
|
|
const state = ufMatch[1];
|
|||
|
|
const city = decodeURIComponent(cidadeMatch[1].replace(/\+/g, ' '));
|
|||
|
|
|
|||
|
|
if (filterState && state !== filterState.toUpperCase()) continue;
|
|||
|
|
|
|||
|
|
const key = `${state}:${city}`;
|
|||
|
|
if (seen.has(key)) continue;
|
|||
|
|
seen.add(key);
|
|||
|
|
|
|||
|
|
cities.push({ state, city, url: rawUrl });
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
cities.sort((a, b) => a.state.localeCompare(b.state) || a.city.localeCompare(b.city));
|
|||
|
|
return cities;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function fetchCityUrls(filterState?: string): Promise<CityUrl[]> {
|
|||
|
|
console.log(`Fetching sitemap: ${SITEMAP_URL}`);
|
|||
|
|
const xml = await fetchPage(SITEMAP_URL);
|
|||
|
|
if (!xml) throw new Error('Failed to fetch sitemap');
|
|||
|
|
|
|||
|
|
const cities = parseCityUrlsFromSitemap(xml, filterState);
|
|||
|
|
console.log(`Found ${cities.length} unique cities${filterState ? ` in ${filterState}` : ''}`);
|
|||
|
|
return cities;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Verify sitemap parsing works**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx -e "
|
|||
|
|
import dotenv from 'dotenv';
|
|||
|
|
dotenv.config({ path: '.env' });
|
|||
|
|
const { parseCityUrlsFromSitemap } = await import('./scripts/import-horariodemissa.ts');
|
|||
|
|
const xml = await fetch('https://horariodemissa.com.br/sitemap.xml').then(r => r.text());
|
|||
|
|
const cities = parseCityUrlsFromSitemap(xml);
|
|||
|
|
console.log('Total cities:', cities.length);
|
|||
|
|
console.log('Sample:', JSON.stringify(cities.slice(0, 3), null, 2));
|
|||
|
|
const states = [...new Set(cities.map(c => c.state))].sort();
|
|||
|
|
console.log('States:', states.join(', '));
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: ~3,500 cities, states include SP, RJ, MG, RS, BA, DF, etc.
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add scripts/import-horariodemissa.ts
|
|||
|
|
git commit -m "feat: horariodemissa importer scaffold + sitemap enumeration"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 4: HTML parsing
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `scripts/import-horariodemissa.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Understand the dual-source page structure**
|
|||
|
|
|
|||
|
|
Each city page contains two data sources per church, joined by the same key (e.g. `dvey2`):
|
|||
|
|
|
|||
|
|
**Source A** — JS `h.push()` strings embedded in `<script>` (sidebar/map):
|
|||
|
|
```
|
|||
|
|
h.push('<p><strong><a href="igreja.php?k=dvey2">NAME</a></strong><br/>Rua X, 123</p><p><strong>Telefone:</strong> (11) 1234-5678</p>');
|
|||
|
|
```
|
|||
|
|
Contains: key, name, address, phone.
|
|||
|
|
|
|||
|
|
**Source B** — Server-rendered `.result` divs:
|
|||
|
|
```html
|
|||
|
|
<div class="result">
|
|||
|
|
<a href="igreja.php?k=dvey2" class="result_title">NAME</a>
|
|||
|
|
<p class="blockleft"><table>
|
|||
|
|
<tr><td style="...">Domingo:</td><td>07:30, 10:30</td></tr>
|
|||
|
|
</table></p>
|
|||
|
|
</div>
|
|||
|
|
```
|
|||
|
|
Contains: key + schedule tables (first = masses, optional second = confessions).
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Add parseDayLabel, parseTimeCells, parseMassTable, parseConfessionTable**
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// ─── HTML Parsers ─────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
export function parseDayLabel(label: string): { dayOfWeek: number; notes: string | null } | null {
|
|||
|
|
const normalized = label.toLowerCase().replace(/:$/, '').trim();
|
|||
|
|
|
|||
|
|
if (SPECIAL_DAY_MAP[normalized]) {
|
|||
|
|
const s = SPECIAL_DAY_MAP[normalized];
|
|||
|
|
return { dayOfWeek: s.dayOfWeek, notes: s.notes };
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if (DAY_MAP[normalized] !== undefined) {
|
|||
|
|
return { dayOfWeek: DAY_MAP[normalized], notes: null };
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
export function parseTimeCells(timesText: string): Array<{ time: string; notes: string | null }> {
|
|||
|
|
const results: Array<{ time: string; notes: string | null }> = [];
|
|||
|
|
|
|||
|
|
// Split by comma but not inside parentheses
|
|||
|
|
const parts = timesText.split(/,(?![^(]*\))/);
|
|||
|
|
|
|||
|
|
for (const part of parts) {
|
|||
|
|
const trimmed = part.trim();
|
|||
|
|
if (!trimmed) continue;
|
|||
|
|
|
|||
|
|
const timeMatch = trimmed.match(/\b(\d{1,2}:\d{2})\b/);
|
|||
|
|
if (!timeMatch) continue;
|
|||
|
|
|
|||
|
|
const [h, m] = timeMatch[1].split(':');
|
|||
|
|
const time = `${h.padStart(2, '0')}:${m}`;
|
|||
|
|
|
|||
|
|
const notesMatch = trimmed.match(/\(([^)]+)\)/);
|
|||
|
|
results.push({ time, notes: notesMatch ? notesMatch[1].trim() : null });
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return results;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
export function parseMassTable(tableHtml: string): ParsedSchedule[] {
|
|||
|
|
const schedules: ParsedSchedule[] = [];
|
|||
|
|
|
|||
|
|
for (const rowMatch of tableHtml.matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi)) {
|
|||
|
|
const tds = [...rowMatch[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
|
|||
|
|
.map(m => m[1].replace(/<[^>]+>/g, '').trim());
|
|||
|
|
|
|||
|
|
if (tds.length < 2) continue;
|
|||
|
|
|
|||
|
|
const dayResult = parseDayLabel(tds[0]);
|
|||
|
|
if (!dayResult) continue;
|
|||
|
|
|
|||
|
|
for (const { time, notes } of parseTimeCells(tds[1])) {
|
|||
|
|
schedules.push({
|
|||
|
|
dayOfWeek: dayResult.dayOfWeek,
|
|||
|
|
time,
|
|||
|
|
notes: [dayResult.notes, notes].filter(Boolean).join('; ') || null,
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return schedules;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
export function parseConfessionTable(tableHtml: string): ParsedConfession[] {
|
|||
|
|
const confessions: ParsedConfession[] = [];
|
|||
|
|
|
|||
|
|
for (const rowMatch of tableHtml.matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi)) {
|
|||
|
|
const tds = [...rowMatch[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
|
|||
|
|
.map(m => m[1].replace(/<[^>]+>/g, '').trim());
|
|||
|
|
|
|||
|
|
if (tds.length < 2) continue;
|
|||
|
|
|
|||
|
|
const dayResult = parseDayLabel(tds[0]);
|
|||
|
|
if (!dayResult) continue;
|
|||
|
|
|
|||
|
|
// "09:00 às 11:00" or "09:00 a 11:00"
|
|||
|
|
const rangeMatch = tds[1].match(/(\d{1,2}:\d{2})\s+(?:às|a)\s+(\d{1,2}:\d{2})/i);
|
|||
|
|
if (!rangeMatch) continue;
|
|||
|
|
|
|||
|
|
const pad = (t: string) => { const [hh, mm] = t.split(':'); return `${hh.padStart(2,'0')}:${mm}`; };
|
|||
|
|
confessions.push({
|
|||
|
|
dayOfWeek: dayResult.dayOfWeek,
|
|||
|
|
startTime: pad(rangeMatch[1]),
|
|||
|
|
endTime: pad(rangeMatch[2]),
|
|||
|
|
notes: dayResult.notes,
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return confessions;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
/**
|
|||
|
|
* Parse a full city page HTML into church records.
|
|||
|
|
* Joins h.push() JS data (name/address/phone) with .result DOM (schedules) by church key.
|
|||
|
|
*/
|
|||
|
|
export function parseCityPage(html: string, city: string, state: string): ParsedChurch[] {
|
|||
|
|
// Parse Source A: h.push() JS strings → name, address, phone
|
|||
|
|
const jsData = new Map<string, { name: string; address: string | null; phone: string | null }>();
|
|||
|
|
|
|||
|
|
for (const pushMatch of html.matchAll(/h\.push\('([\s\S]*?)'\);/g)) {
|
|||
|
|
const content = pushMatch[1].replace(/\\'/g, "'");
|
|||
|
|
|
|||
|
|
const keyMatch = content.match(/igreja\.php\?k=([a-zA-Z0-9]+)/);
|
|||
|
|
if (!keyMatch) continue;
|
|||
|
|
|
|||
|
|
const nameMatch = content.match(/igreja\.php\?k=[^"]+">([^<]+)<\/a>/);
|
|||
|
|
const addrMatch = content.match(/<br\/>([^<]+)<\/p>/);
|
|||
|
|
const phoneMatch = content.match(/Telefone:<\/strong>\s*([^<]+)/);
|
|||
|
|
|
|||
|
|
jsData.set(keyMatch[1], {
|
|||
|
|
name: nameMatch ? nameMatch[1].trim() : '',
|
|||
|
|
address: addrMatch ? addrMatch[1].trim() || null : null,
|
|||
|
|
phone: phoneMatch ? phoneMatch[1].trim() || null : null,
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Parse Source B: .result divs → schedules
|
|||
|
|
// Use split() rather than a lookahead regex — lookahead with $ drops the last result div
|
|||
|
|
const scheduleData = new Map<string, { massSchedules: ParsedSchedule[]; confessionSchedules: ParsedConfession[] }>();
|
|||
|
|
|
|||
|
|
const resultParts = html.split('<div class="result">');
|
|||
|
|
for (let i = 1; i < resultParts.length; i++) {
|
|||
|
|
const resultHtml = resultParts[i];
|
|||
|
|
|
|||
|
|
const keyMatch = resultHtml.match(/href="igreja\.php\?k=([a-zA-Z0-9]+)"/);
|
|||
|
|
if (!keyMatch) continue;
|
|||
|
|
|
|||
|
|
const tables = [...resultHtml.matchAll(/<table>([\s\S]*?)<\/table>/g)].map(m => m[1]);
|
|||
|
|
scheduleData.set(keyMatch[1], {
|
|||
|
|
massSchedules: tables[0] ? parseMassTable(tables[0]) : [],
|
|||
|
|
confessionSchedules: tables[1] ? parseConfessionTable(tables[1]) : [],
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Join both sources by church key — every church in jsData gets its schedules from scheduleData
|
|||
|
|
const allKeys = new Set([...jsData.keys(), ...scheduleData.keys()]);
|
|||
|
|
const churches: ParsedChurch[] = [];
|
|||
|
|
|
|||
|
|
for (const key of allKeys) {
|
|||
|
|
const js = jsData.get(key);
|
|||
|
|
const sched = scheduleData.get(key);
|
|||
|
|
if (!js?.name) continue;
|
|||
|
|
|
|||
|
|
churches.push({
|
|||
|
|
key,
|
|||
|
|
name: js.name,
|
|||
|
|
address: js.address,
|
|||
|
|
phone: js.phone,
|
|||
|
|
city,
|
|||
|
|
state,
|
|||
|
|
massSchedules: sched?.massSchedules ?? [],
|
|||
|
|
confessionSchedules: sched?.confessionSchedules ?? [],
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return churches;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Verify parsing against a live city page**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx -e "
|
|||
|
|
import dotenv from 'dotenv';
|
|||
|
|
dotenv.config({ path: '.env' });
|
|||
|
|
const { parseCityPage } = await import('./scripts/import-horariodemissa.ts');
|
|||
|
|
const url = 'https://horariodemissa.com.br/search.php?uf=SP&cidade=S%C3%A3o+Paulo&bairro=&opcoes=cidade_opcoes&submit=12345678&hl=pt';
|
|||
|
|
const html = await fetch(url, { headers: { 'User-Agent': 'NearestMass-Importer/1.0' } }).then(r => r.text());
|
|||
|
|
const churches = parseCityPage(html, 'São Paulo', 'SP');
|
|||
|
|
console.log('Churches found:', churches.length);
|
|||
|
|
console.log('With schedules:', churches.filter(c => c.massSchedules.length > 0).length);
|
|||
|
|
console.log('Sample:', JSON.stringify(churches[0], null, 2));
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: 20+ churches found, majority with mass schedules, first entry shows name/address/phone/schedules.
|
|||
|
|
|
|||
|
|
- [ ] **Step 4: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add scripts/import-horariodemissa.ts
|
|||
|
|
git commit -m "feat: horariodemissa HTML parser (day mapping, schedule tables, dual-source join)"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 5: DB upsert + main()
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `scripts/import-horariodemissa.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Add geocode helper**
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// ─── Geocoding ────────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
async function geocodeAddress(address: string, city: string, state: string): Promise<{ lat: number; lng: number } | null> {
|
|||
|
|
const query = [address, city, state, 'Brasil'].filter(Boolean).join(', ');
|
|||
|
|
const url = `${NOMINATIM_URL}?q=${encodeURIComponent(query)}&format=json&limit=1&countrycodes=br`;
|
|||
|
|
await delay(NOMINATIM_DELAY_MS);
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
const response = await fetch(url, {
|
|||
|
|
headers: { 'User-Agent': USER_AGENT, 'Accept': 'application/json' },
|
|||
|
|
});
|
|||
|
|
if (!response.ok) return null;
|
|||
|
|
const results = await response.json() as Array<{ lat: string; lon: string }>;
|
|||
|
|
if (!results.length) return null;
|
|||
|
|
return { lat: parseFloat(results[0].lat), lng: parseFloat(results[0].lon) };
|
|||
|
|
} catch {
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Add upsertChurch function**
|
|||
|
|
|
|||
|
|
Note: `latitude`/`longitude` are non-nullable in the schema. Use `0` as the sentinel for "no coordinates yet" (geocode pass will fill these in). The `source` field must be set explicitly — the schema default is `"masstimes"` which would corrupt source-based queries.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// ─── DB Upsert ────────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
async function upsertChurch(
|
|||
|
|
parsed: ParsedChurch,
|
|||
|
|
existingChurches: ExistingChurch[],
|
|||
|
|
args: CLIArgs,
|
|||
|
|
stats: ImportStats
|
|||
|
|
): Promise<void> {
|
|||
|
|
const candidate = { name: parsed.name, lat: 0, lng: 0, horarioDemissaId: parsed.key };
|
|||
|
|
const existing = findDuplicateChurch(candidate, existingChurches);
|
|||
|
|
|
|||
|
|
if (args.dryRun) {
|
|||
|
|
console.log(` [dry-run] ${existing ? 'UPDATE' : 'CREATE'} ${parsed.name} (${parsed.key})`);
|
|||
|
|
if (existing) stats.churchesUpdated++; else stats.churchesCreated++;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
let churchId: string;
|
|||
|
|
|
|||
|
|
await prisma.$transaction(async (tx) => {
|
|||
|
|
const church = await tx.church.upsert({
|
|||
|
|
where: { horarioDemissaId: parsed.key },
|
|||
|
|
create: {
|
|||
|
|
horarioDemissaId: parsed.key,
|
|||
|
|
name: parsed.name,
|
|||
|
|
address: parsed.address,
|
|||
|
|
city: parsed.city,
|
|||
|
|
state: parsed.state,
|
|||
|
|
country: 'BR',
|
|||
|
|
phone: parsed.phone,
|
|||
|
|
source: 'horario-demissa', // must set explicitly — schema default is "masstimes"
|
|||
|
|
latitude: 0, // sentinel for "no coordinates"; geocode pass fills this in
|
|||
|
|
longitude: 0,
|
|||
|
|
lastScrapedAt: new Date(),
|
|||
|
|
scrapeStrategy: 'horario-demissa',
|
|||
|
|
},
|
|||
|
|
update: {
|
|||
|
|
name: parsed.name,
|
|||
|
|
address: parsed.address ?? undefined,
|
|||
|
|
city: parsed.city,
|
|||
|
|
state: parsed.state,
|
|||
|
|
phone: parsed.phone ?? undefined,
|
|||
|
|
lastScrapedAt: new Date(),
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
churchId = church.id;
|
|||
|
|
|
|||
|
|
await tx.massSchedule.deleteMany({ where: { churchId: church.id } });
|
|||
|
|
|
|||
|
|
if (parsed.massSchedules.length > 0) {
|
|||
|
|
// Deduplicate by day+time before inserting
|
|||
|
|
const seen = new Set<string>();
|
|||
|
|
const deduped = parsed.massSchedules.filter((s) => {
|
|||
|
|
const k = `${s.dayOfWeek}:${s.time}`;
|
|||
|
|
return seen.has(k) ? false : (seen.add(k), true);
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
await tx.massSchedule.createMany({
|
|||
|
|
data: deduped.map((s) => ({
|
|||
|
|
churchId: church.id,
|
|||
|
|
dayOfWeek: s.dayOfWeek,
|
|||
|
|
time: s.time,
|
|||
|
|
notes: s.notes,
|
|||
|
|
})),
|
|||
|
|
});
|
|||
|
|
stats.massSchedulesCreated += deduped.length;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
await tx.confessionSchedule.deleteMany({ where: { churchId: church.id } });
|
|||
|
|
if (parsed.confessionSchedules.length > 0) {
|
|||
|
|
await tx.confessionSchedule.createMany({
|
|||
|
|
data: parsed.confessionSchedules.map((c) => ({
|
|||
|
|
churchId: church.id,
|
|||
|
|
dayOfWeek: c.dayOfWeek,
|
|||
|
|
startTime: c.startTime,
|
|||
|
|
endTime: c.endTime,
|
|||
|
|
notes: c.notes,
|
|||
|
|
})),
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
if (existing) {
|
|||
|
|
stats.churchesUpdated++;
|
|||
|
|
} else {
|
|||
|
|
stats.churchesCreated++;
|
|||
|
|
// Use real DB UUID (churchId!) not the source key string
|
|||
|
|
existingChurches.push({
|
|||
|
|
id: churchId!, name: parsed.name, latitude: 0, longitude: 0,
|
|||
|
|
osmId: null, baiduId: null, masstimesId: null, orarimesseId: null,
|
|||
|
|
massSchedulesPhId: null, philmassId: null, horariosMisasId: null,
|
|||
|
|
mszeInfoId: null, weekdayMassesId: null, messesInfoId: null,
|
|||
|
|
bohosluzbyId: null, miserendId: null, kerknetId: null,
|
|||
|
|
gottesdienstzeitenId: null, horarioDemissaId: parsed.key, misasOrgId: null,
|
|||
|
|
source: 'horario-demissa', website: null, phone: parsed.phone,
|
|||
|
|
address: parsed.address, country: 'BR',
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
} catch (error) {
|
|||
|
|
console.error(` Error upserting ${parsed.name}: ${error instanceof Error ? error.message : error}`);
|
|||
|
|
stats.errors++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Add geocodeOnly pass**
|
|||
|
|
|
|||
|
|
Note: `latitude` is non-nullable (`Float` in schema), so `{ latitude: null }` will never match. Use `{ latitude: 0 }` — that is the sentinel value set on creation for address-only churches.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
async function runGeocodeOnly(stats: ImportStats): Promise<void> {
|
|||
|
|
console.log('\nGeocoding Brazilian churches without coordinates...');
|
|||
|
|
const churches = await prisma.church.findMany({
|
|||
|
|
where: { horarioDemissaId: { not: null }, latitude: 0, address: { not: null } },
|
|||
|
|
select: { id: true, name: true, address: true, city: true, state: true },
|
|||
|
|
});
|
|||
|
|
console.log(`Found ${churches.length} churches to geocode`);
|
|||
|
|
|
|||
|
|
for (const church of churches) {
|
|||
|
|
const coords = await geocodeAddress(church.address!, church.city ?? '', church.state ?? '');
|
|||
|
|
if (coords) {
|
|||
|
|
await prisma.church.update({ where: { id: church.id }, data: { latitude: coords.lat, longitude: coords.lng } });
|
|||
|
|
stats.geocoded++;
|
|||
|
|
console.log(` Geocoded: ${church.name} → ${coords.lat}, ${coords.lng}`);
|
|||
|
|
} else {
|
|||
|
|
stats.geocodeFailed++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 4: Add CLI arg parser + main()**
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// ─── CLI + Main ───────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
function parseArgs(): CLIArgs {
|
|||
|
|
const argv = process.argv.slice(2);
|
|||
|
|
const idx = (flag: string) => argv.indexOf(flag);
|
|||
|
|
return {
|
|||
|
|
all: argv.includes('--all'),
|
|||
|
|
state: idx('--state') >= 0 ? argv[idx('--state') + 1] : undefined,
|
|||
|
|
dryRun: argv.includes('--dry-run'),
|
|||
|
|
geocode: argv.includes('--geocode'),
|
|||
|
|
geocodeOnly: argv.includes('--geocode-only'),
|
|||
|
|
resumeFrom: idx('--resume-from') >= 0 ? parseInt(argv[idx('--resume-from') + 1], 10) : undefined,
|
|||
|
|
jobId: idx('--job-id') >= 0 ? argv[idx('--job-id') + 1] : undefined,
|
|||
|
|
};
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function main(): Promise<void> {
|
|||
|
|
const args = parseArgs();
|
|||
|
|
const stats: ImportStats = {
|
|||
|
|
citiesProcessed: 0, churchesFound: 0, churchesCreated: 0,
|
|||
|
|
churchesUpdated: 0, massSchedulesCreated: 0,
|
|||
|
|
geocoded: 0, geocodeFailed: 0, errors: 0,
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
console.log('\n' + '='.repeat(70));
|
|||
|
|
console.log('HORARIO DE MISSA (BRAZIL) IMPORTER');
|
|||
|
|
console.log('='.repeat(70));
|
|||
|
|
console.log(`Mode: ${args.geocodeOnly ? 'geocode-only' : args.dryRun ? 'dry-run' : 'import'}`);
|
|||
|
|
if (args.state) console.log(`State filter: ${args.state}`);
|
|||
|
|
if (args.resumeFrom) console.log(`Resume from: ${args.resumeFrom}`);
|
|||
|
|
console.log(`Time: ${new Date().toISOString()}\n`);
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
if (args.geocodeOnly) {
|
|||
|
|
await runGeocodeOnly(stats);
|
|||
|
|
} else if (args.all || args.state) {
|
|||
|
|
console.log('Loading existing BR churches...');
|
|||
|
|
const existingChurches = await prisma.church.findMany({
|
|||
|
|
where: { country: 'BR' },
|
|||
|
|
select: {
|
|||
|
|
id: true, name: true, latitude: true, longitude: true,
|
|||
|
|
osmId: true, baiduId: true, masstimesId: true, orarimesseId: true,
|
|||
|
|
massSchedulesPhId: true, philmassId: true, horariosMisasId: true,
|
|||
|
|
mszeInfoId: true, weekdayMassesId: true, messesInfoId: true,
|
|||
|
|
bohosluzbyId: true, miserendId: true, kerknetId: true,
|
|||
|
|
gottesdienstzeitenId: true, horarioDemissaId: true, misasOrgId: true,
|
|||
|
|
source: true, website: true, phone: true, address: true, country: true,
|
|||
|
|
},
|
|||
|
|
}) as ExistingChurch[];
|
|||
|
|
console.log(`Loaded ${existingChurches.length} existing BR churches\n`);
|
|||
|
|
|
|||
|
|
const cities = await fetchCityUrls(args.state);
|
|||
|
|
const startIndex = args.resumeFrom ?? 0;
|
|||
|
|
|
|||
|
|
for (let i = startIndex; i < cities.length; i++) {
|
|||
|
|
const { state, city, url } = cities[i];
|
|||
|
|
console.log(`[${i + 1}/${cities.length}] ${state} / ${city}`);
|
|||
|
|
|
|||
|
|
const html = await fetchPage(url);
|
|||
|
|
if (!html) { stats.errors++; continue; }
|
|||
|
|
|
|||
|
|
const churches = parseCityPage(html, city, state);
|
|||
|
|
stats.churchesFound += churches.length;
|
|||
|
|
stats.citiesProcessed++;
|
|||
|
|
console.log(` ${churches.length} churches`);
|
|||
|
|
|
|||
|
|
for (const church of churches) {
|
|||
|
|
await upsertChurch(church, existingChurches, args, stats);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if (args.geocode && !args.dryRun) {
|
|||
|
|
for (const church of churches) {
|
|||
|
|
if (!church.address) continue;
|
|||
|
|
const dbChurch = await prisma.church.findUnique({
|
|||
|
|
where: { horarioDemissaId: church.key },
|
|||
|
|
select: { id: true, latitude: true },
|
|||
|
|
});
|
|||
|
|
// latitude === 0 is the sentinel for "no real coordinates yet"
|
|||
|
|
if (dbChurch && dbChurch.latitude === 0) {
|
|||
|
|
const coords = await geocodeAddress(church.address, church.city, church.state);
|
|||
|
|
if (coords) {
|
|||
|
|
await prisma.church.update({ where: { id: dbChurch.id }, data: { latitude: coords.lat, longitude: coords.lng } });
|
|||
|
|
stats.geocoded++;
|
|||
|
|
} else {
|
|||
|
|
stats.geocodeFailed++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
} else {
|
|||
|
|
console.error('Usage: --all | --state XX | --geocode-only');
|
|||
|
|
process.exit(1);
|
|||
|
|
}
|
|||
|
|
} finally {
|
|||
|
|
await prisma.$disconnect();
|
|||
|
|
await pool.end();
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
console.log('\n' + '='.repeat(70));
|
|||
|
|
console.log('SUMMARY');
|
|||
|
|
console.log('='.repeat(70));
|
|||
|
|
console.log(`Cities processed: ${stats.citiesProcessed}`);
|
|||
|
|
console.log(`Churches found: ${stats.churchesFound}`);
|
|||
|
|
console.log(` Created: ${stats.churchesCreated}`);
|
|||
|
|
console.log(` Updated: ${stats.churchesUpdated}`);
|
|||
|
|
console.log(` Errors: ${stats.errors}`);
|
|||
|
|
console.log(`Mass schedules: ${stats.massSchedulesCreated}`);
|
|||
|
|
if (args.geocode || args.geocodeOnly) {
|
|||
|
|
console.log(`Geocoded: ${stats.geocoded} / Failed: ${stats.geocodeFailed}`);
|
|||
|
|
}
|
|||
|
|
console.log('='.repeat(70) + '\n');
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
main().catch(console.error);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 5: Test dry-run on small state**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx scripts/import-horariodemissa.ts --state DF --dry-run
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: Lists churches from Distrito Federal (Brasília) without DB writes.
|
|||
|
|
|
|||
|
|
- [ ] **Step 6: Test real import on smallest state (Roraima)**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx scripts/import-horariodemissa.ts --state RR
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Then verify:
|
|||
|
|
```bash
|
|||
|
|
npx tsx -e "
|
|||
|
|
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
|
|||
|
|
import { prisma } from './src/lib/db.ts';
|
|||
|
|
const count = await prisma.church.count({ where: { country: 'BR' } });
|
|||
|
|
const sched = await prisma.massSchedule.count({ where: { church: { country: 'BR' } } });
|
|||
|
|
console.log('BR churches:', count, '| Mass schedules:', sched);
|
|||
|
|
await prisma.\$disconnect();
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: Some churches from Roraima with mass schedules in DB.
|
|||
|
|
|
|||
|
|
- [ ] **Step 7: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add scripts/import-horariodemissa.ts
|
|||
|
|
git commit -m "feat: complete horariodemissa importer (Brazil, 8895 churches + 28523 mass times)"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Chunk 3: Spain Importer (import-misas.ts)
|
|||
|
|
|
|||
|
|
> Depends on Chunk 1. Can run in parallel with Chunk 2.
|
|||
|
|
|
|||
|
|
### Task 6: API pagination + boilerplate
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Create: `scripts/import-misas.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Create script with boilerplate + API pagination**
|
|||
|
|
|
|||
|
|
Create `scripts/import-misas.ts`:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
#!/usr/bin/env tsx
|
|||
|
|
/**
|
|||
|
|
* Import Catholic churches from misas.org (Spain)
|
|||
|
|
*
|
|||
|
|
* misas.org lists 17,919 Spanish parishes with name, address, coordinates,
|
|||
|
|
* and province via a public JSON REST API. Mass schedules are auth-gated
|
|||
|
|
* (401 on detail endpoint), so this importer creates/updates church records
|
|||
|
|
* only — no schedule data.
|
|||
|
|
*
|
|||
|
|
* The listing API accepts offset-based pagination. We use Madrid as the center
|
|||
|
|
* with a large radius (999999m) to cover all of Spain in a single stream.
|
|||
|
|
*
|
|||
|
|
* Import strategy:
|
|||
|
|
* 1. Paginate GET /api/parishsearch?country=es&pos=[...]&offset=N&limit=500
|
|||
|
|
* 2. For each parish: id, name, addr, loc (city), prov (province), zip, lat, long
|
|||
|
|
* 3. Match against existing ES churches by misasOrgId or proximity+name
|
|||
|
|
* 4. Upsert church record (no mass schedules)
|
|||
|
|
*
|
|||
|
|
* Usage:
|
|||
|
|
* npx tsx scripts/import-misas.ts --all
|
|||
|
|
* npx tsx scripts/import-misas.ts --all --dry-run
|
|||
|
|
* npx tsx scripts/import-misas.ts --all --resume-from 5000
|
|||
|
|
* npx tsx scripts/import-misas.ts --all --job-id {uuid}
|
|||
|
|
*/
|
|||
|
|
|
|||
|
|
import dotenv from 'dotenv';
|
|||
|
|
import path from 'path';
|
|||
|
|
|
|||
|
|
dotenv.config({ path: path.resolve(process.cwd(), '.env.local') });
|
|||
|
|
dotenv.config({ path: path.resolve(process.cwd(), '.env') });
|
|||
|
|
|
|||
|
|
import { Pool } from 'pg';
|
|||
|
|
import { PrismaPg } from '@prisma/adapter-pg';
|
|||
|
|
import { PrismaClient } from '@prisma/client';
|
|||
|
|
|
|||
|
|
const dbUrl = process.env.DATABASE_URL || 'postgresql://postgres:postgres@localhost:5432/nearestmass';
|
|||
|
|
console.log(`Connecting to database: ${dbUrl.replace(/:[^:@]+@/, ':***@')}`);
|
|||
|
|
const pool = new Pool({
|
|||
|
|
connectionString: dbUrl,
|
|||
|
|
ssl: dbUrl.includes('neon') ? { rejectUnauthorized: false } : undefined,
|
|||
|
|
});
|
|||
|
|
const adapter = new PrismaPg(pool);
|
|||
|
|
const prisma = new PrismaClient({ adapter });
|
|||
|
|
|
|||
|
|
import { findDuplicateChurch } from '../src/lib/church-matcher';
|
|||
|
|
import type { ExistingChurch } from '../src/lib/church-matcher';
|
|||
|
|
|
|||
|
|
// ─── Constants ───────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
const API_BASE = 'https://misas.org/api/parishsearch';
|
|||
|
|
// Madrid coordinates, large radius covers all of Spain
|
|||
|
|
const SPAIN_POS = encodeURIComponent('[-3.7038,40.4168,999999]');
|
|||
|
|
const PAGE_SIZE = 500;
|
|||
|
|
const REQUEST_DELAY_MS = 500;
|
|||
|
|
const USER_AGENT = 'NearestMass-Importer/1.0 (parish data aggregator; contact: privacy@nearestmass.com)';
|
|||
|
|
|
|||
|
|
// ─── Types ───────────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
interface MisasParish {
|
|||
|
|
id: number;
|
|||
|
|
name: string;
|
|||
|
|
uri: string;
|
|||
|
|
addr: string;
|
|||
|
|
loc: string; // city
|
|||
|
|
prov: string; // province
|
|||
|
|
zip: string;
|
|||
|
|
lat: string;
|
|||
|
|
long: string;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface MisasApiResponse {
|
|||
|
|
count: number;
|
|||
|
|
pars: MisasParish[];
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface CLIArgs {
|
|||
|
|
all: boolean;
|
|||
|
|
dryRun: boolean;
|
|||
|
|
resumeFrom?: number;
|
|||
|
|
jobId?: string;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
interface ImportStats {
|
|||
|
|
total: number;
|
|||
|
|
created: number;
|
|||
|
|
updated: number;
|
|||
|
|
errors: number;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// ─── HTTP Client ──────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
let requestCount = 0;
|
|||
|
|
|
|||
|
|
function delay(ms: number): Promise<void> {
|
|||
|
|
return new Promise((resolve) => setTimeout(resolve, ms));
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function fetchParishes(offset: number): Promise<MisasApiResponse | null> {
|
|||
|
|
if (requestCount > 0) await delay(REQUEST_DELAY_MS);
|
|||
|
|
requestCount++;
|
|||
|
|
|
|||
|
|
const url = `${API_BASE}?country=es&pos=${SPAIN_POS}&offset=${offset}&limit=${PAGE_SIZE}`;
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
const response = await fetch(url, {
|
|||
|
|
headers: {
|
|||
|
|
'User-Agent': USER_AGENT,
|
|||
|
|
'Accept': 'application/json',
|
|||
|
|
'Referer': 'https://misas.org/',
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
if (!response.ok) {
|
|||
|
|
console.error(` HTTP ${response.status} at offset ${offset}`);
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return await response.json() as MisasApiResponse;
|
|||
|
|
} catch (error) {
|
|||
|
|
console.error(` Fetch error at offset ${offset}: ${error instanceof Error ? error.message : error}`);
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// ─── Pagination ───────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
export async function* paginateParishes(startOffset: number = 0): AsyncGenerator<MisasParish> {
|
|||
|
|
let offset = startOffset;
|
|||
|
|
let totalKnown = Infinity;
|
|||
|
|
|
|||
|
|
while (offset < totalKnown) {
|
|||
|
|
console.log(` Fetching offset ${offset}${totalKnown < Infinity ? `/${totalKnown}` : ''}...`);
|
|||
|
|
const data = await fetchParishes(offset);
|
|||
|
|
|
|||
|
|
if (!data || !data.pars || data.pars.length === 0) break;
|
|||
|
|
|
|||
|
|
totalKnown = data.count;
|
|||
|
|
for (const parish of data.pars) {
|
|||
|
|
yield parish;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
offset += data.pars.length;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Verify API returns expected data**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx -e "
|
|||
|
|
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
|
|||
|
|
const { paginateParishes } = await import('./scripts/import-misas.ts');
|
|||
|
|
let count = 0;
|
|||
|
|
for await (const p of paginateParishes()) {
|
|||
|
|
if (count === 0) console.log('First parish:', JSON.stringify(p, null, 2));
|
|||
|
|
count++;
|
|||
|
|
if (count >= 5) break;
|
|||
|
|
}
|
|||
|
|
console.log('Fetched:', count, 'from first batch');
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: Parish objects with id, name, lat, long, addr, loc, prov fields.
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add scripts/import-misas.ts
|
|||
|
|
git commit -m "feat: misas.org importer scaffold + API pagination"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 7: DB upsert + main()
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `scripts/import-misas.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Add upsertParish + main()**
|
|||
|
|
|
|||
|
|
Note: `latitude`/`longitude` are `Float` (non-nullable) — use `0` as sentinel when coordinates are missing. Set `source` explicitly to `'misas-org'` — the schema default is `"masstimes"`.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// ─── DB Upsert ────────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
async function upsertParish(
|
|||
|
|
parish: MisasParish,
|
|||
|
|
existingChurches: ExistingChurch[],
|
|||
|
|
args: CLIArgs,
|
|||
|
|
stats: ImportStats
|
|||
|
|
): Promise<void> {
|
|||
|
|
const lat = parseFloat(parish.lat);
|
|||
|
|
const lng = parseFloat(parish.long);
|
|||
|
|
const misasOrgId = String(parish.id);
|
|||
|
|
const resolvedLat = isNaN(lat) ? 0 : lat;
|
|||
|
|
const resolvedLng = isNaN(lng) ? 0 : lng;
|
|||
|
|
|
|||
|
|
const candidate = {
|
|||
|
|
name: parish.name,
|
|||
|
|
lat: resolvedLat,
|
|||
|
|
lng: resolvedLng,
|
|||
|
|
misasOrgId,
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
const existing = findDuplicateChurch(candidate, existingChurches);
|
|||
|
|
|
|||
|
|
if (args.dryRun) {
|
|||
|
|
console.log(` [dry-run] ${existing ? 'UPDATE' : 'CREATE'} ${parish.name} (${misasOrgId})`);
|
|||
|
|
stats.total++;
|
|||
|
|
if (existing) stats.updated++; else stats.created++;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
const church = await prisma.church.upsert({
|
|||
|
|
where: { misasOrgId },
|
|||
|
|
create: {
|
|||
|
|
misasOrgId,
|
|||
|
|
name: parish.name,
|
|||
|
|
address: parish.addr || null,
|
|||
|
|
city: parish.loc || null,
|
|||
|
|
state: parish.prov || null,
|
|||
|
|
zip: parish.zip || null,
|
|||
|
|
country: 'ES',
|
|||
|
|
source: 'misas-org', // must set explicitly — schema default is "masstimes"
|
|||
|
|
latitude: resolvedLat, // 0 = no real coordinates; misas.org provides coords for most
|
|||
|
|
longitude: resolvedLng,
|
|||
|
|
lastScrapedAt: new Date(),
|
|||
|
|
scrapeStrategy: 'misas-org',
|
|||
|
|
},
|
|||
|
|
update: {
|
|||
|
|
name: parish.name,
|
|||
|
|
address: parish.addr || undefined,
|
|||
|
|
city: parish.loc || undefined,
|
|||
|
|
state: parish.prov || undefined,
|
|||
|
|
zip: parish.zip || undefined,
|
|||
|
|
// Only update coords if we have real values (don't overwrite good data with 0)
|
|||
|
|
...(resolvedLat !== 0 && { latitude: resolvedLat, longitude: resolvedLng }),
|
|||
|
|
misasOrgId, // stamp ID even if matched by proximity
|
|||
|
|
lastScrapedAt: new Date(),
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
if (existing) {
|
|||
|
|
stats.updated++;
|
|||
|
|
} else {
|
|||
|
|
stats.created++;
|
|||
|
|
existingChurches.push({
|
|||
|
|
id: church.id, name: parish.name,
|
|||
|
|
latitude: resolvedLat, longitude: resolvedLng,
|
|||
|
|
osmId: null, baiduId: null, masstimesId: null, orarimesseId: null,
|
|||
|
|
massSchedulesPhId: null, philmassId: null, horariosMisasId: null,
|
|||
|
|
mszeInfoId: null, weekdayMassesId: null, messesInfoId: null,
|
|||
|
|
bohosluzbyId: null, miserendId: null, kerknetId: null,
|
|||
|
|
gottesdienstzeitenId: null, horarioDemissaId: null, misasOrgId,
|
|||
|
|
source: 'misas-org', website: null, phone: null,
|
|||
|
|
address: parish.addr || null, country: 'ES',
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
stats.total++;
|
|||
|
|
} catch (error) {
|
|||
|
|
console.error(` Error upserting ${parish.name}: ${error instanceof Error ? error.message : error}`);
|
|||
|
|
stats.errors++;
|
|||
|
|
stats.total++; // count errors in total so progress log fires correctly
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// ─── CLI + Main ───────────────────────────────────────────────────────────────
|
|||
|
|
|
|||
|
|
// Note: --job-id is accepted for scheduler compatibility but BackgroundJob status
|
|||
|
|
// tracking is not wired up in this importer (acceptable for v1 — add later if needed).
|
|||
|
|
function parseArgs(): CLIArgs {
|
|||
|
|
const argv = process.argv.slice(2);
|
|||
|
|
const idx = (flag: string) => argv.indexOf(flag);
|
|||
|
|
return {
|
|||
|
|
all: argv.includes('--all'),
|
|||
|
|
dryRun: argv.includes('--dry-run'),
|
|||
|
|
resumeFrom: idx('--resume-from') >= 0 ? parseInt(argv[idx('--resume-from') + 1], 10) : undefined,
|
|||
|
|
jobId: idx('--job-id') >= 0 ? argv[idx('--job-id') + 1] : undefined,
|
|||
|
|
};
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function main(): Promise<void> {
|
|||
|
|
const args = parseArgs();
|
|||
|
|
const stats: ImportStats = { total: 0, created: 0, updated: 0, errors: 0 };
|
|||
|
|
|
|||
|
|
console.log('\n' + '='.repeat(70));
|
|||
|
|
console.log('MISAS.ORG (SPAIN) IMPORTER');
|
|||
|
|
console.log('='.repeat(70));
|
|||
|
|
console.log(`Mode: ${args.dryRun ? 'dry-run' : 'import'}`);
|
|||
|
|
if (args.resumeFrom) console.log(`Resume from offset: ${args.resumeFrom}`);
|
|||
|
|
console.log(`Time: ${new Date().toISOString()}\n`);
|
|||
|
|
|
|||
|
|
if (!args.all) {
|
|||
|
|
console.error('Usage: --all [--dry-run] [--resume-from N]');
|
|||
|
|
process.exit(1);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
try {
|
|||
|
|
console.log('Loading existing ES churches...');
|
|||
|
|
const existingChurches = await prisma.church.findMany({
|
|||
|
|
where: { country: 'ES' },
|
|||
|
|
select: {
|
|||
|
|
id: true, name: true, latitude: true, longitude: true,
|
|||
|
|
osmId: true, baiduId: true, masstimesId: true, orarimesseId: true,
|
|||
|
|
massSchedulesPhId: true, philmassId: true, horariosMisasId: true,
|
|||
|
|
mszeInfoId: true, weekdayMassesId: true, messesInfoId: true,
|
|||
|
|
bohosluzbyId: true, miserendId: true, kerknetId: true,
|
|||
|
|
gottesdienstzeitenId: true, horarioDemissaId: true, misasOrgId: true,
|
|||
|
|
source: true, website: true, phone: true, address: true, country: true,
|
|||
|
|
},
|
|||
|
|
}) as ExistingChurch[];
|
|||
|
|
console.log(`Loaded ${existingChurches.length} existing ES churches\n`);
|
|||
|
|
|
|||
|
|
for await (const parish of paginateParishes(args.resumeFrom ?? 0)) {
|
|||
|
|
await upsertParish(parish, existingChurches, args, stats);
|
|||
|
|
|
|||
|
|
if (stats.total % 500 === 0) {
|
|||
|
|
console.log(` Progress: ${stats.total} processed (${stats.created} created, ${stats.updated} updated)`);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
} finally {
|
|||
|
|
await prisma.$disconnect();
|
|||
|
|
await pool.end();
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
console.log('\n' + '='.repeat(70));
|
|||
|
|
console.log('SUMMARY');
|
|||
|
|
console.log('='.repeat(70));
|
|||
|
|
console.log(`Total processed: ${stats.total}`);
|
|||
|
|
console.log(` Created: ${stats.created}`);
|
|||
|
|
console.log(` Updated: ${stats.updated}`);
|
|||
|
|
console.log(` Errors: ${stats.errors}`);
|
|||
|
|
console.log('='.repeat(70) + '\n');
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
main().catch(console.error);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Test dry-run end-to-end**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx scripts/import-misas.ts --all --dry-run 2>&1 | tail -20
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: Processes all 17,919 parishes, shows `Total processed: 17919` with created/updated split.
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add scripts/import-misas.ts
|
|||
|
|
git commit -m "feat: complete misas.org importer (Spain, 17919 churches with coordinates)"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Chunk 4: Integration
|
|||
|
|
|
|||
|
|
### Task 8: package.json + scheduler
|
|||
|
|
|
|||
|
|
**Files:**
|
|||
|
|
- Modify: `package.json`
|
|||
|
|
- Modify: `scripts/scheduler.ts`
|
|||
|
|
|
|||
|
|
- [ ] **Step 1: Add npm scripts**
|
|||
|
|
|
|||
|
|
In `package.json` `"scripts"` block, add after `"import:masstimes-api"`:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
"import:horariodemissa": "tsx scripts/import-horariodemissa.ts",
|
|||
|
|
"import:misas": "tsx scripts/import-misas.ts"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 2: Add getJobCommand cases in scheduler.ts**
|
|||
|
|
|
|||
|
|
In `scripts/scheduler.ts`, add before `default:` in `getJobCommand()`:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
case 'horariodemissa-import': {
|
|||
|
|
const args = ['tsx', 'scripts/import-horariodemissa.ts', '--all'];
|
|||
|
|
if (config?.state) args.push('--state', String(config.state));
|
|||
|
|
if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom));
|
|||
|
|
if (config?.geocode) args.push('--geocode');
|
|||
|
|
return { command: 'npx', args };
|
|||
|
|
}
|
|||
|
|
case 'misas-import': {
|
|||
|
|
const args = ['tsx', 'scripts/import-misas.ts', '--all'];
|
|||
|
|
if (config?.resumeFrom) args.push('--resume-from', String(config.resumeFrom));
|
|||
|
|
return { command: 'npx', args };
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 3: Add to PIPELINE_GROUPS imports sequence**
|
|||
|
|
|
|||
|
|
In `PIPELINE_GROUPS[0].phases`, add after the `masstimes-api-import` entry:
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
{ name: 'horariodemissa-import', type: 'horariodemissa-import', config: {} },
|
|||
|
|
{ name: 'misas-import', type: 'misas-import', config: {} },
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 4: Verify TypeScript**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsc --noEmit
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: no errors.
|
|||
|
|
|
|||
|
|
- [ ] **Step 5: Smoke test both npm scripts**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npm run import:horariodemissa -- --state DF --dry-run 2>&1 | tail -10
|
|||
|
|
npm run import:misas -- --all --dry-run 2>&1 | tail -10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- [ ] **Step 6: Commit**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git add package.json scripts/scheduler.ts
|
|||
|
|
git commit -m "feat: add horariodemissa and misas.org to npm scripts and scheduler pipeline"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Final Verification
|
|||
|
|
|
|||
|
|
- [ ] **Import small state from Brazil to confirm end-to-end**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx scripts/import-horariodemissa.ts --state DF
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx -e "
|
|||
|
|
import dotenv from 'dotenv'; dotenv.config({ path: '.env' });
|
|||
|
|
import { prisma } from './src/lib/db.ts';
|
|||
|
|
const churches = await prisma.church.count({ where: { country: 'BR' } });
|
|||
|
|
const schedules = await prisma.massSchedule.count({ where: { church: { country: 'BR' } } });
|
|||
|
|
console.log('BR churches:', churches, '| Mass schedules:', schedules);
|
|||
|
|
await prisma.\$disconnect();
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: Distrito Federal churches in DB with mass schedules.
|
|||
|
|
|
|||
|
|
- [ ] **Dry-run Spain importer full pass**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx tsx scripts/import-misas.ts --all --dry-run 2>&1 | grep -E "SUMMARY|Total|Created|Updated" | tail -10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected: ~17,919 total, mix of created vs updated depending on existing ES church overlap.
|