Two new importers: - horariodemissa.com.br: 8,895 Brazilian churches + 28,523 mass times - misas.org: 17,919 Spanish churches with coordinates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.4 KiB
5.4 KiB
Design: Brazil (horariodemissa.com.br) + Spain (misas.org) Importers
Overview
Two parallel importers targeting the highest-value uncovered regions:
- Brazil — zero current coverage, 8,895 churches + 28,523 mass times
- Spain supplement — 17,919 churches with coordinates (fills gaps vs horariosmisas.com's ~10,000)
Importer 1: import-horariodemissa.ts (Brazil)
Source
- Site: https://horariodemissa.com.br
- Coverage: All 26 Brazilian states + DF
- Data: 8,895 churches, 28,523 mass times (server-rendered, no auth needed)
- robots.txt: Only disallows
/404.php— fully permissive
Enumeration Strategy
Fetch https://horariodemissa.com.br/sitemap.xml → extract unique city URLs filtered to hl=pt only (~3,552 unique cities). URL pattern:
https://horariodemissa.com.br/search.php?uf={STATE}&cidade={CITY}&bairro=&opcoes=cidade_opcoes&submit=12345678&hl=pt
HTML Parsing
Each city page contains .result divs (server-rendered). Per church:
- Key:
hrefof.result_titlelink →igreja.php?k=XXXXX(alphanumeric, used ashorarioDemissaId) - Name:
.result_titlelink text - Address: text node after the
<br/>in the first<p>within.result - Phone:
<p>containingTelefone: - Mass schedule: first
<table>— rows with<td style="text-align:right;font-weight:bold;">DAY:</td><td>TIMES</td> - Confession schedule: second
<table>(same structure, times as rangesHH:MM às HH:MM)
Day Name Mapping
| Portuguese | dayOfWeek |
|---|---|
| Domingo | 0 (Sunday) |
| Segunda-feira | 1 (Monday) |
| Terça-feira | 2 (Tuesday) |
| Quarta-feira | 3 (Wednesday) |
| Quinta-feira | 4 (Thursday) |
| Sexta-feira | 5 (Friday) |
| Sábado | 6 (Saturday) |
| Primeiro Sábado | 6, notes="Primeiro Sábado" |
| Segundo Domingo | 0, notes="Segundo Domingo" |
Time format: HH:MM (24h, already in correct format). Multiple times comma-separated.
Notes in parentheses e.g. (Forma Extraordinária do Rito Romano) → strip and store as massType or notes.
Matching Strategy
horarioDemissaIdexact match (for re-runs)- Name + proximity (200m) against existing BR churches (some may exist from OSM)
- Unmatched: create new church, country=BR, no coordinates
Schema Addition
horarioDemissaId String? @unique @map("horario_demissa_id")
@@index([horarioDemissaId])
CLI
npx tsx scripts/import-horariodemissa.ts --all
npx tsx scripts/import-horariodemissa.ts --all --dry-run
npx tsx scripts/import-horariodemissa.ts --state SP
npx tsx scripts/import-horariodemissa.ts --all --resume-from 500
npx tsx scripts/import-horariodemissa.ts --all --geocode # Nominatim pass
npx tsx scripts/import-horariodemissa.ts --geocode-only
npx tsx scripts/import-horariodemissa.ts --all --job-id {uuid}
Rate Limiting
- City pages: 1.5s between requests (~3,552 × 1.5s ≈ 1.5 hours)
- Geocode (optional): 1.1s between Nominatim requests
Importer 2: import-misas.ts (Spain)
Source
- Site: https://misas.org
- Coverage: Spain only (despite claiming LatAm — API returns 0 for MX/AR/CO)
- Data: 17,919 churches with coordinates, name, address, province, zip
- No mass schedules: detail API returns 401 — church directory only
API
GET https://misas.org/api/parishsearch?country=es&pos=[-3.7038,40.4168,999999]&offset=0&limit=500
Response:
{
"count": 17919,
"pars": [
{
"id": 16604,
"name": "Parròquia de Sant Lliser",
"uri": "parroquia-de-sant-lliser-alos-disil",
"addr": "Carrer Bonabe, 4",
"loc": "Alòs d'Isil",
"prov": "Lérida",
"zip": "25586",
"lat": "42.701074",
"long": "1.100028"
}
]
}
Enumeration Strategy
Paginate with offset in steps of 500 until all 17,919 churches fetched (~36 requests). Use Madrid center coordinates with radius=999999 to cover all of Spain.
Matching Strategy
misasOrgIdexact match (for re-runs)- Name + proximity (200m) against existing ES churches
- Unmatched: create new church with coordinates, country=ES
No mass schedules written — church record only.
Schema Addition
misasOrgId String? @unique @map("misas_org_id")
@@index([misasOrgId])
CLI
npx tsx scripts/import-misas.ts --all
npx tsx scripts/import-misas.ts --all --dry-run
npx tsx scripts/import-misas.ts --all --resume-from 5000
npx tsx scripts/import-misas.ts --all --job-id {uuid}
Rate Limiting
- API pagination: 500ms between requests (~36 calls, minimal impact)
Shared Implementation Patterns
Both scripts follow the standard importer pattern:
// DB setup
dotenv.config(...)
const pool = new Pool({ connectionString: DATABASE_URL })
const prisma = new PrismaClient({ adapter: new PrismaPg(pool) })
// church-matcher integration
import { findDuplicateChurch } from '../src/lib/church-matcher'
// ExistingChurch interface gets new ID fields added
// Standard flags
--all, --dry-run, --resume-from N, --job-id UUID
// Stats output
{ total, created, updated, skipped, errors }
Both added to:
package.jsonscripts- Scheduler pipeline (sequential imports group)
church-matcher.tsExistingChurch interface
Estimated Scale
| Brazil | Spain | |
|---|---|---|
| Churches | 8,895 (all new) | 17,919 (~7,000 new vs horariosmisas) |
| Mass times | 28,523 | 0 (no schedule access) |
| Runtime | ~1.5h | ~5 min |
| Coordinates | No (address only) | Yes |