Reset local main to gitea/master (new source of truth) and restored local-only files: web scrapers, admin dashboard, ChromaDB integration, debug scripts, and utility libraries that aren't tracked in Gitea. Gitea master adds: discovermass, buscarmisas-network, hk-parishes, bohosluzby, kerknet, gottesdienstzeiten, miserend importers, ClaimRequest model, forward geocoding, heartbeat healthcheck. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.7 KiB
weekdaymasses.org.uk Global Importer
Context
weekdaymasses.org.uk is a UK-based Catholic directory covering ~3,500-4,000 churches globally with mass schedules, coordinates, addresses, and phone numbers. Covers GB, Ireland, and 49+ international countries (India, Sri Lanka, South Korea, Japan, and more). All data served on single HTML pages per area — no pagination or API needed.
Data Source
Three area pages cover the entire site:
| Page | URL | Est. Churches |
|---|---|---|
| GB | /en/area/gb/churches |
~3,000+ |
| Ireland | /en/area/ireland/churches |
~300+ |
| Outside GB | /en/area/outside-gb/churches |
~152+ |
Individual country/region pages (e.g. /en/area/india/churches) are subsets of these three.
Data per church
- Name: h3 heading, format "Church Name (Location)"
- Address: plain text after mass times, with postal/zip code
- Coordinates: in map link query params
lat=XX.XXXX&lon=YY.YYYY&church_id=NNNNN - Mass times: format
Day: HH.MMam/pm(Language), HH.MMam/pm(Language) - Phone:
Tel: +XX XXXX XXXXXX - Website: occasional links
- church_id: unique numeric identifier in map links
Mass time format
Sunday: 6.30am(Tamil), 8.30am(Tamil), 5.30pm(English)
Mon Tue Wed Thu Fri: 6.30am(Tamil)
Saturday: 6.30am(Tamil), 5.30pm(English)
Day labels: Sunday, Mon, Tue, Wed, Thu, Fri, Saturday, or combinations like Mon Tue Wed Thu Fri. Also Holy Day entries.
Time format: H.MMam/pm — needs conversion to 24h HH:MM.
Language in parentheses maps to our language field on mass_schedules.
Country detection
The address is the last line of each church entry. Country can be detected by:
- GB: UK postal code pattern (e.g.
SW1A 1AA) - Ireland: Irish Eircode (e.g.
D01 F5P2) or "Ireland" in address - India: 6-digit postal code (e.g.
600088) - Others: country name at end of address, or fallback to the area page being scraped
Design
Schema
Add to Church model in both BethelGuide and ScraperControl:
weekdayMassesId String? @unique @map("weekday_masses_id")
@@index([weekdayMassesId])
Script: scripts/import-weekdaymasses.ts
Single script that:
- Fetches area pages (default: all 3; filterable with
--area gb|ireland|outside-gb|india|...) - Parses HTML into structured church entries
- Converts mass times from
H.MMam/pmtoHH:MM24h format - Detects country from address patterns
- Matches against existing churches by
weekdayMassesId(exact) then proximity+name - Upserts churches and replaces mass schedules
HTML parsing strategy
Each church is a block between consecutive h3 headings. Within each block:
- h3 content = church name
- Lines with day labels + times = mass schedule
- Map link = coordinates + church_id
- Last text block before next h3 = address
Tel:prefix = phone
CLI flags
--all— import all 3 area pages--area <name>— import specific area (gb, ireland, outside-gb, india, sri-lanka, etc.)--dry-run— no database writes--resume-from <n>— skip first N churches--job-id <uuid>— background job tracking
Church matcher integration
Add weekdayMassesId to ExistingChurch, ChurchCandidate, and a new match pass in findDuplicateChurch().
Scheduler integration
Add weekdaymasses-import to the sequential imports group in the pipeline, with getJobCommand() case and npm script.
Scope
- ~3,500-4,000 churches with mass schedules
- Most GB/Ireland churches already in DB from OSM (will match and add schedules)
- India/Sri Lanka/international churches partially in DB from OSM/gcatholic
- Value: mass schedule data for thousands of churches that currently have none