# weekdaymasses.org.uk Global Importer ## Context weekdaymasses.org.uk is a UK-based Catholic directory covering ~3,500-4,000 churches globally with mass schedules, coordinates, addresses, and phone numbers. Covers GB, Ireland, and 49+ international countries (India, Sri Lanka, South Korea, Japan, and more). All data served on single HTML pages per area — no pagination or API needed. ## Data Source Three area pages cover the entire site: | Page | URL | Est. Churches | |------|-----|---------------| | GB | `/en/area/gb/churches` | ~3,000+ | | Ireland | `/en/area/ireland/churches` | ~300+ | | Outside GB | `/en/area/outside-gb/churches` | ~152+ | Individual country/region pages (e.g. `/en/area/india/churches`) are subsets of these three. ### Data per church - **Name**: h3 heading, format "Church Name (Location)" - **Address**: plain text after mass times, with postal/zip code - **Coordinates**: in map link query params `lat=XX.XXXX&lon=YY.YYYY&church_id=NNNNN` - **Mass times**: format `Day: HH.MMam/pm(Language), HH.MMam/pm(Language)` - **Phone**: `Tel: +XX XXXX XXXXXX` - **Website**: occasional links - **church_id**: unique numeric identifier in map links ### Mass time format ``` Sunday: 6.30am(Tamil), 8.30am(Tamil), 5.30pm(English) Mon Tue Wed Thu Fri: 6.30am(Tamil) Saturday: 6.30am(Tamil), 5.30pm(English) ``` Day labels: `Sunday`, `Mon`, `Tue`, `Wed`, `Thu`, `Fri`, `Saturday`, or combinations like `Mon Tue Wed Thu Fri`. Also `Holy Day` entries. Time format: `H.MMam/pm` — needs conversion to 24h `HH:MM`. Language in parentheses maps to our `language` field on mass_schedules. ### Country detection The address is the last line of each church entry. Country can be detected by: - GB: UK postal code pattern (e.g. `SW1A 1AA`) - Ireland: Irish Eircode (e.g. `D01 F5P2`) or "Ireland" in address - India: 6-digit postal code (e.g. `600088`) - Others: country name at end of address, or fallback to the area page being scraped ## Design ### Schema Add to Church model in both BethelGuide and ScraperControl: ```prisma weekdayMassesId String? @unique @map("weekday_masses_id") @@index([weekdayMassesId]) ``` ### Script: `scripts/import-weekdaymasses.ts` Single script that: 1. Fetches area pages (default: all 3; filterable with `--area gb|ireland|outside-gb|india|...`) 2. Parses HTML into structured church entries 3. Converts mass times from `H.MMam/pm` to `HH:MM` 24h format 4. Detects country from address patterns 5. Matches against existing churches by `weekdayMassesId` (exact) then proximity+name 6. Upserts churches and replaces mass schedules ### HTML parsing strategy Each church is a block between consecutive h3 headings. Within each block: - h3 content = church name - Lines with day labels + times = mass schedule - Map link = coordinates + church_id - Last text block before next h3 = address - `Tel:` prefix = phone ### CLI flags - `--all` — import all 3 area pages - `--area ` — import specific area (gb, ireland, outside-gb, india, sri-lanka, etc.) - `--dry-run` — no database writes - `--resume-from ` — skip first N churches - `--job-id ` — background job tracking ### Church matcher integration Add `weekdayMassesId` to `ExistingChurch`, `ChurchCandidate`, and a new match pass in `findDuplicateChurch()`. ### Scheduler integration Add `weekdaymasses-import` to the sequential imports group in the pipeline, with `getJobCommand()` case and npm script. ## Scope - ~3,500-4,000 churches with mass schedules - Most GB/Ireland churches already in DB from OSM (will match and add schedules) - India/Sri Lanka/international churches partially in DB from OSM/gcatholic - Value: mass schedule data for thousands of churches that currently have none