- Wire enrich-with-forward-geocode.ts as scheduler job type
- Add geocode-enrichment pipeline group (500/cycle, post-imports)
- Harden transfer script: skip churches at (0,0) coordinates
- Rewrite dedup-mass-schedules.ts with raw SQL to avoid Prisma 7 stack overflow
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reset local main to gitea/master (new source of truth) and restored
local-only files: web scrapers, admin dashboard, ChromaDB integration,
debug scripts, and utility libraries that aren't tracked in Gitea.
Gitea master adds: discovermass, buscarmisas-network, hk-parishes,
bohosluzby, kerknet, gottesdienstzeiten, miserend importers,
ClaimRequest model, forward geocoding, heartbeat healthcheck.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds extractStreetAddress() to strip institution name prefixes
("Canossa School (H.K.) 8 Hoi Chak Street" → "8 Hoi Chak Street").
Also cleans Kln./R.E./Lantau Island suffixes. Falls back to the
street-only query if the full address returns no result, marking
results with [FOUND (fallback)] in output.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enriches churches with lat/lng=0 using Nominatim search API.
Cleans trailing city/country suffixes from addresses before querying.
Maps HK/MO to 'cn' countrycodes (OSM treats them as part of China).
After this runs, enrich-with-reverse-geocode fills city/state fields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
upsertChurch() handles matched churches (replace schedules atomically
via $transaction, update contact fields if null) and new churches
(create with source='diocese-hk', lat/lng=0 for later geocoding).
main() wires up CLI args, file reading, matching loop, and summary.
Guards main() call with ESM import.meta.url check to prevent execution
on import during tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces max(|A|,|B|) denominator with |A∪B| = |A|+|B|-intersection,
which is the correct Jaccard formula and avoids inflating similarity
when both name sets have significant unique words.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
normalizeName strips noise words (church/parish/chapel/etc), accents,
and punctuation for robust name comparison. findMatch uses word-overlap
Jaccard score (threshold 0.4) with address-prefix fallback for Chinese-
named churches where English name overlap may be low.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
parseEntry composes extractNames, extractFields, parseScheduleLine,
and parseWeekdayLine into a single ParsedEntry. Routes schedule
lines by section header (Sunday/Anticipated/Weekday) and skips
Special Masses and Eucharist Adoration sections.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements splitEntries, extractNames, extractFields, normalizeTime,
parseScheduleLine, and parseWeekdayLine with 26 passing unit tests.
Handles full-width parentheses, language tags, conditional schedule
notes, day ranges, and comma-separated day/time lists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements entry splitter, name extractor, field extractor, time normalizer,
schedule line parser, and weekday day-prefix parser. All 26 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Imports, types, and Prisma client init
- ParsedSchedule and ParsedEntry types for parsing parish data
- ExistingChurch interface for matching
- ImportStats interface for tracking progress
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove discovermassId/buscarmisasNetworkId from findDuplicateChurch match
passes (importers now do their own pre-check dedup); restore as optional
fields on ExistingChurch to keep type/runtime in sync
- Add HK bounding box to COUNTRY_BOUNDING_BOXES; fix silent 0-result
fallback when country query returns empty from mirror server
- discovermass importer: add --limit flag and skip-already-imported
pre-check using importedSlugs set
- Import scripts: remove discovermassId from ExistingChurch select/stubs
(field not needed in shared matcher context)
- Schema: reorder discovermassId/kerknetId/gottesdienstzeitenId fields
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Imports 20,284 US Catholic churches from discovermass.com including mass,
confession, and adoration schedules. Respects robots.txt Crawl-delay: 10.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add discovermassId field to ExistingChurch interface and ChurchCandidate type,
insert a dedicated matching pass in findDuplicateChurch, and update all 15 importer
push blocks plus 16 loadExistingChurches select queries to include the new field.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>