ScraperControl

Author	SHA1	Message	Date
albertfj114	3ebbc3732f	feat: add name normalizer and church matcher for HK import normalizeName strips noise words (church/parish/chapel/etc), accents, and punctuation for robust name comparison. findMatch uses word-overlap Jaccard score (threshold 0.4) with address-prefix fallback for Chinese- named churches where English name overlap may be low. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 16:23:58 -04:00
albertfj114	eedb442e78	feat: add full entry parser for HK parishes parseEntry composes extractNames, extractFields, parseScheduleLine, and parseWeekdayLine into a single ParsedEntry. Routes schedule lines by section header (Sunday/Anticipated/Weekday) and skips Special Masses and Eucharist Adoration sections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 16:18:05 -04:00
albertfj114	38274174a9	feat: add HK parish import parser functions (Tasks 2-6) Implements splitEntries, extractNames, extractFields, normalizeTime, parseScheduleLine, and parseWeekdayLine with 26 passing unit tests. Handles full-width parentheses, language tags, conditional schedule notes, day ranges, and comma-separated day/time lists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 16:15:04 -04:00
albertfj114	328d146201	feat: add HK parish parser functions (Tasks 2-6) with tests Implements entry splitter, name extractor, field extractor, time normalizer, schedule line parser, and weekday day-prefix parser. All 26 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 16:06:26 -04:00
albertfj114	9aea12f4b0	feat: add HK parish import script skeleton - Imports, types, and Prisma client init - ParsedSchedule and ParsedEntry types for parsing parish data - ExistingChurch interface for matching - ImportStats interface for tracking progress Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 15:59:51 -04:00

5 Commits