Commit Graph

35 Commits

Author SHA1 Message Date
albertfj114
8075072c24 fix: use true Jaccard similarity in wordOverlap (intersection/union)
Replaces max(|A|,|B|) denominator with |A∪B| = |A|+|B|-intersection,
which is the correct Jaccard formula and avoids inflating similarity
when both name sets have significant unique words.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:25:24 -04:00
albertfj114
3ebbc3732f feat: add name normalizer and church matcher for HK import
normalizeName strips noise words (church/parish/chapel/etc), accents,
and punctuation for robust name comparison. findMatch uses word-overlap
Jaccard score (threshold 0.4) with address-prefix fallback for Chinese-
named churches where English name overlap may be low.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:23:58 -04:00
albertfj114
eedb442e78 feat: add full entry parser for HK parishes
parseEntry composes extractNames, extractFields, parseScheduleLine,
and parseWeekdayLine into a single ParsedEntry. Routes schedule
lines by section header (Sunday/Anticipated/Weekday) and skips
Special Masses and Eucharist Adoration sections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:18:05 -04:00
albertfj114
38274174a9 feat: add HK parish import parser functions (Tasks 2-6)
Implements splitEntries, extractNames, extractFields, normalizeTime,
parseScheduleLine, and parseWeekdayLine with 26 passing unit tests.
Handles full-width parentheses, language tags, conditional schedule
notes, day ranges, and comma-separated day/time lists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:15:04 -04:00
albertfj114
328d146201 feat: add HK parish parser functions (Tasks 2-6) with tests
Implements entry splitter, name extractor, field extractor, time normalizer,
schedule line parser, and weekday day-prefix parser. All 26 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:06:26 -04:00
albertfj114
9aea12f4b0 feat: add HK parish import script skeleton
- Imports, types, and Prisma client init
- ParsedSchedule and ParsedEntry types for parsing parish data
- ExistingChurch interface for matching
- ImportStats interface for tracking progress

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:59:51 -04:00
albertfj114
033f805965 fix: clean up church-matcher types and add HK OSM bounding box
- Remove discovermassId/buscarmisasNetworkId from findDuplicateChurch match
  passes (importers now do their own pre-check dedup); restore as optional
  fields on ExistingChurch to keep type/runtime in sync
- Add HK bounding box to COUNTRY_BOUNDING_BOXES; fix silent 0-result
  fallback when country query returns empty from mirror server
- discovermass importer: add --limit flag and skip-already-imported
  pre-check using importedSlugs set
- Import scripts: remove discovermassId from ExistingChurch select/stubs
  (field not needed in shared matcher context)
- Schema: reorder discovermassId/kerknetId/gottesdienstzeitenId fields

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:20:45 -04:00
albertfj114
3bd4d2e2f9 fix: write heartbeat file at startup to avoid cold-start unhealthy window 2026-03-28 10:05:21 -04:00
albertfj114
73d8e8990c fix: include freesearch-enrichment in deploy build step 2026-03-28 10:03:11 -04:00
albertfj114
3cb780a692 fix: replace pgrep healthcheck with heartbeat file check 2026-03-28 08:51:58 -04:00
albertfj114
8f7c4d1698 fix: write heartbeat file for Docker healthcheck 2026-03-28 08:50:19 -04:00
albertfj114
857eaedbcf fix: wait for FreeSearch on startup instead of exiting; clean stale jobs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:46:03 -04:00
albertfj114
93d8a9080a docs: add freesearch stability implementation plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:40:40 -04:00
albertfj114
da4aa61860 docs: add freesearch stability & scheduler healthcheck design spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:38:17 -04:00
albertfj114
9593e08983 feat: add buscarmisas-network to package.json and scheduler pipeline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 23:49:39 -04:00
albertfj114
2b37c2d5f2 feat: add buscarmisas-network importer — CLI + main loop
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 23:47:41 -04:00
albertfj114
dde083c32e feat: add buscarmisas-network importer — DB helpers and church processing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 23:45:34 -04:00
albertfj114
5c7bc4cfed feat: add buscarmisas-network importer — sitemap discovery
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 23:43:19 -04:00
albertfj114
08dc9e76ba feat: add buscarmisas-network importer — parsing functions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 19:09:24 -04:00
albertfj114
d4a8d173ce feat: add buscarmisasNetworkId (and discovermassId) to church-matcher interfaces and ID-match passes 2026-03-19 19:06:14 -04:00
albertfj114
db0be8671e feat: add buscarmisasNetworkId (and discovermassId) to church-matcher interfaces and ID-match passes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 19:03:51 -04:00
albertfj114
6ca891f517 chore: sync schema — add kerknetId, gottesdienstzeitenId, discovermassId, buscarmisasNetworkId 2026-03-19 19:02:09 -04:00
albertfj114
f1a0d458e4 docs: add BuscarMisas network importer implementation plan 2026-03-17 12:21:36 -04:00
albertfj114
c4ce474944 docs: finalize BuscarMisas network importer spec (all review issues resolved) 2026-03-16 23:40:53 -04:00
albertfj114
b93c7808a4 docs: add design spec for BuscarMisas network importer (BR/MX/AR/CO/CL) 2026-03-16 23:35:10 -04:00
albertfj114
ef01616ad8 docs: add design spec for buscarmisas network importer
Covers 7-country Latin America + UK + Switzerland mass times
network (horariosmissa.com.br and sister sites), all sharing
identical WordPress structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 00:16:37 -04:00
albertfj114
9e5e2a2b53 feat: add discovermass-import to scheduler pipeline and package.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 07:06:55 -04:00
albertfj114
53ddc51f64 refactor: remove --test-parse debug block from discovermass importer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 07:06:14 -04:00
albertfj114
2046bbe289 feat: add import-discovermass.ts — USA church importer with 10s crawl delay
Imports 20,284 US Catholic churches from discovermass.com including mass,
confession, and adoration schedules. Respects robots.txt Crawl-delay: 10.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 07:02:58 -04:00
albertfj114
857f1f3b3a fix: update findDuplicateChurch JSDoc to reflect all 15 matching passes 2026-03-11 06:55:37 -04:00
albertfj114
a046928ed0 feat: add discovermassId to church-matcher ExistingChurch and ChurchCandidate
Add discovermassId field to ExistingChurch interface and ChurchCandidate type,
insert a dedicated matching pass in findDuplicateChurch, and update all 15 importer
push blocks plus 16 loadExistingChurches select queries to include the new field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 06:52:05 -04:00
albertfj114
2706708c51 feat: add discovermassId field to Church schema
Add discovermassId as a unique, optional field to the Church model
to support importing churches from discovermass.com (20,284 US churches).

The field follows the same pattern as other source ID fields:
- String type, optional, unique
- Maps to 'discovermass_id' database column
- Includes corresponding database index

Generated Prisma client successfully with 'npx prisma generate'
2026-03-11 06:41:58 -04:00
albertfj114
6e9ada7fdf fix: harden discovermass plan against coord validation and regex slowdown
- Validate lat/lng from daddr= (bounds check + isFinite) before storing
- Cap HTML to 100KB before regex matching to prevent backtracking on large pages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 22:34:51 -04:00
albertfj114
bbef80a782 docs: add discovermass.com importer spec and implementation plan
20,284 US churches with mass/confession/adoration schedules.
10s crawl delay (robots.txt), Docker deployment via scheduler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 21:49:51 -04:00
albertfj114
0e468bcb94 docs: add Brazil + Spain importers design spec and implementation plan
Two new importers:
- horariodemissa.com.br: 8,895 Brazilian churches + 28,523 mass times
- misas.org: 17,919 Spanish churches with coordinates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 19:50:54 -04:00