3.3 KiB
FreeSearch Stability & Scheduler Healthcheck Fix
Date: 2026-03-28
Status: Approved
Scope: scripts/enrich-with-freesearch.ts, scripts/scheduler.ts, docker-compose.yml
Problem Summary
Three related infrastructure reliability issues identified during health check:
-
FreeSearch crash loop —
freesearch-enrichmentcontainer restarts every ~60s because startup health check callsprocess.exit(1)when FreeSearch API is unreachable. The circuit breaker (which handles mid-run outages) lives insiderunContinuous()and is never reached. -
Stale running jobs — Each container restart creates a new
freesearch-enrichmentDB job without cleaning up the previousrunningone. Two jobs from Mar 22 and Mar 26 are permanently stuck asrunning. -
Scheduler healthcheck failing —
node:20-bookworm-slimdoes not includeprocps/pgrep. The healthcheck commandpgrep -f scheduler.tsexits 1 silently → scheduler shows asunhealthydespite working correctly.
Fix 1: FreeSearch Startup Resilience
Change
Replace the process.exit(1) startup health check in main() with a waitForFreeSearch() function.
Behavior
- Polls
GET /api/healthwith exponential backoff: 30s → 60s → 120s → 240s → cap at 300s (5 min) - Waits indefinitely — container stays alive until FreeSearch comes back
- Logs each attempt:
"FreeSearch not reachable, retrying in 120s..." - Logs recovery:
"FreeSearch is back, continuing..." - Proceeds to job setup and
runContinuous()once health check passes
Stale job cleanup (same function)
Before creating a new DB job in main(), run a cleanup:
await prisma.backgroundJob.updateMany({
where: { type: 'freesearch-enrichment', status: 'running' },
data: { status: 'failed', error: 'Container restarted', completedAt: new Date() },
});
This fixes the two existing stuck jobs and prevents the pattern from recurring on future restarts.
Files changed
scripts/enrich-with-freesearch.ts: ~25 lines
Fix 2: Scheduler Healthcheck
Change
Replace pgrep-based healthcheck with a heartbeat file approach.
In scheduler.ts: Add writeHeartbeat() call inside the existing hourly cron handler. Writes current ISO timestamp to /app/logs/scheduler.heartbeat.
In docker-compose.yml: Replace healthcheck:
# Before
test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"]
interval: 60s
timeout: 10s
retries: 3
start_period: 30s
# After
test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"]
interval: 90s
timeout: 10s
retries: 3
start_period: 90s
The ./logs volume is already mounted. start_period: 90s avoids false alarms before the first cron tick.
Files changed
scripts/scheduler.ts: ~5 linesdocker-compose.yml: 4 lines
Fix 3: Deploy
bash scripts/deploy-local.sh
docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler
Success Criteria
freesearch-enrichmentcontainer stays running even when FreeSearch is down, resumes enrichment when it comes back- No new stale
runningfreesearch-enrichment jobs after container restarts schedulercontainer shows ashealthyindocker ps- No behavioral changes to enrichment logic itself