# FreeSearch Stability & Scheduler Healthcheck Fix **Date:** 2026-03-28 **Status:** Approved **Scope:** `scripts/enrich-with-freesearch.ts`, `scripts/scheduler.ts`, `docker-compose.yml` --- ## Problem Summary Three related infrastructure reliability issues identified during health check: 1. **FreeSearch crash loop** — `freesearch-enrichment` container restarts every ~60s because startup health check calls `process.exit(1)` when FreeSearch API is unreachable. The circuit breaker (which handles mid-run outages) lives inside `runContinuous()` and is never reached. 2. **Stale running jobs** — Each container restart creates a new `freesearch-enrichment` DB job without cleaning up the previous `running` one. Two jobs from Mar 22 and Mar 26 are permanently stuck as `running`. 3. **Scheduler healthcheck failing** — `node:20-bookworm-slim` does not include `procps`/`pgrep`. The healthcheck command `pgrep -f scheduler.ts` exits 1 silently → scheduler shows as `unhealthy` despite working correctly. --- ## Fix 1: FreeSearch Startup Resilience ### Change Replace the `process.exit(1)` startup health check in `main()` with a `waitForFreeSearch()` function. ### Behavior - Polls `GET /api/health` with exponential backoff: 30s → 60s → 120s → 240s → cap at 300s (5 min) - Waits indefinitely — container stays alive until FreeSearch comes back - Logs each attempt: `"FreeSearch not reachable, retrying in 120s..."` - Logs recovery: `"FreeSearch is back, continuing..."` - Proceeds to job setup and `runContinuous()` once health check passes ### Stale job cleanup (same function) Before creating a new DB job in `main()`, run a cleanup: ```typescript await prisma.backgroundJob.updateMany({ where: { type: 'freesearch-enrichment', status: 'running' }, data: { status: 'failed', error: 'Container restarted', completedAt: new Date() }, }); ``` This fixes the two existing stuck jobs and prevents the pattern from recurring on future restarts. ### Files changed - `scripts/enrich-with-freesearch.ts`: ~25 lines --- ## Fix 2: Scheduler Healthcheck ### Change Replace `pgrep`-based healthcheck with a heartbeat file approach. **In `scheduler.ts`:** Add `writeHeartbeat()` call inside the existing hourly cron handler. Writes current ISO timestamp to `/app/logs/scheduler.heartbeat`. **In `docker-compose.yml`:** Replace healthcheck: ```yaml # Before test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"] interval: 60s timeout: 10s retries: 3 start_period: 30s # After test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"] interval: 90s timeout: 10s retries: 3 start_period: 90s ``` The `./logs` volume is already mounted. `start_period: 90s` avoids false alarms before the first cron tick. ### Files changed - `scripts/scheduler.ts`: ~5 lines - `docker-compose.yml`: 4 lines --- ## Fix 3: Deploy ```bash bash scripts/deploy-local.sh docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler ``` --- ## Success Criteria - `freesearch-enrichment` container stays running even when FreeSearch is down, resumes enrichment when it comes back - No new stale `running` freesearch-enrichment jobs after container restarts - `scheduler` container shows as `healthy` in `docker ps` - No behavioral changes to enrichment logic itself