104 lines
3.3 KiB
Markdown
104 lines
3.3 KiB
Markdown
# FreeSearch Stability & Scheduler Healthcheck Fix
|
|
|
|
**Date:** 2026-03-28
|
|
**Status:** Approved
|
|
**Scope:** `scripts/enrich-with-freesearch.ts`, `scripts/scheduler.ts`, `docker-compose.yml`
|
|
|
|
---
|
|
|
|
## Problem Summary
|
|
|
|
Three related infrastructure reliability issues identified during health check:
|
|
|
|
1. **FreeSearch crash loop** — `freesearch-enrichment` container restarts every ~60s because startup health check calls `process.exit(1)` when FreeSearch API is unreachable. The circuit breaker (which handles mid-run outages) lives inside `runContinuous()` and is never reached.
|
|
|
|
2. **Stale running jobs** — Each container restart creates a new `freesearch-enrichment` DB job without cleaning up the previous `running` one. Two jobs from Mar 22 and Mar 26 are permanently stuck as `running`.
|
|
|
|
3. **Scheduler healthcheck failing** — `node:20-bookworm-slim` does not include `procps`/`pgrep`. The healthcheck command `pgrep -f scheduler.ts` exits 1 silently → scheduler shows as `unhealthy` despite working correctly.
|
|
|
|
---
|
|
|
|
## Fix 1: FreeSearch Startup Resilience
|
|
|
|
### Change
|
|
|
|
Replace the `process.exit(1)` startup health check in `main()` with a `waitForFreeSearch()` function.
|
|
|
|
### Behavior
|
|
|
|
- Polls `GET /api/health` with exponential backoff: 30s → 60s → 120s → 240s → cap at 300s (5 min)
|
|
- Waits indefinitely — container stays alive until FreeSearch comes back
|
|
- Logs each attempt: `"FreeSearch not reachable, retrying in 120s..."`
|
|
- Logs recovery: `"FreeSearch is back, continuing..."`
|
|
- Proceeds to job setup and `runContinuous()` once health check passes
|
|
|
|
### Stale job cleanup (same function)
|
|
|
|
Before creating a new DB job in `main()`, run a cleanup:
|
|
|
|
```typescript
|
|
await prisma.backgroundJob.updateMany({
|
|
where: { type: 'freesearch-enrichment', status: 'running' },
|
|
data: { status: 'failed', error: 'Container restarted', completedAt: new Date() },
|
|
});
|
|
```
|
|
|
|
This fixes the two existing stuck jobs and prevents the pattern from recurring on future restarts.
|
|
|
|
### Files changed
|
|
|
|
- `scripts/enrich-with-freesearch.ts`: ~25 lines
|
|
|
|
---
|
|
|
|
## Fix 2: Scheduler Healthcheck
|
|
|
|
### Change
|
|
|
|
Replace `pgrep`-based healthcheck with a heartbeat file approach.
|
|
|
|
**In `scheduler.ts`:** Add `writeHeartbeat()` call inside the existing hourly cron handler. Writes current ISO timestamp to `/app/logs/scheduler.heartbeat`.
|
|
|
|
**In `docker-compose.yml`:** Replace healthcheck:
|
|
|
|
```yaml
|
|
# Before
|
|
test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"]
|
|
interval: 60s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 30s
|
|
|
|
# After
|
|
test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"]
|
|
interval: 90s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 90s
|
|
```
|
|
|
|
The `./logs` volume is already mounted. `start_period: 90s` avoids false alarms before the first cron tick.
|
|
|
|
### Files changed
|
|
|
|
- `scripts/scheduler.ts`: ~5 lines
|
|
- `docker-compose.yml`: 4 lines
|
|
|
|
---
|
|
|
|
## Fix 3: Deploy
|
|
|
|
```bash
|
|
bash scripts/deploy-local.sh
|
|
docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
- `freesearch-enrichment` container stays running even when FreeSearch is down, resumes enrichment when it comes back
|
|
- No new stale `running` freesearch-enrichment jobs after container restarts
|
|
- `scheduler` container shows as `healthy` in `docker ps`
|
|
- No behavioral changes to enrichment logic itself
|