ScraperControl/docs/superpowers/specs/2026-03-28-freesearch-stability-design.md

# FreeSearch Stability & Scheduler Healthcheck Fix

**Date:** 2026-03-28
**Status:** Approved
**Scope:** `scripts/enrich-with-freesearch.ts`, `scripts/scheduler.ts`, `docker-compose.yml`

---

## Problem Summary

Three related infrastructure reliability issues identified during health check:

1. **FreeSearch crash loop** — `freesearch-enrichment` container restarts every ~60s because startup health check calls `process.exit(1)` when FreeSearch API is unreachable. The circuit breaker (which handles mid-run outages) lives inside `runContinuous()` and is never reached.

2. **Stale running jobs** — Each container restart creates a new `freesearch-enrichment` DB job without cleaning up the previous `running` one. Two jobs from Mar 22 and Mar 26 are permanently stuck as `running`.

3. **Scheduler healthcheck failing** — `node:20-bookworm-slim` does not include `procps`/`pgrep`. The healthcheck command `pgrep -f scheduler.ts` exits 1 silently → scheduler shows as `unhealthy` despite working correctly.

---

## Fix 1: FreeSearch Startup Resilience

### Change

Replace the `process.exit(1)` startup health check in `main()` with a `waitForFreeSearch()` function.

### Behavior

- Polls `GET /api/health` with exponential backoff: 30s → 60s → 120s → 240s → cap at 300s (5 min)
- Waits indefinitely — container stays alive until FreeSearch comes back
- Logs each attempt: `"FreeSearch not reachable, retrying in 120s..."`
- Logs recovery: `"FreeSearch is back, continuing..."`
- Proceeds to job setup and `runContinuous()` once health check passes

### Stale job cleanup (same function)

Before creating a new DB job in `main()`, run a cleanup:

```typescript
await prisma.backgroundJob.updateMany({
  where: { type: 'freesearch-enrichment', status: 'running' },
  data: { status: 'failed', error: 'Container restarted', completedAt: new Date() },
});
```

This fixes the two existing stuck jobs and prevents the pattern from recurring on future restarts.

### Files changed

- `scripts/enrich-with-freesearch.ts`: ~25 lines

---

## Fix 2: Scheduler Healthcheck

### Change

Replace `pgrep`-based healthcheck with a heartbeat file approach.

**In `scheduler.ts`:** Add `writeHeartbeat()` call inside the existing hourly cron handler. Writes current ISO timestamp to `/app/logs/scheduler.heartbeat`.

**In `docker-compose.yml`:** Replace healthcheck:

```yaml
# Before
test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"]
interval: 60s
timeout: 10s
retries: 3
start_period: 30s

# After
test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"]
interval: 90s
timeout: 10s
retries: 3
start_period: 90s
```

The `./logs` volume is already mounted. `start_period: 90s` avoids false alarms before the first cron tick.

### Files changed

- `scripts/scheduler.ts`: ~5 lines
- `docker-compose.yml`: 4 lines

---

## Fix 3: Deploy

```bash
bash scripts/deploy-local.sh
docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler
```

---

## Success Criteria

- `freesearch-enrichment` container stays running even when FreeSearch is down, resumes enrichment when it comes back
- No new stale `running` freesearch-enrichment jobs after container restarts
- `scheduler` container shows as `healthy` in `docker ps`
- No behavioral changes to enrichment logic itself