Files
ScraperControl/docs/superpowers/plans/2026-03-28-freesearch-stability.md
albertfj114 93d8a9080a docs: add freesearch stability implementation plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:40:40 -04:00

10 KiB

FreeSearch Stability & Scheduler Healthcheck Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make the freesearch-enrichment container stay alive when FreeSearch is down, clean up stale running jobs on restart, and fix the scheduler's perpetually-failing Docker healthcheck.

Architecture: Three targeted edits across two scripts and docker-compose. enrich-with-freesearch.ts gets a waitForFreeSearch() startup loop and a stale-job cleanup before job creation. scheduler.ts writes a heartbeat file on each hourly cron tick. docker-compose.yml swaps the pgrep healthcheck for a file-age check on that heartbeat file.

Tech Stack: TypeScript/tsx, Prisma, Docker Compose, node-cron, bash (healthcheck command)


Files

  • Modify: scripts/enrich-with-freesearch.ts:872-880 — add waitForFreeSearch() function
  • Modify: scripts/enrich-with-freesearch.ts:1272-1296 — replace startup exit with wait call + stale job cleanup
  • Modify: scripts/scheduler.ts:747-758 — write heartbeat file in hourly cron
  • Modify: docker-compose.yml:275-280 — replace scheduler healthcheck

Task 1: Add waitForFreeSearch() to the enrichment script

Files:

  • Modify: scripts/enrich-with-freesearch.ts

The existing healthCheck() function (line 872) returns a boolean. We add waitForFreeSearch() directly below it — a loop that calls healthCheck() and sleeps with exponential backoff until it succeeds.

  • Step 1: Add waitForFreeSearch() after healthCheck()

In scripts/enrich-with-freesearch.ts, find this block (around line 872):

async function healthCheck(): Promise<boolean> {
  try {
    const resp = await axios.get(`${FREESEARCH_URL}/api/health`, { timeout: 5000 });
    return resp.status === 200;
  } catch {
    return false;
  }
}

Add the following function immediately after it:

async function waitForFreeSearch(): Promise<void> {
  let backoffMs = 30_000;
  const maxBackoffMs = 300_000; // 5 minutes
  let attempt = 0;

  while (!shuttingDown) {
    attempt++;
    const healthy = await healthCheck();
    if (healthy) {
      if (attempt > 1) log('FreeSearch is back. Continuing...');
      return;
    }
    const waitSec = Math.round(backoffMs / 1000);
    logError(`FreeSearch not reachable at ${FREESEARCH_URL} (attempt ${attempt}). Retrying in ${waitSec}s...`);
    await sleep(backoffMs);
    backoffMs = Math.min(backoffMs * 2, maxBackoffMs);
  }
}
  • Step 2: Replace the startup health check block in main()

Find this block in main() (around line 1272):

  // Health check
  log('Checking FreeSearch health...');
  const healthy = await healthCheck();
  if (!healthy) {
    logError(`FreeSearch not reachable at ${FREESEARCH_URL}`);
    logError('Make sure FreeSearch is running and accessible.');
    process.exit(1);
  }
  log('FreeSearch health check: OK');

Replace with:

  // Wait for FreeSearch to be reachable (indefinite retry with backoff)
  log('Waiting for FreeSearch to be reachable...');
  await waitForFreeSearch();
  if (shuttingDown) return;
  log('FreeSearch health check: OK');
  • Step 3: Add stale job cleanup before job creation

Find this block in main() (around line 1291):

  // Job tracking
  let jobId = await createOrResumeJob(args);
  if (!jobId) {
    jobId = await createNewJob({ countryCode, limit, continuous, dryRun, reSearch });
  }
  log(`Job ID: ${jobId}`);

Replace with:

  // Job tracking — clean up any running jobs left by a previous container restart
  await prisma.backgroundJob.updateMany({
    where: { type: 'freesearch-enrichment', status: 'running' },
    data: { status: 'failed', error: 'Container restarted', completedAt: new Date() },
  });

  let jobId = await createOrResumeJob(args);
  if (!jobId) {
    jobId = await createNewJob({ countryCode, limit, continuous, dryRun, reSearch });
  }
  log(`Job ID: ${jobId}`);
  • Step 4: Verify the script compiles
cd /home/albert/Documents/ScraperControl
npx tsc --noEmit

Expected: no errors (or only pre-existing errors unrelated to this change).

  • Step 5: Commit
git add scripts/enrich-with-freesearch.ts
git commit -m "fix: wait for FreeSearch on startup instead of exiting; clean stale jobs"

Task 2: Write heartbeat file in scheduler

Files:

  • Modify: scripts/scheduler.ts

The scheduler already has an hourly cron that logs a heartbeat message (lines 747-758). We add a single fs.writeFileSync call inside it to write the timestamp to /app/logs/scheduler.heartbeat. The logs/ directory is already created by ensureLogsDir() at startup.

  • Step 1: Add heartbeat file write inside the hourly cron

Find this block in scripts/scheduler.ts (around line 747):

  // Heartbeat every hour — logs cycle state
  cron.schedule('0 * * * *', () => {
    const currentGroup = cycleState.currentGroupIndex < PIPELINE_GROUPS.length
      ? PIPELINE_GROUPS[cycleState.currentGroupIndex].name
      : 'none';
    const jobs = runningJobs.size > 0
      ? `Running: ${[...runningJobs.keys()].join(', ')}`
      : 'No jobs running';
    const state = cycleState.waitingForCooldown
      ? 'cooldown'
      : `group ${cycleState.currentGroupIndex + 1}/${PIPELINE_GROUPS.length} (${currentGroup})`;
    log(`Heartbeat: Cycle ${cycleState.cycleNumber + 1}, ${state}. ${jobs}`);
  }, { timezone: 'UTC' });
  log('Registered cron job: heartbeat (hourly)');

Replace with:

  // Heartbeat every hour — logs cycle state and writes heartbeat file for Docker healthcheck
  cron.schedule('0 * * * *', () => {
    const currentGroup = cycleState.currentGroupIndex < PIPELINE_GROUPS.length
      ? PIPELINE_GROUPS[cycleState.currentGroupIndex].name
      : 'none';
    const jobs = runningJobs.size > 0
      ? `Running: ${[...runningJobs.keys()].join(', ')}`
      : 'No jobs running';
    const state = cycleState.waitingForCooldown
      ? 'cooldown'
      : `group ${cycleState.currentGroupIndex + 1}/${PIPELINE_GROUPS.length} (${currentGroup})`;
    log(`Heartbeat: Cycle ${cycleState.cycleNumber + 1}, ${state}. ${jobs}`);
    fs.writeFileSync(path.join(LOGS_DIR, 'scheduler.heartbeat'), new Date().toISOString());
  }, { timezone: 'UTC' });
  log('Registered cron job: heartbeat (hourly)');

fs and path are already imported in scheduler.ts. LOGS_DIR is already defined as '/app/logs'.

  • Step 2: Verify the script compiles
cd /home/albert/Documents/ScraperControl
npx tsc --noEmit

Expected: no errors.

  • Step 3: Commit
git add scripts/scheduler.ts
git commit -m "fix: write heartbeat file for Docker healthcheck"

Task 3: Fix scheduler healthcheck in docker-compose.yml

Files:

  • Modify: docker-compose.yml

  • Step 1: Replace the scheduler healthcheck

Find this block in docker-compose.yml (around line 275):

    healthcheck:
      test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"]
      interval: 60s
      timeout: 10s
      retries: 3
      start_period: 30s

Replace with:

    healthcheck:
      test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"]
      interval: 90s
      timeout: 10s
      retries: 3
      start_period: 90s

The find ... -mmin -120 check passes if the file exists and was modified within the last 120 minutes (2 hours). The start_period: 90s gives the scheduler time to reach its first hourly cron tick before Docker starts evaluating health.

  • Step 2: Commit
git add docker-compose.yml
git commit -m "fix: replace pgrep healthcheck with heartbeat file check"

Task 4: Deploy and verify

  • Step 1: Sync dev directory to Docker deployment
cd /home/albert/Documents/ScraperControl
bash scripts/deploy-local.sh

Expected: rsync output showing the three changed files transferred to /opt/docker/scraper-control/.

  • Step 2: Restart the two affected containers
docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler
  • Step 3: Verify freesearch-enrichment is stable
docker logs scraper-control-freesearch-enrichment-1 --tail 30 -f

Expected: logs showing "Waiting for FreeSearch to be reachable..." with retry messages if FreeSearch is still down, OR "FreeSearch health check: OK" and normal enrichment if FreeSearch is up. Container should NOT exit. Wait 2 minutes to confirm no restart.

  • Step 4: Confirm stale jobs were cleaned up
docker exec scraper-control-db-1 psql -U postgres -d nearestmass \
  -c "SELECT type, status, started_at, completed_at, error FROM background_jobs WHERE type = 'freesearch-enrichment' ORDER BY started_at DESC LIMIT 5;"

Expected: the two previously-stuck running jobs from Mar 22 and Mar 26 now show status = 'failed' with error = 'Container restarted'.

  • Step 5: Verify scheduler heartbeat file is written

Check if the file already exists from before (it won't — it's new). Wait for next hourly cron tick, or check after 60 minutes:

docker exec scraper-control-scheduler-1 cat /app/logs/scheduler.heartbeat

Expected: an ISO timestamp, e.g. 2026-03-28T14:00:00.000Z

  • Step 6: Verify scheduler becomes healthy
docker ps --format "table {{.Names}}\t{{.Status}}" | grep scheduler

Expected: scraper-control-scheduler-1 Up X hours (healthy) — but only after the first heartbeat fires AND Docker's start_period (90s) passes. If the next cron tick hasn't happened yet, status will remain starting or unhealthy until it does.

To force an immediate test without waiting for the cron:

docker exec scraper-control-scheduler-1 bash -c \
  "date -u +%Y-%m-%dT%H:%M:%S.000Z > /app/logs/scheduler.heartbeat && echo 'written'"
docker exec scraper-control-scheduler-1 \
  find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . && echo "PASS" || echo "FAIL"

Expected: written then PASS.