10 KiB
FreeSearch Stability & Scheduler Healthcheck Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Make the freesearch-enrichment container stay alive when FreeSearch is down, clean up stale running jobs on restart, and fix the scheduler's perpetually-failing Docker healthcheck.
Architecture: Three targeted edits across two scripts and docker-compose. enrich-with-freesearch.ts gets a waitForFreeSearch() startup loop and a stale-job cleanup before job creation. scheduler.ts writes a heartbeat file on each hourly cron tick. docker-compose.yml swaps the pgrep healthcheck for a file-age check on that heartbeat file.
Tech Stack: TypeScript/tsx, Prisma, Docker Compose, node-cron, bash (healthcheck command)
Files
- Modify:
scripts/enrich-with-freesearch.ts:872-880— addwaitForFreeSearch()function - Modify:
scripts/enrich-with-freesearch.ts:1272-1296— replace startup exit with wait call + stale job cleanup - Modify:
scripts/scheduler.ts:747-758— write heartbeat file in hourly cron - Modify:
docker-compose.yml:275-280— replace scheduler healthcheck
Task 1: Add waitForFreeSearch() to the enrichment script
Files:
- Modify:
scripts/enrich-with-freesearch.ts
The existing healthCheck() function (line 872) returns a boolean. We add waitForFreeSearch() directly below it — a loop that calls healthCheck() and sleeps with exponential backoff until it succeeds.
- Step 1: Add
waitForFreeSearch()afterhealthCheck()
In scripts/enrich-with-freesearch.ts, find this block (around line 872):
async function healthCheck(): Promise<boolean> {
try {
const resp = await axios.get(`${FREESEARCH_URL}/api/health`, { timeout: 5000 });
return resp.status === 200;
} catch {
return false;
}
}
Add the following function immediately after it:
async function waitForFreeSearch(): Promise<void> {
let backoffMs = 30_000;
const maxBackoffMs = 300_000; // 5 minutes
let attempt = 0;
while (!shuttingDown) {
attempt++;
const healthy = await healthCheck();
if (healthy) {
if (attempt > 1) log('FreeSearch is back. Continuing...');
return;
}
const waitSec = Math.round(backoffMs / 1000);
logError(`FreeSearch not reachable at ${FREESEARCH_URL} (attempt ${attempt}). Retrying in ${waitSec}s...`);
await sleep(backoffMs);
backoffMs = Math.min(backoffMs * 2, maxBackoffMs);
}
}
- Step 2: Replace the startup health check block in
main()
Find this block in main() (around line 1272):
// Health check
log('Checking FreeSearch health...');
const healthy = await healthCheck();
if (!healthy) {
logError(`FreeSearch not reachable at ${FREESEARCH_URL}`);
logError('Make sure FreeSearch is running and accessible.');
process.exit(1);
}
log('FreeSearch health check: OK');
Replace with:
// Wait for FreeSearch to be reachable (indefinite retry with backoff)
log('Waiting for FreeSearch to be reachable...');
await waitForFreeSearch();
if (shuttingDown) return;
log('FreeSearch health check: OK');
- Step 3: Add stale job cleanup before job creation
Find this block in main() (around line 1291):
// Job tracking
let jobId = await createOrResumeJob(args);
if (!jobId) {
jobId = await createNewJob({ countryCode, limit, continuous, dryRun, reSearch });
}
log(`Job ID: ${jobId}`);
Replace with:
// Job tracking — clean up any running jobs left by a previous container restart
await prisma.backgroundJob.updateMany({
where: { type: 'freesearch-enrichment', status: 'running' },
data: { status: 'failed', error: 'Container restarted', completedAt: new Date() },
});
let jobId = await createOrResumeJob(args);
if (!jobId) {
jobId = await createNewJob({ countryCode, limit, continuous, dryRun, reSearch });
}
log(`Job ID: ${jobId}`);
- Step 4: Verify the script compiles
cd /home/albert/Documents/ScraperControl
npx tsc --noEmit
Expected: no errors (or only pre-existing errors unrelated to this change).
- Step 5: Commit
git add scripts/enrich-with-freesearch.ts
git commit -m "fix: wait for FreeSearch on startup instead of exiting; clean stale jobs"
Task 2: Write heartbeat file in scheduler
Files:
- Modify:
scripts/scheduler.ts
The scheduler already has an hourly cron that logs a heartbeat message (lines 747-758). We add a single fs.writeFileSync call inside it to write the timestamp to /app/logs/scheduler.heartbeat. The logs/ directory is already created by ensureLogsDir() at startup.
- Step 1: Add heartbeat file write inside the hourly cron
Find this block in scripts/scheduler.ts (around line 747):
// Heartbeat every hour — logs cycle state
cron.schedule('0 * * * *', () => {
const currentGroup = cycleState.currentGroupIndex < PIPELINE_GROUPS.length
? PIPELINE_GROUPS[cycleState.currentGroupIndex].name
: 'none';
const jobs = runningJobs.size > 0
? `Running: ${[...runningJobs.keys()].join(', ')}`
: 'No jobs running';
const state = cycleState.waitingForCooldown
? 'cooldown'
: `group ${cycleState.currentGroupIndex + 1}/${PIPELINE_GROUPS.length} (${currentGroup})`;
log(`Heartbeat: Cycle ${cycleState.cycleNumber + 1}, ${state}. ${jobs}`);
}, { timezone: 'UTC' });
log('Registered cron job: heartbeat (hourly)');
Replace with:
// Heartbeat every hour — logs cycle state and writes heartbeat file for Docker healthcheck
cron.schedule('0 * * * *', () => {
const currentGroup = cycleState.currentGroupIndex < PIPELINE_GROUPS.length
? PIPELINE_GROUPS[cycleState.currentGroupIndex].name
: 'none';
const jobs = runningJobs.size > 0
? `Running: ${[...runningJobs.keys()].join(', ')}`
: 'No jobs running';
const state = cycleState.waitingForCooldown
? 'cooldown'
: `group ${cycleState.currentGroupIndex + 1}/${PIPELINE_GROUPS.length} (${currentGroup})`;
log(`Heartbeat: Cycle ${cycleState.cycleNumber + 1}, ${state}. ${jobs}`);
fs.writeFileSync(path.join(LOGS_DIR, 'scheduler.heartbeat'), new Date().toISOString());
}, { timezone: 'UTC' });
log('Registered cron job: heartbeat (hourly)');
fs and path are already imported in scheduler.ts. LOGS_DIR is already defined as '/app/logs'.
- Step 2: Verify the script compiles
cd /home/albert/Documents/ScraperControl
npx tsc --noEmit
Expected: no errors.
- Step 3: Commit
git add scripts/scheduler.ts
git commit -m "fix: write heartbeat file for Docker healthcheck"
Task 3: Fix scheduler healthcheck in docker-compose.yml
Files:
-
Modify:
docker-compose.yml -
Step 1: Replace the scheduler healthcheck
Find this block in docker-compose.yml (around line 275):
healthcheck:
test: ["CMD-SHELL", "pgrep -f scheduler.ts || exit 1"]
interval: 60s
timeout: 10s
retries: 3
start_period: 30s
Replace with:
healthcheck:
test: ["CMD-SHELL", "find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . || exit 1"]
interval: 90s
timeout: 10s
retries: 3
start_period: 90s
The find ... -mmin -120 check passes if the file exists and was modified within the last 120 minutes (2 hours). The start_period: 90s gives the scheduler time to reach its first hourly cron tick before Docker starts evaluating health.
- Step 2: Commit
git add docker-compose.yml
git commit -m "fix: replace pgrep healthcheck with heartbeat file check"
Task 4: Deploy and verify
- Step 1: Sync dev directory to Docker deployment
cd /home/albert/Documents/ScraperControl
bash scripts/deploy-local.sh
Expected: rsync output showing the three changed files transferred to /opt/docker/scraper-control/.
- Step 2: Restart the two affected containers
docker compose -f /opt/docker/scraper-control/docker-compose.yml restart freesearch-enrichment scheduler
- Step 3: Verify freesearch-enrichment is stable
docker logs scraper-control-freesearch-enrichment-1 --tail 30 -f
Expected: logs showing "Waiting for FreeSearch to be reachable..." with retry messages if FreeSearch is still down, OR "FreeSearch health check: OK" and normal enrichment if FreeSearch is up. Container should NOT exit. Wait 2 minutes to confirm no restart.
- Step 4: Confirm stale jobs were cleaned up
docker exec scraper-control-db-1 psql -U postgres -d nearestmass \
-c "SELECT type, status, started_at, completed_at, error FROM background_jobs WHERE type = 'freesearch-enrichment' ORDER BY started_at DESC LIMIT 5;"
Expected: the two previously-stuck running jobs from Mar 22 and Mar 26 now show status = 'failed' with error = 'Container restarted'.
- Step 5: Verify scheduler heartbeat file is written
Check if the file already exists from before (it won't — it's new). Wait for next hourly cron tick, or check after 60 minutes:
docker exec scraper-control-scheduler-1 cat /app/logs/scheduler.heartbeat
Expected: an ISO timestamp, e.g. 2026-03-28T14:00:00.000Z
- Step 6: Verify scheduler becomes healthy
docker ps --format "table {{.Names}}\t{{.Status}}" | grep scheduler
Expected: scraper-control-scheduler-1 Up X hours (healthy) — but only after the first heartbeat fires AND Docker's start_period (90s) passes. If the next cron tick hasn't happened yet, status will remain starting or unhealthy until it does.
To force an immediate test without waiting for the cron:
docker exec scraper-control-scheduler-1 bash -c \
"date -u +%Y-%m-%dT%H:%M:%S.000Z > /app/logs/scheduler.heartbeat && echo 'written'"
docker exec scraper-control-scheduler-1 \
find /app/logs/scheduler.heartbeat -mmin -120 2>/dev/null | grep -q . && echo "PASS" || echo "FAIL"
Expected: written then PASS.