CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a FastFlowLM proxy server setup that runs LLM models on an AMD NPU (Neural Processing Unit). The proxy auto-starts the model on first request and stops it after idle timeout to free RAM.

Architecture

flm-proxy.js — Node.js HTTP proxy (port 8000) that sits in front of FastFlowLM (port 8001). It lazily spawns flm.exe, polls until the model is ready, proxies all requests, and kills the process after 5 minutes of inactivity. Exposes /status and /stop control endpoints.
FastFlowLM/flm.exe — Pre-built binary that serves OpenAI-compatible API (/v1/models, /v1/chat/completions, etc.) using NPU-accelerated models. Not source code — do not modify.
flm-service-install.js / flm-service-uninstall.js — Install/uninstall the proxy as a Windows service via node-windows.
daemon/ — Windows service wrapper files generated by node-windows (exe, logs, config).
flm-start.bat / flm-stop.bat — Simple batch scripts to run FLM directly (bypassing the proxy).

Commands

# Run the proxy (foreground)
node flm-proxy.js

# Install as Windows service
node flm-service-install.js

# Uninstall Windows service
node flm-service-uninstall.js

# Install dependencies
npm install

# Check service logs
cat ~/daemon/flmvisionproxy.out.log
cat ~/daemon/flmvisionproxy.err.log

Key Configuration (in flm-proxy.js)

MODEL — currently qwen2.5vl-it:3b (Qwen2.5 Vision-Language 3B)
PROXY_PORT — 8000 (external-facing)
FLM_PORT — 8001 (internal FLM server)
IDLE_TIMEOUT_MS — 5 minutes
HOST — 0.0.0.0 (listens on all interfaces)

Available Models

See FastFlowLM/model_list.json for the full catalog. Model identifiers use the format family:size (e.g., qwen3:4b, llama3.2:3b). Vision models have "vlm": true. Thinking models have "think": true.

Services

All services are TypeScript/Express apps with the same build pattern:

cd <ServiceDir>
npm install        # install deps
npm run build      # tsc → dist/
npm start          # node dist/server.js
npm run dev        # tsx watch (hot-reload)

# Windows service management
node service-install.js
node service-uninstall.js

ImageModerationService (port 8100)

Checks uploaded images for NSFW/explicit content using the local vision LLM. When an image is flagged unsafe, fires callbacks to the upload service (to replace the image) and to Parochia (to flag the user).

Endpoints: POST /moderate (multipart: file, context, imagePath, userId, siteId), GET /health
Vision model: gemma3:4b via FLM proxy at localhost:8000
Callbacks: Configurable in .env — upload service replace URL + Parochia moderation callback
Source: src/moderate.ts (moderation logic), src/server.ts (Express app)

VisionScannerService (port 8002)

Scans shelf/pantry photos to extract product information and prices using the vision LLM. Uses ChromaDB for embeddings storage and Ollama for embedding generation. Supports image tiling for high-res photos.

Endpoints: POST /scan/shelf (multipart: image, store_name), POST /scan/pantry (multipart: image), GET /health
Vision model: qwen2.5vl-it:3b via FLM proxy at localhost:8000
External deps: Ollama (192.168.0.15:11434, nomic-embed-text), ChromaDB (192.168.0.15:8000), optional Gemini API
Source: src/vision.ts (LLM calls), src/tiling.ts (image tiling), src/shelf.ts / src/pantry.ts (scan logic), src/embeddings.ts + src/chroma.ts (vector storage), src/matching.ts (product matching), src/parsing.ts (response parsing), src/gemini.ts (Gemini fallback), src/config.ts

Environment

Windows 11, AMD NPU hardware
Node.js with node-windows dependency
FLM binary path: C:\Users\sshuser\FastFlowLM\flm.exe
All paths are hardcoded to C:\Users\sshuser\

3.9 KiB Raw Blame History