# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview This is a FastFlowLM proxy server setup that runs LLM models on an AMD NPU (Neural Processing Unit). The proxy auto-starts the model on first request and stops it after idle timeout to free RAM. ## Architecture - **`flm-proxy.js`** — Node.js HTTP proxy (port 8000) that sits in front of FastFlowLM (port 8001). It lazily spawns `flm.exe`, polls until the model is ready, proxies all requests, and kills the process after 5 minutes of inactivity. Exposes `/status` and `/stop` control endpoints. - **`FastFlowLM/flm.exe`** — Pre-built binary that serves OpenAI-compatible API (`/v1/models`, `/v1/chat/completions`, etc.) using NPU-accelerated models. Not source code — do not modify. - **`flm-service-install.js` / `flm-service-uninstall.js`** — Install/uninstall the proxy as a Windows service via `node-windows`. - **`daemon/`** — Windows service wrapper files generated by `node-windows` (exe, logs, config). - **`flm-start.bat` / `flm-stop.bat`** — Simple batch scripts to run FLM directly (bypassing the proxy). ## Commands ```bash # Run the proxy (foreground) node flm-proxy.js # Install as Windows service node flm-service-install.js # Uninstall Windows service node flm-service-uninstall.js # Install dependencies npm install # Check service logs cat ~/daemon/flmvisionproxy.out.log cat ~/daemon/flmvisionproxy.err.log ``` ## Key Configuration (in flm-proxy.js) - `MODEL` — currently `qwen2.5vl-it:3b` (Qwen2.5 Vision-Language 3B) - `PROXY_PORT` — 8000 (external-facing) - `FLM_PORT` — 8001 (internal FLM server) - `IDLE_TIMEOUT_MS` — 5 minutes - `HOST` — `0.0.0.0` (listens on all interfaces) ## Available Models See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`. Thinking models have `"think": true`. ## Services All services are TypeScript/Express apps with the same build pattern: ```bash cd npm install # install deps npm run build # tsc → dist/ npm start # node dist/server.js npm run dev # tsx watch (hot-reload) # Windows service management node service-install.js node service-uninstall.js ``` ### ImageModerationService (port 8100) Checks uploaded images for NSFW/explicit content using the local vision LLM. When an image is flagged unsafe, fires callbacks to the upload service (to replace the image) and to Parochia (to flag the user). - **Endpoints:** `POST /moderate` (multipart: `file`, `context`, `imagePath`, `userId`, `siteId`), `GET /health` - **Vision model:** `gemma3:4b` via FLM proxy at `localhost:8000` - **Callbacks:** Configurable in `.env` — upload service replace URL + Parochia moderation callback - **Source:** `src/moderate.ts` (moderation logic), `src/server.ts` (Express app) ### VisionScannerService (port 8002) Scans shelf/pantry photos to extract product information and prices using the vision LLM. Uses ChromaDB for embeddings storage and Ollama for embedding generation. Supports image tiling for high-res photos. - **Endpoints:** `POST /scan/shelf` (multipart: `image`, `store_name`), `POST /scan/pantry` (multipart: `image`), `GET /health` - **Vision model:** `qwen2.5vl-it:3b` via FLM proxy at `localhost:8000` - **External deps:** Ollama (`192.168.0.15:11434`, `nomic-embed-text`), ChromaDB (`192.168.0.15:8000`), optional Gemini API - **Source:** `src/vision.ts` (LLM calls), `src/tiling.ts` (image tiling), `src/shelf.ts` / `src/pantry.ts` (scan logic), `src/embeddings.ts` + `src/chroma.ts` (vector storage), `src/matching.ts` (product matching), `src/parsing.ts` (response parsing), `src/gemini.ts` (Gemini fallback), `src/config.ts` ## Environment - Windows 11, AMD NPU hardware - Node.js with `node-windows` dependency - FLM binary path: `C:\Users\sshuser\FastFlowLM\flm.exe` - All paths are hardcoded to `C:\Users\sshuser\`