3.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
This is a FastFlowLM proxy server setup that runs LLM models on an AMD NPU (Neural Processing Unit). The proxy auto-starts the model on first request and stops it after idle timeout to free RAM.
Architecture
flm-proxy.js— Node.js HTTP proxy (port 8000) that sits in front of FastFlowLM (port 8001). It lazily spawnsflm.exe, polls until the model is ready, proxies all requests, and kills the process after 5 minutes of inactivity. Exposes/statusand/stopcontrol endpoints.FastFlowLM/flm.exe— Pre-built binary that serves OpenAI-compatible API (/v1/models,/v1/chat/completions, etc.) using NPU-accelerated models. Not source code — do not modify.flm-service-install.js/flm-service-uninstall.js— Install/uninstall the proxy as a Windows service vianode-windows.daemon/— Windows service wrapper files generated bynode-windows(exe, logs, config).flm-start.bat/flm-stop.bat— Simple batch scripts to run FLM directly (bypassing the proxy).
Commands
# Run the proxy (foreground)
node flm-proxy.js
# Install as Windows service
node flm-service-install.js
# Uninstall Windows service
node flm-service-uninstall.js
# Install dependencies
npm install
# Check service logs
cat ~/daemon/flmvisionproxy.out.log
cat ~/daemon/flmvisionproxy.err.log
Key Configuration (in flm-proxy.js)
MODEL— currentlyqwen2.5vl-it:3b(Qwen2.5 Vision-Language 3B)PROXY_PORT— 8000 (external-facing)FLM_PORT— 8001 (internal FLM server)IDLE_TIMEOUT_MS— 5 minutesHOST—0.0.0.0(listens on all interfaces)
Available Models
See FastFlowLM/model_list.json for the full catalog. Model identifiers use the format family:size (e.g., qwen3:4b, llama3.2:3b). Vision models have "vlm": true. Thinking models have "think": true.
Services
All services are TypeScript/Express apps with the same build pattern:
cd <ServiceDir>
npm install # install deps
npm run build # tsc → dist/
npm start # node dist/server.js
npm run dev # tsx watch (hot-reload)
# Windows service management
node service-install.js
node service-uninstall.js
ImageModerationService (port 8100)
Checks uploaded images for NSFW/explicit content using the local vision LLM. When an image is flagged unsafe, fires callbacks to the upload service (to replace the image) and to Parochia (to flag the user).
- Endpoints:
POST /moderate(multipart:file,context,imagePath,userId,siteId),GET /health - Vision model:
gemma3:4bvia FLM proxy atlocalhost:8000 - Callbacks: Configurable in
.env— upload service replace URL + Parochia moderation callback - Source:
src/moderate.ts(moderation logic),src/server.ts(Express app)
VisionScannerService (port 8002)
Scans shelf/pantry photos to extract product information and prices using the vision LLM. Uses ChromaDB for embeddings storage and Ollama for embedding generation. Supports image tiling for high-res photos.
- Endpoints:
POST /scan/shelf(multipart:image,store_name),POST /scan/pantry(multipart:image),GET /health - Vision model:
qwen2.5vl-it:3bvia FLM proxy atlocalhost:8000 - External deps: Ollama (
192.168.0.15:11434,nomic-embed-text), ChromaDB (192.168.0.15:8000), optional Gemini API - Source:
src/vision.ts(LLM calls),src/tiling.ts(image tiling),src/shelf.ts/src/pantry.ts(scan logic),src/embeddings.ts+src/chroma.ts(vector storage),src/matching.ts(product matching),src/parsing.ts(response parsing),src/gemini.ts(Gemini fallback),src/config.ts
Environment
- Windows 11, AMD NPU hardware
- Node.js with
node-windowsdependency - FLM binary path:
C:\Users\sshuser\FastFlowLM\flm.exe - All paths are hardcoded to
C:\Users\sshuser\