Files
flm-proxy/CLAUDE.md

3.9 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a FastFlowLM proxy server setup that runs LLM models on an AMD NPU (Neural Processing Unit). The proxy auto-starts the model on first request and stops it after idle timeout to free RAM.

Architecture

  • flm-proxy.js — Node.js HTTP proxy (port 8000) that sits in front of FastFlowLM (port 8001). It lazily spawns flm.exe, polls until the model is ready, proxies all requests, and kills the process after 5 minutes of inactivity. Exposes /status and /stop control endpoints.
  • FastFlowLM/flm.exe — Pre-built binary that serves OpenAI-compatible API (/v1/models, /v1/chat/completions, etc.) using NPU-accelerated models. Not source code — do not modify.
  • flm-service-install.js / flm-service-uninstall.js — Install/uninstall the proxy as a Windows service via node-windows.
  • daemon/ — Windows service wrapper files generated by node-windows (exe, logs, config).
  • flm-start.bat / flm-stop.bat — Simple batch scripts to run FLM directly (bypassing the proxy).

Commands

# Run the proxy (foreground)
node flm-proxy.js

# Install as Windows service
node flm-service-install.js

# Uninstall Windows service
node flm-service-uninstall.js

# Install dependencies
npm install

# Check service logs
cat ~/daemon/flmvisionproxy.out.log
cat ~/daemon/flmvisionproxy.err.log

Key Configuration (in flm-proxy.js)

  • MODEL — currently qwen2.5vl-it:3b (Qwen2.5 Vision-Language 3B)
  • PROXY_PORT — 8000 (external-facing)
  • FLM_PORT — 8001 (internal FLM server)
  • IDLE_TIMEOUT_MS — 5 minutes
  • HOST0.0.0.0 (listens on all interfaces)

Available Models

See FastFlowLM/model_list.json for the full catalog. Model identifiers use the format family:size (e.g., qwen3:4b, llama3.2:3b). Vision models have "vlm": true. Thinking models have "think": true.

Services

All services are TypeScript/Express apps with the same build pattern:

cd <ServiceDir>
npm install        # install deps
npm run build      # tsc → dist/
npm start          # node dist/server.js
npm run dev        # tsx watch (hot-reload)

# Windows service management
node service-install.js
node service-uninstall.js

ImageModerationService (port 8100)

Checks uploaded images for NSFW/explicit content using the local vision LLM. When an image is flagged unsafe, fires callbacks to the upload service (to replace the image) and to Parochia (to flag the user).

  • Endpoints: POST /moderate (multipart: file, context, imagePath, userId, siteId), GET /health
  • Vision model: gemma3:4b via FLM proxy at localhost:8000
  • Callbacks: Configurable in .env — upload service replace URL + Parochia moderation callback
  • Source: src/moderate.ts (moderation logic), src/server.ts (Express app)

VisionScannerService (port 8002)

Scans shelf/pantry photos to extract product information and prices using the vision LLM. Uses ChromaDB for embeddings storage and Ollama for embedding generation. Supports image tiling for high-res photos.

  • Endpoints: POST /scan/shelf (multipart: image, store_name), POST /scan/pantry (multipart: image), GET /health
  • Vision model: qwen2.5vl-it:3b via FLM proxy at localhost:8000
  • External deps: Ollama (192.168.0.15:11434, nomic-embed-text), ChromaDB (192.168.0.15:8000), optional Gemini API
  • Source: src/vision.ts (LLM calls), src/tiling.ts (image tiling), src/shelf.ts / src/pantry.ts (scan logic), src/embeddings.ts + src/chroma.ts (vector storage), src/matching.ts (product matching), src/parsing.ts (response parsing), src/gemini.ts (Gemini fallback), src/config.ts

Environment

  • Windows 11, AMD NPU hardware
  • Node.js with node-windows dependency
  • FLM binary path: C:\Users\sshuser\FastFlowLM\flm.exe
  • All paths are hardcoded to C:\Users\sshuser\