Files
flm-proxy/CLAUDE.md

89 lines
3.9 KiB
Markdown
Raw Normal View History

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
This is a FastFlowLM proxy server setup that runs LLM models on an AMD NPU (Neural Processing Unit). The proxy auto-starts the model on first request and stops it after idle timeout to free RAM.
## Architecture
- **`flm-proxy.js`** — Node.js HTTP proxy (port 8000) that sits in front of FastFlowLM (port 8001). It lazily spawns `flm.exe`, polls until the model is ready, proxies all requests, and kills the process after 5 minutes of inactivity. Exposes `/status` and `/stop` control endpoints.
- **`FastFlowLM/flm.exe`** — Pre-built binary that serves OpenAI-compatible API (`/v1/models`, `/v1/chat/completions`, etc.) using NPU-accelerated models. Not source code — do not modify.
- **`flm-service-install.js` / `flm-service-uninstall.js`** — Install/uninstall the proxy as a Windows service via `node-windows`.
- **`daemon/`** — Windows service wrapper files generated by `node-windows` (exe, logs, config).
- **`flm-start.bat` / `flm-stop.bat`** — Simple batch scripts to run FLM directly (bypassing the proxy).
## Commands
```bash
# Run the proxy (foreground)
node flm-proxy.js
# Install as Windows service
node flm-service-install.js
# Uninstall Windows service
node flm-service-uninstall.js
# Install dependencies
npm install
# Check service logs
cat ~/daemon/flmvisionproxy.out.log
cat ~/daemon/flmvisionproxy.err.log
```
## Key Configuration (in flm-proxy.js)
- `MODEL` — currently `qwen2.5vl-it:3b` (Qwen2.5 Vision-Language 3B)
- `PROXY_PORT` — 8000 (external-facing)
- `FLM_PORT` — 8001 (internal FLM server)
- `IDLE_TIMEOUT_MS` — 5 minutes
- `HOST``0.0.0.0` (listens on all interfaces)
## Available Models
See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`. Thinking models have `"think": true`.
## Services
All services are TypeScript/Express apps with the same build pattern:
```bash
cd <ServiceDir>
npm install # install deps
npm run build # tsc → dist/
npm start # node dist/server.js
npm run dev # tsx watch (hot-reload)
# Windows service management
node service-install.js
node service-uninstall.js
```
### ImageModerationService (port 8100)
Checks uploaded images for NSFW/explicit content using the local vision LLM. When an image is flagged unsafe, fires callbacks to the upload service (to replace the image) and to Parochia (to flag the user).
- **Endpoints:** `POST /moderate` (multipart: `file`, `context`, `imagePath`, `userId`, `siteId`), `GET /health`
- **Vision model:** `gemma3:4b` via FLM proxy at `localhost:8000`
- **Callbacks:** Configurable in `.env` — upload service replace URL + Parochia moderation callback
- **Source:** `src/moderate.ts` (moderation logic), `src/server.ts` (Express app)
### VisionScannerService (port 8002)
Scans shelf/pantry photos to extract product information and prices using the vision LLM. Uses ChromaDB for embeddings storage and Ollama for embedding generation. Supports image tiling for high-res photos.
- **Endpoints:** `POST /scan/shelf` (multipart: `image`, `store_name`), `POST /scan/pantry` (multipart: `image`), `GET /health`
- **Vision model:** `qwen2.5vl-it:3b` via FLM proxy at `localhost:8000`
- **External deps:** Ollama (`192.168.0.15:11434`, `nomic-embed-text`), ChromaDB (`192.168.0.15:8000`), optional Gemini API
- **Source:** `src/vision.ts` (LLM calls), `src/tiling.ts` (image tiling), `src/shelf.ts` / `src/pantry.ts` (scan logic), `src/embeddings.ts` + `src/chroma.ts` (vector storage), `src/matching.ts` (product matching), `src/parsing.ts` (response parsing), `src/gemini.ts` (Gemini fallback), `src/config.ts`
## Environment
- Windows 11, AMD NPU hardware
- Node.js with `node-windows` dependency
- FLM binary path: `C:\Users\sshuser\FastFlowLM\flm.exe`
- All paths are hardcoded to `C:\Users\sshuser\`