Add README with project documentation

2026-03-29 22:04:41 -04:00
parent a5dcb56f7d
commit 57ed2f5505
1 changed files with 63 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,63 @@
 # FLM Proxy
 Node.js HTTP proxy that sits in front of [FastFlowLM](https://github.com/amd/FastFlowLM) to serve LLM inference on an AMD NPU. The proxy lazily starts the model on the first request and automatically stops it after an idle timeout to free RAM.
 ## How It Works
 - External clients hit the proxy on **port 8000**
 - On the first request, the proxy spawns `flm.exe` which serves an OpenAI-compatible API on port 8001
 - All subsequent requests are proxied through to FLM
 - After 5 minutes of inactivity, the model process is killed to reclaim memory
 - The next request will cold-start the model again (~10-15 seconds)
 ## Configuration
 | Setting | Default | Description |
 |---------|---------|-------------|
 | `MODEL` | `qwen2.5vl-it:3b` | Model to serve (see `FastFlowLM/model_list.json`) |
 | `PROXY_PORT` | `8000` | External-facing port |
 | `FLM_PORT` | `8001` | Internal FLM server port |
 | `IDLE_TIMEOUT_MS` | `300000` (5 min) | Idle time before stopping the model |
 | `HOST` | `0.0.0.0` | Listen address |
 ## Endpoints
 | Endpoint | Description |
 |----------|-------------|
 | `/v1/chat/completions` | OpenAI-compatible chat (proxied to FLM) |
 | `/v1/models` | List available models (proxied to FLM) |
 | `/status` | Proxy status — model ready, starting, PID |
 | `/stop` | Manually stop the model and free RAM |
 ## Usage
 ```bash
 # Install dependencies
 npm install
 # Run in foreground
 node flm-proxy.js
 # Install as a Windows service
 node flm-service-install.js
 # Uninstall Windows service
 node flm-service-uninstall.js
 ```
 ## Service Logs
 When running as a Windows service, logs are written to:
 - `~/daemon/flmvisionproxy.out.log`
 - `~/daemon/flmvisionproxy.err.log`
 ## Environment
 - **OS:** Windows 11, AMD NPU hardware
 - **Runtime:** Node.js
 - **FLM binary:** `C:\Users\sshuser\FastFlowLM\flm.exe`
 - **Dependencies:** `node-windows` (for service install)
 ## Available Models
 See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`, thinking models have `"think": true`.