Add README with project documentation
This commit is contained in:
63
README.md
Normal file
63
README.md
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
# FLM Proxy
|
||||||
|
|
||||||
|
Node.js HTTP proxy that sits in front of [FastFlowLM](https://github.com/amd/FastFlowLM) to serve LLM inference on an AMD NPU. The proxy lazily starts the model on the first request and automatically stops it after an idle timeout to free RAM.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
- External clients hit the proxy on **port 8000**
|
||||||
|
- On the first request, the proxy spawns `flm.exe` which serves an OpenAI-compatible API on port 8001
|
||||||
|
- All subsequent requests are proxied through to FLM
|
||||||
|
- After 5 minutes of inactivity, the model process is killed to reclaim memory
|
||||||
|
- The next request will cold-start the model again (~10-15 seconds)
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
| Setting | Default | Description |
|
||||||
|
|---------|---------|-------------|
|
||||||
|
| `MODEL` | `qwen2.5vl-it:3b` | Model to serve (see `FastFlowLM/model_list.json`) |
|
||||||
|
| `PROXY_PORT` | `8000` | External-facing port |
|
||||||
|
| `FLM_PORT` | `8001` | Internal FLM server port |
|
||||||
|
| `IDLE_TIMEOUT_MS` | `300000` (5 min) | Idle time before stopping the model |
|
||||||
|
| `HOST` | `0.0.0.0` | Listen address |
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
| Endpoint | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `/v1/chat/completions` | OpenAI-compatible chat (proxied to FLM) |
|
||||||
|
| `/v1/models` | List available models (proxied to FLM) |
|
||||||
|
| `/status` | Proxy status — model ready, starting, PID |
|
||||||
|
| `/stop` | Manually stop the model and free RAM |
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
npm install
|
||||||
|
|
||||||
|
# Run in foreground
|
||||||
|
node flm-proxy.js
|
||||||
|
|
||||||
|
# Install as a Windows service
|
||||||
|
node flm-service-install.js
|
||||||
|
|
||||||
|
# Uninstall Windows service
|
||||||
|
node flm-service-uninstall.js
|
||||||
|
```
|
||||||
|
|
||||||
|
## Service Logs
|
||||||
|
|
||||||
|
When running as a Windows service, logs are written to:
|
||||||
|
- `~/daemon/flmvisionproxy.out.log`
|
||||||
|
- `~/daemon/flmvisionproxy.err.log`
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- **OS:** Windows 11, AMD NPU hardware
|
||||||
|
- **Runtime:** Node.js
|
||||||
|
- **FLM binary:** `C:\Users\sshuser\FastFlowLM\flm.exe`
|
||||||
|
- **Dependencies:** `node-windows` (for service install)
|
||||||
|
|
||||||
|
## Available Models
|
||||||
|
|
||||||
|
See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`, thinking models have `"think": true`.
|
||||||
Reference in New Issue
Block a user