Add README with project documentation
This commit is contained in:
63
README.md
Normal file
63
README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# FLM Proxy
|
||||
|
||||
Node.js HTTP proxy that sits in front of [FastFlowLM](https://github.com/amd/FastFlowLM) to serve LLM inference on an AMD NPU. The proxy lazily starts the model on the first request and automatically stops it after an idle timeout to free RAM.
|
||||
|
||||
## How It Works
|
||||
|
||||
- External clients hit the proxy on **port 8000**
|
||||
- On the first request, the proxy spawns `flm.exe` which serves an OpenAI-compatible API on port 8001
|
||||
- All subsequent requests are proxied through to FLM
|
||||
- After 5 minutes of inactivity, the model process is killed to reclaim memory
|
||||
- The next request will cold-start the model again (~10-15 seconds)
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `MODEL` | `qwen2.5vl-it:3b` | Model to serve (see `FastFlowLM/model_list.json`) |
|
||||
| `PROXY_PORT` | `8000` | External-facing port |
|
||||
| `FLM_PORT` | `8001` | Internal FLM server port |
|
||||
| `IDLE_TIMEOUT_MS` | `300000` (5 min) | Idle time before stopping the model |
|
||||
| `HOST` | `0.0.0.0` | Listen address |
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `/v1/chat/completions` | OpenAI-compatible chat (proxied to FLM) |
|
||||
| `/v1/models` | List available models (proxied to FLM) |
|
||||
| `/status` | Proxy status — model ready, starting, PID |
|
||||
| `/stop` | Manually stop the model and free RAM |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Run in foreground
|
||||
node flm-proxy.js
|
||||
|
||||
# Install as a Windows service
|
||||
node flm-service-install.js
|
||||
|
||||
# Uninstall Windows service
|
||||
node flm-service-uninstall.js
|
||||
```
|
||||
|
||||
## Service Logs
|
||||
|
||||
When running as a Windows service, logs are written to:
|
||||
- `~/daemon/flmvisionproxy.out.log`
|
||||
- `~/daemon/flmvisionproxy.err.log`
|
||||
|
||||
## Environment
|
||||
|
||||
- **OS:** Windows 11, AMD NPU hardware
|
||||
- **Runtime:** Node.js
|
||||
- **FLM binary:** `C:\Users\sshuser\FastFlowLM\flm.exe`
|
||||
- **Dependencies:** `node-windows` (for service install)
|
||||
|
||||
## Available Models
|
||||
|
||||
See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`, thinking models have `"think": true`.
|
||||
Reference in New Issue
Block a user