diff --git a/README.md b/README.md
new file mode 100644
index 0000000..b9cb9d2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,63 @@
+# FLM Proxy
+
+Node.js HTTP proxy that sits in front of [FastFlowLM](https://github.com/amd/FastFlowLM) to serve LLM inference on an AMD NPU. The proxy lazily starts the model on the first request and automatically stops it after an idle timeout to free RAM.
+
+## How It Works
+
+- External clients hit the proxy on **port 8000**
+- On the first request, the proxy spawns `flm.exe` which serves an OpenAI-compatible API on port 8001
+- All subsequent requests are proxied through to FLM
+- After 5 minutes of inactivity, the model process is killed to reclaim memory
+- The next request will cold-start the model again (~10-15 seconds)
+
+## Configuration
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `MODEL` | `qwen2.5vl-it:3b` | Model to serve (see `FastFlowLM/model_list.json`) |
+| `PROXY_PORT` | `8000` | External-facing port |
+| `FLM_PORT` | `8001` | Internal FLM server port |
+| `IDLE_TIMEOUT_MS` | `300000` (5 min) | Idle time before stopping the model |
+| `HOST` | `0.0.0.0` | Listen address |
+
+## Endpoints
+
+| Endpoint | Description |
+|----------|-------------|
+| `/v1/chat/completions` | OpenAI-compatible chat (proxied to FLM) |
+| `/v1/models` | List available models (proxied to FLM) |
+| `/status` | Proxy status — model ready, starting, PID |
+| `/stop` | Manually stop the model and free RAM |
+
+## Usage
+
+```bash
+# Install dependencies
+npm install
+
+# Run in foreground
+node flm-proxy.js
+
+# Install as a Windows service
+node flm-service-install.js
+
+# Uninstall Windows service
+node flm-service-uninstall.js
+```
+
+## Service Logs
+
+When running as a Windows service, logs are written to:
+- `~/daemon/flmvisionproxy.out.log`
+- `~/daemon/flmvisionproxy.err.log`
+
+## Environment
+
+- **OS:** Windows 11, AMD NPU hardware
+- **Runtime:** Node.js
+- **FLM binary:** `C:\Users\sshuser\FastFlowLM\flm.exe`
+- **Dependencies:** `node-windows` (for service install)
+
+## Available Models
+
+See `FastFlowLM/model_list.json` for the full catalog. Model identifiers use the format `family:size` (e.g., `qwen3:4b`, `llama3.2:3b`). Vision models have `"vlm": true`, thinking models have `"think": true`.