diff --git a/README.md b/README.md new file mode 100644 index 0000000..4e1c692 --- /dev/null +++ b/README.md @@ -0,0 +1,82 @@ +# Vision Scanner Service + +TypeScript/Express service that scans shelf and pantry photos to extract product information and prices using a local vision LLM running on an AMD NPU. Uses ChromaDB for vector embeddings storage and Ollama for embedding generation. Supports image tiling for high-resolution photos. + +## How It Works + +1. A photo of a store shelf or pantry is uploaded +2. The image is tiled into smaller sections for better accuracy on high-res photos +3. Each tile is sent to the vision LLM (qwen2.5vl-it:3b via FLM proxy) for product extraction +4. Extracted products are matched against existing entries using vector embeddings (ChromaDB + Ollama) +5. Optionally enriched via Gemini API as a fallback + +## Endpoints + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/scan/shelf` | Scan a store shelf photo (multipart: `image`, `store_name`) | +| `POST` | `/scan/pantry` | Scan a pantry photo (multipart: `image`) | +| `POST` | `/enrich/product` | Extract detailed product info from a single product image | +| `GET` | `/health` | Health check (reports status of vision model, Ollama, ChromaDB) | + +## Configuration + +All configuration is via environment variables (`.env` file): + +| Variable | Default | Description | +|----------|---------|-------------| +| `PORT` | `8002` | Service port | +| `VISION_AI_URL` | `http://localhost:8000/v1/chat/completions` | Vision LLM endpoint | +| `VISION_AI_MODEL` | `qwen2.5vl-it:3b` | Vision model to use | +| `VISION_AI_TIMEOUT` | `120000` | Timeout for vision LLM calls (ms) | +| `OLLAMA_HOST` | `http://192.168.0.15:11434` | Ollama server for embeddings | +| `OLLAMA_EMBED_MODEL` | `nomic-embed-text` | Embedding model | +| `CHROMA_HOST` | `http://192.168.0.15:8000` | ChromaDB server | +| `GEMINI_API_KEY` | — | Optional Gemini API key for fallback | +| `GEMINI_MODEL` | `gemini-2.5-flash` | Gemini model for fallback | +| `MAX_CONCURRENT_TILES` | `4` | Max parallel tile processing | +| `UPLOAD_DIR` | `uploads` | Temporary upload directory | + +## Usage + +```bash +npm install # Install dependencies +npm run build # Compile TypeScript → dist/ +npm start # Run the service +npm run dev # Development mode with hot-reload + +# Windows service +node service-install.js +node service-uninstall.js +``` + +## Project Structure + +``` +src/ + server.ts — Express app, routes + config.ts — Configuration from environment + vision.ts — Vision LLM API calls + tiling.ts — Image tiling for high-res photos + shelf.ts — Shelf scanning logic + pantry.ts — Pantry scanning logic + enrich.ts — Product info enrichment + parsing.ts — LLM response parsing + embeddings.ts — Ollama embedding generation + chroma.ts — ChromaDB vector storage + matching.ts — Product matching via embeddings + gemini.ts — Gemini API fallback +``` + +## External Dependencies + +- **FLM Proxy** (localhost:8000) — Vision LLM inference on AMD NPU +- **Ollama** (192.168.0.15:11434) — Embedding generation with `nomic-embed-text` +- **ChromaDB** (192.168.0.15:8000) — Vector database for product embeddings +- **Gemini API** (optional) — Fallback for product enrichment + +## Environment + +- **OS:** Windows 11, AMD NPU hardware +- **Runtime:** Node.js + TypeScript +- **Vision LLM:** qwen2.5vl-it:3b served by FLM proxy on localhost:8000