# Vision Scanner Service TypeScript/Express service that scans shelf and pantry photos to extract product information and prices using a local vision LLM running on an AMD NPU. Uses ChromaDB for vector embeddings storage and Ollama for embedding generation. Supports image tiling for high-resolution photos. ## How It Works 1. A photo of a store shelf or pantry is uploaded 2. The image is tiled into smaller sections for better accuracy on high-res photos 3. Each tile is sent to the vision LLM (qwen2.5vl-it:3b via FLM proxy) for product extraction 4. Extracted products are matched against existing entries using vector embeddings (ChromaDB + Ollama) 5. Optionally enriched via Gemini API as a fallback ## Endpoints | Method | Path | Description | |--------|------|-------------| | `POST` | `/scan/shelf` | Scan a store shelf photo (multipart: `image`, `store_name`) | | `POST` | `/scan/pantry` | Scan a pantry photo (multipart: `image`) | | `POST` | `/enrich/product` | Extract detailed product info from a single product image | | `GET` | `/health` | Health check (reports status of vision model, Ollama, ChromaDB) | ## Configuration All configuration is via environment variables (`.env` file): | Variable | Default | Description | |----------|---------|-------------| | `PORT` | `8002` | Service port | | `VISION_AI_URL` | `http://localhost:8000/v1/chat/completions` | Vision LLM endpoint | | `VISION_AI_MODEL` | `qwen2.5vl-it:3b` | Vision model to use | | `VISION_AI_TIMEOUT` | `120000` | Timeout for vision LLM calls (ms) | | `OLLAMA_HOST` | `http://192.168.0.15:11434` | Ollama server for embeddings | | `OLLAMA_EMBED_MODEL` | `nomic-embed-text` | Embedding model | | `CHROMA_HOST` | `http://192.168.0.15:8000` | ChromaDB server | | `GEMINI_API_KEY` | — | Optional Gemini API key for fallback | | `GEMINI_MODEL` | `gemini-2.5-flash` | Gemini model for fallback | | `MAX_CONCURRENT_TILES` | `4` | Max parallel tile processing | | `UPLOAD_DIR` | `uploads` | Temporary upload directory | ## Usage ```bash npm install # Install dependencies npm run build # Compile TypeScript → dist/ npm start # Run the service npm run dev # Development mode with hot-reload # Windows service node service-install.js node service-uninstall.js ``` ## Project Structure ``` src/ server.ts — Express app, routes config.ts — Configuration from environment vision.ts — Vision LLM API calls tiling.ts — Image tiling for high-res photos shelf.ts — Shelf scanning logic pantry.ts — Pantry scanning logic enrich.ts — Product info enrichment parsing.ts — LLM response parsing embeddings.ts — Ollama embedding generation chroma.ts — ChromaDB vector storage matching.ts — Product matching via embeddings gemini.ts — Gemini API fallback ``` ## External Dependencies - **FLM Proxy** (localhost:8000) — Vision LLM inference on AMD NPU - **Ollama** (192.168.0.15:11434) — Embedding generation with `nomic-embed-text` - **ChromaDB** (192.168.0.15:8000) — Vector database for product embeddings - **Gemini API** (optional) — Fallback for product enrichment ## Environment - **OS:** Windows 11, AMD NPU hardware - **Runtime:** Node.js + TypeScript - **Vision LLM:** qwen2.5vl-it:3b served by FLM proxy on localhost:8000