3.3 KiB
3.3 KiB
Vision Scanner Service
TypeScript/Express service that scans shelf and pantry photos to extract product information and prices using a local vision LLM running on an AMD NPU. Uses ChromaDB for vector embeddings storage and Ollama for embedding generation. Supports image tiling for high-resolution photos.
How It Works
- A photo of a store shelf or pantry is uploaded
- The image is tiled into smaller sections for better accuracy on high-res photos
- Each tile is sent to the vision LLM (qwen2.5vl-it:3b via FLM proxy) for product extraction
- Extracted products are matched against existing entries using vector embeddings (ChromaDB + Ollama)
- Optionally enriched via Gemini API as a fallback
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/scan/shelf |
Scan a store shelf photo (multipart: image, store_name) |
POST |
/scan/pantry |
Scan a pantry photo (multipart: image) |
POST |
/enrich/product |
Extract detailed product info from a single product image |
GET |
/health |
Health check (reports status of vision model, Ollama, ChromaDB) |
Configuration
All configuration is via environment variables (.env file):
| Variable | Default | Description |
|---|---|---|
PORT |
8002 |
Service port |
VISION_AI_URL |
http://localhost:8000/v1/chat/completions |
Vision LLM endpoint |
VISION_AI_MODEL |
qwen2.5vl-it:3b |
Vision model to use |
VISION_AI_TIMEOUT |
120000 |
Timeout for vision LLM calls (ms) |
OLLAMA_HOST |
http://192.168.0.15:11434 |
Ollama server for embeddings |
OLLAMA_EMBED_MODEL |
nomic-embed-text |
Embedding model |
CHROMA_HOST |
http://192.168.0.15:8000 |
ChromaDB server |
GEMINI_API_KEY |
— | Optional Gemini API key for fallback |
GEMINI_MODEL |
gemini-2.5-flash |
Gemini model for fallback |
MAX_CONCURRENT_TILES |
4 |
Max parallel tile processing |
UPLOAD_DIR |
uploads |
Temporary upload directory |
Usage
npm install # Install dependencies
npm run build # Compile TypeScript → dist/
npm start # Run the service
npm run dev # Development mode with hot-reload
# Windows service
node service-install.js
node service-uninstall.js
Project Structure
src/
server.ts — Express app, routes
config.ts — Configuration from environment
vision.ts — Vision LLM API calls
tiling.ts — Image tiling for high-res photos
shelf.ts — Shelf scanning logic
pantry.ts — Pantry scanning logic
enrich.ts — Product info enrichment
parsing.ts — LLM response parsing
embeddings.ts — Ollama embedding generation
chroma.ts — ChromaDB vector storage
matching.ts — Product matching via embeddings
gemini.ts — Gemini API fallback
External Dependencies
- FLM Proxy (localhost:8000) — Vision LLM inference on AMD NPU
- Ollama (192.168.0.15:11434) — Embedding generation with
nomic-embed-text - ChromaDB (192.168.0.15:8000) — Vector database for product embeddings
- Gemini API (optional) — Fallback for product enrichment
Environment
- OS: Windows 11, AMD NPU hardware
- Runtime: Node.js + TypeScript
- Vision LLM: qwen2.5vl-it:3b served by FLM proxy on localhost:8000