Vision Scanner Service

TypeScript/Express service that scans shelf and pantry photos to extract product information and prices using a local vision LLM running on an AMD NPU. Uses ChromaDB for vector embeddings storage and Ollama for embedding generation. Supports image tiling for high-resolution photos.

How It Works

A photo of a store shelf or pantry is uploaded
The image is tiled into smaller sections for better accuracy on high-res photos
Each tile is sent to the vision LLM (qwen2.5vl-it:3b via FLM proxy) for product extraction
Extracted products are matched against existing entries using vector embeddings (ChromaDB + Ollama)
Optionally enriched via Gemini API as a fallback

Endpoints

Method	Path	Description
`POST`	`/scan/shelf`	Scan a store shelf photo (multipart: `image`, `store_name`)
`POST`	`/scan/pantry`	Scan a pantry photo (multipart: `image`)
`POST`	`/enrich/product`	Extract detailed product info from a single product image
`GET`	`/health`	Health check (reports status of vision model, Ollama, ChromaDB)

Configuration

All configuration is via environment variables (.env file):

Variable	Default	Description
`PORT`	`8002`	Service port
`VISION_AI_URL`	`http://localhost:8000/v1/chat/completions`	Vision LLM endpoint
`VISION_AI_MODEL`	`qwen2.5vl-it:3b`	Vision model to use
`VISION_AI_TIMEOUT`	`120000`	Timeout for vision LLM calls (ms)
`OLLAMA_HOST`	`http://192.168.0.15:11434`	Ollama server for embeddings
`OLLAMA_EMBED_MODEL`	`nomic-embed-text`	Embedding model
`CHROMA_HOST`	`http://192.168.0.15:8000`	ChromaDB server
`GEMINI_API_KEY`	—	Optional Gemini API key for fallback
`GEMINI_MODEL`	`gemini-2.5-flash`	Gemini model for fallback
`MAX_CONCURRENT_TILES`	`4`	Max parallel tile processing
`UPLOAD_DIR`	`uploads`	Temporary upload directory

Usage

npm install        # Install dependencies
npm run build      # Compile TypeScript → dist/
npm start          # Run the service
npm run dev        # Development mode with hot-reload

# Windows service
node service-install.js
node service-uninstall.js

Project Structure

src/
  server.ts      — Express app, routes
  config.ts      — Configuration from environment
  vision.ts      — Vision LLM API calls
  tiling.ts      — Image tiling for high-res photos
  shelf.ts       — Shelf scanning logic
  pantry.ts      — Pantry scanning logic
  enrich.ts      — Product info enrichment
  parsing.ts     — LLM response parsing
  embeddings.ts  — Ollama embedding generation
  chroma.ts      — ChromaDB vector storage
  matching.ts    — Product matching via embeddings
  gemini.ts      — Gemini API fallback

External Dependencies

FLM Proxy (localhost:8000) — Vision LLM inference on AMD NPU
Ollama (192.168.0.15:11434) — Embedding generation with nomic-embed-text
ChromaDB (192.168.0.15:8000) — Vector database for product embeddings
Gemini API (optional) — Fallback for product enrichment

Environment

OS: Windows 11, AMD NPU hardware
Runtime: Node.js + TypeScript
Vision LLM: qwen2.5vl-it:3b served by FLM proxy on localhost:8000

3.3 KiB Raw Permalink Blame History