Add README with project documentation
This commit is contained in:
82
README.md
Normal file
82
README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Vision Scanner Service
|
||||
|
||||
TypeScript/Express service that scans shelf and pantry photos to extract product information and prices using a local vision LLM running on an AMD NPU. Uses ChromaDB for vector embeddings storage and Ollama for embedding generation. Supports image tiling for high-resolution photos.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. A photo of a store shelf or pantry is uploaded
|
||||
2. The image is tiled into smaller sections for better accuracy on high-res photos
|
||||
3. Each tile is sent to the vision LLM (qwen2.5vl-it:3b via FLM proxy) for product extraction
|
||||
4. Extracted products are matched against existing entries using vector embeddings (ChromaDB + Ollama)
|
||||
5. Optionally enriched via Gemini API as a fallback
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `POST` | `/scan/shelf` | Scan a store shelf photo (multipart: `image`, `store_name`) |
|
||||
| `POST` | `/scan/pantry` | Scan a pantry photo (multipart: `image`) |
|
||||
| `POST` | `/enrich/product` | Extract detailed product info from a single product image |
|
||||
| `GET` | `/health` | Health check (reports status of vision model, Ollama, ChromaDB) |
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration is via environment variables (`.env` file):
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `PORT` | `8002` | Service port |
|
||||
| `VISION_AI_URL` | `http://localhost:8000/v1/chat/completions` | Vision LLM endpoint |
|
||||
| `VISION_AI_MODEL` | `qwen2.5vl-it:3b` | Vision model to use |
|
||||
| `VISION_AI_TIMEOUT` | `120000` | Timeout for vision LLM calls (ms) |
|
||||
| `OLLAMA_HOST` | `http://192.168.0.15:11434` | Ollama server for embeddings |
|
||||
| `OLLAMA_EMBED_MODEL` | `nomic-embed-text` | Embedding model |
|
||||
| `CHROMA_HOST` | `http://192.168.0.15:8000` | ChromaDB server |
|
||||
| `GEMINI_API_KEY` | — | Optional Gemini API key for fallback |
|
||||
| `GEMINI_MODEL` | `gemini-2.5-flash` | Gemini model for fallback |
|
||||
| `MAX_CONCURRENT_TILES` | `4` | Max parallel tile processing |
|
||||
| `UPLOAD_DIR` | `uploads` | Temporary upload directory |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
npm install # Install dependencies
|
||||
npm run build # Compile TypeScript → dist/
|
||||
npm start # Run the service
|
||||
npm run dev # Development mode with hot-reload
|
||||
|
||||
# Windows service
|
||||
node service-install.js
|
||||
node service-uninstall.js
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/
|
||||
server.ts — Express app, routes
|
||||
config.ts — Configuration from environment
|
||||
vision.ts — Vision LLM API calls
|
||||
tiling.ts — Image tiling for high-res photos
|
||||
shelf.ts — Shelf scanning logic
|
||||
pantry.ts — Pantry scanning logic
|
||||
enrich.ts — Product info enrichment
|
||||
parsing.ts — LLM response parsing
|
||||
embeddings.ts — Ollama embedding generation
|
||||
chroma.ts — ChromaDB vector storage
|
||||
matching.ts — Product matching via embeddings
|
||||
gemini.ts — Gemini API fallback
|
||||
```
|
||||
|
||||
## External Dependencies
|
||||
|
||||
- **FLM Proxy** (localhost:8000) — Vision LLM inference on AMD NPU
|
||||
- **Ollama** (192.168.0.15:11434) — Embedding generation with `nomic-embed-text`
|
||||
- **ChromaDB** (192.168.0.15:8000) — Vector database for product embeddings
|
||||
- **Gemini API** (optional) — Fallback for product enrichment
|
||||
|
||||
## Environment
|
||||
|
||||
- **OS:** Windows 11, AMD NPU hardware
|
||||
- **Runtime:** Node.js + TypeScript
|
||||
- **Vision LLM:** qwen2.5vl-it:3b served by FLM proxy on localhost:8000
|
||||
Reference in New Issue
Block a user