Initial commit: Jersey detection test suite

Test scripts and utilities for evaluating vision-language models on jersey number detection using llama.cpp server.
2026-01-20 13:37:01 -07:00
commit 8706edcd13
14 changed files with 3080 additions and 0 deletions
--- a/docs/LLAMA_SWAP_SETUP.md
+++ b/docs/LLAMA_SWAP_SETUP.md
@ -0,0 +1,237 @@
+# llama-swap Setup Guide for Jersey Detection Testing
+
+This guide explains how to use [llama-swap](https://github.com/mostlygeek/llama-swap) to automatically switch between different vision language models when testing jersey detection.
+
+## What is llama-swap?
+
+llama-swap is a model-swapping proxy that sits between your application and llama.cpp servers. It automatically loads and unloads models based on the `model` parameter in API requests, allowing you to test multiple models without manually restarting servers.
+
+## Installation
+
+### Docker (Recommended)
+
+```bash
+# Pull the CUDA image (or cpu, vulkan, intel depending on your hardware)
+docker pull ghcr.io/mostlygeek/llama-swap:cuda
+```
+
+### Homebrew (macOS/Linux)
+
+```bash
+brew tap mostlygeek/llama-swap
+brew install llama-swap
+```
+
+### Pre-built Binaries
+
+Download from the [releases page](https://github.com/mostlygeek/llama-swap/releases).
+
+## Configuration
+
+A configuration file `llama-swap-config.yaml` is provided with 8 pre-configured vision models:
+
+### Small Models (1-4B parameters)
+- `lfm2-vl-1.6b` - LiquidAI LFM2-VL 1.6B (F16)
+- `gemma-3-4b` - Gemma 3 4B Instruct (F16)
+- `kimi-vl-3b` - Kimi VL A3B Thinking (F16)
+
+### Medium Models (7-12B parameters)
+- `qwen2.5-vl-7b` - Qwen2.5-VL 7B Instruct (F16)
+- `gemma-3-12b` - Gemma 3 12B Instruct (F16)
+
+### Large Models (24-27B parameters)
+- `mistral-small-24b-q8` - Mistral Small 3.2 24B (Q8_K_XL)
+- `mistral-small-24b-q4` - Mistral Small 3.2 24B (Q4_K_XL)
+- `gemma-3-27b` - Gemma 3 27B Instruct (Q8_0)
+
+## Starting llama-swap
+
+### Using Docker
+
+```bash
+docker run -it --rm --runtime nvidia -p 8080:8080 \
+  -v $(pwd)/llama-swap-config.yaml:/app/config.yaml \
+  -v /path/to/hf/cache:/root/.cache/huggingface \
+  ghcr.io/mostlygeek/llama-swap:cuda
+```
+
+### Using Binary
+
+```bash
+llama-swap --config llama-swap-config.yaml --listen localhost:8080
+```
+
+## Testing with Jersey Detection Script
+
+Once llama-swap is running, you can test different models by specifying the `--model-tag` parameter:
+
+### Test a Single Model
+
+```bash
+# Test Qwen2.5-VL 7B with resizing
+python test_jersey_detection.py ./images jersey_prompt.txt \
+  --model-tag "qwen2.5-vl-7b" \
+  --resize 1024
+```
+
+### Test Multiple Models Sequentially
+
+```bash
+# Test small models
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "lfm2-vl-1.6b" --resize 1024
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "gemma-3-4b" --resize 1024
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "kimi-vl-3b" --resize 1024
+
+# Test medium models
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "qwen2.5-vl-7b" --resize 1024
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "gemma-3-12b" --resize 1024
+
+# Test large models
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "mistral-small-24b-q4" --resize 1024
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "gemma-3-27b" --resize 1024
+```
+
+### Automated Testing Scripts
+
+Two bash scripts are provided for automated testing:
+
+#### 1. Full Test Suite (`test_all_models.sh`)
+
+Tests **all models** defined in `llama-swap-config.yaml`:
+
+```bash
+# Basic usage (uses defaults)
+./test_all_models.sh ./test_images
+
+# Customize configuration with environment variables
+RESIZE=2048 ./test_all_models.sh ./test_images
+OUTPUT_FILE=custom_results.jsonl ./test_all_models.sh ./test_images
+PROMPT_FILE=custom_prompt.txt ./test_all_models.sh ./test_images
+
+# Disable resize
+RESIZE= ./test_all_models.sh ./test_images
+```
+
+**Features:**
+- Automatically extracts all model tags from YAML config
+- Color-coded output with progress tracking
+- Confirms before starting tests
+- Shows summary with success/failure counts
+- Asks to continue if a model fails
+
+**Default Configuration:**
+- Images: `./test_images`
+- Prompt: `jersey_prompt_with_confidence.txt`
+- Resize: `1024px`
+- Output: `jersey_detection_results.jsonl`
+
+#### 2. Quick Test (`test_quick.sh`)
+
+Tests a **small subset** of models for rapid iteration:
+
+```bash
+# Test default selection (small, medium, large)
+./test_quick.sh ./test_images
+
+# Test custom models
+MODELS="lfm2-vl-1.6b qwen2.5-vl-7b" ./test_quick.sh ./test_images
+
+# Customize settings
+RESIZE=512 MODELS="gemma-3-4b" ./test_quick.sh ./test_images
+```
+
+**Default Models:**
+- `lfm2-vl-1.6b` (Small - 1.6B)
+- `qwen2.5-vl-7b` (Medium - 7B)
+- `mistral-small-24b-q4` (Large - 24B Q4)
+
+**Use Cases:**
+- Quick validation after prompt changes
+- Testing configuration adjustments
+- Rapid prototyping before full test run
+
+## Analyzing Results
+
+After testing multiple models, use the analysis script to compare performance:
+
+```bash
+python analyze_jersey_results.py
+```
+
+This will show:
+- Comparison table of all models tested
+- Performance charts with hallucination rates
+- Best performers by speed and accuracy
+- Confidence distribution (if applicable)
+
+## Model Swapping Behavior
+
+llama-swap will:
+1. **Automatically load** the requested model when you specify `--model-tag`
+2. **Automatically unload** the previous model (if different from current request)
+3. **Keep running** if you test the same model multiple times
+4. **Monitor** model loading/unloading in the web UI at `http://localhost:8080/ui`
+
+## Optional: Model Auto-Unloading
+
+To automatically unload models after 5 minutes of inactivity, uncomment this line in `llama-swap-config.yaml`:
+
+```yaml
+ttl: 300
+```
+
+## Optional: Preload Model on Startup
+
+To preload a specific model when llama-swap starts, uncomment and modify this section:
+
+```yaml
+hooks:
+  onStartup:
+    - loadModel: qwen2.5-vl-7b
+```
+
+## Customizing Models
+
+To add or modify models, edit `llama-swap-config.yaml`:
+
+```yaml
+models:
+  my-custom-model:
+    name: "My Custom Model Description"
+    cmd: llama-server --no-mmap -ngl 999 -fa on --host 0.0.0.0 --port ${PORT} -hf user/model-name:quantization
+```
+
+Then test with:
+
+```bash
+python test_jersey_detection.py ./images jersey_prompt.txt --model-tag "my-custom-model"
+```
+
+## Troubleshooting
+
+### Model not loading
+- Check llama-swap logs at `http://localhost:8080/log` or via `curl http://localhost:8080/log/stream`
+- Verify the model name in the config matches the `--model-tag` parameter
+- Ensure sufficient GPU memory for the model
+
+### Connection refused
+- Verify llama-swap is running: `curl http://localhost:8080/health`
+- Check the server URL matches: default is `http://192.168.1.126:8080` (from scan.ini)
+
+### Slow model switching
+- First load downloads models from HuggingFace (can be slow)
+- Subsequent loads are faster (cached locally)
+- Use quantized models (Q4, Q8) for faster loading and lower memory usage
+
+## Web UI
+
+llama-swap includes a web interface for monitoring:
+- **Dashboard**: `http://localhost:8080/ui` - View loaded models and logs
+- **Activity**: See recent API requests
+- **Logs**: Real-time log monitoring
+
+## References
+
+- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
+- [llama-swap Documentation](https://github.com/mostlygeek/llama-swap/tree/main/docs)
+- [llama.cpp Documentation](https://github.com/ggerganov/llama.cpp)