Add hallucination detection, prompt files, and llama-swap sections to README

This commit is contained in:
2026-01-20 13:42:39 -07:00
parent 8706edcd13
commit 825f3c19a9

View File

@ -85,6 +85,60 @@ The `jersey_detection_results.jsonl` file contains results from 6 test runs:
See `docs/JERSEY_DETECTION_MODEL_ANALYSIS.md` for detailed analysis.
## Hallucination Detection
Vision-language models can sometimes "hallucinate" by returning example jersey numbers from the prompt instead of actual detections from the image. To combat this, the detection code filters out known example numbers.
**Filtered numbers:** `101`, `102`, `103`, `142`, `199`
These numbers were deliberately chosen as examples in the prompt because real jersey numbers are typically 0-99. Any detection returning these numbers is flagged as a hallucination and excluded from results.
The hallucination filter is implemented in both:
- `scan_utils/jersey_detection.py` - Core detection class
- `test_jersey_detection.py` - Test runner
Test results track hallucination statistics including:
- Total hallucinated detections filtered
- Hallucination rate percentage
- Per-image hallucination counts
## Prompt Files
Two prompt templates are provided for jersey detection:
| File | Description |
|------|-------------|
| [jersey_prompt.txt](jersey_prompt.txt) | Basic prompt for jersey detection without confidence scores |
| [jersey_prompt_with_confidence.txt](jersey_prompt_with_confidence.txt) | Enhanced prompt with confidence scoring (0-100 scale) |
The confidence prompt includes scoring guidelines:
- **90-100**: Extremely clear and unambiguous
- **70-89**: Clear but minor occlusion/angle issues
- **50-69**: Partially visible or somewhat unclear
- **30-49**: Difficult to read but visible
- **0-29**: Very uncertain, barely visible
## Llama-swap Configuration
This project supports [llama-swap](https://github.com/mostlygeek/llama-swap) for automatic model switching during batch testing.
**Configuration file:** [llama-swap-config.yaml](llama-swap-config.yaml)
The config includes 8 pre-configured vision-language models:
| Model Tag | Parameters | Quantization |
|-----------|------------|--------------|
| lfm2-vl-1.6b | 1.6B | F16 |
| gemma-3-4b | 4B | F16 |
| kimi-vl-3b | 3B | F16 |
| qwen2.5-vl-7b | 7B | F16 |
| gemma-3-12b | 12B | F16 |
| mistral-small-24b-q8 | 24B | Q8_K_XL |
| mistral-small-24b-q4 | 24B | Q4_K_XL |
| gemma-3-27b | 27B | Q8_0 |
See [docs/LLAMA_SWAP_SETUP.md](docs/LLAMA_SWAP_SETUP.md) for server setup instructions.
## Key Findings
1. **Top Recommendation**: qwen2.5-vl-7b (72.9% F1 score)