Add hallucination detection, prompt files, and llama-swap sections to README
This commit is contained in:
54
README.md
54
README.md
@ -85,6 +85,60 @@ The `jersey_detection_results.jsonl` file contains results from 6 test runs:
|
|||||||
|
|
||||||
See `docs/JERSEY_DETECTION_MODEL_ANALYSIS.md` for detailed analysis.
|
See `docs/JERSEY_DETECTION_MODEL_ANALYSIS.md` for detailed analysis.
|
||||||
|
|
||||||
|
## Hallucination Detection
|
||||||
|
|
||||||
|
Vision-language models can sometimes "hallucinate" by returning example jersey numbers from the prompt instead of actual detections from the image. To combat this, the detection code filters out known example numbers.
|
||||||
|
|
||||||
|
**Filtered numbers:** `101`, `102`, `103`, `142`, `199`
|
||||||
|
|
||||||
|
These numbers were deliberately chosen as examples in the prompt because real jersey numbers are typically 0-99. Any detection returning these numbers is flagged as a hallucination and excluded from results.
|
||||||
|
|
||||||
|
The hallucination filter is implemented in both:
|
||||||
|
- `scan_utils/jersey_detection.py` - Core detection class
|
||||||
|
- `test_jersey_detection.py` - Test runner
|
||||||
|
|
||||||
|
Test results track hallucination statistics including:
|
||||||
|
- Total hallucinated detections filtered
|
||||||
|
- Hallucination rate percentage
|
||||||
|
- Per-image hallucination counts
|
||||||
|
|
||||||
|
## Prompt Files
|
||||||
|
|
||||||
|
Two prompt templates are provided for jersey detection:
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| [jersey_prompt.txt](jersey_prompt.txt) | Basic prompt for jersey detection without confidence scores |
|
||||||
|
| [jersey_prompt_with_confidence.txt](jersey_prompt_with_confidence.txt) | Enhanced prompt with confidence scoring (0-100 scale) |
|
||||||
|
|
||||||
|
The confidence prompt includes scoring guidelines:
|
||||||
|
- **90-100**: Extremely clear and unambiguous
|
||||||
|
- **70-89**: Clear but minor occlusion/angle issues
|
||||||
|
- **50-69**: Partially visible or somewhat unclear
|
||||||
|
- **30-49**: Difficult to read but visible
|
||||||
|
- **0-29**: Very uncertain, barely visible
|
||||||
|
|
||||||
|
## Llama-swap Configuration
|
||||||
|
|
||||||
|
This project supports [llama-swap](https://github.com/mostlygeek/llama-swap) for automatic model switching during batch testing.
|
||||||
|
|
||||||
|
**Configuration file:** [llama-swap-config.yaml](llama-swap-config.yaml)
|
||||||
|
|
||||||
|
The config includes 8 pre-configured vision-language models:
|
||||||
|
|
||||||
|
| Model Tag | Parameters | Quantization |
|
||||||
|
|-----------|------------|--------------|
|
||||||
|
| lfm2-vl-1.6b | 1.6B | F16 |
|
||||||
|
| gemma-3-4b | 4B | F16 |
|
||||||
|
| kimi-vl-3b | 3B | F16 |
|
||||||
|
| qwen2.5-vl-7b | 7B | F16 |
|
||||||
|
| gemma-3-12b | 12B | F16 |
|
||||||
|
| mistral-small-24b-q8 | 24B | Q8_K_XL |
|
||||||
|
| mistral-small-24b-q4 | 24B | Q4_K_XL |
|
||||||
|
| gemma-3-27b | 27B | Q8_0 |
|
||||||
|
|
||||||
|
See [docs/LLAMA_SWAP_SETUP.md](docs/LLAMA_SWAP_SETUP.md) for server setup instructions.
|
||||||
|
|
||||||
## Key Findings
|
## Key Findings
|
||||||
|
|
||||||
1. **Top Recommendation**: qwen2.5-vl-7b (72.9% F1 score)
|
1. **Top Recommendation**: qwen2.5-vl-7b (72.9% F1 score)
|
||||||
|
|||||||
Reference in New Issue
Block a user