Add Burnley averaged embeddings test results to README

DINOv2 with margin-based matching on barnfield/vertu logos:
43.8% precision, 19.2% recall, 26.7% F1.
This commit is contained in:
Rick McEwen
2026-03-31 11:59:02 -06:00
parent 5ce6265a90
commit 8b67b50d19

View File

@ -2,6 +2,33 @@
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
## Burnley Test: Averaged Embeddings with DINOv2
A targeted test using `DetectLogosEmbeddings` to detect two specific logos (barnfield and vertu) in 516 Burnley match images. Reference embeddings are averaged across all images in each reference directory, and matching uses margin-based comparison (margin=0.05).
**Test command:**
```bash
uv run python test_burnley_detection.py -e dinov2 -t 0.7 --margin 0.05 --output-file results_average_embeddings.txt
```
**Results (DINOv2, threshold 0.70, margin 0.05):**
| Metric | Value |
|--------|-------|
| True Positives | 28 |
| False Positives | 36 |
| False Negatives | 125 |
| Total Expected | 146 |
| **Precision** | **43.8%** |
| **Recall** | **19.2%** |
| **F1 Score** | **26.7%** |
Ground truth is derived from filename prefixes: `vertu_` (vertu logo), `barnfield_` (barnfield logo), `barnfield+vertu_` (both logos). Images without these prefixes are treated as negatives.
Low recall suggests many logos go undetected by DETR or fall below the similarity threshold. The relatively low precision indicates DINOv2 averaged embeddings struggle to discriminate between the two logos in this domain. Further tuning of thresholds, margin, and embedding model (e.g. CLIP or SigLIP) may improve results.
---
## Recommended Settings
Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings: