Add Burnley averaged embeddings test results to README

DINOv2 with margin-based matching on barnfield/vertu logos: 43.8% precision, 19.2% recall, 26.7% F1.
2026-03-31 11:59:02 -06:00
parent 5ce6265a90
commit 8b67b50d19
1 changed files with 27 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -2,6 +2,33 @@

 A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

+## Burnley Test: Averaged Embeddings with DINOv2
+
+A targeted test using `DetectLogosEmbeddings` to detect two specific logos (barnfield and vertu) in 516 Burnley match images. Reference embeddings are averaged across all images in each reference directory, and matching uses margin-based comparison (margin=0.05).
+
+**Test command:**
+```bash
+uv run python test_burnley_detection.py -e dinov2 -t 0.7 --margin 0.05 --output-file results_average_embeddings.txt
+```
+
+**Results (DINOv2, threshold 0.70, margin 0.05):**
+
+| Metric | Value |
+|--------|-------|
+| True Positives | 28 |
+| False Positives | 36 |
+| False Negatives | 125 |
+| Total Expected | 146 |
+| **Precision** | **43.8%** |
+| **Recall** | **19.2%** |
+| **F1 Score** | **26.7%** |
+
+Ground truth is derived from filename prefixes: `vertu_` (vertu logo), `barnfield_` (barnfield logo), `barnfield+vertu_` (both logos). Images without these prefixes are treated as negatives.
+
+Low recall suggests many logos go undetected by DETR or fall below the similarity threshold. The relatively low precision indicates DINOv2 averaged embeddings struggle to discriminate between the two logos in this domain. Further tuning of thresholds, margin, and embedding model (e.g. CLIP or SigLIP) may improve results.
+
+---
+
 ## Recommended Settings

 Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings: