Document margin behavior and update model comparison script

- Add section explaining how margin works differently in multi-ref vs margin-only matching, with examples showing why margin-only fails when using multiple references per logo - Update run_model_comparison.sh to use optimal threshold (0.70) and margin (0.05) based on test results - Add DINOv2 Large model test to comparison script - Add threshold optimization test analysis to results document
2026-01-02 14:42:53 -05:00
parent 48d9145810
commit 2c41549ae0
3 changed files with 179 additions and 3 deletions
--- a/logo_detection_test_methodology.md
+++ b/logo_detection_test_methodology.md
@ -224,6 +224,51 @@ This ensures confident matches and reduces ambiguous classifications.
 - Margin required: 0.05
 - Result: **No match** (0.82 - 0.79 = 0.03 < 0.05)

+#### Margin in Multi-Ref vs Margin-Only Matching
+
+The margin parameter applies to both `margin` and `multi-ref` methods, but operates at different levels:
+
+| Method | What Margin Compares |
+|--------|---------------------|
+| `margin` | Best **reference embedding** vs second-best **reference embedding** |
+| `multi-ref` | Best **logo's aggregated score** vs second-best **logo's aggregated score** |
+
+This distinction is critical when using multiple references per logo.
+
+#### The Problem with Margin-Only and Multiple References
+
+In margin-only matching, all individual reference embeddings compete against each other—including references from the **same logo**. This causes legitimate matches to be rejected.
+
+**Example showing the problem:**
+
+Suppose Nike has 3 references and Adidas has 3 references. A detected region produces:
+
+| Reference | Similarity |
+|-----------|------------|
+| Nike_ref1 | 0.92 |
+| Nike_ref2 | 0.91 |
+| Nike_ref3 | 0.85 |
+| Adidas_ref1 | 0.78 |
+| Adidas_ref2 | 0.75 |
+| Adidas_ref3 | 0.72 |
+
+**With margin-only matching (margin=0.05):**
+- Best reference: Nike_ref1 (0.92)
+- Second-best reference: Nike_ref2 (0.91) ← Same logo!
+- Margin check: 0.92 - 0.91 = 0.01 < 0.05 → **Rejected**
+
+The match is rejected even though this is clearly a Nike logo! Nike's own references compete against each other and fail the margin test.
+
+**With multi-ref matching (margin=0.05):**
+- First, aggregate scores per logo:
+  - Nike: max(0.92, 0.91, 0.85) = 0.92
+  - Adidas: max(0.78, 0.75, 0.72) = 0.78
+- Best logo: Nike (0.92)
+- Second-best logo: Adidas (0.78)
+- Margin check: 0.92 - 0.78 = 0.14 >= 0.05 → **Accepted**
+
+This is why margin-only matching produces very low recall when using multiple references per logo—it was designed for single-reference scenarios.
+
 ---

 ### 6. Embedding Caching