Document margin behavior and update model comparison script

- Add section explaining how margin works differently in multi-ref vs
  margin-only matching, with examples showing why margin-only fails
  when using multiple references per logo
- Update run_model_comparison.sh to use optimal threshold (0.70) and
  margin (0.05) based on test results
- Add DINOv2 Large model test to comparison script
- Add threshold optimization test analysis to results document
This commit is contained in:
Rick McEwen
2026-01-02 14:42:53 -05:00
parent 48d9145810
commit 2c41549ae0
3 changed files with 179 additions and 3 deletions

View File

@ -224,6 +224,51 @@ This ensures confident matches and reduces ambiguous classifications.
- Margin required: 0.05
- Result: **No match** (0.82 - 0.79 = 0.03 < 0.05)
#### Margin in Multi-Ref vs Margin-Only Matching
The margin parameter applies to both `margin` and `multi-ref` methods, but operates at different levels:
| Method | What Margin Compares |
|--------|---------------------|
| `margin` | Best **reference embedding** vs second-best **reference embedding** |
| `multi-ref` | Best **logo's aggregated score** vs second-best **logo's aggregated score** |
This distinction is critical when using multiple references per logo.
#### The Problem with Margin-Only and Multiple References
In margin-only matching, all individual reference embeddings compete against each otherincluding references from the **same logo**. This causes legitimate matches to be rejected.
**Example showing the problem:**
Suppose Nike has 3 references and Adidas has 3 references. A detected region produces:
| Reference | Similarity |
|-----------|------------|
| Nike_ref1 | 0.92 |
| Nike_ref2 | 0.91 |
| Nike_ref3 | 0.85 |
| Adidas_ref1 | 0.78 |
| Adidas_ref2 | 0.75 |
| Adidas_ref3 | 0.72 |
**With margin-only matching (margin=0.05):**
- Best reference: Nike_ref1 (0.92)
- Second-best reference: Nike_ref2 (0.91) Same logo!
- Margin check: 0.92 - 0.91 = 0.01 < 0.05 **Rejected**
The match is rejected even though this is clearly a Nike logo! Nike's own references compete against each other and fail the margin test.
**With multi-ref matching (margin=0.05):**
- First, aggregate scores per logo:
- Nike: max(0.92, 0.91, 0.85) = 0.92
- Adidas: max(0.78, 0.75, 0.72) = 0.78
- Best logo: Nike (0.92)
- Second-best logo: Adidas (0.78)
- Margin check: 0.92 - 0.78 = 0.14 >= 0.05 → **Accepted**
This is why margin-only matching produces very low recall when using multiple references per logo—it was designed for single-reference scenarios.
---
### 6. Embedding Caching