Document margin behavior and update model comparison script
- Add section explaining how margin works differently in multi-ref vs margin-only matching, with examples showing why margin-only fails when using multiple references per logo - Update run_model_comparison.sh to use optimal threshold (0.70) and margin (0.05) based on test results - Add DINOv2 Large model test to comparison script - Add threshold optimization test analysis to results document
This commit is contained in:
@ -224,6 +224,51 @@ This ensures confident matches and reduces ambiguous classifications.
|
||||
- Margin required: 0.05
|
||||
- Result: **No match** (0.82 - 0.79 = 0.03 < 0.05)
|
||||
|
||||
#### Margin in Multi-Ref vs Margin-Only Matching
|
||||
|
||||
The margin parameter applies to both `margin` and `multi-ref` methods, but operates at different levels:
|
||||
|
||||
| Method | What Margin Compares |
|
||||
|--------|---------------------|
|
||||
| `margin` | Best **reference embedding** vs second-best **reference embedding** |
|
||||
| `multi-ref` | Best **logo's aggregated score** vs second-best **logo's aggregated score** |
|
||||
|
||||
This distinction is critical when using multiple references per logo.
|
||||
|
||||
#### The Problem with Margin-Only and Multiple References
|
||||
|
||||
In margin-only matching, all individual reference embeddings compete against each other—including references from the **same logo**. This causes legitimate matches to be rejected.
|
||||
|
||||
**Example showing the problem:**
|
||||
|
||||
Suppose Nike has 3 references and Adidas has 3 references. A detected region produces:
|
||||
|
||||
| Reference | Similarity |
|
||||
|-----------|------------|
|
||||
| Nike_ref1 | 0.92 |
|
||||
| Nike_ref2 | 0.91 |
|
||||
| Nike_ref3 | 0.85 |
|
||||
| Adidas_ref1 | 0.78 |
|
||||
| Adidas_ref2 | 0.75 |
|
||||
| Adidas_ref3 | 0.72 |
|
||||
|
||||
**With margin-only matching (margin=0.05):**
|
||||
- Best reference: Nike_ref1 (0.92)
|
||||
- Second-best reference: Nike_ref2 (0.91) ← Same logo!
|
||||
- Margin check: 0.92 - 0.91 = 0.01 < 0.05 → **Rejected**
|
||||
|
||||
The match is rejected even though this is clearly a Nike logo! Nike's own references compete against each other and fail the margin test.
|
||||
|
||||
**With multi-ref matching (margin=0.05):**
|
||||
- First, aggregate scores per logo:
|
||||
- Nike: max(0.92, 0.91, 0.85) = 0.92
|
||||
- Adidas: max(0.78, 0.75, 0.72) = 0.78
|
||||
- Best logo: Nike (0.92)
|
||||
- Second-best logo: Adidas (0.78)
|
||||
- Margin check: 0.92 - 0.78 = 0.14 >= 0.05 → **Accepted**
|
||||
|
||||
This is why margin-only matching produces very low recall when using multiple references per logo—it was designed for single-reference scenarios.
|
||||
|
||||
---
|
||||
|
||||
### 6. Embedding Caching
|
||||
|
||||
Reference in New Issue
Block a user