Document mean vs max similarity aggregation in multi-ref matching

- Add detailed explanation of mean vs max aggregation methods - Include concrete example with Nike logo and 5 reference images - Add decision table for when to use each approach - Show how min_matching_refs works independently of aggregation
2026-01-02 12:17:13 -05:00
parent 94db5bd40b
commit 2d19ed91d7
1 changed files with 71 additions and 1 deletions
--- a/logo_detection_test_methodology.md
+++ b/logo_detection_test_methodology.md
@ -128,7 +128,77 @@ IoU (Intersection over Union) = Area of Overlap / Area of Union
 **Configuration**:
 - `refs_per_logo`: Number of reference images (default: 3)
 - `min_matching_refs`: Minimum references that must match
- `use_mean_similarity`: Use mean vs max aggregation
+- `use_max_similarity`: Use max instead of mean aggregation (default: False)
 #### Mean vs Max Similarity Aggregation
 When comparing a detected region against multiple reference images for the same logo, we need to combine the individual similarity scores into a single aggregate score. The two options are:
 **Mean Similarity** (default, `--use-max-similarity` NOT set):
 - Calculates the average similarity across ALL reference images
 - More conservative: requires consistent matching across references
 - Better at rejecting false positives where only one reference happens to match
 **Max Similarity** (`--use-max-similarity` flag):
 - Takes the HIGHEST similarity score from any single reference
 - More lenient: only needs one good match to succeed
 - Better recall when logos have high variability (one reference might be a perfect match)
 #### Detailed Example
 Suppose we have 5 reference images for the Nike logo, and a detected region produces these similarity scores:
 | Reference | Similarity |
 |-----------|------------|
 | nike_ref1.png | 0.92 |
 | nike_ref2.png | 0.78 |
 | nike_ref3.png | 0.85 |
 | nike_ref4.png | 0.71 |
 | nike_ref5.png | 0.88 |
 **With Mean Aggregation:**
 ```
 Score = (0.92 + 0.78 + 0.85 + 0.71 + 0.88) / 5 = 0.828
 ```
 The score reflects the overall consistency of the match. If one reference is an outlier (like nike_ref4 at 0.71), it pulls the average down.
 **With Max Aggregation:**
 ```
 Score = max(0.92, 0.78, 0.85, 0.71, 0.88) = 0.92
 ```
 The score reflects the best possible match. The lower-scoring references don't affect the result.
 #### When to Use Each
 | Scenario | Recommended | Why |
 |----------|-------------|-----|
 | Logos with consistent appearance | Mean | Penalizes partial matches that only hit one variant |
 | Logos with high variability (different colors, orientations) | Max | One reference matching well is sufficient evidence |
 | High false positive rate | Mean | More conservative scoring reduces false matches |
 | High false negative rate | Max | More lenient scoring catches more true matches |
 | Reference images are all similar | Either | Results will be similar |
 | Reference images show different logo variants | Max | Each variant should be allowed to match independently |
 #### Combined Example with min_matching_refs
 The `min_matching_refs` parameter works independently of the aggregation method. It counts how many references exceed the threshold, regardless of which aggregation is used for the final score.
 **Example with threshold=0.80, min_matching_refs=2:**
 | Reference | Similarity | Above Threshold? |
 |-----------|------------|------------------|
 | nike_ref1.png | 0.92 | Yes |
 | nike_ref2.png | 0.78 | No |
 | nike_ref3.png | 0.85 | Yes |
 | nike_ref4.png | 0.71 | No |
 | nike_ref5.png | 0.88 | Yes |
 - References above threshold: 3 (nike_ref1, nike_ref3, nike_ref5)
 - min_matching_refs requirement: 2 ✓ (3 >= 2, so we proceed)
 - Mean score: 0.828
 - Max score: 0.92
 If only 1 reference was above threshold, the match would be rejected regardless of the aggregated score.
 ---