diff --git a/logo_detection_test_methodology.md b/logo_detection_test_methodology.md index 7dd3263..190a52d 100644 --- a/logo_detection_test_methodology.md +++ b/logo_detection_test_methodology.md @@ -128,7 +128,77 @@ IoU (Intersection over Union) = Area of Overlap / Area of Union **Configuration**: - `refs_per_logo`: Number of reference images (default: 3) - `min_matching_refs`: Minimum references that must match -- `use_mean_similarity`: Use mean vs max aggregation +- `use_max_similarity`: Use max instead of mean aggregation (default: False) + +#### Mean vs Max Similarity Aggregation + +When comparing a detected region against multiple reference images for the same logo, we need to combine the individual similarity scores into a single aggregate score. The two options are: + +**Mean Similarity** (default, `--use-max-similarity` NOT set): +- Calculates the average similarity across ALL reference images +- More conservative: requires consistent matching across references +- Better at rejecting false positives where only one reference happens to match + +**Max Similarity** (`--use-max-similarity` flag): +- Takes the HIGHEST similarity score from any single reference +- More lenient: only needs one good match to succeed +- Better recall when logos have high variability (one reference might be a perfect match) + +#### Detailed Example + +Suppose we have 5 reference images for the Nike logo, and a detected region produces these similarity scores: + +| Reference | Similarity | +|-----------|------------| +| nike_ref1.png | 0.92 | +| nike_ref2.png | 0.78 | +| nike_ref3.png | 0.85 | +| nike_ref4.png | 0.71 | +| nike_ref5.png | 0.88 | + +**With Mean Aggregation:** +``` +Score = (0.92 + 0.78 + 0.85 + 0.71 + 0.88) / 5 = 0.828 +``` +The score reflects the overall consistency of the match. If one reference is an outlier (like nike_ref4 at 0.71), it pulls the average down. + +**With Max Aggregation:** +``` +Score = max(0.92, 0.78, 0.85, 0.71, 0.88) = 0.92 +``` +The score reflects the best possible match. The lower-scoring references don't affect the result. + +#### When to Use Each + +| Scenario | Recommended | Why | +|----------|-------------|-----| +| Logos with consistent appearance | Mean | Penalizes partial matches that only hit one variant | +| Logos with high variability (different colors, orientations) | Max | One reference matching well is sufficient evidence | +| High false positive rate | Mean | More conservative scoring reduces false matches | +| High false negative rate | Max | More lenient scoring catches more true matches | +| Reference images are all similar | Either | Results will be similar | +| Reference images show different logo variants | Max | Each variant should be allowed to match independently | + +#### Combined Example with min_matching_refs + +The `min_matching_refs` parameter works independently of the aggregation method. It counts how many references exceed the threshold, regardless of which aggregation is used for the final score. + +**Example with threshold=0.80, min_matching_refs=2:** + +| Reference | Similarity | Above Threshold? | +|-----------|------------|------------------| +| nike_ref1.png | 0.92 | Yes | +| nike_ref2.png | 0.78 | No | +| nike_ref3.png | 0.85 | Yes | +| nike_ref4.png | 0.71 | No | +| nike_ref5.png | 0.88 | Yes | + +- References above threshold: 3 (nike_ref1, nike_ref3, nike_ref5) +- min_matching_refs requirement: 2 ✓ (3 >= 2, so we proceed) +- Mean score: 0.828 +- Max score: 0.92 + +If only 1 reference was above threshold, the match would be rejected regardless of the aggregated score. ---