diff --git a/logo_detection_test_methodology.md b/logo_detection_test_methodology.md
index 7dd3263..190a52d 100644
--- a/logo_detection_test_methodology.md
+++ b/logo_detection_test_methodology.md
@@ -128,7 +128,77 @@ IoU (Intersection over Union) = Area of Overlap / Area of Union
 **Configuration**:
 - `refs_per_logo`: Number of reference images (default: 3)
 - `min_matching_refs`: Minimum references that must match
-- `use_mean_similarity`: Use mean vs max aggregation
+- `use_max_similarity`: Use max instead of mean aggregation (default: False)
+
+#### Mean vs Max Similarity Aggregation
+
+When comparing a detected region against multiple reference images for the same logo, we need to combine the individual similarity scores into a single aggregate score. The two options are:
+
+**Mean Similarity** (default, `--use-max-similarity` NOT set):
+- Calculates the average similarity across ALL reference images
+- More conservative: requires consistent matching across references
+- Better at rejecting false positives where only one reference happens to match
+
+**Max Similarity** (`--use-max-similarity` flag):
+- Takes the HIGHEST similarity score from any single reference
+- More lenient: only needs one good match to succeed
+- Better recall when logos have high variability (one reference might be a perfect match)
+
+#### Detailed Example
+
+Suppose we have 5 reference images for the Nike logo, and a detected region produces these similarity scores:
+
+| Reference | Similarity |
+|-----------|------------|
+| nike_ref1.png | 0.92 |
+| nike_ref2.png | 0.78 |
+| nike_ref3.png | 0.85 |
+| nike_ref4.png | 0.71 |
+| nike_ref5.png | 0.88 |
+
+**With Mean Aggregation:**
+```
+Score = (0.92 + 0.78 + 0.85 + 0.71 + 0.88) / 5 = 0.828
+```
+The score reflects the overall consistency of the match. If one reference is an outlier (like nike_ref4 at 0.71), it pulls the average down.
+
+**With Max Aggregation:**
+```
+Score = max(0.92, 0.78, 0.85, 0.71, 0.88) = 0.92
+```
+The score reflects the best possible match. The lower-scoring references don't affect the result.
+
+#### When to Use Each
+
+| Scenario | Recommended | Why |
+|----------|-------------|-----|
+| Logos with consistent appearance | Mean | Penalizes partial matches that only hit one variant |
+| Logos with high variability (different colors, orientations) | Max | One reference matching well is sufficient evidence |
+| High false positive rate | Mean | More conservative scoring reduces false matches |
+| High false negative rate | Max | More lenient scoring catches more true matches |
+| Reference images are all similar | Either | Results will be similar |
+| Reference images show different logo variants | Max | Each variant should be allowed to match independently |
+
+#### Combined Example with min_matching_refs
+
+The `min_matching_refs` parameter works independently of the aggregation method. It counts how many references exceed the threshold, regardless of which aggregation is used for the final score.
+
+**Example with threshold=0.80, min_matching_refs=2:**
+
+| Reference | Similarity | Above Threshold? |
+|-----------|------------|------------------|
+| nike_ref1.png | 0.92 | Yes |
+| nike_ref2.png | 0.78 | No |
+| nike_ref3.png | 0.85 | Yes |
+| nike_ref4.png | 0.71 | No |
+| nike_ref5.png | 0.88 | Yes |
+
+- References above threshold: 3 (nike_ref1, nike_ref3, nike_ref5)
+- min_matching_refs requirement: 2 ✓ (3 >= 2, so we proceed)
+- Mean score: 0.828
+- Max score: 0.92
+
+If only 1 reference was above threshold, the match would be rejected regardless of the aggregated score.
 
 ---