# Logo Detection Test Methodology This document describes how the logo detection test framework works and the various techniques implemented to improve detection accuracy. ## Overview The system uses a two-stage pipeline: 1. **DETR** (DEtection TRansformer) - Detects potential logo regions in images 2. **CLIP** (Contrastive Language-Image Pre-training) - Extracts feature embeddings for matching ## Test Framework (`test_logo_detection.py`) ### Test Flow 1. **Sample Reference Logos**: Randomly select N logos from the database, with multiple reference images per logo 2. **Compute Reference Embeddings**: Generate CLIP embeddings for all reference logo images 3. **Build Test Set**: For each sampled logo, select: - Positive samples: Images known to contain the logo - Negative samples: Images known NOT to contain the logo 4. **Run Detection**: Process each test image through DETR to find logo regions 5. **Match Against References**: Compare detected regions against reference embeddings using margin-based matching 6. **Calculate Metrics**: Compute precision, recall, and F1 score ### Configurable Parameters #### General Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--num-logos` | 10 | Number of reference logos to sample | | `--refs-per-logo` | 3 | Reference images per logo | | `--positive-samples` | 5 | Positive test images per logo | | `--negative-samples` | 20 | Negative test images per logo | | `--threshold` | 0.7 | CLIP similarity threshold for matching | | `--detr-threshold` | 0.5 | DETR detection confidence threshold | | `--seed` | None | Random seed for reproducibility | #### Matching Method Selection | Parameter | Default | Description | |-----------|---------|-------------| | `--matching-method` | margin | Matching method: `simple`, `margin`, or `multi-ref` | | `--margin` | 0.05 | Required margin between best and second-best match (applies to `margin` and `multi-ref`) | #### Multi-Ref Method Parameters (when `--matching-method multi-ref`) | Parameter | Default | Description | |-----------|---------|-------------| | `--min-matching-refs` | 1 | Minimum references that must match above threshold | | `--use-max-similarity` | False | Use max similarity instead of mean across references | #### Cache Control | Parameter | Default | Description | |-----------|---------|-------------| | `--no-cache` | False | Disable embedding cache | | `--clear-cache` | False | Clear cache before running | ### Metrics - **True Positives**: Detected logo correctly matches expected logo - **False Positives**: Detected logo matches wrong logo or image has no logo - **False Negatives**: Expected logo not detected/matched - **Precision**: TP / (TP + FP) - How many detections were correct - **Recall**: TP / Total Expected - How many logos were found - **F1 Score**: Harmonic mean of precision and recall --- ## Accuracy Improvement Techniques ### 1. Non-Maximum Suppression (NMS) **Location**: `logo_detection_detr.py:214-268` **Problem**: DETR may produce multiple overlapping bounding boxes for the same logo. **Solution**: NMS removes redundant detections by: 1. Sorting detections by confidence score (descending) 2. Keeping the highest-scoring box 3. Removing any remaining boxes with IoU > threshold (default 0.5) 4. Repeating until no boxes remain ``` IoU (Intersection over Union) = Area of Overlap / Area of Union ``` **Configuration**: `nms_iou_threshold` parameter (default: 0.5) --- ### 2. Minimum Box Size Filtering **Location**: `logo_detection_detr.py:187-191` **Problem**: Very small detections are often noise or partial logo fragments. **Solution**: Filter out detections where width OR height is below a minimum threshold. **Configuration**: `min_box_size` parameter (default: 20 pixels) --- ### 3. Confidence Threshold Filtering **Location**: `logo_detection_detr.py:177-179` **Problem**: Low-confidence DETR detections are unreliable. **Solution**: Only keep detections with confidence score >= threshold. **Configuration**: `detr_threshold` parameter (default: 0.5) --- ### 4. Multiple Reference Images Per Logo **Location**: `logo_detection_detr.py:397-457` (`find_best_match_multi_ref`) **Problem**: A single reference image may not capture all variations of a logo (different angles, lighting, scales). **Solution**: Use multiple reference images per logo and aggregate their similarity scores: - Calculate similarity to each reference embedding - Count how many references match above threshold - Use mean or max similarity as the aggregate score - Require a minimum number of references to match **Configuration**: - `refs_per_logo`: Number of reference images (default: 3) - `min_matching_refs`: Minimum references that must match - `use_max_similarity`: Use max instead of mean aggregation (default: False) #### Mean vs Max Similarity Aggregation When comparing a detected region against multiple reference images for the same logo, we need to combine the individual similarity scores into a single aggregate score. The two options are: **Mean Similarity** (default, `--use-max-similarity` NOT set): - Calculates the average similarity across ALL reference images - More conservative: requires consistent matching across references - Better at rejecting false positives where only one reference happens to match **Max Similarity** (`--use-max-similarity` flag): - Takes the HIGHEST similarity score from any single reference - More lenient: only needs one good match to succeed - Better recall when logos have high variability (one reference might be a perfect match) #### Detailed Example Suppose we have 5 reference images for the Nike logo, and a detected region produces these similarity scores: | Reference | Similarity | |-----------|------------| | nike_ref1.png | 0.92 | | nike_ref2.png | 0.78 | | nike_ref3.png | 0.85 | | nike_ref4.png | 0.71 | | nike_ref5.png | 0.88 | **With Mean Aggregation:** ``` Score = (0.92 + 0.78 + 0.85 + 0.71 + 0.88) / 5 = 0.828 ``` The score reflects the overall consistency of the match. If one reference is an outlier (like nike_ref4 at 0.71), it pulls the average down. **With Max Aggregation:** ``` Score = max(0.92, 0.78, 0.85, 0.71, 0.88) = 0.92 ``` The score reflects the best possible match. The lower-scoring references don't affect the result. #### When to Use Each | Scenario | Recommended | Why | |----------|-------------|-----| | Logos with consistent appearance | Mean | Penalizes partial matches that only hit one variant | | Logos with high variability (different colors, orientations) | Max | One reference matching well is sufficient evidence | | High false positive rate | Mean | More conservative scoring reduces false matches | | High false negative rate | Max | More lenient scoring catches more true matches | | Reference images are all similar | Either | Results will be similar | | Reference images show different logo variants | Max | Each variant should be allowed to match independently | #### Combined Example with min_matching_refs The `min_matching_refs` parameter works independently of the aggregation method. It counts how many references exceed the threshold, regardless of which aggregation is used for the final score. **Example with threshold=0.80, min_matching_refs=2:** | Reference | Similarity | Above Threshold? | |-----------|------------|------------------| | nike_ref1.png | 0.92 | Yes | | nike_ref2.png | 0.78 | No | | nike_ref3.png | 0.85 | Yes | | nike_ref4.png | 0.71 | No | | nike_ref5.png | 0.88 | Yes | - References above threshold: 3 (nike_ref1, nike_ref3, nike_ref5) - min_matching_refs requirement: 2 ✓ (3 >= 2, so we proceed) - Mean score: 0.828 - Max score: 0.92 If only 1 reference was above threshold, the match would be rejected regardless of the aggregated score. --- ### 5. Margin-Based Matching **Location**: `logo_detection_detr.py:459-505` (`find_best_match_with_margin`) **Problem**: When multiple logos have similar embeddings, the best match may not be significantly better than alternatives, leading to false positives. **Solution**: Require the best match to exceed the second-best match by a minimum margin: ``` Match only if: best_similarity - second_best_similarity >= margin ``` This ensures confident matches and reduces ambiguous classifications. **Configuration**: `--margin` parameter (default: 0.05) **Example**: - Best match: Logo A with similarity 0.82 - Second best: Logo B with similarity 0.79 - Margin required: 0.05 - Result: **No match** (0.82 - 0.79 = 0.03 < 0.05) #### Margin in Multi-Ref vs Margin-Only Matching The margin parameter applies to both `margin` and `multi-ref` methods, but operates at different levels: | Method | What Margin Compares | |--------|---------------------| | `margin` | Best **reference embedding** vs second-best **reference embedding** | | `multi-ref` | Best **logo's aggregated score** vs second-best **logo's aggregated score** | This distinction is critical when using multiple references per logo. #### The Problem with Margin-Only and Multiple References In margin-only matching, all individual reference embeddings compete against each other—including references from the **same logo**. This causes legitimate matches to be rejected. **Example showing the problem:** Suppose Nike has 3 references and Adidas has 3 references. A detected region produces: | Reference | Similarity | |-----------|------------| | Nike_ref1 | 0.92 | | Nike_ref2 | 0.91 | | Nike_ref3 | 0.85 | | Adidas_ref1 | 0.78 | | Adidas_ref2 | 0.75 | | Adidas_ref3 | 0.72 | **With margin-only matching (margin=0.05):** - Best reference: Nike_ref1 (0.92) - Second-best reference: Nike_ref2 (0.91) ← Same logo! - Margin check: 0.92 - 0.91 = 0.01 < 0.05 → **Rejected** The match is rejected even though this is clearly a Nike logo! Nike's own references compete against each other and fail the margin test. **With multi-ref matching (margin=0.05):** - First, aggregate scores per logo: - Nike: max(0.92, 0.91, 0.85) = 0.92 - Adidas: max(0.78, 0.75, 0.72) = 0.78 - Best logo: Nike (0.92) - Second-best logo: Adidas (0.78) - Margin check: 0.92 - 0.78 = 0.14 >= 0.05 → **Accepted** This is why margin-only matching produces very low recall when using multiple references per logo—it was designed for single-reference scenarios. --- ### 6. Embedding Caching **Location**: `test_logo_detection.py:49-82` (`EmbeddingCache` class) **Problem**: Computing CLIP embeddings is computationally expensive. Re-running tests would reprocess the same images. **Solution**: Cache embeddings to disk using pickle: - Reference embeddings keyed by `ref:{filename}` - Detection results keyed by `det:{filename}` - Cache persists between runs (`.embedding_cache.pkl`) **Configuration**: - `--no-cache`: Disable caching entirely - `--clear-cache`: Clear cache before running --- ### 7. Normalized Embeddings for Cosine Similarity **Location**: `logo_detection_detr.py:334-335` **Problem**: Raw CLIP embeddings have varying magnitudes, which can affect similarity calculations. **Solution**: L2-normalize all embeddings before comparison: ```python features = F.normalize(features, dim=-1) ``` This ensures cosine similarity is computed correctly and scores fall in the range [-1, 1]. --- ## Matching Methods Summary | Method | Test Script Option | Key Feature | |--------|-------------------|-------------| | `find_all_matches` | `--matching-method simple` | Returns ALL logos above threshold (baseline, most permissive) | | `find_best_match_with_margin` | `--matching-method margin` | Requires margin over second-best match | | `find_best_match_multi_ref` | `--matching-method multi-ref` | Aggregates scores across reference images | The test script supports `simple`, `margin`, and `multi-ref` matching methods via the `--matching-method` parameter. --- ## Detection Pipeline Summary ``` Input Image │ ▼ ┌─────────────────────────────────────┐ │ DETR Object Detection │ │ - Identifies potential logo regions│ │ - Returns bounding boxes + scores │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Confidence Filtering │ │ - Remove detections < threshold │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Size Filtering │ │ - Remove boxes < min_box_size │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ CLIP Embedding Extraction │ │ - Crop each detected region │ │ - Generate normalized embedding │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Non-Maximum Suppression │ │ - Remove overlapping detections │ │ - Keep highest confidence boxes │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Matching (selectable method) │ │ ┌─────────┬─────────┬────────────┐ │ │ │ simple │ margin │ multi-ref │ │ │ ├─────────┼─────────┼────────────┤ │ │ │ All │ Require │ Aggregate │ │ │ │ matches │ margin │ across │ │ │ │ above │ over │ refs │ │ │ │ thresh │ 2nd best│ (mean/max) │ │ │ └─────────┴─────────┴────────────┘ │ └─────────────────────────────────────┘ │ ▼ Matched Logo Labels ``` --- ## Tuning Recommendations ### For Simple Matching (`--matching-method simple`) | Goal | Adjustments | |------|-------------| | **Reduce false positives** | Increase `--threshold` (only tuning option for simple method) | | **Reduce false negatives** | Decrease `--threshold` | Note: Simple matching is primarily used as a baseline. For production use, consider `margin` or `multi-ref`. ### For Margin-Based Matching (`--matching-method margin`) | Goal | Adjustments | |------|-------------| | **Reduce false positives** | Increase `--threshold`, increase `--margin` | | **Reduce false negatives** | Decrease `--threshold`, decrease `--margin` | ### For Multi-Ref Matching (`--matching-method multi-ref`) | Goal | Adjustments | |------|-------------| | **Reduce false positives** | Increase `--threshold`, increase `--margin`, increase `--min-matching-refs`, use mean similarity | | **Reduce false negatives** | Decrease `--threshold`, decrease `--margin`, decrease `--min-matching-refs`, use `--use-max-similarity` | ### General Tuning | Goal | Adjustments | |------|-------------| | **Faster processing** | Decrease `--refs-per-logo`, use caching | | **More robust detection** | Increase `--refs-per-logo`, decrease `--detr-threshold` | | **Higher precision** | Increase `--detr-threshold`, use margin method with high margin | | **Higher recall** | Decrease `--detr-threshold`, use multi-ref with low `--min-matching-refs` | --- ## Example Usage ```bash # Simple matching (baseline - all matches above threshold) python test_logo_detection.py -n 20 --matching-method simple --threshold 0.70 # Default margin-based matching python test_logo_detection.py -n 20 --threshold 0.75 --margin 0.05 # Multi-ref matching with margin (recommended for reducing false positives) python test_logo_detection.py -n 20 --matching-method multi-ref \ --refs-per-logo 5 --min-matching-refs 2 --threshold 0.70 --margin 0.05 # Multi-ref matching with max similarity (more lenient) python test_logo_detection.py -n 20 --matching-method multi-ref \ --refs-per-logo 5 --min-matching-refs 1 --use-max-similarity --margin 0.03 # Reproducible test with seed python test_logo_detection.py -n 50 --seed 42 --clear-cache ```