The multi-ref matching method was missing a margin check against other logos, causing excessive false positives. This fix adds: - margin parameter to find_best_match_multi_ref() that requires the best logo's score to exceed the second-best by a minimum margin - Test script now passes --margin to both matching methods - Updated documentation to reflect margin applies to both methods Also adds run_comparison_tests.sh to run all three matching methods and compare results.
11 KiB
Logo Detection Test Methodology
This document describes how the logo detection test framework works and the various techniques implemented to improve detection accuracy.
Overview
The system uses a two-stage pipeline:
- DETR (DEtection TRansformer) - Detects potential logo regions in images
- CLIP (Contrastive Language-Image Pre-training) - Extracts feature embeddings for matching
Test Framework (test_logo_detection.py)
Test Flow
- Sample Reference Logos: Randomly select N logos from the database, with multiple reference images per logo
- Compute Reference Embeddings: Generate CLIP embeddings for all reference logo images
- Build Test Set: For each sampled logo, select:
- Positive samples: Images known to contain the logo
- Negative samples: Images known NOT to contain the logo
- Run Detection: Process each test image through DETR to find logo regions
- Match Against References: Compare detected regions against reference embeddings using margin-based matching
- Calculate Metrics: Compute precision, recall, and F1 score
Configurable Parameters
General Parameters
| Parameter | Default | Description |
|---|---|---|
--num-logos |
10 | Number of reference logos to sample |
--refs-per-logo |
3 | Reference images per logo |
--positive-samples |
5 | Positive test images per logo |
--negative-samples |
20 | Negative test images per logo |
--threshold |
0.7 | CLIP similarity threshold for matching |
--detr-threshold |
0.5 | DETR detection confidence threshold |
--seed |
None | Random seed for reproducibility |
Matching Method Selection
| Parameter | Default | Description |
|---|---|---|
--matching-method |
margin | Matching method: margin or multi-ref |
--margin |
0.05 | Required margin between best and second-best match (applies to both methods) |
Multi-Ref Method Parameters (when --matching-method multi-ref)
| Parameter | Default | Description |
|---|---|---|
--min-matching-refs |
1 | Minimum references that must match above threshold |
--use-max-similarity |
False | Use max similarity instead of mean across references |
Cache Control
| Parameter | Default | Description |
|---|---|---|
--no-cache |
False | Disable embedding cache |
--clear-cache |
False | Clear cache before running |
Metrics
- True Positives: Detected logo correctly matches expected logo
- False Positives: Detected logo matches wrong logo or image has no logo
- False Negatives: Expected logo not detected/matched
- Precision: TP / (TP + FP) - How many detections were correct
- Recall: TP / Total Expected - How many logos were found
- F1 Score: Harmonic mean of precision and recall
Accuracy Improvement Techniques
1. Non-Maximum Suppression (NMS)
Location: logo_detection_detr.py:214-268
Problem: DETR may produce multiple overlapping bounding boxes for the same logo.
Solution: NMS removes redundant detections by:
- Sorting detections by confidence score (descending)
- Keeping the highest-scoring box
- Removing any remaining boxes with IoU > threshold (default 0.5)
- Repeating until no boxes remain
IoU (Intersection over Union) = Area of Overlap / Area of Union
Configuration: nms_iou_threshold parameter (default: 0.5)
2. Minimum Box Size Filtering
Location: logo_detection_detr.py:187-191
Problem: Very small detections are often noise or partial logo fragments.
Solution: Filter out detections where width OR height is below a minimum threshold.
Configuration: min_box_size parameter (default: 20 pixels)
3. Confidence Threshold Filtering
Location: logo_detection_detr.py:177-179
Problem: Low-confidence DETR detections are unreliable.
Solution: Only keep detections with confidence score >= threshold.
Configuration: detr_threshold parameter (default: 0.5)
4. Multiple Reference Images Per Logo
Location: logo_detection_detr.py:397-457 (find_best_match_multi_ref)
Problem: A single reference image may not capture all variations of a logo (different angles, lighting, scales).
Solution: Use multiple reference images per logo and aggregate their similarity scores:
- Calculate similarity to each reference embedding
- Count how many references match above threshold
- Use mean or max similarity as the aggregate score
- Require a minimum number of references to match
Configuration:
refs_per_logo: Number of reference images (default: 3)min_matching_refs: Minimum references that must matchuse_mean_similarity: Use mean vs max aggregation
5. Margin-Based Matching
Location: logo_detection_detr.py:459-505 (find_best_match_with_margin)
Problem: When multiple logos have similar embeddings, the best match may not be significantly better than alternatives, leading to false positives.
Solution: Require the best match to exceed the second-best match by a minimum margin:
Match only if: best_similarity - second_best_similarity >= margin
This ensures confident matches and reduces ambiguous classifications.
Configuration: --margin parameter (default: 0.05)
Example:
- Best match: Logo A with similarity 0.82
- Second best: Logo B with similarity 0.79
- Margin required: 0.05
- Result: No match (0.82 - 0.79 = 0.03 < 0.05)
6. Embedding Caching
Location: test_logo_detection.py:49-82 (EmbeddingCache class)
Problem: Computing CLIP embeddings is computationally expensive. Re-running tests would reprocess the same images.
Solution: Cache embeddings to disk using pickle:
- Reference embeddings keyed by
ref:{filename} - Detection results keyed by
det:{filename} - Cache persists between runs (
.embedding_cache.pkl)
Configuration:
--no-cache: Disable caching entirely--clear-cache: Clear cache before running
7. Normalized Embeddings for Cosine Similarity
Location: logo_detection_detr.py:334-335
Problem: Raw CLIP embeddings have varying magnitudes, which can affect similarity calculations.
Solution: L2-normalize all embeddings before comparison:
features = F.normalize(features, dim=-1)
This ensures cosine similarity is computed correctly and scores fall in the range [-1, 1].
Matching Methods Summary
| Method | Test Script Option | Key Feature |
|---|---|---|
find_best_match |
N/A (library only) | Returns highest similarity above threshold |
find_best_match_with_margin |
--matching-method margin |
Requires margin over second-best match |
find_best_match_multi_ref |
--matching-method multi-ref |
Aggregates scores across reference images |
The test script supports both margin and multi-ref matching methods via the --matching-method parameter.
Detection Pipeline Summary
Input Image
│
▼
┌─────────────────────────────────────┐
│ DETR Object Detection │
│ - Identifies potential logo regions│
│ - Returns bounding boxes + scores │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Confidence Filtering │
│ - Remove detections < threshold │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Size Filtering │
│ - Remove boxes < min_box_size │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ CLIP Embedding Extraction │
│ - Crop each detected region │
│ - Generate normalized embedding │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Non-Maximum Suppression │
│ - Remove overlapping detections │
│ - Keep highest confidence boxes │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Matching (selectable method) │
│ ┌───────────────┬────────────────┐ │
│ │ margin │ multi-ref │ │
│ ├───────────────┼────────────────┤ │
│ │ Require margin│ Aggregate │ │
│ │ over 2nd best │ across refs │ │
│ │ match │ (mean or max) │ │
│ └───────────────┴────────────────┘ │
└─────────────────────────────────────┘
│
▼
Matched Logo Labels
Tuning Recommendations
For Margin-Based Matching (--matching-method margin)
| Goal | Adjustments |
|---|---|
| Reduce false positives | Increase --threshold, increase --margin |
| Reduce false negatives | Decrease --threshold, decrease --margin |
For Multi-Ref Matching (--matching-method multi-ref)
| Goal | Adjustments |
|---|---|
| Reduce false positives | Increase --threshold, increase --margin, increase --min-matching-refs, use mean similarity |
| Reduce false negatives | Decrease --threshold, decrease --margin, decrease --min-matching-refs, use --use-max-similarity |
General Tuning
| Goal | Adjustments |
|---|---|
| Faster processing | Decrease --refs-per-logo, use caching |
| More robust detection | Increase --refs-per-logo, decrease --detr-threshold |
| Higher precision | Increase --detr-threshold, use margin method with high margin |
| Higher recall | Decrease --detr-threshold, use multi-ref with low --min-matching-refs |
Example Usage
# Default margin-based matching
python test_logo_detection.py -n 20 --threshold 0.75 --margin 0.05
# Multi-ref matching with margin (recommended for reducing false positives)
python test_logo_detection.py -n 20 --matching-method multi-ref \
--refs-per-logo 5 --min-matching-refs 2 --threshold 0.70 --margin 0.05
# Multi-ref matching with max similarity (more lenient)
python test_logo_detection.py -n 20 --matching-method multi-ref \
--refs-per-logo 5 --min-matching-refs 1 --use-max-similarity --margin 0.03
# Reproducible test with seed
python test_logo_detection.py -n 50 --seed 42 --clear-cache