Initial commit: Logo detection test framework

Add DETR+CLIP based logo detection library and test framework: - DetectLogosDETR class for logo detection and matching - Test script with margin-based and multi-ref matching methods - Data preparation script for test database - Documentation for API usage and test methodology
2025-12-31 10:42:36 -05:00
commit ddccf653d2
14 changed files with 3457 additions and 0 deletions
--- a/logo_detection_test_methodology.md
+++ b/logo_detection_test_methodology.md
@ -0,0 +1,308 @@
+# Logo Detection Test Methodology
+
+This document describes how the logo detection test framework works and the various techniques implemented to improve detection accuracy.
+
+## Overview
+
+The system uses a two-stage pipeline:
+1. **DETR** (DEtection TRansformer) - Detects potential logo regions in images
+2. **CLIP** (Contrastive Language-Image Pre-training) - Extracts feature embeddings for matching
+
+## Test Framework (`test_logo_detection.py`)
+
+### Test Flow
+
+1. **Sample Reference Logos**: Randomly select N logos from the database, with multiple reference images per logo
+2. **Compute Reference Embeddings**: Generate CLIP embeddings for all reference logo images
+3. **Build Test Set**: For each sampled logo, select:
+   - Positive samples: Images known to contain the logo
+   - Negative samples: Images known NOT to contain the logo
+4. **Run Detection**: Process each test image through DETR to find logo regions
+5. **Match Against References**: Compare detected regions against reference embeddings using margin-based matching
+6. **Calculate Metrics**: Compute precision, recall, and F1 score
+
+### Configurable Parameters
+
+#### General Parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--num-logos` | 10 | Number of reference logos to sample |
+| `--refs-per-logo` | 3 | Reference images per logo |
+| `--positive-samples` | 5 | Positive test images per logo |
+| `--negative-samples` | 20 | Negative test images per logo |
+| `--threshold` | 0.7 | CLIP similarity threshold for matching |
+| `--detr-threshold` | 0.5 | DETR detection confidence threshold |
+| `--seed` | None | Random seed for reproducibility |
+
+#### Matching Method Selection
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--matching-method` | margin | Matching method: `margin` or `multi-ref` |
+
+#### Margin Method Parameters (when `--matching-method margin`)
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--margin` | 0.05 | Required margin between best and second-best match |
+
+#### Multi-Ref Method Parameters (when `--matching-method multi-ref`)
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--min-matching-refs` | 1 | Minimum references that must match above threshold |
+| `--use-max-similarity` | False | Use max similarity instead of mean across references |
+
+#### Cache Control
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--no-cache` | False | Disable embedding cache |
+| `--clear-cache` | False | Clear cache before running |
+
+### Metrics
+
+- **True Positives**: Detected logo correctly matches expected logo
+- **False Positives**: Detected logo matches wrong logo or image has no logo
+- **False Negatives**: Expected logo not detected/matched
+- **Precision**: TP / (TP + FP) - How many detections were correct
+- **Recall**: TP / Total Expected - How many logos were found
+- **F1 Score**: Harmonic mean of precision and recall
+
+---
+
+## Accuracy Improvement Techniques
+
+### 1. Non-Maximum Suppression (NMS)
+
+**Location**: `logo_detection_detr.py:214-268`
+
+**Problem**: DETR may produce multiple overlapping bounding boxes for the same logo.
+
+**Solution**: NMS removes redundant detections by:
+1. Sorting detections by confidence score (descending)
+2. Keeping the highest-scoring box
+3. Removing any remaining boxes with IoU > threshold (default 0.5)
+4. Repeating until no boxes remain
+
+```
+IoU (Intersection over Union) = Area of Overlap / Area of Union
+```
+
+**Configuration**: `nms_iou_threshold` parameter (default: 0.5)
+
+---
+
+### 2. Minimum Box Size Filtering
+
+**Location**: `logo_detection_detr.py:187-191`
+
+**Problem**: Very small detections are often noise or partial logo fragments.
+
+**Solution**: Filter out detections where width OR height is below a minimum threshold.
+
+**Configuration**: `min_box_size` parameter (default: 20 pixels)
+
+---
+
+### 3. Confidence Threshold Filtering
+
+**Location**: `logo_detection_detr.py:177-179`
+
+**Problem**: Low-confidence DETR detections are unreliable.
+
+**Solution**: Only keep detections with confidence score >= threshold.
+
+**Configuration**: `detr_threshold` parameter (default: 0.5)
+
+---
+
+### 4. Multiple Reference Images Per Logo
+
+**Location**: `logo_detection_detr.py:397-457` (`find_best_match_multi_ref`)
+
+**Problem**: A single reference image may not capture all variations of a logo (different angles, lighting, scales).
+
+**Solution**: Use multiple reference images per logo and aggregate their similarity scores:
+- Calculate similarity to each reference embedding
+- Count how many references match above threshold
+- Use mean or max similarity as the aggregate score
+- Require a minimum number of references to match
+
+**Configuration**:
+- `refs_per_logo`: Number of reference images (default: 3)
+- `min_matching_refs`: Minimum references that must match
+- `use_mean_similarity`: Use mean vs max aggregation
+
+---
+
+### 5. Margin-Based Matching
+
+**Location**: `logo_detection_detr.py:459-505` (`find_best_match_with_margin`)
+
+**Problem**: When multiple logos have similar embeddings, the best match may not be significantly better than alternatives, leading to false positives.
+
+**Solution**: Require the best match to exceed the second-best match by a minimum margin:
+
+```
+Match only if: best_similarity - second_best_similarity >= margin
+```
+
+This ensures confident matches and reduces ambiguous classifications.
+
+**Configuration**: `--margin` parameter (default: 0.05)
+
+**Example**:
+- Best match: Logo A with similarity 0.82
+- Second best: Logo B with similarity 0.79
+- Margin required: 0.05
+- Result: **No match** (0.82 - 0.79 = 0.03 < 0.05)
+
+---
+
+### 6. Embedding Caching
+
+**Location**: `test_logo_detection.py:49-82` (`EmbeddingCache` class)
+
+**Problem**: Computing CLIP embeddings is computationally expensive. Re-running tests would reprocess the same images.
+
+**Solution**: Cache embeddings to disk using pickle:
+- Reference embeddings keyed by `ref:{filename}`
+- Detection results keyed by `det:{filename}`
+- Cache persists between runs (`.embedding_cache.pkl`)
+
+**Configuration**:
+- `--no-cache`: Disable caching entirely
+- `--clear-cache`: Clear cache before running
+
+---
+
+### 7. Normalized Embeddings for Cosine Similarity
+
+**Location**: `logo_detection_detr.py:334-335`
+
+**Problem**: Raw CLIP embeddings have varying magnitudes, which can affect similarity calculations.
+
+**Solution**: L2-normalize all embeddings before comparison:
+
+```python
+features = F.normalize(features, dim=-1)
+```
+
+This ensures cosine similarity is computed correctly and scores fall in the range [-1, 1].
+
+---
+
+## Matching Methods Summary
+
+| Method | Test Script Option | Key Feature |
+|--------|-------------------|-------------|
+| `find_best_match` | N/A (library only) | Returns highest similarity above threshold |
+| `find_best_match_with_margin` | `--matching-method margin` | Requires margin over second-best match |
+| `find_best_match_multi_ref` | `--matching-method multi-ref` | Aggregates scores across reference images |
+
+The test script supports both `margin` and `multi-ref` matching methods via the `--matching-method` parameter.
+
+---
+
+## Detection Pipeline Summary
+
+```
+Input Image
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  DETR Object Detection              │
+│  - Identifies potential logo regions│
+│  - Returns bounding boxes + scores  │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Confidence Filtering               │
+│  - Remove detections < threshold    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Size Filtering                     │
+│  - Remove boxes < min_box_size      │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  CLIP Embedding Extraction          │
+│  - Crop each detected region        │
+│  - Generate normalized embedding    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Non-Maximum Suppression            │
+│  - Remove overlapping detections    │
+│  - Keep highest confidence boxes    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│  Matching (selectable method)       │
+│  ┌───────────────┬────────────────┐ │
+│  │ margin        │ multi-ref      │ │
+│  ├───────────────┼────────────────┤ │
+│  │ Require margin│ Aggregate      │ │
+│  │ over 2nd best │ across refs    │ │
+│  │ match         │ (mean or max)  │ │
+│  └───────────────┴────────────────┘ │
+└─────────────────────────────────────┘
+    │
+    ▼
+Matched Logo Labels
+```
+
+---
+
+## Tuning Recommendations
+
+### For Margin-Based Matching (`--matching-method margin`)
+
+| Goal | Adjustments |
+|------|-------------|
+| **Reduce false positives** | Increase `--threshold`, increase `--margin` |
+| **Reduce false negatives** | Decrease `--threshold`, decrease `--margin` |
+
+### For Multi-Ref Matching (`--matching-method multi-ref`)
+
+| Goal | Adjustments |
+|------|-------------|
+| **Reduce false positives** | Increase `--threshold`, increase `--min-matching-refs`, use mean similarity |
+| **Reduce false negatives** | Decrease `--threshold`, decrease `--min-matching-refs`, use `--use-max-similarity` |
+
+### General Tuning
+
+| Goal | Adjustments |
+|------|-------------|
+| **Faster processing** | Decrease `--refs-per-logo`, use caching |
+| **More robust detection** | Increase `--refs-per-logo`, decrease `--detr-threshold` |
+| **Higher precision** | Increase `--detr-threshold`, use margin method with high margin |
+| **Higher recall** | Decrease `--detr-threshold`, use multi-ref with low `--min-matching-refs` |
+
+---
+
+## Example Usage
+
+```bash
+# Default margin-based matching
+python test_logo_detection.py -n 20 --threshold 0.75 --margin 0.05
+
+# Multi-ref matching with mean similarity
+python test_logo_detection.py -n 20 --matching-method multi-ref \
+    --refs-per-logo 5 --min-matching-refs 2 --threshold 0.70
+
+# Multi-ref matching with max similarity (more lenient)
+python test_logo_detection.py -n 20 --matching-method multi-ref \
+    --refs-per-logo 5 --min-matching-refs 1 --use-max-similarity
+
+# Reproducible test with seed
+python test_logo_detection.py -n 50 --seed 42 --clear-cache
+```