Initial commit: Logo detection test framework

Add DETR+CLIP based logo detection library and test framework: - DetectLogosDETR class for logo detection and matching - Test script with margin-based and multi-ref matching methods - Data preparation script for test database - Documentation for API usage and test methodology
2025-12-31 10:42:36 -05:00
commit ddccf653d2
14 changed files with 3457 additions and 0 deletions
--- a/logo_detection_detr_usage.md
+++ b/logo_detection_detr_usage.md
@ -0,0 +1,301 @@
+# DetectLogosDETR Class Usage Guide
+
+## Overview
+
+The `DetectLogosDETR` class provides logo detection using:
+- **DETR** (DEtection TRansformer) for initial logo region detection
+- **CLIP** (Contrastive Language-Image Pre-training) for feature embeddings and matching
+
+## Key Features
+
+### 1. **Constructor** - Initializes models with CUDA support
+
+```python
+from scan_utils.logo_detection_detr import DetectLogosDETR
+
+detector = DetectLogosDETR(logger, detr_threshold=0.5)
+```
+
+- Automatically detects and uses CUDA if available
+- Loads DETR for logo region detection
+- Loads CLIP for feature embeddings
+- `detr_threshold`: Confidence threshold for DETR detections (0-1, default: 0.5)
+
+### 2. **Main Detection Methods**
+
+#### `detect(image)` - Detect logos and return embeddings
+
+```python
+detections = detector.detect(opencv_image)
+# Returns: [{'box': {...}, 'score': 0.95, 'embedding': tensor, 'label': 'logo'}, ...]
+```
+
+Returns a list of dictionaries, each containing:
+- `box`: Dictionary with `xmin`, `ymin`, `xmax`, `ymax` (pixel coordinates)
+- `score`: DETR confidence score (float 0-1)
+- `embedding`: CLIP feature embedding (torch.Tensor)
+- `label`: DETR predicted label (string)
+
+#### `get_embedding(image)` - Get embedding for reference logos
+
+```python
+embedding = detector.get_embedding(reference_logo_image)
+# For caching reference logo embeddings
+```
+
+- Takes OpenCV image (BGR format)
+- Returns normalized CLIP embedding (torch.Tensor, shape: [1, 512])
+- Used to compute embeddings for reference logos that will be cached
+
+#### `compare_embeddings(emb1, emb2)` - Compute cosine similarity
+
+```python
+similarity = detector.compare_embeddings(detected_emb, reference_emb)
+# Returns: float (0-1, higher = more similar)
+```
+
+- Compares two CLIP embeddings
+- Returns cosine similarity score (float, range: -1 to 1, typically 0 to 1)
+
+### 3. **Convenience Methods**
+
+#### `find_best_match()` - Find best matching reference logo
+
+```python
+match = detector.find_best_match(
+    detected_embedding,
+    reference_embeddings,
+    similarity_threshold=0.7
+)
+# Returns: (label, similarity) or None
+```
+
+**Parameters:**
+- `detected_embedding`: CLIP embedding from detected logo region
+- `reference_embeddings`: List of (label, embedding) tuples for reference logos
+- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.7)
+
+**Returns:**
+- Tuple of (label, similarity) for best match, or None if no match above threshold
+
+#### `detect_and_match()` - One-step detection and matching
+
+```python
+matches = detector.detect_and_match(
+    image,
+    reference_embeddings,
+    similarity_threshold=0.7
+)
+```
+
+Convenience method that combines detection and matching in one step.
+
+**Returns:**
+- List of matched detections, each containing:
+  - `box`: Bounding box coordinates
+  - `detr_score`: DETR confidence score
+  - `clip_similarity`: CLIP similarity score
+  - `label`: Matched reference logo label
+
+### 4. **Advanced Matching Methods**
+
+These methods provide improved accuracy over basic matching.
+
+#### `find_best_match_with_margin()` - Margin-based matching
+
+Requires the best match to exceed the second-best by a minimum margin, reducing false positives from ambiguous matches.
+
+```python
+match = detector.find_best_match_with_margin(
+    detected_embedding,
+    reference_embeddings,      # List of (label, embedding) tuples
+    similarity_threshold=0.85,
+    margin=0.05
+)
+# Returns: (label, similarity) or None
+```
+
+**Parameters:**
+- `detected_embedding`: CLIP embedding from detected logo region
+- `reference_embeddings`: List of (label, embedding) tuples for reference logos
+- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85)
+- `margin`: Required difference between best and second-best match (default: 0.05)
+
+**Returns:**
+- Tuple of (label, similarity) for best match, or None if:
+  - No match above threshold, OR
+  - Best match doesn't exceed second-best by the required margin
+
+**Example:**
+```python
+# Best match: Logo A (0.82), Second best: Logo B (0.79)
+# With margin=0.05: No match returned (0.82 - 0.79 = 0.03 < 0.05)
+# This prevents false positives when multiple logos look similar
+```
+
+#### `find_best_match_multi_ref()` - Multi-reference matching
+
+Uses multiple reference images per logo for more robust matching, aggregating similarity scores across references.
+
+```python
+match = detector.find_best_match_multi_ref(
+    detected_embedding,
+    reference_embeddings,       # Dict: logo_name -> list of embeddings
+    similarity_threshold=0.85,
+    min_matching_refs=1,
+    use_mean_similarity=True
+)
+# Returns: (label, similarity, num_matching_refs) or None
+```
+
+**Parameters:**
+- `detected_embedding`: CLIP embedding from detected logo region
+- `reference_embeddings`: Dict mapping logo name to list of embeddings
+- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85)
+- `min_matching_refs`: Minimum number of references that must match above threshold (default: 1)
+- `use_mean_similarity`: If True, use mean similarity; if False, use max (default: True)
+
+**Returns:**
+- Tuple of (label, similarity, num_matching_refs) for best match, or None if no match meets criteria
+
+**Example:**
+```python
+# Build multi-ref embeddings dict
+multi_ref_embeddings = {
+    "Nike": [embedding1, embedding2, embedding3],
+    "Adidas": [embedding4, embedding5],
+}
+
+match = detector.find_best_match_multi_ref(
+    detected_embedding,
+    multi_ref_embeddings,
+    similarity_threshold=0.80,
+    min_matching_refs=2,        # At least 2 refs must match
+    use_mean_similarity=True    # Average across all refs
+)
+
+if match:
+    label, avg_similarity, num_refs_matched = match
+    print(f"Matched {label} with {avg_similarity:.3f} ({num_refs_matched} refs matched)")
+```
+
+## Usage Pattern (Similar to Face Recognition)
+
+The class is designed to work with the caching pattern in scan.py:
+
+```python
+from scan_utils.logo_detection_detr import DetectLogosDETR
+
+# Initialize detector
+detector = DetectLogosDETR(logger, detr_threshold=0.5)
+
+# 1. Get embeddings for detected logos (cached per image)
+detections = detector.detect(target_image)
+
+# 2. Get/cache reference logo embeddings
+reference_embeddings = []
+for logo_file in reference_logos:
+    # Check cache first (kvstore)
+    logo_key = make_image_key("logo_reference", logo_file)
+    embedding = kv.get_torch(logo_key)
+
+    if embedding is None:
+        # Load and compute embedding
+        logo_img = image_processor.load_image_safely(logo_file)
+        embedding = detector.get_embedding(logo_img)
+
+        # Cache for future use
+        kv.put_torch(logo_key, embedding)
+
+    reference_embeddings.append((logo_name, embedding))
+
+# 3. Match detections against references
+matched_logos = []
+for detection in detections:
+    match = detector.find_best_match(
+        detection['embedding'],
+        reference_embeddings,
+        similarity_threshold=0.7
+    )
+
+    if match:
+        label, similarity = match
+        matched_logos.append({
+            'label': label,
+            'box': detection['box'],
+            'detr_score': detection['score'],
+            'clip_similarity': similarity
+        })
+        # Logo identified!
+```
+
+## Caching Strategy
+
+This follows the same caching pattern as facial recognition:
+
+1. **Target Image Embeddings**: Cache DETR detections and CLIP embeddings per image
+   - Key: `make_image_key("logo_detection", image_path)`
+   - Avoids re-running DETR on the same image
+
+2. **Reference Logo Embeddings**: Cache CLIP embeddings for reference logos
+   - Key: `make_image_key("logo_reference", logo_path)`
+   - Computed once and reused across all image scans
+
+3. **Benefits**:
+   - DETR only runs once per target image
+   - CLIP only runs once per reference logo
+   - Subsequent scans only perform embedding comparisons (very fast)
+
+## Integration Example
+
+```python
+def detect_logos_with_caching(
+    detector,
+    img_file,
+    reference_logos,
+    max_size=1920
+):
+    # Load and resize image
+    im_in = image_processor.load_image_safely(img_file)
+    img = resize_if_needed_opt(im_in, max_size)
+
+    # Check cache for detections
+    detection_key = make_image_key("logo_detection", img_file)
+    cached_data = kv.get(detection_key)
+
+    if cached_data:
+        # Use cached detections
+        detections = json.loads(cached_data)
+        logger.debug("Logo detections loaded from cache")
+    else:
+        # Run detection and cache results
+        detections = detector.detect(img)
+        kv.put(detection_key, json.dumps(detections))
+
+    # Load reference embeddings (with caching)
+    reference_embeddings = []
+    for logo_name, logo_path in reference_logos:
+        ref_key = make_image_key("logo_reference", logo_path)
+        embedding = kv.get_torch(ref_key)
+
+        if embedding is None:
+            logo_img = image_processor.load_image_safely(logo_path)
+            embedding = detector.get_embedding(logo_img)
+            kv.put_torch(ref_key, embedding)
+
+        reference_embeddings.append((logo_name, embedding))
+
+    # Match and return results
+    return detector.detect_and_match(
+        img,
+        reference_embeddings,
+        similarity_threshold=0.7
+    )
+```
+
+## Performance Considerations
+
+- **First Run**: Slower (DETR + CLIP inference)
+- **Cached Runs**: Much faster (only embedding comparisons)
+- **GPU Acceleration**: Automatically uses CUDA if available
+- **Memory**: Models loaded once and reused across all images