# DetectLogosDETR Class Usage Guide ## Overview The `DetectLogosDETR` class provides logo detection using: - **DETR** (DEtection TRansformer) for initial logo region detection - **CLIP** (Contrastive Language-Image Pre-training) for feature embeddings and matching ## Key Features ### 1. **Constructor** - Initializes models with CUDA support ```python from scan_utils.logo_detection_detr import DetectLogosDETR detector = DetectLogosDETR(logger, detr_threshold=0.5) ``` - Automatically detects and uses CUDA if available - Loads DETR for logo region detection - Loads CLIP for feature embeddings - `detr_threshold`: Confidence threshold for DETR detections (0-1, default: 0.5) ### 2. **Main Detection Methods** #### `detect(image)` - Detect logos and return embeddings ```python detections = detector.detect(opencv_image) # Returns: [{'box': {...}, 'score': 0.95, 'embedding': tensor, 'label': 'logo'}, ...] ``` Returns a list of dictionaries, each containing: - `box`: Dictionary with `xmin`, `ymin`, `xmax`, `ymax` (pixel coordinates) - `score`: DETR confidence score (float 0-1) - `embedding`: CLIP feature embedding (torch.Tensor) - `label`: DETR predicted label (string) #### `get_embedding(image)` - Get embedding for reference logos ```python embedding = detector.get_embedding(reference_logo_image) # For caching reference logo embeddings ``` - Takes OpenCV image (BGR format) - Returns normalized CLIP embedding (torch.Tensor, shape: [1, 512]) - Used to compute embeddings for reference logos that will be cached #### `compare_embeddings(emb1, emb2)` - Compute cosine similarity ```python similarity = detector.compare_embeddings(detected_emb, reference_emb) # Returns: float (0-1, higher = more similar) ``` - Compares two CLIP embeddings - Returns cosine similarity score (float, range: -1 to 1, typically 0 to 1) ### 3. **Convenience Methods** #### `find_best_match()` - Find best matching reference logo ```python match = detector.find_best_match( detected_embedding, reference_embeddings, similarity_threshold=0.7 ) # Returns: (label, similarity) or None ``` **Parameters:** - `detected_embedding`: CLIP embedding from detected logo region - `reference_embeddings`: List of (label, embedding) tuples for reference logos - `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.7) **Returns:** - Tuple of (label, similarity) for best match, or None if no match above threshold #### `detect_and_match()` - One-step detection and matching ```python matches = detector.detect_and_match( image, reference_embeddings, similarity_threshold=0.7 ) ``` Convenience method that combines detection and matching in one step. **Returns:** - List of matched detections, each containing: - `box`: Bounding box coordinates - `detr_score`: DETR confidence score - `clip_similarity`: CLIP similarity score - `label`: Matched reference logo label ### 4. **Advanced Matching Methods** These methods provide improved accuracy over basic matching. #### `find_best_match_with_margin()` - Margin-based matching Requires the best match to exceed the second-best by a minimum margin, reducing false positives from ambiguous matches. ```python match = detector.find_best_match_with_margin( detected_embedding, reference_embeddings, # List of (label, embedding) tuples similarity_threshold=0.85, margin=0.05 ) # Returns: (label, similarity) or None ``` **Parameters:** - `detected_embedding`: CLIP embedding from detected logo region - `reference_embeddings`: List of (label, embedding) tuples for reference logos - `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85) - `margin`: Required difference between best and second-best match (default: 0.05) **Returns:** - Tuple of (label, similarity) for best match, or None if: - No match above threshold, OR - Best match doesn't exceed second-best by the required margin **Example:** ```python # Best match: Logo A (0.82), Second best: Logo B (0.79) # With margin=0.05: No match returned (0.82 - 0.79 = 0.03 < 0.05) # This prevents false positives when multiple logos look similar ``` #### `find_best_match_multi_ref()` - Multi-reference matching Uses multiple reference images per logo for more robust matching, aggregating similarity scores across references. ```python match = detector.find_best_match_multi_ref( detected_embedding, reference_embeddings, # Dict: logo_name -> list of embeddings similarity_threshold=0.85, min_matching_refs=1, use_mean_similarity=True, margin=0.05 ) # Returns: (label, similarity, num_matching_refs) or None ``` **Parameters:** - `detected_embedding`: CLIP embedding from detected logo region - `reference_embeddings`: Dict mapping logo name to list of embeddings - `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85) - `min_matching_refs`: Minimum number of references that must match above threshold (default: 1) - `use_mean_similarity`: If True, use mean similarity; if False, use max (default: True) - `margin`: Required difference between best and second-best logo scores (default: 0.0) **Returns:** - Tuple of (label, similarity, num_matching_refs) for best match, or None if: - No logo meets the min_matching_refs requirement, OR - Best score is below threshold, OR - Best score doesn't exceed second-best by the required margin **Example:** ```python # Build multi-ref embeddings dict multi_ref_embeddings = { "Nike": [embedding1, embedding2, embedding3], "Adidas": [embedding4, embedding5], } match = detector.find_best_match_multi_ref( detected_embedding, multi_ref_embeddings, similarity_threshold=0.80, min_matching_refs=2, # At least 2 refs must match use_mean_similarity=True, # Average across all refs margin=0.05 # Require 0.05 margin over second-best logo ) if match: label, avg_similarity, num_refs_matched = match print(f"Matched {label} with {avg_similarity:.3f} ({num_refs_matched} refs matched)") ``` ## Usage Pattern (Similar to Face Recognition) The class is designed to work with the caching pattern in scan.py: ```python from scan_utils.logo_detection_detr import DetectLogosDETR # Initialize detector detector = DetectLogosDETR(logger, detr_threshold=0.5) # 1. Get embeddings for detected logos (cached per image) detections = detector.detect(target_image) # 2. Get/cache reference logo embeddings reference_embeddings = [] for logo_file in reference_logos: # Check cache first (kvstore) logo_key = make_image_key("logo_reference", logo_file) embedding = kv.get_torch(logo_key) if embedding is None: # Load and compute embedding logo_img = image_processor.load_image_safely(logo_file) embedding = detector.get_embedding(logo_img) # Cache for future use kv.put_torch(logo_key, embedding) reference_embeddings.append((logo_name, embedding)) # 3. Match detections against references matched_logos = [] for detection in detections: match = detector.find_best_match( detection['embedding'], reference_embeddings, similarity_threshold=0.7 ) if match: label, similarity = match matched_logos.append({ 'label': label, 'box': detection['box'], 'detr_score': detection['score'], 'clip_similarity': similarity }) # Logo identified! ``` ## Caching Strategy This follows the same caching pattern as facial recognition: 1. **Target Image Embeddings**: Cache DETR detections and CLIP embeddings per image - Key: `make_image_key("logo_detection", image_path)` - Avoids re-running DETR on the same image 2. **Reference Logo Embeddings**: Cache CLIP embeddings for reference logos - Key: `make_image_key("logo_reference", logo_path)` - Computed once and reused across all image scans 3. **Benefits**: - DETR only runs once per target image - CLIP only runs once per reference logo - Subsequent scans only perform embedding comparisons (very fast) ## Integration Example ```python def detect_logos_with_caching( detector, img_file, reference_logos, max_size=1920 ): # Load and resize image im_in = image_processor.load_image_safely(img_file) img = resize_if_needed_opt(im_in, max_size) # Check cache for detections detection_key = make_image_key("logo_detection", img_file) cached_data = kv.get(detection_key) if cached_data: # Use cached detections detections = json.loads(cached_data) logger.debug("Logo detections loaded from cache") else: # Run detection and cache results detections = detector.detect(img) kv.put(detection_key, json.dumps(detections)) # Load reference embeddings (with caching) reference_embeddings = [] for logo_name, logo_path in reference_logos: ref_key = make_image_key("logo_reference", logo_path) embedding = kv.get_torch(ref_key) if embedding is None: logo_img = image_processor.load_image_safely(logo_path) embedding = detector.get_embedding(logo_img) kv.put_torch(ref_key, embedding) reference_embeddings.append((logo_name, embedding)) # Match and return results return detector.detect_and_match( img, reference_embeddings, similarity_threshold=0.7 ) ``` ## Performance Considerations - **First Run**: Slower (DETR + CLIP inference) - **Cached Runs**: Much faster (only embedding comparisons) - **GPU Acceleration**: Automatically uses CUDA if available - **Memory**: Models loaded once and reused across all images