Files

Rick McEwen ddccf653d2 Initial commit: Logo detection test framework

Add DETR+CLIP based logo detection library and test framework:
- DetectLogosDETR class for logo detection and matching
- Test script with margin-based and multi-ref matching methods
- Data preparation script for test database
- Documentation for API usage and test methodology

2025-12-31 10:42:36 -05:00

9.2 KiB

Raw Blame History

DetectLogosDETR Class Usage Guide

Overview

The DetectLogosDETR class provides logo detection using:

DETR (DEtection TRansformer) for initial logo region detection
CLIP (Contrastive Language-Image Pre-training) for feature embeddings and matching

Key Features

1. Constructor - Initializes models with CUDA support

from scan_utils.logo_detection_detr import DetectLogosDETR

detector = DetectLogosDETR(logger, detr_threshold=0.5)

Automatically detects and uses CUDA if available
Loads DETR for logo region detection
Loads CLIP for feature embeddings
detr_threshold: Confidence threshold for DETR detections (0-1, default: 0.5)

2. Main Detection Methods

`detect(image)` - Detect logos and return embeddings

detections = detector.detect(opencv_image)
# Returns: [{'box': {...}, 'score': 0.95, 'embedding': tensor, 'label': 'logo'}, ...]

Returns a list of dictionaries, each containing:

box: Dictionary with xmin, ymin, xmax, ymax (pixel coordinates)
score: DETR confidence score (float 0-1)
embedding: CLIP feature embedding (torch.Tensor)
label: DETR predicted label (string)

`get_embedding(image)` - Get embedding for reference logos

embedding = detector.get_embedding(reference_logo_image)
# For caching reference logo embeddings

Takes OpenCV image (BGR format)
Returns normalized CLIP embedding (torch.Tensor, shape: [1, 512])
Used to compute embeddings for reference logos that will be cached

`compare_embeddings(emb1, emb2)` - Compute cosine similarity

similarity = detector.compare_embeddings(detected_emb, reference_emb)
# Returns: float (0-1, higher = more similar)

Compares two CLIP embeddings
Returns cosine similarity score (float, range: -1 to 1, typically 0 to 1)

3. Convenience Methods

`find_best_match()` - Find best matching reference logo

match = detector.find_best_match(
    detected_embedding,
    reference_embeddings,
    similarity_threshold=0.7
)
# Returns: (label, similarity) or None

Parameters:

detected_embedding: CLIP embedding from detected logo region
reference_embeddings: List of (label, embedding) tuples for reference logos
similarity_threshold: Minimum similarity to consider a match (0-1, default: 0.7)

Returns:

Tuple of (label, similarity) for best match, or None if no match above threshold

`detect_and_match()` - One-step detection and matching

matches = detector.detect_and_match(
    image,
    reference_embeddings,
    similarity_threshold=0.7
)

Convenience method that combines detection and matching in one step.

Returns:

List of matched detections, each containing:
- box: Bounding box coordinates
- detr_score: DETR confidence score
- clip_similarity: CLIP similarity score
- label: Matched reference logo label

4. Advanced Matching Methods

These methods provide improved accuracy over basic matching.

`find_best_match_with_margin()` - Margin-based matching

Requires the best match to exceed the second-best by a minimum margin, reducing false positives from ambiguous matches.

match = detector.find_best_match_with_margin(
    detected_embedding,
    reference_embeddings,      # List of (label, embedding) tuples
    similarity_threshold=0.85,
    margin=0.05
)
# Returns: (label, similarity) or None

Parameters:

detected_embedding: CLIP embedding from detected logo region
reference_embeddings: List of (label, embedding) tuples for reference logos
similarity_threshold: Minimum similarity to consider a match (0-1, default: 0.85)
margin: Required difference between best and second-best match (default: 0.05)

Returns:

Tuple of (label, similarity) for best match, or None if:
- No match above threshold, OR
- Best match doesn't exceed second-best by the required margin

Example:

# Best match: Logo A (0.82), Second best: Logo B (0.79)
# With margin=0.05: No match returned (0.82 - 0.79 = 0.03 < 0.05)
# This prevents false positives when multiple logos look similar

`find_best_match_multi_ref()` - Multi-reference matching

Uses multiple reference images per logo for more robust matching, aggregating similarity scores across references.

match = detector.find_best_match_multi_ref(
    detected_embedding,
    reference_embeddings,       # Dict: logo_name -> list of embeddings
    similarity_threshold=0.85,
    min_matching_refs=1,
    use_mean_similarity=True
)
# Returns: (label, similarity, num_matching_refs) or None

Parameters:

detected_embedding: CLIP embedding from detected logo region
reference_embeddings: Dict mapping logo name to list of embeddings
similarity_threshold: Minimum similarity to consider a match (0-1, default: 0.85)
min_matching_refs: Minimum number of references that must match above threshold (default: 1)
use_mean_similarity: If True, use mean similarity; if False, use max (default: True)

Returns:

Tuple of (label, similarity, num_matching_refs) for best match, or None if no match meets criteria

Example:

# Build multi-ref embeddings dict
multi_ref_embeddings = {
    "Nike": [embedding1, embedding2, embedding3],
    "Adidas": [embedding4, embedding5],
}

match = detector.find_best_match_multi_ref(
    detected_embedding,
    multi_ref_embeddings,
    similarity_threshold=0.80,
    min_matching_refs=2,        # At least 2 refs must match
    use_mean_similarity=True    # Average across all refs
)

if match:
    label, avg_similarity, num_refs_matched = match
    print(f"Matched {label} with {avg_similarity:.3f} ({num_refs_matched} refs matched)")

Usage Pattern (Similar to Face Recognition)

The class is designed to work with the caching pattern in scan.py:

from scan_utils.logo_detection_detr import DetectLogosDETR

# Initialize detector
detector = DetectLogosDETR(logger, detr_threshold=0.5)

# 1. Get embeddings for detected logos (cached per image)
detections = detector.detect(target_image)

# 2. Get/cache reference logo embeddings
reference_embeddings = []
for logo_file in reference_logos:
    # Check cache first (kvstore)
    logo_key = make_image_key("logo_reference", logo_file)
    embedding = kv.get_torch(logo_key)

    if embedding is None:
        # Load and compute embedding
        logo_img = image_processor.load_image_safely(logo_file)
        embedding = detector.get_embedding(logo_img)

        # Cache for future use
        kv.put_torch(logo_key, embedding)

    reference_embeddings.append((logo_name, embedding))

# 3. Match detections against references
matched_logos = []
for detection in detections:
    match = detector.find_best_match(
        detection['embedding'],
        reference_embeddings,
        similarity_threshold=0.7
    )

    if match:
        label, similarity = match
        matched_logos.append({
            'label': label,
            'box': detection['box'],
            'detr_score': detection['score'],
            'clip_similarity': similarity
        })
        # Logo identified!

Caching Strategy

This follows the same caching pattern as facial recognition:

Target Image Embeddings: Cache DETR detections and CLIP embeddings per image
- Key: make_image_key("logo_detection", image_path)
- Avoids re-running DETR on the same image
Reference Logo Embeddings: Cache CLIP embeddings for reference logos
- Key: make_image_key("logo_reference", logo_path)
- Computed once and reused across all image scans
Benefits:
- DETR only runs once per target image
- CLIP only runs once per reference logo
- Subsequent scans only perform embedding comparisons (very fast)

Integration Example

def detect_logos_with_caching(
    detector,
    img_file,
    reference_logos,
    max_size=1920
):
    # Load and resize image
    im_in = image_processor.load_image_safely(img_file)
    img = resize_if_needed_opt(im_in, max_size)

    # Check cache for detections
    detection_key = make_image_key("logo_detection", img_file)
    cached_data = kv.get(detection_key)

    if cached_data:
        # Use cached detections
        detections = json.loads(cached_data)
        logger.debug("Logo detections loaded from cache")
    else:
        # Run detection and cache results
        detections = detector.detect(img)
        kv.put(detection_key, json.dumps(detections))

    # Load reference embeddings (with caching)
    reference_embeddings = []
    for logo_name, logo_path in reference_logos:
        ref_key = make_image_key("logo_reference", logo_path)
        embedding = kv.get_torch(ref_key)

        if embedding is None:
            logo_img = image_processor.load_image_safely(logo_path)
            embedding = detector.get_embedding(logo_img)
            kv.put_torch(ref_key, embedding)

        reference_embeddings.append((logo_name, embedding))

    # Match and return results
    return detector.detect_and_match(
        img,
        reference_embeddings,
        similarity_threshold=0.7
    )

Performance Considerations

First Run: Slower (DETR + CLIP inference)
Cached Runs: Much faster (only embedding comparisons)
GPU Acceleration: Automatically uses CUDA if available
Memory: Models loaded once and reused across all images

9.2 KiB Raw Blame History

DetectLogosDETR Class Usage Guide

Overview

Key Features

1. Constructor - Initializes models with CUDA support

2. Main Detection Methods

detect(image) - Detect logos and return embeddings

get_embedding(image) - Get embedding for reference logos

compare_embeddings(emb1, emb2) - Compute cosine similarity

3. Convenience Methods

find_best_match() - Find best matching reference logo

detect_and_match() - One-step detection and matching

4. Advanced Matching Methods

find_best_match_with_margin() - Margin-based matching

find_best_match_multi_ref() - Multi-reference matching

Usage Pattern (Similar to Face Recognition)

Caching Strategy

Integration Example

Performance Considerations

9.2 KiB

Raw Blame History

`detect(image)` - Detect logos and return embeddings

`get_embedding(image)` - Get embedding for reference logos

`compare_embeddings(emb1, emb2)` - Compute cosine similarity

`find_best_match()` - Find best matching reference logo

`detect_and_match()` - One-step detection and matching

`find_best_match_with_margin()` - Margin-based matching

`find_best_match_multi_ref()` - Multi-reference matching