Add DETR+CLIP based logo detection library and test framework: - DetectLogosDETR class for logo detection and matching - Test script with margin-based and multi-ref matching methods - Data preparation script for test database - Documentation for API usage and test methodology
9.2 KiB
DetectLogosDETR Class Usage Guide
Overview
The DetectLogosDETR class provides logo detection using:
- DETR (DEtection TRansformer) for initial logo region detection
- CLIP (Contrastive Language-Image Pre-training) for feature embeddings and matching
Key Features
1. Constructor - Initializes models with CUDA support
from scan_utils.logo_detection_detr import DetectLogosDETR
detector = DetectLogosDETR(logger, detr_threshold=0.5)
- Automatically detects and uses CUDA if available
- Loads DETR for logo region detection
- Loads CLIP for feature embeddings
detr_threshold: Confidence threshold for DETR detections (0-1, default: 0.5)
2. Main Detection Methods
detect(image) - Detect logos and return embeddings
detections = detector.detect(opencv_image)
# Returns: [{'box': {...}, 'score': 0.95, 'embedding': tensor, 'label': 'logo'}, ...]
Returns a list of dictionaries, each containing:
box: Dictionary withxmin,ymin,xmax,ymax(pixel coordinates)score: DETR confidence score (float 0-1)embedding: CLIP feature embedding (torch.Tensor)label: DETR predicted label (string)
get_embedding(image) - Get embedding for reference logos
embedding = detector.get_embedding(reference_logo_image)
# For caching reference logo embeddings
- Takes OpenCV image (BGR format)
- Returns normalized CLIP embedding (torch.Tensor, shape: [1, 512])
- Used to compute embeddings for reference logos that will be cached
compare_embeddings(emb1, emb2) - Compute cosine similarity
similarity = detector.compare_embeddings(detected_emb, reference_emb)
# Returns: float (0-1, higher = more similar)
- Compares two CLIP embeddings
- Returns cosine similarity score (float, range: -1 to 1, typically 0 to 1)
3. Convenience Methods
find_best_match() - Find best matching reference logo
match = detector.find_best_match(
detected_embedding,
reference_embeddings,
similarity_threshold=0.7
)
# Returns: (label, similarity) or None
Parameters:
detected_embedding: CLIP embedding from detected logo regionreference_embeddings: List of (label, embedding) tuples for reference logossimilarity_threshold: Minimum similarity to consider a match (0-1, default: 0.7)
Returns:
- Tuple of (label, similarity) for best match, or None if no match above threshold
detect_and_match() - One-step detection and matching
matches = detector.detect_and_match(
image,
reference_embeddings,
similarity_threshold=0.7
)
Convenience method that combines detection and matching in one step.
Returns:
- List of matched detections, each containing:
box: Bounding box coordinatesdetr_score: DETR confidence scoreclip_similarity: CLIP similarity scorelabel: Matched reference logo label
4. Advanced Matching Methods
These methods provide improved accuracy over basic matching.
find_best_match_with_margin() - Margin-based matching
Requires the best match to exceed the second-best by a minimum margin, reducing false positives from ambiguous matches.
match = detector.find_best_match_with_margin(
detected_embedding,
reference_embeddings, # List of (label, embedding) tuples
similarity_threshold=0.85,
margin=0.05
)
# Returns: (label, similarity) or None
Parameters:
detected_embedding: CLIP embedding from detected logo regionreference_embeddings: List of (label, embedding) tuples for reference logossimilarity_threshold: Minimum similarity to consider a match (0-1, default: 0.85)margin: Required difference between best and second-best match (default: 0.05)
Returns:
- Tuple of (label, similarity) for best match, or None if:
- No match above threshold, OR
- Best match doesn't exceed second-best by the required margin
Example:
# Best match: Logo A (0.82), Second best: Logo B (0.79)
# With margin=0.05: No match returned (0.82 - 0.79 = 0.03 < 0.05)
# This prevents false positives when multiple logos look similar
find_best_match_multi_ref() - Multi-reference matching
Uses multiple reference images per logo for more robust matching, aggregating similarity scores across references.
match = detector.find_best_match_multi_ref(
detected_embedding,
reference_embeddings, # Dict: logo_name -> list of embeddings
similarity_threshold=0.85,
min_matching_refs=1,
use_mean_similarity=True
)
# Returns: (label, similarity, num_matching_refs) or None
Parameters:
detected_embedding: CLIP embedding from detected logo regionreference_embeddings: Dict mapping logo name to list of embeddingssimilarity_threshold: Minimum similarity to consider a match (0-1, default: 0.85)min_matching_refs: Minimum number of references that must match above threshold (default: 1)use_mean_similarity: If True, use mean similarity; if False, use max (default: True)
Returns:
- Tuple of (label, similarity, num_matching_refs) for best match, or None if no match meets criteria
Example:
# Build multi-ref embeddings dict
multi_ref_embeddings = {
"Nike": [embedding1, embedding2, embedding3],
"Adidas": [embedding4, embedding5],
}
match = detector.find_best_match_multi_ref(
detected_embedding,
multi_ref_embeddings,
similarity_threshold=0.80,
min_matching_refs=2, # At least 2 refs must match
use_mean_similarity=True # Average across all refs
)
if match:
label, avg_similarity, num_refs_matched = match
print(f"Matched {label} with {avg_similarity:.3f} ({num_refs_matched} refs matched)")
Usage Pattern (Similar to Face Recognition)
The class is designed to work with the caching pattern in scan.py:
from scan_utils.logo_detection_detr import DetectLogosDETR
# Initialize detector
detector = DetectLogosDETR(logger, detr_threshold=0.5)
# 1. Get embeddings for detected logos (cached per image)
detections = detector.detect(target_image)
# 2. Get/cache reference logo embeddings
reference_embeddings = []
for logo_file in reference_logos:
# Check cache first (kvstore)
logo_key = make_image_key("logo_reference", logo_file)
embedding = kv.get_torch(logo_key)
if embedding is None:
# Load and compute embedding
logo_img = image_processor.load_image_safely(logo_file)
embedding = detector.get_embedding(logo_img)
# Cache for future use
kv.put_torch(logo_key, embedding)
reference_embeddings.append((logo_name, embedding))
# 3. Match detections against references
matched_logos = []
for detection in detections:
match = detector.find_best_match(
detection['embedding'],
reference_embeddings,
similarity_threshold=0.7
)
if match:
label, similarity = match
matched_logos.append({
'label': label,
'box': detection['box'],
'detr_score': detection['score'],
'clip_similarity': similarity
})
# Logo identified!
Caching Strategy
This follows the same caching pattern as facial recognition:
-
Target Image Embeddings: Cache DETR detections and CLIP embeddings per image
- Key:
make_image_key("logo_detection", image_path) - Avoids re-running DETR on the same image
- Key:
-
Reference Logo Embeddings: Cache CLIP embeddings for reference logos
- Key:
make_image_key("logo_reference", logo_path) - Computed once and reused across all image scans
- Key:
-
Benefits:
- DETR only runs once per target image
- CLIP only runs once per reference logo
- Subsequent scans only perform embedding comparisons (very fast)
Integration Example
def detect_logos_with_caching(
detector,
img_file,
reference_logos,
max_size=1920
):
# Load and resize image
im_in = image_processor.load_image_safely(img_file)
img = resize_if_needed_opt(im_in, max_size)
# Check cache for detections
detection_key = make_image_key("logo_detection", img_file)
cached_data = kv.get(detection_key)
if cached_data:
# Use cached detections
detections = json.loads(cached_data)
logger.debug("Logo detections loaded from cache")
else:
# Run detection and cache results
detections = detector.detect(img)
kv.put(detection_key, json.dumps(detections))
# Load reference embeddings (with caching)
reference_embeddings = []
for logo_name, logo_path in reference_logos:
ref_key = make_image_key("logo_reference", logo_path)
embedding = kv.get_torch(ref_key)
if embedding is None:
logo_img = image_processor.load_image_safely(logo_path)
embedding = detector.get_embedding(logo_img)
kv.put_torch(ref_key, embedding)
reference_embeddings.append((logo_name, embedding))
# Match and return results
return detector.detect_and_match(
img,
reference_embeddings,
similarity_threshold=0.7
)
Performance Considerations
- First Run: Slower (DETR + CLIP inference)
- Cached Runs: Much faster (only embedding comparisons)
- GPU Acceleration: Automatically uses CUDA if available
- Memory: Models loaded once and reused across all images