The multi-ref matching method was missing a margin check against other logos, causing excessive false positives. This fix adds: - margin parameter to find_best_match_multi_ref() that requires the best logo's score to exceed the second-best by a minimum margin - Test script now passes --margin to both matching methods - Updated documentation to reflect margin applies to both methods Also adds run_comparison_tests.sh to run all three matching methods and compare results.
307 lines
9.5 KiB
Markdown
307 lines
9.5 KiB
Markdown
# DetectLogosDETR Class Usage Guide
|
|
|
|
## Overview
|
|
|
|
The `DetectLogosDETR` class provides logo detection using:
|
|
- **DETR** (DEtection TRansformer) for initial logo region detection
|
|
- **CLIP** (Contrastive Language-Image Pre-training) for feature embeddings and matching
|
|
|
|
## Key Features
|
|
|
|
### 1. **Constructor** - Initializes models with CUDA support
|
|
|
|
```python
|
|
from scan_utils.logo_detection_detr import DetectLogosDETR
|
|
|
|
detector = DetectLogosDETR(logger, detr_threshold=0.5)
|
|
```
|
|
|
|
- Automatically detects and uses CUDA if available
|
|
- Loads DETR for logo region detection
|
|
- Loads CLIP for feature embeddings
|
|
- `detr_threshold`: Confidence threshold for DETR detections (0-1, default: 0.5)
|
|
|
|
### 2. **Main Detection Methods**
|
|
|
|
#### `detect(image)` - Detect logos and return embeddings
|
|
|
|
```python
|
|
detections = detector.detect(opencv_image)
|
|
# Returns: [{'box': {...}, 'score': 0.95, 'embedding': tensor, 'label': 'logo'}, ...]
|
|
```
|
|
|
|
Returns a list of dictionaries, each containing:
|
|
- `box`: Dictionary with `xmin`, `ymin`, `xmax`, `ymax` (pixel coordinates)
|
|
- `score`: DETR confidence score (float 0-1)
|
|
- `embedding`: CLIP feature embedding (torch.Tensor)
|
|
- `label`: DETR predicted label (string)
|
|
|
|
#### `get_embedding(image)` - Get embedding for reference logos
|
|
|
|
```python
|
|
embedding = detector.get_embedding(reference_logo_image)
|
|
# For caching reference logo embeddings
|
|
```
|
|
|
|
- Takes OpenCV image (BGR format)
|
|
- Returns normalized CLIP embedding (torch.Tensor, shape: [1, 512])
|
|
- Used to compute embeddings for reference logos that will be cached
|
|
|
|
#### `compare_embeddings(emb1, emb2)` - Compute cosine similarity
|
|
|
|
```python
|
|
similarity = detector.compare_embeddings(detected_emb, reference_emb)
|
|
# Returns: float (0-1, higher = more similar)
|
|
```
|
|
|
|
- Compares two CLIP embeddings
|
|
- Returns cosine similarity score (float, range: -1 to 1, typically 0 to 1)
|
|
|
|
### 3. **Convenience Methods**
|
|
|
|
#### `find_best_match()` - Find best matching reference logo
|
|
|
|
```python
|
|
match = detector.find_best_match(
|
|
detected_embedding,
|
|
reference_embeddings,
|
|
similarity_threshold=0.7
|
|
)
|
|
# Returns: (label, similarity) or None
|
|
```
|
|
|
|
**Parameters:**
|
|
- `detected_embedding`: CLIP embedding from detected logo region
|
|
- `reference_embeddings`: List of (label, embedding) tuples for reference logos
|
|
- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.7)
|
|
|
|
**Returns:**
|
|
- Tuple of (label, similarity) for best match, or None if no match above threshold
|
|
|
|
#### `detect_and_match()` - One-step detection and matching
|
|
|
|
```python
|
|
matches = detector.detect_and_match(
|
|
image,
|
|
reference_embeddings,
|
|
similarity_threshold=0.7
|
|
)
|
|
```
|
|
|
|
Convenience method that combines detection and matching in one step.
|
|
|
|
**Returns:**
|
|
- List of matched detections, each containing:
|
|
- `box`: Bounding box coordinates
|
|
- `detr_score`: DETR confidence score
|
|
- `clip_similarity`: CLIP similarity score
|
|
- `label`: Matched reference logo label
|
|
|
|
### 4. **Advanced Matching Methods**
|
|
|
|
These methods provide improved accuracy over basic matching.
|
|
|
|
#### `find_best_match_with_margin()` - Margin-based matching
|
|
|
|
Requires the best match to exceed the second-best by a minimum margin, reducing false positives from ambiguous matches.
|
|
|
|
```python
|
|
match = detector.find_best_match_with_margin(
|
|
detected_embedding,
|
|
reference_embeddings, # List of (label, embedding) tuples
|
|
similarity_threshold=0.85,
|
|
margin=0.05
|
|
)
|
|
# Returns: (label, similarity) or None
|
|
```
|
|
|
|
**Parameters:**
|
|
- `detected_embedding`: CLIP embedding from detected logo region
|
|
- `reference_embeddings`: List of (label, embedding) tuples for reference logos
|
|
- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85)
|
|
- `margin`: Required difference between best and second-best match (default: 0.05)
|
|
|
|
**Returns:**
|
|
- Tuple of (label, similarity) for best match, or None if:
|
|
- No match above threshold, OR
|
|
- Best match doesn't exceed second-best by the required margin
|
|
|
|
**Example:**
|
|
```python
|
|
# Best match: Logo A (0.82), Second best: Logo B (0.79)
|
|
# With margin=0.05: No match returned (0.82 - 0.79 = 0.03 < 0.05)
|
|
# This prevents false positives when multiple logos look similar
|
|
```
|
|
|
|
#### `find_best_match_multi_ref()` - Multi-reference matching
|
|
|
|
Uses multiple reference images per logo for more robust matching, aggregating similarity scores across references.
|
|
|
|
```python
|
|
match = detector.find_best_match_multi_ref(
|
|
detected_embedding,
|
|
reference_embeddings, # Dict: logo_name -> list of embeddings
|
|
similarity_threshold=0.85,
|
|
min_matching_refs=1,
|
|
use_mean_similarity=True,
|
|
margin=0.05
|
|
)
|
|
# Returns: (label, similarity, num_matching_refs) or None
|
|
```
|
|
|
|
**Parameters:**
|
|
- `detected_embedding`: CLIP embedding from detected logo region
|
|
- `reference_embeddings`: Dict mapping logo name to list of embeddings
|
|
- `similarity_threshold`: Minimum similarity to consider a match (0-1, default: 0.85)
|
|
- `min_matching_refs`: Minimum number of references that must match above threshold (default: 1)
|
|
- `use_mean_similarity`: If True, use mean similarity; if False, use max (default: True)
|
|
- `margin`: Required difference between best and second-best logo scores (default: 0.0)
|
|
|
|
**Returns:**
|
|
- Tuple of (label, similarity, num_matching_refs) for best match, or None if:
|
|
- No logo meets the min_matching_refs requirement, OR
|
|
- Best score is below threshold, OR
|
|
- Best score doesn't exceed second-best by the required margin
|
|
|
|
**Example:**
|
|
```python
|
|
# Build multi-ref embeddings dict
|
|
multi_ref_embeddings = {
|
|
"Nike": [embedding1, embedding2, embedding3],
|
|
"Adidas": [embedding4, embedding5],
|
|
}
|
|
|
|
match = detector.find_best_match_multi_ref(
|
|
detected_embedding,
|
|
multi_ref_embeddings,
|
|
similarity_threshold=0.80,
|
|
min_matching_refs=2, # At least 2 refs must match
|
|
use_mean_similarity=True, # Average across all refs
|
|
margin=0.05 # Require 0.05 margin over second-best logo
|
|
)
|
|
|
|
if match:
|
|
label, avg_similarity, num_refs_matched = match
|
|
print(f"Matched {label} with {avg_similarity:.3f} ({num_refs_matched} refs matched)")
|
|
```
|
|
|
|
## Usage Pattern (Similar to Face Recognition)
|
|
|
|
The class is designed to work with the caching pattern in scan.py:
|
|
|
|
```python
|
|
from scan_utils.logo_detection_detr import DetectLogosDETR
|
|
|
|
# Initialize detector
|
|
detector = DetectLogosDETR(logger, detr_threshold=0.5)
|
|
|
|
# 1. Get embeddings for detected logos (cached per image)
|
|
detections = detector.detect(target_image)
|
|
|
|
# 2. Get/cache reference logo embeddings
|
|
reference_embeddings = []
|
|
for logo_file in reference_logos:
|
|
# Check cache first (kvstore)
|
|
logo_key = make_image_key("logo_reference", logo_file)
|
|
embedding = kv.get_torch(logo_key)
|
|
|
|
if embedding is None:
|
|
# Load and compute embedding
|
|
logo_img = image_processor.load_image_safely(logo_file)
|
|
embedding = detector.get_embedding(logo_img)
|
|
|
|
# Cache for future use
|
|
kv.put_torch(logo_key, embedding)
|
|
|
|
reference_embeddings.append((logo_name, embedding))
|
|
|
|
# 3. Match detections against references
|
|
matched_logos = []
|
|
for detection in detections:
|
|
match = detector.find_best_match(
|
|
detection['embedding'],
|
|
reference_embeddings,
|
|
similarity_threshold=0.7
|
|
)
|
|
|
|
if match:
|
|
label, similarity = match
|
|
matched_logos.append({
|
|
'label': label,
|
|
'box': detection['box'],
|
|
'detr_score': detection['score'],
|
|
'clip_similarity': similarity
|
|
})
|
|
# Logo identified!
|
|
```
|
|
|
|
## Caching Strategy
|
|
|
|
This follows the same caching pattern as facial recognition:
|
|
|
|
1. **Target Image Embeddings**: Cache DETR detections and CLIP embeddings per image
|
|
- Key: `make_image_key("logo_detection", image_path)`
|
|
- Avoids re-running DETR on the same image
|
|
|
|
2. **Reference Logo Embeddings**: Cache CLIP embeddings for reference logos
|
|
- Key: `make_image_key("logo_reference", logo_path)`
|
|
- Computed once and reused across all image scans
|
|
|
|
3. **Benefits**:
|
|
- DETR only runs once per target image
|
|
- CLIP only runs once per reference logo
|
|
- Subsequent scans only perform embedding comparisons (very fast)
|
|
|
|
## Integration Example
|
|
|
|
```python
|
|
def detect_logos_with_caching(
|
|
detector,
|
|
img_file,
|
|
reference_logos,
|
|
max_size=1920
|
|
):
|
|
# Load and resize image
|
|
im_in = image_processor.load_image_safely(img_file)
|
|
img = resize_if_needed_opt(im_in, max_size)
|
|
|
|
# Check cache for detections
|
|
detection_key = make_image_key("logo_detection", img_file)
|
|
cached_data = kv.get(detection_key)
|
|
|
|
if cached_data:
|
|
# Use cached detections
|
|
detections = json.loads(cached_data)
|
|
logger.debug("Logo detections loaded from cache")
|
|
else:
|
|
# Run detection and cache results
|
|
detections = detector.detect(img)
|
|
kv.put(detection_key, json.dumps(detections))
|
|
|
|
# Load reference embeddings (with caching)
|
|
reference_embeddings = []
|
|
for logo_name, logo_path in reference_logos:
|
|
ref_key = make_image_key("logo_reference", logo_path)
|
|
embedding = kv.get_torch(ref_key)
|
|
|
|
if embedding is None:
|
|
logo_img = image_processor.load_image_safely(logo_path)
|
|
embedding = detector.get_embedding(logo_img)
|
|
kv.put_torch(ref_key, embedding)
|
|
|
|
reference_embeddings.append((logo_name, embedding))
|
|
|
|
# Match and return results
|
|
return detector.detect_and_match(
|
|
img,
|
|
reference_embeddings,
|
|
similarity_threshold=0.7
|
|
)
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
- **First Run**: Slower (DETR + CLIP inference)
|
|
- **Cached Runs**: Much faster (only embedding comparisons)
|
|
- **GPU Acceleration**: Automatically uses CUDA if available
|
|
- **Memory**: Models loaded once and reused across all images |