Logo Detection Test Framework
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
Recommended Settings
Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:
| Parameter | Recommended Value | Notes |
|---|---|---|
| Matching Method | multi-ref |
Best balance of precision and recall |
| Similarity Aggregation | max (default) |
Max outperforms mean aggregation |
| Embedding Model | openai/clip-vit-large-patch14 |
Significantly outperforms DINOv2 |
| CLIP Threshold | 0.70 |
Good precision/recall balance |
| DETR Threshold | 0.50 |
Default detection confidence |
| Margin | 0.05 |
Reduces false positives |
| Refs per Logo | 7-10 |
More references = better accuracy |
| Preprocessing | default |
Best precision; letterbox/stretch hurt precision |
Example command with recommended settings:
uv run python test_logo_detection.py \
--matching-method multi-ref \
--refs-per-logo 10 \
--threshold 0.70 \
--margin 0.05 \
--use-max-similarity
Performance Benchmarks
With recommended settings (multi-ref max, threshold 0.70, margin 0.05):
| Refs/Logo | Precision | Recall | F1 Score |
|---|---|---|---|
| 1 | 45.8% | 65.9% | 54.0% |
| 3 | 40.5% | 72.4% | 51.9% |
| 5 | 47.2% | 72.6% | 57.2% |
| 7 | 51.0% | 79.9% | 62.3% |
| 10 | 50.2% | 81.6% | 62.1% |
Key findings:
- More reference images per logo consistently improves recall
- 7+ refs provides the best precision/recall balance
- Diminishing returns beyond 10 refs
Matching Method Comparison
| Method | Precision | Recall | F1 | Use Case |
|---|---|---|---|---|
simple |
1.3% | 203%* | 2.5% | Not recommended (too many FPs) |
margin |
69.8% | 16.3% | 26.4% | High precision, low recall |
multi-ref (mean) |
51.8% | 63.1% | 56.9% | Balanced |
multi-ref (max) |
51.8% | 75.3% | 61.4% | Best overall |
*Simple method returns all matches above threshold, causing many duplicates.
Embedding Model Comparison
| Model | Precision | Recall | F1 | Recommendation |
|---|---|---|---|---|
openai/clip-vit-large-patch14 |
49.1% | 77.0% | 59.9% | Recommended |
facebook/dinov2-small |
22.4% | 42.8% | 29.5% | Not recommended |
facebook/dinov2-large |
32.2% | 28.5% | 30.2% | Not recommended |
CLIP significantly outperforms DINOv2 for logo matching tasks.
Preprocessing Mode Comparison
| Mode | Precision | Recall | F1 | Notes |
|---|---|---|---|---|
default |
50.2% | 81.6% | 62.1% | Recommended - best precision |
letterbox |
42.4% | 119%* | 62.6% | Higher recall but worse precision |
stretch |
34.5% | 113%* | 52.9% | Not recommended |
*Recall >100% indicates multiple detections per expected logo.
Recommendation: Use default preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).
Overview
This project provides tools to:
- Detect logos in images using a fine-tuned DETR model
- Match detected logos against reference images using CLIP embeddings
- Evaluate detection accuracy with precision, recall, and F1 metrics
Architecture
The system uses a two-stage pipeline:
- DETR - Identifies potential logo regions (bounding boxes) in images
- CLIP - Extracts feature embeddings for each detected region and compares against reference logos
Installation
Requires Python 3.12+. Uses uv for package management.
# Install dependencies
uv sync
# Or using pip
pip install -r requirements.txt
Usage
Prepare Test Data
The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:
logo_test/
├── LogoDet-3K/ # Dataset directory (required)
│ ├── Clothes/ # Category directories
│ │ ├── Adidas/ # Brand directories with images + XML annotations
│ │ ├── Nike/
│ │ └── ...
│ ├── Electronic/
│ ├── Food/
│ └── ...
The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.
Then run the preparation script:
uv run python prepare_test_data.py
This script:
- Scans
LogoDet-3K/for images and XML annotation files - Extracts cropped logo regions using bounding box data → saves to
reference_logos/ - Copies full images → saves to
test_images/ - Creates
test_data_mapping.dbSQLite database with ground truth mappings
Run Detection Tests
# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py
# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75
# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
--refs-per-logo 5 --min-matching-refs 2
# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42
Key Parameters
| Parameter | Default | Description |
|---|---|---|
-n, --num-logos |
10 | Number of reference logos to sample |
-t, --threshold |
0.7 | Similarity threshold for matching |
-d, --detr-threshold |
0.5 | DETR detection confidence threshold |
-e, --embedding-model |
openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) |
--matching-method |
margin | Matching method: simple, margin, or multi-ref |
--margin |
0.05 | Margin over second-best match (margin/multi-ref) |
--refs-per-logo |
3 | Reference images per logo |
--min-matching-refs |
1 | Min refs that must match (multi-ref only) |
--use-max-similarity |
False | Use max instead of mean similarity (multi-ref only) |
--positive-samples |
5 | Positive test images per logo |
--negative-samples |
20 | Negative test images per logo |
-s, --seed |
None | Random seed for reproducibility |
--output-file |
None | Append results summary to file (clean output) |
--clear-cache |
False | Clear embedding cache before running |
Matching Methods:
simple- Returns all logos above threshold (not recommended - too many false positives)margin- Requires margin over second-best match (high precision, low recall)multi-ref- Recommended. Aggregates scores across multiple reference images per logo
See --help for all options.
Run Comparison Tests
# Compare all matching methods
./run_comparison_tests.sh
# Test various threshold/margin combinations
./run_threshold_tests.sh
# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh
# Test different refs-per-logo values
./run_refs_per_logo_test.sh
| Script | Purpose | Output File |
|---|---|---|
run_comparison_tests.sh |
Compare matching methods | test_results/comparison_*.txt |
run_threshold_tests.sh |
Test threshold/margin combinations | test_results/threshold_*.txt |
run_model_comparison.sh |
Compare CLIP vs DINOv2 models | test_results/model_comparison_results.txt |
run_refs_per_logo_test.sh |
Test refs-per-logo values | test_results/refs_per_logo_analysis.txt |
run_preprocess_test.sh |
Compare preprocessing modes | test_results/preprocessing_comparison.txt |
Project Structure
logo_test/
├── logo_detection_detr.py # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py # Test script for accuracy evaluation
├── prepare_test_data.py # Script to prepare test database
├── run_comparison_tests.sh # Compare all matching methods
├── run_threshold_tests.sh # Test threshold/margin combinations
├── run_model_comparison.sh # Compare CLIP vs DINOv2 models
├── test_data_mapping.db # SQLite database with ground truth
├── reference_logos/ # Reference logo images (not in git)
├── test_images/ # Test images (not in git)
├── LogoDet-3K/ # Source dataset (not in git)
├── logo_detection_detr_usage.md # API usage guide
├── logo_detection_test_methodology.md # Test methodology documentation
└── test_results_analysis.md # Analysis of test results
Accuracy Improvement Techniques
The framework implements several techniques to improve detection accuracy:
- Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
- Minimum Box Size Filtering - Filters out noise from tiny detections
- Confidence Threshold Filtering - Removes low-confidence detections
- Multiple Reference Images - Uses multiple refs per logo for robust matching
- Margin-Based Matching - Requires confidence margin over second-best match
- Multi-Ref Matching - Aggregates similarity scores across references
- Embedding Caching - Caches embeddings to avoid recomputation
Models
Detection Model
- DETR:
Pravallika6/detr-finetuned-logo-detection_v2
Embedding Models (selectable via -e/--embedding-model)
| Model | Type | Description |
|---|---|---|
openai/clip-vit-large-patch14 |
CLIP | Default. General-purpose vision-language model |
openai/clip-vit-base-patch32 |
CLIP | Smaller, faster CLIP variant |
facebook/dinov2-small |
DINOv2 | Self-supervised, good for visual similarity |
facebook/dinov2-base |
DINOv2 | Larger DINOv2 variant |
facebook/dinov2-large |
DINOv2 | Largest DINOv2 variant |
Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.
Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.
Documentation
- API Usage Guide - How to use the DetectLogosDETR class
- Test Methodology - Detailed explanation of test framework and tuning
License
MIT