Previously the trainer saved a new "best" model if either separation OR loss improved, with loss checked as a fallback. This caused confusing behavior where models with lower separation could overwrite better models. Now only separation (gap between positive and negative similarity) is used to determine the best model, which is the key metric for contrastive learning quality.
Logo Detection Test Framework
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
Overview
This project provides tools to:
- Detect logos in images using a fine-tuned DETR model
- Match detected logos against reference images using CLIP embeddings
- Evaluate detection accuracy with precision, recall, and F1 metrics
Architecture
The system uses a two-stage pipeline:
- DETR - Identifies potential logo regions (bounding boxes) in images
- CLIP - Extracts feature embeddings for each detected region and compares against reference logos
Installation
Requires Python 3.12+. Uses uv for package management.
# Install dependencies
uv sync
# Or using pip
pip install -r requirements.txt
Usage
Prepare Test Data
The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:
logo_test/
├── LogoDet-3K/ # Dataset directory (required)
│ ├── Clothes/ # Category directories
│ │ ├── Adidas/ # Brand directories with images + XML annotations
│ │ ├── Nike/
│ │ └── ...
│ ├── Electronic/
│ ├── Food/
│ └── ...
The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.
Then run the preparation script:
uv run python prepare_test_data.py
This script:
- Scans
LogoDet-3K/for images and XML annotation files - Extracts cropped logo regions using bounding box data → saves to
reference_logos/ - Copies full images → saves to
test_images/ - Creates
test_data_mapping.dbSQLite database with ground truth mappings
Run Detection Tests
# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py
# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75
# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
--refs-per-logo 5 --min-matching-refs 2
# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42
Key Parameters
| Parameter | Default | Description |
|---|---|---|
-n, --num-logos |
10 | Number of reference logos to sample |
-t, --threshold |
0.7 | Similarity threshold for matching |
-d, --detr-threshold |
0.5 | DETR detection confidence threshold |
-e, --embedding-model |
openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) |
--matching-method |
margin | Matching method: simple, margin, or multi-ref |
--margin |
0.05 | Margin over second-best match (margin/multi-ref) |
--refs-per-logo |
3 | Reference images per logo |
--min-matching-refs |
1 | Min refs that must match (multi-ref only) |
--use-max-similarity |
False | Use max instead of mean similarity (multi-ref only) |
--positive-samples |
5 | Positive test images per logo |
--negative-samples |
20 | Negative test images per logo |
-s, --seed |
None | Random seed for reproducibility |
--output-file |
None | Append results summary to file (clean output) |
--clear-cache |
False | Clear embedding cache before running |
Matching Methods:
simple- Returns all logos above threshold (baseline, most permissive)margin- Requires margin over second-best match (reduces false positives)multi-ref- Aggregates scores across multiple reference images per logo
See --help for all options.
Run Comparison Tests
# Compare all matching methods
./run_comparison_tests.sh
# Test various threshold/margin combinations
./run_threshold_tests.sh
# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh
| Script | Purpose | Output File |
|---|---|---|
run_comparison_tests.sh |
Compare all 4 matching methods | comparison_results.txt |
run_threshold_tests.sh |
Test threshold/margin combinations | threshold_test_results.txt |
run_model_comparison.sh |
Compare CLIP vs DINOv2 models | model_comparison_results.txt |
Project Structure
logo_test/
├── logo_detection_detr.py # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py # Test script for accuracy evaluation
├── prepare_test_data.py # Script to prepare test database
├── run_comparison_tests.sh # Compare all matching methods
├── run_threshold_tests.sh # Test threshold/margin combinations
├── run_model_comparison.sh # Compare CLIP vs DINOv2 models
├── test_data_mapping.db # SQLite database with ground truth
├── reference_logos/ # Reference logo images (not in git)
├── test_images/ # Test images (not in git)
├── LogoDet-3K/ # Source dataset (not in git)
├── logo_detection_detr_usage.md # API usage guide
├── logo_detection_test_methodology.md # Test methodology documentation
└── test_results_analysis.md # Analysis of test results
Accuracy Improvement Techniques
The framework implements several techniques to improve detection accuracy:
- Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
- Minimum Box Size Filtering - Filters out noise from tiny detections
- Confidence Threshold Filtering - Removes low-confidence detections
- Multiple Reference Images - Uses multiple refs per logo for robust matching
- Margin-Based Matching - Requires confidence margin over second-best match
- Multi-Ref Matching - Aggregates similarity scores across references
- Embedding Caching - Caches embeddings to avoid recomputation
Models
Detection Model
- DETR:
Pravallika6/detr-finetuned-logo-detection_v2
Embedding Models (selectable via -e/--embedding-model)
| Model | Type | Description |
|---|---|---|
openai/clip-vit-large-patch14 |
CLIP | Default. General-purpose vision-language model |
openai/clip-vit-base-patch32 |
CLIP | Smaller, faster CLIP variant |
facebook/dinov2-small |
DINOv2 | Self-supervised, good for visual similarity |
facebook/dinov2-base |
DINOv2 | Larger DINOv2 variant |
facebook/dinov2-large |
DINOv2 | Largest DINOv2 variant |
Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.
Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.
Documentation
- API Usage Guide - How to use the DetectLogosDETR class
- Test Methodology - Detailed explanation of test framework and tuning
License
MIT