A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

Recommended Settings

Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:

Parameter	Recommended Value	Notes
Matching Method	`multi-ref`	Best balance of precision and recall
Similarity Aggregation	`max` (default)	Max outperforms mean aggregation
Embedding Model	`openai/clip-vit-large-patch14`	Significantly outperforms DINOv2
CLIP Threshold	`0.70`	Good precision/recall balance
DETR Threshold	`0.50`	Default detection confidence
Margin	`0.05`	Reduces false positives
Refs per Logo	`7-10`	More references = better accuracy
Preprocessing	`default`	Best precision; letterbox/stretch hurt precision

Example command with recommended settings:

uv run python test_logo_detection.py \
    --matching-method multi-ref \
    --refs-per-logo 10 \
    --threshold 0.70 \
    --margin 0.05 \
    --use-max-similarity

Performance Benchmarks

With recommended settings (multi-ref max, threshold 0.70, margin 0.05):

Refs/Logo	Precision	Recall	F1 Score
1	45.8%	65.9%	54.0%
3	40.5%	72.4%	51.9%
5	47.2%	72.6%	57.2%
7	51.0%	79.9%	62.3%
10	50.2%	81.6%	62.1%

Key findings:

More reference images per logo consistently improves recall
7+ refs provides the best precision/recall balance
Diminishing returns beyond 10 refs

Matching Method Comparison

Method	Precision	Recall	F1	Use Case
`simple`	1.3%	203%*	2.5%	Not recommended (too many FPs)
`margin`	69.8%	16.3%	26.4%	High precision, low recall
`multi-ref` (mean)	51.8%	63.1%	56.9%	Balanced
`multi-ref` (max)	51.8%	75.3%	61.4%	Best overall

*Simple method returns all matches above threshold, causing many duplicates.

Embedding Model Comparison

Model	Precision	Recall	F1	Recommendation
`openai/clip-vit-large-patch14`	49.1%	77.0%	59.9%	Recommended
`facebook/dinov2-small`	22.4%	42.8%	29.5%	Not recommended
`facebook/dinov2-large`	32.2%	28.5%	30.2%	Not recommended

CLIP significantly outperforms DINOv2 for logo matching tasks.

Preprocessing Mode Comparison

Mode	Precision	Recall	F1	Notes
`default`	50.2%	81.6%	62.1%	Recommended - best precision
`letterbox`	42.4%	119%*	62.6%	Higher recall but worse precision
`stretch`	34.5%	113%*	52.9%	Not recommended

*Recall >100% indicates multiple detections per expected logo.

Recommendation: Use default preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).

Overview

This project provides tools to:

Detect logos in images using a fine-tuned DETR model
Match detected logos against reference images using CLIP embeddings
Evaluate detection accuracy with precision, recall, and F1 metrics

Architecture

The system uses a two-stage pipeline:

DETR - Identifies potential logo regions (bounding boxes) in images
CLIP - Extracts feature embeddings for each detected region and compares against reference logos

Installation

Requires Python 3.12+. Uses uv for package management.

# Install dependencies
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Prepare Test Data

The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:

logo_test/
├── LogoDet-3K/           # Dataset directory (required)
│   ├── Clothes/          # Category directories
│   │   ├── Adidas/       # Brand directories with images + XML annotations
│   │   ├── Nike/
│   │   └── ...
│   ├── Electronic/
│   ├── Food/
│   └── ...

The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.

Then run the preparation script:

uv run python prepare_test_data.py

This script:

Scans LogoDet-3K/ for images and XML annotation files
Extracts cropped logo regions using bounding box data → saves to reference_logos/
Copies full images → saves to test_images/
Creates test_data_mapping.db SQLite database with ground truth mappings

Run Detection Tests

# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py

# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75

# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
    --refs-per-logo 5 --min-matching-refs 2

# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42

Key Parameters

Parameter	Default	Description
`-n, --num-logos`	10	Number of reference logos to sample
`-t, --threshold`	0.7	Similarity threshold for matching
`-d, --detr-threshold`	0.5	DETR detection confidence threshold
`-e, --embedding-model`	openai/clip-vit-large-patch14	Embedding model (CLIP or DINOv2)
`--matching-method`	margin	Matching method: `simple`, `margin`, or `multi-ref`
`--margin`	0.05	Margin over second-best match (margin/multi-ref)
`--refs-per-logo`	3	Reference images per logo
`--min-matching-refs`	1	Min refs that must match (multi-ref only)
`--use-max-similarity`	False	Use max instead of mean similarity (multi-ref only)
`--positive-samples`	5	Positive test images per logo
`--negative-samples`	20	Negative test images per logo
`-s, --seed`	None	Random seed for reproducibility
`--output-file`	None	Append results summary to file (clean output)
`--clear-cache`	False	Clear embedding cache before running

Matching Methods:

simple - Returns all logos above threshold (not recommended - too many false positives)
margin - Requires margin over second-best match (high precision, low recall)
multi-ref - Recommended. Aggregates scores across multiple reference images per logo

See --help for all options.

Run Comparison Tests

# Compare all matching methods
./run_comparison_tests.sh

# Test various threshold/margin combinations
./run_threshold_tests.sh

# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh

# Test different refs-per-logo values
./run_refs_per_logo_test.sh

Script	Purpose	Output File
`run_comparison_tests.sh`	Compare matching methods	`test_results/comparison_*.txt`
`run_threshold_tests.sh`	Test threshold/margin combinations	`test_results/threshold_*.txt`
`run_model_comparison.sh`	Compare CLIP vs DINOv2 models	`test_results/model_comparison_results.txt`
`run_refs_per_logo_test.sh`	Test refs-per-logo values	`test_results/refs_per_logo_analysis.txt`
`run_preprocess_test.sh`	Compare preprocessing modes	`test_results/preprocessing_comparison.txt`

Project Structure

logo_test/
├── logo_detection_detr.py      # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py      # Test script for accuracy evaluation
├── prepare_test_data.py        # Script to prepare test database
├── run_comparison_tests.sh     # Compare all matching methods
├── run_threshold_tests.sh      # Test threshold/margin combinations
├── run_model_comparison.sh     # Compare CLIP vs DINOv2 models
├── test_data_mapping.db        # SQLite database with ground truth
├── reference_logos/            # Reference logo images (not in git)
├── test_images/                # Test images (not in git)
├── LogoDet-3K/                 # Source dataset (not in git)
├── logo_detection_detr_usage.md        # API usage guide
├── logo_detection_test_methodology.md  # Test methodology documentation
└── test_results_analysis.md    # Analysis of test results

Accuracy Improvement Techniques

The framework implements several techniques to improve detection accuracy:

Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
Minimum Box Size Filtering - Filters out noise from tiny detections
Confidence Threshold Filtering - Removes low-confidence detections
Multiple Reference Images - Uses multiple refs per logo for robust matching
Margin-Based Matching - Requires confidence margin over second-best match
Multi-Ref Matching - Aggregates similarity scores across references
Embedding Caching - Caches embeddings to avoid recomputation

Models

Detection Model

DETR: Pravallika6/detr-finetuned-logo-detection_v2

Embedding Models (selectable via `-e/--embedding-model`)

Model	Type	Description
`openai/clip-vit-large-patch14`	CLIP	Default. General-purpose vision-language model
`openai/clip-vit-base-patch32`	CLIP	Smaller, faster CLIP variant
`facebook/dinov2-small`	DINOv2	Self-supervised, good for visual similarity
`facebook/dinov2-base`	DINOv2	Larger DINOv2 variant
`facebook/dinov2-large`	DINOv2	Largest DINOv2 variant

Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.

Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.

Documentation

API Usage Guide - How to use the DetectLogosDETR class
Test Methodology - Detailed explanation of test framework and tuning

License

MIT

README.md

Logo Detection Test Framework