Rick McEwen 91d1c9cd59 Update README with recommended settings and test results
Add comprehensive recommendations section based on LogoDet-3K testing:
- Optimal parameter settings table (multi-ref, max aggregation, CLIP model)
- Performance benchmarks for refs-per-logo (1-10 refs)
- Matching method comparison (simple vs margin vs multi-ref)
- Embedding model comparison (CLIP vs DINOv2)
- Preprocessing mode comparison (default vs letterbox vs stretch)
2026-01-08 12:55:13 -05:00

Logo Detection Test Framework

A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:

Parameter Recommended Value Notes
Matching Method multi-ref Best balance of precision and recall
Similarity Aggregation max (default) Max outperforms mean aggregation
Embedding Model openai/clip-vit-large-patch14 Significantly outperforms DINOv2
CLIP Threshold 0.70 Good precision/recall balance
DETR Threshold 0.50 Default detection confidence
Margin 0.05 Reduces false positives
Refs per Logo 7-10 More references = better accuracy
Preprocessing default Best precision; letterbox/stretch hurt precision

Example command with recommended settings:

uv run python test_logo_detection.py \
    --matching-method multi-ref \
    --refs-per-logo 10 \
    --threshold 0.70 \
    --margin 0.05 \
    --use-max-similarity

Performance Benchmarks

With recommended settings (multi-ref max, threshold 0.70, margin 0.05):

Refs/Logo Precision Recall F1 Score
1 45.8% 65.9% 54.0%
3 40.5% 72.4% 51.9%
5 47.2% 72.6% 57.2%
7 51.0% 79.9% 62.3%
10 50.2% 81.6% 62.1%

Key findings:

  • More reference images per logo consistently improves recall
  • 7+ refs provides the best precision/recall balance
  • Diminishing returns beyond 10 refs

Matching Method Comparison

Method Precision Recall F1 Use Case
simple 1.3% 203%* 2.5% Not recommended (too many FPs)
margin 69.8% 16.3% 26.4% High precision, low recall
multi-ref (mean) 51.8% 63.1% 56.9% Balanced
multi-ref (max) 51.8% 75.3% 61.4% Best overall

*Simple method returns all matches above threshold, causing many duplicates.

Embedding Model Comparison

Model Precision Recall F1 Recommendation
openai/clip-vit-large-patch14 49.1% 77.0% 59.9% Recommended
facebook/dinov2-small 22.4% 42.8% 29.5% Not recommended
facebook/dinov2-large 32.2% 28.5% 30.2% Not recommended

CLIP significantly outperforms DINOv2 for logo matching tasks.

Preprocessing Mode Comparison

Mode Precision Recall F1 Notes
default 50.2% 81.6% 62.1% Recommended - best precision
letterbox 42.4% 119%* 62.6% Higher recall but worse precision
stretch 34.5% 113%* 52.9% Not recommended

*Recall >100% indicates multiple detections per expected logo.

Recommendation: Use default preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).


Overview

This project provides tools to:

  • Detect logos in images using a fine-tuned DETR model
  • Match detected logos against reference images using CLIP embeddings
  • Evaluate detection accuracy with precision, recall, and F1 metrics

Architecture

The system uses a two-stage pipeline:

  1. DETR - Identifies potential logo regions (bounding boxes) in images
  2. CLIP - Extracts feature embeddings for each detected region and compares against reference logos

Installation

Requires Python 3.12+. Uses uv for package management.

# Install dependencies
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Prepare Test Data

The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:

logo_test/
├── LogoDet-3K/           # Dataset directory (required)
│   ├── Clothes/          # Category directories
│   │   ├── Adidas/       # Brand directories with images + XML annotations
│   │   ├── Nike/
│   │   └── ...
│   ├── Electronic/
│   ├── Food/
│   └── ...

The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.

Then run the preparation script:

uv run python prepare_test_data.py

This script:

  1. Scans LogoDet-3K/ for images and XML annotation files
  2. Extracts cropped logo regions using bounding box data → saves to reference_logos/
  3. Copies full images → saves to test_images/
  4. Creates test_data_mapping.db SQLite database with ground truth mappings

Run Detection Tests

# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py

# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75

# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
    --refs-per-logo 5 --min-matching-refs 2

# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42

Key Parameters

Parameter Default Description
-n, --num-logos 10 Number of reference logos to sample
-t, --threshold 0.7 Similarity threshold for matching
-d, --detr-threshold 0.5 DETR detection confidence threshold
-e, --embedding-model openai/clip-vit-large-patch14 Embedding model (CLIP or DINOv2)
--matching-method margin Matching method: simple, margin, or multi-ref
--margin 0.05 Margin over second-best match (margin/multi-ref)
--refs-per-logo 3 Reference images per logo
--min-matching-refs 1 Min refs that must match (multi-ref only)
--use-max-similarity False Use max instead of mean similarity (multi-ref only)
--positive-samples 5 Positive test images per logo
--negative-samples 20 Negative test images per logo
-s, --seed None Random seed for reproducibility
--output-file None Append results summary to file (clean output)
--clear-cache False Clear embedding cache before running

Matching Methods:

  • simple - Returns all logos above threshold (not recommended - too many false positives)
  • margin - Requires margin over second-best match (high precision, low recall)
  • multi-ref - Recommended. Aggregates scores across multiple reference images per logo

See --help for all options.

Run Comparison Tests

# Compare all matching methods
./run_comparison_tests.sh

# Test various threshold/margin combinations
./run_threshold_tests.sh

# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh

# Test different refs-per-logo values
./run_refs_per_logo_test.sh
Script Purpose Output File
run_comparison_tests.sh Compare matching methods test_results/comparison_*.txt
run_threshold_tests.sh Test threshold/margin combinations test_results/threshold_*.txt
run_model_comparison.sh Compare CLIP vs DINOv2 models test_results/model_comparison_results.txt
run_refs_per_logo_test.sh Test refs-per-logo values test_results/refs_per_logo_analysis.txt
run_preprocess_test.sh Compare preprocessing modes test_results/preprocessing_comparison.txt

Project Structure

logo_test/
├── logo_detection_detr.py      # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py      # Test script for accuracy evaluation
├── prepare_test_data.py        # Script to prepare test database
├── run_comparison_tests.sh     # Compare all matching methods
├── run_threshold_tests.sh      # Test threshold/margin combinations
├── run_model_comparison.sh     # Compare CLIP vs DINOv2 models
├── test_data_mapping.db        # SQLite database with ground truth
├── reference_logos/            # Reference logo images (not in git)
├── test_images/                # Test images (not in git)
├── LogoDet-3K/                 # Source dataset (not in git)
├── logo_detection_detr_usage.md        # API usage guide
├── logo_detection_test_methodology.md  # Test methodology documentation
└── test_results_analysis.md    # Analysis of test results

Accuracy Improvement Techniques

The framework implements several techniques to improve detection accuracy:

  1. Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
  2. Minimum Box Size Filtering - Filters out noise from tiny detections
  3. Confidence Threshold Filtering - Removes low-confidence detections
  4. Multiple Reference Images - Uses multiple refs per logo for robust matching
  5. Margin-Based Matching - Requires confidence margin over second-best match
  6. Multi-Ref Matching - Aggregates similarity scores across references
  7. Embedding Caching - Caches embeddings to avoid recomputation

Models

Detection Model

  • DETR: Pravallika6/detr-finetuned-logo-detection_v2

Embedding Models (selectable via -e/--embedding-model)

Model Type Description
openai/clip-vit-large-patch14 CLIP Default. General-purpose vision-language model
openai/clip-vit-base-patch32 CLIP Smaller, faster CLIP variant
facebook/dinov2-small DINOv2 Self-supervised, good for visual similarity
facebook/dinov2-base DINOv2 Larger DINOv2 variant
facebook/dinov2-large DINOv2 Largest DINOv2 variant

Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.

Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.

Documentation

License

MIT

Description
No description provided
Readme 490 KiB
Languages
Python 82.7%
Shell 17.3%