Rick McEwen 99e5781c91 Fix trainer to use separation as sole criterion for best model
Previously the trainer saved a new "best" model if either separation
OR loss improved, with loss checked as a fallback. This caused
confusing behavior where models with lower separation could overwrite
better models.

Now only separation (gap between positive and negative similarity) is
used to determine the best model, which is the key metric for
contrastive learning quality.
2026-01-05 11:01:14 -05:00

Logo Detection Test Framework

A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

Overview

This project provides tools to:

  • Detect logos in images using a fine-tuned DETR model
  • Match detected logos against reference images using CLIP embeddings
  • Evaluate detection accuracy with precision, recall, and F1 metrics

Architecture

The system uses a two-stage pipeline:

  1. DETR - Identifies potential logo regions (bounding boxes) in images
  2. CLIP - Extracts feature embeddings for each detected region and compares against reference logos

Installation

Requires Python 3.12+. Uses uv for package management.

# Install dependencies
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Prepare Test Data

The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:

logo_test/
├── LogoDet-3K/           # Dataset directory (required)
│   ├── Clothes/          # Category directories
│   │   ├── Adidas/       # Brand directories with images + XML annotations
│   │   ├── Nike/
│   │   └── ...
│   ├── Electronic/
│   ├── Food/
│   └── ...

The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.

Then run the preparation script:

uv run python prepare_test_data.py

This script:

  1. Scans LogoDet-3K/ for images and XML annotation files
  2. Extracts cropped logo regions using bounding box data → saves to reference_logos/
  3. Copies full images → saves to test_images/
  4. Creates test_data_mapping.db SQLite database with ground truth mappings

Run Detection Tests

# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py

# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75

# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
    --refs-per-logo 5 --min-matching-refs 2

# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42

Key Parameters

Parameter Default Description
-n, --num-logos 10 Number of reference logos to sample
-t, --threshold 0.7 Similarity threshold for matching
-d, --detr-threshold 0.5 DETR detection confidence threshold
-e, --embedding-model openai/clip-vit-large-patch14 Embedding model (CLIP or DINOv2)
--matching-method margin Matching method: simple, margin, or multi-ref
--margin 0.05 Margin over second-best match (margin/multi-ref)
--refs-per-logo 3 Reference images per logo
--min-matching-refs 1 Min refs that must match (multi-ref only)
--use-max-similarity False Use max instead of mean similarity (multi-ref only)
--positive-samples 5 Positive test images per logo
--negative-samples 20 Negative test images per logo
-s, --seed None Random seed for reproducibility
--output-file None Append results summary to file (clean output)
--clear-cache False Clear embedding cache before running

Matching Methods:

  • simple - Returns all logos above threshold (baseline, most permissive)
  • margin - Requires margin over second-best match (reduces false positives)
  • multi-ref - Aggregates scores across multiple reference images per logo

See --help for all options.

Run Comparison Tests

# Compare all matching methods
./run_comparison_tests.sh

# Test various threshold/margin combinations
./run_threshold_tests.sh

# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh
Script Purpose Output File
run_comparison_tests.sh Compare all 4 matching methods comparison_results.txt
run_threshold_tests.sh Test threshold/margin combinations threshold_test_results.txt
run_model_comparison.sh Compare CLIP vs DINOv2 models model_comparison_results.txt

Project Structure

logo_test/
├── logo_detection_detr.py      # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py      # Test script for accuracy evaluation
├── prepare_test_data.py        # Script to prepare test database
├── run_comparison_tests.sh     # Compare all matching methods
├── run_threshold_tests.sh      # Test threshold/margin combinations
├── run_model_comparison.sh     # Compare CLIP vs DINOv2 models
├── test_data_mapping.db        # SQLite database with ground truth
├── reference_logos/            # Reference logo images (not in git)
├── test_images/                # Test images (not in git)
├── LogoDet-3K/                 # Source dataset (not in git)
├── logo_detection_detr_usage.md        # API usage guide
├── logo_detection_test_methodology.md  # Test methodology documentation
└── test_results_analysis.md    # Analysis of test results

Accuracy Improvement Techniques

The framework implements several techniques to improve detection accuracy:

  1. Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
  2. Minimum Box Size Filtering - Filters out noise from tiny detections
  3. Confidence Threshold Filtering - Removes low-confidence detections
  4. Multiple Reference Images - Uses multiple refs per logo for robust matching
  5. Margin-Based Matching - Requires confidence margin over second-best match
  6. Multi-Ref Matching - Aggregates similarity scores across references
  7. Embedding Caching - Caches embeddings to avoid recomputation

Models

Detection Model

  • DETR: Pravallika6/detr-finetuned-logo-detection_v2

Embedding Models (selectable via -e/--embedding-model)

Model Type Description
openai/clip-vit-large-patch14 CLIP Default. General-purpose vision-language model
openai/clip-vit-base-patch32 CLIP Smaller, faster CLIP variant
facebook/dinov2-small DINOv2 Self-supervised, good for visual similarity
facebook/dinov2-base DINOv2 Larger DINOv2 variant
facebook/dinov2-large DINOv2 Largest DINOv2 variant

Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.

Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.

Documentation

License

MIT

Description
No description provided
Readme 490 KiB
Languages
Python 82.7%
Shell 17.3%