Go to file

Rick McEwen 91d1c9cd59 Update README with recommended settings and test results

Add comprehensive recommendations section based on LogoDet-3K testing:
- Optimal parameter settings table (multi-ref, max aggregation, CLIP model)
- Performance benchmarks for refs-per-logo (1-10 refs)
- Matching method comparison (simple vs margin vs multi-ref)
- Embedding model comparison (CLIP vs DINOv2)
- Preprocessing mode comparison (default vs letterbox vs stretch)

2026-01-08 12:55:13 -05:00

configs

Add RTX 4090 config with image-level splits

2026-01-06 14:23:13 -05:00

test_results

Add comprehensive model comparison analysis

2026-01-07 12:44:15 -05:00

training

Add image-level split support for CLIP fine-tuning

2026-01-05 15:10:45 -05:00

.gitignore

Remove extraneous file from repository, keep local only

2025-12-31 17:53:06 -05:00

.python-version

Initial commit: Logo detection test framework

2025-12-31 10:42:36 -05:00

analyze_similarity_distribution.sh

2026-01-05 13:39:20 -05:00

CLIP_FINETUNING.md

Document threshold tuning for fine-tuned CLIP model

2026-01-05 14:09:38 -05:00

CLOUD_TRAINING.md

Add CLIP fine-tuning pipeline for logo recognition

2026-01-04 13:45:25 -05:00

compare_finetuned_vs_baseline.sh

Add script to compare fine-tuned vs baseline CLIP

2026-01-05 11:43:47 -05:00

export_model.py

Add CLIP fine-tuning pipeline for logo recognition

2026-01-04 13:45:25 -05:00

find_optimal_threshold.sh

Add threshold optimization script

2026-01-05 14:20:27 -05:00

logo_detection_detr_usage.md

Add simple matching method as baseline for comparison tests

2025-12-31 17:36:18 -05:00

logo_detection_detr.py

Remove hybrid text+CLIP matching approach

2026-01-08 12:48:39 -05:00

logo_detection_test_methodology.md

Document margin behavior and update model comparison script

2026-01-02 14:42:53 -05:00

main.py

Initial commit: Logo detection test framework

2025-12-31 10:42:36 -05:00

prepare_test_data.py

Use script directory as base path for portability

2026-01-06 16:00:09 -05:00

pyproject.toml

Add CLIP fine-tuning pipeline for logo recognition

2026-01-04 13:45:25 -05:00

README.md

Update README with recommended settings and test results

2026-01-08 12:55:13 -05:00

requirements-training.txt

Remove opencv-python from requirements (already installed)

2026-01-06 15:23:31 -05:00

requirements.txt

Initial commit: Logo detection test framework

2025-12-31 10:42:36 -05:00

run_comparison_tests.sh

Add --output-file option for clean results output

2025-12-31 17:42:52 -05:00

run_model_comparison.sh

Document margin behavior and update model comparison script

2026-01-02 14:42:53 -05:00

run_preprocess_test.sh

Add hybrid text+CLIP matching and image preprocessing

2026-01-07 15:09:09 -05:00

run_refs_per_logo_test.sh

Add script to test optimal refs per logo for baseline CLIP

2026-01-07 12:52:16 -05:00

run_threshold_tests_image_split.sh

Add threshold test script for image-split model

2026-01-07 10:14:21 -05:00

run_threshold_tests.sh

Add embedding model selection and comparison test scripts

2026-01-02 12:05:27 -05:00

test_cuda_support.py

Initial commit: Logo detection test framework

2025-12-31 10:42:36 -05:00

test_logo_detection.py

Remove hybrid text+CLIP matching approach

2026-01-08 12:48:39 -05:00

train_clip_logo.py

Add image-level split support for CLIP fine-tuning

2026-01-05 15:10:45 -05:00

uv.lock

Initial commit: Logo detection test framework

2025-12-31 10:42:36 -05:00

README.md

Logo Detection Test Framework

A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

Recommended Settings

Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:

Parameter	Recommended Value	Notes
Matching Method	`multi-ref`	Best balance of precision and recall
Similarity Aggregation	`max` (default)	Max outperforms mean aggregation
Embedding Model	`openai/clip-vit-large-patch14`	Significantly outperforms DINOv2
CLIP Threshold	`0.70`	Good precision/recall balance
DETR Threshold	`0.50`	Default detection confidence
Margin	`0.05`	Reduces false positives
Refs per Logo	`7-10`	More references = better accuracy
Preprocessing	`default`	Best precision; letterbox/stretch hurt precision

Example command with recommended settings:

uv run python test_logo_detection.py \
    --matching-method multi-ref \
    --refs-per-logo 10 \
    --threshold 0.70 \
    --margin 0.05 \
    --use-max-similarity

Performance Benchmarks

With recommended settings (multi-ref max, threshold 0.70, margin 0.05):

Refs/Logo	Precision	Recall	F1 Score
1	45.8%	65.9%	54.0%
3	40.5%	72.4%	51.9%
5	47.2%	72.6%	57.2%
7	51.0%	79.9%	62.3%
10	50.2%	81.6%	62.1%

Key findings:

More reference images per logo consistently improves recall
7+ refs provides the best precision/recall balance
Diminishing returns beyond 10 refs

Matching Method Comparison

Method	Precision	Recall	F1	Use Case
`simple`	1.3%	203%*	2.5%	Not recommended (too many FPs)
`margin`	69.8%	16.3%	26.4%	High precision, low recall
`multi-ref` (mean)	51.8%	63.1%	56.9%	Balanced
`multi-ref` (max)	51.8%	75.3%	61.4%	Best overall

*Simple method returns all matches above threshold, causing many duplicates.

Embedding Model Comparison

Model	Precision	Recall	F1	Recommendation
`openai/clip-vit-large-patch14`	49.1%	77.0%	59.9%	Recommended
`facebook/dinov2-small`	22.4%	42.8%	29.5%	Not recommended
`facebook/dinov2-large`	32.2%	28.5%	30.2%	Not recommended

CLIP significantly outperforms DINOv2 for logo matching tasks.

Preprocessing Mode Comparison

Mode	Precision	Recall	F1	Notes
`default`	50.2%	81.6%	62.1%	Recommended - best precision
`letterbox`	42.4%	119%*	62.6%	Higher recall but worse precision
`stretch`	34.5%	113%*	52.9%	Not recommended

*Recall >100% indicates multiple detections per expected logo.

Recommendation: Use default preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).

Overview

This project provides tools to:

Detect logos in images using a fine-tuned DETR model
Match detected logos against reference images using CLIP embeddings
Evaluate detection accuracy with precision, recall, and F1 metrics

Architecture

The system uses a two-stage pipeline:

DETR - Identifies potential logo regions (bounding boxes) in images
CLIP - Extracts feature embeddings for each detected region and compares against reference logos

Installation

Requires Python 3.12+. Uses uv for package management.

# Install dependencies
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Prepare Test Data

The test framework requires the LogoDet-3K dataset. Download it and place it in the project directory:

logo_test/
├── LogoDet-3K/           # Dataset directory (required)
│   ├── Clothes/          # Category directories
│   │   ├── Adidas/       # Brand directories with images + XML annotations
│   │   ├── Nike/
│   │   └── ...
│   ├── Electronic/
│   ├── Food/
│   └── ...

The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.

Then run the preparation script:

uv run python prepare_test_data.py

This script:

Scans LogoDet-3K/ for images and XML annotation files
Extracts cropped logo regions using bounding box data → saves to reference_logos/
Copies full images → saves to test_images/
Creates test_data_mapping.db SQLite database with ground truth mappings

Run Detection Tests

# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py

# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75

# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
    --refs-per-logo 5 --min-matching-refs 2

# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42

Key Parameters

Parameter	Default	Description
`-n, --num-logos`	10	Number of reference logos to sample
`-t, --threshold`	0.7	Similarity threshold for matching
`-d, --detr-threshold`	0.5	DETR detection confidence threshold
`-e, --embedding-model`	openai/clip-vit-large-patch14	Embedding model (CLIP or DINOv2)
`--matching-method`	margin	Matching method: `simple`, `margin`, or `multi-ref`
`--margin`	0.05	Margin over second-best match (margin/multi-ref)
`--refs-per-logo`	3	Reference images per logo
`--min-matching-refs`	1	Min refs that must match (multi-ref only)
`--use-max-similarity`	False	Use max instead of mean similarity (multi-ref only)
`--positive-samples`	5	Positive test images per logo
`--negative-samples`	20	Negative test images per logo
`-s, --seed`	None	Random seed for reproducibility
`--output-file`	None	Append results summary to file (clean output)
`--clear-cache`	False	Clear embedding cache before running

Matching Methods:

simple - Returns all logos above threshold (not recommended - too many false positives)
margin - Requires margin over second-best match (high precision, low recall)
multi-ref - Recommended. Aggregates scores across multiple reference images per logo

See --help for all options.

Run Comparison Tests

# Compare all matching methods
./run_comparison_tests.sh

# Test various threshold/margin combinations
./run_threshold_tests.sh

# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh

# Test different refs-per-logo values
./run_refs_per_logo_test.sh

Script	Purpose	Output File
`run_comparison_tests.sh`	Compare matching methods	`test_results/comparison_*.txt`
`run_threshold_tests.sh`	Test threshold/margin combinations	`test_results/threshold_*.txt`
`run_model_comparison.sh`	Compare CLIP vs DINOv2 models	`test_results/model_comparison_results.txt`
`run_refs_per_logo_test.sh`	Test refs-per-logo values	`test_results/refs_per_logo_analysis.txt`
`run_preprocess_test.sh`	Compare preprocessing modes	`test_results/preprocessing_comparison.txt`

Project Structure

logo_test/
├── logo_detection_detr.py      # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py      # Test script for accuracy evaluation
├── prepare_test_data.py        # Script to prepare test database
├── run_comparison_tests.sh     # Compare all matching methods
├── run_threshold_tests.sh      # Test threshold/margin combinations
├── run_model_comparison.sh     # Compare CLIP vs DINOv2 models
├── test_data_mapping.db        # SQLite database with ground truth
├── reference_logos/            # Reference logo images (not in git)
├── test_images/                # Test images (not in git)
├── LogoDet-3K/                 # Source dataset (not in git)
├── logo_detection_detr_usage.md        # API usage guide
├── logo_detection_test_methodology.md  # Test methodology documentation
└── test_results_analysis.md    # Analysis of test results

Accuracy Improvement Techniques

The framework implements several techniques to improve detection accuracy:

Non-Maximum Suppression (NMS) - Removes overlapping duplicate detections
Minimum Box Size Filtering - Filters out noise from tiny detections
Confidence Threshold Filtering - Removes low-confidence detections
Multiple Reference Images - Uses multiple refs per logo for robust matching
Margin-Based Matching - Requires confidence margin over second-best match
Multi-Ref Matching - Aggregates similarity scores across references
Embedding Caching - Caches embeddings to avoid recomputation

Models

Detection Model

DETR: Pravallika6/detr-finetuned-logo-detection_v2

Embedding Models (selectable via `-e/--embedding-model`)

Model	Type	Description
`openai/clip-vit-large-patch14`	CLIP	Default. General-purpose vision-language model
`openai/clip-vit-base-patch32`	CLIP	Smaller, faster CLIP variant
`facebook/dinov2-small`	DINOv2	Self-supervised, good for visual similarity
`facebook/dinov2-base`	DINOv2	Larger DINOv2 variant
`facebook/dinov2-large`	DINOv2	Largest DINOv2 variant

Models are automatically downloaded from HuggingFace on first run and cached in ~/.cache/huggingface/.

Note: When switching between embedding models, use --clear-cache to ensure embeddings are recomputed with the new model.

Documentation

API Usage Guide - How to use the DetectLogosDETR class
Test Methodology - Detailed explanation of test framework and tuning

License

MIT

README.md

Logo Detection Test Framework

Recommended Settings

Performance Benchmarks

Matching Method Comparison

Embedding Model Comparison

Preprocessing Mode Comparison

Overview

Architecture

Installation

Usage

Prepare Test Data

Run Detection Tests

Key Parameters

Run Comparison Tests

Project Structure

Accuracy Improvement Techniques

Models

Detection Model

Embedding Models (selectable via -e/--embedding-model)

Documentation

License

Embedding Models (selectable via `-e/--embedding-model`)