# Logo Detection Test Framework A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models. ## Recommended Settings Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings: | Parameter | Recommended Value | Notes | |-----------|-------------------|-------| | **Matching Method** | `multi-ref` | Best balance of precision and recall | | **Similarity Aggregation** | `max` (default) | Max outperforms mean aggregation | | **Embedding Model** | `openai/clip-vit-large-patch14` | Significantly outperforms DINOv2 | | **CLIP Threshold** | `0.70` | Good precision/recall balance | | **DETR Threshold** | `0.50` | Default detection confidence | | **Margin** | `0.05` | Reduces false positives | | **Refs per Logo** | `7-10` | More references = better accuracy | | **Preprocessing** | `default` | Best precision; letterbox/stretch hurt precision | **Example command with recommended settings:** ```bash uv run python test_logo_detection.py \ --matching-method multi-ref \ --refs-per-logo 10 \ --threshold 0.70 \ --margin 0.05 \ --use-max-similarity ``` ### Performance Benchmarks With recommended settings (multi-ref max, threshold 0.70, margin 0.05): | Refs/Logo | Precision | Recall | F1 Score | |-----------|-----------|--------|----------| | 1 | 45.8% | 65.9% | 54.0% | | 3 | 40.5% | 72.4% | 51.9% | | 5 | 47.2% | 72.6% | 57.2% | | 7 | **51.0%** | **79.9%** | **62.3%** | | 10 | 50.2% | 81.6% | 62.1% | **Key findings:** - More reference images per logo consistently improves recall - 7+ refs provides the best precision/recall balance - Diminishing returns beyond 10 refs ### Matching Method Comparison | Method | Precision | Recall | F1 | Use Case | |--------|-----------|--------|-----|----------| | `simple` | 1.3% | 203%* | 2.5% | Not recommended (too many FPs) | | `margin` | 69.8% | 16.3% | 26.4% | High precision, low recall | | `multi-ref` (mean) | 51.8% | 63.1% | 56.9% | Balanced | | `multi-ref` (max) | **51.8%** | **75.3%** | **61.4%** | **Best overall** | *Simple method returns all matches above threshold, causing many duplicates. ### Embedding Model Comparison | Model | Precision | Recall | F1 | Recommendation | |-------|-----------|--------|-----|----------------| | `openai/clip-vit-large-patch14` | **49.1%** | **77.0%** | **59.9%** | **Recommended** | | `facebook/dinov2-small` | 22.4% | 42.8% | 29.5% | Not recommended | | `facebook/dinov2-large` | 32.2% | 28.5% | 30.2% | Not recommended | CLIP significantly outperforms DINOv2 for logo matching tasks. ### Preprocessing Mode Comparison | Mode | Precision | Recall | F1 | Notes | |------|-----------|--------|-----|-------| | `default` | **50.2%** | 81.6% | 62.1% | **Recommended** - best precision | | `letterbox` | 42.4% | 119%* | 62.6% | Higher recall but worse precision | | `stretch` | 34.5% | 113%* | 52.9% | Not recommended | *Recall >100% indicates multiple detections per expected logo. **Recommendation:** Use `default` preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives). --- ## Overview This project provides tools to: - Detect logos in images using a fine-tuned DETR model - Match detected logos against reference images using CLIP embeddings - Evaluate detection accuracy with precision, recall, and F1 metrics ## Architecture The system uses a two-stage pipeline: 1. **DETR** - Identifies potential logo regions (bounding boxes) in images 2. **CLIP** - Extracts feature embeddings for each detected region and compares against reference logos ## Installation Requires Python 3.12+. Uses [uv](https://github.com/astral-sh/uv) for package management. ```bash # Install dependencies uv sync # Or using pip pip install -r requirements.txt ``` ## Usage ### Prepare Test Data The test framework requires the **LogoDet-3K** dataset. Download it and place it in the project directory: ``` logo_test/ ├── LogoDet-3K/ # Dataset directory (required) │ ├── Clothes/ # Category directories │ │ ├── Adidas/ # Brand directories with images + XML annotations │ │ ├── Nike/ │ │ └── ... │ ├── Electronic/ │ ├── Food/ │ └── ... ``` The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes. Then run the preparation script: ```bash uv run python prepare_test_data.py ``` This script: 1. Scans `LogoDet-3K/` for images and XML annotation files 2. Extracts cropped logo regions using bounding box data → saves to `reference_logos/` 3. Copies full images → saves to `test_images/` 4. Creates `test_data_mapping.db` SQLite database with ground truth mappings ### Run Detection Tests ```bash # Basic test with default settings (margin-based matching) uv run python test_logo_detection.py # Test with more logos and custom threshold uv run python test_logo_detection.py -n 20 --threshold 0.75 # Use multi-ref matching method uv run python test_logo_detection.py --matching-method multi-ref \ --refs-per-logo 5 --min-matching-refs 2 # Reproducible test with seed uv run python test_logo_detection.py -n 50 --seed 42 ``` ### Key Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `-n, --num-logos` | 10 | Number of reference logos to sample | | `-t, --threshold` | 0.7 | Similarity threshold for matching | | `-d, --detr-threshold` | 0.5 | DETR detection confidence threshold | | `-e, --embedding-model` | openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) | | `--matching-method` | margin | Matching method: `simple`, `margin`, or `multi-ref` | | `--margin` | 0.05 | Margin over second-best match (margin/multi-ref) | | `--refs-per-logo` | 3 | Reference images per logo | | `--min-matching-refs` | 1 | Min refs that must match (multi-ref only) | | `--use-max-similarity` | False | Use max instead of mean similarity (multi-ref only) | | `--positive-samples` | 5 | Positive test images per logo | | `--negative-samples` | 20 | Negative test images per logo | | `-s, --seed` | None | Random seed for reproducibility | | `--output-file` | None | Append results summary to file (clean output) | | `--clear-cache` | False | Clear embedding cache before running | **Matching Methods:** - `simple` - Returns all logos above threshold (not recommended - too many false positives) - `margin` - Requires margin over second-best match (high precision, low recall) - `multi-ref` - **Recommended.** Aggregates scores across multiple reference images per logo See `--help` for all options. ### Run Comparison Tests ```bash # Compare all matching methods ./run_comparison_tests.sh # Test various threshold/margin combinations ./run_threshold_tests.sh # Compare embedding models (CLIP vs DINOv2) ./run_model_comparison.sh # Test different refs-per-logo values ./run_refs_per_logo_test.sh ``` | Script | Purpose | Output File | |--------|---------|-------------| | `run_comparison_tests.sh` | Compare matching methods | `test_results/comparison_*.txt` | | `run_threshold_tests.sh` | Test threshold/margin combinations | `test_results/threshold_*.txt` | | `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `test_results/model_comparison_results.txt` | | `run_refs_per_logo_test.sh` | Test refs-per-logo values | `test_results/refs_per_logo_analysis.txt` | | `run_preprocess_test.sh` | Compare preprocessing modes | `test_results/preprocessing_comparison.txt` | ## Project Structure ``` logo_test/ ├── logo_detection_detr.py # Core detection library (DetectLogosDETR class) ├── test_logo_detection.py # Test script for accuracy evaluation ├── prepare_test_data.py # Script to prepare test database ├── run_comparison_tests.sh # Compare all matching methods ├── run_threshold_tests.sh # Test threshold/margin combinations ├── run_model_comparison.sh # Compare CLIP vs DINOv2 models ├── test_data_mapping.db # SQLite database with ground truth ├── reference_logos/ # Reference logo images (not in git) ├── test_images/ # Test images (not in git) ├── LogoDet-3K/ # Source dataset (not in git) ├── logo_detection_detr_usage.md # API usage guide ├── logo_detection_test_methodology.md # Test methodology documentation └── test_results_analysis.md # Analysis of test results ``` ## Accuracy Improvement Techniques The framework implements several techniques to improve detection accuracy: 1. **Non-Maximum Suppression (NMS)** - Removes overlapping duplicate detections 2. **Minimum Box Size Filtering** - Filters out noise from tiny detections 3. **Confidence Threshold Filtering** - Removes low-confidence detections 4. **Multiple Reference Images** - Uses multiple refs per logo for robust matching 5. **Margin-Based Matching** - Requires confidence margin over second-best match 6. **Multi-Ref Matching** - Aggregates similarity scores across references 7. **Embedding Caching** - Caches embeddings to avoid recomputation ## Models ### Detection Model - **DETR**: `Pravallika6/detr-finetuned-logo-detection_v2` ### Embedding Models (selectable via `-e/--embedding-model`) | Model | Type | Description | |-------|------|-------------| | `openai/clip-vit-large-patch14` | CLIP | Default. General-purpose vision-language model | | `openai/clip-vit-base-patch32` | CLIP | Smaller, faster CLIP variant | | `facebook/dinov2-small` | DINOv2 | Self-supervised, good for visual similarity | | `facebook/dinov2-base` | DINOv2 | Larger DINOv2 variant | | `facebook/dinov2-large` | DINOv2 | Largest DINOv2 variant | Models are automatically downloaded from HuggingFace on first run and cached in `~/.cache/huggingface/`. **Note**: When switching between embedding models, use `--clear-cache` to ensure embeddings are recomputed with the new model. ## Documentation - [API Usage Guide](logo_detection_detr_usage.md) - How to use the DetectLogosDETR class - [Test Methodology](logo_detection_test_methodology.md) - Detailed explanation of test framework and tuning ## License MIT