# Logo Detection Test Framework A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models. ## Overview This project provides tools to: - Detect logos in images using a fine-tuned DETR model - Match detected logos against reference images using CLIP embeddings - Evaluate detection accuracy with precision, recall, and F1 metrics ## Architecture The system uses a two-stage pipeline: 1. **DETR** - Identifies potential logo regions (bounding boxes) in images 2. **CLIP** - Extracts feature embeddings for each detected region and compares against reference logos ## Installation Requires Python 3.12+. Uses [uv](https://github.com/astral-sh/uv) for package management. ```bash # Install dependencies uv sync # Or using pip pip install -r requirements.txt ``` ## Usage ### Prepare Test Data The test framework requires the **LogoDet-3K** dataset. Download it and place it in the project directory: ``` logo_test/ ├── LogoDet-3K/ # Dataset directory (required) │ ├── Clothes/ # Category directories │ │ ├── Adidas/ # Brand directories with images + XML annotations │ │ ├── Nike/ │ │ └── ... │ ├── Electronic/ │ ├── Food/ │ └── ... ``` The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes. Then run the preparation script: ```bash uv run python prepare_test_data.py ``` This script: 1. Scans `LogoDet-3K/` for images and XML annotation files 2. Extracts cropped logo regions using bounding box data → saves to `reference_logos/` 3. Copies full images → saves to `test_images/` 4. Creates `test_data_mapping.db` SQLite database with ground truth mappings ### Run Detection Tests ```bash # Basic test with default settings (margin-based matching) uv run python test_logo_detection.py # Test with more logos and custom threshold uv run python test_logo_detection.py -n 20 --threshold 0.75 # Use multi-ref matching method uv run python test_logo_detection.py --matching-method multi-ref \ --refs-per-logo 5 --min-matching-refs 2 # Reproducible test with seed uv run python test_logo_detection.py -n 50 --seed 42 ``` ### Key Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `-n, --num-logos` | 10 | Number of reference logos to sample | | `-t, --threshold` | 0.7 | Similarity threshold for matching | | `-d, --detr-threshold` | 0.5 | DETR detection confidence threshold | | `-e, --embedding-model` | openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) | | `--matching-method` | margin | Matching method: `simple`, `margin`, or `multi-ref` | | `--margin` | 0.05 | Margin over second-best match (margin/multi-ref) | | `--refs-per-logo` | 3 | Reference images per logo | | `--min-matching-refs` | 1 | Min refs that must match (multi-ref only) | | `--use-max-similarity` | False | Use max instead of mean similarity (multi-ref only) | | `--positive-samples` | 5 | Positive test images per logo | | `--negative-samples` | 20 | Negative test images per logo | | `-s, --seed` | None | Random seed for reproducibility | | `--output-file` | None | Append results summary to file (clean output) | | `--clear-cache` | False | Clear embedding cache before running | **Matching Methods:** - `simple` - Returns all logos above threshold (baseline, most permissive) - `margin` - Requires margin over second-best match (reduces false positives) - `multi-ref` - Aggregates scores across multiple reference images per logo See `--help` for all options. ### Run Comparison Tests ```bash # Compare all matching methods ./run_comparison_tests.sh # Test various threshold/margin combinations ./run_threshold_tests.sh # Compare embedding models (CLIP vs DINOv2) ./run_model_comparison.sh ``` | Script | Purpose | Output File | |--------|---------|-------------| | `run_comparison_tests.sh` | Compare all 4 matching methods | `comparison_results.txt` | | `run_threshold_tests.sh` | Test threshold/margin combinations | `threshold_test_results.txt` | | `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `model_comparison_results.txt` | ## Project Structure ``` logo_test/ ├── logo_detection_detr.py # Core detection library (DetectLogosDETR class) ├── test_logo_detection.py # Test script for accuracy evaluation ├── prepare_test_data.py # Script to prepare test database ├── run_comparison_tests.sh # Compare all matching methods ├── run_threshold_tests.sh # Test threshold/margin combinations ├── run_model_comparison.sh # Compare CLIP vs DINOv2 models ├── test_data_mapping.db # SQLite database with ground truth ├── reference_logos/ # Reference logo images (not in git) ├── test_images/ # Test images (not in git) ├── LogoDet-3K/ # Source dataset (not in git) ├── logo_detection_detr_usage.md # API usage guide ├── logo_detection_test_methodology.md # Test methodology documentation └── test_results_analysis.md # Analysis of test results ``` ## Accuracy Improvement Techniques The framework implements several techniques to improve detection accuracy: 1. **Non-Maximum Suppression (NMS)** - Removes overlapping duplicate detections 2. **Minimum Box Size Filtering** - Filters out noise from tiny detections 3. **Confidence Threshold Filtering** - Removes low-confidence detections 4. **Multiple Reference Images** - Uses multiple refs per logo for robust matching 5. **Margin-Based Matching** - Requires confidence margin over second-best match 6. **Multi-Ref Matching** - Aggregates similarity scores across references 7. **Embedding Caching** - Caches embeddings to avoid recomputation ## Models ### Detection Model - **DETR**: `Pravallika6/detr-finetuned-logo-detection_v2` ### Embedding Models (selectable via `-e/--embedding-model`) | Model | Type | Description | |-------|------|-------------| | `openai/clip-vit-large-patch14` | CLIP | Default. General-purpose vision-language model | | `openai/clip-vit-base-patch32` | CLIP | Smaller, faster CLIP variant | | `facebook/dinov2-small` | DINOv2 | Self-supervised, good for visual similarity | | `facebook/dinov2-base` | DINOv2 | Larger DINOv2 variant | | `facebook/dinov2-large` | DINOv2 | Largest DINOv2 variant | Models are automatically downloaded from HuggingFace on first run and cached in `~/.cache/huggingface/`. **Note**: When switching between embedding models, use `--clear-cache` to ensure embeddings are recomputed with the new model. ## Documentation - [API Usage Guide](logo_detection_detr_usage.md) - How to use the DetectLogosDETR class - [Test Methodology](logo_detection_test_methodology.md) - Detailed explanation of test framework and tuning ## License MIT