Add comprehensive recommendations section based on LogoDet-3K testing: - Optimal parameter settings table (multi-ref, max aggregation, CLIP model) - Performance benchmarks for refs-per-logo (1-10 refs) - Matching method comparison (simple vs margin vs multi-ref) - Embedding model comparison (CLIP vs DINOv2) - Preprocessing mode comparison (default vs letterbox vs stretch)
264 lines
10 KiB
Markdown
264 lines
10 KiB
Markdown
# Logo Detection Test Framework
|
|
|
|
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
|
|
|
|
## Recommended Settings
|
|
|
|
Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:
|
|
|
|
| Parameter | Recommended Value | Notes |
|
|
|-----------|-------------------|-------|
|
|
| **Matching Method** | `multi-ref` | Best balance of precision and recall |
|
|
| **Similarity Aggregation** | `max` (default) | Max outperforms mean aggregation |
|
|
| **Embedding Model** | `openai/clip-vit-large-patch14` | Significantly outperforms DINOv2 |
|
|
| **CLIP Threshold** | `0.70` | Good precision/recall balance |
|
|
| **DETR Threshold** | `0.50` | Default detection confidence |
|
|
| **Margin** | `0.05` | Reduces false positives |
|
|
| **Refs per Logo** | `7-10` | More references = better accuracy |
|
|
| **Preprocessing** | `default` | Best precision; letterbox/stretch hurt precision |
|
|
|
|
**Example command with recommended settings:**
|
|
```bash
|
|
uv run python test_logo_detection.py \
|
|
--matching-method multi-ref \
|
|
--refs-per-logo 10 \
|
|
--threshold 0.70 \
|
|
--margin 0.05 \
|
|
--use-max-similarity
|
|
```
|
|
|
|
### Performance Benchmarks
|
|
|
|
With recommended settings (multi-ref max, threshold 0.70, margin 0.05):
|
|
|
|
| Refs/Logo | Precision | Recall | F1 Score |
|
|
|-----------|-----------|--------|----------|
|
|
| 1 | 45.8% | 65.9% | 54.0% |
|
|
| 3 | 40.5% | 72.4% | 51.9% |
|
|
| 5 | 47.2% | 72.6% | 57.2% |
|
|
| 7 | **51.0%** | **79.9%** | **62.3%** |
|
|
| 10 | 50.2% | 81.6% | 62.1% |
|
|
|
|
**Key findings:**
|
|
- More reference images per logo consistently improves recall
|
|
- 7+ refs provides the best precision/recall balance
|
|
- Diminishing returns beyond 10 refs
|
|
|
|
### Matching Method Comparison
|
|
|
|
| Method | Precision | Recall | F1 | Use Case |
|
|
|--------|-----------|--------|-----|----------|
|
|
| `simple` | 1.3% | 203%* | 2.5% | Not recommended (too many FPs) |
|
|
| `margin` | 69.8% | 16.3% | 26.4% | High precision, low recall |
|
|
| `multi-ref` (mean) | 51.8% | 63.1% | 56.9% | Balanced |
|
|
| `multi-ref` (max) | **51.8%** | **75.3%** | **61.4%** | **Best overall** |
|
|
|
|
*Simple method returns all matches above threshold, causing many duplicates.
|
|
|
|
### Embedding Model Comparison
|
|
|
|
| Model | Precision | Recall | F1 | Recommendation |
|
|
|-------|-----------|--------|-----|----------------|
|
|
| `openai/clip-vit-large-patch14` | **49.1%** | **77.0%** | **59.9%** | **Recommended** |
|
|
| `facebook/dinov2-small` | 22.4% | 42.8% | 29.5% | Not recommended |
|
|
| `facebook/dinov2-large` | 32.2% | 28.5% | 30.2% | Not recommended |
|
|
|
|
CLIP significantly outperforms DINOv2 for logo matching tasks.
|
|
|
|
### Preprocessing Mode Comparison
|
|
|
|
| Mode | Precision | Recall | F1 | Notes |
|
|
|------|-----------|--------|-----|-------|
|
|
| `default` | **50.2%** | 81.6% | 62.1% | **Recommended** - best precision |
|
|
| `letterbox` | 42.4% | 119%* | 62.6% | Higher recall but worse precision |
|
|
| `stretch` | 34.5% | 113%* | 52.9% | Not recommended |
|
|
|
|
*Recall >100% indicates multiple detections per expected logo.
|
|
|
|
**Recommendation:** Use `default` preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This project provides tools to:
|
|
- Detect logos in images using a fine-tuned DETR model
|
|
- Match detected logos against reference images using CLIP embeddings
|
|
- Evaluate detection accuracy with precision, recall, and F1 metrics
|
|
|
|
## Architecture
|
|
|
|
The system uses a two-stage pipeline:
|
|
|
|
1. **DETR** - Identifies potential logo regions (bounding boxes) in images
|
|
2. **CLIP** - Extracts feature embeddings for each detected region and compares against reference logos
|
|
|
|
## Installation
|
|
|
|
Requires Python 3.12+. Uses [uv](https://github.com/astral-sh/uv) for package management.
|
|
|
|
```bash
|
|
# Install dependencies
|
|
uv sync
|
|
|
|
# Or using pip
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Prepare Test Data
|
|
|
|
The test framework requires the **LogoDet-3K** dataset. Download it and place it in the project directory:
|
|
|
|
```
|
|
logo_test/
|
|
├── LogoDet-3K/ # Dataset directory (required)
|
|
│ ├── Clothes/ # Category directories
|
|
│ │ ├── Adidas/ # Brand directories with images + XML annotations
|
|
│ │ ├── Nike/
|
|
│ │ └── ...
|
|
│ ├── Electronic/
|
|
│ ├── Food/
|
|
│ └── ...
|
|
```
|
|
|
|
The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.
|
|
|
|
Then run the preparation script:
|
|
|
|
```bash
|
|
uv run python prepare_test_data.py
|
|
```
|
|
|
|
This script:
|
|
1. Scans `LogoDet-3K/` for images and XML annotation files
|
|
2. Extracts cropped logo regions using bounding box data → saves to `reference_logos/`
|
|
3. Copies full images → saves to `test_images/`
|
|
4. Creates `test_data_mapping.db` SQLite database with ground truth mappings
|
|
|
|
### Run Detection Tests
|
|
|
|
```bash
|
|
# Basic test with default settings (margin-based matching)
|
|
uv run python test_logo_detection.py
|
|
|
|
# Test with more logos and custom threshold
|
|
uv run python test_logo_detection.py -n 20 --threshold 0.75
|
|
|
|
# Use multi-ref matching method
|
|
uv run python test_logo_detection.py --matching-method multi-ref \
|
|
--refs-per-logo 5 --min-matching-refs 2
|
|
|
|
# Reproducible test with seed
|
|
uv run python test_logo_detection.py -n 50 --seed 42
|
|
```
|
|
|
|
### Key Parameters
|
|
|
|
| Parameter | Default | Description |
|
|
|-----------|---------|-------------|
|
|
| `-n, --num-logos` | 10 | Number of reference logos to sample |
|
|
| `-t, --threshold` | 0.7 | Similarity threshold for matching |
|
|
| `-d, --detr-threshold` | 0.5 | DETR detection confidence threshold |
|
|
| `-e, --embedding-model` | openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) |
|
|
| `--matching-method` | margin | Matching method: `simple`, `margin`, or `multi-ref` |
|
|
| `--margin` | 0.05 | Margin over second-best match (margin/multi-ref) |
|
|
| `--refs-per-logo` | 3 | Reference images per logo |
|
|
| `--min-matching-refs` | 1 | Min refs that must match (multi-ref only) |
|
|
| `--use-max-similarity` | False | Use max instead of mean similarity (multi-ref only) |
|
|
| `--positive-samples` | 5 | Positive test images per logo |
|
|
| `--negative-samples` | 20 | Negative test images per logo |
|
|
| `-s, --seed` | None | Random seed for reproducibility |
|
|
| `--output-file` | None | Append results summary to file (clean output) |
|
|
| `--clear-cache` | False | Clear embedding cache before running |
|
|
|
|
**Matching Methods:**
|
|
- `simple` - Returns all logos above threshold (not recommended - too many false positives)
|
|
- `margin` - Requires margin over second-best match (high precision, low recall)
|
|
- `multi-ref` - **Recommended.** Aggregates scores across multiple reference images per logo
|
|
|
|
See `--help` for all options.
|
|
|
|
### Run Comparison Tests
|
|
|
|
```bash
|
|
# Compare all matching methods
|
|
./run_comparison_tests.sh
|
|
|
|
# Test various threshold/margin combinations
|
|
./run_threshold_tests.sh
|
|
|
|
# Compare embedding models (CLIP vs DINOv2)
|
|
./run_model_comparison.sh
|
|
|
|
# Test different refs-per-logo values
|
|
./run_refs_per_logo_test.sh
|
|
```
|
|
|
|
| Script | Purpose | Output File |
|
|
|--------|---------|-------------|
|
|
| `run_comparison_tests.sh` | Compare matching methods | `test_results/comparison_*.txt` |
|
|
| `run_threshold_tests.sh` | Test threshold/margin combinations | `test_results/threshold_*.txt` |
|
|
| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `test_results/model_comparison_results.txt` |
|
|
| `run_refs_per_logo_test.sh` | Test refs-per-logo values | `test_results/refs_per_logo_analysis.txt` |
|
|
| `run_preprocess_test.sh` | Compare preprocessing modes | `test_results/preprocessing_comparison.txt` |
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
logo_test/
|
|
├── logo_detection_detr.py # Core detection library (DetectLogosDETR class)
|
|
├── test_logo_detection.py # Test script for accuracy evaluation
|
|
├── prepare_test_data.py # Script to prepare test database
|
|
├── run_comparison_tests.sh # Compare all matching methods
|
|
├── run_threshold_tests.sh # Test threshold/margin combinations
|
|
├── run_model_comparison.sh # Compare CLIP vs DINOv2 models
|
|
├── test_data_mapping.db # SQLite database with ground truth
|
|
├── reference_logos/ # Reference logo images (not in git)
|
|
├── test_images/ # Test images (not in git)
|
|
├── LogoDet-3K/ # Source dataset (not in git)
|
|
├── logo_detection_detr_usage.md # API usage guide
|
|
├── logo_detection_test_methodology.md # Test methodology documentation
|
|
└── test_results_analysis.md # Analysis of test results
|
|
```
|
|
|
|
## Accuracy Improvement Techniques
|
|
|
|
The framework implements several techniques to improve detection accuracy:
|
|
|
|
1. **Non-Maximum Suppression (NMS)** - Removes overlapping duplicate detections
|
|
2. **Minimum Box Size Filtering** - Filters out noise from tiny detections
|
|
3. **Confidence Threshold Filtering** - Removes low-confidence detections
|
|
4. **Multiple Reference Images** - Uses multiple refs per logo for robust matching
|
|
5. **Margin-Based Matching** - Requires confidence margin over second-best match
|
|
6. **Multi-Ref Matching** - Aggregates similarity scores across references
|
|
7. **Embedding Caching** - Caches embeddings to avoid recomputation
|
|
|
|
## Models
|
|
|
|
### Detection Model
|
|
- **DETR**: `Pravallika6/detr-finetuned-logo-detection_v2`
|
|
|
|
### Embedding Models (selectable via `-e/--embedding-model`)
|
|
|
|
| Model | Type | Description |
|
|
|-------|------|-------------|
|
|
| `openai/clip-vit-large-patch14` | CLIP | Default. General-purpose vision-language model |
|
|
| `openai/clip-vit-base-patch32` | CLIP | Smaller, faster CLIP variant |
|
|
| `facebook/dinov2-small` | DINOv2 | Self-supervised, good for visual similarity |
|
|
| `facebook/dinov2-base` | DINOv2 | Larger DINOv2 variant |
|
|
| `facebook/dinov2-large` | DINOv2 | Largest DINOv2 variant |
|
|
|
|
Models are automatically downloaded from HuggingFace on first run and cached in `~/.cache/huggingface/`.
|
|
|
|
**Note**: When switching between embedding models, use `--clear-cache` to ensure embeddings are recomputed with the new model.
|
|
|
|
## Documentation
|
|
|
|
- [API Usage Guide](logo_detection_detr_usage.md) - How to use the DetectLogosDETR class
|
|
- [Test Methodology](logo_detection_test_methodology.md) - Detailed explanation of test framework and tuning
|
|
|
|
## License
|
|
|
|
MIT |