Initial commit: Logo detection test framework
Add DETR+CLIP based logo detection library and test framework: - DetectLogosDETR class for logo detection and matching - Test script with margin-based and multi-ref matching methods - Data preparation script for test database - Documentation for API usage and test methodology
This commit is contained in:
116
README.md
Normal file
116
README.md
Normal file
@ -0,0 +1,116 @@
|
||||
# Logo Detection Test Framework
|
||||
|
||||
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
|
||||
|
||||
## Overview
|
||||
|
||||
This project provides tools to:
|
||||
- Detect logos in images using a fine-tuned DETR model
|
||||
- Match detected logos against reference images using CLIP embeddings
|
||||
- Evaluate detection accuracy with precision, recall, and F1 metrics
|
||||
|
||||
## Architecture
|
||||
|
||||
The system uses a two-stage pipeline:
|
||||
|
||||
1. **DETR** - Identifies potential logo regions (bounding boxes) in images
|
||||
2. **CLIP** - Extracts feature embeddings for each detected region and compares against reference logos
|
||||
|
||||
## Installation
|
||||
|
||||
Requires Python 3.12+. Uses [uv](https://github.com/astral-sh/uv) for package management.
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Or using pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Prepare Test Data
|
||||
|
||||
First, prepare the test database with logo mappings:
|
||||
|
||||
```bash
|
||||
uv run python prepare_test_data.py
|
||||
```
|
||||
|
||||
This creates `test_data_mapping.db` with ground truth mappings between test images and logos.
|
||||
|
||||
### Run Detection Tests
|
||||
|
||||
```bash
|
||||
# Basic test with default settings (margin-based matching)
|
||||
uv run python test_logo_detection.py
|
||||
|
||||
# Test with more logos and custom threshold
|
||||
uv run python test_logo_detection.py -n 20 --threshold 0.75
|
||||
|
||||
# Use multi-ref matching method
|
||||
uv run python test_logo_detection.py --matching-method multi-ref \
|
||||
--refs-per-logo 5 --min-matching-refs 2
|
||||
|
||||
# Reproducible test with seed
|
||||
uv run python test_logo_detection.py -n 50 --seed 42
|
||||
```
|
||||
|
||||
### Key Parameters
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `-n, --num-logos` | 10 | Number of reference logos to sample |
|
||||
| `-t, --threshold` | 0.7 | CLIP similarity threshold |
|
||||
| `-d, --detr-threshold` | 0.5 | DETR detection confidence threshold |
|
||||
| `--matching-method` | margin | Matching method: `margin` or `multi-ref` |
|
||||
| `--margin` | 0.05 | Margin over second-best match (margin method) |
|
||||
| `--min-matching-refs` | 1 | Min refs that must match (multi-ref method) |
|
||||
| `--refs-per-logo` | 3 | Reference images per logo |
|
||||
| `-s, --seed` | None | Random seed for reproducibility |
|
||||
|
||||
See `--help` for all options.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
logo_test/
|
||||
├── logo_detection_detr.py # Core detection library (DetectLogosDETR class)
|
||||
├── test_logo_detection.py # Test script for accuracy evaluation
|
||||
├── prepare_test_data.py # Script to prepare test database
|
||||
├── test_data_mapping.db # SQLite database with ground truth
|
||||
├── reference_logos/ # Reference logo images (not in git)
|
||||
├── test_images/ # Test images (not in git)
|
||||
├── logo_detection_detr_usage.md # API usage guide
|
||||
└── logo_detection_test_methodology.md # Test methodology documentation
|
||||
```
|
||||
|
||||
## Accuracy Improvement Techniques
|
||||
|
||||
The framework implements several techniques to improve detection accuracy:
|
||||
|
||||
1. **Non-Maximum Suppression (NMS)** - Removes overlapping duplicate detections
|
||||
2. **Minimum Box Size Filtering** - Filters out noise from tiny detections
|
||||
3. **Confidence Threshold Filtering** - Removes low-confidence detections
|
||||
4. **Multiple Reference Images** - Uses multiple refs per logo for robust matching
|
||||
5. **Margin-Based Matching** - Requires confidence margin over second-best match
|
||||
6. **Multi-Ref Matching** - Aggregates similarity scores across references
|
||||
7. **Embedding Caching** - Caches embeddings to avoid recomputation
|
||||
|
||||
## Models
|
||||
|
||||
The framework uses:
|
||||
- **DETR**: `Pravallika6/detr-finetuned-logo-detection_v2`
|
||||
- **CLIP**: `openai/clip-vit-large-patch14`
|
||||
|
||||
Models are automatically downloaded from HuggingFace on first run and cached in `~/.cache/huggingface/`.
|
||||
|
||||
## Documentation
|
||||
|
||||
- [API Usage Guide](logo_detection_detr_usage.md) - How to use the DetectLogosDETR class
|
||||
- [Test Methodology](logo_detection_test_methodology.md) - Detailed explanation of test framework and tuning
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user