Update README with recommended settings and test results
Add comprehensive recommendations section based on LogoDet-3K testing: - Optimal parameter settings table (multi-ref, max aggregation, CLIP model) - Performance benchmarks for refs-per-logo (1-10 refs) - Matching method comparison (simple vs margin vs multi-ref) - Embedding model comparison (CLIP vs DINOv2) - Preprocessing mode comparison (default vs letterbox vs stretch)
This commit is contained in:
94
README.md
94
README.md
@ -2,6 +2,83 @@
|
||||
|
||||
A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.
|
||||
|
||||
## Recommended Settings
|
||||
|
||||
Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:
|
||||
|
||||
| Parameter | Recommended Value | Notes |
|
||||
|-----------|-------------------|-------|
|
||||
| **Matching Method** | `multi-ref` | Best balance of precision and recall |
|
||||
| **Similarity Aggregation** | `max` (default) | Max outperforms mean aggregation |
|
||||
| **Embedding Model** | `openai/clip-vit-large-patch14` | Significantly outperforms DINOv2 |
|
||||
| **CLIP Threshold** | `0.70` | Good precision/recall balance |
|
||||
| **DETR Threshold** | `0.50` | Default detection confidence |
|
||||
| **Margin** | `0.05` | Reduces false positives |
|
||||
| **Refs per Logo** | `7-10` | More references = better accuracy |
|
||||
| **Preprocessing** | `default` | Best precision; letterbox/stretch hurt precision |
|
||||
|
||||
**Example command with recommended settings:**
|
||||
```bash
|
||||
uv run python test_logo_detection.py \
|
||||
--matching-method multi-ref \
|
||||
--refs-per-logo 10 \
|
||||
--threshold 0.70 \
|
||||
--margin 0.05 \
|
||||
--use-max-similarity
|
||||
```
|
||||
|
||||
### Performance Benchmarks
|
||||
|
||||
With recommended settings (multi-ref max, threshold 0.70, margin 0.05):
|
||||
|
||||
| Refs/Logo | Precision | Recall | F1 Score |
|
||||
|-----------|-----------|--------|----------|
|
||||
| 1 | 45.8% | 65.9% | 54.0% |
|
||||
| 3 | 40.5% | 72.4% | 51.9% |
|
||||
| 5 | 47.2% | 72.6% | 57.2% |
|
||||
| 7 | **51.0%** | **79.9%** | **62.3%** |
|
||||
| 10 | 50.2% | 81.6% | 62.1% |
|
||||
|
||||
**Key findings:**
|
||||
- More reference images per logo consistently improves recall
|
||||
- 7+ refs provides the best precision/recall balance
|
||||
- Diminishing returns beyond 10 refs
|
||||
|
||||
### Matching Method Comparison
|
||||
|
||||
| Method | Precision | Recall | F1 | Use Case |
|
||||
|--------|-----------|--------|-----|----------|
|
||||
| `simple` | 1.3% | 203%* | 2.5% | Not recommended (too many FPs) |
|
||||
| `margin` | 69.8% | 16.3% | 26.4% | High precision, low recall |
|
||||
| `multi-ref` (mean) | 51.8% | 63.1% | 56.9% | Balanced |
|
||||
| `multi-ref` (max) | **51.8%** | **75.3%** | **61.4%** | **Best overall** |
|
||||
|
||||
*Simple method returns all matches above threshold, causing many duplicates.
|
||||
|
||||
### Embedding Model Comparison
|
||||
|
||||
| Model | Precision | Recall | F1 | Recommendation |
|
||||
|-------|-----------|--------|-----|----------------|
|
||||
| `openai/clip-vit-large-patch14` | **49.1%** | **77.0%** | **59.9%** | **Recommended** |
|
||||
| `facebook/dinov2-small` | 22.4% | 42.8% | 29.5% | Not recommended |
|
||||
| `facebook/dinov2-large` | 32.2% | 28.5% | 30.2% | Not recommended |
|
||||
|
||||
CLIP significantly outperforms DINOv2 for logo matching tasks.
|
||||
|
||||
### Preprocessing Mode Comparison
|
||||
|
||||
| Mode | Precision | Recall | F1 | Notes |
|
||||
|------|-----------|--------|-----|-------|
|
||||
| `default` | **50.2%** | 81.6% | 62.1% | **Recommended** - best precision |
|
||||
| `letterbox` | 42.4% | 119%* | 62.6% | Higher recall but worse precision |
|
||||
| `stretch` | 34.5% | 113%* | 52.9% | Not recommended |
|
||||
|
||||
*Recall >100% indicates multiple detections per expected logo.
|
||||
|
||||
**Recommendation:** Use `default` preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This project provides tools to:
|
||||
@ -97,9 +174,9 @@ uv run python test_logo_detection.py -n 50 --seed 42
|
||||
| `--clear-cache` | False | Clear embedding cache before running |
|
||||
|
||||
**Matching Methods:**
|
||||
- `simple` - Returns all logos above threshold (baseline, most permissive)
|
||||
- `margin` - Requires margin over second-best match (reduces false positives)
|
||||
- `multi-ref` - Aggregates scores across multiple reference images per logo
|
||||
- `simple` - Returns all logos above threshold (not recommended - too many false positives)
|
||||
- `margin` - Requires margin over second-best match (high precision, low recall)
|
||||
- `multi-ref` - **Recommended.** Aggregates scores across multiple reference images per logo
|
||||
|
||||
See `--help` for all options.
|
||||
|
||||
@ -114,13 +191,18 @@ See `--help` for all options.
|
||||
|
||||
# Compare embedding models (CLIP vs DINOv2)
|
||||
./run_model_comparison.sh
|
||||
|
||||
# Test different refs-per-logo values
|
||||
./run_refs_per_logo_test.sh
|
||||
```
|
||||
|
||||
| Script | Purpose | Output File |
|
||||
|--------|---------|-------------|
|
||||
| `run_comparison_tests.sh` | Compare all 4 matching methods | `comparison_results.txt` |
|
||||
| `run_threshold_tests.sh` | Test threshold/margin combinations | `threshold_test_results.txt` |
|
||||
| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `model_comparison_results.txt` |
|
||||
| `run_comparison_tests.sh` | Compare matching methods | `test_results/comparison_*.txt` |
|
||||
| `run_threshold_tests.sh` | Test threshold/margin combinations | `test_results/threshold_*.txt` |
|
||||
| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `test_results/model_comparison_results.txt` |
|
||||
| `run_refs_per_logo_test.sh` | Test refs-per-logo values | `test_results/refs_per_logo_analysis.txt` |
|
||||
| `run_preprocess_test.sh` | Compare preprocessing modes | `test_results/preprocessing_comparison.txt` |
|
||||
|
||||
## Project Structure
|
||||
|
||||
|
||||
Reference in New Issue
Block a user