6.6 KiB
Logo Recognition Model Analysis
Date: January 7, 2026 Purpose: Determine the best model and threshold for logo recognition of logos not currently in the test set.
Executive Summary
| Model | Best Threshold | F1 Score | Precision | Recall | Recommended Use |
|---|---|---|---|---|---|
| Image-Split Fine-tuned | 0.70-0.75 | 67-68% | 66-80% | 59-68% | Known logos (in reference set) |
| Baseline CLIP | 0.70 | 57-60% | 48-49% | 72-77% | Unknown logos (never seen before) |
| Logo-Split Fine-tuned | 0.76 | 56% | 49% | 64% | Not recommended |
| DINOv2 (small/large) | - | 29-30% | 22-32% | 28-43% | Not suitable |
Winner: Image-Split Fine-tuned Model at threshold 0.70-0.75
Detailed Model Comparison
1. Baseline CLIP (openai/clip-vit-large-patch14)
The pre-trained CLIP model without any fine-tuning.
Threshold Performance:
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.70 | 47.9% | 71.8% | 57.5% |
| 0.80 | 33.0% | 63.1% | 43.4% |
| 0.85 | 26.9% | 43.4% | 33.2% |
| 0.90 | 54.9% | 22.8% | 32.2% |
Similarity Distribution:
- True Positive mean: 0.854 (range: 0.75-0.95)
- False Positive mean: 0.846 (range: 0.75-0.95)
- Problem: TP and FP distributions almost completely overlap
Suggested optimal threshold: 0.756 (predicted F1 = 67.1%)
Strengths:
- Good recall at low thresholds
- Works on completely unseen logos
- No training required
Weaknesses:
- Poor separation between correct and incorrect matches
- High false positive rate
2. Fine-tuned CLIP (Logo-Level Splits)
Trained with contrastive learning, tested on completely unseen logo brands.
Threshold Performance:
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.70 | 25.9% | 67.1% | 37.4% |
| 0.76 | 49.1% | 64.3% | 55.7% |
| 0.82 | 75.7% | 41.4% | 53.5% |
| 0.86 | 88.6% | 28.1% | 42.7% |
Similarity Distribution:
- True Positive mean: 0.853
- False Positive mean: 0.787 (better separation than baseline)
- Missed logos mean: 0.711 (only 43.7% above 0.75)
Suggested optimal threshold: 0.82 (predicted F1 = 71.9%)
Strengths:
- Better TP/FP separation than baseline
- Very high precision at high thresholds (88.6% at t=0.86)
Weaknesses:
- Does not generalize well to unseen logo brands
- Many correct logos score below threshold (56% of missed logos below 0.75)
- Worse than baseline at threshold 0.70
3. Fine-tuned CLIP (Image-Level Splits) ⭐ BEST
Trained with contrastive learning, all logo brands seen during training (different images held out for testing).
Threshold Performance:
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.65 | 56.9% | 75.9% | 65.0% |
| 0.70 | 66.3% | 68.3% | 67.3% |
| 0.75 | 79.9% | 59.3% | 68.1% |
| 0.80 | 83.7% | 52.8% | 64.8% |
| 0.85 | 92.4% | 42.8% | 58.5% |
| 0.90 | 98.9% | 24.7% | 39.5% |
Similarity Distribution:
- True Positive mean: 0.866 (higher than other models)
- False Positive mean: 0.807
- TP-FP gap: 0.059 (best separation)
- At t=0.75: 92 TP vs only 38 FP (excellent ratio)
Suggested optimal threshold: 0.755 (predicted F1 = 85.6%)
Strengths:
- Best overall F1 score (68.1% at t=0.75)
- Best precision at any threshold (79.9-98.9%)
- Excellent TP/FP ratio
- Highest true positive similarity scores
Weaknesses:
- Requires logos to be in the reference set during training
- May not generalize to completely novel logos
4. DINOv2 Models
Tested for comparison but significantly underperformed.
| Model | Precision | Recall | F1 |
|---|---|---|---|
| DINOv2-small | 22.4% | 42.8% | 29.5% |
| DINOv2-large | 32.2% | 28.5% | 30.2% |
Not recommended for logo recognition tasks.
Recommendations
For Logo Recognition of Known Logos (logos in your reference set)
Use: Image-Split Fine-tuned Model
# Recommended configuration
python test_logo_detection.py \
-e models/logo_detection/clip_finetuned_image_split \
-t 0.70 \
--matching-method multi-ref \
--use-max-similarity
| Use Case | Threshold | Expected Performance |
|---|---|---|
| Balanced (recommended) | 0.70 | 66% precision, 68% recall, 67% F1 |
| High precision | 0.75 | 80% precision, 59% recall, 68% F1 |
| Very high precision | 0.80 | 84% precision, 53% recall, 65% F1 |
| Maximum precision | 0.85+ | 92%+ precision, <43% recall |
For Logo Recognition of Unknown Logos (completely novel brands)
Use: Baseline CLIP (the fine-tuned models don't generalize well)
# Recommended configuration
python test_logo_detection.py \
-e openai/clip-vit-large-patch14 \
-t 0.70 \
--matching-method multi-ref \
--use-max-similarity
Expected: ~48% precision, ~72% recall, ~58% F1
Key Findings
1. Image-Level Splits Dramatically Improve Performance
The image-split fine-tuned model outperforms all others because:
- It learns brand-specific features during training
- Test images are different but from same brands
- Better represents real-world use where you have reference images for logos you want to detect
2. Logo-Level Splits Test True Generalization (but results are poor)
The logo-split model tests whether fine-tuning helps with completely unseen logos:
- Result: It doesn't help much (56% F1 vs 58% baseline)
- Contrastive learning doesn't transfer well to novel brands
- Use baseline CLIP for novel logo detection
3. Threshold Sweet Spot is 0.70-0.75
For all models, the optimal F1 occurs around threshold 0.70-0.75:
- Lower thresholds: Too many false positives
- Higher thresholds: Misses too many correct logos
- At 0.90+: Precision is high but recall drops below 25%
4. Precision-Recall Tradeoff
| Priority | Threshold | Tradeoff |
|---|---|---|
| Recall | 0.65-0.70 | More matches, more false positives |
| Balanced | 0.70-0.75 | Best F1 score |
| Precision | 0.75-0.80 | Fewer false positives, misses some matches |
| High Precision | 0.85+ | Very few false positives, misses many matches |
Conclusion
For production use with known logos:
- Use Image-Split Fine-tuned Model at threshold 0.70-0.75
- Expected F1: 67-68%, Precision: 66-80%
For discovering unknown logos:
- Use Baseline CLIP at threshold 0.70
- Expected F1: ~58%, Precision: ~48%
The image-split fine-tuning provides significant improvements (+8-10% F1) over baseline for known logos, but does not help with completely novel brands. For a production system, ensure all target logos are included in the training/reference set.