# Logo Recognition Model Analysis **Date:** January 7, 2026 **Purpose:** Determine the best model and threshold for logo recognition of logos not currently in the test set. --- ## Executive Summary | Model | Best Threshold | F1 Score | Precision | Recall | Recommended Use | |-------|---------------|----------|-----------|--------|-----------------| | **Image-Split Fine-tuned** | 0.70-0.75 | **67-68%** | 66-80% | 59-68% | Known logos (in reference set) | | Baseline CLIP | 0.70 | 57-60% | 48-49% | 72-77% | Unknown logos (never seen before) | | Logo-Split Fine-tuned | 0.76 | 56% | 49% | 64% | Not recommended | | DINOv2 (small/large) | - | 29-30% | 22-32% | 28-43% | Not suitable | **Winner: Image-Split Fine-tuned Model** at threshold **0.70-0.75** --- ## Detailed Model Comparison ### 1. Baseline CLIP (openai/clip-vit-large-patch14) The pre-trained CLIP model without any fine-tuning. **Threshold Performance:** | Threshold | Precision | Recall | F1 | |-----------|-----------|--------|-----| | 0.70 | 47.9% | 71.8% | 57.5% | | 0.80 | 33.0% | 63.1% | 43.4% | | 0.85 | 26.9% | 43.4% | 33.2% | | 0.90 | 54.9% | 22.8% | 32.2% | **Similarity Distribution:** - True Positive mean: 0.854 (range: 0.75-0.95) - False Positive mean: 0.846 (range: 0.75-0.95) - **Problem:** TP and FP distributions almost completely overlap **Suggested optimal threshold:** 0.756 (predicted F1 = 67.1%) **Strengths:** - Good recall at low thresholds - Works on completely unseen logos - No training required **Weaknesses:** - Poor separation between correct and incorrect matches - High false positive rate --- ### 2. Fine-tuned CLIP (Logo-Level Splits) Trained with contrastive learning, tested on completely unseen logo brands. **Threshold Performance:** | Threshold | Precision | Recall | F1 | |-----------|-----------|--------|-----| | 0.70 | 25.9% | 67.1% | 37.4% | | 0.76 | **49.1%** | 64.3% | **55.7%** | | 0.82 | 75.7% | 41.4% | 53.5% | | 0.86 | 88.6% | 28.1% | 42.7% | **Similarity Distribution:** - True Positive mean: 0.853 - False Positive mean: 0.787 (better separation than baseline) - Missed logos mean: 0.711 (only 43.7% above 0.75) **Suggested optimal threshold:** 0.82 (predicted F1 = 71.9%) **Strengths:** - Better TP/FP separation than baseline - Very high precision at high thresholds (88.6% at t=0.86) **Weaknesses:** - Does not generalize well to unseen logo brands - Many correct logos score below threshold (56% of missed logos below 0.75) - Worse than baseline at threshold 0.70 --- ### 3. Fine-tuned CLIP (Image-Level Splits) ⭐ BEST Trained with contrastive learning, all logo brands seen during training (different images held out for testing). **Threshold Performance:** | Threshold | Precision | Recall | F1 | |-----------|-----------|--------|-----| | 0.65 | 56.9% | **75.9%** | 65.0% | | 0.70 | 66.3% | 68.3% | **67.3%** | | 0.75 | **79.9%** | 59.3% | **68.1%** | | 0.80 | 83.7% | 52.8% | 64.8% | | 0.85 | 92.4% | 42.8% | 58.5% | | 0.90 | 98.9% | 24.7% | 39.5% | **Similarity Distribution:** - True Positive mean: 0.866 (higher than other models) - False Positive mean: 0.807 - TP-FP gap: 0.059 (best separation) - At t=0.75: 92 TP vs only 38 FP (excellent ratio) **Suggested optimal threshold:** 0.755 (predicted F1 = 85.6%) **Strengths:** - Best overall F1 score (68.1% at t=0.75) - Best precision at any threshold (79.9-98.9%) - Excellent TP/FP ratio - Highest true positive similarity scores **Weaknesses:** - Requires logos to be in the reference set during training - May not generalize to completely novel logos --- ### 4. DINOv2 Models Tested for comparison but significantly underperformed. | Model | Precision | Recall | F1 | |-------|-----------|--------|-----| | DINOv2-small | 22.4% | 42.8% | 29.5% | | DINOv2-large | 32.2% | 28.5% | 30.2% | **Not recommended** for logo recognition tasks. --- ## Recommendations ### For Logo Recognition of Known Logos (logos in your reference set) **Use: Image-Split Fine-tuned Model** ```bash # Recommended configuration python test_logo_detection.py \ -e models/logo_detection/clip_finetuned_image_split \ -t 0.70 \ --matching-method multi-ref \ --use-max-similarity ``` | Use Case | Threshold | Expected Performance | |----------|-----------|---------------------| | Balanced (recommended) | 0.70 | 66% precision, 68% recall, 67% F1 | | High precision | 0.75 | 80% precision, 59% recall, 68% F1 | | Very high precision | 0.80 | 84% precision, 53% recall, 65% F1 | | Maximum precision | 0.85+ | 92%+ precision, <43% recall | ### For Logo Recognition of Unknown Logos (completely novel brands) **Use: Baseline CLIP** (the fine-tuned models don't generalize well) ```bash # Recommended configuration python test_logo_detection.py \ -e openai/clip-vit-large-patch14 \ -t 0.70 \ --matching-method multi-ref \ --use-max-similarity ``` Expected: ~48% precision, ~72% recall, ~58% F1 --- ## Key Findings ### 1. Image-Level Splits Dramatically Improve Performance The image-split fine-tuned model outperforms all others because: - It learns brand-specific features during training - Test images are different but from same brands - Better represents real-world use where you have reference images for logos you want to detect ### 2. Logo-Level Splits Test True Generalization (but results are poor) The logo-split model tests whether fine-tuning helps with completely unseen logos: - Result: It doesn't help much (56% F1 vs 58% baseline) - Contrastive learning doesn't transfer well to novel brands - Use baseline CLIP for novel logo detection ### 3. Threshold Sweet Spot is 0.70-0.75 For all models, the optimal F1 occurs around threshold 0.70-0.75: - Lower thresholds: Too many false positives - Higher thresholds: Misses too many correct logos - At 0.90+: Precision is high but recall drops below 25% ### 4. Precision-Recall Tradeoff | Priority | Threshold | Tradeoff | |----------|-----------|----------| | Recall | 0.65-0.70 | More matches, more false positives | | Balanced | 0.70-0.75 | Best F1 score | | Precision | 0.75-0.80 | Fewer false positives, misses some matches | | High Precision | 0.85+ | Very few false positives, misses many matches | --- ## Conclusion **For production use with known logos:** - Use **Image-Split Fine-tuned Model** at **threshold 0.70-0.75** - Expected F1: 67-68%, Precision: 66-80% **For discovering unknown logos:** - Use **Baseline CLIP** at **threshold 0.70** - Expected F1: ~58%, Precision: ~48% The image-split fine-tuning provides significant improvements (+8-10% F1) over baseline for known logos, but does not help with completely novel brands. For a production system, ensure all target logos are included in the training/reference set.