Files

Rick McEwen b5432c9ef7 Add comprehensive model comparison analysis

2026-01-07 12:44:15 -05:00

6.6 KiB

Raw Blame History

Logo Recognition Model Analysis

Date: January 7, 2026 Purpose: Determine the best model and threshold for logo recognition of logos not currently in the test set.

Executive Summary

Model	Best Threshold	F1 Score	Precision	Recall	Recommended Use
Image-Split Fine-tuned	0.70-0.75	67-68%	66-80%	59-68%	Known logos (in reference set)
Baseline CLIP	0.70	57-60%	48-49%	72-77%	Unknown logos (never seen before)
Logo-Split Fine-tuned	0.76	56%	49%	64%	Not recommended
DINOv2 (small/large)	-	29-30%	22-32%	28-43%	Not suitable

Winner: Image-Split Fine-tuned Model at threshold 0.70-0.75

Detailed Model Comparison

1. Baseline CLIP (openai/clip-vit-large-patch14)

The pre-trained CLIP model without any fine-tuning.

Threshold Performance:

Threshold	Precision	Recall	F1
0.70	47.9%	71.8%	57.5%
0.80	33.0%	63.1%	43.4%
0.85	26.9%	43.4%	33.2%
0.90	54.9%	22.8%	32.2%

Similarity Distribution:

True Positive mean: 0.854 (range: 0.75-0.95)
False Positive mean: 0.846 (range: 0.75-0.95)
Problem: TP and FP distributions almost completely overlap

Suggested optimal threshold: 0.756 (predicted F1 = 67.1%)

Strengths:

Good recall at low thresholds
Works on completely unseen logos
No training required

Weaknesses:

Poor separation between correct and incorrect matches
High false positive rate

2. Fine-tuned CLIP (Logo-Level Splits)

Trained with contrastive learning, tested on completely unseen logo brands.

Threshold Performance:

Threshold	Precision	Recall	F1
0.70	25.9%	67.1%	37.4%
0.76	49.1%	64.3%	55.7%
0.82	75.7%	41.4%	53.5%
0.86	88.6%	28.1%	42.7%

Similarity Distribution:

True Positive mean: 0.853
False Positive mean: 0.787 (better separation than baseline)
Missed logos mean: 0.711 (only 43.7% above 0.75)

Suggested optimal threshold: 0.82 (predicted F1 = 71.9%)

Strengths:

Better TP/FP separation than baseline
Very high precision at high thresholds (88.6% at t=0.86)

Weaknesses:

Does not generalize well to unseen logo brands
Many correct logos score below threshold (56% of missed logos below 0.75)
Worse than baseline at threshold 0.70

3. Fine-tuned CLIP (Image-Level Splits) ⭐ BEST

Trained with contrastive learning, all logo brands seen during training (different images held out for testing).

Threshold Performance:

Threshold	Precision	Recall	F1
0.65	56.9%	75.9%	65.0%
0.70	66.3%	68.3%	67.3%
0.75	79.9%	59.3%	68.1%
0.80	83.7%	52.8%	64.8%
0.85	92.4%	42.8%	58.5%
0.90	98.9%	24.7%	39.5%

Similarity Distribution:

True Positive mean: 0.866 (higher than other models)
False Positive mean: 0.807
TP-FP gap: 0.059 (best separation)
At t=0.75: 92 TP vs only 38 FP (excellent ratio)

Suggested optimal threshold: 0.755 (predicted F1 = 85.6%)

Strengths:

Best overall F1 score (68.1% at t=0.75)
Best precision at any threshold (79.9-98.9%)
Excellent TP/FP ratio
Highest true positive similarity scores

Weaknesses:

Requires logos to be in the reference set during training
May not generalize to completely novel logos

4. DINOv2 Models

Tested for comparison but significantly underperformed.

Model	Precision	Recall	F1
DINOv2-small	22.4%	42.8%	29.5%
DINOv2-large	32.2%	28.5%	30.2%

Not recommended for logo recognition tasks.

Recommendations

For Logo Recognition of Known Logos (logos in your reference set)

Use: Image-Split Fine-tuned Model

# Recommended configuration
python test_logo_detection.py \
    -e models/logo_detection/clip_finetuned_image_split \
    -t 0.70 \
    --matching-method multi-ref \
    --use-max-similarity

Use Case	Threshold	Expected Performance
Balanced (recommended)	0.70	66% precision, 68% recall, 67% F1
High precision	0.75	80% precision, 59% recall, 68% F1
Very high precision	0.80	84% precision, 53% recall, 65% F1
Maximum precision	0.85+	92%+ precision, <43% recall

For Logo Recognition of Unknown Logos (completely novel brands)

Use: Baseline CLIP (the fine-tuned models don't generalize well)

# Recommended configuration
python test_logo_detection.py \
    -e openai/clip-vit-large-patch14 \
    -t 0.70 \
    --matching-method multi-ref \
    --use-max-similarity

Expected: ~48% precision, ~72% recall, ~58% F1

Key Findings

1. Image-Level Splits Dramatically Improve Performance

The image-split fine-tuned model outperforms all others because:

It learns brand-specific features during training
Test images are different but from same brands
Better represents real-world use where you have reference images for logos you want to detect

2. Logo-Level Splits Test True Generalization (but results are poor)

The logo-split model tests whether fine-tuning helps with completely unseen logos:

Result: It doesn't help much (56% F1 vs 58% baseline)
Contrastive learning doesn't transfer well to novel brands
Use baseline CLIP for novel logo detection

3. Threshold Sweet Spot is 0.70-0.75

For all models, the optimal F1 occurs around threshold 0.70-0.75:

Lower thresholds: Too many false positives
Higher thresholds: Misses too many correct logos
At 0.90+: Precision is high but recall drops below 25%

4. Precision-Recall Tradeoff

Priority	Threshold	Tradeoff
Recall	0.65-0.70	More matches, more false positives
Balanced	0.70-0.75	Best F1 score
Precision	0.75-0.80	Fewer false positives, misses some matches
High Precision	0.85+	Very few false positives, misses many matches

Conclusion

For production use with known logos:

Use Image-Split Fine-tuned Model at threshold 0.70-0.75
Expected F1: 67-68%, Precision: 66-80%

For discovering unknown logos:

Use Baseline CLIP at threshold 0.70
Expected F1: ~58%, Precision: ~48%

The image-split fine-tuning provides significant improvements (+8-10% F1) over baseline for known logos, but does not help with completely novel brands. For a production system, ensure all target logos are included in the training/reference set.

6.6 KiB Raw Blame History

Logo Recognition Model Analysis

Executive Summary

Detailed Model Comparison

1. Baseline CLIP (openai/clip-vit-large-patch14)

2. Fine-tuned CLIP (Logo-Level Splits)

3. Fine-tuned CLIP (Image-Level Splits) ⭐ BEST

4. DINOv2 Models

Recommendations

For Logo Recognition of Known Logos (logos in your reference set)

For Logo Recognition of Unknown Logos (completely novel brands)

Key Findings

1. Image-Level Splits Dramatically Improve Performance

2. Logo-Level Splits Test True Generalization (but results are poor)

3. Threshold Sweet Spot is 0.70-0.75

4. Precision-Recall Tradeoff

Conclusion

6.6 KiB

Raw Blame History