============================================================

Test Parameters:
  Logos: 50, Seed: 42, Threshold: 0.7
  Method: multi-ref, Refs/logo: 3, Margin: 0.05

BASELINE (openai/clip-vit-large-patch14):
  True Positives (correct matches):  101
  False Positives (wrong matches):   104
  False Negatives (missed logos):    156
  Precision: 0.4927 (49.3%)
  Recall:    0.4056 (40.6%)
  F1 Score:  0.4449 (44.5%)

FINE-TUNED (models/logo_detection/clip_finetuned):
  True Positives (correct matches):  164
  False Positives (wrong matches):   414
  False Negatives (missed logos):    115
  Precision: 0.2837 (28.4%)
  Recall:    0.6586 (65.9%)
  F1 Score:  0.3966 (39.7%)

------------------------------------------------------------
F1 SCORE COMPARISON:
  Baseline:    44.5%
  Fine-tuned:  39.7%
------------------------------------------------------------

Full results saved to: comparison_results/
