Embedding Model Comparison Tests ================================= Date: Fri Jan 2 12:47:03 PM MST 2026 Common Parameters: Matching method: multi-ref (max) Reference logos: 20 Refs per logo: 10 Positive samples: 20 Negative samples: 100 Min matching refs: 3 Threshold: 0.70 Margin: 0.05 Seed: 42 ====================================================================== TEST: MULTI-REF MATCHING Model: openai/clip-vit-large-patch14 Method: Multi-ref (max, min_refs=3, margin=0.05) ====================================================================== Date: 2026-01-02 13:05:17 Configuration: Embedding model: openai/clip-vit-large-patch14 Reference logos: 20 Refs per logo: 10 Total reference embeddings:189 Positive samples/logo: 20 Negative samples/logo: 100 Test images processed: 2355 Similarity threshold: 0.7 DETR threshold: 0.5 Random seed: 42 Results: True Positives: 284 False Positives: 295 False Negatives: 124 Total Expected: 369 Scores: Precision: 0.4905 (49.1%) Recall: 0.7696 (77.0%) F1 Score: 0.5992 (59.9%) ====================================================================== TEST: MULTI-REF MATCHING Model: facebook/dinov2-small Method: Multi-ref (max, min_refs=3, margin=0.05) ====================================================================== Date: 2026-01-02 13:19:01 Configuration: Embedding model: facebook/dinov2-small Reference logos: 20 Refs per logo: 10 Total reference embeddings:189 Positive samples/logo: 20 Negative samples/logo: 100 Test images processed: 2358 Similarity threshold: 0.7 DETR threshold: 0.5 Random seed: 42 Results: True Positives: 158 False Positives: 546 False Negatives: 234 Total Expected: 369 Scores: Precision: 0.2244 (22.4%) Recall: 0.4282 (42.8%) F1 Score: 0.2945 (29.5%) ====================================================================== TEST: MULTI-REF MATCHING Model: facebook/dinov2-large Method: Multi-ref (max, min_refs=3, margin=0.05) ====================================================================== Date: 2026-01-02 13:39:33 Configuration: Embedding model: facebook/dinov2-large Reference logos: 20 Refs per logo: 10 Total reference embeddings:189 Positive samples/logo: 20 Negative samples/logo: 100 Test images processed: 2355 Similarity threshold: 0.7 DETR threshold: 0.5 Random seed: 42 Results: True Positives: 105 False Positives: 221 False Negatives: 277 Total Expected: 369 Scores: Precision: 0.3221 (32.2%) Recall: 0.2846 (28.5%) F1 Score: 0.3022 (30.2%)