Updated results with similarities

Add Burnley averaged embeddings test results to README
DINOv2 with margin-based matching on barnfield/vertu logos: 43.8% precision, 19.2% recall, 26.7% F1.
2026-03-31 12:30:14 -06:00 · 2026-03-31 11:59:02 -06:00 · 2026-03-31 11:54:39 -06:00 · 2026-03-31 11:51:26 -06:00 · 2026-03-31 11:49:11 -06:00 · 2026-01-08 12:55:13 -05:00
22 changed files with 3354 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -2,6 +2,110 @@

 A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

+## Burnley Test: Averaged Embeddings with DINOv2
+
+A targeted test using `DetectLogosEmbeddings` to detect two specific logos (barnfield and vertu) in 516 Burnley match images. Reference embeddings are averaged across all images in each reference directory, and matching uses margin-based comparison (margin=0.05).
+
+**Test command:**
+```bash
+uv run python test_burnley_detection.py -e dinov2 -t 0.7 --margin 0.05 --output-file results_average_embeddings.txt
+```
+
+**Results (DINOv2, threshold 0.70, margin 0.05):**
+
+| Metric | Value |
+|--------|-------|
+| True Positives | 28 |
+| False Positives | 36 |
+| False Negatives | 125 |
+| Total Expected | 146 |
+| **Precision** | **43.8%** |
+| **Recall** | **19.2%** |
+| **F1 Score** | **26.7%** |
+
+Ground truth is derived from filename prefixes: `vertu_` (vertu logo), `barnfield_` (barnfield logo), `barnfield+vertu_` (both logos). Images without these prefixes are treated as negatives.
+
+Low recall suggests many logos go undetected by DETR or fall below the similarity threshold. The relatively low precision indicates DINOv2 averaged embeddings struggle to discriminate between the two logos in this domain. Further tuning of thresholds, margin, and embedding model (e.g. CLIP or SigLIP) may improve results.
+
+---
+
+## Recommended Settings
+
+Based on extensive testing with the LogoDet-3K dataset, these are the optimal settings:
+
+| Parameter | Recommended Value | Notes |
+|-----------|-------------------|-------|
+| **Matching Method** | `multi-ref` | Best balance of precision and recall |
+| **Similarity Aggregation** | `max` (default) | Max outperforms mean aggregation |
+| **Embedding Model** | `openai/clip-vit-large-patch14` | Significantly outperforms DINOv2 |
+| **CLIP Threshold** | `0.70` | Good precision/recall balance |
+| **DETR Threshold** | `0.50` | Default detection confidence |
+| **Margin** | `0.05` | Reduces false positives |
+| **Refs per Logo** | `7-10` | More references = better accuracy |
+| **Preprocessing** | `default` | Best precision; letterbox/stretch hurt precision |
+
+**Example command with recommended settings:**
+```bash
+uv run python test_logo_detection.py \
+    --matching-method multi-ref \
+    --refs-per-logo 10 \
+    --threshold 0.70 \
+    --margin 0.05 \
+    --use-max-similarity
+```
+
+### Performance Benchmarks
+
+With recommended settings (multi-ref max, threshold 0.70, margin 0.05):
+
+| Refs/Logo | Precision | Recall | F1 Score |
+|-----------|-----------|--------|----------|
+| 1 | 45.8% | 65.9% | 54.0% |
+| 3 | 40.5% | 72.4% | 51.9% |
+| 5 | 47.2% | 72.6% | 57.2% |
+| 7 | **51.0%** | **79.9%** | **62.3%** |
+| 10 | 50.2% | 81.6% | 62.1% |
+
+**Key findings:**
+- More reference images per logo consistently improves recall
+- 7+ refs provides the best precision/recall balance
+- Diminishing returns beyond 10 refs
+
+### Matching Method Comparison
+
+| Method | Precision | Recall | F1 | Use Case |
+|--------|-----------|--------|-----|----------|
+| `simple` | 1.3% | 203%* | 2.5% | Not recommended (too many FPs) |
+| `margin` | 69.8% | 16.3% | 26.4% | High precision, low recall |
+| `multi-ref` (mean) | 51.8% | 63.1% | 56.9% | Balanced |
+| `multi-ref` (max) | **51.8%** | **75.3%** | **61.4%** | **Best overall** |
+
+*Simple method returns all matches above threshold, causing many duplicates.
+
+### Embedding Model Comparison
+
+| Model | Precision | Recall | F1 | Recommendation |
+|-------|-----------|--------|-----|----------------|
+| `openai/clip-vit-large-patch14` | **49.1%** | **77.0%** | **59.9%** | **Recommended** |
+| `facebook/dinov2-small` | 22.4% | 42.8% | 29.5% | Not recommended |
+| `facebook/dinov2-large` | 32.2% | 28.5% | 30.2% | Not recommended |
+
+CLIP significantly outperforms DINOv2 for logo matching tasks.
+
+### Preprocessing Mode Comparison
+
+| Mode | Precision | Recall | F1 | Notes |
+|------|-----------|--------|-----|-------|
+| `default` | **50.2%** | 81.6% | 62.1% | **Recommended** - best precision |
+| `letterbox` | 42.4% | 119%* | 62.6% | Higher recall but worse precision |
+| `stretch` | 34.5% | 113%* | 52.9% | Not recommended |
+
+*Recall >100% indicates multiple detections per expected logo.
+
+**Recommendation:** Use `default` preprocessing. While letterbox shows marginally higher F1, it has significantly worse precision (more false positives).
+
+---
+
 ## Overview

 This project provides tools to:
@ -97,9 +201,9 @@ uv run python test_logo_detection.py -n 50 --seed 42
 | `--clear-cache` | False | Clear embedding cache before running |

 **Matching Methods:**
- `simple` - Returns all logos above threshold (baseline, most permissive)
- `margin` - Requires margin over second-best match (reduces false positives)
- `multi-ref` - Aggregates scores across multiple reference images per logo
+- `simple` - Returns all logos above threshold (not recommended - too many false positives)
+- `margin` - Requires margin over second-best match (high precision, low recall)
+- `multi-ref` - **Recommended.** Aggregates scores across multiple reference images per logo

 See `--help` for all options.

@ -114,13 +218,18 @@ See `--help` for all options.

 # Compare embedding models (CLIP vs DINOv2)
 ./run_model_comparison.sh
+
+# Test different refs-per-logo values
+./run_refs_per_logo_test.sh
 ```

 | Script | Purpose | Output File |
 |--------|---------|-------------|
-| `run_comparison_tests.sh` | Compare all 4 matching methods | `comparison_results.txt` |
-| `run_threshold_tests.sh` | Test threshold/margin combinations | `threshold_test_results.txt` |
-| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `model_comparison_results.txt` |
+| `run_comparison_tests.sh` | Compare matching methods | `test_results/comparison_*.txt` |
+| `run_threshold_tests.sh` | Test threshold/margin combinations | `test_results/threshold_*.txt` |
+| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `test_results/model_comparison_results.txt` |
+| `run_refs_per_logo_test.sh` | Test refs-per-logo values | `test_results/refs_per_logo_analysis.txt` |
+| `run_preprocess_test.sh` | Compare preprocessing modes | `test_results/preprocessing_comparison.txt` |

 ## Project Structure

--- a/logo_detection_detr.py
+++ b/logo_detection_detr.py
@ -49,6 +49,7 @@ class DetectLogosDETR:
        detr_threshold: float = 0.5,
        min_box_size: int = 20,
        nms_iou_threshold: float = 0.5,
+        preprocess_mode: str = "default",
    ):
        """
        Initialize DETR and embedding models.
@ -64,12 +65,17 @@ class DetectLogosDETR:
            detr_threshold: Confidence threshold for DETR detections (0-1)
            min_box_size: Minimum width/height in pixels for detected boxes (filters noise)
            nms_iou_threshold: IoU threshold for Non-Maximum Suppression
+            preprocess_mode: Image preprocessing mode for CLIP:
+                - "default": Use CLIP's default (resize shortest edge + center crop)
+                - "letterbox": Pad to square with black bars, preserving aspect ratio
+                - "stretch": Stretch to square (distorts aspect ratio)
        """
        self.logger = logger
        self.detr_threshold = detr_threshold
        self.min_box_size = min_box_size
        self.nms_iou_threshold = nms_iou_threshold
        self.embedding_model_name = embedding_model
+        self.preprocess_mode = preprocess_mode

        # Set device
        self.device_str = "cuda:0" if torch.cuda.is_available() else "cpu"
@ -116,6 +122,8 @@ class DetectLogosDETR:
                self.embedding_model = AutoModel.from_pretrained(embedding_model_path).to(self.device)
                self.embedding_processor = AutoImageProcessor.from_pretrained(embedding_model_path)

+        if self.preprocess_mode != "default":
+            self.logger.info(f"Image preprocessing mode: {self.preprocess_mode}")
        self.logger.info("DetectLogosDETR initialization complete")

    def _detect_model_type(self, model_name: str) -> str:
@ -402,6 +410,46 @@ class DetectLogosDETR:

        return self._get_embedding_pil(pil_image)

+    def _preprocess_image(self, pil_image: Image.Image, target_size: int = 224) -> Image.Image:
+        """
+        Preprocess image based on the configured preprocessing mode.
+
+        Args:
+            pil_image: PIL Image (RGB format)
+            target_size: Target size for the square output (default 224 for CLIP)
+
+        Returns:
+            Preprocessed PIL Image
+        """
+        if self.preprocess_mode == "default":
+            # Let the processor handle it (resize shortest edge + center crop)
+            return pil_image
+
+        width, height = pil_image.size
+
+        if self.preprocess_mode == "letterbox":
+            # Pad to square with black bars, preserving aspect ratio
+            max_dim = max(width, height)
+
+            # Create a black square canvas
+            new_image = Image.new("RGB", (max_dim, max_dim), (0, 0, 0))
+
+            # Paste the original image centered
+            paste_x = (max_dim - width) // 2
+            paste_y = (max_dim - height) // 2
+            new_image.paste(pil_image, (paste_x, paste_y))
+
+            # Resize to target size
+            return new_image.resize((target_size, target_size), Image.LANCZOS)
+
+        elif self.preprocess_mode == "stretch":
+            # Stretch to square (distorts aspect ratio)
+            return pil_image.resize((target_size, target_size), Image.LANCZOS)
+
+        else:
+            # Unknown mode, return original
+            return pil_image
+
    def _get_embedding_pil(self, pil_image: Image.Image) -> torch.Tensor:
        """
        Internal method to get embedding from PIL image.
@ -414,6 +462,10 @@ class DetectLogosDETR:
        Returns:
            Normalized feature embedding (torch.Tensor)
        """
+        # Apply preprocessing if configured
+        if self.preprocess_mode != "default":
+            pil_image = self._preprocess_image(pil_image)
+
        # Process image through the embedding model
        inputs = self.embedding_processor(images=pil_image, return_tensors="pt").to(self.device)

--- a/logo_detection_embeddings.py
+++ b/logo_detection_embeddings.py
@ -0,0 +1,364 @@
+"""
+Logo detection using DETR for object detection and selectable embedding models for feature matching.
+
+This module provides a class for detecting logos in images using:
+1. DETR (DEtection TRansformer) for initial logo region detection
+2. Selectable embedding model (CLIP, DINOv2, or SigLIP) for feature extraction and matching
+
+Key features:
+- Multiple reference images per logo entry, averaged into a single embedding
+- Cache-aware: averaged embeddings are only recalculated when the filenames list changes
+- Supports local model directories with fallback to HuggingFace
+"""
+
+import hashlib
+import json
+import os
+
+import cv2
+import numpy as np
+import torch
+import torch.nn.functional as F
+from PIL import Image
+from transformers import (
+    AutoImageProcessor,
+    AutoModel,
+    AutoProcessor,
+    CLIPModel,
+    CLIPProcessor,
+    Dinov2Model,
+    pipeline,
+)
+from typing import Any, Dict, List, Optional, Tuple
+
+
+class DetectLogosEmbeddings:
+    """
+    Logo detection class using DETR and a selectable embedding model.
+
+    This class detects logos in images by:
+    1. Using DETR to find potential logo regions (bounding boxes)
+    2. Extracting embeddings for each detected region using the selected model
+    3. Comparing embeddings with averaged reference logo embeddings for identification
+
+    Supported embedding models:
+    - clip: openai/clip-vit-large-patch14
+    - dinov2: facebook/dinov2-base (recommended for visual similarity)
+    - siglip: google/siglip-base-patch16-224
+    """
+
+    def __init__(
+        self,
+        logger,
+        detr_model: str = "Pravallika6/detr-finetuned-logo-detection_v2",
+        embedding_model_type: str = "dinov2",
+        detr_threshold: float = 0.5,
+    ):
+        """
+        Initialize DETR and embedding models.
+
+        Args:
+            logger: Logger instance for logging
+            detr_model: HuggingFace model name or local path for DETR object detection
+            embedding_model_type: One of "clip", "dinov2", or "siglip"
+            detr_threshold: Confidence threshold for DETR detections (0-1)
+        """
+        self.logger = logger
+        self.detr_threshold = detr_threshold
+        self.embedding_model_type = embedding_model_type
+
+        # Set device
+        self.device_str = "cuda:0" if torch.cuda.is_available() else "cpu"
+        self.device_index = 0 if torch.cuda.is_available() else -1
+        self.device = torch.device(self.device_str)
+
+        self.logger.info(
+            f"Initializing DetectLogosEmbeddings on device: {self.device_str}, "
+            f"embedding model: {embedding_model_type}"
+        )
+
+        # --- DETR model ---
+        default_detr_dir = os.environ.get(
+            "LOGO_DETR_MODEL_DIR", "models/logo_detection/detr"
+        )
+        detr_model_path = self._resolve_model_path(detr_model, default_detr_dir, "DETR")
+
+        self.logger.info(f"Loading DETR model: {detr_model_path}")
+        self.detr_pipe = pipeline(
+            task="object-detection",
+            model=detr_model_path,
+            device=self.device_index,
+            use_fast=True,
+        )
+
+        # --- Embedding model ---
+        self._load_embedding_model(embedding_model_type)
+
+        self.logger.info("DetectLogosEmbeddings initialization complete")
+
+    def _load_embedding_model(self, model_type: str) -> None:
+        """
+        Load the selected embedding model.
+
+        Args:
+            model_type: One of "clip", "dinov2", or "siglip"
+        """
+        default_embedding_dir = os.environ.get(
+            "LOGO_EMBEDDING_MODEL_DIR", f"models/logo_detection/{model_type}"
+        )
+
+        if model_type == "clip":
+            model_name = "openai/clip-vit-large-patch14"
+            model_path = self._resolve_model_path(
+                model_name, default_embedding_dir, "CLIP"
+            )
+            self.logger.info(f"Loading CLIP model: {model_path}")
+            self._clip_model = CLIPModel.from_pretrained(model_path).to(self.device)
+            self._clip_processor = CLIPProcessor.from_pretrained(model_path)
+            self._clip_model.eval()
+
+            def embed_fn(pil_image):
+                inputs = self._clip_processor(
+                    images=pil_image, return_tensors="pt"
+                ).to(self.device)
+                with torch.no_grad():
+                    features = self._clip_model.get_image_features(**inputs)
+                return F.normalize(features, dim=-1)
+
+        elif model_type == "dinov2":
+            model_name = "facebook/dinov2-base"
+            model_path = self._resolve_model_path(
+                model_name, default_embedding_dir, "DINOv2"
+            )
+            self.logger.info(f"Loading DINOv2 model: {model_path}")
+            self._dinov2_model = Dinov2Model.from_pretrained(model_path).to(self.device)
+            self._dinov2_processor = AutoImageProcessor.from_pretrained(model_path)
+            self._dinov2_model.eval()
+
+            def embed_fn(pil_image):
+                inputs = self._dinov2_processor(
+                    images=pil_image, return_tensors="pt"
+                ).to(self.device)
+                with torch.no_grad():
+                    outputs = self._dinov2_model(**inputs)
+                    # Use CLS token embedding
+                    features = outputs.last_hidden_state[:, 0, :]
+                return F.normalize(features, dim=-1)
+
+        elif model_type == "siglip":
+            model_name = "google/siglip-base-patch16-224"
+            model_path = self._resolve_model_path(
+                model_name, default_embedding_dir, "SigLIP"
+            )
+            self.logger.info(f"Loading SigLIP model: {model_path}")
+            self._siglip_model = AutoModel.from_pretrained(model_path).to(self.device)
+            self._siglip_processor = AutoProcessor.from_pretrained(model_path)
+            self._siglip_model.eval()
+
+            def embed_fn(pil_image):
+                inputs = self._siglip_processor(
+                    images=pil_image, return_tensors="pt"
+                ).to(self.device)
+                with torch.no_grad():
+                    features = self._siglip_model.get_image_features(**inputs)
+                return F.normalize(features, dim=-1)
+
+        else:
+            raise ValueError(
+                f"Unknown embedding model type: {model_type}. "
+                f"Use 'clip', 'dinov2', or 'siglip'"
+            )
+
+        self._embed_fn = embed_fn
+
+    def _resolve_model_path(
+        self, model_name_or_path: str, default_local_dir: str, model_type: str
+    ) -> str:
+        """
+        Resolve model path, checking for local models before using HuggingFace.
+
+        Args:
+            model_name_or_path: HuggingFace model name or absolute path
+            default_local_dir: Default local directory to check
+            model_type: Type of model (for logging)
+
+        Returns:
+            Resolved model path (local path or HuggingFace model name)
+        """
+        # If it's an absolute path, use it directly
+        if os.path.isabs(model_name_or_path):
+            if os.path.exists(model_name_or_path):
+                self.logger.info(
+                    f"{model_type} model: Using local model at {model_name_or_path}"
+                )
+                return model_name_or_path
+            else:
+                self.logger.warning(
+                    f"{model_type} model: Local path {model_name_or_path} does not exist, "
+                    f"falling back to HuggingFace"
+                )
+                return model_name_or_path
+
+        # Check if default local directory exists
+        if os.path.exists(default_local_dir):
+            config_file = os.path.join(default_local_dir, "config.json")
+            if os.path.exists(config_file):
+                abs_path = os.path.abspath(default_local_dir)
+                self.logger.info(
+                    f"{model_type} model: Found local model at {abs_path}"
+                )
+                return abs_path
+            else:
+                self.logger.warning(
+                    f"{model_type} model: Local directory {default_local_dir} exists but "
+                    f"is not a valid model (missing config.json)"
+                )
+
+        # Use HuggingFace model name
+        self.logger.info(
+            f"{model_type} model: No local model found, will download from HuggingFace: "
+            f"{model_name_or_path}"
+        )
+        return model_name_or_path
+
+    def detect(self, image: np.ndarray) -> List[Dict[str, Any]]:
+        """
+        Detect logos in an image and return bounding boxes with embeddings.
+
+        Args:
+            image: OpenCV image (BGR format, numpy array)
+
+        Returns:
+            List of dictionaries, each containing:
+                - 'box': dict with 'xmin', 'ymin', 'xmax', 'ymax' (pixel coordinates)
+                - 'score': DETR confidence score (float 0-1)
+                - 'embedding': Feature embedding (torch.Tensor)
+                - 'label': DETR predicted label (string)
+        """
+        # Convert OpenCV BGR to RGB PIL Image
+        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        pil_image = Image.fromarray(image_rgb)
+
+        # Run DETR detection
+        predictions = self.detr_pipe(pil_image)
+
+        # Filter by threshold and add embeddings
+        detections = []
+        for pred in predictions:
+            score = pred.get("score", 0.0)
+            if score < self.detr_threshold:
+                continue
+
+            box = pred.get("box", {})
+            xmin = box.get("xmin", 0)
+            ymin = box.get("ymin", 0)
+            xmax = box.get("xmax", 0)
+            ymax = box.get("ymax", 0)
+
+            # Extract bounding box region
+            bbox_crop = pil_image.crop((xmin, ymin, xmax, ymax))
+
+            # Get embedding for this region
+            embedding = self._embed_fn(bbox_crop)
+
+            detections.append(
+                {
+                    "box": {"xmin": xmin, "ymin": ymin, "xmax": xmax, "ymax": ymax},
+                    "score": score,
+                    "embedding": embedding,
+                    "label": pred.get("label", "logo"),
+                }
+            )
+
+        self.logger.debug(
+            f"Detected {len(detections)} logos (threshold: {self.detr_threshold})"
+        )
+        return detections
+
+    def get_embedding(self, image: np.ndarray) -> torch.Tensor:
+        """
+        Get embedding for a single reference logo image.
+
+        Args:
+            image: OpenCV image (BGR format, numpy array)
+
+        Returns:
+            Normalized feature embedding (torch.Tensor)
+        """
+        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        pil_image = Image.fromarray(image_rgb)
+        return self._embed_fn(pil_image)
+
+    def get_averaged_embedding(self, images: List[np.ndarray]) -> Optional[torch.Tensor]:
+        """
+        Compute averaged embedding from multiple reference logo images.
+
+        Follows the averaging pattern from db_embeddings.py:
+        1. Compute embedding for each image
+        2. Stack and average across all images
+        3. Re-normalize the averaged embedding
+
+        Args:
+            images: List of OpenCV images (BGR format, numpy arrays)
+
+        Returns:
+            Normalized averaged embedding (torch.Tensor, shape [1, D]),
+            or None if no valid embeddings could be computed
+        """
+        embeddings = []
+        for img in images:
+            try:
+                emb = self.get_embedding(img)
+                embeddings.append(emb)
+            except Exception as e:
+                self.logger.warning(f"Failed to compute embedding for reference image: {e}")
+
+        if not embeddings:
+            return None
+
+        # Stack: (N, D), average: (1, D), re-normalize
+        stacked = torch.cat(embeddings, dim=0)
+        avg_emb = stacked.mean(dim=0, keepdim=True)
+        avg_emb = F.normalize(avg_emb, dim=-1)
+
+        self.logger.debug(
+            f"Computed averaged embedding from {len(embeddings)} reference image(s)"
+        )
+        return avg_emb
+
+    def compare_embeddings(
+        self, embedding1: torch.Tensor, embedding2: torch.Tensor
+    ) -> float:
+        """
+        Compute cosine similarity between two embeddings.
+
+        Args:
+            embedding1: First embedding (torch.Tensor)
+            embedding2: Second embedding (torch.Tensor)
+
+        Returns:
+            Cosine similarity score (float, range: -1 to 1, typically 0 to 1)
+        """
+        # Ensure tensors are on the same device
+        if embedding1.device != embedding2.device:
+            embedding2 = embedding2.to(embedding1.device)
+
+        similarity = F.cosine_similarity(embedding1, embedding2, dim=-1)
+        return similarity.item()
+
+    @staticmethod
+    def make_filenames_hash(filenames: List[str]) -> str:
+        """
+        Compute a deterministic hash of a filenames list.
+
+        Used for cache invalidation — if the filenames list changes,
+        the hash changes, triggering re-computation of averaged embeddings.
+
+        Args:
+            filenames: List of filename strings
+
+        Returns:
+            16-character hex hash string
+        """
+        canonical = json.dumps(sorted(filenames))
+        return hashlib.sha256(canonical.encode("utf-8")).hexdigest()[:16]
--- a/prepare_test_data.py
+++ b/prepare_test_data.py
@ -113,11 +113,14 @@ def get_or_create_logo_name(cursor: sqlite3.Cursor, name: str) -> int:


 def main():
-    # Paths
-    dataset_dir = Path("/data/dev.python/logo_test/LogoDet-3K")
-    reference_dir = Path("/data/dev.python/logo_test/reference_logos")
-    test_images_dir = Path("/data/dev.python/logo_test/test_images")
-    db_path = Path("/data/dev.python/logo_test/test_data_mapping.db")
+    # Use script directory as base path for portability
+    base_dir = Path(__file__).parent.resolve()
+
+    # Paths relative to script location
+    dataset_dir = base_dir / "LogoDet-3K"
+    reference_dir = base_dir / "reference_logos"
+    test_images_dir = base_dir / "test_images"
+    db_path = base_dir / "test_data_mapping.db"

    # Ensure output directories exist
    reference_dir.mkdir(exist_ok=True)
--- a/requirements-training.txt
+++ b/requirements-training.txt
@ -0,0 +1,23 @@
+# Requirements for CLIP logo fine-tuning on RTX 4090
+#
+# Only includes packages not already installed on the training server.
+# Does NOT upgrade existing packages (torch, torchvision, numpy, pillow,
+# pyyaml, opencv-python) which are already installed and compatible.
+#
+# Usage:
+#   pip install -r requirements-training.txt
+
+# CLIP models and tokenizers
+transformers>=4.36.0
+
+# LoRA fine-tuning
+peft>=0.7.0
+
+# Progress bars
+tqdm>=4.66.0
+
+# HuggingFace Hub for model downloads
+huggingface-hub>=0.19.0
+
+# Accelerate for efficient training (optional but recommended)
+accelerate>=0.25.0
--- a/results_average_embeddings.txt
+++ b/results_average_embeddings.txt
@ -0,0 +1,52 @@
+======================================================================
+BURNLEY LOGO DETECTION TEST
+Model: dinov2
+Method: Margin-based (margin=0.05)
+======================================================================
+Date: 2026-03-31 11:45:03
+
+Configuration:
+  Embedding model:           dinov2
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Matching margin:           0.05
+  Test images processed:     516
+  Reference logos:           barnfield, vertu
+
+Results:
+  True Positives:       28
+  False Positives:      36
+  False Negatives:     125
+  Total Expected:      146
+
+Scores:
+  Precision:  0.4375 (43.8%)
+  Recall:     0.1918 (19.2%)
+  F1 Score:   0.2667 (26.7%)
+
+======================================================================
+BURNLEY LOGO DETECTION TEST
+Model: dinov2
+Method: Margin-based (margin=0.05)
+======================================================================
+Date: 2026-03-31 12:29:32
+
+Configuration:
+  Embedding model:           dinov2
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Matching margin:           0.05
+  Test images processed:     516
+  Reference logos:           barnfield, vertu
+
+Results:
+  True Positives:       28
+  False Positives:      36
+  False Negatives:     125
+  Total Expected:      146
+
+Scores:
+  Precision:  0.4375 (43.8%)
+  Recall:     0.1918 (19.2%)
+  F1 Score:   0.2667 (26.7%)
+
--- a/run_preprocess_test.sh
+++ b/run_preprocess_test.sh
@ -0,0 +1,149 @@
+#!/bin/bash
+#
+# Test different image preprocessing modes to determine if they improve
+# CLIP embedding accuracy for logo matching.
+#
+# Preprocessing modes tested:
+#   - default:   CLIP's default (resize shortest edge + center crop)
+#   - letterbox: Pad to square with black bars, preserving aspect ratio
+#   - stretch:   Stretch to square (distorts aspect ratio)
+#
+# Usage:
+#   ./run_preprocess_test.sh
+#
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+OUTPUT_FILE="${SCRIPT_DIR}/test_results/preprocessing_comparison.txt"
+
+# Model - baseline CLIP (testing preprocessing effect on standard model)
+MODEL="openai/clip-vit-large-patch14"
+
+# Fixed parameters (same as refs_per_logo test for comparability)
+NUM_LOGOS=20
+REFS_PER_LOGO=10
+POSITIVE_SAMPLES=20
+NEGATIVE_SAMPLES=100
+MIN_MATCHING_REFS=1
+THRESHOLD=0.70
+MARGIN=0.05
+SEED=42
+
+# Preprocessing modes to test
+MODES="default letterbox stretch"
+
+# Create output directory if needed
+mkdir -p "${SCRIPT_DIR}/test_results"
+
+# Clear output file and write header
+cat > "$OUTPUT_FILE" << EOF
+Image Preprocessing Comparison Test
+====================================
+Date: $(date)
+
+Model: ${MODEL}
+Method: multi-ref (max)
+
+Fixed Parameters:
+  Number of logo brands: ${NUM_LOGOS}
+  Refs per logo: ${REFS_PER_LOGO}
+  Similarity threshold: ${THRESHOLD}
+  Margin: ${MARGIN}
+  Min matching refs: ${MIN_MATCHING_REFS}
+  Positive samples/logo: ${POSITIVE_SAMPLES}
+  Negative samples/logo: ${NEGATIVE_SAMPLES}
+  Seed: ${SEED}
+
+Testing preprocessing modes: ${MODES}
+
+EOF
+
+echo "Image Preprocessing Comparison Test"
+echo "===================================="
+echo "Model: ${MODEL}"
+echo "Testing preprocessing modes: ${MODES}"
+echo ""
+
+# Results table header
+echo "Results Summary:" >> "$OUTPUT_FILE"
+echo "----------------" >> "$OUTPUT_FILE"
+printf "%-12s %8s %8s %8s %8s %8s %8s\n" "Mode" "TP" "FP" "FN" "Prec" "Recall" "F1" >> "$OUTPUT_FILE"
+echo "------------------------------------------------------------------------" >> "$OUTPUT_FILE"
+
+# Track best result
+BEST_F1=0
+BEST_MODE="default"
+
+for MODE in ${MODES}; do
+    echo "=== Testing preprocess_mode=${MODE} ==="
+
+    # Clear cache to ensure fresh embeddings with new preprocessing
+    rm -f "${SCRIPT_DIR}/.embedding_cache.pkl"
+
+    # Run test and capture output
+    OUTPUT=$(uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+        --num-logos $NUM_LOGOS \
+        --refs-per-logo $REFS_PER_LOGO \
+        --positive-samples $POSITIVE_SAMPLES \
+        --negative-samples $NEGATIVE_SAMPLES \
+        --matching-method multi-ref \
+        --min-matching-refs $MIN_MATCHING_REFS \
+        --use-max-similarity \
+        --threshold $THRESHOLD \
+        --margin $MARGIN \
+        --seed $SEED \
+        --embedding-model "$MODEL" \
+        --preprocess-mode "$MODE" \
+        --no-cache \
+        2>&1)
+
+    # Extract metrics
+    TP=$(echo "${OUTPUT}" | grep "True Positives" | grep -oE "[0-9]+" | head -1)
+    FP=$(echo "${OUTPUT}" | grep "False Positives" | grep -oE "[0-9]+" | head -1)
+    FN=$(echo "${OUTPUT}" | grep "False Negatives" | grep -oE "[0-9]+" | head -1)
+    PREC=$(echo "${OUTPUT}" | grep "Precision:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    RECALL=$(echo "${OUTPUT}" | grep "Recall:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    F1=$(echo "${OUTPUT}" | grep "F1 Score:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+
+    # Print to console
+    echo "  TP: ${TP}, FP: ${FP}, FN: ${FN}"
+    echo "  Precision: ${PREC}, Recall: ${RECALL}, F1: ${F1}"
+    echo ""
+
+    # Add to results table
+    printf "%-12s %8s %8s %8s %8s %8s %8s\n" "${MODE}" "${TP}" "${FP}" "${FN}" "${PREC}" "${RECALL}" "${F1}" >> "$OUTPUT_FILE"
+
+    # Track best F1
+    F1_NUM=$(echo "${F1}" | tr -d '%')
+    if [ -n "$F1_NUM" ]; then
+        BETTER=$(echo "${F1_NUM} > ${BEST_F1}" | bc -l 2>/dev/null || echo "0")
+        if [ "$BETTER" = "1" ]; then
+            BEST_F1="${F1_NUM}"
+            BEST_MODE="${MODE}"
+        fi
+    fi
+
+    # Also append full output for this test
+    echo "" >> "$OUTPUT_FILE"
+    echo "======================================================================" >> "$OUTPUT_FILE"
+    echo "DETAILED RESULTS: preprocess_mode=${MODE}" >> "$OUTPUT_FILE"
+    echo "======================================================================" >> "$OUTPUT_FILE"
+    echo "${OUTPUT}" | grep -A 50 "Configuration:" | head -30 >> "$OUTPUT_FILE"
+    echo "" >> "$OUTPUT_FILE"
+done
+
+# Summary
+echo "------------------------------------------------------------------------" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+echo "BEST PREPROCESSING MODE: ${BEST_MODE} (F1 = ${BEST_F1}%)" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+echo "Notes:" >> "$OUTPUT_FILE"
+echo "  - default: CLIP's standard preprocessing (resize shortest edge + center crop)" >> "$OUTPUT_FILE"
+echo "  - letterbox: Pads image to square with black bars, preserving aspect ratio" >> "$OUTPUT_FILE"
+echo "  - stretch: Resizes image to square, distorting aspect ratio" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+
+echo "======================================="
+echo "BEST: preprocess_mode=${BEST_MODE} (F1 = ${BEST_F1}%)"
+echo "======================================="
+echo ""
+echo "Results saved to: $OUTPUT_FILE"
--- a/run_refs_per_logo_test.sh
+++ b/run_refs_per_logo_test.sh
@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Test different numbers of reference logos per brand to find optimal setting.
+# Uses baseline CLIP with multi-ref (max) matching method.
+#
+# Usage:
+#   ./run_refs_per_logo_test.sh
+#
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+OUTPUT_FILE="${SCRIPT_DIR}/test_results/refs_per_logo_analysis.txt"
+
+# Model - baseline CLIP (best for unknown logos)
+MODEL="openai/clip-vit-large-patch14"
+
+# Fixed parameters
+NUM_LOGOS=20
+POSITIVE_SAMPLES=20
+NEGATIVE_SAMPLES=100
+MIN_MATCHING_REFS=1
+THRESHOLD=0.70
+MARGIN=0.05
+SEED=42
+
+# Refs per logo values to test
+REFS_TO_TEST="1 2 3 5 7 10 15 20"
+
+# Create output directory if needed
+mkdir -p "${SCRIPT_DIR}/test_results"
+
+# Clear output file and write header
+cat > "$OUTPUT_FILE" << EOF
+Reference Logos Per Brand Optimization
+======================================
+Date: $(date)
+
+Model: ${MODEL}
+Method: multi-ref (max)
+
+Fixed Parameters:
+  Number of logo brands: ${NUM_LOGOS}
+  Similarity threshold: ${THRESHOLD}
+  Margin: ${MARGIN}
+  Min matching refs: ${MIN_MATCHING_REFS}
+  Positive samples/logo: ${POSITIVE_SAMPLES}
+  Negative samples/logo: ${NEGATIVE_SAMPLES}
+  Seed: ${SEED}
+
+Testing refs per logo: ${REFS_TO_TEST}
+
+EOF
+
+echo "Reference Logos Per Brand Optimization"
+echo "======================================="
+echo "Model: ${MODEL}"
+echo "Testing refs per logo: ${REFS_TO_TEST}"
+echo ""
+
+# Results table header
+echo "Results Summary:" >> "$OUTPUT_FILE"
+echo "----------------" >> "$OUTPUT_FILE"
+printf "%-12s %8s %8s %8s %8s %8s %8s\n" "Refs/Logo" "TP" "FP" "FN" "Prec" "Recall" "F1" >> "$OUTPUT_FILE"
+echo "------------------------------------------------------------------------" >> "$OUTPUT_FILE"
+
+# Track best result
+BEST_F1=0
+BEST_REFS=0
+
+for REFS in ${REFS_TO_TEST}; do
+    echo "=== Testing refs_per_logo=${REFS} ==="
+
+    # Run test and capture output
+    OUTPUT=$(uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+        --num-logos $NUM_LOGOS \
+        --refs-per-logo $REFS \
+        --positive-samples $POSITIVE_SAMPLES \
+        --negative-samples $NEGATIVE_SAMPLES \
+        --matching-method multi-ref \
+        --min-matching-refs $MIN_MATCHING_REFS \
+        --use-max-similarity \
+        --threshold $THRESHOLD \
+        --margin $MARGIN \
+        --seed $SEED \
+        --embedding-model "$MODEL" \
+        2>&1)
+
+    # Extract metrics
+    TP=$(echo "${OUTPUT}" | grep "True Positives" | grep -oE "[0-9]+" | head -1)
+    FP=$(echo "${OUTPUT}" | grep "False Positives" | grep -oE "[0-9]+" | head -1)
+    FN=$(echo "${OUTPUT}" | grep "False Negatives" | grep -oE "[0-9]+" | head -1)
+    PREC=$(echo "${OUTPUT}" | grep "Precision:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    RECALL=$(echo "${OUTPUT}" | grep "Recall:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    F1=$(echo "${OUTPUT}" | grep "F1 Score:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+
+    # Print to console
+    echo "  TP: ${TP}, FP: ${FP}, FN: ${FN}"
+    echo "  Precision: ${PREC}, Recall: ${RECALL}, F1: ${F1}"
+    echo ""
+
+    # Add to results table
+    printf "%-12s %8s %8s %8s %8s %8s %8s\n" "${REFS}" "${TP}" "${FP}" "${FN}" "${PREC}" "${RECALL}" "${F1}" >> "$OUTPUT_FILE"
+
+    # Track best F1
+    F1_NUM=$(echo "${F1}" | tr -d '%')
+    if [ -n "$F1_NUM" ]; then
+        BETTER=$(echo "${F1_NUM} > ${BEST_F1}" | bc -l 2>/dev/null || echo "0")
+        if [ "$BETTER" = "1" ]; then
+            BEST_F1="${F1_NUM}"
+            BEST_REFS="${REFS}"
+        fi
+    fi
+
+    # Also append full output for this test
+    echo "" >> "$OUTPUT_FILE"
+    echo "======================================================================" >> "$OUTPUT_FILE"
+    echo "DETAILED RESULTS: refs_per_logo=${REFS}" >> "$OUTPUT_FILE"
+    echo "======================================================================" >> "$OUTPUT_FILE"
+    echo "${OUTPUT}" | grep -A 50 "Configuration:" | head -30 >> "$OUTPUT_FILE"
+    echo "" >> "$OUTPUT_FILE"
+done
+
+# Summary
+echo "------------------------------------------------------------------------" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+echo "OPTIMAL SETTING: refs_per_logo=${BEST_REFS} (F1 = ${BEST_F1}%)" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+
+echo "======================================="
+echo "OPTIMAL: refs_per_logo=${BEST_REFS} (F1 = ${BEST_F1}%)"
+echo "======================================="
+echo ""
+echo "Results saved to: $OUTPUT_FILE"
--- a/run_threshold_tests_image_split.sh
+++ b/run_threshold_tests_image_split.sh
@ -0,0 +1,181 @@
+#!/bin/bash
+#
+# Run logo detection tests with the image-split fine-tuned model.
+# Tests various threshold and margin settings to find optimal parameters.
+#
+# Usage:
+#   ./run_threshold_tests_image_split.sh
+#
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+OUTPUT_FILE="${SCRIPT_DIR}/threshold_test_results_image_split.txt"
+
+# Model path
+MODEL_PATH="models/logo_detection/clip_finetuned_image_split"
+
+# Common parameters
+NUM_LOGOS=20
+REFS_PER_LOGO=10
+POSITIVE_SAMPLES=20
+NEGATIVE_SAMPLES=100
+MIN_MATCHING_REFS=3
+SEED=42
+
+# Check if model exists
+if [ ! -d "${SCRIPT_DIR}/${MODEL_PATH}" ]; then
+    echo "Error: Image-split model not found at ${SCRIPT_DIR}/${MODEL_PATH}"
+    echo "Train the model first with: python train_clip_logo.py --config configs/cloud_rtx4090_image_split.yaml"
+    exit 1
+fi
+
+# Clear output file and write header
+echo "Threshold Optimization Tests - Image-Split Model" > "$OUTPUT_FILE"
+echo "=================================================" >> "$OUTPUT_FILE"
+echo "Date: $(date)" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+echo "Model: ${MODEL_PATH}" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+echo "Common Parameters:" >> "$OUTPUT_FILE"
+echo "  Matching method: multi-ref (max)" >> "$OUTPUT_FILE"
+echo "  Reference logos: $NUM_LOGOS" >> "$OUTPUT_FILE"
+echo "  Refs per logo: $REFS_PER_LOGO" >> "$OUTPUT_FILE"
+echo "  Positive samples: $POSITIVE_SAMPLES" >> "$OUTPUT_FILE"
+echo "  Negative samples: $NEGATIVE_SAMPLES" >> "$OUTPUT_FILE"
+echo "  Min matching refs: $MIN_MATCHING_REFS" >> "$OUTPUT_FILE"
+echo "  Seed: $SEED" >> "$OUTPUT_FILE"
+echo "" >> "$OUTPUT_FILE"
+
+echo "Running threshold optimization tests for image-split model..."
+echo "  Model: ${MODEL_PATH}"
+echo "  Matching method: multi-ref (max)"
+echo "  Reference logos: $NUM_LOGOS"
+echo "  Refs per logo: $REFS_PER_LOGO"
+echo "  Seed: $SEED"
+echo ""
+
+# Test 1: Lower threshold (image-split model may have different distribution)
+echo "=== Test 1: threshold=0.65, margin=0.05 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.65 \
+    --margin 0.05 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 2: Default threshold
+echo "=== Test 2: threshold=0.70, margin=0.05 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.70 \
+    --margin 0.05 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 3: threshold=0.75
+echo "=== Test 3: threshold=0.75, margin=0.05 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.75 \
+    --margin 0.05 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 4: threshold=0.80
+echo "=== Test 4: threshold=0.80, margin=0.05 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.80 \
+    --margin 0.05 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 5: threshold=0.80 with larger margin
+echo "=== Test 5: threshold=0.80, margin=0.10 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.80 \
+    --margin 0.10 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 6: threshold=0.85
+echo "=== Test 6: threshold=0.85, margin=0.10 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.85 \
+    --margin 0.10 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+
+# Test 7: threshold=0.90
+echo "=== Test 7: threshold=0.90, margin=0.10 ==="
+uv run python "$SCRIPT_DIR/test_logo_detection.py" \
+    --num-logos $NUM_LOGOS \
+    --refs-per-logo $REFS_PER_LOGO \
+    --positive-samples $POSITIVE_SAMPLES \
+    --negative-samples $NEGATIVE_SAMPLES \
+    --matching-method multi-ref \
+    --min-matching-refs $MIN_MATCHING_REFS \
+    --use-max-similarity \
+    --threshold 0.90 \
+    --margin 0.10 \
+    --seed $SEED \
+    --embedding-model "$MODEL_PATH" \
+    --output-file "$OUTPUT_FILE"
+
+echo ""
+echo "Results saved to: $OUTPUT_FILE"
--- a/test_burnley_detection.py
+++ b/test_burnley_detection.py
@ -0,0 +1,521 @@
+#!/usr/bin/env python3
+"""
+Test script for logo detection accuracy on Burnley test images.
+
+Uses DetectLogosEmbeddings from logo_detection_embeddings.py to detect
+barnfield and vertu logos. Ground truth is determined by filename prefix:
+- "vertu_" → contains vertu logo
+- "barnfield_" → contains barnfield logo
+- "barnfield+vertu_" → contains both logos
+- anything else → no target logos
+"""
+
+import argparse
+import logging
+import pickle
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Set, Tuple
+
+import cv2
+import torch
+from tqdm import tqdm
+
+from logo_detection_embeddings import DetectLogosEmbeddings
+
+
+def setup_logging(verbose: bool = False) -> logging.Logger:
+    """Configure logging."""
+    level = logging.DEBUG if verbose else logging.INFO
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s - %(levelname)s - %(message)s",
+        datefmt="%H:%M:%S",
+    )
+    return logging.getLogger(__name__)
+
+
+def load_image(image_path: Path) -> Optional[cv2.Mat]:
+    """Load an image using OpenCV."""
+    img = cv2.imread(str(image_path))
+    if img is None:
+        return None
+    return img
+
+
+class EmbeddingCache:
+    """Simple file-based cache for embeddings."""
+
+    def __init__(self, cache_path: Path):
+        self.cache_path = cache_path
+        self.cache: Dict[str, Any] = {}
+        self._load()
+
+    def _load(self):
+        if self.cache_path.exists():
+            try:
+                with open(self.cache_path, "rb") as f:
+                    self.cache = pickle.load(f)
+            except Exception:
+                self.cache = {}
+
+    def save(self):
+        self.cache_path.parent.mkdir(parents=True, exist_ok=True)
+        with open(self.cache_path, "wb") as f:
+            pickle.dump(self.cache, f)
+
+    def get(self, key: str):
+        return self.cache.get(key)
+
+    def put(self, key: str, value):
+        if isinstance(value, torch.Tensor):
+            self.cache[key] = value.cpu()
+        else:
+            self.cache[key] = value
+
+    def __len__(self):
+        return len(self.cache)
+
+
+def get_expected_logos(filename: str) -> Set[str]:
+    """Determine expected logos from filename prefix."""
+    name = filename.lower()
+    if name.startswith("barnfield+vertu_"):
+        return {"barnfield", "vertu"}
+    elif name.startswith("barnfield_"):
+        return {"barnfield"}
+    elif name.startswith("vertu_"):
+        return {"vertu"}
+    return set()
+
+
+def load_reference_images(ref_dir: Path, logger: logging.Logger) -> List[cv2.Mat]:
+    """Load all images from a reference directory."""
+    images = []
+    for path in sorted(ref_dir.iterdir()):
+        if path.suffix.lower() in (".jpg", ".jpeg", ".png", ".bmp"):
+            img = load_image(path)
+            if img is not None:
+                images.append(img)
+            else:
+                logger.warning(f"Failed to load reference image: {path}")
+    return images
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Test logo detection on Burnley test images using DetectLogosEmbeddings"
+    )
+    parser.add_argument(
+        "-t", "--threshold",
+        type=float,
+        default=0.7,
+        help="Similarity threshold for matching (default: 0.7)",
+    )
+    parser.add_argument(
+        "-d", "--detr-threshold",
+        type=float,
+        default=0.5,
+        help="DETR detection confidence threshold (default: 0.5)",
+    )
+    parser.add_argument(
+        "-e", "--embedding-model",
+        type=str,
+        choices=["clip", "dinov2", "siglip"],
+        default="dinov2",
+        help="Embedding model type (default: dinov2)",
+    )
+    parser.add_argument(
+        "--margin",
+        type=float,
+        default=0.05,
+        help="Required margin between best and second-best match (default: 0.05)",
+    )
+    parser.add_argument(
+        "-v", "--verbose",
+        action="store_true",
+        help="Enable verbose logging",
+    )
+    parser.add_argument(
+        "--similarity-details",
+        action="store_true",
+        help="Output detailed similarity scores for each detection",
+    )
+    parser.add_argument(
+        "--no-cache",
+        action="store_true",
+        help="Disable embedding cache",
+    )
+    parser.add_argument(
+        "--clear-cache",
+        action="store_true",
+        help="Clear embedding cache before running",
+    )
+    parser.add_argument(
+        "--output-file",
+        type=str,
+        default=None,
+        help="Append results summary to this file",
+    )
+
+    args = parser.parse_args()
+    logger = setup_logging(args.verbose)
+
+    # Paths
+    base_dir = Path(__file__).resolve().parent
+    test_images_dir = base_dir / "burnley_test_images"
+    barnfield_ref_dir = base_dir / "barnfield_reference_images"
+    vertu_ref_dir = base_dir / "vertu_reference_images"
+    cache_path = base_dir / ".burnley_embedding_cache.pkl"
+
+    # Verify directories exist
+    for d, name in [(test_images_dir, "Test images"), (barnfield_ref_dir, "Barnfield refs"), (vertu_ref_dir, "Vertu refs")]:
+        if not d.exists():
+            logger.error(f"{name} directory not found: {d}")
+            sys.exit(1)
+
+    # Handle cache
+    if args.clear_cache and cache_path.exists():
+        cache_path.unlink()
+        logger.info("Cleared embedding cache")
+
+    cache = EmbeddingCache(cache_path) if not args.no_cache else None
+    if cache:
+        logger.info(f"Loaded {len(cache)} cached embeddings")
+
+    # Initialize detector
+    logger.info(f"Initializing detector with embedding model: {args.embedding_model}")
+    detector = DetectLogosEmbeddings(
+        logger=logger,
+        detr_threshold=args.detr_threshold,
+        embedding_model_type=args.embedding_model,
+    )
+
+    # Compute averaged reference embeddings
+    logger.info("Computing reference embeddings...")
+
+    reference_embeddings: Dict[str, torch.Tensor] = {}
+    for logo_name, ref_dir in [("barnfield", barnfield_ref_dir), ("vertu", vertu_ref_dir)]:
+        cache_key = f"avg_ref:{logo_name}:{args.embedding_model}"
+        cached = cache.get(cache_key) if cache else None
+
+        if cached is not None:
+            reference_embeddings[logo_name] = cached
+            logger.info(f"Loaded cached averaged embedding for {logo_name}")
+        else:
+            ref_images = load_reference_images(ref_dir, logger)
+            logger.info(f"Computing averaged embedding for {logo_name} from {len(ref_images)} images")
+            avg_emb = detector.get_averaged_embedding(ref_images)
+            if avg_emb is None:
+                logger.error(f"Failed to compute embedding for {logo_name}")
+                sys.exit(1)
+            reference_embeddings[logo_name] = avg_emb
+            if cache:
+                cache.put(cache_key, avg_emb)
+
+    # Collect test images
+    test_files = sorted([
+        f.name for f in test_images_dir.iterdir()
+        if f.suffix.lower() in (".jpg", ".jpeg", ".png", ".bmp")
+    ])
+    logger.info(f"Found {len(test_files)} test images")
+
+    # Metrics
+    true_positives = 0
+    false_positives = 0
+    false_negatives = 0
+    total_expected = 0
+    results = []
+
+    similarity_details = {
+        "true_positive_sims": [],
+        "false_positive_sims": [],
+        "missed_best_sims": [],
+        "detection_details": [],
+    }
+
+    # Process test images
+    for test_filename in tqdm(test_files, desc="Testing"):
+        test_path = test_images_dir / test_filename
+        expected_logos = get_expected_logos(test_filename)
+        total_expected += len(expected_logos)
+
+        # Check cache for detections
+        det_cache_key = f"det:{test_filename}:{args.embedding_model}"
+        cached_detections = cache.get(det_cache_key) if cache else None
+
+        if cached_detections is not None:
+            detections = cached_detections
+        else:
+            test_img = load_image(test_path)
+            if test_img is None:
+                logger.warning(f"Failed to load test image: {test_path}")
+                continue
+            detections = detector.detect(test_img)
+            if cache:
+                cache.put(det_cache_key, detections)
+
+        # Match each detection against reference embeddings with margin
+        matched_logos: Set[str] = set()
+        for det_idx, detection in enumerate(detections):
+            # Compute similarity to each reference logo
+            sims: Dict[str, float] = {}
+            for logo_name, ref_emb in reference_embeddings.items():
+                sims[logo_name] = detector.compare_embeddings(
+                    detection["embedding"], ref_emb
+                )
+
+            sorted_sims = sorted(sims.items(), key=lambda x: -x[1])
+
+            if args.similarity_details:
+                similarity_details["detection_details"].append({
+                    "image": test_filename,
+                    "detection_idx": det_idx,
+                    "expected_logos": list(expected_logos),
+                    "similarities": sorted_sims,
+                    "detr_score": detection.get("score", 0),
+                })
+
+            # Best match with margin check
+            if not sorted_sims:
+                continue
+
+            best_name, best_sim = sorted_sims[0]
+            if best_sim < args.threshold:
+                continue
+
+            # Check margin over second best
+            if len(sorted_sims) > 1:
+                second_sim = sorted_sims[1][1]
+                if best_sim - second_sim < args.margin:
+                    continue
+
+            matched_logos.add(best_name)
+            is_correct = best_name in expected_logos
+
+            if is_correct:
+                true_positives += 1
+                if args.similarity_details:
+                    similarity_details["true_positive_sims"].append(best_sim)
+            else:
+                false_positives += 1
+                if args.similarity_details:
+                    similarity_details["false_positive_sims"].append(best_sim)
+
+            results.append({
+                "test_image": test_filename,
+                "matched_logo": best_name,
+                "similarity": best_sim,
+                "correct": is_correct,
+            })
+
+        # Count missed detections
+        missed = expected_logos - matched_logos
+        false_negatives += len(missed)
+
+        for missed_logo in missed:
+            if args.similarity_details and detections:
+                best_sim_for_missed = 0
+                ref_emb = reference_embeddings[missed_logo]
+                for detection in detections:
+                    sim = detector.compare_embeddings(detection["embedding"], ref_emb)
+                    best_sim_for_missed = max(best_sim_for_missed, sim)
+                similarity_details["missed_best_sims"].append(best_sim_for_missed)
+
+            results.append({
+                "test_image": test_filename,
+                "matched_logo": None,
+                "expected_logo": missed_logo,
+                "similarity": None,
+                "correct": False,
+            })
+
+    # Save cache
+    if cache:
+        cache.save()
+        logger.info(f"Saved {len(cache)} embeddings to cache")
+
+    # Calculate metrics
+    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
+    recall = true_positives / total_expected if total_expected > 0 else 0
+    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
+
+    # Print results
+    print("\n" + "=" * 60)
+    print("BURNLEY LOGO DETECTION TEST RESULTS")
+    print("=" * 60)
+    print(f"\nConfiguration:")
+    print(f"  Embedding model:           {args.embedding_model}")
+    print(f"  Similarity threshold:      {args.threshold}")
+    print(f"  DETR confidence threshold: {args.detr_threshold}")
+    print(f"  Matching margin:           {args.margin}")
+    print(f"  Test images processed:     {len(test_files)}")
+    print(f"  Reference logos:           barnfield, vertu")
+
+    print(f"\nMetrics:")
+    print(f"  True Positives (correct matches):  {true_positives}")
+    print(f"  False Positives (wrong matches):   {false_positives}")
+    print(f"  False Negatives (missed logos):    {false_negatives}")
+    print(f"  Total expected matches:            {total_expected}")
+
+    print(f"\nScores:")
+    print(f"  Precision: {precision:.4f} ({precision*100:.1f}%)")
+    print(f"  Recall:    {recall:.4f} ({recall*100:.1f}%)")
+    print(f"  F1 Score:  {f1:.4f} ({f1*100:.1f}%)")
+
+    # Show false positive examples
+    false_positive_examples = [r for r in results if r.get("matched_logo") and not r["correct"]]
+    if false_positive_examples:
+        print(f"\nExample False Positives (first 5):")
+        for r in false_positive_examples[:5]:
+            print(f"  - Image: {r['test_image']}")
+            print(f"    Matched: {r['matched_logo']} (similarity: {r['similarity']:.3f})")
+
+    # Show false negative examples
+    false_negative_examples = [r for r in results if r.get("expected_logo")]
+    if false_negative_examples:
+        print(f"\nExample False Negatives (first 5):")
+        for r in false_negative_examples[:5]:
+            print(f"  - Image: {r['test_image']}")
+            print(f"    Expected: {r['expected_logo']}")
+
+    print("=" * 60)
+
+    # Print similarity details if requested
+    if args.similarity_details:
+        print_similarity_details(similarity_details, args.threshold)
+
+    # Write results to file if requested
+    if args.output_file:
+        write_results_to_file(
+            output_path=Path(args.output_file),
+            args=args,
+            num_test_images=len(test_files),
+            true_positives=true_positives,
+            false_positives=false_positives,
+            false_negatives=false_negatives,
+            total_expected=total_expected,
+            precision=precision,
+            recall=recall,
+            f1=f1,
+        )
+        print(f"\nResults appended to: {args.output_file}")
+
+
+def print_similarity_details(details: dict, threshold: float):
+    """Print detailed similarity distribution analysis."""
+    import statistics
+
+    print("\n" + "=" * 60)
+    print("SIMILARITY DISTRIBUTION ANALYSIS")
+    print("=" * 60)
+
+    def compute_stats(values, name):
+        if not values:
+            print(f"\n{name}: No data")
+            return
+        print(f"\n{name} (n={len(values)}):")
+        print(f"  Min:    {min(values):.4f}")
+        print(f"  Max:    {max(values):.4f}")
+        print(f"  Mean:   {statistics.mean(values):.4f}")
+        if len(values) > 1:
+            print(f"  StdDev: {statistics.stdev(values):.4f}")
+            print(f"  Median: {statistics.median(values):.4f}")
+
+        above = sum(1 for v in values if v >= threshold)
+        below = sum(1 for v in values if v < threshold)
+        print(f"  Above threshold ({threshold}): {above} ({100*above/len(values):.1f}%)")
+        print(f"  Below threshold ({threshold}): {below} ({100*below/len(values):.1f}%)")
+
+    compute_stats(details["true_positive_sims"], "TRUE POSITIVE similarities")
+    compute_stats(details["false_positive_sims"], "FALSE POSITIVE similarities")
+    compute_stats(details["missed_best_sims"], "MISSED LOGO best similarities")
+
+    # Overlap analysis
+    tp_sims = details["true_positive_sims"]
+    fp_sims = details["false_positive_sims"]
+    if tp_sims and fp_sims:
+        print("\n" + "-" * 40)
+        print("OVERLAP ANALYSIS:")
+        tp_min, tp_max = min(tp_sims), max(tp_sims)
+        fp_min, fp_max = min(fp_sims), max(fp_sims)
+        print(f"  True Positives range:  [{tp_min:.4f}, {tp_max:.4f}]")
+        print(f"  False Positives range: [{fp_min:.4f}, {fp_max:.4f}]")
+
+        overlap_min = max(tp_min, fp_min)
+        overlap_max = min(tp_max, fp_max)
+        if overlap_min < overlap_max:
+            print(f"  OVERLAP REGION:        [{overlap_min:.4f}, {overlap_max:.4f}]")
+        else:
+            print("  NO OVERLAP - distributions are separable!")
+
+    # Sample detection details
+    det_details = details["detection_details"]
+    if det_details:
+        print("\n" + "-" * 40)
+        print(f"SAMPLE DETECTION DETAILS (first 20 of {len(det_details)}):")
+        for i, det in enumerate(det_details[:20]):
+            expected = det["expected_logos"]
+            sims = det["similarities"]
+            print(f"\n  [{i+1}] Image: {det['image']}")
+            print(f"      Expected: {expected if expected else '(none)'}")
+            print(f"      DETR score: {det['detr_score']:.3f}")
+            print(f"      Similarities:")
+            for logo, sim in sims:
+                marker = " <-- CORRECT" if logo in expected else ""
+                print(f"        {sim:.4f}  {logo}{marker}")
+
+    print("\n" + "=" * 60)
+
+
+def write_results_to_file(
+    output_path: Path,
+    args,
+    num_test_images: int,
+    true_positives: int,
+    false_positives: int,
+    false_negatives: int,
+    total_expected: int,
+    precision: float,
+    recall: float,
+    f1: float,
+):
+    """Write results summary to file."""
+    from datetime import datetime
+
+    lines = [
+        "=" * 70,
+        "BURNLEY LOGO DETECTION TEST",
+        f"Model: {args.embedding_model}",
+        f"Method: Margin-based (margin={args.margin})",
+        "=" * 70,
+        f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
+        "",
+        "Configuration:",
+        f"  Embedding model:           {args.embedding_model}",
+        f"  Similarity threshold:      {args.threshold}",
+        f"  DETR threshold:            {args.detr_threshold}",
+        f"  Matching margin:           {args.margin}",
+        f"  Test images processed:     {num_test_images}",
+        f"  Reference logos:           barnfield, vertu",
+        "",
+        "Results:",
+        f"  True Positives:   {true_positives:>6}",
+        f"  False Positives:  {false_positives:>6}",
+        f"  False Negatives:  {false_negatives:>6}",
+        f"  Total Expected:   {total_expected:>6}",
+        "",
+        "Scores:",
+        f"  Precision:  {precision:.4f} ({precision*100:.1f}%)",
+        f"  Recall:     {recall:.4f} ({recall*100:.1f}%)",
+        f"  F1 Score:   {f1:.4f} ({f1*100:.1f}%)",
+        "",
+        "",
+    ]
+
+    with open(output_path, "a") as f:
+        f.write("\n".join(lines))
+
+
+if __name__ == "__main__":
+    main()
--- a/test_logo_detection.py
+++ b/test_logo_detection.py
@ -18,7 +18,7 @@ import random
 import sqlite3
 import sys
 from pathlib import Path
-from typing import Dict, List, Optional, Set, Tuple
+from typing import Any, Dict, List, Optional, Set, Tuple

 import cv2
 import torch
@ -286,6 +286,14 @@ def main():
        default=None,
        help="Append results summary to this file (no progress output, just results)",
    )
+    parser.add_argument(
+        "--preprocess-mode",
+        type=str,
+        choices=["default", "letterbox", "stretch"],
+        default="default",
+        help="Image preprocessing mode for CLIP: 'default' (resize+center crop), "
+             "'letterbox' (pad to square with black bars), 'stretch' (distort to square)",
+    )

    args = parser.parse_args()
    logger = setup_logging(args.verbose)
@ -315,10 +323,13 @@ def main():

    # Initialize detector
    logger.info(f"Initializing logo detector with embedding model: {args.embedding_model}")
+    if args.preprocess_mode != "default":
+        logger.info(f"Using preprocessing mode: {args.preprocess_mode}")
    detector = DetectLogosDETR(
        logger=logger,
        detr_threshold=args.detr_threshold,
        embedding_model=args.embedding_model,
+        preprocess_mode=args.preprocess_mode,
    )

    # Load ground truth (both mappings)
@ -354,6 +365,7 @@ def main():
            cache_key = f"ref:{ref_filename}"
            embedding = cache.get(cache_key) if cache else None

+            # Load image if needed for embedding
            if embedding is None:
                img = load_image(ref_path)
                if img is None:
@ -442,17 +454,19 @@ def main():
        cache_key = f"det:{test_filename}"
        cached_detections = cache.get(cache_key) if cache else None

+        test_img = None
        if cached_detections is not None:
            # Cached detections contain serialized box data and embeddings
            detections = cached_detections
        else:
            # Load and detect
-            img = load_image(test_path)
-            if img is None:
-                logger.warning(f"Failed to load test image: {test_path}")
-                continue
+            if test_img is None:
+                test_img = load_image(test_path)
+                if test_img is None:
+                    logger.warning(f"Failed to load test image: {test_path}")
+                    continue

-            detections = detector.detect(img)
+            detections = detector.detect(test_img)

            # Cache the detections
            if cache:
@ -549,7 +563,7 @@ def main():
                        "correct": is_correct,
                    })

-            else:  # multi-ref
+            elif args.matching_method == "multi-ref":
                # Multi-ref matching: aggregates scores across reference images
                match_result = detector.find_best_match_multi_ref(
                    detection["embedding"],
@ -625,6 +639,7 @@ def main():
    print(f"  Test images processed:     {len(test_images)}")
    print(f"  CLIP similarity threshold: {args.threshold}")
    print(f"  DETR confidence threshold: {args.detr_threshold}")
+    print(f"  Preprocess mode:           {args.preprocess_mode}")
    print(f"  Matching method:           {args.matching_method}")
    if args.matching_method in ("margin", "multi-ref"):
        print(f"  Matching margin:           {args.margin}")
@ -832,6 +847,7 @@ def write_results_to_file(
        "",
        "Configuration:",
        f"  Embedding model:           {args.embedding_model}",
+        f"  Preprocess mode:           {args.preprocess_mode}",
        f"  Reference logos:           {num_logos}",
        f"  Refs per logo:             {args.refs_per_logo}",
        f"  Total reference embeddings:{total_refs}",
--- a/test_results/FINAL_MODEL_ANALYSIS.md
+++ b/test_results/FINAL_MODEL_ANALYSIS.md
@ -0,0 +1,216 @@
+# Logo Recognition Model Analysis
+
+**Date:** January 7, 2026
+**Purpose:** Determine the best model and threshold for logo recognition of logos not currently in the test set.
+
+---
+
+## Executive Summary
+
+| Model | Best Threshold | F1 Score | Precision | Recall | Recommended Use |
+|-------|---------------|----------|-----------|--------|-----------------|
+| **Image-Split Fine-tuned** | 0.70-0.75 | **67-68%** | 66-80% | 59-68% | Known logos (in reference set) |
+| Baseline CLIP | 0.70 | 57-60% | 48-49% | 72-77% | Unknown logos (never seen before) |
+| Logo-Split Fine-tuned | 0.76 | 56% | 49% | 64% | Not recommended |
+| DINOv2 (small/large) | - | 29-30% | 22-32% | 28-43% | Not suitable |
+
+**Winner: Image-Split Fine-tuned Model** at threshold **0.70-0.75**
+
+---
+
+## Detailed Model Comparison
+
+### 1. Baseline CLIP (openai/clip-vit-large-patch14)
+
+The pre-trained CLIP model without any fine-tuning.
+
+**Threshold Performance:**
+
+| Threshold | Precision | Recall | F1 |
+|-----------|-----------|--------|-----|
+| 0.70 | 47.9% | 71.8% | 57.5% |
+| 0.80 | 33.0% | 63.1% | 43.4% |
+| 0.85 | 26.9% | 43.4% | 33.2% |
+| 0.90 | 54.9% | 22.8% | 32.2% |
+
+**Similarity Distribution:**
+- True Positive mean: 0.854 (range: 0.75-0.95)
+- False Positive mean: 0.846 (range: 0.75-0.95)
+- **Problem:** TP and FP distributions almost completely overlap
+
+**Suggested optimal threshold:** 0.756 (predicted F1 = 67.1%)
+
+**Strengths:**
+- Good recall at low thresholds
+- Works on completely unseen logos
+- No training required
+
+**Weaknesses:**
+- Poor separation between correct and incorrect matches
+- High false positive rate
+
+---
+
+### 2. Fine-tuned CLIP (Logo-Level Splits)
+
+Trained with contrastive learning, tested on completely unseen logo brands.
+
+**Threshold Performance:**
+
+| Threshold | Precision | Recall | F1 |
+|-----------|-----------|--------|-----|
+| 0.70 | 25.9% | 67.1% | 37.4% |
+| 0.76 | **49.1%** | 64.3% | **55.7%** |
+| 0.82 | 75.7% | 41.4% | 53.5% |
+| 0.86 | 88.6% | 28.1% | 42.7% |
+
+**Similarity Distribution:**
+- True Positive mean: 0.853
+- False Positive mean: 0.787 (better separation than baseline)
+- Missed logos mean: 0.711 (only 43.7% above 0.75)
+
+**Suggested optimal threshold:** 0.82 (predicted F1 = 71.9%)
+
+**Strengths:**
+- Better TP/FP separation than baseline
+- Very high precision at high thresholds (88.6% at t=0.86)
+
+**Weaknesses:**
+- Does not generalize well to unseen logo brands
+- Many correct logos score below threshold (56% of missed logos below 0.75)
+- Worse than baseline at threshold 0.70
+
+---
+
+### 3. Fine-tuned CLIP (Image-Level Splits) ⭐ BEST
+
+Trained with contrastive learning, all logo brands seen during training (different images held out for testing).
+
+**Threshold Performance:**
+
+| Threshold | Precision | Recall | F1 |
+|-----------|-----------|--------|-----|
+| 0.65 | 56.9% | **75.9%** | 65.0% |
+| 0.70 | 66.3% | 68.3% | **67.3%** |
+| 0.75 | **79.9%** | 59.3% | **68.1%** |
+| 0.80 | 83.7% | 52.8% | 64.8% |
+| 0.85 | 92.4% | 42.8% | 58.5% |
+| 0.90 | 98.9% | 24.7% | 39.5% |
+
+**Similarity Distribution:**
+- True Positive mean: 0.866 (higher than other models)
+- False Positive mean: 0.807
+- TP-FP gap: 0.059 (best separation)
+- At t=0.75: 92 TP vs only 38 FP (excellent ratio)
+
+**Suggested optimal threshold:** 0.755 (predicted F1 = 85.6%)
+
+**Strengths:**
+- Best overall F1 score (68.1% at t=0.75)
+- Best precision at any threshold (79.9-98.9%)
+- Excellent TP/FP ratio
+- Highest true positive similarity scores
+
+**Weaknesses:**
+- Requires logos to be in the reference set during training
+- May not generalize to completely novel logos
+
+---
+
+### 4. DINOv2 Models
+
+Tested for comparison but significantly underperformed.
+
+| Model | Precision | Recall | F1 |
+|-------|-----------|--------|-----|
+| DINOv2-small | 22.4% | 42.8% | 29.5% |
+| DINOv2-large | 32.2% | 28.5% | 30.2% |
+
+**Not recommended** for logo recognition tasks.
+
+---
+
+## Recommendations
+
+### For Logo Recognition of Known Logos (logos in your reference set)
+
+**Use: Image-Split Fine-tuned Model**
+
+```bash
+# Recommended configuration
+python test_logo_detection.py \
+    -e models/logo_detection/clip_finetuned_image_split \
+    -t 0.70 \
+    --matching-method multi-ref \
+    --use-max-similarity
+```
+
+| Use Case | Threshold | Expected Performance |
+|----------|-----------|---------------------|
+| Balanced (recommended) | 0.70 | 66% precision, 68% recall, 67% F1 |
+| High precision | 0.75 | 80% precision, 59% recall, 68% F1 |
+| Very high precision | 0.80 | 84% precision, 53% recall, 65% F1 |
+| Maximum precision | 0.85+ | 92%+ precision, <43% recall |
+
+### For Logo Recognition of Unknown Logos (completely novel brands)
+
+**Use: Baseline CLIP** (the fine-tuned models don't generalize well)
+
+```bash
+# Recommended configuration
+python test_logo_detection.py \
+    -e openai/clip-vit-large-patch14 \
+    -t 0.70 \
+    --matching-method multi-ref \
+    --use-max-similarity
+```
+
+Expected: ~48% precision, ~72% recall, ~58% F1
+
+---
+
+## Key Findings
+
+### 1. Image-Level Splits Dramatically Improve Performance
+
+The image-split fine-tuned model outperforms all others because:
+- It learns brand-specific features during training
+- Test images are different but from same brands
+- Better represents real-world use where you have reference images for logos you want to detect
+
+### 2. Logo-Level Splits Test True Generalization (but results are poor)
+
+The logo-split model tests whether fine-tuning helps with completely unseen logos:
+- Result: It doesn't help much (56% F1 vs 58% baseline)
+- Contrastive learning doesn't transfer well to novel brands
+- Use baseline CLIP for novel logo detection
+
+### 3. Threshold Sweet Spot is 0.70-0.75
+
+For all models, the optimal F1 occurs around threshold 0.70-0.75:
+- Lower thresholds: Too many false positives
+- Higher thresholds: Misses too many correct logos
+- At 0.90+: Precision is high but recall drops below 25%
+
+### 4. Precision-Recall Tradeoff
+
+| Priority | Threshold | Tradeoff |
+|----------|-----------|----------|
+| Recall | 0.65-0.70 | More matches, more false positives |
+| Balanced | 0.70-0.75 | Best F1 score |
+| Precision | 0.75-0.80 | Fewer false positives, misses some matches |
+| High Precision | 0.85+ | Very few false positives, misses many matches |
+
+---
+
+## Conclusion
+
+**For production use with known logos:**
+- Use **Image-Split Fine-tuned Model** at **threshold 0.70-0.75**
+- Expected F1: 67-68%, Precision: 66-80%
+
+**For discovering unknown logos:**
+- Use **Baseline CLIP** at **threshold 0.70**
+- Expected F1: ~58%, Precision: ~48%
+
+The image-split fine-tuning provides significant improvements (+8-10% F1) over baseline for known logos, but does not help with completely novel brands. For a production system, ensure all target logos are included in the training/reference set.
--- a/test_results/comparison_results/baseline_20260105_100740.txt
+++ b/test_results/comparison_results/baseline_20260105_100740.txt
--- a/test_results/comparison_results/comparison_summary_20260105_100740.txt
+++ b/test_results/comparison_results/comparison_summary_20260105_100740.txt
@ -0,0 +1,29 @@
+============================================================
+
+Test Parameters:
+  Logos: 50, Seed: 42, Threshold: 0.7
+  Method: multi-ref, Refs/logo: 3, Margin: 0.05
+
+BASELINE (openai/clip-vit-large-patch14):
+  True Positives (correct matches):  101
+  False Positives (wrong matches):   104
+  False Negatives (missed logos):    156
+  Precision: 0.4927 (49.3%)
+  Recall:    0.4056 (40.6%)
+  F1 Score:  0.4449 (44.5%)
+
+FINE-TUNED (models/logo_detection/clip_finetuned):
+  True Positives (correct matches):  164
+  False Positives (wrong matches):   414
+  False Negatives (missed logos):    115
+  Precision: 0.2837 (28.4%)
+  Recall:    0.6586 (65.9%)
+  F1 Score:  0.3966 (39.7%)
+
+------------------------------------------------------------
+F1 SCORE COMPARISON:
+  Baseline:    44.5%
+  Fine-tuned:  39.7%
+------------------------------------------------------------
+
+Full results saved to: comparison_results/
--- a/test_results/comparison_results/finetuned_20260105_100740.txt
+++ b/test_results/comparison_results/finetuned_20260105_100740.txt
--- a/test_results/comparison_results_clip_defaults_all_methods.txt
+++ b/test_results/comparison_results_clip_defaults_all_methods.txt
@ -0,0 +1,124 @@
+Logo Detection Comparison Tests
+================================
+Date: Wed Dec 31 03:43:45 PM MST 2025
+
+Common Parameters:
+  Reference logos: 20
+  Refs per logo: 10
+  Positive samples: 20
+  Negative samples: 100
+  Min matching refs: 3
+  Seed: 42
+
+======================================================================
+TEST: SIMPLE MATCHING
+Method: Simple (all matches above threshold)
+======================================================================
+Date: 2025-12-31 16:02:25
+
+Configuration:
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2355
+  CLIP threshold:            0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      751
+  False Positives:   58221
+  False Negatives:       9
+  Total Expected:      369
+
+Scores:
+  Precision:  0.0127 (1.3%)
+  Recall:     2.0352 (203.5%)
+  F1 Score:   0.0253 (2.5%)
+
+======================================================================
+TEST: MARGIN MATCHING
+Method: Margin-based (margin=0.05)
+======================================================================
+Date: 2025-12-31 16:20:42
+
+Configuration:
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2361
+  CLIP threshold:            0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:       60
+  False Positives:      26
+  False Negatives:     310
+  Total Expected:      369
+
+Scores:
+  Precision:  0.6977 (69.8%)
+  Recall:     0.1626 (16.3%)
+  F1 Score:   0.2637 (26.4%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Method: Multi-ref (mean, min_refs=3, margin=0.05)
+======================================================================
+Date: 2025-12-31 16:38:59
+
+Configuration:
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2352
+  CLIP threshold:            0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      233
+  False Positives:     217
+  False Negatives:     170
+  Total Expected:      369
+
+Scores:
+  Precision:  0.5178 (51.8%)
+  Recall:     0.6314 (63.1%)
+  F1 Score:   0.5690 (56.9%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2025-12-31 16:56:49
+
+Configuration:
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2350
+  CLIP threshold:            0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      278
+  False Positives:     259
+  False Negatives:     136
+  Total Expected:      369
+
+Scores:
+  Precision:  0.5177 (51.8%)
+  Recall:     0.7534 (75.3%)
+  F1 Score:   0.6137 (61.4%)
+
--- a/test_results/model_comparison_results.txt
+++ b/test_results/model_comparison_results.txt
@ -0,0 +1,105 @@
+Embedding Model Comparison Tests
+=================================
+Date: Fri Jan  2 12:47:03 PM MST 2026
+
+Common Parameters:
+  Matching method: multi-ref (max)
+  Reference logos: 20
+  Refs per logo: 10
+  Positive samples: 20
+  Negative samples: 100
+  Min matching refs: 3
+  Threshold: 0.70
+  Margin: 0.05
+  Seed: 42
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2026-01-02 13:05:17
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2355
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      284
+  False Positives:     295
+  False Negatives:     124
+  Total Expected:      369
+
+Scores:
+  Precision:  0.4905 (49.1%)
+  Recall:     0.7696 (77.0%)
+  F1 Score:   0.5992 (59.9%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: facebook/dinov2-small
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2026-01-02 13:19:01
+
+Configuration:
+  Embedding model:           facebook/dinov2-small
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2358
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      158
+  False Positives:     546
+  False Negatives:     234
+  Total Expected:      369
+
+Scores:
+  Precision:  0.2244 (22.4%)
+  Recall:     0.4282 (42.8%)
+  F1 Score:   0.2945 (29.5%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: facebook/dinov2-large
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2026-01-02 13:39:33
+
+Configuration:
+  Embedding model:           facebook/dinov2-large
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2355
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      105
+  False Positives:     221
+  False Negatives:     277
+  Total Expected:      369
+
+Scores:
+  Precision:  0.3221 (32.2%)
+  Recall:     0.2846 (28.5%)
+  F1 Score:   0.3022 (30.2%)
+
--- a/test_results/similarity_analysis/baseline_similarity_20260105_113827.txt
+++ b/test_results/similarity_analysis/baseline_similarity_20260105_113827.txt
--- a/test_results/similarity_analysis/finetuned_similarity_20260105_113827.txt
+++ b/test_results/similarity_analysis/finetuned_similarity_20260105_113827.txt
--- a/test_results/test_results_analysis.md
+++ b/test_results/test_results_analysis.md
@ -346,6 +346,131 @@ DINOv2 Small produces over 3x as many false positives as true positives, making

 ---

+## Summary and Recommendations
+
+This section synthesizes findings from all test runs to provide actionable recommendations for logo detection configuration and future improvements.
+
+### Best Configuration
+
+Based on all tests conducted, the optimal configuration is:
+
+| Parameter | Recommended Value | Rationale |
+|-----------|-------------------|-----------|
+| **Embedding Model** | `openai/clip-vit-large-patch14` | 2x better F1 than DINOv2 alternatives |
+| **Matching Method** | `multi-ref` with max similarity | Best F1 (59.9%) and recall (77.0%) |
+| **Similarity Threshold** | 0.70 | Lower thresholds outperform higher ones |
+| **Margin** | 0.05 | Minimal impact; keep low to avoid rejecting valid matches |
+| **Min Matching Refs** | 3 | Provides better discrimination than threshold alone |
+| **Refs Per Logo** | 10 | More references improve robustness |
+| **DETR Threshold** | 0.50 | Standard detection confidence |
+
+### Performance Expectations
+
+With the recommended configuration:
+
+| Metric | Expected Value | Interpretation |
+|--------|----------------|----------------|
+| Precision | ~49% | About half of detections are correct |
+| Recall | ~77% | Finds most logos present in images |
+| F1 Score | ~60% | Moderate overall accuracy |
+| FP:TP Ratio | ~1:1 | Approximately equal true and false positives |
+
+**Important**: These results indicate the system is suitable for applications that can tolerate a high false positive rate, such as:
+- Initial screening with human review
+- Flagging content for further analysis
+- Low-stakes logo presence detection
+
+The system is **not suitable** for high-precision applications without additional filtering or verification steps.
+
+### Key Insights from Testing
+
+#### What Works
+
+1. **Multi-ref matching with max aggregation** consistently outperforms other methods
+2. **Multiple references per logo** (10) provides robustness against logo variations
+3. **min_matching_refs=3** is more effective at discrimination than threshold tuning
+4. **CLIP embeddings** significantly outperform self-supervised alternatives (DINOv2)
+
+#### What Doesn't Work
+
+1. **Raising similarity threshold** paradoxically increases false positives in the 0.70-0.85 range
+2. **Margin-only matching** fails with multiple references (same-logo refs compete)
+3. **DINOv2 models** produce 2-3x worse results than CLIP
+4. **Simple threshold-based matching** produces unacceptable 78:1 FP:TP ratio
+
+#### Limitations
+
+1. **~50% precision ceiling**: Even the best configuration produces nearly as many false positives as true positives
+2. **No clean threshold separation**: CLIP's embedding space doesn't provide clear decision boundaries for logos
+3. **General-purpose models**: Neither CLIP nor DINOv2 are optimized for fine-grained logo discrimination
+4. **Pipeline dependencies**: Results depend heavily on DETR detection quality
+
+### Recommendations for Future Improvements
+
+#### Short-Term Improvements
+
+| Improvement | Expected Impact | Effort |
+|-------------|-----------------|--------|
+| **Post-processing filters** | Reduce FP by 20-30% | Low |
+| Add color histogram matching | Filter matches with wrong colors | |
+| Add aspect ratio validation | Reject shape mismatches | |
+| Add text detection | Filter if expected text is missing | |
+| **Reference curation** | Improve TP by 10-20% | Low |
+| Remove low-quality references | Reduce noise in ref embeddings | |
+| Ensure diverse logo variants | Improve coverage | |
+| **Ensemble scoring** | Improve F1 by 10-15% | Medium |
+| Combine CLIP + color features | Multi-signal confidence | |
+| Weighted voting across refs | More robust aggregation | |
+
+#### Medium-Term Improvements
+
+| Improvement | Expected Impact | Effort |
+|-------------|-----------------|--------|
+| **Fine-tune CLIP on logos** | Improve F1 by 20-40% | Medium |
+| Contrastive training on logo pairs | Better embedding separation | |
+| Use LogoDet-3K for training data | Domain-specific features | |
+| **Alternative detection models** | Improve detection quality | Medium |
+| Test YOLOv8 for logo detection | Faster, potentially more accurate | |
+| Train custom detector on logo data | Better region proposals | |
+| **Learned similarity metric** | Improve precision by 30-50% | Medium |
+| Train siamese network for logo matching | Replace cosine similarity | |
+| Learn logo-specific distance function | Better discrimination | |
+
+#### Long-Term Improvements
+
+| Improvement | Expected Impact | Effort |
+|-------------|-----------------|--------|
+| **End-to-end logo recognition model** | F1 > 85% | High |
+| Single model for detection + recognition | Eliminate pipeline errors | |
+| Train on large-scale logo dataset | Comprehensive coverage | |
+| **Logo-specific foundation model** | F1 > 90% | High |
+| Pre-train on millions of logo images | Domain expertise | |
+| Fine-tune for specific brand sets | Production-ready accuracy | |
+
+### Decision Framework
+
+Use this framework to choose between precision and recall:
+
+| Use Case | Priority | Recommended Adjustments |
+|----------|----------|------------------------|
+| **Content moderation** | High recall | Use defaults; accept FPs for human review |
+| **Brand monitoring** | Balanced | Use defaults; filter obvious FPs |
+| **Automated licensing** | High precision | Use threshold=0.90; accept low recall |
+| **Search/discovery** | High recall | Lower threshold to 0.65; more refs |
+
+### Conclusion
+
+The current DETR + CLIP pipeline with multi-ref matching achieves moderate accuracy (~60% F1) that is suitable for screening applications but falls short of production requirements for automated decision-making. The fundamental limitation is that general-purpose vision models lack the fine-grained discrimination needed for logo recognition.
+
+**To achieve production-quality accuracy (>85% F1), the system requires:**
+1. A logo-specific embedding model (fine-tuned or trained from scratch)
+2. Additional visual features beyond CLIP embeddings
+3. Potentially an end-to-end architecture designed for logo recognition
+
+The test framework established here provides the foundation for evaluating these future improvements systematically.
+
+---
+
 ## Test Run: [Next Test Name]

 *Results pending...*
--- a/test_results/threshold_analysis/finetuned_thresholds_20260105_122213.txt
+++ b/test_results/threshold_analysis/finetuned_thresholds_20260105_122213.txt
@ -0,0 +1,20 @@
+============================================================
+THRESHOLD OPTIMIZATION RESULTS
+Model: finetuned (models/logo_detection/clip_finetuned)
+============================================================
+
+Threshold        TP       FP       FN     Prec   Recall       F1
+--------------------------------------------------------------------
+0.70            167      477      120    25.9%    67.1%    37.4%
+0.72            158      339      116    31.8%    63.5%    42.4%
+0.74            150      252      123    37.3%    60.2%    46.1%
+0.76            160      166      119    49.1%    64.3%    55.7%
+0.78            120      102      147    54.1%    48.2%    51.0%
+0.80            110       73      151    60.1%    44.2%    50.9%
+0.82            103       33      159    75.7%    41.4%    53.5%
+0.84             74       18      180    80.4%    29.7%    43.4%
+0.86             70        9      187    88.6%    28.1%    42.7%
+--------------------------------------------------------------------
+
+BEST THRESHOLD: 0.76 (F1 = 55.7%)
+
--- a/test_results/threshold_analysis/threshold_test_results.txt
+++ b/test_results/threshold_analysis/threshold_test_results.txt
@ -0,0 +1,193 @@
+Threshold Optimization Tests
+=============================
+Date: Fri Jan  2 10:11:34 AM MST 2026
+
+Common Parameters:
+  Matching method: multi-ref (max)
+  Reference logos: 20
+  Refs per logo: 10
+  Positive samples: 20
+  Negative samples: 100
+  Min matching refs: 3
+  Seed: 42
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2026-01-02 10:29:26
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2358
+  Similarity threshold:      0.7
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      265
+  False Positives:     288
+  False Negatives:     141
+  Total Expected:      369
+
+Scores:
+  Precision:  0.4792 (47.9%)
+  Recall:     0.7182 (71.8%)
+  F1 Score:   0.5748 (57.5%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.05)
+======================================================================
+Date: 2026-01-02 10:47:35
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2348
+  Similarity threshold:      0.8
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      233
+  False Positives:     472
+  False Negatives:     165
+  Total Expected:      369
+
+Scores:
+  Precision:  0.3305 (33.0%)
+  Recall:     0.6314 (63.1%)
+  F1 Score:   0.4339 (43.4%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.1)
+======================================================================
+Date: 2026-01-02 11:05:34
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2357
+  Similarity threshold:      0.8
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      187
+  False Positives:     375
+  False Negatives:     208
+  Total Expected:      369
+
+Scores:
+  Precision:  0.3327 (33.3%)
+  Recall:     0.5068 (50.7%)
+  F1 Score:   0.4017 (40.2%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.1)
+======================================================================
+Date: 2026-01-02 11:23:33
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2356
+  Similarity threshold:      0.85
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      160
+  False Positives:     434
+  False Negatives:     223
+  Total Expected:      369
+
+Scores:
+  Precision:  0.2694 (26.9%)
+  Recall:     0.4336 (43.4%)
+  F1 Score:   0.3323 (33.2%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.15)
+======================================================================
+Date: 2026-01-02 11:41:47
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2359
+  Similarity threshold:      0.85
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:      163
+  False Positives:     410
+  False Negatives:     220
+  Total Expected:      369
+
+Scores:
+  Precision:  0.2845 (28.4%)
+  Recall:     0.4417 (44.2%)
+  F1 Score:   0.3461 (34.6%)
+
+======================================================================
+TEST: MULTI-REF MATCHING
+Model: openai/clip-vit-large-patch14
+Method: Multi-ref (max, min_refs=3, margin=0.15)
+======================================================================
+Date: 2026-01-02 12:00:00
+
+Configuration:
+  Embedding model:           openai/clip-vit-large-patch14
+  Reference logos:           20
+  Refs per logo:             10
+  Total reference embeddings:189
+  Positive samples/logo:     20
+  Negative samples/logo:     100
+  Test images processed:     2363
+  Similarity threshold:      0.9
+  DETR threshold:            0.5
+  Random seed:               42
+
+Results:
+  True Positives:       84
+  False Positives:      69
+  False Negatives:     288
+  Total Expected:      369
+
+Scores:
+  Precision:  0.5490 (54.9%)
+  Recall:     0.2276 (22.8%)
+  F1 Score:   0.3218 (32.2%)
+
Author	SHA1	Message	Date
Rick McEwen	f2ae80c9e5	Updated results with similarities	2026-03-31 12:30:14 -06:00
Rick McEwen	8b67b50d19	Add Burnley averaged embeddings test results to README DINOv2 with margin-based matching on barnfield/vertu logos: 43.8% precision, 19.2% recall, 26.7% F1.	2026-03-31 11:59:02 -06:00
Rick McEwen	5ce6265a90	Test data and results	2026-03-31 11:54:39 -06:00
Rick McEwen	512f678310	Add latest test detection method	2026-03-31 11:51:26 -06:00
Rick McEwen	f598866d37	Add Burnley logo detection test using DetectLogosEmbeddings Test script for barnfield and vertu logo detection on Burnley test images. Uses averaged reference embeddings and margin-based matching. Ground truth derived from filename prefixes.	2026-03-31 11:49:11 -06:00
Rick McEwen	91d1c9cd59	Update README with recommended settings and test results Add comprehensive recommendations section based on LogoDet-3K testing: - Optimal parameter settings table (multi-ref, max aggregation, CLIP model) - Performance benchmarks for refs-per-logo (1-10 refs) - Matching method comparison (simple vs margin vs multi-ref) - Embedding model comparison (CLIP vs DINOv2) - Preprocessing mode comparison (default vs letterbox vs stretch)	2026-01-08 12:55:13 -05:00
Rick McEwen	ea6fcec9ce	Remove hybrid text+CLIP matching approach The hybrid approach combined OCR text recognition with CLIP embeddings to improve logo matching accuracy. After extensive testing, the approach was abandoned because: 1. OCR quality on small logo crops is unreliable 2. Text filtering rejected correct matches as often as wrong ones 3. Best hybrid result (57.1% precision) was similar to baseline (55.1%) 4. Recall dropped significantly (52.6% vs 59.6%) 5. Added complexity (EasyOCR dependency, extra parameters) wasn't justified Removed: - Hybrid matching methods from DetectLogosDETR class - Text extraction and similarity methods - Hybrid test scripts and text_recognition.py module - Hybrid-related CLI arguments from test_logo_detection.py The baseline multi-ref matching with 0.70 threshold remains the recommended approach for logo detection.	2026-01-08 12:48:39 -05:00
Rick McEwen	f777b049a3	Fix EasyOCR model path to use script-relative directory	2026-01-07 15:38:23 -05:00
Rick McEwen	49f982611a	Add hybrid text+CLIP matching and image preprocessing Hybrid matching combines text recognition with CLIP similarity: - If reference logo has text and detection matches: lower CLIP threshold - If reference has text but detection doesn't match: higher threshold - If reference has no text: standard threshold Image preprocessing adds letterbox/stretch modes for CLIP input to preserve aspect ratio instead of center cropping. New files: - run_hybrid_test.sh: Test hybrid matching configurations - run_preprocess_test.sh: Compare preprocessing modes Changes to logo_detection_detr.py: - Add preprocess_mode parameter (default/letterbox/stretch) - Add set_text_detector() for hybrid matching - Add extract_text() using EasyOCR - Add compute_text_similarity() with fuzzy matching - Add find_best_match_hybrid() with tiered thresholds Changes to test_logo_detection.py: - Add --matching-method hybrid option - Add --preprocess-mode option - Add hybrid threshold arguments	2026-01-07 15:09:09 -05:00
Rick McEwen	78f46f04bf	Add script to test optimal refs per logo for baseline CLIP	2026-01-07 12:52:16 -05:00
Rick McEwen	b5432c9ef7	Add comprehensive model comparison analysis	2026-01-07 12:44:15 -05:00
Rick McEwen	440e8fcdb4	Combine all test results in a single directory	2026-01-07 10:22:54 -05:00
Rick McEwen	2f28aa6052	Add threshold test script for image-split model	2026-01-07 10:14:21 -05:00
Rick McEwen	569285f664	Use script directory as base path for portability	2026-01-06 16:00:09 -05:00
Rick McEwen	c086e8bbf7	Remove opencv-python from requirements (already installed)	2026-01-06 15:23:31 -05:00
Rick McEwen	304d743df8	Add minimal requirements file for training server	2026-01-06 15:17:34 -05:00