Add RTX 4090 config with image-level splits

Add image-level split support for CLIP fine-tuning
Image-level splits allow the model to see some images from each logo brand during training, unlike logo-level splits where test brands are completely unseen. This is less rigorous but more representative of real-world use. Changes: - Add configs/image_level_splits.yaml with gentler training settings: - split_level: "image" for image-level splits - temperature: 0.15 (softer contrastive learning) - learning_rate: 5e-6 (slower learning) - max_epochs: 30 (more epochs) - Update training/dataset.py: - Add split_level parameter to LogoDataset - Implement _split_images() for image-level splitting - Update LogoContrastiveDataset to use split-specific image mappings - Update training/config.py: - Add split_level field to TrainingConfig - Update train_clip_logo.py: - Pass split_level to create_dataloaders Usage: uv run python train_clip_logo.py --config configs/image_level_splits.yaml
2026-01-06 14:23:13 -05:00 · 2026-01-05 15:10:45 -05:00 · 2026-01-05 14:20:27 -05:00 · 2026-01-05 14:09:38 -05:00 · 2026-01-05 13:39:20 -05:00 · 2026-01-05 11:50:10 -05:00
12 changed files with 1037 additions and 57 deletions
--- a/CLIP_FINETUNING.md
+++ b/CLIP_FINETUNING.md
@ -114,9 +114,12 @@ min_delta: 0.001

 ### Test Fine-Tuned Model

+**Important**: The fine-tuned model requires a higher threshold (0.82) than baseline (0.75).
+
 ```bash
 uv run python test_logo_detection.py -n 50 \
    -e models/logo_detection/clip_finetuned \
+    -t 0.82 \
    --matching-method multi-ref \
    --seed 42
 ```
@ -124,26 +127,58 @@ uv run python test_logo_detection.py -n 50 \
 ### Compare with Baseline

 ```bash
-# Baseline CLIP
+# Baseline CLIP (threshold 0.75)
 uv run python test_logo_detection.py -n 50 \
    -e openai/clip-vit-large-patch14 \
+    -t 0.75 \
    --matching-method multi-ref \
    --seed 42

-# Fine-tuned model
+# Fine-tuned model (threshold 0.82)
 uv run python test_logo_detection.py -n 50 \
    -e models/logo_detection/clip_finetuned \
+    -t 0.82 \
    --matching-method multi-ref \
    --seed 42
 ```

+### Threshold Selection
+
+The fine-tuned model requires a **higher similarity threshold** than baseline CLIP. This is because contrastive learning successfully pushed non-matching logo similarities much lower, changing the score distribution.
+
+#### Similarity Distribution Analysis
+
+| Metric | Baseline | Fine-tuned |
+|--------|----------|------------|
+| Wrong logos mean similarity | 0.66 | **0.44** |
+| Wrong logos above 0.75 | 23.2% | **0.6%** |
+| Correct logos mean similarity | 0.75 | 0.64 |
+| Optimal threshold | 0.756 | **0.819** |
+| F1 at optimal threshold | 67.1% | **71.9%** |
+
+**Key insight**: The fine-tuned model dramatically reduced similarities to wrong logos (from 0.66 to 0.44 mean). This means at threshold 0.75, it correctly rejects far more non-matches, but needs a higher threshold to avoid false positives from scores that bunch up just above 0.75.
+
+#### Analyze Similarity Distribution
+
+To find the optimal threshold for your model:
+
+```bash
+# Run detailed similarity analysis
+./analyze_similarity_distribution.sh --model finetuned
+
+# Or analyze both models
+./analyze_similarity_distribution.sh --model both
+```
+
+This outputs distribution statistics and suggests an optimal threshold based on the data.
+
 ### Expected Metrics

-| Metric | Baseline CLIP | Target (Fine-tuned) |
-|--------|---------------|---------------------|
-| Precision | ~49% | >70% |
-| Recall | ~77% | >75% |
-| F1 Score | ~60% | >72% |
+| Metric | Baseline (t=0.75) | Fine-tuned (t=0.82) |
+|--------|-------------------|---------------------|
+| Precision | ~49% | >65% |
+| Recall | ~77% | >70% |
+| F1 Score | ~60% | >70% |

 Training metrics to monitor:
 - Mean positive similarity: target > 0.85
--- a/analyze_similarity_distribution.sh
+++ b/analyze_similarity_distribution.sh
@ -0,0 +1,141 @@
+#!/bin/bash
+#
+# Analyze similarity distribution for baseline and fine-tuned models.
+#
+# This script runs the test with --similarity-details to output detailed
+# statistics about how the models score matches vs non-matches.
+#
+# Usage:
+#   ./analyze_similarity_distribution.sh
+#   ./analyze_similarity_distribution.sh --model finetuned
+#   ./analyze_similarity_distribution.sh --model baseline
+#
+
+set -e
+
+# Default parameters
+NUM_LOGOS="${NUM_LOGOS:-50}"
+SEED="${SEED:-42}"
+THRESHOLD="${THRESHOLD:-0.75}"
+REFS_PER_LOGO="${REFS_PER_LOGO:-3}"
+MARGIN="${MARGIN:-0.05}"
+MODEL="${MODEL:-both}"
+
+# Model paths
+BASELINE_MODEL="openai/clip-vit-large-patch14"
+FINETUNED_MODEL="models/logo_detection/clip_finetuned"
+
+# Output directory
+OUTPUT_DIR="similarity_analysis"
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        -n|--num-logos)
+            NUM_LOGOS="$2"
+            shift 2
+            ;;
+        -s|--seed)
+            SEED="$2"
+            shift 2
+            ;;
+        -t|--threshold)
+            THRESHOLD="$2"
+            shift 2
+            ;;
+        --model)
+            MODEL="$2"
+            shift 2
+            ;;
+        --finetuned-path)
+            FINETUNED_MODEL="$2"
+            shift 2
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  -n, --num-logos NUM     Number of logos to test (default: 50)"
+            echo "  -s, --seed SEED         Random seed (default: 42)"
+            echo "  -t, --threshold VAL     Similarity threshold (default: 0.75)"
+            echo "  --model MODEL           Which model: 'baseline', 'finetuned', or 'both' (default: both)"
+            echo "  --finetuned-path PATH   Path to fine-tuned model"
+            echo "  -h, --help              Show this help message"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Create output directory
+mkdir -p "${OUTPUT_DIR}"
+
+echo "============================================================"
+echo "SIMILARITY DISTRIBUTION ANALYSIS"
+echo "============================================================"
+echo ""
+echo "Parameters:"
+echo "  Number of logos: ${NUM_LOGOS}"
+echo "  Random seed:     ${SEED}"
+echo "  Threshold:       ${THRESHOLD}"
+echo "  Refs per logo:   ${REFS_PER_LOGO}"
+echo "  Margin:          ${MARGIN}"
+echo "  Model:           ${MODEL}"
+echo ""
+
+# Common test arguments
+TEST_ARGS=(
+    -n "${NUM_LOGOS}"
+    -s "${SEED}"
+    -t "${THRESHOLD}"
+    --refs-per-logo "${REFS_PER_LOGO}"
+    --margin "${MARGIN}"
+    --matching-method multi-ref
+    --similarity-details
+    --clear-cache
+)
+
+run_analysis() {
+    local model_name="$1"
+    local model_path="$2"
+    local output_file="${OUTPUT_DIR}/${model_name}_similarity_${TIMESTAMP}.txt"
+
+    echo "============================================================"
+    echo "Analyzing: ${model_name}"
+    echo "Model:     ${model_path}"
+    echo "Output:    ${output_file}"
+    echo "============================================================"
+    echo ""
+
+    uv run python test_logo_detection.py \
+        "${TEST_ARGS[@]}" \
+        -e "${model_path}" \
+        2>&1 | tee "${output_file}"
+
+    echo ""
+    echo "Results saved to: ${output_file}"
+    echo ""
+}
+
+# Run analysis based on model selection
+if [[ "${MODEL}" == "baseline" ]] || [[ "${MODEL}" == "both" ]]; then
+    run_analysis "baseline" "${BASELINE_MODEL}"
+fi
+
+if [[ "${MODEL}" == "finetuned" ]] || [[ "${MODEL}" == "both" ]]; then
+    if [ ! -d "${FINETUNED_MODEL}" ]; then
+        echo "Warning: Fine-tuned model not found at ${FINETUNED_MODEL}"
+        echo "Skipping fine-tuned model analysis."
+    else
+        run_analysis "finetuned" "${FINETUNED_MODEL}"
+    fi
+fi
+
+echo "============================================================"
+echo "Analysis complete!"
+echo "Results saved to: ${OUTPUT_DIR}/"
+echo "============================================================"
--- a/compare_finetuned_vs_baseline.sh
+++ b/compare_finetuned_vs_baseline.sh
@ -0,0 +1,191 @@
+#!/bin/bash
+#
+# Compare fine-tuned CLIP model against baseline CLIP for logo recognition.
+#
+# This script runs the same test suite on both models and outputs results
+# for easy comparison.
+#
+# Usage:
+#   ./compare_finetuned_vs_baseline.sh
+#   ./compare_finetuned_vs_baseline.sh --num-logos 100
+#
+
+set -e
+
+# Default parameters
+NUM_LOGOS="${NUM_LOGOS:-50}"
+SEED="${SEED:-42}"
+THRESHOLD="${THRESHOLD:-0.7}"
+DETR_THRESHOLD="${DETR_THRESHOLD:-0.5}"
+REFS_PER_LOGO="${REFS_PER_LOGO:-3}"
+MARGIN="${MARGIN:-0.05}"
+POSITIVE_SAMPLES="${POSITIVE_SAMPLES:-5}"
+NEGATIVE_SAMPLES="${NEGATIVE_SAMPLES:-20}"
+
+# Model paths
+BASELINE_MODEL="openai/clip-vit-large-patch14"
+FINETUNED_MODEL="models/logo_detection/clip_finetuned"
+
+# Output files
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+OUTPUT_DIR="comparison_results"
+BASELINE_OUTPUT="${OUTPUT_DIR}/baseline_${TIMESTAMP}.txt"
+FINETUNED_OUTPUT="${OUTPUT_DIR}/finetuned_${TIMESTAMP}.txt"
+SUMMARY_OUTPUT="${OUTPUT_DIR}/comparison_summary_${TIMESTAMP}.txt"
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        -n|--num-logos)
+            NUM_LOGOS="$2"
+            shift 2
+            ;;
+        -s|--seed)
+            SEED="$2"
+            shift 2
+            ;;
+        -t|--threshold)
+            THRESHOLD="$2"
+            shift 2
+            ;;
+        --refs-per-logo)
+            REFS_PER_LOGO="$2"
+            shift 2
+            ;;
+        --margin)
+            MARGIN="$2"
+            shift 2
+            ;;
+        --finetuned-model)
+            FINETUNED_MODEL="$2"
+            shift 2
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  -n, --num-logos NUM      Number of logos to test (default: 50)"
+            echo "  -s, --seed SEED          Random seed for reproducibility (default: 42)"
+            echo "  -t, --threshold VAL      Similarity threshold (default: 0.7)"
+            echo "  --refs-per-logo NUM      Reference images per logo (default: 3)"
+            echo "  --margin VAL             Margin for matching (default: 0.05)"
+            echo "  --finetuned-model PATH   Path to fine-tuned model"
+            echo "  -h, --help               Show this help message"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Create output directory
+mkdir -p "${OUTPUT_DIR}"
+
+# Check if fine-tuned model exists
+if [ ! -d "${FINETUNED_MODEL}" ]; then
+    echo "Error: Fine-tuned model not found at ${FINETUNED_MODEL}"
+    echo "Please train the model first using: uv run python train_clip_logo.py --config configs/jetson_orin.yaml"
+    exit 1
+fi
+
+echo "============================================================"
+echo "CLIP Logo Recognition: Fine-tuned vs Baseline Comparison"
+echo "============================================================"
+echo ""
+echo "Parameters:"
+echo "  Number of logos:    ${NUM_LOGOS}"
+echo "  Random seed:        ${SEED}"
+echo "  Threshold:          ${THRESHOLD}"
+echo "  DETR threshold:     ${DETR_THRESHOLD}"
+echo "  Refs per logo:      ${REFS_PER_LOGO}"
+echo "  Margin:             ${MARGIN}"
+echo "  Positive samples:   ${POSITIVE_SAMPLES}"
+echo "  Negative samples:   ${NEGATIVE_SAMPLES}"
+echo ""
+echo "Models:"
+echo "  Baseline:           ${BASELINE_MODEL}"
+echo "  Fine-tuned:         ${FINETUNED_MODEL}"
+echo ""
+echo "Output:"
+echo "  Baseline results:   ${BASELINE_OUTPUT}"
+echo "  Fine-tuned results: ${FINETUNED_OUTPUT}"
+echo "  Summary:            ${SUMMARY_OUTPUT}"
+echo ""
+
+# Common test arguments
+TEST_ARGS=(
+    -n "${NUM_LOGOS}"
+    -s "${SEED}"
+    -t "${THRESHOLD}"
+    -d "${DETR_THRESHOLD}"
+    --refs-per-logo "${REFS_PER_LOGO}"
+    --margin "${MARGIN}"
+    --positive-samples "${POSITIVE_SAMPLES}"
+    --negative-samples "${NEGATIVE_SAMPLES}"
+    --matching-method multi-ref
+    --clear-cache
+)
+
+# Run baseline test
+echo "============================================================"
+echo "Testing BASELINE model: ${BASELINE_MODEL}"
+echo "============================================================"
+echo ""
+
+uv run python test_logo_detection.py \
+    "${TEST_ARGS[@]}" \
+    -e "${BASELINE_MODEL}" \
+    2>&1 | tee "${BASELINE_OUTPUT}"
+
+echo ""
+echo "Baseline results saved to: ${BASELINE_OUTPUT}"
+echo ""
+
+# Run fine-tuned test
+echo "============================================================"
+echo "Testing FINE-TUNED model: ${FINETUNED_MODEL}"
+echo "============================================================"
+echo ""
+
+uv run python test_logo_detection.py \
+    "${TEST_ARGS[@]}" \
+    -e "${FINETUNED_MODEL}" \
+    2>&1 | tee "${FINETUNED_OUTPUT}"
+
+echo ""
+echo "Fine-tuned results saved to: ${FINETUNED_OUTPUT}"
+echo ""
+
+# Extract and compare key metrics
+echo "============================================================"
+echo "COMPARISON SUMMARY"
+echo "============================================================" | tee "${SUMMARY_OUTPUT}"
+echo "" | tee -a "${SUMMARY_OUTPUT}"
+echo "Test Parameters:" | tee -a "${SUMMARY_OUTPUT}"
+echo "  Logos: ${NUM_LOGOS}, Seed: ${SEED}, Threshold: ${THRESHOLD}" | tee -a "${SUMMARY_OUTPUT}"
+echo "  Method: multi-ref, Refs/logo: ${REFS_PER_LOGO}, Margin: ${MARGIN}" | tee -a "${SUMMARY_OUTPUT}"
+echo "" | tee -a "${SUMMARY_OUTPUT}"
+
+echo "BASELINE (${BASELINE_MODEL}):" | tee -a "${SUMMARY_OUTPUT}"
+grep -E "(Precision|Recall|F1 Score|True Positives|False Positives|False Negatives)" "${BASELINE_OUTPUT}" | head -6 | tee -a "${SUMMARY_OUTPUT}"
+echo "" | tee -a "${SUMMARY_OUTPUT}"
+
+echo "FINE-TUNED (${FINETUNED_MODEL}):" | tee -a "${SUMMARY_OUTPUT}"
+grep -E "(Precision|Recall|F1 Score|True Positives|False Positives|False Negatives)" "${FINETUNED_OUTPUT}" | head -6 | tee -a "${SUMMARY_OUTPUT}"
+echo "" | tee -a "${SUMMARY_OUTPUT}"
+
+# Extract F1 scores for quick comparison
+BASELINE_F1=$(grep "F1 Score" "${BASELINE_OUTPUT}" | head -1 | grep -oE "[0-9]+\.[0-9]+%" | head -1 || echo "N/A")
+FINETUNED_F1=$(grep "F1 Score" "${FINETUNED_OUTPUT}" | head -1 | grep -oE "[0-9]+\.[0-9]+%" | head -1 || echo "N/A")
+
+echo "------------------------------------------------------------" | tee -a "${SUMMARY_OUTPUT}"
+echo "F1 SCORE COMPARISON:" | tee -a "${SUMMARY_OUTPUT}"
+echo "  Baseline:    ${BASELINE_F1}" | tee -a "${SUMMARY_OUTPUT}"
+echo "  Fine-tuned:  ${FINETUNED_F1}" | tee -a "${SUMMARY_OUTPUT}"
+echo "------------------------------------------------------------" | tee -a "${SUMMARY_OUTPUT}"
+echo "" | tee -a "${SUMMARY_OUTPUT}"
+echo "Full results saved to: ${OUTPUT_DIR}/" | tee -a "${SUMMARY_OUTPUT}"
+echo ""
+echo "Done!"
--- a/configs/cloud_rtx4090_image_split.yaml
+++ b/configs/cloud_rtx4090_image_split.yaml
@ -0,0 +1,70 @@
+# Training configuration for RTX 4090 (24GB VRAM) with IMAGE-LEVEL splits
+#
+# Combines RTX 4090 hardware optimizations with image-level splitting and
+# gentler contrastive learning for better generalization.
+#
+# Usage:
+#   python train_clip_logo.py --config configs/cloud_rtx4090_image_split.yaml
+#
+# Estimated training time: 5-7 hours (more epochs than logo-level)
+# Estimated cost on RunPod: ~$4
+
+# Base model
+base_model: "openai/clip-vit-large-patch14"
+
+# Dataset paths
+dataset_dir: "LogoDet-3K"
+reference_dir: "reference_logos"
+db_path: "test_data_mapping.db"
+
+# Data split configuration - IMAGE LEVEL
+# Each logo brand will have images in all splits, allowing the model
+# to see some examples of each brand during training.
+split_level: "image"
+train_split: 0.7
+val_split: 0.15
+test_split: 0.15
+
+# Larger batches for faster training on 24GB VRAM
+batch_size: 32
+logos_per_batch: 32
+samples_per_logo: 4
+gradient_accumulation_steps: 4  # Effective batch = 128
+num_workers: 8
+
+# Model architecture
+lora_r: 16
+lora_alpha: 32
+lora_dropout: 0.1
+freeze_layers: 12
+use_gradient_checkpointing: true
+
+# Training - GENTLER settings for better generalization
+learning_rate: 5.0e-6           # Reduced from 1e-5
+weight_decay: 0.01
+warmup_steps: 500
+max_epochs: 30                  # More epochs with slower learning
+mixed_precision: true
+
+# Loss - HIGHER temperature for softer contrastive learning
+temperature: 0.15               # Increased from 0.07
+loss_type: "infonce"
+triplet_margin: 0.2             # Reduced from 0.3
+
+# Early stopping - more patience with gentler learning
+patience: 7
+min_delta: 0.001
+
+# Output - separate directory for image-split model
+checkpoint_dir: "checkpoints_image_split"
+output_dir: "models/logo_detection/clip_finetuned_image_split"
+save_every_n_epochs: 2          # Save frequently for cloud
+
+# Logging
+log_every_n_steps: 10
+eval_every_n_epochs: 1
+
+seed: 42
+use_hard_negatives: false
+use_augmentation: true
+augmentation_strength: "medium"
--- a/configs/image_level_splits.yaml
+++ b/configs/image_level_splits.yaml
@ -0,0 +1,78 @@
+# Training configuration with IMAGE-LEVEL splits
+#
+# Unlike logo-level splits where test logos are completely unseen brands,
+# image-level splits allow the model to see some images from each brand
+# during training. This is less rigorous but more representative of
+# real-world use where you have reference images for logos you want to detect.
+#
+# Also uses gentler contrastive learning settings to prevent over-separation.
+#
+# Usage:
+#   uv run python train_clip_logo.py --config configs/image_level_splits.yaml
+
+# Base model
+base_model: "openai/clip-vit-large-patch14"
+
+# Dataset paths (relative to project root)
+dataset_dir: "LogoDet-3K"
+reference_dir: "reference_logos"
+db_path: "test_data_mapping.db"
+
+# Data split configuration
+# split_level: "image" means images are split, not logo brands
+# This allows test set to contain images from brands seen during training
+split_level: "image"
+train_split: 0.7
+val_split: 0.15
+test_split: 0.15
+
+# Batch construction
+batch_size: 16
+logos_per_batch: 32
+samples_per_logo: 4
+gradient_accumulation_steps: 8
+num_workers: 4
+
+# Model architecture - same as before
+lora_r: 16
+lora_alpha: 32
+lora_dropout: 0.1
+freeze_layers: 12
+use_gradient_checkpointing: true
+
+# Training hyperparameters - GENTLER settings
+learning_rate: 5.0e-6           # Reduced from 1e-5
+weight_decay: 0.01
+warmup_steps: 500
+max_epochs: 30                  # More epochs with slower learning
+mixed_precision: true
+
+# Loss function - HIGHER temperature for softer contrastive learning
+temperature: 0.15               # Increased from 0.07
+loss_type: "infonce"
+triplet_margin: 0.2             # Reduced from 0.3
+
+# Early stopping
+patience: 7                     # More patience with gentler learning
+min_delta: 0.001
+
+# Checkpoints and output
+checkpoint_dir: "checkpoints_image_split"
+output_dir: "models/logo_detection/clip_finetuned_image_split"
+save_every_n_epochs: 5
+
+# Logging
+log_every_n_steps: 10
+eval_every_n_epochs: 1
+
+# Reproducibility
+seed: 42
+
+# Hard negative mining
+use_hard_negatives: false
+hard_negative_start_epoch: 10
+hard_negatives_per_logo: 10
+
+# Data augmentation
+use_augmentation: true
+augmentation_strength: "medium"
--- a/find_optimal_threshold.sh
+++ b/find_optimal_threshold.sh
@ -0,0 +1,168 @@
+#!/bin/bash
+#
+# Find optimal similarity threshold for logo detection.
+#
+# Tests a range of thresholds and outputs precision/recall/F1 for each.
+#
+# Usage:
+#   ./find_optimal_threshold.sh
+#   ./find_optimal_threshold.sh --model finetuned
+#   ./find_optimal_threshold.sh --model baseline
+#   ./find_optimal_threshold.sh --thresholds "0.70 0.75 0.80 0.85"
+#
+
+set -e
+
+# Default parameters
+NUM_LOGOS="${NUM_LOGOS:-50}"
+SEED="${SEED:-42}"
+REFS_PER_LOGO="${REFS_PER_LOGO:-3}"
+MARGIN="${MARGIN:-0.05}"
+MODEL="${MODEL:-finetuned}"
+USE_MAX_SIM="${USE_MAX_SIM:-false}"
+
+# Default thresholds to test
+THRESHOLDS="${THRESHOLDS:-0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86}"
+
+# Model paths
+BASELINE_MODEL="openai/clip-vit-large-patch14"
+FINETUNED_MODEL="models/logo_detection/clip_finetuned"
+
+# Output
+OUTPUT_DIR="threshold_analysis"
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        -n|--num-logos)
+            NUM_LOGOS="$2"
+            shift 2
+            ;;
+        -s|--seed)
+            SEED="$2"
+            shift 2
+            ;;
+        --model)
+            MODEL="$2"
+            shift 2
+            ;;
+        --thresholds)
+            THRESHOLDS="$2"
+            shift 2
+            ;;
+        --finetuned-path)
+            FINETUNED_MODEL="$2"
+            shift 2
+            ;;
+        --use-max-similarity)
+            USE_MAX_SIM="true"
+            shift
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  -n, --num-logos NUM       Number of logos to test (default: 50)"
+            echo "  -s, --seed SEED           Random seed (default: 42)"
+            echo "  --model MODEL             Which model: 'baseline' or 'finetuned' (default: finetuned)"
+            echo "  --thresholds \"T1 T2 ...\"  Space-separated thresholds to test"
+            echo "  --finetuned-path PATH     Path to fine-tuned model"
+            echo "  --use-max-similarity      Use max instead of mean for multi-ref aggregation"
+            echo "  -h, --help                Show this help message"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Select model path
+if [[ "${MODEL}" == "baseline" ]]; then
+    MODEL_PATH="${BASELINE_MODEL}"
+else
+    MODEL_PATH="${FINETUNED_MODEL}"
+fi
+
+# Check if fine-tuned model exists
+if [[ "${MODEL}" == "finetuned" ]] && [ ! -d "${FINETUNED_MODEL}" ]; then
+    echo "Error: Fine-tuned model not found at ${FINETUNED_MODEL}"
+    exit 1
+fi
+
+# Create output directory
+mkdir -p "${OUTPUT_DIR}"
+OUTPUT_FILE="${OUTPUT_DIR}/${MODEL}_thresholds_${TIMESTAMP}.txt"
+
+echo "============================================================"
+echo "THRESHOLD OPTIMIZATION"
+echo "============================================================"
+echo ""
+echo "Model:      ${MODEL} (${MODEL_PATH})"
+echo "Thresholds: ${THRESHOLDS}"
+echo "Logos:      ${NUM_LOGOS}"
+echo "Seed:       ${SEED}"
+echo "Max sim:    ${USE_MAX_SIM}"
+echo "Output:     ${OUTPUT_FILE}"
+echo ""
+
+# Header for results
+echo "============================================================" | tee "${OUTPUT_FILE}"
+echo "THRESHOLD OPTIMIZATION RESULTS" | tee -a "${OUTPUT_FILE}"
+echo "Model: ${MODEL} (${MODEL_PATH})" | tee -a "${OUTPUT_FILE}"
+echo "============================================================" | tee -a "${OUTPUT_FILE}"
+echo "" | tee -a "${OUTPUT_FILE}"
+printf "%-10s %8s %8s %8s %8s %8s %8s\n" "Threshold" "TP" "FP" "FN" "Prec" "Recall" "F1" | tee -a "${OUTPUT_FILE}"
+echo "--------------------------------------------------------------------" | tee -a "${OUTPUT_FILE}"
+
+# Track best F1
+BEST_F1=0
+BEST_THRESHOLD=""
+
+# Build extra args
+EXTRA_ARGS=""
+if [[ "${USE_MAX_SIM}" == "true" ]]; then
+    EXTRA_ARGS="--use-max-similarity"
+fi
+
+# Test each threshold
+for THRESHOLD in ${THRESHOLDS}; do
+    # Run test and capture output
+    OUTPUT=$(uv run python test_logo_detection.py \
+        -n "${NUM_LOGOS}" \
+        -s "${SEED}" \
+        -t "${THRESHOLD}" \
+        --refs-per-logo "${REFS_PER_LOGO}" \
+        --margin "${MARGIN}" \
+        --matching-method multi-ref \
+        -e "${MODEL_PATH}" \
+        ${EXTRA_ARGS} \
+        2>/dev/null)
+
+    # Extract metrics
+    TP=$(echo "${OUTPUT}" | grep "True Positives" | grep -oE "[0-9]+" | head -1)
+    FP=$(echo "${OUTPUT}" | grep "False Positives" | grep -oE "[0-9]+" | head -1)
+    FN=$(echo "${OUTPUT}" | grep "False Negatives" | grep -oE "[0-9]+" | head -1)
+    PREC=$(echo "${OUTPUT}" | grep "Precision:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    RECALL=$(echo "${OUTPUT}" | grep "Recall:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+    F1=$(echo "${OUTPUT}" | grep "F1 Score:" | grep -oE "[0-9]+\.[0-9]+%" | head -1)
+
+    # Print row
+    printf "%-10s %8s %8s %8s %8s %8s %8s\n" "${THRESHOLD}" "${TP}" "${FP}" "${FN}" "${PREC}" "${RECALL}" "${F1}" | tee -a "${OUTPUT_FILE}"
+
+    # Track best F1
+    F1_NUM=$(echo "${F1}" | tr -d '%')
+    BEST_NUM=$(echo "${BEST_F1}" | tr -d '%')
+    if (( $(echo "${F1_NUM} > ${BEST_NUM}" | bc -l) )); then
+        BEST_F1="${F1}"
+        BEST_THRESHOLD="${THRESHOLD}"
+    fi
+done
+
+echo "--------------------------------------------------------------------" | tee -a "${OUTPUT_FILE}"
+echo "" | tee -a "${OUTPUT_FILE}"
+echo "BEST THRESHOLD: ${BEST_THRESHOLD} (F1 = ${BEST_F1})" | tee -a "${OUTPUT_FILE}"
+echo "" | tee -a "${OUTPUT_FILE}"
+echo "Results saved to: ${OUTPUT_FILE}"
--- a/test_logo_detection.py
+++ b/test_logo_detection.py
@ -265,6 +265,11 @@ def main():
        action="store_true",
        help="Enable verbose logging",
    )
+    parser.add_argument(
+        "--similarity-details",
+        action="store_true",
+        help="Output detailed similarity scores for each detection (for analyzing score distributions)",
+    )
    parser.add_argument(
        "--no-cache",
        action="store_true",
@ -411,6 +416,16 @@ def main():
    # Detailed results for analysis
    results = []

+    # Similarity distribution tracking (for --similarity-details)
+    similarity_details = {
+        "true_positive_sims": [],      # Similarities for correct matches
+        "false_positive_sims": [],     # Similarities for wrong matches
+        "missed_best_sims": [],        # Best similarity for logos that should have matched but didn't
+        "all_positive_sims": [],       # All similarities between detected regions and correct logos
+        "all_negative_sims": [],       # All similarities between detected regions and wrong logos
+        "detection_details": [],       # Per-detection breakdown
+    }
+
    # Process test images
    for test_filename in tqdm(test_images, desc="Testing"):
        test_path = test_images_dir / test_filename
@ -445,7 +460,38 @@ def main():

        # Match detections against references using selected method
        matched_logos: Set[str] = set()
-        for detection in detections:
+        for det_idx, detection in enumerate(detections):
+            # Compute similarities to all reference logos for detailed analysis
+            if args.similarity_details:
+                all_sims = {}
+                for logo_name, ref_emb_list in multi_ref_embeddings.items():
+                    sims = []
+                    for ref_emb in ref_emb_list:
+                        sim = detector.compare_embeddings(detection["embedding"], ref_emb)
+                        sims.append(sim)
+                    # Use mean or max based on setting
+                    if args.use_max_similarity:
+                        all_sims[logo_name] = max(sims) if sims else 0
+                    else:
+                        all_sims[logo_name] = sum(sims) / len(sims) if sims else 0
+
+                    # Track positive vs negative similarities
+                    for sim in sims:
+                        if logo_name in expected_logos:
+                            similarity_details["all_positive_sims"].append(sim)
+                        else:
+                            similarity_details["all_negative_sims"].append(sim)
+
+                # Store detection details
+                sorted_sims = sorted(all_sims.items(), key=lambda x: -x[1])
+                similarity_details["detection_details"].append({
+                    "image": test_filename,
+                    "detection_idx": det_idx,
+                    "expected_logos": list(expected_logos),
+                    "top_5_matches": sorted_sims[:5],
+                    "detr_score": detection.get("score", 0),
+                })
+
            if args.matching_method == "simple":
                # Simple matching: return ALL logos above threshold
                all_matches = detector.find_all_matches(
@ -457,16 +503,21 @@ def main():
                    matched_logos.add(label)

                    # Check if this is a correct match
-                    if label in expected_logos:
+                    is_correct = label in expected_logos
+                    if is_correct:
                        true_positives += 1
+                        if args.similarity_details:
+                            similarity_details["true_positive_sims"].append(similarity)
                    else:
                        false_positives += 1
+                        if args.similarity_details:
+                            similarity_details["false_positive_sims"].append(similarity)

                    results.append({
                        "test_image": test_filename,
                        "matched_logo": label,
                        "similarity": similarity,
-                        "correct": label in expected_logos,
+                        "correct": is_correct,
                    })

            elif args.matching_method == "margin":
@ -481,16 +532,21 @@ def main():
                    label, similarity = match_result
                    matched_logos.add(label)

-                    if label in expected_logos:
+                    is_correct = label in expected_logos
+                    if is_correct:
                        true_positives += 1
+                        if args.similarity_details:
+                            similarity_details["true_positive_sims"].append(similarity)
                    else:
                        false_positives += 1
+                        if args.similarity_details:
+                            similarity_details["false_positive_sims"].append(similarity)

                    results.append({
                        "test_image": test_filename,
                        "matched_logo": label,
                        "similarity": similarity,
-                        "correct": label in expected_logos,
+                        "correct": is_correct,
                    })

            else:  # multi-ref
@ -507,16 +563,21 @@ def main():
                    label, similarity, num_matching = match_result
                    matched_logos.add(label)

-                    if label in expected_logos:
+                    is_correct = label in expected_logos
+                    if is_correct:
                        true_positives += 1
+                        if args.similarity_details:
+                            similarity_details["true_positive_sims"].append(similarity)
                    else:
                        false_positives += 1
+                        if args.similarity_details:
+                            similarity_details["false_positive_sims"].append(similarity)

                    results.append({
                        "test_image": test_filename,
                        "matched_logo": label,
                        "similarity": similarity,
-                        "correct": label in expected_logos,
+                        "correct": is_correct,
                    })

        # Count missed detections (false negatives)
@ -524,6 +585,15 @@ def main():
        false_negatives += len(missed)

        for missed_logo in missed:
+            # Track best similarity for missed logos (if we have detections)
+            if args.similarity_details and detections:
+                best_sim_for_missed = 0
+                for detection in detections:
+                    for ref_emb in multi_ref_embeddings.get(missed_logo, []):
+                        sim = detector.compare_embeddings(detection["embedding"], ref_emb)
+                        best_sim_for_missed = max(best_sim_for_missed, sim)
+                similarity_details["missed_best_sims"].append(best_sim_for_missed)
+
            results.append({
                "test_image": test_filename,
                "matched_logo": None,
@ -593,6 +663,10 @@ def main():

    print("=" * 60)

+    # Print similarity distribution details if requested
+    if args.similarity_details:
+        print_similarity_details(similarity_details, args.threshold)
+
    # Write results to file if requested
    if args.output_file:
        write_results_to_file(
@ -612,6 +686,116 @@ def main():
        print(f"\nResults appended to: {args.output_file}")


+def print_similarity_details(details: dict, threshold: float):
+    """Print detailed similarity distribution analysis."""
+    import statistics
+
+    print("\n" + "=" * 60)
+    print("SIMILARITY DISTRIBUTION ANALYSIS")
+    print("=" * 60)
+
+    # Helper to compute stats
+    def compute_stats(values, name):
+        if not values:
+            print(f"\n{name}: No data")
+            return
+        print(f"\n{name} (n={len(values)}):")
+        print(f"  Min:    {min(values):.4f}")
+        print(f"  Max:    {max(values):.4f}")
+        print(f"  Mean:   {statistics.mean(values):.4f}")
+        if len(values) > 1:
+            print(f"  StdDev: {statistics.stdev(values):.4f}")
+            print(f"  Median: {statistics.median(values):.4f}")
+
+        # Percentiles
+        sorted_vals = sorted(values)
+        n = len(sorted_vals)
+        p10 = sorted_vals[int(n * 0.10)] if n > 10 else sorted_vals[0]
+        p25 = sorted_vals[int(n * 0.25)] if n > 4 else sorted_vals[0]
+        p75 = sorted_vals[int(n * 0.75)] if n > 4 else sorted_vals[-1]
+        p90 = sorted_vals[int(n * 0.90)] if n > 10 else sorted_vals[-1]
+        print(f"  P10:    {p10:.4f}")
+        print(f"  P25:    {p25:.4f}")
+        print(f"  P75:    {p75:.4f}")
+        print(f"  P90:    {p90:.4f}")
+
+        # Count above/below threshold
+        above = sum(1 for v in values if v >= threshold)
+        below = sum(1 for v in values if v < threshold)
+        print(f"  Above threshold ({threshold}): {above} ({100*above/len(values):.1f}%)")
+        print(f"  Below threshold ({threshold}): {below} ({100*below/len(values):.1f}%)")
+
+    # Print distribution stats
+    compute_stats(details["true_positive_sims"], "TRUE POSITIVE similarities (correct matches)")
+    compute_stats(details["false_positive_sims"], "FALSE POSITIVE similarities (wrong matches)")
+    compute_stats(details["missed_best_sims"], "MISSED LOGO best similarities (false negatives)")
+    compute_stats(details["all_positive_sims"], "ALL similarities to CORRECT logos (per-ref)")
+    compute_stats(details["all_negative_sims"], "ALL similarities to WRONG logos (per-ref)")
+
+    # Overlap analysis
+    tp_sims = details["true_positive_sims"]
+    fp_sims = details["false_positive_sims"]
+    if tp_sims and fp_sims:
+        print("\n" + "-" * 40)
+        print("OVERLAP ANALYSIS:")
+        tp_min, tp_max = min(tp_sims), max(tp_sims)
+        fp_min, fp_max = min(fp_sims), max(fp_sims)
+        print(f"  True Positives range:  [{tp_min:.4f}, {tp_max:.4f}]")
+        print(f"  False Positives range: [{fp_min:.4f}, {fp_max:.4f}]")
+
+        # Check overlap
+        overlap_min = max(tp_min, fp_min)
+        overlap_max = min(tp_max, fp_max)
+        if overlap_min < overlap_max:
+            print(f"  OVERLAP REGION:        [{overlap_min:.4f}, {overlap_max:.4f}]")
+            tp_in_overlap = sum(1 for v in tp_sims if overlap_min <= v <= overlap_max)
+            fp_in_overlap = sum(1 for v in fp_sims if overlap_min <= v <= overlap_max)
+            print(f"  TPs in overlap: {tp_in_overlap} ({100*tp_in_overlap/len(tp_sims):.1f}%)")
+            print(f"  FPs in overlap: {fp_in_overlap} ({100*fp_in_overlap/len(fp_sims):.1f}%)")
+        else:
+            print("  NO OVERLAP - distributions are separable!")
+
+        # Suggest optimal threshold
+        all_points = [(s, "tp") for s in tp_sims] + [(s, "fp") for s in fp_sims]
+        all_points.sort()
+        best_thresh = threshold
+        best_f1 = 0
+        total_tp = len(tp_sims)
+        total_fp = len(fp_sims)
+
+        for thresh in [p[0] for p in all_points]:
+            # At this threshold:
+            tp_above = sum(1 for s in tp_sims if s >= thresh)
+            fp_above = sum(1 for s in fp_sims if s >= thresh)
+            prec = tp_above / (tp_above + fp_above) if (tp_above + fp_above) > 0 else 0
+            rec = tp_above / total_tp if total_tp > 0 else 0
+            f1 = 2 * prec * rec / (prec + rec) if (prec + rec) > 0 else 0
+            if f1 > best_f1:
+                best_f1 = f1
+                best_thresh = thresh
+
+        print(f"\n  SUGGESTED OPTIMAL THRESHOLD: {best_thresh:.4f}")
+        print(f"  (would give F1 = {best_f1:.4f} on this data)")
+
+    # Print sample detection details
+    det_details = details["detection_details"]
+    if det_details:
+        print("\n" + "-" * 40)
+        print(f"SAMPLE DETECTION DETAILS (first 20 of {len(det_details)}):")
+        for i, det in enumerate(det_details[:20]):
+            expected = det["expected_logos"]
+            top5 = det["top_5_matches"]
+            print(f"\n  [{i+1}] Image: {det['image']}")
+            print(f"      Expected: {expected if expected else '(none)'}")
+            print(f"      DETR score: {det['detr_score']:.3f}")
+            print(f"      Top 5 matches:")
+            for logo, sim in top5:
+                marker = " <-- CORRECT" if logo in expected else ""
+                print(f"        {sim:.4f}  {logo}{marker}")
+
+    print("\n" + "=" * 60)
+
+
 def write_results_to_file(
    output_path: Path,
    args,
--- a/train_clip_logo.py
+++ b/train_clip_logo.py
@ -256,6 +256,7 @@ def main():
        test_split=config.test_split,
        seed=config.seed,
        augmentation_strength=config.augmentation_strength,
+        split_level=getattr(config, 'split_level', 'logo'),
    )

    # Create trainer
--- a/training/config.py
+++ b/training/config.py
@ -20,7 +20,8 @@ class TrainingConfig:
    reference_dir: str = "reference_logos"
    db_path: str = "test_data_mapping.db"

-    # Data split ratios
+    # Data split configuration
+    split_level: str = "logo"  # "logo" for brand-level, "image" for image-level
    train_split: float = 0.7
    val_split: float = 0.15
    test_split: float = 0.15
--- a/training/dataset.py
+++ b/training/dataset.py
@ -84,7 +84,7 @@ class LogoDataset:
    """
    Manages logo data from the SQLite database.

-    Handles loading logo-to-image mappings and splitting by logo brand.
+    Handles loading logo-to-image mappings and splitting by logo brand or image.
    """

    def __init__(
@ -95,19 +95,57 @@ class LogoDataset:
        val_split: float = 0.15,
        test_split: float = 0.15,
        seed: int = 42,
+        split_level: str = "logo",
    ):
+        """
+        Initialize the logo dataset.
+
+        Args:
+            db_path: Path to SQLite database
+            reference_dir: Directory containing reference logo images
+            train_split: Fraction for training
+            val_split: Fraction for validation
+            test_split: Fraction for testing
+            seed: Random seed for reproducibility
+            split_level: "logo" for brand-level splits (test on unseen brands),
+                        "image" for image-level splits (test on unseen images
+                        from seen brands)
+        """
        self.db_path = Path(db_path)
        self.reference_dir = Path(reference_dir)
        self.seed = seed
+        self.split_level = split_level

        # Load logo-to-images mapping from database
        self.logo_to_images = self._load_logo_mappings()
        self.all_logos = list(self.logo_to_images.keys())

-        # Create logo-level splits
+        if split_level == "logo":
+            # Logo-level splits: test logos are completely unseen brands
            self.train_logos, self.val_logos, self.test_logos = self._split_logos(
                train_split, val_split, test_split
            )
+            # For logo-level splits, each split has its own logos
+            self.train_logo_to_images = {
+                l: self.logo_to_images[l] for l in self.train_logos
+            }
+            self.val_logo_to_images = {
+                l: self.logo_to_images[l] for l in self.val_logos
+            }
+            self.test_logo_to_images = {
+                l: self.logo_to_images[l] for l in self.test_logos
+            }
+        else:
+            # Image-level splits: all logos present in all splits, different images
+            (
+                self.train_logo_to_images,
+                self.val_logo_to_images,
+                self.test_logo_to_images,
+            ) = self._split_images(train_split, val_split, test_split)
+            # All logos are in all splits
+            self.train_logos = list(self.train_logo_to_images.keys())
+            self.val_logos = list(self.val_logo_to_images.keys())
+            self.test_logos = list(self.test_logo_to_images.keys())

    def _load_logo_mappings(self) -> Dict[str, List[Path]]:
        """Load logo name to image paths mapping from database."""
@ -151,21 +189,74 @@ class LogoDataset:

        return train_logos, val_logos, test_logos

-    def get_split_info(self) -> Dict[str, int]:
+    def _split_images(
+        self,
+        train_split: float,
+        val_split: float,
+        test_split: float,
+    ) -> Tuple[Dict[str, List[Path]], Dict[str, List[Path]], Dict[str, List[Path]]]:
+        """
+        Split images within each logo brand for train/val/test.
+
+        Each logo brand will have images in all splits, allowing the model
+        to see some examples of each brand during training.
+        """
+        random.seed(self.seed)
+
+        train_logo_to_images: Dict[str, List[Path]] = {}
+        val_logo_to_images: Dict[str, List[Path]] = {}
+        test_logo_to_images: Dict[str, List[Path]] = {}
+
+        for logo, images in self.logo_to_images.items():
+            # Shuffle images for this logo
+            shuffled_images = images.copy()
+            random.shuffle(shuffled_images)
+
+            n = len(shuffled_images)
+            if n == 1:
+                # Only one image: put in train only
+                train_logo_to_images[logo] = shuffled_images
+                continue
+            elif n == 2:
+                # Two images: one train, one val
+                train_logo_to_images[logo] = [shuffled_images[0]]
+                val_logo_to_images[logo] = [shuffled_images[1]]
+                continue
+
+            # Normal split for 3+ images
+            train_end = max(1, int(n * train_split))
+            val_end = train_end + max(1, int(n * val_split))
+
+            train_images = shuffled_images[:train_end]
+            val_images = shuffled_images[train_end:val_end]
+            test_images = shuffled_images[val_end:]
+
+            # Ensure at least one image in train
+            if train_images:
+                train_logo_to_images[logo] = train_images
+            if val_images:
+                val_logo_to_images[logo] = val_images
+            if test_images:
+                test_logo_to_images[logo] = test_images
+
+        return train_logo_to_images, val_logo_to_images, test_logo_to_images
+
+    def get_split_info(self) -> Dict[str, any]:
        """Return information about the splits."""
        return {
+            "split_level": self.split_level,
            "total_logos": len(self.all_logos),
            "train_logos": len(self.train_logos),
            "val_logos": len(self.val_logos),
            "test_logos": len(self.test_logos),
            "train_images": sum(
-                len(self.logo_to_images[l]) for l in self.train_logos
+                len(imgs) for imgs in self.train_logo_to_images.values()
            ),
            "val_images": sum(
-                len(self.logo_to_images[l]) for l in self.val_logos
+                len(imgs) for imgs in self.val_logo_to_images.values()
            ),
            "test_images": sum(
-                len(self.logo_to_images[l]) for l in self.test_logos
+                len(imgs) for imgs in self.test_logo_to_images.values()
            ),
        }

@ -205,29 +296,33 @@ class LogoContrastiveDataset(Dataset):
        self.transform = transform
        self.batches_per_epoch = batches_per_epoch

-        # Get logos for this split
+        # Get logos and their images for this split
+        # This respects both logo-level and image-level splits
        if split == "train":
            self.logos = logo_data.train_logos
+            self.logo_to_images = logo_data.train_logo_to_images
        elif split == "val":
            self.logos = logo_data.val_logos
+            self.logo_to_images = logo_data.val_logo_to_images
        else:
            self.logos = logo_data.test_logos
+            self.logo_to_images = logo_data.test_logo_to_images

-        # Filter logos with enough samples
+        # Filter logos with enough samples for this split
        self.valid_logos = [
            logo for logo in self.logos
-            if len(logo_data.logo_to_images[logo]) >= samples_per_logo
+            if logo in self.logo_to_images and len(self.logo_to_images[logo]) >= samples_per_logo
        ]

        # For logos with fewer samples, we'll use with replacement
        self.logos_needing_replacement = [
            logo for logo in self.logos
-            if len(logo_data.logo_to_images[logo]) < samples_per_logo
+            if logo in self.logo_to_images and len(self.logo_to_images[logo]) < samples_per_logo
        ]

-        # Create label mapping
+        # Create label mapping (use all logos from the full dataset for consistent labels)
        self.logo_to_label = {
-            logo: idx for idx, logo in enumerate(self.logos)
+            logo: idx for idx, logo in enumerate(logo_data.all_logos)
        }

    def __len__(self) -> int:
@ -244,12 +339,13 @@ class LogoContrastiveDataset(Dataset):
        images = []
        labels = []

-        # Sample K logos for this batch
-        k = min(self.logos_per_batch, len(self.logos))
-        batch_logos = random.sample(self.logos, k)
+        # Sample K logos for this batch (only from logos that have images in this split)
+        available_logos = [l for l in self.logos if l in self.logo_to_images]
+        k = min(self.logos_per_batch, len(available_logos))
+        batch_logos = random.sample(available_logos, k)

        for logo in batch_logos:
-            logo_images = self.logo_data.logo_to_images[logo]
+            logo_images = self.logo_to_images[logo]

            # Sample M images for this logo
            if len(logo_images) >= self.samples_per_logo:
@ -353,6 +449,7 @@ def create_dataloaders(
    seed: int = 42,
    augmentation_strength: str = "medium",
    batches_per_epoch: int = 1000,
+    split_level: str = "logo",
 ) -> Tuple[DataLoader, DataLoader, Optional[DataLoader]]:
    """
    Create train, validation, and optionally test dataloaders.
@ -370,6 +467,7 @@ def create_dataloaders(
        seed: Random seed
        augmentation_strength: "light", "medium", or "strong"
        batches_per_epoch: Number of batches per training epoch
+        split_level: "logo" for brand-level splits, "image" for image-level splits

    Returns:
        Tuple of (train_loader, val_loader, test_loader)
@ -382,11 +480,13 @@ def create_dataloaders(
        val_split=val_split,
        test_split=test_split,
        seed=seed,
+        split_level=split_level,
    )

    # Print split info
    split_info = logo_data.get_split_info()
    print(f"Dataset loaded:")
+    print(f"  Split level: {split_info['split_level']}")
    print(f"  Total logos: {split_info['total_logos']}")
    print(f"  Train: {split_info['train_logos']} logos, {split_info['train_images']} images")
    print(f"  Val: {split_info['val_logos']} logos, {split_info['val_images']} images")
--- a/training/model.py
+++ b/training/model.py
@ -250,33 +250,49 @@ class LogoFineTunedCLIP(nn.Module):
        # Load base CLIP model
        clip_model = CLIPModel.from_pretrained(base_model)

-        # Create model instance
+        # Check if we need to load LoRA weights
+        if config.get("peft_applied", False) and PEFT_AVAILABLE:
+            # Create model WITHOUT LoRA (lora_r=0) - we'll load LoRA weights separately
            model = cls(
                vision_model=clip_model.vision_model,
-            lora_r=config.get("lora_r", 0),
+                lora_r=0,  # Don't apply LoRA in constructor
                lora_alpha=config.get("lora_alpha", 1),
                freeze_layers=config.get("freeze_layers", 12),
                add_projection_head=config.get("add_projection_head", True),
-            use_gradient_checkpointing=False,  # Not needed for inference
+                use_gradient_checkpointing=False,
            )

-        # Load weights
-        if config.get("peft_applied", False) and PEFT_AVAILABLE:
-            # Load LoRA weights
+            # Load LoRA weights from checkpoint
            lora_path = model_path / "vision_lora"
            if lora_path.exists():
                model.vision_model = PeftModel.from_pretrained(
                    model.vision_model, lora_path
                )
+                model.peft_applied = True
+                model.lora_r = config.get("lora_r", 16)
+
            # Load projection head
            proj_path = model_path / "projection_head.bin"
            if proj_path.exists():
-                model.projection.load_state_dict(torch.load(proj_path))
+                model.projection.load_state_dict(
+                    torch.load(proj_path, map_location="cpu")
+                )
        else:
-            # Load full model state
+            # No LoRA - create model and load full state
+            model = cls(
+                vision_model=clip_model.vision_model,
+                lora_r=0,
+                lora_alpha=config.get("lora_alpha", 1),
+                freeze_layers=config.get("freeze_layers", 12),
+                add_projection_head=config.get("add_projection_head", True),
+                use_gradient_checkpointing=False,
+            )
+
            weights_path = model_path / "pytorch_model.bin"
            if weights_path.exists():
-                model.load_state_dict(torch.load(weights_path))
+                model.load_state_dict(
+                    torch.load(weights_path, map_location="cpu")
+                )

        if device is not None:
            model = model.to(device)
--- a/training/trainer.py
+++ b/training/trainer.py
@ -169,16 +169,11 @@ class Trainer:
                    "val_neg_sim": val_metrics["mean_neg_sim"],
                })

-                # Checkpointing based on separation (primary) or loss (secondary)
-                improved = False
+                # Checkpointing based on separation (gap between pos and neg similarity)
+                # This is the key metric for contrastive learning quality
                if val_metrics["separation"] > self.best_val_separation + self.config.min_delta:
                    self.best_val_separation = val_metrics["separation"]
-                    improved = True
-                elif val_metrics["loss"] < self.best_val_loss - self.config.min_delta:
-                    self.best_val_loss = val_metrics["loss"]
-                    improved = True
-
-                if improved:
+                    self.best_val_loss = val_metrics["loss"]  # Track for reference
                    self.patience_counter = 0
                    self._save_checkpoint("best.pt")
                    self.logger.info("New best model saved!")
Author	SHA1	Message	Date
Rick McEwen	55abb1217c	Add RTX 4090 config with image-level splits	2026-01-06 14:23:13 -05:00
Rick McEwen	14a1bda3fa	Add image-level split support for CLIP fine-tuning Image-level splits allow the model to see some images from each logo brand during training, unlike logo-level splits where test brands are completely unseen. This is less rigorous but more representative of real-world use. Changes: - Add configs/image_level_splits.yaml with gentler training settings: - split_level: "image" for image-level splits - temperature: 0.15 (softer contrastive learning) - learning_rate: 5e-6 (slower learning) - max_epochs: 30 (more epochs) - Update training/dataset.py: - Add split_level parameter to LogoDataset - Implement _split_images() for image-level splitting - Update LogoContrastiveDataset to use split-specific image mappings - Update training/config.py: - Add split_level field to TrainingConfig - Update train_clip_logo.py: - Pass split_level to create_dataloaders Usage: uv run python train_clip_logo.py --config configs/image_level_splits.yaml	2026-01-05 15:10:45 -05:00
Rick McEwen	32bfefc022	Add threshold optimization script - Test range of thresholds to find optimal F1 - Support both baseline and fine-tuned models - Option for max vs mean similarity aggregation - Output results table with TP/FP/FN/precision/recall/F1	2026-01-05 14:20:27 -05:00
Rick McEwen	f74d4b6981	Document threshold tuning for fine-tuned CLIP model - Add threshold selection section with similarity distribution analysis - Document that fine-tuned model needs threshold 0.82 (vs baseline 0.75) - Add table comparing baseline vs fine-tuned distributions - Update test commands to include correct thresholds - Reference analyze_similarity_distribution.sh for threshold optimization	2026-01-05 14:09:38 -05:00
Rick McEwen	6685af72d9	Add similarity distribution analysis for debugging embedding quality - Add --similarity-details flag to test_logo_detection.py - Track true positive, false positive, and missed detection similarities - Compute distribution statistics (min, max, mean, stddev, percentiles) - Analyze overlap between TP and FP distributions - Suggest optimal threshold based on data - Show per-detection breakdown with top-5 matches - Create analyze_similarity_distribution.sh wrapper script - Supports baseline, finetuned, or both models - Saves output to similarity_analysis/ directory	2026-01-05 13:39:20 -05:00
Rick McEwen	1bf9985def	Fix double LoRA application when loading fine-tuned model The from_pretrained method was applying LoRA twice: 1. In the constructor via lora_r parameter 2. When loading with PeftModel.from_pretrained() Now creates model with lora_r=0 and loads LoRA weights separately. Note: Warning about "missing adapter keys" for layers 0-11 is expected since those layers are frozen and don't have LoRA adapters.	2026-01-05 11:50:10 -05:00
Rick McEwen	e5482a2d9e	Add script to compare fine-tuned vs baseline CLIP	2026-01-05 11:43:47 -05:00
Rick McEwen	99e5781c91	Fix trainer to use separation as sole criterion for best model Previously the trainer saved a new "best" model if either separation OR loss improved, with loss checked as a fallback. This caused confusing behavior where models with lower separation could overwrite better models. Now only separation (gap between positive and negative similarity) is used to determine the best model, which is the key metric for contrastive learning quality.	2026-01-05 11:01:14 -05:00