Files
logo_test/CLIP_FINETUNING.md
Rick McEwen f74d4b6981 Document threshold tuning for fine-tuned CLIP model
- Add threshold selection section with similarity distribution analysis
- Document that fine-tuned model needs threshold 0.82 (vs baseline 0.75)
- Add table comparing baseline vs fine-tuned distributions
- Update test commands to include correct thresholds
- Reference analyze_similarity_distribution.sh for threshold optimization
2026-01-05 14:09:38 -05:00

302 lines
8.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLIP Fine-Tuning for Logo Recognition
This document describes the CLIP fine-tuning pipeline for improving logo embedding similarity using the LogoDet-3K dataset.
## Overview
The fine-tuning approach uses **contrastive learning** with **LoRA** (Low-Rank Adaptation) to train CLIP's vision encoder for better logo similarity matching while maintaining compatibility with the existing `DetectLogosDETR` class.
**Goal**: Improve F1 from ~60% to >72% on logo matching tasks.
## Files Created
### Training Module (`training/`)
| File | Description |
|------|-------------|
| `__init__.py` | Module exports |
| `config.py` | `TrainingConfig` dataclass with all hyperparameters |
| `dataset.py` | `LogoContrastiveDataset` with logo-level splits and augmentations |
| `model.py` | `LogoFineTunedCLIP` wrapper with LoRA support |
| `losses.py` | `InfoNCELoss`, `TripletLoss`, `SupConLoss`, `CombinedLoss` |
| `trainer.py` | Training loop with mixed precision, checkpointing, early stopping |
| `evaluation.py` | `EmbeddingEvaluator` for validation metrics |
### Scripts
| File | Description |
|------|-------------|
| `train_clip_logo.py` | Main training entry point |
| `export_model.py` | Export trained models to HuggingFace-compatible format |
### Configuration
| File | Description |
|------|-------------|
| `configs/jetson_orin.yaml` | Training config optimized for Jetson Orin AGX |
## Prerequisites
1. **Install dependencies**:
```bash
uv sync
```
2. **Prepare test data** (if not already done):
```bash
uv run python prepare_test_data.py
```
This creates:
- `reference_logos/` - Cropped logo images organized by category/brand
- `test_images/` - Full images for testing
- `test_data_mapping.db` - SQLite database with mappings
## Training
### Basic Training
```bash
uv run python train_clip_logo.py --config configs/jetson_orin.yaml
```
### Training with Overrides
```bash
uv run python train_clip_logo.py --config configs/jetson_orin.yaml \
--learning-rate 5e-6 \
--max-epochs 30 \
--batch-size 8
```
### Resume from Checkpoint
```bash
uv run python train_clip_logo.py --config configs/jetson_orin.yaml \
--resume checkpoints/epoch_10.pt
```
### Training Output
- Checkpoints saved to `checkpoints/`
- Best model saved as `checkpoints/best.pt`
- Final model exported to `models/logo_detection/clip_finetuned/`
## Configuration Options
Key parameters in `configs/jetson_orin.yaml`:
```yaml
# Model
base_model: "openai/clip-vit-large-patch14"
lora_r: 16 # LoRA rank (0 to disable)
lora_alpha: 32 # LoRA scaling factor
freeze_layers: 12 # Freeze first N transformer layers
# Batch construction
batch_size: 16
logos_per_batch: 32 # Different logos per batch
samples_per_logo: 4 # Samples per logo (creates positive pairs)
gradient_accumulation_steps: 8 # Effective batch = 128
# Training
learning_rate: 1.0e-5
max_epochs: 20
mixed_precision: true
temperature: 0.07 # InfoNCE temperature
# Early stopping
patience: 5
min_delta: 0.001
```
## Evaluation
### Test Fine-Tuned Model
**Important**: The fine-tuned model requires a higher threshold (0.82) than baseline (0.75).
```bash
uv run python test_logo_detection.py -n 50 \
-e models/logo_detection/clip_finetuned \
-t 0.82 \
--matching-method multi-ref \
--seed 42
```
### Compare with Baseline
```bash
# Baseline CLIP (threshold 0.75)
uv run python test_logo_detection.py -n 50 \
-e openai/clip-vit-large-patch14 \
-t 0.75 \
--matching-method multi-ref \
--seed 42
# Fine-tuned model (threshold 0.82)
uv run python test_logo_detection.py -n 50 \
-e models/logo_detection/clip_finetuned \
-t 0.82 \
--matching-method multi-ref \
--seed 42
```
### Threshold Selection
The fine-tuned model requires a **higher similarity threshold** than baseline CLIP. This is because contrastive learning successfully pushed non-matching logo similarities much lower, changing the score distribution.
#### Similarity Distribution Analysis
| Metric | Baseline | Fine-tuned |
|--------|----------|------------|
| Wrong logos mean similarity | 0.66 | **0.44** |
| Wrong logos above 0.75 | 23.2% | **0.6%** |
| Correct logos mean similarity | 0.75 | 0.64 |
| Optimal threshold | 0.756 | **0.819** |
| F1 at optimal threshold | 67.1% | **71.9%** |
**Key insight**: The fine-tuned model dramatically reduced similarities to wrong logos (from 0.66 to 0.44 mean). This means at threshold 0.75, it correctly rejects far more non-matches, but needs a higher threshold to avoid false positives from scores that bunch up just above 0.75.
#### Analyze Similarity Distribution
To find the optimal threshold for your model:
```bash
# Run detailed similarity analysis
./analyze_similarity_distribution.sh --model finetuned
# Or analyze both models
./analyze_similarity_distribution.sh --model both
```
This outputs distribution statistics and suggests an optimal threshold based on the data.
### Expected Metrics
| Metric | Baseline (t=0.75) | Fine-tuned (t=0.82) |
|--------|-------------------|---------------------|
| Precision | ~49% | >65% |
| Recall | ~77% | >70% |
| F1 Score | ~60% | >70% |
Training metrics to monitor:
- Mean positive similarity: target > 0.85
- Mean negative similarity: target < 0.50
- Embedding separation: target > 0.35
## Export Model
To export a checkpoint to HuggingFace format:
```bash
uv run python export_model.py \
--checkpoint checkpoints/best.pt \
--output models/logo_detection/clip_finetuned
```
With LoRA weight merging (reduces inference overhead):
```bash
uv run python export_model.py \
--checkpoint checkpoints/best.pt \
--output models/logo_detection/clip_finetuned \
--merge-lora
```
## Using Fine-Tuned Model with DetectLogosDETR
The fine-tuned model works as a drop-in replacement:
```python
from logo_detection_detr import DetectLogosDETR
# Use fine-tuned model
detector = DetectLogosDETR(
logger=logger,
embedding_model="models/logo_detection/clip_finetuned",
)
# Or use baseline for comparison
detector_baseline = DetectLogosDETR(
logger=logger,
embedding_model="openai/clip-vit-large-patch14",
)
```
## Architecture Details
### Training Approach
1. **Contrastive Learning**: Uses InfoNCE loss to maximize similarity between embeddings of the same logo while minimizing similarity to different logos.
2. **LoRA (Low-Rank Adaptation)**: Adds small trainable matrices to attention layers instead of fine-tuning all weights. This is memory-efficient and prevents catastrophic forgetting.
3. **Layer Freezing**: Freezes the first 12 of 24 transformer layers to preserve CLIP's low-level visual features while adapting high-level semantics.
4. **Logo-Level Splits**: Splits data by logo brand (not by image) to test generalization to unseen logos.
### Batch Construction
Each batch contains:
- K different logo brands (default: 32)
- M samples per brand (default: 4)
- Total samples: K × M = 128
This ensures positive pairs (same logo) exist within each batch for contrastive learning.
### Data Augmentation
Medium strength augmentations:
- Random horizontal flip
- Random rotation (±15°)
- Color jitter (brightness, contrast, saturation)
- Random affine transforms
- Random grayscale (10% of images)
## Troubleshooting
### Out of Memory
Reduce batch size and increase gradient accumulation:
```bash
uv run python train_clip_logo.py --config configs/jetson_orin.yaml \
--batch-size 8 \
--gradient-accumulation-steps 16
```
### Slow Training
Ensure mixed precision is enabled:
```bash
uv run python train_clip_logo.py --config configs/jetson_orin.yaml
# mixed_precision: true is default in jetson_orin.yaml
```
### No Improvement
Try adjusting:
- Lower learning rate: `--learning-rate 5e-6`
- Higher temperature: `--temperature 0.1`
- Different loss: edit config to use `loss_type: "combined"`
### Import Error for Fine-Tuned Model
Ensure the `training/` module is in your Python path:
```bash
export PYTHONPATH="${PYTHONPATH}:/data/dev.python/logo_test"
```
## Dependencies Added
The following were added to `pyproject.toml`:
```toml
peft>=0.7.0 # LoRA support
pyyaml>=6.0 # Config file parsing
torchvision>=0.20.0 # Image transforms
```