Add CLIP fine-tuning pipeline for logo recognition

Implement contrastive learning with LoRA to fine-tune CLIP's vision
encoder on LogoDet-3K dataset for improved logo embedding similarity.

New training module (training/):
- config.py: TrainingConfig dataclass with all hyperparameters
- dataset.py: LogoContrastiveDataset with logo-level splits
- model.py: LogoFineTunedCLIP wrapper with LoRA support
- losses.py: InfoNCE, TripletLoss, SupConLoss implementations
- trainer.py: Training loop with mixed precision and checkpointing
- evaluation.py: EmbeddingEvaluator for validation metrics

New scripts:
- train_clip_logo.py: Main training entry point
- export_model.py: Export to HuggingFace-compatible format

Configurations:
- configs/jetson_orin.yaml: Optimized for Jetson Orin AGX
- configs/cloud_rtx4090.yaml: Optimized for 24GB cloud GPUs
- configs/cloud_a100.yaml: Optimized for 80GB cloud GPUs

Documentation:
- CLIP_FINETUNING.md: Training guide and usage instructions
- CLOUD_TRAINING.md: Cloud GPU recommendations and cost estimates

Modified:
- logo_detection_detr.py: Add fine-tuned model loading support
- pyproject.toml: Add peft, pyyaml, torchvision dependencies
This commit is contained in:
Rick McEwen
2026-01-04 13:45:25 -05:00
parent 1551360028
commit 44e8b6ae7d
16 changed files with 3334 additions and 12 deletions

24
training/__init__.py Normal file
View File

@ -0,0 +1,24 @@
"""
CLIP fine-tuning module for logo recognition.
This module provides tools for fine-tuning CLIP's vision encoder using
contrastive learning on the LogoDet-3K dataset.
"""
from .config import TrainingConfig
from .dataset import LogoContrastiveDataset, create_dataloaders
from .model import LogoFineTunedCLIP
from .losses import InfoNCELoss, TripletLoss
from .trainer import Trainer
from .evaluation import EmbeddingEvaluator
__all__ = [
"TrainingConfig",
"LogoContrastiveDataset",
"create_dataloaders",
"LogoFineTunedCLIP",
"InfoNCELoss",
"TripletLoss",
"Trainer",
"EmbeddingEvaluator",
]