Image-level splits allow the model to see some images from each logo
brand during training, unlike logo-level splits where test brands are
completely unseen. This is less rigorous but more representative of
real-world use.
Changes:
- Add configs/image_level_splits.yaml with gentler training settings:
- split_level: "image" for image-level splits
- temperature: 0.15 (softer contrastive learning)
- learning_rate: 5e-6 (slower learning)
- max_epochs: 30 (more epochs)
- Update training/dataset.py:
- Add split_level parameter to LogoDataset
- Implement _split_images() for image-level splitting
- Update LogoContrastiveDataset to use split-specific image mappings
- Update training/config.py:
- Add split_level field to TrainingConfig
- Update train_clip_logo.py:
- Pass split_level to create_dataloaders
Usage:
uv run python train_clip_logo.py --config configs/image_level_splits.yaml
Implement contrastive learning with LoRA to fine-tune CLIP's vision
encoder on LogoDet-3K dataset for improved logo embedding similarity.
New training module (training/):
- config.py: TrainingConfig dataclass with all hyperparameters
- dataset.py: LogoContrastiveDataset with logo-level splits
- model.py: LogoFineTunedCLIP wrapper with LoRA support
- losses.py: InfoNCE, TripletLoss, SupConLoss implementations
- trainer.py: Training loop with mixed precision and checkpointing
- evaluation.py: EmbeddingEvaluator for validation metrics
New scripts:
- train_clip_logo.py: Main training entry point
- export_model.py: Export to HuggingFace-compatible format
Configurations:
- configs/jetson_orin.yaml: Optimized for Jetson Orin AGX
- configs/cloud_rtx4090.yaml: Optimized for 24GB cloud GPUs
- configs/cloud_a100.yaml: Optimized for 80GB cloud GPUs
Documentation:
- CLIP_FINETUNING.md: Training guide and usage instructions
- CLOUD_TRAINING.md: Cloud GPU recommendations and cost estimates
Modified:
- logo_detection_detr.py: Add fine-tuned model loading support
- pyproject.toml: Add peft, pyyaml, torchvision dependencies