Implement contrastive learning with LoRA to fine-tune CLIP's vision encoder on LogoDet-3K dataset for improved logo embedding similarity. New training module (training/): - config.py: TrainingConfig dataclass with all hyperparameters - dataset.py: LogoContrastiveDataset with logo-level splits - model.py: LogoFineTunedCLIP wrapper with LoRA support - losses.py: InfoNCE, TripletLoss, SupConLoss implementations - trainer.py: Training loop with mixed precision and checkpointing - evaluation.py: EmbeddingEvaluator for validation metrics New scripts: - train_clip_logo.py: Main training entry point - export_model.py: Export to HuggingFace-compatible format Configurations: - configs/jetson_orin.yaml: Optimized for Jetson Orin AGX - configs/cloud_rtx4090.yaml: Optimized for 24GB cloud GPUs - configs/cloud_a100.yaml: Optimized for 80GB cloud GPUs Documentation: - CLIP_FINETUNING.md: Training guide and usage instructions - CLOUD_TRAINING.md: Cloud GPU recommendations and cost estimates Modified: - logo_detection_detr.py: Add fine-tuned model loading support - pyproject.toml: Add peft, pyyaml, torchvision dependencies
19 lines
387 B
TOML
19 lines
387 B
TOML
[project]
|
|
name = "logo-test"
|
|
version = "0.1.0"
|
|
description = "Add your description here"
|
|
readme = "README.md"
|
|
requires-python = ">=3.12"
|
|
dependencies = [
|
|
"numpy>=2.2.6",
|
|
"opencv-python>=4.12.0.88",
|
|
"pillow>=12.0.0",
|
|
"torch>=2.9.1",
|
|
"tqdm>=4.67.1",
|
|
"transformers>=4.57.3",
|
|
"typing>=3.10.0.0",
|
|
"peft>=0.7.0",
|
|
"pyyaml>=6.0",
|
|
"torchvision>=0.20.0",
|
|
]
|