Files

Rick McEwen 44e8b6ae7d Add CLIP fine-tuning pipeline for logo recognition

Implement contrastive learning with LoRA to fine-tune CLIP's vision
encoder on LogoDet-3K dataset for improved logo embedding similarity.

New training module (training/):
- config.py: TrainingConfig dataclass with all hyperparameters
- dataset.py: LogoContrastiveDataset with logo-level splits
- model.py: LogoFineTunedCLIP wrapper with LoRA support
- losses.py: InfoNCE, TripletLoss, SupConLoss implementations
- trainer.py: Training loop with mixed precision and checkpointing
- evaluation.py: EmbeddingEvaluator for validation metrics

New scripts:
- train_clip_logo.py: Main training entry point
- export_model.py: Export to HuggingFace-compatible format

Configurations:
- configs/jetson_orin.yaml: Optimized for Jetson Orin AGX
- configs/cloud_rtx4090.yaml: Optimized for 24GB cloud GPUs
- configs/cloud_a100.yaml: Optimized for 80GB cloud GPUs

Documentation:
- CLIP_FINETUNING.md: Training guide and usage instructions
- CLOUD_TRAINING.md: Cloud GPU recommendations and cost estimates

Modified:
- logo_detection_detr.py: Add fine-tuned model loading support
- pyproject.toml: Add peft, pyyaml, torchvision dependencies

2026-01-04 13:45:25 -05:00

6.4 KiB

Raw Blame History

Cloud GPU Training for CLIP Fine-Tuning

This document provides guidance on using cloud GPU instances (e.g., RunPod) for faster CLIP fine-tuning compared to local training on Jetson Orin AGX.

Training Time Comparison

Local training on Jetson Orin AGX takes approximately 24 hours. Cloud GPUs offer significantly faster training:

GPU	VRAM	Est. Training Time	Hourly Rate	Est. Total Cost
RTX 4090	24GB	4-6 hours	$0.59/hr	$2.40-$3.50
RTX 3090	24GB	5-7 hours	$0.39/hr	$2.00-$2.75
A100 80GB	80GB	2-3 hours	$1.99/hr	$4.00-$6.00
L40S	48GB	3-4 hours	$0.89/hr	$2.70-$3.60
H100 80GB	80GB	1.5-2 hours	$1.99/hr	$3.00-$4.00

Prices from RunPod Community Cloud as of January 2025. Rates may vary.

Recommendations

Best Value: RTX 4090 ($0.59/hr)

24GB VRAM is sufficient for ViT-L/14 with LoRA
Good balance of speed and cost
Widely available on Community Cloud
Total cost: ~$3 for complete training

Best Speed: H100 80GB ($1.99/hr)

Fastest training (1.5-2 hours)
80GB VRAM allows larger batch sizes
Can increase batch_size to 32+ and reduce gradient_accumulation_steps
Total cost: ~$3-4

Budget Option: RTX 3090 ($0.39/hr)

Cheapest hourly rate
24GB VRAM works fine
Slightly slower than 4090
Total cost: ~$2-3

Cloud-Optimized Configurations

RTX 4090 / RTX 3090 (24GB VRAM)

Create configs/cloud_rtx4090.yaml:

# Optimized for 24GB VRAM cloud GPUs
base_model: "openai/clip-vit-large-patch14"

# Dataset paths
dataset_dir: "LogoDet-3K"
reference_dir: "reference_logos"
db_path: "test_data_mapping.db"

# Data splits
train_split: 0.7
val_split: 0.15
test_split: 0.15

# Larger batches for faster training
batch_size: 32
logos_per_batch: 32
samples_per_logo: 4
gradient_accumulation_steps: 4  # Effective batch = 128
num_workers: 8

# Model architecture
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1
freeze_layers: 12
use_gradient_checkpointing: true

# Training
learning_rate: 1.0e-5
weight_decay: 0.01
warmup_steps: 500
max_epochs: 20
mixed_precision: true

# Loss
temperature: 0.07
loss_type: "infonce"

# Early stopping
patience: 5
min_delta: 0.001

# Output
checkpoint_dir: "checkpoints"
output_dir: "models/logo_detection/clip_finetuned"
save_every_n_epochs: 5

# Logging
log_every_n_steps: 10
eval_every_n_epochs: 1

seed: 42
use_augmentation: true
augmentation_strength: "medium"

A100 / H100 (80GB VRAM)

Create configs/cloud_a100.yaml:

# Optimized for 80GB VRAM cloud GPUs (A100, H100)
base_model: "openai/clip-vit-large-patch14"

# Dataset paths
dataset_dir: "LogoDet-3K"
reference_dir: "reference_logos"
db_path: "test_data_mapping.db"

# Data splits
train_split: 0.7
val_split: 0.15
test_split: 0.15

# Maximum batch sizes for 80GB VRAM
batch_size: 64
logos_per_batch: 32
samples_per_logo: 4
gradient_accumulation_steps: 2  # Effective batch = 128
num_workers: 8

# Model architecture (can disable gradient checkpointing with 80GB)
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1
freeze_layers: 12
use_gradient_checkpointing: false  # Not needed with 80GB

# Training
learning_rate: 1.0e-5
weight_decay: 0.01
warmup_steps: 500
max_epochs: 20
mixed_precision: true

# Loss
temperature: 0.07
loss_type: "infonce"

# Early stopping
patience: 5
min_delta: 0.001

# Output
checkpoint_dir: "checkpoints"
output_dir: "models/logo_detection/clip_finetuned"
save_every_n_epochs: 5

# Logging
log_every_n_steps: 10
eval_every_n_epochs: 1

seed: 42
use_augmentation: true
augmentation_strength: "medium"

RunPod Quick Start

1. Create a Pod

Go to RunPod
Select GPU (RTX 4090 recommended)
Choose PyTorch template (CUDA 12.x)
Set volume size: 50GB (for dataset + models)

2. Setup Environment

# Connect via SSH or web terminal

# Install dependencies
pip install peft pyyaml torchvision transformers tqdm pillow

# Clone your repository (or upload files)
git clone <your-repo-url>
cd logo_test

# Or use runpodctl to sync files
# runpodctl send logo_test/

3. Prepare Data

If data isn't already prepared:

# This creates reference_logos/ and test_data_mapping.db
python prepare_test_data.py

4. Run Training

# For RTX 4090
python train_clip_logo.py --config configs/cloud_rtx4090.yaml

# For A100/H100
python train_clip_logo.py --config configs/cloud_a100.yaml

# Or with command-line overrides
python train_clip_logo.py --config configs/jetson_orin.yaml \
    --batch-size 32 \
    --gradient-accumulation-steps 4 \
    --num-workers 8

5. Download Results

# Export the trained model
python export_model.py \
    --checkpoint checkpoints/best.pt \
    --output models/logo_detection/clip_finetuned

# Download to local machine
# Option 1: Use runpodctl
runpodctl receive models/logo_detection/clip_finetuned

# Option 2: SCP
scp -r root@<pod-ip>:/workspace/logo_test/models/logo_detection/clip_finetuned ./

# Option 3: Compress and download via web
tar -czvf clip_finetuned.tar.gz models/logo_detection/clip_finetuned

Cost Optimization Tips

Use Spot/Interruptible Instances

Community Cloud GPUs are already cheaper
Some providers offer spot pricing for additional savings
Save checkpoints frequently (save_every_n_epochs: 2)

Minimize Storage Costs

RunPod charges $0.10/GB/month for container disk
Use network volumes only if needed
Delete pods when training completes

Monitor Training

Watch for early convergence (may finish before 20 epochs)
Early stopping will save time/cost if no improvement

Batch Training Runs

Test configuration locally first (1-2 epochs)
Run full training on cloud only when config is validated

Cost Comparison Summary

Option	Time	Cost	Best For
Jetson Orin (local)	~24 hrs	Free*	No cloud dependency
RTX 3090 (RunPod)	~6 hrs	~$2.50	Lowest cost
RTX 4090 (RunPod)	~5 hrs	~$3.00	Best value
L40S (RunPod)	~3.5 hrs	~$3.00	Good balance
A100 80GB (RunPod)	~2.5 hrs	~$5.00	Large batches
H100 80GB (RunPod)	~1.5 hrs	~$3.50	Fastest

*Local training has electricity cost but no cloud fees.

6.4 KiB Raw Blame History