# Cloud GPU Training for CLIP Fine-Tuning This document provides guidance on using cloud GPU instances (e.g., RunPod) for faster CLIP fine-tuning compared to local training on Jetson Orin AGX. ## Training Time Comparison Local training on Jetson Orin AGX takes approximately 24 hours. Cloud GPUs offer significantly faster training: | GPU | VRAM | Est. Training Time | Hourly Rate | Est. Total Cost | |-----|------|-------------------|-------------|-----------------| | **RTX 4090** | 24GB | 4-6 hours | $0.59/hr | **$2.40-$3.50** | | **RTX 3090** | 24GB | 5-7 hours | $0.39/hr | **$2.00-$2.75** | | **A100 80GB** | 80GB | 2-3 hours | $1.99/hr | **$4.00-$6.00** | | **L40S** | 48GB | 3-4 hours | $0.89/hr | **$2.70-$3.60** | | **H100 80GB** | 80GB | 1.5-2 hours | $1.99/hr | **$3.00-$4.00** | *Prices from RunPod Community Cloud as of January 2025. Rates may vary.* ## Recommendations ### Best Value: RTX 4090 ($0.59/hr) - 24GB VRAM is sufficient for ViT-L/14 with LoRA - Good balance of speed and cost - Widely available on Community Cloud - **Total cost: ~$3 for complete training** ### Best Speed: H100 80GB ($1.99/hr) - Fastest training (1.5-2 hours) - 80GB VRAM allows larger batch sizes - Can increase `batch_size` to 32+ and reduce `gradient_accumulation_steps` - **Total cost: ~$3-4** ### Budget Option: RTX 3090 ($0.39/hr) - Cheapest hourly rate - 24GB VRAM works fine - Slightly slower than 4090 - **Total cost: ~$2-3** ## Cloud-Optimized Configurations ### RTX 4090 / RTX 3090 (24GB VRAM) Create `configs/cloud_rtx4090.yaml`: ```yaml # Optimized for 24GB VRAM cloud GPUs base_model: "openai/clip-vit-large-patch14" # Dataset paths dataset_dir: "LogoDet-3K" reference_dir: "reference_logos" db_path: "test_data_mapping.db" # Data splits train_split: 0.7 val_split: 0.15 test_split: 0.15 # Larger batches for faster training batch_size: 32 logos_per_batch: 32 samples_per_logo: 4 gradient_accumulation_steps: 4 # Effective batch = 128 num_workers: 8 # Model architecture lora_r: 16 lora_alpha: 32 lora_dropout: 0.1 freeze_layers: 12 use_gradient_checkpointing: true # Training learning_rate: 1.0e-5 weight_decay: 0.01 warmup_steps: 500 max_epochs: 20 mixed_precision: true # Loss temperature: 0.07 loss_type: "infonce" # Early stopping patience: 5 min_delta: 0.001 # Output checkpoint_dir: "checkpoints" output_dir: "models/logo_detection/clip_finetuned" save_every_n_epochs: 5 # Logging log_every_n_steps: 10 eval_every_n_epochs: 1 seed: 42 use_augmentation: true augmentation_strength: "medium" ``` ### A100 / H100 (80GB VRAM) Create `configs/cloud_a100.yaml`: ```yaml # Optimized for 80GB VRAM cloud GPUs (A100, H100) base_model: "openai/clip-vit-large-patch14" # Dataset paths dataset_dir: "LogoDet-3K" reference_dir: "reference_logos" db_path: "test_data_mapping.db" # Data splits train_split: 0.7 val_split: 0.15 test_split: 0.15 # Maximum batch sizes for 80GB VRAM batch_size: 64 logos_per_batch: 32 samples_per_logo: 4 gradient_accumulation_steps: 2 # Effective batch = 128 num_workers: 8 # Model architecture (can disable gradient checkpointing with 80GB) lora_r: 16 lora_alpha: 32 lora_dropout: 0.1 freeze_layers: 12 use_gradient_checkpointing: false # Not needed with 80GB # Training learning_rate: 1.0e-5 weight_decay: 0.01 warmup_steps: 500 max_epochs: 20 mixed_precision: true # Loss temperature: 0.07 loss_type: "infonce" # Early stopping patience: 5 min_delta: 0.001 # Output checkpoint_dir: "checkpoints" output_dir: "models/logo_detection/clip_finetuned" save_every_n_epochs: 5 # Logging log_every_n_steps: 10 eval_every_n_epochs: 1 seed: 42 use_augmentation: true augmentation_strength: "medium" ``` ## RunPod Quick Start ### 1. Create a Pod 1. Go to [RunPod](https://www.runpod.io/) 2. Select GPU (RTX 4090 recommended) 3. Choose PyTorch template (CUDA 12.x) 4. Set volume size: 50GB (for dataset + models) ### 2. Setup Environment ```bash # Connect via SSH or web terminal # Install dependencies pip install peft pyyaml torchvision transformers tqdm pillow # Clone your repository (or upload files) git clone cd logo_test # Or use runpodctl to sync files # runpodctl send logo_test/ ``` ### 3. Prepare Data If data isn't already prepared: ```bash # This creates reference_logos/ and test_data_mapping.db python prepare_test_data.py ``` ### 4. Run Training ```bash # For RTX 4090 python train_clip_logo.py --config configs/cloud_rtx4090.yaml # For A100/H100 python train_clip_logo.py --config configs/cloud_a100.yaml # Or with command-line overrides python train_clip_logo.py --config configs/jetson_orin.yaml \ --batch-size 32 \ --gradient-accumulation-steps 4 \ --num-workers 8 ``` ### 5. Download Results ```bash # Export the trained model python export_model.py \ --checkpoint checkpoints/best.pt \ --output models/logo_detection/clip_finetuned # Download to local machine # Option 1: Use runpodctl runpodctl receive models/logo_detection/clip_finetuned # Option 2: SCP scp -r root@:/workspace/logo_test/models/logo_detection/clip_finetuned ./ # Option 3: Compress and download via web tar -czvf clip_finetuned.tar.gz models/logo_detection/clip_finetuned ``` ## Cost Optimization Tips ### Use Spot/Interruptible Instances - Community Cloud GPUs are already cheaper - Some providers offer spot pricing for additional savings - Save checkpoints frequently (`save_every_n_epochs: 2`) ### Minimize Storage Costs - RunPod charges $0.10/GB/month for container disk - Use network volumes only if needed - Delete pods when training completes ### Monitor Training - Watch for early convergence (may finish before 20 epochs) - Early stopping will save time/cost if no improvement ### Batch Training Runs - Test configuration locally first (1-2 epochs) - Run full training on cloud only when config is validated ## Cost Comparison Summary | Option | Time | Cost | Best For | |--------|------|------|----------| | Jetson Orin (local) | ~24 hrs | Free* | No cloud dependency | | RTX 3090 (RunPod) | ~6 hrs | ~$2.50 | Lowest cost | | RTX 4090 (RunPod) | ~5 hrs | ~$3.00 | Best value | | L40S (RunPod) | ~3.5 hrs | ~$3.00 | Good balance | | A100 80GB (RunPod) | ~2.5 hrs | ~$5.00 | Large batches | | H100 80GB (RunPod) | ~1.5 hrs | ~$3.50 | Fastest | *Local training has electricity cost but no cloud fees. ## References - [RunPod Pricing](https://www.runpod.io/pricing) - [RunPod RTX 4090](https://www.runpod.io/gpu-models/rtx-4090) - [RunPod Documentation](https://docs.runpod.io/)