The hybrid approach combined OCR text recognition with CLIP embeddings
to improve logo matching accuracy. After extensive testing, the approach
was abandoned because:
1. OCR quality on small logo crops is unreliable
2. Text filtering rejected correct matches as often as wrong ones
3. Best hybrid result (57.1% precision) was similar to baseline (55.1%)
4. Recall dropped significantly (52.6% vs 59.6%)
5. Added complexity (EasyOCR dependency, extra parameters) wasn't justified
Removed:
- Hybrid matching methods from DetectLogosDETR class
- Text extraction and similarity methods
- Hybrid test scripts and text_recognition.py module
- Hybrid-related CLI arguments from test_logo_detection.py
The baseline multi-ref matching with 0.70 threshold remains the
recommended approach for logo detection.
Hybrid matching combines text recognition with CLIP similarity:
- If reference logo has text and detection matches: lower CLIP threshold
- If reference has text but detection doesn't match: higher threshold
- If reference has no text: standard threshold
Image preprocessing adds letterbox/stretch modes for CLIP input to
preserve aspect ratio instead of center cropping.
New files:
- run_hybrid_test.sh: Test hybrid matching configurations
- run_preprocess_test.sh: Compare preprocessing modes
Changes to logo_detection_detr.py:
- Add preprocess_mode parameter (default/letterbox/stretch)
- Add set_text_detector() for hybrid matching
- Add extract_text() using EasyOCR
- Add compute_text_similarity() with fuzzy matching
- Add find_best_match_hybrid() with tiered thresholds
Changes to test_logo_detection.py:
- Add --matching-method hybrid option
- Add --preprocess-mode option
- Add hybrid threshold arguments
Implement contrastive learning with LoRA to fine-tune CLIP's vision
encoder on LogoDet-3K dataset for improved logo embedding similarity.
New training module (training/):
- config.py: TrainingConfig dataclass with all hyperparameters
- dataset.py: LogoContrastiveDataset with logo-level splits
- model.py: LogoFineTunedCLIP wrapper with LoRA support
- losses.py: InfoNCE, TripletLoss, SupConLoss implementations
- trainer.py: Training loop with mixed precision and checkpointing
- evaluation.py: EmbeddingEvaluator for validation metrics
New scripts:
- train_clip_logo.py: Main training entry point
- export_model.py: Export to HuggingFace-compatible format
Configurations:
- configs/jetson_orin.yaml: Optimized for Jetson Orin AGX
- configs/cloud_rtx4090.yaml: Optimized for 24GB cloud GPUs
- configs/cloud_a100.yaml: Optimized for 80GB cloud GPUs
Documentation:
- CLIP_FINETUNING.md: Training guide and usage instructions
- CLOUD_TRAINING.md: Cloud GPU recommendations and cost estimates
Modified:
- logo_detection_detr.py: Add fine-tuned model loading support
- pyproject.toml: Add peft, pyyaml, torchvision dependencies
- Update DetectLogosDETR to support both CLIP and DINOv2 models
- Rename clip_model parameter to embedding_model
- Add model type detection for different embedding extraction
- DINOv2 uses CLS token, CLIP uses get_image_features()
- Add -e/--embedding-model argument to test_logo_detection.py
- Include model name in file output header
- Add run_threshold_tests.sh for testing various threshold/margin values
- Add run_model_comparison.sh for comparing CLIP vs DINOv2 models
- Add find_all_matches() method to DetectLogosDETR that returns all
logos above similarity threshold without any rejection logic
- Add --matching-method simple option to test script
- Update run_comparison_tests.sh to include simple matching as Test 1
- Update documentation to describe simple matching method
The multi-ref matching method was missing a margin check against other
logos, causing excessive false positives. This fix adds:
- margin parameter to find_best_match_multi_ref() that requires the
best logo's score to exceed the second-best by a minimum margin
- Test script now passes --margin to both matching methods
- Updated documentation to reflect margin applies to both methods
Also adds run_comparison_tests.sh to run all three matching methods
and compare results.
Add DETR+CLIP based logo detection library and test framework:
- DetectLogosDETR class for logo detection and matching
- Test script with margin-based and multi-ref matching methods
- Data preparation script for test database
- Documentation for API usage and test methodology