Document threshold tuning for fine-tuned CLIP model
- Add threshold selection section with similarity distribution analysis - Document that fine-tuned model needs threshold 0.82 (vs baseline 0.75) - Add table comparing baseline vs fine-tuned distributions - Update test commands to include correct thresholds - Reference analyze_similarity_distribution.sh for threshold optimization
This commit is contained in:
@ -114,9 +114,12 @@ min_delta: 0.001
|
|||||||
|
|
||||||
### Test Fine-Tuned Model
|
### Test Fine-Tuned Model
|
||||||
|
|
||||||
|
**Important**: The fine-tuned model requires a higher threshold (0.82) than baseline (0.75).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv run python test_logo_detection.py -n 50 \
|
uv run python test_logo_detection.py -n 50 \
|
||||||
-e models/logo_detection/clip_finetuned \
|
-e models/logo_detection/clip_finetuned \
|
||||||
|
-t 0.82 \
|
||||||
--matching-method multi-ref \
|
--matching-method multi-ref \
|
||||||
--seed 42
|
--seed 42
|
||||||
```
|
```
|
||||||
@ -124,26 +127,58 @@ uv run python test_logo_detection.py -n 50 \
|
|||||||
### Compare with Baseline
|
### Compare with Baseline
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Baseline CLIP
|
# Baseline CLIP (threshold 0.75)
|
||||||
uv run python test_logo_detection.py -n 50 \
|
uv run python test_logo_detection.py -n 50 \
|
||||||
-e openai/clip-vit-large-patch14 \
|
-e openai/clip-vit-large-patch14 \
|
||||||
|
-t 0.75 \
|
||||||
--matching-method multi-ref \
|
--matching-method multi-ref \
|
||||||
--seed 42
|
--seed 42
|
||||||
|
|
||||||
# Fine-tuned model
|
# Fine-tuned model (threshold 0.82)
|
||||||
uv run python test_logo_detection.py -n 50 \
|
uv run python test_logo_detection.py -n 50 \
|
||||||
-e models/logo_detection/clip_finetuned \
|
-e models/logo_detection/clip_finetuned \
|
||||||
|
-t 0.82 \
|
||||||
--matching-method multi-ref \
|
--matching-method multi-ref \
|
||||||
--seed 42
|
--seed 42
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Threshold Selection
|
||||||
|
|
||||||
|
The fine-tuned model requires a **higher similarity threshold** than baseline CLIP. This is because contrastive learning successfully pushed non-matching logo similarities much lower, changing the score distribution.
|
||||||
|
|
||||||
|
#### Similarity Distribution Analysis
|
||||||
|
|
||||||
|
| Metric | Baseline | Fine-tuned |
|
||||||
|
|--------|----------|------------|
|
||||||
|
| Wrong logos mean similarity | 0.66 | **0.44** |
|
||||||
|
| Wrong logos above 0.75 | 23.2% | **0.6%** |
|
||||||
|
| Correct logos mean similarity | 0.75 | 0.64 |
|
||||||
|
| Optimal threshold | 0.756 | **0.819** |
|
||||||
|
| F1 at optimal threshold | 67.1% | **71.9%** |
|
||||||
|
|
||||||
|
**Key insight**: The fine-tuned model dramatically reduced similarities to wrong logos (from 0.66 to 0.44 mean). This means at threshold 0.75, it correctly rejects far more non-matches, but needs a higher threshold to avoid false positives from scores that bunch up just above 0.75.
|
||||||
|
|
||||||
|
#### Analyze Similarity Distribution
|
||||||
|
|
||||||
|
To find the optimal threshold for your model:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run detailed similarity analysis
|
||||||
|
./analyze_similarity_distribution.sh --model finetuned
|
||||||
|
|
||||||
|
# Or analyze both models
|
||||||
|
./analyze_similarity_distribution.sh --model both
|
||||||
|
```
|
||||||
|
|
||||||
|
This outputs distribution statistics and suggests an optimal threshold based on the data.
|
||||||
|
|
||||||
### Expected Metrics
|
### Expected Metrics
|
||||||
|
|
||||||
| Metric | Baseline CLIP | Target (Fine-tuned) |
|
| Metric | Baseline (t=0.75) | Fine-tuned (t=0.82) |
|
||||||
|--------|---------------|---------------------|
|
|--------|-------------------|---------------------|
|
||||||
| Precision | ~49% | >70% |
|
| Precision | ~49% | >65% |
|
||||||
| Recall | ~77% | >75% |
|
| Recall | ~77% | >70% |
|
||||||
| F1 Score | ~60% | >72% |
|
| F1 Score | ~60% | >70% |
|
||||||
|
|
||||||
Training metrics to monitor:
|
Training metrics to monitor:
|
||||||
- Mean positive similarity: target > 0.85
|
- Mean positive similarity: target > 0.85
|
||||||
|
|||||||
Reference in New Issue
Block a user