# Logo Detection Test Framework

A testing framework for evaluating logo detection accuracy using DETR (DEtection TRansformer) and CLIP (Contrastive Language-Image Pre-training) models.

## Overview

This project provides tools to:
- Detect logos in images using a fine-tuned DETR model
- Match detected logos against reference images using CLIP embeddings
- Evaluate detection accuracy with precision, recall, and F1 metrics

## Architecture

The system uses a two-stage pipeline:

1. **DETR** - Identifies potential logo regions (bounding boxes) in images
2. **CLIP** - Extracts feature embeddings for each detected region and compares against reference logos

## Installation

Requires Python 3.12+. Uses [uv](https://github.com/astral-sh/uv) for package management.

```bash
# Install dependencies
uv sync

# Or using pip
pip install -r requirements.txt
```

## Usage

### Prepare Test Data

The test framework requires the **LogoDet-3K** dataset. Download it and place it in the project directory:

```
logo_test/
├── LogoDet-3K/           # Dataset directory (required)
│   ├── Clothes/          # Category directories
│   │   ├── Adidas/       # Brand directories with images + XML annotations
│   │   ├── Nike/
│   │   └── ...
│   ├── Electronic/
│   ├── Food/
│   └── ...
```

The dataset should contain images with corresponding Pascal VOC format XML annotation files that define logo bounding boxes.

Then run the preparation script:

```bash
uv run python prepare_test_data.py
```

This script:
1. Scans `LogoDet-3K/` for images and XML annotation files
2. Extracts cropped logo regions using bounding box data → saves to `reference_logos/`
3. Copies full images → saves to `test_images/`
4. Creates `test_data_mapping.db` SQLite database with ground truth mappings

### Run Detection Tests

```bash
# Basic test with default settings (margin-based matching)
uv run python test_logo_detection.py

# Test with more logos and custom threshold
uv run python test_logo_detection.py -n 20 --threshold 0.75

# Use multi-ref matching method
uv run python test_logo_detection.py --matching-method multi-ref \
    --refs-per-logo 5 --min-matching-refs 2

# Reproducible test with seed
uv run python test_logo_detection.py -n 50 --seed 42
```

### Key Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `-n, --num-logos` | 10 | Number of reference logos to sample |
| `-t, --threshold` | 0.7 | Similarity threshold for matching |
| `-d, --detr-threshold` | 0.5 | DETR detection confidence threshold |
| `-e, --embedding-model` | openai/clip-vit-large-patch14 | Embedding model (CLIP or DINOv2) |
| `--matching-method` | margin | Matching method: `simple`, `margin`, or `multi-ref` |
| `--margin` | 0.05 | Margin over second-best match (margin/multi-ref) |
| `--refs-per-logo` | 3 | Reference images per logo |
| `--min-matching-refs` | 1 | Min refs that must match (multi-ref only) |
| `--use-max-similarity` | False | Use max instead of mean similarity (multi-ref only) |
| `--positive-samples` | 5 | Positive test images per logo |
| `--negative-samples` | 20 | Negative test images per logo |
| `-s, --seed` | None | Random seed for reproducibility |
| `--output-file` | None | Append results summary to file (clean output) |
| `--clear-cache` | False | Clear embedding cache before running |

**Matching Methods:**
- `simple` - Returns all logos above threshold (baseline, most permissive)
- `margin` - Requires margin over second-best match (reduces false positives)
- `multi-ref` - Aggregates scores across multiple reference images per logo

See `--help` for all options.

### Run Comparison Tests

```bash
# Compare all matching methods
./run_comparison_tests.sh

# Test various threshold/margin combinations
./run_threshold_tests.sh

# Compare embedding models (CLIP vs DINOv2)
./run_model_comparison.sh
```

| Script | Purpose | Output File |
|--------|---------|-------------|
| `run_comparison_tests.sh` | Compare all 4 matching methods | `comparison_results.txt` |
| `run_threshold_tests.sh` | Test threshold/margin combinations | `threshold_test_results.txt` |
| `run_model_comparison.sh` | Compare CLIP vs DINOv2 models | `model_comparison_results.txt` |

## Project Structure

```
logo_test/
├── logo_detection_detr.py      # Core detection library (DetectLogosDETR class)
├── test_logo_detection.py      # Test script for accuracy evaluation
├── prepare_test_data.py        # Script to prepare test database
├── run_comparison_tests.sh     # Compare all matching methods
├── run_threshold_tests.sh      # Test threshold/margin combinations
├── run_model_comparison.sh     # Compare CLIP vs DINOv2 models
├── test_data_mapping.db        # SQLite database with ground truth
├── reference_logos/            # Reference logo images (not in git)
├── test_images/                # Test images (not in git)
├── LogoDet-3K/                 # Source dataset (not in git)
├── logo_detection_detr_usage.md        # API usage guide
├── logo_detection_test_methodology.md  # Test methodology documentation
└── test_results_analysis.md    # Analysis of test results
```

## Accuracy Improvement Techniques

The framework implements several techniques to improve detection accuracy:

1. **Non-Maximum Suppression (NMS)** - Removes overlapping duplicate detections
2. **Minimum Box Size Filtering** - Filters out noise from tiny detections
3. **Confidence Threshold Filtering** - Removes low-confidence detections
4. **Multiple Reference Images** - Uses multiple refs per logo for robust matching
5. **Margin-Based Matching** - Requires confidence margin over second-best match
6. **Multi-Ref Matching** - Aggregates similarity scores across references
7. **Embedding Caching** - Caches embeddings to avoid recomputation

## Models

### Detection Model
- **DETR**: `Pravallika6/detr-finetuned-logo-detection_v2`

### Embedding Models (selectable via `-e/--embedding-model`)

| Model | Type | Description |
|-------|------|-------------|
| `openai/clip-vit-large-patch14` | CLIP | Default. General-purpose vision-language model |
| `openai/clip-vit-base-patch32` | CLIP | Smaller, faster CLIP variant |
| `facebook/dinov2-small` | DINOv2 | Self-supervised, good for visual similarity |
| `facebook/dinov2-base` | DINOv2 | Larger DINOv2 variant |
| `facebook/dinov2-large` | DINOv2 | Largest DINOv2 variant |

Models are automatically downloaded from HuggingFace on first run and cached in `~/.cache/huggingface/`.

**Note**: When switching between embedding models, use `--clear-cache` to ensure embeddings are recomputed with the new model.

## Documentation

- [API Usage Guide](logo_detection_detr_usage.md) - How to use the DetectLogosDETR class
- [Test Methodology](logo_detection_test_methodology.md) - Detailed explanation of test framework and tuning

## License

MIT