Add accuracy test framework, prompts, results, and analysis reports
Includes accuracy test scripts for Qwen (local) and Gemini (cloud API), three prompt variants (original, capstone, constrained), test results from all runs, and two analysis reports with an HTML presentation version.
This commit is contained in:
1
.python-version
Normal file
1
.python-version
Normal file
@ -0,0 +1 @@
|
|||||||
|
3.12
|
||||||
170
accuracy_analysis_report.md
Normal file
170
accuracy_analysis_report.md
Normal file
@ -0,0 +1,170 @@
|
|||||||
|
# Jersey Color Detection Accuracy Analysis
|
||||||
|
|
||||||
|
## Test Configuration
|
||||||
|
|
||||||
|
- **Models tested:** Gemini 3 Flash Preview (cloud API), Qwen3-VL-8B (local, via llama.cpp)
|
||||||
|
- **Prompts tested:** `jersey_prompt.txt` (original), `jersey_prompt_capstone.txt` (capstone)
|
||||||
|
- **Test images:** 161 annotated basketball jersey images
|
||||||
|
- **Ground truth colors:** 202 (excluding white)
|
||||||
|
- **Images resized** to max 768px wide before submission
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Comparison
|
||||||
|
|
||||||
|
| Metric | Gemini + Original | Gemini + Capstone | Qwen + Original | Qwen + Capstone |
|
||||||
|
|----------------------------|:-----------------:|:-----------------:|:----------------:|:---------------:|
|
||||||
|
| **Recall (exact)** | 64.4% | 60.9% | 64.4% | 65.8% |
|
||||||
|
| **Recall (exact+similar)** | **81.2%** | 78.2% | 77.2% | 77.7% |
|
||||||
|
| **Recall (missed)** | 18.8% | 21.8% | 22.8% | 22.3% |
|
||||||
|
| **Precision (exact)** | 74.7% | 70.7% | 70.7% | 73.9% |
|
||||||
|
| **Precision (exact+sim.)** | **93.7%** | 90.2% | 84.8% | 87.2% |
|
||||||
|
| **Extra/wrong** | **6.3%** | 9.8% | 15.2% | 12.8% |
|
||||||
|
| PASS images | **124** | 118 | 117 | 119 |
|
||||||
|
| PARTIAL images | 19 | 21 | 18 | 19 |
|
||||||
|
| FAIL images | **18** | 22 | 26 | 23 |
|
||||||
|
| Avg time per image | 13.3s | 11.7s | 9.5s | 8.9s |
|
||||||
|
|
||||||
|
### Key Takeaways
|
||||||
|
|
||||||
|
1. **Gemini + original prompt is the best combination** across all major metrics: highest recall (81.2%), highest precision (93.7%), fewest failures (18), and fewest extra/wrong colors (6.3%).
|
||||||
|
|
||||||
|
2. **Exact recall is remarkably stable** across all four runs (60.9%–65.8%), suggesting ~35% of ground truth colors are inherently difficult for current VLMs regardless of model or prompt.
|
||||||
|
|
||||||
|
3. **Gemini produces far fewer hallucinated colors** than Qwen. Gemini's extra/wrong rate is 6.3%–9.8% vs. Qwen's 12.8%–15.2%. When Gemini detects a color, it is almost always correct.
|
||||||
|
|
||||||
|
4. **The capstone prompt did not improve results** for either model. For Gemini it degraded both recall and precision. For Qwen the difference was negligible.
|
||||||
|
|
||||||
|
5. **Qwen is ~30% faster** (8.9–9.5s vs 11.7–13.3s per image) but at the cost of lower accuracy and more false positives.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Color-Level Analysis
|
||||||
|
|
||||||
|
### Most Problematic Ground Truth Colors
|
||||||
|
|
||||||
|
Colors most frequently missed across all four test runs:
|
||||||
|
|
||||||
|
| Color | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Total Misses | Common Confusion |
|
||||||
|
|-----------------|:-----------:|:----------:|:---------:|:--------:|:------------:|---------------------|
|
||||||
|
| **gray** | 7 | 6 | 7 | 9 | 29 | Often returned as "grey" (similar match) or missed entirely |
|
||||||
|
| **maroon** | 5 | 9 | 8 | 7 | 29 | Frequently confused with "red" |
|
||||||
|
| **black** | 7 | 7 | 6 | 6 | 26 | Often not detected at all |
|
||||||
|
| **light blue** | 2 | 2 | 8 | 5 | 17 | Returned as "blue" (Qwen especially) |
|
||||||
|
| **green** | 3 | 4 | 3 | 4 | 14 | Sometimes returned as "black" |
|
||||||
|
| **dark brown** | 0 | 1 | 4 | 4 | 9 | Returned as "black" or "brown" |
|
||||||
|
| **brown** | 1 | 1 | 3 | 3 | 8 | Returned as "black" or "orange" |
|
||||||
|
| **teal** | 2 | 2 | 2 | 2 | 8 | Confused with "green" or "blue" |
|
||||||
|
| **blue** | 3 | 3 | 3 | 2 | 11 | Sometimes not detected at all |
|
||||||
|
| **gold/yellow** | 2 | 2 | 1 | 1 | 6 | Occasionally missed entirely |
|
||||||
|
|
||||||
|
### Most Common Extra/Wrong Colors Reported
|
||||||
|
|
||||||
|
| Extra Color | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Notes |
|
||||||
|
|--------------|:-----------:|:----------:|:---------:|:--------:|-------|
|
||||||
|
| **red** | 3 | 7 | 7 | 6 | Typically a misread of maroon |
|
||||||
|
| **black** | 2 | 4 | 7 | 7 | Misread of dark brown/green/gray |
|
||||||
|
| **blue** | 3 | 2 | 10 | 6 | Misread of light blue or teal |
|
||||||
|
| **green** | 1 | 1 | 1 | 1 | Misread of teal |
|
||||||
|
| **orange** | 1 | 1 | 1 | 1 | Misread of brown |
|
||||||
|
|
||||||
|
### Similar-Match Confusion Patterns
|
||||||
|
|
||||||
|
These are cases where the VLM returned a color in the same family but not the exact ground truth term:
|
||||||
|
|
||||||
|
| Expected | Returned As | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap |
|
||||||
|
|------------------|----------------|:-----------:|:----------:|:---------:|:--------:|
|
||||||
|
| gray | grey | 9 | 10 | — | — |
|
||||||
|
| navy blue | blue | 7 | 6 | 8 | 8 |
|
||||||
|
| dark blue | blue | 5 | 6 | 10 | 9 |
|
||||||
|
| dark brown | brown | 5 | 5 | 2 | 2 |
|
||||||
|
| gold | yellow | 3 | 2 | 5 | 3 |
|
||||||
|
| dark blue | navy blue/navy | 4 | 4 | — | 1 |
|
||||||
|
|
||||||
|
**Observations:**
|
||||||
|
- **gray/grey** is purely a spelling variant — Gemini consistently uses British spelling. Qwen uses "gray" so this never triggers for Qwen.
|
||||||
|
- **navy blue → blue** and **dark blue → blue** are the most common simplifications. Both models tend to drop shade qualifiers.
|
||||||
|
- **dark brown → brown** follows the same pattern of dropping the shade qualifier.
|
||||||
|
- **gold → yellow** is a genuine color perception difference where models see yellow-dominant gold jerseys.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Persistently Failed Images
|
||||||
|
|
||||||
|
These 11 images failed across **all four** test runs, representing the hardest cases:
|
||||||
|
|
||||||
|
| Image | GT Colors | Typical VLM Response | Failure Pattern |
|
||||||
|
|-------|-----------|---------------------|-----------------|
|
||||||
|
| 016 - maroon.jpg | maroon | (none) or red | Maroon not recognized |
|
||||||
|
| 029 - maroon_white.jpg | maroon | red | Maroon → red confusion |
|
||||||
|
| 034 - light blue.jpg | light blue | blue | Shade qualifier dropped |
|
||||||
|
| 046 - green.jpg | green | black | Dark green misread as black |
|
||||||
|
| 053 - black_white.jpg | black | (not detected) | Black jerseys missed |
|
||||||
|
| 057 - gold or yellow.jpg | gold\|yellow | (not detected) | Gold/yellow missed |
|
||||||
|
| 132 - brown_white.jpg | brown | orange | Brown → orange confusion |
|
||||||
|
| 134 - teal_white.jpg | teal | blue or green | Teal not in model vocabulary |
|
||||||
|
| 138 - maroon.jpg | maroon | red | Maroon → red confusion |
|
||||||
|
| 150 - green_gray.jpg | green, gray | black | Both colors misread |
|
||||||
|
| 160 - blue_white.jpg | blue | (not detected) | Blue not detected |
|
||||||
|
|
||||||
|
### Root Cause Categories
|
||||||
|
|
||||||
|
1. **Maroon blindness (3 images):** Both models consistently classify maroon as red. This is the single largest systematic error.
|
||||||
|
|
||||||
|
2. **Dark color confusion (3 images):** Dark green, brown, and black are frequently confused with each other, especially in low-contrast or shadowed images.
|
||||||
|
|
||||||
|
3. **Shade qualifier loss (2 images):** "Light blue" and "teal" are simplified to "blue" or "green" — models use a coarser color vocabulary than the ground truth.
|
||||||
|
|
||||||
|
4. **Non-detection (3 images):** Some jerseys are simply not detected at all, likely due to occlusion, unusual angles, or low image quality.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model-Specific Observations
|
||||||
|
|
||||||
|
### Gemini 3 Flash
|
||||||
|
- **Strengths:** Highest precision (93.7%), very few hallucinated colors, good at similar-family matching. Never produced gibberish color names.
|
||||||
|
- **Weaknesses:** Consistently uses British "grey" instead of "gray". Slower than local model.
|
||||||
|
- **Prompt sensitivity:** The capstone prompt slightly hurt performance (81.2% → 78.2% recall), suggesting the original simpler prompt works better.
|
||||||
|
|
||||||
|
### Qwen3-VL-8B
|
||||||
|
- **Strengths:** Faster inference (8.9s avg). Slightly higher exact match rate with capstone prompt (65.8%).
|
||||||
|
- **Weaknesses:** Much higher false positive rate (12.8–15.2% extra/wrong). Struggles significantly with "light blue" (8 misses with original prompt). Produced one gibberish color ("redolas"). Over-reports "blue" and "black".
|
||||||
|
- **Prompt sensitivity:** Minimal difference between prompts. Capstone prompt slightly reduced errors.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Normalize "grey" → "gray"** in post-processing to eliminate the most common similar-match gap for Gemini.
|
||||||
|
|
||||||
|
2. **Add "maroon" to the prompt** as an explicit color option or example, since both models struggle to distinguish it from red without guidance.
|
||||||
|
|
||||||
|
3. **Consider a constrained color vocabulary** in the prompt (e.g., "Choose from: red, blue, green, yellow, orange, purple, black, gray, brown, maroon, teal, light blue, navy blue, gold, pink") to reduce vocabulary mismatch and shade-qualifier drift.
|
||||||
|
|
||||||
|
4. **Post-processing color mapping** could recover many similar-match cases automatically: navy→navy blue, grey→gray, dark blue→navy blue, etc.
|
||||||
|
|
||||||
|
5. **The original `jersey_prompt.txt` is the better prompt** — the capstone prompt's additional constraints did not improve accuracy for either model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Color Similarity Families
|
||||||
|
|
||||||
|
The following color families were used for "similar match" scoring. Two colors count as a similar match if they appear in the same family:
|
||||||
|
|
||||||
|
| Family | Member Colors |
|
||||||
|
|------------|-------------------------------------------------------|
|
||||||
|
| blue | blue, dark blue, navy blue, navy, royal blue |
|
||||||
|
| light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
|
||||||
|
| red | red, scarlet, crimson |
|
||||||
|
| dark_red | maroon, burgundy, dark red, wine |
|
||||||
|
| green | green, dark green, forest green, kelly green |
|
||||||
|
| yellow | yellow, gold, golden |
|
||||||
|
| orange | orange, burnt orange |
|
||||||
|
| brown | brown, dark brown |
|
||||||
|
| purple | purple, violet |
|
||||||
|
| gray | gray, grey, silver, charcoal |
|
||||||
|
| black | black |
|
||||||
|
| teal | teal, turquoise, cyan, aqua |
|
||||||
|
| pink | pink, magenta, hot pink, rose |
|
||||||
|
|
||||||
|
**Note:** Colors in *different* families are never counted as similar, even if perceptually close (e.g., maroon and red are in separate families; brown and orange are in separate families). This is intentional — the similar-match metric captures vocabulary variation within the same color concept, not genuine color misidentification.
|
||||||
760
accuracy_analysis_report_round2.html
Normal file
760
accuracy_analysis_report_round2.html
Normal file
@ -0,0 +1,760 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Jersey Color Detection Accuracy — Round 2 Analysis</title>
|
||||||
|
<style>
|
||||||
|
:root {
|
||||||
|
--green: #16a34a;
|
||||||
|
--green-bg: #dcfce7;
|
||||||
|
--red: #dc2626;
|
||||||
|
--red-bg: #fee2e2;
|
||||||
|
--blue: #2563eb;
|
||||||
|
--blue-bg: #dbeafe;
|
||||||
|
--amber: #d97706;
|
||||||
|
--amber-bg: #fef3c7;
|
||||||
|
--gray-50: #f9fafb;
|
||||||
|
--gray-100: #f3f4f6;
|
||||||
|
--gray-200: #e5e7eb;
|
||||||
|
--gray-300: #d1d5db;
|
||||||
|
--gray-600: #4b5563;
|
||||||
|
--gray-700: #374151;
|
||||||
|
--gray-800: #1f2937;
|
||||||
|
--gray-900: #111827;
|
||||||
|
}
|
||||||
|
|
||||||
|
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||||
|
|
||||||
|
body {
|
||||||
|
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
|
||||||
|
line-height: 1.6;
|
||||||
|
color: var(--gray-800);
|
||||||
|
max-width: 1100px;
|
||||||
|
margin: 0 auto;
|
||||||
|
padding: 2rem;
|
||||||
|
background: #fff;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 {
|
||||||
|
font-size: 2rem;
|
||||||
|
color: var(--gray-900);
|
||||||
|
border-bottom: 3px solid var(--blue);
|
||||||
|
padding-bottom: 0.5rem;
|
||||||
|
margin-bottom: 0.5rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
h2 {
|
||||||
|
font-size: 1.5rem;
|
||||||
|
color: var(--blue);
|
||||||
|
margin-top: 2.5rem;
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
border-bottom: 2px solid var(--gray-200);
|
||||||
|
padding-bottom: 0.3rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
h3 {
|
||||||
|
font-size: 1.15rem;
|
||||||
|
color: var(--gray-700);
|
||||||
|
margin-top: 1.5rem;
|
||||||
|
margin-bottom: 0.5rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.meta {
|
||||||
|
color: var(--gray-600);
|
||||||
|
font-size: 0.95rem;
|
||||||
|
margin-bottom: 1.5rem;
|
||||||
|
}
|
||||||
|
.meta strong { color: var(--gray-800); }
|
||||||
|
|
||||||
|
p { margin-bottom: 0.75rem; }
|
||||||
|
|
||||||
|
hr {
|
||||||
|
border: none;
|
||||||
|
border-top: 1px solid var(--gray-200);
|
||||||
|
margin: 2rem 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Tables */
|
||||||
|
table {
|
||||||
|
width: 100%;
|
||||||
|
border-collapse: collapse;
|
||||||
|
margin: 1rem 0 1.5rem;
|
||||||
|
font-size: 0.9rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
th, td {
|
||||||
|
padding: 0.5rem 0.75rem;
|
||||||
|
text-align: center;
|
||||||
|
border: 1px solid var(--gray-200);
|
||||||
|
}
|
||||||
|
|
||||||
|
th {
|
||||||
|
background: var(--gray-800);
|
||||||
|
color: #fff;
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
td:first-child, th:first-child {
|
||||||
|
text-align: left;
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
tr:nth-child(even) { background: var(--gray-50); }
|
||||||
|
tr:hover { background: var(--gray-100); }
|
||||||
|
|
||||||
|
/* Highlight classes */
|
||||||
|
.best {
|
||||||
|
background: var(--green-bg) !important;
|
||||||
|
color: var(--green);
|
||||||
|
font-weight: 700;
|
||||||
|
}
|
||||||
|
|
||||||
|
.worst {
|
||||||
|
background: var(--red-bg) !important;
|
||||||
|
color: var(--red);
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
.improved {
|
||||||
|
background: var(--blue-bg) !important;
|
||||||
|
color: var(--blue);
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
.warning {
|
||||||
|
background: var(--amber-bg) !important;
|
||||||
|
color: var(--amber);
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Callout boxes */
|
||||||
|
.callout {
|
||||||
|
border-left: 4px solid;
|
||||||
|
padding: 1rem 1.25rem;
|
||||||
|
margin: 1rem 0;
|
||||||
|
border-radius: 0 6px 6px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.callout-green {
|
||||||
|
border-color: var(--green);
|
||||||
|
background: var(--green-bg);
|
||||||
|
}
|
||||||
|
|
||||||
|
.callout-red {
|
||||||
|
border-color: var(--red);
|
||||||
|
background: var(--red-bg);
|
||||||
|
}
|
||||||
|
|
||||||
|
.callout-blue {
|
||||||
|
border-color: var(--blue);
|
||||||
|
background: var(--blue-bg);
|
||||||
|
}
|
||||||
|
|
||||||
|
.callout-amber {
|
||||||
|
border-color: var(--amber);
|
||||||
|
background: var(--amber-bg);
|
||||||
|
}
|
||||||
|
|
||||||
|
.callout strong { display: block; margin-bottom: 0.25rem; }
|
||||||
|
|
||||||
|
/* Model comparison cards */
|
||||||
|
.model-cards {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: 1.5rem;
|
||||||
|
margin: 1rem 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-card {
|
||||||
|
border: 2px solid var(--gray-200);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 1.25rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-card h3 {
|
||||||
|
margin-top: 0;
|
||||||
|
padding-bottom: 0.4rem;
|
||||||
|
border-bottom: 2px solid;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-card.gemini h3 { border-color: var(--blue); color: var(--blue); }
|
||||||
|
.model-card.qwen h3 { border-color: var(--green); color: var(--green); }
|
||||||
|
|
||||||
|
.model-card ul {
|
||||||
|
list-style: none;
|
||||||
|
padding: 0;
|
||||||
|
margin: 0.5rem 0 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-card li {
|
||||||
|
padding: 0.3rem 0;
|
||||||
|
font-size: 0.9rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-card li strong { color: var(--gray-700); }
|
||||||
|
|
||||||
|
/* Recommendation list */
|
||||||
|
ol.recs {
|
||||||
|
counter-reset: rec;
|
||||||
|
list-style: none;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
ol.recs li {
|
||||||
|
counter-increment: rec;
|
||||||
|
padding: 0.75rem 1rem 0.75rem 3.25rem;
|
||||||
|
margin-bottom: 0.5rem;
|
||||||
|
border-radius: 6px;
|
||||||
|
background: var(--gray-50);
|
||||||
|
border: 1px solid var(--gray-200);
|
||||||
|
position: relative;
|
||||||
|
}
|
||||||
|
|
||||||
|
ol.recs li::before {
|
||||||
|
content: counter(rec);
|
||||||
|
position: absolute;
|
||||||
|
left: 0.75rem;
|
||||||
|
top: 0.75rem;
|
||||||
|
width: 1.75rem;
|
||||||
|
height: 1.75rem;
|
||||||
|
background: var(--blue);
|
||||||
|
color: #fff;
|
||||||
|
border-radius: 50%;
|
||||||
|
text-align: center;
|
||||||
|
line-height: 1.75rem;
|
||||||
|
font-weight: 700;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Code / prompt block */
|
||||||
|
pre {
|
||||||
|
background: var(--gray-900);
|
||||||
|
color: #e5e7eb;
|
||||||
|
padding: 1.25rem;
|
||||||
|
border-radius: 8px;
|
||||||
|
overflow-x: auto;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
line-height: 1.5;
|
||||||
|
margin: 1rem 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
code {
|
||||||
|
font-family: "SF Mono", "Fira Code", "Fira Mono", Menlo, Consolas, monospace;
|
||||||
|
background: var(--gray-100);
|
||||||
|
padding: 0.15rem 0.35rem;
|
||||||
|
border-radius: 3px;
|
||||||
|
font-size: 0.88em;
|
||||||
|
}
|
||||||
|
|
||||||
|
pre code {
|
||||||
|
background: none;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Color swatch in similarity table */
|
||||||
|
.swatch {
|
||||||
|
display: inline-block;
|
||||||
|
width: 14px;
|
||||||
|
height: 14px;
|
||||||
|
border-radius: 3px;
|
||||||
|
margin-right: 6px;
|
||||||
|
vertical-align: middle;
|
||||||
|
border: 1px solid var(--gray-300);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Badge */
|
||||||
|
.badge {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 0.15rem 0.5rem;
|
||||||
|
border-radius: 4px;
|
||||||
|
font-size: 0.8rem;
|
||||||
|
font-weight: 700;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.03em;
|
||||||
|
}
|
||||||
|
.badge-pass { background: var(--green-bg); color: var(--green); }
|
||||||
|
.badge-partial { background: var(--amber-bg); color: var(--amber); }
|
||||||
|
.badge-fail { background: var(--red-bg); color: var(--red); }
|
||||||
|
|
||||||
|
/* Print styles */
|
||||||
|
@media print {
|
||||||
|
body { padding: 0; font-size: 11pt; }
|
||||||
|
.callout, .model-card { break-inside: avoid; }
|
||||||
|
h2 { break-after: avoid; }
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<h1>Jersey Color Detection Accuracy — Round 2 Analysis</h1>
|
||||||
|
<div class="meta">
|
||||||
|
<strong>Date:</strong> March 3, 2026<br>
|
||||||
|
<strong>Models:</strong> Gemini 3 Flash Preview, Qwen3-VL-8B (local via llama.cpp)<br>
|
||||||
|
<strong>Prompts:</strong> jersey_prompt.txt (original), jersey_prompt_capstone.txt (capstone), jersey_prompt_constrained.txt (constrained)<br>
|
||||||
|
<strong>Test set:</strong> 161 annotated images, 202 ground truth colors (excluding white)
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Summary Comparison</h2>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Metric</th>
|
||||||
|
<th>Qwen Original</th>
|
||||||
|
<th>Qwen Capstone</th>
|
||||||
|
<th>Qwen Constrained</th>
|
||||||
|
<th>Gemini Original</th>
|
||||||
|
<th>Gemini Capstone</th>
|
||||||
|
<th>Gemini Constrained</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Recall (exact)</td>
|
||||||
|
<td>65.3%</td>
|
||||||
|
<td>66.3%</td>
|
||||||
|
<td class="best">71.8%</td>
|
||||||
|
<td>62.4%</td>
|
||||||
|
<td class="worst">60.9%</td>
|
||||||
|
<td class="improved">67.8%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Recall (exact+similar)</td>
|
||||||
|
<td>78.2%</td>
|
||||||
|
<td>78.2%</td>
|
||||||
|
<td class="best">82.7%</td>
|
||||||
|
<td>79.7%</td>
|
||||||
|
<td class="worst">78.2%</td>
|
||||||
|
<td class="improved">81.7%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Missed</td>
|
||||||
|
<td>21.8%</td>
|
||||||
|
<td>21.8%</td>
|
||||||
|
<td class="best">17.3%</td>
|
||||||
|
<td>20.3%</td>
|
||||||
|
<td class="worst">21.8%</td>
|
||||||
|
<td class="improved">18.3%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Precision (exact)</td>
|
||||||
|
<td>71.7%</td>
|
||||||
|
<td>74.0%</td>
|
||||||
|
<td class="improved">78.4%</td>
|
||||||
|
<td>72.0%</td>
|
||||||
|
<td class="worst">69.5%</td>
|
||||||
|
<td class="best">78.7%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Precision (exact+sim.)</td>
|
||||||
|
<td>85.9%</td>
|
||||||
|
<td>87.3%</td>
|
||||||
|
<td class="improved">90.3%</td>
|
||||||
|
<td>91.4%</td>
|
||||||
|
<td class="worst">88.7%</td>
|
||||||
|
<td class="best">94.3%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Extra/wrong</td>
|
||||||
|
<td>14.1%</td>
|
||||||
|
<td>12.7%</td>
|
||||||
|
<td class="improved">9.7%</td>
|
||||||
|
<td>8.6%</td>
|
||||||
|
<td class="worst">11.3%</td>
|
||||||
|
<td class="best">5.7%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="badge badge-pass">PASS</span></td>
|
||||||
|
<td>118</td>
|
||||||
|
<td>120</td>
|
||||||
|
<td class="best">127</td>
|
||||||
|
<td>120</td>
|
||||||
|
<td>117</td>
|
||||||
|
<td class="improved">124</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="badge badge-partial">PARTIAL</span></td>
|
||||||
|
<td>19</td>
|
||||||
|
<td>19</td>
|
||||||
|
<td class="best">15</td>
|
||||||
|
<td>20</td>
|
||||||
|
<td class="worst">22</td>
|
||||||
|
<td>19</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="badge badge-fail">FAIL</span></td>
|
||||||
|
<td>24</td>
|
||||||
|
<td>22</td>
|
||||||
|
<td>19</td>
|
||||||
|
<td>21</td>
|
||||||
|
<td>22</td>
|
||||||
|
<td class="best">18</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Total time</td>
|
||||||
|
<td>1557s</td>
|
||||||
|
<td>1437s</td>
|
||||||
|
<td>1596s</td>
|
||||||
|
<td class="best">253s</td>
|
||||||
|
<td>260s</td>
|
||||||
|
<td>344s</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Key Findings</h2>
|
||||||
|
|
||||||
|
<h3>1. The constrained prompt is the best prompt for both models</h3>
|
||||||
|
|
||||||
|
<p>The constrained vocabulary prompt delivered the strongest results across the board:</p>
|
||||||
|
|
||||||
|
<div class="callout callout-green">
|
||||||
|
<strong>Qwen + Constrained</strong>
|
||||||
|
Achieved the highest recall of any combination at <strong>82.7%</strong> (167/202 found), up from 78.2% with both other prompts. It also posted the most PASS images (<strong>127</strong>, up from 118/120) and the fewest FAIL images (<strong>19</strong>, down from 24/22).
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="callout callout-blue">
|
||||||
|
<strong>Gemini + Constrained</strong>
|
||||||
|
Achieved the highest precision of any combination at <strong>94.3%</strong> (164/174 correct), with only <strong>5.7% extra/wrong</strong> colors — the lowest error rate across all six runs. It tied for fewest failures at <strong>18</strong>.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h3>2. Exact match rates jumped significantly</h3>
|
||||||
|
|
||||||
|
<p>The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:</p>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Model</th>
|
||||||
|
<th>Exact Match (Original)</th>
|
||||||
|
<th>Exact Match (Constrained)</th>
|
||||||
|
<th>Improvement</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Qwen</td>
|
||||||
|
<td>65.3% (132)</td>
|
||||||
|
<td class="best">71.8% (145)</td>
|
||||||
|
<td class="improved">+6.5 pp</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Gemini</td>
|
||||||
|
<td>62.4% (126)</td>
|
||||||
|
<td class="best">67.8% (137)</td>
|
||||||
|
<td class="improved">+5.4 pp</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."</p>
|
||||||
|
|
||||||
|
<h3>3. Targeted color improvements</h3>
|
||||||
|
|
||||||
|
<p>The constrained prompt's explicit color guidance fixed the worst systematic errors:</p>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Problem Color</th>
|
||||||
|
<th>Qwen Misses (Orig→Constrained)</th>
|
||||||
|
<th>Gemini Misses (Orig→Constrained)</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#800000"></span>maroon</td>
|
||||||
|
<td>8 → <span style="color:var(--green);font-weight:700">3</span></td>
|
||||||
|
<td>6 → <span style="color:var(--green);font-weight:700">3</span></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#87ceeb"></span>light blue</td>
|
||||||
|
<td>7 → <span style="color:var(--green);font-weight:700">1</span></td>
|
||||||
|
<td>3 → <span style="color:var(--green);font-weight:700">1</span></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#3e2723"></span>dark brown</td>
|
||||||
|
<td>4 → <span style="color:var(--green);font-weight:700">2</span></td>
|
||||||
|
<td>1 → 1</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#008080"></span>teal</td>
|
||||||
|
<td>2 → 2</td>
|
||||||
|
<td>2 → 2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#9e9e9e"></span>gray</td>
|
||||||
|
<td class="warning">7 → 8</td>
|
||||||
|
<td>6 → 6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><span class="swatch" style="background:#222"></span>black</td>
|
||||||
|
<td>6 → 6</td>
|
||||||
|
<td>7 → 7</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><strong>Maroon:</strong> Cut in half for both models. Previously the most-missed color for Qwen; now ranks 5th.</li>
|
||||||
|
<li><strong>Light blue:</strong> Near-elimination of the "light blue → blue" confusion for both models (7→1 for Qwen, 3→1 for Gemini).</li>
|
||||||
|
<li><strong>Gray/grey:</strong> The spelling normalization instruction eliminated the grey→gray similar-match penalty for Gemini entirely (10 confusions → 0). However, gray detection misses remain unchanged — these are images where gray jerseys aren't detected at all, not a naming issue.</li>
|
||||||
|
<li><strong>Teal and black</strong> remain stubbornly problematic regardless of prompt.</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h3>4. New overcorrection pattern with constrained prompt</h3>
|
||||||
|
|
||||||
|
<div class="callout callout-amber">
|
||||||
|
<strong>Overcorrection Warning</strong>
|
||||||
|
The constrained prompt introduced a new failure mode — models now occasionally over-apply newly-learned color terms.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><strong>Qwen + Constrained</strong> reported "maroon" as an extra/wrong color <strong>5 times</strong> (was 0 previously). It's now calling some brown and red jerseys "maroon" — the opposite of the original problem. Specific cases: 007 (brown→maroon), 031 (brown→maroon), 048 (red→maroon), 142 (orange→maroon).</li>
|
||||||
|
<li><strong>Gemini + Constrained</strong> reported "light blue" as an extra/wrong color <strong>2 times</strong> (was 0 previously), including misidentifying navy blue as light blue (image 081).</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>This overcorrection is a smaller problem than the original misses it replaced, but worth noting.</p>
|
||||||
|
|
||||||
|
<h3>5. The capstone prompt did not improve results</h3>
|
||||||
|
|
||||||
|
<div class="callout callout-red">
|
||||||
|
<strong>Capstone Prompt: No Benefit</strong>
|
||||||
|
The capstone prompt performed at or slightly below the original prompt for both models. Its emphasis on precision over recall ("do not guess") hurt overall detection rates without meaningfully improving color accuracy.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Qwen: 78.2% recall (same), 87.3% precision (slight improvement)</li>
|
||||||
|
<li>Gemini: 78.2% recall (down from 79.7%), 88.7% precision (down from 91.4%)</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h3>6. Gemini speed improvement from concurrency</h3>
|
||||||
|
|
||||||
|
<p>The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains:</p>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Previous Sequential Runs</th>
|
||||||
|
<th>Current Concurrent Runs</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>2134s (13.3s avg)</td>
|
||||||
|
<td class="best">253s (1.6s avg)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>1882s (11.7s avg)</td>
|
||||||
|
<td class="best">260s (1.6s avg)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>—</td>
|
||||||
|
<td>344s (2.1s avg)</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>That's roughly an <strong>8x speedup</strong> for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).</p>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Persistently Failed Images</h2>
|
||||||
|
|
||||||
|
<p>These <strong>10 images</strong> failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:</p>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Image</th>
|
||||||
|
<th>GT Colors</th>
|
||||||
|
<th>Typical Error</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>016 - maroon.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#800000"></span>maroon</td>
|
||||||
|
<td class="worst">Not detected or called "red"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>034 - light blue.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#87ceeb"></span>light blue</td>
|
||||||
|
<td class="worst">Called "blue"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>046 - green.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#388e3c"></span>green</td>
|
||||||
|
<td class="worst">Called "black"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>053 - black_white.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#222"></span>black</td>
|
||||||
|
<td class="worst">Not detected</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>077 - teal_white.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#008080"></span>teal</td>
|
||||||
|
<td class="worst">Called "green"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>132 - brown_white.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#795548"></span>brown</td>
|
||||||
|
<td class="worst">Called "orange"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>134 - teal_white.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#008080"></span>teal</td>
|
||||||
|
<td class="worst">Called "blue" or "light blue"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>138 - maroon.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#800000"></span>maroon</td>
|
||||||
|
<td class="worst">Called "red"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>150 - green_gray.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#388e3c"></span>green, <span class="swatch" style="background:#9e9e9e"></span>gray</td>
|
||||||
|
<td class="worst">Called "black"</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>160 - blue_white.jpg</td>
|
||||||
|
<td><span class="swatch" style="background:#2196f3"></span>blue</td>
|
||||||
|
<td class="worst">Not detected</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>Notable improvements: Images <strong>029</strong> (maroon), <strong>087/141/161</strong> (light blue), and <strong>099</strong> (maroon) were previously persistent failures but were <strong>fixed by the constrained prompt</strong> for at least one model.</p>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Model Comparison</h2>
|
||||||
|
|
||||||
|
<div class="model-cards">
|
||||||
|
<div class="model-card gemini">
|
||||||
|
<h3>Gemini 3 Flash</h3>
|
||||||
|
<ul>
|
||||||
|
<li><strong>Best at:</strong> Precision (94.3% with constrained prompt), fewest hallucinated colors</li>
|
||||||
|
<li><strong>Weakness:</strong> Lower exact recall than Qwen; still uses shade variants even with constraints</li>
|
||||||
|
<li><strong>Speed:</strong> ~250–340s with 8 concurrent workers</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="model-card qwen">
|
||||||
|
<h3>Qwen3-VL-8B</h3>
|
||||||
|
<ul>
|
||||||
|
<li><strong>Best at:</strong> Recall (82.7% with constrained prompt), highest PASS count (127)</li>
|
||||||
|
<li><strong>Weakness:</strong> Higher false positive rate; introduced "maroon" overcorrection with constrained prompt</li>
|
||||||
|
<li><strong>Speed:</strong> ~1440–1600s sequential (local GPU inference)</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Recommendations</h2>
|
||||||
|
|
||||||
|
<ol class="recs">
|
||||||
|
<li><strong>Use the constrained prompt</strong> (<code>jersey_prompt_constrained.txt</code>) — it is the clear winner for both models, improving recall and precision simultaneously.</li>
|
||||||
|
<li><strong>Post-processing normalization</strong> could still recover additional matches: map <code>grey</code> → <code>gray</code> (catches any remaining Gemini outputs) and <code>navy</code> → <code>navy blue</code> (catches shorthand usage).</li>
|
||||||
|
<li><strong>Consider a brown/maroon calibration</strong> — the constrained prompt overcorrected on Qwen, turning brown→maroon confusion into a new error source. Adding "Use 'brown' for warm, non-reddish dark colors" or similar guidance may help.</li>
|
||||||
|
<li><strong>Gray and black detection remain unsolved</strong> at the prompt level — these are likely image quality or model perception limitations that no amount of prompt engineering will fix. These colors may benefit from a secondary computer vision pass (e.g., dominant color extraction from the jersey region).</li>
|
||||||
|
<li><strong>Retire the capstone prompt</strong> — it offered no benefit over the original and performed worse than the constrained prompt in every metric.</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Appendix: Color Similarity Families Used for Scoring</h2>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Family</th>
|
||||||
|
<th>Member Colors</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr><td><span class="swatch" style="background:#2196f3"></span>blue</td><td>blue, dark blue, navy blue, navy, royal blue</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#87ceeb"></span>light_blue</td><td>light blue, sky blue, baby blue, carolina blue, powder blue</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#f44336"></span>red</td><td>red, scarlet, crimson</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#800000"></span>dark_red</td><td>maroon, burgundy, dark red, wine</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#388e3c"></span>green</td><td>green, dark green, forest green, kelly green</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#fdd835"></span>yellow</td><td>yellow, gold, golden</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#ff9800"></span>orange</td><td>orange, burnt orange</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#795548"></span>brown</td><td>brown, dark brown</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#9c27b0"></span>purple</td><td>purple, violet</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#9e9e9e"></span>gray</td><td>gray, grey, silver, charcoal</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#222"></span>black</td><td>black</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#008080"></span>teal</td><td>teal, turquoise, cyan, aqua</td></tr>
|
||||||
|
<tr><td><span class="swatch" style="background:#e91e63"></span>pink</td><td>pink, magenta, hot pink, rose</td></tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
<h2>Appendix: Constrained Prompt (<code>jersey_prompt_constrained.txt</code>)</h2>
|
||||||
|
|
||||||
|
<pre><code>You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
|
||||||
|
|
||||||
|
CRITICAL INSTRUCTIONS:
|
||||||
|
1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
|
||||||
|
2. ONLY include jersey numbers that you can ACTUALLY READ in the image
|
||||||
|
3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
|
||||||
|
4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
|
||||||
|
5. DO NOT include jerseys if you cannot clearly see the number
|
||||||
|
|
||||||
|
COLOR VOCABULARY:
|
||||||
|
For "jersey_color" and "number_color", you MUST choose from this list ONLY:
|
||||||
|
red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
|
||||||
|
|
||||||
|
Important color distinctions:
|
||||||
|
- Use "maroon" for dark brownish-red, NOT "red"
|
||||||
|
- Use "light blue" for pale or sky blue, NOT "blue"
|
||||||
|
- Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
|
||||||
|
- Use "teal" for blue-green, NOT "green" or "blue"
|
||||||
|
- Use "gray" (not "grey") for silver or neutral tones
|
||||||
|
- Use "dark brown" for very dark brown, NOT "black"
|
||||||
|
- Use "gold" for metallic or deep yellow, NOT "yellow"
|
||||||
|
|
||||||
|
RESPONSE FORMAT:
|
||||||
|
Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
|
||||||
|
|
||||||
|
Use DOUBLE QUOTES (") for all JSON keys and string values.
|
||||||
|
|
||||||
|
The JSON must have a single key "jerseys" with an array of dictionaries.
|
||||||
|
|
||||||
|
Each dictionary must have exactly these three keys:
|
||||||
|
- "jersey_number": The number on the jersey (as a string, only if clearly visible)
|
||||||
|
- "jersey_color": The primary color of the jersey (MUST be from the color list above)
|
||||||
|
- "number_color": The color of the number on the jersey (MUST be from the color list above)
|
||||||
|
|
||||||
|
Example response for an image WITH visible jerseys:
|
||||||
|
{
|
||||||
|
"jerseys": [
|
||||||
|
{
|
||||||
|
"jersey_number": "10",
|
||||||
|
"jersey_color": "maroon",
|
||||||
|
"number_color": "gold"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"jersey_number": "42",
|
||||||
|
"jersey_color": "light blue",
|
||||||
|
"number_color": "white"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
Example response for an image WITHOUT jerseys or with unclear numbers:
|
||||||
|
{"jerseys": []}
|
||||||
|
|
||||||
|
REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
|
||||||
|
|
||||||
|
Now analyze the image and return the JSON object.</code></pre>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
229
accuracy_analysis_report_round2.md
Normal file
229
accuracy_analysis_report_round2.md
Normal file
@ -0,0 +1,229 @@
|
|||||||
|
# Jersey Color Detection Accuracy — Round 2 Analysis
|
||||||
|
|
||||||
|
**Date:** March 3, 2026
|
||||||
|
**Models:** Gemini 3 Flash Preview, Qwen3-VL-8B (local via llama.cpp)
|
||||||
|
**Prompts:** jersey_prompt.txt (original), jersey_prompt_capstone.txt (capstone), jersey_prompt_constrained.txt (constrained)
|
||||||
|
**Test set:** 161 annotated images, 202 ground truth colors (excluding white)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Comparison
|
||||||
|
|
||||||
|
| Metric | Qwen Original | Qwen Capstone | Qwen Constrained | Gemini Original | Gemini Capstone | Gemini Constrained |
|
||||||
|
|----------------------------|:-------------:|:-------------:|:-----------------:|:---------------:|:---------------:|:------------------:|
|
||||||
|
| **Recall (exact)** | 65.3% | 66.3% | **71.8%** | 62.4% | 60.9% | 67.8% |
|
||||||
|
| **Recall (exact+similar)** | 78.2% | 78.2% | **82.7%** | 79.7% | 78.2% | 81.7% |
|
||||||
|
| **Missed** | 21.8% | 21.8% | **17.3%** | 20.3% | 21.8% | 18.3% |
|
||||||
|
| **Precision (exact)** | 71.7% | 74.0% | 78.4% | 72.0% | 69.5% | **78.7%** |
|
||||||
|
| **Precision (exact+sim.)** | 85.9% | 87.3% | 90.3% | 91.4% | 88.7% | **94.3%** |
|
||||||
|
| **Extra/wrong** | 14.1% | 12.7% | 9.7% | 8.6% | 11.3% | **5.7%** |
|
||||||
|
| PASS | 118 | 120 | **127** | 120 | 117 | 124 |
|
||||||
|
| PARTIAL | 19 | 19 | **15** | 20 | 22 | 19 |
|
||||||
|
| FAIL | 24 | 22 | 19 | 21 | 22 | **18** |
|
||||||
|
| Total time | 1557s | 1437s | 1596s | 253s | 260s | 344s |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
|
||||||
|
### 1. The constrained prompt is the best prompt for both models
|
||||||
|
|
||||||
|
The constrained vocabulary prompt delivered the strongest results across the board:
|
||||||
|
|
||||||
|
- **Qwen + Constrained** achieved the highest recall of any combination at **82.7%** (167/202 found), up from 78.2% with both other prompts. It also posted the most PASS images (**127**, up from 118/120) and the fewest FAIL images (**19**, down from 24/22).
|
||||||
|
|
||||||
|
- **Gemini + Constrained** achieved the highest precision of any combination at **94.3%** (164/174 correct), with only **5.7% extra/wrong** colors — the lowest error rate across all six runs. It tied for fewest failures at **18**.
|
||||||
|
|
||||||
|
### 2. Exact match rates jumped significantly
|
||||||
|
|
||||||
|
The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:
|
||||||
|
|
||||||
|
| Model | Exact Match (Original) | Exact Match (Constrained) | Improvement |
|
||||||
|
|--------|:----------------------:|:-------------------------:|:-----------:|
|
||||||
|
| Qwen | 65.3% (132) | **71.8% (145)** | +6.5 pp |
|
||||||
|
| Gemini | 62.4% (126) | **67.8% (137)** | +5.4 pp |
|
||||||
|
|
||||||
|
This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."
|
||||||
|
|
||||||
|
### 3. Targeted color improvements
|
||||||
|
|
||||||
|
The constrained prompt's explicit color guidance fixed the worst systematic errors:
|
||||||
|
|
||||||
|
| Problem Color | Qwen Misses (Orig→Constrained) | Gemini Misses (Orig→Constrained) |
|
||||||
|
|----------------|:------------------------------:|:--------------------------------:|
|
||||||
|
| **maroon** | 8 → **3** | 6 → **3** |
|
||||||
|
| **light blue** | 7 → **1** | 3 → **1** |
|
||||||
|
| **dark brown** | 4 → **2** | 1 → 1 |
|
||||||
|
| **teal** | 2 → **2** | 2 → 2 |
|
||||||
|
| **gray** | 7 → 8 | 6 → 6 |
|
||||||
|
| **black** | 6 → 6 | 7 → 7 |
|
||||||
|
|
||||||
|
- **Maroon:** Cut in half for both models. Previously the most-missed color for Qwen; now ranks 5th.
|
||||||
|
- **Light blue:** Near-elimination of the "light blue → blue" confusion for both models (7→1 for Qwen, 3→1 for Gemini).
|
||||||
|
- **Gray/grey:** The spelling normalization instruction eliminated the grey→gray similar-match penalty for Gemini entirely (10 confusions → 0). However, gray detection misses remain unchanged — these are images where gray jerseys aren't detected at all, not a naming issue.
|
||||||
|
- **Teal and black** remain stubbornly problematic regardless of prompt.
|
||||||
|
|
||||||
|
### 4. New overcorrection pattern with constrained prompt
|
||||||
|
|
||||||
|
The constrained prompt introduced a new failure mode — models now occasionally over-apply newly-learned color terms:
|
||||||
|
|
||||||
|
- **Qwen + Constrained** reported "maroon" as an extra/wrong color **5 times** (was 0 previously). It's now calling some brown and red jerseys "maroon" — the opposite of the original problem. Specific cases: 007 (brown→maroon), 031 (brown→maroon), 048 (red→maroon), 142 (orange→maroon).
|
||||||
|
|
||||||
|
- **Gemini + Constrained** reported "light blue" as an extra/wrong color **2 times** (was 0 previously), including misidentifying navy blue as light blue (image 081).
|
||||||
|
|
||||||
|
This overcorrection is a smaller problem than the original misses it replaced, but worth noting.
|
||||||
|
|
||||||
|
### 5. The capstone prompt did not improve results
|
||||||
|
|
||||||
|
The capstone prompt performed at or slightly below the original prompt for both models:
|
||||||
|
|
||||||
|
- Qwen: 78.2% recall (same), 87.3% precision (slight improvement)
|
||||||
|
- Gemini: 78.2% recall (down from 79.7%), 88.7% precision (down from 91.4%)
|
||||||
|
|
||||||
|
The capstone prompt's emphasis on precision over recall ("do not guess") may have hurt overall detection rates without meaningfully improving color accuracy.
|
||||||
|
|
||||||
|
### 6. Gemini speed improvement from concurrency
|
||||||
|
|
||||||
|
The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains for the Gemini runs:
|
||||||
|
|
||||||
|
| Previous sequential runs | Current concurrent runs |
|
||||||
|
|:------------------------:|:-----------------------:|
|
||||||
|
| 2134s (13.3s avg) | 253s (1.6s avg) |
|
||||||
|
| 1882s (11.7s avg) | 260s (1.6s avg) |
|
||||||
|
| | 344s (2.1s avg) |
|
||||||
|
|
||||||
|
That's roughly an **8x speedup** for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Persistently Failed Images
|
||||||
|
|
||||||
|
These **10 images** failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:
|
||||||
|
|
||||||
|
| Image | GT Colors | Typical Error |
|
||||||
|
|-------|-----------|---------------|
|
||||||
|
| 016 - maroon.jpg | maroon | Not detected or called "red" |
|
||||||
|
| 034 - light blue.jpg | light blue | Called "blue" |
|
||||||
|
| 046 - green.jpg | green | Called "black" |
|
||||||
|
| 053 - black_white.jpg | black | Not detected |
|
||||||
|
| 077 - teal_white.jpg | teal | Called "green" |
|
||||||
|
| 132 - brown_white.jpg | brown | Called "orange" |
|
||||||
|
| 134 - teal_white.jpg | teal | Called "blue" or "light blue" |
|
||||||
|
| 138 - maroon.jpg | maroon | Called "red" |
|
||||||
|
| 150 - green_gray.jpg | green, gray | Called "black" |
|
||||||
|
| 160 - blue_white.jpg | blue | Not detected |
|
||||||
|
|
||||||
|
Notable improvements: Images **029** (maroon), **087/141/161** (light blue), and **099** (maroon) were previously persistent failures but were **fixed by the constrained prompt** for at least one model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model Comparison
|
||||||
|
|
||||||
|
### Gemini 3 Flash
|
||||||
|
- **Best at:** Precision (94.3% with constrained prompt), fewest hallucinated colors
|
||||||
|
- **Weakness:** Lower exact recall than Qwen; still uses shade variants even with constraints
|
||||||
|
- **Speed:** ~250-340s with 8 concurrent workers
|
||||||
|
|
||||||
|
### Qwen3-VL-8B
|
||||||
|
- **Best at:** Recall (82.7% with constrained prompt), highest PASS count (127)
|
||||||
|
- **Weakness:** Higher false positive rate; introduced "maroon" overcorrection with constrained prompt
|
||||||
|
- **Speed:** ~1440-1600s sequential (local GPU inference)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Use the constrained prompt** (`jersey_prompt_constrained.txt`) — it is the clear winner for both models, improving recall and precision simultaneously.
|
||||||
|
|
||||||
|
2. **Post-processing normalization** could still recover additional matches:
|
||||||
|
- Map `grey` → `gray` (catches any remaining Gemini outputs)
|
||||||
|
- Map `navy` → `navy blue` (catches shorthand usage)
|
||||||
|
|
||||||
|
3. **Consider a brown/maroon calibration** — the constrained prompt overcorrected on Qwen, turning brown→maroon confusion into a new error source. Adding "Use 'brown' for warm, non-reddish dark colors" or similar guidance may help.
|
||||||
|
|
||||||
|
4. **Gray and black detection remain unsolved** at the prompt level — these are likely image quality or model perception limitations that no amount of prompt engineering will fix. These colors may benefit from a secondary computer vision pass (e.g., dominant color extraction from the jersey region).
|
||||||
|
|
||||||
|
5. **Retire the capstone prompt** — it offered no benefit over the original and performed worse than the constrained prompt in every metric.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Color Similarity Families Used for Scoring
|
||||||
|
|
||||||
|
| Family | Member Colors |
|
||||||
|
|------------|-------------------------------------------------------|
|
||||||
|
| blue | blue, dark blue, navy blue, navy, royal blue |
|
||||||
|
| light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
|
||||||
|
| red | red, scarlet, crimson |
|
||||||
|
| dark_red | maroon, burgundy, dark red, wine |
|
||||||
|
| green | green, dark green, forest green, kelly green |
|
||||||
|
| yellow | yellow, gold, golden |
|
||||||
|
| orange | orange, burnt orange |
|
||||||
|
| brown | brown, dark brown |
|
||||||
|
| purple | purple, violet |
|
||||||
|
| gray | gray, grey, silver, charcoal |
|
||||||
|
| black | black |
|
||||||
|
| teal | teal, turquoise, cyan, aqua |
|
||||||
|
| pink | pink, magenta, hot pink, rose |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Constrained Prompt (`jersey_prompt_constrained.txt`)
|
||||||
|
|
||||||
|
```
|
||||||
|
You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
|
||||||
|
|
||||||
|
CRITICAL INSTRUCTIONS:
|
||||||
|
1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
|
||||||
|
2. ONLY include jersey numbers that you can ACTUALLY READ in the image
|
||||||
|
3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
|
||||||
|
4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
|
||||||
|
5. DO NOT include jerseys if you cannot clearly see the number
|
||||||
|
|
||||||
|
COLOR VOCABULARY:
|
||||||
|
For "jersey_color" and "number_color", you MUST choose from this list ONLY:
|
||||||
|
red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
|
||||||
|
|
||||||
|
Important color distinctions:
|
||||||
|
- Use "maroon" for dark brownish-red, NOT "red"
|
||||||
|
- Use "light blue" for pale or sky blue, NOT "blue"
|
||||||
|
- Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
|
||||||
|
- Use "teal" for blue-green, NOT "green" or "blue"
|
||||||
|
- Use "gray" (not "grey") for silver or neutral tones
|
||||||
|
- Use "dark brown" for very dark brown, NOT "black"
|
||||||
|
- Use "gold" for metallic or deep yellow, NOT "yellow"
|
||||||
|
|
||||||
|
RESPONSE FORMAT:
|
||||||
|
Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
|
||||||
|
|
||||||
|
Use DOUBLE QUOTES (") for all JSON keys and string values.
|
||||||
|
|
||||||
|
The JSON must have a single key "jerseys" with an array of dictionaries.
|
||||||
|
|
||||||
|
Each dictionary must have exactly these three keys:
|
||||||
|
- "jersey_number": The number on the jersey (as a string, only if clearly visible)
|
||||||
|
- "jersey_color": The primary color of the jersey (MUST be from the color list above)
|
||||||
|
- "number_color": The color of the number on the jersey (MUST be from the color list above)
|
||||||
|
|
||||||
|
Example response for an image WITH visible jerseys:
|
||||||
|
{
|
||||||
|
"jerseys": [
|
||||||
|
{
|
||||||
|
"jersey_number": "10",
|
||||||
|
"jersey_color": "maroon",
|
||||||
|
"number_color": "gold"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"jersey_number": "42",
|
||||||
|
"jersey_color": "light blue",
|
||||||
|
"number_color": "white"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
Example response for an image WITHOUT jerseys or with unclear numbers:
|
||||||
|
{"jerseys": []}
|
||||||
|
|
||||||
|
REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
|
||||||
|
|
||||||
|
Now analyze the image and return the JSON object.
|
||||||
|
```
|
||||||
490
accuracy_test_results.md
Normal file
490
accuracy_test_results.md
Normal file
@ -0,0 +1,490 @@
|
|||||||
|
#Gemini 3 Flash Results (Prompt: jersey_prompt.txt):
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
ACCURACY SUMMARY (gemini-3-flash-preview)
|
||||||
|
================================================================================
|
||||||
|
Images processed: 161
|
||||||
|
Errors: 0
|
||||||
|
Total time: 2134.4s (13.3s avg)
|
||||||
|
|
||||||
|
Ground truth colors: 202 (excluding white)
|
||||||
|
VLM unique colors: 174 (excluding white)
|
||||||
|
|
||||||
|
--- Recall (did VLM find each ground truth color?) ---
|
||||||
|
Exact match: 130 / 202 (64.4%)
|
||||||
|
Similar match: 34 / 202 (16.8%)
|
||||||
|
Total found: 164 / 202 (81.2%)
|
||||||
|
Missed: 38 / 202 (18.8%)
|
||||||
|
|
||||||
|
--- Precision (are VLM colors correct?) ---
|
||||||
|
Exact match: 130 / 174 (74.7%)
|
||||||
|
Similar match: 33 / 174 (19.0%)
|
||||||
|
Total correct: 163 / 174 (93.7%)
|
||||||
|
Extra/wrong: 11 / 174 (6.3%)
|
||||||
|
|
||||||
|
--- Similar-Match Confusions (expected -> got) ---
|
||||||
|
gray -> grey x9
|
||||||
|
navy blue -> blue x7
|
||||||
|
dark brown -> brown x5
|
||||||
|
dark blue -> blue x5
|
||||||
|
gold -> yellow x3
|
||||||
|
dark blue -> navy blue x3
|
||||||
|
navy -> navy blue x1
|
||||||
|
dark blue -> navy x1
|
||||||
|
|
||||||
|
--- Most Missed Ground Truth Colors ---
|
||||||
|
gray 7 #######
|
||||||
|
black 7 #######
|
||||||
|
maroon 5 #####
|
||||||
|
blue 3 ###
|
||||||
|
green 3 ###
|
||||||
|
gold 2 ##
|
||||||
|
light blue 2 ##
|
||||||
|
gold|yellow 2 ##
|
||||||
|
red 2 ##
|
||||||
|
teal 2 ##
|
||||||
|
orange 1 #
|
||||||
|
yellow 1 #
|
||||||
|
brown 1 #
|
||||||
|
|
||||||
|
--- Most Common Extra/Wrong VLM Colors ---
|
||||||
|
red 3 ###
|
||||||
|
blue 3 ###
|
||||||
|
black 2 ##
|
||||||
|
green 1 #
|
||||||
|
orange 1 #
|
||||||
|
dark blue 1 #
|
||||||
|
|
||||||
|
--- Per-Image Verdict ---
|
||||||
|
PASS 124
|
||||||
|
PARTIAL 19
|
||||||
|
FAIL 18
|
||||||
|
|
||||||
|
--- Failed Images (18) ---
|
||||||
|
016 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
029 -maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
034 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
046 - green.jpg
|
||||||
|
missed: green
|
||||||
|
extra: black
|
||||||
|
048 - red.jpg
|
||||||
|
missed: red
|
||||||
|
053 - black_white.jpg
|
||||||
|
missed: black
|
||||||
|
057 - white_gold or yellow.jpg
|
||||||
|
missed: gold|yellow
|
||||||
|
069 - red_white.jpg
|
||||||
|
missed: red
|
||||||
|
074 - white_orange.jpg
|
||||||
|
missed: orange
|
||||||
|
077 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: green
|
||||||
|
088 - white_maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
129 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
132 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: orange
|
||||||
|
134 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: blue
|
||||||
|
138 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
150 - green_gray.jpg
|
||||||
|
missed: green, gray
|
||||||
|
extra: black
|
||||||
|
160 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
161 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
|
||||||
|
|
||||||
|
#Qwen3-VL-8B Model Results (Prompt: jersey_prompt.txt):
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
ACCURACY SUMMARY
|
||||||
|
================================================================================
|
||||||
|
Images processed: 161
|
||||||
|
Errors: 0
|
||||||
|
Total time: 1526.4s (9.5s avg)
|
||||||
|
|
||||||
|
Ground truth colors: 202 (excluding white)
|
||||||
|
VLM unique colors: 184 (excluding white)
|
||||||
|
|
||||||
|
--- Recall (did VLM find each ground truth color?) ---
|
||||||
|
Exact match: 130 / 202 (64.4%)
|
||||||
|
Similar match: 26 / 202 (12.9%)
|
||||||
|
Total found: 156 / 202 (77.2%)
|
||||||
|
Missed: 46 / 202 (22.8%)
|
||||||
|
|
||||||
|
--- Precision (are VLM colors correct?) ---
|
||||||
|
Exact match: 130 / 184 (70.7%)
|
||||||
|
Similar match: 26 / 184 (14.1%)
|
||||||
|
Total correct: 156 / 184 (84.8%)
|
||||||
|
Extra/wrong: 28 / 184 (15.2%)
|
||||||
|
|
||||||
|
--- Similar-Match Confusions (expected -> got) ---
|
||||||
|
dark blue -> blue x10
|
||||||
|
navy blue -> blue x8
|
||||||
|
gold -> yellow x5
|
||||||
|
dark brown -> brown x2
|
||||||
|
navy -> blue x1
|
||||||
|
|
||||||
|
--- Most Missed Ground Truth Colors ---
|
||||||
|
light blue 8 ########
|
||||||
|
maroon 8 ########
|
||||||
|
gray 7 #######
|
||||||
|
black 6 ######
|
||||||
|
dark brown 4 ####
|
||||||
|
brown 3 ###
|
||||||
|
blue 3 ###
|
||||||
|
green 3 ###
|
||||||
|
teal 2 ##
|
||||||
|
gold|yellow 1 #
|
||||||
|
red 1 #
|
||||||
|
|
||||||
|
--- Most Common Extra/Wrong VLM Colors ---
|
||||||
|
blue 10 ##########
|
||||||
|
black 7 #######
|
||||||
|
red 7 #######
|
||||||
|
gold 1 #
|
||||||
|
green 1 #
|
||||||
|
redolas 1 #
|
||||||
|
orange 1 #
|
||||||
|
|
||||||
|
--- Per-Image Verdict ---
|
||||||
|
PASS 117
|
||||||
|
PARTIAL 18
|
||||||
|
FAIL 26
|
||||||
|
|
||||||
|
--- Failed Images (26) ---
|
||||||
|
001 -brown_white or dark brown.jpg
|
||||||
|
missed: brown, dark brown
|
||||||
|
extra: black
|
||||||
|
013 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
016 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
017 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: black
|
||||||
|
022 - black_light blue.jpg
|
||||||
|
missed: black, light blue
|
||||||
|
extra: blue
|
||||||
|
029 -maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
034 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
036 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
046 - green.jpg
|
||||||
|
missed: green
|
||||||
|
extra: black
|
||||||
|
053 - black_white.jpg
|
||||||
|
missed: black
|
||||||
|
057 - white_gold or yellow.jpg
|
||||||
|
missed: gold|yellow
|
||||||
|
063 - dark brown.jpg
|
||||||
|
missed: dark brown
|
||||||
|
extra: black
|
||||||
|
069 - red_white.jpg
|
||||||
|
missed: red
|
||||||
|
077 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: green
|
||||||
|
078 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
083 - dark brown_white.jpg
|
||||||
|
missed: dark brown
|
||||||
|
extra: black
|
||||||
|
087 - white_light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
099 - maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: redolas, red
|
||||||
|
129 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
132 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: orange
|
||||||
|
134 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: blue
|
||||||
|
138 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
141 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
150 - green_gray.jpg
|
||||||
|
missed: green, gray
|
||||||
|
extra: black
|
||||||
|
160 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
161 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
|
||||||
|
|
||||||
|
#Gemini 3 Flash Results (Prompt: jersey_prompt_capstone.txt):
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
ACCURACY SUMMARY (gemini-3-flash-preview)
|
||||||
|
================================================================================
|
||||||
|
Images processed: 161
|
||||||
|
Errors: 0
|
||||||
|
Total time: 1881.7s (11.7s avg)
|
||||||
|
|
||||||
|
Ground truth colors: 202 (excluding white)
|
||||||
|
VLM unique colors: 174 (excluding white)
|
||||||
|
|
||||||
|
--- Recall (did VLM find each ground truth color?) ---
|
||||||
|
Exact match: 123 / 202 (60.9%)
|
||||||
|
Similar match: 35 / 202 (17.3%)
|
||||||
|
Total found: 158 / 202 (78.2%)
|
||||||
|
Missed: 44 / 202 (21.8%)
|
||||||
|
|
||||||
|
--- Precision (are VLM colors correct?) ---
|
||||||
|
Exact match: 123 / 174 (70.7%)
|
||||||
|
Similar match: 34 / 174 (19.5%)
|
||||||
|
Total correct: 157 / 174 (90.2%)
|
||||||
|
Extra/wrong: 17 / 174 (9.8%)
|
||||||
|
|
||||||
|
--- Similar-Match Confusions (expected -> got) ---
|
||||||
|
gray -> grey x10
|
||||||
|
navy blue -> blue x6
|
||||||
|
dark blue -> blue x6
|
||||||
|
dark brown -> brown x5
|
||||||
|
dark blue -> navy blue x3
|
||||||
|
gold -> yellow x2
|
||||||
|
navy blue -> navy x1
|
||||||
|
navy -> blue x1
|
||||||
|
dark blue -> navy x1
|
||||||
|
|
||||||
|
--- Most Missed Ground Truth Colors ---
|
||||||
|
maroon 9 #########
|
||||||
|
black 7 #######
|
||||||
|
gray 6 ######
|
||||||
|
green 4 ####
|
||||||
|
gold 3 ###
|
||||||
|
blue 3 ###
|
||||||
|
light blue 2 ##
|
||||||
|
gold|yellow 2 ##
|
||||||
|
red 2 ##
|
||||||
|
teal 2 ##
|
||||||
|
navy blue 1 #
|
||||||
|
dark brown 1 #
|
||||||
|
yellow 1 #
|
||||||
|
brown 1 #
|
||||||
|
|
||||||
|
--- Most Common Extra/Wrong VLM Colors ---
|
||||||
|
red 7 #######
|
||||||
|
black 4 ####
|
||||||
|
blue 2 ##
|
||||||
|
green 1 #
|
||||||
|
orange 1 #
|
||||||
|
light blue 1 #
|
||||||
|
navy 1 #
|
||||||
|
|
||||||
|
--- Per-Image Verdict ---
|
||||||
|
PASS 118
|
||||||
|
PARTIAL 21
|
||||||
|
FAIL 22
|
||||||
|
|
||||||
|
--- Failed Images (22) ---
|
||||||
|
016 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
019 - maroon_gold.jpg
|
||||||
|
missed: maroon, gold
|
||||||
|
extra: red
|
||||||
|
029 -maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
030 - navy blue_white.jpg
|
||||||
|
missed: navy blue
|
||||||
|
034 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
036 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
046 - green.jpg
|
||||||
|
missed: green
|
||||||
|
extra: black
|
||||||
|
048 - red.jpg
|
||||||
|
missed: red
|
||||||
|
053 - black_white.jpg
|
||||||
|
missed: black
|
||||||
|
057 - white_gold or yellow.jpg
|
||||||
|
missed: gold|yellow
|
||||||
|
069 - red_white.jpg
|
||||||
|
missed: red
|
||||||
|
077 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: green
|
||||||
|
083 - dark brown_white.jpg
|
||||||
|
missed: dark brown
|
||||||
|
extra: black
|
||||||
|
088 - white_maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
099 - maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
128 - green_white.jpg
|
||||||
|
missed: green
|
||||||
|
129 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
132 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: orange
|
||||||
|
134 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: light blue
|
||||||
|
138 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
150 - green_gray.jpg
|
||||||
|
missed: green, gray
|
||||||
|
extra: black
|
||||||
|
160 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
|
||||||
|
|
||||||
|
#Qwen3-VL-8B Model Results (Prompt: jersey_prompt_capstone.txt):
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
ACCURACY SUMMARY
|
||||||
|
================================================================================
|
||||||
|
Images processed: 161
|
||||||
|
Errors: 0
|
||||||
|
Total time: 1435.7s (8.9s avg)
|
||||||
|
|
||||||
|
Ground truth colors: 202 (excluding white)
|
||||||
|
VLM unique colors: 180 (excluding white)
|
||||||
|
|
||||||
|
--- Recall (did VLM find each ground truth color?) ---
|
||||||
|
Exact match: 133 / 202 (65.8%)
|
||||||
|
Similar match: 24 / 202 (11.9%)
|
||||||
|
Total found: 157 / 202 (77.7%)
|
||||||
|
Missed: 45 / 202 (22.3%)
|
||||||
|
|
||||||
|
--- Precision (are VLM colors correct?) ---
|
||||||
|
Exact match: 133 / 180 (73.9%)
|
||||||
|
Similar match: 24 / 180 (13.3%)
|
||||||
|
Total correct: 157 / 180 (87.2%)
|
||||||
|
Extra/wrong: 23 / 180 (12.8%)
|
||||||
|
|
||||||
|
--- Similar-Match Confusions (expected -> got) ---
|
||||||
|
dark blue -> blue x9
|
||||||
|
navy blue -> blue x8
|
||||||
|
gold -> yellow x3
|
||||||
|
dark brown -> brown x2
|
||||||
|
navy -> blue x1
|
||||||
|
dark blue -> navy x1
|
||||||
|
|
||||||
|
--- Most Missed Ground Truth Colors ---
|
||||||
|
gray 9 #########
|
||||||
|
maroon 7 #######
|
||||||
|
black 6 ######
|
||||||
|
light blue 5 #####
|
||||||
|
dark brown 4 ####
|
||||||
|
green 4 ####
|
||||||
|
brown 3 ###
|
||||||
|
gold 2 ##
|
||||||
|
blue 2 ##
|
||||||
|
teal 2 ##
|
||||||
|
gold|yellow 1 #
|
||||||
|
|
||||||
|
--- Most Common Extra/Wrong VLM Colors ---
|
||||||
|
black 7 #######
|
||||||
|
blue 6 ######
|
||||||
|
red 6 ######
|
||||||
|
gold 1 #
|
||||||
|
green 1 #
|
||||||
|
orange 1 #
|
||||||
|
navy 1 #
|
||||||
|
|
||||||
|
--- Per-Image Verdict ---
|
||||||
|
PASS 119
|
||||||
|
PARTIAL 19
|
||||||
|
FAIL 23
|
||||||
|
|
||||||
|
--- Failed Images (23) ---
|
||||||
|
001 -brown_white or dark brown.jpg
|
||||||
|
missed: brown, dark brown
|
||||||
|
extra: black
|
||||||
|
013 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
016 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
017 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: black
|
||||||
|
019 - maroon_gold.jpg
|
||||||
|
missed: maroon, gold
|
||||||
|
extra: red
|
||||||
|
029 -maroon_white.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
034 - light blue.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
036 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
039 - gray_white.jpg
|
||||||
|
missed: gray
|
||||||
|
046 - green.jpg
|
||||||
|
missed: green
|
||||||
|
extra: black
|
||||||
|
053 - black_white.jpg
|
||||||
|
missed: black
|
||||||
|
057 - white_gold or yellow.jpg
|
||||||
|
missed: gold|yellow
|
||||||
|
063 - dark brown.jpg
|
||||||
|
missed: dark brown
|
||||||
|
extra: black
|
||||||
|
077 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: green
|
||||||
|
083 - dark brown_white.jpg
|
||||||
|
missed: dark brown
|
||||||
|
extra: black
|
||||||
|
132 - brown_white.jpg
|
||||||
|
missed: brown
|
||||||
|
extra: orange
|
||||||
|
134 - teal_white.jpg
|
||||||
|
missed: teal
|
||||||
|
extra: blue
|
||||||
|
138 - maroon.jpg
|
||||||
|
missed: maroon
|
||||||
|
extra: red
|
||||||
|
141 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
|
145 - green_white.jpg
|
||||||
|
missed: green
|
||||||
|
150 - green_gray.jpg
|
||||||
|
missed: green, gray
|
||||||
|
extra: black
|
||||||
|
160 - blue_white.jpg
|
||||||
|
missed: blue
|
||||||
|
161 - light blue_white.jpg
|
||||||
|
missed: light blue
|
||||||
|
extra: blue
|
||||||
5609
accuracy_test_results_all.txt
Normal file
5609
accuracy_test_results_all.txt
Normal file
File diff suppressed because it is too large
Load Diff
15
jersey_prompt_capstone.txt
Normal file
15
jersey_prompt_capstone.txt
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
You are a high-precision sports telemetry system. Your job is to scan the image and output structured data for every visible jersey number.
|
||||||
|
|
||||||
|
**Goal:** Identify every clearly readable jersey number, along with its jersey color and number color.
|
||||||
|
|
||||||
|
**Input Analysis Guidelines:**
|
||||||
|
|
||||||
|
1. **Scan Targets:** Focus entirely on the torso/chest, back, and leg areas of players.
|
||||||
|
2. **Verify Readability:** For each potential number, check: - Are all digits clearly visible? - Is any part of the number occluded by a limb, fold, or object? - Is the number blurry or too small to read with certainty? - If a number is partially hidden (e.g., looking like a 1 but could be a 7), DISCARD IT.
|
||||||
|
3. Determine jersey_color from that player's TORSO SHIRT region: - Use the largest contiguous fabric area on the torso (exclude the number itself, stripes/logos, and deep shadows). - Ignore shorts color even if shorts dominate the image. - Choose the single color name that best matches the shirt's base color.
|
||||||
|
|
||||||
|
**Examples:** [Image: Player in red shirt with white '10'] -> {"jerseys": [{"jersey_number": "10", "jersey_color": "red", "number_color": "white"}]}
|
||||||
|
|
||||||
|
**Output Format:** Provide your output in valid JSON format with the following structure. Do not include markdown formatting (like ```json). { "jerseys": [ { "jersey_number": <string>, "jersey_color": <string>, "number_color": <string> } ] }
|
||||||
|
|
||||||
|
**Constraint:** - If no numbers are clearly readable, return "jerseys": []. - Do not guess. Precision is more important than recall.
|
||||||
56
jersey_prompt_constrained.txt
Normal file
56
jersey_prompt_constrained.txt
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
|
||||||
|
|
||||||
|
CRITICAL INSTRUCTIONS:
|
||||||
|
1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
|
||||||
|
2. ONLY include jersey numbers that you can ACTUALLY READ in the image
|
||||||
|
3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
|
||||||
|
4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
|
||||||
|
5. DO NOT include jerseys if you cannot clearly see the number
|
||||||
|
|
||||||
|
COLOR VOCABULARY:
|
||||||
|
For "jersey_color" and "number_color", you MUST choose from this list ONLY:
|
||||||
|
red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
|
||||||
|
|
||||||
|
Important color distinctions:
|
||||||
|
- Use "maroon" for dark brownish-red, NOT "red"
|
||||||
|
- Use "light blue" for pale or sky blue, NOT "blue"
|
||||||
|
- Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
|
||||||
|
- Use "teal" for blue-green, NOT "green" or "blue"
|
||||||
|
- Use "gray" (not "grey") for silver or neutral tones
|
||||||
|
- Use "dark brown" for very dark brown, NOT "black"
|
||||||
|
- Use "gold" for metallic or deep yellow, NOT "yellow"
|
||||||
|
|
||||||
|
RESPONSE FORMAT:
|
||||||
|
Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
|
||||||
|
|
||||||
|
Use DOUBLE QUOTES (") for all JSON keys and string values.
|
||||||
|
|
||||||
|
The JSON must have a single key "jerseys" with an array of dictionaries.
|
||||||
|
|
||||||
|
Each dictionary must have exactly these three keys:
|
||||||
|
- "jersey_number": The number on the jersey (as a string, only if clearly visible)
|
||||||
|
- "jersey_color": The primary color of the jersey (MUST be from the color list above)
|
||||||
|
- "number_color": The color of the number on the jersey (MUST be from the color list above)
|
||||||
|
|
||||||
|
Example response for an image WITH visible jerseys:
|
||||||
|
{
|
||||||
|
"jerseys": [
|
||||||
|
{
|
||||||
|
"jersey_number": "10",
|
||||||
|
"jersey_color": "maroon",
|
||||||
|
"number_color": "gold"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"jersey_number": "42",
|
||||||
|
"jersey_color": "light blue",
|
||||||
|
"number_color": "white"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
Example response for an image WITHOUT jerseys or with unclear numbers:
|
||||||
|
{"jerseys": []}
|
||||||
|
|
||||||
|
REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
|
||||||
|
|
||||||
|
Now analyze the image and return the JSON object.
|
||||||
11
pyproject.toml
Normal file
11
pyproject.toml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
[project]
|
||||||
|
name = "jersey-test"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Add your description here"
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.12"
|
||||||
|
dependencies = [
|
||||||
|
"numpy>=1.24.0",
|
||||||
|
"opencv-python>=4.8.0",
|
||||||
|
"requests>=2.28.0",
|
||||||
|
]
|
||||||
44
run_all_accuracy_tests.sh
Executable file
44
run_all_accuracy_tests.sh
Executable file
@ -0,0 +1,44 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
#
|
||||||
|
# Run both accuracy test scripts against all three prompts.
|
||||||
|
# Results are saved to accuracy_test_results_all.txt
|
||||||
|
#
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||||
|
OUTPUT_FILE="${SCRIPT_DIR}/accuracy_test_results_all.txt"
|
||||||
|
|
||||||
|
PROMPTS=(
|
||||||
|
"jersey_prompt.txt"
|
||||||
|
"jersey_prompt_capstone.txt"
|
||||||
|
"jersey_prompt_constrained.txt"
|
||||||
|
)
|
||||||
|
|
||||||
|
echo "Results will be saved to: ${OUTPUT_FILE}"
|
||||||
|
echo "Started at: $(date)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
> "$OUTPUT_FILE"
|
||||||
|
|
||||||
|
for prompt in "${PROMPTS[@]}"; do
|
||||||
|
prompt_path="${SCRIPT_DIR}/${prompt}"
|
||||||
|
|
||||||
|
echo "========================================" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "Qwen3-VL-8B + ${prompt}" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "Started: $(date)" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "========================================" | tee -a "$OUTPUT_FILE"
|
||||||
|
python3 "${SCRIPT_DIR}/test_accuracy.py" "$prompt_path" 2>&1 | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "" | tee -a "$OUTPUT_FILE"
|
||||||
|
|
||||||
|
echo "========================================" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "Gemini 3 Flash + ${prompt}" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "Started: $(date)" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "========================================" | tee -a "$OUTPUT_FILE"
|
||||||
|
python3 "${SCRIPT_DIR}/test_accuracy_gemini.py" "$prompt_path" 2>&1 | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "" | tee -a "$OUTPUT_FILE"
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "========================================" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "All tests completed at: $(date)" | tee -a "$OUTPUT_FILE"
|
||||||
|
echo "Results saved to: ${OUTPUT_FILE}"
|
||||||
402
test_accuracy.py
Normal file
402
test_accuracy.py
Normal file
@ -0,0 +1,402 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Test script to measure VLM accuracy for jersey color detection.
|
||||||
|
|
||||||
|
Uses annotated test images where ground truth colors are encoded in filenames.
|
||||||
|
Compares VLM results against ground truth, measuring exact and similar color matches.
|
||||||
|
White is ignored in both ground truth and VLM results.
|
||||||
|
|
||||||
|
Filename format: "014 - orange_dark blue or purple.jpg"
|
||||||
|
- Underscore separates distinct jersey colors
|
||||||
|
- "or" separates acceptable alternatives for a single jersey
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python test_accuracy.py [prompt_file]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from collections import Counter
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
from scan_utils.llama_cpp_client import LlamaCppClient
|
||||||
|
|
||||||
|
SERVER_URL = "http://agx:8080"
|
||||||
|
IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files_annotated")
|
||||||
|
DEFAULT_PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
|
||||||
|
MAX_IMAGE_WIDTH = 768
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Color similarity – colors in the same family count as "similar" matches
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
COLOR_FAMILIES = {
|
||||||
|
'blue': ['blue', 'dark blue', 'navy blue', 'navy', 'royal blue'],
|
||||||
|
'light_blue': ['light blue', 'sky blue', 'baby blue', 'carolina blue', 'powder blue'],
|
||||||
|
'red': ['red', 'scarlet', 'crimson'],
|
||||||
|
'dark_red': ['maroon', 'burgundy', 'dark red', 'wine'],
|
||||||
|
'green': ['green', 'dark green', 'forest green', 'kelly green'],
|
||||||
|
'yellow': ['yellow', 'gold', 'golden'],
|
||||||
|
'orange': ['orange', 'burnt orange'],
|
||||||
|
'brown': ['brown', 'dark brown'],
|
||||||
|
'purple': ['purple', 'violet'],
|
||||||
|
'gray': ['gray', 'grey', 'silver', 'charcoal'],
|
||||||
|
'black': ['black'],
|
||||||
|
'teal': ['teal', 'turquoise', 'cyan', 'aqua'],
|
||||||
|
'pink': ['pink', 'magenta', 'hot pink', 'rose'],
|
||||||
|
}
|
||||||
|
|
||||||
|
_COLOR_TO_FAMILY = {}
|
||||||
|
for _family, _members in COLOR_FAMILIES.items():
|
||||||
|
for _color in _members:
|
||||||
|
_COLOR_TO_FAMILY[_color] = _family
|
||||||
|
|
||||||
|
|
||||||
|
def colors_are_similar(color1: str, color2: str) -> bool:
|
||||||
|
"""Return True if two colors belong to the same color family."""
|
||||||
|
if color1 == color2:
|
||||||
|
return True
|
||||||
|
f1 = _COLOR_TO_FAMILY.get(color1)
|
||||||
|
f2 = _COLOR_TO_FAMILY.get(color2)
|
||||||
|
return bool(f1 and f2 and f1 == f2)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Ground-truth parsing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def parse_ground_truth(filename: str) -> list[list[str]]:
|
||||||
|
"""Parse ground truth colors from an annotated filename.
|
||||||
|
|
||||||
|
Returns a list of color groups. Each group is a list of acceptable
|
||||||
|
alternatives (from "or" in the filename). White entries are removed.
|
||||||
|
|
||||||
|
Example: "014 - orange_dark blue or purple.jpg"
|
||||||
|
-> [["orange"], ["dark blue", "purple"]]
|
||||||
|
"""
|
||||||
|
name = Path(filename).stem
|
||||||
|
# Strip number prefix ("014 - ", "029 -", etc.)
|
||||||
|
name = re.sub(r'^\d+\s*-\s*', '', name)
|
||||||
|
# Treat hyphens between colors as underscores (e.g. "yellow-black")
|
||||||
|
name = name.replace('-', '_')
|
||||||
|
|
||||||
|
color_groups = []
|
||||||
|
for part in name.split('_'):
|
||||||
|
part = part.strip()
|
||||||
|
if not part:
|
||||||
|
continue
|
||||||
|
alternatives = [a.strip().lower() for a in part.split(' or ')]
|
||||||
|
alternatives = [a for a in alternatives if a and a != 'white']
|
||||||
|
if alternatives:
|
||||||
|
color_groups.append(alternatives)
|
||||||
|
return color_groups
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Response cleaning
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def clean_response(text: str) -> str:
|
||||||
|
"""Remove think tags and markdown code blocks from model output."""
|
||||||
|
cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
|
||||||
|
cleaned = re.sub(r'\u25c1think\u25b7.*?\u25c1/think\u25b7', '', cleaned, flags=re.DOTALL)
|
||||||
|
cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
|
||||||
|
cleaned = re.sub(r'\u25c1/?think\u25b7', '', cleaned, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
|
||||||
|
if json_block:
|
||||||
|
cleaned = json_block.group(1)
|
||||||
|
else:
|
||||||
|
cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
return cleaned.strip()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Scoring
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def score_image(gt_groups: list[list[str]], vlm_colors: set[str]) -> dict:
|
||||||
|
"""Compare VLM detected colors against ground truth color groups.
|
||||||
|
|
||||||
|
Recall = how many GT color groups were found in VLM output
|
||||||
|
Precision = how many VLM colors match something in the GT
|
||||||
|
"""
|
||||||
|
recall_exact = 0
|
||||||
|
recall_similar = 0
|
||||||
|
recall_missed = []
|
||||||
|
confusions = []
|
||||||
|
|
||||||
|
for group in gt_groups:
|
||||||
|
# Try exact match first
|
||||||
|
if any(alt in vlm_colors for alt in group):
|
||||||
|
recall_exact += 1
|
||||||
|
continue
|
||||||
|
# Try similar match
|
||||||
|
matched_vlm = None
|
||||||
|
for alt in group:
|
||||||
|
for vc in vlm_colors:
|
||||||
|
if colors_are_similar(alt, vc):
|
||||||
|
matched_vlm = vc
|
||||||
|
break
|
||||||
|
if matched_vlm:
|
||||||
|
break
|
||||||
|
if matched_vlm:
|
||||||
|
recall_similar += 1
|
||||||
|
confusions.append((group, matched_vlm))
|
||||||
|
else:
|
||||||
|
recall_missed.append(group)
|
||||||
|
|
||||||
|
# Precision: check each VLM color against GT
|
||||||
|
all_gt_alts = [alt for group in gt_groups for alt in group]
|
||||||
|
precision_exact = 0
|
||||||
|
precision_similar = 0
|
||||||
|
precision_extra = []
|
||||||
|
for vc in vlm_colors:
|
||||||
|
if vc in all_gt_alts:
|
||||||
|
precision_exact += 1
|
||||||
|
elif any(colors_are_similar(vc, gt) for gt in all_gt_alts):
|
||||||
|
precision_similar += 1
|
||||||
|
else:
|
||||||
|
precision_extra.append(vc)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'gt_count': len(gt_groups),
|
||||||
|
'vlm_count': len(vlm_colors),
|
||||||
|
'recall_exact': recall_exact,
|
||||||
|
'recall_similar': recall_similar,
|
||||||
|
'recall_missed': recall_missed,
|
||||||
|
'precision_exact': precision_exact,
|
||||||
|
'precision_similar': precision_similar,
|
||||||
|
'precision_extra': precision_extra,
|
||||||
|
'confusions': confusions,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def pct(n: int, d: int) -> str:
|
||||||
|
return f"{100 * n / d:.1f}%" if d else "N/A"
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(total_gt, total_vlm, total_recall_exact, total_recall_similar,
|
||||||
|
total_recall_missed, total_precision_exact, total_precision_similar,
|
||||||
|
total_precision_extra, confusion_counter, missed_counter,
|
||||||
|
extra_counter, per_image_results, image_count, errors, total_time):
|
||||||
|
"""Print the full accuracy summary report."""
|
||||||
|
print()
|
||||||
|
print("=" * 80)
|
||||||
|
print("ACCURACY SUMMARY")
|
||||||
|
print("=" * 80)
|
||||||
|
print(f"Images processed: {image_count}")
|
||||||
|
print(f"Errors: {errors}")
|
||||||
|
print(f"Total time: {total_time:.1f}s ({total_time / max(image_count, 1):.1f}s avg)")
|
||||||
|
print()
|
||||||
|
print(f"Ground truth colors: {total_gt} (excluding white)")
|
||||||
|
print(f"VLM unique colors: {total_vlm} (excluding white)")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print("--- Recall (did VLM find each ground truth color?) ---")
|
||||||
|
print(f" Exact match: {total_recall_exact:4d} / {total_gt} ({pct(total_recall_exact, total_gt)})")
|
||||||
|
print(f" Similar match: {total_recall_similar:4d} / {total_gt} ({pct(total_recall_similar, total_gt)})")
|
||||||
|
recall_total = total_recall_exact + total_recall_similar
|
||||||
|
print(f" Total found: {recall_total:4d} / {total_gt} ({pct(recall_total, total_gt)})")
|
||||||
|
print(f" Missed: {total_recall_missed:4d} / {total_gt} ({pct(total_recall_missed, total_gt)})")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print("--- Precision (are VLM colors correct?) ---")
|
||||||
|
print(f" Exact match: {total_precision_exact:4d} / {total_vlm} ({pct(total_precision_exact, total_vlm)})")
|
||||||
|
print(f" Similar match: {total_precision_similar:4d} / {total_vlm} ({pct(total_precision_similar, total_vlm)})")
|
||||||
|
prec_total = total_precision_exact + total_precision_similar
|
||||||
|
print(f" Total correct: {prec_total:4d} / {total_vlm} ({pct(prec_total, total_vlm)})")
|
||||||
|
print(f" Extra/wrong: {total_precision_extra:4d} / {total_vlm} ({pct(total_precision_extra, total_vlm)})")
|
||||||
|
|
||||||
|
if confusion_counter:
|
||||||
|
print()
|
||||||
|
print("--- Similar-Match Confusions (expected -> got) ---")
|
||||||
|
for (expected, got), count in confusion_counter.most_common():
|
||||||
|
print(f" {expected:30s} -> {got:20s} x{count}")
|
||||||
|
|
||||||
|
if missed_counter:
|
||||||
|
print()
|
||||||
|
print("--- Most Missed Ground Truth Colors ---")
|
||||||
|
for color, count in missed_counter.most_common(20):
|
||||||
|
bar = "#" * min(count, 40)
|
||||||
|
print(f" {color:30s} {count:3d} {bar}")
|
||||||
|
|
||||||
|
if extra_counter:
|
||||||
|
print()
|
||||||
|
print("--- Most Common Extra/Wrong VLM Colors ---")
|
||||||
|
for color, count in extra_counter.most_common(20):
|
||||||
|
bar = "#" * min(count, 40)
|
||||||
|
print(f" {color:30s} {count:3d} {bar}")
|
||||||
|
|
||||||
|
if per_image_results:
|
||||||
|
tags = Counter(r['tag'] for r in per_image_results)
|
||||||
|
print()
|
||||||
|
print("--- Per-Image Verdict ---")
|
||||||
|
for tag in ['PASS', 'PARTIAL', 'FAIL']:
|
||||||
|
print(f" {tag:10s} {tags.get(tag, 0):4d}")
|
||||||
|
|
||||||
|
failed = [r for r in per_image_results if r['tag'] == 'FAIL']
|
||||||
|
if failed:
|
||||||
|
print()
|
||||||
|
print(f"--- Failed Images ({len(failed)}) ---")
|
||||||
|
for r in failed:
|
||||||
|
scores = r['scores']
|
||||||
|
missed_strs = ["|".join(g) for g in scores['recall_missed']]
|
||||||
|
print(f" {r['file']}")
|
||||||
|
print(f" missed: {', '.join(missed_strs)}")
|
||||||
|
if scores['precision_extra']:
|
||||||
|
print(f" extra: {', '.join(scores['precision_extra'])}")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def main():
|
||||||
|
prompt_file = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_PROMPT_FILE
|
||||||
|
|
||||||
|
with open(prompt_file, 'r') as f:
|
||||||
|
prompt = f.read()
|
||||||
|
|
||||||
|
valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
|
||||||
|
image_files = sorted([
|
||||||
|
p for p in Path(IMAGES_DIR).iterdir()
|
||||||
|
if p.suffix.lower() in valid_extensions
|
||||||
|
])
|
||||||
|
|
||||||
|
print(f"Images to process: {len(image_files)}")
|
||||||
|
print(f"Server: {SERVER_URL}")
|
||||||
|
print(f"Prompt: {prompt_file} ({len(prompt)} chars)")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
client = LlamaCppClient(base_url=SERVER_URL)
|
||||||
|
|
||||||
|
# Accumulators
|
||||||
|
total_gt = 0
|
||||||
|
total_vlm = 0
|
||||||
|
total_recall_exact = 0
|
||||||
|
total_recall_similar = 0
|
||||||
|
total_recall_missed = 0
|
||||||
|
total_precision_exact = 0
|
||||||
|
total_precision_similar = 0
|
||||||
|
total_precision_extra = 0
|
||||||
|
errors = 0
|
||||||
|
start_all = time.time()
|
||||||
|
|
||||||
|
confusion_counter = Counter()
|
||||||
|
missed_counter = Counter()
|
||||||
|
extra_counter = Counter()
|
||||||
|
per_image_results = []
|
||||||
|
|
||||||
|
for i, image_path in enumerate(image_files, 1):
|
||||||
|
gt_groups = parse_ground_truth(image_path.name)
|
||||||
|
gt_display = ", ".join("|".join(g) for g in gt_groups) if gt_groups else "(none)"
|
||||||
|
print(f"\n[{i}/{len(image_files)}] {image_path.name}")
|
||||||
|
print(f" GT: [{gt_display}]")
|
||||||
|
|
||||||
|
image = cv2.imread(str(image_path))
|
||||||
|
if image is None:
|
||||||
|
print(" SKIP (failed to load)")
|
||||||
|
errors += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
h, w = image.shape[:2]
|
||||||
|
if w > MAX_IMAGE_WIDTH:
|
||||||
|
scale = MAX_IMAGE_WIDTH / w
|
||||||
|
image = cv2.resize(image, (MAX_IMAGE_WIDTH, int(h * scale)), interpolation=cv2.INTER_AREA)
|
||||||
|
|
||||||
|
message = client.create_multimodal_message(role="user", content=prompt, images=[image])
|
||||||
|
|
||||||
|
try:
|
||||||
|
t0 = time.time()
|
||||||
|
response = client.chat_completion(messages=[message], temperature=0.1, max_tokens=1000)
|
||||||
|
elapsed = time.time() - t0
|
||||||
|
|
||||||
|
response_text = response['choices'][0]['message']['content']
|
||||||
|
cleaned = clean_response(response_text)
|
||||||
|
result = json.loads(cleaned)
|
||||||
|
jerseys = result.get('jerseys', [])
|
||||||
|
|
||||||
|
# Unique VLM jersey colors, ignoring white
|
||||||
|
vlm_colors = set()
|
||||||
|
for j in jerseys:
|
||||||
|
jc = j.get('jersey_color', '').strip().lower()
|
||||||
|
if jc and jc != 'white':
|
||||||
|
vlm_colors.add(jc)
|
||||||
|
|
||||||
|
vlm_display = ", ".join(sorted(vlm_colors)) if vlm_colors else "(none)"
|
||||||
|
print(f" VLM: [{vlm_display}] ({len(jerseys)} jersey(s), {elapsed:.1f}s)")
|
||||||
|
|
||||||
|
if not gt_groups:
|
||||||
|
print(" -- no ground truth colors (white-only), skipping scoring")
|
||||||
|
continue
|
||||||
|
|
||||||
|
scores = score_image(gt_groups, vlm_colors)
|
||||||
|
total_gt += scores['gt_count']
|
||||||
|
total_vlm += scores['vlm_count']
|
||||||
|
total_recall_exact += scores['recall_exact']
|
||||||
|
total_recall_similar += scores['recall_similar']
|
||||||
|
total_recall_missed += len(scores['recall_missed'])
|
||||||
|
total_precision_exact += scores['precision_exact']
|
||||||
|
total_precision_similar += scores['precision_similar']
|
||||||
|
total_precision_extra += len(scores['precision_extra'])
|
||||||
|
|
||||||
|
for group, got in scores['confusions']:
|
||||||
|
confusion_counter[("|".join(group), got)] += 1
|
||||||
|
for group in scores['recall_missed']:
|
||||||
|
missed_counter["|".join(group)] += 1
|
||||||
|
for ec in scores['precision_extra']:
|
||||||
|
extra_counter[ec] += 1
|
||||||
|
|
||||||
|
# Status line
|
||||||
|
status_parts = []
|
||||||
|
if scores['recall_exact']:
|
||||||
|
status_parts.append(f"exact:{scores['recall_exact']}")
|
||||||
|
if scores['recall_similar']:
|
||||||
|
status_parts.append(f"similar:{scores['recall_similar']}")
|
||||||
|
if scores['recall_missed']:
|
||||||
|
missed_strs = ["|".join(g) for g in scores['recall_missed']]
|
||||||
|
status_parts.append(f"MISS:{','.join(missed_strs)}")
|
||||||
|
if scores['precision_extra']:
|
||||||
|
status_parts.append(f"extra:{','.join(scores['precision_extra'])}")
|
||||||
|
|
||||||
|
all_found = (scores['recall_exact'] + scores['recall_similar']) == scores['gt_count']
|
||||||
|
no_extra = not scores['precision_extra']
|
||||||
|
if all_found and no_extra:
|
||||||
|
tag = "PASS"
|
||||||
|
elif scores['recall_exact'] + scores['recall_similar'] > 0:
|
||||||
|
tag = "PARTIAL"
|
||||||
|
else:
|
||||||
|
tag = "FAIL"
|
||||||
|
print(f" {tag} {', '.join(status_parts)}")
|
||||||
|
|
||||||
|
per_image_results.append({
|
||||||
|
'file': image_path.name,
|
||||||
|
'tag': tag,
|
||||||
|
'scores': scores,
|
||||||
|
})
|
||||||
|
|
||||||
|
except (json.JSONDecodeError, KeyError, IndexError) as e:
|
||||||
|
print(f" PARSE ERROR: {e}")
|
||||||
|
errors += 1
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ERROR: {e}")
|
||||||
|
errors += 1
|
||||||
|
|
||||||
|
total_time = time.time() - start_all
|
||||||
|
|
||||||
|
print_summary(
|
||||||
|
total_gt, total_vlm, total_recall_exact, total_recall_similar,
|
||||||
|
total_recall_missed, total_precision_exact, total_precision_similar,
|
||||||
|
total_precision_extra, confusion_counter, missed_counter,
|
||||||
|
extra_counter, per_image_results, len(image_files), errors, total_time,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
576
test_accuracy_gemini.py
Normal file
576
test_accuracy_gemini.py
Normal file
@ -0,0 +1,576 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Test script to measure Gemini VLM accuracy for jersey color detection.
|
||||||
|
|
||||||
|
Uses annotated test images where ground truth colors are encoded in filenames.
|
||||||
|
Compares Gemini results against ground truth, measuring exact and similar color
|
||||||
|
matches. White is ignored in both ground truth and VLM results.
|
||||||
|
|
||||||
|
Filename format: "014 - orange_dark blue or purple.jpg"
|
||||||
|
- Underscore separates distinct jersey colors
|
||||||
|
- "or" separates acceptable alternatives for a single jersey
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python test_accuracy_gemini.py [prompt_file]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import concurrent.futures
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from collections import Counter
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import requests
|
||||||
|
|
||||||
|
GEMINI_MODEL = "gemini-3-flash-preview"
|
||||||
|
API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{GEMINI_MODEL}:generateContent"
|
||||||
|
|
||||||
|
IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files_annotated")
|
||||||
|
DEFAULT_PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
|
||||||
|
API_KEY_FILE = os.path.join(os.path.dirname(__file__), "gemini_api_key.txt")
|
||||||
|
MAX_IMAGE_WIDTH = 768
|
||||||
|
JPEG_QUALITY = 85
|
||||||
|
CONCURRENT_WORKERS = 8
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Color similarity – colors in the same family count as "similar" matches
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
COLOR_FAMILIES = {
|
||||||
|
'blue': ['blue', 'dark blue', 'navy blue', 'navy', 'royal blue'],
|
||||||
|
'light_blue': ['light blue', 'sky blue', 'baby blue', 'carolina blue', 'powder blue'],
|
||||||
|
'red': ['red', 'scarlet', 'crimson'],
|
||||||
|
'dark_red': ['maroon', 'burgundy', 'dark red', 'wine'],
|
||||||
|
'green': ['green', 'dark green', 'forest green', 'kelly green'],
|
||||||
|
'yellow': ['yellow', 'gold', 'golden'],
|
||||||
|
'orange': ['orange', 'burnt orange'],
|
||||||
|
'brown': ['brown', 'dark brown'],
|
||||||
|
'purple': ['purple', 'violet'],
|
||||||
|
'gray': ['gray', 'grey', 'silver', 'charcoal'],
|
||||||
|
'black': ['black'],
|
||||||
|
'teal': ['teal', 'turquoise', 'cyan', 'aqua'],
|
||||||
|
'pink': ['pink', 'magenta', 'hot pink', 'rose'],
|
||||||
|
}
|
||||||
|
|
||||||
|
_COLOR_TO_FAMILY = {}
|
||||||
|
for _family, _members in COLOR_FAMILIES.items():
|
||||||
|
for _color in _members:
|
||||||
|
_COLOR_TO_FAMILY[_color] = _family
|
||||||
|
|
||||||
|
|
||||||
|
def colors_are_similar(color1: str, color2: str) -> bool:
|
||||||
|
"""Return True if two colors belong to the same color family."""
|
||||||
|
if color1 == color2:
|
||||||
|
return True
|
||||||
|
f1 = _COLOR_TO_FAMILY.get(color1)
|
||||||
|
f2 = _COLOR_TO_FAMILY.get(color2)
|
||||||
|
return bool(f1 and f2 and f1 == f2)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Ground-truth parsing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def parse_ground_truth(filename: str) -> list[list[str]]:
|
||||||
|
"""Parse ground truth colors from an annotated filename.
|
||||||
|
|
||||||
|
Returns a list of color groups. Each group is a list of acceptable
|
||||||
|
alternatives (from "or" in the filename). White entries are removed.
|
||||||
|
|
||||||
|
Example: "014 - orange_dark blue or purple.jpg"
|
||||||
|
-> [["orange"], ["dark blue", "purple"]]
|
||||||
|
"""
|
||||||
|
name = Path(filename).stem
|
||||||
|
# Strip number prefix ("014 - ", "029 -", etc.)
|
||||||
|
name = re.sub(r'^\d+\s*-\s*', '', name)
|
||||||
|
# Treat hyphens between colors as underscores (e.g. "yellow-black")
|
||||||
|
name = name.replace('-', '_')
|
||||||
|
|
||||||
|
color_groups = []
|
||||||
|
for part in name.split('_'):
|
||||||
|
part = part.strip()
|
||||||
|
if not part:
|
||||||
|
continue
|
||||||
|
alternatives = [a.strip().lower() for a in part.split(' or ')]
|
||||||
|
alternatives = [a for a in alternatives if a and a != 'white']
|
||||||
|
if alternatives:
|
||||||
|
color_groups.append(alternatives)
|
||||||
|
return color_groups
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Response cleaning & salvage
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def clean_response(text: str) -> str:
|
||||||
|
"""Remove think tags and markdown code blocks from model output."""
|
||||||
|
cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
|
||||||
|
cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
|
||||||
|
if json_block:
|
||||||
|
cleaned = json_block.group(1)
|
||||||
|
else:
|
||||||
|
cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
return cleaned.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def salvage_jerseys(text: str) -> list[dict]:
|
||||||
|
"""Extract complete jersey objects from truncated JSON using regex."""
|
||||||
|
pattern = re.compile(
|
||||||
|
r'\{\s*'
|
||||||
|
r'"jersey_number"\s*:\s*"[^"]*"\s*,\s*'
|
||||||
|
r'"jersey_color"\s*:\s*"([^"]*)"\s*,\s*'
|
||||||
|
r'"number_color"\s*:\s*"([^"]*)"\s*'
|
||||||
|
r'\}',
|
||||||
|
re.DOTALL,
|
||||||
|
)
|
||||||
|
jerseys = []
|
||||||
|
for m in pattern.finditer(text):
|
||||||
|
jerseys.append({
|
||||||
|
'jersey_color': m.group(1),
|
||||||
|
'number_color': m.group(2),
|
||||||
|
})
|
||||||
|
return jerseys
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Gemini API helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def load_api_key() -> str:
|
||||||
|
with open(API_KEY_FILE, 'r') as f:
|
||||||
|
return f.read().strip()
|
||||||
|
|
||||||
|
|
||||||
|
def encode_image(image_path: str) -> tuple[str, str]:
|
||||||
|
"""Read an image file, resize if wider than MAX_IMAGE_WIDTH, and return (base64_data, mime_type)."""
|
||||||
|
ext = Path(image_path).suffix.lower()
|
||||||
|
mime_map = {
|
||||||
|
'.jpg': 'image/jpeg',
|
||||||
|
'.jpeg': 'image/jpeg',
|
||||||
|
'.png': 'image/png',
|
||||||
|
'.webp': 'image/webp',
|
||||||
|
'.bmp': 'image/bmp',
|
||||||
|
'.tiff': 'image/tiff',
|
||||||
|
}
|
||||||
|
mime_type = mime_map.get(ext, 'image/jpeg')
|
||||||
|
|
||||||
|
image = cv2.imread(image_path)
|
||||||
|
if image is not None:
|
||||||
|
h, w = image.shape[:2]
|
||||||
|
if w > MAX_IMAGE_WIDTH:
|
||||||
|
scale = MAX_IMAGE_WIDTH / w
|
||||||
|
image = cv2.resize(image, (MAX_IMAGE_WIDTH, int(h * scale)), interpolation=cv2.INTER_AREA)
|
||||||
|
if ext == '.png':
|
||||||
|
_, buf = cv2.imencode('.png', image)
|
||||||
|
else:
|
||||||
|
_, buf = cv2.imencode('.jpg', image, [cv2.IMWRITE_JPEG_QUALITY, JPEG_QUALITY])
|
||||||
|
data = base64.b64encode(buf).decode('utf-8')
|
||||||
|
else:
|
||||||
|
with open(image_path, 'rb') as f:
|
||||||
|
data = base64.b64encode(f.read()).decode('utf-8')
|
||||||
|
|
||||||
|
return data, mime_type
|
||||||
|
|
||||||
|
|
||||||
|
MAX_RETRIES = 3
|
||||||
|
RETRY_BACKOFF = [2, 5, 10]
|
||||||
|
|
||||||
|
|
||||||
|
def call_gemini(session: requests.Session, api_key: str, image_data: str,
|
||||||
|
mime_type: str, prompt: str) -> dict:
|
||||||
|
"""Send pre-encoded image + prompt to the Gemini API and return the raw response."""
|
||||||
|
payload = {
|
||||||
|
"contents": [{
|
||||||
|
"parts": [
|
||||||
|
{
|
||||||
|
"inline_data": {
|
||||||
|
"mime_type": mime_type,
|
||||||
|
"data": image_data,
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"text": prompt,
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}],
|
||||||
|
"generationConfig": {
|
||||||
|
"temperature": 0.1,
|
||||||
|
"maxOutputTokens": 8192,
|
||||||
|
"responseMimeType": "application/json",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for attempt in range(MAX_RETRIES):
|
||||||
|
response = session.post(
|
||||||
|
API_URL,
|
||||||
|
headers={
|
||||||
|
"x-goog-api-key": api_key,
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
json=payload,
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code >= 500 and attempt < MAX_RETRIES - 1:
|
||||||
|
time.sleep(RETRY_BACKOFF[attempt])
|
||||||
|
continue
|
||||||
|
|
||||||
|
response.raise_for_status()
|
||||||
|
return response.json()
|
||||||
|
|
||||||
|
response.raise_for_status()
|
||||||
|
return response.json()
|
||||||
|
|
||||||
|
|
||||||
|
def _api_worker(session: requests.Session, api_key: str, image_data: str,
|
||||||
|
mime_type: str, prompt: str) -> dict:
|
||||||
|
"""Wrapper that captures timing and exceptions for concurrent execution."""
|
||||||
|
t0 = time.time()
|
||||||
|
try:
|
||||||
|
resp = call_gemini(session, api_key, image_data, mime_type, prompt)
|
||||||
|
return {'resp': resp, 'elapsed': time.time() - t0, 'error': None}
|
||||||
|
except Exception as e:
|
||||||
|
return {'resp': None, 'elapsed': time.time() - t0, 'error': e}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Scoring
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def score_image(gt_groups: list[list[str]], vlm_colors: set[str]) -> dict:
|
||||||
|
"""Compare VLM detected colors against ground truth color groups.
|
||||||
|
|
||||||
|
Recall = how many GT color groups were found in VLM output
|
||||||
|
Precision = how many VLM colors match something in the GT
|
||||||
|
"""
|
||||||
|
recall_exact = 0
|
||||||
|
recall_similar = 0
|
||||||
|
recall_missed = []
|
||||||
|
confusions = []
|
||||||
|
|
||||||
|
for group in gt_groups:
|
||||||
|
# Try exact match first
|
||||||
|
if any(alt in vlm_colors for alt in group):
|
||||||
|
recall_exact += 1
|
||||||
|
continue
|
||||||
|
# Try similar match
|
||||||
|
matched_vlm = None
|
||||||
|
for alt in group:
|
||||||
|
for vc in vlm_colors:
|
||||||
|
if colors_are_similar(alt, vc):
|
||||||
|
matched_vlm = vc
|
||||||
|
break
|
||||||
|
if matched_vlm:
|
||||||
|
break
|
||||||
|
if matched_vlm:
|
||||||
|
recall_similar += 1
|
||||||
|
confusions.append((group, matched_vlm))
|
||||||
|
else:
|
||||||
|
recall_missed.append(group)
|
||||||
|
|
||||||
|
# Precision: check each VLM color against GT
|
||||||
|
all_gt_alts = [alt for group in gt_groups for alt in group]
|
||||||
|
precision_exact = 0
|
||||||
|
precision_similar = 0
|
||||||
|
precision_extra = []
|
||||||
|
for vc in vlm_colors:
|
||||||
|
if vc in all_gt_alts:
|
||||||
|
precision_exact += 1
|
||||||
|
elif any(colors_are_similar(vc, gt) for gt in all_gt_alts):
|
||||||
|
precision_similar += 1
|
||||||
|
else:
|
||||||
|
precision_extra.append(vc)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'gt_count': len(gt_groups),
|
||||||
|
'vlm_count': len(vlm_colors),
|
||||||
|
'recall_exact': recall_exact,
|
||||||
|
'recall_similar': recall_similar,
|
||||||
|
'recall_missed': recall_missed,
|
||||||
|
'precision_exact': precision_exact,
|
||||||
|
'precision_similar': precision_similar,
|
||||||
|
'precision_extra': precision_extra,
|
||||||
|
'confusions': confusions,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def pct(n: int, d: int) -> str:
|
||||||
|
return f"{100 * n / d:.1f}%" if d else "N/A"
|
||||||
|
|
||||||
|
|
||||||
|
def extract_vlm_colors(jerseys: list[dict]) -> set[str]:
|
||||||
|
"""Return unique jersey colors from VLM output, ignoring white."""
|
||||||
|
vlm_colors = set()
|
||||||
|
for j in jerseys:
|
||||||
|
jc = j.get('jersey_color', '').strip().lower()
|
||||||
|
if jc and jc != 'white':
|
||||||
|
vlm_colors.add(jc)
|
||||||
|
return vlm_colors
|
||||||
|
|
||||||
|
|
||||||
|
def parse_response(result: dict) -> tuple[list[dict], set[str]]:
|
||||||
|
"""Parse a Gemini response into jerseys list and vlm_colors set.
|
||||||
|
|
||||||
|
On JSON parse failure, attempts to salvage jersey objects from truncated
|
||||||
|
output. Returns (jerseys, vlm_colors).
|
||||||
|
"""
|
||||||
|
text = result['resp']['candidates'][0]['content']['parts'][0]['text']
|
||||||
|
cleaned = clean_response(text)
|
||||||
|
try:
|
||||||
|
data = json.loads(cleaned)
|
||||||
|
jerseys = data.get('jerseys', [])
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
jerseys = salvage_jerseys(text)
|
||||||
|
return jerseys, extract_vlm_colors(jerseys)
|
||||||
|
|
||||||
|
|
||||||
|
def score_and_format(gt_groups, vlm_colors, scores):
|
||||||
|
"""Build a status line and tag from scoring results."""
|
||||||
|
status_parts = []
|
||||||
|
if scores['recall_exact']:
|
||||||
|
status_parts.append(f"exact:{scores['recall_exact']}")
|
||||||
|
if scores['recall_similar']:
|
||||||
|
status_parts.append(f"similar:{scores['recall_similar']}")
|
||||||
|
if scores['recall_missed']:
|
||||||
|
missed_strs = ["|".join(g) for g in scores['recall_missed']]
|
||||||
|
status_parts.append(f"MISS:{','.join(missed_strs)}")
|
||||||
|
if scores['precision_extra']:
|
||||||
|
status_parts.append(f"extra:{','.join(scores['precision_extra'])}")
|
||||||
|
|
||||||
|
all_found = (scores['recall_exact'] + scores['recall_similar']) == scores['gt_count']
|
||||||
|
no_extra = not scores['precision_extra']
|
||||||
|
if all_found and no_extra:
|
||||||
|
tag = "PASS"
|
||||||
|
elif scores['recall_exact'] + scores['recall_similar'] > 0:
|
||||||
|
tag = "PARTIAL"
|
||||||
|
else:
|
||||||
|
tag = "FAIL"
|
||||||
|
return tag, status_parts
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(model_name, total_gt, total_vlm, total_recall_exact,
|
||||||
|
total_recall_similar, total_recall_missed,
|
||||||
|
total_precision_exact, total_precision_similar,
|
||||||
|
total_precision_extra, confusion_counter, missed_counter,
|
||||||
|
extra_counter, per_image_results, image_count, errors,
|
||||||
|
total_time):
|
||||||
|
"""Print the full accuracy summary report."""
|
||||||
|
print()
|
||||||
|
print("=" * 80)
|
||||||
|
print(f"ACCURACY SUMMARY ({model_name})")
|
||||||
|
print("=" * 80)
|
||||||
|
print(f"Images processed: {image_count}")
|
||||||
|
print(f"Errors: {errors}")
|
||||||
|
print(f"Total time: {total_time:.1f}s ({total_time / max(image_count, 1):.1f}s avg)")
|
||||||
|
print()
|
||||||
|
print(f"Ground truth colors: {total_gt} (excluding white)")
|
||||||
|
print(f"VLM unique colors: {total_vlm} (excluding white)")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print("--- Recall (did VLM find each ground truth color?) ---")
|
||||||
|
print(f" Exact match: {total_recall_exact:4d} / {total_gt} ({pct(total_recall_exact, total_gt)})")
|
||||||
|
print(f" Similar match: {total_recall_similar:4d} / {total_gt} ({pct(total_recall_similar, total_gt)})")
|
||||||
|
recall_total = total_recall_exact + total_recall_similar
|
||||||
|
print(f" Total found: {recall_total:4d} / {total_gt} ({pct(recall_total, total_gt)})")
|
||||||
|
print(f" Missed: {total_recall_missed:4d} / {total_gt} ({pct(total_recall_missed, total_gt)})")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print("--- Precision (are VLM colors correct?) ---")
|
||||||
|
print(f" Exact match: {total_precision_exact:4d} / {total_vlm} ({pct(total_precision_exact, total_vlm)})")
|
||||||
|
print(f" Similar match: {total_precision_similar:4d} / {total_vlm} ({pct(total_precision_similar, total_vlm)})")
|
||||||
|
prec_total = total_precision_exact + total_precision_similar
|
||||||
|
print(f" Total correct: {prec_total:4d} / {total_vlm} ({pct(prec_total, total_vlm)})")
|
||||||
|
print(f" Extra/wrong: {total_precision_extra:4d} / {total_vlm} ({pct(total_precision_extra, total_vlm)})")
|
||||||
|
|
||||||
|
if confusion_counter:
|
||||||
|
print()
|
||||||
|
print("--- Similar-Match Confusions (expected -> got) ---")
|
||||||
|
for (expected, got), count in confusion_counter.most_common():
|
||||||
|
print(f" {expected:30s} -> {got:20s} x{count}")
|
||||||
|
|
||||||
|
if missed_counter:
|
||||||
|
print()
|
||||||
|
print("--- Most Missed Ground Truth Colors ---")
|
||||||
|
for color, count in missed_counter.most_common(20):
|
||||||
|
bar = "#" * min(count, 40)
|
||||||
|
print(f" {color:30s} {count:3d} {bar}")
|
||||||
|
|
||||||
|
if extra_counter:
|
||||||
|
print()
|
||||||
|
print("--- Most Common Extra/Wrong VLM Colors ---")
|
||||||
|
for color, count in extra_counter.most_common(20):
|
||||||
|
bar = "#" * min(count, 40)
|
||||||
|
print(f" {color:30s} {count:3d} {bar}")
|
||||||
|
|
||||||
|
if per_image_results:
|
||||||
|
tags = Counter(r['tag'] for r in per_image_results)
|
||||||
|
print()
|
||||||
|
print("--- Per-Image Verdict ---")
|
||||||
|
for tag in ['PASS', 'PARTIAL', 'FAIL']:
|
||||||
|
print(f" {tag:10s} {tags.get(tag, 0):4d}")
|
||||||
|
|
||||||
|
failed = [r for r in per_image_results if r['tag'] == 'FAIL']
|
||||||
|
if failed:
|
||||||
|
print()
|
||||||
|
print(f"--- Failed Images ({len(failed)}) ---")
|
||||||
|
for r in failed:
|
||||||
|
scores = r['scores']
|
||||||
|
missed_strs = ["|".join(g) for g in scores['recall_missed']]
|
||||||
|
print(f" {r['file']}")
|
||||||
|
print(f" missed: {', '.join(missed_strs)}")
|
||||||
|
if scores['precision_extra']:
|
||||||
|
print(f" extra: {', '.join(scores['precision_extra'])}")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def main():
|
||||||
|
prompt_file = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_PROMPT_FILE
|
||||||
|
api_key = load_api_key()
|
||||||
|
|
||||||
|
with open(prompt_file, 'r') as f:
|
||||||
|
prompt = f.read()
|
||||||
|
|
||||||
|
valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
|
||||||
|
image_files = sorted([
|
||||||
|
p for p in Path(IMAGES_DIR).iterdir()
|
||||||
|
if p.suffix.lower() in valid_extensions
|
||||||
|
])
|
||||||
|
|
||||||
|
print(f"Model: {GEMINI_MODEL}")
|
||||||
|
print(f"Images to process: {len(image_files)}")
|
||||||
|
print(f"Concurrency: {CONCURRENT_WORKERS} workers")
|
||||||
|
print(f"Prompt: {prompt_file} ({len(prompt)} chars)")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Phase 1: Pre-encode all images
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
print("Pre-encoding images ... ", end="", flush=True)
|
||||||
|
t_enc = time.time()
|
||||||
|
encoded_images = []
|
||||||
|
for image_path in image_files:
|
||||||
|
encoded_images.append(encode_image(str(image_path)))
|
||||||
|
print(f"{len(encoded_images)} images in {time.time() - t_enc:.1f}s")
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Phase 2: Submit all API calls concurrently
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
session = requests.Session()
|
||||||
|
start_all = time.time()
|
||||||
|
|
||||||
|
print(f"Sending API requests ... ", flush=True)
|
||||||
|
api_results = [None] * len(image_files)
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=CONCURRENT_WORKERS) as executor:
|
||||||
|
future_to_idx = {}
|
||||||
|
for i, (image_data, mime_type) in enumerate(encoded_images):
|
||||||
|
future = executor.submit(
|
||||||
|
_api_worker, session, api_key, image_data, mime_type, prompt,
|
||||||
|
)
|
||||||
|
future_to_idx[future] = i
|
||||||
|
|
||||||
|
completed = 0
|
||||||
|
for future in concurrent.futures.as_completed(future_to_idx):
|
||||||
|
idx = future_to_idx[future]
|
||||||
|
api_results[idx] = future.result()
|
||||||
|
completed += 1
|
||||||
|
print(f"\r {completed}/{len(image_files)} API calls completed", end="", flush=True)
|
||||||
|
|
||||||
|
api_time = time.time() - start_all
|
||||||
|
print(f" ({api_time:.1f}s total)")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Phase 3: Score results in order
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
total_gt = 0
|
||||||
|
total_vlm = 0
|
||||||
|
total_recall_exact = 0
|
||||||
|
total_recall_similar = 0
|
||||||
|
total_recall_missed = 0
|
||||||
|
total_precision_exact = 0
|
||||||
|
total_precision_similar = 0
|
||||||
|
total_precision_extra = 0
|
||||||
|
errors = 0
|
||||||
|
|
||||||
|
confusion_counter = Counter()
|
||||||
|
missed_counter = Counter()
|
||||||
|
extra_counter = Counter()
|
||||||
|
per_image_results = []
|
||||||
|
|
||||||
|
for i, (image_path, result) in enumerate(zip(image_files, api_results), 1):
|
||||||
|
gt_groups = parse_ground_truth(image_path.name)
|
||||||
|
gt_display = ", ".join("|".join(g) for g in gt_groups) if gt_groups else "(none)"
|
||||||
|
print(f"\n[{i}/{len(image_files)}] {image_path.name}")
|
||||||
|
print(f" GT: [{gt_display}]")
|
||||||
|
|
||||||
|
if result['error'] is not None:
|
||||||
|
e = result['error']
|
||||||
|
if isinstance(e, requests.exceptions.HTTPError):
|
||||||
|
print(f" HTTP ERROR: {e}")
|
||||||
|
else:
|
||||||
|
print(f" ERROR: {e}")
|
||||||
|
errors += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
elapsed = result['elapsed']
|
||||||
|
|
||||||
|
try:
|
||||||
|
jerseys, vlm_colors = parse_response(result)
|
||||||
|
|
||||||
|
vlm_display = ", ".join(sorted(vlm_colors)) if vlm_colors else "(none)"
|
||||||
|
print(f" VLM: [{vlm_display}] ({len(jerseys)} jersey(s), {elapsed:.1f}s)")
|
||||||
|
|
||||||
|
if not gt_groups:
|
||||||
|
print(" -- no ground truth colors (white-only), skipping scoring")
|
||||||
|
continue
|
||||||
|
|
||||||
|
scores = score_image(gt_groups, vlm_colors)
|
||||||
|
total_gt += scores['gt_count']
|
||||||
|
total_vlm += scores['vlm_count']
|
||||||
|
total_recall_exact += scores['recall_exact']
|
||||||
|
total_recall_similar += scores['recall_similar']
|
||||||
|
total_recall_missed += len(scores['recall_missed'])
|
||||||
|
total_precision_exact += scores['precision_exact']
|
||||||
|
total_precision_similar += scores['precision_similar']
|
||||||
|
total_precision_extra += len(scores['precision_extra'])
|
||||||
|
|
||||||
|
for group, got in scores['confusions']:
|
||||||
|
confusion_counter[("|".join(group), got)] += 1
|
||||||
|
for group in scores['recall_missed']:
|
||||||
|
missed_counter["|".join(group)] += 1
|
||||||
|
for ec in scores['precision_extra']:
|
||||||
|
extra_counter[ec] += 1
|
||||||
|
|
||||||
|
tag, status_parts = score_and_format(gt_groups, vlm_colors, scores)
|
||||||
|
print(f" {tag} {', '.join(status_parts)}")
|
||||||
|
|
||||||
|
per_image_results.append({
|
||||||
|
'file': image_path.name,
|
||||||
|
'tag': tag,
|
||||||
|
'scores': scores,
|
||||||
|
})
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" PARSE ERROR: {e}")
|
||||||
|
errors += 1
|
||||||
|
|
||||||
|
total_time = time.time() - start_all
|
||||||
|
|
||||||
|
print_summary(
|
||||||
|
GEMINI_MODEL, total_gt, total_vlm, total_recall_exact,
|
||||||
|
total_recall_similar, total_recall_missed, total_precision_exact,
|
||||||
|
total_precision_similar, total_precision_extra, confusion_counter,
|
||||||
|
missed_counter, extra_counter, per_image_results, len(image_files),
|
||||||
|
errors, total_time,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
198
uv.lock
generated
Normal file
198
uv.lock
generated
Normal file
@ -0,0 +1,198 @@
|
|||||||
|
version = 1
|
||||||
|
revision = 3
|
||||||
|
requires-python = ">=3.12"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "certifi"
|
||||||
|
version = "2026.1.4"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/e0/2d/a891ca51311197f6ad14a7ef42e2399f36cf2f9bd44752b3dc4eab60fdc5/certifi-2026.1.4.tar.gz", hash = "sha256:ac726dd470482006e014ad384921ed6438c457018f4b3d204aea4281258b2120", size = 154268, upload-time = "2026-01-04T02:42:41.825Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e6/ad/3cc14f097111b4de0040c83a525973216457bbeeb63739ef1ed275c1c021/certifi-2026.1.4-py3-none-any.whl", hash = "sha256:9943707519e4add1115f44c2bc244f782c0249876bf51b6599fee1ffbedd685c", size = 152900, upload-time = "2026-01-04T02:42:40.15Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "charset-normalizer"
|
||||||
|
version = "3.4.4"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/13/69/33ddede1939fdd074bce5434295f38fae7136463422fe4fd3e0e89b98062/charset_normalizer-3.4.4.tar.gz", hash = "sha256:94537985111c35f28720e43603b8e7b43a6ecfb2ce1d3058bbe955b73404e21a", size = 129418, upload-time = "2025-10-14T04:42:32.879Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/f3/85/1637cd4af66fa687396e757dec650f28025f2a2f5a5531a3208dc0ec43f2/charset_normalizer-3.4.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:0a98e6759f854bd25a58a73fa88833fba3b7c491169f86ce1180c948ab3fd394", size = 208425, upload-time = "2025-10-14T04:40:53.353Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/9d/6a/04130023fef2a0d9c62d0bae2649b69f7b7d8d24ea5536feef50551029df/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b5b290ccc2a263e8d185130284f8501e3e36c5e02750fc6b6bdeb2e9e96f1e25", size = 148162, upload-time = "2025-10-14T04:40:54.558Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/78/29/62328d79aa60da22c9e0b9a66539feae06ca0f5a4171ac4f7dc285b83688/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74bb723680f9f7a6234dcf67aea57e708ec1fbdf5699fb91dfd6f511b0a320ef", size = 144558, upload-time = "2025-10-14T04:40:55.677Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/86/bb/b32194a4bf15b88403537c2e120b817c61cd4ecffa9b6876e941c3ee38fe/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f1e34719c6ed0b92f418c7c780480b26b5d9c50349e9a9af7d76bf757530350d", size = 161497, upload-time = "2025-10-14T04:40:57.217Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/19/89/a54c82b253d5b9b111dc74aca196ba5ccfcca8242d0fb64146d4d3183ff1/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2437418e20515acec67d86e12bf70056a33abdacb5cb1655042f6538d6b085a8", size = 159240, upload-time = "2025-10-14T04:40:58.358Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/c0/10/d20b513afe03acc89ec33948320a5544d31f21b05368436d580dec4e234d/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:11d694519d7f29d6cd09f6ac70028dba10f92f6cdd059096db198c283794ac86", size = 153471, upload-time = "2025-10-14T04:40:59.468Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/61/fa/fbf177b55bdd727010f9c0a3c49eefa1d10f960e5f09d1d887bf93c2e698/charset_normalizer-3.4.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ac1c4a689edcc530fc9d9aa11f5774b9e2f33f9a0c6a57864e90908f5208d30a", size = 150864, upload-time = "2025-10-14T04:41:00.623Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/05/12/9fbc6a4d39c0198adeebbde20b619790e9236557ca59fc40e0e3cebe6f40/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:21d142cc6c0ec30d2efee5068ca36c128a30b0f2c53c1c07bd78cb6bc1d3be5f", size = 150647, upload-time = "2025-10-14T04:41:01.754Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/ad/1f/6a9a593d52e3e8c5d2b167daf8c6b968808efb57ef4c210acb907c365bc4/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:5dbe56a36425d26d6cfb40ce79c314a2e4dd6211d51d6d2191c00bed34f354cc", size = 145110, upload-time = "2025-10-14T04:41:03.231Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/30/42/9a52c609e72471b0fc54386dc63c3781a387bb4fe61c20231a4ebcd58bdd/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:5bfbb1b9acf3334612667b61bd3002196fe2a1eb4dd74d247e0f2a4d50ec9bbf", size = 162839, upload-time = "2025-10-14T04:41:04.715Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/c4/5b/c0682bbf9f11597073052628ddd38344a3d673fda35a36773f7d19344b23/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:d055ec1e26e441f6187acf818b73564e6e6282709e9bcb5b63f5b23068356a15", size = 150667, upload-time = "2025-10-14T04:41:05.827Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e4/24/a41afeab6f990cf2daf6cb8c67419b63b48cf518e4f56022230840c9bfb2/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:af2d8c67d8e573d6de5bc30cdb27e9b95e49115cd9baad5ddbd1a6207aaa82a9", size = 160535, upload-time = "2025-10-14T04:41:06.938Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/2a/e5/6a4ce77ed243c4a50a1fecca6aaaab419628c818a49434be428fe24c9957/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:780236ac706e66881f3b7f2f32dfe90507a09e67d1d454c762cf642e6e1586e0", size = 154816, upload-time = "2025-10-14T04:41:08.101Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/a8/ef/89297262b8092b312d29cdb2517cb1237e51db8ecef2e9af5edbe7b683b1/charset_normalizer-3.4.4-cp312-cp312-win32.whl", hash = "sha256:5833d2c39d8896e4e19b689ffc198f08ea58116bee26dea51e362ecc7cd3ed26", size = 99694, upload-time = "2025-10-14T04:41:09.23Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/3d/2d/1e5ed9dd3b3803994c155cd9aacb60c82c331bad84daf75bcb9c91b3295e/charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:a79cfe37875f822425b89a82333404539ae63dbdddf97f84dcbc3d339aae9525", size = 107131, upload-time = "2025-10-14T04:41:10.467Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d0/d9/0ed4c7098a861482a7b6a95603edce4c0d9db2311af23da1fb2b75ec26fc/charset_normalizer-3.4.4-cp312-cp312-win_arm64.whl", hash = "sha256:376bec83a63b8021bb5c8ea75e21c4ccb86e7e45ca4eb81146091b56599b80c3", size = 100390, upload-time = "2025-10-14T04:41:11.915Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/97/45/4b3a1239bbacd321068ea6e7ac28875b03ab8bc0aa0966452db17cd36714/charset_normalizer-3.4.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:e1f185f86a6f3403aa2420e815904c67b2f9ebc443f045edd0de921108345794", size = 208091, upload-time = "2025-10-14T04:41:13.346Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/7d/62/73a6d7450829655a35bb88a88fca7d736f9882a27eacdca2c6d505b57e2e/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b39f987ae8ccdf0d2642338faf2abb1862340facc796048b604ef14919e55ed", size = 147936, upload-time = "2025-10-14T04:41:14.461Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/89/c5/adb8c8b3d6625bef6d88b251bbb0d95f8205831b987631ab0c8bb5d937c2/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:3162d5d8ce1bb98dd51af660f2121c55d0fa541b46dff7bb9b9f86ea1d87de72", size = 144180, upload-time = "2025-10-14T04:41:15.588Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/91/ed/9706e4070682d1cc219050b6048bfd293ccf67b3d4f5a4f39207453d4b99/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:81d5eb2a312700f4ecaa977a8235b634ce853200e828fbadf3a9c50bab278328", size = 161346, upload-time = "2025-10-14T04:41:16.738Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d5/0d/031f0d95e4972901a2f6f09ef055751805ff541511dc1252ba3ca1f80cf5/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5bd2293095d766545ec1a8f612559f6b40abc0eb18bb2f5d1171872d34036ede", size = 158874, upload-time = "2025-10-14T04:41:17.923Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/f5/83/6ab5883f57c9c801ce5e5677242328aa45592be8a00644310a008d04f922/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a8a8b89589086a25749f471e6a900d3f662d1d3b6e2e59dcecf787b1cc3a1894", size = 153076, upload-time = "2025-10-14T04:41:19.106Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/75/1e/5ff781ddf5260e387d6419959ee89ef13878229732732ee73cdae01800f2/charset_normalizer-3.4.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc7637e2f80d8530ee4a78e878bce464f70087ce73cf7c1caf142416923b98f1", size = 150601, upload-time = "2025-10-14T04:41:20.245Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d7/57/71be810965493d3510a6ca79b90c19e48696fb1ff964da319334b12677f0/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f8bf04158c6b607d747e93949aa60618b61312fe647a6369f88ce2ff16043490", size = 150376, upload-time = "2025-10-14T04:41:21.398Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e5/d5/c3d057a78c181d007014feb7e9f2e65905a6c4ef182c0ddf0de2924edd65/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:554af85e960429cf30784dd47447d5125aaa3b99a6f0683589dbd27e2f45da44", size = 144825, upload-time = "2025-10-14T04:41:22.583Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e6/8c/d0406294828d4976f275ffbe66f00266c4b3136b7506941d87c00cab5272/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:74018750915ee7ad843a774364e13a3db91682f26142baddf775342c3f5b1133", size = 162583, upload-time = "2025-10-14T04:41:23.754Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d7/24/e2aa1f18c8f15c4c0e932d9287b8609dd30ad56dbe41d926bd846e22fb8d/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:c0463276121fdee9c49b98908b3a89c39be45d86d1dbaa22957e38f6321d4ce3", size = 150366, upload-time = "2025-10-14T04:41:25.27Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e4/5b/1e6160c7739aad1e2df054300cc618b06bf784a7a164b0f238360721ab86/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:362d61fd13843997c1c446760ef36f240cf81d3ebf74ac62652aebaf7838561e", size = 160300, upload-time = "2025-10-14T04:41:26.725Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/7a/10/f882167cd207fbdd743e55534d5d9620e095089d176d55cb22d5322f2afd/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9a26f18905b8dd5d685d6d07b0cdf98a79f3c7a918906af7cc143ea2e164c8bc", size = 154465, upload-time = "2025-10-14T04:41:28.322Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/89/66/c7a9e1b7429be72123441bfdbaf2bc13faab3f90b933f664db506dea5915/charset_normalizer-3.4.4-cp313-cp313-win32.whl", hash = "sha256:9b35f4c90079ff2e2edc5b26c0c77925e5d2d255c42c74fdb70fb49b172726ac", size = 99404, upload-time = "2025-10-14T04:41:29.95Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/c4/26/b9924fa27db384bdcd97ab83b4f0a8058d96ad9626ead570674d5e737d90/charset_normalizer-3.4.4-cp313-cp313-win_amd64.whl", hash = "sha256:b435cba5f4f750aa6c0a0d92c541fb79f69a387c91e61f1795227e4ed9cece14", size = 107092, upload-time = "2025-10-14T04:41:31.188Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/af/8f/3ed4bfa0c0c72a7ca17f0380cd9e4dd842b09f664e780c13cff1dcf2ef1b/charset_normalizer-3.4.4-cp313-cp313-win_arm64.whl", hash = "sha256:542d2cee80be6f80247095cc36c418f7bddd14f4a6de45af91dfad36d817bba2", size = 100408, upload-time = "2025-10-14T04:41:32.624Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/2a/35/7051599bd493e62411d6ede36fd5af83a38f37c4767b92884df7301db25d/charset_normalizer-3.4.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:da3326d9e65ef63a817ecbcc0df6e94463713b754fe293eaa03da99befb9a5bd", size = 207746, upload-time = "2025-10-14T04:41:33.773Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/10/9a/97c8d48ef10d6cd4fcead2415523221624bf58bcf68a802721a6bc807c8f/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8af65f14dc14a79b924524b1e7fffe304517b2bff5a58bf64f30b98bbc5079eb", size = 147889, upload-time = "2025-10-14T04:41:34.897Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/10/bf/979224a919a1b606c82bd2c5fa49b5c6d5727aa47b4312bb27b1734f53cd/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74664978bb272435107de04e36db5a9735e78232b85b77d45cfb38f758efd33e", size = 143641, upload-time = "2025-10-14T04:41:36.116Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/ba/33/0ad65587441fc730dc7bd90e9716b30b4702dc7b617e6ba4997dc8651495/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:752944c7ffbfdd10c074dc58ec2d5a8a4cd9493b314d367c14d24c17684ddd14", size = 160779, upload-time = "2025-10-14T04:41:37.229Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/67/ed/331d6b249259ee71ddea93f6f2f0a56cfebd46938bde6fcc6f7b9a3d0e09/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d1f13550535ad8cff21b8d757a3257963e951d96e20ec82ab44bc64aeb62a191", size = 159035, upload-time = "2025-10-14T04:41:38.368Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/67/ff/f6b948ca32e4f2a4576aa129d8bed61f2e0543bf9f5f2b7fc3758ed005c9/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ecaae4149d99b1c9e7b88bb03e3221956f68fd6d50be2ef061b2381b61d20838", size = 152542, upload-time = "2025-10-14T04:41:39.862Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/16/85/276033dcbcc369eb176594de22728541a925b2632f9716428c851b149e83/charset_normalizer-3.4.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:cb6254dc36b47a990e59e1068afacdcd02958bdcce30bb50cc1700a8b9d624a6", size = 149524, upload-time = "2025-10-14T04:41:41.319Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/9e/f2/6a2a1f722b6aba37050e626530a46a68f74e63683947a8acff92569f979a/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c8ae8a0f02f57a6e61203a31428fa1d677cbe50c93622b4149d5c0f319c1d19e", size = 150395, upload-time = "2025-10-14T04:41:42.539Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/60/bb/2186cb2f2bbaea6338cad15ce23a67f9b0672929744381e28b0592676824/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:47cc91b2f4dd2833fddaedd2893006b0106129d4b94fdb6af1f4ce5a9965577c", size = 143680, upload-time = "2025-10-14T04:41:43.661Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/7d/a5/bf6f13b772fbb2a90360eb620d52ed8f796f3c5caee8398c3b2eb7b1c60d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:82004af6c302b5d3ab2cfc4cc5f29db16123b1a8417f2e25f9066f91d4411090", size = 162045, upload-time = "2025-10-14T04:41:44.821Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/df/c5/d1be898bf0dc3ef9030c3825e5d3b83f2c528d207d246cbabe245966808d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2b7d8f6c26245217bd2ad053761201e9f9680f8ce52f0fcd8d0755aeae5b2152", size = 149687, upload-time = "2025-10-14T04:41:46.442Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/a5/42/90c1f7b9341eef50c8a1cb3f098ac43b0508413f33affd762855f67a410e/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:799a7a5e4fb2d5898c60b640fd4981d6a25f1c11790935a44ce38c54e985f828", size = 160014, upload-time = "2025-10-14T04:41:47.631Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/76/be/4d3ee471e8145d12795ab655ece37baed0929462a86e72372fd25859047c/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:99ae2cffebb06e6c22bdc25801d7b30f503cc87dbd283479e7b606f70aff57ec", size = 154044, upload-time = "2025-10-14T04:41:48.81Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/b0/6f/8f7af07237c34a1defe7defc565a9bc1807762f672c0fde711a4b22bf9c0/charset_normalizer-3.4.4-cp314-cp314-win32.whl", hash = "sha256:f9d332f8c2a2fcbffe1378594431458ddbef721c1769d78e2cbc06280d8155f9", size = 99940, upload-time = "2025-10-14T04:41:49.946Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/4b/51/8ade005e5ca5b0d80fb4aff72a3775b325bdc3d27408c8113811a7cbe640/charset_normalizer-3.4.4-cp314-cp314-win_amd64.whl", hash = "sha256:8a6562c3700cce886c5be75ade4a5db4214fda19fede41d9792d100288d8f94c", size = 107104, upload-time = "2025-10-14T04:41:51.051Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/da/5f/6b8f83a55bb8278772c5ae54a577f3099025f9ade59d0136ac24a0df4bde/charset_normalizer-3.4.4-cp314-cp314-win_arm64.whl", hash = "sha256:de00632ca48df9daf77a2c65a484531649261ec9f25489917f09e455cb09ddb2", size = 100743, upload-time = "2025-10-14T04:41:52.122Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "idna"
|
||||||
|
version = "3.11"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "jersey-test"
|
||||||
|
version = "0.1.0"
|
||||||
|
source = { virtual = "." }
|
||||||
|
dependencies = [
|
||||||
|
{ name = "numpy" },
|
||||||
|
{ name = "opencv-python" },
|
||||||
|
{ name = "requests" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[package.metadata]
|
||||||
|
requires-dist = [
|
||||||
|
{ name = "numpy", specifier = ">=1.24.0" },
|
||||||
|
{ name = "opencv-python", specifier = ">=4.8.0" },
|
||||||
|
{ name = "requests", specifier = ">=2.28.0" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "numpy"
|
||||||
|
version = "2.4.2"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/57/fd/0005efbd0af48e55eb3c7208af93f2862d4b1a56cd78e84309a2d959208d/numpy-2.4.2.tar.gz", hash = "sha256:659a6107e31a83c4e33f763942275fd278b21d095094044eb35569e86a21ddae", size = 20723651, upload-time = "2026-01-31T23:13:10.135Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/51/6e/6f394c9c77668153e14d4da83bcc247beb5952f6ead7699a1a2992613bea/numpy-2.4.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:21982668592194c609de53ba4933a7471880ccbaadcc52352694a59ecc860b3a", size = 16667963, upload-time = "2026-01-31T23:10:52.147Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/1f/f8/55483431f2b2fd015ae6ed4fe62288823ce908437ed49db5a03d15151678/numpy-2.4.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40397bda92382fcec844066efb11f13e1c9a3e2a8e8f318fb72ed8b6db9f60f1", size = 14693571, upload-time = "2026-01-31T23:10:54.789Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/2f/20/18026832b1845cdc82248208dd929ca14c9d8f2bac391f67440707fff27c/numpy-2.4.2-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:b3a24467af63c67829bfaa61eecf18d5432d4f11992688537be59ecd6ad32f5e", size = 5203469, upload-time = "2026-01-31T23:10:57.343Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/7d/33/2eb97c8a77daaba34eaa3fa7241a14ac5f51c46a6bd5911361b644c4a1e2/numpy-2.4.2-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:805cc8de9fd6e7a22da5aed858e0ab16be5a4db6c873dde1d7451c541553aa27", size = 6550820, upload-time = "2026-01-31T23:10:59.429Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/b1/91/b97fdfd12dc75b02c44e26c6638241cc004d4079a0321a69c62f51470c4c/numpy-2.4.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6d82351358ffbcdcd7b686b90742a9b86632d6c1c051016484fa0b326a0a1548", size = 15663067, upload-time = "2026-01-31T23:11:01.291Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/f5/c6/a18e59f3f0b8071cc85cbc8d80cd02d68aa9710170b2553a117203d46936/numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9e35d3e0144137d9fdae62912e869136164534d64a169f86438bc9561b6ad49f", size = 16619782, upload-time = "2026-01-31T23:11:03.669Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/b7/83/9751502164601a79e18847309f5ceec0b1446d7b6aa12305759b72cf98b2/numpy-2.4.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:adb6ed2ad29b9e15321d167d152ee909ec73395901b70936f029c3bc6d7f4460", size = 17013128, upload-time = "2026-01-31T23:11:05.913Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/61/c4/c4066322256ec740acc1c8923a10047818691d2f8aec254798f3dd90f5f2/numpy-2.4.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:8906e71fd8afcb76580404e2a950caef2685df3d2a57fe82a86ac8d33cc007ba", size = 18345324, upload-time = "2026-01-31T23:11:08.248Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/ab/af/6157aa6da728fa4525a755bfad486ae7e3f76d4c1864138003eb84328497/numpy-2.4.2-cp312-cp312-win32.whl", hash = "sha256:ec055f6dae239a6299cace477b479cca2fc125c5675482daf1dd886933a1076f", size = 5960282, upload-time = "2026-01-31T23:11:10.497Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/92/0f/7ceaaeaacb40567071e94dbf2c9480c0ae453d5bb4f52bea3892c39dc83c/numpy-2.4.2-cp312-cp312-win_amd64.whl", hash = "sha256:209fae046e62d0ce6435fcfe3b1a10537e858249b3d9b05829e2a05218296a85", size = 12314210, upload-time = "2026-01-31T23:11:12.176Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/2f/a3/56c5c604fae6dd40fa2ed3040d005fca97e91bd320d232ac9931d77ba13c/numpy-2.4.2-cp312-cp312-win_arm64.whl", hash = "sha256:fbde1b0c6e81d56f5dccd95dd4a711d9b95df1ae4009a60887e56b27e8d903fa", size = 10220171, upload-time = "2026-01-31T23:11:14.684Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/a1/22/815b9fe25d1d7ae7d492152adbc7226d3eff731dffc38fe970589fcaaa38/numpy-2.4.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:25f2059807faea4b077a2b6837391b5d830864b3543627f381821c646f31a63c", size = 16663696, upload-time = "2026-01-31T23:11:17.516Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/09/f0/817d03a03f93ba9c6c8993de509277d84e69f9453601915e4a69554102a1/numpy-2.4.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bd3a7a9f5847d2fb8c2c6d1c862fa109c31a9abeca1a3c2bd5a64572955b2979", size = 14688322, upload-time = "2026-01-31T23:11:19.883Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/da/b4/f805ab79293c728b9a99438775ce51885fd4f31b76178767cfc718701a39/numpy-2.4.2-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:8e4549f8a3c6d13d55041925e912bfd834285ef1dd64d6bc7d542583355e2e98", size = 5198157, upload-time = "2026-01-31T23:11:22.375Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/74/09/826e4289844eccdcd64aac27d13b0fd3f32039915dd5b9ba01baae1f436c/numpy-2.4.2-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:aea4f66ff44dfddf8c2cffd66ba6538c5ec67d389285292fe428cb2c738c8aef", size = 6546330, upload-time = "2026-01-31T23:11:23.958Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/19/fb/cbfdbfa3057a10aea5422c558ac57538e6acc87ec1669e666d32ac198da7/numpy-2.4.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c3cd545784805de05aafe1dde61752ea49a359ccba9760c1e5d1c88a93bbf2b7", size = 15660968, upload-time = "2026-01-31T23:11:25.713Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/04/dc/46066ce18d01645541f0186877377b9371b8fa8017fa8262002b4ef22612/numpy-2.4.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d0d9b7c93578baafcbc5f0b83eaf17b79d345c6f36917ba0c67f45226911d499", size = 16607311, upload-time = "2026-01-31T23:11:28.117Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/14/d9/4b5adfc39a43fa6bf918c6d544bc60c05236cc2f6339847fc5b35e6cb5b0/numpy-2.4.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f74f0f7779cc7ae07d1810aab8ac6b1464c3eafb9e283a40da7309d5e6e48fbb", size = 17012850, upload-time = "2026-01-31T23:11:30.888Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/b7/20/adb6e6adde6d0130046e6fdfb7675cc62bc2f6b7b02239a09eb58435753d/numpy-2.4.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c7ac672d699bf36275c035e16b65539931347d68b70667d28984c9fb34e07fa7", size = 18334210, upload-time = "2026-01-31T23:11:33.214Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/78/0e/0a73b3dff26803a8c02baa76398015ea2a5434d9b8265a7898a6028c1591/numpy-2.4.2-cp313-cp313-win32.whl", hash = "sha256:8e9afaeb0beff068b4d9cd20d322ba0ee1cecfb0b08db145e4ab4dd44a6b5110", size = 5958199, upload-time = "2026-01-31T23:11:35.385Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/43/bc/6352f343522fcb2c04dbaf94cb30cca6fd32c1a750c06ad6231b4293708c/numpy-2.4.2-cp313-cp313-win_amd64.whl", hash = "sha256:7df2de1e4fba69a51c06c28f5a3de36731eb9639feb8e1cf7e4a7b0daf4cf622", size = 12310848, upload-time = "2026-01-31T23:11:38.001Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/6e/8d/6da186483e308da5da1cc6918ce913dcfe14ffde98e710bfeff2a6158d4e/numpy-2.4.2-cp313-cp313-win_arm64.whl", hash = "sha256:0fece1d1f0a89c16b03442eae5c56dc0be0c7883b5d388e0c03f53019a4bfd71", size = 10221082, upload-time = "2026-01-31T23:11:40.392Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/25/a1/9510aa43555b44781968935c7548a8926274f815de42ad3997e9e83680dd/numpy-2.4.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:5633c0da313330fd20c484c78cdd3f9b175b55e1a766c4a174230c6b70ad8262", size = 14815866, upload-time = "2026-01-31T23:11:42.495Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/36/30/6bbb5e76631a5ae46e7923dd16ca9d3f1c93cfa8d4ed79a129814a9d8db3/numpy-2.4.2-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d9f64d786b3b1dd742c946c42d15b07497ed14af1a1f3ce840cce27daa0ce913", size = 5325631, upload-time = "2026-01-31T23:11:44.7Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/46/00/3a490938800c1923b567b3a15cd17896e68052e2145d8662aaf3e1ffc58f/numpy-2.4.2-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:b21041e8cb6a1eb5312dd1d2f80a94d91efffb7a06b70597d44f1bd2dfc315ab", size = 6646254, upload-time = "2026-01-31T23:11:46.341Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d3/e9/fac0890149898a9b609caa5af7455a948b544746e4b8fe7c212c8edd71f8/numpy-2.4.2-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:00ab83c56211a1d7c07c25e3217ea6695e50a3e2f255053686b081dc0b091a82", size = 15720138, upload-time = "2026-01-31T23:11:48.082Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/ea/5c/08887c54e68e1e28df53709f1893ce92932cc6f01f7c3d4dc952f61ffd4e/numpy-2.4.2-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2fb882da679409066b4603579619341c6d6898fc83a8995199d5249f986e8e8f", size = 16655398, upload-time = "2026-01-31T23:11:50.293Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/4d/89/253db0fa0e66e9129c745e4ef25631dc37d5f1314dad2b53e907b8538e6d/numpy-2.4.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:66cb9422236317f9d44b67b4d18f44efe6e9c7f8794ac0462978513359461554", size = 17079064, upload-time = "2026-01-31T23:11:52.927Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/2a/d5/cbade46ce97c59c6c3da525e8d95b7abe8a42974a1dc5c1d489c10433e88/numpy-2.4.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0f01dcf33e73d80bd8dc0f20a71303abbafa26a19e23f6b68d1aa9990af90257", size = 18379680, upload-time = "2026-01-31T23:11:55.22Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/40/62/48f99ae172a4b63d981babe683685030e8a3df4f246c893ea5c6ef99f018/numpy-2.4.2-cp313-cp313t-win32.whl", hash = "sha256:52b913ec40ff7ae845687b0b34d8d93b60cb66dcee06996dd5c99f2fc9328657", size = 6082433, upload-time = "2026-01-31T23:11:58.096Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/07/38/e054a61cfe48ad9f1ed0d188e78b7e26859d0b60ef21cd9de4897cdb5326/numpy-2.4.2-cp313-cp313t-win_amd64.whl", hash = "sha256:5eea80d908b2c1f91486eb95b3fb6fab187e569ec9752ab7d9333d2e66bf2d6b", size = 12451181, upload-time = "2026-01-31T23:11:59.782Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/6e/a4/a05c3a6418575e185dd84d0b9680b6bb2e2dc3e4202f036b7b4e22d6e9dc/numpy-2.4.2-cp313-cp313t-win_arm64.whl", hash = "sha256:fd49860271d52127d61197bb50b64f58454e9f578cb4b2c001a6de8b1f50b0b1", size = 10290756, upload-time = "2026-01-31T23:12:02.438Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/18/88/b7df6050bf18fdcfb7046286c6535cabbdd2064a3440fca3f069d319c16e/numpy-2.4.2-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:444be170853f1f9d528428eceb55f12918e4fda5d8805480f36a002f1415e09b", size = 16663092, upload-time = "2026-01-31T23:12:04.521Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/25/7a/1fee4329abc705a469a4afe6e69b1ef7e915117747886327104a8493a955/numpy-2.4.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:d1240d50adff70c2a88217698ca844723068533f3f5c5fa6ee2e3220e3bdb000", size = 14698770, upload-time = "2026-01-31T23:12:06.96Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/fb/0b/f9e49ba6c923678ad5bc38181c08ac5e53b7a5754dbca8e581aa1a56b1ff/numpy-2.4.2-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:7cdde6de52fb6664b00b056341265441192d1291c130e99183ec0d4b110ff8b1", size = 5208562, upload-time = "2026-01-31T23:12:09.632Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/7d/12/d7de8f6f53f9bb76997e5e4c069eda2051e3fe134e9181671c4391677bb2/numpy-2.4.2-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:cda077c2e5b780200b6b3e09d0b42205a3d1c68f30c6dceb90401c13bff8fe74", size = 6543710, upload-time = "2026-01-31T23:12:11.969Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/09/63/c66418c2e0268a31a4cf8a8b512685748200f8e8e8ec6c507ce14e773529/numpy-2.4.2-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d30291931c915b2ab5717c2974bb95ee891a1cf22ebc16a8006bd59cd210d40a", size = 15677205, upload-time = "2026-01-31T23:12:14.33Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bba37bc29d4d85761deed3954a1bc62be7cf462b9510b51d367b769a8c8df325", size = 16611738, upload-time = "2026-01-31T23:12:16.525Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/c2/a7/39c4cdda9f019b609b5c473899d87abff092fc908cfe4d1ecb2fcff453b0/numpy-2.4.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b2f0073ed0868db1dcd86e052d37279eef185b9c8db5bf61f30f46adac63c909", size = 17028888, upload-time = "2026-01-31T23:12:19.306Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/da/b3/e84bb64bdfea967cc10950d71090ec2d84b49bc691df0025dddb7c26e8e3/numpy-2.4.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7f54844851cdb630ceb623dcec4db3240d1ac13d4990532446761baede94996a", size = 18339556, upload-time = "2026-01-31T23:12:21.816Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/88/f5/954a291bc1192a27081706862ac62bb5920fbecfbaa302f64682aa90beed/numpy-2.4.2-cp314-cp314-win32.whl", hash = "sha256:12e26134a0331d8dbd9351620f037ec470b7c75929cb8a1537f6bfe411152a1a", size = 6006899, upload-time = "2026-01-31T23:12:24.14Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/05/cb/eff72a91b2efdd1bc98b3b8759f6a1654aa87612fc86e3d87d6fe4f948c4/numpy-2.4.2-cp314-cp314-win_amd64.whl", hash = "sha256:068cdb2d0d644cdb45670810894f6a0600797a69c05f1ac478e8d31670b8ee75", size = 12443072, upload-time = "2026-01-31T23:12:26.33Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/37/75/62726948db36a56428fce4ba80a115716dc4fad6a3a4352487f8bb950966/numpy-2.4.2-cp314-cp314-win_arm64.whl", hash = "sha256:6ed0be1ee58eef41231a5c943d7d1375f093142702d5723ca2eb07db9b934b05", size = 10494886, upload-time = "2026-01-31T23:12:28.488Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/36/2f/ee93744f1e0661dc267e4b21940870cabfae187c092e1433b77b09b50ac4/numpy-2.4.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:98f16a80e917003a12c0580f97b5f875853ebc33e2eaa4bccfc8201ac6869308", size = 14818567, upload-time = "2026-01-31T23:12:30.709Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/a7/24/6535212add7d76ff938d8bdc654f53f88d35cddedf807a599e180dcb8e66/numpy-2.4.2-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:20abd069b9cda45874498b245c8015b18ace6de8546bf50dfa8cea1696ed06ef", size = 5328372, upload-time = "2026-01-31T23:12:32.962Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/5e/9d/c48f0a035725f925634bf6b8994253b43f2047f6778a54147d7e213bc5a7/numpy-2.4.2-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:e98c97502435b53741540a5717a6749ac2ada901056c7db951d33e11c885cc7d", size = 6649306, upload-time = "2026-01-31T23:12:34.797Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/81/05/7c73a9574cd4a53a25907bad38b59ac83919c0ddc8234ec157f344d57d9a/numpy-2.4.2-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:da6cad4e82cb893db4b69105c604d805e0c3ce11501a55b5e9f9083b47d2ffe8", size = 15722394, upload-time = "2026-01-31T23:12:36.565Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/35/fa/4de10089f21fc7d18442c4a767ab156b25c2a6eaf187c0db6d9ecdaeb43f/numpy-2.4.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9e4424677ce4b47fe73c8b5556d876571f7c6945d264201180db2dc34f676ab5", size = 16653343, upload-time = "2026-01-31T23:12:39.188Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/b8/f9/d33e4ffc857f3763a57aa85650f2e82486832d7492280ac21ba9efda80da/numpy-2.4.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:2b8f157c8a6f20eb657e240f8985cc135598b2b46985c5bccbde7616dc9c6b1e", size = 17078045, upload-time = "2026-01-31T23:12:42.041Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/c8/b8/54bdb43b6225badbea6389fa038c4ef868c44f5890f95dd530a218706da3/numpy-2.4.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5daf6f3914a733336dab21a05cdec343144600e964d2fcdabaac0c0269874b2a", size = 18380024, upload-time = "2026-01-31T23:12:44.331Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/a5/55/6e1a61ded7af8df04016d81b5b02daa59f2ea9252ee0397cb9f631efe9e5/numpy-2.4.2-cp314-cp314t-win32.whl", hash = "sha256:8c50dd1fc8826f5b26a5ee4d77ca55d88a895f4e4819c7ecc2a9f5905047a443", size = 6153937, upload-time = "2026-01-31T23:12:47.229Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/45/aa/fa6118d1ed6d776b0983f3ceac9b1a5558e80df9365b1c3aa6d42bf9eee4/numpy-2.4.2-cp314-cp314t-win_amd64.whl", hash = "sha256:fcf92bee92742edd401ba41135185866f7026c502617f422eb432cfeca4fe236", size = 12631844, upload-time = "2026-01-31T23:12:48.997Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/32/0a/2ec5deea6dcd158f254a7b372fb09cfba5719419c8d66343bab35237b3fb/numpy-2.4.2-cp314-cp314t-win_arm64.whl", hash = "sha256:1f92f53998a17265194018d1cc321b2e96e900ca52d54c7c77837b71b9465181", size = 10565379, upload-time = "2026-01-31T23:12:51.345Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opencv-python"
|
||||||
|
version = "4.13.0.92"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
dependencies = [
|
||||||
|
{ name = "numpy" },
|
||||||
|
]
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/fc/6f/5a28fef4c4a382be06afe3938c64cc168223016fa520c5abaf37e8862aa5/opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19", size = 46247052, upload-time = "2026-02-05T07:01:25.046Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/08/ac/6c98c44c650b8114a0fb901691351cfb3956d502e8e9b5cd27f4ee7fbf2f/opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9", size = 32568781, upload-time = "2026-02-05T07:01:41.379Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/3e/51/82fed528b45173bf629fa44effb76dff8bc9f4eeaee759038362dfa60237/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a", size = 47685527, upload-time = "2026-02-05T06:59:11.24Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/db/07/90b34a8e2cf9c50fe8ed25cac9011cde0676b4d9d9c973751ac7616223a2/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf", size = 70460872, upload-time = "2026-02-05T06:59:19.162Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/02/6d/7a9cc719b3eaf4377b9c2e3edeb7ed3a81de41f96421510c0a169ca3cfd4/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616", size = 46708208, upload-time = "2026-02-05T06:59:15.419Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/fd/55/b3b49a1b97aabcfbbd6c7326df9cb0b6fa0c0aefa8e89d500939e04aa229/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5", size = 72927042, upload-time = "2026-02-05T06:59:23.389Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/fb/17/de5458312bcb07ddf434d7bfcb24bb52c59635ad58c6e7c751b48949b009/opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59", size = 30932638, upload-time = "2026-02-05T07:02:14.98Z" },
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/e9/a5/1be1516390333ff9be3a9cb648c9f33df79d5096e5884b5df71a588af463/opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5", size = 40212062, upload-time = "2026-02-05T07:02:12.724Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "requests"
|
||||||
|
version = "2.32.5"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
dependencies = [
|
||||||
|
{ name = "certifi" },
|
||||||
|
{ name = "charset-normalizer" },
|
||||||
|
{ name = "idna" },
|
||||||
|
{ name = "urllib3" },
|
||||||
|
]
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "urllib3"
|
||||||
|
version = "2.6.3"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" },
|
||||||
|
]
|
||||||
Reference in New Issue
Block a user