diff --git a/.python-version b/.python-version new file mode 100644 index 0000000..e4fba21 --- /dev/null +++ b/.python-version @@ -0,0 +1 @@ +3.12 diff --git a/accuracy_analysis_report.md b/accuracy_analysis_report.md new file mode 100644 index 0000000..829e9aa --- /dev/null +++ b/accuracy_analysis_report.md @@ -0,0 +1,170 @@ +# Jersey Color Detection Accuracy Analysis + +## Test Configuration + +- **Models tested:** Gemini 3 Flash Preview (cloud API), Qwen3-VL-8B (local, via llama.cpp) +- **Prompts tested:** `jersey_prompt.txt` (original), `jersey_prompt_capstone.txt` (capstone) +- **Test images:** 161 annotated basketball jersey images +- **Ground truth colors:** 202 (excluding white) +- **Images resized** to max 768px wide before submission + +--- + +## Summary Comparison + +| Metric | Gemini + Original | Gemini + Capstone | Qwen + Original | Qwen + Capstone | +|----------------------------|:-----------------:|:-----------------:|:----------------:|:---------------:| +| **Recall (exact)** | 64.4% | 60.9% | 64.4% | 65.8% | +| **Recall (exact+similar)** | **81.2%** | 78.2% | 77.2% | 77.7% | +| **Recall (missed)** | 18.8% | 21.8% | 22.8% | 22.3% | +| **Precision (exact)** | 74.7% | 70.7% | 70.7% | 73.9% | +| **Precision (exact+sim.)** | **93.7%** | 90.2% | 84.8% | 87.2% | +| **Extra/wrong** | **6.3%** | 9.8% | 15.2% | 12.8% | +| PASS images | **124** | 118 | 117 | 119 | +| PARTIAL images | 19 | 21 | 18 | 19 | +| FAIL images | **18** | 22 | 26 | 23 | +| Avg time per image | 13.3s | 11.7s | 9.5s | 8.9s | + +### Key Takeaways + +1. **Gemini + original prompt is the best combination** across all major metrics: highest recall (81.2%), highest precision (93.7%), fewest failures (18), and fewest extra/wrong colors (6.3%). + +2. **Exact recall is remarkably stable** across all four runs (60.9%–65.8%), suggesting ~35% of ground truth colors are inherently difficult for current VLMs regardless of model or prompt. + +3. **Gemini produces far fewer hallucinated colors** than Qwen. Gemini's extra/wrong rate is 6.3%–9.8% vs. Qwen's 12.8%–15.2%. When Gemini detects a color, it is almost always correct. + +4. **The capstone prompt did not improve results** for either model. For Gemini it degraded both recall and precision. For Qwen the difference was negligible. + +5. **Qwen is ~30% faster** (8.9–9.5s vs 11.7–13.3s per image) but at the cost of lower accuracy and more false positives. + +--- + +## Color-Level Analysis + +### Most Problematic Ground Truth Colors + +Colors most frequently missed across all four test runs: + +| Color | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Total Misses | Common Confusion | +|-----------------|:-----------:|:----------:|:---------:|:--------:|:------------:|---------------------| +| **gray** | 7 | 6 | 7 | 9 | 29 | Often returned as "grey" (similar match) or missed entirely | +| **maroon** | 5 | 9 | 8 | 7 | 29 | Frequently confused with "red" | +| **black** | 7 | 7 | 6 | 6 | 26 | Often not detected at all | +| **light blue** | 2 | 2 | 8 | 5 | 17 | Returned as "blue" (Qwen especially) | +| **green** | 3 | 4 | 3 | 4 | 14 | Sometimes returned as "black" | +| **dark brown** | 0 | 1 | 4 | 4 | 9 | Returned as "black" or "brown" | +| **brown** | 1 | 1 | 3 | 3 | 8 | Returned as "black" or "orange" | +| **teal** | 2 | 2 | 2 | 2 | 8 | Confused with "green" or "blue" | +| **blue** | 3 | 3 | 3 | 2 | 11 | Sometimes not detected at all | +| **gold/yellow** | 2 | 2 | 1 | 1 | 6 | Occasionally missed entirely | + +### Most Common Extra/Wrong Colors Reported + +| Extra Color | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Notes | +|--------------|:-----------:|:----------:|:---------:|:--------:|-------| +| **red** | 3 | 7 | 7 | 6 | Typically a misread of maroon | +| **black** | 2 | 4 | 7 | 7 | Misread of dark brown/green/gray | +| **blue** | 3 | 2 | 10 | 6 | Misread of light blue or teal | +| **green** | 1 | 1 | 1 | 1 | Misread of teal | +| **orange** | 1 | 1 | 1 | 1 | Misread of brown | + +### Similar-Match Confusion Patterns + +These are cases where the VLM returned a color in the same family but not the exact ground truth term: + +| Expected | Returned As | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | +|------------------|----------------|:-----------:|:----------:|:---------:|:--------:| +| gray | grey | 9 | 10 | — | — | +| navy blue | blue | 7 | 6 | 8 | 8 | +| dark blue | blue | 5 | 6 | 10 | 9 | +| dark brown | brown | 5 | 5 | 2 | 2 | +| gold | yellow | 3 | 2 | 5 | 3 | +| dark blue | navy blue/navy | 4 | 4 | — | 1 | + +**Observations:** +- **gray/grey** is purely a spelling variant — Gemini consistently uses British spelling. Qwen uses "gray" so this never triggers for Qwen. +- **navy blue → blue** and **dark blue → blue** are the most common simplifications. Both models tend to drop shade qualifiers. +- **dark brown → brown** follows the same pattern of dropping the shade qualifier. +- **gold → yellow** is a genuine color perception difference where models see yellow-dominant gold jerseys. + +--- + +## Persistently Failed Images + +These 11 images failed across **all four** test runs, representing the hardest cases: + +| Image | GT Colors | Typical VLM Response | Failure Pattern | +|-------|-----------|---------------------|-----------------| +| 016 - maroon.jpg | maroon | (none) or red | Maroon not recognized | +| 029 - maroon_white.jpg | maroon | red | Maroon → red confusion | +| 034 - light blue.jpg | light blue | blue | Shade qualifier dropped | +| 046 - green.jpg | green | black | Dark green misread as black | +| 053 - black_white.jpg | black | (not detected) | Black jerseys missed | +| 057 - gold or yellow.jpg | gold\|yellow | (not detected) | Gold/yellow missed | +| 132 - brown_white.jpg | brown | orange | Brown → orange confusion | +| 134 - teal_white.jpg | teal | blue or green | Teal not in model vocabulary | +| 138 - maroon.jpg | maroon | red | Maroon → red confusion | +| 150 - green_gray.jpg | green, gray | black | Both colors misread | +| 160 - blue_white.jpg | blue | (not detected) | Blue not detected | + +### Root Cause Categories + +1. **Maroon blindness (3 images):** Both models consistently classify maroon as red. This is the single largest systematic error. + +2. **Dark color confusion (3 images):** Dark green, brown, and black are frequently confused with each other, especially in low-contrast or shadowed images. + +3. **Shade qualifier loss (2 images):** "Light blue" and "teal" are simplified to "blue" or "green" — models use a coarser color vocabulary than the ground truth. + +4. **Non-detection (3 images):** Some jerseys are simply not detected at all, likely due to occlusion, unusual angles, or low image quality. + +--- + +## Model-Specific Observations + +### Gemini 3 Flash +- **Strengths:** Highest precision (93.7%), very few hallucinated colors, good at similar-family matching. Never produced gibberish color names. +- **Weaknesses:** Consistently uses British "grey" instead of "gray". Slower than local model. +- **Prompt sensitivity:** The capstone prompt slightly hurt performance (81.2% → 78.2% recall), suggesting the original simpler prompt works better. + +### Qwen3-VL-8B +- **Strengths:** Faster inference (8.9s avg). Slightly higher exact match rate with capstone prompt (65.8%). +- **Weaknesses:** Much higher false positive rate (12.8–15.2% extra/wrong). Struggles significantly with "light blue" (8 misses with original prompt). Produced one gibberish color ("redolas"). Over-reports "blue" and "black". +- **Prompt sensitivity:** Minimal difference between prompts. Capstone prompt slightly reduced errors. + +--- + +## Recommendations + +1. **Normalize "grey" → "gray"** in post-processing to eliminate the most common similar-match gap for Gemini. + +2. **Add "maroon" to the prompt** as an explicit color option or example, since both models struggle to distinguish it from red without guidance. + +3. **Consider a constrained color vocabulary** in the prompt (e.g., "Choose from: red, blue, green, yellow, orange, purple, black, gray, brown, maroon, teal, light blue, navy blue, gold, pink") to reduce vocabulary mismatch and shade-qualifier drift. + +4. **Post-processing color mapping** could recover many similar-match cases automatically: navy→navy blue, grey→gray, dark blue→navy blue, etc. + +5. **The original `jersey_prompt.txt` is the better prompt** — the capstone prompt's additional constraints did not improve accuracy for either model. + +--- + +## Appendix: Color Similarity Families + +The following color families were used for "similar match" scoring. Two colors count as a similar match if they appear in the same family: + +| Family | Member Colors | +|------------|-------------------------------------------------------| +| blue | blue, dark blue, navy blue, navy, royal blue | +| light_blue | light blue, sky blue, baby blue, carolina blue, powder blue | +| red | red, scarlet, crimson | +| dark_red | maroon, burgundy, dark red, wine | +| green | green, dark green, forest green, kelly green | +| yellow | yellow, gold, golden | +| orange | orange, burnt orange | +| brown | brown, dark brown | +| purple | purple, violet | +| gray | gray, grey, silver, charcoal | +| black | black | +| teal | teal, turquoise, cyan, aqua | +| pink | pink, magenta, hot pink, rose | + +**Note:** Colors in *different* families are never counted as similar, even if perceptually close (e.g., maroon and red are in separate families; brown and orange are in separate families). This is intentional — the similar-match metric captures vocabulary variation within the same color concept, not genuine color misidentification. diff --git a/accuracy_analysis_report_round2.html b/accuracy_analysis_report_round2.html new file mode 100644 index 0000000..03df706 --- /dev/null +++ b/accuracy_analysis_report_round2.html @@ -0,0 +1,760 @@ + + +
+ + +| Metric | +Qwen Original | +Qwen Capstone | +Qwen Constrained | +Gemini Original | +Gemini Capstone | +Gemini Constrained | +
|---|---|---|---|---|---|---|
| Recall (exact) | +65.3% | +66.3% | +71.8% | +62.4% | +60.9% | +67.8% | +
| Recall (exact+similar) | +78.2% | +78.2% | +82.7% | +79.7% | +78.2% | +81.7% | +
| Missed | +21.8% | +21.8% | +17.3% | +20.3% | +21.8% | +18.3% | +
| Precision (exact) | +71.7% | +74.0% | +78.4% | +72.0% | +69.5% | +78.7% | +
| Precision (exact+sim.) | +85.9% | +87.3% | +90.3% | +91.4% | +88.7% | +94.3% | +
| Extra/wrong | +14.1% | +12.7% | +9.7% | +8.6% | +11.3% | +5.7% | +
| PASS | +118 | +120 | +127 | +120 | +117 | +124 | +
| PARTIAL | +19 | +19 | +15 | +20 | +22 | +19 | +
| FAIL | +24 | +22 | +19 | +21 | +22 | +18 | +
| Total time | +1557s | +1437s | +1596s | +253s | +260s | +344s | +
The constrained vocabulary prompt delivered the strongest results across the board:
+ +The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:
+ +| Model | +Exact Match (Original) | +Exact Match (Constrained) | +Improvement | +
|---|---|---|---|
| Qwen | +65.3% (132) | +71.8% (145) | ++6.5 pp | +
| Gemini | +62.4% (126) | +67.8% (137) | ++5.4 pp | +
This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."
+ +The constrained prompt's explicit color guidance fixed the worst systematic errors:
+ +| Problem Color | +Qwen Misses (Orig→Constrained) | +Gemini Misses (Orig→Constrained) | +
|---|---|---|
| maroon | +8 → 3 | +6 → 3 | +
| light blue | +7 → 1 | +3 → 1 | +
| dark brown | +4 → 2 | +1 → 1 | +
| teal | +2 → 2 | +2 → 2 | +
| gray | +7 → 8 | +6 → 6 | +
| black | +6 → 6 | +7 → 7 | +
This overcorrection is a smaller problem than the original misses it replaced, but worth noting.
+ +The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains:
+ +| Previous Sequential Runs | +Current Concurrent Runs | +
|---|---|
| 2134s (13.3s avg) | +253s (1.6s avg) | +
| 1882s (11.7s avg) | +260s (1.6s avg) | +
| — | +344s (2.1s avg) | +
That's roughly an 8x speedup for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).
+ +These 10 images failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:
+ +| Image | +GT Colors | +Typical Error | +
|---|---|---|
| 016 - maroon.jpg | +maroon | +Not detected or called "red" | +
| 034 - light blue.jpg | +light blue | +Called "blue" | +
| 046 - green.jpg | +green | +Called "black" | +
| 053 - black_white.jpg | +black | +Not detected | +
| 077 - teal_white.jpg | +teal | +Called "green" | +
| 132 - brown_white.jpg | +brown | +Called "orange" | +
| 134 - teal_white.jpg | +teal | +Called "blue" or "light blue" | +
| 138 - maroon.jpg | +maroon | +Called "red" | +
| 150 - green_gray.jpg | +green, gray | +Called "black" | +
| 160 - blue_white.jpg | +blue | +Not detected | +
Notable improvements: Images 029 (maroon), 087/141/161 (light blue), and 099 (maroon) were previously persistent failures but were fixed by the constrained prompt for at least one model.
+ +jersey_prompt_constrained.txt) — it is the clear winner for both models, improving recall and precision simultaneously.grey → gray (catches any remaining Gemini outputs) and navy → navy blue (catches shorthand usage).| Family | +Member Colors | +
|---|---|
| blue | blue, dark blue, navy blue, navy, royal blue |
| light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
| red | red, scarlet, crimson |
| dark_red | maroon, burgundy, dark red, wine |
| green | green, dark green, forest green, kelly green |
| yellow | yellow, gold, golden |
| orange | orange, burnt orange |
| brown | brown, dark brown |
| purple | purple, violet |
| gray | gray, grey, silver, charcoal |
| black | black |
| teal | teal, turquoise, cyan, aqua |
| pink | pink, magenta, hot pink, rose |
jersey_prompt_constrained.txt)You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
+
+CRITICAL INSTRUCTIONS:
+1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
+2. ONLY include jersey numbers that you can ACTUALLY READ in the image
+3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
+4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
+5. DO NOT include jerseys if you cannot clearly see the number
+
+COLOR VOCABULARY:
+For "jersey_color" and "number_color", you MUST choose from this list ONLY:
+red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
+
+Important color distinctions:
+- Use "maroon" for dark brownish-red, NOT "red"
+- Use "light blue" for pale or sky blue, NOT "blue"
+- Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
+- Use "teal" for blue-green, NOT "green" or "blue"
+- Use "gray" (not "grey") for silver or neutral tones
+- Use "dark brown" for very dark brown, NOT "black"
+- Use "gold" for metallic or deep yellow, NOT "yellow"
+
+RESPONSE FORMAT:
+Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
+
+Use DOUBLE QUOTES (") for all JSON keys and string values.
+
+The JSON must have a single key "jerseys" with an array of dictionaries.
+
+Each dictionary must have exactly these three keys:
+- "jersey_number": The number on the jersey (as a string, only if clearly visible)
+- "jersey_color": The primary color of the jersey (MUST be from the color list above)
+- "number_color": The color of the number on the jersey (MUST be from the color list above)
+
+Example response for an image WITH visible jerseys:
+{
+ "jerseys": [
+ {
+ "jersey_number": "10",
+ "jersey_color": "maroon",
+ "number_color": "gold"
+ },
+ {
+ "jersey_number": "42",
+ "jersey_color": "light blue",
+ "number_color": "white"
+ }
+ ]
+}
+
+Example response for an image WITHOUT jerseys or with unclear numbers:
+{"jerseys": []}
+
+REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
+
+Now analyze the image and return the JSON object.
+
+
+
diff --git a/accuracy_analysis_report_round2.md b/accuracy_analysis_report_round2.md
new file mode 100644
index 0000000..bb38c4a
--- /dev/null
+++ b/accuracy_analysis_report_round2.md
@@ -0,0 +1,229 @@
+# Jersey Color Detection Accuracy — Round 2 Analysis
+
+**Date:** March 3, 2026
+**Models:** Gemini 3 Flash Preview, Qwen3-VL-8B (local via llama.cpp)
+**Prompts:** jersey_prompt.txt (original), jersey_prompt_capstone.txt (capstone), jersey_prompt_constrained.txt (constrained)
+**Test set:** 161 annotated images, 202 ground truth colors (excluding white)
+
+---
+
+## Summary Comparison
+
+| Metric | Qwen Original | Qwen Capstone | Qwen Constrained | Gemini Original | Gemini Capstone | Gemini Constrained |
+|----------------------------|:-------------:|:-------------:|:-----------------:|:---------------:|:---------------:|:------------------:|
+| **Recall (exact)** | 65.3% | 66.3% | **71.8%** | 62.4% | 60.9% | 67.8% |
+| **Recall (exact+similar)** | 78.2% | 78.2% | **82.7%** | 79.7% | 78.2% | 81.7% |
+| **Missed** | 21.8% | 21.8% | **17.3%** | 20.3% | 21.8% | 18.3% |
+| **Precision (exact)** | 71.7% | 74.0% | 78.4% | 72.0% | 69.5% | **78.7%** |
+| **Precision (exact+sim.)** | 85.9% | 87.3% | 90.3% | 91.4% | 88.7% | **94.3%** |
+| **Extra/wrong** | 14.1% | 12.7% | 9.7% | 8.6% | 11.3% | **5.7%** |
+| PASS | 118 | 120 | **127** | 120 | 117 | 124 |
+| PARTIAL | 19 | 19 | **15** | 20 | 22 | 19 |
+| FAIL | 24 | 22 | 19 | 21 | 22 | **18** |
+| Total time | 1557s | 1437s | 1596s | 253s | 260s | 344s |
+
+---
+
+## Key Findings
+
+### 1. The constrained prompt is the best prompt for both models
+
+The constrained vocabulary prompt delivered the strongest results across the board:
+
+- **Qwen + Constrained** achieved the highest recall of any combination at **82.7%** (167/202 found), up from 78.2% with both other prompts. It also posted the most PASS images (**127**, up from 118/120) and the fewest FAIL images (**19**, down from 24/22).
+
+- **Gemini + Constrained** achieved the highest precision of any combination at **94.3%** (164/174 correct), with only **5.7% extra/wrong** colors — the lowest error rate across all six runs. It tied for fewest failures at **18**.
+
+### 2. Exact match rates jumped significantly
+
+The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:
+
+| Model | Exact Match (Original) | Exact Match (Constrained) | Improvement |
+|--------|:----------------------:|:-------------------------:|:-----------:|
+| Qwen | 65.3% (132) | **71.8% (145)** | +6.5 pp |
+| Gemini | 62.4% (126) | **67.8% (137)** | +5.4 pp |
+
+This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."
+
+### 3. Targeted color improvements
+
+The constrained prompt's explicit color guidance fixed the worst systematic errors:
+
+| Problem Color | Qwen Misses (Orig→Constrained) | Gemini Misses (Orig→Constrained) |
+|----------------|:------------------------------:|:--------------------------------:|
+| **maroon** | 8 → **3** | 6 → **3** |
+| **light blue** | 7 → **1** | 3 → **1** |
+| **dark brown** | 4 → **2** | 1 → 1 |
+| **teal** | 2 → **2** | 2 → 2 |
+| **gray** | 7 → 8 | 6 → 6 |
+| **black** | 6 → 6 | 7 → 7 |
+
+- **Maroon:** Cut in half for both models. Previously the most-missed color for Qwen; now ranks 5th.
+- **Light blue:** Near-elimination of the "light blue → blue" confusion for both models (7→1 for Qwen, 3→1 for Gemini).
+- **Gray/grey:** The spelling normalization instruction eliminated the grey→gray similar-match penalty for Gemini entirely (10 confusions → 0). However, gray detection misses remain unchanged — these are images where gray jerseys aren't detected at all, not a naming issue.
+- **Teal and black** remain stubbornly problematic regardless of prompt.
+
+### 4. New overcorrection pattern with constrained prompt
+
+The constrained prompt introduced a new failure mode — models now occasionally over-apply newly-learned color terms:
+
+- **Qwen + Constrained** reported "maroon" as an extra/wrong color **5 times** (was 0 previously). It's now calling some brown and red jerseys "maroon" — the opposite of the original problem. Specific cases: 007 (brown→maroon), 031 (brown→maroon), 048 (red→maroon), 142 (orange→maroon).
+
+- **Gemini + Constrained** reported "light blue" as an extra/wrong color **2 times** (was 0 previously), including misidentifying navy blue as light blue (image 081).
+
+This overcorrection is a smaller problem than the original misses it replaced, but worth noting.
+
+### 5. The capstone prompt did not improve results
+
+The capstone prompt performed at or slightly below the original prompt for both models:
+
+- Qwen: 78.2% recall (same), 87.3% precision (slight improvement)
+- Gemini: 78.2% recall (down from 79.7%), 88.7% precision (down from 91.4%)
+
+The capstone prompt's emphasis on precision over recall ("do not guess") may have hurt overall detection rates without meaningfully improving color accuracy.
+
+### 6. Gemini speed improvement from concurrency
+
+The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains for the Gemini runs:
+
+| Previous sequential runs | Current concurrent runs |
+|:------------------------:|:-----------------------:|
+| 2134s (13.3s avg) | 253s (1.6s avg) |
+| 1882s (11.7s avg) | 260s (1.6s avg) |
+| | 344s (2.1s avg) |
+
+That's roughly an **8x speedup** for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).
+
+---
+
+## Persistently Failed Images
+
+These **10 images** failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:
+
+| Image | GT Colors | Typical Error |
+|-------|-----------|---------------|
+| 016 - maroon.jpg | maroon | Not detected or called "red" |
+| 034 - light blue.jpg | light blue | Called "blue" |
+| 046 - green.jpg | green | Called "black" |
+| 053 - black_white.jpg | black | Not detected |
+| 077 - teal_white.jpg | teal | Called "green" |
+| 132 - brown_white.jpg | brown | Called "orange" |
+| 134 - teal_white.jpg | teal | Called "blue" or "light blue" |
+| 138 - maroon.jpg | maroon | Called "red" |
+| 150 - green_gray.jpg | green, gray | Called "black" |
+| 160 - blue_white.jpg | blue | Not detected |
+
+Notable improvements: Images **029** (maroon), **087/141/161** (light blue), and **099** (maroon) were previously persistent failures but were **fixed by the constrained prompt** for at least one model.
+
+---
+
+## Model Comparison
+
+### Gemini 3 Flash
+- **Best at:** Precision (94.3% with constrained prompt), fewest hallucinated colors
+- **Weakness:** Lower exact recall than Qwen; still uses shade variants even with constraints
+- **Speed:** ~250-340s with 8 concurrent workers
+
+### Qwen3-VL-8B
+- **Best at:** Recall (82.7% with constrained prompt), highest PASS count (127)
+- **Weakness:** Higher false positive rate; introduced "maroon" overcorrection with constrained prompt
+- **Speed:** ~1440-1600s sequential (local GPU inference)
+
+---
+
+## Recommendations
+
+1. **Use the constrained prompt** (`jersey_prompt_constrained.txt`) — it is the clear winner for both models, improving recall and precision simultaneously.
+
+2. **Post-processing normalization** could still recover additional matches:
+ - Map `grey` → `gray` (catches any remaining Gemini outputs)
+ - Map `navy` → `navy blue` (catches shorthand usage)
+
+3. **Consider a brown/maroon calibration** — the constrained prompt overcorrected on Qwen, turning brown→maroon confusion into a new error source. Adding "Use 'brown' for warm, non-reddish dark colors" or similar guidance may help.
+
+4. **Gray and black detection remain unsolved** at the prompt level — these are likely image quality or model perception limitations that no amount of prompt engineering will fix. These colors may benefit from a secondary computer vision pass (e.g., dominant color extraction from the jersey region).
+
+5. **Retire the capstone prompt** — it offered no benefit over the original and performed worse than the constrained prompt in every metric.
+
+---
+
+## Appendix: Color Similarity Families Used for Scoring
+
+| Family | Member Colors |
+|------------|-------------------------------------------------------|
+| blue | blue, dark blue, navy blue, navy, royal blue |
+| light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
+| red | red, scarlet, crimson |
+| dark_red | maroon, burgundy, dark red, wine |
+| green | green, dark green, forest green, kelly green |
+| yellow | yellow, gold, golden |
+| orange | orange, burnt orange |
+| brown | brown, dark brown |
+| purple | purple, violet |
+| gray | gray, grey, silver, charcoal |
+| black | black |
+| teal | teal, turquoise, cyan, aqua |
+| pink | pink, magenta, hot pink, rose |
+
+---
+
+## Appendix: Constrained Prompt (`jersey_prompt_constrained.txt`)
+
+```
+You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
+
+CRITICAL INSTRUCTIONS:
+1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
+2. ONLY include jersey numbers that you can ACTUALLY READ in the image
+3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
+4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
+5. DO NOT include jerseys if you cannot clearly see the number
+
+COLOR VOCABULARY:
+For "jersey_color" and "number_color", you MUST choose from this list ONLY:
+red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
+
+Important color distinctions:
+- Use "maroon" for dark brownish-red, NOT "red"
+- Use "light blue" for pale or sky blue, NOT "blue"
+- Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
+- Use "teal" for blue-green, NOT "green" or "blue"
+- Use "gray" (not "grey") for silver or neutral tones
+- Use "dark brown" for very dark brown, NOT "black"
+- Use "gold" for metallic or deep yellow, NOT "yellow"
+
+RESPONSE FORMAT:
+Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
+
+Use DOUBLE QUOTES (") for all JSON keys and string values.
+
+The JSON must have a single key "jerseys" with an array of dictionaries.
+
+Each dictionary must have exactly these three keys:
+- "jersey_number": The number on the jersey (as a string, only if clearly visible)
+- "jersey_color": The primary color of the jersey (MUST be from the color list above)
+- "number_color": The color of the number on the jersey (MUST be from the color list above)
+
+Example response for an image WITH visible jerseys:
+{
+ "jerseys": [
+ {
+ "jersey_number": "10",
+ "jersey_color": "maroon",
+ "number_color": "gold"
+ },
+ {
+ "jersey_number": "42",
+ "jersey_color": "light blue",
+ "number_color": "white"
+ }
+ ]
+}
+
+Example response for an image WITHOUT jerseys or with unclear numbers:
+{"jerseys": []}
+
+REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
+
+Now analyze the image and return the JSON object.
+```
diff --git a/accuracy_test_results.md b/accuracy_test_results.md
new file mode 100644
index 0000000..486863f
--- /dev/null
+++ b/accuracy_test_results.md
@@ -0,0 +1,490 @@
+#Gemini 3 Flash Results (Prompt: jersey_prompt.txt):
+
+================================================================================
+ACCURACY SUMMARY (gemini-3-flash-preview)
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 2134.4s (13.3s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 174 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 130 / 202 (64.4%)
+ Similar match: 34 / 202 (16.8%)
+ Total found: 164 / 202 (81.2%)
+ Missed: 38 / 202 (18.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 130 / 174 (74.7%)
+ Similar match: 33 / 174 (19.0%)
+ Total correct: 163 / 174 (93.7%)
+ Extra/wrong: 11 / 174 (6.3%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ gray -> grey x9
+ navy blue -> blue x7
+ dark brown -> brown x5
+ dark blue -> blue x5
+ gold -> yellow x3
+ dark blue -> navy blue x3
+ navy -> navy blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ gray 7 #######
+ black 7 #######
+ maroon 5 #####
+ blue 3 ###
+ green 3 ###
+ gold 2 ##
+ light blue 2 ##
+ gold|yellow 2 ##
+ red 2 ##
+ teal 2 ##
+ orange 1 #
+ yellow 1 #
+ brown 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ red 3 ###
+ blue 3 ###
+ black 2 ##
+ green 1 #
+ orange 1 #
+ dark blue 1 #
+
+--- Per-Image Verdict ---
+ PASS 124
+ PARTIAL 19
+ FAIL 18
+
+--- Failed Images (18) ---
+ 016 - maroon.jpg
+ missed: maroon
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 074 - white_orange.jpg
+ missed: orange
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 088 - white_maroon.jpg
+ missed: maroon
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+
+#Qwen3-VL-8B Model Results (Prompt: jersey_prompt.txt):
+
+================================================================================
+ACCURACY SUMMARY
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1526.4s (9.5s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 184 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 130 / 202 (64.4%)
+ Similar match: 26 / 202 (12.9%)
+ Total found: 156 / 202 (77.2%)
+ Missed: 46 / 202 (22.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 130 / 184 (70.7%)
+ Similar match: 26 / 184 (14.1%)
+ Total correct: 156 / 184 (84.8%)
+ Extra/wrong: 28 / 184 (15.2%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ dark blue -> blue x10
+ navy blue -> blue x8
+ gold -> yellow x5
+ dark brown -> brown x2
+ navy -> blue x1
+
+--- Most Missed Ground Truth Colors ---
+ light blue 8 ########
+ maroon 8 ########
+ gray 7 #######
+ black 6 ######
+ dark brown 4 ####
+ brown 3 ###
+ blue 3 ###
+ green 3 ###
+ teal 2 ##
+ gold|yellow 1 #
+ red 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ blue 10 ##########
+ black 7 #######
+ red 7 #######
+ gold 1 #
+ green 1 #
+ redolas 1 #
+ orange 1 #
+
+--- Per-Image Verdict ---
+ PASS 117
+ PARTIAL 18
+ FAIL 26
+
+--- Failed Images (26) ---
+ 001 -brown_white or dark brown.jpg
+ missed: brown, dark brown
+ extra: black
+ 013 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 016 - maroon.jpg
+ missed: maroon
+ 017 - brown_white.jpg
+ missed: brown
+ extra: black
+ 022 - black_light blue.jpg
+ missed: black, light blue
+ extra: blue
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 036 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 063 - dark brown.jpg
+ missed: dark brown
+ extra: black
+ 069 - red_white.jpg
+ missed: red
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 078 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 087 - white_light blue.jpg
+ missed: light blue
+ extra: blue
+ 099 - maroon_white.jpg
+ missed: maroon
+ extra: redolas, red
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 141 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+
+#Gemini 3 Flash Results (Prompt: jersey_prompt_capstone.txt):
+
+================================================================================
+ACCURACY SUMMARY (gemini-3-flash-preview)
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1881.7s (11.7s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 174 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 123 / 202 (60.9%)
+ Similar match: 35 / 202 (17.3%)
+ Total found: 158 / 202 (78.2%)
+ Missed: 44 / 202 (21.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 123 / 174 (70.7%)
+ Similar match: 34 / 174 (19.5%)
+ Total correct: 157 / 174 (90.2%)
+ Extra/wrong: 17 / 174 (9.8%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ gray -> grey x10
+ navy blue -> blue x6
+ dark blue -> blue x6
+ dark brown -> brown x5
+ dark blue -> navy blue x3
+ gold -> yellow x2
+ navy blue -> navy x1
+ navy -> blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ maroon 9 #########
+ black 7 #######
+ gray 6 ######
+ green 4 ####
+ gold 3 ###
+ blue 3 ###
+ light blue 2 ##
+ gold|yellow 2 ##
+ red 2 ##
+ teal 2 ##
+ navy blue 1 #
+ dark brown 1 #
+ yellow 1 #
+ brown 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ red 7 #######
+ black 4 ####
+ blue 2 ##
+ green 1 #
+ orange 1 #
+ light blue 1 #
+ navy 1 #
+
+--- Per-Image Verdict ---
+ PASS 118
+ PARTIAL 21
+ FAIL 22
+
+--- Failed Images (22) ---
+ 016 - maroon.jpg
+ missed: maroon
+ 019 - maroon_gold.jpg
+ missed: maroon, gold
+ extra: red
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 030 - navy blue_white.jpg
+ missed: navy blue
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 036 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 088 - white_maroon.jpg
+ missed: maroon
+ 099 - maroon_white.jpg
+ missed: maroon
+ extra: red
+ 128 - green_white.jpg
+ missed: green
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: light blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+
+
+#Qwen3-VL-8B Model Results (Prompt: jersey_prompt_capstone.txt):
+
+================================================================================
+ACCURACY SUMMARY
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1435.7s (8.9s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 180 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 133 / 202 (65.8%)
+ Similar match: 24 / 202 (11.9%)
+ Total found: 157 / 202 (77.7%)
+ Missed: 45 / 202 (22.3%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 133 / 180 (73.9%)
+ Similar match: 24 / 180 (13.3%)
+ Total correct: 157 / 180 (87.2%)
+ Extra/wrong: 23 / 180 (12.8%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ dark blue -> blue x9
+ navy blue -> blue x8
+ gold -> yellow x3
+ dark brown -> brown x2
+ navy -> blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ gray 9 #########
+ maroon 7 #######
+ black 6 ######
+ light blue 5 #####
+ dark brown 4 ####
+ green 4 ####
+ brown 3 ###
+ gold 2 ##
+ blue 2 ##
+ teal 2 ##
+ gold|yellow 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ black 7 #######
+ blue 6 ######
+ red 6 ######
+ gold 1 #
+ green 1 #
+ orange 1 #
+ navy 1 #
+
+--- Per-Image Verdict ---
+ PASS 119
+ PARTIAL 19
+ FAIL 23
+
+--- Failed Images (23) ---
+ 001 -brown_white or dark brown.jpg
+ missed: brown, dark brown
+ extra: black
+ 013 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 016 - maroon.jpg
+ missed: maroon
+ 017 - brown_white.jpg
+ missed: brown
+ extra: black
+ 019 - maroon_gold.jpg
+ missed: maroon, gold
+ extra: red
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 036 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 039 - gray_white.jpg
+ missed: gray
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 063 - dark brown.jpg
+ missed: dark brown
+ extra: black
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 141 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 145 - green_white.jpg
+ missed: green
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
diff --git a/accuracy_test_results_all.txt b/accuracy_test_results_all.txt
new file mode 100644
index 0000000..3d163c9
--- /dev/null
+++ b/accuracy_test_results_all.txt
@@ -0,0 +1,5609 @@
+========================================
+Qwen3-VL-8B + jersey_prompt.txt
+Started: Tue Mar 3 04:40:45 PM MST 2026
+========================================
+Images to process: 161
+Server: http://agx:8080
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [black] (3 jersey(s), 11.1s)
+ FAIL MISS:brown,dark brown, extra:black
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [blue] (3 jersey(s), 10.8s)
+ PASS similar:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (3 jersey(s), 11.9s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 5.0s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 4.3s)
+ PASS similar:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 4.3s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [gold, red] (3 jersey(s), 10.8s)
+ PARTIAL exact:1, extra:gold
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 13.8s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 7.5s)
+ FAIL MISS:light blue, extra:blue
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 10.9s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 1.5s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [black] (2 jersey(s), 8.8s)
+ FAIL MISS:brown, extra:black
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [gray, red] (2 jersey(s), 7.4s)
+ PASS exact:2
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [red, yellow] (2 jersey(s), 7.7s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 4.9s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 4.3s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 16.3s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (2 jersey(s), 7.8s)
+ FAIL MISS:maroon, extra:red
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (3 jersey(s), 10.9s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (1 jersey(s), 4.7s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green, yellow] (2 jersey(s), 8.1s)
+ PASS exact:2
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (4 jersey(s), 13.7s)
+ FAIL MISS:light blue, extra:blue
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [blue] (3 jersey(s), 10.1s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (1 jersey(s), 5.1s)
+ PARTIAL exact:1, MISS:gray
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (8 jersey(s), 25.7s)
+ PASS similar:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, gray] (2 jersey(s), 7.9s)
+ PASS exact:2
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (5 jersey(s), 16.6s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (15 jersey(s), 46.4s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [red] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 7.9s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black, yellow] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 4.9s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [yellow] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 4.9s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 7.9s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 7.5s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [black] (1 jersey(s), 4.9s)
+ FAIL MISS:dark brown, extra:black
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (3 jersey(s), 10.4s)
+ PASS exact:1, similar:1
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 4.7s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [(none)] (4 jersey(s), 13.7s)
+ FAIL MISS:red
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.4s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (3 jersey(s), 11.4s)
+ PASS exact:1
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (4 jersey(s), 13.4s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 7.6s)
+ FAIL MISS:light blue, extra:blue
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 13.8s)
+ PARTIAL exact:1, MISS:maroon, extra:red
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (3 jersey(s), 10.6s)
+ PASS similar:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [black] (2 jersey(s), 7.8s)
+ FAIL MISS:dark brown, extra:black
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [black, yellow] (2 jersey(s), 7.9s)
+ PARTIAL exact:1, MISS:dark brown, extra:black
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [brown] (2 jersey(s), 8.0s)
+ PASS similar:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ FAIL MISS:light blue, extra:blue
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (4 jersey(s), 14.2s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 13.7s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 7.9s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 12.5s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 8.6s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [gray] (2 jersey(s), 8.0s)
+ PARTIAL exact:1, MISS:black
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 8.7s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (3 jersey(s), 12.0s)
+ FAIL MISS:maroon, extra:red
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 17.0s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (3 jersey(s), 10.9s)
+ PASS exact:2
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 11.1s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 9.1s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, gray] (2 jersey(s), 9.0s)
+ PASS exact:2
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 8.2s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 13.8s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 7.4s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (3 jersey(s), 10.9s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [blue, red] (3 jersey(s), 10.7s)
+ PASS exact:1, similar:1
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [gray] (1 jersey(s), 6.2s)
+ PASS exact:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (4 jersey(s), 13.7s)
+ PASS similar:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [blue, red] (2 jersey(s), 8.2s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (3 jersey(s), 10.9s)
+ FAIL MISS:blue
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (2 jersey(s), 8.4s)
+ PASS exact:2
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 10.8s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (3 jersey(s), 10.9s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (6 jersey(s), 21.1s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [blue] (1 jersey(s), 4.9s)
+ FAIL MISS:teal, extra:blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 4.9s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 8.0s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.6s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (3 jersey(s), 11.1s)
+ FAIL MISS:light blue, extra:blue
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (10 jersey(s), 31.9s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [gray, red] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 7.9s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 16.7s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [black] (2 jersey(s), 7.8s)
+ FAIL MISS:green,gray, extra:black
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [blue, yellow] (5 jersey(s), 16.7s)
+ PARTIAL exact:1, MISS:black, extra:blue
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [blue, pink] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [brown] (5 jersey(s), 16.8s)
+ PASS similar:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [gray, purple, yellow] (5 jersey(s), 17.3s)
+ PARTIAL exact:3, MISS:green
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (2 jersey(s), 7.7s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [blue, yellow] (4 jersey(s), 14.0s)
+ PASS exact:1, similar:1
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (1 jersey(s), 4.9s)
+ FAIL MISS:blue
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 7.7s)
+ FAIL MISS:light blue, extra:blue
+
+================================================================================
+ACCURACY SUMMARY
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1557.4s (9.7s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 184 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 132 / 202 (65.3%)
+ Similar match: 26 / 202 (12.9%)
+ Total found: 158 / 202 (78.2%)
+ Missed: 44 / 202 (21.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 132 / 184 (71.7%)
+ Similar match: 26 / 184 (14.1%)
+ Total correct: 158 / 184 (85.9%)
+ Extra/wrong: 26 / 184 (14.1%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ dark blue -> blue x10
+ navy blue -> blue x8
+ gold -> yellow x5
+ dark brown -> brown x2
+ navy -> blue x1
+
+--- Most Missed Ground Truth Colors ---
+ maroon 8 ########
+ gray 7 #######
+ light blue 7 #######
+ black 6 ######
+ dark brown 4 ####
+ brown 3 ###
+ blue 3 ###
+ green 3 ###
+ teal 2 ##
+ red 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ blue 9 #########
+ black 7 #######
+ red 7 #######
+ gold 1 #
+ green 1 #
+ orange 1 #
+
+--- Per-Image Verdict ---
+ PASS 118
+ PARTIAL 19
+ FAIL 24
+
+--- Failed Images (24) ---
+ 001 -brown_white or dark brown.jpg
+ missed: brown, dark brown
+ extra: black
+ 013 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 016 - maroon.jpg
+ missed: maroon
+ 017 - brown_white.jpg
+ missed: brown
+ extra: black
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 036 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 053 - black_white.jpg
+ missed: black
+ 063 - dark brown.jpg
+ missed: dark brown
+ extra: black
+ 069 - red_white.jpg
+ missed: red
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 078 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 087 - white_light blue.jpg
+ missed: light blue
+ extra: blue
+ 099 - maroon_white.jpg
+ missed: maroon
+ extra: red
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 141 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+========================================
+Gemini 3 Flash + jersey_prompt.txt
+Started: Tue Mar 3 05:06:43 PM MST 2026
+========================================
+Model: gemini-3-flash-preview
+Images to process: 161
+Concurrency: 8 workers
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars)
+================================================================================
+Pre-encoding images ... 161 images in 1.7s
+Sending API requests ...
+
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (253.2s total)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [brown] (1 jersey(s), 9.0s)
+ PASS exact:1, similar:1
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 6.6s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (3 jersey(s), 9.4s)
+ PASS similar:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (2 jersey(s), 10.5s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 3.0s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [dark blue] (1 jersey(s), 3.1s)
+ PASS similar:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 5.1s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 17.9s)
+ PASS exact:1
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black] (3 jersey(s), 11.3s)
+ PASS exact:1
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 8.5s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 10.2s)
+ FAIL MISS:light blue, extra:blue
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 6.1s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.4s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 3.2s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 4.8s)
+ PASS exact:1
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [grey] (1 jersey(s), 6.5s)
+ PARTIAL similar:1, MISS:red
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [maroon] (1 jersey(s), 4.4s)
+ PARTIAL exact:1, MISS:gold
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 5.6s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 3.3s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.7s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 5.1s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 3.7s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 6.8s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 35.2s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 37.9s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (2 jersey(s), 4.8s)
+ FAIL MISS:maroon, extra:red
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 38.6s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 4.9s)
+ PASS exact:1
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 5.3s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (5 jersey(s), 37.0s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (6 jersey(s), 15.9s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green, yellow] (2 jersey(s), 14.4s)
+ PASS exact:2
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 6.4s)
+ PASS exact:1
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [navy blue] (4 jersey(s), 8.3s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 8.2s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [grey] (2 jersey(s), 4.6s)
+ PASS similar:1
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [grey, maroon] (2 jersey(s), 7.3s)
+ PASS exact:1, similar:1
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [navy blue] (8 jersey(s), 42.8s)
+ PASS exact:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.0s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, grey] (5 jersey(s), 39.4s)
+ PASS exact:1, similar:1
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (8 jersey(s), 35.7s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 34.7s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (8 jersey(s), 39.6s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 6.5s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [(none)] (0 jersey(s), 7.4s)
+ FAIL MISS:red
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 3.3s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (6 jersey(s), 39.2s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.1s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black] (1 jersey(s), 3.2s)
+ PARTIAL exact:1, MISS:gold
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 3.2s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 3.5s)
+ PASS exact:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (2 jersey(s), 5.8s)
+ PASS exact:1, similar:1
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 12.5s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [(none)] (1 jersey(s), 4.1s)
+ FAIL MISS:gold|yellow
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 5.3s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 3.4s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 5.7s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (1 jersey(s), 3.0s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 5.7s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 5.1s)
+ PASS similar:1
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.0s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (4 jersey(s), 38.3s)
+ PASS exact:1, similar:1
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 3.3s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 10.7s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 6.1s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [(none)] (5 jersey(s), 39.4s)
+ FAIL MISS:red
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 6.2s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 11.6s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 4.7s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [(none)] (1 jersey(s), 7.0s)
+ FAIL MISS:orange
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 3.4s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 8.5s)
+ PASS exact:1
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (5 jersey(s), 37.9s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 6.8s)
+ PASS exact:1
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, maroon] (6 jersey(s), 7.7s)
+ PASS exact:2
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 4.4s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 4.4s)
+ PASS similar:1
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (3 jersey(s), 6.8s)
+ PASS similar:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [black] (2 jersey(s), 10.1s)
+ FAIL MISS:dark brown, extra:black
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [brown, yellow] (2 jersey(s), 3.4s)
+ PASS exact:1, similar:1
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 9.2s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 5.7s)
+ PASS similar:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 8.4s)
+ PASS exact:1
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (2 jersey(s), 5.3s)
+ FAIL MISS:maroon
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 7.4s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (5 jersey(s), 36.7s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (3 jersey(s), 6.0s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (6 jersey(s), 10.9s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 4.5s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 6.6s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 36.8s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 2.7s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [grey] (3 jersey(s), 36.8s)
+ PARTIAL similar:1, MISS:black
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 6.7s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 8.1s)
+ PASS exact:1
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 5.7s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (7 jersey(s), 12.1s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black] (1 jersey(s), 3.4s)
+ PARTIAL exact:1, MISS:yellow
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 18.0s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 35.2s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 5.3s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, grey] (2 jersey(s), 34.5s)
+ PASS exact:1, similar:1
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 4.7s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 4.5s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 4.7s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 9.0s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 37.6s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [(none)] (0 jersey(s), 6.8s)
+ FAIL MISS:orange
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.4s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 5.1s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 7.7s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [grey] (2 jersey(s), 7.1s)
+ PASS similar:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 7.9s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (4 jersey(s), 8.5s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [blue, red] (3 jersey(s), 20.5s)
+ PASS exact:1, similar:1
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 6.5s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [grey] (1 jersey(s), 3.4s)
+ PASS similar:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (4 jersey(s), 20.7s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (4 jersey(s), 7.8s)
+ PASS similar:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [navy, red] (3 jersey(s), 7.7s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 7.5s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [black, yellow] (5 jersey(s), 22.9s)
+ PARTIAL exact:1, extra:black
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 36.1s)
+ PASS exact:1
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (3 jersey(s), 6.0s)
+ FAIL MISS:blue
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [yellow] (1 jersey(s), 3.3s)
+ PARTIAL exact:1, MISS:black
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 5.4s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (3 jersey(s), 30.8s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (7 jersey(s), 42.4s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [blue] (1 jersey(s), 7.1s)
+ FAIL MISS:teal, extra:blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 3.6s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.5s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 7.3s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 3.5s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (1 jersey(s), 12.2s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 4.0s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (3 jersey(s), 4.7s)
+ PASS exact:1
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.0s)
+ PASS exact:1
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 5.9s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (13 jersey(s), 8.2s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [grey, red] (2 jersey(s), 4.2s)
+ PASS exact:1, similar:1
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 4.8s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 6.0s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 37.0s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [black] (2 jersey(s), 12.3s)
+ FAIL MISS:green,gray, extra:black
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [navy, yellow] (6 jersey(s), 39.2s)
+ PARTIAL exact:1, MISS:black, extra:navy
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [navy blue, pink] (3 jersey(s), 5.9s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 5.2s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [brown] (5 jersey(s), 7.0s)
+ PASS similar:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [grey, purple, yellow] (5 jersey(s), 7.7s)
+ PARTIAL exact:2, similar:1, MISS:green
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (3 jersey(s), 35.1s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 40.2s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [dark blue, yellow] (6 jersey(s), 33.8s)
+ PASS exact:2
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 36.7s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (1 jersey(s), 4.1s)
+ FAIL MISS:blue
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 4.8s)
+ FAIL MISS:light blue, extra:blue
+
+================================================================================
+ACCURACY SUMMARY (gemini-3-flash-preview)
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 253.2s (1.6s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 175 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 126 / 202 (62.4%)
+ Similar match: 35 / 202 (17.3%)
+ Total found: 161 / 202 (79.7%)
+ Missed: 41 / 202 (20.3%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 126 / 175 (72.0%)
+ Similar match: 34 / 175 (19.4%)
+ Total correct: 160 / 175 (91.4%)
+ Extra/wrong: 15 / 175 (8.6%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ gray -> grey x10
+ navy blue -> blue x6
+ dark brown -> brown x5
+ dark blue -> navy blue x5
+ gold -> yellow x3
+ dark blue -> blue x3
+ navy blue -> dark blue x1
+ navy -> navy blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ black 7 #######
+ gray 6 ######
+ maroon 6 ######
+ light blue 3 ###
+ red 3 ###
+ blue 3 ###
+ green 3 ###
+ gold 2 ##
+ orange 2 ##
+ teal 2 ##
+ gold|yellow 1 #
+ dark brown 1 #
+ yellow 1 #
+ brown 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ blue 4 ####
+ red 4 ####
+ black 4 ####
+ green 1 #
+ orange 1 #
+ navy 1 #
+
+--- Per-Image Verdict ---
+ PASS 120
+ PARTIAL 20
+ FAIL 21
+
+--- Failed Images (21) ---
+ 013 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 016 - maroon.jpg
+ missed: maroon
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 074 - white_orange.jpg
+ missed: orange
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 088 - white_maroon.jpg
+ missed: maroon
+ 112 - orange_white.jpg
+ missed: orange
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+========================================
+Qwen3-VL-8B + jersey_prompt_capstone.txt
+Started: Tue Mar 3 05:10:58 PM MST 2026
+========================================
+Images to process: 161
+Server: http://agx:8080
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [black] (2 jersey(s), 8.2s)
+ FAIL MISS:brown,dark brown, extra:black
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [blue] (3 jersey(s), 8.3s)
+ PASS similar:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (3 jersey(s), 11.9s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 3.8s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 4.2s)
+ PASS similar:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 3.2s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [gold, red] (3 jersey(s), 10.8s)
+ PARTIAL exact:1, extra:gold
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 13.8s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 7.5s)
+ FAIL MISS:light blue, extra:blue
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 11.0s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 5.4s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 1.7s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [black] (2 jersey(s), 6.9s)
+ FAIL MISS:brown, extra:black
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [gray, red] (2 jersey(s), 7.3s)
+ PASS exact:2
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [red] (1 jersey(s), 3.7s)
+ FAIL MISS:maroon,gold, extra:red
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 6.2s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 6.1s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 3.8s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 3.2s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 16.1s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (2 jersey(s), 7.8s)
+ FAIL MISS:maroon, extra:red
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 5.9s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 6.1s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (3 jersey(s), 10.9s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (1 jersey(s), 3.7s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green, yellow] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (4 jersey(s), 13.8s)
+ FAIL MISS:light blue, extra:blue
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [blue] (3 jersey(s), 10.1s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (1 jersey(s), 5.1s)
+ PARTIAL exact:1, MISS:gray
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (9 jersey(s), 28.9s)
+ PASS similar:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.8s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, gray] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (7 jersey(s), 22.5s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (15 jersey(s), 46.5s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [red] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 6.1s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.8s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black] (1 jersey(s), 4.8s)
+ PARTIAL exact:1, MISS:gold
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 3.7s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (2 jersey(s), 7.7s)
+ PASS exact:1, similar:1
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [(none)] (1 jersey(s), 3.7s)
+ FAIL MISS:gold|yellow
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 3.8s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 6.0s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (1 jersey(s), 3.7s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 5.6s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [black] (1 jersey(s), 3.7s)
+ FAIL MISS:dark brown, extra:black
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (3 jersey(s), 10.4s)
+ PASS exact:1, similar:1
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 3.5s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 3.7s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 16.6s)
+ PASS exact:1
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 8.3s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (1 jersey(s), 3.4s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.4s)
+ PASS exact:1
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 14.2s)
+ PASS exact:1
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (3 jersey(s), 10.4s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 13.8s)
+ PARTIAL exact:1, MISS:maroon, extra:red
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 3.5s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 5.8s)
+ PASS similar:1
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (3 jersey(s), 10.6s)
+ PASS similar:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [black] (1 jersey(s), 3.7s)
+ FAIL MISS:dark brown, extra:black
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [black, yellow] (2 jersey(s), 6.0s)
+ PARTIAL exact:1, MISS:dark brown, extra:black
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 3.6s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 5.0s)
+ PASS similar:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (4 jersey(s), 14.2s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 5.9s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 12.5s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 6.7s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [gray] (2 jersey(s), 8.0s)
+ PARTIAL exact:1, MISS:black
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 6.9s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 12.2s)
+ PASS exact:1
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 17.0s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 17.3s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.4s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, gray] (2 jersey(s), 7.3s)
+ PASS exact:2
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 6.1s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [(none)] (1 jersey(s), 3.6s)
+ FAIL MISS:orange
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.8s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 6.3s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 13.8s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 6.2s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 7.4s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (3 jersey(s), 10.9s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [blue, red] (3 jersey(s), 10.6s)
+ PASS exact:1, similar:1
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [gray] (1 jersey(s), 5.1s)
+ PASS exact:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (4 jersey(s), 13.7s)
+ PASS similar:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [blue, red] (2 jersey(s), 8.2s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (2 jersey(s), 8.4s)
+ PASS exact:2
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 8.3s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (3 jersey(s), 10.8s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (7 jersey(s), 23.5s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [blue] (1 jersey(s), 5.0s)
+ FAIL MISS:teal, extra:blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 3.9s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 6.2s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 3.8s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 6.0s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.7s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (3 jersey(s), 8.6s)
+ FAIL MISS:light blue, extra:blue
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 6.1s)
+ PASS exact:1
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (12 jersey(s), 37.7s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [gray, red] (2 jersey(s), 6.7s)
+ PASS exact:2
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 8.3s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 7.9s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 13.7s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [black] (2 jersey(s), 7.8s)
+ FAIL MISS:green,gray, extra:black
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [navy, yellow] (5 jersey(s), 17.1s)
+ PARTIAL exact:1, MISS:black, extra:navy
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [blue, pink] (2 jersey(s), 7.9s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [brown] (5 jersey(s), 12.9s)
+ PASS similar:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [purple, yellow] (4 jersey(s), 14.2s)
+ PARTIAL exact:2, MISS:green,gray
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (2 jersey(s), 7.6s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 8.3s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [navy, yellow] (4 jersey(s), 14.0s)
+ PASS exact:1, similar:1
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (1 jersey(s), 3.8s)
+ FAIL MISS:blue
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 5.8s)
+ FAIL MISS:light blue, extra:blue
+
+================================================================================
+ACCURACY SUMMARY
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1437.3s (8.9s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 181 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 134 / 202 (66.3%)
+ Similar match: 24 / 202 (11.9%)
+ Total found: 158 / 202 (78.2%)
+ Missed: 44 / 202 (21.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 134 / 181 (74.0%)
+ Similar match: 24 / 181 (13.3%)
+ Total correct: 158 / 181 (87.3%)
+ Extra/wrong: 23 / 181 (12.7%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ dark blue -> blue x9
+ navy blue -> blue x8
+ gold -> yellow x3
+ dark brown -> brown x2
+ navy -> blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ gray 8 ########
+ maroon 7 #######
+ black 6 ######
+ light blue 5 #####
+ dark brown 4 ####
+ brown 3 ###
+ green 3 ###
+ gold 2 ##
+ blue 2 ##
+ teal 2 ##
+ gold|yellow 1 #
+ orange 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ black 7 #######
+ blue 6 ######
+ red 6 ######
+ gold 1 #
+ green 1 #
+ orange 1 #
+ navy 1 #
+
+--- Per-Image Verdict ---
+ PASS 120
+ PARTIAL 19
+ FAIL 22
+
+--- Failed Images (22) ---
+ 001 -brown_white or dark brown.jpg
+ missed: brown, dark brown
+ extra: black
+ 013 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 016 - maroon.jpg
+ missed: maroon
+ 017 - brown_white.jpg
+ missed: brown
+ extra: black
+ 019 - maroon_gold.jpg
+ missed: maroon, gold
+ extra: red
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 036 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 063 - dark brown.jpg
+ missed: dark brown
+ extra: black
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 112 - orange_white.jpg
+ missed: orange
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 141 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+========================================
+Gemini 3 Flash + jersey_prompt_capstone.txt
+Started: Tue Mar 3 05:34:55 PM MST 2026
+========================================
+Model: gemini-3-flash-preview
+Images to process: 161
+Concurrency: 8 workers
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars)
+================================================================================
+Pre-encoding images ... 161 images in 1.7s
+Sending API requests ...
+
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (259.8s total)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [brown] (1 jersey(s), 7.0s)
+ PASS exact:1, similar:1
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 4.6s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 7.5s)
+ PASS similar:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (3 jersey(s), 18.8s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 3.7s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [dark blue] (1 jersey(s), 4.7s)
+ PASS similar:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.3s)
+ PASS exact:1
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 7.5s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 12.1s)
+ PASS exact:1
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black] (3 jersey(s), 13.8s)
+ PASS exact:1
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 12.5s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 3.5s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 4.1s)
+ PASS exact:1
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 4.6s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 4.0s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 5.0s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (3 jersey(s), 8.9s)
+ PASS exact:1
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [grey] (1 jersey(s), 4.1s)
+ PARTIAL similar:1, MISS:red
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [red] (1 jersey(s), 5.0s)
+ FAIL MISS:maroon,gold, extra:red
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 4.0s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 4.3s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 5.3s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 3.6s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 3.6s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 3.3s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 36.1s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 38.3s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (2 jersey(s), 4.8s)
+ FAIL MISS:maroon, extra:red
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 10.8s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 4.2s)
+ PASS exact:1
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 4.8s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (7 jersey(s), 40.2s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (1 jersey(s), 12.7s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green, yellow] (3 jersey(s), 9.2s)
+ PASS exact:2
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 5.0s)
+ PASS exact:1
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [blue] (4 jersey(s), 7.5s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 36.8s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [blue, grey] (4 jersey(s), 38.9s)
+ PARTIAL similar:1, extra:blue
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [grey, maroon] (2 jersey(s), 11.3s)
+ PASS exact:1, similar:1
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (8 jersey(s), 7.2s)
+ PASS similar:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.5s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, grey] (5 jersey(s), 7.6s)
+ PASS exact:1, similar:1
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (8 jersey(s), 36.8s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 5.4s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (8 jersey(s), 39.3s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 4.7s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [(none)] (0 jersey(s), 34.4s)
+ FAIL MISS:red
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 4.1s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (5 jersey(s), 37.3s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.2s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black] (1 jersey(s), 3.7s)
+ PARTIAL exact:1, MISS:gold
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 3.4s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 3.3s)
+ PASS exact:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green] (1 jersey(s), 11.1s)
+ PARTIAL exact:1, MISS:gold
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 6.6s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [(none)] (1 jersey(s), 4.0s)
+ FAIL MISS:gold|yellow
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 7.7s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 4.3s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 4.8s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (1 jersey(s), 4.0s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 4.3s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 3.5s)
+ PASS similar:1
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 5.7s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (5 jersey(s), 40.7s)
+ PASS exact:1, similar:1
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 4.7s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 5.8s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 4.3s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [(none)] (5 jersey(s), 38.2s)
+ FAIL MISS:red
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 6.2s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 3.5s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (1 jersey(s), 9.0s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 4.3s)
+ PASS exact:1
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 3.2s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue, pink] (4 jersey(s), 8.1s)
+ PARTIAL exact:1, extra:pink
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (5 jersey(s), 37.1s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 10.8s)
+ FAIL MISS:light blue, extra:blue
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, red] (6 jersey(s), 36.8s)
+ PARTIAL exact:1, MISS:maroon, extra:red
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 3.4s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 3.9s)
+ PASS similar:1
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (3 jersey(s), 6.3s)
+ PASS similar:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [black] (2 jersey(s), 14.2s)
+ FAIL MISS:dark brown, extra:black
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [brown, yellow] (2 jersey(s), 3.8s)
+ PASS exact:1, similar:1
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 4.5s)
+ PASS similar:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [maroon] (4 jersey(s), 41.4s)
+ PASS exact:1
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 5.8s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (5 jersey(s), 38.4s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (3 jersey(s), 10.2s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 39.3s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (2 jersey(s), 4.6s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 7.0s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 22.9s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 5.2s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [grey] (2 jersey(s), 19.4s)
+ PARTIAL similar:1, MISS:black
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 4.3s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [red] (3 jersey(s), 4.5s)
+ FAIL MISS:maroon, extra:red
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 40.0s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (7 jersey(s), 39.2s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black] (1 jersey(s), 4.2s)
+ PARTIAL exact:1, MISS:yellow
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 36.3s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 4.1s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 6.3s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, grey] (2 jersey(s), 4.0s)
+ PASS exact:1, similar:1
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 4.4s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (1 jersey(s), 47.1s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 5.2s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 10.7s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 34.8s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 5.8s)
+ PASS exact:1
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 5.1s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 5.7s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, red] (4 jersey(s), 7.9s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [grey] (2 jersey(s), 3.9s)
+ PASS similar:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (1 jersey(s), 8.5s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (3 jersey(s), 4.7s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [dark blue, red] (3 jersey(s), 7.0s)
+ PASS exact:2
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 5.9s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [grey] (1 jersey(s), 2.6s)
+ PASS similar:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (4 jersey(s), 8.8s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (4 jersey(s), 4.9s)
+ PASS similar:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [navy, red] (3 jersey(s), 8.1s)
+ PARTIAL similar:1, MISS:maroon, extra:red
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 5.8s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (4 jersey(s), 4.8s)
+ PASS exact:1
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [(none)] (0 jersey(s), 42.6s)
+ FAIL MISS:green
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (3 jersey(s), 16.8s)
+ FAIL MISS:blue
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [yellow] (1 jersey(s), 3.4s)
+ PARTIAL exact:1, MISS:black
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 3.8s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (2 jersey(s), 10.2s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (8 jersey(s), 43.5s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [light blue] (1 jersey(s), 4.3s)
+ FAIL MISS:teal, extra:light blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 5.2s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 8.6s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 3.3s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [blue] (1 jersey(s), 4.8s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 3.5s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (3 jersey(s), 5.2s)
+ FAIL MISS:light blue, extra:blue
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 5.5s)
+ PASS exact:1
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 4.7s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (8 jersey(s), 39.7s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 3.8s)
+ PASS exact:1
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [grey, red] (2 jersey(s), 4.0s)
+ PASS exact:1, similar:1
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 4.1s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 5.9s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 36.9s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [black] (1 jersey(s), 12.8s)
+ FAIL MISS:green,gray, extra:black
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [dark blue, yellow] (5 jersey(s), 38.6s)
+ PARTIAL exact:1, MISS:black, extra:dark blue
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [navy blue, pink] (3 jersey(s), 22.1s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 3.7s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [brown] (5 jersey(s), 5.1s)
+ PASS similar:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [grey, purple, yellow] (5 jersey(s), 7.4s)
+ PARTIAL exact:2, similar:1, MISS:green
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (1 jersey(s), 12.0s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 38.1s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [blue, yellow] (7 jersey(s), 37.4s)
+ PASS exact:1, similar:1
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 11.2s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (1 jersey(s), 4.2s)
+ FAIL MISS:blue
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [blue] (2 jersey(s), 5.6s)
+ FAIL MISS:light blue, extra:blue
+
+================================================================================
+ACCURACY SUMMARY (gemini-3-flash-preview)
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 259.8s (1.6s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 177 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 123 / 202 (60.9%)
+ Similar match: 35 / 202 (17.3%)
+ Total found: 158 / 202 (78.2%)
+ Missed: 44 / 202 (21.8%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 123 / 177 (69.5%)
+ Similar match: 34 / 177 (19.2%)
+ Total correct: 157 / 177 (88.7%)
+ Extra/wrong: 20 / 177 (11.3%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ gray -> grey x10
+ navy blue -> blue x7
+ dark brown -> brown x5
+ dark blue -> blue x5
+ dark blue -> navy blue x3
+ gold -> yellow x2
+ navy blue -> dark blue x1
+ navy -> blue x1
+ dark blue -> navy x1
+
+--- Most Missed Ground Truth Colors ---
+ maroon 8 ########
+ black 7 #######
+ gray 6 ######
+ light blue 4 ####
+ green 4 ####
+ red 3 ###
+ gold 3 ###
+ blue 3 ###
+ teal 2 ##
+ gold|yellow 1 #
+ dark brown 1 #
+ yellow 1 #
+ brown 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ red 7 #######
+ blue 5 #####
+ black 3 ###
+ pink 1 #
+ green 1 #
+ orange 1 #
+ light blue 1 #
+ dark blue 1 #
+
+--- Per-Image Verdict ---
+ PASS 117
+ PARTIAL 22
+ FAIL 22
+
+--- Failed Images (22) ---
+ 016 - maroon.jpg
+ missed: maroon
+ 019 - maroon_gold.jpg
+ missed: maroon, gold
+ extra: red
+ 029 -maroon_white.jpg
+ missed: maroon
+ extra: red
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 078 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 099 - maroon_white.jpg
+ missed: maroon
+ extra: red
+ 128 - green_white.jpg
+ missed: green
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: light blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 141 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+ 161 - light blue_white.jpg
+ missed: light blue
+ extra: blue
+
+========================================
+Qwen3-VL-8B + jersey_prompt_constrained.txt
+Started: Tue Mar 3 05:39:17 PM MST 2026
+========================================
+Images to process: 161
+Server: http://agx:8080
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [black] (3 jersey(s), 11.6s)
+ FAIL MISS:brown,dark brown, extra:black
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [dark blue] (3 jersey(s), 11.1s)
+ PASS exact:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (3 jersey(s), 11.7s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 5.0s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [dark blue] (1 jersey(s), 4.4s)
+ PASS similar:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ FAIL MISS:brown, extra:maroon
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 4.3s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [gold, red] (3 jersey(s), 10.8s)
+ PARTIAL exact:1, extra:gold
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black, maroon] (3 jersey(s), 11.0s)
+ PARTIAL exact:1, extra:maroon
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 13.8s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.6s)
+ PASS exact:1
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 11.0s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 1.7s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [dark brown] (2 jersey(s), 8.8s)
+ PASS similar:1
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [gray, red] (2 jersey(s), 7.3s)
+ PASS exact:2
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [maroon, yellow] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 5.0s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 4.3s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 16.1s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [maroon] (2 jersey(s), 7.9s)
+ FAIL MISS:brown, extra:maroon
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (3 jersey(s), 10.9s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (1 jersey(s), 4.8s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green, yellow] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [dark blue] (3 jersey(s), 10.3s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (1 jersey(s), 5.1s)
+ PARTIAL exact:1, MISS:gray
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [navy blue] (9 jersey(s), 30.6s)
+ PASS exact:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, gray] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (7 jersey(s), 22.6s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (15 jersey(s), 46.5s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [maroon] (1 jersey(s), 5.0s)
+ FAIL MISS:red, extra:maroon
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 7.9s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 14.1s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black, yellow] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 4.9s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [navy blue] (2 jersey(s), 8.1s)
+ PASS similar:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (2 jersey(s), 7.8s)
+ PASS exact:1, similar:1
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [(none)] (1 jersey(s), 4.9s)
+ FAIL MISS:gold|yellow
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 4.9s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 8.1s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 7.5s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [dark brown] (1 jersey(s), 5.0s)
+ PASS exact:1
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [dark green, yellow] (3 jersey(s), 10.6s)
+ PASS similar:2
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 13.7s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [(none)] (4 jersey(s), 14.1s)
+ FAIL MISS:red
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 11.1s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.4s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.5s)
+ PASS exact:1
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (3 jersey(s), 11.4s)
+ PASS exact:1
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (4 jersey(s), 13.4s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, maroon] (4 jersey(s), 14.1s)
+ PASS exact:2
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 7.7s)
+ PASS similar:1
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [dark blue] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [black] (2 jersey(s), 7.9s)
+ FAIL MISS:dark brown, extra:black
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [dark brown, gold] (2 jersey(s), 8.0s)
+ PASS exact:1, similar:1
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.8s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [dark brown] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 11.1s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (4 jersey(s), 14.3s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 8.1s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 12.5s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 8.6s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [light blue] (2 jersey(s), 8.3s)
+ FAIL MISS:gray,black, extra:light blue
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 8.7s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 12.2s)
+ PASS exact:1
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 13.8s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 17.0s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (5 jersey(s), 17.3s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 9.2s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, gray] (2 jersey(s), 9.1s)
+ PASS exact:2
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 14.0s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 7.8s)
+ PASS exact:1
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, maroon] (4 jersey(s), 14.0s)
+ PASS exact:1, similar:1
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 7.8s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (3 jersey(s), 10.9s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [navy blue, red] (3 jersey(s), 11.1s)
+ PASS exact:1, similar:1
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 10.9s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [gray] (1 jersey(s), 6.3s)
+ PASS exact:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (4 jersey(s), 14.1s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [dark blue] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [dark blue, red] (2 jersey(s), 8.2s)
+ PARTIAL exact:1, MISS:maroon, extra:red
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 17.2s)
+ PASS exact:1
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [black, yellow] (2 jersey(s), 8.4s)
+ PASS exact:2
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 10.8s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (3 jersey(s), 10.8s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (6 jersey(s), 21.2s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [light blue] (1 jersey(s), 5.1s)
+ FAIL MISS:teal, extra:light blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 8.0s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 11.0s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 4.9s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 8.3s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 7.7s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (3 jersey(s), 11.2s)
+ PASS exact:1
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [maroon] (2 jersey(s), 8.2s)
+ FAIL MISS:orange, extra:maroon
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 11.1s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (10 jersey(s), 31.9s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [(none)] (1 jersey(s), 5.0s)
+ FAIL MISS:green
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [gray, red] (2 jersey(s), 8.0s)
+ PASS exact:2
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 10.8s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 7.8s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 16.7s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [dark blue] (2 jersey(s), 7.9s)
+ FAIL MISS:green,gray, extra:dark blue
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [dark blue, yellow] (5 jersey(s), 17.1s)
+ PARTIAL exact:1, MISS:black, extra:dark blue
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [navy blue, pink] (2 jersey(s), 8.3s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 8.1s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [dark brown] (5 jersey(s), 17.3s)
+ PASS exact:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [gray, purple, yellow] (5 jersey(s), 17.4s)
+ PARTIAL exact:3, MISS:green
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (2 jersey(s), 7.7s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 10.7s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [dark blue, yellow] (4 jersey(s), 14.3s)
+ PASS exact:2
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (4 jersey(s), 13.9s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 7.9s)
+ PASS exact:1
+
+================================================================================
+ACCURACY SUMMARY
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 1596.1s (9.9s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 185 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 145 / 202 (71.8%)
+ Similar match: 22 / 202 (10.9%)
+ Total found: 167 / 202 (82.7%)
+ Missed: 35 / 202 (17.3%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 145 / 185 (78.4%)
+ Similar match: 22 / 185 (11.9%)
+ Total correct: 167 / 185 (90.3%)
+ Extra/wrong: 18 / 185 (9.7%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ navy blue -> blue x6
+ gold -> yellow x5
+ dark blue -> navy blue x5
+ navy blue -> dark blue x1
+ brown -> dark brown x1
+ navy -> dark blue x1
+ blue -> navy blue x1
+ green -> dark green x1
+ yellow -> gold x1
+
+--- Most Missed Ground Truth Colors ---
+ gray 8 ########
+ black 6 ######
+ brown 4 ####
+ green 4 ####
+ maroon 3 ###
+ dark brown 2 ##
+ red 2 ##
+ teal 2 ##
+ blue 1 #
+ light blue 1 #
+ gold|yellow 1 #
+ orange 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ maroon 5 #####
+ black 3 ###
+ light blue 2 ##
+ red 2 ##
+ dark blue 2 ##
+ gold 1 #
+ blue 1 #
+ green 1 #
+ orange 1 #
+
+--- Per-Image Verdict ---
+ PASS 127
+ PARTIAL 15
+ FAIL 19
+
+--- Failed Images (19) ---
+ 001 -brown_white or dark brown.jpg
+ missed: brown, dark brown
+ extra: black
+ 007 - brown_white.jpg
+ missed: brown
+ extra: maroon
+ 016 - maroon.jpg
+ missed: maroon
+ 031 - brown_white.jpg
+ missed: brown
+ extra: maroon
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ extra: maroon
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 083 - dark brown_white.jpg
+ missed: dark brown
+ extra: black
+ 097 - gray_black.jpg
+ missed: gray, black
+ extra: light blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: light blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 142 - orange_white.jpg
+ missed: orange
+ extra: maroon
+ 145 - green_white.jpg
+ missed: green
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: dark blue
+
+========================================
+Gemini 3 Flash + jersey_prompt_constrained.txt
+Started: Tue Mar 3 06:05:53 PM MST 2026
+========================================
+Model: gemini-3-flash-preview
+Images to process: 161
+Concurrency: 8 workers
+Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars)
+================================================================================
+Pre-encoding images ... 161 images in 1.7s
+Sending API requests ...
+
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (344.4s total)
+================================================================================
+
+[1/161] 001 -brown_white or dark brown.jpg
+ GT: [brown, dark brown]
+ VLM: [dark brown] (2 jersey(s), 36.3s)
+ PASS exact:1, similar:1
+
+[2/161] 002 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (2 jersey(s), 6.3s)
+ PASS exact:1
+
+[3/161] 003 - dark blue.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 7.5s)
+ PASS similar:1
+
+[4/161] 004 - purple_light blue.jpg
+ GT: [purple, light blue]
+ VLM: [light blue, purple] (2 jersey(s), 37.3s)
+ PASS exact:2
+
+[5/161] 005 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (1 jersey(s), 4.5s)
+ PARTIAL exact:1, MISS:gray
+
+[6/161] 006 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [navy blue] (1 jersey(s), 5.0s)
+ PASS exact:1
+
+[7/161] 007 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.1s)
+ PASS exact:1
+
+[8/161] 008 -red or orange.jpg
+ GT: [red|orange]
+ VLM: [red] (1 jersey(s), 3.2s)
+ PASS exact:1
+
+[9/161] 009 - white_red.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 35.1s)
+ PASS exact:1
+
+[10/161] 010 - white_black.jpg
+ GT: [black]
+ VLM: [black] (3 jersey(s), 10.5s)
+ PASS exact:1
+
+[11/161] 011 - white or gray_purple.jpg
+ GT: [gray, purple]
+ VLM: [purple] (4 jersey(s), 40.8s)
+ PARTIAL exact:1, MISS:gray
+
+[12/161] 012 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 5.3s)
+ PASS exact:1
+
+[13/161] 013 - light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 8.9s)
+ PASS exact:1
+
+[14/161] 014 - orange_dark blue or purple.jpg
+ GT: [orange, dark blue|purple]
+ VLM: [orange, purple] (3 jersey(s), 9.8s)
+ PASS exact:2
+
+[15/161] 015 - green.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 4.4s)
+ PASS exact:1
+
+[16/161] 016 - maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (0 jersey(s), 3.9s)
+ FAIL MISS:maroon
+
+[17/161] 017 - brown_white.jpg
+ GT: [brown]
+ VLM: [dark brown] (2 jersey(s), 6.5s)
+ PASS similar:1
+
+[18/161] 018 - gray_red.jpg
+ GT: [gray, red]
+ VLM: [gray] (1 jersey(s), 8.7s)
+ PARTIAL exact:1, MISS:red
+
+[19/161] 019 - maroon_gold.jpg
+ GT: [maroon, gold]
+ VLM: [maroon] (1 jersey(s), 4.5s)
+ PARTIAL exact:1, MISS:gold
+
+[20/161] 020 - white_brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (2 jersey(s), 4.9s)
+ PASS exact:1
+
+[21/161] 021 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 9.1s)
+ PASS exact:1
+
+[22/161] 022 - black_light blue.jpg
+ GT: [black, light blue]
+ VLM: [light blue] (1 jersey(s), 5.0s)
+ PARTIAL exact:1, MISS:black
+
+[23/161] 023 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.2s)
+ PASS exact:1
+
+[24/161] 024 - white_pink.jpg
+ GT: [pink]
+ VLM: [pink] (2 jersey(s), 5.7s)
+ PASS exact:1
+
+[25/161] 025 - blue_green.jpg
+ GT: [blue, green]
+ VLM: [green] (1 jersey(s), 3.8s)
+ PARTIAL exact:1, MISS:blue
+
+[26/161] 026 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 6.8s)
+ PASS exact:1
+
+[27/161] 027 - red_white.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 37.7s)
+ PASS exact:1
+
+[28/161] 028 - green_white.jpg
+ GT: [green]
+ VLM: [green] (6 jersey(s), 41.4s)
+ PASS exact:1
+
+[29/161] 029 -maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 7.1s)
+ PASS exact:1
+
+[30/161] 030 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (2 jersey(s), 5.8s)
+ PASS similar:1
+
+[31/161] 031 - brown_white.jpg
+ GT: [brown]
+ VLM: [brown] (2 jersey(s), 6.0s)
+ PASS exact:1
+
+[32/161] 032 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 5.9s)
+ PASS exact:1
+
+[33/161] 033 - navy blue_white or gray.jpg
+ GT: [navy blue, gray]
+ VLM: [blue] (8 jersey(s), 43.6s)
+ PARTIAL similar:1, MISS:gray
+
+[34/161] 034 - light blue.jpg
+ GT: [light blue]
+ VLM: [blue] (1 jersey(s), 11.5s)
+ FAIL MISS:light blue, extra:blue
+
+[35/161] 035 -green_gold or yellow.jpg
+ GT: [green, gold|yellow]
+ VLM: [green] (1 jersey(s), 11.6s)
+ PARTIAL exact:1, MISS:gold|yellow
+
+[36/161] 036 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 9.7s)
+ PASS exact:1
+
+[37/161] 037 -navy_white.jpg
+ GT: [navy]
+ VLM: [navy blue] (3 jersey(s), 16.0s)
+ PASS similar:1
+
+[38/161] 038 - red_white.jpg
+ GT: [red]
+ VLM: [red] (3 jersey(s), 38.7s)
+ PASS exact:1
+
+[39/161] 039 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (3 jersey(s), 18.4s)
+ PASS exact:1
+
+[40/161] 040 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [gray, maroon] (2 jersey(s), 5.4s)
+ PASS exact:2
+
+[41/161] 041 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [navy blue] (8 jersey(s), 41.5s)
+ PASS exact:1
+
+[42/161] 042 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 5.1s)
+ PASS exact:1
+
+[43/161] 043 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [black, gray] (5 jersey(s), 39.3s)
+ PASS exact:2
+
+[44/161] 044 - purple_black.jpg
+ GT: [purple, black]
+ VLM: [purple] (8 jersey(s), 36.4s)
+ PARTIAL exact:1, MISS:black
+
+[45/161] 045 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 36.0s)
+ PASS exact:1
+
+[46/161] 046 - green.jpg
+ GT: [green]
+ VLM: [black] (8 jersey(s), 35.2s)
+ FAIL MISS:green, extra:black
+
+[47/161] 047 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (3 jersey(s), 5.3s)
+ PASS exact:1
+
+[48/161] 048 - red.jpg
+ GT: [red]
+ VLM: [(none)] (0 jersey(s), 36.0s)
+ FAIL MISS:red
+
+[49/161] 049 - white_gold.jpg
+ GT: [gold]
+ VLM: [yellow] (2 jersey(s), 3.6s)
+ PASS similar:1
+
+[50/161] 050 - white_orange.jpg
+ GT: [orange]
+ VLM: [orange] (6 jersey(s), 40.4s)
+ PASS exact:1
+
+[51/161] 051 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 5.8s)
+ PASS exact:1
+
+[52/161] 052 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [black] (1 jersey(s), 24.0s)
+ PARTIAL exact:1, MISS:gold
+
+[53/161] 053 - black_white.jpg
+ GT: [black]
+ VLM: [(none)] (1 jersey(s), 4.3s)
+ FAIL MISS:black
+
+[54/161] 054 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (2 jersey(s), 6.5s)
+ PASS exact:1
+
+[55/161] 055 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (2 jersey(s), 12.6s)
+ PASS exact:1, similar:1
+
+[56/161] 056 - white_red.jpg
+ GT: [red]
+ VLM: [red] (4 jersey(s), 36.0s)
+ PASS exact:1
+
+[57/161] 057 - white_gold or yellow.jpg
+ GT: [gold|yellow]
+ VLM: [(none)] (1 jersey(s), 4.4s)
+ FAIL MISS:gold|yellow
+
+[58/161] 058 - purple.jpg
+ GT: [purple]
+ VLM: [purple] (4 jersey(s), 6.2s)
+ PASS exact:1
+
+[59/161] 059 - black_gold.jpg
+ GT: [black, gold]
+ VLM: [gold] (1 jersey(s), 4.5s)
+ PARTIAL exact:1, MISS:black
+
+[60/161] 060 - gray_navy blue.jpg
+ GT: [gray, navy blue]
+ VLM: [blue] (2 jersey(s), 7.1s)
+ PARTIAL similar:1, MISS:gray
+
+[61/161] 061 - brown or orange.jpg
+ GT: [brown|orange]
+ VLM: [orange] (1 jersey(s), 3.4s)
+ PASS exact:1
+
+[62/161] 062 - orange_blue.jpg
+ GT: [orange, blue]
+ VLM: [blue, orange] (2 jersey(s), 4.8s)
+ PASS exact:2
+
+[63/161] 063 - dark brown.jpg
+ GT: [dark brown]
+ VLM: [brown] (1 jersey(s), 4.7s)
+ PASS similar:1
+
+[64/161] 064 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 5.3s)
+ PASS exact:1
+
+[65/161] 065 - green_gold.jpg
+ GT: [green, gold]
+ VLM: [green, yellow] (5 jersey(s), 37.1s)
+ PASS exact:1, similar:1
+
+[66/161] 066 - yellow.jpg
+ GT: [yellow]
+ VLM: [yellow] (1 jersey(s), 6.6s)
+ PASS exact:1
+
+[67/161] 067 - red_white.jpg
+ GT: [red]
+ VLM: [red] (5 jersey(s), 36.5s)
+ PASS exact:1
+
+[68/161] 068 - gold.jpg
+ GT: [gold]
+ VLM: [gold] (1 jersey(s), 39.5s)
+ PASS exact:1
+
+[69/161] 069 - red_white.jpg
+ GT: [red]
+ VLM: [(none)] (5 jersey(s), 40.6s)
+ FAIL MISS:red
+
+[70/161] 070 - green_white.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 7.9s)
+ PASS exact:1
+
+[71/161] 071 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 4.4s)
+ PASS exact:1
+
+[72/161] 072 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 5.6s)
+ PASS exact:1
+
+[73/161] 073 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (1 jersey(s), 4.2s)
+ PASS exact:1
+
+[74/161] 074 - white_orange.jpg
+ GT: [orange]
+ VLM: [(none)] (1 jersey(s), 8.9s)
+ FAIL MISS:orange
+
+[75/161] 075 - green_white.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 5.0s)
+ PASS exact:1
+
+[76/161] 076 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (4 jersey(s), 38.6s)
+ PASS exact:1
+
+[77/161] 077 - teal_white.jpg
+ GT: [teal]
+ VLM: [green] (5 jersey(s), 34.5s)
+ FAIL MISS:teal, extra:green
+
+[78/161] 078 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 5.7s)
+ PASS exact:1
+
+[79/161] 079 - blue_maroon.jpg
+ GT: [blue, maroon]
+ VLM: [blue, maroon] (6 jersey(s), 10.0s)
+ PASS exact:2
+
+[80/161] 080 - navy blue_white.jpg
+ GT: [navy blue]
+ VLM: [blue] (1 jersey(s), 7.9s)
+ PASS similar:1
+
+[81/161] 081 - navy blue.jpg
+ GT: [navy blue]
+ VLM: [light blue] (2 jersey(s), 6.6s)
+ FAIL MISS:navy blue, extra:light blue
+
+[82/161] 082 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (3 jersey(s), 21.3s)
+ PASS similar:1
+
+[83/161] 083 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [dark brown] (2 jersey(s), 40.1s)
+ PASS exact:1
+
+[84/161] 084 - dark brown_yellow.jpg
+ GT: [dark brown, yellow]
+ VLM: [dark brown, gold] (2 jersey(s), 8.6s)
+ PASS exact:1, similar:1
+
+[85/161] 085 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 25.5s)
+ PASS exact:1
+
+[86/161] 086 - dark brown_white.jpg
+ GT: [dark brown]
+ VLM: [dark brown] (1 jersey(s), 38.5s)
+ PASS exact:1
+
+[87/161] 087 - white_light blue.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 10.2s)
+ PASS exact:1
+
+[88/161] 088 - white_maroon.jpg
+ GT: [maroon]
+ VLM: [(none)] (2 jersey(s), 34.9s)
+ FAIL MISS:maroon
+
+[89/161] 089 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 7.7s)
+ PASS exact:1
+
+[90/161] 090 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (5 jersey(s), 36.9s)
+ PASS exact:1
+
+[91/161] 091 - teal.jpg
+ GT: [teal]
+ VLM: [teal] (3 jersey(s), 7.6s)
+ PASS exact:1
+
+[92/161] 092 - green_white.jpg
+ GT: [green]
+ VLM: [green] (6 jersey(s), 40.0s)
+ PASS exact:1
+
+[93/161] 093 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 6.6s)
+ PASS similar:1
+
+[94/161] 094 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 6.6s)
+ PASS exact:1
+
+[95/161] 095 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 35.6s)
+ PASS exact:1
+
+[96/161] 096 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 3.7s)
+ PASS exact:1
+
+[97/161] 097 - gray_black.jpg
+ GT: [gray, black]
+ VLM: [gray] (4 jersey(s), 39.1s)
+ PARTIAL exact:1, MISS:black
+
+[98/161] 098 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (2 jersey(s), 35.9s)
+ PASS exact:1
+
+[99/161] 099 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (3 jersey(s), 5.6s)
+ PASS exact:1
+
+[100/161] 100 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (4 jersey(s), 34.6s)
+ PASS exact:1
+
+[101/161] 101 - green_white.jpg
+ GT: [green]
+ VLM: [green] (7 jersey(s), 38.7s)
+ PASS exact:1
+
+[102/161] 102 - yellow-black.jpg
+ GT: [yellow, black]
+ VLM: [black] (1 jersey(s), 7.1s)
+ PARTIAL exact:1, MISS:yellow
+
+[103/161] 103 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 35.0s)
+ PASS exact:1
+
+[104/161] 104 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 35.3s)
+ PASS exact:1
+
+[105/161] 105 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 4.8s)
+ PASS exact:1
+
+[106/161] 106 - black_gray.jpg
+ GT: [black, gray]
+ VLM: [black, gray] (2 jersey(s), 6.9s)
+ PASS exact:2
+
+[107/161] 107 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 7.8s)
+ PASS exact:1
+
+[108/161] 108 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.3s)
+ PASS exact:1
+
+[109/161] 109 - purple_white.jpg
+ GT: [purple]
+ VLM: [purple] (2 jersey(s), 4.8s)
+ PASS exact:1
+
+[110/161] 110 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 7.0s)
+ PASS exact:1
+
+[111/161] 111 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (2 jersey(s), 10.9s)
+ PASS exact:1
+
+[112/161] 112 - orange_white.jpg
+ GT: [orange]
+ VLM: [(none)] (0 jersey(s), 37.6s)
+ FAIL MISS:orange
+
+[113/161] 113 - orange.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 3.5s)
+ PASS exact:1
+
+[114/161] 114 - black_white.jpg
+ GT: [black]
+ VLM: [black] (2 jersey(s), 5.5s)
+ PASS exact:1
+
+[115/161] 115 - navy blue_maroon.jpg
+ GT: [navy blue, maroon]
+ VLM: [blue, maroon] (4 jersey(s), 7.4s)
+ PASS exact:1, similar:1
+
+[116/161] 116 - gray_white.jpg
+ GT: [gray]
+ VLM: [gray] (2 jersey(s), 39.7s)
+ PASS exact:1
+
+[117/161] 117 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 37.5s)
+ PASS exact:1
+
+[118/161] 118 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (2 jersey(s), 12.1s)
+ PASS similar:1
+
+[119/161] 119 - black_yellow.jpg
+ GT: [black, yellow]
+ VLM: [black, yellow] (4 jersey(s), 36.4s)
+ PASS exact:2
+
+[120/161] 120 - red_dark blue.jpg
+ GT: [red, dark blue]
+ VLM: [navy blue, red] (3 jersey(s), 17.4s)
+ PASS exact:1, similar:1
+
+[121/161] 121 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (3 jersey(s), 17.7s)
+ PASS exact:1
+
+[122/161] 122 - gray.jpg
+ GT: [gray]
+ VLM: [gray] (1 jersey(s), 4.1s)
+ PASS exact:1
+
+[123/161] 123 - teal_white.jpg
+ GT: [teal]
+ VLM: [teal] (4 jersey(s), 11.1s)
+ PASS exact:1
+
+[124/161] 124 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (4 jersey(s), 8.1s)
+ PASS similar:1
+
+[125/161] 125 - dark blue_maroon.jpg
+ GT: [dark blue, maroon]
+ VLM: [maroon, navy blue] (4 jersey(s), 17.9s)
+ PASS exact:1, similar:1
+
+[126/161] 126 - white_blue.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 6.8s)
+ PASS exact:1
+
+[127/161] 127 - yellow.jpg
+ GT: [yellow]
+ VLM: [black, gold] (5 jersey(s), 39.3s)
+ PARTIAL similar:1, extra:black
+
+[128/161] 128 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 11.6s)
+ PASS exact:1
+
+[129/161] 129 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (3 jersey(s), 6.1s)
+ FAIL MISS:blue
+
+[130/161] 130 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [yellow] (1 jersey(s), 4.3s)
+ PARTIAL exact:1, MISS:black
+
+[131/161] 131 - purple_orange.jpg
+ GT: [purple, orange]
+ VLM: [orange, purple] (3 jersey(s), 9.4s)
+ PASS exact:2
+
+[132/161] 132 - brown_white.jpg
+ GT: [brown]
+ VLM: [orange] (2 jersey(s), 36.4s)
+ FAIL MISS:brown, extra:orange
+
+[133/161] 133 - light blue.png
+ GT: [light blue]
+ VLM: [light blue] (7 jersey(s), 38.8s)
+ PASS exact:1
+
+[134/161] 134 - teal_white.jpg
+ GT: [teal]
+ VLM: [light blue] (1 jersey(s), 11.2s)
+ FAIL MISS:teal, extra:light blue
+
+[135/161] 135 - green.jpg
+ GT: [green]
+ VLM: [green] (1 jersey(s), 4.9s)
+ PASS exact:1
+
+[136/161] 136 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 6.8s)
+ PASS exact:1
+
+[137/161] 137 - green_white.jpg
+ GT: [green]
+ VLM: [green] (4 jersey(s), 9.8s)
+ PASS exact:1
+
+[138/161] 138 - maroon.jpg
+ GT: [maroon]
+ VLM: [red] (1 jersey(s), 4.3s)
+ FAIL MISS:maroon, extra:red
+
+[139/161] 139 - dark blue_white.jpg
+ GT: [dark blue]
+ VLM: [navy blue] (1 jersey(s), 5.3s)
+ PASS similar:1
+
+[140/161] 140 - red_white.jpg
+ GT: [red]
+ VLM: [red] (2 jersey(s), 5.3s)
+ PASS exact:1
+
+[141/161] 141 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (3 jersey(s), 6.3s)
+ PASS exact:1
+
+[142/161] 142 - orange_white.jpg
+ GT: [orange]
+ VLM: [orange] (1 jersey(s), 5.3s)
+ PASS exact:1
+
+[143/161] 143 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (3 jersey(s), 5.7s)
+ PASS exact:1
+
+[144/161] 144 - green.jpg
+ GT: [green]
+ VLM: [green] (8 jersey(s), 38.3s)
+ PASS exact:1
+
+[145/161] 145 - green_white.jpg
+ GT: [green]
+ VLM: [green] (2 jersey(s), 7.3s)
+ PASS exact:1
+
+[146/161] 146 - red_gray.jpg
+ GT: [red, gray]
+ VLM: [gray, red] (2 jersey(s), 4.7s)
+ PASS exact:2
+
+[147/161] 147 - green.jpg
+ GT: [green]
+ VLM: [green] (3 jersey(s), 5.2s)
+ PASS exact:1
+
+[148/161] 148 - yellow_purple.jpg
+ GT: [yellow, purple]
+ VLM: [purple, yellow] (2 jersey(s), 8.4s)
+ PASS exact:2
+
+[149/161] 149 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 38.0s)
+ PASS exact:1
+
+[150/161] 150 - green_gray.jpg
+ GT: [green, gray]
+ VLM: [black] (2 jersey(s), 10.3s)
+ FAIL MISS:green,gray, extra:black
+
+[151/161] 151 - yellow_black.jpg
+ GT: [yellow, black]
+ VLM: [gold, navy blue] (6 jersey(s), 35.2s)
+ PARTIAL similar:1, MISS:black, extra:navy blue
+
+[152/161] 152 - pink_dark blue.jpg
+ GT: [pink, dark blue]
+ VLM: [navy blue, pink] (3 jersey(s), 7.9s)
+ PASS exact:1, similar:1
+
+[153/161] 153 - maroon_white.jpg
+ GT: [maroon]
+ VLM: [maroon] (2 jersey(s), 4.6s)
+ PASS exact:1
+
+[154/161] 154 - dark brown.jpeg
+ GT: [dark brown]
+ VLM: [brown] (5 jersey(s), 8.9s)
+ PASS similar:1
+
+[155/161] 155 - white_green_gray_purple_yellow.jpg
+ GT: [green, gray, purple, yellow]
+ VLM: [gold, gray, purple] (5 jersey(s), 21.6s)
+ PARTIAL exact:2, similar:1, MISS:green
+
+[156/161] 156 - maroon_gray.jpg
+ GT: [maroon, gray]
+ VLM: [maroon] (2 jersey(s), 15.0s)
+ PARTIAL exact:1, MISS:gray
+
+[157/161] 157 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 37.0s)
+ PASS exact:1
+
+[158/161] 158 - dark blue_yellow.jpg
+ GT: [dark blue, yellow]
+ VLM: [gold, navy blue] (5 jersey(s), 37.4s)
+ PASS similar:2
+
+[159/161] 159 - blue_white.jpg
+ GT: [blue]
+ VLM: [blue] (5 jersey(s), 10.1s)
+ PASS exact:1
+
+[160/161] 160 - blue_white.jpg
+ GT: [blue]
+ VLM: [(none)] (1 jersey(s), 4.3s)
+ FAIL MISS:blue
+
+[161/161] 161 - light blue_white.jpg
+ GT: [light blue]
+ VLM: [light blue] (2 jersey(s), 4.4s)
+ PASS exact:1
+
+================================================================================
+ACCURACY SUMMARY (gemini-3-flash-preview)
+================================================================================
+Images processed: 161
+Errors: 0
+Total time: 344.4s (2.1s avg)
+
+Ground truth colors: 202 (excluding white)
+VLM unique colors: 174 (excluding white)
+
+--- Recall (did VLM find each ground truth color?) ---
+ Exact match: 137 / 202 (67.8%)
+ Similar match: 28 / 202 (13.9%)
+ Total found: 165 / 202 (81.7%)
+ Missed: 37 / 202 (18.3%)
+
+--- Precision (are VLM colors correct?) ---
+ Exact match: 137 / 174 (78.7%)
+ Similar match: 27 / 174 (15.5%)
+ Total correct: 164 / 174 (94.3%)
+ Extra/wrong: 10 / 174 (5.7%)
+
+--- Similar-Match Confusions (expected -> got) ---
+ dark blue -> navy blue x10
+ navy blue -> blue x5
+ yellow -> gold x5
+ gold -> yellow x3
+ brown -> dark brown x2
+ dark brown -> brown x2
+ navy -> navy blue x1
+
+--- Most Missed Ground Truth Colors ---
+ black 7 #######
+ gray 6 ######
+ maroon 3 ###
+ red 3 ###
+ blue 3 ###
+ green 3 ###
+ gold 2 ##
+ gold|yellow 2 ##
+ orange 2 ##
+ teal 2 ##
+ light blue 1 #
+ navy blue 1 #
+ yellow 1 #
+ brown 1 #
+
+--- Most Common Extra/Wrong VLM Colors ---
+ black 3 ###
+ light blue 2 ##
+ blue 1 #
+ green 1 #
+ orange 1 #
+ red 1 #
+ navy blue 1 #
+
+--- Per-Image Verdict ---
+ PASS 124
+ PARTIAL 19
+ FAIL 18
+
+--- Failed Images (18) ---
+ 016 - maroon.jpg
+ missed: maroon
+ 034 - light blue.jpg
+ missed: light blue
+ extra: blue
+ 046 - green.jpg
+ missed: green
+ extra: black
+ 048 - red.jpg
+ missed: red
+ 053 - black_white.jpg
+ missed: black
+ 057 - white_gold or yellow.jpg
+ missed: gold|yellow
+ 069 - red_white.jpg
+ missed: red
+ 074 - white_orange.jpg
+ missed: orange
+ 077 - teal_white.jpg
+ missed: teal
+ extra: green
+ 081 - navy blue.jpg
+ missed: navy blue
+ extra: light blue
+ 088 - white_maroon.jpg
+ missed: maroon
+ 112 - orange_white.jpg
+ missed: orange
+ 129 - blue_white.jpg
+ missed: blue
+ 132 - brown_white.jpg
+ missed: brown
+ extra: orange
+ 134 - teal_white.jpg
+ missed: teal
+ extra: light blue
+ 138 - maroon.jpg
+ missed: maroon
+ extra: red
+ 150 - green_gray.jpg
+ missed: green, gray
+ extra: black
+ 160 - blue_white.jpg
+ missed: blue
+
+========================================
+All tests completed at: Tue Mar 3 06:11:40 PM MST 2026
diff --git a/jersey_prompt_capstone.txt b/jersey_prompt_capstone.txt
new file mode 100644
index 0000000..36698d9
--- /dev/null
+++ b/jersey_prompt_capstone.txt
@@ -0,0 +1,15 @@
+You are a high-precision sports telemetry system. Your job is to scan the image and output structured data for every visible jersey number.
+
+**Goal:** Identify every clearly readable jersey number, along with its jersey color and number color.
+
+**Input Analysis Guidelines:**
+
+1. **Scan Targets:** Focus entirely on the torso/chest, back, and leg areas of players.
+2. **Verify Readability:** For each potential number, check: - Are all digits clearly visible? - Is any part of the number occluded by a limb, fold, or object? - Is the number blurry or too small to read with certainty? - If a number is partially hidden (e.g., looking like a 1 but could be a 7), DISCARD IT.
+3. Determine jersey_color from that player's TORSO SHIRT region: - Use the largest contiguous fabric area on the torso (exclude the number itself, stripes/logos, and deep shadows). - Ignore shorts color even if shorts dominate the image. - Choose the single color name that best matches the shirt's base color.
+
+**Examples:** [Image: Player in red shirt with white '10'] -> {"jerseys": [{"jersey_number": "10", "jersey_color": "red", "number_color": "white"}]}
+
+**Output Format:** Provide your output in valid JSON format with the following structure. Do not include markdown formatting (like ```json). { "jerseys": [ { "jersey_number":