Add accuracy test framework, prompts, results, and analysis reports

Includes accuracy test scripts for Qwen (local) and Gemini (cloud API), three prompt variants (original, capstone, constrained), test results from all runs, and two analysis reports with an HTML presentation version.
2026-03-03 18:44:49 -07:00
parent 435033ea07
commit 5405d7f7dc
13 changed files with 8561 additions and 0 deletions
--- a/.python-version
+++ b/.python-version
@ -0,0 +1 @@
 3.12
--- a/accuracy_analysis_report.md
+++ b/accuracy_analysis_report.md
@ -0,0 +1,170 @@
 # Jersey Color Detection Accuracy Analysis
 ## Test Configuration
 - **Models tested:** Gemini 3 Flash Preview (cloud API), Qwen3-VL-8B (local, via llama.cpp)
 - **Prompts tested:** `jersey_prompt.txt` (original), `jersey_prompt_capstone.txt` (capstone)
 - **Test images:** 161 annotated basketball jersey images
 - **Ground truth colors:** 202 (excluding white)
 - **Images resized** to max 768px wide before submission
 ---
 ## Summary Comparison
 | Metric                     | Gemini + Original | Gemini + Capstone | Qwen + Original | Qwen + Capstone |
 |----------------------------|:-----------------:|:-----------------:|:----------------:|:---------------:|
 | **Recall (exact)**         | 64.4%             | 60.9%             | 64.4%            | 65.8%           |
 | **Recall (exact+similar)** | **81.2%**         | 78.2%             | 77.2%            | 77.7%           |
 | **Recall (missed)**        | 18.8%             | 21.8%             | 22.8%            | 22.3%           |
 | **Precision (exact)**      | 74.7%             | 70.7%             | 70.7%            | 73.9%           |
 | **Precision (exact+sim.)** | **93.7%**         | 90.2%             | 84.8%            | 87.2%           |
 | **Extra/wrong**            | **6.3%**          | 9.8%              | 15.2%            | 12.8%           |
 | PASS images                | **124**           | 118               | 117              | 119             |
 | PARTIAL images             | 19                | 21                | 18               | 19              |
 | FAIL images                | **18**            | 22                | 26               | 23              |
 | Avg time per image         | 13.3s             | 11.7s             | 9.5s             | 8.9s            |
 ### Key Takeaways
 1. **Gemini + original prompt is the best combination** across all major metrics: highest recall (81.2%), highest precision (93.7%), fewest failures (18), and fewest extra/wrong colors (6.3%).
 2. **Exact recall is remarkably stable** across all four runs (60.9%–65.8%), suggesting ~35% of ground truth colors are inherently difficult for current VLMs regardless of model or prompt.
 3. **Gemini produces far fewer hallucinated colors** than Qwen. Gemini's extra/wrong rate is 6.3%–9.8% vs. Qwen's 12.8%–15.2%. When Gemini detects a color, it is almost always correct.
 4. **The capstone prompt did not improve results** for either model. For Gemini it degraded both recall and precision. For Qwen the difference was negligible.
 5. **Qwen is ~30% faster** (8.9–9.5s vs 11.7–13.3s per image) but at the cost of lower accuracy and more false positives.
 ---
 ## Color-Level Analysis
 ### Most Problematic Ground Truth Colors
 Colors most frequently missed across all four test runs:
 | Color           | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Total Misses | Common Confusion    |
 |-----------------|:-----------:|:----------:|:---------:|:--------:|:------------:|---------------------|
 | **gray**        | 7           | 6          | 7         | 9        | 29           | Often returned as "grey" (similar match) or missed entirely |
 | **maroon**      | 5           | 9          | 8         | 7        | 29           | Frequently confused with "red"   |
 | **black**       | 7           | 7          | 6         | 6        | 26           | Often not detected at all        |
 | **light blue**  | 2           | 2          | 8         | 5        | 17           | Returned as "blue" (Qwen especially) |
 | **green**       | 3           | 4          | 3         | 4        | 14           | Sometimes returned as "black"    |
 | **dark brown**  | 0           | 1          | 4         | 4        | 9            | Returned as "black" or "brown"   |
 | **brown**       | 1           | 1          | 3         | 3        | 8            | Returned as "black" or "orange"  |
 | **teal**        | 2           | 2          | 2         | 2        | 8            | Confused with "green" or "blue"  |
 | **blue**        | 3           | 3          | 3         | 2        | 11           | Sometimes not detected at all    |
 | **gold/yellow** | 2           | 2          | 1         | 1        | 6            | Occasionally missed entirely     |
 ### Most Common Extra/Wrong Colors Reported
 | Extra Color  | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap | Notes |
 |--------------|:-----------:|:----------:|:---------:|:--------:|-------|
 | **red**      | 3           | 7          | 7         | 6        | Typically a misread of maroon    |
 | **black**    | 2           | 4          | 7         | 7        | Misread of dark brown/green/gray |
 | **blue**     | 3           | 2          | 10        | 6        | Misread of light blue or teal    |
 | **green**    | 1           | 1          | 1         | 1        | Misread of teal                  |
 | **orange**   | 1           | 1          | 1         | 1        | Misread of brown                 |
 ### Similar-Match Confusion Patterns
 These are cases where the VLM returned a color in the same family but not the exact ground truth term:
 | Expected         | Returned As    | Gemini+Orig | Gemini+Cap | Qwen+Orig | Qwen+Cap |
 |------------------|----------------|:-----------:|:----------:|:---------:|:--------:|
 | gray             | grey           | 9           | 10         | —         | —        |
 | navy blue        | blue           | 7           | 6          | 8         | 8        |
 | dark blue        | blue           | 5           | 6          | 10        | 9        |
 | dark brown       | brown          | 5           | 5          | 2         | 2        |
 | gold             | yellow         | 3           | 2          | 5         | 3        |
 | dark blue        | navy blue/navy | 4           | 4          | —         | 1        |
 **Observations:**
 - **gray/grey** is purely a spelling variant — Gemini consistently uses British spelling. Qwen uses "gray" so this never triggers for Qwen.
 - **navy blue → blue** and **dark blue → blue** are the most common simplifications. Both models tend to drop shade qualifiers.
 - **dark brown → brown** follows the same pattern of dropping the shade qualifier.
 - **gold → yellow** is a genuine color perception difference where models see yellow-dominant gold jerseys.
 ---
 ## Persistently Failed Images
 These 11 images failed across **all four** test runs, representing the hardest cases:
 | Image | GT Colors | Typical VLM Response | Failure Pattern |
 |-------|-----------|---------------------|-----------------|
 | 016 - maroon.jpg            | maroon          | (none) or red        | Maroon not recognized |
 | 029 - maroon_white.jpg      | maroon          | red                  | Maroon → red confusion |
 | 034 - light blue.jpg        | light blue      | blue                 | Shade qualifier dropped |
 | 046 - green.jpg             | green           | black                | Dark green misread as black |
 | 053 - black_white.jpg       | black           | (not detected)       | Black jerseys missed |
 | 057 - gold or yellow.jpg    | gold\|yellow    | (not detected)       | Gold/yellow missed |
 | 132 - brown_white.jpg       | brown           | orange               | Brown → orange confusion |
 | 134 - teal_white.jpg        | teal            | blue or green        | Teal not in model vocabulary |
 | 138 - maroon.jpg            | maroon          | red                  | Maroon → red confusion |
 | 150 - green_gray.jpg        | green, gray     | black                | Both colors misread |
 | 160 - blue_white.jpg        | blue            | (not detected)       | Blue not detected |
 ### Root Cause Categories
 1. **Maroon blindness (3 images):** Both models consistently classify maroon as red. This is the single largest systematic error.
 2. **Dark color confusion (3 images):** Dark green, brown, and black are frequently confused with each other, especially in low-contrast or shadowed images.
 3. **Shade qualifier loss (2 images):** "Light blue" and "teal" are simplified to "blue" or "green" — models use a coarser color vocabulary than the ground truth.
 4. **Non-detection (3 images):** Some jerseys are simply not detected at all, likely due to occlusion, unusual angles, or low image quality.
 ---
 ## Model-Specific Observations
 ### Gemini 3 Flash
 - **Strengths:** Highest precision (93.7%), very few hallucinated colors, good at similar-family matching. Never produced gibberish color names.
 - **Weaknesses:** Consistently uses British "grey" instead of "gray". Slower than local model.
 - **Prompt sensitivity:** The capstone prompt slightly hurt performance (81.2% → 78.2% recall), suggesting the original simpler prompt works better.
 ### Qwen3-VL-8B
 - **Strengths:** Faster inference (8.9s avg). Slightly higher exact match rate with capstone prompt (65.8%).
 - **Weaknesses:** Much higher false positive rate (12.8–15.2% extra/wrong). Struggles significantly with "light blue" (8 misses with original prompt). Produced one gibberish color ("redolas"). Over-reports "blue" and "black".
 - **Prompt sensitivity:** Minimal difference between prompts. Capstone prompt slightly reduced errors.
 ---
 ## Recommendations
 1. **Normalize "grey" → "gray"** in post-processing to eliminate the most common similar-match gap for Gemini.
 2. **Add "maroon" to the prompt** as an explicit color option or example, since both models struggle to distinguish it from red without guidance.
 3. **Consider a constrained color vocabulary** in the prompt (e.g., "Choose from: red, blue, green, yellow, orange, purple, black, gray, brown, maroon, teal, light blue, navy blue, gold, pink") to reduce vocabulary mismatch and shade-qualifier drift.
 4. **Post-processing color mapping** could recover many similar-match cases automatically: navy→navy blue, grey→gray, dark blue→navy blue, etc.
 5. **The original `jersey_prompt.txt` is the better prompt** — the capstone prompt's additional constraints did not improve accuracy for either model.
 ---
 ## Appendix: Color Similarity Families
 The following color families were used for "similar match" scoring. Two colors count as a similar match if they appear in the same family:
 | Family     | Member Colors                                         |
 |------------|-------------------------------------------------------|
 | blue       | blue, dark blue, navy blue, navy, royal blue          |
 | light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
 | red        | red, scarlet, crimson                                 |
 | dark_red   | maroon, burgundy, dark red, wine                      |
 | green      | green, dark green, forest green, kelly green          |
 | yellow     | yellow, gold, golden                                  |
 | orange     | orange, burnt orange                                  |
 | brown      | brown, dark brown                                     |
 | purple     | purple, violet                                        |
 | gray       | gray, grey, silver, charcoal                          |
 | black      | black                                                 |
 | teal       | teal, turquoise, cyan, aqua                           |
 | pink       | pink, magenta, hot pink, rose                         |
 **Note:** Colors in *different* families are never counted as similar, even if perceptually close (e.g., maroon and red are in separate families; brown and orange are in separate families). This is intentional — the similar-match metric captures vocabulary variation within the same color concept, not genuine color misidentification.
--- a/accuracy_analysis_report_round2.html
+++ b/accuracy_analysis_report_round2.html
@ -0,0 +1,760 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Jersey Color Detection Accuracy — Round 2 Analysis</title>
 <style>
  :root {
    --green: #16a34a;
    --green-bg: #dcfce7;
    --red: #dc2626;
    --red-bg: #fee2e2;
    --blue: #2563eb;
    --blue-bg: #dbeafe;
    --amber: #d97706;
    --amber-bg: #fef3c7;
    --gray-50: #f9fafb;
    --gray-100: #f3f4f6;
    --gray-200: #e5e7eb;
    --gray-300: #d1d5db;
    --gray-600: #4b5563;
    --gray-700: #374151;
    --gray-800: #1f2937;
    --gray-900: #111827;
  }
  * { box-sizing: border-box; margin: 0; padding: 0; }
  body {
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
    line-height: 1.6;
    color: var(--gray-800);
    max-width: 1100px;
    margin: 0 auto;
    padding: 2rem;
    background: #fff;
  }
  h1 {
    font-size: 2rem;
    color: var(--gray-900);
    border-bottom: 3px solid var(--blue);
    padding-bottom: 0.5rem;
    margin-bottom: 0.5rem;
  }
  h2 {
    font-size: 1.5rem;
    color: var(--blue);
    margin-top: 2.5rem;
    margin-bottom: 1rem;
    border-bottom: 2px solid var(--gray-200);
    padding-bottom: 0.3rem;
  }
  h3 {
    font-size: 1.15rem;
    color: var(--gray-700);
    margin-top: 1.5rem;
    margin-bottom: 0.5rem;
  }
  .meta {
    color: var(--gray-600);
    font-size: 0.95rem;
    margin-bottom: 1.5rem;
  }
  .meta strong { color: var(--gray-800); }
  p { margin-bottom: 0.75rem; }
  hr {
    border: none;
    border-top: 1px solid var(--gray-200);
    margin: 2rem 0;
  }
  /* Tables */
  table {
    width: 100%;
    border-collapse: collapse;
    margin: 1rem 0 1.5rem;
    font-size: 0.9rem;
  }
  th, td {
    padding: 0.5rem 0.75rem;
    text-align: center;
    border: 1px solid var(--gray-200);
  }
  th {
    background: var(--gray-800);
    color: #fff;
    font-weight: 600;
  }
  td:first-child, th:first-child {
    text-align: left;
    font-weight: 600;
  }
  tr:nth-child(even) { background: var(--gray-50); }
  tr:hover { background: var(--gray-100); }
  /* Highlight classes */
  .best {
    background: var(--green-bg) !important;
    color: var(--green);
    font-weight: 700;
  }
  .worst {
    background: var(--red-bg) !important;
    color: var(--red);
    font-weight: 600;
  }
  .improved {
    background: var(--blue-bg) !important;
    color: var(--blue);
    font-weight: 600;
  }
  .warning {
    background: var(--amber-bg) !important;
    color: var(--amber);
    font-weight: 600;
  }
  /* Callout boxes */
  .callout {
    border-left: 4px solid;
    padding: 1rem 1.25rem;
    margin: 1rem 0;
    border-radius: 0 6px 6px 0;
  }
  .callout-green {
    border-color: var(--green);
    background: var(--green-bg);
  }
  .callout-red {
    border-color: var(--red);
    background: var(--red-bg);
  }
  .callout-blue {
    border-color: var(--blue);
    background: var(--blue-bg);
  }
  .callout-amber {
    border-color: var(--amber);
    background: var(--amber-bg);
  }
  .callout strong { display: block; margin-bottom: 0.25rem; }
  /* Model comparison cards */
  .model-cards {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 1.5rem;
    margin: 1rem 0;
  }
  .model-card {
    border: 2px solid var(--gray-200);
    border-radius: 8px;
    padding: 1.25rem;
  }
  .model-card h3 {
    margin-top: 0;
    padding-bottom: 0.4rem;
    border-bottom: 2px solid;
  }
  .model-card.gemini h3 { border-color: var(--blue); color: var(--blue); }
  .model-card.qwen h3 { border-color: var(--green); color: var(--green); }
  .model-card ul {
    list-style: none;
    padding: 0;
    margin: 0.5rem 0 0;
  }
  .model-card li {
    padding: 0.3rem 0;
    font-size: 0.9rem;
  }
  .model-card li strong { color: var(--gray-700); }
  /* Recommendation list */
  ol.recs {
    counter-reset: rec;
    list-style: none;
    padding: 0;
  }
  ol.recs li {
    counter-increment: rec;
    padding: 0.75rem 1rem 0.75rem 3.25rem;
    margin-bottom: 0.5rem;
    border-radius: 6px;
    background: var(--gray-50);
    border: 1px solid var(--gray-200);
    position: relative;
  }
  ol.recs li::before {
    content: counter(rec);
    position: absolute;
    left: 0.75rem;
    top: 0.75rem;
    width: 1.75rem;
    height: 1.75rem;
    background: var(--blue);
    color: #fff;
    border-radius: 50%;
    text-align: center;
    line-height: 1.75rem;
    font-weight: 700;
    font-size: 0.85rem;
  }
  /* Code / prompt block */
  pre {
    background: var(--gray-900);
    color: #e5e7eb;
    padding: 1.25rem;
    border-radius: 8px;
    overflow-x: auto;
    font-size: 0.85rem;
    line-height: 1.5;
    margin: 1rem 0;
  }
  code {
    font-family: "SF Mono", "Fira Code", "Fira Mono", Menlo, Consolas, monospace;
    background: var(--gray-100);
    padding: 0.15rem 0.35rem;
    border-radius: 3px;
    font-size: 0.88em;
  }
  pre code {
    background: none;
    padding: 0;
  }
  /* Color swatch in similarity table */
  .swatch {
    display: inline-block;
    width: 14px;
    height: 14px;
    border-radius: 3px;
    margin-right: 6px;
    vertical-align: middle;
    border: 1px solid var(--gray-300);
  }
  /* Badge */
  .badge {
    display: inline-block;
    padding: 0.15rem 0.5rem;
    border-radius: 4px;
    font-size: 0.8rem;
    font-weight: 700;
    text-transform: uppercase;
    letter-spacing: 0.03em;
  }
  .badge-pass { background: var(--green-bg); color: var(--green); }
  .badge-partial { background: var(--amber-bg); color: var(--amber); }
  .badge-fail { background: var(--red-bg); color: var(--red); }
  /* Print styles */
  @media print {
    body { padding: 0; font-size: 11pt; }
    .callout, .model-card { break-inside: avoid; }
    h2 { break-after: avoid; }
  }
 </style>
 </head>
 <body>
 <h1>Jersey Color Detection Accuracy — Round 2 Analysis</h1>
 <div class="meta">
  <strong>Date:</strong> March 3, 2026<br>
  <strong>Models:</strong> Gemini 3 Flash Preview, Qwen3-VL-8B (local via llama.cpp)<br>
  <strong>Prompts:</strong> jersey_prompt.txt (original), jersey_prompt_capstone.txt (capstone), jersey_prompt_constrained.txt (constrained)<br>
  <strong>Test set:</strong> 161 annotated images, 202 ground truth colors (excluding white)
 </div>
 <hr>
 <h2>Summary Comparison</h2>
 <table>
  <thead>
    <tr>
      <th>Metric</th>
      <th>Qwen Original</th>
      <th>Qwen Capstone</th>
      <th>Qwen Constrained</th>
      <th>Gemini Original</th>
      <th>Gemini Capstone</th>
      <th>Gemini Constrained</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Recall (exact)</td>
      <td>65.3%</td>
      <td>66.3%</td>
      <td class="best">71.8%</td>
      <td>62.4%</td>
      <td class="worst">60.9%</td>
      <td class="improved">67.8%</td>
    </tr>
    <tr>
      <td>Recall (exact+similar)</td>
      <td>78.2%</td>
      <td>78.2%</td>
      <td class="best">82.7%</td>
      <td>79.7%</td>
      <td class="worst">78.2%</td>
      <td class="improved">81.7%</td>
    </tr>
    <tr>
      <td>Missed</td>
      <td>21.8%</td>
      <td>21.8%</td>
      <td class="best">17.3%</td>
      <td>20.3%</td>
      <td class="worst">21.8%</td>
      <td class="improved">18.3%</td>
    </tr>
    <tr>
      <td>Precision (exact)</td>
      <td>71.7%</td>
      <td>74.0%</td>
      <td class="improved">78.4%</td>
      <td>72.0%</td>
      <td class="worst">69.5%</td>
      <td class="best">78.7%</td>
    </tr>
    <tr>
      <td>Precision (exact+sim.)</td>
      <td>85.9%</td>
      <td>87.3%</td>
      <td class="improved">90.3%</td>
      <td>91.4%</td>
      <td class="worst">88.7%</td>
      <td class="best">94.3%</td>
    </tr>
    <tr>
      <td>Extra/wrong</td>
      <td>14.1%</td>
      <td>12.7%</td>
      <td class="improved">9.7%</td>
      <td>8.6%</td>
      <td class="worst">11.3%</td>
      <td class="best">5.7%</td>
    </tr>
    <tr>
      <td><span class="badge badge-pass">PASS</span></td>
      <td>118</td>
      <td>120</td>
      <td class="best">127</td>
      <td>120</td>
      <td>117</td>
      <td class="improved">124</td>
    </tr>
    <tr>
      <td><span class="badge badge-partial">PARTIAL</span></td>
      <td>19</td>
      <td>19</td>
      <td class="best">15</td>
      <td>20</td>
      <td class="worst">22</td>
      <td>19</td>
    </tr>
    <tr>
      <td><span class="badge badge-fail">FAIL</span></td>
      <td>24</td>
      <td>22</td>
      <td>19</td>
      <td>21</td>
      <td>22</td>
      <td class="best">18</td>
    </tr>
    <tr>
      <td>Total time</td>
      <td>1557s</td>
      <td>1437s</td>
      <td>1596s</td>
      <td class="best">253s</td>
      <td>260s</td>
      <td>344s</td>
    </tr>
  </tbody>
 </table>
 <hr>
 <h2>Key Findings</h2>
 <h3>1. The constrained prompt is the best prompt for both models</h3>
 <p>The constrained vocabulary prompt delivered the strongest results across the board:</p>
 <div class="callout callout-green">
  <strong>Qwen + Constrained</strong>
  Achieved the highest recall of any combination at <strong>82.7%</strong> (167/202 found), up from 78.2% with both other prompts. It also posted the most PASS images (<strong>127</strong>, up from 118/120) and the fewest FAIL images (<strong>19</strong>, down from 24/22).
 </div>
 <div class="callout callout-blue">
  <strong>Gemini + Constrained</strong>
  Achieved the highest precision of any combination at <strong>94.3%</strong> (164/174 correct), with only <strong>5.7% extra/wrong</strong> colors — the lowest error rate across all six runs. It tied for fewest failures at <strong>18</strong>.
 </div>
 <h3>2. Exact match rates jumped significantly</h3>
 <p>The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:</p>
 <table>
  <thead>
    <tr>
      <th>Model</th>
      <th>Exact Match (Original)</th>
      <th>Exact Match (Constrained)</th>
      <th>Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Qwen</td>
      <td>65.3% (132)</td>
      <td class="best">71.8% (145)</td>
      <td class="improved">+6.5 pp</td>
    </tr>
    <tr>
      <td>Gemini</td>
      <td>62.4% (126)</td>
      <td class="best">67.8% (137)</td>
      <td class="improved">+5.4 pp</td>
    </tr>
  </tbody>
 </table>
 <p>This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."</p>
 <h3>3. Targeted color improvements</h3>
 <p>The constrained prompt's explicit color guidance fixed the worst systematic errors:</p>
 <table>
  <thead>
    <tr>
      <th>Problem Color</th>
      <th>Qwen Misses (Orig→Constrained)</th>
      <th>Gemini Misses (Orig→Constrained)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><span class="swatch" style="background:#800000"></span>maroon</td>
      <td>8 → <span style="color:var(--green);font-weight:700">3</span></td>
      <td>6 → <span style="color:var(--green);font-weight:700">3</span></td>
    </tr>
    <tr>
      <td><span class="swatch" style="background:#87ceeb"></span>light blue</td>
      <td>7 → <span style="color:var(--green);font-weight:700">1</span></td>
      <td>3 → <span style="color:var(--green);font-weight:700">1</span></td>
    </tr>
    <tr>
      <td><span class="swatch" style="background:#3e2723"></span>dark brown</td>
      <td>4 → <span style="color:var(--green);font-weight:700">2</span></td>
      <td>1 → 1</td>
    </tr>
    <tr>
      <td><span class="swatch" style="background:#008080"></span>teal</td>
      <td>2 → 2</td>
      <td>2 → 2</td>
    </tr>
    <tr>
      <td><span class="swatch" style="background:#9e9e9e"></span>gray</td>
      <td class="warning">7 → 8</td>
      <td>6 → 6</td>
    </tr>
    <tr>
      <td><span class="swatch" style="background:#222"></span>black</td>
      <td>6 → 6</td>
      <td>7 → 7</td>
    </tr>
  </tbody>
 </table>
 <ul>
  <li><strong>Maroon:</strong> Cut in half for both models. Previously the most-missed color for Qwen; now ranks 5th.</li>
  <li><strong>Light blue:</strong> Near-elimination of the "light blue → blue" confusion for both models (7→1 for Qwen, 3→1 for Gemini).</li>
  <li><strong>Gray/grey:</strong> The spelling normalization instruction eliminated the grey→gray similar-match penalty for Gemini entirely (10 confusions → 0). However, gray detection misses remain unchanged — these are images where gray jerseys aren't detected at all, not a naming issue.</li>
  <li><strong>Teal and black</strong> remain stubbornly problematic regardless of prompt.</li>
 </ul>
 <h3>4. New overcorrection pattern with constrained prompt</h3>
 <div class="callout callout-amber">
  <strong>Overcorrection Warning</strong>
  The constrained prompt introduced a new failure mode — models now occasionally over-apply newly-learned color terms.
 </div>
 <ul>
  <li><strong>Qwen + Constrained</strong> reported "maroon" as an extra/wrong color <strong>5 times</strong> (was 0 previously). It's now calling some brown and red jerseys "maroon" — the opposite of the original problem. Specific cases: 007 (brown→maroon), 031 (brown→maroon), 048 (red→maroon), 142 (orange→maroon).</li>
  <li><strong>Gemini + Constrained</strong> reported "light blue" as an extra/wrong color <strong>2 times</strong> (was 0 previously), including misidentifying navy blue as light blue (image 081).</li>
 </ul>
 <p>This overcorrection is a smaller problem than the original misses it replaced, but worth noting.</p>
 <h3>5. The capstone prompt did not improve results</h3>
 <div class="callout callout-red">
  <strong>Capstone Prompt: No Benefit</strong>
  The capstone prompt performed at or slightly below the original prompt for both models. Its emphasis on precision over recall ("do not guess") hurt overall detection rates without meaningfully improving color accuracy.
 </div>
 <ul>
  <li>Qwen: 78.2% recall (same), 87.3% precision (slight improvement)</li>
  <li>Gemini: 78.2% recall (down from 79.7%), 88.7% precision (down from 91.4%)</li>
 </ul>
 <h3>6. Gemini speed improvement from concurrency</h3>
 <p>The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains:</p>
 <table>
  <thead>
    <tr>
      <th>Previous Sequential Runs</th>
      <th>Current Concurrent Runs</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>2134s (13.3s avg)</td>
      <td class="best">253s (1.6s avg)</td>
    </tr>
    <tr>
      <td>1882s (11.7s avg)</td>
      <td class="best">260s (1.6s avg)</td>
    </tr>
    <tr>
      <td>—</td>
      <td>344s (2.1s avg)</td>
    </tr>
  </tbody>
 </table>
 <p>That's roughly an <strong>8x speedup</strong> for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).</p>
 <hr>
 <h2>Persistently Failed Images</h2>
 <p>These <strong>10 images</strong> failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:</p>
 <table>
  <thead>
    <tr>
      <th>Image</th>
      <th>GT Colors</th>
      <th>Typical Error</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>016 - maroon.jpg</td>
      <td><span class="swatch" style="background:#800000"></span>maroon</td>
      <td class="worst">Not detected or called "red"</td>
    </tr>
    <tr>
      <td>034 - light blue.jpg</td>
      <td><span class="swatch" style="background:#87ceeb"></span>light blue</td>
      <td class="worst">Called "blue"</td>
    </tr>
    <tr>
      <td>046 - green.jpg</td>
      <td><span class="swatch" style="background:#388e3c"></span>green</td>
      <td class="worst">Called "black"</td>
    </tr>
    <tr>
      <td>053 - black_white.jpg</td>
      <td><span class="swatch" style="background:#222"></span>black</td>
      <td class="worst">Not detected</td>
    </tr>
    <tr>
      <td>077 - teal_white.jpg</td>
      <td><span class="swatch" style="background:#008080"></span>teal</td>
      <td class="worst">Called "green"</td>
    </tr>
    <tr>
      <td>132 - brown_white.jpg</td>
      <td><span class="swatch" style="background:#795548"></span>brown</td>
      <td class="worst">Called "orange"</td>
    </tr>
    <tr>
      <td>134 - teal_white.jpg</td>
      <td><span class="swatch" style="background:#008080"></span>teal</td>
      <td class="worst">Called "blue" or "light blue"</td>
    </tr>
    <tr>
      <td>138 - maroon.jpg</td>
      <td><span class="swatch" style="background:#800000"></span>maroon</td>
      <td class="worst">Called "red"</td>
    </tr>
    <tr>
      <td>150 - green_gray.jpg</td>
      <td><span class="swatch" style="background:#388e3c"></span>green, <span class="swatch" style="background:#9e9e9e"></span>gray</td>
      <td class="worst">Called "black"</td>
    </tr>
    <tr>
      <td>160 - blue_white.jpg</td>
      <td><span class="swatch" style="background:#2196f3"></span>blue</td>
      <td class="worst">Not detected</td>
    </tr>
  </tbody>
 </table>
 <p>Notable improvements: Images <strong>029</strong> (maroon), <strong>087/141/161</strong> (light blue), and <strong>099</strong> (maroon) were previously persistent failures but were <strong>fixed by the constrained prompt</strong> for at least one model.</p>
 <hr>
 <h2>Model Comparison</h2>
 <div class="model-cards">
  <div class="model-card gemini">
    <h3>Gemini 3 Flash</h3>
    <ul>
      <li><strong>Best at:</strong> Precision (94.3% with constrained prompt), fewest hallucinated colors</li>
      <li><strong>Weakness:</strong> Lower exact recall than Qwen; still uses shade variants even with constraints</li>
      <li><strong>Speed:</strong> ~250–340s with 8 concurrent workers</li>
    </ul>
  </div>
  <div class="model-card qwen">
    <h3>Qwen3-VL-8B</h3>
    <ul>
      <li><strong>Best at:</strong> Recall (82.7% with constrained prompt), highest PASS count (127)</li>
      <li><strong>Weakness:</strong> Higher false positive rate; introduced "maroon" overcorrection with constrained prompt</li>
      <li><strong>Speed:</strong> ~1440–1600s sequential (local GPU inference)</li>
    </ul>
  </div>
 </div>
 <hr>
 <h2>Recommendations</h2>
 <ol class="recs">
  <li><strong>Use the constrained prompt</strong> (<code>jersey_prompt_constrained.txt</code>) — it is the clear winner for both models, improving recall and precision simultaneously.</li>
  <li><strong>Post-processing normalization</strong> could still recover additional matches: map <code>grey</code> → <code>gray</code> (catches any remaining Gemini outputs) and <code>navy</code> → <code>navy blue</code> (catches shorthand usage).</li>
  <li><strong>Consider a brown/maroon calibration</strong> — the constrained prompt overcorrected on Qwen, turning brown→maroon confusion into a new error source. Adding "Use 'brown' for warm, non-reddish dark colors" or similar guidance may help.</li>
  <li><strong>Gray and black detection remain unsolved</strong> at the prompt level — these are likely image quality or model perception limitations that no amount of prompt engineering will fix. These colors may benefit from a secondary computer vision pass (e.g., dominant color extraction from the jersey region).</li>
  <li><strong>Retire the capstone prompt</strong> — it offered no benefit over the original and performed worse than the constrained prompt in every metric.</li>
 </ol>
 <hr>
 <h2>Appendix: Color Similarity Families Used for Scoring</h2>
 <table>
  <thead>
    <tr>
      <th>Family</th>
      <th>Member Colors</th>
    </tr>
  </thead>
  <tbody>
    <tr><td><span class="swatch" style="background:#2196f3"></span>blue</td><td>blue, dark blue, navy blue, navy, royal blue</td></tr>
    <tr><td><span class="swatch" style="background:#87ceeb"></span>light_blue</td><td>light blue, sky blue, baby blue, carolina blue, powder blue</td></tr>
    <tr><td><span class="swatch" style="background:#f44336"></span>red</td><td>red, scarlet, crimson</td></tr>
    <tr><td><span class="swatch" style="background:#800000"></span>dark_red</td><td>maroon, burgundy, dark red, wine</td></tr>
    <tr><td><span class="swatch" style="background:#388e3c"></span>green</td><td>green, dark green, forest green, kelly green</td></tr>
    <tr><td><span class="swatch" style="background:#fdd835"></span>yellow</td><td>yellow, gold, golden</td></tr>
    <tr><td><span class="swatch" style="background:#ff9800"></span>orange</td><td>orange, burnt orange</td></tr>
    <tr><td><span class="swatch" style="background:#795548"></span>brown</td><td>brown, dark brown</td></tr>
    <tr><td><span class="swatch" style="background:#9c27b0"></span>purple</td><td>purple, violet</td></tr>
    <tr><td><span class="swatch" style="background:#9e9e9e"></span>gray</td><td>gray, grey, silver, charcoal</td></tr>
    <tr><td><span class="swatch" style="background:#222"></span>black</td><td>black</td></tr>
    <tr><td><span class="swatch" style="background:#008080"></span>teal</td><td>teal, turquoise, cyan, aqua</td></tr>
    <tr><td><span class="swatch" style="background:#e91e63"></span>pink</td><td>pink, magenta, hot pink, rose</td></tr>
  </tbody>
 </table>
 <hr>
 <h2>Appendix: Constrained Prompt (<code>jersey_prompt_constrained.txt</code>)</h2>
 <pre><code>You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
 CRITICAL INSTRUCTIONS:
 1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
 2. ONLY include jersey numbers that you can ACTUALLY READ in the image
 3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
 4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
 5. DO NOT include jerseys if you cannot clearly see the number
 COLOR VOCABULARY:
 For "jersey_color" and "number_color", you MUST choose from this list ONLY:
 red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
 Important color distinctions:
 - Use "maroon" for dark brownish-red, NOT "red"
 - Use "light blue" for pale or sky blue, NOT "blue"
 - Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
 - Use "teal" for blue-green, NOT "green" or "blue"
 - Use "gray" (not "grey") for silver or neutral tones
 - Use "dark brown" for very dark brown, NOT "black"
 - Use "gold" for metallic or deep yellow, NOT "yellow"
 RESPONSE FORMAT:
 Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
 Use DOUBLE QUOTES (") for all JSON keys and string values.
 The JSON must have a single key "jerseys" with an array of dictionaries.
 Each dictionary must have exactly these three keys:
 - "jersey_number": The number on the jersey (as a string, only if clearly visible)
 - "jersey_color": The primary color of the jersey (MUST be from the color list above)
 - "number_color": The color of the number on the jersey (MUST be from the color list above)
 Example response for an image WITH visible jerseys:
 {
  "jerseys": [
    {
      "jersey_number": "10",
      "jersey_color": "maroon",
      "number_color": "gold"
    },
    {
      "jersey_number": "42",
      "jersey_color": "light blue",
      "number_color": "white"
    }
  ]
 }
 Example response for an image WITHOUT jerseys or with unclear numbers:
 {"jerseys": []}
 REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
 Now analyze the image and return the JSON object.</code></pre>
 </body>
 </html>
--- a/accuracy_analysis_report_round2.md
+++ b/accuracy_analysis_report_round2.md
@ -0,0 +1,229 @@
 # Jersey Color Detection Accuracy — Round 2 Analysis
 **Date:** March 3, 2026
 **Models:** Gemini 3 Flash Preview, Qwen3-VL-8B (local via llama.cpp)
 **Prompts:** jersey_prompt.txt (original), jersey_prompt_capstone.txt (capstone), jersey_prompt_constrained.txt (constrained)
 **Test set:** 161 annotated images, 202 ground truth colors (excluding white)
 ---
 ## Summary Comparison
 | Metric                     | Qwen Original | Qwen Capstone | Qwen Constrained | Gemini Original | Gemini Capstone | Gemini Constrained |
 |----------------------------|:-------------:|:-------------:|:-----------------:|:---------------:|:---------------:|:------------------:|
 | **Recall (exact)**         | 65.3%         | 66.3%         | **71.8%**         | 62.4%           | 60.9%           | 67.8%              |
 | **Recall (exact+similar)** | 78.2%         | 78.2%         | **82.7%**         | 79.7%           | 78.2%           | 81.7%              |
 | **Missed**                 | 21.8%         | 21.8%         | **17.3%**         | 20.3%           | 21.8%           | 18.3%              |
 | **Precision (exact)**      | 71.7%         | 74.0%         | 78.4%             | 72.0%           | 69.5%           | **78.7%**          |
 | **Precision (exact+sim.)** | 85.9%         | 87.3%         | 90.3%             | 91.4%           | 88.7%           | **94.3%**          |
 | **Extra/wrong**            | 14.1%         | 12.7%         | 9.7%              | 8.6%            | 11.3%           | **5.7%**           |
 | PASS                       | 118           | 120           | **127**           | 120             | 117             | 124                |
 | PARTIAL                    | 19            | 19            | **15**            | 20              | 22              | 19                 |
 | FAIL                       | 24            | 22            | 19                | 21              | 22              | **18**             |
 | Total time                 | 1557s         | 1437s         | 1596s             | 253s            | 260s            | 344s               |
 ---
 ## Key Findings
 ### 1. The constrained prompt is the best prompt for both models
 The constrained vocabulary prompt delivered the strongest results across the board:
 - **Qwen + Constrained** achieved the highest recall of any combination at **82.7%** (167/202 found), up from 78.2% with both other prompts. It also posted the most PASS images (**127**, up from 118/120) and the fewest FAIL images (**19**, down from 24/22).
 - **Gemini + Constrained** achieved the highest precision of any combination at **94.3%** (164/174 correct), with only **5.7% extra/wrong** colors — the lowest error rate across all six runs. It tied for fewest failures at **18**.
 ### 2. Exact match rates jumped significantly
 The constrained prompt's biggest impact was converting similar matches into exact matches by forcing models to use the ground truth vocabulary:
 | Model  | Exact Match (Original) | Exact Match (Constrained) | Improvement |
 |--------|:----------------------:|:-------------------------:|:-----------:|
 | Qwen   | 65.3% (132)            | **71.8% (145)**           | +6.5 pp     |
 | Gemini | 62.4% (126)            | **67.8% (137)**           | +5.4 pp     |
 This came partly from eliminating vocabulary mismatch (e.g., grey→gray, navy→navy blue) and partly from teaching models to use specific color terms like "maroon" and "light blue."
 ### 3. Targeted color improvements
 The constrained prompt's explicit color guidance fixed the worst systematic errors:
 | Problem Color  | Qwen Misses (Orig→Constrained) | Gemini Misses (Orig→Constrained) |
 |----------------|:------------------------------:|:--------------------------------:|
 | **maroon**     | 8 → **3**                      | 6 → **3**                        |
 | **light blue** | 7 → **1**                      | 3 → **1**                        |
 | **dark brown** | 4 → **2**                      | 1 → 1                            |
 | **teal**       | 2 → **2**                      | 2 → 2                            |
 | **gray**       | 7 → 8                          | 6 → 6                            |
 | **black**      | 6 → 6                          | 7 → 7                            |
 - **Maroon:** Cut in half for both models. Previously the most-missed color for Qwen; now ranks 5th.
 - **Light blue:** Near-elimination of the "light blue → blue" confusion for both models (7→1 for Qwen, 3→1 for Gemini).
 - **Gray/grey:** The spelling normalization instruction eliminated the grey→gray similar-match penalty for Gemini entirely (10 confusions → 0). However, gray detection misses remain unchanged — these are images where gray jerseys aren't detected at all, not a naming issue.
 - **Teal and black** remain stubbornly problematic regardless of prompt.
 ### 4. New overcorrection pattern with constrained prompt
 The constrained prompt introduced a new failure mode — models now occasionally over-apply newly-learned color terms:
 - **Qwen + Constrained** reported "maroon" as an extra/wrong color **5 times** (was 0 previously). It's now calling some brown and red jerseys "maroon" — the opposite of the original problem. Specific cases: 007 (brown→maroon), 031 (brown→maroon), 048 (red→maroon), 142 (orange→maroon).
 - **Gemini + Constrained** reported "light blue" as an extra/wrong color **2 times** (was 0 previously), including misidentifying navy blue as light blue (image 081).
 This overcorrection is a smaller problem than the original misses it replaced, but worth noting.
 ### 5. The capstone prompt did not improve results
 The capstone prompt performed at or slightly below the original prompt for both models:
 - Qwen: 78.2% recall (same), 87.3% precision (slight improvement)
 - Gemini: 78.2% recall (down from 79.7%), 88.7% precision (down from 91.4%)
 The capstone prompt's emphasis on precision over recall ("do not guess") may have hurt overall detection rates without meaningfully improving color accuracy.
 ### 6. Gemini speed improvement from concurrency
 The concurrent processing optimization (8 workers + session reuse + JPEG quality 85) delivered major speed gains for the Gemini runs:
 | Previous sequential runs | Current concurrent runs |
 |:------------------------:|:-----------------------:|
 | 2134s (13.3s avg)        | 253s (1.6s avg)         |
 | 1882s (11.7s avg)        | 260s (1.6s avg)         |
 |                          | 344s (2.1s avg)         |
 That's roughly an **8x speedup** for the first two prompts. The constrained prompt run was slightly slower (344s) due to its longer prompt text (2223 chars vs ~1500 chars).
 ---
 ## Persistently Failed Images
 These **10 images** failed across all six runs, representing the hardest cases for current VLMs regardless of model or prompt:
 | Image | GT Colors | Typical Error |
 |-------|-----------|---------------|
 | 016 - maroon.jpg             | maroon       | Not detected or called "red" |
 | 034 - light blue.jpg         | light blue   | Called "blue" |
 | 046 - green.jpg              | green        | Called "black" |
 | 053 - black_white.jpg        | black        | Not detected |
 | 077 - teal_white.jpg         | teal         | Called "green" |
 | 132 - brown_white.jpg        | brown        | Called "orange" |
 | 134 - teal_white.jpg         | teal         | Called "blue" or "light blue" |
 | 138 - maroon.jpg             | maroon       | Called "red" |
 | 150 - green_gray.jpg         | green, gray  | Called "black" |
 | 160 - blue_white.jpg         | blue         | Not detected |
 Notable improvements: Images **029** (maroon), **087/141/161** (light blue), and **099** (maroon) were previously persistent failures but were **fixed by the constrained prompt** for at least one model.
 ---
 ## Model Comparison
 ### Gemini 3 Flash
 - **Best at:** Precision (94.3% with constrained prompt), fewest hallucinated colors
 - **Weakness:** Lower exact recall than Qwen; still uses shade variants even with constraints
 - **Speed:** ~250-340s with 8 concurrent workers
 ### Qwen3-VL-8B
 - **Best at:** Recall (82.7% with constrained prompt), highest PASS count (127)
 - **Weakness:** Higher false positive rate; introduced "maroon" overcorrection with constrained prompt
 - **Speed:** ~1440-1600s sequential (local GPU inference)
 ---
 ## Recommendations
 1. **Use the constrained prompt** (`jersey_prompt_constrained.txt`) — it is the clear winner for both models, improving recall and precision simultaneously.
 2. **Post-processing normalization** could still recover additional matches:
   - Map `grey` → `gray` (catches any remaining Gemini outputs)
   - Map `navy` → `navy blue` (catches shorthand usage)
 3. **Consider a brown/maroon calibration** — the constrained prompt overcorrected on Qwen, turning brown→maroon confusion into a new error source. Adding "Use 'brown' for warm, non-reddish dark colors" or similar guidance may help.
 4. **Gray and black detection remain unsolved** at the prompt level — these are likely image quality or model perception limitations that no amount of prompt engineering will fix. These colors may benefit from a secondary computer vision pass (e.g., dominant color extraction from the jersey region).
 5. **Retire the capstone prompt** — it offered no benefit over the original and performed worse than the constrained prompt in every metric.
 ---
 ## Appendix: Color Similarity Families Used for Scoring
 | Family     | Member Colors                                         |
 |------------|-------------------------------------------------------|
 | blue       | blue, dark blue, navy blue, navy, royal blue          |
 | light_blue | light blue, sky blue, baby blue, carolina blue, powder blue |
 | red        | red, scarlet, crimson                                 |
 | dark_red   | maroon, burgundy, dark red, wine                      |
 | green      | green, dark green, forest green, kelly green          |
 | yellow     | yellow, gold, golden                                  |
 | orange     | orange, burnt orange                                  |
 | brown      | brown, dark brown                                     |
 | purple     | purple, violet                                        |
 | gray       | gray, grey, silver, charcoal                          |
 | black      | black                                                 |
 | teal       | teal, turquoise, cyan, aqua                           |
 | pink       | pink, magenta, hot pink, rose                         |
 ---
 ## Appendix: Constrained Prompt (`jersey_prompt_constrained.txt`)
 ```
 You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
 CRITICAL INSTRUCTIONS:
 1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
 2. ONLY include jersey numbers that you can ACTUALLY READ in the image
 3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
 4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
 5. DO NOT include jerseys if you cannot clearly see the number
 COLOR VOCABULARY:
 For "jersey_color" and "number_color", you MUST choose from this list ONLY:
 red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
 Important color distinctions:
 - Use "maroon" for dark brownish-red, NOT "red"
 - Use "light blue" for pale or sky blue, NOT "blue"
 - Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
 - Use "teal" for blue-green, NOT "green" or "blue"
 - Use "gray" (not "grey") for silver or neutral tones
 - Use "dark brown" for very dark brown, NOT "black"
 - Use "gold" for metallic or deep yellow, NOT "yellow"
 RESPONSE FORMAT:
 Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
 Use DOUBLE QUOTES (") for all JSON keys and string values.
 The JSON must have a single key "jerseys" with an array of dictionaries.
 Each dictionary must have exactly these three keys:
 - "jersey_number": The number on the jersey (as a string, only if clearly visible)
 - "jersey_color": The primary color of the jersey (MUST be from the color list above)
 - "number_color": The color of the number on the jersey (MUST be from the color list above)
 Example response for an image WITH visible jerseys:
 {
  "jerseys": [
    {
      "jersey_number": "10",
      "jersey_color": "maroon",
      "number_color": "gold"
    },
    {
      "jersey_number": "42",
      "jersey_color": "light blue",
      "number_color": "white"
    }
  ]
 }
 Example response for an image WITHOUT jerseys or with unclear numbers:
 {"jerseys": []}
 REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
 Now analyze the image and return the JSON object.
 ```
--- a/accuracy_test_results.md
+++ b/accuracy_test_results.md
@ -0,0 +1,490 @@
 #Gemini 3 Flash Results (Prompt: jersey_prompt.txt):
 ================================================================================
 ACCURACY SUMMARY  (gemini-3-flash-preview)
 ================================================================================
 Images processed:       161
 Errors:                 0
 Total time:             2134.4s (13.3s avg)
 Ground truth colors:    202  (excluding white)
 VLM unique colors:      174  (excluding white)
 --- Recall (did VLM find each ground truth color?) ---
  Exact match:           130 / 202  (64.4%)
  Similar match:          34 / 202  (16.8%)
  Total found:           164 / 202  (81.2%)
  Missed:                 38 / 202  (18.8%)
 --- Precision (are VLM colors correct?) ---
  Exact match:           130 / 174  (74.7%)
  Similar match:          33 / 174  (19.0%)
  Total correct:         163 / 174  (93.7%)
  Extra/wrong:            11 / 174  (6.3%)
 --- Similar-Match Confusions (expected -> got) ---
  gray                           -> grey                  x9
  navy blue                      -> blue                  x7
  dark brown                     -> brown                 x5
  dark blue                      -> blue                  x5
  gold                           -> yellow                x3
  dark blue                      -> navy blue             x3
  navy                           -> navy blue             x1
  dark blue                      -> navy                  x1
 --- Most Missed Ground Truth Colors ---
  gray                             7  #######
  black                            7  #######
  maroon                           5  #####
  blue                             3  ###
  green                            3  ###
  gold                             2  ##
  light blue                       2  ##
  gold|yellow                      2  ##
  red                              2  ##
  teal                             2  ##
  orange                           1  #
  yellow                           1  #
  brown                            1  #
 --- Most Common Extra/Wrong VLM Colors ---
  red                              3  ###
  blue                             3  ###
  black                            2  ##
  green                            1  #
  orange                           1  #
  dark blue                        1  #
 --- Per-Image Verdict ---
  PASS        124
  PARTIAL      19
  FAIL         18
 --- Failed Images (18) ---
  016 - maroon.jpg
    missed: maroon
  029 -maroon_white.jpg
    missed: maroon
    extra:  red
  034 - light blue.jpg
    missed: light blue
    extra:  blue
  046 - green.jpg
    missed: green
    extra:  black
  048 - red.jpg
    missed: red
  053 - black_white.jpg
    missed: black
  057 - white_gold or yellow.jpg
    missed: gold|yellow
  069 - red_white.jpg
    missed: red
  074 - white_orange.jpg
    missed: orange
  077 - teal_white.jpg
    missed: teal
    extra:  green
  088 - white_maroon.jpg
    missed: maroon
  129 - blue_white.jpg
    missed: blue
  132 - brown_white.jpg
    missed: brown
    extra:  orange
  134 - teal_white.jpg
    missed: teal
    extra:  blue
  138 - maroon.jpg
    missed: maroon
    extra:  red
  150 - green_gray.jpg
    missed: green, gray
    extra:  black
  160 - blue_white.jpg
    missed: blue
  161 - light blue_white.jpg
    missed: light blue
    extra:  blue
 #Qwen3-VL-8B Model Results (Prompt: jersey_prompt.txt):
 ================================================================================
 ACCURACY SUMMARY
 ================================================================================
 Images processed:       161
 Errors:                 0
 Total time:             1526.4s (9.5s avg)
 Ground truth colors:    202  (excluding white)
 VLM unique colors:      184  (excluding white)
 --- Recall (did VLM find each ground truth color?) ---
  Exact match:           130 / 202  (64.4%)
  Similar match:          26 / 202  (12.9%)
  Total found:           156 / 202  (77.2%)
  Missed:                 46 / 202  (22.8%)
 --- Precision (are VLM colors correct?) ---
  Exact match:           130 / 184  (70.7%)
  Similar match:          26 / 184  (14.1%)
  Total correct:         156 / 184  (84.8%)
  Extra/wrong:            28 / 184  (15.2%)
 --- Similar-Match Confusions (expected -> got) ---
  dark blue                      -> blue                  x10
  navy blue                      -> blue                  x8
  gold                           -> yellow                x5
  dark brown                     -> brown                 x2
  navy                           -> blue                  x1
 --- Most Missed Ground Truth Colors ---
  light blue                       8  ########
  maroon                           8  ########
  gray                             7  #######
  black                            6  ######
  dark brown                       4  ####
  brown                            3  ###
  blue                             3  ###
  green                            3  ###
  teal                             2  ##
  gold|yellow                      1  #
  red                              1  #
 --- Most Common Extra/Wrong VLM Colors ---
  blue                            10  ##########
  black                            7  #######
  red                              7  #######
  gold                             1  #
  green                            1  #
  redolas                          1  #
  orange                           1  #
 --- Per-Image Verdict ---
  PASS        117
  PARTIAL      18
  FAIL         26
 --- Failed Images (26) ---
  001 -brown_white or dark brown.jpg
    missed: brown, dark brown
    extra:  black
  013 - light blue.jpg
    missed: light blue
    extra:  blue
  016 - maroon.jpg
    missed: maroon
  017 - brown_white.jpg
    missed: brown
    extra:  black
  022 - black_light blue.jpg
    missed: black, light blue
    extra:  blue
  029 -maroon_white.jpg
    missed: maroon
    extra:  red
  034 - light blue.jpg
    missed: light blue
    extra:  blue
  036 - light blue_white.jpg
    missed: light blue
    extra:  blue
  046 - green.jpg
    missed: green
    extra:  black
  053 - black_white.jpg
    missed: black
  057 - white_gold or yellow.jpg
    missed: gold|yellow
  063 - dark brown.jpg
    missed: dark brown
    extra:  black
  069 - red_white.jpg
    missed: red
  077 - teal_white.jpg
    missed: teal
    extra:  green
  078 - light blue_white.jpg
    missed: light blue
    extra:  blue
  083 - dark brown_white.jpg
    missed: dark brown
    extra:  black
  087 - white_light blue.jpg
    missed: light blue
    extra:  blue
  099 - maroon_white.jpg
    missed: maroon
    extra:  redolas, red
  129 - blue_white.jpg
    missed: blue
  132 - brown_white.jpg
    missed: brown
    extra:  orange
  134 - teal_white.jpg
    missed: teal
    extra:  blue
  138 - maroon.jpg
    missed: maroon
    extra:  red
  141 - light blue_white.jpg
    missed: light blue
    extra:  blue
  150 - green_gray.jpg
    missed: green, gray
    extra:  black
  160 - blue_white.jpg
    missed: blue
  161 - light blue_white.jpg
    missed: light blue
    extra:  blue
 #Gemini 3 Flash Results (Prompt: jersey_prompt_capstone.txt):
 ================================================================================
 ACCURACY SUMMARY  (gemini-3-flash-preview)
 ================================================================================
 Images processed:       161
 Errors:                 0
 Total time:             1881.7s (11.7s avg)
 Ground truth colors:    202  (excluding white)
 VLM unique colors:      174  (excluding white)
 --- Recall (did VLM find each ground truth color?) ---
  Exact match:           123 / 202  (60.9%)
  Similar match:          35 / 202  (17.3%)
  Total found:           158 / 202  (78.2%)
  Missed:                 44 / 202  (21.8%)
 --- Precision (are VLM colors correct?) ---
  Exact match:           123 / 174  (70.7%)
  Similar match:          34 / 174  (19.5%)
  Total correct:         157 / 174  (90.2%)
  Extra/wrong:            17 / 174  (9.8%)
 --- Similar-Match Confusions (expected -> got) ---
  gray                           -> grey                  x10
  navy blue                      -> blue                  x6
  dark blue                      -> blue                  x6
  dark brown                     -> brown                 x5
  dark blue                      -> navy blue             x3
  gold                           -> yellow                x2
  navy blue                      -> navy                  x1
  navy                           -> blue                  x1
  dark blue                      -> navy                  x1
 --- Most Missed Ground Truth Colors ---
  maroon                           9  #########
  black                            7  #######
  gray                             6  ######
  green                            4  ####
  gold                             3  ###
  blue                             3  ###
  light blue                       2  ##
  gold|yellow                      2  ##
  red                              2  ##
  teal                             2  ##
  navy blue                        1  #
  dark brown                       1  #
  yellow                           1  #
  brown                            1  #
 --- Most Common Extra/Wrong VLM Colors ---
  red                              7  #######
  black                            4  ####
  blue                             2  ##
  green                            1  #
  orange                           1  #
  light blue                       1  #
  navy                             1  #
 --- Per-Image Verdict ---
  PASS        118
  PARTIAL      21
  FAIL         22
 --- Failed Images (22) ---
  016 - maroon.jpg
    missed: maroon
  019 - maroon_gold.jpg
    missed: maroon, gold
    extra:  red
  029 -maroon_white.jpg
    missed: maroon
    extra:  red
  030 - navy blue_white.jpg
    missed: navy blue
  034 - light blue.jpg
    missed: light blue
    extra:  blue
  036 - light blue_white.jpg
    missed: light blue
    extra:  blue
  046 - green.jpg
    missed: green
    extra:  black
  048 - red.jpg
    missed: red
  053 - black_white.jpg
    missed: black
  057 - white_gold or yellow.jpg
    missed: gold|yellow
  069 - red_white.jpg
    missed: red
  077 - teal_white.jpg
    missed: teal
    extra:  green
  083 - dark brown_white.jpg
    missed: dark brown
    extra:  black
  088 - white_maroon.jpg
    missed: maroon
  099 - maroon_white.jpg
    missed: maroon
    extra:  red
  128 - green_white.jpg
    missed: green
  129 - blue_white.jpg
    missed: blue
  132 - brown_white.jpg
    missed: brown
    extra:  orange
  134 - teal_white.jpg
    missed: teal
    extra:  light blue
  138 - maroon.jpg
    missed: maroon
    extra:  red
  150 - green_gray.jpg
    missed: green, gray
    extra:  black
  160 - blue_white.jpg
    missed: blue
 #Qwen3-VL-8B Model Results (Prompt: jersey_prompt_capstone.txt):
 ================================================================================
 ACCURACY SUMMARY
 ================================================================================
 Images processed:       161
 Errors:                 0
 Total time:             1435.7s (8.9s avg)
 Ground truth colors:    202  (excluding white)
 VLM unique colors:      180  (excluding white)
 --- Recall (did VLM find each ground truth color?) ---
  Exact match:           133 / 202  (65.8%)
  Similar match:          24 / 202  (11.9%)
  Total found:           157 / 202  (77.7%)
  Missed:                 45 / 202  (22.3%)
 --- Precision (are VLM colors correct?) ---
  Exact match:           133 / 180  (73.9%)
  Similar match:          24 / 180  (13.3%)
  Total correct:         157 / 180  (87.2%)
  Extra/wrong:            23 / 180  (12.8%)
 --- Similar-Match Confusions (expected -> got) ---
  dark blue                      -> blue                  x9
  navy blue                      -> blue                  x8
  gold                           -> yellow                x3
  dark brown                     -> brown                 x2
  navy                           -> blue                  x1
  dark blue                      -> navy                  x1
 --- Most Missed Ground Truth Colors ---
  gray                             9  #########
  maroon                           7  #######
  black                            6  ######
  light blue                       5  #####
  dark brown                       4  ####
  green                            4  ####
  brown                            3  ###
  gold                             2  ##
  blue                             2  ##
  teal                             2  ##
  gold|yellow                      1  #
 --- Most Common Extra/Wrong VLM Colors ---
  black                            7  #######
  blue                             6  ######
  red                              6  ######
  gold                             1  #
  green                            1  #
  orange                           1  #
  navy                             1  #
 --- Per-Image Verdict ---
  PASS        119
  PARTIAL      19
  FAIL         23
 --- Failed Images (23) ---
  001 -brown_white or dark brown.jpg
    missed: brown, dark brown
    extra:  black
  013 - light blue.jpg
    missed: light blue
    extra:  blue
  016 - maroon.jpg
    missed: maroon
  017 - brown_white.jpg
    missed: brown
    extra:  black
  019 - maroon_gold.jpg
    missed: maroon, gold
    extra:  red
  029 -maroon_white.jpg
    missed: maroon
    extra:  red
  034 - light blue.jpg
    missed: light blue
    extra:  blue
  036 - light blue_white.jpg
    missed: light blue
    extra:  blue
  039 - gray_white.jpg
    missed: gray
  046 - green.jpg
    missed: green
    extra:  black
  053 - black_white.jpg
    missed: black
  057 - white_gold or yellow.jpg
    missed: gold|yellow
  063 - dark brown.jpg
    missed: dark brown
    extra:  black
  077 - teal_white.jpg
    missed: teal
    extra:  green
  083 - dark brown_white.jpg
    missed: dark brown
    extra:  black
  132 - brown_white.jpg
    missed: brown
    extra:  orange
  134 - teal_white.jpg
    missed: teal
    extra:  blue
  138 - maroon.jpg
    missed: maroon
    extra:  red
  141 - light blue_white.jpg
    missed: light blue
    extra:  blue
  145 - green_white.jpg
    missed: green
  150 - green_gray.jpg
    missed: green, gray
    extra:  black
  160 - blue_white.jpg
    missed: blue
  161 - light blue_white.jpg
    missed: light blue
    extra:  blue
--- a/accuracy_test_results_all.txt
+++ b/accuracy_test_results_all.txt
--- a/jersey_prompt_capstone.txt
+++ b/jersey_prompt_capstone.txt
@ -0,0 +1,15 @@
 You are a high-precision sports telemetry system. Your job is to scan the image and output structured data for every visible jersey number.
 **Goal:** Identify every clearly readable jersey number, along with its jersey color and number color.
 **Input Analysis Guidelines:**
 1. **Scan Targets:** Focus entirely on the torso/chest, back, and leg areas of players.
 2. **Verify Readability:** For each potential number, check: - Are all digits clearly visible? - Is any part of the number occluded by a limb, fold, or object? - Is the number blurry or too small to read with certainty? - If a number is partially hidden (e.g., looking like a 1 but could be a 7), DISCARD IT.
 3. Determine jersey_color from that player's TORSO SHIRT region: - Use the largest contiguous fabric area on the torso (exclude the number itself, stripes/logos, and deep shadows). - Ignore shorts color even if shorts dominate the image. - Choose the single color name that best matches the shirt's base color.
 **Examples:** [Image: Player in red shirt with white '10'] -> {"jerseys": [{"jersey_number": "10", "jersey_color": "red", "number_color": "white"}]}
 **Output Format:** Provide your output in valid JSON format with the following structure. Do not include markdown formatting (like ```json). { "jerseys": [ { "jersey_number": <string>, "jersey_color": <string>, "number_color": <string> } ] }
 **Constraint:** - If no numbers are clearly readable, return "jerseys": []. - Do not guess. Precision is more important than recall.
--- a/jersey_prompt_constrained.txt
+++ b/jersey_prompt_constrained.txt
@ -0,0 +1,56 @@
 You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
 CRITICAL INSTRUCTIONS:
 1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
 2. ONLY include jersey numbers that you can ACTUALLY READ in the image
 3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
 4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
 5. DO NOT include jerseys if you cannot clearly see the number
 COLOR VOCABULARY:
 For "jersey_color" and "number_color", you MUST choose from this list ONLY:
 red, blue, dark blue, navy blue, light blue, green, yellow, gold, orange, purple, black, white, gray, brown, dark brown, maroon, teal, pink
 Important color distinctions:
 - Use "maroon" for dark brownish-red, NOT "red"
 - Use "light blue" for pale or sky blue, NOT "blue"
 - Use "navy blue" for very dark blue, NOT "blue" or "dark blue"
 - Use "teal" for blue-green, NOT "green" or "blue"
 - Use "gray" (not "grey") for silver or neutral tones
 - Use "dark brown" for very dark brown, NOT "black"
 - Use "gold" for metallic or deep yellow, NOT "yellow"
 RESPONSE FORMAT:
 Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
 Use DOUBLE QUOTES (") for all JSON keys and string values.
 The JSON must have a single key "jerseys" with an array of dictionaries.
 Each dictionary must have exactly these three keys:
 - "jersey_number": The number on the jersey (as a string, only if clearly visible)
 - "jersey_color": The primary color of the jersey (MUST be from the color list above)
 - "number_color": The color of the number on the jersey (MUST be from the color list above)
 Example response for an image WITH visible jerseys:
 {
  "jerseys": [
    {
      "jersey_number": "10",
      "jersey_color": "maroon",
      "number_color": "gold"
    },
    {
      "jersey_number": "42",
      "jersey_color": "light blue",
      "number_color": "white"
    }
  ]
 }
 Example response for an image WITHOUT jerseys or with unclear numbers:
 {"jerseys": []}
 REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
 Now analyze the image and return the JSON object.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,11 @@
 [project]
 name = "jersey-test"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
    "numpy>=1.24.0",
    "opencv-python>=4.8.0",
    "requests>=2.28.0",
 ]
--- a/run_all_accuracy_tests.sh
+++ b/run_all_accuracy_tests.sh
@ -0,0 +1,44 @@
 #!/usr/bin/env bash
 #
 # Run both accuracy test scripts against all three prompts.
 # Results are saved to accuracy_test_results_all.txt
 #
 set -e
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 OUTPUT_FILE="${SCRIPT_DIR}/accuracy_test_results_all.txt"
 PROMPTS=(
    "jersey_prompt.txt"
    "jersey_prompt_capstone.txt"
    "jersey_prompt_constrained.txt"
 )
 echo "Results will be saved to: ${OUTPUT_FILE}"
 echo "Started at: $(date)"
 echo ""
 > "$OUTPUT_FILE"
 for prompt in "${PROMPTS[@]}"; do
    prompt_path="${SCRIPT_DIR}/${prompt}"
    echo "========================================" | tee -a "$OUTPUT_FILE"
    echo "Qwen3-VL-8B + ${prompt}" | tee -a "$OUTPUT_FILE"
    echo "Started: $(date)" | tee -a "$OUTPUT_FILE"
    echo "========================================" | tee -a "$OUTPUT_FILE"
    python3 "${SCRIPT_DIR}/test_accuracy.py" "$prompt_path" 2>&1 | tee -a "$OUTPUT_FILE"
    echo "" | tee -a "$OUTPUT_FILE"
    echo "========================================" | tee -a "$OUTPUT_FILE"
    echo "Gemini 3 Flash + ${prompt}" | tee -a "$OUTPUT_FILE"
    echo "Started: $(date)" | tee -a "$OUTPUT_FILE"
    echo "========================================" | tee -a "$OUTPUT_FILE"
    python3 "${SCRIPT_DIR}/test_accuracy_gemini.py" "$prompt_path" 2>&1 | tee -a "$OUTPUT_FILE"
    echo "" | tee -a "$OUTPUT_FILE"
 done
 echo "========================================" | tee -a "$OUTPUT_FILE"
 echo "All tests completed at: $(date)" | tee -a "$OUTPUT_FILE"
 echo "Results saved to: ${OUTPUT_FILE}"
--- a/test_accuracy.py
+++ b/test_accuracy.py
@ -0,0 +1,402 @@
 #!/usr/bin/env python3
 """
 Test script to measure VLM accuracy for jersey color detection.
 Uses annotated test images where ground truth colors are encoded in filenames.
 Compares VLM results against ground truth, measuring exact and similar color matches.
 White is ignored in both ground truth and VLM results.
 Filename format: "014 - orange_dark blue or purple.jpg"
  - Underscore separates distinct jersey colors
  - "or" separates acceptable alternatives for a single jersey
 Usage:
    python test_accuracy.py [prompt_file]
 """
 import json
 import os
 import re
 import sys
 import time
 from collections import Counter
 from pathlib import Path
 import cv2
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 from scan_utils.llama_cpp_client import LlamaCppClient
 SERVER_URL = "http://agx:8080"
 IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files_annotated")
 DEFAULT_PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
 MAX_IMAGE_WIDTH = 768
 # ---------------------------------------------------------------------------
 # Color similarity – colors in the same family count as "similar" matches
 # ---------------------------------------------------------------------------
 COLOR_FAMILIES = {
    'blue':       ['blue', 'dark blue', 'navy blue', 'navy', 'royal blue'],
    'light_blue': ['light blue', 'sky blue', 'baby blue', 'carolina blue', 'powder blue'],
    'red':        ['red', 'scarlet', 'crimson'],
    'dark_red':   ['maroon', 'burgundy', 'dark red', 'wine'],
    'green':      ['green', 'dark green', 'forest green', 'kelly green'],
    'yellow':     ['yellow', 'gold', 'golden'],
    'orange':     ['orange', 'burnt orange'],
    'brown':      ['brown', 'dark brown'],
    'purple':     ['purple', 'violet'],
    'gray':       ['gray', 'grey', 'silver', 'charcoal'],
    'black':      ['black'],
    'teal':       ['teal', 'turquoise', 'cyan', 'aqua'],
    'pink':       ['pink', 'magenta', 'hot pink', 'rose'],
 }
 _COLOR_TO_FAMILY = {}
 for _family, _members in COLOR_FAMILIES.items():
    for _color in _members:
        _COLOR_TO_FAMILY[_color] = _family
 def colors_are_similar(color1: str, color2: str) -> bool:
    """Return True if two colors belong to the same color family."""
    if color1 == color2:
        return True
    f1 = _COLOR_TO_FAMILY.get(color1)
    f2 = _COLOR_TO_FAMILY.get(color2)
    return bool(f1 and f2 and f1 == f2)
 # ---------------------------------------------------------------------------
 # Ground-truth parsing
 # ---------------------------------------------------------------------------
 def parse_ground_truth(filename: str) -> list[list[str]]:
    """Parse ground truth colors from an annotated filename.
    Returns a list of color groups.  Each group is a list of acceptable
    alternatives (from "or" in the filename).  White entries are removed.
    Example: "014 - orange_dark blue or purple.jpg"
      -> [["orange"], ["dark blue", "purple"]]
    """
    name = Path(filename).stem
    # Strip number prefix ("014 - ", "029 -", etc.)
    name = re.sub(r'^\d+\s*-\s*', '', name)
    # Treat hyphens between colors as underscores (e.g. "yellow-black")
    name = name.replace('-', '_')
    color_groups = []
    for part in name.split('_'):
        part = part.strip()
        if not part:
            continue
        alternatives = [a.strip().lower() for a in part.split(' or ')]
        alternatives = [a for a in alternatives if a and a != 'white']
        if alternatives:
            color_groups.append(alternatives)
    return color_groups
 # ---------------------------------------------------------------------------
 # Response cleaning
 # ---------------------------------------------------------------------------
 def clean_response(text: str) -> str:
    """Remove think tags and markdown code blocks from model output."""
    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'\u25c1think\u25b7.*?\u25c1/think\u25b7', '', cleaned, flags=re.DOTALL)
    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
    cleaned = re.sub(r'\u25c1/?think\u25b7', '', cleaned, flags=re.IGNORECASE)
    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
    if json_block:
        cleaned = json_block.group(1)
    else:
        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
    return cleaned.strip()
 # ---------------------------------------------------------------------------
 # Scoring
 # ---------------------------------------------------------------------------
 def score_image(gt_groups: list[list[str]], vlm_colors: set[str]) -> dict:
    """Compare VLM detected colors against ground truth color groups.
    Recall  = how many GT color groups were found in VLM output
    Precision = how many VLM colors match something in the GT
    """
    recall_exact = 0
    recall_similar = 0
    recall_missed = []
    confusions = []
    for group in gt_groups:
        # Try exact match first
        if any(alt in vlm_colors for alt in group):
            recall_exact += 1
            continue
        # Try similar match
        matched_vlm = None
        for alt in group:
            for vc in vlm_colors:
                if colors_are_similar(alt, vc):
                    matched_vlm = vc
                    break
            if matched_vlm:
                break
        if matched_vlm:
            recall_similar += 1
            confusions.append((group, matched_vlm))
        else:
            recall_missed.append(group)
    # Precision: check each VLM color against GT
    all_gt_alts = [alt for group in gt_groups for alt in group]
    precision_exact = 0
    precision_similar = 0
    precision_extra = []
    for vc in vlm_colors:
        if vc in all_gt_alts:
            precision_exact += 1
        elif any(colors_are_similar(vc, gt) for gt in all_gt_alts):
            precision_similar += 1
        else:
            precision_extra.append(vc)
    return {
        'gt_count': len(gt_groups),
        'vlm_count': len(vlm_colors),
        'recall_exact': recall_exact,
        'recall_similar': recall_similar,
        'recall_missed': recall_missed,
        'precision_exact': precision_exact,
        'precision_similar': precision_similar,
        'precision_extra': precision_extra,
        'confusions': confusions,
    }
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def pct(n: int, d: int) -> str:
    return f"{100 * n / d:.1f}%" if d else "N/A"
 def print_summary(total_gt, total_vlm, total_recall_exact, total_recall_similar,
                  total_recall_missed, total_precision_exact, total_precision_similar,
                  total_precision_extra, confusion_counter, missed_counter,
                  extra_counter, per_image_results, image_count, errors, total_time):
    """Print the full accuracy summary report."""
    print()
    print("=" * 80)
    print("ACCURACY SUMMARY")
    print("=" * 80)
    print(f"Images processed:       {image_count}")
    print(f"Errors:                 {errors}")
    print(f"Total time:             {total_time:.1f}s ({total_time / max(image_count, 1):.1f}s avg)")
    print()
    print(f"Ground truth colors:    {total_gt}  (excluding white)")
    print(f"VLM unique colors:      {total_vlm}  (excluding white)")
    print()
    print("--- Recall (did VLM find each ground truth color?) ---")
    print(f"  Exact match:          {total_recall_exact:4d} / {total_gt}  ({pct(total_recall_exact, total_gt)})")
    print(f"  Similar match:        {total_recall_similar:4d} / {total_gt}  ({pct(total_recall_similar, total_gt)})")
    recall_total = total_recall_exact + total_recall_similar
    print(f"  Total found:          {recall_total:4d} / {total_gt}  ({pct(recall_total, total_gt)})")
    print(f"  Missed:               {total_recall_missed:4d} / {total_gt}  ({pct(total_recall_missed, total_gt)})")
    print()
    print("--- Precision (are VLM colors correct?) ---")
    print(f"  Exact match:          {total_precision_exact:4d} / {total_vlm}  ({pct(total_precision_exact, total_vlm)})")
    print(f"  Similar match:        {total_precision_similar:4d} / {total_vlm}  ({pct(total_precision_similar, total_vlm)})")
    prec_total = total_precision_exact + total_precision_similar
    print(f"  Total correct:        {prec_total:4d} / {total_vlm}  ({pct(prec_total, total_vlm)})")
    print(f"  Extra/wrong:          {total_precision_extra:4d} / {total_vlm}  ({pct(total_precision_extra, total_vlm)})")
    if confusion_counter:
        print()
        print("--- Similar-Match Confusions (expected -> got) ---")
        for (expected, got), count in confusion_counter.most_common():
            print(f"  {expected:30s} -> {got:20s}  x{count}")
    if missed_counter:
        print()
        print("--- Most Missed Ground Truth Colors ---")
        for color, count in missed_counter.most_common(20):
            bar = "#" * min(count, 40)
            print(f"  {color:30s} {count:3d}  {bar}")
    if extra_counter:
        print()
        print("--- Most Common Extra/Wrong VLM Colors ---")
        for color, count in extra_counter.most_common(20):
            bar = "#" * min(count, 40)
            print(f"  {color:30s} {count:3d}  {bar}")
    if per_image_results:
        tags = Counter(r['tag'] for r in per_image_results)
        print()
        print("--- Per-Image Verdict ---")
        for tag in ['PASS', 'PARTIAL', 'FAIL']:
            print(f"  {tag:10s} {tags.get(tag, 0):4d}")
        failed = [r for r in per_image_results if r['tag'] == 'FAIL']
        if failed:
            print()
            print(f"--- Failed Images ({len(failed)}) ---")
            for r in failed:
                scores = r['scores']
                missed_strs = ["|".join(g) for g in scores['recall_missed']]
                print(f"  {r['file']}")
                print(f"    missed: {', '.join(missed_strs)}")
                if scores['precision_extra']:
                    print(f"    extra:  {', '.join(scores['precision_extra'])}")
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def main():
    prompt_file = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_PROMPT_FILE
    with open(prompt_file, 'r') as f:
        prompt = f.read()
    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
    image_files = sorted([
        p for p in Path(IMAGES_DIR).iterdir()
        if p.suffix.lower() in valid_extensions
    ])
    print(f"Images to process: {len(image_files)}")
    print(f"Server: {SERVER_URL}")
    print(f"Prompt: {prompt_file} ({len(prompt)} chars)")
    print("=" * 80)
    client = LlamaCppClient(base_url=SERVER_URL)
    # Accumulators
    total_gt = 0
    total_vlm = 0
    total_recall_exact = 0
    total_recall_similar = 0
    total_recall_missed = 0
    total_precision_exact = 0
    total_precision_similar = 0
    total_precision_extra = 0
    errors = 0
    start_all = time.time()
    confusion_counter = Counter()
    missed_counter = Counter()
    extra_counter = Counter()
    per_image_results = []
    for i, image_path in enumerate(image_files, 1):
        gt_groups = parse_ground_truth(image_path.name)
        gt_display = ", ".join("|".join(g) for g in gt_groups) if gt_groups else "(none)"
        print(f"\n[{i}/{len(image_files)}] {image_path.name}")
        print(f"         GT: [{gt_display}]")
        image = cv2.imread(str(image_path))
        if image is None:
            print("         SKIP (failed to load)")
            errors += 1
            continue
        h, w = image.shape[:2]
        if w > MAX_IMAGE_WIDTH:
            scale = MAX_IMAGE_WIDTH / w
            image = cv2.resize(image, (MAX_IMAGE_WIDTH, int(h * scale)), interpolation=cv2.INTER_AREA)
        message = client.create_multimodal_message(role="user", content=prompt, images=[image])
        try:
            t0 = time.time()
            response = client.chat_completion(messages=[message], temperature=0.1, max_tokens=1000)
            elapsed = time.time() - t0
            response_text = response['choices'][0]['message']['content']
            cleaned = clean_response(response_text)
            result = json.loads(cleaned)
            jerseys = result.get('jerseys', [])
            # Unique VLM jersey colors, ignoring white
            vlm_colors = set()
            for j in jerseys:
                jc = j.get('jersey_color', '').strip().lower()
                if jc and jc != 'white':
                    vlm_colors.add(jc)
            vlm_display = ", ".join(sorted(vlm_colors)) if vlm_colors else "(none)"
            print(f"         VLM: [{vlm_display}]  ({len(jerseys)} jersey(s), {elapsed:.1f}s)")
            if not gt_groups:
                print("         -- no ground truth colors (white-only), skipping scoring")
                continue
            scores = score_image(gt_groups, vlm_colors)
            total_gt += scores['gt_count']
            total_vlm += scores['vlm_count']
            total_recall_exact += scores['recall_exact']
            total_recall_similar += scores['recall_similar']
            total_recall_missed += len(scores['recall_missed'])
            total_precision_exact += scores['precision_exact']
            total_precision_similar += scores['precision_similar']
            total_precision_extra += len(scores['precision_extra'])
            for group, got in scores['confusions']:
                confusion_counter[("|".join(group), got)] += 1
            for group in scores['recall_missed']:
                missed_counter["|".join(group)] += 1
            for ec in scores['precision_extra']:
                extra_counter[ec] += 1
            # Status line
            status_parts = []
            if scores['recall_exact']:
                status_parts.append(f"exact:{scores['recall_exact']}")
            if scores['recall_similar']:
                status_parts.append(f"similar:{scores['recall_similar']}")
            if scores['recall_missed']:
                missed_strs = ["|".join(g) for g in scores['recall_missed']]
                status_parts.append(f"MISS:{','.join(missed_strs)}")
            if scores['precision_extra']:
                status_parts.append(f"extra:{','.join(scores['precision_extra'])}")
            all_found = (scores['recall_exact'] + scores['recall_similar']) == scores['gt_count']
            no_extra = not scores['precision_extra']
            if all_found and no_extra:
                tag = "PASS"
            elif scores['recall_exact'] + scores['recall_similar'] > 0:
                tag = "PARTIAL"
            else:
                tag = "FAIL"
            print(f"         {tag}  {', '.join(status_parts)}")
            per_image_results.append({
                'file': image_path.name,
                'tag': tag,
                'scores': scores,
            })
        except (json.JSONDecodeError, KeyError, IndexError) as e:
            print(f"         PARSE ERROR: {e}")
            errors += 1
        except Exception as e:
            print(f"         ERROR: {e}")
            errors += 1
    total_time = time.time() - start_all
    print_summary(
        total_gt, total_vlm, total_recall_exact, total_recall_similar,
        total_recall_missed, total_precision_exact, total_precision_similar,
        total_precision_extra, confusion_counter, missed_counter,
        extra_counter, per_image_results, len(image_files), errors, total_time,
    )
 if __name__ == '__main__':
    main()
--- a/test_accuracy_gemini.py
+++ b/test_accuracy_gemini.py
@ -0,0 +1,576 @@
 #!/usr/bin/env python3
 """
 Test script to measure Gemini VLM accuracy for jersey color detection.
 Uses annotated test images where ground truth colors are encoded in filenames.
 Compares Gemini results against ground truth, measuring exact and similar color
 matches.  White is ignored in both ground truth and VLM results.
 Filename format: "014 - orange_dark blue or purple.jpg"
  - Underscore separates distinct jersey colors
  - "or" separates acceptable alternatives for a single jersey
 Usage:
    python test_accuracy_gemini.py [prompt_file]
 """
 import base64
 import concurrent.futures
 import json
 import os
 import re
 import sys
 import time
 from collections import Counter
 from pathlib import Path
 import cv2
 import requests
 GEMINI_MODEL = "gemini-3-flash-preview"
 API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{GEMINI_MODEL}:generateContent"
 IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files_annotated")
 DEFAULT_PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
 API_KEY_FILE = os.path.join(os.path.dirname(__file__), "gemini_api_key.txt")
 MAX_IMAGE_WIDTH = 768
 JPEG_QUALITY = 85
 CONCURRENT_WORKERS = 8
 # ---------------------------------------------------------------------------
 # Color similarity – colors in the same family count as "similar" matches
 # ---------------------------------------------------------------------------
 COLOR_FAMILIES = {
    'blue':       ['blue', 'dark blue', 'navy blue', 'navy', 'royal blue'],
    'light_blue': ['light blue', 'sky blue', 'baby blue', 'carolina blue', 'powder blue'],
    'red':        ['red', 'scarlet', 'crimson'],
    'dark_red':   ['maroon', 'burgundy', 'dark red', 'wine'],
    'green':      ['green', 'dark green', 'forest green', 'kelly green'],
    'yellow':     ['yellow', 'gold', 'golden'],
    'orange':     ['orange', 'burnt orange'],
    'brown':      ['brown', 'dark brown'],
    'purple':     ['purple', 'violet'],
    'gray':       ['gray', 'grey', 'silver', 'charcoal'],
    'black':      ['black'],
    'teal':       ['teal', 'turquoise', 'cyan', 'aqua'],
    'pink':       ['pink', 'magenta', 'hot pink', 'rose'],
 }
 _COLOR_TO_FAMILY = {}
 for _family, _members in COLOR_FAMILIES.items():
    for _color in _members:
        _COLOR_TO_FAMILY[_color] = _family
 def colors_are_similar(color1: str, color2: str) -> bool:
    """Return True if two colors belong to the same color family."""
    if color1 == color2:
        return True
    f1 = _COLOR_TO_FAMILY.get(color1)
    f2 = _COLOR_TO_FAMILY.get(color2)
    return bool(f1 and f2 and f1 == f2)
 # ---------------------------------------------------------------------------
 # Ground-truth parsing
 # ---------------------------------------------------------------------------
 def parse_ground_truth(filename: str) -> list[list[str]]:
    """Parse ground truth colors from an annotated filename.
    Returns a list of color groups.  Each group is a list of acceptable
    alternatives (from "or" in the filename).  White entries are removed.
    Example: "014 - orange_dark blue or purple.jpg"
      -> [["orange"], ["dark blue", "purple"]]
    """
    name = Path(filename).stem
    # Strip number prefix ("014 - ", "029 -", etc.)
    name = re.sub(r'^\d+\s*-\s*', '', name)
    # Treat hyphens between colors as underscores (e.g. "yellow-black")
    name = name.replace('-', '_')
    color_groups = []
    for part in name.split('_'):
        part = part.strip()
        if not part:
            continue
        alternatives = [a.strip().lower() for a in part.split(' or ')]
        alternatives = [a for a in alternatives if a and a != 'white']
        if alternatives:
            color_groups.append(alternatives)
    return color_groups
 # ---------------------------------------------------------------------------
 # Response cleaning & salvage
 # ---------------------------------------------------------------------------
 def clean_response(text: str) -> str:
    """Remove think tags and markdown code blocks from model output."""
    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
    if json_block:
        cleaned = json_block.group(1)
    else:
        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
    return cleaned.strip()
 def salvage_jerseys(text: str) -> list[dict]:
    """Extract complete jersey objects from truncated JSON using regex."""
    pattern = re.compile(
        r'\{\s*'
        r'"jersey_number"\s*:\s*"[^"]*"\s*,\s*'
        r'"jersey_color"\s*:\s*"([^"]*)"\s*,\s*'
        r'"number_color"\s*:\s*"([^"]*)"\s*'
        r'\}',
        re.DOTALL,
    )
    jerseys = []
    for m in pattern.finditer(text):
        jerseys.append({
            'jersey_color': m.group(1),
            'number_color': m.group(2),
        })
    return jerseys
 # ---------------------------------------------------------------------------
 # Gemini API helpers
 # ---------------------------------------------------------------------------
 def load_api_key() -> str:
    with open(API_KEY_FILE, 'r') as f:
        return f.read().strip()
 def encode_image(image_path: str) -> tuple[str, str]:
    """Read an image file, resize if wider than MAX_IMAGE_WIDTH, and return (base64_data, mime_type)."""
    ext = Path(image_path).suffix.lower()
    mime_map = {
        '.jpg': 'image/jpeg',
        '.jpeg': 'image/jpeg',
        '.png': 'image/png',
        '.webp': 'image/webp',
        '.bmp': 'image/bmp',
        '.tiff': 'image/tiff',
    }
    mime_type = mime_map.get(ext, 'image/jpeg')
    image = cv2.imread(image_path)
    if image is not None:
        h, w = image.shape[:2]
        if w > MAX_IMAGE_WIDTH:
            scale = MAX_IMAGE_WIDTH / w
            image = cv2.resize(image, (MAX_IMAGE_WIDTH, int(h * scale)), interpolation=cv2.INTER_AREA)
        if ext == '.png':
            _, buf = cv2.imencode('.png', image)
        else:
            _, buf = cv2.imencode('.jpg', image, [cv2.IMWRITE_JPEG_QUALITY, JPEG_QUALITY])
        data = base64.b64encode(buf).decode('utf-8')
    else:
        with open(image_path, 'rb') as f:
            data = base64.b64encode(f.read()).decode('utf-8')
    return data, mime_type
 MAX_RETRIES = 3
 RETRY_BACKOFF = [2, 5, 10]
 def call_gemini(session: requests.Session, api_key: str, image_data: str,
                mime_type: str, prompt: str) -> dict:
    """Send pre-encoded image + prompt to the Gemini API and return the raw response."""
    payload = {
        "contents": [{
            "parts": [
                {
                    "inline_data": {
                        "mime_type": mime_type,
                        "data": image_data,
                    }
                },
                {
                    "text": prompt,
                }
            ]
        }],
        "generationConfig": {
            "temperature": 0.1,
            "maxOutputTokens": 8192,
            "responseMimeType": "application/json",
        }
    }
    for attempt in range(MAX_RETRIES):
        response = session.post(
            API_URL,
            headers={
                "x-goog-api-key": api_key,
                "Content-Type": "application/json",
            },
            json=payload,
        )
        if response.status_code >= 500 and attempt < MAX_RETRIES - 1:
            time.sleep(RETRY_BACKOFF[attempt])
            continue
        response.raise_for_status()
        return response.json()
    response.raise_for_status()
    return response.json()
 def _api_worker(session: requests.Session, api_key: str, image_data: str,
                mime_type: str, prompt: str) -> dict:
    """Wrapper that captures timing and exceptions for concurrent execution."""
    t0 = time.time()
    try:
        resp = call_gemini(session, api_key, image_data, mime_type, prompt)
        return {'resp': resp, 'elapsed': time.time() - t0, 'error': None}
    except Exception as e:
        return {'resp': None, 'elapsed': time.time() - t0, 'error': e}
 # ---------------------------------------------------------------------------
 # Scoring
 # ---------------------------------------------------------------------------
 def score_image(gt_groups: list[list[str]], vlm_colors: set[str]) -> dict:
    """Compare VLM detected colors against ground truth color groups.
    Recall  = how many GT color groups were found in VLM output
    Precision = how many VLM colors match something in the GT
    """
    recall_exact = 0
    recall_similar = 0
    recall_missed = []
    confusions = []
    for group in gt_groups:
        # Try exact match first
        if any(alt in vlm_colors for alt in group):
            recall_exact += 1
            continue
        # Try similar match
        matched_vlm = None
        for alt in group:
            for vc in vlm_colors:
                if colors_are_similar(alt, vc):
                    matched_vlm = vc
                    break
            if matched_vlm:
                break
        if matched_vlm:
            recall_similar += 1
            confusions.append((group, matched_vlm))
        else:
            recall_missed.append(group)
    # Precision: check each VLM color against GT
    all_gt_alts = [alt for group in gt_groups for alt in group]
    precision_exact = 0
    precision_similar = 0
    precision_extra = []
    for vc in vlm_colors:
        if vc in all_gt_alts:
            precision_exact += 1
        elif any(colors_are_similar(vc, gt) for gt in all_gt_alts):
            precision_similar += 1
        else:
            precision_extra.append(vc)
    return {
        'gt_count': len(gt_groups),
        'vlm_count': len(vlm_colors),
        'recall_exact': recall_exact,
        'recall_similar': recall_similar,
        'recall_missed': recall_missed,
        'precision_exact': precision_exact,
        'precision_similar': precision_similar,
        'precision_extra': precision_extra,
        'confusions': confusions,
    }
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def pct(n: int, d: int) -> str:
    return f"{100 * n / d:.1f}%" if d else "N/A"
 def extract_vlm_colors(jerseys: list[dict]) -> set[str]:
    """Return unique jersey colors from VLM output, ignoring white."""
    vlm_colors = set()
    for j in jerseys:
        jc = j.get('jersey_color', '').strip().lower()
        if jc and jc != 'white':
            vlm_colors.add(jc)
    return vlm_colors
 def parse_response(result: dict) -> tuple[list[dict], set[str]]:
    """Parse a Gemini response into jerseys list and vlm_colors set.
    On JSON parse failure, attempts to salvage jersey objects from truncated
    output.  Returns (jerseys, vlm_colors).
    """
    text = result['resp']['candidates'][0]['content']['parts'][0]['text']
    cleaned = clean_response(text)
    try:
        data = json.loads(cleaned)
        jerseys = data.get('jerseys', [])
    except json.JSONDecodeError:
        jerseys = salvage_jerseys(text)
    return jerseys, extract_vlm_colors(jerseys)
 def score_and_format(gt_groups, vlm_colors, scores):
    """Build a status line and tag from scoring results."""
    status_parts = []
    if scores['recall_exact']:
        status_parts.append(f"exact:{scores['recall_exact']}")
    if scores['recall_similar']:
        status_parts.append(f"similar:{scores['recall_similar']}")
    if scores['recall_missed']:
        missed_strs = ["|".join(g) for g in scores['recall_missed']]
        status_parts.append(f"MISS:{','.join(missed_strs)}")
    if scores['precision_extra']:
        status_parts.append(f"extra:{','.join(scores['precision_extra'])}")
    all_found = (scores['recall_exact'] + scores['recall_similar']) == scores['gt_count']
    no_extra = not scores['precision_extra']
    if all_found and no_extra:
        tag = "PASS"
    elif scores['recall_exact'] + scores['recall_similar'] > 0:
        tag = "PARTIAL"
    else:
        tag = "FAIL"
    return tag, status_parts
 def print_summary(model_name, total_gt, total_vlm, total_recall_exact,
                  total_recall_similar, total_recall_missed,
                  total_precision_exact, total_precision_similar,
                  total_precision_extra, confusion_counter, missed_counter,
                  extra_counter, per_image_results, image_count, errors,
                  total_time):
    """Print the full accuracy summary report."""
    print()
    print("=" * 80)
    print(f"ACCURACY SUMMARY  ({model_name})")
    print("=" * 80)
    print(f"Images processed:       {image_count}")
    print(f"Errors:                 {errors}")
    print(f"Total time:             {total_time:.1f}s ({total_time / max(image_count, 1):.1f}s avg)")
    print()
    print(f"Ground truth colors:    {total_gt}  (excluding white)")
    print(f"VLM unique colors:      {total_vlm}  (excluding white)")
    print()
    print("--- Recall (did VLM find each ground truth color?) ---")
    print(f"  Exact match:          {total_recall_exact:4d} / {total_gt}  ({pct(total_recall_exact, total_gt)})")
    print(f"  Similar match:        {total_recall_similar:4d} / {total_gt}  ({pct(total_recall_similar, total_gt)})")
    recall_total = total_recall_exact + total_recall_similar
    print(f"  Total found:          {recall_total:4d} / {total_gt}  ({pct(recall_total, total_gt)})")
    print(f"  Missed:               {total_recall_missed:4d} / {total_gt}  ({pct(total_recall_missed, total_gt)})")
    print()
    print("--- Precision (are VLM colors correct?) ---")
    print(f"  Exact match:          {total_precision_exact:4d} / {total_vlm}  ({pct(total_precision_exact, total_vlm)})")
    print(f"  Similar match:        {total_precision_similar:4d} / {total_vlm}  ({pct(total_precision_similar, total_vlm)})")
    prec_total = total_precision_exact + total_precision_similar
    print(f"  Total correct:        {prec_total:4d} / {total_vlm}  ({pct(prec_total, total_vlm)})")
    print(f"  Extra/wrong:          {total_precision_extra:4d} / {total_vlm}  ({pct(total_precision_extra, total_vlm)})")
    if confusion_counter:
        print()
        print("--- Similar-Match Confusions (expected -> got) ---")
        for (expected, got), count in confusion_counter.most_common():
            print(f"  {expected:30s} -> {got:20s}  x{count}")
    if missed_counter:
        print()
        print("--- Most Missed Ground Truth Colors ---")
        for color, count in missed_counter.most_common(20):
            bar = "#" * min(count, 40)
            print(f"  {color:30s} {count:3d}  {bar}")
    if extra_counter:
        print()
        print("--- Most Common Extra/Wrong VLM Colors ---")
        for color, count in extra_counter.most_common(20):
            bar = "#" * min(count, 40)
            print(f"  {color:30s} {count:3d}  {bar}")
    if per_image_results:
        tags = Counter(r['tag'] for r in per_image_results)
        print()
        print("--- Per-Image Verdict ---")
        for tag in ['PASS', 'PARTIAL', 'FAIL']:
            print(f"  {tag:10s} {tags.get(tag, 0):4d}")
        failed = [r for r in per_image_results if r['tag'] == 'FAIL']
        if failed:
            print()
            print(f"--- Failed Images ({len(failed)}) ---")
            for r in failed:
                scores = r['scores']
                missed_strs = ["|".join(g) for g in scores['recall_missed']]
                print(f"  {r['file']}")
                print(f"    missed: {', '.join(missed_strs)}")
                if scores['precision_extra']:
                    print(f"    extra:  {', '.join(scores['precision_extra'])}")
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def main():
    prompt_file = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_PROMPT_FILE
    api_key = load_api_key()
    with open(prompt_file, 'r') as f:
        prompt = f.read()
    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
    image_files = sorted([
        p for p in Path(IMAGES_DIR).iterdir()
        if p.suffix.lower() in valid_extensions
    ])
    print(f"Model: {GEMINI_MODEL}")
    print(f"Images to process: {len(image_files)}")
    print(f"Concurrency: {CONCURRENT_WORKERS} workers")
    print(f"Prompt: {prompt_file} ({len(prompt)} chars)")
    print("=" * 80)
    # ------------------------------------------------------------------
    # Phase 1: Pre-encode all images
    # ------------------------------------------------------------------
    print("Pre-encoding images ... ", end="", flush=True)
    t_enc = time.time()
    encoded_images = []
    for image_path in image_files:
        encoded_images.append(encode_image(str(image_path)))
    print(f"{len(encoded_images)} images in {time.time() - t_enc:.1f}s")
    # ------------------------------------------------------------------
    # Phase 2: Submit all API calls concurrently
    # ------------------------------------------------------------------
    session = requests.Session()
    start_all = time.time()
    print(f"Sending API requests ... ", flush=True)
    api_results = [None] * len(image_files)
    with concurrent.futures.ThreadPoolExecutor(max_workers=CONCURRENT_WORKERS) as executor:
        future_to_idx = {}
        for i, (image_data, mime_type) in enumerate(encoded_images):
            future = executor.submit(
                _api_worker, session, api_key, image_data, mime_type, prompt,
            )
            future_to_idx[future] = i
        completed = 0
        for future in concurrent.futures.as_completed(future_to_idx):
            idx = future_to_idx[future]
            api_results[idx] = future.result()
            completed += 1
            print(f"\r  {completed}/{len(image_files)} API calls completed", end="", flush=True)
    api_time = time.time() - start_all
    print(f"  ({api_time:.1f}s total)")
    print("=" * 80)
    # ------------------------------------------------------------------
    # Phase 3: Score results in order
    # ------------------------------------------------------------------
    total_gt = 0
    total_vlm = 0
    total_recall_exact = 0
    total_recall_similar = 0
    total_recall_missed = 0
    total_precision_exact = 0
    total_precision_similar = 0
    total_precision_extra = 0
    errors = 0
    confusion_counter = Counter()
    missed_counter = Counter()
    extra_counter = Counter()
    per_image_results = []
    for i, (image_path, result) in enumerate(zip(image_files, api_results), 1):
        gt_groups = parse_ground_truth(image_path.name)
        gt_display = ", ".join("|".join(g) for g in gt_groups) if gt_groups else "(none)"
        print(f"\n[{i}/{len(image_files)}] {image_path.name}")
        print(f"         GT: [{gt_display}]")
        if result['error'] is not None:
            e = result['error']
            if isinstance(e, requests.exceptions.HTTPError):
                print(f"         HTTP ERROR: {e}")
            else:
                print(f"         ERROR: {e}")
            errors += 1
            continue
        elapsed = result['elapsed']
        try:
            jerseys, vlm_colors = parse_response(result)
            vlm_display = ", ".join(sorted(vlm_colors)) if vlm_colors else "(none)"
            print(f"         VLM: [{vlm_display}]  ({len(jerseys)} jersey(s), {elapsed:.1f}s)")
            if not gt_groups:
                print("         -- no ground truth colors (white-only), skipping scoring")
                continue
            scores = score_image(gt_groups, vlm_colors)
            total_gt += scores['gt_count']
            total_vlm += scores['vlm_count']
            total_recall_exact += scores['recall_exact']
            total_recall_similar += scores['recall_similar']
            total_recall_missed += len(scores['recall_missed'])
            total_precision_exact += scores['precision_exact']
            total_precision_similar += scores['precision_similar']
            total_precision_extra += len(scores['precision_extra'])
            for group, got in scores['confusions']:
                confusion_counter[("|".join(group), got)] += 1
            for group in scores['recall_missed']:
                missed_counter["|".join(group)] += 1
            for ec in scores['precision_extra']:
                extra_counter[ec] += 1
            tag, status_parts = score_and_format(gt_groups, vlm_colors, scores)
            print(f"         {tag}  {', '.join(status_parts)}")
            per_image_results.append({
                'file': image_path.name,
                'tag': tag,
                'scores': scores,
            })
        except Exception as e:
            print(f"         PARSE ERROR: {e}")
            errors += 1
    total_time = time.time() - start_all
    print_summary(
        GEMINI_MODEL, total_gt, total_vlm, total_recall_exact,
        total_recall_similar, total_recall_missed, total_precision_exact,
        total_precision_similar, total_precision_extra, confusion_counter,
        missed_counter, extra_counter, per_image_results, len(image_files),
        errors, total_time,
    )
 if __name__ == '__main__':
    main()
--- a/uv.lock
+++ b/uv.lock
@ -0,0 +1,198 @@
 version = 1
 revision = 3
 requires-python = ">=3.12"
 [[package]]
 name = "certifi"
 version = "2026.1.4"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/e0/2d/a891ca51311197f6ad14a7ef42e2399f36cf2f9bd44752b3dc4eab60fdc5/certifi-2026.1.4.tar.gz", hash = "sha256:ac726dd470482006e014ad384921ed6438c457018f4b3d204aea4281258b2120", size = 154268, upload-time = "2026-01-04T02:42:41.825Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/e6/ad/3cc14f097111b4de0040c83a525973216457bbeeb63739ef1ed275c1c021/certifi-2026.1.4-py3-none-any.whl", hash = "sha256:9943707519e4add1115f44c2bc244f782c0249876bf51b6599fee1ffbedd685c", size = 152900, upload-time = "2026-01-04T02:42:40.15Z" },
 ]
 [[package]]
 name = "charset-normalizer"
 version = "3.4.4"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/13/69/33ddede1939fdd074bce5434295f38fae7136463422fe4fd3e0e89b98062/charset_normalizer-3.4.4.tar.gz", hash = "sha256:94537985111c35f28720e43603b8e7b43a6ecfb2ce1d3058bbe955b73404e21a", size = 129418, upload-time = "2025-10-14T04:42:32.879Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/f3/85/1637cd4af66fa687396e757dec650f28025f2a2f5a5531a3208dc0ec43f2/charset_normalizer-3.4.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:0a98e6759f854bd25a58a73fa88833fba3b7c491169f86ce1180c948ab3fd394", size = 208425, upload-time = "2025-10-14T04:40:53.353Z" },
    { url = "https://files.pythonhosted.org/packages/9d/6a/04130023fef2a0d9c62d0bae2649b69f7b7d8d24ea5536feef50551029df/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b5b290ccc2a263e8d185130284f8501e3e36c5e02750fc6b6bdeb2e9e96f1e25", size = 148162, upload-time = "2025-10-14T04:40:54.558Z" },
    { url = "https://files.pythonhosted.org/packages/78/29/62328d79aa60da22c9e0b9a66539feae06ca0f5a4171ac4f7dc285b83688/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74bb723680f9f7a6234dcf67aea57e708ec1fbdf5699fb91dfd6f511b0a320ef", size = 144558, upload-time = "2025-10-14T04:40:55.677Z" },
    { url = "https://files.pythonhosted.org/packages/86/bb/b32194a4bf15b88403537c2e120b817c61cd4ecffa9b6876e941c3ee38fe/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f1e34719c6ed0b92f418c7c780480b26b5d9c50349e9a9af7d76bf757530350d", size = 161497, upload-time = "2025-10-14T04:40:57.217Z" },
    { url = "https://files.pythonhosted.org/packages/19/89/a54c82b253d5b9b111dc74aca196ba5ccfcca8242d0fb64146d4d3183ff1/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2437418e20515acec67d86e12bf70056a33abdacb5cb1655042f6538d6b085a8", size = 159240, upload-time = "2025-10-14T04:40:58.358Z" },
    { url = "https://files.pythonhosted.org/packages/c0/10/d20b513afe03acc89ec33948320a5544d31f21b05368436d580dec4e234d/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:11d694519d7f29d6cd09f6ac70028dba10f92f6cdd059096db198c283794ac86", size = 153471, upload-time = "2025-10-14T04:40:59.468Z" },
    { url = "https://files.pythonhosted.org/packages/61/fa/fbf177b55bdd727010f9c0a3c49eefa1d10f960e5f09d1d887bf93c2e698/charset_normalizer-3.4.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ac1c4a689edcc530fc9d9aa11f5774b9e2f33f9a0c6a57864e90908f5208d30a", size = 150864, upload-time = "2025-10-14T04:41:00.623Z" },
    { url = "https://files.pythonhosted.org/packages/05/12/9fbc6a4d39c0198adeebbde20b619790e9236557ca59fc40e0e3cebe6f40/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:21d142cc6c0ec30d2efee5068ca36c128a30b0f2c53c1c07bd78cb6bc1d3be5f", size = 150647, upload-time = "2025-10-14T04:41:01.754Z" },
    { url = "https://files.pythonhosted.org/packages/ad/1f/6a9a593d52e3e8c5d2b167daf8c6b968808efb57ef4c210acb907c365bc4/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:5dbe56a36425d26d6cfb40ce79c314a2e4dd6211d51d6d2191c00bed34f354cc", size = 145110, upload-time = "2025-10-14T04:41:03.231Z" },
    { url = "https://files.pythonhosted.org/packages/30/42/9a52c609e72471b0fc54386dc63c3781a387bb4fe61c20231a4ebcd58bdd/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:5bfbb1b9acf3334612667b61bd3002196fe2a1eb4dd74d247e0f2a4d50ec9bbf", size = 162839, upload-time = "2025-10-14T04:41:04.715Z" },
    { url = "https://files.pythonhosted.org/packages/c4/5b/c0682bbf9f11597073052628ddd38344a3d673fda35a36773f7d19344b23/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:d055ec1e26e441f6187acf818b73564e6e6282709e9bcb5b63f5b23068356a15", size = 150667, upload-time = "2025-10-14T04:41:05.827Z" },
    { url = "https://files.pythonhosted.org/packages/e4/24/a41afeab6f990cf2daf6cb8c67419b63b48cf518e4f56022230840c9bfb2/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:af2d8c67d8e573d6de5bc30cdb27e9b95e49115cd9baad5ddbd1a6207aaa82a9", size = 160535, upload-time = "2025-10-14T04:41:06.938Z" },
    { url = "https://files.pythonhosted.org/packages/2a/e5/6a4ce77ed243c4a50a1fecca6aaaab419628c818a49434be428fe24c9957/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:780236ac706e66881f3b7f2f32dfe90507a09e67d1d454c762cf642e6e1586e0", size = 154816, upload-time = "2025-10-14T04:41:08.101Z" },
    { url = "https://files.pythonhosted.org/packages/a8/ef/89297262b8092b312d29cdb2517cb1237e51db8ecef2e9af5edbe7b683b1/charset_normalizer-3.4.4-cp312-cp312-win32.whl", hash = "sha256:5833d2c39d8896e4e19b689ffc198f08ea58116bee26dea51e362ecc7cd3ed26", size = 99694, upload-time = "2025-10-14T04:41:09.23Z" },
    { url = "https://files.pythonhosted.org/packages/3d/2d/1e5ed9dd3b3803994c155cd9aacb60c82c331bad84daf75bcb9c91b3295e/charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:a79cfe37875f822425b89a82333404539ae63dbdddf97f84dcbc3d339aae9525", size = 107131, upload-time = "2025-10-14T04:41:10.467Z" },
    { url = "https://files.pythonhosted.org/packages/d0/d9/0ed4c7098a861482a7b6a95603edce4c0d9db2311af23da1fb2b75ec26fc/charset_normalizer-3.4.4-cp312-cp312-win_arm64.whl", hash = "sha256:376bec83a63b8021bb5c8ea75e21c4ccb86e7e45ca4eb81146091b56599b80c3", size = 100390, upload-time = "2025-10-14T04:41:11.915Z" },
    { url = "https://files.pythonhosted.org/packages/97/45/4b3a1239bbacd321068ea6e7ac28875b03ab8bc0aa0966452db17cd36714/charset_normalizer-3.4.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:e1f185f86a6f3403aa2420e815904c67b2f9ebc443f045edd0de921108345794", size = 208091, upload-time = "2025-10-14T04:41:13.346Z" },
    { url = "https://files.pythonhosted.org/packages/7d/62/73a6d7450829655a35bb88a88fca7d736f9882a27eacdca2c6d505b57e2e/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b39f987ae8ccdf0d2642338faf2abb1862340facc796048b604ef14919e55ed", size = 147936, upload-time = "2025-10-14T04:41:14.461Z" },
    { url = "https://files.pythonhosted.org/packages/89/c5/adb8c8b3d6625bef6d88b251bbb0d95f8205831b987631ab0c8bb5d937c2/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:3162d5d8ce1bb98dd51af660f2121c55d0fa541b46dff7bb9b9f86ea1d87de72", size = 144180, upload-time = "2025-10-14T04:41:15.588Z" },
    { url = "https://files.pythonhosted.org/packages/91/ed/9706e4070682d1cc219050b6048bfd293ccf67b3d4f5a4f39207453d4b99/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:81d5eb2a312700f4ecaa977a8235b634ce853200e828fbadf3a9c50bab278328", size = 161346, upload-time = "2025-10-14T04:41:16.738Z" },
    { url = "https://files.pythonhosted.org/packages/d5/0d/031f0d95e4972901a2f6f09ef055751805ff541511dc1252ba3ca1f80cf5/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5bd2293095d766545ec1a8f612559f6b40abc0eb18bb2f5d1171872d34036ede", size = 158874, upload-time = "2025-10-14T04:41:17.923Z" },
    { url = "https://files.pythonhosted.org/packages/f5/83/6ab5883f57c9c801ce5e5677242328aa45592be8a00644310a008d04f922/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a8a8b89589086a25749f471e6a900d3f662d1d3b6e2e59dcecf787b1cc3a1894", size = 153076, upload-time = "2025-10-14T04:41:19.106Z" },
    { url = "https://files.pythonhosted.org/packages/75/1e/5ff781ddf5260e387d6419959ee89ef13878229732732ee73cdae01800f2/charset_normalizer-3.4.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc7637e2f80d8530ee4a78e878bce464f70087ce73cf7c1caf142416923b98f1", size = 150601, upload-time = "2025-10-14T04:41:20.245Z" },
    { url = "https://files.pythonhosted.org/packages/d7/57/71be810965493d3510a6ca79b90c19e48696fb1ff964da319334b12677f0/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f8bf04158c6b607d747e93949aa60618b61312fe647a6369f88ce2ff16043490", size = 150376, upload-time = "2025-10-14T04:41:21.398Z" },
    { url = "https://files.pythonhosted.org/packages/e5/d5/c3d057a78c181d007014feb7e9f2e65905a6c4ef182c0ddf0de2924edd65/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:554af85e960429cf30784dd47447d5125aaa3b99a6f0683589dbd27e2f45da44", size = 144825, upload-time = "2025-10-14T04:41:22.583Z" },
    { url = "https://files.pythonhosted.org/packages/e6/8c/d0406294828d4976f275ffbe66f00266c4b3136b7506941d87c00cab5272/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:74018750915ee7ad843a774364e13a3db91682f26142baddf775342c3f5b1133", size = 162583, upload-time = "2025-10-14T04:41:23.754Z" },
    { url = "https://files.pythonhosted.org/packages/d7/24/e2aa1f18c8f15c4c0e932d9287b8609dd30ad56dbe41d926bd846e22fb8d/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:c0463276121fdee9c49b98908b3a89c39be45d86d1dbaa22957e38f6321d4ce3", size = 150366, upload-time = "2025-10-14T04:41:25.27Z" },
    { url = "https://files.pythonhosted.org/packages/e4/5b/1e6160c7739aad1e2df054300cc618b06bf784a7a164b0f238360721ab86/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:362d61fd13843997c1c446760ef36f240cf81d3ebf74ac62652aebaf7838561e", size = 160300, upload-time = "2025-10-14T04:41:26.725Z" },
    { url = "https://files.pythonhosted.org/packages/7a/10/f882167cd207fbdd743e55534d5d9620e095089d176d55cb22d5322f2afd/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9a26f18905b8dd5d685d6d07b0cdf98a79f3c7a918906af7cc143ea2e164c8bc", size = 154465, upload-time = "2025-10-14T04:41:28.322Z" },
    { url = "https://files.pythonhosted.org/packages/89/66/c7a9e1b7429be72123441bfdbaf2bc13faab3f90b933f664db506dea5915/charset_normalizer-3.4.4-cp313-cp313-win32.whl", hash = "sha256:9b35f4c90079ff2e2edc5b26c0c77925e5d2d255c42c74fdb70fb49b172726ac", size = 99404, upload-time = "2025-10-14T04:41:29.95Z" },
    { url = "https://files.pythonhosted.org/packages/c4/26/b9924fa27db384bdcd97ab83b4f0a8058d96ad9626ead570674d5e737d90/charset_normalizer-3.4.4-cp313-cp313-win_amd64.whl", hash = "sha256:b435cba5f4f750aa6c0a0d92c541fb79f69a387c91e61f1795227e4ed9cece14", size = 107092, upload-time = "2025-10-14T04:41:31.188Z" },
    { url = "https://files.pythonhosted.org/packages/af/8f/3ed4bfa0c0c72a7ca17f0380cd9e4dd842b09f664e780c13cff1dcf2ef1b/charset_normalizer-3.4.4-cp313-cp313-win_arm64.whl", hash = "sha256:542d2cee80be6f80247095cc36c418f7bddd14f4a6de45af91dfad36d817bba2", size = 100408, upload-time = "2025-10-14T04:41:32.624Z" },
    { url = "https://files.pythonhosted.org/packages/2a/35/7051599bd493e62411d6ede36fd5af83a38f37c4767b92884df7301db25d/charset_normalizer-3.4.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:da3326d9e65ef63a817ecbcc0df6e94463713b754fe293eaa03da99befb9a5bd", size = 207746, upload-time = "2025-10-14T04:41:33.773Z" },
    { url = "https://files.pythonhosted.org/packages/10/9a/97c8d48ef10d6cd4fcead2415523221624bf58bcf68a802721a6bc807c8f/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8af65f14dc14a79b924524b1e7fffe304517b2bff5a58bf64f30b98bbc5079eb", size = 147889, upload-time = "2025-10-14T04:41:34.897Z" },
    { url = "https://files.pythonhosted.org/packages/10/bf/979224a919a1b606c82bd2c5fa49b5c6d5727aa47b4312bb27b1734f53cd/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74664978bb272435107de04e36db5a9735e78232b85b77d45cfb38f758efd33e", size = 143641, upload-time = "2025-10-14T04:41:36.116Z" },
    { url = "https://files.pythonhosted.org/packages/ba/33/0ad65587441fc730dc7bd90e9716b30b4702dc7b617e6ba4997dc8651495/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:752944c7ffbfdd10c074dc58ec2d5a8a4cd9493b314d367c14d24c17684ddd14", size = 160779, upload-time = "2025-10-14T04:41:37.229Z" },
    { url = "https://files.pythonhosted.org/packages/67/ed/331d6b249259ee71ddea93f6f2f0a56cfebd46938bde6fcc6f7b9a3d0e09/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d1f13550535ad8cff21b8d757a3257963e951d96e20ec82ab44bc64aeb62a191", size = 159035, upload-time = "2025-10-14T04:41:38.368Z" },
    { url = "https://files.pythonhosted.org/packages/67/ff/f6b948ca32e4f2a4576aa129d8bed61f2e0543bf9f5f2b7fc3758ed005c9/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ecaae4149d99b1c9e7b88bb03e3221956f68fd6d50be2ef061b2381b61d20838", size = 152542, upload-time = "2025-10-14T04:41:39.862Z" },
    { url = "https://files.pythonhosted.org/packages/16/85/276033dcbcc369eb176594de22728541a925b2632f9716428c851b149e83/charset_normalizer-3.4.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:cb6254dc36b47a990e59e1068afacdcd02958bdcce30bb50cc1700a8b9d624a6", size = 149524, upload-time = "2025-10-14T04:41:41.319Z" },
    { url = "https://files.pythonhosted.org/packages/9e/f2/6a2a1f722b6aba37050e626530a46a68f74e63683947a8acff92569f979a/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c8ae8a0f02f57a6e61203a31428fa1d677cbe50c93622b4149d5c0f319c1d19e", size = 150395, upload-time = "2025-10-14T04:41:42.539Z" },
    { url = "https://files.pythonhosted.org/packages/60/bb/2186cb2f2bbaea6338cad15ce23a67f9b0672929744381e28b0592676824/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:47cc91b2f4dd2833fddaedd2893006b0106129d4b94fdb6af1f4ce5a9965577c", size = 143680, upload-time = "2025-10-14T04:41:43.661Z" },
    { url = "https://files.pythonhosted.org/packages/7d/a5/bf6f13b772fbb2a90360eb620d52ed8f796f3c5caee8398c3b2eb7b1c60d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:82004af6c302b5d3ab2cfc4cc5f29db16123b1a8417f2e25f9066f91d4411090", size = 162045, upload-time = "2025-10-14T04:41:44.821Z" },
    { url = "https://files.pythonhosted.org/packages/df/c5/d1be898bf0dc3ef9030c3825e5d3b83f2c528d207d246cbabe245966808d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2b7d8f6c26245217bd2ad053761201e9f9680f8ce52f0fcd8d0755aeae5b2152", size = 149687, upload-time = "2025-10-14T04:41:46.442Z" },
    { url = "https://files.pythonhosted.org/packages/a5/42/90c1f7b9341eef50c8a1cb3f098ac43b0508413f33affd762855f67a410e/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:799a7a5e4fb2d5898c60b640fd4981d6a25f1c11790935a44ce38c54e985f828", size = 160014, upload-time = "2025-10-14T04:41:47.631Z" },
    { url = "https://files.pythonhosted.org/packages/76/be/4d3ee471e8145d12795ab655ece37baed0929462a86e72372fd25859047c/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:99ae2cffebb06e6c22bdc25801d7b30f503cc87dbd283479e7b606f70aff57ec", size = 154044, upload-time = "2025-10-14T04:41:48.81Z" },
    { url = "https://files.pythonhosted.org/packages/b0/6f/8f7af07237c34a1defe7defc565a9bc1807762f672c0fde711a4b22bf9c0/charset_normalizer-3.4.4-cp314-cp314-win32.whl", hash = "sha256:f9d332f8c2a2fcbffe1378594431458ddbef721c1769d78e2cbc06280d8155f9", size = 99940, upload-time = "2025-10-14T04:41:49.946Z" },
    { url = "https://files.pythonhosted.org/packages/4b/51/8ade005e5ca5b0d80fb4aff72a3775b325bdc3d27408c8113811a7cbe640/charset_normalizer-3.4.4-cp314-cp314-win_amd64.whl", hash = "sha256:8a6562c3700cce886c5be75ade4a5db4214fda19fede41d9792d100288d8f94c", size = 107104, upload-time = "2025-10-14T04:41:51.051Z" },
    { url = "https://files.pythonhosted.org/packages/da/5f/6b8f83a55bb8278772c5ae54a577f3099025f9ade59d0136ac24a0df4bde/charset_normalizer-3.4.4-cp314-cp314-win_arm64.whl", hash = "sha256:de00632ca48df9daf77a2c65a484531649261ec9f25489917f09e455cb09ddb2", size = 100743, upload-time = "2025-10-14T04:41:52.122Z" },
    { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
 ]
 [[package]]
 name = "idna"
 version = "3.11"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" },
 ]
 [[package]]
 name = "jersey-test"
 version = "0.1.0"
 source = { virtual = "." }
 dependencies = [
    { name = "numpy" },
    { name = "opencv-python" },
    { name = "requests" },
 ]
 [package.metadata]
 requires-dist = [
    { name = "numpy", specifier = ">=1.24.0" },
    { name = "opencv-python", specifier = ">=4.8.0" },
    { name = "requests", specifier = ">=2.28.0" },
 ]
 [[package]]
 name = "numpy"
 version = "2.4.2"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/57/fd/0005efbd0af48e55eb3c7208af93f2862d4b1a56cd78e84309a2d959208d/numpy-2.4.2.tar.gz", hash = "sha256:659a6107e31a83c4e33f763942275fd278b21d095094044eb35569e86a21ddae", size = 20723651, upload-time = "2026-01-31T23:13:10.135Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/51/6e/6f394c9c77668153e14d4da83bcc247beb5952f6ead7699a1a2992613bea/numpy-2.4.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:21982668592194c609de53ba4933a7471880ccbaadcc52352694a59ecc860b3a", size = 16667963, upload-time = "2026-01-31T23:10:52.147Z" },
    { url = "https://files.pythonhosted.org/packages/1f/f8/55483431f2b2fd015ae6ed4fe62288823ce908437ed49db5a03d15151678/numpy-2.4.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40397bda92382fcec844066efb11f13e1c9a3e2a8e8f318fb72ed8b6db9f60f1", size = 14693571, upload-time = "2026-01-31T23:10:54.789Z" },
    { url = "https://files.pythonhosted.org/packages/2f/20/18026832b1845cdc82248208dd929ca14c9d8f2bac391f67440707fff27c/numpy-2.4.2-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:b3a24467af63c67829bfaa61eecf18d5432d4f11992688537be59ecd6ad32f5e", size = 5203469, upload-time = "2026-01-31T23:10:57.343Z" },
    { url = "https://files.pythonhosted.org/packages/7d/33/2eb97c8a77daaba34eaa3fa7241a14ac5f51c46a6bd5911361b644c4a1e2/numpy-2.4.2-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:805cc8de9fd6e7a22da5aed858e0ab16be5a4db6c873dde1d7451c541553aa27", size = 6550820, upload-time = "2026-01-31T23:10:59.429Z" },
    { url = "https://files.pythonhosted.org/packages/b1/91/b97fdfd12dc75b02c44e26c6638241cc004d4079a0321a69c62f51470c4c/numpy-2.4.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6d82351358ffbcdcd7b686b90742a9b86632d6c1c051016484fa0b326a0a1548", size = 15663067, upload-time = "2026-01-31T23:11:01.291Z" },
    { url = "https://files.pythonhosted.org/packages/f5/c6/a18e59f3f0b8071cc85cbc8d80cd02d68aa9710170b2553a117203d46936/numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9e35d3e0144137d9fdae62912e869136164534d64a169f86438bc9561b6ad49f", size = 16619782, upload-time = "2026-01-31T23:11:03.669Z" },
    { url = "https://files.pythonhosted.org/packages/b7/83/9751502164601a79e18847309f5ceec0b1446d7b6aa12305759b72cf98b2/numpy-2.4.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:adb6ed2ad29b9e15321d167d152ee909ec73395901b70936f029c3bc6d7f4460", size = 17013128, upload-time = "2026-01-31T23:11:05.913Z" },
    { url = "https://files.pythonhosted.org/packages/61/c4/c4066322256ec740acc1c8923a10047818691d2f8aec254798f3dd90f5f2/numpy-2.4.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:8906e71fd8afcb76580404e2a950caef2685df3d2a57fe82a86ac8d33cc007ba", size = 18345324, upload-time = "2026-01-31T23:11:08.248Z" },
    { url = "https://files.pythonhosted.org/packages/ab/af/6157aa6da728fa4525a755bfad486ae7e3f76d4c1864138003eb84328497/numpy-2.4.2-cp312-cp312-win32.whl", hash = "sha256:ec055f6dae239a6299cace477b479cca2fc125c5675482daf1dd886933a1076f", size = 5960282, upload-time = "2026-01-31T23:11:10.497Z" },
    { url = "https://files.pythonhosted.org/packages/92/0f/7ceaaeaacb40567071e94dbf2c9480c0ae453d5bb4f52bea3892c39dc83c/numpy-2.4.2-cp312-cp312-win_amd64.whl", hash = "sha256:209fae046e62d0ce6435fcfe3b1a10537e858249b3d9b05829e2a05218296a85", size = 12314210, upload-time = "2026-01-31T23:11:12.176Z" },
    { url = "https://files.pythonhosted.org/packages/2f/a3/56c5c604fae6dd40fa2ed3040d005fca97e91bd320d232ac9931d77ba13c/numpy-2.4.2-cp312-cp312-win_arm64.whl", hash = "sha256:fbde1b0c6e81d56f5dccd95dd4a711d9b95df1ae4009a60887e56b27e8d903fa", size = 10220171, upload-time = "2026-01-31T23:11:14.684Z" },
    { url = "https://files.pythonhosted.org/packages/a1/22/815b9fe25d1d7ae7d492152adbc7226d3eff731dffc38fe970589fcaaa38/numpy-2.4.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:25f2059807faea4b077a2b6837391b5d830864b3543627f381821c646f31a63c", size = 16663696, upload-time = "2026-01-31T23:11:17.516Z" },
    { url = "https://files.pythonhosted.org/packages/09/f0/817d03a03f93ba9c6c8993de509277d84e69f9453601915e4a69554102a1/numpy-2.4.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bd3a7a9f5847d2fb8c2c6d1c862fa109c31a9abeca1a3c2bd5a64572955b2979", size = 14688322, upload-time = "2026-01-31T23:11:19.883Z" },
    { url = "https://files.pythonhosted.org/packages/da/b4/f805ab79293c728b9a99438775ce51885fd4f31b76178767cfc718701a39/numpy-2.4.2-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:8e4549f8a3c6d13d55041925e912bfd834285ef1dd64d6bc7d542583355e2e98", size = 5198157, upload-time = "2026-01-31T23:11:22.375Z" },
    { url = "https://files.pythonhosted.org/packages/74/09/826e4289844eccdcd64aac27d13b0fd3f32039915dd5b9ba01baae1f436c/numpy-2.4.2-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:aea4f66ff44dfddf8c2cffd66ba6538c5ec67d389285292fe428cb2c738c8aef", size = 6546330, upload-time = "2026-01-31T23:11:23.958Z" },
    { url = "https://files.pythonhosted.org/packages/19/fb/cbfdbfa3057a10aea5422c558ac57538e6acc87ec1669e666d32ac198da7/numpy-2.4.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c3cd545784805de05aafe1dde61752ea49a359ccba9760c1e5d1c88a93bbf2b7", size = 15660968, upload-time = "2026-01-31T23:11:25.713Z" },
    { url = "https://files.pythonhosted.org/packages/04/dc/46066ce18d01645541f0186877377b9371b8fa8017fa8262002b4ef22612/numpy-2.4.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d0d9b7c93578baafcbc5f0b83eaf17b79d345c6f36917ba0c67f45226911d499", size = 16607311, upload-time = "2026-01-31T23:11:28.117Z" },
    { url = "https://files.pythonhosted.org/packages/14/d9/4b5adfc39a43fa6bf918c6d544bc60c05236cc2f6339847fc5b35e6cb5b0/numpy-2.4.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f74f0f7779cc7ae07d1810aab8ac6b1464c3eafb9e283a40da7309d5e6e48fbb", size = 17012850, upload-time = "2026-01-31T23:11:30.888Z" },
    { url = "https://files.pythonhosted.org/packages/b7/20/adb6e6adde6d0130046e6fdfb7675cc62bc2f6b7b02239a09eb58435753d/numpy-2.4.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c7ac672d699bf36275c035e16b65539931347d68b70667d28984c9fb34e07fa7", size = 18334210, upload-time = "2026-01-31T23:11:33.214Z" },
    { url = "https://files.pythonhosted.org/packages/78/0e/0a73b3dff26803a8c02baa76398015ea2a5434d9b8265a7898a6028c1591/numpy-2.4.2-cp313-cp313-win32.whl", hash = "sha256:8e9afaeb0beff068b4d9cd20d322ba0ee1cecfb0b08db145e4ab4dd44a6b5110", size = 5958199, upload-time = "2026-01-31T23:11:35.385Z" },
    { url = "https://files.pythonhosted.org/packages/43/bc/6352f343522fcb2c04dbaf94cb30cca6fd32c1a750c06ad6231b4293708c/numpy-2.4.2-cp313-cp313-win_amd64.whl", hash = "sha256:7df2de1e4fba69a51c06c28f5a3de36731eb9639feb8e1cf7e4a7b0daf4cf622", size = 12310848, upload-time = "2026-01-31T23:11:38.001Z" },
    { url = "https://files.pythonhosted.org/packages/6e/8d/6da186483e308da5da1cc6918ce913dcfe14ffde98e710bfeff2a6158d4e/numpy-2.4.2-cp313-cp313-win_arm64.whl", hash = "sha256:0fece1d1f0a89c16b03442eae5c56dc0be0c7883b5d388e0c03f53019a4bfd71", size = 10221082, upload-time = "2026-01-31T23:11:40.392Z" },
    { url = "https://files.pythonhosted.org/packages/25/a1/9510aa43555b44781968935c7548a8926274f815de42ad3997e9e83680dd/numpy-2.4.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:5633c0da313330fd20c484c78cdd3f9b175b55e1a766c4a174230c6b70ad8262", size = 14815866, upload-time = "2026-01-31T23:11:42.495Z" },
    { url = "https://files.pythonhosted.org/packages/36/30/6bbb5e76631a5ae46e7923dd16ca9d3f1c93cfa8d4ed79a129814a9d8db3/numpy-2.4.2-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d9f64d786b3b1dd742c946c42d15b07497ed14af1a1f3ce840cce27daa0ce913", size = 5325631, upload-time = "2026-01-31T23:11:44.7Z" },
    { url = "https://files.pythonhosted.org/packages/46/00/3a490938800c1923b567b3a15cd17896e68052e2145d8662aaf3e1ffc58f/numpy-2.4.2-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:b21041e8cb6a1eb5312dd1d2f80a94d91efffb7a06b70597d44f1bd2dfc315ab", size = 6646254, upload-time = "2026-01-31T23:11:46.341Z" },
    { url = "https://files.pythonhosted.org/packages/d3/e9/fac0890149898a9b609caa5af7455a948b544746e4b8fe7c212c8edd71f8/numpy-2.4.2-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:00ab83c56211a1d7c07c25e3217ea6695e50a3e2f255053686b081dc0b091a82", size = 15720138, upload-time = "2026-01-31T23:11:48.082Z" },
    { url = "https://files.pythonhosted.org/packages/ea/5c/08887c54e68e1e28df53709f1893ce92932cc6f01f7c3d4dc952f61ffd4e/numpy-2.4.2-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2fb882da679409066b4603579619341c6d6898fc83a8995199d5249f986e8e8f", size = 16655398, upload-time = "2026-01-31T23:11:50.293Z" },
    { url = "https://files.pythonhosted.org/packages/4d/89/253db0fa0e66e9129c745e4ef25631dc37d5f1314dad2b53e907b8538e6d/numpy-2.4.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:66cb9422236317f9d44b67b4d18f44efe6e9c7f8794ac0462978513359461554", size = 17079064, upload-time = "2026-01-31T23:11:52.927Z" },
    { url = "https://files.pythonhosted.org/packages/2a/d5/cbade46ce97c59c6c3da525e8d95b7abe8a42974a1dc5c1d489c10433e88/numpy-2.4.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0f01dcf33e73d80bd8dc0f20a71303abbafa26a19e23f6b68d1aa9990af90257", size = 18379680, upload-time = "2026-01-31T23:11:55.22Z" },
    { url = "https://files.pythonhosted.org/packages/40/62/48f99ae172a4b63d981babe683685030e8a3df4f246c893ea5c6ef99f018/numpy-2.4.2-cp313-cp313t-win32.whl", hash = "sha256:52b913ec40ff7ae845687b0b34d8d93b60cb66dcee06996dd5c99f2fc9328657", size = 6082433, upload-time = "2026-01-31T23:11:58.096Z" },
    { url = "https://files.pythonhosted.org/packages/07/38/e054a61cfe48ad9f1ed0d188e78b7e26859d0b60ef21cd9de4897cdb5326/numpy-2.4.2-cp313-cp313t-win_amd64.whl", hash = "sha256:5eea80d908b2c1f91486eb95b3fb6fab187e569ec9752ab7d9333d2e66bf2d6b", size = 12451181, upload-time = "2026-01-31T23:11:59.782Z" },
    { url = "https://files.pythonhosted.org/packages/6e/a4/a05c3a6418575e185dd84d0b9680b6bb2e2dc3e4202f036b7b4e22d6e9dc/numpy-2.4.2-cp313-cp313t-win_arm64.whl", hash = "sha256:fd49860271d52127d61197bb50b64f58454e9f578cb4b2c001a6de8b1f50b0b1", size = 10290756, upload-time = "2026-01-31T23:12:02.438Z" },
    { url = "https://files.pythonhosted.org/packages/18/88/b7df6050bf18fdcfb7046286c6535cabbdd2064a3440fca3f069d319c16e/numpy-2.4.2-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:444be170853f1f9d528428eceb55f12918e4fda5d8805480f36a002f1415e09b", size = 16663092, upload-time = "2026-01-31T23:12:04.521Z" },
    { url = "https://files.pythonhosted.org/packages/25/7a/1fee4329abc705a469a4afe6e69b1ef7e915117747886327104a8493a955/numpy-2.4.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:d1240d50adff70c2a88217698ca844723068533f3f5c5fa6ee2e3220e3bdb000", size = 14698770, upload-time = "2026-01-31T23:12:06.96Z" },
    { url = "https://files.pythonhosted.org/packages/fb/0b/f9e49ba6c923678ad5bc38181c08ac5e53b7a5754dbca8e581aa1a56b1ff/numpy-2.4.2-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:7cdde6de52fb6664b00b056341265441192d1291c130e99183ec0d4b110ff8b1", size = 5208562, upload-time = "2026-01-31T23:12:09.632Z" },
    { url = "https://files.pythonhosted.org/packages/7d/12/d7de8f6f53f9bb76997e5e4c069eda2051e3fe134e9181671c4391677bb2/numpy-2.4.2-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:cda077c2e5b780200b6b3e09d0b42205a3d1c68f30c6dceb90401c13bff8fe74", size = 6543710, upload-time = "2026-01-31T23:12:11.969Z" },
    { url = "https://files.pythonhosted.org/packages/09/63/c66418c2e0268a31a4cf8a8b512685748200f8e8e8ec6c507ce14e773529/numpy-2.4.2-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d30291931c915b2ab5717c2974bb95ee891a1cf22ebc16a8006bd59cd210d40a", size = 15677205, upload-time = "2026-01-31T23:12:14.33Z" },
    { url = "https://files.pythonhosted.org/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bba37bc29d4d85761deed3954a1bc62be7cf462b9510b51d367b769a8c8df325", size = 16611738, upload-time = "2026-01-31T23:12:16.525Z" },
    { url = "https://files.pythonhosted.org/packages/c2/a7/39c4cdda9f019b609b5c473899d87abff092fc908cfe4d1ecb2fcff453b0/numpy-2.4.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b2f0073ed0868db1dcd86e052d37279eef185b9c8db5bf61f30f46adac63c909", size = 17028888, upload-time = "2026-01-31T23:12:19.306Z" },
    { url = "https://files.pythonhosted.org/packages/da/b3/e84bb64bdfea967cc10950d71090ec2d84b49bc691df0025dddb7c26e8e3/numpy-2.4.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7f54844851cdb630ceb623dcec4db3240d1ac13d4990532446761baede94996a", size = 18339556, upload-time = "2026-01-31T23:12:21.816Z" },
    { url = "https://files.pythonhosted.org/packages/88/f5/954a291bc1192a27081706862ac62bb5920fbecfbaa302f64682aa90beed/numpy-2.4.2-cp314-cp314-win32.whl", hash = "sha256:12e26134a0331d8dbd9351620f037ec470b7c75929cb8a1537f6bfe411152a1a", size = 6006899, upload-time = "2026-01-31T23:12:24.14Z" },
    { url = "https://files.pythonhosted.org/packages/05/cb/eff72a91b2efdd1bc98b3b8759f6a1654aa87612fc86e3d87d6fe4f948c4/numpy-2.4.2-cp314-cp314-win_amd64.whl", hash = "sha256:068cdb2d0d644cdb45670810894f6a0600797a69c05f1ac478e8d31670b8ee75", size = 12443072, upload-time = "2026-01-31T23:12:26.33Z" },
    { url = "https://files.pythonhosted.org/packages/37/75/62726948db36a56428fce4ba80a115716dc4fad6a3a4352487f8bb950966/numpy-2.4.2-cp314-cp314-win_arm64.whl", hash = "sha256:6ed0be1ee58eef41231a5c943d7d1375f093142702d5723ca2eb07db9b934b05", size = 10494886, upload-time = "2026-01-31T23:12:28.488Z" },
    { url = "https://files.pythonhosted.org/packages/36/2f/ee93744f1e0661dc267e4b21940870cabfae187c092e1433b77b09b50ac4/numpy-2.4.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:98f16a80e917003a12c0580f97b5f875853ebc33e2eaa4bccfc8201ac6869308", size = 14818567, upload-time = "2026-01-31T23:12:30.709Z" },
    { url = "https://files.pythonhosted.org/packages/a7/24/6535212add7d76ff938d8bdc654f53f88d35cddedf807a599e180dcb8e66/numpy-2.4.2-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:20abd069b9cda45874498b245c8015b18ace6de8546bf50dfa8cea1696ed06ef", size = 5328372, upload-time = "2026-01-31T23:12:32.962Z" },
    { url = "https://files.pythonhosted.org/packages/5e/9d/c48f0a035725f925634bf6b8994253b43f2047f6778a54147d7e213bc5a7/numpy-2.4.2-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:e98c97502435b53741540a5717a6749ac2ada901056c7db951d33e11c885cc7d", size = 6649306, upload-time = "2026-01-31T23:12:34.797Z" },
    { url = "https://files.pythonhosted.org/packages/81/05/7c73a9574cd4a53a25907bad38b59ac83919c0ddc8234ec157f344d57d9a/numpy-2.4.2-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:da6cad4e82cb893db4b69105c604d805e0c3ce11501a55b5e9f9083b47d2ffe8", size = 15722394, upload-time = "2026-01-31T23:12:36.565Z" },
    { url = "https://files.pythonhosted.org/packages/35/fa/4de10089f21fc7d18442c4a767ab156b25c2a6eaf187c0db6d9ecdaeb43f/numpy-2.4.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9e4424677ce4b47fe73c8b5556d876571f7c6945d264201180db2dc34f676ab5", size = 16653343, upload-time = "2026-01-31T23:12:39.188Z" },
    { url = "https://files.pythonhosted.org/packages/b8/f9/d33e4ffc857f3763a57aa85650f2e82486832d7492280ac21ba9efda80da/numpy-2.4.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:2b8f157c8a6f20eb657e240f8985cc135598b2b46985c5bccbde7616dc9c6b1e", size = 17078045, upload-time = "2026-01-31T23:12:42.041Z" },
    { url = "https://files.pythonhosted.org/packages/c8/b8/54bdb43b6225badbea6389fa038c4ef868c44f5890f95dd530a218706da3/numpy-2.4.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5daf6f3914a733336dab21a05cdec343144600e964d2fcdabaac0c0269874b2a", size = 18380024, upload-time = "2026-01-31T23:12:44.331Z" },
    { url = "https://files.pythonhosted.org/packages/a5/55/6e1a61ded7af8df04016d81b5b02daa59f2ea9252ee0397cb9f631efe9e5/numpy-2.4.2-cp314-cp314t-win32.whl", hash = "sha256:8c50dd1fc8826f5b26a5ee4d77ca55d88a895f4e4819c7ecc2a9f5905047a443", size = 6153937, upload-time = "2026-01-31T23:12:47.229Z" },
    { url = "https://files.pythonhosted.org/packages/45/aa/fa6118d1ed6d776b0983f3ceac9b1a5558e80df9365b1c3aa6d42bf9eee4/numpy-2.4.2-cp314-cp314t-win_amd64.whl", hash = "sha256:fcf92bee92742edd401ba41135185866f7026c502617f422eb432cfeca4fe236", size = 12631844, upload-time = "2026-01-31T23:12:48.997Z" },
    { url = "https://files.pythonhosted.org/packages/32/0a/2ec5deea6dcd158f254a7b372fb09cfba5719419c8d66343bab35237b3fb/numpy-2.4.2-cp314-cp314t-win_arm64.whl", hash = "sha256:1f92f53998a17265194018d1cc321b2e96e900ca52d54c7c77837b71b9465181", size = 10565379, upload-time = "2026-01-31T23:12:51.345Z" },
 ]
 [[package]]
 name = "opencv-python"
 version = "4.13.0.92"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "numpy" },
 ]
 wheels = [
    { url = "https://files.pythonhosted.org/packages/fc/6f/5a28fef4c4a382be06afe3938c64cc168223016fa520c5abaf37e8862aa5/opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19", size = 46247052, upload-time = "2026-02-05T07:01:25.046Z" },
    { url = "https://files.pythonhosted.org/packages/08/ac/6c98c44c650b8114a0fb901691351cfb3956d502e8e9b5cd27f4ee7fbf2f/opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9", size = 32568781, upload-time = "2026-02-05T07:01:41.379Z" },
    { url = "https://files.pythonhosted.org/packages/3e/51/82fed528b45173bf629fa44effb76dff8bc9f4eeaee759038362dfa60237/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a", size = 47685527, upload-time = "2026-02-05T06:59:11.24Z" },
    { url = "https://files.pythonhosted.org/packages/db/07/90b34a8e2cf9c50fe8ed25cac9011cde0676b4d9d9c973751ac7616223a2/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf", size = 70460872, upload-time = "2026-02-05T06:59:19.162Z" },
    { url = "https://files.pythonhosted.org/packages/02/6d/7a9cc719b3eaf4377b9c2e3edeb7ed3a81de41f96421510c0a169ca3cfd4/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616", size = 46708208, upload-time = "2026-02-05T06:59:15.419Z" },
    { url = "https://files.pythonhosted.org/packages/fd/55/b3b49a1b97aabcfbbd6c7326df9cb0b6fa0c0aefa8e89d500939e04aa229/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5", size = 72927042, upload-time = "2026-02-05T06:59:23.389Z" },
    { url = "https://files.pythonhosted.org/packages/fb/17/de5458312bcb07ddf434d7bfcb24bb52c59635ad58c6e7c751b48949b009/opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59", size = 30932638, upload-time = "2026-02-05T07:02:14.98Z" },
    { url = "https://files.pythonhosted.org/packages/e9/a5/1be1516390333ff9be3a9cb648c9f33df79d5096e5884b5df71a588af463/opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5", size = 40212062, upload-time = "2026-02-05T07:02:12.724Z" },
 ]
 [[package]]
 name = "requests"
 version = "2.32.5"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "certifi" },
    { name = "charset-normalizer" },
    { name = "idna" },
    { name = "urllib3" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
 ]
 [[package]]
 name = "urllib3"
 version = "2.6.3"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" },
 ]