Add color variety and hex specificity test scripts with report

- test_color_variety.py: named-color test for local llama.cpp VLM - test_color_variety_gemini.py: named-color test for Gemini 3 Flash API - test_hex_color_specificity.py: hex color specificity test for Gemini - test_hex_color_specificity_llama.py: hex color specificity test for local VLM - jersey_prompt_hex_color.txt: prompt requesting hex color codes - COLOR_TEST_REPORT.md: analysis report comparing 3 models across 5 tests - color_test_results.md: raw test output from all runs
2026-02-24 11:30:41 -07:00
parent 825f3c19a9
commit 435033ea07
7 changed files with 1646 additions and 0 deletions
--- a/COLOR_TEST_REPORT.md
+++ b/COLOR_TEST_REPORT.md
@ -0,0 +1,205 @@
+# Jersey Color Detection - VLM Comparison Report
+
+**Date:** 2026-02-24
+**Test set:** 161 basketball images (`basketball_jersery_color_test_files/`)
+
+## Overview
+
+Five tests were run to evaluate how vision-language models describe jersey colors:
+
+| Test | Model | Images | Prompt | Purpose |
+|------|-------|--------|--------|---------|
+| 1 | Qwen2.5-VL-7B (local, llama.cpp) | 161 | Named colors | Baseline color vocabulary |
+| 2 | Gemini 3 Flash (cloud API) | 161 | Named colors | Cloud model color vocabulary |
+| 3 | Qwen3-VL-8B (local, llama.cpp) | 161 | Named colors | Newer local model color vocabulary |
+| 4 | Gemini 3 Flash (cloud API) | 20 (random, seed=42) | Hex codes (jersey only) | Hex color specificity |
+| 5 | Qwen3-VL-8B (local, llama.cpp) | 20 (random, seed=42) | Hex codes (jersey only) | Hex color specificity |
+
+---
+
+## Named Color Vocabulary (Tests 1-3)
+
+### Detection Volume
+
+| Metric | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B |
+|--------|---------------|----------------|--------------|
+| Jerseys detected | 369 | 453 | 444 |
+| Errors | 0 | 0 | 1 |
+| Avg time/image | 14.9s | 15.9s | 17.0s |
+| Unique jersey colors | 15 | 19 | 15 |
+| Unique number colors | 11 | 15 | 13 |
+| Combined palette size | 15 | 19 | 17 |
+
+Gemini detected the most jerseys (453) and used the broadest color vocabulary (19 terms). Qwen3-VL-8B detected nearly as many jerseys (444) as Gemini but with a vocabulary closer to the older Qwen2.5 model.
+
+### Jersey Color Distribution
+
+| Color | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B | Notes |
+|-------|---------------|----------------|--------------|-------|
+| white | 84 (22.8%) | 125 (27.6%) | 120 (27.0%) | Top color for all three |
+| blue | 60 (16.3%) | 43 (9.5%) | 69 (15.5%) | Both Qwen models lump blues |
+| green | 48 (13.0%) | 60 (13.2%) | 53 (11.9%) | Consistent across models |
+| black | 31 (8.4%) | 21 (4.6%) | 33 (7.4%) | |
+| purple | 25 (6.8%) | 28 (6.2%) | 30 (6.8%) | Consistent |
+| red | 27 (7.3%) | 22 (4.9%) | 28 (6.3%) | |
+| orange | 24 (6.5%) | 27 (6.0%) | 27 (6.1%) | Very consistent |
+| yellow | 27 (7.3%) | 24 (5.3%) | 26 (5.9%) | |
+| maroon | 14 (3.8%) | 23 (5.1%) | 15 (3.4%) | Gemini uses maroon more |
+| light blue | 6 (1.6%) | 22 (4.9%) | 13 (2.9%) | Gemini distinguishes light blue most |
+| gray/grey | 9 (2.4%) | 12 (2.6%) | 10 (2.3%) | |
+| brown | 6 (1.6%) | 13 (2.9%) | 9 (2.0%) | |
+| teal | 4 (1.1%) | 7 (1.5%) | 7 (1.6%) | |
+| pink | 2 (0.5%) | 2 (0.4%) | 2 (0.5%) | |
+| gold | 2 (0.5%) | 2 (0.4%) | 2 (0.5%) | |
+| navy blue | -- | 11 (2.4%) | -- | Gemini-only |
+| dark blue | -- | 9 (2.0%) | -- | Gemini-only |
+| dark brown | -- | 1 (0.2%) | -- | Gemini-only |
+| navy | -- | 1 (0.2%) | -- | Gemini-only |
+
+### Number Color Distribution
+
+| Color | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B |
+|-------|---------------|----------------|--------------|
+| white | 195 (52.8%) | 183 (40.4%) | 184 (41.4%) |
+| black | 60 (16.3%) | 40 (8.8%) | 44 (9.9%) |
+| yellow | 39 (10.6%) | 58 (12.8%) | 32 (7.2%) |
+| red | 30 (8.1%) | 44 (9.7%) | 41 (9.2%) |
+| blue | 23 (6.2%) | 39 (8.6%) | 39 (8.8%) |
+| orange | 8 (2.2%) | 21 (4.6%) | 29 (6.5%) |
+| gold | -- | 5 (1.1%) | 21 (4.7%) |
+| dark blue | -- | 14 (3.1%) | 9 (2.0%) |
+| maroon | 2 (0.5%) | 14 (3.1%) | 12 (2.7%) |
+| green | 3 (0.8%) | 13 (2.9%) | 14 (3.2%) |
+| purple | 4 (1.1%) | 11 (2.4%) | 11 (2.5%) |
+| pink | 3 (0.8%) | 6 (1.3%) | 6 (1.4%) |
+| brown | 2 (0.5%) | 2 (0.4%) | -- |
+| grey | -- | 2 (0.4%) | -- |
+| navy blue | -- | 1 (0.2%) | -- |
+| silver | -- | -- | 2 (0.5%) |
+
+### Key Differences in Named Color Mode
+
+1. **Gemini has the richest vocabulary.** It uses 19 distinct jersey color terms vs 15 for both Qwen models. The extras are all blue-shade variants (navy blue, dark blue, navy) and dark brown.
+
+2. **Both Qwen models lump blues together.** Qwen2.5-VL-7B reports 60 "blue" jerseys, Qwen3-VL-8B reports 69. Gemini splits these into blue (43), light blue (22), navy blue (11), dark blue (9), and navy (1) — totaling 86 blue-family detections with much finer granularity.
+
+3. **Qwen3-VL-8B is a modest upgrade over Qwen2.5-VL-7B.** It detects 20% more jerseys (444 vs 369) and uses the same 15 jersey color terms but with a slightly more balanced distribution. It has the same vocabulary as Qwen2.5 but added "dark blue", "silver" to its number color palette.
+
+4. **Gemini detects the most jerseys overall.** 453 vs 444 (Qwen3) vs 369 (Qwen2.5). The two newer models are close, while Qwen2.5 lags behind.
+
+5. **All three models are dominated by basic colors.** White, blue/green, and black account for the majority of detections. None spontaneously uses precise shade names like "crimson", "cobalt", or "forest green".
+
+6. **Qwen3-VL-8B favors "gold" for number colors.** It reported gold 21 times for number colors vs Gemini's 5 and Qwen2.5's 0. This may reflect team-specific coloring (e.g., Lakers gold numbers).
+
+---
+
+## Hex Color Specificity (Tests 4-5)
+
+Both tests used the same 20 random images (seed=42) and evaluated **jersey colors only** (number colors excluded since they are usually primary colors like white or black).
+
+### Summary
+
+| Metric | Gemini 3 Flash | Qwen3-VL-8B |
+|--------|----------------|--------------|
+| Images tested | 20 | 20 |
+| Total jerseys | 56 | 59 |
+| Jersey color values | 56 | 59 |
+| Valid hex codes | 56/56 (100%) | 59/59 (100%) |
+| Unique hex values | 24 | 21 |
+| Specific (distinct shade) | 40 (71.4%) | 37 (62.7%) |
+| Generic (near primary) | 16 (28.6%) | 22 (37.3%) |
+
+### Distance from Nearest Primary Color
+
+| Stat | Gemini 3 Flash | Qwen3-VL-8B |
+|------|----------------|--------------|
+| Min | 0.0 | 0.0 |
+| Avg | 44.5 | 34.5 |
+| Max | 111.0 | 110.7 |
+
+(Scale: 0 = exact primary match. 20 = generic threshold. Higher = more specific.)
+
+### Gemini 3 Flash - Unique Hex Values (24)
+
+| Hex | RGB | Count | Classification |
+|-----|-----|-------|---------------|
+| `#004B23` | (0, 75, 35) | x7 | specific, near green (dark), d=63.5 |
+| `#1A2344` | (26, 35, 68) | x2 | specific, near navy, d=74.2 |
+| `#1E4BA1` | (30, 75, 161) | x1 | specific, near navy, d=87.3 |
+| `#2B231D` | (43, 35, 29) | x1 | specific, near black, d=62.6 |
+| `#3D2B1F` | (61, 43, 31) | x1 | specific, near black, d=80.8 |
+| `#461D7C` | (70, 29, 124) | x1 | specific, near purple, d=65.0 |
+| `#4B2E83` | (75, 46, 131) | x5 | specific, near purple, d=70.2 |
+| `#701112` | (112, 17, 18) | x1 | specific, near maroon, d=29.5 |
+| `#7BAFD4` | (123, 175, 212) | x3 | specific, near silver, d=73.8 |
+| `#990000` | (153, 0, 0) | x2 | specific, near maroon, d=25.0 |
+| `#A9A9A9` | (169, 169, 169) | x1 | specific, near silver, d=39.8 |
+| `#C41230` | (196, 18, 48) | x1 | specific, near brown, d=39.7 |
+| `#D11111` | (209, 17, 17) | x2 | specific, near red, d=51.9 |
+| `#D32F2F` | (211, 47, 47) | x2 | specific, near brown, d=46.5 |
+| `#E31837` | (227, 24, 55) | x1 | specific, near brown, d=65.9 |
+| `#E31B23` | (227, 27, 35) | x1 | specific, near red, d=52.3 |
+| `#E3242B` | (227, 36, 43) | x2 | specific, near brown, d=62.3 |
+| `#E6E600` | (230, 230, 0) | x1 | specific, near gold, d=29.2 |
+| `#E8E8E8` | (232, 232, 232) | x1 | specific, near white, d=39.8 |
+| `#E91E63` | (233, 30, 99) | x1 | specific, near brown, d=89.5 |
+| `#F06292` | (240, 98, 146) | x2 | specific, near pink, d=111.0 |
+| `#F57C00` | (245, 124, 0) | x1 | specific, near orange, d=42.2 |
+| `#FFCD00` | (255, 205, 0) | x1 | GENERIC, near gold, d=10.0 |
+| `#FFFFFF` | (255, 255, 255) | x15 | GENERIC, near white, d=0.0 |
+
+### Qwen3-VL-8B - Unique Hex Values (21)
+
+| Hex | RGB | Count | Classification |
+|-----|-----|-------|---------------|
+| `#000000` | (0, 0, 0) | x1 | GENERIC, near black, d=0.0 |
+| `#006400` | (0, 100, 0) | x10 | specific, near green (dark), d=28.0 |
+| `#191970` | (25, 25, 112) | x1 | specific, near navy, d=38.8 |
+| `#19418A` | (25, 65, 138) | x1 | specific, near navy, d=70.4 |
+| `#3D2B21` | (61, 43, 33) | x2 | specific, near black, d=81.6 |
+| `#66B2FF` | (102, 178, 255) | x3 | specific, near silver, d=110.7 |
+| `#6A0DAD` | (106, 13, 173) | x6 | specific, near purple, d=51.7 |
+| `#8B0000` | (139, 0, 0) | x1 | GENERIC, near maroon, d=11.0 |
+| `#A9A9A9` | (169, 169, 169) | x1 | specific, near silver, d=39.8 |
+| `#B22234` | (178, 34, 52) | x2 | GENERIC, near brown, d=18.2 |
+| `#D32F2F` | (211, 47, 47) | x3 | specific, near brown, d=46.5 |
+| `#D60000` | (214, 0, 0) | x3 | specific, near red, d=41.0 |
+| `#DC143C` | (220, 20, 60) | x2 | specific, near brown, d=61.9 |
+| `#F5F5DC` | (245, 245, 220) | x2 | specific, near white, d=37.7 |
+| `#F5F5F5` | (245, 245, 245) | x1 | GENERIC, near white, d=17.3 |
+| `#FF0000` | (255, 0, 0) | x1 | GENERIC, near red, d=0.0 |
+| `#FF6347` | (255, 99, 71) | x1 | specific, near orange, d=96.9 |
+| `#FF69B4` | (255, 105, 180) | x2 | specific, near pink, d=90.0 |
+| `#FFD700` | (255, 215, 0) | x1 | GENERIC, near gold, d=0.0 |
+| `#FFFF00` | (255, 255, 0) | x1 | GENERIC, near yellow, d=0.0 |
+| `#FFFFFF` | (255, 255, 255) | x14 | GENERIC, near white, d=0.0 |
+
+### Notable Findings
+
+- **Both models can produce valid hex codes.** 100% of returned values were valid hex in both cases.
+
+- **Gemini is more specific overall.** 71.4% of its jersey hex codes were distinct shades vs 62.7% for Qwen3. Gemini also produced more unique hex values (24 vs 21) and had a higher average distance from primaries (44.5 vs 34.5).
+
+- **Gemini uses more varied shades of each color family.** For red-family jerseys, Gemini returned 8 distinct hex values (`#701112`, `#990000`, `#C41230`, `#D11111`, `#D32F2F`, `#E31837`, `#E31B23`, `#E3242B`). Qwen3 returned 6 (`#8B0000`, `#B22234`, `#D32F2F`, `#D60000`, `#DC143C`, `#FF0000`), including two exact primaries.
+
+- **Qwen3 reuses hex values more heavily.** `#006400` (dark green) appeared 10 times and `#FFFFFF` 14 times — two values account for 41% of all results. Gemini's most repeated value was `#FFFFFF` at 15 times (27%), with better spread across other shades.
+
+- **White dominates both models.** `#FFFFFF` was the single most common value for both (Gemini: x15, Qwen3: x14), which is expected given white jerseys are the most common in basketball.
+
+- **Both models share some exact hex codes.** `#3D2B21` (dark brown), `#A9A9A9` (dark silver/gray), and `#D32F2F` (medium red) appeared in both models' outputs, suggesting some convergence on certain color estimations.
+
+---
+
+## Conclusions
+
+1. **For basic color categorization, all three models work.** If you only need to distinguish "white vs dark vs colored" jerseys, any will do. Gemini offers slightly finer granularity with its blue-shade vocabulary (navy blue, dark blue, navy).
+
+2. **Gemini detects the most jerseys per image** (2.81 avg), followed closely by Qwen3-VL-8B (2.76 avg), with Qwen2.5-VL-7B trailing (2.29 avg).
+
+3. **Qwen3-VL-8B is a solid upgrade over Qwen2.5-VL-7B** for detection volume (+20% more jerseys) while maintaining the same color vocabulary. It runs locally without cloud API costs, making it a good default choice.
+
+4. **Hex color prompting works for jersey body colors.** Both models return specific hex shades the majority of the time (Gemini 71%, Qwen3 63%). Gemini produces more varied and specific shades, while Qwen3 tends to reuse a smaller set of hex values.
+
+5. **Neither model is a reliable colorimeter.** The hex values should be treated as rough shade estimates, not pixel-accurate measurements. For precise color matching, traditional computer vision (e.g., sampling pixels from the detected jersey region) would be more reliable.
+
+6. **Recommendation:** Use named-color prompts for general jersey classification. Reserve hex-color prompts for use cases where distinguishing similar shades matters (e.g., telling apart two teams that both wear "blue"). Gemini gives the best hex specificity but requires a cloud API; Qwen3-VL-8B is a capable local alternative.
--- a/color_test_results.md
+++ b/color_test_results.md
@ -0,0 +1,299 @@
+#Qwen2.5-VL-7B Model Results:
+
+======================================================================
+COLOR VARIETY SUMMARY
+======================================================================
+Images processed: 161
+Total jerseys detected: 369
+Errors: 0
+Total time: 2397.7s (14.9s avg)
+
+--- Jersey Colors (15 unique) ---
+  white                       84  ##################################################
+  blue                        60  ##################################################
+  green                       48  ################################################
+  black                       31  ###############################
+  yellow                      27  ###########################
+  red                         27  ###########################
+  purple                      25  #########################
+  orange                      24  ########################
+  maroon                      14  ##############
+  gray                         9  #########
+  light blue                   6  ######
+  brown                        6  ######
+  teal                         4  ####
+  pink                         2  ##
+  gold                         2  ##
+
+--- Number Colors (11 unique) ---
+  white                      195  ##################################################
+  black                       60  ##################################################
+  yellow                      39  #######################################
+  red                         30  ##############################
+  blue                        23  #######################
+  orange                       8  ########
+  purple                       4  ####
+  pink                         3  ###
+  green                        3  ###
+  brown                        2  ##
+  maroon                       2  ##
+
+--- Combined Color Palette (15 unique values) ---
+  black                      jersey: 31  number: 60
+  blue                       jersey: 60  number: 23
+  brown                      jersey:  6  number:  2
+  gold                       jersey:  2  number:  0
+  gray                       jersey:  9  number:  0
+  green                      jersey: 48  number:  3
+  light blue                 jersey:  6  number:  0
+  maroon                     jersey: 14  number:  2
+  orange                     jersey: 24  number:  8
+  pink                       jersey:  2  number:  3
+  purple                     jersey: 25  number:  4
+  red                        jersey: 27  number: 30
+  teal                       jersey:  4  number:  0
+  white                      jersey: 84  number:195
+  yellow                     jersey: 27  number: 39
+
+
+#Gemini 3 Flash Results:
+
+======================================================================
+COLOR VARIETY SUMMARY  (gemini-3-flash-preview)
+======================================================================
+Images processed: 161
+Total jerseys detected: 453
+Errors: 0
+Total time: 2560.0s (15.9s avg)
+
+--- Jersey Colors (19 unique) ---
+  white                      125  ##################################################
+  green                       60  ##################################################
+  blue                        43  ###########################################
+  purple                      28  ############################
+  orange                      27  ###########################
+  yellow                      24  ########################
+  maroon                      23  #######################
+  light blue                  22  ######################
+  red                         22  ######################
+  black                       21  #####################
+  brown                       13  #############
+  grey                        12  ############
+  navy blue                   11  ###########
+  dark blue                    9  #########
+  teal                         7  #######
+  pink                         2  ##
+  gold                         2  ##
+  dark brown                   1  #
+  navy                         1  #
+
+--- Number Colors (15 unique) ---
+  white                      183  ##################################################
+  yellow                      58  ##################################################
+  red                         44  ############################################
+  black                       40  ########################################
+  blue                        39  #######################################
+  orange                      21  #####################
+  dark blue                   14  ##############
+  maroon                      14  ##############
+  green                       13  #############
+  purple                      11  ###########
+  pink                         6  ######
+  gold                         5  #####
+  brown                        2  ##
+  grey                         2  ##
+  navy blue                    1  #
+
+--- Combined Color Palette (19 unique values) ---
+  black                      jersey: 21  number: 40
+  blue                       jersey: 43  number: 39
+  brown                      jersey: 13  number:  2
+  dark blue                  jersey:  9  number: 14
+  dark brown                 jersey:  1  number:  0
+  gold                       jersey:  2  number:  5
+  green                      jersey: 60  number: 13
+  grey                       jersey: 12  number:  2
+  light blue                 jersey: 22  number:  0
+  maroon                     jersey: 23  number: 14
+  navy                       jersey:  1  number:  0
+  navy blue                  jersey: 11  number:  1
+  orange                     jersey: 27  number: 21
+  pink                       jersey:  2  number:  6
+  purple                     jersey: 28  number: 11
+  red                        jersey: 22  number: 44
+  teal                       jersey:  7  number:  0
+  white                      jersey:125  number:183
+  yellow                     jersey: 24  number: 58
+
+
+#Qwen3-VL-8B Model Results:
+
+======================================================================
+COLOR VARIETY SUMMARY
+======================================================================
+Images processed: 161
+Total jerseys detected: 444
+Errors: 1
+Total time: 2738.7s (17.0s avg)
+
+--- Jersey Colors (15 unique) ---
+  white                      120  ##################################################
+  blue                        69  ##################################################
+  green                       53  ##################################################
+  black                       33  #################################
+  purple                      30  ##############################
+  red                         28  ############################
+  orange                      27  ###########################
+  yellow                      26  ##########################
+  maroon                      15  ###############
+  light blue                  13  #############
+  gray                        10  ##########
+  brown                        9  #########
+  teal                         7  #######
+  pink                         2  ##
+  gold                         2  ##
+
+--- Number Colors (13 unique) ---
+  white                      184  ##################################################
+  black                       44  ############################################
+  red                         41  #########################################
+  blue                        39  #######################################
+  yellow                      32  ################################
+  orange                      29  #############################
+  gold                        21  #####################
+  green                       14  ##############
+  maroon                      12  ############
+  purple                      11  ###########
+  dark blue                    9  #########
+  pink                         6  ######
+  silver                       2  ##
+
+--- Combined Color Palette (17 unique values) ---
+  black                      jersey: 33  number: 44
+  blue                       jersey: 69  number: 39
+  brown                      jersey:  9  number:  0
+  dark blue                  jersey:  0  number:  9
+  gold                       jersey:  2  number: 21
+  gray                       jersey: 10  number:  0
+  green                      jersey: 53  number: 14
+  light blue                 jersey: 13  number:  0
+  maroon                     jersey: 15  number: 12
+  orange                     jersey: 27  number: 29
+  pink                       jersey:  2  number:  6
+  purple                     jersey: 30  number: 11
+  red                        jersey: 28  number: 41
+  silver                     jersey:  0  number:  2
+  teal                       jersey:  7  number:  0
+  white                      jersey:120  number:184
+  yellow                     jersey: 26  number: 32
+
+
+
+#Gemini 3 Flash (Hex Colors, random sample of 10 images) Results:
+
+Test params: test_hex_color_specificity.py --sample 20 --seed 42
+
+======================================================================
+HEX COLOR SPECIFICITY ANALYSIS
+======================================================================
+Model: gemini-3-flash-preview
+Images tested: 20 (seed=42)
+Total jerseys: 56
+Total jersey color values: 56
+Errors: 0
+
+Valid hex codes: 56/56
+
+--- Specificity Breakdown ---
+  Generic (near a pure primary):   16  (28.6%)
+  Specific (distinct shade):       40  (71.4%)
+
+--- Unique Hex Values (24) ---
+  #004B23  RGB(  0, 75, 35)  HSL(148.0,100.0%,14.7%)  x7  [specific, near green (dark), d=63.5]
+  #1A2344  RGB( 26, 35, 68)  HSL(227.1,44.7%,18.4%)  x2  [specific, near navy, d=74.2]
+  #1E4BA1  RGB( 30, 75,161)  HSL(219.4,68.6%,37.5%)  x1  [specific, near navy, d=87.3]
+  #2B231D  RGB( 43, 35, 29)  HSL( 25.7,19.4%,14.1%)  x1  [specific, near black, d=62.6]
+  #3D2B1F  RGB( 61, 43, 31)  HSL( 24.0,32.6%,18.0%)  x1  [specific, near black, d=80.8]
+  #461D7C  RGB( 70, 29,124)  HSL(265.9,62.1%,30.0%)  x1  [specific, near purple, d=65.0]
+  #4B2E83  RGB( 75, 46,131)  HSL(260.5,48.0%,34.7%)  x5  [specific, near purple, d=70.2]
+  #701112  RGB(112, 17, 18)  HSL(359.4,73.6%,25.3%)  x1  [specific, near maroon, d=29.5]
+  #7BAFD4  RGB(123,175,212)  HSL(204.9,50.9%,65.7%)  x3  [specific, near silver, d=73.8]
+  #990000  RGB(153,  0,  0)  HSL(  0.0,100.0%,30.0%)  x2  [specific, near maroon, d=25.0]
+  #A9A9A9  RGB(169,169,169)  HSL(  0.0, 0.0%,66.3%)  x1  [specific, near silver, d=39.8]
+  #C41230  RGB(196, 18, 48)  HSL(349.9,83.2%,42.0%)  x1  [specific, near brown, d=39.7]
+  #D11111  RGB(209, 17, 17)  HSL(  0.0,85.0%,44.3%)  x2  [specific, near red, d=51.9]
+  #D32F2F  RGB(211, 47, 47)  HSL(  0.0,65.1%,50.6%)  x2  [specific, near brown, d=46.5]
+  #E31837  RGB(227, 24, 55)  HSL(350.8,80.9%,49.2%)  x1  [specific, near brown, d=65.9]
+  #E31B23  RGB(227, 27, 35)  HSL(357.6,78.7%,49.8%)  x1  [specific, near red, d=52.3]
+  #E3242B  RGB(227, 36, 43)  HSL(357.8,77.3%,51.6%)  x2  [specific, near brown, d=62.3]
+  #E6E600  RGB(230,230,  0)  HSL( 60.0,100.0%,45.1%)  x1  [specific, near gold, d=29.2]
+  #E8E8E8  RGB(232,232,232)  HSL(  0.0, 0.0%,91.0%)  x1  [specific, near white, d=39.8]
+  #E91E63  RGB(233, 30, 99)  HSL(339.6,82.2%,51.6%)  x1  [specific, near brown, d=89.5]
+  #F06292  RGB(240, 98,146)  HSL(339.7,82.6%,66.3%)  x2  [specific, near pink, d=111.0]
+  #F57C00  RGB(245,124,  0)  HSL( 30.4,100.0%,48.0%)  x1  [specific, near orange, d=42.2]
+  #FFCD00  RGB(255,205,  0)  HSL( 48.2,100.0%,50.0%)  x1  [GENERIC, near gold, d=10.0]
+  #FFFFFF  RGB(255,255,255)  HSL(  0.0, 0.0%,100.0%)  x15  [GENERIC, near white, d=0.0]
+
+--- Distance from Nearest Primary ---
+  Min: 0.0   Avg: 44.5   Max: 111.0
+  (Higher = more specific. Threshold for 'generic' = 20)
+
+--- Verdict ---
+  MIXED results (71% specific).
+  The model sometimes returns specific shades but often falls back to primaries.
+
+
+
+#Qwen3-VL-8B (Hex Colors, random sample of 10 images) Results:
+
+Test params: test_hex_color_specificity_llama.py --sample 20 --seed 42
+
+======================================================================
+HEX COLOR SPECIFICITY ANALYSIS
+======================================================================
+Model: unsloth_Qwen3-VL-8B-Instruct-GGUF_Qwen3-VL-8B-Instruct-BF16
+Server: http://agx:8080
+Images tested: 20 (seed=42)
+Total jerseys: 59
+Total jersey color values: 59
+Errors: 0
+
+Valid hex codes: 59/59
+
+--- Specificity Breakdown ---
+  Generic (near a pure primary):   22  (37.3%)
+  Specific (distinct shade):       37  (62.7%)
+
+--- Unique Hex Values (21) ---
+  #000000  RGB(  0,  0,  0)  HSL(  0.0, 0.0%, 0.0%)  x1  [GENERIC, near black, d=0.0]
+  #006400  RGB(  0,100,  0)  HSL(120.0,100.0%,19.6%)  x10  [specific, near green (dark), d=28.0]
+  #191970  RGB( 25, 25,112)  HSL(240.0,63.5%,26.9%)  x1  [specific, near navy, d=38.8]
+  #19418A  RGB( 25, 65,138)  HSL(218.8,69.3%,32.0%)  x1  [specific, near navy, d=70.4]
+  #3D2B21  RGB( 61, 43, 33)  HSL( 21.4,29.8%,18.4%)  x2  [specific, near black, d=81.6]
+  #66B2FF  RGB(102,178,255)  HSL(210.2,100.0%,70.0%)  x3  [specific, near silver, d=110.7]
+  #6A0DAD  RGB(106, 13,173)  HSL(274.9,86.0%,36.5%)  x6  [specific, near purple, d=51.7]
+  #8B0000  RGB(139,  0,  0)  HSL(  0.0,100.0%,27.3%)  x1  [GENERIC, near maroon, d=11.0]
+  #A9A9A9  RGB(169,169,169)  HSL(  0.0, 0.0%,66.3%)  x1  [specific, near silver, d=39.8]
+  #B22234  RGB(178, 34, 52)  HSL(352.5,67.9%,41.6%)  x2  [GENERIC, near brown, d=18.2]
+  #D32F2F  RGB(211, 47, 47)  HSL(  0.0,65.1%,50.6%)  x3  [specific, near brown, d=46.5]
+  #D60000  RGB(214,  0,  0)  HSL(  0.0,100.0%,42.0%)  x3  [specific, near red, d=41.0]
+  #DC143C  RGB(220, 20, 60)  HSL(348.0,83.3%,47.1%)  x2  [specific, near brown, d=61.9]
+  #F5F5DC  RGB(245,245,220)  HSL( 60.0,55.6%,91.2%)  x2  [specific, near white, d=37.7]
+  #F5F5F5  RGB(245,245,245)  HSL(  0.0, 0.0%,96.1%)  x1  [GENERIC, near white, d=17.3]
+  #FF0000  RGB(255,  0,  0)  HSL(  0.0,100.0%,50.0%)  x1  [GENERIC, near red, d=0.0]
+  #FF6347  RGB(255, 99, 71)  HSL(  9.1,100.0%,63.9%)  x1  [specific, near orange, d=96.9]
+  #FF69B4  RGB(255,105,180)  HSL(330.0,100.0%,70.6%)  x2  [specific, near pink, d=90.0]
+  #FFD700  RGB(255,215,  0)  HSL( 50.6,100.0%,50.0%)  x1  [GENERIC, near gold, d=0.0]
+  #FFFF00  RGB(255,255,  0)  HSL( 60.0,100.0%,50.0%)  x1  [GENERIC, near yellow, d=0.0]
+  #FFFFFF  RGB(255,255,255)  HSL(  0.0, 0.0%,100.0%)  x14  [GENERIC, near white, d=0.0]
+
+--- Distance from Nearest Primary ---
+  Min: 0.0   Avg: 34.5   Max: 110.7
+  (Higher = more specific. Threshold for 'generic' = 20)
+
+--- Verdict ---
+  MIXED results (63% specific).
+  The model sometimes returns specific shades but often falls back to primaries.
+
+
+
--- a/jersey_prompt_hex_color.txt
+++ b/jersey_prompt_hex_color.txt
@ -0,0 +1,50 @@
+You are an expert at detecting sports jerseys in images. Carefully examine the provided image and identify all visible sports jerseys.
+
+CRITICAL INSTRUCTIONS:
+1. ONLY detect jerseys that are CLEARLY VISIBLE in the image
+2. ONLY include jersey numbers that you can ACTUALLY READ in the image
+3. If you CANNOT see any jerseys, you MUST return {"jerseys": []}
+4. DO NOT make up, imagine, or guess jersey numbers that aren't visible
+5. DO NOT include jerseys if you cannot clearly see the number
+
+COLOR INSTRUCTIONS:
+- Report jersey_color and number_color as HEX color codes (e.g. "#8B0000", "#1E3A5F")
+- Do NOT use generic color names like "red", "blue", "white"
+- Estimate the SPECIFIC shade you see in the image as precisely as possible
+- For example: dark maroon should be "#800000", not "#FF0000"
+- Royal blue should be "#4169E1", not "#0000FF"
+
+RESPONSE FORMAT:
+Respond ONLY with a valid JSON object. No explanations, no markdown, no extra text.
+
+Use DOUBLE QUOTES (") for all JSON keys and string values.
+
+The JSON must have a single key "jerseys" with an array of dictionaries.
+
+Each dictionary must have exactly these three keys:
+- "jersey_number": The number on the jersey (as a string, only if clearly visible)
+- "jersey_color": The primary color of the jersey as a HEX code (e.g. "#8B0000")
+- "number_color": The color of the number on the jersey as a HEX code (e.g. "#FFFFFF")
+
+Example response for an image WITH visible jerseys:
+{
+  "jerseys": [
+    {
+      "jersey_number": "101",
+      "jersey_color": "#8B0000",
+      "number_color": "#F5F5DC"
+    },
+    {
+      "jersey_number": "142",
+      "jersey_color": "#1E3A5F",
+      "number_color": "#DAA520"
+    }
+  ]
+}
+
+Example response for an image WITHOUT jerseys or with unclear numbers:
+{"jerseys": []}
+
+REMEMBER: Only include jerseys with numbers you can ACTUALLY SEE in the image. When in doubt, return empty array.
+
+Now analyze the image and return the JSON object.
--- a/test_color_variety.py
+++ b/test_color_variety.py
@ -0,0 +1,151 @@
+#!/usr/bin/env python3
+"""
+Test script to discover the variety of colors a VLM returns for jersey detection.
+
+Submits all test images to the VLM and tallies every unique jersey_color and
+number_color value, producing a summary of the model's color vocabulary.
+
+Usage:
+    python test_color_variety.py
+"""
+
+import json
+import os
+import re
+import sys
+import time
+from collections import Counter
+from pathlib import Path
+
+import cv2
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from scan_utils.llama_cpp_client import LlamaCppClient
+
+SERVER_URL = "http://agx:8080"
+IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files")
+PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
+
+
+def clean_response(text: str) -> str:
+    """Remove think tags and markdown code blocks from model output."""
+    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
+    cleaned = re.sub(r'\u25c1think\u25b7.*?\u25c1/think\u25b7', '', cleaned, flags=re.DOTALL)
+    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
+    cleaned = re.sub(r'\u25c1/?think\u25b7', '', cleaned, flags=re.IGNORECASE)
+
+    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
+    if json_block:
+        cleaned = json_block.group(1)
+    else:
+        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
+
+    return cleaned.strip()
+
+
+def main():
+    # Load prompt
+    with open(PROMPT_FILE, 'r') as f:
+        prompt = f.read()
+
+    # Gather image files (extensions OpenCV can handle)
+    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
+    image_files = sorted([
+        p for p in Path(IMAGES_DIR).iterdir()
+        if p.suffix.lower() in valid_extensions
+    ])
+
+    skipped = sorted([
+        p.name for p in Path(IMAGES_DIR).iterdir()
+        if p.is_file() and p.suffix.lower() not in valid_extensions
+    ])
+
+    print(f"Images to process: {len(image_files)}")
+    if skipped:
+        print(f"Skipping {len(skipped)} unsupported files: {', '.join(skipped)}")
+    print(f"Server: {SERVER_URL}")
+    print(f"Prompt: {PROMPT_FILE} ({len(prompt)} chars)")
+    print("=" * 70)
+
+    client = LlamaCppClient(base_url=SERVER_URL)
+
+    jersey_color_counter = Counter()
+    number_color_counter = Counter()
+    total_jerseys = 0
+    errors = 0
+    start_all = time.time()
+
+    for i, image_path in enumerate(image_files, 1):
+        print(f"[{i}/{len(image_files)}] {image_path.name} ... ", end="", flush=True)
+
+        image = cv2.imread(str(image_path))
+        if image is None:
+            print("SKIP (failed to load)")
+            errors += 1
+            continue
+
+        message = client.create_multimodal_message(role="user", content=prompt, images=[image])
+
+        try:
+            t0 = time.time()
+            response = client.chat_completion(messages=[message], temperature=0.1, max_tokens=1000)
+            elapsed = time.time() - t0
+
+            response_text = response['choices'][0]['message']['content']
+            cleaned = clean_response(response_text)
+            result = json.loads(cleaned)
+            jerseys = result.get('jerseys', [])
+
+            colors_found = []
+            for j in jerseys:
+                jc = j.get('jersey_color', '').strip().lower()
+                nc = j.get('number_color', '').strip().lower()
+                if jc:
+                    jersey_color_counter[jc] += 1
+                if nc:
+                    number_color_counter[nc] += 1
+                colors_found.append(f"{jc}/{nc}")
+                total_jerseys += 1
+
+            print(f"{len(jerseys)} jersey(s) in {elapsed:.1f}s  {', '.join(colors_found) if colors_found else '(none)'}")
+
+        except (json.JSONDecodeError, KeyError, IndexError) as e:
+            print(f"PARSE ERROR: {e}")
+            errors += 1
+        except Exception as e:
+            print(f"ERROR: {e}")
+            errors += 1
+
+    total_time = time.time() - start_all
+
+    # --- Summary ---
+    print()
+    print("=" * 70)
+    print("COLOR VARIETY SUMMARY")
+    print("=" * 70)
+    print(f"Images processed: {len(image_files)}")
+    print(f"Total jerseys detected: {total_jerseys}")
+    print(f"Errors: {errors}")
+    print(f"Total time: {total_time:.1f}s ({total_time / len(image_files):.1f}s avg)")
+
+    print(f"\n--- Jersey Colors ({len(jersey_color_counter)} unique) ---")
+    for color, count in jersey_color_counter.most_common():
+        bar = "#" * min(count, 50)
+        print(f"  {color:25s} {count:4d}  {bar}")
+
+    print(f"\n--- Number Colors ({len(number_color_counter)} unique) ---")
+    for color, count in number_color_counter.most_common():
+        bar = "#" * min(count, 50)
+        print(f"  {color:25s} {count:4d}  {bar}")
+
+    # Combined unique palette
+    all_colors = sorted(set(jersey_color_counter.keys()) | set(number_color_counter.keys()))
+    print(f"\n--- Combined Color Palette ({len(all_colors)} unique values) ---")
+    for color in all_colors:
+        jc = jersey_color_counter.get(color, 0)
+        nc = number_color_counter.get(color, 0)
+        print(f"  {color:25s}  jersey:{jc:3d}  number:{nc:3d}")
+
+
+if __name__ == '__main__':
+    main()
--- a/test_color_variety_gemini.py
+++ b/test_color_variety_gemini.py
@ -0,0 +1,270 @@
+#!/usr/bin/env python3
+"""
+Test script to discover the variety of colors the Gemini 3 Flash VLM returns
+for jersey detection.
+
+Submits all test images to the Gemini API and tallies every unique jersey_color
+and number_color value, producing a summary of the model's color vocabulary.
+
+Usage:
+    python test_color_variety_gemini.py
+"""
+
+import base64
+import json
+import os
+import re
+import sys
+import time
+from collections import Counter
+from pathlib import Path
+
+import cv2
+import requests
+
+GEMINI_MODEL = "gemini-3-flash-preview"
+API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{GEMINI_MODEL}:generateContent"
+
+IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files")
+PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt.txt")
+API_KEY_FILE = os.path.join(os.path.dirname(__file__), "gemini_api_key.txt")
+
+
+def load_api_key() -> str:
+    with open(API_KEY_FILE, 'r') as f:
+        return f.read().strip()
+
+
+def clean_response(text: str) -> str:
+    """Remove think tags and markdown code blocks from model output."""
+    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
+    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
+
+    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
+    if json_block:
+        cleaned = json_block.group(1)
+    else:
+        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
+
+    return cleaned.strip()
+
+
+def salvage_jerseys(text: str) -> list[dict]:
+    """Extract complete jersey objects from truncated JSON using regex.
+
+    When the response is cut off mid-JSON, json.loads() fails.  We can still
+    recover every fully-formed jersey object that was returned before the
+    truncation point.
+    """
+    pattern = re.compile(
+        r'\{\s*'
+        r'"jersey_number"\s*:\s*"[^"]*"\s*,\s*'
+        r'"jersey_color"\s*:\s*"([^"]*)"\s*,\s*'
+        r'"number_color"\s*:\s*"([^"]*)"\s*'
+        r'\}',
+        re.DOTALL,
+    )
+    jerseys = []
+    for m in pattern.finditer(text):
+        jerseys.append({
+            'jersey_color': m.group(1),
+            'number_color': m.group(2),
+        })
+    return jerseys
+
+
+def encode_image(image_path: str) -> tuple[str, str]:
+    """Read an image file and return (base64_data, mime_type)."""
+    ext = Path(image_path).suffix.lower()
+    mime_map = {
+        '.jpg': 'image/jpeg',
+        '.jpeg': 'image/jpeg',
+        '.png': 'image/png',
+        '.webp': 'image/webp',
+        '.bmp': 'image/bmp',
+        '.tiff': 'image/tiff',
+    }
+    mime_type = mime_map.get(ext, 'image/jpeg')
+
+    with open(image_path, 'rb') as f:
+        data = base64.b64encode(f.read()).decode('utf-8')
+
+    return data, mime_type
+
+
+MAX_RETRIES = 3
+RETRY_BACKOFF = [2, 5, 10]  # seconds between retries
+
+
+def call_gemini(api_key: str, image_path: str, prompt: str) -> dict:
+    """Send an image + prompt to the Gemini API and return parsed JSON."""
+    image_data, mime_type = encode_image(image_path)
+
+    payload = {
+        "contents": [{
+            "parts": [
+                {
+                    "inline_data": {
+                        "mime_type": mime_type,
+                        "data": image_data,
+                    }
+                },
+                {
+                    "text": prompt,
+                }
+            ]
+        }],
+        "generationConfig": {
+            "temperature": 0.1,
+            "maxOutputTokens": 8192,
+            "responseMimeType": "application/json",
+        }
+    }
+
+    for attempt in range(MAX_RETRIES):
+        response = requests.post(
+            API_URL,
+            headers={
+                "x-goog-api-key": api_key,
+                "Content-Type": "application/json",
+            },
+            json=payload,
+        )
+
+        if response.status_code >= 500 and attempt < MAX_RETRIES - 1:
+            wait = RETRY_BACKOFF[attempt]
+            print(f"HTTP {response.status_code}, retry in {wait}s ... ", end="", flush=True)
+            time.sleep(wait)
+            continue
+
+        response.raise_for_status()
+        return response.json()
+
+    # Should not reach here, but just in case
+    response.raise_for_status()
+    return response.json()
+
+
+def main():
+    api_key = load_api_key()
+
+    with open(PROMPT_FILE, 'r') as f:
+        prompt = f.read()
+
+    # Gather image files (extensions the API can handle)
+    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
+    image_files = sorted([
+        p for p in Path(IMAGES_DIR).iterdir()
+        if p.suffix.lower() in valid_extensions
+    ])
+
+    skipped = sorted([
+        p.name for p in Path(IMAGES_DIR).iterdir()
+        if p.is_file() and p.suffix.lower() not in valid_extensions
+    ])
+
+    print(f"Model: {GEMINI_MODEL}")
+    print(f"Images to process: {len(image_files)}")
+    if skipped:
+        print(f"Skipping {len(skipped)} unsupported files: {', '.join(skipped)}")
+    print(f"Prompt: {PROMPT_FILE} ({len(prompt)} chars)")
+    print("=" * 70)
+
+    jersey_color_counter = Counter()
+    number_color_counter = Counter()
+    total_jerseys = 0
+    errors = 0
+    start_all = time.time()
+
+    for i, image_path in enumerate(image_files, 1):
+        print(f"[{i}/{len(image_files)}] {image_path.name} ... ", end="", flush=True)
+
+        try:
+            t0 = time.time()
+            resp = call_gemini(api_key, str(image_path), prompt)
+            elapsed = time.time() - t0
+
+            # Extract text from Gemini response
+            text = resp['candidates'][0]['content']['parts'][0]['text']
+            cleaned = clean_response(text)
+            result = json.loads(cleaned)
+            jerseys = result.get('jerseys', [])
+
+            colors_found = []
+            for j in jerseys:
+                jc = j.get('jersey_color', '').strip().lower()
+                nc = j.get('number_color', '').strip().lower()
+                if jc:
+                    jersey_color_counter[jc] += 1
+                if nc:
+                    number_color_counter[nc] += 1
+                colors_found.append(f"{jc}/{nc}")
+                total_jerseys += 1
+
+            print(f"{len(jerseys)} jersey(s) in {elapsed:.1f}s  {', '.join(colors_found) if colors_found else '(none)'}")
+
+        except (json.JSONDecodeError, KeyError, IndexError) as e:
+            raw = ""
+            try:
+                raw = resp['candidates'][0]['content']['parts'][0]['text']
+            except Exception:
+                pass
+            # Try to salvage complete jersey objects from truncated JSON
+            salvaged = salvage_jerseys(raw) if raw else []
+            if salvaged:
+                colors_found = []
+                for j in salvaged:
+                    jc = j.get('jersey_color', '').strip().lower()
+                    nc = j.get('number_color', '').strip().lower()
+                    if jc:
+                        jersey_color_counter[jc] += 1
+                    if nc:
+                        number_color_counter[nc] += 1
+                    colors_found.append(f"{jc}/{nc}")
+                    total_jerseys += 1
+                print(f"TRUNCATED, salvaged {len(salvaged)} jersey(s) in {elapsed:.1f}s  {', '.join(colors_found)}")
+            else:
+                print(f"PARSE ERROR: {e}")
+                if raw:
+                    print(f"         raw: {raw[:200]}")
+                errors += 1
+        except requests.exceptions.HTTPError as e:
+            print(f"HTTP ERROR: {e}")
+            errors += 1
+        except Exception as e:
+            print(f"ERROR: {e}")
+            errors += 1
+
+    total_time = time.time() - start_all
+
+    # --- Summary ---
+    print()
+    print("=" * 70)
+    print(f"COLOR VARIETY SUMMARY  ({GEMINI_MODEL})")
+    print("=" * 70)
+    print(f"Images processed: {len(image_files)}")
+    print(f"Total jerseys detected: {total_jerseys}")
+    print(f"Errors: {errors}")
+    print(f"Total time: {total_time:.1f}s ({total_time / len(image_files):.1f}s avg)")
+
+    print(f"\n--- Jersey Colors ({len(jersey_color_counter)} unique) ---")
+    for color, count in jersey_color_counter.most_common():
+        bar = "#" * min(count, 50)
+        print(f"  {color:25s} {count:4d}  {bar}")
+
+    print(f"\n--- Number Colors ({len(number_color_counter)} unique) ---")
+    for color, count in number_color_counter.most_common():
+        bar = "#" * min(count, 50)
+        print(f"  {color:25s} {count:4d}  {bar}")
+
+    # Combined unique palette
+    all_colors = sorted(set(jersey_color_counter.keys()) | set(number_color_counter.keys()))
+    print(f"\n--- Combined Color Palette ({len(all_colors)} unique values) ---")
+    for color in all_colors:
+        jc = jersey_color_counter.get(color, 0)
+        nc = number_color_counter.get(color, 0)
+        print(f"  {color:25s}  jersey:{jc:3d}  number:{nc:3d}")
+
+
+if __name__ == '__main__':
+    main()
--- a/test_hex_color_specificity.py
+++ b/test_hex_color_specificity.py
@ -0,0 +1,355 @@
+#!/usr/bin/env python3
+"""
+Test whether the Gemini 3 Flash VLM can return specific hex color codes
+rather than generic named colors.
+
+Sends a random sample of 10 images using a hex-color prompt, then analyzes
+how specific the returned colors actually are by comparing them against
+a set of known "pure primary" hex values.
+
+Usage:
+    python test_hex_color_specificity.py
+    python test_hex_color_specificity.py --sample 20
+    python test_hex_color_specificity.py --seed 42
+"""
+
+import argparse
+import base64
+import colorsys
+import json
+import math
+import os
+import random
+import re
+import time
+from pathlib import Path
+
+import requests
+
+GEMINI_MODEL = "gemini-3-flash-preview"
+API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{GEMINI_MODEL}:generateContent"
+
+IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files")
+PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt_hex_color.txt")
+API_KEY_FILE = os.path.join(os.path.dirname(__file__), "gemini_api_key.txt")
+
+# Pure primary / basic colors that indicate the model is NOT being specific
+PRIMARY_COLORS = {
+    "#FF0000": "red",
+    "#00FF00": "green",
+    "#0000FF": "blue",
+    "#FFFF00": "yellow",
+    "#FF00FF": "magenta",
+    "#00FFFF": "cyan",
+    "#FFFFFF": "white",
+    "#000000": "black",
+    "#808080": "gray",
+    "#FFA500": "orange",
+    "#800080": "purple",
+    "#FFC0CB": "pink",
+    "#A52A2A": "brown",
+    "#800000": "maroon",
+    "#008000": "green (dark)",
+    "#000080": "navy",
+    "#C0C0C0": "silver",
+    "#FFD700": "gold",
+}
+
+# Distance threshold: how close to a primary a hex must be to count as "generic"
+# In RGB space (0-255 per channel), 20 is very close
+GENERIC_DISTANCE_THRESHOLD = 20
+
+
+def hex_to_rgb(h: str) -> tuple[int, int, int] | None:
+    """Parse a hex color string to (r, g, b). Returns None if invalid."""
+    h = h.strip().lstrip('#')
+    if len(h) == 6:
+        try:
+            return (int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
+        except ValueError:
+            return None
+    if len(h) == 3:
+        try:
+            return (int(h[0]*2, 16), int(h[1]*2, 16), int(h[2]*2, 16))
+        except ValueError:
+            return None
+    return None
+
+
+def rgb_distance(a: tuple[int, int, int], b: tuple[int, int, int]) -> float:
+    """Euclidean distance between two RGB colors."""
+    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))
+
+
+def rgb_to_hsl(r: int, g: int, b: int) -> tuple[float, float, float]:
+    """Convert RGB (0-255) to HSL (h: 0-360, s: 0-100, l: 0-100)."""
+    h, l, s = colorsys.rgb_to_hls(r / 255, g / 255, b / 255)
+    return round(h * 360, 1), round(s * 100, 1), round(l * 100, 1)
+
+
+def classify_color(hex_str: str) -> dict:
+    """Classify a hex color as generic/primary or specific.
+
+    Returns a dict with:
+        valid: bool - whether the hex code is a valid color
+        hex: str - normalized hex code
+        rgb: tuple - (r, g, b)
+        hsl: tuple - (h, s, l)
+        is_generic: bool - True if the color is a pure primary / basic color
+        nearest_primary: str - the closest primary color name
+        primary_distance: float - RGB distance to the nearest primary
+    """
+    rgb = hex_to_rgb(hex_str)
+    if rgb is None:
+        return {'valid': False, 'hex': hex_str, 'reason': 'invalid hex'}
+
+    normalized = f"#{rgb[0]:02X}{rgb[1]:02X}{rgb[2]:02X}"
+    hsl = rgb_to_hsl(*rgb)
+
+    # Find nearest primary
+    best_name = "unknown"
+    best_dist = float('inf')
+    for phex, pname in PRIMARY_COLORS.items():
+        prgb = hex_to_rgb(phex)
+        d = rgb_distance(rgb, prgb)
+        if d < best_dist:
+            best_dist = d
+            best_name = pname
+
+    is_generic = best_dist < GENERIC_DISTANCE_THRESHOLD
+
+    return {
+        'valid': True,
+        'hex': normalized,
+        'rgb': rgb,
+        'hsl': hsl,
+        'is_generic': is_generic,
+        'nearest_primary': best_name,
+        'primary_distance': round(best_dist, 1),
+    }
+
+
+def load_api_key() -> str:
+    with open(API_KEY_FILE, 'r') as f:
+        return f.read().strip()
+
+
+def clean_response(text: str) -> str:
+    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
+    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
+    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
+    if json_block:
+        cleaned = json_block.group(1)
+    else:
+        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
+    return cleaned.strip()
+
+
+def salvage_jerseys(text: str) -> list[dict]:
+    """Extract complete jersey objects from truncated JSON."""
+    pattern = re.compile(
+        r'\{\s*'
+        r'"jersey_number"\s*:\s*"[^"]*"\s*,\s*'
+        r'"jersey_color"\s*:\s*"([^"]*)"\s*,\s*'
+        r'"number_color"\s*:\s*"([^"]*)"\s*'
+        r'\}',
+        re.DOTALL,
+    )
+    return [{'jersey_color': m.group(1), 'number_color': m.group(2)} for m in pattern.finditer(text)]
+
+
+def encode_image(image_path: str) -> tuple[str, str]:
+    ext = Path(image_path).suffix.lower()
+    mime_map = {'.jpg': 'image/jpeg', '.jpeg': 'image/jpeg', '.png': 'image/png',
+                '.webp': 'image/webp', '.bmp': 'image/bmp', '.tiff': 'image/tiff'}
+    mime_type = mime_map.get(ext, 'image/jpeg')
+    with open(image_path, 'rb') as f:
+        data = base64.b64encode(f.read()).decode('utf-8')
+    return data, mime_type
+
+
+MAX_RETRIES = 3
+RETRY_BACKOFF = [2, 5, 10]
+
+
+def call_gemini(api_key: str, image_path: str, prompt: str) -> dict:
+    image_data, mime_type = encode_image(image_path)
+    payload = {
+        "contents": [{"parts": [
+            {"inline_data": {"mime_type": mime_type, "data": image_data}},
+            {"text": prompt},
+        ]}],
+        "generationConfig": {
+            "temperature": 0.1,
+            "maxOutputTokens": 8192,
+            "responseMimeType": "application/json",
+        }
+    }
+    for attempt in range(MAX_RETRIES):
+        response = requests.post(
+            API_URL,
+            headers={"x-goog-api-key": api_key, "Content-Type": "application/json"},
+            json=payload,
+        )
+        if response.status_code >= 500 and attempt < MAX_RETRIES - 1:
+            wait = RETRY_BACKOFF[attempt]
+            print(f"HTTP {response.status_code}, retry in {wait}s ... ", end="", flush=True)
+            time.sleep(wait)
+            continue
+        response.raise_for_status()
+        return response.json()
+    response.raise_for_status()
+    return response.json()
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Test hex color specificity from Gemini VLM")
+    parser.add_argument('--sample', type=int, default=10, help='Number of random images to test (default: 10)')
+    parser.add_argument('--seed', type=int, default=None, help='Random seed for reproducibility')
+    args = parser.parse_args()
+
+    api_key = load_api_key()
+
+    with open(PROMPT_FILE, 'r') as f:
+        prompt = f.read()
+
+    # Gather and sample image files
+    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
+    all_images = sorted([
+        p for p in Path(IMAGES_DIR).iterdir()
+        if p.suffix.lower() in valid_extensions
+    ])
+
+    seed = args.seed if args.seed is not None else random.randint(0, 99999)
+    rng = random.Random(seed)
+    sample_size = min(args.sample, len(all_images))
+    sample_images = rng.sample(all_images, sample_size)
+    sample_images.sort()
+
+    print(f"Model: {GEMINI_MODEL}")
+    print(f"Prompt: {PROMPT_FILE}")
+    print(f"Sample: {sample_size} images (seed={seed})")
+    print(f"Selected: {', '.join(p.name for p in sample_images)}")
+    print("=" * 70)
+
+    # Collect all color classifications
+    all_colors = []  # list of dicts with image, field, hex, classification
+    total_jerseys = 0
+    errors = 0
+
+    for i, image_path in enumerate(sample_images, 1):
+        print(f"\n[{i}/{sample_size}] {image_path.name}")
+
+        try:
+            t0 = time.time()
+            resp = call_gemini(api_key, str(image_path), prompt)
+            elapsed = time.time() - t0
+
+            text = resp['candidates'][0]['content']['parts'][0]['text']
+            cleaned = clean_response(text)
+
+            try:
+                result = json.loads(cleaned)
+                jerseys = result.get('jerseys', [])
+            except json.JSONDecodeError:
+                jerseys = salvage_jerseys(cleaned)
+                if jerseys:
+                    print(f"  (truncated response, salvaged {len(jerseys)} jersey(s))")
+
+            print(f"  {len(jerseys)} jersey(s) detected in {elapsed:.1f}s")
+
+            for j in jerseys:
+                total_jerseys += 1
+                raw_hex = j.get('jersey_color', '')
+                c = classify_color(raw_hex)
+                c['image'] = image_path.name
+                c['field'] = 'jersey_color'
+                c['raw'] = raw_hex
+                all_colors.append(c)
+
+                if not c['valid']:
+                    status = f"  INVALID ({raw_hex})"
+                elif c['is_generic']:
+                    status = f"  GENERIC  ~{c['nearest_primary']}"
+                else:
+                    status = f"  SPECIFIC (nearest: {c['nearest_primary']}, dist={c['primary_distance']})"
+
+                print(f"    jersey: {raw_hex:10s} -> {status}")
+
+        except requests.exceptions.HTTPError as e:
+            print(f"  HTTP ERROR: {e}")
+            errors += 1
+        except Exception as e:
+            print(f"  ERROR: {e}")
+            errors += 1
+
+    # --- Analysis ---
+    print()
+    print("=" * 70)
+    print("HEX COLOR SPECIFICITY ANALYSIS")
+    print("=" * 70)
+    print(f"Model: {GEMINI_MODEL}")
+    print(f"Images tested: {sample_size} (seed={seed})")
+    print(f"Total jerseys: {total_jerseys}")
+    print(f"Total jersey color values: {len(all_colors)}")
+    print(f"Errors: {errors}")
+
+    valid_colors = [c for c in all_colors if c['valid']]
+    invalid_colors = [c for c in all_colors if not c['valid']]
+
+    print(f"\nValid hex codes: {len(valid_colors)}/{len(all_colors)}")
+    if invalid_colors:
+        print(f"Invalid values ({len(invalid_colors)}):")
+        for c in invalid_colors:
+            print(f"  {c['image']} {c['field']}: {c['raw']}")
+
+    if not valid_colors:
+        print("\nNo valid hex colors returned. The model may not support hex output.")
+        return
+
+    generic = [c for c in valid_colors if c['is_generic']]
+    specific = [c for c in valid_colors if not c['is_generic']]
+
+    pct_specific = len(specific) / len(valid_colors) * 100
+
+    print(f"\n--- Specificity Breakdown ---")
+    print(f"  Generic (near a pure primary):  {len(generic):3d}  ({100 - pct_specific:.1f}%)")
+    print(f"  Specific (distinct shade):      {len(specific):3d}  ({pct_specific:.1f}%)")
+
+    # Show unique hex values returned
+    unique_hexes = sorted(set(c['hex'] for c in valid_colors))
+    print(f"\n--- Unique Hex Values ({len(unique_hexes)}) ---")
+    for h in unique_hexes:
+        rgb = hex_to_rgb(h)
+        hsl = rgb_to_hsl(*rgb)
+        cl = classify_color(h)
+        tag = "GENERIC" if cl['is_generic'] else "specific"
+        count = sum(1 for c in valid_colors if c['hex'] == h)
+        print(f"  {h}  RGB({rgb[0]:3d},{rgb[1]:3d},{rgb[2]:3d})  "
+              f"HSL({hsl[0]:5.1f},{hsl[1]:4.1f}%,{hsl[2]:4.1f}%)  "
+              f"x{count}  [{tag}, near {cl['nearest_primary']}, d={cl['primary_distance']}]")
+
+    # Distance statistics
+    distances = [c['primary_distance'] for c in valid_colors]
+    avg_dist = sum(distances) / len(distances)
+    min_dist = min(distances)
+    max_dist = max(distances)
+    print(f"\n--- Distance from Nearest Primary ---")
+    print(f"  Min: {min_dist:.1f}   Avg: {avg_dist:.1f}   Max: {max_dist:.1f}")
+    print(f"  (Higher = more specific. Threshold for 'generic' = {GENERIC_DISTANCE_THRESHOLD})")
+
+    # Verdict
+    print(f"\n--- Verdict ---")
+    if pct_specific >= 80:
+        print(f"  The model returns SPECIFIC hex colors ({pct_specific:.0f}% distinct shades).")
+        print(f"  It is capable of estimating precise colors from images.")
+    elif pct_specific >= 40:
+        print(f"  MIXED results ({pct_specific:.0f}% specific).")
+        print(f"  The model sometimes returns specific shades but often falls back to primaries.")
+    else:
+        print(f"  The model mostly returns GENERIC primary colors ({pct_specific:.0f}% specific).")
+        print(f"  Hex codes are largely just the standard primary color equivalents.")
+
+
+if __name__ == '__main__':
+    main()
--- a/test_hex_color_specificity_llama.py
+++ b/test_hex_color_specificity_llama.py
@ -0,0 +1,316 @@
+#!/usr/bin/env python3
+"""
+Test whether a local llama.cpp VLM can return specific hex color codes
+rather than generic named colors.
+
+Sends a random sample of 10 images using a hex-color prompt, then analyzes
+how specific the returned colors actually are by comparing them against
+a set of known "pure primary" hex values.
+
+Usage:
+    python test_hex_color_specificity_llama.py
+    python test_hex_color_specificity_llama.py --sample 20
+    python test_hex_color_specificity_llama.py --seed 42
+    python test_hex_color_specificity_llama.py --server-url http://hitagi:8080
+"""
+
+import argparse
+import colorsys
+import json
+import math
+import os
+import random
+import re
+import sys
+import time
+from pathlib import Path
+
+import cv2
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from scan_utils.llama_cpp_client import LlamaCppClient
+
+SERVER_URL = "http://agx:8080"
+IMAGES_DIR = os.path.join(os.path.dirname(__file__), "basketball_jersery_color_test_files")
+PROMPT_FILE = os.path.join(os.path.dirname(__file__), "jersey_prompt_hex_color.txt")
+
+# Pure primary / basic colors that indicate the model is NOT being specific
+PRIMARY_COLORS = {
+    "#FF0000": "red",
+    "#00FF00": "green",
+    "#0000FF": "blue",
+    "#FFFF00": "yellow",
+    "#FF00FF": "magenta",
+    "#00FFFF": "cyan",
+    "#FFFFFF": "white",
+    "#000000": "black",
+    "#808080": "gray",
+    "#FFA500": "orange",
+    "#800080": "purple",
+    "#FFC0CB": "pink",
+    "#A52A2A": "brown",
+    "#800000": "maroon",
+    "#008000": "green (dark)",
+    "#000080": "navy",
+    "#C0C0C0": "silver",
+    "#FFD700": "gold",
+}
+
+# Distance threshold: how close to a primary a hex must be to count as "generic"
+# In RGB space (0-255 per channel), 20 is very close
+GENERIC_DISTANCE_THRESHOLD = 20
+
+
+def hex_to_rgb(h: str) -> tuple[int, int, int] | None:
+    """Parse a hex color string to (r, g, b). Returns None if invalid."""
+    h = h.strip().lstrip('#')
+    if len(h) == 6:
+        try:
+            return (int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
+        except ValueError:
+            return None
+    if len(h) == 3:
+        try:
+            return (int(h[0]*2, 16), int(h[1]*2, 16), int(h[2]*2, 16))
+        except ValueError:
+            return None
+    return None
+
+
+def rgb_distance(a: tuple[int, int, int], b: tuple[int, int, int]) -> float:
+    """Euclidean distance between two RGB colors."""
+    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))
+
+
+def rgb_to_hsl(r: int, g: int, b: int) -> tuple[float, float, float]:
+    """Convert RGB (0-255) to HSL (h: 0-360, s: 0-100, l: 0-100)."""
+    h, l, s = colorsys.rgb_to_hls(r / 255, g / 255, b / 255)
+    return round(h * 360, 1), round(s * 100, 1), round(l * 100, 1)
+
+
+def classify_color(hex_str: str) -> dict:
+    """Classify a hex color as generic/primary or specific."""
+    rgb = hex_to_rgb(hex_str)
+    if rgb is None:
+        return {'valid': False, 'hex': hex_str, 'reason': 'invalid hex'}
+
+    normalized = f"#{rgb[0]:02X}{rgb[1]:02X}{rgb[2]:02X}"
+    hsl = rgb_to_hsl(*rgb)
+
+    best_name = "unknown"
+    best_dist = float('inf')
+    for phex, pname in PRIMARY_COLORS.items():
+        prgb = hex_to_rgb(phex)
+        d = rgb_distance(rgb, prgb)
+        if d < best_dist:
+            best_dist = d
+            best_name = pname
+
+    is_generic = best_dist < GENERIC_DISTANCE_THRESHOLD
+
+    return {
+        'valid': True,
+        'hex': normalized,
+        'rgb': rgb,
+        'hsl': hsl,
+        'is_generic': is_generic,
+        'nearest_primary': best_name,
+        'primary_distance': round(best_dist, 1),
+    }
+
+
+def clean_response(text: str) -> str:
+    cleaned = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL | re.IGNORECASE)
+    cleaned = re.sub(r'\u25c1think\u25b7.*?\u25c1/think\u25b7', '', cleaned, flags=re.DOTALL)
+    cleaned = re.sub(r'</?think>', '', cleaned, flags=re.IGNORECASE)
+    cleaned = re.sub(r'\u25c1/?think\u25b7', '', cleaned, flags=re.IGNORECASE)
+    json_block = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', cleaned, flags=re.DOTALL | re.IGNORECASE)
+    if json_block:
+        cleaned = json_block.group(1)
+    else:
+        cleaned = re.sub(r'```(?:json)?', '', cleaned, flags=re.IGNORECASE)
+    return cleaned.strip()
+
+
+def salvage_jerseys(text: str) -> list[dict]:
+    """Extract complete jersey objects from truncated JSON."""
+    pattern = re.compile(
+        r'\{\s*'
+        r'"jersey_number"\s*:\s*"[^"]*"\s*,\s*'
+        r'"jersey_color"\s*:\s*"([^"]*)"\s*,\s*'
+        r'"number_color"\s*:\s*"([^"]*)"\s*'
+        r'\}',
+        re.DOTALL,
+    )
+    return [{'jersey_color': m.group(1), 'number_color': m.group(2)} for m in pattern.finditer(text)]
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Test hex color specificity from local llama.cpp VLM")
+    parser.add_argument('--sample', type=int, default=10, help='Number of random images to test (default: 10)')
+    parser.add_argument('--seed', type=int, default=None, help='Random seed for reproducibility')
+    parser.add_argument('--server-url', default=SERVER_URL, help=f'llama.cpp server URL (default: {SERVER_URL})')
+    args = parser.parse_args()
+
+    with open(PROMPT_FILE, 'r') as f:
+        prompt = f.read()
+
+    client = LlamaCppClient(base_url=args.server_url)
+
+    # Detect model name from server
+    model_name = "unknown"
+    try:
+        models = client.get_models()
+        if 'data' in models and len(models['data']) > 0:
+            model_id = models['data'][0].get('id', 'unknown')
+            model_name = os.path.splitext(os.path.basename(model_id))[0] if model_id else "unknown"
+    except Exception:
+        pass
+
+    # Gather and sample image files
+    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.webp'}
+    all_images = sorted([
+        p for p in Path(IMAGES_DIR).iterdir()
+        if p.suffix.lower() in valid_extensions
+    ])
+
+    seed = args.seed if args.seed is not None else random.randint(0, 99999)
+    rng = random.Random(seed)
+    sample_size = min(args.sample, len(all_images))
+    sample_images = rng.sample(all_images, sample_size)
+    sample_images.sort()
+
+    print(f"Model: {model_name}")
+    print(f"Server: {args.server_url}")
+    print(f"Prompt: {PROMPT_FILE}")
+    print(f"Sample: {sample_size} images (seed={seed})")
+    print(f"Selected: {', '.join(p.name for p in sample_images)}")
+    print("=" * 70)
+
+    # Collect all color classifications
+    all_colors = []
+    total_jerseys = 0
+    errors = 0
+
+    for i, image_path in enumerate(sample_images, 1):
+        print(f"\n[{i}/{sample_size}] {image_path.name}")
+
+        image = cv2.imread(str(image_path))
+        if image is None:
+            print(f"  SKIP (failed to load)")
+            errors += 1
+            continue
+
+        try:
+            t0 = time.time()
+
+            message = client.create_multimodal_message(role="user", content=prompt, images=[image])
+            response = client.chat_completion(messages=[message], temperature=0.1, max_tokens=1000)
+            elapsed = time.time() - t0
+
+            response_text = response['choices'][0]['message']['content']
+            cleaned = clean_response(response_text)
+
+            try:
+                result = json.loads(cleaned)
+                jerseys = result.get('jerseys', [])
+            except json.JSONDecodeError:
+                jerseys = salvage_jerseys(cleaned)
+                if jerseys:
+                    print(f"  (truncated response, salvaged {len(jerseys)} jersey(s))")
+
+            print(f"  {len(jerseys)} jersey(s) detected in {elapsed:.1f}s")
+
+            for j in jerseys:
+                total_jerseys += 1
+                raw_hex = j.get('jersey_color', '')
+                c = classify_color(raw_hex)
+                c['image'] = image_path.name
+                c['field'] = 'jersey_color'
+                c['raw'] = raw_hex
+                all_colors.append(c)
+
+                if not c['valid']:
+                    status = f"  INVALID ({raw_hex})"
+                elif c['is_generic']:
+                    status = f"  GENERIC  ~{c['nearest_primary']}"
+                else:
+                    status = f"  SPECIFIC (nearest: {c['nearest_primary']}, dist={c['primary_distance']})"
+
+                print(f"    jersey: {raw_hex:10s} -> {status}")
+
+        except Exception as e:
+            print(f"  ERROR: {e}")
+            errors += 1
+
+    # --- Analysis ---
+    print()
+    print("=" * 70)
+    print("HEX COLOR SPECIFICITY ANALYSIS")
+    print("=" * 70)
+    print(f"Model: {model_name}")
+    print(f"Server: {args.server_url}")
+    print(f"Images tested: {sample_size} (seed={seed})")
+    print(f"Total jerseys: {total_jerseys}")
+    print(f"Total jersey color values: {len(all_colors)}")
+    print(f"Errors: {errors}")
+
+    valid_colors = [c for c in all_colors if c['valid']]
+    invalid_colors = [c for c in all_colors if not c['valid']]
+
+    print(f"\nValid hex codes: {len(valid_colors)}/{len(all_colors)}")
+    if invalid_colors:
+        print(f"Invalid values ({len(invalid_colors)}):")
+        for c in invalid_colors:
+            print(f"  {c['image']} {c['field']}: {c['raw']}")
+
+    if not valid_colors:
+        print("\nNo valid hex colors returned. The model may not support hex output.")
+        return
+
+    generic = [c for c in valid_colors if c['is_generic']]
+    specific = [c for c in valid_colors if not c['is_generic']]
+
+    pct_specific = len(specific) / len(valid_colors) * 100
+
+    print(f"\n--- Specificity Breakdown ---")
+    print(f"  Generic (near a pure primary):  {len(generic):3d}  ({100 - pct_specific:.1f}%)")
+    print(f"  Specific (distinct shade):      {len(specific):3d}  ({pct_specific:.1f}%)")
+
+    # Show unique hex values returned
+    unique_hexes = sorted(set(c['hex'] for c in valid_colors))
+    print(f"\n--- Unique Hex Values ({len(unique_hexes)}) ---")
+    for h in unique_hexes:
+        rgb = hex_to_rgb(h)
+        hsl = rgb_to_hsl(*rgb)
+        cl = classify_color(h)
+        tag = "GENERIC" if cl['is_generic'] else "specific"
+        count = sum(1 for c in valid_colors if c['hex'] == h)
+        print(f"  {h}  RGB({rgb[0]:3d},{rgb[1]:3d},{rgb[2]:3d})  "
+              f"HSL({hsl[0]:5.1f},{hsl[1]:4.1f}%,{hsl[2]:4.1f}%)  "
+              f"x{count}  [{tag}, near {cl['nearest_primary']}, d={cl['primary_distance']}]")
+
+    # Distance statistics
+    distances = [c['primary_distance'] for c in valid_colors]
+    avg_dist = sum(distances) / len(distances)
+    min_dist = min(distances)
+    max_dist = max(distances)
+    print(f"\n--- Distance from Nearest Primary ---")
+    print(f"  Min: {min_dist:.1f}   Avg: {avg_dist:.1f}   Max: {max_dist:.1f}")
+    print(f"  (Higher = more specific. Threshold for 'generic' = {GENERIC_DISTANCE_THRESHOLD})")
+
+    # Verdict
+    print(f"\n--- Verdict ---")
+    if pct_specific >= 80:
+        print(f"  The model returns SPECIFIC hex colors ({pct_specific:.0f}% distinct shades).")
+        print(f"  It is capable of estimating precise colors from images.")
+    elif pct_specific >= 40:
+        print(f"  MIXED results ({pct_specific:.0f}% specific).")
+        print(f"  The model sometimes returns specific shades but often falls back to primaries.")
+    else:
+        print(f"  The model mostly returns GENERIC primary colors ({pct_specific:.0f}% specific).")
+        print(f"  Hex codes are largely just the standard primary color equivalents.")
+
+
+if __name__ == '__main__':
+    main()