Add color variety and hex specificity test scripts with report

- test_color_variety.py: named-color test for local llama.cpp VLM
- test_color_variety_gemini.py: named-color test for Gemini 3 Flash API
- test_hex_color_specificity.py: hex color specificity test for Gemini
- test_hex_color_specificity_llama.py: hex color specificity test for local VLM
- jersey_prompt_hex_color.txt: prompt requesting hex color codes
- COLOR_TEST_REPORT.md: analysis report comparing 3 models across 5 tests
- color_test_results.md: raw test output from all runs
This commit is contained in:
2026-02-24 11:30:41 -07:00
parent 825f3c19a9
commit 435033ea07
7 changed files with 1646 additions and 0 deletions

299
color_test_results.md Normal file
View File

@ -0,0 +1,299 @@
#Qwen2.5-VL-7B Model Results:
======================================================================
COLOR VARIETY SUMMARY
======================================================================
Images processed: 161
Total jerseys detected: 369
Errors: 0
Total time: 2397.7s (14.9s avg)
--- Jersey Colors (15 unique) ---
white 84 ##################################################
blue 60 ##################################################
green 48 ################################################
black 31 ###############################
yellow 27 ###########################
red 27 ###########################
purple 25 #########################
orange 24 ########################
maroon 14 ##############
gray 9 #########
light blue 6 ######
brown 6 ######
teal 4 ####
pink 2 ##
gold 2 ##
--- Number Colors (11 unique) ---
white 195 ##################################################
black 60 ##################################################
yellow 39 #######################################
red 30 ##############################
blue 23 #######################
orange 8 ########
purple 4 ####
pink 3 ###
green 3 ###
brown 2 ##
maroon 2 ##
--- Combined Color Palette (15 unique values) ---
black jersey: 31 number: 60
blue jersey: 60 number: 23
brown jersey: 6 number: 2
gold jersey: 2 number: 0
gray jersey: 9 number: 0
green jersey: 48 number: 3
light blue jersey: 6 number: 0
maroon jersey: 14 number: 2
orange jersey: 24 number: 8
pink jersey: 2 number: 3
purple jersey: 25 number: 4
red jersey: 27 number: 30
teal jersey: 4 number: 0
white jersey: 84 number:195
yellow jersey: 27 number: 39
#Gemini 3 Flash Results:
======================================================================
COLOR VARIETY SUMMARY (gemini-3-flash-preview)
======================================================================
Images processed: 161
Total jerseys detected: 453
Errors: 0
Total time: 2560.0s (15.9s avg)
--- Jersey Colors (19 unique) ---
white 125 ##################################################
green 60 ##################################################
blue 43 ###########################################
purple 28 ############################
orange 27 ###########################
yellow 24 ########################
maroon 23 #######################
light blue 22 ######################
red 22 ######################
black 21 #####################
brown 13 #############
grey 12 ############
navy blue 11 ###########
dark blue 9 #########
teal 7 #######
pink 2 ##
gold 2 ##
dark brown 1 #
navy 1 #
--- Number Colors (15 unique) ---
white 183 ##################################################
yellow 58 ##################################################
red 44 ############################################
black 40 ########################################
blue 39 #######################################
orange 21 #####################
dark blue 14 ##############
maroon 14 ##############
green 13 #############
purple 11 ###########
pink 6 ######
gold 5 #####
brown 2 ##
grey 2 ##
navy blue 1 #
--- Combined Color Palette (19 unique values) ---
black jersey: 21 number: 40
blue jersey: 43 number: 39
brown jersey: 13 number: 2
dark blue jersey: 9 number: 14
dark brown jersey: 1 number: 0
gold jersey: 2 number: 5
green jersey: 60 number: 13
grey jersey: 12 number: 2
light blue jersey: 22 number: 0
maroon jersey: 23 number: 14
navy jersey: 1 number: 0
navy blue jersey: 11 number: 1
orange jersey: 27 number: 21
pink jersey: 2 number: 6
purple jersey: 28 number: 11
red jersey: 22 number: 44
teal jersey: 7 number: 0
white jersey:125 number:183
yellow jersey: 24 number: 58
#Qwen3-VL-8B Model Results:
======================================================================
COLOR VARIETY SUMMARY
======================================================================
Images processed: 161
Total jerseys detected: 444
Errors: 1
Total time: 2738.7s (17.0s avg)
--- Jersey Colors (15 unique) ---
white 120 ##################################################
blue 69 ##################################################
green 53 ##################################################
black 33 #################################
purple 30 ##############################
red 28 ############################
orange 27 ###########################
yellow 26 ##########################
maroon 15 ###############
light blue 13 #############
gray 10 ##########
brown 9 #########
teal 7 #######
pink 2 ##
gold 2 ##
--- Number Colors (13 unique) ---
white 184 ##################################################
black 44 ############################################
red 41 #########################################
blue 39 #######################################
yellow 32 ################################
orange 29 #############################
gold 21 #####################
green 14 ##############
maroon 12 ############
purple 11 ###########
dark blue 9 #########
pink 6 ######
silver 2 ##
--- Combined Color Palette (17 unique values) ---
black jersey: 33 number: 44
blue jersey: 69 number: 39
brown jersey: 9 number: 0
dark blue jersey: 0 number: 9
gold jersey: 2 number: 21
gray jersey: 10 number: 0
green jersey: 53 number: 14
light blue jersey: 13 number: 0
maroon jersey: 15 number: 12
orange jersey: 27 number: 29
pink jersey: 2 number: 6
purple jersey: 30 number: 11
red jersey: 28 number: 41
silver jersey: 0 number: 2
teal jersey: 7 number: 0
white jersey:120 number:184
yellow jersey: 26 number: 32
#Gemini 3 Flash (Hex Colors, random sample of 10 images) Results:
Test params: test_hex_color_specificity.py --sample 20 --seed 42
======================================================================
HEX COLOR SPECIFICITY ANALYSIS
======================================================================
Model: gemini-3-flash-preview
Images tested: 20 (seed=42)
Total jerseys: 56
Total jersey color values: 56
Errors: 0
Valid hex codes: 56/56
--- Specificity Breakdown ---
Generic (near a pure primary): 16 (28.6%)
Specific (distinct shade): 40 (71.4%)
--- Unique Hex Values (24) ---
#004B23 RGB( 0, 75, 35) HSL(148.0,100.0%,14.7%) x7 [specific, near green (dark), d=63.5]
#1A2344 RGB( 26, 35, 68) HSL(227.1,44.7%,18.4%) x2 [specific, near navy, d=74.2]
#1E4BA1 RGB( 30, 75,161) HSL(219.4,68.6%,37.5%) x1 [specific, near navy, d=87.3]
#2B231D RGB( 43, 35, 29) HSL( 25.7,19.4%,14.1%) x1 [specific, near black, d=62.6]
#3D2B1F RGB( 61, 43, 31) HSL( 24.0,32.6%,18.0%) x1 [specific, near black, d=80.8]
#461D7C RGB( 70, 29,124) HSL(265.9,62.1%,30.0%) x1 [specific, near purple, d=65.0]
#4B2E83 RGB( 75, 46,131) HSL(260.5,48.0%,34.7%) x5 [specific, near purple, d=70.2]
#701112 RGB(112, 17, 18) HSL(359.4,73.6%,25.3%) x1 [specific, near maroon, d=29.5]
#7BAFD4 RGB(123,175,212) HSL(204.9,50.9%,65.7%) x3 [specific, near silver, d=73.8]
#990000 RGB(153, 0, 0) HSL( 0.0,100.0%,30.0%) x2 [specific, near maroon, d=25.0]
#A9A9A9 RGB(169,169,169) HSL( 0.0, 0.0%,66.3%) x1 [specific, near silver, d=39.8]
#C41230 RGB(196, 18, 48) HSL(349.9,83.2%,42.0%) x1 [specific, near brown, d=39.7]
#D11111 RGB(209, 17, 17) HSL( 0.0,85.0%,44.3%) x2 [specific, near red, d=51.9]
#D32F2F RGB(211, 47, 47) HSL( 0.0,65.1%,50.6%) x2 [specific, near brown, d=46.5]
#E31837 RGB(227, 24, 55) HSL(350.8,80.9%,49.2%) x1 [specific, near brown, d=65.9]
#E31B23 RGB(227, 27, 35) HSL(357.6,78.7%,49.8%) x1 [specific, near red, d=52.3]
#E3242B RGB(227, 36, 43) HSL(357.8,77.3%,51.6%) x2 [specific, near brown, d=62.3]
#E6E600 RGB(230,230, 0) HSL( 60.0,100.0%,45.1%) x1 [specific, near gold, d=29.2]
#E8E8E8 RGB(232,232,232) HSL( 0.0, 0.0%,91.0%) x1 [specific, near white, d=39.8]
#E91E63 RGB(233, 30, 99) HSL(339.6,82.2%,51.6%) x1 [specific, near brown, d=89.5]
#F06292 RGB(240, 98,146) HSL(339.7,82.6%,66.3%) x2 [specific, near pink, d=111.0]
#F57C00 RGB(245,124, 0) HSL( 30.4,100.0%,48.0%) x1 [specific, near orange, d=42.2]
#FFCD00 RGB(255,205, 0) HSL( 48.2,100.0%,50.0%) x1 [GENERIC, near gold, d=10.0]
#FFFFFF RGB(255,255,255) HSL( 0.0, 0.0%,100.0%) x15 [GENERIC, near white, d=0.0]
--- Distance from Nearest Primary ---
Min: 0.0 Avg: 44.5 Max: 111.0
(Higher = more specific. Threshold for 'generic' = 20)
--- Verdict ---
MIXED results (71% specific).
The model sometimes returns specific shades but often falls back to primaries.
#Qwen3-VL-8B (Hex Colors, random sample of 10 images) Results:
Test params: test_hex_color_specificity_llama.py --sample 20 --seed 42
======================================================================
HEX COLOR SPECIFICITY ANALYSIS
======================================================================
Model: unsloth_Qwen3-VL-8B-Instruct-GGUF_Qwen3-VL-8B-Instruct-BF16
Server: http://agx:8080
Images tested: 20 (seed=42)
Total jerseys: 59
Total jersey color values: 59
Errors: 0
Valid hex codes: 59/59
--- Specificity Breakdown ---
Generic (near a pure primary): 22 (37.3%)
Specific (distinct shade): 37 (62.7%)
--- Unique Hex Values (21) ---
#000000 RGB( 0, 0, 0) HSL( 0.0, 0.0%, 0.0%) x1 [GENERIC, near black, d=0.0]
#006400 RGB( 0,100, 0) HSL(120.0,100.0%,19.6%) x10 [specific, near green (dark), d=28.0]
#191970 RGB( 25, 25,112) HSL(240.0,63.5%,26.9%) x1 [specific, near navy, d=38.8]
#19418A RGB( 25, 65,138) HSL(218.8,69.3%,32.0%) x1 [specific, near navy, d=70.4]
#3D2B21 RGB( 61, 43, 33) HSL( 21.4,29.8%,18.4%) x2 [specific, near black, d=81.6]
#66B2FF RGB(102,178,255) HSL(210.2,100.0%,70.0%) x3 [specific, near silver, d=110.7]
#6A0DAD RGB(106, 13,173) HSL(274.9,86.0%,36.5%) x6 [specific, near purple, d=51.7]
#8B0000 RGB(139, 0, 0) HSL( 0.0,100.0%,27.3%) x1 [GENERIC, near maroon, d=11.0]
#A9A9A9 RGB(169,169,169) HSL( 0.0, 0.0%,66.3%) x1 [specific, near silver, d=39.8]
#B22234 RGB(178, 34, 52) HSL(352.5,67.9%,41.6%) x2 [GENERIC, near brown, d=18.2]
#D32F2F RGB(211, 47, 47) HSL( 0.0,65.1%,50.6%) x3 [specific, near brown, d=46.5]
#D60000 RGB(214, 0, 0) HSL( 0.0,100.0%,42.0%) x3 [specific, near red, d=41.0]
#DC143C RGB(220, 20, 60) HSL(348.0,83.3%,47.1%) x2 [specific, near brown, d=61.9]
#F5F5DC RGB(245,245,220) HSL( 60.0,55.6%,91.2%) x2 [specific, near white, d=37.7]
#F5F5F5 RGB(245,245,245) HSL( 0.0, 0.0%,96.1%) x1 [GENERIC, near white, d=17.3]
#FF0000 RGB(255, 0, 0) HSL( 0.0,100.0%,50.0%) x1 [GENERIC, near red, d=0.0]
#FF6347 RGB(255, 99, 71) HSL( 9.1,100.0%,63.9%) x1 [specific, near orange, d=96.9]
#FF69B4 RGB(255,105,180) HSL(330.0,100.0%,70.6%) x2 [specific, near pink, d=90.0]
#FFD700 RGB(255,215, 0) HSL( 50.6,100.0%,50.0%) x1 [GENERIC, near gold, d=0.0]
#FFFF00 RGB(255,255, 0) HSL( 60.0,100.0%,50.0%) x1 [GENERIC, near yellow, d=0.0]
#FFFFFF RGB(255,255,255) HSL( 0.0, 0.0%,100.0%) x14 [GENERIC, near white, d=0.0]
--- Distance from Nearest Primary ---
Min: 0.0 Avg: 34.5 Max: 110.7
(Higher = more specific. Threshold for 'generic' = 20)
--- Verdict ---
MIXED results (63% specific).
The model sometimes returns specific shades but often falls back to primaries.