Includes accuracy test scripts for Qwen (local) and Gemini (cloud API), three prompt variants (original, capstone, constrained), test results from all runs, and two analysis reports with an HTML presentation version.
491 lines
14 KiB
Markdown
491 lines
14 KiB
Markdown
#Gemini 3 Flash Results (Prompt: jersey_prompt.txt):
|
|
|
|
================================================================================
|
|
ACCURACY SUMMARY (gemini-3-flash-preview)
|
|
================================================================================
|
|
Images processed: 161
|
|
Errors: 0
|
|
Total time: 2134.4s (13.3s avg)
|
|
|
|
Ground truth colors: 202 (excluding white)
|
|
VLM unique colors: 174 (excluding white)
|
|
|
|
--- Recall (did VLM find each ground truth color?) ---
|
|
Exact match: 130 / 202 (64.4%)
|
|
Similar match: 34 / 202 (16.8%)
|
|
Total found: 164 / 202 (81.2%)
|
|
Missed: 38 / 202 (18.8%)
|
|
|
|
--- Precision (are VLM colors correct?) ---
|
|
Exact match: 130 / 174 (74.7%)
|
|
Similar match: 33 / 174 (19.0%)
|
|
Total correct: 163 / 174 (93.7%)
|
|
Extra/wrong: 11 / 174 (6.3%)
|
|
|
|
--- Similar-Match Confusions (expected -> got) ---
|
|
gray -> grey x9
|
|
navy blue -> blue x7
|
|
dark brown -> brown x5
|
|
dark blue -> blue x5
|
|
gold -> yellow x3
|
|
dark blue -> navy blue x3
|
|
navy -> navy blue x1
|
|
dark blue -> navy x1
|
|
|
|
--- Most Missed Ground Truth Colors ---
|
|
gray 7 #######
|
|
black 7 #######
|
|
maroon 5 #####
|
|
blue 3 ###
|
|
green 3 ###
|
|
gold 2 ##
|
|
light blue 2 ##
|
|
gold|yellow 2 ##
|
|
red 2 ##
|
|
teal 2 ##
|
|
orange 1 #
|
|
yellow 1 #
|
|
brown 1 #
|
|
|
|
--- Most Common Extra/Wrong VLM Colors ---
|
|
red 3 ###
|
|
blue 3 ###
|
|
black 2 ##
|
|
green 1 #
|
|
orange 1 #
|
|
dark blue 1 #
|
|
|
|
--- Per-Image Verdict ---
|
|
PASS 124
|
|
PARTIAL 19
|
|
FAIL 18
|
|
|
|
--- Failed Images (18) ---
|
|
016 - maroon.jpg
|
|
missed: maroon
|
|
029 -maroon_white.jpg
|
|
missed: maroon
|
|
extra: red
|
|
034 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
046 - green.jpg
|
|
missed: green
|
|
extra: black
|
|
048 - red.jpg
|
|
missed: red
|
|
053 - black_white.jpg
|
|
missed: black
|
|
057 - white_gold or yellow.jpg
|
|
missed: gold|yellow
|
|
069 - red_white.jpg
|
|
missed: red
|
|
074 - white_orange.jpg
|
|
missed: orange
|
|
077 - teal_white.jpg
|
|
missed: teal
|
|
extra: green
|
|
088 - white_maroon.jpg
|
|
missed: maroon
|
|
129 - blue_white.jpg
|
|
missed: blue
|
|
132 - brown_white.jpg
|
|
missed: brown
|
|
extra: orange
|
|
134 - teal_white.jpg
|
|
missed: teal
|
|
extra: blue
|
|
138 - maroon.jpg
|
|
missed: maroon
|
|
extra: red
|
|
150 - green_gray.jpg
|
|
missed: green, gray
|
|
extra: black
|
|
160 - blue_white.jpg
|
|
missed: blue
|
|
161 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
|
|
|
|
#Qwen3-VL-8B Model Results (Prompt: jersey_prompt.txt):
|
|
|
|
================================================================================
|
|
ACCURACY SUMMARY
|
|
================================================================================
|
|
Images processed: 161
|
|
Errors: 0
|
|
Total time: 1526.4s (9.5s avg)
|
|
|
|
Ground truth colors: 202 (excluding white)
|
|
VLM unique colors: 184 (excluding white)
|
|
|
|
--- Recall (did VLM find each ground truth color?) ---
|
|
Exact match: 130 / 202 (64.4%)
|
|
Similar match: 26 / 202 (12.9%)
|
|
Total found: 156 / 202 (77.2%)
|
|
Missed: 46 / 202 (22.8%)
|
|
|
|
--- Precision (are VLM colors correct?) ---
|
|
Exact match: 130 / 184 (70.7%)
|
|
Similar match: 26 / 184 (14.1%)
|
|
Total correct: 156 / 184 (84.8%)
|
|
Extra/wrong: 28 / 184 (15.2%)
|
|
|
|
--- Similar-Match Confusions (expected -> got) ---
|
|
dark blue -> blue x10
|
|
navy blue -> blue x8
|
|
gold -> yellow x5
|
|
dark brown -> brown x2
|
|
navy -> blue x1
|
|
|
|
--- Most Missed Ground Truth Colors ---
|
|
light blue 8 ########
|
|
maroon 8 ########
|
|
gray 7 #######
|
|
black 6 ######
|
|
dark brown 4 ####
|
|
brown 3 ###
|
|
blue 3 ###
|
|
green 3 ###
|
|
teal 2 ##
|
|
gold|yellow 1 #
|
|
red 1 #
|
|
|
|
--- Most Common Extra/Wrong VLM Colors ---
|
|
blue 10 ##########
|
|
black 7 #######
|
|
red 7 #######
|
|
gold 1 #
|
|
green 1 #
|
|
redolas 1 #
|
|
orange 1 #
|
|
|
|
--- Per-Image Verdict ---
|
|
PASS 117
|
|
PARTIAL 18
|
|
FAIL 26
|
|
|
|
--- Failed Images (26) ---
|
|
001 -brown_white or dark brown.jpg
|
|
missed: brown, dark brown
|
|
extra: black
|
|
013 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
016 - maroon.jpg
|
|
missed: maroon
|
|
017 - brown_white.jpg
|
|
missed: brown
|
|
extra: black
|
|
022 - black_light blue.jpg
|
|
missed: black, light blue
|
|
extra: blue
|
|
029 -maroon_white.jpg
|
|
missed: maroon
|
|
extra: red
|
|
034 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
036 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
046 - green.jpg
|
|
missed: green
|
|
extra: black
|
|
053 - black_white.jpg
|
|
missed: black
|
|
057 - white_gold or yellow.jpg
|
|
missed: gold|yellow
|
|
063 - dark brown.jpg
|
|
missed: dark brown
|
|
extra: black
|
|
069 - red_white.jpg
|
|
missed: red
|
|
077 - teal_white.jpg
|
|
missed: teal
|
|
extra: green
|
|
078 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
083 - dark brown_white.jpg
|
|
missed: dark brown
|
|
extra: black
|
|
087 - white_light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
099 - maroon_white.jpg
|
|
missed: maroon
|
|
extra: redolas, red
|
|
129 - blue_white.jpg
|
|
missed: blue
|
|
132 - brown_white.jpg
|
|
missed: brown
|
|
extra: orange
|
|
134 - teal_white.jpg
|
|
missed: teal
|
|
extra: blue
|
|
138 - maroon.jpg
|
|
missed: maroon
|
|
extra: red
|
|
141 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
150 - green_gray.jpg
|
|
missed: green, gray
|
|
extra: black
|
|
160 - blue_white.jpg
|
|
missed: blue
|
|
161 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
|
|
|
|
#Gemini 3 Flash Results (Prompt: jersey_prompt_capstone.txt):
|
|
|
|
================================================================================
|
|
ACCURACY SUMMARY (gemini-3-flash-preview)
|
|
================================================================================
|
|
Images processed: 161
|
|
Errors: 0
|
|
Total time: 1881.7s (11.7s avg)
|
|
|
|
Ground truth colors: 202 (excluding white)
|
|
VLM unique colors: 174 (excluding white)
|
|
|
|
--- Recall (did VLM find each ground truth color?) ---
|
|
Exact match: 123 / 202 (60.9%)
|
|
Similar match: 35 / 202 (17.3%)
|
|
Total found: 158 / 202 (78.2%)
|
|
Missed: 44 / 202 (21.8%)
|
|
|
|
--- Precision (are VLM colors correct?) ---
|
|
Exact match: 123 / 174 (70.7%)
|
|
Similar match: 34 / 174 (19.5%)
|
|
Total correct: 157 / 174 (90.2%)
|
|
Extra/wrong: 17 / 174 (9.8%)
|
|
|
|
--- Similar-Match Confusions (expected -> got) ---
|
|
gray -> grey x10
|
|
navy blue -> blue x6
|
|
dark blue -> blue x6
|
|
dark brown -> brown x5
|
|
dark blue -> navy blue x3
|
|
gold -> yellow x2
|
|
navy blue -> navy x1
|
|
navy -> blue x1
|
|
dark blue -> navy x1
|
|
|
|
--- Most Missed Ground Truth Colors ---
|
|
maroon 9 #########
|
|
black 7 #######
|
|
gray 6 ######
|
|
green 4 ####
|
|
gold 3 ###
|
|
blue 3 ###
|
|
light blue 2 ##
|
|
gold|yellow 2 ##
|
|
red 2 ##
|
|
teal 2 ##
|
|
navy blue 1 #
|
|
dark brown 1 #
|
|
yellow 1 #
|
|
brown 1 #
|
|
|
|
--- Most Common Extra/Wrong VLM Colors ---
|
|
red 7 #######
|
|
black 4 ####
|
|
blue 2 ##
|
|
green 1 #
|
|
orange 1 #
|
|
light blue 1 #
|
|
navy 1 #
|
|
|
|
--- Per-Image Verdict ---
|
|
PASS 118
|
|
PARTIAL 21
|
|
FAIL 22
|
|
|
|
--- Failed Images (22) ---
|
|
016 - maroon.jpg
|
|
missed: maroon
|
|
019 - maroon_gold.jpg
|
|
missed: maroon, gold
|
|
extra: red
|
|
029 -maroon_white.jpg
|
|
missed: maroon
|
|
extra: red
|
|
030 - navy blue_white.jpg
|
|
missed: navy blue
|
|
034 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
036 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
046 - green.jpg
|
|
missed: green
|
|
extra: black
|
|
048 - red.jpg
|
|
missed: red
|
|
053 - black_white.jpg
|
|
missed: black
|
|
057 - white_gold or yellow.jpg
|
|
missed: gold|yellow
|
|
069 - red_white.jpg
|
|
missed: red
|
|
077 - teal_white.jpg
|
|
missed: teal
|
|
extra: green
|
|
083 - dark brown_white.jpg
|
|
missed: dark brown
|
|
extra: black
|
|
088 - white_maroon.jpg
|
|
missed: maroon
|
|
099 - maroon_white.jpg
|
|
missed: maroon
|
|
extra: red
|
|
128 - green_white.jpg
|
|
missed: green
|
|
129 - blue_white.jpg
|
|
missed: blue
|
|
132 - brown_white.jpg
|
|
missed: brown
|
|
extra: orange
|
|
134 - teal_white.jpg
|
|
missed: teal
|
|
extra: light blue
|
|
138 - maroon.jpg
|
|
missed: maroon
|
|
extra: red
|
|
150 - green_gray.jpg
|
|
missed: green, gray
|
|
extra: black
|
|
160 - blue_white.jpg
|
|
missed: blue
|
|
|
|
|
|
#Qwen3-VL-8B Model Results (Prompt: jersey_prompt_capstone.txt):
|
|
|
|
================================================================================
|
|
ACCURACY SUMMARY
|
|
================================================================================
|
|
Images processed: 161
|
|
Errors: 0
|
|
Total time: 1435.7s (8.9s avg)
|
|
|
|
Ground truth colors: 202 (excluding white)
|
|
VLM unique colors: 180 (excluding white)
|
|
|
|
--- Recall (did VLM find each ground truth color?) ---
|
|
Exact match: 133 / 202 (65.8%)
|
|
Similar match: 24 / 202 (11.9%)
|
|
Total found: 157 / 202 (77.7%)
|
|
Missed: 45 / 202 (22.3%)
|
|
|
|
--- Precision (are VLM colors correct?) ---
|
|
Exact match: 133 / 180 (73.9%)
|
|
Similar match: 24 / 180 (13.3%)
|
|
Total correct: 157 / 180 (87.2%)
|
|
Extra/wrong: 23 / 180 (12.8%)
|
|
|
|
--- Similar-Match Confusions (expected -> got) ---
|
|
dark blue -> blue x9
|
|
navy blue -> blue x8
|
|
gold -> yellow x3
|
|
dark brown -> brown x2
|
|
navy -> blue x1
|
|
dark blue -> navy x1
|
|
|
|
--- Most Missed Ground Truth Colors ---
|
|
gray 9 #########
|
|
maroon 7 #######
|
|
black 6 ######
|
|
light blue 5 #####
|
|
dark brown 4 ####
|
|
green 4 ####
|
|
brown 3 ###
|
|
gold 2 ##
|
|
blue 2 ##
|
|
teal 2 ##
|
|
gold|yellow 1 #
|
|
|
|
--- Most Common Extra/Wrong VLM Colors ---
|
|
black 7 #######
|
|
blue 6 ######
|
|
red 6 ######
|
|
gold 1 #
|
|
green 1 #
|
|
orange 1 #
|
|
navy 1 #
|
|
|
|
--- Per-Image Verdict ---
|
|
PASS 119
|
|
PARTIAL 19
|
|
FAIL 23
|
|
|
|
--- Failed Images (23) ---
|
|
001 -brown_white or dark brown.jpg
|
|
missed: brown, dark brown
|
|
extra: black
|
|
013 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
016 - maroon.jpg
|
|
missed: maroon
|
|
017 - brown_white.jpg
|
|
missed: brown
|
|
extra: black
|
|
019 - maroon_gold.jpg
|
|
missed: maroon, gold
|
|
extra: red
|
|
029 -maroon_white.jpg
|
|
missed: maroon
|
|
extra: red
|
|
034 - light blue.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
036 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
039 - gray_white.jpg
|
|
missed: gray
|
|
046 - green.jpg
|
|
missed: green
|
|
extra: black
|
|
053 - black_white.jpg
|
|
missed: black
|
|
057 - white_gold or yellow.jpg
|
|
missed: gold|yellow
|
|
063 - dark brown.jpg
|
|
missed: dark brown
|
|
extra: black
|
|
077 - teal_white.jpg
|
|
missed: teal
|
|
extra: green
|
|
083 - dark brown_white.jpg
|
|
missed: dark brown
|
|
extra: black
|
|
132 - brown_white.jpg
|
|
missed: brown
|
|
extra: orange
|
|
134 - teal_white.jpg
|
|
missed: teal
|
|
extra: blue
|
|
138 - maroon.jpg
|
|
missed: maroon
|
|
extra: red
|
|
141 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|
|
145 - green_white.jpg
|
|
missed: green
|
|
150 - green_gray.jpg
|
|
missed: green, gray
|
|
extra: black
|
|
160 - blue_white.jpg
|
|
missed: blue
|
|
161 - light blue_white.jpg
|
|
missed: light blue
|
|
extra: blue
|