======================================== Qwen3-VL-8B + jersey_prompt.txt Started: Tue Mar 3 04:40:45 PM MST 2026 ======================================== Images to process: 161 Server: http://agx:8080 Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [black] (3 jersey(s), 11.1s) FAIL MISS:brown,dark brown, extra:black [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 7.9s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [blue] (3 jersey(s), 10.8s) PASS similar:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (3 jersey(s), 11.9s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 5.0s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 4.3s) PASS similar:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 7.9s) PASS exact:1 [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 4.3s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [gold, red] (3 jersey(s), 10.8s) PARTIAL exact:1, extra:gold [10/161] 010 - white_black.jpg GT: [black] VLM: [black] (3 jersey(s), 10.9s) PASS exact:1 [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 13.8s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.3s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 7.5s) FAIL MISS:light blue, extra:blue [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 10.9s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 7.3s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 1.5s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [black] (2 jersey(s), 8.8s) FAIL MISS:brown, extra:black [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [gray, red] (2 jersey(s), 7.4s) PASS exact:2 [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [red, yellow] (2 jersey(s), 7.7s) PARTIAL similar:1, MISS:maroon, extra:red [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 8.1s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.9s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 4.9s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.8s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 7.8s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 4.3s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 16.3s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.8s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [red] (2 jersey(s), 7.8s) FAIL MISS:maroon, extra:red [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.8s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 7.8s) PASS exact:1 [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 8.0s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (3 jersey(s), 10.9s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (1 jersey(s), 4.7s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green, yellow] (2 jersey(s), 8.1s) PASS exact:2 [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [blue] (4 jersey(s), 13.7s) FAIL MISS:light blue, extra:blue [37/161] 037 -navy_white.jpg GT: [navy] VLM: [blue] (3 jersey(s), 10.1s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 10.9s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 7.9s) PASS exact:1 [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (1 jersey(s), 5.1s) PARTIAL exact:1, MISS:gray [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (8 jersey(s), 25.7s) PASS similar:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.8s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, gray] (2 jersey(s), 7.9s) PASS exact:2 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (5 jersey(s), 16.6s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.8s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (15 jersey(s), 46.4s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 10.7s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [red] (1 jersey(s), 4.9s) PASS exact:1 [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 7.9s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (4 jersey(s), 13.8s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.9s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black, yellow] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 4.9s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [blue] (2 jersey(s), 7.7s) PASS exact:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (2 jersey(s), 7.9s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [yellow] (2 jersey(s), 7.9s) PASS exact:1 [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 14.0s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 4.9s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 7.9s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 7.9s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 7.5s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [black] (1 jersey(s), 4.9s) FAIL MISS:dark brown, extra:black [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.7s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (3 jersey(s), 10.4s) PASS exact:1, similar:1 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 4.7s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 13.8s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 4.8s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [(none)] (4 jersey(s), 13.7s) FAIL MISS:red [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 10.8s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.9s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.5s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.4s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.5s) PASS exact:1 [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.8s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue] (3 jersey(s), 11.4s) PASS exact:1 [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (4 jersey(s), 13.4s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 7.6s) FAIL MISS:light blue, extra:blue [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, red] (4 jersey(s), 13.8s) PARTIAL exact:1, MISS:maroon, extra:red [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.8s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.8s) PASS similar:1 [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (3 jersey(s), 10.6s) PASS similar:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [black] (2 jersey(s), 7.8s) FAIL MISS:dark brown, extra:black [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [black, yellow] (2 jersey(s), 7.9s) PARTIAL exact:1, MISS:dark brown, extra:black [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 4.8s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [brown] (2 jersey(s), 8.0s) PASS similar:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 7.8s) FAIL MISS:light blue, extra:blue [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.9s) PASS exact:1 [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 10.8s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (4 jersey(s), 14.2s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (2 jersey(s), 8.0s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 13.7s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 7.9s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 12.5s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 8.6s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [gray] (2 jersey(s), 8.0s) PARTIAL exact:1, MISS:black [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 8.7s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [red] (3 jersey(s), 12.0s) FAIL MISS:maroon, extra:red [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 13.9s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 17.0s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black, yellow] (3 jersey(s), 10.9s) PASS exact:2 [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 11.1s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.0s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 9.1s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, gray] (2 jersey(s), 9.0s) PASS exact:2 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.7s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.9s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.8s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 13.9s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 8.0s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.8s) PASS exact:1 [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.9s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 8.2s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, red] (4 jersey(s), 13.8s) PARTIAL similar:1, MISS:maroon, extra:red [116/161] 116 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 7.9s) PASS exact:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.1s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 7.4s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (3 jersey(s), 10.9s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [blue, red] (3 jersey(s), 10.7s) PASS exact:1, similar:1 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 10.9s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [gray] (1 jersey(s), 6.2s) PASS exact:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (3 jersey(s), 10.9s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (4 jersey(s), 13.7s) PASS similar:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [blue, red] (2 jersey(s), 8.2s) PARTIAL similar:1, MISS:maroon, extra:red [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 10.8s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [yellow] (4 jersey(s), 14.0s) PASS exact:1 [128/161] 128 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [129/161] 129 - blue_white.jpg GT: [blue] VLM: [(none)] (3 jersey(s), 10.9s) FAIL MISS:blue [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [black, yellow] (2 jersey(s), 8.4s) PASS exact:2 [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 10.8s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (3 jersey(s), 10.9s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (6 jersey(s), 21.1s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [blue] (1 jersey(s), 4.9s) FAIL MISS:teal, extra:blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 8.0s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.1s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 10.9s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 4.9s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 8.0s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.6s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [blue] (3 jersey(s), 11.1s) FAIL MISS:light blue, extra:blue [142/161] 142 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 8.1s) PASS exact:1 [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 11.0s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (10 jersey(s), 31.9s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [gray, red] (2 jersey(s), 8.0s) PASS exact:2 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 10.8s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 7.9s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 16.7s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [black] (2 jersey(s), 7.8s) FAIL MISS:green,gray, extra:black [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [blue, yellow] (5 jersey(s), 16.7s) PARTIAL exact:1, MISS:black, extra:blue [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [blue, pink] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.0s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [brown] (5 jersey(s), 16.8s) PASS similar:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [gray, purple, yellow] (5 jersey(s), 17.3s) PARTIAL exact:3, MISS:green [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (2 jersey(s), 7.7s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 10.7s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [blue, yellow] (4 jersey(s), 14.0s) PASS exact:1, similar:1 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 13.9s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [(none)] (1 jersey(s), 4.9s) FAIL MISS:blue [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 7.7s) FAIL MISS:light blue, extra:blue ================================================================================ ACCURACY SUMMARY ================================================================================ Images processed: 161 Errors: 0 Total time: 1557.4s (9.7s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 184 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 132 / 202 (65.3%) Similar match: 26 / 202 (12.9%) Total found: 158 / 202 (78.2%) Missed: 44 / 202 (21.8%) --- Precision (are VLM colors correct?) --- Exact match: 132 / 184 (71.7%) Similar match: 26 / 184 (14.1%) Total correct: 158 / 184 (85.9%) Extra/wrong: 26 / 184 (14.1%) --- Similar-Match Confusions (expected -> got) --- dark blue -> blue x10 navy blue -> blue x8 gold -> yellow x5 dark brown -> brown x2 navy -> blue x1 --- Most Missed Ground Truth Colors --- maroon 8 ######## gray 7 ####### light blue 7 ####### black 6 ###### dark brown 4 #### brown 3 ### blue 3 ### green 3 ### teal 2 ## red 1 # --- Most Common Extra/Wrong VLM Colors --- blue 9 ######### black 7 ####### red 7 ####### gold 1 # green 1 # orange 1 # --- Per-Image Verdict --- PASS 118 PARTIAL 19 FAIL 24 --- Failed Images (24) --- 001 -brown_white or dark brown.jpg missed: brown, dark brown extra: black 013 - light blue.jpg missed: light blue extra: blue 016 - maroon.jpg missed: maroon 017 - brown_white.jpg missed: brown extra: black 029 -maroon_white.jpg missed: maroon extra: red 034 - light blue.jpg missed: light blue extra: blue 036 - light blue_white.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 053 - black_white.jpg missed: black 063 - dark brown.jpg missed: dark brown extra: black 069 - red_white.jpg missed: red 077 - teal_white.jpg missed: teal extra: green 078 - light blue_white.jpg missed: light blue extra: blue 083 - dark brown_white.jpg missed: dark brown extra: black 087 - white_light blue.jpg missed: light blue extra: blue 099 - maroon_white.jpg missed: maroon extra: red 129 - blue_white.jpg missed: blue 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: blue 138 - maroon.jpg missed: maroon extra: red 141 - light blue_white.jpg missed: light blue extra: blue 150 - green_gray.jpg missed: green, gray extra: black 160 - blue_white.jpg missed: blue 161 - light blue_white.jpg missed: light blue extra: blue ======================================== Gemini 3 Flash + jersey_prompt.txt Started: Tue Mar 3 05:06:43 PM MST 2026 ======================================== Model: gemini-3-flash-preview Images to process: 161 Concurrency: 8 workers Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars) ================================================================================ Pre-encoding images ... 161 images in 1.7s Sending API requests ... 1/161 API calls completed 2/161 API calls completed 3/161 API calls completed 4/161 API calls completed 5/161 API calls completed 6/161 API calls completed 7/161 API calls completed 8/161 API calls completed 9/161 API calls completed 10/161 API calls completed 11/161 API calls completed 12/161 API calls completed 13/161 API calls completed 14/161 API calls completed 15/161 API calls completed 16/161 API calls completed 17/161 API calls completed 18/161 API calls completed 19/161 API calls completed 20/161 API calls completed 21/161 API calls completed 22/161 API calls completed 23/161 API calls completed 24/161 API calls completed 25/161 API calls completed 26/161 API calls completed 27/161 API calls completed 28/161 API calls completed 29/161 API calls completed 30/161 API calls completed 31/161 API calls completed 32/161 API calls completed 33/161 API calls completed 34/161 API calls completed 35/161 API calls completed 36/161 API calls completed 37/161 API calls completed 38/161 API calls completed 39/161 API calls completed 40/161 API calls completed 41/161 API calls completed 42/161 API calls completed 43/161 API calls completed 44/161 API calls completed 45/161 API calls completed 46/161 API calls completed 47/161 API calls completed 48/161 API calls completed 49/161 API calls completed 50/161 API calls completed 51/161 API calls completed 52/161 API calls completed 53/161 API calls completed 54/161 API calls completed 55/161 API calls completed 56/161 API calls completed 57/161 API calls completed 58/161 API calls completed 59/161 API calls completed 60/161 API calls completed 61/161 API calls completed 62/161 API calls completed 63/161 API calls completed 64/161 API calls completed 65/161 API calls completed 66/161 API calls completed 67/161 API calls completed 68/161 API calls completed 69/161 API calls completed 70/161 API calls completed 71/161 API calls completed 72/161 API calls completed 73/161 API calls completed 74/161 API calls completed 75/161 API calls completed 76/161 API calls completed 77/161 API calls completed 78/161 API calls completed 79/161 API calls completed 80/161 API calls completed 81/161 API calls completed 82/161 API calls completed 83/161 API calls completed 84/161 API calls completed 85/161 API calls completed 86/161 API calls completed 87/161 API calls completed 88/161 API calls completed 89/161 API calls completed 90/161 API calls completed 91/161 API calls completed 92/161 API calls completed 93/161 API calls completed 94/161 API calls completed 95/161 API calls completed 96/161 API calls completed 97/161 API calls completed 98/161 API calls completed 99/161 API calls completed 100/161 API calls completed 101/161 API calls completed 102/161 API calls completed 103/161 API calls completed 104/161 API calls completed 105/161 API calls completed 106/161 API calls completed 107/161 API calls completed 108/161 API calls completed 109/161 API calls completed 110/161 API calls completed 111/161 API calls completed 112/161 API calls completed 113/161 API calls completed 114/161 API calls completed 115/161 API calls completed 116/161 API calls completed 117/161 API calls completed 118/161 API calls completed 119/161 API calls completed 120/161 API calls completed 121/161 API calls completed 122/161 API calls completed 123/161 API calls completed 124/161 API calls completed 125/161 API calls completed 126/161 API calls completed 127/161 API calls completed 128/161 API calls completed 129/161 API calls completed 130/161 API calls completed 131/161 API calls completed 132/161 API calls completed 133/161 API calls completed 134/161 API calls completed 135/161 API calls completed 136/161 API calls completed 137/161 API calls completed 138/161 API calls completed 139/161 API calls completed 140/161 API calls completed 141/161 API calls completed 142/161 API calls completed 143/161 API calls completed 144/161 API calls completed 145/161 API calls completed 146/161 API calls completed 147/161 API calls completed 148/161 API calls completed 149/161 API calls completed 150/161 API calls completed 151/161 API calls completed 152/161 API calls completed 153/161 API calls completed 154/161 API calls completed 155/161 API calls completed 156/161 API calls completed 157/161 API calls completed 158/161 API calls completed 159/161 API calls completed 160/161 API calls completed 161/161 API calls completed (253.2s total) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [brown] (1 jersey(s), 9.0s) PASS exact:1, similar:1 [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 6.6s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [navy blue] (3 jersey(s), 9.4s) PASS similar:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (2 jersey(s), 10.5s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 3.0s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [dark blue] (1 jersey(s), 3.1s) PASS similar:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.0s) PASS exact:1 [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 5.1s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [red] (4 jersey(s), 17.9s) PASS exact:1 [10/161] 010 - white_black.jpg GT: [black] VLM: [black] (3 jersey(s), 11.3s) PASS exact:1 [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 8.5s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 3.8s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 10.2s) FAIL MISS:light blue, extra:blue [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 6.1s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 3.4s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 3.2s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 4.8s) PASS exact:1 [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [grey] (1 jersey(s), 6.5s) PARTIAL similar:1, MISS:red [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [maroon] (1 jersey(s), 4.4s) PARTIAL exact:1, MISS:gold [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 5.6s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.8s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 3.3s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.7s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 5.1s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 3.7s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 6.8s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 35.2s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 37.9s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [red] (2 jersey(s), 4.8s) FAIL MISS:maroon, extra:red [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 38.6s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 4.9s) PASS exact:1 [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 5.3s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (5 jersey(s), 37.0s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (6 jersey(s), 15.9s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green, yellow] (2 jersey(s), 14.4s) PASS exact:2 [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 6.4s) PASS exact:1 [37/161] 037 -navy_white.jpg GT: [navy] VLM: [navy blue] (4 jersey(s), 8.3s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 8.2s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [grey] (2 jersey(s), 4.6s) PASS similar:1 [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [grey, maroon] (2 jersey(s), 7.3s) PASS exact:1, similar:1 [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [navy blue] (8 jersey(s), 42.8s) PASS exact:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.0s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, grey] (5 jersey(s), 39.4s) PASS exact:1, similar:1 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (8 jersey(s), 35.7s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (3 jersey(s), 34.7s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (8 jersey(s), 39.6s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 6.5s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [(none)] (0 jersey(s), 7.4s) FAIL MISS:red [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 3.3s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (6 jersey(s), 39.2s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.1s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black] (1 jersey(s), 3.2s) PARTIAL exact:1, MISS:gold [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 3.2s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [blue] (2 jersey(s), 3.5s) PASS exact:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (2 jersey(s), 5.8s) PASS exact:1, similar:1 [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (4 jersey(s), 12.5s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [(none)] (1 jersey(s), 4.1s) FAIL MISS:gold|yellow [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 5.3s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 3.4s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 5.7s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (1 jersey(s), 3.0s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 5.7s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 5.1s) PASS similar:1 [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 4.0s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (4 jersey(s), 38.3s) PASS exact:1, similar:1 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 3.3s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 10.7s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 6.1s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [(none)] (5 jersey(s), 39.4s) FAIL MISS:red [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 6.2s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 11.6s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 4.7s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (1 jersey(s), 4.8s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [(none)] (1 jersey(s), 7.0s) FAIL MISS:orange [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 3.4s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 8.5s) PASS exact:1 [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (5 jersey(s), 37.9s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 6.8s) PASS exact:1 [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, maroon] (6 jersey(s), 7.7s) PASS exact:2 [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 4.4s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 4.4s) PASS similar:1 [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (3 jersey(s), 6.8s) PASS similar:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [black] (2 jersey(s), 10.1s) FAIL MISS:dark brown, extra:black [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [brown, yellow] (2 jersey(s), 3.4s) PASS exact:1, similar:1 [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 9.2s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 5.7s) PASS similar:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 8.4s) PASS exact:1 [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [(none)] (2 jersey(s), 5.3s) FAIL MISS:maroon [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 7.4s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (5 jersey(s), 36.7s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (3 jersey(s), 6.0s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (6 jersey(s), 10.9s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 4.5s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 6.6s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 36.8s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 2.7s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [grey] (3 jersey(s), 36.8s) PARTIAL similar:1, MISS:black [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 6.7s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 8.1s) PASS exact:1 [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 5.7s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (7 jersey(s), 12.1s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black] (1 jersey(s), 3.4s) PARTIAL exact:1, MISS:yellow [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 18.0s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 35.2s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 5.3s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, grey] (2 jersey(s), 34.5s) PASS exact:1, similar:1 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 4.7s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 4.5s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 4.7s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 9.0s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 37.6s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [(none)] (0 jersey(s), 6.8s) FAIL MISS:orange [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.4s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 5.1s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, red] (4 jersey(s), 7.7s) PARTIAL similar:1, MISS:maroon, extra:red [116/161] 116 - gray_white.jpg GT: [gray] VLM: [grey] (2 jersey(s), 7.1s) PASS similar:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 3.8s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 7.9s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (4 jersey(s), 8.5s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [blue, red] (3 jersey(s), 20.5s) PASS exact:1, similar:1 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 6.5s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [grey] (1 jersey(s), 3.4s) PASS similar:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (4 jersey(s), 20.7s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (4 jersey(s), 7.8s) PASS similar:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [navy, red] (3 jersey(s), 7.7s) PARTIAL similar:1, MISS:maroon, extra:red [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 7.5s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [black, yellow] (5 jersey(s), 22.9s) PARTIAL exact:1, extra:black [128/161] 128 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 36.1s) PASS exact:1 [129/161] 129 - blue_white.jpg GT: [blue] VLM: [(none)] (3 jersey(s), 6.0s) FAIL MISS:blue [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [yellow] (1 jersey(s), 3.3s) PARTIAL exact:1, MISS:black [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 5.4s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (3 jersey(s), 30.8s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (7 jersey(s), 42.4s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [blue] (1 jersey(s), 7.1s) FAIL MISS:teal, extra:blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (1 jersey(s), 3.6s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 3.5s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 7.3s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 3.5s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (1 jersey(s), 12.2s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 4.0s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [light blue] (3 jersey(s), 4.7s) PASS exact:1 [142/161] 142 - orange_white.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.0s) PASS exact:1 [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 5.9s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (13 jersey(s), 8.2s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.0s) PASS exact:1 [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [grey, red] (2 jersey(s), 4.2s) PASS exact:1, similar:1 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 4.8s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 6.0s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 37.0s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [black] (2 jersey(s), 12.3s) FAIL MISS:green,gray, extra:black [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [navy, yellow] (6 jersey(s), 39.2s) PARTIAL exact:1, MISS:black, extra:navy [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [navy blue, pink] (3 jersey(s), 5.9s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 5.2s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [brown] (5 jersey(s), 7.0s) PASS similar:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [grey, purple, yellow] (5 jersey(s), 7.7s) PARTIAL exact:2, similar:1, MISS:green [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (3 jersey(s), 35.1s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 40.2s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [dark blue, yellow] (6 jersey(s), 33.8s) PASS exact:2 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 36.7s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [(none)] (1 jersey(s), 4.1s) FAIL MISS:blue [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 4.8s) FAIL MISS:light blue, extra:blue ================================================================================ ACCURACY SUMMARY (gemini-3-flash-preview) ================================================================================ Images processed: 161 Errors: 0 Total time: 253.2s (1.6s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 175 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 126 / 202 (62.4%) Similar match: 35 / 202 (17.3%) Total found: 161 / 202 (79.7%) Missed: 41 / 202 (20.3%) --- Precision (are VLM colors correct?) --- Exact match: 126 / 175 (72.0%) Similar match: 34 / 175 (19.4%) Total correct: 160 / 175 (91.4%) Extra/wrong: 15 / 175 (8.6%) --- Similar-Match Confusions (expected -> got) --- gray -> grey x10 navy blue -> blue x6 dark brown -> brown x5 dark blue -> navy blue x5 gold -> yellow x3 dark blue -> blue x3 navy blue -> dark blue x1 navy -> navy blue x1 dark blue -> navy x1 --- Most Missed Ground Truth Colors --- black 7 ####### gray 6 ###### maroon 6 ###### light blue 3 ### red 3 ### blue 3 ### green 3 ### gold 2 ## orange 2 ## teal 2 ## gold|yellow 1 # dark brown 1 # yellow 1 # brown 1 # --- Most Common Extra/Wrong VLM Colors --- blue 4 #### red 4 #### black 4 #### green 1 # orange 1 # navy 1 # --- Per-Image Verdict --- PASS 120 PARTIAL 20 FAIL 21 --- Failed Images (21) --- 013 - light blue.jpg missed: light blue extra: blue 016 - maroon.jpg missed: maroon 029 -maroon_white.jpg missed: maroon extra: red 034 - light blue.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 048 - red.jpg missed: red 053 - black_white.jpg missed: black 057 - white_gold or yellow.jpg missed: gold|yellow 069 - red_white.jpg missed: red 074 - white_orange.jpg missed: orange 077 - teal_white.jpg missed: teal extra: green 083 - dark brown_white.jpg missed: dark brown extra: black 088 - white_maroon.jpg missed: maroon 112 - orange_white.jpg missed: orange 129 - blue_white.jpg missed: blue 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: blue 138 - maroon.jpg missed: maroon extra: red 150 - green_gray.jpg missed: green, gray extra: black 160 - blue_white.jpg missed: blue 161 - light blue_white.jpg missed: light blue extra: blue ======================================== Qwen3-VL-8B + jersey_prompt_capstone.txt Started: Tue Mar 3 05:10:58 PM MST 2026 ======================================== Images to process: 161 Server: http://agx:8080 Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [black] (2 jersey(s), 8.2s) FAIL MISS:brown,dark brown, extra:black [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 6.0s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [blue] (3 jersey(s), 8.3s) PASS similar:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (3 jersey(s), 11.9s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 3.8s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 4.2s) PASS similar:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.0s) PASS exact:1 [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 3.2s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [gold, red] (3 jersey(s), 10.8s) PARTIAL exact:1, extra:gold [10/161] 010 - white_black.jpg GT: [black] VLM: [black] (3 jersey(s), 10.9s) PASS exact:1 [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 13.8s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.3s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 7.5s) FAIL MISS:light blue, extra:blue [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 11.0s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 5.4s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 1.7s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [black] (2 jersey(s), 6.9s) FAIL MISS:brown, extra:black [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [gray, red] (2 jersey(s), 7.3s) PASS exact:2 [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [red] (1 jersey(s), 3.7s) FAIL MISS:maroon,gold, extra:red [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 6.2s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 6.1s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 3.8s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.9s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 7.7s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 3.2s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 16.1s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [red] (2 jersey(s), 7.8s) FAIL MISS:maroon, extra:red [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 5.9s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.0s) PASS exact:1 [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 6.1s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (3 jersey(s), 10.9s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (1 jersey(s), 3.7s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green, yellow] (2 jersey(s), 8.0s) PASS exact:2 [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [blue] (4 jersey(s), 13.8s) FAIL MISS:light blue, extra:blue [37/161] 037 -navy_white.jpg GT: [navy] VLM: [blue] (3 jersey(s), 10.1s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 11.0s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 7.9s) PASS exact:1 [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (1 jersey(s), 5.1s) PARTIAL exact:1, MISS:gray [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (9 jersey(s), 28.9s) PASS similar:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.8s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, gray] (2 jersey(s), 8.0s) PASS exact:2 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (7 jersey(s), 22.5s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.9s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (15 jersey(s), 46.5s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 10.7s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [red] (1 jersey(s), 4.9s) PASS exact:1 [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 6.1s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (4 jersey(s), 13.8s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.8s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black] (1 jersey(s), 4.8s) PARTIAL exact:1, MISS:gold [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 3.7s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [blue] (2 jersey(s), 5.9s) PASS exact:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (2 jersey(s), 7.7s) PASS exact:1, similar:1 [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (2 jersey(s), 5.9s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [(none)] (1 jersey(s), 3.7s) FAIL MISS:gold|yellow [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 14.0s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 3.8s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 6.0s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (1 jersey(s), 3.7s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 5.6s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [black] (1 jersey(s), 3.7s) FAIL MISS:dark brown, extra:black [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 4.8s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (3 jersey(s), 10.4s) PASS exact:1, similar:1 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 3.5s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 13.8s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 3.7s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 16.6s) PASS exact:1 [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 8.3s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.9s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.5s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (1 jersey(s), 3.4s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.4s) PASS exact:1 [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 4.8s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 14.2s) PASS exact:1 [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (3 jersey(s), 10.4s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 5.9s) PASS exact:1 [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, red] (4 jersey(s), 13.8s) PARTIAL exact:1, MISS:maroon, extra:red [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 3.5s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 5.8s) PASS similar:1 [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (3 jersey(s), 10.6s) PASS similar:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [black] (1 jersey(s), 3.7s) FAIL MISS:dark brown, extra:black [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [black, yellow] (2 jersey(s), 6.0s) PARTIAL exact:1, MISS:dark brown, extra:black [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 3.6s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 5.0s) PASS similar:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 6.0s) PASS exact:1 [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.9s) PASS exact:1 [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 11.0s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (4 jersey(s), 14.2s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (2 jersey(s), 8.1s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 13.8s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 5.9s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 12.5s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 6.7s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [gray] (2 jersey(s), 8.0s) PARTIAL exact:1, MISS:black [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 6.9s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 12.2s) PASS exact:1 [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 13.8s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 17.0s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black, yellow] (2 jersey(s), 8.0s) PASS exact:2 [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 17.3s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 11.0s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.4s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, gray] (2 jersey(s), 7.3s) PASS exact:2 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 10.7s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.9s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 6.0s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 13.9s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 6.1s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [(none)] (1 jersey(s), 3.6s) FAIL MISS:orange [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.8s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 6.3s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, red] (4 jersey(s), 13.8s) PARTIAL similar:1, MISS:maroon, extra:red [116/161] 116 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 6.0s) PASS exact:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 6.2s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 7.4s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (3 jersey(s), 10.9s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [blue, red] (3 jersey(s), 10.6s) PASS exact:1, similar:1 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 11.0s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [gray] (1 jersey(s), 5.1s) PASS exact:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (4 jersey(s), 13.9s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (4 jersey(s), 13.7s) PASS similar:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [blue, red] (2 jersey(s), 8.2s) PARTIAL similar:1, MISS:maroon, extra:red [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 10.8s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [yellow] (4 jersey(s), 13.9s) PASS exact:1 [128/161] 128 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [129/161] 129 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 13.8s) PASS exact:1 [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [black, yellow] (2 jersey(s), 8.4s) PASS exact:2 [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 8.3s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (3 jersey(s), 10.8s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (7 jersey(s), 23.5s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [blue] (1 jersey(s), 5.0s) FAIL MISS:teal, extra:blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (1 jersey(s), 3.9s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 6.2s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 10.9s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 3.8s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 6.0s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.7s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [blue] (3 jersey(s), 8.6s) FAIL MISS:light blue, extra:blue [142/161] 142 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 6.1s) PASS exact:1 [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 11.0s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (12 jersey(s), 37.7s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [gray, red] (2 jersey(s), 6.7s) PASS exact:2 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 8.3s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 7.9s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 13.7s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [black] (2 jersey(s), 7.8s) FAIL MISS:green,gray, extra:black [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [navy, yellow] (5 jersey(s), 17.1s) PARTIAL exact:1, MISS:black, extra:navy [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [blue, pink] (2 jersey(s), 7.9s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.1s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [brown] (5 jersey(s), 12.9s) PASS similar:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [purple, yellow] (4 jersey(s), 14.2s) PARTIAL exact:2, MISS:green,gray [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (2 jersey(s), 7.6s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 8.3s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [navy, yellow] (4 jersey(s), 14.0s) PASS exact:1, similar:1 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 13.9s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [(none)] (1 jersey(s), 3.8s) FAIL MISS:blue [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 5.8s) FAIL MISS:light blue, extra:blue ================================================================================ ACCURACY SUMMARY ================================================================================ Images processed: 161 Errors: 0 Total time: 1437.3s (8.9s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 181 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 134 / 202 (66.3%) Similar match: 24 / 202 (11.9%) Total found: 158 / 202 (78.2%) Missed: 44 / 202 (21.8%) --- Precision (are VLM colors correct?) --- Exact match: 134 / 181 (74.0%) Similar match: 24 / 181 (13.3%) Total correct: 158 / 181 (87.3%) Extra/wrong: 23 / 181 (12.7%) --- Similar-Match Confusions (expected -> got) --- dark blue -> blue x9 navy blue -> blue x8 gold -> yellow x3 dark brown -> brown x2 navy -> blue x1 dark blue -> navy x1 --- Most Missed Ground Truth Colors --- gray 8 ######## maroon 7 ####### black 6 ###### light blue 5 ##### dark brown 4 #### brown 3 ### green 3 ### gold 2 ## blue 2 ## teal 2 ## gold|yellow 1 # orange 1 # --- Most Common Extra/Wrong VLM Colors --- black 7 ####### blue 6 ###### red 6 ###### gold 1 # green 1 # orange 1 # navy 1 # --- Per-Image Verdict --- PASS 120 PARTIAL 19 FAIL 22 --- Failed Images (22) --- 001 -brown_white or dark brown.jpg missed: brown, dark brown extra: black 013 - light blue.jpg missed: light blue extra: blue 016 - maroon.jpg missed: maroon 017 - brown_white.jpg missed: brown extra: black 019 - maroon_gold.jpg missed: maroon, gold extra: red 029 -maroon_white.jpg missed: maroon extra: red 034 - light blue.jpg missed: light blue extra: blue 036 - light blue_white.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 053 - black_white.jpg missed: black 057 - white_gold or yellow.jpg missed: gold|yellow 063 - dark brown.jpg missed: dark brown extra: black 077 - teal_white.jpg missed: teal extra: green 083 - dark brown_white.jpg missed: dark brown extra: black 112 - orange_white.jpg missed: orange 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: blue 138 - maroon.jpg missed: maroon extra: red 141 - light blue_white.jpg missed: light blue extra: blue 150 - green_gray.jpg missed: green, gray extra: black 160 - blue_white.jpg missed: blue 161 - light blue_white.jpg missed: light blue extra: blue ======================================== Gemini 3 Flash + jersey_prompt_capstone.txt Started: Tue Mar 3 05:34:55 PM MST 2026 ======================================== Model: gemini-3-flash-preview Images to process: 161 Concurrency: 8 workers Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars) ================================================================================ Pre-encoding images ... 161 images in 1.7s Sending API requests ... 1/161 API calls completed 2/161 API calls completed 3/161 API calls completed 4/161 API calls completed 5/161 API calls completed 6/161 API calls completed 7/161 API calls completed 8/161 API calls completed 9/161 API calls completed 10/161 API calls completed 11/161 API calls completed 12/161 API calls completed 13/161 API calls completed 14/161 API calls completed 15/161 API calls completed 16/161 API calls completed 17/161 API calls completed 18/161 API calls completed 19/161 API calls completed 20/161 API calls completed 21/161 API calls completed 22/161 API calls completed 23/161 API calls completed 24/161 API calls completed 25/161 API calls completed 26/161 API calls completed 27/161 API calls completed 28/161 API calls completed 29/161 API calls completed 30/161 API calls completed 31/161 API calls completed 32/161 API calls completed 33/161 API calls completed 34/161 API calls completed 35/161 API calls completed 36/161 API calls completed 37/161 API calls completed 38/161 API calls completed 39/161 API calls completed 40/161 API calls completed 41/161 API calls completed 42/161 API calls completed 43/161 API calls completed 44/161 API calls completed 45/161 API calls completed 46/161 API calls completed 47/161 API calls completed 48/161 API calls completed 49/161 API calls completed 50/161 API calls completed 51/161 API calls completed 52/161 API calls completed 53/161 API calls completed 54/161 API calls completed 55/161 API calls completed 56/161 API calls completed 57/161 API calls completed 58/161 API calls completed 59/161 API calls completed 60/161 API calls completed 61/161 API calls completed 62/161 API calls completed 63/161 API calls completed 64/161 API calls completed 65/161 API calls completed 66/161 API calls completed 67/161 API calls completed 68/161 API calls completed 69/161 API calls completed 70/161 API calls completed 71/161 API calls completed 72/161 API calls completed 73/161 API calls completed 74/161 API calls completed 75/161 API calls completed 76/161 API calls completed 77/161 API calls completed 78/161 API calls completed 79/161 API calls completed 80/161 API calls completed 81/161 API calls completed 82/161 API calls completed 83/161 API calls completed 84/161 API calls completed 85/161 API calls completed 86/161 API calls completed 87/161 API calls completed 88/161 API calls completed 89/161 API calls completed 90/161 API calls completed 91/161 API calls completed 92/161 API calls completed 93/161 API calls completed 94/161 API calls completed 95/161 API calls completed 96/161 API calls completed 97/161 API calls completed 98/161 API calls completed 99/161 API calls completed 100/161 API calls completed 101/161 API calls completed 102/161 API calls completed 103/161 API calls completed 104/161 API calls completed 105/161 API calls completed 106/161 API calls completed 107/161 API calls completed 108/161 API calls completed 109/161 API calls completed 110/161 API calls completed 111/161 API calls completed 112/161 API calls completed 113/161 API calls completed 114/161 API calls completed 115/161 API calls completed 116/161 API calls completed 117/161 API calls completed 118/161 API calls completed 119/161 API calls completed 120/161 API calls completed 121/161 API calls completed 122/161 API calls completed 123/161 API calls completed 124/161 API calls completed 125/161 API calls completed 126/161 API calls completed 127/161 API calls completed 128/161 API calls completed 129/161 API calls completed 130/161 API calls completed 131/161 API calls completed 132/161 API calls completed 133/161 API calls completed 134/161 API calls completed 135/161 API calls completed 136/161 API calls completed 137/161 API calls completed 138/161 API calls completed 139/161 API calls completed 140/161 API calls completed 141/161 API calls completed 142/161 API calls completed 143/161 API calls completed 144/161 API calls completed 145/161 API calls completed 146/161 API calls completed 147/161 API calls completed 148/161 API calls completed 149/161 API calls completed 150/161 API calls completed 151/161 API calls completed 152/161 API calls completed 153/161 API calls completed 154/161 API calls completed 155/161 API calls completed 156/161 API calls completed 157/161 API calls completed 158/161 API calls completed 159/161 API calls completed 160/161 API calls completed 161/161 API calls completed (259.8s total) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [brown] (1 jersey(s), 7.0s) PASS exact:1, similar:1 [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 4.6s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 7.5s) PASS similar:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (3 jersey(s), 18.8s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 3.7s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [dark blue] (1 jersey(s), 4.7s) PASS similar:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.3s) PASS exact:1 [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 7.5s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [red] (3 jersey(s), 12.1s) PASS exact:1 [10/161] 010 - white_black.jpg GT: [black] VLM: [black] (3 jersey(s), 13.8s) PASS exact:1 [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 12.5s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 3.5s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 4.1s) PASS exact:1 [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 4.6s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 4.0s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 5.0s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [brown] (3 jersey(s), 8.9s) PASS exact:1 [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [grey] (1 jersey(s), 4.1s) PARTIAL similar:1, MISS:red [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [red] (1 jersey(s), 5.0s) FAIL MISS:maroon,gold, extra:red [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 4.0s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 4.3s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 5.3s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 3.6s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 3.6s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 3.3s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 5.9s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 36.1s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 38.3s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [red] (2 jersey(s), 4.8s) FAIL MISS:maroon, extra:red [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 10.8s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 4.2s) PASS exact:1 [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 4.8s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (7 jersey(s), 40.2s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (1 jersey(s), 12.7s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green, yellow] (3 jersey(s), 9.2s) PASS exact:2 [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 5.0s) PASS exact:1 [37/161] 037 -navy_white.jpg GT: [navy] VLM: [blue] (4 jersey(s), 7.5s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 36.8s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [blue, grey] (4 jersey(s), 38.9s) PARTIAL similar:1, extra:blue [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [grey, maroon] (2 jersey(s), 11.3s) PASS exact:1, similar:1 [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (8 jersey(s), 7.2s) PASS similar:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.5s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, grey] (5 jersey(s), 7.6s) PASS exact:1, similar:1 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (8 jersey(s), 36.8s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (2 jersey(s), 5.4s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (8 jersey(s), 39.3s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 4.7s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [(none)] (0 jersey(s), 34.4s) FAIL MISS:red [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 4.1s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (5 jersey(s), 37.3s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.2s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black] (1 jersey(s), 3.7s) PARTIAL exact:1, MISS:gold [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 3.4s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [blue] (2 jersey(s), 3.3s) PASS exact:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green] (1 jersey(s), 11.1s) PARTIAL exact:1, MISS:gold [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (3 jersey(s), 6.6s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [(none)] (1 jersey(s), 4.0s) FAIL MISS:gold|yellow [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 7.7s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 4.3s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 4.8s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (1 jersey(s), 4.0s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 4.3s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 3.5s) PASS similar:1 [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 5.7s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (5 jersey(s), 40.7s) PASS exact:1, similar:1 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 4.7s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 5.8s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 4.3s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [(none)] (5 jersey(s), 38.2s) FAIL MISS:red [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 6.2s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 3.8s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 3.5s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (1 jersey(s), 9.0s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 4.3s) PASS exact:1 [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 3.2s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue, pink] (4 jersey(s), 8.1s) PARTIAL exact:1, extra:pink [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (5 jersey(s), 37.1s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 10.8s) FAIL MISS:light blue, extra:blue [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, red] (6 jersey(s), 36.8s) PARTIAL exact:1, MISS:maroon, extra:red [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 3.4s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 3.9s) PASS similar:1 [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (3 jersey(s), 6.3s) PASS similar:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [black] (2 jersey(s), 14.2s) FAIL MISS:dark brown, extra:black [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [brown, yellow] (2 jersey(s), 3.8s) PASS exact:1, similar:1 [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.3s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 4.5s) PASS similar:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.5s) PASS exact:1 [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [maroon] (4 jersey(s), 41.4s) PASS exact:1 [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 5.8s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (5 jersey(s), 38.4s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (3 jersey(s), 10.2s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 39.3s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (2 jersey(s), 4.6s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 7.0s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 22.9s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 5.2s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [grey] (2 jersey(s), 19.4s) PARTIAL similar:1, MISS:black [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 4.3s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [red] (3 jersey(s), 4.5s) FAIL MISS:maroon, extra:red [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 40.0s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (7 jersey(s), 39.2s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black] (1 jersey(s), 4.2s) PARTIAL exact:1, MISS:yellow [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 36.3s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 4.1s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 6.3s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, grey] (2 jersey(s), 4.0s) PASS exact:1, similar:1 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 4.4s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (1 jersey(s), 47.1s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 5.2s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 10.7s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 34.8s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 5.8s) PASS exact:1 [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 5.1s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 5.7s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, red] (4 jersey(s), 7.9s) PARTIAL similar:1, MISS:maroon, extra:red [116/161] 116 - gray_white.jpg GT: [gray] VLM: [grey] (2 jersey(s), 3.9s) PASS similar:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 3.8s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (1 jersey(s), 8.5s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (3 jersey(s), 4.7s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [dark blue, red] (3 jersey(s), 7.0s) PASS exact:2 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 5.9s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [grey] (1 jersey(s), 2.6s) PASS similar:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (4 jersey(s), 8.8s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (4 jersey(s), 4.9s) PASS similar:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [navy, red] (3 jersey(s), 8.1s) PARTIAL similar:1, MISS:maroon, extra:red [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 5.8s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [yellow] (4 jersey(s), 4.8s) PASS exact:1 [128/161] 128 - green_white.jpg GT: [green] VLM: [(none)] (0 jersey(s), 42.6s) FAIL MISS:green [129/161] 129 - blue_white.jpg GT: [blue] VLM: [(none)] (3 jersey(s), 16.8s) FAIL MISS:blue [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [yellow] (1 jersey(s), 3.4s) PARTIAL exact:1, MISS:black [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 3.8s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (2 jersey(s), 10.2s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (8 jersey(s), 43.5s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [light blue] (1 jersey(s), 4.3s) FAIL MISS:teal, extra:light blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (1 jersey(s), 5.2s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 3.8s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 8.6s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 3.3s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [blue] (1 jersey(s), 4.8s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 3.5s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [blue] (3 jersey(s), 5.2s) FAIL MISS:light blue, extra:blue [142/161] 142 - orange_white.jpg GT: [orange] VLM: [orange] (1 jersey(s), 5.5s) PASS exact:1 [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 4.7s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (8 jersey(s), 39.7s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 3.8s) PASS exact:1 [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [grey, red] (2 jersey(s), 4.0s) PASS exact:1, similar:1 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 4.1s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 5.9s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 36.9s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [black] (1 jersey(s), 12.8s) FAIL MISS:green,gray, extra:black [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [dark blue, yellow] (5 jersey(s), 38.6s) PARTIAL exact:1, MISS:black, extra:dark blue [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [navy blue, pink] (3 jersey(s), 22.1s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 3.7s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [brown] (5 jersey(s), 5.1s) PASS similar:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [grey, purple, yellow] (5 jersey(s), 7.4s) PARTIAL exact:2, similar:1, MISS:green [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (1 jersey(s), 12.0s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 38.1s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [blue, yellow] (7 jersey(s), 37.4s) PASS exact:1, similar:1 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 11.2s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [(none)] (1 jersey(s), 4.2s) FAIL MISS:blue [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [blue] (2 jersey(s), 5.6s) FAIL MISS:light blue, extra:blue ================================================================================ ACCURACY SUMMARY (gemini-3-flash-preview) ================================================================================ Images processed: 161 Errors: 0 Total time: 259.8s (1.6s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 177 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 123 / 202 (60.9%) Similar match: 35 / 202 (17.3%) Total found: 158 / 202 (78.2%) Missed: 44 / 202 (21.8%) --- Precision (are VLM colors correct?) --- Exact match: 123 / 177 (69.5%) Similar match: 34 / 177 (19.2%) Total correct: 157 / 177 (88.7%) Extra/wrong: 20 / 177 (11.3%) --- Similar-Match Confusions (expected -> got) --- gray -> grey x10 navy blue -> blue x7 dark brown -> brown x5 dark blue -> blue x5 dark blue -> navy blue x3 gold -> yellow x2 navy blue -> dark blue x1 navy -> blue x1 dark blue -> navy x1 --- Most Missed Ground Truth Colors --- maroon 8 ######## black 7 ####### gray 6 ###### light blue 4 #### green 4 #### red 3 ### gold 3 ### blue 3 ### teal 2 ## gold|yellow 1 # dark brown 1 # yellow 1 # brown 1 # --- Most Common Extra/Wrong VLM Colors --- red 7 ####### blue 5 ##### black 3 ### pink 1 # green 1 # orange 1 # light blue 1 # dark blue 1 # --- Per-Image Verdict --- PASS 117 PARTIAL 22 FAIL 22 --- Failed Images (22) --- 016 - maroon.jpg missed: maroon 019 - maroon_gold.jpg missed: maroon, gold extra: red 029 -maroon_white.jpg missed: maroon extra: red 034 - light blue.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 048 - red.jpg missed: red 053 - black_white.jpg missed: black 057 - white_gold or yellow.jpg missed: gold|yellow 069 - red_white.jpg missed: red 077 - teal_white.jpg missed: teal extra: green 078 - light blue_white.jpg missed: light blue extra: blue 083 - dark brown_white.jpg missed: dark brown extra: black 099 - maroon_white.jpg missed: maroon extra: red 128 - green_white.jpg missed: green 129 - blue_white.jpg missed: blue 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: light blue 138 - maroon.jpg missed: maroon extra: red 141 - light blue_white.jpg missed: light blue extra: blue 150 - green_gray.jpg missed: green, gray extra: black 160 - blue_white.jpg missed: blue 161 - light blue_white.jpg missed: light blue extra: blue ======================================== Qwen3-VL-8B + jersey_prompt_constrained.txt Started: Tue Mar 3 05:39:17 PM MST 2026 ======================================== Images to process: 161 Server: http://agx:8080 Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [black] (3 jersey(s), 11.6s) FAIL MISS:brown,dark brown, extra:black [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 8.0s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [dark blue] (3 jersey(s), 11.1s) PASS exact:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (3 jersey(s), 11.7s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 5.0s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [dark blue] (1 jersey(s), 4.4s) PASS similar:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [maroon] (2 jersey(s), 8.0s) FAIL MISS:brown, extra:maroon [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 4.3s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [gold, red] (3 jersey(s), 10.8s) PARTIAL exact:1, extra:gold [10/161] 010 - white_black.jpg GT: [black] VLM: [black, maroon] (3 jersey(s), 11.0s) PARTIAL exact:1, extra:maroon [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 13.8s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.3s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.6s) PASS exact:1 [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 11.0s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 7.3s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 1.7s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [dark brown] (2 jersey(s), 8.8s) PASS similar:1 [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [gray, red] (2 jersey(s), 7.3s) PASS exact:2 [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [maroon, yellow] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 8.1s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 8.0s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 5.0s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.7s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 7.7s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 4.3s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 16.1s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.9s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.0s) PASS exact:1 [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.8s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [maroon] (2 jersey(s), 7.9s) FAIL MISS:brown, extra:maroon [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 8.1s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (3 jersey(s), 10.9s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (1 jersey(s), 4.8s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green, yellow] (2 jersey(s), 8.0s) PASS exact:2 [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 14.0s) PASS exact:1 [37/161] 037 -navy_white.jpg GT: [navy] VLM: [dark blue] (3 jersey(s), 10.3s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 10.9s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 7.9s) PASS exact:1 [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (1 jersey(s), 5.1s) PARTIAL exact:1, MISS:gray [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [navy blue] (9 jersey(s), 30.6s) PASS exact:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.9s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, gray] (2 jersey(s), 8.0s) PASS exact:2 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (7 jersey(s), 22.6s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.8s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (15 jersey(s), 46.5s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 10.8s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [maroon] (1 jersey(s), 5.0s) FAIL MISS:red, extra:maroon [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 7.9s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (4 jersey(s), 14.1s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.9s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black, yellow] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 4.9s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [navy blue] (2 jersey(s), 8.1s) PASS similar:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (2 jersey(s), 7.8s) PASS exact:1, similar:1 [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (2 jersey(s), 7.9s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [(none)] (1 jersey(s), 4.9s) FAIL MISS:gold|yellow [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 14.0s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 4.9s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 8.1s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 7.8s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 7.5s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [dark brown] (1 jersey(s), 5.0s) PASS exact:1 [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 10.7s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [dark green, yellow] (3 jersey(s), 10.6s) PASS similar:2 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 4.8s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 13.7s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 4.8s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [(none)] (4 jersey(s), 14.1s) FAIL MISS:red [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 11.1s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.0s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.5s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.4s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.5s) PASS exact:1 [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 10.7s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue] (3 jersey(s), 11.4s) PASS exact:1 [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (4 jersey(s), 13.4s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.7s) PASS exact:1 [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, maroon] (4 jersey(s), 14.1s) PASS exact:2 [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.8s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 7.7s) PASS similar:1 [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [dark blue] (3 jersey(s), 10.8s) PASS exact:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [black] (2 jersey(s), 7.9s) FAIL MISS:dark brown, extra:black [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [dark brown, gold] (2 jersey(s), 8.0s) PASS exact:1, similar:1 [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 4.8s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [dark brown] (2 jersey(s), 8.0s) PASS exact:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 8.0s) PASS exact:1 [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.8s) PASS exact:1 [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 11.1s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (4 jersey(s), 14.3s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (2 jersey(s), 8.0s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 14.0s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 8.1s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 12.5s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.0s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 8.6s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [light blue] (2 jersey(s), 8.3s) FAIL MISS:gray,black, extra:light blue [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 8.7s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 12.2s) PASS exact:1 [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 13.8s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 17.0s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black, yellow] (2 jersey(s), 8.0s) PASS exact:2 [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (5 jersey(s), 17.3s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.0s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 9.2s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, gray] (2 jersey(s), 9.1s) PASS exact:2 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.7s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 8.0s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 7.8s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 14.0s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.9s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 7.8s) PASS exact:1 [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 4.9s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 8.1s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, maroon] (4 jersey(s), 14.0s) PASS exact:1, similar:1 [116/161] 116 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 8.0s) PASS exact:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.1s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 7.8s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (3 jersey(s), 10.9s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [navy blue, red] (3 jersey(s), 11.1s) PASS exact:1, similar:1 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 10.9s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [gray] (1 jersey(s), 6.3s) PASS exact:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (4 jersey(s), 14.1s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [dark blue] (4 jersey(s), 13.9s) PASS exact:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [dark blue, red] (2 jersey(s), 8.2s) PARTIAL exact:1, MISS:maroon, extra:red [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 11.0s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [yellow] (4 jersey(s), 13.9s) PASS exact:1 [128/161] 128 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.0s) PASS exact:1 [129/161] 129 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 17.2s) PASS exact:1 [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [black, yellow] (2 jersey(s), 8.4s) PASS exact:2 [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 10.8s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (3 jersey(s), 10.8s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (6 jersey(s), 21.2s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [light blue] (1 jersey(s), 5.1s) FAIL MISS:teal, extra:light blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 8.1s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 8.0s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 11.0s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 4.9s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 8.3s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 7.7s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [light blue] (3 jersey(s), 11.2s) PASS exact:1 [142/161] 142 - orange_white.jpg GT: [orange] VLM: [maroon] (2 jersey(s), 8.2s) FAIL MISS:orange, extra:maroon [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 11.1s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (10 jersey(s), 31.9s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [(none)] (1 jersey(s), 5.0s) FAIL MISS:green [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [gray, red] (2 jersey(s), 8.0s) PASS exact:2 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 10.8s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 7.8s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 16.7s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [dark blue] (2 jersey(s), 7.9s) FAIL MISS:green,gray, extra:dark blue [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [dark blue, yellow] (5 jersey(s), 17.1s) PARTIAL exact:1, MISS:black, extra:dark blue [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [navy blue, pink] (2 jersey(s), 8.3s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 8.1s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [dark brown] (5 jersey(s), 17.3s) PASS exact:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [gray, purple, yellow] (5 jersey(s), 17.4s) PARTIAL exact:3, MISS:green [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (2 jersey(s), 7.7s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 10.7s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [dark blue, yellow] (4 jersey(s), 14.3s) PASS exact:2 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (4 jersey(s), 13.9s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [blue] (2 jersey(s), 7.9s) PASS exact:1 [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 7.9s) PASS exact:1 ================================================================================ ACCURACY SUMMARY ================================================================================ Images processed: 161 Errors: 0 Total time: 1596.1s (9.9s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 185 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 145 / 202 (71.8%) Similar match: 22 / 202 (10.9%) Total found: 167 / 202 (82.7%) Missed: 35 / 202 (17.3%) --- Precision (are VLM colors correct?) --- Exact match: 145 / 185 (78.4%) Similar match: 22 / 185 (11.9%) Total correct: 167 / 185 (90.3%) Extra/wrong: 18 / 185 (9.7%) --- Similar-Match Confusions (expected -> got) --- navy blue -> blue x6 gold -> yellow x5 dark blue -> navy blue x5 navy blue -> dark blue x1 brown -> dark brown x1 navy -> dark blue x1 blue -> navy blue x1 green -> dark green x1 yellow -> gold x1 --- Most Missed Ground Truth Colors --- gray 8 ######## black 6 ###### brown 4 #### green 4 #### maroon 3 ### dark brown 2 ## red 2 ## teal 2 ## blue 1 # light blue 1 # gold|yellow 1 # orange 1 # --- Most Common Extra/Wrong VLM Colors --- maroon 5 ##### black 3 ### light blue 2 ## red 2 ## dark blue 2 ## gold 1 # blue 1 # green 1 # orange 1 # --- Per-Image Verdict --- PASS 127 PARTIAL 15 FAIL 19 --- Failed Images (19) --- 001 -brown_white or dark brown.jpg missed: brown, dark brown extra: black 007 - brown_white.jpg missed: brown extra: maroon 016 - maroon.jpg missed: maroon 031 - brown_white.jpg missed: brown extra: maroon 034 - light blue.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 048 - red.jpg missed: red extra: maroon 053 - black_white.jpg missed: black 057 - white_gold or yellow.jpg missed: gold|yellow 069 - red_white.jpg missed: red 077 - teal_white.jpg missed: teal extra: green 083 - dark brown_white.jpg missed: dark brown extra: black 097 - gray_black.jpg missed: gray, black extra: light blue 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: light blue 138 - maroon.jpg missed: maroon extra: red 142 - orange_white.jpg missed: orange extra: maroon 145 - green_white.jpg missed: green 150 - green_gray.jpg missed: green, gray extra: dark blue ======================================== Gemini 3 Flash + jersey_prompt_constrained.txt Started: Tue Mar 3 06:05:53 PM MST 2026 ======================================== Model: gemini-3-flash-preview Images to process: 161 Concurrency: 8 workers Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars) ================================================================================ Pre-encoding images ... 161 images in 1.7s Sending API requests ... 1/161 API calls completed 2/161 API calls completed 3/161 API calls completed 4/161 API calls completed 5/161 API calls completed 6/161 API calls completed 7/161 API calls completed 8/161 API calls completed 9/161 API calls completed 10/161 API calls completed 11/161 API calls completed 12/161 API calls completed 13/161 API calls completed 14/161 API calls completed 15/161 API calls completed 16/161 API calls completed 17/161 API calls completed 18/161 API calls completed 19/161 API calls completed 20/161 API calls completed 21/161 API calls completed 22/161 API calls completed 23/161 API calls completed 24/161 API calls completed 25/161 API calls completed 26/161 API calls completed 27/161 API calls completed 28/161 API calls completed 29/161 API calls completed 30/161 API calls completed 31/161 API calls completed 32/161 API calls completed 33/161 API calls completed 34/161 API calls completed 35/161 API calls completed 36/161 API calls completed 37/161 API calls completed 38/161 API calls completed 39/161 API calls completed 40/161 API calls completed 41/161 API calls completed 42/161 API calls completed 43/161 API calls completed 44/161 API calls completed 45/161 API calls completed 46/161 API calls completed 47/161 API calls completed 48/161 API calls completed 49/161 API calls completed 50/161 API calls completed 51/161 API calls completed 52/161 API calls completed 53/161 API calls completed 54/161 API calls completed 55/161 API calls completed 56/161 API calls completed 57/161 API calls completed 58/161 API calls completed 59/161 API calls completed 60/161 API calls completed 61/161 API calls completed 62/161 API calls completed 63/161 API calls completed 64/161 API calls completed 65/161 API calls completed 66/161 API calls completed 67/161 API calls completed 68/161 API calls completed 69/161 API calls completed 70/161 API calls completed 71/161 API calls completed 72/161 API calls completed 73/161 API calls completed 74/161 API calls completed 75/161 API calls completed 76/161 API calls completed 77/161 API calls completed 78/161 API calls completed 79/161 API calls completed 80/161 API calls completed 81/161 API calls completed 82/161 API calls completed 83/161 API calls completed 84/161 API calls completed 85/161 API calls completed 86/161 API calls completed 87/161 API calls completed 88/161 API calls completed 89/161 API calls completed 90/161 API calls completed 91/161 API calls completed 92/161 API calls completed 93/161 API calls completed 94/161 API calls completed 95/161 API calls completed 96/161 API calls completed 97/161 API calls completed 98/161 API calls completed 99/161 API calls completed 100/161 API calls completed 101/161 API calls completed 102/161 API calls completed 103/161 API calls completed 104/161 API calls completed 105/161 API calls completed 106/161 API calls completed 107/161 API calls completed 108/161 API calls completed 109/161 API calls completed 110/161 API calls completed 111/161 API calls completed 112/161 API calls completed 113/161 API calls completed 114/161 API calls completed 115/161 API calls completed 116/161 API calls completed 117/161 API calls completed 118/161 API calls completed 119/161 API calls completed 120/161 API calls completed 121/161 API calls completed 122/161 API calls completed 123/161 API calls completed 124/161 API calls completed 125/161 API calls completed 126/161 API calls completed 127/161 API calls completed 128/161 API calls completed 129/161 API calls completed 130/161 API calls completed 131/161 API calls completed 132/161 API calls completed 133/161 API calls completed 134/161 API calls completed 135/161 API calls completed 136/161 API calls completed 137/161 API calls completed 138/161 API calls completed 139/161 API calls completed 140/161 API calls completed 141/161 API calls completed 142/161 API calls completed 143/161 API calls completed 144/161 API calls completed 145/161 API calls completed 146/161 API calls completed 147/161 API calls completed 148/161 API calls completed 149/161 API calls completed 150/161 API calls completed 151/161 API calls completed 152/161 API calls completed 153/161 API calls completed 154/161 API calls completed 155/161 API calls completed 156/161 API calls completed 157/161 API calls completed 158/161 API calls completed 159/161 API calls completed 160/161 API calls completed 161/161 API calls completed (344.4s total) ================================================================================ [1/161] 001 -brown_white or dark brown.jpg GT: [brown, dark brown] VLM: [dark brown] (2 jersey(s), 36.3s) PASS exact:1, similar:1 [2/161] 002 - yellow.jpg GT: [yellow] VLM: [yellow] (2 jersey(s), 6.3s) PASS exact:1 [3/161] 003 - dark blue.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 7.5s) PASS similar:1 [4/161] 004 - purple_light blue.jpg GT: [purple, light blue] VLM: [light blue, purple] (2 jersey(s), 37.3s) PASS exact:2 [5/161] 005 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (1 jersey(s), 4.5s) PARTIAL exact:1, MISS:gray [6/161] 006 - navy blue.jpg GT: [navy blue] VLM: [navy blue] (1 jersey(s), 5.0s) PASS exact:1 [7/161] 007 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.1s) PASS exact:1 [8/161] 008 -red or orange.jpg GT: [red|orange] VLM: [red] (1 jersey(s), 3.2s) PASS exact:1 [9/161] 009 - white_red.jpg GT: [red] VLM: [red] (4 jersey(s), 35.1s) PASS exact:1 [10/161] 010 - white_black.jpg GT: [black] VLM: [black] (3 jersey(s), 10.5s) PASS exact:1 [11/161] 011 - white or gray_purple.jpg GT: [gray, purple] VLM: [purple] (4 jersey(s), 40.8s) PARTIAL exact:1, MISS:gray [12/161] 012 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 5.3s) PASS exact:1 [13/161] 013 - light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 8.9s) PASS exact:1 [14/161] 014 - orange_dark blue or purple.jpg GT: [orange, dark blue|purple] VLM: [orange, purple] (3 jersey(s), 9.8s) PASS exact:2 [15/161] 015 - green.jpg GT: [green] VLM: [green] (2 jersey(s), 4.4s) PASS exact:1 [16/161] 016 - maroon.jpg GT: [maroon] VLM: [(none)] (0 jersey(s), 3.9s) FAIL MISS:maroon [17/161] 017 - brown_white.jpg GT: [brown] VLM: [dark brown] (2 jersey(s), 6.5s) PASS similar:1 [18/161] 018 - gray_red.jpg GT: [gray, red] VLM: [gray] (1 jersey(s), 8.7s) PARTIAL exact:1, MISS:red [19/161] 019 - maroon_gold.jpg GT: [maroon, gold] VLM: [maroon] (1 jersey(s), 4.5s) PARTIAL exact:1, MISS:gold [20/161] 020 - white_brown or orange.jpg GT: [brown|orange] VLM: [orange] (2 jersey(s), 4.9s) PASS exact:1 [21/161] 021 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 9.1s) PASS exact:1 [22/161] 022 - black_light blue.jpg GT: [black, light blue] VLM: [light blue] (1 jersey(s), 5.0s) PARTIAL exact:1, MISS:black [23/161] 023 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.2s) PASS exact:1 [24/161] 024 - white_pink.jpg GT: [pink] VLM: [pink] (2 jersey(s), 5.7s) PASS exact:1 [25/161] 025 - blue_green.jpg GT: [blue, green] VLM: [green] (1 jersey(s), 3.8s) PARTIAL exact:1, MISS:blue [26/161] 026 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 6.8s) PASS exact:1 [27/161] 027 - red_white.jpg GT: [red] VLM: [red] (4 jersey(s), 37.7s) PASS exact:1 [28/161] 028 - green_white.jpg GT: [green] VLM: [green] (6 jersey(s), 41.4s) PASS exact:1 [29/161] 029 -maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 7.1s) PASS exact:1 [30/161] 030 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (2 jersey(s), 5.8s) PASS similar:1 [31/161] 031 - brown_white.jpg GT: [brown] VLM: [brown] (2 jersey(s), 6.0s) PASS exact:1 [32/161] 032 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 5.9s) PASS exact:1 [33/161] 033 - navy blue_white or gray.jpg GT: [navy blue, gray] VLM: [blue] (8 jersey(s), 43.6s) PARTIAL similar:1, MISS:gray [34/161] 034 - light blue.jpg GT: [light blue] VLM: [blue] (1 jersey(s), 11.5s) FAIL MISS:light blue, extra:blue [35/161] 035 -green_gold or yellow.jpg GT: [green, gold|yellow] VLM: [green] (1 jersey(s), 11.6s) PARTIAL exact:1, MISS:gold|yellow [36/161] 036 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 9.7s) PASS exact:1 [37/161] 037 -navy_white.jpg GT: [navy] VLM: [navy blue] (3 jersey(s), 16.0s) PASS similar:1 [38/161] 038 - red_white.jpg GT: [red] VLM: [red] (3 jersey(s), 38.7s) PASS exact:1 [39/161] 039 - gray_white.jpg GT: [gray] VLM: [gray] (3 jersey(s), 18.4s) PASS exact:1 [40/161] 040 - maroon_gray.jpg GT: [maroon, gray] VLM: [gray, maroon] (2 jersey(s), 5.4s) PASS exact:2 [41/161] 041 - navy blue_white.jpg GT: [navy blue] VLM: [navy blue] (8 jersey(s), 41.5s) PASS exact:1 [42/161] 042 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 5.1s) PASS exact:1 [43/161] 043 - gray_black.jpg GT: [gray, black] VLM: [black, gray] (5 jersey(s), 39.3s) PASS exact:2 [44/161] 044 - purple_black.jpg GT: [purple, black] VLM: [purple] (8 jersey(s), 36.4s) PARTIAL exact:1, MISS:black [45/161] 045 - purple.jpg GT: [purple] VLM: [purple] (3 jersey(s), 36.0s) PASS exact:1 [46/161] 046 - green.jpg GT: [green] VLM: [black] (8 jersey(s), 35.2s) FAIL MISS:green, extra:black [47/161] 047 - purple_white.jpg GT: [purple] VLM: [purple] (3 jersey(s), 5.3s) PASS exact:1 [48/161] 048 - red.jpg GT: [red] VLM: [(none)] (0 jersey(s), 36.0s) FAIL MISS:red [49/161] 049 - white_gold.jpg GT: [gold] VLM: [yellow] (2 jersey(s), 3.6s) PASS similar:1 [50/161] 050 - white_orange.jpg GT: [orange] VLM: [orange] (6 jersey(s), 40.4s) PASS exact:1 [51/161] 051 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 5.8s) PASS exact:1 [52/161] 052 - black_gold.jpg GT: [black, gold] VLM: [black] (1 jersey(s), 24.0s) PARTIAL exact:1, MISS:gold [53/161] 053 - black_white.jpg GT: [black] VLM: [(none)] (1 jersey(s), 4.3s) FAIL MISS:black [54/161] 054 - white_blue.jpg GT: [blue] VLM: [blue] (2 jersey(s), 6.5s) PASS exact:1 [55/161] 055 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (2 jersey(s), 12.6s) PASS exact:1, similar:1 [56/161] 056 - white_red.jpg GT: [red] VLM: [red] (4 jersey(s), 36.0s) PASS exact:1 [57/161] 057 - white_gold or yellow.jpg GT: [gold|yellow] VLM: [(none)] (1 jersey(s), 4.4s) FAIL MISS:gold|yellow [58/161] 058 - purple.jpg GT: [purple] VLM: [purple] (4 jersey(s), 6.2s) PASS exact:1 [59/161] 059 - black_gold.jpg GT: [black, gold] VLM: [gold] (1 jersey(s), 4.5s) PARTIAL exact:1, MISS:black [60/161] 060 - gray_navy blue.jpg GT: [gray, navy blue] VLM: [blue] (2 jersey(s), 7.1s) PARTIAL similar:1, MISS:gray [61/161] 061 - brown or orange.jpg GT: [brown|orange] VLM: [orange] (1 jersey(s), 3.4s) PASS exact:1 [62/161] 062 - orange_blue.jpg GT: [orange, blue] VLM: [blue, orange] (2 jersey(s), 4.8s) PASS exact:2 [63/161] 063 - dark brown.jpg GT: [dark brown] VLM: [brown] (1 jersey(s), 4.7s) PASS similar:1 [64/161] 064 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 5.3s) PASS exact:1 [65/161] 065 - green_gold.jpg GT: [green, gold] VLM: [green, yellow] (5 jersey(s), 37.1s) PASS exact:1, similar:1 [66/161] 066 - yellow.jpg GT: [yellow] VLM: [yellow] (1 jersey(s), 6.6s) PASS exact:1 [67/161] 067 - red_white.jpg GT: [red] VLM: [red] (5 jersey(s), 36.5s) PASS exact:1 [68/161] 068 - gold.jpg GT: [gold] VLM: [gold] (1 jersey(s), 39.5s) PASS exact:1 [69/161] 069 - red_white.jpg GT: [red] VLM: [(none)] (5 jersey(s), 40.6s) FAIL MISS:red [70/161] 070 - green_white.jpg GT: [green] VLM: [green] (3 jersey(s), 7.9s) PASS exact:1 [71/161] 071 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 4.4s) PASS exact:1 [72/161] 072 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 5.6s) PASS exact:1 [73/161] 073 - maroon_white.jpg GT: [maroon] VLM: [maroon] (1 jersey(s), 4.2s) PASS exact:1 [74/161] 074 - white_orange.jpg GT: [orange] VLM: [(none)] (1 jersey(s), 8.9s) FAIL MISS:orange [75/161] 075 - green_white.jpg GT: [green] VLM: [green] (1 jersey(s), 5.0s) PASS exact:1 [76/161] 076 - light blue_white.jpg GT: [light blue] VLM: [light blue] (4 jersey(s), 38.6s) PASS exact:1 [77/161] 077 - teal_white.jpg GT: [teal] VLM: [green] (5 jersey(s), 34.5s) FAIL MISS:teal, extra:green [78/161] 078 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 5.7s) PASS exact:1 [79/161] 079 - blue_maroon.jpg GT: [blue, maroon] VLM: [blue, maroon] (6 jersey(s), 10.0s) PASS exact:2 [80/161] 080 - navy blue_white.jpg GT: [navy blue] VLM: [blue] (1 jersey(s), 7.9s) PASS similar:1 [81/161] 081 - navy blue.jpg GT: [navy blue] VLM: [light blue] (2 jersey(s), 6.6s) FAIL MISS:navy blue, extra:light blue [82/161] 082 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (3 jersey(s), 21.3s) PASS similar:1 [83/161] 083 - dark brown_white.jpg GT: [dark brown] VLM: [dark brown] (2 jersey(s), 40.1s) PASS exact:1 [84/161] 084 - dark brown_yellow.jpg GT: [dark brown, yellow] VLM: [dark brown, gold] (2 jersey(s), 8.6s) PASS exact:1, similar:1 [85/161] 085 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 25.5s) PASS exact:1 [86/161] 086 - dark brown_white.jpg GT: [dark brown] VLM: [dark brown] (1 jersey(s), 38.5s) PASS exact:1 [87/161] 087 - white_light blue.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 10.2s) PASS exact:1 [88/161] 088 - white_maroon.jpg GT: [maroon] VLM: [(none)] (2 jersey(s), 34.9s) FAIL MISS:maroon [89/161] 089 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 7.7s) PASS exact:1 [90/161] 090 - maroon_white.jpg GT: [maroon] VLM: [maroon] (5 jersey(s), 36.9s) PASS exact:1 [91/161] 091 - teal.jpg GT: [teal] VLM: [teal] (3 jersey(s), 7.6s) PASS exact:1 [92/161] 092 - green_white.jpg GT: [green] VLM: [green] (6 jersey(s), 40.0s) PASS exact:1 [93/161] 093 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 6.6s) PASS similar:1 [94/161] 094 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 6.6s) PASS exact:1 [95/161] 095 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 35.6s) PASS exact:1 [96/161] 096 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 3.7s) PASS exact:1 [97/161] 097 - gray_black.jpg GT: [gray, black] VLM: [gray] (4 jersey(s), 39.1s) PARTIAL exact:1, MISS:black [98/161] 098 - teal_white.jpg GT: [teal] VLM: [teal] (2 jersey(s), 35.9s) PASS exact:1 [99/161] 099 - maroon_white.jpg GT: [maroon] VLM: [maroon] (3 jersey(s), 5.6s) PASS exact:1 [100/161] 100 - orange_white.jpg GT: [orange] VLM: [orange] (4 jersey(s), 34.6s) PASS exact:1 [101/161] 101 - green_white.jpg GT: [green] VLM: [green] (7 jersey(s), 38.7s) PASS exact:1 [102/161] 102 - yellow-black.jpg GT: [yellow, black] VLM: [black] (1 jersey(s), 7.1s) PARTIAL exact:1, MISS:yellow [103/161] 103 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 35.0s) PASS exact:1 [104/161] 104 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 35.3s) PASS exact:1 [105/161] 105 - orange.jpg GT: [orange] VLM: [orange] (2 jersey(s), 4.8s) PASS exact:1 [106/161] 106 - black_gray.jpg GT: [black, gray] VLM: [black, gray] (2 jersey(s), 6.9s) PASS exact:2 [107/161] 107 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 7.8s) PASS exact:1 [108/161] 108 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.3s) PASS exact:1 [109/161] 109 - purple_white.jpg GT: [purple] VLM: [purple] (2 jersey(s), 4.8s) PASS exact:1 [110/161] 110 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 7.0s) PASS exact:1 [111/161] 111 - orange_white.jpg GT: [orange] VLM: [orange] (2 jersey(s), 10.9s) PASS exact:1 [112/161] 112 - orange_white.jpg GT: [orange] VLM: [(none)] (0 jersey(s), 37.6s) FAIL MISS:orange [113/161] 113 - orange.jpg GT: [orange] VLM: [orange] (1 jersey(s), 3.5s) PASS exact:1 [114/161] 114 - black_white.jpg GT: [black] VLM: [black] (2 jersey(s), 5.5s) PASS exact:1 [115/161] 115 - navy blue_maroon.jpg GT: [navy blue, maroon] VLM: [blue, maroon] (4 jersey(s), 7.4s) PASS exact:1, similar:1 [116/161] 116 - gray_white.jpg GT: [gray] VLM: [gray] (2 jersey(s), 39.7s) PASS exact:1 [117/161] 117 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 37.5s) PASS exact:1 [118/161] 118 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (2 jersey(s), 12.1s) PASS similar:1 [119/161] 119 - black_yellow.jpg GT: [black, yellow] VLM: [black, yellow] (4 jersey(s), 36.4s) PASS exact:2 [120/161] 120 - red_dark blue.jpg GT: [red, dark blue] VLM: [navy blue, red] (3 jersey(s), 17.4s) PASS exact:1, similar:1 [121/161] 121 - orange_white.jpg GT: [orange] VLM: [orange] (3 jersey(s), 17.7s) PASS exact:1 [122/161] 122 - gray.jpg GT: [gray] VLM: [gray] (1 jersey(s), 4.1s) PASS exact:1 [123/161] 123 - teal_white.jpg GT: [teal] VLM: [teal] (4 jersey(s), 11.1s) PASS exact:1 [124/161] 124 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (4 jersey(s), 8.1s) PASS similar:1 [125/161] 125 - dark blue_maroon.jpg GT: [dark blue, maroon] VLM: [maroon, navy blue] (4 jersey(s), 17.9s) PASS exact:1, similar:1 [126/161] 126 - white_blue.jpg GT: [blue] VLM: [blue] (3 jersey(s), 6.8s) PASS exact:1 [127/161] 127 - yellow.jpg GT: [yellow] VLM: [black, gold] (5 jersey(s), 39.3s) PARTIAL similar:1, extra:black [128/161] 128 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 11.6s) PASS exact:1 [129/161] 129 - blue_white.jpg GT: [blue] VLM: [(none)] (3 jersey(s), 6.1s) FAIL MISS:blue [130/161] 130 - yellow_black.jpg GT: [yellow, black] VLM: [yellow] (1 jersey(s), 4.3s) PARTIAL exact:1, MISS:black [131/161] 131 - purple_orange.jpg GT: [purple, orange] VLM: [orange, purple] (3 jersey(s), 9.4s) PASS exact:2 [132/161] 132 - brown_white.jpg GT: [brown] VLM: [orange] (2 jersey(s), 36.4s) FAIL MISS:brown, extra:orange [133/161] 133 - light blue.png GT: [light blue] VLM: [light blue] (7 jersey(s), 38.8s) PASS exact:1 [134/161] 134 - teal_white.jpg GT: [teal] VLM: [light blue] (1 jersey(s), 11.2s) FAIL MISS:teal, extra:light blue [135/161] 135 - green.jpg GT: [green] VLM: [green] (1 jersey(s), 4.9s) PASS exact:1 [136/161] 136 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 6.8s) PASS exact:1 [137/161] 137 - green_white.jpg GT: [green] VLM: [green] (4 jersey(s), 9.8s) PASS exact:1 [138/161] 138 - maroon.jpg GT: [maroon] VLM: [red] (1 jersey(s), 4.3s) FAIL MISS:maroon, extra:red [139/161] 139 - dark blue_white.jpg GT: [dark blue] VLM: [navy blue] (1 jersey(s), 5.3s) PASS similar:1 [140/161] 140 - red_white.jpg GT: [red] VLM: [red] (2 jersey(s), 5.3s) PASS exact:1 [141/161] 141 - light blue_white.jpg GT: [light blue] VLM: [light blue] (3 jersey(s), 6.3s) PASS exact:1 [142/161] 142 - orange_white.jpg GT: [orange] VLM: [orange] (1 jersey(s), 5.3s) PASS exact:1 [143/161] 143 - blue_white.jpg GT: [blue] VLM: [blue] (3 jersey(s), 5.7s) PASS exact:1 [144/161] 144 - green.jpg GT: [green] VLM: [green] (8 jersey(s), 38.3s) PASS exact:1 [145/161] 145 - green_white.jpg GT: [green] VLM: [green] (2 jersey(s), 7.3s) PASS exact:1 [146/161] 146 - red_gray.jpg GT: [red, gray] VLM: [gray, red] (2 jersey(s), 4.7s) PASS exact:2 [147/161] 147 - green.jpg GT: [green] VLM: [green] (3 jersey(s), 5.2s) PASS exact:1 [148/161] 148 - yellow_purple.jpg GT: [yellow, purple] VLM: [purple, yellow] (2 jersey(s), 8.4s) PASS exact:2 [149/161] 149 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 38.0s) PASS exact:1 [150/161] 150 - green_gray.jpg GT: [green, gray] VLM: [black] (2 jersey(s), 10.3s) FAIL MISS:green,gray, extra:black [151/161] 151 - yellow_black.jpg GT: [yellow, black] VLM: [gold, navy blue] (6 jersey(s), 35.2s) PARTIAL similar:1, MISS:black, extra:navy blue [152/161] 152 - pink_dark blue.jpg GT: [pink, dark blue] VLM: [navy blue, pink] (3 jersey(s), 7.9s) PASS exact:1, similar:1 [153/161] 153 - maroon_white.jpg GT: [maroon] VLM: [maroon] (2 jersey(s), 4.6s) PASS exact:1 [154/161] 154 - dark brown.jpeg GT: [dark brown] VLM: [brown] (5 jersey(s), 8.9s) PASS similar:1 [155/161] 155 - white_green_gray_purple_yellow.jpg GT: [green, gray, purple, yellow] VLM: [gold, gray, purple] (5 jersey(s), 21.6s) PARTIAL exact:2, similar:1, MISS:green [156/161] 156 - maroon_gray.jpg GT: [maroon, gray] VLM: [maroon] (2 jersey(s), 15.0s) PARTIAL exact:1, MISS:gray [157/161] 157 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 37.0s) PASS exact:1 [158/161] 158 - dark blue_yellow.jpg GT: [dark blue, yellow] VLM: [gold, navy blue] (5 jersey(s), 37.4s) PASS similar:2 [159/161] 159 - blue_white.jpg GT: [blue] VLM: [blue] (5 jersey(s), 10.1s) PASS exact:1 [160/161] 160 - blue_white.jpg GT: [blue] VLM: [(none)] (1 jersey(s), 4.3s) FAIL MISS:blue [161/161] 161 - light blue_white.jpg GT: [light blue] VLM: [light blue] (2 jersey(s), 4.4s) PASS exact:1 ================================================================================ ACCURACY SUMMARY (gemini-3-flash-preview) ================================================================================ Images processed: 161 Errors: 0 Total time: 344.4s (2.1s avg) Ground truth colors: 202 (excluding white) VLM unique colors: 174 (excluding white) --- Recall (did VLM find each ground truth color?) --- Exact match: 137 / 202 (67.8%) Similar match: 28 / 202 (13.9%) Total found: 165 / 202 (81.7%) Missed: 37 / 202 (18.3%) --- Precision (are VLM colors correct?) --- Exact match: 137 / 174 (78.7%) Similar match: 27 / 174 (15.5%) Total correct: 164 / 174 (94.3%) Extra/wrong: 10 / 174 (5.7%) --- Similar-Match Confusions (expected -> got) --- dark blue -> navy blue x10 navy blue -> blue x5 yellow -> gold x5 gold -> yellow x3 brown -> dark brown x2 dark brown -> brown x2 navy -> navy blue x1 --- Most Missed Ground Truth Colors --- black 7 ####### gray 6 ###### maroon 3 ### red 3 ### blue 3 ### green 3 ### gold 2 ## gold|yellow 2 ## orange 2 ## teal 2 ## light blue 1 # navy blue 1 # yellow 1 # brown 1 # --- Most Common Extra/Wrong VLM Colors --- black 3 ### light blue 2 ## blue 1 # green 1 # orange 1 # red 1 # navy blue 1 # --- Per-Image Verdict --- PASS 124 PARTIAL 19 FAIL 18 --- Failed Images (18) --- 016 - maroon.jpg missed: maroon 034 - light blue.jpg missed: light blue extra: blue 046 - green.jpg missed: green extra: black 048 - red.jpg missed: red 053 - black_white.jpg missed: black 057 - white_gold or yellow.jpg missed: gold|yellow 069 - red_white.jpg missed: red 074 - white_orange.jpg missed: orange 077 - teal_white.jpg missed: teal extra: green 081 - navy blue.jpg missed: navy blue extra: light blue 088 - white_maroon.jpg missed: maroon 112 - orange_white.jpg missed: orange 129 - blue_white.jpg missed: blue 132 - brown_white.jpg missed: brown extra: orange 134 - teal_white.jpg missed: teal extra: light blue 138 - maroon.jpg missed: maroon extra: red 150 - green_gray.jpg missed: green, gray extra: black 160 - blue_white.jpg missed: blue ======================================== All tests completed at: Tue Mar 3 06:11:40 PM MST 2026