Files
jersey_test/accuracy_test_results_all.txt
Rick McEwen 5405d7f7dc Add accuracy test framework, prompts, results, and analysis reports
Includes accuracy test scripts for Qwen (local) and Gemini (cloud API),
three prompt variants (original, capstone, constrained), test results
from all runs, and two analysis reports with an HTML presentation version.
2026-03-03 18:44:49 -07:00

5610 lines
159 KiB
Plaintext

========================================
Qwen3-VL-8B + jersey_prompt.txt
Started: Tue Mar 3 04:40:45 PM MST 2026
========================================
Images to process: 161
Server: http://agx:8080
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [black] (3 jersey(s), 11.1s)
FAIL MISS:brown,dark brown, extra:black
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 7.9s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [blue] (3 jersey(s), 10.8s)
PASS similar:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (3 jersey(s), 11.9s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 5.0s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 4.3s)
PASS similar:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 7.9s)
PASS exact:1
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 4.3s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [gold, red] (3 jersey(s), 10.8s)
PARTIAL exact:1, extra:gold
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black] (3 jersey(s), 10.9s)
PASS exact:1
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 13.8s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.3s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 7.5s)
FAIL MISS:light blue, extra:blue
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 10.9s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.3s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 1.5s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [black] (2 jersey(s), 8.8s)
FAIL MISS:brown, extra:black
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [gray, red] (2 jersey(s), 7.4s)
PASS exact:2
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [red, yellow] (2 jersey(s), 7.7s)
PARTIAL similar:1, MISS:maroon, extra:red
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 8.1s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.9s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 4.9s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.8s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 7.8s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 4.3s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 16.3s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.8s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [red] (2 jersey(s), 7.8s)
FAIL MISS:maroon, extra:red
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.8s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 7.8s)
PASS exact:1
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 8.0s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (3 jersey(s), 10.9s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (1 jersey(s), 4.7s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green, yellow] (2 jersey(s), 8.1s)
PASS exact:2
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (4 jersey(s), 13.7s)
FAIL MISS:light blue, extra:blue
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [blue] (3 jersey(s), 10.1s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 10.9s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 7.9s)
PASS exact:1
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (1 jersey(s), 5.1s)
PARTIAL exact:1, MISS:gray
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (8 jersey(s), 25.7s)
PASS similar:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.8s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, gray] (2 jersey(s), 7.9s)
PASS exact:2
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (5 jersey(s), 16.6s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.8s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (15 jersey(s), 46.4s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 10.7s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [red] (1 jersey(s), 4.9s)
PASS exact:1
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 7.9s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 13.8s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.9s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black, yellow] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 4.9s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 7.7s)
PASS exact:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.9s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [yellow] (2 jersey(s), 7.9s)
PASS exact:1
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 14.0s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 4.9s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 7.9s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 7.9s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 7.5s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [black] (1 jersey(s), 4.9s)
FAIL MISS:dark brown, extra:black
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.7s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (3 jersey(s), 10.4s)
PASS exact:1, similar:1
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 4.7s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 13.8s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 4.8s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [(none)] (4 jersey(s), 13.7s)
FAIL MISS:red
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.8s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.9s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.5s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.4s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.5s)
PASS exact:1
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.8s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (3 jersey(s), 11.4s)
PASS exact:1
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (4 jersey(s), 13.4s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 7.6s)
FAIL MISS:light blue, extra:blue
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, red] (4 jersey(s), 13.8s)
PARTIAL exact:1, MISS:maroon, extra:red
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.8s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.8s)
PASS similar:1
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (3 jersey(s), 10.6s)
PASS similar:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [black] (2 jersey(s), 7.8s)
FAIL MISS:dark brown, extra:black
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [black, yellow] (2 jersey(s), 7.9s)
PARTIAL exact:1, MISS:dark brown, extra:black
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.8s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [brown] (2 jersey(s), 8.0s)
PASS similar:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 7.8s)
FAIL MISS:light blue, extra:blue
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.9s)
PASS exact:1
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 10.8s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (4 jersey(s), 14.2s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 8.0s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 13.7s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 7.9s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 12.5s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 8.6s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [gray] (2 jersey(s), 8.0s)
PARTIAL exact:1, MISS:black
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 8.7s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [red] (3 jersey(s), 12.0s)
FAIL MISS:maroon, extra:red
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 13.9s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 17.0s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black, yellow] (3 jersey(s), 10.9s)
PASS exact:2
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 11.1s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.0s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 9.1s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, gray] (2 jersey(s), 9.0s)
PASS exact:2
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.7s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.9s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.8s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 13.9s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 8.0s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.8s)
PASS exact:1
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.9s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 8.2s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, red] (4 jersey(s), 13.8s)
PARTIAL similar:1, MISS:maroon, extra:red
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 7.9s)
PASS exact:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.1s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 7.4s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (3 jersey(s), 10.9s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [blue, red] (3 jersey(s), 10.7s)
PASS exact:1, similar:1
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 10.9s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [gray] (1 jersey(s), 6.2s)
PASS exact:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (3 jersey(s), 10.9s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (4 jersey(s), 13.7s)
PASS similar:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [blue, red] (2 jersey(s), 8.2s)
PARTIAL similar:1, MISS:maroon, extra:red
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 10.8s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [yellow] (4 jersey(s), 14.0s)
PASS exact:1
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [(none)] (3 jersey(s), 10.9s)
FAIL MISS:blue
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [black, yellow] (2 jersey(s), 8.4s)
PASS exact:2
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 10.8s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (3 jersey(s), 10.9s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (6 jersey(s), 21.1s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [blue] (1 jersey(s), 4.9s)
FAIL MISS:teal, extra:blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.0s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.1s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.9s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 4.9s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 8.0s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.6s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (3 jersey(s), 11.1s)
FAIL MISS:light blue, extra:blue
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 8.1s)
PASS exact:1
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 11.0s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (10 jersey(s), 31.9s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [gray, red] (2 jersey(s), 8.0s)
PASS exact:2
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.8s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 7.9s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 16.7s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [black] (2 jersey(s), 7.8s)
FAIL MISS:green,gray, extra:black
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [blue, yellow] (5 jersey(s), 16.7s)
PARTIAL exact:1, MISS:black, extra:blue
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [blue, pink] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.0s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [brown] (5 jersey(s), 16.8s)
PASS similar:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [gray, purple, yellow] (5 jersey(s), 17.3s)
PARTIAL exact:3, MISS:green
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (2 jersey(s), 7.7s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 10.7s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [blue, yellow] (4 jersey(s), 14.0s)
PASS exact:1, similar:1
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 13.9s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [(none)] (1 jersey(s), 4.9s)
FAIL MISS:blue
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 7.7s)
FAIL MISS:light blue, extra:blue
================================================================================
ACCURACY SUMMARY
================================================================================
Images processed: 161
Errors: 0
Total time: 1557.4s (9.7s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 184 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 132 / 202 (65.3%)
Similar match: 26 / 202 (12.9%)
Total found: 158 / 202 (78.2%)
Missed: 44 / 202 (21.8%)
--- Precision (are VLM colors correct?) ---
Exact match: 132 / 184 (71.7%)
Similar match: 26 / 184 (14.1%)
Total correct: 158 / 184 (85.9%)
Extra/wrong: 26 / 184 (14.1%)
--- Similar-Match Confusions (expected -> got) ---
dark blue -> blue x10
navy blue -> blue x8
gold -> yellow x5
dark brown -> brown x2
navy -> blue x1
--- Most Missed Ground Truth Colors ---
maroon 8 ########
gray 7 #######
light blue 7 #######
black 6 ######
dark brown 4 ####
brown 3 ###
blue 3 ###
green 3 ###
teal 2 ##
red 1 #
--- Most Common Extra/Wrong VLM Colors ---
blue 9 #########
black 7 #######
red 7 #######
gold 1 #
green 1 #
orange 1 #
--- Per-Image Verdict ---
PASS 118
PARTIAL 19
FAIL 24
--- Failed Images (24) ---
001 -brown_white or dark brown.jpg
missed: brown, dark brown
extra: black
013 - light blue.jpg
missed: light blue
extra: blue
016 - maroon.jpg
missed: maroon
017 - brown_white.jpg
missed: brown
extra: black
029 -maroon_white.jpg
missed: maroon
extra: red
034 - light blue.jpg
missed: light blue
extra: blue
036 - light blue_white.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
053 - black_white.jpg
missed: black
063 - dark brown.jpg
missed: dark brown
extra: black
069 - red_white.jpg
missed: red
077 - teal_white.jpg
missed: teal
extra: green
078 - light blue_white.jpg
missed: light blue
extra: blue
083 - dark brown_white.jpg
missed: dark brown
extra: black
087 - white_light blue.jpg
missed: light blue
extra: blue
099 - maroon_white.jpg
missed: maroon
extra: red
129 - blue_white.jpg
missed: blue
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: blue
138 - maroon.jpg
missed: maroon
extra: red
141 - light blue_white.jpg
missed: light blue
extra: blue
150 - green_gray.jpg
missed: green, gray
extra: black
160 - blue_white.jpg
missed: blue
161 - light blue_white.jpg
missed: light blue
extra: blue
========================================
Gemini 3 Flash + jersey_prompt.txt
Started: Tue Mar 3 05:06:43 PM MST 2026
========================================
Model: gemini-3-flash-preview
Images to process: 161
Concurrency: 8 workers
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt.txt (1504 chars)
================================================================================
Pre-encoding images ... 161 images in 1.7s
Sending API requests ...
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (253.2s total)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [brown] (1 jersey(s), 9.0s)
PASS exact:1, similar:1
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 6.6s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [navy blue] (3 jersey(s), 9.4s)
PASS similar:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (2 jersey(s), 10.5s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 3.0s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [dark blue] (1 jersey(s), 3.1s)
PASS similar:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.0s)
PASS exact:1
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 5.1s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [red] (4 jersey(s), 17.9s)
PASS exact:1
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black] (3 jersey(s), 11.3s)
PASS exact:1
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 8.5s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 3.8s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 10.2s)
FAIL MISS:light blue, extra:blue
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 6.1s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.4s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 3.2s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 4.8s)
PASS exact:1
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [grey] (1 jersey(s), 6.5s)
PARTIAL similar:1, MISS:red
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [maroon] (1 jersey(s), 4.4s)
PARTIAL exact:1, MISS:gold
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 5.6s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.8s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 3.3s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.7s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 5.1s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 3.7s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 6.8s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 35.2s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 37.9s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [red] (2 jersey(s), 4.8s)
FAIL MISS:maroon, extra:red
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 38.6s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 4.9s)
PASS exact:1
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 5.3s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (5 jersey(s), 37.0s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (6 jersey(s), 15.9s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green, yellow] (2 jersey(s), 14.4s)
PASS exact:2
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 6.4s)
PASS exact:1
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [navy blue] (4 jersey(s), 8.3s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 8.2s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [grey] (2 jersey(s), 4.6s)
PASS similar:1
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [grey, maroon] (2 jersey(s), 7.3s)
PASS exact:1, similar:1
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [navy blue] (8 jersey(s), 42.8s)
PASS exact:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.0s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, grey] (5 jersey(s), 39.4s)
PASS exact:1, similar:1
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (8 jersey(s), 35.7s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 34.7s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (8 jersey(s), 39.6s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 6.5s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [(none)] (0 jersey(s), 7.4s)
FAIL MISS:red
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 3.3s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (6 jersey(s), 39.2s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.1s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black] (1 jersey(s), 3.2s)
PARTIAL exact:1, MISS:gold
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 3.2s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 3.5s)
PASS exact:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (2 jersey(s), 5.8s)
PASS exact:1, similar:1
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (4 jersey(s), 12.5s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [(none)] (1 jersey(s), 4.1s)
FAIL MISS:gold|yellow
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 5.3s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 3.4s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 5.7s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (1 jersey(s), 3.0s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 5.7s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 5.1s)
PASS similar:1
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.0s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (4 jersey(s), 38.3s)
PASS exact:1, similar:1
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 3.3s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 10.7s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 6.1s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [(none)] (5 jersey(s), 39.4s)
FAIL MISS:red
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 6.2s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 11.6s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 4.7s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (1 jersey(s), 4.8s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [(none)] (1 jersey(s), 7.0s)
FAIL MISS:orange
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 3.4s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 8.5s)
PASS exact:1
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (5 jersey(s), 37.9s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 6.8s)
PASS exact:1
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, maroon] (6 jersey(s), 7.7s)
PASS exact:2
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 4.4s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 4.4s)
PASS similar:1
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (3 jersey(s), 6.8s)
PASS similar:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [black] (2 jersey(s), 10.1s)
FAIL MISS:dark brown, extra:black
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [brown, yellow] (2 jersey(s), 3.4s)
PASS exact:1, similar:1
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 9.2s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 5.7s)
PASS similar:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 8.4s)
PASS exact:1
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [(none)] (2 jersey(s), 5.3s)
FAIL MISS:maroon
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 7.4s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (5 jersey(s), 36.7s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (3 jersey(s), 6.0s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (6 jersey(s), 10.9s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 4.5s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 6.6s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 36.8s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 2.7s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [grey] (3 jersey(s), 36.8s)
PARTIAL similar:1, MISS:black
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 6.7s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 8.1s)
PASS exact:1
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 5.7s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (7 jersey(s), 12.1s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black] (1 jersey(s), 3.4s)
PARTIAL exact:1, MISS:yellow
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 18.0s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 35.2s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 5.3s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, grey] (2 jersey(s), 34.5s)
PASS exact:1, similar:1
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 4.7s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 4.5s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 4.7s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 9.0s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 37.6s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [(none)] (0 jersey(s), 6.8s)
FAIL MISS:orange
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.4s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 5.1s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, red] (4 jersey(s), 7.7s)
PARTIAL similar:1, MISS:maroon, extra:red
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [grey] (2 jersey(s), 7.1s)
PASS similar:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.8s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 7.9s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (4 jersey(s), 8.5s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [blue, red] (3 jersey(s), 20.5s)
PASS exact:1, similar:1
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 6.5s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [grey] (1 jersey(s), 3.4s)
PASS similar:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (4 jersey(s), 20.7s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (4 jersey(s), 7.8s)
PASS similar:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [navy, red] (3 jersey(s), 7.7s)
PARTIAL similar:1, MISS:maroon, extra:red
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 7.5s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [black, yellow] (5 jersey(s), 22.9s)
PARTIAL exact:1, extra:black
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 36.1s)
PASS exact:1
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [(none)] (3 jersey(s), 6.0s)
FAIL MISS:blue
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [yellow] (1 jersey(s), 3.3s)
PARTIAL exact:1, MISS:black
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 5.4s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (3 jersey(s), 30.8s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (7 jersey(s), 42.4s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [blue] (1 jersey(s), 7.1s)
FAIL MISS:teal, extra:blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (1 jersey(s), 3.6s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.5s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 7.3s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 3.5s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (1 jersey(s), 12.2s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 4.0s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (3 jersey(s), 4.7s)
PASS exact:1
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.0s)
PASS exact:1
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 5.9s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (13 jersey(s), 8.2s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.0s)
PASS exact:1
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [grey, red] (2 jersey(s), 4.2s)
PASS exact:1, similar:1
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 4.8s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 6.0s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 37.0s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [black] (2 jersey(s), 12.3s)
FAIL MISS:green,gray, extra:black
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [navy, yellow] (6 jersey(s), 39.2s)
PARTIAL exact:1, MISS:black, extra:navy
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [navy blue, pink] (3 jersey(s), 5.9s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 5.2s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [brown] (5 jersey(s), 7.0s)
PASS similar:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [grey, purple, yellow] (5 jersey(s), 7.7s)
PARTIAL exact:2, similar:1, MISS:green
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (3 jersey(s), 35.1s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 40.2s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [dark blue, yellow] (6 jersey(s), 33.8s)
PASS exact:2
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 36.7s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [(none)] (1 jersey(s), 4.1s)
FAIL MISS:blue
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 4.8s)
FAIL MISS:light blue, extra:blue
================================================================================
ACCURACY SUMMARY (gemini-3-flash-preview)
================================================================================
Images processed: 161
Errors: 0
Total time: 253.2s (1.6s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 175 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 126 / 202 (62.4%)
Similar match: 35 / 202 (17.3%)
Total found: 161 / 202 (79.7%)
Missed: 41 / 202 (20.3%)
--- Precision (are VLM colors correct?) ---
Exact match: 126 / 175 (72.0%)
Similar match: 34 / 175 (19.4%)
Total correct: 160 / 175 (91.4%)
Extra/wrong: 15 / 175 (8.6%)
--- Similar-Match Confusions (expected -> got) ---
gray -> grey x10
navy blue -> blue x6
dark brown -> brown x5
dark blue -> navy blue x5
gold -> yellow x3
dark blue -> blue x3
navy blue -> dark blue x1
navy -> navy blue x1
dark blue -> navy x1
--- Most Missed Ground Truth Colors ---
black 7 #######
gray 6 ######
maroon 6 ######
light blue 3 ###
red 3 ###
blue 3 ###
green 3 ###
gold 2 ##
orange 2 ##
teal 2 ##
gold|yellow 1 #
dark brown 1 #
yellow 1 #
brown 1 #
--- Most Common Extra/Wrong VLM Colors ---
blue 4 ####
red 4 ####
black 4 ####
green 1 #
orange 1 #
navy 1 #
--- Per-Image Verdict ---
PASS 120
PARTIAL 20
FAIL 21
--- Failed Images (21) ---
013 - light blue.jpg
missed: light blue
extra: blue
016 - maroon.jpg
missed: maroon
029 -maroon_white.jpg
missed: maroon
extra: red
034 - light blue.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
048 - red.jpg
missed: red
053 - black_white.jpg
missed: black
057 - white_gold or yellow.jpg
missed: gold|yellow
069 - red_white.jpg
missed: red
074 - white_orange.jpg
missed: orange
077 - teal_white.jpg
missed: teal
extra: green
083 - dark brown_white.jpg
missed: dark brown
extra: black
088 - white_maroon.jpg
missed: maroon
112 - orange_white.jpg
missed: orange
129 - blue_white.jpg
missed: blue
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: blue
138 - maroon.jpg
missed: maroon
extra: red
150 - green_gray.jpg
missed: green, gray
extra: black
160 - blue_white.jpg
missed: blue
161 - light blue_white.jpg
missed: light blue
extra: blue
========================================
Qwen3-VL-8B + jersey_prompt_capstone.txt
Started: Tue Mar 3 05:10:58 PM MST 2026
========================================
Images to process: 161
Server: http://agx:8080
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [black] (2 jersey(s), 8.2s)
FAIL MISS:brown,dark brown, extra:black
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 6.0s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [blue] (3 jersey(s), 8.3s)
PASS similar:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (3 jersey(s), 11.9s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 3.8s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 4.2s)
PASS similar:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.0s)
PASS exact:1
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 3.2s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [gold, red] (3 jersey(s), 10.8s)
PARTIAL exact:1, extra:gold
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black] (3 jersey(s), 10.9s)
PASS exact:1
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 13.8s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.3s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 7.5s)
FAIL MISS:light blue, extra:blue
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 11.0s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 5.4s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 1.7s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [black] (2 jersey(s), 6.9s)
FAIL MISS:brown, extra:black
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [gray, red] (2 jersey(s), 7.3s)
PASS exact:2
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [red] (1 jersey(s), 3.7s)
FAIL MISS:maroon,gold, extra:red
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 6.2s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 6.1s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 3.8s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.9s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 7.7s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 3.2s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 16.1s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [red] (2 jersey(s), 7.8s)
FAIL MISS:maroon, extra:red
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 5.9s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.0s)
PASS exact:1
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 6.1s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (3 jersey(s), 10.9s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (1 jersey(s), 3.7s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green, yellow] (2 jersey(s), 8.0s)
PASS exact:2
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (4 jersey(s), 13.8s)
FAIL MISS:light blue, extra:blue
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [blue] (3 jersey(s), 10.1s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 11.0s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 7.9s)
PASS exact:1
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (1 jersey(s), 5.1s)
PARTIAL exact:1, MISS:gray
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (9 jersey(s), 28.9s)
PASS similar:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.8s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, gray] (2 jersey(s), 8.0s)
PASS exact:2
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (7 jersey(s), 22.5s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.9s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (15 jersey(s), 46.5s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 10.7s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [red] (1 jersey(s), 4.9s)
PASS exact:1
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 6.1s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 13.8s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.8s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black] (1 jersey(s), 4.8s)
PARTIAL exact:1, MISS:gold
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 3.7s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 5.9s)
PASS exact:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (2 jersey(s), 7.7s)
PASS exact:1, similar:1
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.9s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [(none)] (1 jersey(s), 3.7s)
FAIL MISS:gold|yellow
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 14.0s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 3.8s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 6.0s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (1 jersey(s), 3.7s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 5.6s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [black] (1 jersey(s), 3.7s)
FAIL MISS:dark brown, extra:black
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.8s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (3 jersey(s), 10.4s)
PASS exact:1, similar:1
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 3.5s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 13.8s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 3.7s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 16.6s)
PASS exact:1
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 8.3s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.9s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.5s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (1 jersey(s), 3.4s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.4s)
PASS exact:1
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.8s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 14.2s)
PASS exact:1
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (3 jersey(s), 10.4s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 5.9s)
PASS exact:1
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, red] (4 jersey(s), 13.8s)
PARTIAL exact:1, MISS:maroon, extra:red
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 3.5s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 5.8s)
PASS similar:1
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (3 jersey(s), 10.6s)
PASS similar:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [black] (1 jersey(s), 3.7s)
FAIL MISS:dark brown, extra:black
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [black, yellow] (2 jersey(s), 6.0s)
PARTIAL exact:1, MISS:dark brown, extra:black
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 3.6s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 5.0s)
PASS similar:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 6.0s)
PASS exact:1
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.9s)
PASS exact:1
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 11.0s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (4 jersey(s), 14.2s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 8.1s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 13.8s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 5.9s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 12.5s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 6.7s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [gray] (2 jersey(s), 8.0s)
PARTIAL exact:1, MISS:black
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 6.9s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 12.2s)
PASS exact:1
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 13.8s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 17.0s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black, yellow] (2 jersey(s), 8.0s)
PASS exact:2
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 17.3s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 11.0s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.4s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, gray] (2 jersey(s), 7.3s)
PASS exact:2
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 10.7s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.9s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 6.0s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 13.9s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 6.1s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [(none)] (1 jersey(s), 3.6s)
FAIL MISS:orange
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.8s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 6.3s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, red] (4 jersey(s), 13.8s)
PARTIAL similar:1, MISS:maroon, extra:red
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 6.0s)
PASS exact:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 6.2s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 7.4s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (3 jersey(s), 10.9s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [blue, red] (3 jersey(s), 10.6s)
PASS exact:1, similar:1
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 11.0s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [gray] (1 jersey(s), 5.1s)
PASS exact:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (4 jersey(s), 13.9s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (4 jersey(s), 13.7s)
PASS similar:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [blue, red] (2 jersey(s), 8.2s)
PARTIAL similar:1, MISS:maroon, extra:red
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 10.8s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [yellow] (4 jersey(s), 13.9s)
PASS exact:1
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 13.8s)
PASS exact:1
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [black, yellow] (2 jersey(s), 8.4s)
PASS exact:2
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 8.3s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (3 jersey(s), 10.8s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (7 jersey(s), 23.5s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [blue] (1 jersey(s), 5.0s)
FAIL MISS:teal, extra:blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (1 jersey(s), 3.9s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 6.2s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.9s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 3.8s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 6.0s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.7s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (3 jersey(s), 8.6s)
FAIL MISS:light blue, extra:blue
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 6.1s)
PASS exact:1
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 11.0s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (12 jersey(s), 37.7s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [gray, red] (2 jersey(s), 6.7s)
PASS exact:2
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 8.3s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 7.9s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 13.7s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [black] (2 jersey(s), 7.8s)
FAIL MISS:green,gray, extra:black
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [navy, yellow] (5 jersey(s), 17.1s)
PARTIAL exact:1, MISS:black, extra:navy
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [blue, pink] (2 jersey(s), 7.9s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.1s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [brown] (5 jersey(s), 12.9s)
PASS similar:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [purple, yellow] (4 jersey(s), 14.2s)
PARTIAL exact:2, MISS:green,gray
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (2 jersey(s), 7.6s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 8.3s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [navy, yellow] (4 jersey(s), 14.0s)
PASS exact:1, similar:1
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 13.9s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [(none)] (1 jersey(s), 3.8s)
FAIL MISS:blue
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 5.8s)
FAIL MISS:light blue, extra:blue
================================================================================
ACCURACY SUMMARY
================================================================================
Images processed: 161
Errors: 0
Total time: 1437.3s (8.9s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 181 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 134 / 202 (66.3%)
Similar match: 24 / 202 (11.9%)
Total found: 158 / 202 (78.2%)
Missed: 44 / 202 (21.8%)
--- Precision (are VLM colors correct?) ---
Exact match: 134 / 181 (74.0%)
Similar match: 24 / 181 (13.3%)
Total correct: 158 / 181 (87.3%)
Extra/wrong: 23 / 181 (12.7%)
--- Similar-Match Confusions (expected -> got) ---
dark blue -> blue x9
navy blue -> blue x8
gold -> yellow x3
dark brown -> brown x2
navy -> blue x1
dark blue -> navy x1
--- Most Missed Ground Truth Colors ---
gray 8 ########
maroon 7 #######
black 6 ######
light blue 5 #####
dark brown 4 ####
brown 3 ###
green 3 ###
gold 2 ##
blue 2 ##
teal 2 ##
gold|yellow 1 #
orange 1 #
--- Most Common Extra/Wrong VLM Colors ---
black 7 #######
blue 6 ######
red 6 ######
gold 1 #
green 1 #
orange 1 #
navy 1 #
--- Per-Image Verdict ---
PASS 120
PARTIAL 19
FAIL 22
--- Failed Images (22) ---
001 -brown_white or dark brown.jpg
missed: brown, dark brown
extra: black
013 - light blue.jpg
missed: light blue
extra: blue
016 - maroon.jpg
missed: maroon
017 - brown_white.jpg
missed: brown
extra: black
019 - maroon_gold.jpg
missed: maroon, gold
extra: red
029 -maroon_white.jpg
missed: maroon
extra: red
034 - light blue.jpg
missed: light blue
extra: blue
036 - light blue_white.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
053 - black_white.jpg
missed: black
057 - white_gold or yellow.jpg
missed: gold|yellow
063 - dark brown.jpg
missed: dark brown
extra: black
077 - teal_white.jpg
missed: teal
extra: green
083 - dark brown_white.jpg
missed: dark brown
extra: black
112 - orange_white.jpg
missed: orange
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: blue
138 - maroon.jpg
missed: maroon
extra: red
141 - light blue_white.jpg
missed: light blue
extra: blue
150 - green_gray.jpg
missed: green, gray
extra: black
160 - blue_white.jpg
missed: blue
161 - light blue_white.jpg
missed: light blue
extra: blue
========================================
Gemini 3 Flash + jersey_prompt_capstone.txt
Started: Tue Mar 3 05:34:55 PM MST 2026
========================================
Model: gemini-3-flash-preview
Images to process: 161
Concurrency: 8 workers
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_capstone.txt (1511 chars)
================================================================================
Pre-encoding images ... 161 images in 1.7s
Sending API requests ...
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (259.8s total)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [brown] (1 jersey(s), 7.0s)
PASS exact:1, similar:1
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 4.6s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 7.5s)
PASS similar:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (3 jersey(s), 18.8s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 3.7s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [dark blue] (1 jersey(s), 4.7s)
PASS similar:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.3s)
PASS exact:1
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 7.5s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [red] (3 jersey(s), 12.1s)
PASS exact:1
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black] (3 jersey(s), 13.8s)
PASS exact:1
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 12.5s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 3.5s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 4.1s)
PASS exact:1
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 4.6s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 4.0s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 5.0s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [brown] (3 jersey(s), 8.9s)
PASS exact:1
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [grey] (1 jersey(s), 4.1s)
PARTIAL similar:1, MISS:red
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [red] (1 jersey(s), 5.0s)
FAIL MISS:maroon,gold, extra:red
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 4.0s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 4.3s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 5.3s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 3.6s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 3.6s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 3.3s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 5.9s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 36.1s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 38.3s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [red] (2 jersey(s), 4.8s)
FAIL MISS:maroon, extra:red
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 10.8s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 4.2s)
PASS exact:1
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 4.8s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (7 jersey(s), 40.2s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (1 jersey(s), 12.7s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green, yellow] (3 jersey(s), 9.2s)
PASS exact:2
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 5.0s)
PASS exact:1
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [blue] (4 jersey(s), 7.5s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 36.8s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [blue, grey] (4 jersey(s), 38.9s)
PARTIAL similar:1, extra:blue
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [grey, maroon] (2 jersey(s), 11.3s)
PASS exact:1, similar:1
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (8 jersey(s), 7.2s)
PASS similar:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.5s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, grey] (5 jersey(s), 7.6s)
PASS exact:1, similar:1
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (8 jersey(s), 36.8s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 5.4s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (8 jersey(s), 39.3s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 4.7s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [(none)] (0 jersey(s), 34.4s)
FAIL MISS:red
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 4.1s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (5 jersey(s), 37.3s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.2s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black] (1 jersey(s), 3.7s)
PARTIAL exact:1, MISS:gold
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 3.4s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 3.3s)
PASS exact:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green] (1 jersey(s), 11.1s)
PARTIAL exact:1, MISS:gold
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (3 jersey(s), 6.6s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [(none)] (1 jersey(s), 4.0s)
FAIL MISS:gold|yellow
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 7.7s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 4.3s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 4.8s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (1 jersey(s), 4.0s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 4.3s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 3.5s)
PASS similar:1
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 5.7s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (5 jersey(s), 40.7s)
PASS exact:1, similar:1
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 4.7s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 5.8s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 4.3s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [(none)] (5 jersey(s), 38.2s)
FAIL MISS:red
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 6.2s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 3.8s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 3.5s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (1 jersey(s), 9.0s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 4.3s)
PASS exact:1
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 3.2s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue, pink] (4 jersey(s), 8.1s)
PARTIAL exact:1, extra:pink
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (5 jersey(s), 37.1s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 10.8s)
FAIL MISS:light blue, extra:blue
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, red] (6 jersey(s), 36.8s)
PARTIAL exact:1, MISS:maroon, extra:red
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 3.4s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 3.9s)
PASS similar:1
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (3 jersey(s), 6.3s)
PASS similar:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [black] (2 jersey(s), 14.2s)
FAIL MISS:dark brown, extra:black
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [brown, yellow] (2 jersey(s), 3.8s)
PASS exact:1, similar:1
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.3s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 4.5s)
PASS similar:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.5s)
PASS exact:1
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [maroon] (4 jersey(s), 41.4s)
PASS exact:1
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 5.8s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (5 jersey(s), 38.4s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (3 jersey(s), 10.2s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 39.3s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (2 jersey(s), 4.6s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 7.0s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 22.9s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 5.2s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [grey] (2 jersey(s), 19.4s)
PARTIAL similar:1, MISS:black
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 4.3s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [red] (3 jersey(s), 4.5s)
FAIL MISS:maroon, extra:red
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 40.0s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (7 jersey(s), 39.2s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black] (1 jersey(s), 4.2s)
PARTIAL exact:1, MISS:yellow
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 36.3s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 4.1s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 6.3s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, grey] (2 jersey(s), 4.0s)
PASS exact:1, similar:1
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 4.4s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (1 jersey(s), 47.1s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 5.2s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 10.7s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 34.8s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 5.8s)
PASS exact:1
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 5.1s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 5.7s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, red] (4 jersey(s), 7.9s)
PARTIAL similar:1, MISS:maroon, extra:red
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [grey] (2 jersey(s), 3.9s)
PASS similar:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.8s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (1 jersey(s), 8.5s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (3 jersey(s), 4.7s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [dark blue, red] (3 jersey(s), 7.0s)
PASS exact:2
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 5.9s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [grey] (1 jersey(s), 2.6s)
PASS similar:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (4 jersey(s), 8.8s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (4 jersey(s), 4.9s)
PASS similar:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [navy, red] (3 jersey(s), 8.1s)
PARTIAL similar:1, MISS:maroon, extra:red
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 5.8s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [yellow] (4 jersey(s), 4.8s)
PASS exact:1
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [(none)] (0 jersey(s), 42.6s)
FAIL MISS:green
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [(none)] (3 jersey(s), 16.8s)
FAIL MISS:blue
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [yellow] (1 jersey(s), 3.4s)
PARTIAL exact:1, MISS:black
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 3.8s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (2 jersey(s), 10.2s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (8 jersey(s), 43.5s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [light blue] (1 jersey(s), 4.3s)
FAIL MISS:teal, extra:light blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (1 jersey(s), 5.2s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.8s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 8.6s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 3.3s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [blue] (1 jersey(s), 4.8s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 3.5s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (3 jersey(s), 5.2s)
FAIL MISS:light blue, extra:blue
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 5.5s)
PASS exact:1
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 4.7s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (8 jersey(s), 39.7s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 3.8s)
PASS exact:1
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [grey, red] (2 jersey(s), 4.0s)
PASS exact:1, similar:1
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 4.1s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 5.9s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 36.9s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [black] (1 jersey(s), 12.8s)
FAIL MISS:green,gray, extra:black
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [dark blue, yellow] (5 jersey(s), 38.6s)
PARTIAL exact:1, MISS:black, extra:dark blue
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [navy blue, pink] (3 jersey(s), 22.1s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 3.7s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [brown] (5 jersey(s), 5.1s)
PASS similar:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [grey, purple, yellow] (5 jersey(s), 7.4s)
PARTIAL exact:2, similar:1, MISS:green
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (1 jersey(s), 12.0s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 38.1s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [blue, yellow] (7 jersey(s), 37.4s)
PASS exact:1, similar:1
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 11.2s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [(none)] (1 jersey(s), 4.2s)
FAIL MISS:blue
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [blue] (2 jersey(s), 5.6s)
FAIL MISS:light blue, extra:blue
================================================================================
ACCURACY SUMMARY (gemini-3-flash-preview)
================================================================================
Images processed: 161
Errors: 0
Total time: 259.8s (1.6s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 177 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 123 / 202 (60.9%)
Similar match: 35 / 202 (17.3%)
Total found: 158 / 202 (78.2%)
Missed: 44 / 202 (21.8%)
--- Precision (are VLM colors correct?) ---
Exact match: 123 / 177 (69.5%)
Similar match: 34 / 177 (19.2%)
Total correct: 157 / 177 (88.7%)
Extra/wrong: 20 / 177 (11.3%)
--- Similar-Match Confusions (expected -> got) ---
gray -> grey x10
navy blue -> blue x7
dark brown -> brown x5
dark blue -> blue x5
dark blue -> navy blue x3
gold -> yellow x2
navy blue -> dark blue x1
navy -> blue x1
dark blue -> navy x1
--- Most Missed Ground Truth Colors ---
maroon 8 ########
black 7 #######
gray 6 ######
light blue 4 ####
green 4 ####
red 3 ###
gold 3 ###
blue 3 ###
teal 2 ##
gold|yellow 1 #
dark brown 1 #
yellow 1 #
brown 1 #
--- Most Common Extra/Wrong VLM Colors ---
red 7 #######
blue 5 #####
black 3 ###
pink 1 #
green 1 #
orange 1 #
light blue 1 #
dark blue 1 #
--- Per-Image Verdict ---
PASS 117
PARTIAL 22
FAIL 22
--- Failed Images (22) ---
016 - maroon.jpg
missed: maroon
019 - maroon_gold.jpg
missed: maroon, gold
extra: red
029 -maroon_white.jpg
missed: maroon
extra: red
034 - light blue.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
048 - red.jpg
missed: red
053 - black_white.jpg
missed: black
057 - white_gold or yellow.jpg
missed: gold|yellow
069 - red_white.jpg
missed: red
077 - teal_white.jpg
missed: teal
extra: green
078 - light blue_white.jpg
missed: light blue
extra: blue
083 - dark brown_white.jpg
missed: dark brown
extra: black
099 - maroon_white.jpg
missed: maroon
extra: red
128 - green_white.jpg
missed: green
129 - blue_white.jpg
missed: blue
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: light blue
138 - maroon.jpg
missed: maroon
extra: red
141 - light blue_white.jpg
missed: light blue
extra: blue
150 - green_gray.jpg
missed: green, gray
extra: black
160 - blue_white.jpg
missed: blue
161 - light blue_white.jpg
missed: light blue
extra: blue
========================================
Qwen3-VL-8B + jersey_prompt_constrained.txt
Started: Tue Mar 3 05:39:17 PM MST 2026
========================================
Images to process: 161
Server: http://agx:8080
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [black] (3 jersey(s), 11.6s)
FAIL MISS:brown,dark brown, extra:black
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 8.0s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [dark blue] (3 jersey(s), 11.1s)
PASS exact:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (3 jersey(s), 11.7s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 5.0s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [dark blue] (1 jersey(s), 4.4s)
PASS similar:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [maroon] (2 jersey(s), 8.0s)
FAIL MISS:brown, extra:maroon
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 4.3s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [gold, red] (3 jersey(s), 10.8s)
PARTIAL exact:1, extra:gold
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black, maroon] (3 jersey(s), 11.0s)
PARTIAL exact:1, extra:maroon
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 13.8s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.3s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.6s)
PASS exact:1
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 11.0s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.3s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 1.7s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [dark brown] (2 jersey(s), 8.8s)
PASS similar:1
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [gray, red] (2 jersey(s), 7.3s)
PASS exact:2
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [maroon, yellow] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 8.1s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 8.0s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 5.0s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.7s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 7.7s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 4.3s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 16.1s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.9s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.0s)
PASS exact:1
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.8s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [maroon] (2 jersey(s), 7.9s)
FAIL MISS:brown, extra:maroon
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 8.1s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (3 jersey(s), 10.9s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (1 jersey(s), 4.8s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green, yellow] (2 jersey(s), 8.0s)
PASS exact:2
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 14.0s)
PASS exact:1
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [dark blue] (3 jersey(s), 10.3s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 10.9s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 7.9s)
PASS exact:1
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (1 jersey(s), 5.1s)
PARTIAL exact:1, MISS:gray
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [navy blue] (9 jersey(s), 30.6s)
PASS exact:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.9s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, gray] (2 jersey(s), 8.0s)
PASS exact:2
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (7 jersey(s), 22.6s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.8s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (15 jersey(s), 46.5s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 10.8s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [maroon] (1 jersey(s), 5.0s)
FAIL MISS:red, extra:maroon
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 7.9s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 14.1s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.9s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black, yellow] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 4.9s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [navy blue] (2 jersey(s), 8.1s)
PASS similar:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (2 jersey(s), 7.8s)
PASS exact:1, similar:1
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.9s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [(none)] (1 jersey(s), 4.9s)
FAIL MISS:gold|yellow
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 14.0s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 4.9s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 8.1s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 7.8s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 7.5s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [dark brown] (1 jersey(s), 5.0s)
PASS exact:1
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.7s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [dark green, yellow] (3 jersey(s), 10.6s)
PASS similar:2
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 4.8s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 13.7s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 4.8s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [(none)] (4 jersey(s), 14.1s)
FAIL MISS:red
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 11.1s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.0s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.5s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.4s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.5s)
PASS exact:1
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.7s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (3 jersey(s), 11.4s)
PASS exact:1
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (4 jersey(s), 13.4s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.7s)
PASS exact:1
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, maroon] (4 jersey(s), 14.1s)
PASS exact:2
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.8s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 7.7s)
PASS similar:1
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [dark blue] (3 jersey(s), 10.8s)
PASS exact:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [black] (2 jersey(s), 7.9s)
FAIL MISS:dark brown, extra:black
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [dark brown, gold] (2 jersey(s), 8.0s)
PASS exact:1, similar:1
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.8s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [dark brown] (2 jersey(s), 8.0s)
PASS exact:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 8.0s)
PASS exact:1
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.8s)
PASS exact:1
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 11.1s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (4 jersey(s), 14.3s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 8.0s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 14.0s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 8.1s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 12.5s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.0s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 8.6s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [light blue] (2 jersey(s), 8.3s)
FAIL MISS:gray,black, extra:light blue
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 8.7s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 12.2s)
PASS exact:1
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 13.8s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 17.0s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black, yellow] (2 jersey(s), 8.0s)
PASS exact:2
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (5 jersey(s), 17.3s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.0s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 9.2s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, gray] (2 jersey(s), 9.1s)
PASS exact:2
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.7s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 8.0s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 7.8s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 14.0s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.9s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 7.8s)
PASS exact:1
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 4.9s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 8.1s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, maroon] (4 jersey(s), 14.0s)
PASS exact:1, similar:1
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 8.0s)
PASS exact:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.1s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 7.8s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (3 jersey(s), 10.9s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [navy blue, red] (3 jersey(s), 11.1s)
PASS exact:1, similar:1
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 10.9s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [gray] (1 jersey(s), 6.3s)
PASS exact:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (4 jersey(s), 14.1s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [dark blue] (4 jersey(s), 13.9s)
PASS exact:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [dark blue, red] (2 jersey(s), 8.2s)
PARTIAL exact:1, MISS:maroon, extra:red
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 11.0s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [yellow] (4 jersey(s), 13.9s)
PASS exact:1
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.0s)
PASS exact:1
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 17.2s)
PASS exact:1
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [black, yellow] (2 jersey(s), 8.4s)
PASS exact:2
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 10.8s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (3 jersey(s), 10.8s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (6 jersey(s), 21.2s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [light blue] (1 jersey(s), 5.1s)
FAIL MISS:teal, extra:light blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.1s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 8.0s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 11.0s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 4.9s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 8.3s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 7.7s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (3 jersey(s), 11.2s)
PASS exact:1
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [maroon] (2 jersey(s), 8.2s)
FAIL MISS:orange, extra:maroon
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 11.1s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (10 jersey(s), 31.9s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [(none)] (1 jersey(s), 5.0s)
FAIL MISS:green
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [gray, red] (2 jersey(s), 8.0s)
PASS exact:2
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 10.8s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 7.8s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 16.7s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [dark blue] (2 jersey(s), 7.9s)
FAIL MISS:green,gray, extra:dark blue
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [dark blue, yellow] (5 jersey(s), 17.1s)
PARTIAL exact:1, MISS:black, extra:dark blue
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [navy blue, pink] (2 jersey(s), 8.3s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 8.1s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [dark brown] (5 jersey(s), 17.3s)
PASS exact:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [gray, purple, yellow] (5 jersey(s), 17.4s)
PARTIAL exact:3, MISS:green
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (2 jersey(s), 7.7s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 10.7s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [dark blue, yellow] (4 jersey(s), 14.3s)
PASS exact:2
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (4 jersey(s), 13.9s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 7.9s)
PASS exact:1
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 7.9s)
PASS exact:1
================================================================================
ACCURACY SUMMARY
================================================================================
Images processed: 161
Errors: 0
Total time: 1596.1s (9.9s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 185 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 145 / 202 (71.8%)
Similar match: 22 / 202 (10.9%)
Total found: 167 / 202 (82.7%)
Missed: 35 / 202 (17.3%)
--- Precision (are VLM colors correct?) ---
Exact match: 145 / 185 (78.4%)
Similar match: 22 / 185 (11.9%)
Total correct: 167 / 185 (90.3%)
Extra/wrong: 18 / 185 (9.7%)
--- Similar-Match Confusions (expected -> got) ---
navy blue -> blue x6
gold -> yellow x5
dark blue -> navy blue x5
navy blue -> dark blue x1
brown -> dark brown x1
navy -> dark blue x1
blue -> navy blue x1
green -> dark green x1
yellow -> gold x1
--- Most Missed Ground Truth Colors ---
gray 8 ########
black 6 ######
brown 4 ####
green 4 ####
maroon 3 ###
dark brown 2 ##
red 2 ##
teal 2 ##
blue 1 #
light blue 1 #
gold|yellow 1 #
orange 1 #
--- Most Common Extra/Wrong VLM Colors ---
maroon 5 #####
black 3 ###
light blue 2 ##
red 2 ##
dark blue 2 ##
gold 1 #
blue 1 #
green 1 #
orange 1 #
--- Per-Image Verdict ---
PASS 127
PARTIAL 15
FAIL 19
--- Failed Images (19) ---
001 -brown_white or dark brown.jpg
missed: brown, dark brown
extra: black
007 - brown_white.jpg
missed: brown
extra: maroon
016 - maroon.jpg
missed: maroon
031 - brown_white.jpg
missed: brown
extra: maroon
034 - light blue.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
048 - red.jpg
missed: red
extra: maroon
053 - black_white.jpg
missed: black
057 - white_gold or yellow.jpg
missed: gold|yellow
069 - red_white.jpg
missed: red
077 - teal_white.jpg
missed: teal
extra: green
083 - dark brown_white.jpg
missed: dark brown
extra: black
097 - gray_black.jpg
missed: gray, black
extra: light blue
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: light blue
138 - maroon.jpg
missed: maroon
extra: red
142 - orange_white.jpg
missed: orange
extra: maroon
145 - green_white.jpg
missed: green
150 - green_gray.jpg
missed: green, gray
extra: dark blue
========================================
Gemini 3 Flash + jersey_prompt_constrained.txt
Started: Tue Mar 3 06:05:53 PM MST 2026
========================================
Model: gemini-3-flash-preview
Images to process: 161
Concurrency: 8 workers
Prompt: /home/rmcewen/data/dev.python/jersey_test/jersey_prompt_constrained.txt (2223 chars)
================================================================================
Pre-encoding images ... 161 images in 1.7s
Sending API requests ...
1/161 API calls completed
2/161 API calls completed
3/161 API calls completed
4/161 API calls completed
5/161 API calls completed
6/161 API calls completed
7/161 API calls completed
8/161 API calls completed
9/161 API calls completed
10/161 API calls completed
11/161 API calls completed
12/161 API calls completed
13/161 API calls completed
14/161 API calls completed
15/161 API calls completed
16/161 API calls completed
17/161 API calls completed
18/161 API calls completed
19/161 API calls completed
20/161 API calls completed
21/161 API calls completed
22/161 API calls completed
23/161 API calls completed
24/161 API calls completed
25/161 API calls completed
26/161 API calls completed
27/161 API calls completed
28/161 API calls completed
29/161 API calls completed
30/161 API calls completed
31/161 API calls completed
32/161 API calls completed
33/161 API calls completed
34/161 API calls completed
35/161 API calls completed
36/161 API calls completed
37/161 API calls completed
38/161 API calls completed
39/161 API calls completed
40/161 API calls completed
41/161 API calls completed
42/161 API calls completed
43/161 API calls completed
44/161 API calls completed
45/161 API calls completed
46/161 API calls completed
47/161 API calls completed
48/161 API calls completed
49/161 API calls completed
50/161 API calls completed
51/161 API calls completed
52/161 API calls completed
53/161 API calls completed
54/161 API calls completed
55/161 API calls completed
56/161 API calls completed
57/161 API calls completed
58/161 API calls completed
59/161 API calls completed
60/161 API calls completed
61/161 API calls completed
62/161 API calls completed
63/161 API calls completed
64/161 API calls completed
65/161 API calls completed
66/161 API calls completed
67/161 API calls completed
68/161 API calls completed
69/161 API calls completed
70/161 API calls completed
71/161 API calls completed
72/161 API calls completed
73/161 API calls completed
74/161 API calls completed
75/161 API calls completed
76/161 API calls completed
77/161 API calls completed
78/161 API calls completed
79/161 API calls completed
80/161 API calls completed
81/161 API calls completed
82/161 API calls completed
83/161 API calls completed
84/161 API calls completed
85/161 API calls completed
86/161 API calls completed
87/161 API calls completed
88/161 API calls completed
89/161 API calls completed
90/161 API calls completed
91/161 API calls completed
92/161 API calls completed
93/161 API calls completed
94/161 API calls completed
95/161 API calls completed
96/161 API calls completed
97/161 API calls completed
98/161 API calls completed
99/161 API calls completed
100/161 API calls completed
101/161 API calls completed
102/161 API calls completed
103/161 API calls completed
104/161 API calls completed
105/161 API calls completed
106/161 API calls completed
107/161 API calls completed
108/161 API calls completed
109/161 API calls completed
110/161 API calls completed
111/161 API calls completed
112/161 API calls completed
113/161 API calls completed
114/161 API calls completed
115/161 API calls completed
116/161 API calls completed
117/161 API calls completed
118/161 API calls completed
119/161 API calls completed
120/161 API calls completed
121/161 API calls completed
122/161 API calls completed
123/161 API calls completed
124/161 API calls completed
125/161 API calls completed
126/161 API calls completed
127/161 API calls completed
128/161 API calls completed
129/161 API calls completed
130/161 API calls completed
131/161 API calls completed
132/161 API calls completed
133/161 API calls completed
134/161 API calls completed
135/161 API calls completed
136/161 API calls completed
137/161 API calls completed
138/161 API calls completed
139/161 API calls completed
140/161 API calls completed
141/161 API calls completed
142/161 API calls completed
143/161 API calls completed
144/161 API calls completed
145/161 API calls completed
146/161 API calls completed
147/161 API calls completed
148/161 API calls completed
149/161 API calls completed
150/161 API calls completed
151/161 API calls completed
152/161 API calls completed
153/161 API calls completed
154/161 API calls completed
155/161 API calls completed
156/161 API calls completed
157/161 API calls completed
158/161 API calls completed
159/161 API calls completed
160/161 API calls completed
161/161 API calls completed (344.4s total)
================================================================================
[1/161] 001 -brown_white or dark brown.jpg
GT: [brown, dark brown]
VLM: [dark brown] (2 jersey(s), 36.3s)
PASS exact:1, similar:1
[2/161] 002 - yellow.jpg
GT: [yellow]
VLM: [yellow] (2 jersey(s), 6.3s)
PASS exact:1
[3/161] 003 - dark blue.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 7.5s)
PASS similar:1
[4/161] 004 - purple_light blue.jpg
GT: [purple, light blue]
VLM: [light blue, purple] (2 jersey(s), 37.3s)
PASS exact:2
[5/161] 005 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (1 jersey(s), 4.5s)
PARTIAL exact:1, MISS:gray
[6/161] 006 - navy blue.jpg
GT: [navy blue]
VLM: [navy blue] (1 jersey(s), 5.0s)
PASS exact:1
[7/161] 007 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.1s)
PASS exact:1
[8/161] 008 -red or orange.jpg
GT: [red|orange]
VLM: [red] (1 jersey(s), 3.2s)
PASS exact:1
[9/161] 009 - white_red.jpg
GT: [red]
VLM: [red] (4 jersey(s), 35.1s)
PASS exact:1
[10/161] 010 - white_black.jpg
GT: [black]
VLM: [black] (3 jersey(s), 10.5s)
PASS exact:1
[11/161] 011 - white or gray_purple.jpg
GT: [gray, purple]
VLM: [purple] (4 jersey(s), 40.8s)
PARTIAL exact:1, MISS:gray
[12/161] 012 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 5.3s)
PASS exact:1
[13/161] 013 - light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 8.9s)
PASS exact:1
[14/161] 014 - orange_dark blue or purple.jpg
GT: [orange, dark blue|purple]
VLM: [orange, purple] (3 jersey(s), 9.8s)
PASS exact:2
[15/161] 015 - green.jpg
GT: [green]
VLM: [green] (2 jersey(s), 4.4s)
PASS exact:1
[16/161] 016 - maroon.jpg
GT: [maroon]
VLM: [(none)] (0 jersey(s), 3.9s)
FAIL MISS:maroon
[17/161] 017 - brown_white.jpg
GT: [brown]
VLM: [dark brown] (2 jersey(s), 6.5s)
PASS similar:1
[18/161] 018 - gray_red.jpg
GT: [gray, red]
VLM: [gray] (1 jersey(s), 8.7s)
PARTIAL exact:1, MISS:red
[19/161] 019 - maroon_gold.jpg
GT: [maroon, gold]
VLM: [maroon] (1 jersey(s), 4.5s)
PARTIAL exact:1, MISS:gold
[20/161] 020 - white_brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (2 jersey(s), 4.9s)
PASS exact:1
[21/161] 021 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 9.1s)
PASS exact:1
[22/161] 022 - black_light blue.jpg
GT: [black, light blue]
VLM: [light blue] (1 jersey(s), 5.0s)
PARTIAL exact:1, MISS:black
[23/161] 023 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.2s)
PASS exact:1
[24/161] 024 - white_pink.jpg
GT: [pink]
VLM: [pink] (2 jersey(s), 5.7s)
PASS exact:1
[25/161] 025 - blue_green.jpg
GT: [blue, green]
VLM: [green] (1 jersey(s), 3.8s)
PARTIAL exact:1, MISS:blue
[26/161] 026 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 6.8s)
PASS exact:1
[27/161] 027 - red_white.jpg
GT: [red]
VLM: [red] (4 jersey(s), 37.7s)
PASS exact:1
[28/161] 028 - green_white.jpg
GT: [green]
VLM: [green] (6 jersey(s), 41.4s)
PASS exact:1
[29/161] 029 -maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 7.1s)
PASS exact:1
[30/161] 030 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (2 jersey(s), 5.8s)
PASS similar:1
[31/161] 031 - brown_white.jpg
GT: [brown]
VLM: [brown] (2 jersey(s), 6.0s)
PASS exact:1
[32/161] 032 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 5.9s)
PASS exact:1
[33/161] 033 - navy blue_white or gray.jpg
GT: [navy blue, gray]
VLM: [blue] (8 jersey(s), 43.6s)
PARTIAL similar:1, MISS:gray
[34/161] 034 - light blue.jpg
GT: [light blue]
VLM: [blue] (1 jersey(s), 11.5s)
FAIL MISS:light blue, extra:blue
[35/161] 035 -green_gold or yellow.jpg
GT: [green, gold|yellow]
VLM: [green] (1 jersey(s), 11.6s)
PARTIAL exact:1, MISS:gold|yellow
[36/161] 036 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 9.7s)
PASS exact:1
[37/161] 037 -navy_white.jpg
GT: [navy]
VLM: [navy blue] (3 jersey(s), 16.0s)
PASS similar:1
[38/161] 038 - red_white.jpg
GT: [red]
VLM: [red] (3 jersey(s), 38.7s)
PASS exact:1
[39/161] 039 - gray_white.jpg
GT: [gray]
VLM: [gray] (3 jersey(s), 18.4s)
PASS exact:1
[40/161] 040 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [gray, maroon] (2 jersey(s), 5.4s)
PASS exact:2
[41/161] 041 - navy blue_white.jpg
GT: [navy blue]
VLM: [navy blue] (8 jersey(s), 41.5s)
PASS exact:1
[42/161] 042 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 5.1s)
PASS exact:1
[43/161] 043 - gray_black.jpg
GT: [gray, black]
VLM: [black, gray] (5 jersey(s), 39.3s)
PASS exact:2
[44/161] 044 - purple_black.jpg
GT: [purple, black]
VLM: [purple] (8 jersey(s), 36.4s)
PARTIAL exact:1, MISS:black
[45/161] 045 - purple.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 36.0s)
PASS exact:1
[46/161] 046 - green.jpg
GT: [green]
VLM: [black] (8 jersey(s), 35.2s)
FAIL MISS:green, extra:black
[47/161] 047 - purple_white.jpg
GT: [purple]
VLM: [purple] (3 jersey(s), 5.3s)
PASS exact:1
[48/161] 048 - red.jpg
GT: [red]
VLM: [(none)] (0 jersey(s), 36.0s)
FAIL MISS:red
[49/161] 049 - white_gold.jpg
GT: [gold]
VLM: [yellow] (2 jersey(s), 3.6s)
PASS similar:1
[50/161] 050 - white_orange.jpg
GT: [orange]
VLM: [orange] (6 jersey(s), 40.4s)
PASS exact:1
[51/161] 051 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 5.8s)
PASS exact:1
[52/161] 052 - black_gold.jpg
GT: [black, gold]
VLM: [black] (1 jersey(s), 24.0s)
PARTIAL exact:1, MISS:gold
[53/161] 053 - black_white.jpg
GT: [black]
VLM: [(none)] (1 jersey(s), 4.3s)
FAIL MISS:black
[54/161] 054 - white_blue.jpg
GT: [blue]
VLM: [blue] (2 jersey(s), 6.5s)
PASS exact:1
[55/161] 055 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (2 jersey(s), 12.6s)
PASS exact:1, similar:1
[56/161] 056 - white_red.jpg
GT: [red]
VLM: [red] (4 jersey(s), 36.0s)
PASS exact:1
[57/161] 057 - white_gold or yellow.jpg
GT: [gold|yellow]
VLM: [(none)] (1 jersey(s), 4.4s)
FAIL MISS:gold|yellow
[58/161] 058 - purple.jpg
GT: [purple]
VLM: [purple] (4 jersey(s), 6.2s)
PASS exact:1
[59/161] 059 - black_gold.jpg
GT: [black, gold]
VLM: [gold] (1 jersey(s), 4.5s)
PARTIAL exact:1, MISS:black
[60/161] 060 - gray_navy blue.jpg
GT: [gray, navy blue]
VLM: [blue] (2 jersey(s), 7.1s)
PARTIAL similar:1, MISS:gray
[61/161] 061 - brown or orange.jpg
GT: [brown|orange]
VLM: [orange] (1 jersey(s), 3.4s)
PASS exact:1
[62/161] 062 - orange_blue.jpg
GT: [orange, blue]
VLM: [blue, orange] (2 jersey(s), 4.8s)
PASS exact:2
[63/161] 063 - dark brown.jpg
GT: [dark brown]
VLM: [brown] (1 jersey(s), 4.7s)
PASS similar:1
[64/161] 064 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 5.3s)
PASS exact:1
[65/161] 065 - green_gold.jpg
GT: [green, gold]
VLM: [green, yellow] (5 jersey(s), 37.1s)
PASS exact:1, similar:1
[66/161] 066 - yellow.jpg
GT: [yellow]
VLM: [yellow] (1 jersey(s), 6.6s)
PASS exact:1
[67/161] 067 - red_white.jpg
GT: [red]
VLM: [red] (5 jersey(s), 36.5s)
PASS exact:1
[68/161] 068 - gold.jpg
GT: [gold]
VLM: [gold] (1 jersey(s), 39.5s)
PASS exact:1
[69/161] 069 - red_white.jpg
GT: [red]
VLM: [(none)] (5 jersey(s), 40.6s)
FAIL MISS:red
[70/161] 070 - green_white.jpg
GT: [green]
VLM: [green] (3 jersey(s), 7.9s)
PASS exact:1
[71/161] 071 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 4.4s)
PASS exact:1
[72/161] 072 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 5.6s)
PASS exact:1
[73/161] 073 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (1 jersey(s), 4.2s)
PASS exact:1
[74/161] 074 - white_orange.jpg
GT: [orange]
VLM: [(none)] (1 jersey(s), 8.9s)
FAIL MISS:orange
[75/161] 075 - green_white.jpg
GT: [green]
VLM: [green] (1 jersey(s), 5.0s)
PASS exact:1
[76/161] 076 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (4 jersey(s), 38.6s)
PASS exact:1
[77/161] 077 - teal_white.jpg
GT: [teal]
VLM: [green] (5 jersey(s), 34.5s)
FAIL MISS:teal, extra:green
[78/161] 078 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 5.7s)
PASS exact:1
[79/161] 079 - blue_maroon.jpg
GT: [blue, maroon]
VLM: [blue, maroon] (6 jersey(s), 10.0s)
PASS exact:2
[80/161] 080 - navy blue_white.jpg
GT: [navy blue]
VLM: [blue] (1 jersey(s), 7.9s)
PASS similar:1
[81/161] 081 - navy blue.jpg
GT: [navy blue]
VLM: [light blue] (2 jersey(s), 6.6s)
FAIL MISS:navy blue, extra:light blue
[82/161] 082 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (3 jersey(s), 21.3s)
PASS similar:1
[83/161] 083 - dark brown_white.jpg
GT: [dark brown]
VLM: [dark brown] (2 jersey(s), 40.1s)
PASS exact:1
[84/161] 084 - dark brown_yellow.jpg
GT: [dark brown, yellow]
VLM: [dark brown, gold] (2 jersey(s), 8.6s)
PASS exact:1, similar:1
[85/161] 085 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 25.5s)
PASS exact:1
[86/161] 086 - dark brown_white.jpg
GT: [dark brown]
VLM: [dark brown] (1 jersey(s), 38.5s)
PASS exact:1
[87/161] 087 - white_light blue.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 10.2s)
PASS exact:1
[88/161] 088 - white_maroon.jpg
GT: [maroon]
VLM: [(none)] (2 jersey(s), 34.9s)
FAIL MISS:maroon
[89/161] 089 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 7.7s)
PASS exact:1
[90/161] 090 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (5 jersey(s), 36.9s)
PASS exact:1
[91/161] 091 - teal.jpg
GT: [teal]
VLM: [teal] (3 jersey(s), 7.6s)
PASS exact:1
[92/161] 092 - green_white.jpg
GT: [green]
VLM: [green] (6 jersey(s), 40.0s)
PASS exact:1
[93/161] 093 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 6.6s)
PASS similar:1
[94/161] 094 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 6.6s)
PASS exact:1
[95/161] 095 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 35.6s)
PASS exact:1
[96/161] 096 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 3.7s)
PASS exact:1
[97/161] 097 - gray_black.jpg
GT: [gray, black]
VLM: [gray] (4 jersey(s), 39.1s)
PARTIAL exact:1, MISS:black
[98/161] 098 - teal_white.jpg
GT: [teal]
VLM: [teal] (2 jersey(s), 35.9s)
PASS exact:1
[99/161] 099 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (3 jersey(s), 5.6s)
PASS exact:1
[100/161] 100 - orange_white.jpg
GT: [orange]
VLM: [orange] (4 jersey(s), 34.6s)
PASS exact:1
[101/161] 101 - green_white.jpg
GT: [green]
VLM: [green] (7 jersey(s), 38.7s)
PASS exact:1
[102/161] 102 - yellow-black.jpg
GT: [yellow, black]
VLM: [black] (1 jersey(s), 7.1s)
PARTIAL exact:1, MISS:yellow
[103/161] 103 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 35.0s)
PASS exact:1
[104/161] 104 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 35.3s)
PASS exact:1
[105/161] 105 - orange.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 4.8s)
PASS exact:1
[106/161] 106 - black_gray.jpg
GT: [black, gray]
VLM: [black, gray] (2 jersey(s), 6.9s)
PASS exact:2
[107/161] 107 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 7.8s)
PASS exact:1
[108/161] 108 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.3s)
PASS exact:1
[109/161] 109 - purple_white.jpg
GT: [purple]
VLM: [purple] (2 jersey(s), 4.8s)
PASS exact:1
[110/161] 110 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 7.0s)
PASS exact:1
[111/161] 111 - orange_white.jpg
GT: [orange]
VLM: [orange] (2 jersey(s), 10.9s)
PASS exact:1
[112/161] 112 - orange_white.jpg
GT: [orange]
VLM: [(none)] (0 jersey(s), 37.6s)
FAIL MISS:orange
[113/161] 113 - orange.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 3.5s)
PASS exact:1
[114/161] 114 - black_white.jpg
GT: [black]
VLM: [black] (2 jersey(s), 5.5s)
PASS exact:1
[115/161] 115 - navy blue_maroon.jpg
GT: [navy blue, maroon]
VLM: [blue, maroon] (4 jersey(s), 7.4s)
PASS exact:1, similar:1
[116/161] 116 - gray_white.jpg
GT: [gray]
VLM: [gray] (2 jersey(s), 39.7s)
PASS exact:1
[117/161] 117 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 37.5s)
PASS exact:1
[118/161] 118 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (2 jersey(s), 12.1s)
PASS similar:1
[119/161] 119 - black_yellow.jpg
GT: [black, yellow]
VLM: [black, yellow] (4 jersey(s), 36.4s)
PASS exact:2
[120/161] 120 - red_dark blue.jpg
GT: [red, dark blue]
VLM: [navy blue, red] (3 jersey(s), 17.4s)
PASS exact:1, similar:1
[121/161] 121 - orange_white.jpg
GT: [orange]
VLM: [orange] (3 jersey(s), 17.7s)
PASS exact:1
[122/161] 122 - gray.jpg
GT: [gray]
VLM: [gray] (1 jersey(s), 4.1s)
PASS exact:1
[123/161] 123 - teal_white.jpg
GT: [teal]
VLM: [teal] (4 jersey(s), 11.1s)
PASS exact:1
[124/161] 124 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (4 jersey(s), 8.1s)
PASS similar:1
[125/161] 125 - dark blue_maroon.jpg
GT: [dark blue, maroon]
VLM: [maroon, navy blue] (4 jersey(s), 17.9s)
PASS exact:1, similar:1
[126/161] 126 - white_blue.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 6.8s)
PASS exact:1
[127/161] 127 - yellow.jpg
GT: [yellow]
VLM: [black, gold] (5 jersey(s), 39.3s)
PARTIAL similar:1, extra:black
[128/161] 128 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 11.6s)
PASS exact:1
[129/161] 129 - blue_white.jpg
GT: [blue]
VLM: [(none)] (3 jersey(s), 6.1s)
FAIL MISS:blue
[130/161] 130 - yellow_black.jpg
GT: [yellow, black]
VLM: [yellow] (1 jersey(s), 4.3s)
PARTIAL exact:1, MISS:black
[131/161] 131 - purple_orange.jpg
GT: [purple, orange]
VLM: [orange, purple] (3 jersey(s), 9.4s)
PASS exact:2
[132/161] 132 - brown_white.jpg
GT: [brown]
VLM: [orange] (2 jersey(s), 36.4s)
FAIL MISS:brown, extra:orange
[133/161] 133 - light blue.png
GT: [light blue]
VLM: [light blue] (7 jersey(s), 38.8s)
PASS exact:1
[134/161] 134 - teal_white.jpg
GT: [teal]
VLM: [light blue] (1 jersey(s), 11.2s)
FAIL MISS:teal, extra:light blue
[135/161] 135 - green.jpg
GT: [green]
VLM: [green] (1 jersey(s), 4.9s)
PASS exact:1
[136/161] 136 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 6.8s)
PASS exact:1
[137/161] 137 - green_white.jpg
GT: [green]
VLM: [green] (4 jersey(s), 9.8s)
PASS exact:1
[138/161] 138 - maroon.jpg
GT: [maroon]
VLM: [red] (1 jersey(s), 4.3s)
FAIL MISS:maroon, extra:red
[139/161] 139 - dark blue_white.jpg
GT: [dark blue]
VLM: [navy blue] (1 jersey(s), 5.3s)
PASS similar:1
[140/161] 140 - red_white.jpg
GT: [red]
VLM: [red] (2 jersey(s), 5.3s)
PASS exact:1
[141/161] 141 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (3 jersey(s), 6.3s)
PASS exact:1
[142/161] 142 - orange_white.jpg
GT: [orange]
VLM: [orange] (1 jersey(s), 5.3s)
PASS exact:1
[143/161] 143 - blue_white.jpg
GT: [blue]
VLM: [blue] (3 jersey(s), 5.7s)
PASS exact:1
[144/161] 144 - green.jpg
GT: [green]
VLM: [green] (8 jersey(s), 38.3s)
PASS exact:1
[145/161] 145 - green_white.jpg
GT: [green]
VLM: [green] (2 jersey(s), 7.3s)
PASS exact:1
[146/161] 146 - red_gray.jpg
GT: [red, gray]
VLM: [gray, red] (2 jersey(s), 4.7s)
PASS exact:2
[147/161] 147 - green.jpg
GT: [green]
VLM: [green] (3 jersey(s), 5.2s)
PASS exact:1
[148/161] 148 - yellow_purple.jpg
GT: [yellow, purple]
VLM: [purple, yellow] (2 jersey(s), 8.4s)
PASS exact:2
[149/161] 149 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 38.0s)
PASS exact:1
[150/161] 150 - green_gray.jpg
GT: [green, gray]
VLM: [black] (2 jersey(s), 10.3s)
FAIL MISS:green,gray, extra:black
[151/161] 151 - yellow_black.jpg
GT: [yellow, black]
VLM: [gold, navy blue] (6 jersey(s), 35.2s)
PARTIAL similar:1, MISS:black, extra:navy blue
[152/161] 152 - pink_dark blue.jpg
GT: [pink, dark blue]
VLM: [navy blue, pink] (3 jersey(s), 7.9s)
PASS exact:1, similar:1
[153/161] 153 - maroon_white.jpg
GT: [maroon]
VLM: [maroon] (2 jersey(s), 4.6s)
PASS exact:1
[154/161] 154 - dark brown.jpeg
GT: [dark brown]
VLM: [brown] (5 jersey(s), 8.9s)
PASS similar:1
[155/161] 155 - white_green_gray_purple_yellow.jpg
GT: [green, gray, purple, yellow]
VLM: [gold, gray, purple] (5 jersey(s), 21.6s)
PARTIAL exact:2, similar:1, MISS:green
[156/161] 156 - maroon_gray.jpg
GT: [maroon, gray]
VLM: [maroon] (2 jersey(s), 15.0s)
PARTIAL exact:1, MISS:gray
[157/161] 157 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 37.0s)
PASS exact:1
[158/161] 158 - dark blue_yellow.jpg
GT: [dark blue, yellow]
VLM: [gold, navy blue] (5 jersey(s), 37.4s)
PASS similar:2
[159/161] 159 - blue_white.jpg
GT: [blue]
VLM: [blue] (5 jersey(s), 10.1s)
PASS exact:1
[160/161] 160 - blue_white.jpg
GT: [blue]
VLM: [(none)] (1 jersey(s), 4.3s)
FAIL MISS:blue
[161/161] 161 - light blue_white.jpg
GT: [light blue]
VLM: [light blue] (2 jersey(s), 4.4s)
PASS exact:1
================================================================================
ACCURACY SUMMARY (gemini-3-flash-preview)
================================================================================
Images processed: 161
Errors: 0
Total time: 344.4s (2.1s avg)
Ground truth colors: 202 (excluding white)
VLM unique colors: 174 (excluding white)
--- Recall (did VLM find each ground truth color?) ---
Exact match: 137 / 202 (67.8%)
Similar match: 28 / 202 (13.9%)
Total found: 165 / 202 (81.7%)
Missed: 37 / 202 (18.3%)
--- Precision (are VLM colors correct?) ---
Exact match: 137 / 174 (78.7%)
Similar match: 27 / 174 (15.5%)
Total correct: 164 / 174 (94.3%)
Extra/wrong: 10 / 174 (5.7%)
--- Similar-Match Confusions (expected -> got) ---
dark blue -> navy blue x10
navy blue -> blue x5
yellow -> gold x5
gold -> yellow x3
brown -> dark brown x2
dark brown -> brown x2
navy -> navy blue x1
--- Most Missed Ground Truth Colors ---
black 7 #######
gray 6 ######
maroon 3 ###
red 3 ###
blue 3 ###
green 3 ###
gold 2 ##
gold|yellow 2 ##
orange 2 ##
teal 2 ##
light blue 1 #
navy blue 1 #
yellow 1 #
brown 1 #
--- Most Common Extra/Wrong VLM Colors ---
black 3 ###
light blue 2 ##
blue 1 #
green 1 #
orange 1 #
red 1 #
navy blue 1 #
--- Per-Image Verdict ---
PASS 124
PARTIAL 19
FAIL 18
--- Failed Images (18) ---
016 - maroon.jpg
missed: maroon
034 - light blue.jpg
missed: light blue
extra: blue
046 - green.jpg
missed: green
extra: black
048 - red.jpg
missed: red
053 - black_white.jpg
missed: black
057 - white_gold or yellow.jpg
missed: gold|yellow
069 - red_white.jpg
missed: red
074 - white_orange.jpg
missed: orange
077 - teal_white.jpg
missed: teal
extra: green
081 - navy blue.jpg
missed: navy blue
extra: light blue
088 - white_maroon.jpg
missed: maroon
112 - orange_white.jpg
missed: orange
129 - blue_white.jpg
missed: blue
132 - brown_white.jpg
missed: brown
extra: orange
134 - teal_white.jpg
missed: teal
extra: light blue
138 - maroon.jpg
missed: maroon
extra: red
150 - green_gray.jpg
missed: green, gray
extra: black
160 - blue_white.jpg
missed: blue
========================================
All tests completed at: Tue Mar 3 06:11:40 PM MST 2026