# Jersey Color Detection - VLM Comparison Report **Date:** 2026-02-24 **Test set:** 161 basketball images (`basketball_jersery_color_test_files/`) ## Overview Five tests were run to evaluate how vision-language models describe jersey colors: | Test | Model | Images | Prompt | Purpose | |------|-------|--------|--------|---------| | 1 | Qwen2.5-VL-7B (local, llama.cpp) | 161 | Named colors | Baseline color vocabulary | | 2 | Gemini 3 Flash (cloud API) | 161 | Named colors | Cloud model color vocabulary | | 3 | Qwen3-VL-8B (local, llama.cpp) | 161 | Named colors | Newer local model color vocabulary | | 4 | Gemini 3 Flash (cloud API) | 20 (random, seed=42) | Hex codes (jersey only) | Hex color specificity | | 5 | Qwen3-VL-8B (local, llama.cpp) | 20 (random, seed=42) | Hex codes (jersey only) | Hex color specificity | --- ## Named Color Vocabulary (Tests 1-3) ### Detection Volume | Metric | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B | |--------|---------------|----------------|--------------| | Jerseys detected | 369 | 453 | 444 | | Errors | 0 | 0 | 1 | | Avg time/image | 14.9s | 15.9s | 17.0s | | Unique jersey colors | 15 | 19 | 15 | | Unique number colors | 11 | 15 | 13 | | Combined palette size | 15 | 19 | 17 | Gemini detected the most jerseys (453) and used the broadest color vocabulary (19 terms). Qwen3-VL-8B detected nearly as many jerseys (444) as Gemini but with a vocabulary closer to the older Qwen2.5 model. ### Jersey Color Distribution | Color | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B | Notes | |-------|---------------|----------------|--------------|-------| | white | 84 (22.8%) | 125 (27.6%) | 120 (27.0%) | Top color for all three | | blue | 60 (16.3%) | 43 (9.5%) | 69 (15.5%) | Both Qwen models lump blues | | green | 48 (13.0%) | 60 (13.2%) | 53 (11.9%) | Consistent across models | | black | 31 (8.4%) | 21 (4.6%) | 33 (7.4%) | | | purple | 25 (6.8%) | 28 (6.2%) | 30 (6.8%) | Consistent | | red | 27 (7.3%) | 22 (4.9%) | 28 (6.3%) | | | orange | 24 (6.5%) | 27 (6.0%) | 27 (6.1%) | Very consistent | | yellow | 27 (7.3%) | 24 (5.3%) | 26 (5.9%) | | | maroon | 14 (3.8%) | 23 (5.1%) | 15 (3.4%) | Gemini uses maroon more | | light blue | 6 (1.6%) | 22 (4.9%) | 13 (2.9%) | Gemini distinguishes light blue most | | gray/grey | 9 (2.4%) | 12 (2.6%) | 10 (2.3%) | | | brown | 6 (1.6%) | 13 (2.9%) | 9 (2.0%) | | | teal | 4 (1.1%) | 7 (1.5%) | 7 (1.6%) | | | pink | 2 (0.5%) | 2 (0.4%) | 2 (0.5%) | | | gold | 2 (0.5%) | 2 (0.4%) | 2 (0.5%) | | | navy blue | -- | 11 (2.4%) | -- | Gemini-only | | dark blue | -- | 9 (2.0%) | -- | Gemini-only | | dark brown | -- | 1 (0.2%) | -- | Gemini-only | | navy | -- | 1 (0.2%) | -- | Gemini-only | ### Number Color Distribution | Color | Qwen2.5-VL-7B | Gemini 3 Flash | Qwen3-VL-8B | |-------|---------------|----------------|--------------| | white | 195 (52.8%) | 183 (40.4%) | 184 (41.4%) | | black | 60 (16.3%) | 40 (8.8%) | 44 (9.9%) | | yellow | 39 (10.6%) | 58 (12.8%) | 32 (7.2%) | | red | 30 (8.1%) | 44 (9.7%) | 41 (9.2%) | | blue | 23 (6.2%) | 39 (8.6%) | 39 (8.8%) | | orange | 8 (2.2%) | 21 (4.6%) | 29 (6.5%) | | gold | -- | 5 (1.1%) | 21 (4.7%) | | dark blue | -- | 14 (3.1%) | 9 (2.0%) | | maroon | 2 (0.5%) | 14 (3.1%) | 12 (2.7%) | | green | 3 (0.8%) | 13 (2.9%) | 14 (3.2%) | | purple | 4 (1.1%) | 11 (2.4%) | 11 (2.5%) | | pink | 3 (0.8%) | 6 (1.3%) | 6 (1.4%) | | brown | 2 (0.5%) | 2 (0.4%) | -- | | grey | -- | 2 (0.4%) | -- | | navy blue | -- | 1 (0.2%) | -- | | silver | -- | -- | 2 (0.5%) | ### Key Differences in Named Color Mode 1. **Gemini has the richest vocabulary.** It uses 19 distinct jersey color terms vs 15 for both Qwen models. The extras are all blue-shade variants (navy blue, dark blue, navy) and dark brown. 2. **Both Qwen models lump blues together.** Qwen2.5-VL-7B reports 60 "blue" jerseys, Qwen3-VL-8B reports 69. Gemini splits these into blue (43), light blue (22), navy blue (11), dark blue (9), and navy (1) — totaling 86 blue-family detections with much finer granularity. 3. **Qwen3-VL-8B is a modest upgrade over Qwen2.5-VL-7B.** It detects 20% more jerseys (444 vs 369) and uses the same 15 jersey color terms but with a slightly more balanced distribution. It has the same vocabulary as Qwen2.5 but added "dark blue", "silver" to its number color palette. 4. **Gemini detects the most jerseys overall.** 453 vs 444 (Qwen3) vs 369 (Qwen2.5). The two newer models are close, while Qwen2.5 lags behind. 5. **All three models are dominated by basic colors.** White, blue/green, and black account for the majority of detections. None spontaneously uses precise shade names like "crimson", "cobalt", or "forest green". 6. **Qwen3-VL-8B favors "gold" for number colors.** It reported gold 21 times for number colors vs Gemini's 5 and Qwen2.5's 0. This may reflect team-specific coloring (e.g., Lakers gold numbers). --- ## Hex Color Specificity (Tests 4-5) Both tests used the same 20 random images (seed=42) and evaluated **jersey colors only** (number colors excluded since they are usually primary colors like white or black). ### Summary | Metric | Gemini 3 Flash | Qwen3-VL-8B | |--------|----------------|--------------| | Images tested | 20 | 20 | | Total jerseys | 56 | 59 | | Jersey color values | 56 | 59 | | Valid hex codes | 56/56 (100%) | 59/59 (100%) | | Unique hex values | 24 | 21 | | Specific (distinct shade) | 40 (71.4%) | 37 (62.7%) | | Generic (near primary) | 16 (28.6%) | 22 (37.3%) | ### Distance from Nearest Primary Color | Stat | Gemini 3 Flash | Qwen3-VL-8B | |------|----------------|--------------| | Min | 0.0 | 0.0 | | Avg | 44.5 | 34.5 | | Max | 111.0 | 110.7 | (Scale: 0 = exact primary match. 20 = generic threshold. Higher = more specific.) ### Gemini 3 Flash - Unique Hex Values (24) | Hex | RGB | Count | Classification | |-----|-----|-------|---------------| | `#004B23` | (0, 75, 35) | x7 | specific, near green (dark), d=63.5 | | `#1A2344` | (26, 35, 68) | x2 | specific, near navy, d=74.2 | | `#1E4BA1` | (30, 75, 161) | x1 | specific, near navy, d=87.3 | | `#2B231D` | (43, 35, 29) | x1 | specific, near black, d=62.6 | | `#3D2B1F` | (61, 43, 31) | x1 | specific, near black, d=80.8 | | `#461D7C` | (70, 29, 124) | x1 | specific, near purple, d=65.0 | | `#4B2E83` | (75, 46, 131) | x5 | specific, near purple, d=70.2 | | `#701112` | (112, 17, 18) | x1 | specific, near maroon, d=29.5 | | `#7BAFD4` | (123, 175, 212) | x3 | specific, near silver, d=73.8 | | `#990000` | (153, 0, 0) | x2 | specific, near maroon, d=25.0 | | `#A9A9A9` | (169, 169, 169) | x1 | specific, near silver, d=39.8 | | `#C41230` | (196, 18, 48) | x1 | specific, near brown, d=39.7 | | `#D11111` | (209, 17, 17) | x2 | specific, near red, d=51.9 | | `#D32F2F` | (211, 47, 47) | x2 | specific, near brown, d=46.5 | | `#E31837` | (227, 24, 55) | x1 | specific, near brown, d=65.9 | | `#E31B23` | (227, 27, 35) | x1 | specific, near red, d=52.3 | | `#E3242B` | (227, 36, 43) | x2 | specific, near brown, d=62.3 | | `#E6E600` | (230, 230, 0) | x1 | specific, near gold, d=29.2 | | `#E8E8E8` | (232, 232, 232) | x1 | specific, near white, d=39.8 | | `#E91E63` | (233, 30, 99) | x1 | specific, near brown, d=89.5 | | `#F06292` | (240, 98, 146) | x2 | specific, near pink, d=111.0 | | `#F57C00` | (245, 124, 0) | x1 | specific, near orange, d=42.2 | | `#FFCD00` | (255, 205, 0) | x1 | GENERIC, near gold, d=10.0 | | `#FFFFFF` | (255, 255, 255) | x15 | GENERIC, near white, d=0.0 | ### Qwen3-VL-8B - Unique Hex Values (21) | Hex | RGB | Count | Classification | |-----|-----|-------|---------------| | `#000000` | (0, 0, 0) | x1 | GENERIC, near black, d=0.0 | | `#006400` | (0, 100, 0) | x10 | specific, near green (dark), d=28.0 | | `#191970` | (25, 25, 112) | x1 | specific, near navy, d=38.8 | | `#19418A` | (25, 65, 138) | x1 | specific, near navy, d=70.4 | | `#3D2B21` | (61, 43, 33) | x2 | specific, near black, d=81.6 | | `#66B2FF` | (102, 178, 255) | x3 | specific, near silver, d=110.7 | | `#6A0DAD` | (106, 13, 173) | x6 | specific, near purple, d=51.7 | | `#8B0000` | (139, 0, 0) | x1 | GENERIC, near maroon, d=11.0 | | `#A9A9A9` | (169, 169, 169) | x1 | specific, near silver, d=39.8 | | `#B22234` | (178, 34, 52) | x2 | GENERIC, near brown, d=18.2 | | `#D32F2F` | (211, 47, 47) | x3 | specific, near brown, d=46.5 | | `#D60000` | (214, 0, 0) | x3 | specific, near red, d=41.0 | | `#DC143C` | (220, 20, 60) | x2 | specific, near brown, d=61.9 | | `#F5F5DC` | (245, 245, 220) | x2 | specific, near white, d=37.7 | | `#F5F5F5` | (245, 245, 245) | x1 | GENERIC, near white, d=17.3 | | `#FF0000` | (255, 0, 0) | x1 | GENERIC, near red, d=0.0 | | `#FF6347` | (255, 99, 71) | x1 | specific, near orange, d=96.9 | | `#FF69B4` | (255, 105, 180) | x2 | specific, near pink, d=90.0 | | `#FFD700` | (255, 215, 0) | x1 | GENERIC, near gold, d=0.0 | | `#FFFF00` | (255, 255, 0) | x1 | GENERIC, near yellow, d=0.0 | | `#FFFFFF` | (255, 255, 255) | x14 | GENERIC, near white, d=0.0 | ### Notable Findings - **Both models can produce valid hex codes.** 100% of returned values were valid hex in both cases. - **Gemini is more specific overall.** 71.4% of its jersey hex codes were distinct shades vs 62.7% for Qwen3. Gemini also produced more unique hex values (24 vs 21) and had a higher average distance from primaries (44.5 vs 34.5). - **Gemini uses more varied shades of each color family.** For red-family jerseys, Gemini returned 8 distinct hex values (`#701112`, `#990000`, `#C41230`, `#D11111`, `#D32F2F`, `#E31837`, `#E31B23`, `#E3242B`). Qwen3 returned 6 (`#8B0000`, `#B22234`, `#D32F2F`, `#D60000`, `#DC143C`, `#FF0000`), including two exact primaries. - **Qwen3 reuses hex values more heavily.** `#006400` (dark green) appeared 10 times and `#FFFFFF` 14 times — two values account for 41% of all results. Gemini's most repeated value was `#FFFFFF` at 15 times (27%), with better spread across other shades. - **White dominates both models.** `#FFFFFF` was the single most common value for both (Gemini: x15, Qwen3: x14), which is expected given white jerseys are the most common in basketball. - **Both models share some exact hex codes.** `#3D2B21` (dark brown), `#A9A9A9` (dark silver/gray), and `#D32F2F` (medium red) appeared in both models' outputs, suggesting some convergence on certain color estimations. --- ## Conclusions 1. **For basic color categorization, all three models work.** If you only need to distinguish "white vs dark vs colored" jerseys, any will do. Gemini offers slightly finer granularity with its blue-shade vocabulary (navy blue, dark blue, navy). 2. **Gemini detects the most jerseys per image** (2.81 avg), followed closely by Qwen3-VL-8B (2.76 avg), with Qwen2.5-VL-7B trailing (2.29 avg). 3. **Qwen3-VL-8B is a solid upgrade over Qwen2.5-VL-7B** for detection volume (+20% more jerseys) while maintaining the same color vocabulary. It runs locally without cloud API costs, making it a good default choice. 4. **Hex color prompting works for jersey body colors.** Both models return specific hex shades the majority of the time (Gemini 71%, Qwen3 63%). Gemini produces more varied and specific shades, while Qwen3 tends to reuse a smaller set of hex values. 5. **Neither model is a reliable colorimeter.** The hex values should be treated as rough shade estimates, not pixel-accurate measurements. For precise color matching, traditional computer vision (e.g., sampling pixels from the detected jersey region) would be more reliable. 6. **Recommendation:** Use named-color prompts for general jersey classification. Reserve hex-color prompts for use cases where distinguishing similar shades matters (e.g., telling apart two teams that both wear "blue"). Gemini gives the best hex specificity but requires a cloud API; Qwen3-VL-8B is a capable local alternative.