Colour in Translation: Data, Models, and Benchmarking for Cross-Linguistic Colour Naming

要旨

Colour naming links vision and language. Yet, effective cross linguistic colour communication is limited by the lack of multilingual data and computational models for comprehensive colour name translation. We collected 6,408 unique colour naming responses in five languages using online experiments and fieldwork. For each language, we train a "spin colour forest", a novel partially rotated decision trees model that accurately estimate colour naming distributions across the full gamut, consistently outperforming existing methods. Unlike prior work that assumed 11 universal colour categories, our results reveal cross-linguistic variation in naming granularity: American English uses 47 indispensable colour names, British English 32, French 27, Greek 32, and the Himba 7 to categorise the same perceptually uniform colour space. Building on these findings, we develop a colour translation benchmark, which we demonstrate by evaluating both the lexical and perceptual accuracy of a large language model. Our evaluation reveals a critical lexical-perceptual disconnect, demonstrating that language models lack perceptual grounding in colour translation. Our data, models, and benchmark provide an empirical foundation for inclusive design that reflects how people communicate colour across cultures.

著者
Dimitris Mylonas
Northeastern University London, London, United Kingdom
Rafique Ahmed
Northeastern University London, London, United Kingdom
Akvile Sinkeviciute
Northeastern University London, London, United Kingdom
Alexandros Koliousis
Northeastern University London, London, United Kingdom

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Modeling Spatial, Linguistic, and Sensory Errors

P1 - Room 128
6 件の発表
2026-04-14 20:15:00
2026-04-14 21:45:00