Colour in Translation: Data, Models, and Benchmarking for Cross-Linguistic Colour Naming

Colour naming links vision and language. Yet, effective cross linguistic colour communication is limited by the lack of multilingual data and computational models for comprehensive colour name translation. We collected 6,408 unique colour naming responses in five languages using online experiments and fieldwork. For each language, we train a "spin colour forest", a novel partially rotated decision trees model that accurately estimate colour naming distributions across the full gamut, consistently outperforming existing methods. Unlike prior work that assumed 11 universal colour categories, our results reveal cross-linguistic variation in naming granularity: American English uses 47 indispensable colour names, British English 32, French 27, Greek 32, and the Himba 7 to categorise the same perceptually uniform colour space. Building on these findings, we develop a colour translation benchmark, which we demonstrate by evaluating both the lexical and perceptual accuracy of a large language model. Our evaluation reveals a critical lexical-perceptual disconnect, demonstrating that language models lack perceptual grounding in colour translation. Our data, models, and benchmark provide an empirical foundation for inclusive design that reflects how people communicate colour across cultures.

Northeastern University London, London, United Kingdom

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 128

6 件の発表

開始日時2026-04-14 20:15:00

終了日時2026-04-14 21:45:00

お気に入り

あとで読む

コレクション

Colour in Translation: Data, Models, and Benchmarking for Cross-Linguistic Colour Naming

要旨

著者

会議: CHI 2026

セッション: Modeling Spatial, Linguistic, and Sensory Errors