Text-to-image (T2I) models enable users worldwide to create high-definition and realistic images through text prompts, where the underrepresentation and potential misinformation of images have raised growing concerns. However, few existing works examine cultural representativeness, especially involving whether the generated content can fairly and accurately reflect global cultures. Combining automated and human methods, we investigate this issue in multiple dimensions quantificationally and conduct a set of evaluations on three prevailing T2I models (DALL-E v2, Stable Diffusion v1.5 and v2.1). Introducing attributes of cultural cluster and subject, we provide a fresh interdisciplinary perspective to bias analysis. The benchmark dataset UCOGC is presented, which encompasses authentic images of unique cultural objects from global clusters. Our results reveal that the culture of a disadvantaged country is prone to be neglected, some specified subjects often present a stereotype or a simple patchwork of elements, and over half of cultural objects are mispresented.
https://doi.org/10.1145/3613904.3642877
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)