This paper proposes that the ability to generate diverse outputs in response to a single prompt is necessary for text-to-image models to become more effective creativity support tools. It formalises the problem of measuring the diversity of generated text and images, with an emphasis on interactive, exploratory use in open-ended and creative tasks. It suggests, motivated by research in the psychology of creativity, that diversity should sit alongside image quality and fit-to-prompt as critical measures in this setting. The paper adapts several diversity measures from the literature to this task, then explores how they compare to human diversity ratings. These evaluations show that algorithmic measures of diversity can be a useful proxy for human ratings, with both declining in accuracy as the difficulty of the task increases. The paper concludes with an exploratory qualitative analysis of the factors involved in human diversity judgments to guide future research in this emerging area.
ACM CHI Conference on Human Factors in Computing Systems