Art or Artifice? Large Language Models and the False Promise of Creativity

要旨

Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique and propose Torrance Test of Creative Writing (TTCW) to evaluate creativity as product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative writers and implement a human assessment of 48 stories written either by professional authors or LLMs using TTCW. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals. In addition, we explore the use of LLMs as assessors to automate the TTCW evaluation, revealing that none of the LLMs positively correlate with the expert assessments.

著者
Tuhin Chakrabarty
Columbia University, New York, New York, United States
Philippe Laban
Salesforce Research, New York, New York, United States
Divyansh Agarwal
Salesforce Research, New York, New York, United States
Smaranda Muresan
Columbia University, New York, New York, United States
Chien-Sheng Wu
Salesforce AI, Palo Alto, California, United States
論文URL

doi.org/10.1145/3613904.3642731

動画

会議: CHI 2024

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)

セッション: Arts and Creative AI

320 'Emalani Theater
4 件の発表
2024-05-14 20:00:00
2024-05-14 21:20:00