Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation

This study introduces \textbf{InteractEval}, a framework that integrates the outcomes of Think-Aloud (TA) conducted by humans and LLMs to generate attributes for checklist-based text evaluation. By combining humans' flexibility and high-level reasoning with LLMs' consistency and extensive knowledge, InteractEval outperforms text evaluation baselines on a text summarization benchmark (SummEval) and an essay scoring benchmark (ELLIPSE). Furthermore, an in-depth analysis shows that it promotes divergent thinking in both humans and LLMs, leading to the generation of a wider range of relevant attributes and enhancement of text evaluation performance. A subsequent comparative analysis reveals that humans excel at identifying attributes related to internal quality (Coherence and Fluency), but LLMs perform better at those attributes related to external alignment (Consistency and Relevance). Consequently, leveraging both humans and LLMs together produces the best evaluation outcomes, highlighting the necessity of effectively combining humans and LLMs in an automated checklist-based text evaluation.

KAIST, Daejeon, Korea, Republic of

Kaist, Daejeon, Korea, Republic of

KAIST, Daejeon, Daejeon, Korea, Republic of

10.1145/3706598.3713181

https://dl.acm.org/doi/10.1145/3706598.3713181

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

Annex Hall F205

7 件の発表

開始日時2025-04-29 23:10:00

終了日時2025-04-30 00:40:00

読み込み中…

お気に入り