Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation

要旨

This study introduces \textbf{InteractEval}, a framework that integrates the outcomes of Think-Aloud (TA) conducted by humans and LLMs to generate attributes for checklist-based text evaluation. By combining humans' flexibility and high-level reasoning with LLMs' consistency and extensive knowledge, InteractEval outperforms text evaluation baselines on a text summarization benchmark (SummEval) and an essay scoring benchmark (ELLIPSE). Furthermore, an in-depth analysis shows that it promotes divergent thinking in both humans and LLMs, leading to the generation of a wider range of relevant attributes and enhancement of text evaluation performance. A subsequent comparative analysis reveals that humans excel at identifying attributes related to internal quality (Coherence and Fluency), but LLMs perform better at those attributes related to external alignment (Consistency and Relevance). Consequently, leveraging both humans and LLMs together produces the best evaluation outcomes, highlighting the necessity of effectively combining humans and LLMs in an automated checklist-based text evaluation.

著者
SeongYeub Chu
KAIST, Daejeon, Korea, Republic of
JongWoo Kim
Kaist, Daejeon, Korea, Republic of
Mun Yong. Yi
KAIST, Daejeon, Daejeon, Korea, Republic of
DOI

10.1145/3706598.3713181

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713181

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Text Entry

Annex Hall F205
7 件の発表
2025-04-29 23:10:00
2025-04-30 00:40:00
日本語まとめ
読み込み中…