Open-ended Structured Question Assessment with Human-LLM Collaboration

要旨

Open-ended Structured Questions (OSQs) assess not only students’ knowledge but also their reasoning and expression. However, grading OSQ requires fine-grained, scoring point–level analysis, which is labor-intensive and difficult to scale. Although recent LLM-based and human–AI collaborative grading systems improve efficiency, they mainly operate at the whole-response level and lack support for point-level inspection, correction, and feedback integration. We present VeriGrader, a novel human–AI collaborative system for OSQ grading. It combines chain-of-thought prompting with scoring point– and response-level in-context learning to enable interpretable LLM grading and iterative refinement from instructor feedback. A coordinated multi-view interface supports efficient verification of response segments, matched scoring points, and rationales. We evaluate VeriGrader using real course data and a user study with 12 participants. Results show that VeriGrader improves both grading efficiency, accuracy, and consistency over the baselines, demonstrating the effectiveness of VeriGrader and promoting human–AI collaboration in educational assessment.

著者
Fengyan Lin
South China University of Technology, Guangzhou, Guangdong, China
Yanna Lin
University of Waterloo, Waterloo, Ontario, Canada
Kai Cao
South China University of Technology, Guangzhou, China
Zikun Deng
South China University of Technology, Guangzhou, Guangdong, China
Yi Cai
South China University of Technology, Guangzhou, China

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: HCAI and Collaboration

P1 - Room 130
6 件の発表
2026-04-15 18:00:00
2026-04-15 19:30:00