Open-ended Structured Question Assessment with Human-LLM Collaboration

Open-ended Structured Questions (OSQs) assess not only students’ knowledge but also their reasoning and expression. However, grading OSQ requires fine-grained, scoring point–level analysis, which is labor-intensive and difficult to scale. Although recent LLM-based and human–AI collaborative grading systems improve efficiency, they mainly operate at the whole-response level and lack support for point-level inspection, correction, and feedback integration. We present VeriGrader, a novel human–AI collaborative system for OSQ grading. It combines chain-of-thought prompting with scoring point– and response-level in-context learning to enable interpretable LLM grading and iterative refinement from instructor feedback. A coordinated multi-view interface supports efficient verification of response segments, matched scoring points, and rationales. We evaluate VeriGrader using real course data and a user study with 12 participants. Results show that VeriGrader improves both grading efficiency, accuracy, and consistency over the baselines, demonstrating the effectiveness of VeriGrader and promoting human–AI collaboration in educational assessment.

South China University of Technology, Guangzhou, Guangdong, China

University of Waterloo, Waterloo, Ontario, Canada

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, Guangdong, China

South China University of Technology, Guangzhou, China

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 130

6 件の発表

開始日時2026-04-15 18:00:00

終了日時2026-04-15 19:30:00

お気に入り

あとで読む

コレクション

要旨

著者

会議: CHI 2026

セッション: HCAI and Collaboration