Despite recognizing that Large Language Models (LLMs) can generate inaccurate or unacceptable responses, universities are increasingly making such models available to their students. Existing university policies defer the responsibility of checking for correctness and appropriateness of LLM responses to students and assume that they will have the required knowledge and skills to do so on their own. In this work, we conducted a series of user studies with students (N=47) from a large North American public research university to understand if and how they critically engage with LLMs. Our participants evaluated an LLM provided by the university in a quasi-experimental setup; first by themselves, and then with a scaffolded design probe that guided them through an end-user auditing exercise. Qualitative analysis of participant think-aloud and LLM interaction data showed that students without basic AI literacy skills struggle to conceptualize and evaluate LLM biases on their own. However, they transition to focused thinking and purposeful interactions when provided with structured guidance. We highlight areas where current university policies may fall short and offer policy and design recommendations to better support students.
https://dl.acm.org/doi/10.1145/3706598.3713714
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)