The rapid advancement of large language models (LLMs) such as ChatGPT makes LLM-based academic tools possible. However, little research has empirically evaluated how scholars perform different types of academic tasks with LLMs. Through an empirical study followed by a semi-structured interview, we assessed 48 early-stage scholars’ performance in conducting core academic activities (i.e., paper reading and literature reviews) under different levels of time pressure. Before conducting the tasks, participants received different training programs regarding the limitations and capabilities of the LLMs. After completing the tasks, participants completed an interview. Quantitative data regarding the influence of time pressure, task type, and training program on participants' performance in academic tasks was analyzed. Semi-structured interviews provided additional information on the influential factors of task performance, participants' perceptions of LLMs, and concerns about integrating LLMs into academic workflows. The findings can guide more appropriate usage and design of LLM-based tools in assisting academic work.
https://doi.org/10.1145/3613904.3641917
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)