Multimodal large language models (MLLMs) are Generative AI models that take different modalities such as text, audio, and video as input and generate appropriate multimodal output. Since such models will be integrated into future educational tools, a human-centered design approach that takes students’ perspectives into account is essential while designing such applications. This paper describes two co-design workshops which were conducted with 79 student groups to examine how they design and prototype future educational tools integrated with MLLMs. Through various activities in the workshops, students discussed relevant educational problems, created journey maps, storyboards and low fidelity prototypes for their applications, and evaluated their applications based on relevant design principles. We found that students’ applications used MLLMs for important learning environment design features such as multimodal content creation, personalization, and feedback. Based on these findings, we discuss future research directions for the design of multimodality in generative AI educational applications.
https://dl.acm.org/doi/10.1145/3706598.3714146
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)