Language Matters

会議の名前
CHI 2025
Unknown Word Detection for English as a Second Language (ESL) Learners using Gaze and Pre-trained Language Models
要旨

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

著者
Jiexin Ding
Tsinghua University, Beijing, China
Bowen Zhao
Groundlight AI, Seattle, Washington, United States
Yuntao Wang
Tsinghua University, Beijing, China
Xinyun Liu
Rice University, Houston, Texas, United States
Rui Hao
University of Chinese Academy of Sciences, Beijing, China
Ishan Chatterjee
University of Washington, Seattle, Washington, United States
Yuanchun Shi
Tsinghua University, Beijing, China
DOI

10.1145/3706598.3714181

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714181

動画
Tap&Say: Touch Location-Informed Large Language Model for Multimodal Text Correction on Smartphones
要旨

While voice input offers a convenient alternative to traditional text editing on mobile devices, practical implementations face two key challenges: 1) reliably distinguishing between editing commands and content dictation, and 2) effortlessly pinpointing the intended edit location. We propose Tap&Say, a novel multimodal system that combines touch interactions with Large Language Models (LLMs) for accurate text correction. By tapping near an error, users signal their edit intent and location, addressing both challenges. Then, the user speaks the correction text. Tap&Say utilizes the touch location, voice input, and existing text to generate contextually relevant correction suggestions. We propose a novel touch location-informed attention layer that integrates the tap location into the LLM's attention mechanism, enabling it to utilize the tap location for text correction. We fine-tuned the touch location-informed LLM on synthetic touch locations and correction commands, achieving significantly higher correction accuracy than the state-of-the-art method VT. A 16-person user study demonstrated that Tap&Say outperforms VT with 16.4% shorter task completion time and 47.5% fewer keyboard clicks and is preferred by users.

著者
Maozheng Zhao
Stony Brook University, Stony Brook, New York, United States
Michael Xuelin Huang
Google, Mountain View, California, United States
Nathan G. Huang
Westlake High School, Austin, Texas, United States
Shanqing Cai
Google, Mountain View, California, United States
Henry Huang
Harvard University, Cambridge, Massachusetts, United States
Michael G. Huang
University of Texas at Austin, Austin, Texas, United States
Shumin Zhai
Google, Mountain View, California, United States
IV Ramakrishnan
Stony Brook University, Stony Brook, New York, United States
Xiaojun Bi
Stony Brook University, Stony Brook, New York, United States
DOI

10.1145/3706598.3713376

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713376

動画
Lookee: Gaze Tracking-based Infant Vocabulary Comprehension Assessment and Analysis
要旨

Measuring preverbal vocabulary comprehension of young children is vital for early intervention and developmental evaluation, yet challenging due to their limited communication abilities. We introduce Lookee, an AI-powered vocabulary comprehension assessment tool through gaze tracking for toddlers in the preverbal stage. Lookee incorporates the Intermodal Preferential Looking Paradigm (IPLP), which is one of the prominent word comprehension measures for toddlers and estimates word comprehension through a random forest model analysis. We design and validate Lookee through user studies involving 19 toddlers and their parents. Then we identify necessary design requirements from potential stakeholders' perspectives through in-depth interviews including researchers, clinicians, and parents. As a result, Lookee achieves considerable estimation accuracy with sufficient system usability, and demonstrates key design requirements for each stakeholder group. From our study, we highlight necessary design implications in developing and validating AI-powered clinical tools for toddlers.

著者
Minji Kim
Seoul National University, Seoul, Korea, Republic of
Minkyu Shim
Seoul National University, Seoul, Korea, Republic of
Jun Ho Chai
Sunway University, Kuala Lumpur, Malaysia
Eon-Suk Ko
Chosun University, Gwangju, Korea, Republic of
Youngki Lee
Seoul National University, Seoul, Korea, Republic of
DOI

10.1145/3706598.3713386

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713386

動画
Unlocking the Power of Speech: Game-Based Accent and Oral Communication Training for Immigrant English Language Learners via Large Language Models
要旨

With the growing number of immigrants globally, language barriers have become a significant challenge, particularly for those entering English-speaking countries. Traditional language learning methods often fail to provide sufficient practical opportunities, especially for diverse accents. To address this, we introduce Language Urban Odyssey (LUO), a serious game that leverages large language models (LLMs) and game-based learning to offer a low-cost, accessible virtual environment for English learners. Built on the Minecraft platform, LUO offers real-time speech interaction with NPCs of various accents, supported by multi-modal feedback. A controlled study (N=30) showed improvements in speaking abilities, accent comprehension, and emotional confidence. Our findings suggest that LUO provides a scalable, immersive platform that bridges gaps in language learning for immigrants facing cultural and social challenges.

著者
Yijun Zhao
Zhejiang University, Hangzhou, China
Jiangyu Pan
Zhejiang University, Hangzhou, China
Jiacheng Cao
Zhejiang University, HangZhou, China
Jiarong Zhang
Zhejiang University, Hangzhou, Zhejiang, China
Yan Dong
Zhejiang University, Hangzhou, China
Yicheng Wang
Zhejiang University, HangZhou, Zhejiang, China
Preben Hansen
Stockholm University, Kista, Sweden
Guanyun Wang
Zhejiang University, Hangzhou, China
DOI

10.1145/3706598.3713945

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713945

動画
Designing for Transactional Moments: Features of Tools for Child-centred Speech Language Teletherapy
要旨

Teletherapy for speech-language therapy (SLT) has become essential for many families. Early intervention for young children is important to ensure that developmental milestones are met. In this study, from a corpus of 10 videos, we present three cases of online and in-person therapy sessions with children between the ages of 3 and 6. Our analysis shows how online and in-person SLT sessions use tools, how they are conscripted into social and transactional moments, and identifies features of tools that support or hinder therapists’ goals (see Figure 1). From our findings, we discuss in detail four overarching features of tools and implications for design. These features support engagement, space usage, child-centred play, and adaptability in therapy sessions. The paper outlines how these features are present in the tools used in SLT, and describes how they impact SLT activities, therapists’ and children’s goals, and the environment for social transactional activities.

著者
Sarah Matthews
Queensland University of Technology, Brisbane, Qld, Australia
Susan Danby
Queensland University of Technology, Brisbane, QLD, Australia
Sophie Westwood
Queensland University of Technology, Brisbane, Qld, Australia
Maryanne Theobald
Queensland University of Technology, Brisbane, QLD, Australia
Peta Wyeth
University of Technology Sydney, Sydney, Australia
DOI

10.1145/3706598.3713394

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713394

動画
BrickSmart: Leveraging Generative AI to Support Children's Spatial Language Learning in Family Block Play
要旨

Block-building activities are crucial for developing children's spatial reasoning and mathematical skills, yet parents often lack the expertise to guide these activities effectively. BrickSmart, a pioneering system, addresses this gap by providing spatial language guidance through a structured three-step process: Discovery & Design, Build & Learn, and Explore & Expand. This system uniquely supports parents in 1) generating personalized block-building instructions, 2) guiding parents to teach spatial language during building and interactive play, and 3) tracking children's learning progress, altogether enhancing children's engagement and cognitive development. In a comparative study involving 12 parent-child pairs children aged 6-8 years) for both experimental and control groups, BrickSmart demonstrated improvements in supportiveness, efficiency, and innovation, with a significant increase in children's use of spatial vocabularies during block play, thereby offering an effective framework for fostering spatial language skills in children.

受賞
Honorable Mention
著者
Yujia Liu
Tsinghua University, Beijing, China
Siyu Zha
Tsinghua University, Beijing, China
Yuewen Zhang
Tsinghua University, Beijing, China
Yanjin Wang
University of Toronto, Toronto, Ontario, Canada
Yangming Zhang
Wuhan University, Wuhan, Hubei, China
Qi Xin
Tsinghua University, Beijing, China
Lun Yiu Nie
The University of Texas at Austin, Austin, Texas, United States
Chao Zhang
Cornell University, Ithaca, New York, United States
YINGQING XU
Tsinghua University, Beijing, China
DOI

10.1145/3706598.3714212

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714212

動画