From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task Automation

要旨

Traditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.

著者
Yiwen Yin
Tsinghua University, Beijing, China
Yu Mei
Tsinghua University, Beijing, China
Chun Yu
Tsinghua University, Beijing, China
Toby Jia-Jun. Li
University of Notre Dame, Notre Dame, Indiana, United States
Aamir Khan Jadoon
Tsinghua University, Beijing, China
Sixiang Cheng
Tsinghua University, Beijing, China
Weinan Shi
Tsinghua University, Beijing, China
Mohan Chen
Tsinghua University, Beijing, China
Yuanchun Shi
Tsinghua University, Beijing, China
DOI

10.1145/3706598.3713356

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713356

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Malleable and Adaptive Interface

G401
7 件の発表
2025-04-28 20:10:00
2025-04-28 21:40:00
日本語まとめ
読み込み中…