From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task Automation

Traditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.

Tsinghua University, Beijing, China

University of Notre Dame, Notre Dame, Indiana, United States

Tsinghua University, Beijing, China

10.1145/3706598.3713356

https://dl.acm.org/doi/10.1145/3706598.3713356

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

G401

7 件の発表

開始日時2025-04-28 20:10:00

終了日時2025-04-28 21:40:00

読み込み中…

お気に入り