3. Machine Learning for User Interfaces

会議の名前
UIST 2024
UIClip: A Data-driven Model for Assessing User Interface Design
要旨

User interface (UI) design is a difficult yet important task for ensuring the usability, accessibility, and aesthetic qualities of applications. In our paper, we develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a UI given its screenshot and natural language description. To train UIClip, we used a combination of automated crawling, synthetic augmentation, and human ratings to construct a large-scale dataset of UIs, collated by description and ranked by design quality. Through training on the dataset, UIClip implicitly learns properties of good and bad designs by (i) assigning a numerical score that represents a UI design's relevance and quality and (ii) providing design suggestions. In an evaluation that compared the outputs of UIClip and other baselines to UIs rated by 12 human designers, we found that UIClip achieved the highest agreement with ground-truth rankings. Finally, we present three example applications that demonstrate how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality: (i) UI code generation, (ii) UI design tips generation, and (iii) quality-aware UI example search.

著者
Jason Wu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Yi-Hao Peng
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Xin Yue Amanda. Li
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Amanda Swearngin
Apple, Seattle, Washington, United States
Jeffrey P. Bigham
Apple, Pittsburgh, Pennsylvania, United States
Jeffrey Nichols
Apple Inc, San Diego, California, United States
論文URL

https://doi.org/10.1145/3654777.3676408

動画
UICrit: Enhancing Automated Design Evaluation with a UI Critique Dataset
要旨

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven designers, each with at least a year of professional design experience. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55\% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

著者
Peitong Duan
UC Berkeley, Berkeley, California, United States
Chin-Yi Cheng
Google Research, Mountain View, California, United States
Gang Li
Google Research, Mountain View, California, United States
Bjoern Hartmann
UC Berkeley, Berkeley, California, United States
Yang Li
Google Research, Mountain View, California, United States
論文URL

https://doi.org/10.1145/3654777.3676381

動画
EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning
要旨

From a visual-perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', no scanpath model has been capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which utilizes a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that predicts gaze locations. Our model offers the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and durations, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model.

著者
Yue Jiang
Aalto University, Espoo, Finland
Zixin Guo
Aalto University, Espoo, Finland
Hamed Rezazadegan Tavakoli
Nokia Technologies, Espoo, Finland
Luis A.. Leiva
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Antti Oulasvirta
Aalto University, Helsinki, Finland
論文URL

https://doi.org/10.1145/3654777.3676436

動画
GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning
要旨

Virtual assistants have the potential to play an important role in helping users achieve different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GPTVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GPTVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. For unprecedented tasks, GPTVoiceTasker utilises the contextual information and on-screen content to continuously explore and execute the tasks. In addition, the system continually learns from historical user commands to automate subsequent task invocations, further enhancing execution efficiency. From our experiments, GPTVoiceTasker achieved 84.5% accuracy in parsing human commands into executable actions and 85.7% accuracy in automating multi-step tasks. In our user study, GPTVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GPTVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.

著者
Minh Duc Vu
CSIRO's Data61, Clayton, Victoria, Australia
Han Wang
Monash University, Melbourne, VIC, Australia
Jieshan Chen
CSIRO's Data61, Sydney, New South Wales, Australia
Zhuang Li
Monash University, Melbourne, Australia
Shengdong Zhao
City University of Hong Kong, Hong Kong, China
Zhenchang Xing
CSIRO's Data61 & Australian National University, ACTON, ACT, Australia
Chunyang Chen
Technical University of Munich, Heilbronn, Germany
論文URL

https://doi.org/10.1145/3654777.3676356

動画
VisionTasker: Mobile Task Automation Using Vision Based UI Understanding and LLM Task Planning
要旨

Mobile task automation is an emerging field that leverages AI to streamline and optimize the execution of routine tasks on mobile devices, thereby enhancing efficiency and productivity. Traditional methods, such as Programming By Demonstration (PBD), are limited due to their dependence on predefined tasks and susceptibility to app updates. Recent advancements have utilized the view hierarchy to collect UI information and employed Large Language Models (LLM) to enhance task automation. However, view hierarchies have accessibility issues and face potential problems like missing object descriptions or misaligned structures. This paper introduces VisionTasker, a two-stage framework combining vision-based UI understanding and LLM task planning, for mobile task automation in a step-by-step manner. VisionTasker firstly converts a UI screenshot into natural language interpretations using a vision-based UI understanding approach, eliminating the need for view hierarchies. Secondly, it adopts a step-by-step task planning method, presenting one interface at a time to the LLM. The LLM then identifies relevant elements within the interface and determines the next action, enhancing accuracy and practicality. Extensive experiments show that VisionTasker outperforms previous methods, providing effective UI representations across four datasets. Additionally, in automating 147 real-world tasks on an Android smartphone, VisionTasker demonstrates advantages over humans in tasks where humans show unfamiliarity and shows significant improvements when integrated with the PBD mechanism. VisionTasker is open-source and available at https://github.com/AkimotoAyako/VisionTasker.

著者
Yunpeng Song
Xi'an Jiaotong University, Xi'an, China
Yiheng Bian
Xi'an Jiaotong University, Xi'an, China
Yongtao Tang
Xi'an Jiaotong University, Xi'an, China
Guiyu Ma
Xi'an Jiaotong University, Xi’an, China
Zhongmin Cai
Xi’an Jiaotong University , Xi’an , China
論文URL

https://doi.org/10.1145/3654777.3676386

動画