3. Machine Learning for User Interfaces

https://doi.org/10.1145/3654777.3676381

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven designers, each with at least a year of professional design experience. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55\% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

UC Berkeley, Berkeley, California, United States

Google Research, Mountain View, California, United States

UC Berkeley, Berkeley, California, United States

Google Research, Mountain View, California, United States

https://doi.org/10.1145/3654777.3676436

From a visual-perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', no scanpath model has been capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which utilizes a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that predicts gaze locations. Our model offers the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and durations, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model.

Aalto University, Espoo, Finland

Nokia Technologies, Espoo, Finland

University of Luxembourg, Esch-sur-Alzette, Luxembourg

Aalto University, Helsinki, Finland

https://doi.org/10.1145/3654777.3676356

Virtual assistants have the potential to play an important role in helping users achieve different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GPTVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GPTVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. For unprecedented tasks, GPTVoiceTasker utilises the contextual information and on-screen content to continuously explore and execute the tasks. In addition, the system continually learns from historical user commands to automate subsequent task invocations, further enhancing execution efficiency. From our experiments, GPTVoiceTasker achieved 84.5% accuracy in parsing human commands into executable actions and 85.7% accuracy in automating multi-step tasks. In our user study, GPTVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GPTVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.

CSIRO's Data61, Clayton, Victoria, Australia

Monash University, Melbourne, VIC, Australia

CSIRO's Data61, Sydney, New South Wales, Australia

Monash University, Melbourne, Australia

City University of Hong Kong, Hong Kong, China

CSIRO's Data61 & Australian National University, ACTON, ACT, Australia

Technical University of Munich, Heilbronn, Germany