UICrit: Enhancing Automated Design Evaluation with a UI Critique Dataset

要旨

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven designers, each with at least a year of professional design experience. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55\% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

著者
Peitong Duan
UC Berkeley, Berkeley, California, United States
Chin-Yi Cheng
Google Research, Mountain View, California, United States
Gang Li
Google Research, Mountain View, California, United States
Bjoern Hartmann
UC Berkeley, Berkeley, California, United States
Yang Li
Google Research, Mountain View, California, United States
論文URL

https://doi.org/10.1145/3654777.3676381

動画

会議: UIST 2024

ACM Symposium on User Interface Software and Technology

セッション: 3. Machine Learning for User Interfaces

Westin: Allegheny 3
5 件の発表
2024-10-15 18:00:00
2024-10-15 19:15:00