SeekUI: Predicting Visual Search Behavior on Graphical User Interfaces with a Reward-Augmented Vision Language Model

Visual search is key to understanding and improving interaction with graphical user interfaces (GUIs), yet predicting scanpaths on real GUIs remains an open challenge. Unlike free-viewing, visual search is goal-driven and shaped by both linguistic and visual features of the GUI. State-of-the-art models of visual search, trained on natural images, fail with GUIs because they cannot capture the effects of grouping and semantics on search strategies. We present \textsc{SeekUI}, a reward-augmented Vision Language Model (VLM) that predicts scanpaths directly from a GUI screenshot and a text cue describing the desired target. Our model extends the capability of VLMs to reproduce human-like visual search behavior on GUIs and outperforms baseline models across different types of GUIs. Importantly, it reproduces key empirical phenomena established in eye-tracking studies of visual search, including the Guess–Scan–Confirm strategy. In sum, \textsc{SeekUI} provides a foundation for predicting visual search behavior and has potential for informing GUI evaluation and optimization.

Aalto University, Espoo, Finland

University of Luxembourg, Esch-sur-Alzette, Luxembourg

Aalto University, Helsinki, Finland

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 131

7 件の発表

開始日時2026-04-14 20:15:00

終了日時2026-04-14 21:45:00

お気に入り

あとで読む

コレクション

SeekUI: Predicting Visual Search Behavior on Graphical User Interfaces with a Reward-Augmented Vision Language Model

要旨

著者

会議: CHI 2026

セッション: Interactive Visualization for Model Inspection and Debugging