Look at me

https://doi.org/10.1145/3313831.3376578

We present GazeConduits, a calibration-free ad-hoc mobile interaction concept that enables users to collaboratively interact with tablets, other users, and content in a cross-device setting using gaze and touch input. GazeConduits leverages recently introduced smartphone capabilities to detect facial features and estimate users' gaze directions. To join a collaborative setting, users place one or more tablets onto a shared table and position their phone in the center, which then tracks users present as well as their gaze direction to determine the tablets they look at. We present a series of techniques using GazeConduits for collaborative interaction across mobile devices for content selection and manipulation. Our evaluation with 20 simultaneous tablets on a table shows that GazeConduits can reliably identify which tablet or collaborator a user is looking at.

Cross-device interaction

gaze input

touch input

RWTH Aachen University, Aachen, Germany

ETH Zürich, Zürich, Switzerland

Aarhus University, Aarhus, Denmark

University College London, London, United Kingdom

10.1145/3313831.3376578

https://doi.org/10.1145/3313831.3376799

Eye movements provide insight into what parts of an image a viewer finds most salient, interesting, or relevant to the task at hand. Unfortunately, eye tracking data, a commonly-used proxy for attention, is cumbersome to collect. Here we explore an alternative: a comprehensive web-based toolbox for crowdsourcing visual attention. We draw from four main classes of attention-capturing methodologies in the literature. ZoomMaps is a novel zoom-based interface that captures viewing on a mobile phone. CodeCharts is a self-reporting methodology that records points of interest at precise viewing durations. ImportAnnots is an "annotation" tool for selecting important image regions, and cursor-based BubbleView lets viewers click to deblur a small area. We compare these methodologies using a common analysis framework in order to develop appropriate use cases for each interface. This toolbox and our analyses provide a blueprint for how to gather attention data at scale without an eye tracker.

Eye tracking

attention

crowdsourcing

interaction techniques

Massachusetts Institute of Technology, Cambridge, MA, USA

Harvard University, Cambridge, MA, USA

University of California, Berkeley, Berkeley, CA, USA

Boston College, Chestnut Hill, MA, USA

Adobe Research, Cambridge, MA, USA

10.1145/3313831.3376799

https://doi.org/10.1145/3313831.3376641

Listening to text using read-aloud applications is a popular way for people to consume content when their visual attention is situationally impaired (e.g., commuting, walking, tired eyes). However, due to the linear nature of audio, such apps do not support skimming---a non-linear, rapid form of reading---essential for quickly grasping the gist and organization of difficult texts, like academic or professional documents. To support auditory skimming for situational impairments, we (1) identified the user needs and challenges in auditory skimming through a formative study (N=20), (2) derived the concept of "eyes-reduced" skimming that blends auditory and visual modes of reading, inspired by how participants mixed visual and non-visual interactions, (3) generated a set of design guidelines for eyes-reduced skimming, and (4) designed and evaluated a novel audio skimming app that embodies the guidelines. Our in-situ preliminary observation study (N=6) suggested that participants were positive about our design and were able to auditorily skim documents. We discuss design implications for eyes-reduced reading, read-aloud apps, and text-to-speech engines.

Situational impairments

skim reading

eyes-free

eyes-reduced

text-to-speech

mobile device

accessibility

design guidelines

interactive prototype

University of British Columbia, Vancouver, BC, Canada

10.1145/3313831.3376641

https://doi.org/10.1145/3313831.3376266

Subtitles play a crucial role in cross-lingual distribution of multimedia content and help communicate information where auditory content is not feasible (loud environments, hearing impairments, unknown languages). Established methods utilize text at the bottom of the screen, which may distract from the video. Alternative techniques place captions closer to related content (e.g., faces) but are not applicable to arbitrary videos such as documentations. Hence, we propose to leverage live gaze as indirect input method to adapt captions to individual viewing behavior. We implemented two gaze-adaptive methods and compared them in a user study (n=54) to traditional captions and audio-only videos. The results show that viewers with less experience with captions prefer our gaze-adaptive methods as they assist them in reading. Furthermore, gaze distributions resulting from our methods are closer to natural viewing behavior compared to the traditional approach. Based on these results, we provide design implications for gaze-adaptive captions.

Eye Tracking

Gaze Input

Gaze-Responsive Display

Multimedia

Video Captions

Subtitles

ETH Zürich, Zürich, Switzerland

University of Stuttgart, Stuttgart, Germany

ETH Zürich, Zürich, Switzerland

10.1145/3313831.3376266