Desired model behavior often differs across contexts (e.g., different geographies, communities, or institutions), but there is little infrastructure to facilitate context-specific evaluations key to deployment decisions and building trust. Here, we present Kaleidoscope, a system for evaluating models in terms of user-driven, domain-relevant concepts. Kaleidoscope’s iterative workflow enables generalizing from a few examples into a larger, diverse set representing an important concept. These example sets can be used to test model outputs or shifts in model behavior in semantically-meaningful ways. For instance, we might construct a “xenophobic comments” set and test that its examples are more likely to be flagged by a content moderation model than a “civil discussion” set. To evaluate Kaleidoscope, we compare it against template- and DSL-based grouping methods, and conduct a usability study with 13 Reddit users testing a content moderation model. We find that Kaleidoscope facilitates iterative, exploratory hypothesis testing across diverse, conceptually-meaningful example sets.
https://doi.org/10.1145/3544548.3581482
There have been significant advances in simulation models predicting human behavior across various interactive tasks. One issue remains, however: identifying the parameter values that best describe an individual user. These parameters often express personal cognitive and physiological characteristics, and inferring their exact values has significant effects on individual-level predictions. Still, the high complexity of simulation models usually causes parameter inference to consume prohibitively large amounts of time, as much as days per user. We investigated amortized inference for its potential to reduce inference time dramatically, to mere tens of milliseconds. Its principle is to pre-train a neural proxy model for probabilistic inference, using synthetic data simulated from a range of parameter combinations. From examining the efficiency and prediction performance of amortized inference in three challenging cases that involve real-world data (menu search, point-and-click, and touchscreen typing), the paper demonstrates that an amortized inference approach permits analyzing large-scale datasets by means of simulation models. It also addresses emerging opportunities and challenges in applying amortized inference in HCI.
https://doi.org/10.1145/3544548.3581439
Perceptual dissimilarities, requiring high-cost user ratings, have contributed to designing well-distinguishable vibrations for associated meaning delivery. Appropriate metrics can reduce the cost, but known metrics in vibration similarity/dissimilarity could not predict them robustly. We propose a physiology-based model (PM) that predicts the perceptual dissimilarities of a given vibration set via two parallel processes: Neural Coding (NC), mimicking the neural signal transfer, and One-dimensional Convolution (OC), capturing rhythmic features. Eight parameters were trained using six datasets published in the literature to maximize Spearman's Rank Correlation. We validated PM and six metrics of RMSE, DTW, Spectral/Temporal Matchings, ST-SIM, and SPQI in twelve datasets: six trained and six untrained datasets including measured accelerations. In all validations, PM's predictions showed robust correlations with user data and similar structures in perceptual spaces. Other baseline metrics showed better fit in specific datasets, but none of them robustly showed correlations and similar perceptual spaces over twelve datasets.
https://doi.org/10.1145/3544548.3580686
Illusory VR interaction techniques such as hand redirection work because humans use vision to adjust their motor commands during movement (e.g., reaching). Existing simulations of redirected reaching are limited, however, and have not yet incorporated important stochastic characteristics like sensorimotor noise, nor captured redirection's effect on movement duration. In this work, we propose adapting a stochastic optimal feedback control (SOFC) model of normal reach to simulate redirection by augmenting sensory feedback at run-time. We present a summary of our simulation and validate it against user data gathered in multiple redirection conditions. We also evaluate the impacts of visual attention on the effectiveness of redirection in real users and replicate the effects in simulation. Our results show that an infinite-horizon SOFC model is able to reproduce key characteristics of redirected reaches and highlight the benefits of SOFC as a tool for simulating, evaluating, and gaining insights about redirection techniques.
https://doi.org/10.1145/3544548.3580767
This paper presents a Shape-Adaptive Ternary-Gaussian model for describing endpoint uncertainty when pointing at moving targets of arbitrary shapes. The basic idea of the model is to combine the uncertainty related to the target shape with the uncertainty caused by the target motion. First, we proposed a model to predict endpoint distribution on static targets based on a Dual-Space Decomposition (DUDE) algorithm. Then, we linearly combined a 2D Ternary-Gaussian model with the newly proposed DUDE-based model to make the 2D Ternary-Gaussian model adaptable to moving targets with random shapes. To verify the performance of our model, we compared it with the original 2D Ternary-Gaussian model and a recent proposed Inscribed Circle model in predicting endpoint distribution. The results show that the proposed model outperformed the two baseline models while maintaining good robustness across different shapes and moving speeds.
https://doi.org/10.1145/3544548.3581217
User models play an important role in interaction design, supporting automation of interaction design choices. In order to do so, model parameters must be estimated from user data. While very large amounts of user data are sometimes required, recent research has shown how experiments can be designed so as to gather data and infer parameters as efficiently as possible, thereby minimising the data requirement. In the current article, we investigate a variant of these methods that amortises the computational cost of designing experiments by training a policy for choosing experimental designs with simulated participants. Our solution learns which experiments provide the most useful data for parameter estimation by interacting with in-silico agents sampled from the model space thereby using synthetic data rather than vast amounts of human data. The approach is demonstrated for three progressively complex models of pointing.
https://doi.org/10.1145/3544548.3581483