Experiments in human-computer interaction (HCI) often evaluate whether a prototype is “better,” but novelty alone can affect users’ judgments and possibly performance. To quantify this effect, we conducted a within-subjects study of 48 participants comparing four pairs of functionally identical prototypes (mice, keyboards, search engines, and AI chatbots). Each pair differed only in cosmetic features and a label marking one as “old” and the other as “new.” Novelty labeling shifted preference: up to 77% favored the version labeled “new.” Subjective ratings for the search engine increased under the “new” label by up to 7.1%. For the AI chatbot, ratings were driven by preference, with the preferred version rated up to 11.6% higher than the unpreferred one. Performance differences were modest and emerged for errors (e.g., 9.7% fewer misses with the “new” mouse, up to 7.2% lower error rates with the “new” keyboard). Technology readiness predicted baseline skill and occasionally moderated performance but did not protect judgments from novelty bias. These results show that novelty labeling reframes interpretation and preference more than performance, raising concerns for HCI evaluations relying on participant judgments.
ACM CHI Conference on Human Factors in Computing Systems