In recent years, various initiatives from within and outside the HCI field have encouraged researchers to improve research ethics, openness, and transparency in their empirical research. We quantify how the CHI literature might have changed in these three aspects by analyzing samples of 118 CHI 2017 and 127 CHI 2022 papers---randomly drawn and stratified across conference sessions. We operationalized research ethics, openness, and transparency into 45 criteria and manually annotated the sampled papers. The results show that the CHI 2022 sample was better in 18 criteria, but in the rest of the criteria, it has no improvement. The most noticeable improvements were related to research transparency (10 out of 17 criteria). We also explored the possibility of assisting the verification process by developing a proof-of-concept screening system. We tested this tool with eight criteria. Six of them achieved high accuracy and F1 score. We discuss the implications for future research practices and education. This paper and all supplementary materials are freely available at https://doi.org/10.17605/osf.io/n25d6.
https://doi.org/10.1145/3544548.3580848
Smartphones enable understanding human behavior with activity recognition to support people’s daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people’s daily lives, and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and ~216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries.
https://doi.org/10.1145/3544548.3581190
Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised-learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions about human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions—that human preferences (reward distributions) are fixed over time—and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users. The code for our experimental framework and the collected data can be found at https://github.com/HumainLab/human-bandit-evaluation.
Lifelogging is traditionally used for memory augmentation. However, recent research shows that users' trust in the completeness and accuracy of lifelogs might skew their memories. Privacy-protection alterations such as body blurring and content deletion are commonly applied to photos to circumvent capturing sensitive information. However, their impact on how users remember memories remain unclear. To this end, we conduct a white-hat memory attack and report on an iterative experiment (N=21) to compare the impact of viewing 1) unaltered lifelogs, 2) blurred lifelogs, and 3) a subset of the lifelogs after deleting private ones, on confidently remembering memories. Findings indicate that all the privacy methods impact memories' quality similarly and that users tend to change their answers in recognition more than recall scenarios. Results also show that users have high confidence in their remembered content across all privacy methods. Our work raises awareness about the mindful designing of technological interventions.
https://doi.org/10.1145/3544548.3581565
The ubiquity of devices connected to the internet raises concerns about the security and privacy of smart homes. The effectiveness of interventions to support secure user behaviors is limited by a lack of validated instruments to measure relevant psychological constructs, such as self-efficacy - the belief that one is able to perform certain behaviors. We developed and validated the Cybersecurity Self-Efficacy in Smart Homes (CySESH) scale, a 12-item unidimensional measure of domain-specific self-efficacy beliefs, across five studies (N=1247). Three pilot studies generated and refined an item pool. We report evidence from one initial and one major, preregistered validation study for (1) excellent reliability (𝛼=0.90), (2) convergent validity with self-efficacy in information security (𝑟SEIS=0.64, p<.001), and (3) discriminant validity with outcome expectations (𝑟OE=0.26, p<.001), self-esteem (𝑟RSE=0.17, p<.001), and optimism (𝑟LOT−R=0.18, p<.001). We discuss CySESH's potential to advance future HCI research on cybersecurity, practitioner user assessments, and implications for consumer protection policy.
https://doi.org/10.1145/3544548.3580860
This paper analyses Human-Computer Interaction (HCI) literature reviews to provide a clear conceptual basis for authors, reviewers, and readers. HCI is multidisciplinary and various types of literature reviews exist, from systematic to critical reviews in the style of essays. Yet, there is insufficient consensus of what to expect of literature reviews in HCI. Thus, a shared understanding of literature reviews and clear terminology is needed to plan, evaluate, and use literature reviews, and to further improve review methodology. We analysed 189 literature reviews published at all SIGCHI conferences and ACM Transactions on Computer-Human Interaction (TOCHI) up until August 2022. We report on the main dimensions of variation: (i) contribution types and topics; and (ii) structure and methodologies applied. We identify gaps and trends to inform future meta work in HCI and provide a starting point on how to move towards a more comprehensive terminology system of literature reviews in HCI.
https://doi.org/10.1145/3544548.3581332