Recent research in robotic proxies has demonstrated that one can automatically reproduce many non-verbal cues important in co-located collaboration. However, they often require a symmetrical hardware setup in each location. We present the VRoxy system, designed to enable access to remote spaces through a robotic embodiment, using a VR headset in a much smaller space, such as a personal office. VRoxy maps small movements in VR space to larger movements in the physical space of the robot, allowing the user to navigate large physical spaces easily. Using VRoxy, the VR user can quickly explore and navigate in a low-fidelity rendering of the remote space. Upon the robot's arrival, the system uses the feed of a 360 camera to support real-time interactions. The system also facilitates various interaction modalities by rendering the micro-mobility around shared spaces, head and facial animations, and pointing gestures on the proxy. We demonstrate how our system can accommodate mapping multiple physical locations onto a unified virtual space. In a formative study, users could complete a design decision task where they navigated and collaborated in a complex 7.5m x 5m layout using a 3m x 2m VR space.
https://doi.org/10.1145/3586183.3606743
Despite the advances and ubiquity of digital communication media such as videoconferencing and virtual reality, they remain oblivious to the rich intentions expressed by users. Beyond transmitting audio, videos, and messages, we envision digital communication media as proactive facilitators that can provide unobtrusive assistance to enhance communication and collaboration. Informed by the results of a formative study, we propose three key design concepts to explore the systematic integration of intelligence into communication and collaboration, including the panel substrate, language-based intent recognition, and lightweight interaction techniques. We developed CrossTalk, a videoconferencing system that instantiates these concepts, which was found to enable a more fluid and flexible communication and collaboration experience.
https://doi.org/10.1145/3586183.3606773
Virtual reality (VR) telepresence applications and the so-called "metaverse" promise to be the next major medium of human-computer interaction. However, with recent studies demonstrating the ease at which VR users can be profiled and deanonymized, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an "incognito mode" for VR. Our technique leverages local ε-differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. Our system is capable of flexibly adapting to the unique needs of each VR application to further optimize this trade-off. We implement our solution as a universal Unity (C#) plugin that we then evaluate using several popular VR applications. Upon faithfully replicating the most well-known VR privacy attack studies, we show a significant degradation of attacker capabilities when using our solution.
https://doi.org/10.1145/3586183.3606754
We present MARS (Metadata Augmented Real-time Streaming), a system that enables game-aware streaming interfaces for Twitch. Current streaming interfaces provide a video stream of gameplay and a chat channel for conversation, but do not allow viewers to interact with game content independently from the steamer or other viewers. With MARS, a Unity game’s metadata is rendered in real-time onto a Twitch viewer’s interface. The metadata can then power viewer-side interfaces that are aware of the streamer’s game activity and provide new capacities for viewers. Use cases include providing contextual information (e.g. clicking on a unit to learn more), improving accessibility (e.g. slowing down text presentation speed), and supporting novel stream-based game designs (e.g. asymmetric designs where the viewers know more than the streamer). We share the details of MARS’ architecture and capabilities in this paper, and showcase a working prototype for each of our three proposed use cases.
https://doi.org/10.1145/3586183.3606753
Crafting a rich and unique environment is crucial for fictional world-building, but can be difficult to achieve since illustrating a world from scratch requires time and significant skill. We investigate the use of recent multi-modal image generation systems to enable users iteratively visualize and modify elements of their fictional world using a combination of text input, sketching, and region-based filling. WorldSmith enables novice world builders to quickly visualize a fictional world with layered edits and hierarchical compositions. Through a formative study (4 participants) and first-use study (13 participants) we demonstrate that WorldSmith offers more expressive interactions with prompt-based models. With this work, we explore how creatives can be empowered to leverage prompt-based generative AI as a tool in their creative process, beyond current "click-once" prompting UI paradigms.
https://doi.org/10.1145/3586183.3606772
With the increasing deployment of voice-controlled devices in homes and enterprises, there is an urgent demand for voice identification to prevent unauthorized access to sensitive information and property loss. However, due to the broadcast nature of sound wave, a voice only system is vulnerable to adverse conditions and malicious attacks. We observe that the cooperation of millimeter waves (mmWave) and voice signals can significantly improve the effectiveness and security of user identification. Based on the properties, we propose a multi-modal user identification system (named WavoID) by fusing the uniqueness of mmWave sensed vocal vibration and mic-recorded voice of users. To estimate fine-grained waveforms, WavoID splits signals and adaptively combines useful decomposed signals according to correlative contents in both mmWave and voice. An elaborated anti-spoofing module in WavoID comprising biometric bimodal information defend against attacks. WavoID produces and fuses the response maps of mmWave and voice to improve the representation power of fused features, benefiting accurate identification, even facing adverse circumstances. We evaluate WavoID using commercial sensors on extensive experiments. WavoID has significant performance on user identification with over 98% accuracy on 100 user datasets.
https://doi.org/10.1145/3586183.3606775