Teamwork Triumphs: Collaborative Experiences

https://doi.org/10.1145/3586183.3606773

Despite the advances and ubiquity of digital communication media such as videoconferencing and virtual reality, they remain oblivious to the rich intentions expressed by users. Beyond transmitting audio, videos, and messages, we envision digital communication media as proactive facilitators that can provide unobtrusive assistance to enhance communication and collaboration. Informed by the results of a formative study, we propose three key design concepts to explore the systematic integration of intelligence into communication and collaboration, including the panel substrate, language-based intent recognition, and lightweight interaction techniques. We developed CrossTalk, a videoconferencing system that instantiates these concepts, which was found to enable a more fluid and flexible communication and collaboration experience.

University of California, San Diego, San Diego, California, United States

University of California San Diego, La Jolla, California, United States

Manipal Institute of Technology, Manipal, Karnataka, India

University of California San Diego, San Diego, California, United States

University of California, San Diego, San Diego, California, United States

https://doi.org/10.1145/3586183.3606754

Virtual reality (VR) telepresence applications and the so-called "metaverse" promise to be the next major medium of human-computer interaction. However, with recent studies demonstrating the ease at which VR users can be profiled and deanonymized, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an "incognito mode" for VR. Our technique leverages local ε-differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. Our system is capable of flexibly adapting to the unique needs of each VR application to further optimize this trade-off. We implement our solution as a universal Unity (C#) plugin that we then evaluate using several popular VR applications. Upon faithfully replicating the most well-known VR privacy attack studies, we show a significant degradation of attacker capabilities when using our solution.

University of California, Berkeley, Berkeley, California, United States

Technical University of Munich, Munich, Germany

University of California, Berkeley, Berkeley, California, United States

https://doi.org/10.1145/3586183.3606753

We present MARS (Metadata Augmented Real-time Streaming), a system that enables game-aware streaming interfaces for Twitch. Current streaming interfaces provide a video stream of gameplay and a chat channel for conversation, but do not allow viewers to interact with game content independently from the steamer or other viewers. With MARS, a Unity game’s metadata is rendered in real-time onto a Twitch viewer’s interface. The metadata can then power viewer-side interfaces that are aware of the streamer’s game activity and provide new capacities for viewers. Use cases include providing contextual information (e.g. clicking on a unit to learn more), improving accessibility (e.g. slowing down text presentation speed), and supporting novel stream-based game designs (e.g. asymmetric designs where the viewers know more than the streamer). We share the details of MARS’ architecture and capabilities in this paper, and showcase a working prototype for each of our three proposed use cases.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

https://doi.org/10.1145/3586183.3606772

Crafting a rich and unique environment is crucial for fictional world-building, but can be difficult to achieve since illustrating a world from scratch requires time and significant skill. We investigate the use of recent multi-modal image generation systems to enable users iteratively visualize and modify elements of their fictional world using a combination of text input, sketching, and region-based filling. WorldSmith enables novice world builders to quickly visualize a fictional world with layered edits and hierarchical compositions. Through a formative study (4 participants) and first-use study (13 participants) we demonstrate that WorldSmith offers more expressive interactions with prompt-based models. With this work, we explore how creatives can be empowered to leverage prompt-based generative AI as a tool in their creative process, beyond current "click-once" prompting UI paradigms.

University of Bayreuth, Bayreuth, Germany

Autodesk Research, Toronto, Ontario, Canada

https://doi.org/10.1145/3586183.3606775

With the increasing deployment of voice-controlled devices in homes and enterprises, there is an urgent demand for voice identification to prevent unauthorized access to sensitive information and property loss. However, due to the broadcast nature of sound wave, a voice only system is vulnerable to adverse conditions and malicious attacks. We observe that the cooperation of millimeter waves (mmWave) and voice signals can significantly improve the effectiveness and security of user identification. Based on the properties, we propose a multi-modal user identification system (named WavoID) by fusing the uniqueness of mmWave sensed vocal vibration and mic-recorded voice of users. To estimate fine-grained waveforms, WavoID splits signals and adaptively combines useful decomposed signals according to correlative contents in both mmWave and voice. An elaborated anti-spoofing module in WavoID comprising biometric bimodal information defend against attacks. WavoID produces and fuses the response maps of mmWave and voice to improve the representation power of fused features, benefiting accurate identification, even facing adverse circumstances. We evaluate WavoID using commercial sensors on extensive experiments. WavoID has significant performance on user identification with over 98% accuracy on 100 user datasets.

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, Zhejiang, China

University at buffalo, Buffalo, New York, United States

Computer Science and Engineering, Buffalo, New York, United States

University of Colorado Denver, Denver, Colorado, United States

University at buffalo, Buffalo, New York, United States

Duke Kunshan University, Kunshan, China

Zhejiang University, Hangzhou, China