Existing conversational approaches for Internet of Things (IoT) service mashup do not support modification because of the usability challenge, although it is common for users to modify the service mashups in IoT environments. To support the modification of IoT service mashups through conversational interfaces in a usable manner, we propose the conversational mashup modification agent (CoMMA). Users can modify IoT service mashups using CoMMA through natural language conversations. CoMMA has a two-step mashup modification interaction, an implicature-based localization step, and a modification step with a disambiguation strategy. The localization step allows users to easily search for a mashup by vocalizing their expressions in the environment. The modification step supports users to modify mashups by speaking simple modification commands. We conducted a user study and the results show that CoMMA is as effective as visual approaches in terms of task completion time and perceived task workload for modifying IoT service mashups.
https://dl.acm.org/doi/abs/10.1145/3491102.3517655
We examine mid-air typing data collected from touch typists to evaluate the features and classification models for recognizing finger stroke. A large number of finger movement traces have been collected using finger motion capture systems, labeled into individual finger strokes, and classified into several key features. We test finger kinematic features, including 3D position, velocity, acceleration, and temporal features, including previous fingers and keys. Based on this analysis, we assess the performance of various classifiers, including Naive Bayes, Random Forest, Support Vector Machines, and Deep Neural Networks, in terms of the accuracy for correctly classifying the keystroke. We finally incorporate a linguistic heuristic to explore the effectiveness of the character prediction model and improve the total accuracy.
https://dl.acm.org/doi/abs/10.1145/3491102.3502100
Touchscreen tracking latency, often 80ms or more, creates a rubber-banding effect in everyday direct manipulation tasks such as dragging, scrolling, and drawing. This has been shown to decrease system preference, user performance, and overall realism of these interfaces. In this research, we demonstrate how the addition of a thin, 2D micro-patterned surface with 5 micron spaced features can be used to reduce motor-visual touchscreen latency. When a finger, stylus, or tangible is translated across this textured surface frictional forces induce acoustic vibrations which naturally encode sliding velocity. This acoustic signal is sampled at 192kHz using a conventional audio interface pipeline with an average latency of 28ms. When fused with conventional low-speed, but high-spatial-accuracy 2D touch position data, our machine learning model can make accurate predictions of real time touch location.
https://dl.acm.org/doi/abs/10.1145/3491102.3502069
Full-body tracking in virtual reality improves presence, allows interaction via body postures, and facilitates better social expression among users. However, full-body tracking systems today require a complex setup fixed to the environment (e.g., multiple lighthouses/cameras) and a laborious calibration process, which goes against the desire to make VR systems more portable and integrated. We present HybridTrak, which provides accurate, real-time full-body tracking by augmenting inside-out upper-body VR tracking systems with a single external off-the-shelf RGB web camera. HybridTrak converts and transforms users' 2D full-body poses from the webcam to 3D poses leveraging the inside-out upper-body tracking data with a full-neural solution. We showed HybridTrak is more accurate than RGB or depth-based tracking method on the MPI-INF-3DHP dataset. We also tested HybridTrak in the popular VRChat app and showed that body postures presented by HybridTrak are more distinguishable and more natural than a solution using an RGBD camera.
https://dl.acm.org/doi/abs/10.1145/3491102.3502045
Gaze and speech are rich contextual sources of information that, when combined, can result in effective and rich multimodal interactions. This paper proposes a machine learning-based pipeline that leverages and combines users’ natural gaze activity, the semantic knowledge from their vocal utterances and the synchronicity between gaze and speech data to facilitate users’ interaction. We evaluated our proposed approach on an existing dataset, which involved 32 participants recording voice notes while reading an academic paper. Using a Logistic Regression classifier, we demonstrate that our proposed multimodal approach maps voice notes with accurate text passages with an average 𝐹1-Score of 0.90. Our proposed pipeline motivates the design of multimodal interfaces that combines natural gaze and speech patterns to enable robust interactions
https://dl.acm.org/doi/abs/10.1145/3491102.3502134