Speech & language

https://doi.org/10.1145/3313831.3376322

Learning to speak in foreign languages is hard. Speech shadowing has been rising as a proven way to practice speaking, which asks a learner to listen and repeat a native speech template as simultaneously as possible. However, shadowing can be hard to do because learners can frequently fail to follow the speech and unintentionally interrupt a practice session. Worse, as a technical way to evaluate shadowing performance in real-time has not been established, no automated solutions are available to help. In this paper, we propose a technical framework with context-dependent speech recognition to evaluate shadowing in real-time. We propose a shadowing tutor system called WithYou, which can automatically adjust the playback and the difficulty of a speech template when learners fail, so shadowing becomes smooth and tailored. Results from a user study show that WithYou provides greater speech improvements (14%) than the conventional method (2.7%) with a lower cognitive load.

Computer Assisted Language Learning (CALL)

Speaking

Shadowing

Speech Recognition

Intelligent Tutoring System, Language Learning

University of Tokyo, Tokyo, Japan

University of Tokyo & Sony Computer Science Laboratories, Tokyo, Japan

10.1145/3313831.3376322

https://doi.org/10.1145/3313831.3376519

We present a system that automatically transforms text articles into audio-visual slideshows by leveraging the notion of word concreteness, which measures how strongly a word or phrase is related to some perceptible concept. In a formative study we learn that people not only prefer such audio-visual slideshows but find that the content is easier to understand compared to text articles or text articles augmented with images. We use word concreteness to select search terms and find images relevant to the text. Then, based on the distribution of concrete words and the grammatical structure of an article, we time-align selected images with audio narration obtained through text-to-speech to produce audio-visual slideshows. In a user evaluation we find that our concreteness-based algorithm selects images that are highly relevant to the text. The quality of our slideshows is comparable to slideshows produced manually using standard video editing tools, and people strongly prefer our slideshows to those generated using a simple keyword-search based approach.

Audio-visual slideshows

Text-to-video

Word concreteness

Stanford University, Stanford, CA, USA

Adobe Research, Cambridge, MA, USA

Adobe Research, San Francisco, CA, USA

Stanford University, Stanford, CA, USA

10.1145/3313831.3376519

https://doi.org/10.1145/3313831.3376451

Analyzing queries from search engines and intelligent assistants is difficult. A key challenge is organizing queries into interpretable, context-preserving, representative, and flexible groups. We present structural templates, abstract queries that replace tokens with their linguistic feature forms, as a query grouping method. The templates allow analysts to create query groups with structural similarity at different granularities. We introduce Tempura, an interactive tool that lets analysts explore a query dataset with structural templates. Tempura summarizes a query dataset by selecting a representative subset of templates to show the query distribution. The tool also helps analysts navigate the template space by suggesting related templates likely to yield further explorations. Our user study shows that Tempura helps analysts examine the distribution of a query dataset, find labeling errors, and discover model error patterns and outliers.

Natural Language Processing

Error Analysis

Query Analysis

University of Washington & Apple Inc., Seattle, WA, USA

Apple Inc., Seattle, WA, USA

10.1145/3313831.3376451

https://doi.org/10.1145/3313831.3376383

Social signals are crucial when we decide if we want to interact with someone online. However, social signals are typically limited to the few that platform designers provide, and most can be easily manipulated. In this paper, we propose a new idea called synthesized social signals (S3s): social signals computationally derived from an account's history, and then rendered into the profile. Unlike conventional social signals such as profile bios, S3s use computational summarization to reduce receiver costs and raise the cost of faking signals. To demonstrate and explore the concept, we built Sig, an extensible Chrome extension that computes and visualizes S3s. After a formative study, we conducted a field deployment of Sig on Twitter, targeting two well-known problems on social media: toxic accounts and misinformation. Results show that Sig reduced receiver costs, added important signals beyond conventionally available ones, and that a few users felt safer using Twitter as a result. We conclude by reflecting on the opportunities and challenges S3s provide for augmenting interaction on social platforms.

social computing

social signals

social platform

social media

University of Michigan – Ann Arbor, Ann Arbor, MI, USA

Georgia Institute of Technology, Atlanta, GA, USA

University of Michigan – Ann Arbor, Ann Arbor, MI, USA

10.1145/3313831.3376383