Improving Automatic Summarization for Browsing Longform Spoken Dialog

要旨

Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, spoken dialog remains challenging to consume as listening is slower than reading and difficult to skim or navigate relative to text. Recent systems leveraging automatic speech recognition (ASR) and automatic summarization allow users to better browse speech data and forage for information of interest. However, these systems intake disfluent speech which causes automatic summarization to yield readability, adequacy, and accuracy problems. To improve navigability and browsability of speech, we present three training agnostic post-processing techniques that address dialog concerns of readability, coherence, and adequacy. We integrate these improvements with user interfaces which communicate estimated summary metrics to aid user browsing heuristics. Quantitative evaluation metrics show a 19\% improvement in summary quality. We discuss how summarization technologies can help people browse longform audio in trustworthy and readable ways.

著者
Daniel Li
Columbia University, New York, New York, United States
Thomas Chen
Microsoft, Seattle, Washington, United States
Alec Zadikian
Google LLC, Mountain View, California, United States
Albert Tung
Stanford University, Stanford, California, United States
Lydia B. Chilton
Columbia University, New York, New York, United States
論文URL

https://doi.org/10.1145/3544548.3581339

動画

会議: CHI 2023

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)

セッション: Communication and Social Good

Hall G2
6 件の発表
2023-04-26 23:30:00
2023-04-27 00:55:00