Improving Automatic Summarization for Browsing Longform Spoken Dialog

Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, spoken dialog remains challenging to consume as listening is slower than reading and difficult to skim or navigate relative to text. Recent systems leveraging automatic speech recognition (ASR) and automatic summarization allow users to better browse speech data and forage for information of interest. However, these systems intake disfluent speech which causes automatic summarization to yield readability, adequacy, and accuracy problems. To improve navigability and browsability of speech, we present three training agnostic post-processing techniques that address dialog concerns of readability, coherence, and adequacy. We integrate these improvements with user interfaces which communicate estimated summary metrics to aid user browsing heuristics. Quantitative evaluation metrics show a 19\% improvement in summary quality. We discuss how summarization technologies can help people browse longform audio in trustworthy and readable ways.

Columbia University, New York, New York, United States

Microsoft, Seattle, Washington, United States

Google LLC, Mountain View, California, United States

Stanford University, Stanford, California, United States

Columbia University, New York, New York, United States

https://doi.org/10.1145/3544548.3581339

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)

Hall G2

6 件の発表

開始日時2023-04-26 23:30:00

終了日時2023-04-27 00:55:00

お気に入り