Synthesis-Assisted Video Prototyping From a Document

要旨

Video productions commonly start with a script, especially for talking head videos that feature a speaker narrating to the camera. When the source materials come from a written document -- such as a web tutorial, it takes iterations to refine content from a text article to a spoken dialogue, while considering visual compositions in each scene. We propose Doc2Video, a video prototyping approach that converts a document to interactive scripting with a preview of synthetic talking head videos. Our pipeline decomposes a source document into a series of scenes, each automatically creating a synthesized video of a virtual instructor. Designed for a specific domain -- programming cookbooks, we apply visual elements from the source document, such as a keyword, a code snippet or a screenshot, in suitable layouts. Users edit narration sentences, break or combine sections, and modify visuals to prototype a video in our Editing UI. We evaluated our pipeline with public programming cookbooks. Feedback from professional creators shows that our method provided a reasonable starting point to engage them in interactive scripting for a narrated instructional video.

著者
Peggy Chi
Google Research, Mountain View, California, United States
Tao Dong
Google, Mountain View, California, United States
Christian Frueh
Google Research, Mountain View, California, United States
Brian Colonna
Google Research, Pittsburgh, Pennsylvania, United States
Vivek Kwatra
Google Research, Mountain View, California, United States
Irfan Essa
Google, Atlanta, Georgia, United States
論文URL

https://doi.org/10.1145/3526113.3545676

会議: UIST 2022

The ACM Symposium on User Interface Software and Technology

セッション: Storytelling and Presentation

6 件の発表
2022-10-31 23:30:00
2022-11-01 01:00:00