Synthesis-Assisted Video Prototyping From a Document

Video productions commonly start with a script, especially for talking head videos that feature a speaker narrating to the camera. When the source materials come from a written document -- such as a web tutorial, it takes iterations to refine content from a text article to a spoken dialogue, while considering visual compositions in each scene. We propose Doc2Video, a video prototyping approach that converts a document to interactive scripting with a preview of synthetic talking head videos. Our pipeline decomposes a source document into a series of scenes, each automatically creating a synthesized video of a virtual instructor. Designed for a specific domain -- programming cookbooks, we apply visual elements from the source document, such as a keyword, a code snippet or a screenshot, in suitable layouts. Users edit narration sentences, break or combine sections, and modify visuals to prototype a video in our Editing UI. We evaluated our pipeline with public programming cookbooks. Feedback from professional creators shows that our method provided a reasonable starting point to engage them in interactive scripting for a narrated instructional video.

Google Research, Mountain View, California, United States

Google, Mountain View, California, United States

Google Research, Mountain View, California, United States

Google Research, Pittsburgh, Pennsylvania, United States

Google Research, Mountain View, California, United States

Google, Atlanta, Georgia, United States

https://doi.org/10.1145/3526113.3545676

The ACM Symposium on User Interface Software and Technology

6 件の発表

開始日時2022-10-31 23:30:00

終了日時2022-11-01 01:00:00

お気に入り