Automatic segmentation of logs for creativity tools such as image editing systems could improve their usability and learnability by supporting such interaction use cases as smart history navigation or recommending alternative design choices. We propose a multi-level segmentation model that works for many image editing tasks including poster creation, portrait retouching, and special effect creation. The lowest-level chunks of logged events are computed using a support vector machine model and higher-level chunks are built on top of these, at a level of granularity that can be customized for specific use cases. Our model takes into account features derived from four event attributes collected in realistically complex Photoshop sessions with expert users: command, timestamp, image content, and artwork layer. We present a detailed analysis of the relevance of each feature and evaluate the model using both quantitative performance metrics and qualitative analysis of sample sessions.
https://doi.org/10.1145/3313831.3376152
Many artists broadcast their creative process through live streaming platforms like Twitch and YouTube, and people often watch archives of these broadcasts later for learning and inspiration. Unfortunately, because live stream videos are often multiple hours long and hard to skim and browse, few can leverage the wealth of knowledge hidden in these archives. We present an approach for automatic temporal segmentation of creative live stream videos. Using an audio transcript and a log of software usage, the system segments the video into sections that the artist can optionally label with meaningful titles. We evaluate this approach by gathering feedback from expert streamers and comparing automatic segmentations to those made by viewers. We find that, while there is no one "correct" way to segment a live stream, our automatic method performs similarly to viewers, and streamers find it useful for navigating their streams after making slight adjustments and adding section titles.
We present GAZED eye GAZe-guided EDiting for videos captured by a solitary, static, wide-angle and high-resolution camera. Eye-gaze has been effectively employed in computational applications as a cue to capture interesting scene content; we employ gaze as a proxy to select shots for inclusion in the edited video. Given the original video, scene content and user eye-gaze tracks are combined to generate an edited video comprising cinematically valid actor shots and shot transitions to generate an aesthetic and vivid representation of the original narrative. We model cinematic video editing as an energy minimization problem over shot selection, whose constraints capture cinematographic editing conventions. Gazed scene locations primarily determine the shots constituting the edited video. Effectiveness of GAZED against multiple competing methods is demonstrated via a psychophysical study involving 12 users and twelve performance videos.
https://doi.org/10.1145/3313831.3376544
Photographic composition is often taught as alignment with composition grids—most commonly, the rule of thirds. Professional photographers use more complex grids, like the harmonic armature, to achieve more diverse dynamic compositions. We are interested in understanding whether these complex grids are helpful to amateurs.<br>In a formative study, we found that overlaying the harmonic armature in the camera can help less experienced photographers discover and achieve different compositions, but it can also be overwhelming due to the large number of lines. Photographers actually use subsets of lines from the armature to explain different aspects of composition. However, this occurs mainly offline to analyze existing images. We propose bringing this mental model into the camera—by adaptively highlighting relevant lines to the current scene and point of view. We describe a saliency-based algorithm for selecting these lines and present an evaluation of the system that shows that photographers found the proposed adaptive armatures helpful for capturing more well-composed images.
https://doi.org/10.1145/3313831.3376635
In this paper we introduce Camera Adversaria; a mobile app designed to disrupt the automatic surveillance of personal photographs by technology companies. The app leverages the brittleness of deep neural networks with respect to high-frequency signals, adding generative adversarial perturbations to users' photographs. These perturbations confound image classification systems but are virtually imperceptible to human viewers. Camera Adversaria builds on methods developed by machine learning researchers as well as a growing body of work, primarily from art and design, which transgresses contemporary surveillance systems. We map the design space of responses to surveillance and identify an under-explored region where our project is situated. Finally we show that the language typically used in the adversarial perturbation literature serves to affirm corporate surveillance practices and malign resistance. This raises significant questions about the function of the research community in countenancing systems of surveillance.