Macros are building block tasks of our everyday smartphone activity (e.g., "login", or "booking a flight"). Effectively extracting macros is important for understanding mobile interaction and enabling task automation. These macros are however difficult to extract at scale as they can be comprised of multiple steps yet hidden within programmatic components of mobile apps. In this paper, we introduce a novel approach based on Large Language Models (LLMs) to automatically extract semantically meaningful macros from both random and user-curated mobile interaction traces. The macros produced by our approach are automatically tagged with natural language descriptions and are fully executable. We conduct multiple studies to validate the quality of extracted macros, including user evaluation, comparative analysis against human-curated tasks, and automatic execution of these macros. These experiments and analyses demonstrate the effectiveness of our approach and the usefulness of extracted macros in various downstream applications.
https://doi.org/10.1145/3613904.3642074
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users `envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of \textit{Envisioning} by highlighting three misalignments on not knowing: (1) what the task should be, (2) how to instruct the LLM to do the task, and (3) what to expect for the LLM’s output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.
https://doi.org/10.1145/3613904.3642754
Recent studies indicated GPT-4 outperforms online crowd workers in data labeling accuracy, notably workers from Amazon Mechanical Turk (MTurk). However, these studies were criticized for deviating from standard crowdsourcing practices and emphasizing individual workers' performances over the whole data-annotation process. This paper compared GPT-4 and an ethical and well-executed MTurk pipeline, with 415 workers labeling 3,177 sentence segments from 200 scholarly articles using the CODA-19 scheme. Two worker interfaces yielded 127,080 labels, which were then used to infer the final labels through eight label-aggregation algorithms. Our evaluation showed that despite best practices, MTurk pipeline's highest accuracy was 81.5%, whereas GPT-4 achieved 83.6%. Interestingly, when combining GPT-4's labels with crowd labels collected via an advanced worker interface for aggregation, 2 out of the 8 algorithms achieved an even higher accuracy (87.5%, 87.0%). Further analysis suggested that, when the crowd's and GPT-4's labeling strengths are complementary, aggregating them could increase labeling accuracy.
https://doi.org/10.1145/3613904.3642834
Across many domains (e.g., media/entertainment, mobile apps, finance, IoT, cybersecurity), there is a growing need for stateful analytics over streams of events to meet key business outcomes. Stateful analytics over event streams entails carefully modeling the sequence, timing, and contextual correlations of events to dynamic attributes. Unfortunately, existing frameworks and languages (e.g., SQL, Flink, Spark) entail significant code complexity and expert effort to express such stateful analytics because of their dynamic and stateful nature. Our overarching goal is to simplify and democratize stateful analytics. Through an iterative design and evaluation process including a foundational user study and two rounds of formative evaluations with 15 industry practitioners, we created SEAM-EZ, a no-code visual programming platform for quickly creating and validating stateful metrics. SEAM-EZ features a node-graph editor, interactive tooltips, embedded data views, and auto-suggestion features to facilitate the creation and validation of stateful analytics. We then conducted three real-world case studies of SEAM-EZ with 20 additional practitioners. Our results suggest that practitioners who previously could not or had to spend significant effort to create stateful metrics using traditional tools such as SQL or Spark can now easily and quickly create and validate such metrics using SEAM-EZ.
https://doi.org/10.1145/3613904.3642055
Spreadsheet programs for interactive surfaces have limited manipulations capabilities and are often frustrating to use. One key reason is that the spreadsheet grid creates a layer that intercepts most user input events, making it difficult to reach the cell values that lie underneath. We conduct an analysis of commercial spreadsheet programs and an elicitation study to understand what users can do and what they would like to do with spreadsheets on interactive surfaces. Informed by these, we design interaction techniques that leverage the precision of the pen to mitigate friction between the different layers. These enable more operations by direct manipulation on and through the grid, targeting not only cells and groups of cells, but values and substrings within and across cells as well. We prototype these interaction techniques and conduct a qualitative study with information workers who perform a variety of spreadsheet operations on their own data.