Machine learning & state detection

https://doi.org/10.1145/3313831.3376866

Selecting an appropriate model to forecast product demand is critical to the manufacturing industry. However, due to the data complexity, market uncertainty and users' demanding requirements for the model, it is challenging for demand analysts to select a proper model. Although existing model selection methods can reduce the manual burden to some extent, they often fail to present model performance details on individual products and reveal the potential risk of the selected model. This paper presents DFSeer, an interactive visualization system to conduct reliable model selection for demand forecasting based on the products with similar historical demand. It supports model comparison and selection with different levels of details. Besides, it shows the difference in model performance on similar products to reveal the risk of model selection and increase users' confidence in choosing a forecasting model. Two case studies and interviews with domain experts demonstrate the effectiveness and usability of DFSeer.

Interactive Visualization

Model Selection

Product Demand Forecasting

Time Series

Hong Kong University of Science and Technology, Hong Kong, China

Huawei Technologies Investment Co. Ltd, Shenzhen, China

Hong Kong University of Science and Technology, Hong Kong, China

Huawei Technologies Investment Co. Ltd, Shenzhen, China

The Kong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China

10.1145/3313831.3376866

https://doi.org/10.1145/3313831.3376455

The development of a statistical modeling program requires example data to observe and verify the behavior of the program. Such example data are either taken from an existing dataset or synthesized using commands. Programmers may want to directly design an arbitrary dataset or modify it interactively, but it is difficult to do so in current development environments. We therefore propose combining a code editor with an interactive scatter plot editor to efficiently understand the behavior of statistical modeling algorithms. The user interactively creates and modifies the dataset on the scatter plot editor, while the system continuously executes the code in the editor, taking the data as input, and shows the result in the editor. This paper presents the design rationale behind the system and introduces several usage scenarios.

Interactive data design and editing

Statistical modeling

Live programming

The University of Tokyo, Tokyo, Japan

10.1145/3313831.3376455

https://doi.org/10.1145/3313831.3376140

Online fraud is the well-known dark side of the modern Internet. Unsupervised fraud detection algorithms are widely used to address this problem. However, selecting features, adjusting hyperparameters, evaluating the algorithms, and eliminating false positives all require human expert involvement. In this work, we design and implement an end-to-end interactive visualization system, FDHelper, based on the deep understanding of the mechanism of the black market and fraud detection algorithms. We identify a workflow based on experience from both fraud detection algorithm experts and domain experts. Using a multi-granularity three-layer visualization map embedding an entropy-based distance metric ColDis, analysts can interactively select different feature sets, refine fraud detection algorithms, tune parameters and evaluate the detection result in near real-time. We demonstrate the effectiveness and significance of FDHelper through two case studies with state-of-the-art fraud detection algorithms, interviews with domain experts and algorithm experts, and a user study with eight first-time end users.

Human Computer Interaction

Fraud Detection

Visualization

Tsinghua University, Beijing, China

Tsinghua University , Beijing, China

Tsinghua University, Beijing, China

Boston University, Boston, MA, USA

Tsinghua University & AHI Fin-tech Inc., Beijing, China

Beihang University, Beijing, China

Tsinghua University, Beijing, China

10.1145/3313831.3376140

https://doi.org/10.1145/3313831.3376177

Successful machine learning (ML) applications require iterations on both modeling and the underlying data. While prior visualization tools for ML primarily focus on modeling, our interviews with 23 ML practitioners reveal that they improve model performance frequently by iterating on their data (e.g., collecting new data, adding labels) rather than their models. We also identify common types of data iterations and associated analysis tasks and challenges. To help attribute data iterations to model performance, we design a collection of interactive visualizations and integrate them into a prototype, Chameleon, that lets users compare data features, training/testing splits, and performance across data versions. We present two case studies where developers apply \system to their own evolving datasets on production ML projects. Our interface helps them verify data collection efforts, find failure cases stretching across data versions, capture data processing changes that impacted performance, and identify opportunities for future data iterations.

Data iteration

evolving datasets

machine learning iteration

visual analytics

interactive interfaces

Georgia Institute of Technology & Apple Inc., Atlanta, GA, USA

Apple Inc., Seattle, WA, USA

Carnegie Mellon University, Pittsburgh, PA, USA

Apple Inc, Seattle, WA, USA

10.1145/3313831.3376177