Enabling collaborative data science development with the Ballet framework

要旨

While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, a lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to an ML performance evaluation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss implications for future designers of collaborative projects.

著者
Micah J.. Smith
MIT, Cambridge, Massachusetts, United States
Jürgen Cito
TU Wien, Vienna, Austria
Kelvin Lu
MIT, Cambridge, Massachusetts, United States
Kalyan Veeramachaneni
MIT, Cambridge, Massachusetts, United States
論文URL

https://doi.org/10.1145/3479575

動画

会議: CSCW2021

The 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing

セッション: Data Work Across Contexts and Disciplines

Papers Room D
8 件の発表
2021-10-26 19:00:00
2021-10-26 20:30:00