Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

October 2020

2020 Proceeding

General Chairs:
Jeannette Wing
Columbia University, USA
,
David Madigan
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

FODS '20: ACM-IMS Foundations of Data Science Conference Virtual Event USA October 19 - 20, 2020

ISBN:

978-1-4503-8103-1

Published:

18 October 2020

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Reflects downloads up to 20 Sep 2024Bibliometrics

Citation count

118

Downloads (6 weeks)

100

Downloads (12 months)

970

Downloads (cumulative)

4,058

Sections

FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

2020

Previous Next

Skip Abstract Section

Abstract

Computing and statistics underpin the rapid emergence of data science as a pivotal academic discipline. The Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), the two key academic organizations in these areas, have come together to launch a conference series on the Foundations of Data Science. Our inaugural event, the ACMIMS Interdisciplinary Summit on the Foundations of Data Science, took place in San Francisco in 2019. FODS-2020 represents the first of what will be an annual conference series with refereed conference proceedings. This interdisciplinary event brings together researchers and practitioners to address foundational data science challenges in prediction, inference, fairness, ethics and the future of data science.

We received 58 submissions and the program committee reviewed each paper thoroughly. We accepted 17 papers for plenary presentation and inclusion in the proceedings. The program also included keynote addresses by Professor Mihaela van der Schaar and Professor Oren Etzioni and half-day tutorials by Professor Michael Kearns and Professor David Blei.

Proceeding Downloads

PDF(Title Page, Copyright, Welcome, Contents, Organization, Sponsors)

PDF(Author Index)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Keynote Talk I

section

Session details: Keynote Talk I

David Madigan

https://doi.org/10.1145/3429731

- 0
Metrics
Total Citations0

keynote

AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare

Mihaela van der Schaar

Page 1https://doi.org/10.1145/3412815.3416879

An AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, ...

- 1
- 224
Metrics
Total Citations1
Total Downloads224
Last 12 Months23
Last 6 weeks1

Abstract
Get Access

SESSION: Session 1: Methodology

section

Session details: Session 1: Methodology

Julia Kempe

https://doi.org/10.1145/3429732

- 0
Metrics
Total Citations0

research-article

ADAGES: Adaptive Aggregation with Stability for Distributed Feature Selection

Yu Gui

Pages 3–12https://doi.org/10.1145/3412815.3416881

In this era of big data, not only the large amount of data keeps motivating distributed computing, but concerns on data privacy also put forward the emphasis on distributed learning. To conduct feature selection and to control the false discovery rate ...

- 1
- 113
Metrics
Total Citations1
Total Downloads113
Last 12 Months6
Last 6 weeks0

Abstract
Get Access

research-article

Classification Acceleration via Merging Decision Trees

Chenglin Fan,
Ping Li

Pages 13–22https://doi.org/10.1145/3412815.3416886

We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each ...

- 9
- 213
Metrics
Total Citations9
Total Downloads213
Last 12 Months49
Last 6 weeks7

Abstract
Get Access

research-article

Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Sarah Tan,
Matvey Soloviev,
Giles Hooker,
Martin T. Wells

Pages 23–34https://doi.org/10.1145/3412815.3416893

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to ...

- 26
- 373
Metrics
Total Citations26
Total Downloads373
Last 12 Months59
Last 6 weeks7

Abstract
Get Access

research-article

Open Access

Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting

Miguel Á Carreira-Perpiñán,
Arman Zharmagambetov

Pages 35–46https://doi.org/10.1145/3412815.3416882

Ensemble methods based on trees, such as Random Forests, AdaBoost and gradient boosting, are widely recognized as among the best off-the-shelf classifiers: they typically achieve state-of-the-art accuracy in many problems with little effort in tuning ...

- 20
- 577
Metrics
Total Citations20
Total Downloads577
Last 12 Months247
Last 6 weeks25

Abstract
View online with eReader
PDF

SESSION: Session 2: Fairness, Privacy, Interpretability

section

Session details: Session 2: Fairness, Privacy, Interpretability

Jeff Goldsmith

https://doi.org/10.1145/3429733

- 0
Metrics
Total Citations0

research-article

Interpreting Black Box Models via Hypothesis Testing

Collin Burns,
Jesse Thomason,
Wesley Tansey

Pages 47–57https://doi.org/10.1145/3412815.3416889

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore ...

- 19
- 349
Metrics
Total Citations19
Total Downloads349
Last 12 Months63
Last 6 weeks3

Abstract
Get Access

research-article

Congenial Differential Privacy under Mandated Disclosure

Ruobin Gong,
Xiao-Li Meng

Pages 59–70https://doi.org/10.1145/3412815.3416892

Differentially private data releases are often required to satisfy a set of external constraints that reflect the legal, ethical, and logical mandates to which the data curator is obligated. The enforcement of constraints, when treated as post-...

- 5
- 130
Metrics
Total Citations5
Total Downloads130
Last 12 Months20
Last 6 weeks1

Abstract
Get Access

research-article

Incentives Needed for Low-Cost Fair Lateral Data Reuse

Roland Maio,
Augustin Chaintreau

Pages 71–82https://doi.org/10.1145/3412815.3416890

A central goal of algorithmic fairness is to build systems with fairness properties that compose gracefully. A major effort and step towards this goal in data science has been the development offair representations which guarantee demographic parity ...

- 0
- 67
Metrics
Total Citations0
Total Downloads67
Last 12 Months5
Last 6 weeks0

Abstract
Get Access

research-article

Public Access

Applying Algorithmic Accountability Frameworks with Domain-specific Codes of Ethics: A Case Study in Ecosystem Forecasting for Shellfish Toxicity in the Gulf of Maine

Isabella Grasso,
David Russell,
Abigail Matthews,
Jeanna Matthews,
Nicholas R. Record

Pages 83–91https://doi.org/10.1145/3412815.3416897

Ecological forecasts are used to inform decisions that can havesignificant impacts on the lives of individuals and on the healthof ecosystems. These forecasts, or models, embody the ethics oftheir creators as well as many seemingly arbitrary ...

- 3
- 334
Metrics
Total Citations3
Total Downloads334
Last 12 Months115
Last 6 weeks13

Abstract
View online with eReader
PDF

SESSION: Keynote Talk II

section

Session details: Keynote Talk II

Jeannette Wing

https://doi.org/10.1145/3429734

- 0
Metrics
Total Citations0

keynote

Semantic Scholar, NLP, and the Fight against COVID-19

Oren Etzioni

Page 93https://doi.org/10.1145/3412815.3416880

This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) at the Allen Institute for AI and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its ...

- 0
- 57
Metrics
Total Citations0
Total Downloads57
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

SESSION: Session 3: Data Science Theory

section

Session details: Session 3: Data Science Theory

Yannis Ioannidis

https://doi.org/10.1145/3429735

- 0
Metrics
Total Citations0

research-article

Non-Uniform Sampling of Fixed Margin Binary Matrices

Alex Fout,
Bailey K. Fosdick,
Matthew P. Hitt

Pages 95–105https://doi.org/10.1145/3412815.3416887

Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained ...

- 0
- 63
Metrics
Total Citations0
Total Downloads63
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

research-article

Large Very Dense Subgraphs in a Stream of Edges

Claire Mathieu,
Michel de Rougemont

Pages 107–117https://doi.org/10.1145/3412815.3416884

We study the detection and the reconstruction of a large very dense subgraph in a social graph with n nodes and m edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. łog n)$. A subgraph is ...

- 2
- 63
Metrics
Total Citations2
Total Downloads63
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

research-article

Toward Communication Efficient Adaptive Gradient Method

Xiangyi Chen,
Xiaoyun Li,
Ping Li

Pages 119–128https://doi.org/10.1145/3412815.3416891

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed ...

- 11
- 270
Metrics
Total Citations11
Total Downloads270
Last 12 Months23
Last 6 weeks4

Abstract
Get Access

research-article

Towards Practical Lipschitz Bandits

Tianyu Wang,
Weicheng Ye,
Dawei Geng,
Cynthia Rudin

Pages 129–138https://doi.org/10.1145/3412815.3416885

Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and ...

- 9
- 183
Metrics
Total Citations9
Total Downloads183
Last 12 Months36
Last 6 weeks3

Abstract
Get Access

research-article

Open Access

On Reinforcement Learning for Turn-based Zero-sum Markov Games

Devavrat Shah,
Varun Somani,
Qiaomin Xie,
Zhi Xu

Pages 139–148https://doi.org/10.1145/3412815.3416888

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) ...

- 0
- 395
Metrics
Total Citations0
Total Downloads395
Last 12 Months180
Last 6 weeks12

Abstract
View online with eReader
PDF

SESSION: Session 4: Foundations in Practice

section

Session details: Session 4: Foundations in Practice

Stan Ahalt

https://doi.org/10.1145/3429736

- 0
Metrics
Total Citations0

research-article

Public Access

Transforming Probabilistic Programs for Model Checking

Ryan Bernstein,
Matthijs Vákár,
Jeannette Wing

Pages 149–159https://doi.org/10.1145/3412815.3416896

Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of ...

- 0
- 166
Metrics
Total Citations0
Total Downloads166
Last 12 Months47
Last 6 weeks13

Abstract
View online with eReader
PDF

research-article

Open Access

StyleCAPTCHA: CAPTCHA Based on Stylized Images to Defend against Deep Networks

Haitian Chen,
Bai Jiang,
Hao Chen

Pages 161–170https://doi.org/10.1145/3412815.3416895

CAPTCHAs are widely deployed for bot detection. Many CAPTCHAs are based on visual perception tasks such as text and objection classification. However, they are under serious threat from advanced visual perception technologies based on deep convolutional ...

- 1
- 231
Metrics
Total Citations1
Total Downloads231
Last 12 Months52
Last 6 weeks9

Abstract
View online with eReader
PDF

research-article

Statistical Significance in High-dimensional Linear Mixed Models

Lina Lin,
Mathias Drton,
Ali Shojaie

Pages 171–181https://doi.org/10.1145/3412815.3416883

This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (...

- 2
- 92
Metrics
Total Citations2
Total Downloads92
Last 12 Months14
Last 6 weeks0

Abstract
Get Access

research-article

Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

Thanh Le,
Vasant Honavar

Pages 183–188https://doi.org/10.1145/3412815.3416894

Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable ...

- 3
- 158
Metrics
Total Citations3
Total Downloads158
Last 12 Months25
Last 6 weeks2

Abstract
Get Access

Cited By

Ge C, Yang Z, Fan X, Huang Y, Shi Z, Zhang X and Han L (2024). A new spectral simulating method based on near-infrared hyperspectral imaging for evaluation of antibiotic mycelia residues in protein feeds, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 10.1016/j.saa.2024.124536, 319, (124536), Online publication date: 1-Oct-2024.
Hernández-López I, Prieto-Santiago V, Ortiz-Sòla J, Abadias M and Aguiló-Aguayo I (2024). Acceptance of microalgal processes and products Sustainable Industrial Processes Based on Microalgae, 10.1016/B978-0-443-19213-5.00015-7, (335-359),
Yang B, Ji S, Zhao T, Wang Z, Zhang Y, Pan Q, Huang W and Lu B (2023). Phytosterols photooxidation in O/W emulsion: Influence of emulsifier composition and interfacial properties, Food Hydrocolloids, 10.1016/j.foodhyd.2023.108698, 142, (108698), Online publication date: 1-Sep-2023.
Cheng Z, Pan W, Xian W, Yu J, Weng X, Benjakul S, Guidi A, Ying X and Deng S (2022). Effects of various logistics packaging on the quality and microbial variation of bigeye tuna (Thunnus obesus), Frontiers in Nutrition, 10.3389/fnut.2022.998377, 9
Liu Q, Chang X, Shan Y, Fu F and Ding S (2020). Fabrication and characterization of Pickering emulsion gels stabilized by zein/pullulan complex colloidal particles , Journal of the Science of Food and Agriculture, 10.1002/jsfa.10992, 101:9, (3630-3643), Online publication date: 1-Jul-2021.
Magri A, Petriccione M, Cerqueira M and Gutiérrez T (2020). Self-assembled lipids for food applications: A review, Advances in Colloid and Interface Science, 10.1016/j.cis.2020.102279, 285, (102279), Online publication date: 1-Nov-2020.

Save to Binder

Create a New Binder

Name

Contributors

Jeannette Wing
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
David Madigan
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile

Index Terms

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

Index terms have been assigned to the content through auto-classification.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Index Terms

Recommendations

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining