Computing and statistics underpin the rapid emergence of data science as a pivotal academic discipline. The Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), the two key academic organizations in these areas, have come together to launch a conference series on the Foundations of Data Science. Our inaugural event, the ACMIMS Interdisciplinary Summit on the Foundations of Data Science, took place in San Francisco in 2019. FODS-2020 represents the first of what will be an annual conference series with refereed conference proceedings. This interdisciplinary event brings together researchers and practitioners to address foundational data science challenges in prediction, inference, fairness, ethics and the future of data science.
We received 58 submissions and the program committee reviewed each paper thoroughly. We accepted 17 papers for plenary presentation and inclusion in the proceedings. The program also included keynote addresses by Professor Mihaela van der Schaar and Professor Oren Etzioni and half-day tutorials by Professor Michael Kearns and Professor David Blei.
Proceeding Downloads
AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare
An AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, ...
ADAGES: Adaptive Aggregation with Stability for Distributed Feature Selection
In this era of big data, not only the large amount of data keeps motivating distributed computing, but concerns on data privacy also put forward the emphasis on distributed learning. To conduct feature selection and to control the false discovery rate ...
Classification Acceleration via Merging Decision Trees
We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each ...
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to ...
Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting
Ensemble methods based on trees, such as Random Forests, AdaBoost and gradient boosting, are widely recognized as among the best off-the-shelf classifiers: they typically achieve state-of-the-art accuracy in many problems with little effort in tuning ...
Interpreting Black Box Models via Hypothesis Testing
In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore ...
Congenial Differential Privacy under Mandated Disclosure
Differentially private data releases are often required to satisfy a set of external constraints that reflect the legal, ethical, and logical mandates to which the data curator is obligated. The enforcement of constraints, when treated as post-...
Incentives Needed for Low-Cost Fair Lateral Data Reuse
A central goal of algorithmic fairness is to build systems with fairness properties that compose gracefully. A major effort and step towards this goal in data science has been the development offair representations which guarantee demographic parity ...
Applying Algorithmic Accountability Frameworks with Domain-specific Codes of Ethics: A Case Study in Ecosystem Forecasting for Shellfish Toxicity in the Gulf of Maine
Ecological forecasts are used to inform decisions that can havesignificant impacts on the lives of individuals and on the healthof ecosystems. These forecasts, or models, embody the ethics oftheir creators as well as many seemingly arbitrary ...
Semantic Scholar, NLP, and the Fight against COVID-19
This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) at the Allen Institute for AI and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its ...
Non-Uniform Sampling of Fixed Margin Binary Matrices
Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained ...
Large Very Dense Subgraphs in a Stream of Edges
We study the detection and the reconstruction of a large very dense subgraph in a social graph with n nodes and m edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. łog n)$. A subgraph is ...
Toward Communication Efficient Adaptive Gradient Method
In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed ...
Towards Practical Lipschitz Bandits
Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and ...
On Reinforcement Learning for Turn-based Zero-sum Markov Games
We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) ...
Transforming Probabilistic Programs for Model Checking
Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of ...
StyleCAPTCHA: CAPTCHA Based on Stylized Images to Defend against Deep Networks
CAPTCHAs are widely deployed for bot detection. Many CAPTCHAs are based on visual perception tasks such as text and objection classification. However, they are under serious threat from advanced visual perception technologies based on deep convolutional ...
Statistical Significance in High-dimensional Linear Mixed Models
This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (...
Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data
Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable ...
Cited By
-
Ge C, Yang Z, Fan X, Huang Y, Shi Z, Zhang X and Han L (2024). A new spectral simulating method based on near-infrared hyperspectral imaging for evaluation of antibiotic mycelia residues in protein feeds, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 10.1016/j.saa.2024.124536, 319, (124536), Online publication date: 1-Oct-2024.
-
Hernández-López I, Prieto-Santiago V, Ortiz-Sòla J, Abadias M and Aguiló-Aguayo I (2024). Acceptance of microalgal processes and products Sustainable Industrial Processes Based on Microalgae, 10.1016/B978-0-443-19213-5.00015-7, (335-359),
-
Yang B, Ji S, Zhao T, Wang Z, Zhang Y, Pan Q, Huang W and Lu B (2023). Phytosterols photooxidation in O/W emulsion: Influence of emulsifier composition and interfacial properties, Food Hydrocolloids, 10.1016/j.foodhyd.2023.108698, 142, (108698), Online publication date: 1-Sep-2023.
-
Cheng Z, Pan W, Xian W, Yu J, Weng X, Benjakul S, Guidi A, Ying X and Deng S (2022). Effects of various logistics packaging on the quality and microbial variation of bigeye tuna (Thunnus obesus), Frontiers in Nutrition, 10.3389/fnut.2022.998377, 9
-
Liu Q, Chang X, Shan Y, Fu F and Ding S (2020). Fabrication and characterization of Pickering emulsion gels stabilized by zein/pullulan complex colloidal particles , Journal of the Science of Food and Agriculture, 10.1002/jsfa.10992, 101:9, (3630-3643), Online publication date: 1-Jul-2021.
-
Magri A, Petriccione M, Cerqueira M and Gutiérrez T (2020). Self-assembled lipids for food applications: A review, Advances in Colloid and Interface Science, 10.1016/j.cis.2020.102279, 285, (102279), Online publication date: 1-Nov-2020.
Index Terms
- Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference