research-article

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Authors:

Nikos Arechiga,

Peter D. Bailis,

Matei ZahariaAuthors Info & Claims

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 496 - 505

https://doi.org/10.1145/3514221.3517907

Published: 11 June 2022 Publication History

Abstract

ML is being deployed in complex, real-world scenarios where errors have impactful consequences. In these systems, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data. Common practice in the ML literature assumes that labels are the ground truth. However, in our experience in a large autonomous vehicle development center, we have found that vendors can often provide erroneous labels, which can lead to downstream safety risks in trained models.

To address these issues, we propose a new abstraction, learned observation assertions, and implement it in a system called Fixy. Fixy leverages existing organizational resources, such as existing (possibly noisy) labeled datasets or previously trained ML models, to learn a probabilistic model for finding errors in human- or model-generated labels. Given user-provided features and these existing resources, Fixy learns feature distributions that specify likely and unlikely values (e.g., that a speed of 30mph is likely but 300mph is unlikely). It then uses these feature distributions to score labels for potential errors. We show that Fixy can automatically rank potential errors in real datasets with up to 2x higher precision compared to recent work on model assertions and standard techniques such as uncertainty sampling. Furthermore, Fixy can uncover labeling errors in 70% of scenes in a popular autonomous vehicle dataset.

Supplemental Material

MP4 File

Video for LOA

Download
21.15 MB

PDF File

Read me

Download
21.12 KB

ZIP File

Source Code

Download
44.32 MB

References

[1]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.

Digital Library

[2]

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, et al. 2017. Tfx: A tensorflow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1387--1395.

Digital Library

[3]

Leopoldo Bertossi. 2006. Consistent query answering in databases. ACM Sigmod Record, Vol. 35, 2 (2006), 68--76.

Digital Library

[4]

George Beskales, Ihab F Ilyas, and Lukasz Golab. 2010. Sampling the repairs of functional dependency violations under hard constraints. Proceedings of the VLDB Endowment, Vol. 3, 1--2 (2010), 197--207.

Digital Library

[5]

Philip Bohannon, Wenfei Fan, Michael Flaster, and Rajeev Rastogi. 2005. A cost-based model and effective heuristic for repairing constraints by value modification. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 143--154.

Digital Library

[6]

Chiao-Lun Cheng. 2019. Training Data - Quantity is no Panacea. (2019). https://scale.com/blog/training-data-quantity-is-no-panacea

[7]

Xu Chu, Ihab F Ilyas, Sanjay Krishnan, and Jiannan Wang. 2016. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 international conference on management of data. 2201--2206.

Digital Library

[8]

Frank Dellaert, Michael Kaess, et al. 2017. Factor graphs for robot perception. Foundations and Trends® in Robotics, Vol. 6, 1--2 (2017), 1--139.

[9]

Alireza Heidari, Joshua McGrath, Ihab F Ilyas, and Theodoros Rekatsinas. 2019. Holodetect: Few-shot learning for error detection. In Proceedings of the 2019 International Conference on Management of Data. 829--846.

Digital Library

[10]

Nick Hynes, D Sculley, and Michael Terry. 2017. The data linter: Lightweight, automated sanity checking for ml data sets. In NIPS MLSys Workshop .

[11]

Daniel Kang, Deepti Raghavan, Peter Bailis, and Matei Zaharia. 2020. Model Assertions for Monitoring and Improving ML Model. MLSys (2020).

[12]

Andrej Kaparthy. 2018. Building the Software 2.0 Stack. (2018).

[13]

R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platinsky, W. Jiang, and V. Shet. 2019. Lyft Level 5 Perception Dataset 2020. https://level5.lyft.com/dataset/.

[14]

Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J Franklin, and Ken Goldberg. 2016. Activeclean: Interactive data cleaning while learning convex loss models. arXiv preprint arXiv:1601.03797 (2016).

[15]

Frank R Kschischang, Brendan J Frey, and H-A Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, Vol. 47, 2 (2001), 498--519.

Digital Library

[16]

Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12697--12705.

[17]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics, Vol. 31, 5 (2015), 1147--1163.

[18]

Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901--4911.

[19]

Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1027--1038.

Digital Library

[20]

Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2017. Data management challenges in production machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data. 1723--1726.

Digital Library

[21]

Neoklis Polyzotis, Martin Zinkevich, Sudip Roy, Eric Breck, and Steven Whang. 2019. Data validation for machine learning. MLSys (2019).

[22]

Johannes Pöschmann, Tim Pfeifer, and Peter Protzel. 2020. Factor Graph based 3D Multi-Object Tracking in Point Clouds. arXiv preprint arXiv:2008.05309 (2020).

[23]

Erhard Rahm and Hong Hai Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., Vol. 23, 4 (2000), 3--13.

[24]

Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2020. Snorkel: rapid training data creation with weak supervision. The VLDB Journal, Vol. 29, 2 (2020), 709--730.

[25]

Theodoros Rekatsinas, Xu Chu, Ihab F Ilyas, and Christopher Ré. 2017. Holoclean: Holistic data repairs with probabilistic inference. arXiv preprint arXiv:1702.00820 (2017).

[26]

Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[27]

Vinay Shet. 2019. Lyft Level 5 Self-Driving Perception Dataset Competition Now Open. https://medium.com/wovenplanetlevel5/lyft-level-5-self-driving-dataset-competition-now-open-97493e9f154a. (2019).

[28]

Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Ré, and Abishek Sethi. 2020. Leveraging organizational resources to adapt models to new data modalities. arXiv preprint arXiv:2008.09983 (2020).

[29]

Daisuke Wakabayashi. 2018. Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam. https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html .

[30]

Ulla Wandinger. 2005. Introduction to lidar. In Lidar. Springer, 1--18.

[31]

Weiming Xiang, Patrick Musau, Ayana A Wild, Diego Manzanas Lopez, Nathaniel Hamilton, Xiaodong Yang, Joel Rosenfeld, and Taylor T Johnson. 2018. Verification for machine learning, autonomy, and neural networks survey. arXiv preprint arXiv:1810.01989 (2018).

[32]

Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).

Digital Library

[33]

Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, and Gang Yu. 2019. Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv preprint arXiv:1908.09492 (2019).

Cited By

Shankar SLi HAsawa PHulsebos MLin YZamfirescu-Pereira JChase HFu-Hinthorn WParameswaran AWu E(2024)spade: Synthesizing Data Quality Assertions for Large Language Model PipelinesProceedings of the VLDB Endowment10.14778/3685800.368583517:12(4173-4186)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685835
Yin YFeng YWeng SYao YLiu JZhao ZChristakis MPradel M(2024)Datactive: Data Fault Localization for Object Detection SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680329(895-907)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680329
Kang DGuibas JBailis PHashimoto TSun YZaharia M(2024)Data Management for ML-Based Analytics and BeyondACM / IMS Journal of Data Science10.1145/36110931:1(1-23)Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1145/3611093
Show More Cited By

Index Terms

Finding Label and Model Errors in Perception Data With Learned Observation Assertions
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions
        Inconsistent data
        Uncertainty
    2. Information integration
      1. Data cleaning

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Learning safe multi-label prediction for weakly labeled data

In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete, which commonly occurs in real applications, e.g., image classification, document categorization. This setting includes, e.g., (i) ...
Partial label learning with unlabeled data
IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

June 2022

2597 pages

ISBN:9781450392495

DOI:10.1145/3514221

General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
Google

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 12 - 17, 2022

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
354
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shankar SLi HAsawa PHulsebos MLin YZamfirescu-Pereira JChase HFu-Hinthorn WParameswaran AWu E(2024)spade: Synthesizing Data Quality Assertions for Large Language Model PipelinesProceedings of the VLDB Endowment10.14778/3685800.368583517:12(4173-4186)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685835
Yin YFeng YWeng SYao YLiu JZhao ZChristakis MPradel M(2024)Datactive: Data Fault Localization for Object Detection SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680329(895-907)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680329
Kang DGuibas JBailis PHashimoto TSun YZaharia M(2024)Data Management for ML-Based Analytics and BeyondACM / IMS Journal of Data Science10.1145/36110931:1(1-23)Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1145/3611093
Schubert MRiedlinger TKahl KKröll DSchoenen SŠegvić SRottmann M(2024)Identifying Label Errors in Object Detection Datasets by Loss Inspection2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00452(4570-4579)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00452
Yan PAbdulkadir ALuley PRosenthal MSchatte GGrewe BStadelmann T(2024)A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and DirectionsIEEE Access10.1109/ACCESS.2023.334913212(3768-3789)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3349132
Weng SFeng YYin YDai YLiu JZhao Z(2024)Seeing the invisible: test prioritization for object detection systemEmpirical Software Engineering10.1007/s10664-024-10539-429:6Online publication date: 23-Sep-2024
https://doi.org/10.1007/s10664-024-10539-4
Lin CJackson S(2023)From Bias to Repair: Error as a Site of Collaboration and Negotiation in Applied Data Science WorkProceedings of the ACM on Human-Computer Interaction10.1145/35796077:CSCW1(1-32)Online publication date: 16-Apr-2023
https://dl.acm.org/doi/10.1145/3579607
Luley PDeriu JYan PSchatte GStadelmann T(2023)From Concept to Implementation: The Data-Centric Development Process for AI in Industry2023 10th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS57534.2023.00017(73-76)Online publication date: Jun-2023
https://doi.org/10.1109/SDS57534.2023.00017

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten