Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3387940.3391465acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Evaluating Surprise Adequacy for Question Answering

Published: 25 September 2020 Publication History

Abstract

With the wide and rapid adoption of Deep Neural Networks (DNNs) in various domains, an urgent need to validate their behaviour has risen, resulting in various test adequacy metrics for DNNs. One of the metrics, Surprise Adequacy (SA), aims to measure how surprising a new input is based on the similarity to the data used for training. While SA has been evaluated to be effective for image classifiers based on Convolutional Neural Networks (CNNs), it has not been studied for the Natural Language Processing (NLP) domain. This paper applies SA to NLP, in particular to the question answering task: the aim is to investigate whether SA correlates well with the correctness of answers. An empirical evaluation using the widely used Stanford Question Answering Dataset (SQuAD) shows that SA can work well as a test adequacy metric for the question answering task.

References

[1]
Paul Ammann and Jeff Offutt. 2008. Introduction to Software Testing (1 ed.). Cambridge University Press, New York, NY, USA.
[2]
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722--2730.
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805
[4]
Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2013. Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1915--1929.
[5]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing using Surprise Adequacy. In Proceedings of the 41th International Conference on Software Engineering (ICSE 2019). IEEE Press, 1039--1049.
[6]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.
[7]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.
[8]
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist (2010).
[9]
Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 7167--7177.
[10]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60--88.
[11]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, 120--131.
[12]
Prasanta Chandra Mahalanobis. 2018. Reprint of: Mahalanobis, P.C. (1936) "On the Generalised Distance in Statistics". Sankhya A 80, 1 (2018), 1--7.
[13]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, 1--18.
[14]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv:1802.05365
[15]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. CoRR abs/1606.05250 (2016). arXiv:1606.05250 http://arxiv.org/abs/1606.05250
[16]
Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional Attention Flow for Machine Comprehension. CoRR abs/1611.01603 (2016). arXiv:1611.01603
[17]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[18]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303--314.

Cited By

View all
  • (2024)Neuron importance-aware coverage analysis for deep neural network testingEmpirical Software Engineering10.1007/s10664-024-10524-x29:5Online publication date: 25-Jul-2024
  • (2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
  • (2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
  • Show More Cited By

Index Terms

  1. Evaluating Surprise Adequacy for Question Answering
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops
        June 2020
        831 pages
        ISBN:9781450379632
        DOI:10.1145/3387940
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 September 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Deep Learning
        2. Natural Language Processing
        3. Software Testing

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        ICSE '20
        Sponsor:
        ICSE '20: 42nd International Conference on Software Engineering
        June 27 - July 19, 2020
        Seoul, Republic of Korea

        Upcoming Conference

        ICSE 2025

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Neuron importance-aware coverage analysis for deep neural network testingEmpirical Software Engineering10.1007/s10664-024-10524-x29:5Online publication date: 25-Jul-2024
        • (2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
        • (2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
        • (2023)Uncertainty quantification for deep neural networks: An empirical comparison and usage guidelinesSoftware Testing, Verification and Reliability10.1002/stvr.184033:6Online publication date: 20-Jan-2023
        • (2022)CheapET-3: cost-efficient use of remote DNN modelsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3559082(1811-1813)Online publication date: 7-Nov-2022
        • (2022)Quality assurance study with mismatched data in sentiment analysis2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00059(442-446)Online publication date: Dec-2022
        • (2021)Improved Surprise Adequacy Tools for Corner Case Data Description and DetectionApplied Sciences10.3390/app1115682611:15(6826)Online publication date: 25-Jul-2021
        • (2021)Corner Case Data Description and Detection2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN)10.1109/WAIN52551.2021.00009(19-26)Online publication date: May-2021
        • (2021)A Review and Refinement of Surprise Adequacy2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest)10.1109/DeepTest52559.2021.00009(17-24)Online publication date: Jun-2021
        • (2021)Multimodal Surprise Adequacy Analysis of Inputs for Natural Language Processing DNN Models2021 IEEE/ACM International Conference on Automation of Software Test (AST)10.1109/AST52587.2021.00017(80-89)Online publication date: May-2021

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media