Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems

Published: 27 December 2024 Publication History

Abstract

Autonomous driving systems (ADSs) are complex cyber-physical systems (CPSs) that must ensure safety even in uncertain conditions. Modern ADSs often employ deep neural networks (DNNs), which may not produce correct results in every possible driving scenario. Thus, an approach to estimate the confidence of an ADS at runtime is necessary to prevent potentially dangerous situations. In this article we propose MarMot, an online monitoring approach for ADSs based on metamorphic relations (MRs), which are properties of a system that hold among multiple inputs and the corresponding outputs. Using domain-specific MRs, MarMot estimates the uncertainty of the ADS at runtime, allowing the identification of anomalous situations that are likely to cause a faulty behavior of the ADS, such as driving off the road.
We perform an empirical assessment of MarMot with five different MRs, using two different subject ADSs, including a small-scale physical ADS and a simulated ADS. Our evaluation encompasses the identification of both external anomalies, e.g., fog, as well as internal anomalies, e.g., faulty DNNs due to mislabeled training data. Our results show that MarMot can identify up to 65% of the external anomalies and 100% of the internal anomalies in the physical ADS, and up to 54% of the external anomalies and 88% of the internal anomalies in the simulated ADS. With these results, MarMot outperforms or is comparable to other state-of-the-art approaches, including SelfOracle, Ensemble, and MC Dropout-based ADS monitors.

References

[1]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018a. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering. 1016–1026.
[2]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018b. Testing autonomous cars for feature interaction failures using many-objective search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 143–154.
[3]
Aitor Arrieta. 2022. Multi-objective metamorphic follow-up test case selection for deep learning systems. In Proceedings of the Genetic and Evolutionary Computation Conference. 1327–1335.
[4]
Jon Ayerdi, Pablo Valle, Asier Iriarte, Ibai Roman, Miren Illarramendi, and Aitor Arrieta. 2023. Dataset for “MarMot: Metamorphic runtime monitoring of autonomous driving systems. DOI:
[5]
Jonathan Bell, Christian Murphy, and Gail Kaiser. 2015. Metamorphic runtime checking of applications without test oracles. CrossTalk 28, 2 (2015).
[6]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63–74.
[7]
Matteo Biagiola and Paolo Tonella. 2022. Testing the plasticity of reinforcement learning-based systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 4 (2022), 1–46.
[8]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to end learning for self-driving cars. arXiv:1604.07316. Retrieved from https://arxiv.org/abs/1604.07316
[9]
Alessandro Calo, Paolo Arcaini, Shaukat Ali, Florian Hauer, and Fuyuki Ishikawa. 2020. Generating avoidable collision scenarios for testing autonomous driving systems. In Proceedings of the IEEE 13th International Conference on Software Testing, Validation and Verification (ICST ’20). IEEE, 375–386.
[10]
T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology.
[11]
Yao Deng, Guannan Lou, Xi Zheng, Tianyi Zhang, Miryung Kim, Huai Liu, Chen Wang, and Tsong Yueh Chen. 2021. BMT: Behavior driven development-based metamorphic testing for autonomous driving models. In Proceedings of the IEEE/ACM 6th International Workshop on Metamorphic Testing (MET ’21). IEEE, 32–36.
[12]
Yao Deng, Xi Zheng, Tianyi Zhang, Huai Liu, Guannan Lou, Miryung Kim, and Tsong Yueh Chen. 2022. A declarative metamorphic testing framework for autonomous driving. IEEE Transactions on Software Engineering 49, 4 (2022), 1964–1982.
[13]
Raul Sena Ferreira, Jean Arlat, Jérémie Guiochet, and Hélène Waeselynck. 2021. Benchmarking safety monitors for image classifiers with machine learning. In Proceedings of the IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC ’21). IEEE, 7–16.
[14]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1050–1059.
[15]
Alessio Gambi, Tri Huynh, and Gordon Fraser. 2019a. Generating effective test cases for self-driving cars from police reports. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 257–267.
[16]
Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019b. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 318–328.
[17]
Joris Guerin, Kevin Delmas, and Jérémie Guiochet. 2022. Evaluation of runtime monitoring for UAV emergency landing. In Proceedings of the International Conference on Robotics and Automation (ICRA ’22). IEEE, 9703–9709.
[18]
Fitash Ul Haq, Donghwan Shin, and Lionel Briand. 2022. Efficient online testing for DNN-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th International Conference on Software Engineering. 811–822.
[19]
Fitash Ul Haq, Donghwan Shin, and Lionel C. Briand. 2023. Many-objective reinforcement learning for online testing of dnn-enabled systems. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23). IEEE, 1814–1826.
[20]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations (2019).
[21]
Jens Henriksson, Christian Berger, Markus Borg, Lars Tornberg, Cristofer Englund, Sankar Raman Sathyamoorthy, and Stig Ursing. 2019. Towards structured evaluation of deep neural network supervisors. In Proceedings of the IEEE International Conference on Artificial Intelligence Testing (AITest ’19). IEEE, 27–34.
[22]
J. J. Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79, 8 (1982), 2554–2558.
[23]
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121.
[24]
Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. Deepcrime: mutation testing of deep learning systems based on real faults. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 67–78.
[25]
Manzoor Hussain, Nazakat Ali, and Jang-Eui Hong. 2022. DeepGuard: A framework for safeguarding autonomous driving systems from inconsistent behaviour. Automated Software Engineering 29, 1 (2022), 1.
[26]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE ’19). IEEE, 1039–1049.
[27]
Anis Koubâa et al. 2017. Robot Operating System (ROS). Vol. 1. Springer.
[28]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.
[29]
[30]
LeoRover. 2023. LeoRover. Retrieved from https://github.com/LeoRover
[31]
Mikael Lindvall, Adam Porter, Gudjon Magnusson, and Christoph Schulze. 2017. Metamorphic model-based testing of autonomous systems. In Proceedings of the IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET ’17). IEEE, 35–41.
[32]
Chengjie Lu, Yize Shi, Huihui Zhang, Man Zhang, Tiexin Wang, Tao Yue, and Shaukat Ali. 2022. Learning configurations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering 49, 1 (2022), 384–402.
[33]
Rhiannon Michelmore, Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, and Marta Kwiatkowska. 2020. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’20). 7344–7350. DOI:
[34]
Galen E. Mullins, Paul G. Stankiewicz, R. Chad Hawthorne, and Satyandra K. Gupta. 2018. Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles. Journal of Systems and Software 137 (2018), 197–215.
[35]
Christian Murphy and Gail E. Kaiser. 2009. Metamorphic Runtime Checking of Non-Testable Programs. Retrieved from https://core.ac.uk/reader/161435520
[36]
Christian Murphy, Kuang Shen, and Gail Kaiser. 2009. Automatic system testing of programs without test oracles. In Proceedings of the 18th International Symposium on Software Testing and Analysis. 189–200.
[37]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[38]
Vincenzo Riccio and Paolo Tonella. 2023. When and why test generators for deep learning produce invalid inputs: An empirical study. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23). IEEE, 1161–1173.
[39]
Jeanine Romano, Jeffrey. D. Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen’s d indices the most appropriate choices. In Proceedings of the Annual Meeting of the Southern Association for Institutional Research. 1–51.
[40]
Lukas Ruff, Jacob R. Kauffmann, Robert A. Vandermeulen, Grégoire Montavon, Wojciech Samek, Marius Kloft, Thomas G. Dietterich, and Klaus-Robert Müller. 2021. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE 109, 5 (2021), 756–795.
[41]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.
[42]
Franz Scheuer, Alessio Gambi, and Paolo Arcaini. 2023. STRETCH: Generating challenging scenarios for testing collision avoidance systems. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV ’23). IEEE, 1–6.
[43]
Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on Software Engineering 42, 9 (2016), 805–824.
[44]
Helge Spieker and Arnaud Gotlieb. 2020. Adaptive metamorphic testing with contextual bandits. Journal of Systems and Software 165 (2020), Article 110574.
[45]
Andrea Stocco, Paulo J Nunes, Marcelo D’Amorim, and Paolo Tonella. 2022a. Thirdeye: Attention maps for safe autonomous driving systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
[46]
Andrea Stocco, Brian Pulfer, and Paolo Tonella. 2022b. Mind the gap! A study on the transferability of virtual vs physical-world testing of autonomous driving systems. IEEE Transactions on Software Engineering 49, 4 (2022), 1928–1940.
[47]
Andrea Stocco and Paolo Tonella. 2022. Confidence-driven weighted retraining for predicting safety-critical failures in autonomous driving systems. Journal of Software: Evolution and Process 34, 10 (2022), e2386.
[48]
Andrea Stocco, Michael Weiss, Marco Calzana, and Paolo Tonella. 2020. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 359–371.
[49]
Yang Sun, Christopher M. Poskitt, Jun Sun, Yuqi Chen, and Zijiang Yang. 2022. LawBreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
[50]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. 303–314.
[51]
Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. Dissector: Input validation for deep learning applications by crossing-layer dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 727–738.
[52]
Michael Weiss and Paolo Tonella. 2023. Uncertainty quantification for deep neural networks: An empirical comparison and usage guidelines. Software Testing, Verification and Reliability 33, 6 (2023), e1840.
[53]
Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian Elbaum, Yun Lin, and Jin Song Dong. 2021. Self-checking deep neural networks in deployment. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE ’21). IEEE, 372–384.
[54]
Man Zhang, Shaukat Ali, Tao Yue, Roland Norgren, and Oscar Okariz. 2019. Uncertainty-wise cyber-physical system test modeling. Software & Systems Modeling 18 (2019), 1379–1418.
[55]
Man Zhang, Bran Selic, Shaukat Ali, Tao Yue, Oscar Okariz, and Roland Norgren. 2016. Understanding uncertainty in cyber-physical systems: A conceptual model. In Proceedings of the Modelling Foundations and Applications: 12th European Conference, ECMFA 2016, Held as Part of STAF 2016. Springer, 247–264.
[56]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.
[57]
Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. 2022. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering 49, 4 (2022), 1860–1875.
[58]
Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M. Poskitt, Yang Liu, and Zijiang Yang. 2023. Specification-based autonomous driving system testing. IEEE Transactions on Software Engineering 49, 6 (2023), 3391–3410.
[59]
Zhi Quan Zhou and Liqun Sun. 2019. Metamorphic testing of driverless cars. Communications of the ACM 62, 3 (2019), 61–67.

Cited By

View all
  • (2025)Semantic Diverse DMR and TMR for High-Integrity AI-Based Function EfficiencyACM Transactions on Cyber-Physical Systems10.1145/3716140Online publication date: 31-Jan-2025
  • (2024)Predicting Safety Misbehaviours in Autonomous Driving Systems Using Uncertainty Quantification2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00016(70-81)Online publication date: 27-May-2024

Index Terms

  1. MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 34, Issue 1
    January 2025
    967 pages
    EISSN:1557-7392
    DOI:10.1145/3703005
    • Editor:
    • Abhik Roychoudhury
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2024
    Online AM: 15 July 2024
    Accepted: 06 July 2024
    Revised: 07 June 2024
    Received: 11 October 2023
    Published in TOSEM Volume 34, Issue 1

    Check for updates

    Author Tags

    1. Autonomous Driving Systems
    2. Runtime Monitoring
    3. Metamorphic Testing
    4. Cyber-Physical Systems
    5. Deep Neural Networks

    Qualifiers

    • Research-article

    Funding Sources

    • Basque Government through their Elkartek
    • Department of Education, Universities and Research of the Basque Country

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)341
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Semantic Diverse DMR and TMR for High-Integrity AI-Based Function EfficiencyACM Transactions on Cyber-Physical Systems10.1145/3716140Online publication date: 31-Jan-2025
    • (2024)Predicting Safety Misbehaviours in Autonomous Driving Systems Using Uncertainty Quantification2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00016(70-81)Online publication date: 27-May-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media