research-article

MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems

Authors:

Miren Illarramendi,

Aitor ArrietaAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 34, Issue 1

Article No.: 18, Pages 1 - 35

https://doi.org/10.1145/3678171

Published: 27 December 2024 Publication History

Abstract

Autonomous driving systems (ADSs) are complex cyber-physical systems (CPSs) that must ensure safety even in uncertain conditions. Modern ADSs often employ deep neural networks (DNNs), which may not produce correct results in every possible driving scenario. Thus, an approach to estimate the confidence of an ADS at runtime is necessary to prevent potentially dangerous situations. In this article we propose MarMot, an online monitoring approach for ADSs based on metamorphic relations (MRs), which are properties of a system that hold among multiple inputs and the corresponding outputs. Using domain-specific MRs, MarMot estimates the uncertainty of the ADS at runtime, allowing the identification of anomalous situations that are likely to cause a faulty behavior of the ADS, such as driving off the road.

We perform an empirical assessment of MarMot with five different MRs, using two different subject ADSs, including a small-scale physical ADS and a simulated ADS. Our evaluation encompasses the identification of both external anomalies, e.g., fog, as well as internal anomalies, e.g., faulty DNNs due to mislabeled training data. Our results show that MarMot can identify up to 65% of the external anomalies and 100% of the internal anomalies in the physical ADS, and up to 54% of the external anomalies and 88% of the internal anomalies in the simulated ADS. With these results, MarMot outperforms or is comparable to other state-of-the-art approaches, including SelfOracle, Ensemble, and MC Dropout-based ADS monitors.

References

[1]

Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018a. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering. 1016–1026.

Digital Library

[2]

Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018b. Testing autonomous cars for feature interaction failures using many-objective search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 143–154.

Digital Library

[3]

Aitor Arrieta. 2022. Multi-objective metamorphic follow-up test case selection for deep learning systems. In Proceedings of the Genetic and Evolutionary Computation Conference. 1327–1335.

Digital Library

[4]

Jon Ayerdi, Pablo Valle, Asier Iriarte, Ibai Roman, Miren Illarramendi, and Aitor Arrieta. 2023. Dataset for “MarMot: Metamorphic runtime monitoring of autonomous driving systems. DOI:

[5]

Jonathan Bell, Christian Murphy, and Gail Kaiser. 2015. Metamorphic runtime checking of applications without test oracles. CrossTalk 28, 2 (2015).

[6]

Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63–74.

Digital Library

[7]

Matteo Biagiola and Paolo Tonella. 2022. Testing the plasticity of reinforcement learning-based systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 4 (2022), 1–46.

Digital Library

[8]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to end learning for self-driving cars. arXiv:1604.07316. Retrieved from https://arxiv.org/abs/1604.07316

[9]

Alessandro Calo, Paolo Arcaini, Shaukat Ali, Florian Hauer, and Fuyuki Ishikawa. 2020. Generating avoidable collision scenarios for testing autonomous driving systems. In Proceedings of the IEEE 13th International Conference on Software Testing, Validation and Verification (ICST ’20). IEEE, 375–386.

[10]

T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology.

[11]

Yao Deng, Guannan Lou, Xi Zheng, Tianyi Zhang, Miryung Kim, Huai Liu, Chen Wang, and Tsong Yueh Chen. 2021. BMT: Behavior driven development-based metamorphic testing for autonomous driving models. In Proceedings of the IEEE/ACM 6th International Workshop on Metamorphic Testing (MET ’21). IEEE, 32–36.

[12]

Yao Deng, Xi Zheng, Tianyi Zhang, Huai Liu, Guannan Lou, Miryung Kim, and Tsong Yueh Chen. 2022. A declarative metamorphic testing framework for autonomous driving. IEEE Transactions on Software Engineering 49, 4 (2022), 1964–1982.

Digital Library

[13]

Raul Sena Ferreira, Jean Arlat, Jérémie Guiochet, and Hélène Waeselynck. 2021. Benchmarking safety monitors for image classifiers with machine learning. In Proceedings of the IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC ’21). IEEE, 7–16.

[14]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1050–1059.

Digital Library

[15]

Alessio Gambi, Tri Huynh, and Gordon Fraser. 2019a. Generating effective test cases for self-driving cars from police reports. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 257–267.

Digital Library

[16]

Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019b. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 318–328.

Digital Library

[17]

Joris Guerin, Kevin Delmas, and Jérémie Guiochet. 2022. Evaluation of runtime monitoring for UAV emergency landing. In Proceedings of the International Conference on Robotics and Automation (ICRA ’22). IEEE, 9703–9709.

Digital Library

[18]

Fitash Ul Haq, Donghwan Shin, and Lionel Briand. 2022. Efficient online testing for DNN-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th International Conference on Software Engineering. 811–822.

Digital Library

[19]

Fitash Ul Haq, Donghwan Shin, and Lionel C. Briand. 2023. Many-objective reinforcement learning for online testing of dnn-enabled systems. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23). IEEE, 1814–1826.

Digital Library

[20]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations (2019).

[21]

Jens Henriksson, Christian Berger, Markus Borg, Lars Tornberg, Cristofer Englund, Sankar Raman Sathyamoorthy, and Stig Ursing. 2019. Towards structured evaluation of deep neural network supervisors. In Proceedings of the IEEE International Conference on Artificial Intelligence Testing (AITest ’19). IEEE, 27–34.

[22]

J. J. Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79, 8 (1982), 2554–2558.

[23]

Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121.

Digital Library

[24]

Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. Deepcrime: mutation testing of deep learning systems based on real faults. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 67–78.

Digital Library

[25]

Manzoor Hussain, Nazakat Ali, and Jang-Eui Hong. 2022. DeepGuard: A framework for safeguarding autonomous driving systems from inconsistent behaviour. Automated Software Engineering 29, 1 (2022), 1.

Digital Library

[26]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE ’19). IEEE, 1039–1049.

Digital Library

[27]

Anis Koubâa et al. 2017. Robot Operating System (ROS). Vol. 1. Springer.

Digital Library

[28]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.

[29]

LeoRover. 2022. LeoRover Dataset. Retrieved from https://www.kaggle.com/datasets/aleksanderszymaski/full-track

[30]

LeoRover. 2023. LeoRover. Retrieved from https://github.com/LeoRover

[31]

Mikael Lindvall, Adam Porter, Gudjon Magnusson, and Christoph Schulze. 2017. Metamorphic model-based testing of autonomous systems. In Proceedings of the IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET ’17). IEEE, 35–41.

Digital Library

[32]

Chengjie Lu, Yize Shi, Huihui Zhang, Man Zhang, Tiexin Wang, Tao Yue, and Shaukat Ali. 2022. Learning configurations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering 49, 1 (2022), 384–402.

[33]

Rhiannon Michelmore, Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, and Marta Kwiatkowska. 2020. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’20). 7344–7350. DOI:

[34]

Galen E. Mullins, Paul G. Stankiewicz, R. Chad Hawthorne, and Satyandra K. Gupta. 2018. Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles. Journal of Systems and Software 137 (2018), 197–215.

[35]

Christian Murphy and Gail E. Kaiser. 2009. Metamorphic Runtime Checking of Non-Testable Programs. Retrieved from https://core.ac.uk/reader/161435520

[36]

Christian Murphy, Kuang Shen, and Gail Kaiser. 2009. Automatic system testing of programs without test oracles. In Proceedings of the 18th International Symposium on Software Testing and Analysis. 189–200.

Digital Library

[37]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.

Digital Library

[38]

Vincenzo Riccio and Paolo Tonella. 2023. When and why test generators for deep learning produce invalid inputs: An empirical study. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23). IEEE, 1161–1173.

Digital Library

[39]

Jeanine Romano, Jeffrey. D. Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen’s d indices the most appropriate choices. In Proceedings of the Annual Meeting of the Southern Association for Institutional Research. 1–51.

[40]

Lukas Ruff, Jacob R. Kauffmann, Robert A. Vandermeulen, Grégoire Montavon, Wojciech Samek, Marius Kloft, Thomas G. Dietterich, and Klaus-Robert Müller. 2021. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE 109, 5 (2021), 756–795.

[41]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.

[42]

Franz Scheuer, Alessio Gambi, and Paolo Arcaini. 2023. STRETCH: Generating challenging scenarios for testing collision avoidance systems. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV ’23). IEEE, 1–6.

[43]

Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on Software Engineering 42, 9 (2016), 805–824.

[44]

Helge Spieker and Arnaud Gotlieb. 2020. Adaptive metamorphic testing with contextual bandits. Journal of Systems and Software 165 (2020), Article 110574.

[45]

Andrea Stocco, Paulo J Nunes, Marcelo D’Amorim, and Paolo Tonella. 2022a. Thirdeye: Attention maps for safe autonomous driving systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.

Digital Library

[46]

Andrea Stocco, Brian Pulfer, and Paolo Tonella. 2022b. Mind the gap! A study on the transferability of virtual vs physical-world testing of autonomous driving systems. IEEE Transactions on Software Engineering 49, 4 (2022), 1928–1940.

Digital Library

[47]

Andrea Stocco and Paolo Tonella. 2022. Confidence-driven weighted retraining for predicting safety-critical failures in autonomous driving systems. Journal of Software: Evolution and Process 34, 10 (2022), e2386.

[48]

Andrea Stocco, Michael Weiss, Marco Calzana, and Paolo Tonella. 2020. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 359–371.

Digital Library

[49]

Yang Sun, Christopher M. Poskitt, Jun Sun, Yuqi Chen, and Zijiang Yang. 2022. LawBreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.

Digital Library

[50]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. 303–314.

Digital Library

[51]

Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. Dissector: Input validation for deep learning applications by crossing-layer dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 727–738.

Digital Library

[52]

Michael Weiss and Paolo Tonella. 2023. Uncertainty quantification for deep neural networks: An empirical comparison and usage guidelines. Software Testing, Verification and Reliability 33, 6 (2023), e1840.

[53]

Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian Elbaum, Yun Lin, and Jin Song Dong. 2021. Self-checking deep neural networks in deployment. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE ’21). IEEE, 372–384.

Digital Library

[54]

Man Zhang, Shaukat Ali, Tao Yue, Roland Norgren, and Oscar Okariz. 2019. Uncertainty-wise cyber-physical system test modeling. Software & Systems Modeling 18 (2019), 1379–1418.

Digital Library

[55]

Man Zhang, Bran Selic, Shaukat Ali, Tao Yue, Oscar Okariz, and Roland Norgren. 2016. Understanding uncertainty in cyber-physical systems: A conceptual model. In Proceedings of the Modelling Foundations and Applications: 12th European Conference, ECMFA 2016, Held as Part of STAF 2016. Springer, 247–264.

Digital Library

[56]

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.

Digital Library

[57]

Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. 2022. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering 49, 4 (2022), 1860–1875.

Digital Library

[58]

Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M. Poskitt, Yang Liu, and Zijiang Yang. 2023. Specification-based autonomous driving system testing. IEEE Transactions on Software Engineering 49, 6 (2023), 3391–3410.

Digital Library

[59]

Zhi Quan Zhou and Liqun Sun. 2019. Metamorphic testing of driverless cars. Communications of the ACM 62, 3 (2019), 61–67.

Digital Library

Cited By

Caro MBrando AAbella J(2025)Semantic Diverse DMR and TMR for High-Integrity AI-Based Function EfficiencyACM Transactions on Cyber-Physical Systems10.1145/3716140Online publication date: 31-Jan-2025
https://doi.org/10.1145/3716140
Grewal RTonella PStocco A(2024)Predicting Safety Misbehaviours in Autonomous Driving Systems Using Uncertainty Quantification2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00016(70-81)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00016

Index Terms

MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems
1. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software safety

Recommendations

Runtime Monitoring for Cyber-physical Systems: A Case Study of Cooperative Adaptive Cruise Control
ISDEA '12: Proceedings of the 2012 Second International Conference on Intelligent System Design and Engineering Application

Cyber-Physical Systems (CPS) involve deep interactions between computation cores, communication networks, and physical environments. These systems are inherently complex and highly nondeterministic. This makes the traditional formal verification ...
Towards a model-integrated runtime monitoring infrastructure for cyber-physical systems
ICSE-NIER '21: Proceedings of the 43rd International Conference on Software Engineering: New Ideas and Emerging Results

Runtime monitoring is essential for ensuring the safe operation and enabling self-adaptive behavior of Cyber-Physical Systems (CPS). It requires the creation of system monitors, instrumentation for data collection, and the definition of constraints. All ...
Multilevel Runtime Security and Safety Monitoring for Cyber Physical Systems Using Model-Based Engineering
Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops
Abstract
Cyber-Physical Systems (CPS) are heterogeneous in nature and are composed of numerous components and embedded subsystems that are interacting with each other and with the physical world. The interaction of hardware and software components at each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 34, Issue 1

January 2025

967 pages

EISSN:1557-7392

DOI:10.1145/3703005

Editor:
Abhik Roychoudhury
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2024

Online AM: 15 July 2024

Accepted: 06 July 2024

Revised: 07 June 2024

Received: 11 October 2023

Published in TOSEM Volume 34, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Basque Government through their Elkartek
Department of Education, Universities and Research of the Basque Country

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
341
Total Downloads

Downloads (Last 12 months)341
Downloads (Last 6 weeks)58

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Caro MBrando AAbella J(2025)Semantic Diverse DMR and TMR for High-Integrity AI-Based Function EfficiencyACM Transactions on Cyber-Physical Systems10.1145/3716140Online publication date: 31-Jan-2025
https://doi.org/10.1145/3716140
Grewal RTonella PStocco A(2024)Predicting Safety Misbehaviours in Autonomous Driving Systems Using Uncertainty Quantification2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00016(70-81)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00016

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents