research-article

MAGNets: Micro-Architectured Group Neural Networks

Authors:

Briti Gangopadhyay,

Pallb Dasgupta,

Soumyajit DeyAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 2650 - 2658

Published: 06 May 2024 Publication History

Abstract

Reinforcement Learning (RL) algorithms have successfully achieved human-like performances in complex environments like games, autonomous vehicles, and industrial robots. However, the Deep Neural Networks (DNNs) used to approximate large Deep Reinforcement Learning (DRL) policies are resource-hungry and opaque. This limits the applicability of DRL in safety-critical applications running on resource-constrained platforms. On the other hand, on inspecting the design of most multi-output safety critical embedded control applications, it may be observed that such systems often derive each output based on some artifacts, which are, in turn, derived from input variables. Given such dependencies of internal system states on inputs, one may argue that each of these derived artifacts can be approximated by a smaller network in a multi-network DRL setting. In this work, we propose Micro Architecture Group Neural Networks (MAGNets) that can distill the learning of a large DRL network into multiple small neural networks. Using several OpenAI Gym environments, we show that existing verification tools can be used to verify the output of MAGNets while preserving the performance of a large neural policy. We also report our gains in network compactness, which directly impacts the execution latency and applicability in edge devices.

References

[1]

Guy Amir, Michael Schapira, and Guy Katz. 2021. Towards Scalable Verification of Deep Reinforcement Learning. 2021 Formal Methods in Computer Aided Design (FMCAD) (2021), 193--203.

[2]

Jimmy Ba and Rich Caruana. 2014. Do Deep Nets Really Need to be Deep?. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2014/file/ea8fcd92d59581717e06eb187f10666d-Paper.pdf

[3]

Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Neural Information Processing Systems.

[4]

Andrew Bell, Ian Solano-Kamaiko, Oded Nov, and Julia Stoyanovich. 2022. It's Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT '22). Association for Computing Machinery, New York, NY, USA, 248--266. https://doi.org/10.1145/3531146.3533090

Digital Library

[5]

Yoshua Bengio et al. 2009. Learning deep architectures for AI. Foundations and trends® in Machine Learning, Vol. 2, 1 (2009), 1--127.

[6]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. ArXiv, Vol. abs/1606.01540 (2016). https://api.semanticscholar.org/CorpusID:16099293

[7]

Rudy Bunel, Matthew J. Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. 2018. Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis. ArXiv, Vol. abs/1805.04276 (2018).

[8]

Souradeep Dutta, Xin Chen, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2019. Sherlock - A Tool for Verification of Neural Network Feedback Systems: Demo Abstract. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control (Montreal, Quebec, Canada) (HSCC '19). Association for Computing Machinery, New York, NY, USA, 262--263. https://doi.org/10.1145/3302504.3313351

Digital Library

[9]

Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJl-b3RcF7

[10]

Abhiroop Ghosh, Yashesh Dhebar, Ritam Guha, Kalyanmoy Deb, Subramanya Nageshrao, Ling Zhu, Eric Tseng, and Dimitar Filev. 2021. Interpretable AI Agent Through Nonlinear Decision Trees for Lane Change Problem. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI). 01-08. https://doi.org/10.1109/SSCI50451.2021.9659552

[11]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv: Computer Vision and Pattern Recognition (2015).

[12]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. ArXiv, Vol. abs/1503.02531 (2015).

[13]

Radoslav Ivanov, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. 2019. Verisig: verifying safety properties of hybrid systems with neural network controllers. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 169--178.

Digital Library

[14]

Radoslav Ivanov, James Weimer, Oleg Sokolsky, and Insup Lee. 2018. Demo: verisig - verifying safety properties of hybrid systems with neural network controllers. Proceedings of the Workshop on Design Automation for CPS and IoT (2018).

[15]

Guy Katz, Clark W. Barrett, David L. Dill, Kyle D. Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In International Conference on Computer Aided Verification.

[16]

Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, David L. Dill, Mykel J. Kochenderfer, and Clark Barrett. 2019. The Marabou Framework for Verification and Analysis of Deep Neural Networks. In Computer Aided Verification, Isil Dillig and Serdar Tasiran (Eds.). Springer International Publishing, Cham, 443--452.

[17]

James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM, Vol. 19, 7 (jul 1976), 385--394. https://doi.org/10.1145/360248.360252

Digital Library

[18]

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez. 2022. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 6 (2022), 4909--4926. https://doi.org/10.1109/TITS.2021.3054625

[19]

William Koch, Renato Mancuso, Richard West, and Azer Bestavros. 2018. Reinforcement Learning for UAV Attitude Control. ACM Transactions on Cyber-Physical Systems, Vol. 3 (2018), 1--21.

Digital Library

[20]

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E Papalexakis, and Fei Fang. 2023. MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part IV. Springer, 251--266.

[21]

Ananya Paul, Krishnendu Bera, Devtanu Misra, Sattwik Barua, Saurabh Singh, Nishu Nishant Kumar, and Sulata Mitra. 2021. Intelligent Traffic Signal Management Using DRL for a Real-Time Road Network in ITS. In 2021 Thirteenth International Conference on Contemporary Computing (IC3-2021) (Noida, India) (IC3 '21). Association for Computing Machinery, New York, NY, USA, 417--425. https://doi.org/10.1145/3474124.3474187

Digital Library

[22]

Wenjie Qiu and He Zhu. 2022. Programmatic Reinforcement Learning without Oracles. In International Conference on Learning Representations.

[23]

Partha Pratim Ray. 2022. A review on TinyML: State-of-the-art and prospects. Journal of King Saud University - Computer and Information Sciences, Vol. 34, 4 (2022), 1595--1623. https://doi.org/10.1016/j.jksuci.2021.11.019

Digital Library

[24]

Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. 2010. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In International Conference on Artificial Intelligence and Statistics.

[25]

Iqbal H Sarker. 2021. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, Vol. 2, 6 (2021), 420.

Digital Library

[26]

Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, Vol. 3, 6 (1999), 233--242.

[27]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. ArXiv, Vol. abs/1707.06347 (2017).

[28]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature, Vol. 550, 7676 (2017), 354--359.

[29]

Filip Svoboda, David Nunes, Milad Alizadeh, Russel Daries, Rui Luo, Akhil Mathur, Sourav Bhattacharya, Jorge Sa Silva, and Nicholas Donald Lane. 2021. Resource Efficient Deep Reinforcement Learning for Acutely Constrained TinyML Devices. In Research Symposium on Tiny Machine Learning. https://openreview.net/forum?id=_vo8DFo9iuB

[30]

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. In International Conference on Machine Learning. PMLR, 5045--5054.

[31]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning, Vol. 8, 3 (01 May 1992), 279--292. https://doi.org/10.1007/BF00992698

Digital Library

[32]

Zidong Zhang, Dongxia Zhang, and Robert C. Qiu. 2020. Deep reinforcement learning for power system applications: An overview. CSEE Journal of Power and Energy Systems, Vol. 6, 1 (2020), 213--225. https://doi.org/10.17775/CSEEJPES.2019.00920

Index Terms

MAGNets: Micro-Architectured Group Neural Networks
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Deep learning in neural networks

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. ...
Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the ...
Global Reinforcement Learning in Neural Networks

In this letter, we have found a more general formulation of the REward Increment = Nonnegative Factor times Offset Reinforcement times Characteristic Eligibility (REINFORCE) learning principle first suggested by Williams. The new formulation has enabled ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '24

Sponsor:

SIGAI

AAMAS '24: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
32
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)13

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten