A Hexagon Sensor and A Layer-Based Conversion Method for Hexagon Clusters
<p>Graph of results of learning in a hexagon-cluster-based map without sensor components provided which collect observations.</p> "> Figure 2
<p>RayPerceptionSensor that utilizes collision data to collect observations.</p> "> Figure 3
<p>Axial coordinate conversion that approximates the hexagon cluster on a grid based on a diagonally tilted horizontal <span class="html-italic">x</span>-axis and a vertical <span class="html-italic">y</span>-axis.</p> "> Figure 4
<p>Structure for sequentially observing cells in a hexagon cluster without empty cells based on their position index.</p> "> Figure 5
<p>Structure of sequentially observing cells in a hexagon cluster with empty cells, excluding the empty cells, based on their position index.</p> "> Figure 6
<p>Overview of the observation collected by the buffer sensor. The observation consists of the x, y values of the coordinates converted based on axial coordinate conversion and the internal values assigned to each cell.</p> "> Figure 7
<p>Graph of results of learning in a hexagon-cluster-based map with the buffer sensor. The failure to improve in the learning metrics, except for entropy, indicates that the agent has been trained based on an incorrect policy.</p> "> Figure 8
<p>Observation structure of the buffer sensor; excluding empty cells from the observation results in the misalignment between the position index and the observation index.</p> "> Figure 9
<p>Observation structure of the hexagon sensor, which collects information by gathering the values written by all cells into an array, resulting in alignment between the position index and the observation index.</p> "> Figure 10
<p>Overview of the hexagon sensor observing the cells in a hexagon cluster, with position indexes assigned based on axial coordinate conversion.</p> "> Figure 11
<p>A conversion method that divides the hexagon cluster into multiple layers from the center and assigns position indexes to the cells according to their layers.</p> "> Figure 12
<p>Overview of the hexagon sensor observing the cells in a hexagon cluster, with position indexes assigned based on layer-based conversion.</p> "> Figure 13
<p>An example of a generated map. The map consists of 13 positive cells, 7 negative cells, and 1 destination cell, all of which are connected without any disconnections.</p> "> Figure 14
<p>The observable range of the hexagon sensor covering three layers surrounding the agent used in the experiment.</p> "> Figure 15
<p>This figure demonstrates the directions of movement from the current cell based on the actions selected by the agent.</p> "> Figure 16
<p>Hyper-parameter settings in the configuration file for learning based on Unity ML-Agents.</p> "> Figure 17
<p>Graph of results of learning for each agent regarding the reward, episode, and policy. The agent using the hexagon sensor and the layer-based conversion method proposed in this paper is represented as Agent C and marked in red on the graph.</p> "> Figure 18
<p>Learning time required by each agent to reach the predetermined max steps.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Sensor Components of Unity ML-Agents
2.2. Addressing Hexagon Cluster Information
3. Background and Motivation
4. Design
4.1. The Hexagon Sensor
4.2. Layer-Based Conversion
5. The Experimental Environment
5.1. The Map Design Based on the Hexagon Cluster
5.2. Creation of the Cells
5.3. Sensor Range
5.4. The Agent
5.5. Rewards
5.6. Hyper-Parameters and the Environment’s Spec
6. Performance Results
- Agent A: Converts the position of cells into coordinates using the axial coordinate conversion method and then collects observations for which the coordinates are explicitly written using the buffer sensor.
- Agent B: Converts the position of the cells into coordinates using the axial coordinate conversion method but collects observations using the hexagon sensor, without explicitly writing the coordinates.
- Agent C: Collects observations using the hexagon sensor with the layer-based conversion method, without converting the positions of the cells into coordinates.
6.1. Learning Performance Evaluation
6.2. Inference Performance Evaluation
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Technologies, U. Unity ML-Agents Toolkit - Sensors. Available online: https://github.com/Unity-Technologies/ml-agents/tree/develop/com.unity.ml-agents/Runtime/Sensors (accessed on 10 October 2024).
- Wang, H.; Tang, H.; Hao, J.; Hao, X.; Fu, Y.; Ma, Y. Large Scale Deep Reinforcement Learning in War-games. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Seoul, Republic of Korea, 16–19 December 2020; pp. 1693–1699. [Google Scholar] [CrossRef]
- Cannon, C.T.; Goericke, S. Using Convolution Neural Networks to Develop Robust Combat Behaviors Through Reinforcement Learning. Ph.D. Thesis, Naval Postgraduate School, Monterey, CA, USA, 2021. [Google Scholar]
- Roland, E.; Wisthoff, B.; Kelleher, E. Hex Size in JTLS; US ROLANDS & ASSOCIATES Corporation: Monterey, CA, USA, 2002. [Google Scholar]
- Kooij, G.V.D. Actor-Critic Catan: Reinforcement Learning in High-Strategic Environments. Bachelor’s Thesis, Utrecht University, Utrecht, The Netherlands, 2020. [Google Scholar]
- Blixt, R.; Ye, A. Reinforcement learning AI to Hive. 2013. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2:668701 (accessed on 10 October 2024).
- Apuroop, K.G.S.; Le, A.V.; Elara, M.R.; Sheu, B.J. Reinforcement learning-based complete area coverage path planning for a modified htrihex robot. Sensors 2021, 21, 1067. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Wang, Z.; Chen, C.; Dong, D. Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction. IEEE/ASME Trans. Mechatron. 2022, 27, 846–857. [Google Scholar] [CrossRef]
- Smulders, B.G.J.P. Optimizing Minesweeper and Its Hexagonal Variant with Deep Reinforcement Learning. Bachelor’s Thesis, Tilburg University, Tilburg, The Netherlands, 2023. [Google Scholar]
- Saetre, S.M. Laying the Foundation For an Artificial Intelligence-Powered Extendable Digital Twin Framework for Autonomous Sea Vessels. Master’s Thesis, NTNU, Trondheim, Norway, 2022. [Google Scholar]
- Nämerforslund, T. Machine Learning Adversaries in Video Games Using Reinforcement Learning in the Unity Engine to Create Compelling Enemy Characters. 2021. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1583775 (accessed on 10 October 2024).
- Persson, H. Deep Reinforcement Learning for Multi-Agent Path Planning in 2D Cost Map Environments Using Unity Machine Learning Agents Toolkit. 2024. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1867292 (accessed on 10 October 2024).
- Elmaraghy, A.; Ruttico, P.; Restelli, M.; Montali, J.; Causone, F. Towards Reviving the Master Builder: Autonomous Design and Construction Using Deep Reinforcement Learning. Master’s Thesis, Politecnico di Milano, Milano, Italy, 2022. [Google Scholar]
- Birch, C.P.; Oom, S.P.; Beecham, J.A. Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecol. Model. 2007, 206, 347–359. [Google Scholar] [CrossRef]
- Jacquemont, M.; Antiga, L.; Vuillaume, T.; Silvestri, G.; Benoit, A.; Lambert, P.; Maurin, G. Indexed Operations for Non-rectangular Lattices Applied to Convolutional Neural Networks. In Proceedings of the 14th International Conference on Computer Vision Theory and Applications VISAPP, Prague, Czech Republic, 25–27 February 2019. [Google Scholar]
- Luo, J.; Zhang, W.; Su, J.; Xiang, F. Hexagonal convolutional neural networks for hexagonal grids. IEEE Access 2019, 7, 142738–142749. [Google Scholar] [CrossRef]
- Shilon, I.; Kraus, M.; Büchele, M.; Egberts, K.; Fischer, T.; Holch, T.L.; Lohse, T.; Schwanke, U.; Steppa, C.; Funk, S. Application of deep learning methods to analysis of imaging atmospheric Cherenkov telescopes data. Astropart. Phys. 2019, 105, 44–53. [Google Scholar] [CrossRef]
- Verma, A.; Meenpal, T.; Acharya, B. Computational Cost Reduction of Convolution Neural Networks by Insignificant Filter Removal. Sci. Technol. 2022, 25, 150–165. [Google Scholar]
- Technologies, U. Unity ML-Agents Toolkit—Using Tensorboard. Available online: https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Using-Tensorboard.md/ (accessed on 27 September 2024).
- Rubak, M.; Göschka, F.H.D.P.D.M.D.D.K.M. Imitation Learning with the Unity Machine Learning Agents Toolkit. Master’s Thesis, University of Applied Sciences FH Campus Wein, Wien, Austria, 2021. [Google Scholar]
- Andersson, P. Future-Proofing Video Game Agents with Reinforced Learning and Unity ML-Agents. 2021. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1605238 (accessed on 10 October 2024).
- mbaske. Grid Sensors for Unity ML-Agents. Available online: https://github.com/mbaske/grid-sensor (accessed on 10 October 2024).
- Steward, J. Procedural Content Generation: Generating Procedural Game Content Using Machine Learning. Ph.D. Thesis, California State University, Northridge, CA, USA, 2022. [Google Scholar]
- Juliani, A.; Berges, V.P.; Teng, E.; Cohen, A.; Harper, J.; Elion, C.; Goy, C.; Gao, Y.; Henry, H.; Mattar, M.; et al. Unity: A General Platform for Intelligent Agents. arXiv 2018, arXiv:1809.02627. [Google Scholar]
- Tervo, A. Effects of Curriculum Learning on Maze Exploring DRL Agent Using Unity ML-Agents. Master’s Thesis, University of Turku, Turku, Finland, 2022. [Google Scholar]
- Hoogeboom, E.; Peters, J.W.T.; Cohen, T.S.; Welling, M. HexaConv. arXiv 2018, arXiv:1803.02108. [Google Scholar]
- Patel, A. Hexagonal Grids. 2013. Available online: https://www.redblobgames.com/grids/hexagons/ (accessed on 27 September 2024).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Types of Reward | Reward Value |
---|---|
Move to a positive cell | +0.001 |
Move to a negative cell | −0.0005 |
Reaches the destination cell | 1 − (0.001 * Timer) 1 |
Error penalty | −0.02 |
Continuous penalty | −0.0005 |
Excessive frame count penalty | −1 |
Hardware and Software | Specification and Version |
---|---|
CPU | Ryzen 5 5600X (6 cores) |
GPU | NVIDIA GeForce RTX 3060ti |
Memory | Samsung RAM DDR5-4800 (32 GB) |
Storage | Samsung PM981a (1 TB) |
ML-Agents release version | Release 21 |
ML-Agents Unity package | 3.0.0 |
ML-Agents Python package | 1.0.0 |
Python | 3.10.13 |
PyTorch | 1.13.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.-H.; Sung, H. A Hexagon Sensor and A Layer-Based Conversion Method for Hexagon Clusters. Information 2024, 15, 747. https://doi.org/10.3390/info15120747
Kim J-H, Sung H. A Hexagon Sensor and A Layer-Based Conversion Method for Hexagon Clusters. Information. 2024; 15(12):747. https://doi.org/10.3390/info15120747
Chicago/Turabian StyleKim, Jun-Ho, and Hanul Sung. 2024. "A Hexagon Sensor and A Layer-Based Conversion Method for Hexagon Clusters" Information 15, no. 12: 747. https://doi.org/10.3390/info15120747
APA StyleKim, J.-H., & Sung, H. (2024). A Hexagon Sensor and A Layer-Based Conversion Method for Hexagon Clusters. Information, 15(12), 747. https://doi.org/10.3390/info15120747