Abstract
This paper investigates the design of game playing agents, which should automatically play an asymmetric hide-and-search-based board game with imperfect information, called Scotland Yard. Neural network approaches have been developed to make the agents behave human-like in the sense that they would assess the game environment in a way a human would assess it. Specifically, a thorough investigation has been conducted on the application of adversarial neural network combined with Q-learning for designing the game playing agents in the game. The searchers, called detectives and the hider, called Mister X (Mr. X) have been modeled as neural network agents, which play the game of Scotland Yard. Though it is a type of two-player (or, two-sided) game, all the five detectives must cooperate to capture the hider to win the game. A special kind of feature space has been designed for both detectives and Mr. X that would aid the process of cooperation among the detectives. Rigorous experiments have been conducted, and the performance in each experiment has been noted. The evidence from the obtained results demonstrates that the designed neural agents could show promising performance in terms of learning the game, cooperating, and making efforts to win the game.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
\(l_i\) denotes the number of neurons in the layer i.
References
McGonigal J (2011) Reality is broken: why games make us better and how they can change the world. Penguin, London
Peled N, Kraus S (2015) A study of computational and human strategies in revelation games. Auton Agents Multi Agent Syst 29(1):73–97
Kroll M, Geiß R (2007) A ludo board game for the agtive 2007 tool contest. http://www.informatik.uni-marburg.de/~swt/agtive-contest/Ludo-Karlsruhe.pdf. Accessed 23 June 2017
Adler M, Räcke H, Sivadasan N, Sohler C, Vöcking B (2003) Randomized pursuit-evasion in graphs. Comb Probab Comput 12(03):225–244
Sujit PB, Ghose D (2004) Multiple agent search of an unknown environment using game theoretical models. In: American control conference, 2004. Proceedings of the 2004, vol 6. IEEE, pp 5564–5569
Subelman EJ (1981) A hide-search game. J Appl Probab 18(03):628–640
Isaacs R (1999) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation, North Chelmsford
Nijssen JAM, Winands MHM (2011) Monte-carlo tree search for the game of Scotland Yard. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 158–165
Coulom R (2006) Efficient selectivity and backup operators in Monte-Carlo tree search. In: International conference on computers and games. Springer, pp 72–83
Luckhart C, Irani KB (1986) An algorithmic solution of n-person games. In :AAAI, vol 86, pp 158–162
Sturtevant NR, Korf RE (2000) On pruning techniques for multi-player games. AAAI/IAAI 49:201–207
Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, pp. 1069–1076
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. In: European conference on machine learning. Springer, pp 282–293
Nijssen P, Winands MHM (2012) Monte carlo tree search for the hide-and-seek game scotland yard. IEEE Trans Comput Intell AI Games 4(4):282–294
Sevenster M (2006) The complexity of scotland yard. In: van Benthem J, Löwe B, Gabbay D (eds) Interactive logic, pp 209–246
Risi S, Togelius J (2015) Neuroevolution in games: state of the art and open challenges. IEEE Trans Comput Intell AI Games 9:25–41
Li J, Kendall G (2015) A hyper-heuristic methodology to generate adaptivestrategies for games. IEEE Trans Computat Intell AI Games 9:1–10
Scotland Yard (2016). https://www.boardgamegeek.com/boardgame/ 438/scotland-yard. Online; accessed 3 Nov 2016
Turing AM (1956) Can a machine think. World Math 4:2099–2123
Wang D, Subagdja B, Tan A-H, Ng G-W (2009) Creating human-like autonomous players in real-time first person shooter computer games. In: Proceedings, twenty-first annual conference on innovative applications of artificial intelligence, pp 173–178
Schrum J, Karpov IV, Miikkulainen R (2011) Ut\({\hat{2}}\): human-like behavior via neuroevolution of combat behavior and replay of human traces. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 329–336
Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918
Richards N, Moriarty DE, Miikkulainen R (1998) Evolving neural networks to play go. Appl Intell 8(1):85–96
Lin W, Baldi P (2008) Learning to play go using recursive neural networks. Neural Netw 21(9):1392–1400
Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: International conference on machine learning, pp 1766–1774
Jetal Hunt K, Sbarbaro D, Żbikowski R, Gawthrop PJ (1992) Neural networks for control systems—a survey. Automatica 28(6):1083–1112
López de Lacalle LN, Lamikiz A, Sánchez JA, Arana JL (2002) Improving the surface finish in high speed milling of stamping dies. J Mater Process Technol 123(2):292–302
Ho W-H, Tsai J-T, Lin B-T, Chou J-H (2009) Adaptive network-based fuzzy inference system for prediction of surface roughness in end milling process using hybrid taguchi-genetic learning algorithm. Expert Syst Appl 36(2):3216–3222
Arnaiz-González Á, Fernández-Valdivielso A, Bustillo A, López de Lacalle LN (2016) Using artificial neural networks for the prediction of dimensional error on inclined surfaces manufactured by ball-end milling. Int J Adv Manuf Technol 83(5–8):847–859
Swain RR, Khilar PM, Dash T (2018) Neural network based automated detection of link failures in wireless sensor networks and extension to a study on the detection of disjoint nodes. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0709-3
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3—-4):279–292
Reddy PN, Dambekodi SN, Dash T (2017) Towards continuous monitoring of environment under uncertainty: a fuzzy granular decision tree approach
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT press, Cambridge
Puterman Martin L (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
McKinney W (2011) pandas: a foundational python library for data analysis and statistics. In: Python for high performance and scientific computing, pp. 1–9
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Kingma D, Adam JB (2014) A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest in relation to the publication of this article.
Additional information
Sahith N. Dambekodi and Preetham N. Reddy have contributed equally to this work.
Appendix 1: Some additional details
Appendix 1: Some additional details
1.1 Scotland Yard: the implemented version
In our present work, we implement the original Scotland Yard board game. This might bear some similarities to a game with same name that is played in Tokyo (Scotland Yard Tokyo), and in Switzerland (Scotland Yard - Swiss Edition). For additional clarification, we again provide information on the version that is being implemented in this work. There are two teams in this game. The two teams are of Mr. X. (alone) and 5 Detectives out to catch him. Mr. X. moves covertly and his position is revealed only 5 times throughout the entire game. The detectives must cleverly corner and capture Mr. X by the end of their game. Each location on the map is a node, numbered from 1 to 199. To travel between nodes, a player can use one of 3 transportation mechanisms: taxi, bus and subway, each of which has different connectivity across the map. Each of these services can be availed using tokens given to the players at the start of the game. We have not used double turns and waterways in our implementation of the game. What makes the game more interesting is that after each move by a detective, the detective would hand its used token used to Mr. X, who can use it himself in his covert movements. The overall result is more overlap between the two parties, and the detectives being able to affect the game in more than one way.
1.2 Q-learning: configuration
We have used the following configuration for the Q-Network. The detectives and Mr. X move using a static network until the turn is done. Each of their moves is stored in the replay memory. At the end of the game, the appropriate reward is given according to which a team won (either Mr. X or detective side). After the game is done and during the learning phase, the reward is back-propagated through each move made during the entire game. As we move backwards through the memory, the reward is diminished exponentially. In this way, greater weight is given to the late game turns.
Exploration is linearly decreased from 1 to 0 (as shown in above figures) depending on the total number of turns in the game. Exploration is defined as the probability that the agent will take a random move instead using the network.
The reward system is +100 for a win, −100 for a loss. For the detectives, the reward is calculated dynamically based on the distance of the detective from Mr. X when the game is over. The closer a detective is to Mr. X, during a win it will have a higher fraction of the +100 reward, and for a loss it will have a lower fraction of the −100 reward.
Rights and permissions
About this article
Cite this article
Dash, T., Dambekodi, S.N., Reddy, P.N. et al. Adversarial neural networks for playing hide-and-search board game Scotland Yard. Neural Comput & Applic 32, 3149–3164 (2020). https://doi.org/10.1007/s00521-018-3701-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3701-0