Download Advances In Swarm Intelligence 12Th International Conference Icsi 2021 Qingdao China July 17 21 2021 Proceedings Part I Lecture Notes In Computer Science 12689 Ying Tan Editor online ebook texxtbook full chapter pdf
Download Advances In Swarm Intelligence 12Th International Conference Icsi 2021 Qingdao China July 17 21 2021 Proceedings Part I Lecture Notes In Computer Science 12689 Ying Tan Editor online ebook texxtbook full chapter pdf
Download Advances In Swarm Intelligence 12Th International Conference Icsi 2021 Qingdao China July 17 21 2021 Proceedings Part I Lecture Notes In Computer Science 12689 Ying Tan Editor online ebook texxtbook full chapter pdf
https://ebookmeta.com/product/natural-language-processing-and-
chinese-computing-10th-ccf-international-conference-
nlpcc-2021-qingdao-china-october-13-17-2021-proceedings-part-i-
lu-wang/
Advances
in Swarm Intelligence
12th International Conference, ICSI 2021
Qingdao, China, July 17–21, 2021
Proceedings, Part I
Lecture Notes in Computer Science 12689
Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA
Advances
in Swarm Intelligence
12th International Conference, ICSI 2021
Qingdao, China, July 17–21, 2021
Proceedings, Part I
123
Editors
Ying Tan Yuhui Shi
Peking University Southern University of Science
Beijing, China and Technology
Shenzhen, China
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book and its companion volume, comprising LNCS volumes 12689 and 12690,
constitute the proceedings of The Twelfth International International Conference on
Swarm Intelligence (ICSI 2021) held during July 17–21, 2021, in Qingdao, China, both
on-site and online.
The theme of ICSI 2021 was “Serving Life with Swarm Intelligence.” The con-
ference provided an excellent opportunity for academics and practitioners to present
and discuss the latest scientific results and methods, innovative ideas, and advantages
in theories, technologies, and applications in swarm intelligence. The technical pro-
gram covered a number of aspects of swarm intelligence and its related areas. ICSI
2021 was the twelfth international gathering for academics and researchers working on
most aspects of swarm intelligence, following successful events in Serbia (ICSI 2020,
virtually), Chiang Mai (ICSI 2019), Shanghai (ICSI 2018), Fukuoka (ICSI 2017), Bali
(ICSI 2016), Beijing (ICSI-CCI 2015), Hefei (ICSI 2014), Harbin (ICSI 2013),
Shenzhen (ICSI 2012), Chongqing (ICSI 2011), and Beijing (ICSI 2010), which
provided a high-level academic forum for participants to disseminate their new research
findings and discuss emerging areas of research. ICSI 2021 also created a stimulating
environment for participants to interact and exchange information on future challenges
and opportunities in the field of swarm intelligence research.
Due to the ongoing COVID-19 pandemic, ICSI 2021 provided opportunities for
both online and offline presentations. On the one hand, ICSI 2021 was held normally in
Qingdao, China, but on the other hand, the ICSI 2021 technical team provided the
ability for authors who were subject to restrictions on overseas travel to present their
work through an interactive online platform or video replay. The presentations by
accepted authors were made available to all registered attendees on-site and online.
The host city of ICSI 2021, Qingdao (also spelled Tsingtao), is a major
sub-provincial city in the eastern Shandong province, China. Located on the western
shore of the Yellow Sea, Qingdao is a major nodal city on the 21st Century Maritime
Silk Road arm of the Belt and Road Initiative that connects East Asia with Europe, and
has the highest GDP of any city in the province. It had jurisdiction over seven districts
and three county-level cities till 2019, and as of 2014 had a population of 9,046,200
with an urban population of 6,188,100. Lying across the Shandong Peninsula and
looking out to the Yellow Sea to its south, Qingdao borders the prefectural cities of
Yantai to the northeast, Weifang to the west, and Rizhao to the southwest.
ICSI 2021 received 177 submissions and invited submissions from about 392
authors in 32 countries and regions (Algeria, Australia, Bangladesh, Belgium, Brazil,
Bulgaria, Canada, China, Colombia, India, Italy, Japan, Jordan, Mexico, Nigeria, Peru,
Portugal, Romania, Russia, Saudi Arabia, Serbia, Slovakia, South Africa, Spain,
Sweden, Taiwan (China), Thailand, Turkey, United Arab Emirates, UK, USA, and
Vietnam) across 6 continents (Asia, Europe, North America, South America, Africa,
and Oceania). Each submission was reviewed by at least 2 reviewers, and had on
vi Preface
average 2.5 reviewers. Based on rigorous reviews by the Program Committee members
and additional reviewers, 104 high-quality papers were selected for publication in this
proceedings, an acceptance rate of 58.76%. The papers are organized into 16 cohesive
sections covering major topics of swarm intelligence research and its development and
applications.
On behalf of the Organizing Committee of ICSI 2021, we would like to express our
sincere thanks to the International Association of Swarm and Evolutionary Intelligence
(IASEI), which is the premier international scholarly society devoted to advancing the
theories, algorithms, real-world applications, and developments of swarm intelligence
and evolutionary intelligence. We would also like to thank Peking University, Southern
University of Science and Technology, and Ocean University of China for their
co-sponsorships, and the Computational Intelligence Laboratory of Peking University
and IEEE Beijing Chapter for their technical co-sponsorships, as well as our supporters
including the International Neural Network Society, World Federation on Soft Com-
puting, International Journal of Intelligence Systems, MDPI’s journals Electronics and
Mathematics, Beijing Xinghui Hi-Tech Co., and Springer Nature.
We would also like to thank the members of the Advisory Committee for their
guidance, the members of the Program Committee and additional reviewers for
reviewing the papers, and the members of the Publication Committee for checking the
accepted papers in a short period of time. We are particularly grateful to Springer for
publishing the proceedings in the prestigious series of Lecture Notes in Computer
Science. Moreover, we wish to express our heartfelt appreciation to the plenary
speakers, session chairs, and student helpers. In addition, there are still many more
colleagues, associates, friends, and supporters who helped us in immeasurable ways;
we express our sincere gratitude to them all. Last but not the least, we would like to
thank all the speakers, authors, and participants for their great contributions that made
ICSI 2021 successful and all the hard work worthwhile.
General Co-chairs
Ying Tan Peking University, China
Russell C. Eberhart IUPUI, USA
Tutorial Co-chairs
Junqi Zhang Tongji University, China
Shi Cheng Shanxi Normal University, China
Publications Co-chairs
Swagatam Das Indian Statistical Institute, India
Radu-Emil Precup Politehnica University of Timisoara, Romania
Publicity Co-chairs
Yew-Soon Ong Nanyang Technological University, Singapore
Carlos Coello CINVESTAV-IPN, Mexico
Yaochu Jin University of Surrey, UK
Rossi Kamal GERIOT, Bangladesh
Dongbin Zhao Institute of Automation, China
Conference Secretariat
Renlong Chen Peking University, China
Program Committee
Ashik Ahmed Islamic University of Technology, Bangladesh
Rafael Alcala University of Granada, Spain
Abdelmalek Amine Tahar Moulay University of Saida, Algeria
Sabri Arik Istanbul University, Turkey
Carmelo J. A. Bastos Filho University of Pernambuco, Brazil
Sandeep Bhongade G.S. Institute of Technology, India
Sujin Bureerat Khon Kaen University, Thailand
Bin Cao Tsinghua University, China
Organization ix
Additional Reviewers
Differential Evolution
Fireworks Algorithms
Multi-objective Optimization
Map Fusion Method Based on Image Stitching for Multi-robot SLAM . . . . . 146
Qirong Tang, Kun Zhang, Pengjie Xu, Jingtao Zhang, and Yuanzhe Cui
Machine Learning
Data Mining
Artificial Fish Swarm Algorithm for Mining High Utility Itemsets . . . . . . . . 407
Wei Song, Junya Li, and Chaomin Huang
Other Applications
1 Introduction
Basic concept of modern swarm development is complication of tasks, which decide
every unit, when solving common swarm aim problem [1–3]. When physical swarm
units operate at the environment space, the problem is to minimize a time of units
mutual control, that increase demands to units digital control systems [4]. As a rule,
for control the unit onboard equipment Von Neumann computers are used. This device,
in comparison with analogue controllers, possesses with new properties, which follows
from sequential, operator-by-operator, interpretation of algorithm [5–7], embedded into
controller. So it is necessary to spend any time to calculate action, transmitted to actuator
after receiving data from sensors [8]. Time intervals emerging between input/output data
vectors (data skew), and between input data from sensors and output data to actuators
(pure lag) affect on quality characteristics of swarm unit control system as a whole [9,
10], so they should be taken into account when design the system.
There are no any difficulties in estimation of time intervals in simple case, when
cyclic control algorithm include input-output-calculation-return operators only, but when
structure of soft, involving transactions operators, is rather complicated, there is the
problem to estimate time intervals between transactions at the stage of algorithm design.
To solve the problem one should to take into account those facts, that data, forming on
swarm unit sensors outputs, are random one; data may be processed by quite different
algorithm branches, possessing quite different time complexities; algorithm includes
decision operators at branching points. So to facilitate the problem solution semi-Markov
processes theory [11–14] should be accepted as basic concept for control algorithm
simulation. Methods of swarm unit digital control system simulation at the stage of its
design in order to determine unit performance are not widespread, that confirms necessity
and relevancy of investigation in the area.
Multi, K-loop, digital control system (DCS) structure is shown on the Fig. 1. It is rather
classic one, and includes two subsystems: linear Object Under Control (OUC) and Dig-
ital Controller (DC). OUC consists of K units, every of which is described with transfer
function vector Wk (s) = [Wk1 (s), ..., Wkl (s), ..., WkK (s)] and feedback scalar trans-
fer function W0, k (s), 1 ≤ k ≤ K. Vectors Wk (s), and scalars W0, k (s), describe the
dynamics of OUC k-th unit itself and feedback sensor, respectively. DC is real time
Von Neumann type computer, which interprets control program, and in cycle generates
quests both to actuators, and to sensors for organizing the managing procedure.
U1(s) X1(s)
F1(s) W1(s)
X0,1(z)
W0,1(s)
...
...
Uk(s)
Xk(s)
Wk(s)
Fk(s) Wc(s)
X0,k(z)
W0,k(s)
... ...
XK(s)
UK(s) WK(s)
FK(s)
X0,K(z)
W0,K(s)
DC OUC
vector U(s) = [U1 (s), ..., Uk (s), ..., UK (s)] is generated by software, and is physi-
cally transformed into the swarm unit state X(s) = [X1 (s), ..., Xk (s), ..., XK (s)] as
follows:
⎡ ⎤
W1 (s)
⎢ ... ⎥
⎢ ⎥
⎢ ⎥
X(s) = U(s) · W(s) = U(s) · ⎢ Wk (s) ⎥, (1)
⎢ ⎥
⎣ ... ⎦
WK (s)
In accordance with the theorem about shifting in the time domain [15, 16]
where τ is the shifting value; t is the time; ϕ(t) is a function; L[...] - direct Laplace
transform: (s) is the Laplace transform of ϕ(t).
From (2) it follows, that
where Fsh (s), Xsh (s), Ush (s) are vectors F(s), X0 (s), U(s), elements of which are delayed
on time; Qf (s) = Qf , kl (s) , Q0 (s) = Q0, kl (s) , Qv (s) = Qu, kl (s) are diagonal lag
matrices, in which
⎧
⎨ 0, when k = l;
Qf , kl (s) = 1, when k = l = 1; (6)
⎩
exp −τf , k s , when 2 ≤ k = l ≤ K;
0, when
k = l;
Q0, kl (s) = (7)
exp −τ0, k s , when k = l;
6 E. Larkin et al.
0, when
k = l;
Qu, kl (s) = (8)
exp −τu, k s , when k = l.
Data, processed in DC, are discrete one, so in strict sense, ordinary transfer function
apparatus is not fit for description of U(s) vector elements calculation. But, when sam-
pling period is approached to zero, then data processing in frequency domain may be
described as ordinary transfer function matrix Wc (s). So, in the case, when, processing
feedback signal, DC realizes a linear control law, on its outputs U(s) the following vector
signal is generated [5, 6]:
where Wc (s) = Wc, kl (s) is the K × K matrix of linear transfer functions, which are
embedded into DC as a software; W0 (s) = W0, kl (s) is the K × K diagonal matrix,
whose elements are as follows:
0, when k = l;
W0, kl (s) = (10)
W0, k (s), when k = l.
Simultaneous solution of (9) and (11) relatively to X(s) gives the following
expression
For estimation of time intervals the model of Von Neumann computer operation in time
domain should be worked out. For simplicity it may be represented as including trans-
action operators only. Control process in such model is reduced to elements of vectors
F(s), X0 (s) reading from interface and elements of vector U(s) writing to interface.
The algorithm, generated quests, is the cyclic one, but in it absent a looping effect. The
algorithm may generate transactions in an arbitrary sequence, with one exception; the
same transaction can not be generated twice at a time. Also, due to the fact, that for
control action U(s) calculation all element of vectors F(s) and X0 (s) should be used,
the strong connectivity condition should be imposed [17, 18] on the graph, which, rep-
resents the structure of control algorithm. In common case such properties has the full
oriented graph without loops, shown on the Fig. 2 a. In simplest case vectors F(s) and
Swarm Unit Digital Control System Simulation 7
X0 (s) elements are quested in turn, after that control action is calculated, and after that
elements of U(s) are quested in turn (Fig. 2 b).
With taking into account randomness of time interval between transactions and
stochastic transactions sequence for external observer, the adequate approach to algo-
rithm simulation is semi-Markov process [11–14], which states are abstract analogues
of algorithm operators. Semi-Markov process is represented by the semi-Markov matrix
h(t) = [hkl (t)] = gkl (t) ⊗ pkl (t) , (12)
where pkl (t) is probability of the direct switching from the k-th state to the l-th state;
gkl (t) is the time density of residence the process (17) in the k-th state before switching
into the l-th state; ⊗ is the direct multiplication sign; t is the physical time.
1 k 1 k
3K l 3K l
a c
Fig. 2. Common structure of semi-Markov process (a), simplest case (b) and the model for time
interval estimation (c)
Semi-Markov process (13) is ergodic one and does not include both absorbing, and
partially absorbing states. Due to semi-Markov process ergodicity on densities gk, l (t)
and probabilities pk, l (t) following restrictions are imposed:
3K
pkl = 1; (14)
l=1
where 3K is common quantity of transaction operators; Tklmin and Tklmax are upper and
lower bounds of density gkl (t) domain.
When estimation of time intervals between transactions it is no matter how semi-
Markov process (13) gets l-th state from the first one. Determining in the case is that
8 E. Larkin et al.
switch is the first, but not second, third, etc. For time interval estimation initial semi-
Markov process should be transformed into the process with the structure, shown on the
Fig. 2 c, in which first state is the starting one, and l-th state is the absorbing one. For
getting such structure:
In such a way
h(t) → h (t) = gkl (t) · pkl
. (16)
After recalculation probabilities according (15), partially absorbing states are anni-
hilated, and events of getting the l-th state from the first state begin to make up a full
group of incompatible events. In such a way, time density of wandering from the first
state to the l-th state may be estimated as follows [19]
⎡ ⎤
∞
−1 ⎣
L h (t) ⎦ · Ilc ,
j
l (t) = I1 · L
r
g1, (17)
j=1
where L−1 [...] is the inverse Laplace transform; I1r is the row-vector, first element of
which is equal to one, and other elements are equal to zeros; Ilc is the column-vector,
l-th element of which is equal to one, and other elements are equal to zeros.
For time density (19) the expectation and the dispersion may be calculated, as usual
[20]:
∞
T1l =
t · g1l (t)dt; (18)
0
∞
2
D1l = t − T1l · g1l (t)dt (19)
0
In simplest case density, expectation and dispersion of reaching time the l-th state
from the first, are as follows:
l−1
g1l (t) = L−1 L gk, k+1 (t) , (20)
k=1
l
T1l (t) = Tk, k+1 ; (21)
k=1
Swarm Unit Digital Control System Simulation 9
l
D1l (t) = Dk, k+1 ; (22)
k=1
where gk, k+1 (t), Tk, k+1 , Dk, k+1 (t) are density, expectation and dispersion of time of
residence the process, shown on the Fig. 2 c, in the k-th state before switching into the
(k + 1)-th state.
Expectations T1l (t) = τ1l give middle estimations of time delays. Also time intervals
may be estimated with using “three sigma rule” [21], as follows:
τ1l = T1l + 3 D1l. (23)
Estimations (17)–(23) define lags of input/output vectors F(s), X0 (s), U(s) elements
with respect to input the element F1 (s). All other delays may be obtained from these
parameters. For example, delay between input of k-th element, 1 ≤ l ≤ 2K and output
of l-th element 2K + 1 ≤ m ≤ 3K may be defined as
When obtaining swarm unit control system closed loop transfer function according
(11) estimations (18), (21), or (23) may be used.
-
W11(s)
F1(s) U1(s) X1(s)
W21(s)
DC OUC
W12(s)
X2(s)
F2(s) U2(s)
W22(s)
At all named plots processes start with delays, which are defined by output lags
of signals U1 (s), U2 (s). Figure 4 b demonstrates in general a stable system, but with
increased overshooting and time of reaching the mode. Figure 4 c demonstrate the
performance of system, close to stability border, and Fig. 4 c shows fully unstable
system.
5 Conclusion
As a result, the mathematical model of physical swarm unit digital control system,
which takes into account real characteristics of Von Neumann type controllers, is worked
out. Method of estimation of time intervals between transactions, generated by digital
controller algorithm of arbitrary complexity, to unit actuators and sensors, is proposed. It
is shown, that time delays between input/output elements of the same vector (data skew),
and between input of data from sensors and output data to actuators (feedback lag) causes
deterioration of swarm unit performance characteristics, such as overshooting and time
of reaching the mode. The results of investigation may be recommended for utilization
in ingineering practice of swam unit soft design.
Further investigations in the domain may be directed to working out methods of
practical swarm control algorithms synthesis, optimal to complexity-quality ratio.
The research was supported by the Foundation for Basic Research under the project
19-47-710004 r_a.
References
1. Bouallègue, S., Haggège, J., Ayadi, M., Benrejeb, M.: PID-type fuzzy logic controller tuning
based on particle swarm optimization. Eng. Appl. Artif. Intell. 25(3), 484–493 (2012). https://
doi.org/10.1016/j.engappai.2011.09.018
2. Reyes-Sierra, M., Coello Coello, C.A.: Multi-objective particle swarm optimizers: a survey
of the state-of-the-art. Int. J. Comput. Intell. Res. 2(3), 287–308 (2006). https://doi.org/10.
5019/j.ijcir.2006.68
3. Babishin, V., Taghipour, S.: Optimal maintenance policy for multicomponent systems with
periodic and opportunistic inspections and preventive replacements. Appl. Math. Model.
40(23), 10480–10505 (2016). https://doi.org/10.1016/j.apm.2016.07.019
4. Larkin, E., Antonov, M.: On assessing the temporal characteristics of reaching the milestone
by a swarm. In: Tan, Y., Shi, Y., Tuba, M. (eds.) ICSI 2020. LNCS, vol. 12145, pp. 46–55.
Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53956-6_5
5. Landau, I.D., Zito, G.: Digital Control Systems, Design, Identification and Implementation,
p. 484. Springer, Heidelberg (2006)
6. Aström, J., Wittenmark, B.: Computer Controlled Systems: Theory and Design, p. 557.
Tsinghua University Press. Prentice Hall (2002)
7. Fadali, M.S., Visioli, A.: Digital Control Engineering: Analysis and Design, pp. 239–272.
Elsevier Inc. (2013)
8. Larkin, E.V., Ivutin, A.N.: Estimation of latency in embedded real-time systems. In: 3rd
Meditteranean Conference on Embedded Computing (MECO-2014), Budva, Montenegro,
pp. 236–239 (2014)
12 E. Larkin et al.
9. Auslander, D.M., Ridgely, J.R., Jones, J.C.: Real-time software for implementation of feed-
back control. In: Levine, W.S. (ed.) The Control Handbook. Control System Fundamentals,
pp. 16-1–16-32. CRC Press. Taylor and Francis Group (2017)
10. Karnopp, D.C., Margolis, D.L., Rosenberg, R.C.: System Dynamics: Modeling, Simulation
and Control of Mechatronic Systems, p. 636. Hoboken, Willey (2012)
11. Bielecki, T., Jakubowski, J., Niew˛egłowski, M.: Conditional Markov chains: properties, con-
struction and structured dependence. Stochast. Process. Appl. 127(4), 1125–1170 (2017).
https://doi.org/10.1016/j.spa.2016.07.010
12. Ching, W.K., Huang, X., Ng, M.K., Siu, T.K.: Markov Chains: Models, Algorithms and
Applications. International Series in Operations Research & Management Science, vol. 189,
p. 241. Springer, New York (2013)
13. Howard, R.A.: Dynamic Probabilistic Systems, vol. 1: Markov Models. vol. II: Semi-Markov
and Decision Processes. Courier Corporation (2012)
14. Janssen, J., Manca, R.: Applied Semi-Markov Processes, p. 310. Springer, Heidelberg (2006)
15. Schiff, J.L.: The Laplace Transform: Theory and Applications, p. 233. Springer, New York
(1991)
16. Li, J., Farquharson, C.G., Hu, X.: Three effective inverse Laplace transform algorithms for
computing time -domain electromagnetic responses. Geophysics 81(2), E75–E90 (2015)
17. Arnold, K.A.: Timing analysis in embedded systems. In: Ganssler, J., Arnold, K., et al. (eds.)
Embedded Hardware, pp. 239–272. Elsevier Inc. (2008)
18. Balsamo, S., Harrison, P., Marin, A.: Methodological construction of product-form stochastic
Petri nets for performance evaluation. J. Syst. Softw. 85(7), 1520–1539 (2012). https://doi.
org/10.1016/j.jss.2011.11.1042
19. Larkin, E., Akimenko, T., Privalov, A.: Synchronized swarm operation. In: Tan, Y., Shi, Y.,
Tuba, M. (eds.) Advances in Swarm Intelligence, ICSI 2020, pp. 15–24. Springer, Cham
(2020). https://doi.org/10.1007/978-3-030-53956-6_2
20. Kobayashi, H., Marl, B.L., Turin, W.: Probability, Random Processes and Statistical Analysis,
p. 812. Cambridge University Press (2012)
21. Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994)
Natural Emergence of Heterogeneous
Strategies in Artificially Intelligent
Competitive Teams
1 Introduction
Multi agent systems can play an important role in scenarios such as disaster
relief, defense against enemies and games. There have been studies on various
Supported by organization ONR N00014-19-C-1070, AFOSR/AFRL award FA9550-
18-1-0251, Darpa DARPA Cooperative Agreement No.: HR00111820051 and AFOSR
FA9550-15-1-0442.
c Springer Nature Switzerland AG 2021
Y. Tan and Y. Shi (Eds.): ICSI 2021, LNCS 12689, pp. 13–25, 2021.
https://doi.org/10.1007/978-3-030-78743-1_2
14 A. Deka and K. Sycara
Fig. 1. The FortAttack environment in which guards (green) need to protect the fort
(cyan semicircle at the top) from the attackers (red). The attackers win when any one
of them reaches the fort. Each agent can shoot a laser which can kill an opponent.
aspects of it including task assignment [16], resilience to failure [12], scalability [1]
and opponent modeling [23]. Multi agent systems become increasingly complex
in mixed cooperative-competitive scenarios where an agent has to cooperate with
other agents of the same team to jointly compete against the opposing team. It
becomes difficult to model behavior of an agent or a team by hand and learning
based methods are of particular appeal.
Our goal is to develop a learning based algorithm for decentralized control of
multi agent systems in mixed cooperative-competitive scenarios with the abil-
ity to handle a variable number of agents, as some robots may get damaged in
a real world scenario or some agents may get killed in a game. To be able to
handle a variable number of agents and to scale to many agents, we propose to
use a Graph Neural Networks (GNNs) based architecture to model inter-agent
interactions, similar to [1] and [3]. This approach relies on shared parameters
amongst all agents in a team which renders all of them homogeneous. We aim
to study if heterogeneous behavior can emerge out of such homogeneous agents.
(within the same team) when such behavior can lead to the team’s success.
Such behavior implicitly includes heterogeneous task allocation and complex
coordination within a team, none of which had to be explicitly crafted but
can be extremely beneficial for multi agent systems.
2 Related Work
2.1 Multi-agent Reinforcement Learning
The recent successes of reinforcement learning in games, [11,17] and robotics, [6,
15] have encouraged researchers to extend reinforcement learning to multi agent
settings.
There are three broad categories of approaches used, centralized, decentral-
ized and a mix of the two. Centralized approaches have a single reinforcement
learning agent for the entire team, which has global state information and selects
joint actions for the team. However, the joint state and action spaces grows expo-
nentially with the number of agents rendering centralized approaches difficult to
scale [5].
Independent Q-learning, [19,20] is a decentralized approach where each agent
learns separately with Q-learning, [22] and treats all other agents as parts of the
environment. Inter agent interactions are not explicitly modeled and performance
is generally sub-par.
Centralized learning with decentralized execution has gained attention
because it is reasonable to remove communication restrictions at training time.
Some approaches use a decentralized actor with a centralized critic, which is
accessible only at training time. MADDPG, [10] learns a centralized critic for
each agent and trains policies using DDPG, [9]. QMIX, [13] proposes a mono-
tonic decomposition of action value function. However, the use of centralized
critic requires that the number of agents be fixed in the environment.
GridNet, [7] addresses the issue of multiple and variable number of agents
without exponentially growing the policy representation by representing a pol-
icy with an encoder-decoder architecture with convolution layers. However, the
centralized execution realm renders it infeasible in many scenarios.
Graphs can naturally model multi agent systems with each node representing
an agent. [18] modeled inter agent interactions in multi agent teams using GNNs
which can be learnt through back propagation. [8] proposed to use attention
and [1] proposed to use an entity graph for augmenting environment information.
However, these settings don’t involve two opposing multi agent teams that both
evolve by learning.
[3] explored multi agent reinforcement learning for the game of hide and
seek. They find that increasingly complex behavior emerge out of simple rules of
the game over many episodes of interactions. However, they relied on extremely
heavy computations spanning over many millions of episodes of environment
exploration.
We draw inspiration from [1] and [3]. For each team we propose to have two
components within the graph, one to model the observations of the opponents
16 A. Deka and K. Sycara
and one to model the interactions with fellow team mates. Our work falls in the
paradigm of centralized training with decentralized execution. We were able to
train our agents in the FortAttack environment using the proposed approach on
a commodity laptop. We believe that the reasonable computational requirement
would encourage further research in the field of mixed cooperative-competitive
MARL.
3 Method
The agents in a multi-agent team can be treated as nodes of a graph to lever-
age the power of Graph Neural Networks (GNNs). GNNs form a deep-learning
architecture where the computations at the nodes and edges of the graph are
performed by neural networks (parameterized non-linear functions), [1]. Due to
the presence of graph structure and multiple neural networks, they are called
GNNs.
We describe our use of GNNs from the perspective of one team and use Xi
to denote the state of ith friendly agent in the team, which in our case is its
position, orientation and velocity. We use XOppj to denote the state of the j th
opponent in the opposing team. Let S = {1, 2, . . . , N1 } denote the set of friendly
agents and SOpp = {N1 + 1, N1 + 2, . . . , N1 + N2 } denote the set of opponents.
Note that a symmetric view can be presented from the perspective of the other
team.
In the following, we describe how agent 1 processes the observations of its
opponents and how it interacts with its teammates. Figure 2 shows this pictori-
ally for a 3 agents vs 3 agents scenario. All the other agents have a symmetric
representation of interactions.
Natural Emergence of Multi-agent AI Strategies 17
Fig. 2. Modeling of inter agent interactions with Graph Neural Networks (GNNs) from
the perspective of agent 1, in a 3 friendly agents vs 3 opponents scenario. Left: agent
1’s embedding, H10 is formed by taking into consideration the states of all opponents
through an attention layer. Right: agent 1’s embedding gets updated, (H1k → H1k+1 )
by taking into consideration its team mates through an attention layer.
Friendly agent 1 takes its state, X1 and passes it through a non-linear function,
fθa to generate an embedding, h1 . Similarly, it forms an embedding, hOppj from
each of its opponents with the function fθb .
Note that the opponents don’t share their information with the friendly agent
1. Friendly agent 1 merely makes its own observation of the opponents. It then
computes a dot product attention, ψ1j which describes how much attention it
pays to each of its opponents. The dimension of h1 and hOppj are d1 each. This
attention allows agent 1 to compute a joint embedding, e1 of all of its opponents.
1
ψ̂1j = < h1 , hOppj > ∀j ∈ SOpp (3)
d1
exp(ψ̂1j )
ψ1j = (4)
m∈SOpp exp(ψ̂1m )
e1 = ψ1j hOppj (5)
j∈SOpp
In Eq. 3, <, > denotes vector dot product. Note that j∈SOpp ψ1j = 1 which
ensures that the net attention paid by agent 1 to its opponents is fixed. Finally,
e1 is concatenated with h1 to form an agent embedding, H10 :
3.3 Policy
The final embedding of friendly agent 1, H1K is passed through a policy head. In
our experiments, we use a stochastic policy in discrete action space and hence
the policy head has a sigmoid activation which outputs a categorical distribution
specifying the probability of each action, αm .
π(αm |O1 ) = π (αm |H1K ) = sigmoid(fθd (H1K )) (12)
where, O1 = {Xi : i ∈ S} ∪ {XOppj : j ∈ SOpp }
Here, O1 is the observation of agent 1, which consists of its own state and the
states of all other agents that it observes. This corresponds to a fully connected
graph. We do this for simplicity. In practice, we could limit the observation space
of an agent within a fixed neighborhood around the agent similar to [1] and [3].
3.5 Training
4 Environment
Fig. 3. Average reward per agent per episode for the teams of attackers and guards as
training progresses. The reward plots have distinct extrema and corresponding snap-
shots of the environment are shown. The x-axis shows the number of steps of environ-
ment interaction. The reward is plotted after Gaussian smoothing.
Each agent can observe all the other agents in the environment. Hence, the
observation space consists of states (positions, orientations and velocities) of
team mates and opponents. We assume full observability as the environment
Natural Emergence of Multi-agent AI Strategies 21
Fig. 4. Sample sequences for different strategies that evolved during training. Each
row represents one sequence and time moves from left to right.
22 A. Deka and K. Sycara
Fig. 5. Average reward per agent per episode for guards as ensemble training pro-
gresses. The reward is shown after Gaussian smoothing.
5 Results
We show the results for the 5 guards vs 5 attackers scenario in the FortAttack
environment.
team and a corresponding minima for the other. These extrema correspond to
increasingly complex strategies that evolve naturally - as one team gets better
at its task, it creates pressure for the other team, which in turn comes up with
a stronger and more complex strategic behavior.
1. Random behavior : At the beginning of training, agents randomly move around
and shoot in the wild. They explore trying to make sense of the FortAttack
environment and their goals in this world.
2. Flash laser : Attackers eventually learn to approach the fort and the guards
adopt a simple strategy to win. They all continuously flash their lasers cre-
ating a protection zone in front of the fort which kills any attacker that tries
to enter.
3. Sneak : As guards block entry from the front, attackers play smart. They
approach from all the directions, some of them get killed but one of them
manages to sneak in from the side.
4. Spread and flash: In response to the sneaking behavior, the guards learn to
spread out and kill all attackers before they can sneak in.
5. Deceive: To tackle the strong guards, the attackers come up with the strategy
of deception. Most of them move forward from the right while one holds back
on the left. The guards start shooting at the attackers on the right which
diverts their attention from the single attacker on the left. This attacker qui-
etly waits for the right moment to sneak in, bringing victory for the whole
team. Note that this strategy requires heterogeneous behavior amongst the
homogeneous agents, which naturally evolved without explicitly being encour-
aged to do so.
6. Spread smartly: In response to this, the guards learn to spread smartly, cov-
ering a wider region and killing attackers before they can sneak in.
train a single guard policy against all of the attacker strategies, by randomly
sampling one attacker strategy for each environment episode. Figure 5 shows
the reward for guards as training progresses. This time, the reward for guards
continually increases and doesn’t show an oscillating behavior.
6 Conclusions
In this work we were able to scale to multiple agents by modeling inter agent
interactions with a graph containing two attention layers. We studied the evo-
lution of complex multi agent strategies in a mixed cooperative-competitive
environment. In particular, we saw the natural emergence of deception strategy
which required heterogeneous behavior amongst homogeneous agents. If instead
we wanted to explicitly encode heterogeneous strategies, a simple extension of
our work would be to have different sets of policy parameters (fθd ) within the
same team, e.g. one set for aggressive guards and one set of defensive guards.
We believe that our study would inspire further work towards scaling multi
agent reinforcement learning to large number of agents in more complex mixed
cooperative-competitive scenarios.
References
1. Agarwal, A., Kumar, S., Sycara, K.: Learning transferable cooperative behavior in
multi-agent teams. arXiv preprint arXiv:1906.01202 (2019)
2. Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., Abbeel, P.:
Continuous adaptation via meta-learning in nonstationary and competitive envi-
ronments. arXiv preprint arXiv:1710.03641 (2017)
3. Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint
arXiv:1909.07528 (2019)
4. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
5. Bu, L., Babu, R., De Schutter, B., et al.: A comprehensive survey of multiagent
reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(2),
156–172 (2008)
6. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maxi-
mum entropy deep reinforcement learning with a stochastic actor. arXiv preprint
arXiv:1801.01290 (2018)
7. Han, L., et al.: Grid-wise control for multi-agent reinforcement learning in video
game ai. In: International Conference on Machine Learning, pp. 2576–2585 (2019)
8. Hoshen, Y.: Vain: attentional multi-agent predictive modeling. In: Advances in
Neural Information Processing Systems, pp. 2701–2711 (2017)
9. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971 (2015)
10. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent
actor-critic for mixed cooperative-competitive environments. In: Neural Informa-
tion Processing Systems (NIPS) (2017)
11. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature
518(7540), 529–533 (2015)
Natural Emergence of Multi-agent AI Strategies 25
[334]
[335]
[336]
[337]
[338]
[339]
[340]
Brooks and Conklin, Johns Hopkins Univ. Circ. x. 1891, No. 88.
[341]
[343]
[344]
[345]
[346]
ἀκαλήφη = a nettle.
[347]
[348]
[349]
[350]
[351]
[353]
[354]
Agassiz and Mayer, Mem. Mus. Comp. Zool. xxvi. 3, 1902, p. 153.
[355]
[356]
[357]
[358]
[359]
[360]
[361]
Proc. Roy. Irish Acad. 3rd ser. v. 1900, p. 735.
[362]
[363]
[364]
[365]
[366]
[367]
[368]
[369]
[370]
[371]
G. Lindström, Handl. k. Svensk. Vet. Akad. xxxii. 1899.
[372]
[373]
[374]
[375]
[376]
[377]
[378]
[379]
[380]
[381]
[383]
[384]
[385]
[386]
[387]
[388]
[389]
[390]
[391]
Zool. Anz. xxv. 1902, p. 302.
[392]
[393]
[394]
[395]
[396]
[397]
[398]
[399]
Ashworth and Annandale, Proc. Roy. Soc. Edinb. xxv. 1904, p. 11.
[400]
[401]
[403]
[404]
[406]
[407]
[408]
[409]
[410]
For the details of these borings, see "The Atoll of Funafuti," Royal
Society of London, 1904.
[411]
[412]
[413]
[414]
[416]
[417]
[418]
[419]
[420]
[421]
[422]
[423]
[424]
[425]
L. Roule, Bull. Mus. Océanogr. Monaco, 1904, p. 3.
[426]
[427]
[428]
[429]
The two costae that are seen in the middle when the Ctenophore
is viewed in the transverse plane, as in Figs. 180 and 181, and the
corresponding costae on the opposite side are called the
"transverse" costae; the other four are called the "sagittal" costae.
[430]
[431]
[432]
[433]
[434]
[436]
[437]
[438]
[439]
[440]
[441]
[442]
Durham, "Wandering Cells in Echinodermata," Quart. J. Micr. Sci.
xxxiii. 1891, pp. 81 et seq.
[443]
[444]
[445]
[446]
[447]
[448]
[449]
"Cont. à l'Étude anat. des Astérides," Arch. Zool. Exp. (2) v. bis,
1887, p. 104.
[451]
[452]
[453]
Ludwig, "Die Echinodermen des Golfes von Neapel," pp. 68, 69.
[454]
[455]
[456]
[457]
[458]
This figure does not show the animal's attitude during forward
progression quite correctly. The tips of the two anterior arms
should be bent outwards, not inwards as in the figure.
[459]
[460]
[461]
[462]
[463]
[464]
[466]
[467]
[468]
[469]
[470]
[471]
[472]
[473]
"The Fauna and Bottom Deposits near the thirty-fathom line from
the Eddystone grounds to Startpoint," Journ. Marine Biol. Ass. v.
1899, p. 472.
[475]
"Mitth. über die zool. Stat. v. Neapel," Zeitschr. wiss. Zool. xxv.
1875, p. 471.
[476]
[477]
[478]
[479]
[480]
Loc. cit.
[481]
[482]
"Die Wirkung von Licht und Schatten auf die Seeigel," Zeitschr. für
Biol. xl. 1900, p. 447.
[483]
[484]
[485]
[486]
[487]
[488]
In this case the fluid flows from the lantern coelom into Stewart's
organs and vice versa. Oxygen must be absorbed through the
peristome. The Cidaridae are not as sensitive to want of oxygen as
the other families (Uexküll, loc. cit.).
[489]
[490]
[492]
[493]
Loc. cit.
[494]
Loc. cit.
[495]
Reference on p. 528 n.
[496]
[497]
[498]
[499]
[500]
[501]
[502]
[503]
[504]
[505]
[506]
[507]