Multi-agent active sensing development is relatively recent. Models in the literature can address cooperative, self-interested, or adversarial settings. Cooperative settings usually focus on coordination to efficiently cover the environment or to maintain a common belief state. In self-interested and adversarial settings, the agents typically focus on gathering information to anticipate the other agents’ actions and react appropriately to avoid an undesirable state such as a collision or counter an adversarial attack. In both settings, however, it is interesting to note that the information-gathering process targets not only environmental factors (including a possible opponent location, speed, etc.) but also the other agents’ internal states (their intention, goals, beliefs, etc.).
5.2.1 Cooperative Settings.
To our knowledge, the first models to address multi-agent active perception are proposed by Renoux et al. [
58] and Eck and Soh [
22], who develop two different models focusing on cases in which agents cannot collaborate in advance to create a joint policy. Renoux et al. [
58] represents each agent in the system with a POMDP that reasons over extended belief states, which includes the agent’s own beliefs over the environmental factors but also a level-1 nested belief over the other agents in the systems. The reward function is based on a model of agent-based relevance, defined in Reference [
59] and presented in Equation (
8),
where
\(b_{t}\) is the belief state of the agent before receiving the observation
\(o\),
\(b_{t+1}\) is the belief state of the agent after receiving the observation
\(o\),
\(D_{KL}\) is the Kullback–Leibler distance, and
\(H\) is the negative entropy. This formulation assumes that an observation is relevant for a given agent if it is new (i.e., changes the agent’s belief significantly, measured by the Kullback–Leibler distance) or allows to render the agent’s beliefs more precise (measured by the difference in entropy). The parameters
\(\alpha\) and
\(\beta\) allow balancing these two (sometimes contradictory) aspects, and the parameter
\(\delta\) ensures that the relevance remains positive. Using this formulation and the extended belief state, the agents can proactively weigh the expected impact of an observation for another agent and optimize to send the most impactful one. This decentralized information sharing allows decentralized cooperation. To solve the extended POMDP, Renoux et al. [
58] transform it into a Belief-MDP, similarly to what has been described in the single-agent case (Section
4.2.3). Later, Renoux et al. [
60] use similar extended belief states to extend the POMDP-IR framework into a
Communicative POMDP-IR (Com-POMDP-IR). The Com-POMDP-IR extends the prediction actions from the POMDP-IR to incorporate predictions about a human operator’s belief state and uses these prediction actions to optimize a communication strategy. In this case, the resulting POMDP is solved using standard POMDP approaches, similarly to what has been described in the single-agent case (Section
4.2.1).
Eck and Soh [
22] also represents each agent as a POMDP and considers that each agent could request information from their neighbors. When they share information, agents share their entire belief space, and the receiving agent updates its own beliefs according to Equation (
9),
where
\(b\) is the belief state of the agent receiving the information,
\(b_{Sh}\) is the shared belief space,
\(x,x^{\prime } \in DOM(X)\) are the possible values for a partially observable phenomenon
\(X\), and
\(w\) is a constant weight that dampens shared information, as is commonly used in information fusion literature [
27]. The resulting model is solved by transforming it into a Knowledge POMDP, according to what has already been described in Section
4.2.3. The main difference between Reference [
58] and Reference [
22] lies in the fact that Reference [
58] exchanges single observations based on their estimated relevance and update the planning agent’s belief state using standard POMDPs’ Bayes rule, while Reference [
22] exchanges entire belief states and use an information fusion approach to merge the two belief states. However, both approaches focus on “on-the-go” multi-agent cooperation, where previous synchronization is unnecessary.
Lauri et al. [
42] propose an alternative approach to multi-agent active perception by extending the
\(\rho\)-POMDP approach to decentralized systems. The resulting
\(\rho\)Dec-POMDP applies an entropy measure to the reward function to perform active sensing. The authors use an exact algorithm and assume periodic explicit communication to compute a joint belief estimate. Lauri et al. [
44] relax the explicit communication hypothesis and present a heuristic for the
\(\rho\)Dec-POMDP, making it possible to solve bigger problems. This heuristic uses a final reward definition, i.e., a reward granted at the end of the finite horizon. This reward is also based on the Shannon entropy of the joint belief, as in Reference [
42]. Because of this final reward, the algorithm in References [
44] and [
45] still requires the explicit computation of the joint belief estimate, requiring a large memory and computation overhead. Lauri and Oliehoek [
43] show that the
\(\rho\)Dec-POMDP can be converted into a Dec-POMDP with linear rewards, similarly to the
\(\rho\)-POMDP and POMDP-IR equivalence. To do so, the authors introduce individual prediction actions in the
\(\rho\)Dec-POMDP model, similarly to the prediction actions of the POMDP-IR model. This equivalence then allows us to use standard Dec-POMDP solvers and therefore does not require the computation of joint state estimates. Finally, Best et al. [
12] introduced the
Decentralized Monte-Carlo Tree Search (Dec-MCTS), an online algorithm for asynchronous multi-robot coordination. Even though the Dec-MCTS is not limited to active perception scenarios, the authors illustrate its efficiency in an active perception scenario. Coordination is achieved by considering the plans of the other robots, which are communicated during a dedicated communication phase.
In the approaches above, the active perception process only concerns environmental factors, and mechanisms are implemented to allow the agents to reach a common belief over these factors. However, very little work has been done to include active perception of the other agents’ internal states. To our knowledge, only Best et al. [
13] chose this approach and extended the Dec-MCTS model with a communication planning algorithm to optimize a communication requests sequence. Agents evaluate other agents’ belief evolution and send a request for other agents’ to send them their plan. In doing so, the agents introduce active perception actions over other agents’ internal states by only requesting their plan when needed.