1. Introduction
The target allocation problem in multi-AUV (autonomous underwater vehicle) operations represents a significant challenge in constrained combinatorial optimization, closely analogous to resource allocation problems in other domains. This problem, first introduced by [
1] and established as NP-complete by [
2] in 1986, is characterized by exponentially increasing solution times as the number of targets grows. Traditional enumeration-based algorithms [
3] are therefore limited to addressing only small-scale problems. The central objective of target allocation is to minimize resource utilization while maximizing operational benefits, requiring an effective allocation of targets to tasks under various scenarios. Two critical aspects define the problem: the probability of AUV effectiveness, which determines the success of the allocation strategy, and a refined analytical description of the allocation process, which is essential for developing computational models and algorithms.
The target allocation problem can be categorized into static and dynamic allocation problems [
4]. In static allocation, all resources are assigned at one stage, with complete knowledge of problem parameters, making the goal to optimize temporary operational tasks. In contrast, dynamic allocation involves multiple stages, where decisions are made iteratively based on the outcomes of previous stages. In multi-AUV non-cooperative scenarios, the target allocation problem is inherently dynamic [
5], and it can be modeled as a finite-state Markov decision process [
6]. Unlike static problems, dynamic allocation considers not only the matching of resources to objectives but also the interrelations between decision phases, objectives, and resources, making it a complex triple-optimization problem [
7]. Effective algorithms for this problem must meet three requirements: rapidity, rationality, and effectiveness of allocation.
Existing research on target assignment has primarily focused on two types of models: asset-based optimization models [
6,
7,
8,
9,
10,
11] and goal-based optimization models [
12,
13,
14]. Ref. [
15] proposed an asset-based dynamic target assignment optimization model, which was later extended by [
16] to incorporate four key constraints: capability, strategy, resource, and feasibility. Dynamic target allocation has also been formulated as a Markovian decision process, with [
6] applying approximate dynamic programming (ADP) to solve representative examples. Other works, such as [
9,
10], have developed integer linear programming models or linearization algorithms to address constraints and resource interdependencies. Additionally, ref. [
17] introduced a hierarchical approach within the observe–orient–decide–act (OODA) framework for non-cooperative scenarios. In non-cooperative scenarios, asset-based models often rely on intelligence gathering and analysis, which can be challenging in real-world applications.
In non-cooperative environments, where asset-based models depend heavily on intelligence gathering, some researchers have shifted toward goal-based optimization models that focus on minimizing target threats or maximizing operational gains without relying on asset values. For instance, ref. [
18] proposed a dual-target allocation optimization model balancing expected threat neutralization and resource expenditure, while ref. [
19] developed a two-target multi-stage tasking model emphasizing sensor-platform synergy. Building on these approaches, ref. [
12] defined an objective function that maximizes system security and engagement sequence stability. To address uncertainty, ref. [
20] introduced a bi-objective optimization model using a conditional value-at-risk measure, and ref. [
21] proposed a multi-objective optimization model with probability thresholds and time window constraints. Chandra et al. [
22] explored the distributed decision-making problem in multi-agent systems and proposed a game-theoretic control barrier function-based distributed multi-robot navigation method. Other works, such as [
23,
24], explored nonlinear multi-objective planning and zero-sum game approaches, with [
14] synthesizing these methods into a two-party zero-sum dual-matrix game model for multi-AUV scenarios. These models are particularly useful in environments where intelligence is limited or unreliable, allowing decision-makers to focus on high-level objectives rather than specific resource capabilities.
To solve dynamic target assignment problems, numerous algorithms have emerged, particularly with the advent of intelligent optimization techniques. Simulated annealing [
25], genetic algorithms [
9,
26,
27], ant colony optimization [
28,
29], and particle swarm optimization [
30,
31] have all been applied to this domain. Romano et al. [
32] studied the social behavior of robots, providing a foundation for researching collective intelligence. These studies [
33,
34,
35] reveal the influence of phenotypic traits, group structure, and dynamic principles on collective movement and behavioral decision-making. Hybrid algorithms have also gained traction: [
36] combined ant colony optimization (ACO) with simulated annealing (SA), while [
29] integrated ACO with greedy algorithms for large-scale problems. Novel approaches, such as an improved membrane swarm algorithm [
37], an (ABC) algorithm using rule-based heuristic factors [
38], and elite collaborative genetic algorithms [
39], have been proposed to address challenges like slow convergence and low search efficiency. Additionally, ref. [
40] developed an improved genetic algorithm for cross-regional collaborative task allocation, and ref. [
21] introduced a hybrid multi-objective discrete particle swarm optimization algorithm for cooperative environments. Recently, ref. [
41] proposed an adaptive simulated annealing–particle swarm optimization (SA–PSO) algorithm, integrating simulated annealing to enhance convergence speed and real-time performance in multi-objective target assignment. However, most existing algorithms focus on reducing the number of iterations or improving convergence efficiency, often at the expense of robustness. This can lead to suboptimal solutions, especially in dynamic, non-cooperative environments where avoiding local optima is critical.
Despite these advancements, several gaps remain. Most existing models and algorithms focus on either defensive strategies or offensive surprise tactics, with limited attention to dynamic, non-cooperative target allocation models that simultaneously consider the strategies of both sides in multi-AUV tasks. Additionally, while many studies aim to improve computational efficiency, fewer efforts have been directed toward designing algorithms that effectively escape local optima and achieve globally optimal solutions in complex, real-world scenarios. To address these challenges, this paper makes the following contributions:
We established a game model for the non-cooperative target allocation problem in multi-AUV tasks, creating an AUV dominance evaluation system and quantifying and analyzing the benefits of target allocation schemes.
We proposed a Nash equilibrium solution algorithm for the non-cooperative target allocation game to address the computational challenges of the NP-hard problem.
We improved the quantum particle swarm algorithm, which can be applied not only to the solution of Nash equilibrium problems but also to hard NP problems with “steep” non-global optimal solutions.
The remaining organizational parts of the article are as follows:
Section 2 develops a dual matrix game model for multi-AUV target assignment. In
Section 3, we design a repulsion-based quantum particle swarm Nash equilibrium solution algorithm. Finally, in
Section 4 and
Section 5, simulation experiments are designed to verify the effectiveness and superiority of the proposed algorithm.
2. Game Modeling of Multi-AUVs Target Assignment Problem
To address the complex demands of target assignment in multi-AUV operations, particularly in non-cooperative scenarios, this paper adopts game theory principles by modeling the red and blue AUVs as participants in a strategic game. In multi-AUV operations, target assignment refers to the process of allocating specific tasks or objectives (targets) to each autonomous underwater vehicle (AUV) within a system to ensure the effective completion of the overall mission. This task becomes especially critical in non-cooperative scenarios, where opposing parties, such as red and blue AUV groups, act independently and potentially in conflict with one another. The importance of solving the target assignment problem lies in maximizing operational effectiveness while minimizing resource waste and conflict. In non-cooperative settings, understanding and predicting the behavior of the opposing side is crucial for achieving a strategic advantage. For example, a poor assignment of targets could lead to redundant efforts, missed mission objectives, or even jeopardizing the success of an operation due to adversarial interference. In this paper, we establish a situation advantage evaluation model using the operational information of both parties. The target assignment strategies of the red and blue AUVs are treated as game strategies, and the interactions are formalized as a bi-matrix game. A payoff function is constructed to evaluate the total gains and losses of both sides under various target assignment combinations, capturing the competitive dynamics of resource allocation and mission success. By solving the Nash equilibrium of the game, the optimal target assignment strategy for each side is determined, ensuring that no participant can unilaterally improve its outcome without disrupting the balance.
2.1. AUV Capability Model
The prerequisite for matching is the quantification of capability. In many existing studies, whether addressing AUVs or other unmanned systems in target allocation problems, there is a lack of detailed explanation regarding threat capabilities. Typically, a simple probability (often a fixed value) is used to represent it. While this approach has minimal impact on algorithms in static allocation problems, in dynamic allocation scenarios, the distribution at different stages relies on feedback from the previous moment’s results. As the task duration increases, using fixed values to represent allocation effects can lead to cumulative errors that grow larger over time, resulting in significant deviations between experimental outcomes and actual situations. Therefore, before modeling the target allocation problem, it is important to provide a clear description of the AUV’s threat capability.
In actual gameplay, there are scenarios where multiple AUVs select the same target. This situation can be simulated as a simultaneous volley, ignoring the time interval of the shooting process, and the threat capability can be described using coverage domains. Assuming that all AUVs operate independently during this process, at time
t,
n AUVs select a target, and their hit probabilities, depending on their respective states, are denoted as
. The capability function of each AUV follows a zero-bias circular normal distribution, meaning that the capability of a single AUV can be expressed as follows:
where
, the standard error of the error variable in each direction is
, and the threat radius of the AUV is
R. The definition
represents the capability of this group to simultaneously select targets.
In real-world scenarios, targets are often not singular, necessitating consideration of multiple targets when modeling the threat capability of AUVs. To simplify this study, the concept of a simultaneous “volley” scenario is introduced into the modeling process. Although achieving truly simultaneous strikes on multiple targets is challenging in practice, the underwater environment involves significant time costs for feedback on detection information as well as other unique characteristics. For instance, underwater communication is constrained by low efficiency and high latency due to the physical properties of acoustic communication. Additionally, AUVs in underwater environments face slower movement speeds and limited navigation and positioning accuracy, which add complexity to task planning and allocation. Therefore, the process can be simplified by assuming no immediate feedback after deploying the AUV, effectively treating the launches as “simultaneous”. In this context, “simultaneous” does not emphasize precise timing consistency but rather focuses on the absence of feedback after each AUV completes its mission, allowing for modeling as a single, unified engagement phase.
2.2. Bi-Matrix Game Model of Multi-AUVs Target Assignment
The two-person zero-sum bi-matrix game
can be viewed as an extension of the n-person finite game
, where the game individual set is
, and
are red and blue parties composed of multi-AUVs.
and
are extensions of the strategy set
to represent the mixed strategy set of game individuals, respectively, which are the target assignment schemes of red and blue.
are the extensions of the payoff function and the payoff matrix corresponding to the strategy, respectively. Under the mixed strategy
, the expected returns of individual R and individual B are, respectively,
and
.
is a mixed Nash equilibrium of game
.
In the calculation of multi-AUV target assignment, the computational difficulty increases significantly with the number of AUVs. Therefore, this paper introduces the concept of strategic superior super [
32], firstly slimming down the payoff matrix.
Definition 1. In game , the strategies and of game i makeThis indicated that strategy of game i is superior to strategy . The relevant corollaries for the solution of the Nash equilibrium mixed strategy are given below. is the jth column of A, and is the ith row of A.
Inference 1. Let be a bi-matrix game; then, is a mixed Nash equilibrium of ,
IFF:
Theorem 1. Let be a mixed Nash equilibrium of ; then, there are complementary relaxation conditions: According to Definition 1, is the mixed Nash equilibrium of bi-matrix game only when is the solution of the inequality set as follows: Definition 2. By solving the above equation, the Nash equilibrium mixed strategy of the individual game can be obtained.
2.3. Multi-AUVs Confrontation Superiority Evaluation System
Based on the analysis of the multi-AUVs countermeasures problem, the situation advantage function based on the heading, velocity, and distance of both sides and the attack and defense efficiency advantage function based on the attack performance of both sides’ AUVs is established to form the multi-AUVs attack and defense countermeasures dominance evaluation system, which provides the basis for the establishment of the payment matrix.
2.3.1. Situation Advantage Function
The red AUV cluster is
, where
represents the
red AUV. The blue AUV cluster is
where
represents the
blue AUV. The offensive and defensive situations of
and
are shown in
Figure 1. In
Figure 1,
is the linear distance between two AUVs, and
and
are the velocity vectors of the red AUV and blue AUV, respectively.
is the off-axis angle of
relative to
, and is defined as the entry angle of the target.
The right deviation of the target bearing and target entry angle is positive, and the left deviation is negative. According to the confrontation situation between red and blue, the heading advantage is defined as follows:
When
and
, the red AUVs have the advantage of angle attack. When
and
, the two sides are in a state of equilibrium. When
and
, both sides are at a disadvantage. When
and
, blue AUVs are in the dominant state.
The velocity advantage function is shown in Equation (
9):
where
and
are the velocity vectors of the red AUV and blue AUV, respectively. The higher the velocity of the AUV, the greater the attack advantage will be.
The distance advantage function is shown in Equation (
10):
where
is the linear distance between two red AUVs and blue AUVs,
,
;
is the maximum effective range of the AUV, and
is the minimum delivery distance of the AUV. When
, the distance advantage is considered to be zero, and the distance advantage gradually increases with the decrease in distance; when
, the distance advantage reaches the maximum, and the distance advantage gradually decreases with the further decrease in distance.
According to the above frontal, velocity, and distance advantage functions, the multi-AUV offensive and defense situation advantage function is shown in Equation (
11), where
is the weighting coefficient, and
.
2.3.2. AUV Capability Efficiency Evaluation Function
The confrontation capability of the AUV is mainly measured by six factors: maneuver capability, detection capability, communication capability, load capability, stealth capability, and AUV performance. The calculation method is as follows:
where B is the maneuver capability parameter,
is the performance parameter of the AUV threat capacity,
is the detection performance parameter,
is the communication capability coefficient,
is the stealth performance coefficient, and
is the load capacity coefficient. After the standardized operation of Equation (
12), the advantage function of red and blue AUVs can be obtained
.
where
and
are the target assignment game advantages of red and blue, respectively.
In summary, the advantage function of the attack and defense game of multi-AUVs is as follows:
where
is the situational advantage function of both sides, and
is the advantage function of offensive and defensive effectiveness.
and
are the weighted coefficients, and
+
= 1.
2.4. Multi-AUVs Target Assignment Game Payment
Through the establishment of a multi-AUV target assignment dominance evaluation system, the basis of the multi-AUVs target assignment game payment matrix is provided. The overall revenue function of red and blue is as follows:
where
and
are the total offensive and defensive gains of the red and blue sides, respectively;
and
are the AUV numbers of red and blue, respectively.
,
(
, and
, respectively, represent the value return of
and
relative to red AUVs (blue AUVs);
are the survival probabilities of
and
after
K steps, which are determined by the following:
where
and
, respectively, represent
. The number of AUVs is used by the
jth AUV of blue AUVs to attack the
ith AUV of red AUVs and the damage hit probability of a single one;
and
, respectively, represent the number of AUVs used by the
ith red AUV to attack the
jth blue AUV in step k and the damage probability of a single one.
and
represent the damage hit probability under ideal experimental conditions;
and
are the dominance coefficients of the ith AUV of red AUVs to the
jth AUV of blue AUVs at the
kth step and the dominance coefficients of the
jth AUV of blue AUVs to the
ith AUV of red AUVs at the
kth step. Definite
and
, and the initial survival probability of both parties is 1. According to the control of both sides on the use of AUVs, the model constraint condition is determined as follows:
where
and
represent the maximum number of targets assigned by the red and blue parties to the red
boat and the blue
boat in the
game.
and
represent the maximum number of targets sent by red
i/blue
j AUV in the
game.
2.5. Nash Equilibrium of the Multi-AUVs Target Assignment Game
To address the Nash equilibrium problem in the bi-matrix game for target allocation, the process begins with optimizing the payoff matrix through strategic refinement to eliminate non-optimal strategies, thereby forming an optimal strategy set. Next, the feasibility of a pure strategy solution is assessed. If feasible, both parties select the pure strategy scenario as the optimal solution for target assignment at this stage. The mixed Nash equilibrium solution is then calculated using quantum particle swarm optimization based on a repulsion process. Targets are assigned according to the mixed Nash equilibrium, with assignments made probabilistically. Given that the Nash equilibrium solution for target assignment in multi-AUV non-cooperative scenarios is an NP-hard problem, this paper will introduce the repulsion process quantum particle swarm optimization (RPQPSO) algorithm to solve the Nash equilibrium in the subsequent chapter.
4. Simulation Experiment of the Target Assignment for the Multi-AUVs Task
To verify the RPQPSO algorithm within the non-cooperative game process of multi-AUV target assignment, this paper describes a multi-AUV simulation experiment. The game payoff matrix for multi-AUV target assignment in non-cooperative scenarios was established. To solve the mixed Nash equilibrium, a traditional particle swarm optimization algorithm was implemented for comparison with the RPQPSO algorithm proposed in
Section 5. This comparison verified the algorithm’s real-time performance and accuracy. A multi-round game was constructed based on the multi-AUV target assignment model, verifying the feasibility of the solution method for the multi-AUV task allocation in non-cooperative scenarios.
4.1. Multi-AUVs Non-Cooperative Scenario Parameter Setting
In the unbounded calm (without the influence of waves and currents) seabed, there were 12 isomorphic AUVs in the red team, denoted as , , …, , which constructed three AUV groups . There were six isomorphic AUVs in the blue group, denoted as , , …, , which constructed two AUV groups . The experiments were conducted on a Windows system for algorithm computations, using VS2013 for numerical simulations and AnyLogic for visualization simulations.
The red AUVs were informed of the general position of the blue AUVs and moved towards them. Blue AUVs stayed on course. All the team members had good detection and communication performance, and the group maintained communication and shared information during the task.
Table 1 and
Table 2 show the comparison of the parameters of the basic AUV performance and the performance carried out by the two sides. The initial velocity and pose assignment of the red and blue AUV groups are shown in
Figure 2, where
represents the coordinate
of the current northeast coordinate system of the AUV, the heading angle is
, and the longitudinal velocity is
u.
The AUV value parameter matrix of both sides is shown in Equations (
33) and (
34), where
and
(
and
), respectively, represent the value benefits of
and
relative to red AUVs (blue AUVs), which are obtained from intelligence information.
Constraints on the maximum number of targets sent by
/
in each step:
Blue AUVs/red AUVs maximum number of targets assigned to
and
constraints:
4.2. First-Round Simulation of the Target Assignment
According to the proximity rule, the red group chooses the closest blue group to attack. In the situation shown in
Figure 2, red group
attacks blue group
, red group
attacks blue group
, and red group
does not participate in the confrontation. The following takes the target assignment game in which red group
chooses to attack blue group
in the first round of attack and defense as an example.
According to the situation assessment system, the game advantage matrix of both sides can be obtained:
Red group
has
target assignment schemes, and blue group
has
target assignment schemes. The two sides’ task assignment schemes are expressed as Equations (
40) and (
41).
In the equation, represents the number of targets assigned by to , and represents the number of targets assigned by to .
All the load assignment schemes of Equations (
40) and (
41) of red group
and blue group
and the advantage matrix formula refer to Equations (
38) and (
39), and we put them into Equations (
19) and (
20) to obtain the actual hit probability of the AUV of both parties. Then, we put Equation (
19) into Equations (
17) and (
18) to obtain the predicted survival probability of blue and red AUVs, and finally, the payment matrix is obtained. Because the target assignment scheme of the two sides has 54 and 60 options, respectively, the payment matrix of both sides is a complex matrix of 54*60, which is difficult to calculate directly. Therefore, the strategy optimization is adopted to eliminate the strategy with low payment value, simplify the payment matrix of both sides, and finally, simplify as follows:
where
represents the
ith strategy of the red group, and
represents the
jth strategy of the blue group.
The scribing method is adopted to solve the Nash equilibrium. The maximum values of each column of are , , . The maximum values of each row of are , , and , respectively. It can be seen that is the overlapping point, so is the Nash equilibrium. The optimal mixed Nash equilibrium solution of the bi-matrix game was obtained by PSO as , and the average time was 1.47 s. The optimal mixed Nash equilibrium strategy of the bi-matrix game was obtained by the RPQPSO as , and the average time was 2.47 s. The calculation result of QPSO based on the repulsion process was selected as the optimal result. That is, the red side chose the 28th strategy with a probability of 0.9997, and the blue side chose the 18th strategy with a probability of 0.9999. The expected returns of red and blue AUVs implementing this game strategy were, respectively, .
If the unilaterally chooses to execute the 44th strategy instead of the 18th strategy, the will continue to adopt the 18th strategy and obtain an expected return of 0.11547, while the can only obtain an expected return of −0.37840, which proves that the individual players who unilaterally do not abide by the Nash equilibrium can only lead to a reduction in benefits.
4.3. Simulation of the Multi-AUV Target Assignment Process
Because the game involves a delay for the targets to reach the designated positions, and the multi-AUV task execution is not accomplished instantly, it generally requires multiple rounds of interaction. When multi-AUVs engage in tasks based on target allocation, the expected benefits and target assignment for both sides in the first round of task execution can be determined. The expected rewards for the blue team and the AUVs are as follows:
and the assignment is shown as follows:
After each round assignment for a mission, both sides’ AUVs maneuver. Players in the second round were as follows:
vs.
;
vs.
. Since the distributive effect in the first round did not hit the target, neither side could determine the benefits of the first round of assignment. Therefore, the expected rewards were calculated by combining the distributive effect in the first round and the current maneuver situation.
The opposing parties choose the target assignment scheme according to probability, and the final target assignment is as follows:
The first round of assignment effected by the red and blue parties and the positions of each AUV are shown in
Figure 3. If the probability of survival is less than 0.4, the victim will be wounded and will not participate in the confrontation.
When the second round of target assignment is completed, both sides begin to prepare for the third round of assignment. Before the third round of the game, the first round of engagement had already occurred, and its effects/consequences were known. Based on the probability of target neutralization and the probability of failure, the survival probability for both parties can be calculated as follows:
The status before the third round of assignment and the situation from the second round of target allocation can be assessed. It can be observed from the game results that the loss for the red group
is significant, leading
to withdraw from the game in the third round. Participation in the third round of the game involves
vs.
and
vs.
. Since the targets from the first round have been engaged, and the second round of targets have been deployed but have not yet reached the target area, the outcomes cannot yet be determined. Therefore, by combining the second round of target assignments with the results from the first round engagements, the expected returns for the second and third rounds can be calculated as follows:
The two sides choose the target assignment scheme according to the probability results obtained by the matrix game, and the final target assignment is as follows:
The final survival probability of each AUV group member on the red and blue sides is shown as follows:
At this point, the blue AUVs have lost the ability to continue the task, marking the end of the game. An integrated analysis is conducted on the target distribution over three rounds of tasks. For intuitive expression, the coding scheme for multi-round integration is designed according to group, sub-target, and multi-round integration, as illustrated in
Figure 4 and
Figure 5. The first row in this diagram represents the AUV
in the red group
in the first distribution and the AUV
in the blue group
that are selected. Similarly, the blue target allocation in the figure is also represented according to these coding rules, and the 0 in the figure indicates that the allocation did not participate in this round or that the ability to continue allocation was lost in the previous round.
5. Analysis of Algorithm Superiority
In order to prove the effectiveness of the proposed algorithm, a comparison experiment was designed to compare the computational effects of RPQPSO, RPSO, QPSO, and PSO on dynamic target allocation. The PSO algorithm takes the particle population number as 20, the maximum iteration times as 1000, and
,
,
,
, and the updating formula
adopts Equation (
71).
The QPSO algorithm takes the particle population as 20 and the maximum iteration times as 1000, and
,
,
a uses a linear value reduction strategy. The process parameters of repulsion are, respectively,
,
.
The parameters in the formula refer to
Section 2.4. In order to reduce the influence of random factors, the offline calculation of each algorithm was repeated 50 times under the same conditions.
Figure 6 shows the comparison results of algorithm convergence.
Figure 7,
Figure 8 and
Figure 9 are the statistical tables of optimal solution and optimization time of the algorithm, respectively, where
represents the minimum operation time of the algorithm;
represents the average running time of the algorithm;
represents the minimum number of iterations the algorithm runs;
represents the average number of iterations the algorithm runs;
c is the number of times to find the optimal solution.
Compared with the experimental results, it can be seen that the particle swarm optimization algorithm exhibits good convergence performance but is prone to falling into local optima, achieving the optimal solution only 5 times out of 50 experiments. In contrast, the quantum particle swarm optimization algorithm based on the repulsion process achieves the optimal solution 42 times out of 50 experiments. Its convergence curve is smoother, less prone to abrupt changes, and demonstrates stronger global convergence capability and higher accuracy. However, the convergence speed is slower, resulting in a correspondingly longer solving time.
In order to verify whether the error of early termination of the algorithm is small, the quantum particle swarm optimization algorithm based on the repulsion process is run 100 times under the above conditions. The optimal fitness
when the algorithm reaches stability and the optimal fitness
when the algorithm reaches the maximum number of iterations, as well as the running time
and the number of iterations
when the algorithm reaches stability, are obtained.
Table 3 shows the statistical results of the effectiveness of the particle swarm optimization algorithm based on the repulsion process when it reaches stability, where
is the maximum difference between
and
in 100 experiments;
is the average of the difference between
and
.
As can be seen from
Table 3, the stable state detection of the quanta particle swarm algorithm based on the repulsion process can effectively reduce the running time of the algorithm and has an error of 0.17 at most, which has little influence on the error of solving accuracy and improves the real-time noise of the algorithm. In conclusion, the quantum particle swarm optimization algorithm based on the repulsion process proposed in this paper can solve mixed Nash equilibrium well, both concerning real-time and accuracy.
6. Conclusions
The target assignment problem has received extensive attention and research, but research on non-cooperative target assignment problems is still relatively limited. This paper studies target assignments in the context of multi-AUV non-cooperative tasks. Firstly, a bi-matrix game model of target assignment based on a multi-AUV system was proposed. Secondly, in order to solve the Nash equilibrium of the bi-matrix game effectively and avoid local optima, a quantum particle swarm optimization algorithm with local exclusion was designed to solve the Nash equilibrium. Finally, the advantages of the algorithm were analyzed in terms of the number of iterations and the fitness value of the optimal solution.
Experiments show that the multi-AUV non-cooperative target assignment model designed in this paper meets the requirements of non-cooperative scenarios and can objectively reflect real task conditions. The established AUV situation assessment system provides a foundation for constructing the payoff matrix and subsequent target assignment. Additionally, the dual matrix game-based non-cooperative target assignment model designed in this paper can adapt to dynamic target assignment tasks. Experiments verify that any deviation from the resolved Nash equilibrium solution by either party leads to a decline in returns, confirming the model’s effectiveness.
Since this paper starts from the perspective of both parties and uses target assignment as a means of managing non-cooperative scenarios, with Nash equilibrium as the final assignment strategy, solving the Nash equilibrium becomes the focus. However, it is challenging to solve the Nash equilibrium in practical situations. By designing a quantum particle swarm optimization algorithm based on a repulsion process, the Nash equilibrium solution meets the requirements of non-cooperative scenarios. A typical matrix game experiment is designed to verify the effectiveness and superiority of the proposed algorithm for solving Nash equilibrium in bi-matrix games.
Future research can further expand and enhance the outcomes of this study in the following aspects: first, more complex non-cooperative scenarios can be explored, such as multi-party games or mixed target assignment problems involving multiple types of tasks, to improve the model’s applicability and real-world relevance. Second, incomplete information game models could be introduced in dynamic environments, enabling AUVs to make smarter decisions when faced with dynamically changing task conditions or uncertain information. Additionally, algorithm performance can be further optimized by incorporating methods such as deep learning or multi-agent reinforcement learning, thereby improving the efficiency and adaptability of Nash equilibrium solutions. Finally, in practical applications, simulation tests or the actual deployment of multi-AUV task systems can be conducted to validate the robustness and feasibility of the proposed approach. This would provide stronger technical support for fields such as ocean resource exploration, military confrontation, and environmental monitoring.