CN111488992A - Simulator adversary reinforcing device based on artificial intelligence - Google Patents
Simulator adversary reinforcing device based on artificial intelligence Download PDFInfo
- Publication number
- CN111488992A CN111488992A CN202010140651.6A CN202010140651A CN111488992A CN 111488992 A CN111488992 A CN 111488992A CN 202010140651 A CN202010140651 A CN 202010140651A CN 111488992 A CN111488992 A CN 111488992A
- Authority
- CN
- China
- Prior art keywords
- module
- simulator
- information
- workstation
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 29
- 230000003014 reinforcing effect Effects 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 50
- 230000002787 reinforcement Effects 0.000 claims abstract description 36
- 230000006978 adaptation Effects 0.000 claims abstract description 26
- 238000004088 simulation Methods 0.000 claims abstract description 20
- 238000013523 data management Methods 0.000 claims description 13
- 238000011156 evaluation Methods 0.000 claims description 13
- 230000003416 augmentation Effects 0.000 claims description 4
- 230000006855 networking Effects 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 9
- 238000012015 optical character recognition Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008485 antagonism Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000005728 strengthening Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000002054 transplantation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/042—Backward inferencing
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B9/00—Simulators for teaching or training purposes
- G09B9/003—Simulators for teaching or training purposes for military purposes and tactics
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B9/00—Simulators for teaching or training purposes
- G09B9/02—Simulators for teaching or training purposes for teaching control of vehicles or other craft
- G09B9/08—Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of aircraft, e.g. Link trainer
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Aviation & Aerospace Engineering (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a simulator opponent reinforcing device based on artificial intelligence, which comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module; the deep reinforcement learning module trains an agent by adopting the received data; the workstation module is connected with the simulator through the adaptation module, so that the simulator operation object and the intelligent agent perform confrontation simulation. The invention utilizes a uniform technical framework, can realize networking man-machine fight in various types of simulators, can automatically complete information acquisition and further promotes the continuous upgrading of the intelligent agent.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence confrontation and simulation deduction, and particularly relates to a simulator opponent reinforcing device based on artificial intelligence.
Background
Various simulators are often used to model a true confrontational environment, whether for games or training of special personnel. For the flight simulator, the flight simulator can approach reality to a great extent through simulation of a real airplane, including approximation of a dynamic model, environmental simulation, a cockpit, operation parts and the like. Under the condition of low cost, the aircraft crew can continuously carry out flight training through the flight simulator, and the flight simulator has the advantages of safety, reliability, economy and high working efficiency, and has a positive effect on improving the flight level of pilots.
However, the upgrade and update of the existing simulator are mainly updated one by one through original factories, and as the sources of the factories of the flight simulator are various, the upgrade difficulty and investment of the operation are extremely high.
The existing simulator generally aims at flight simulation, and can partially realize simple autonomous behaviors such as cruising and fixed-point flight; and a flight simulator can select a simulation opponent to carry out air combat confrontation training, but the confrontation algorithm mostly adopts simple expert knowledge, and is low in confrontation level and poor in adaptability under different scenes.
The defects of the existing flight simulator are mainly reflected in the following aspects by taking air combat as a training purpose: the intelligent level of the virtual opponent is not enough, the antagonism experience is not enough, and the strong enemy antagonism cannot be realized to achieve the intelligent auxiliary purpose; the simulator system is closed, so that the outside is difficult to interact with the simulator system, the expansion is inconvenient, and the further upgrading can not be carried out according to the requirement; the different simulators have greatly different hardware and software designs, high maintenance cost and incapability of being transplanted, and time and labor are wasted in customized upgrading.
In view of the above disadvantages, upgrading the existing simulator urgently needs to solve the following problems: (1) the upgrading scheme is easy to copy, has wide adaptability and does not depend on the type of a simulator; (2) the intelligent level of the game opponents is high, and the game opponents are more close to actual combat by taking air combat as guidance; (3) the maintenance cost is low, and the transplantation is convenient.
Disclosure of Invention
The invention aims to provide an artificial intelligence-based simulator opponent reinforcing device which is used for improving the antagonism of games or training and providing high-intelligence imaginary opponents for operators.
In order to achieve the purpose, the technical scheme of the application is as follows:
an artificial intelligence-based simulator opponent augmentation device, the artificial intelligence-based simulator opponent augmentation device comprising: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:
when the intelligent game machine works in a training mode, the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module;
the deep reinforcement learning module trains an agent by adopting the received data;
when the intelligent agent is in an inference mode, the workstation module is connected with the simulator through the adaptation module so that the simulator operation object and the intelligent agent can perform confrontation simulation, the adaptation module acquires screen display information from the simulator for recognition, and sends recognized attitude and posture information of the simulator operation object to the workstation module;
the workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module generates decision information and sends the decision information to the workstation module, and the workstation module sends the attitude information of the intelligent body to the adaptation module after executing the decision;
and the adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display.
Further, the simulator opponent reinforcing device based on artificial intelligence further comprises a data management module, wherein the data management module is used for storing sample data generated by a training mode and an inference mode and storing a trained intelligent agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module.
Further, the simulator opponent reinforcing device based on artificial intelligence further comprises an efficiency evaluation module, wherein the efficiency evaluation module is connected with the workstation module and is used for evaluating the training result.
Further, the deep reinforcement learning module trains the agent by adopting a DDPG algorithm.
Further, the adaptation module acquires screen display information from the simulator for recognition, and adopts OCR to recognize images acquired from the screen display information and acquire attitude and posture information of the simulator operation object.
The simulator adversary strengthening device based on artificial intelligence that this application provided constructs a simulation environment to the air battle through the workstation module, realizes the butt joint with the line simulator through the adaptation module, lets the flight simulator can operate in the simulation environment of structure. Through the deep reinforcement learning module, the training generation of the intelligent agent on the strong enemy of the opponent is realized, and a strong AI opponent is provided for simulation training personnel. The technical scheme of the application has the advantages that: (1) by utilizing a uniform technical architecture, networking man-machine fight can be realized in various types of flight simulators; (2) the method takes continuous image high-speed identification as a core, can extract the airplane state by simply utilizing video information output by a simulator, inputs the airplane state into an intelligent air combat engine, and simultaneously displays a virtual enemy target in the engine in a display of an original simulator in an overlapping manner, so that a flight simulator is upgraded into an intelligent air combat simulator, and the training level of air combat tactics is promoted to be improved; (3) in the man-machine fighting process, tactical information acquisition can be automatically completed, and continuous upgrading of the intelligent air combat system is further promoted.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence-based simulator opponent reinforcing device according to the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in FIG. 1, an artificial intelligence based simulator opponent augmentation device includes: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:
when the intelligent game machine works in a training mode, the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module;
the deep reinforcement learning module trains an agent by adopting the received data;
when the intelligent agent is in an inference mode, the workstation module is connected with the simulator through the adaptation module so that the simulator operation object and the intelligent agent can perform confrontation simulation, the adaptation module acquires screen display information from the simulator for recognition, and sends recognized attitude and posture information of the simulator operation object to the workstation module;
the workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module generates decision information and sends the decision information to the workstation module, and the workstation module sends the attitude information of the intelligent body to the adaptation module after executing the decision;
and the adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display.
In this embodiment, pilot training is taken as an example for explanation, and the method is also applicable to the reinforcement of other simulator artificial intelligence opponents, for example, the reinforcement of an artificial intelligence opponent in a game, and details are not described below. In the present embodiment, an artificial intelligence opponent is referred to as an agent, and an object that is manually operated on a simulator is referred to as a simulator operation object. In the pilot training, the agent performs the countermeasure drilling as the blue army in the air combat countermeasure, and the simulated airplane participating in the pilot operation of the simulated countermeasure as the red army (simulator operation object).
In this embodiment, the workstation module is internally provided with a self-game countermeasure training engine, and can also be externally connected with the self-game countermeasure training engine, if pilot training is performed, the self-game countermeasure training engine can be a simulated air combat engine, various functions of simulated air combat simulation are realized, the functions include various elements such as an air combat scene and an airplane, and the simulated air combat engine can perform personalized configuration on the air combat scene and the airplane type. The workstation module is the pivot of the device, and serves as a bridge among all modules, and supports scene selection, airplane parameter configuration and the like. The method can realize the definition of air battle scene, the editing of scene parameters, the loading of scene, the definition of airplane model, the editing of airplane model parameters, the loading of airplane model, the configuration of victory or defeat rule parameters and the like.
The deep reinforcement learning module integrates algorithms related to air combat, including reinforcement learning, expert rules and the like, by taking pilot training as an example, and can realize classical tactical methods of air combat such as cross attack, unilateral attack, bilateral attack and the like.
The deep reinforcement learning module of the embodiment trains the agent based on the DDPG algorithm, which is called a deep diagnostic Policy Gradient, and the DDPG algorithm is a relatively mature algorithm and is not described herein again. Assuming a state stFor the fighting situation observed by the agent in the current state, atDecision-making actions to be performed by agents in the current situation of combat, rtInstant rewards, s, obtained for agents after performing decision-making actionst+1And after the intelligent agent executes the decision-making action, entering the fighting situation returned by the next state environment. The intelligent agent carries out the engagement with the air battle environment through a series of s, a and rThe goal is to maximize the jackpot. Under the current state s, the intelligent agent calculates a decision behavior a according to the strategy mu-mu (a-s)tWill decide action atAnd executing in the simulation environment until the end of the war office.
In the training mode, the training can be started from zero, and the existing model can be loaded to continue training. A self-game confrontation training engine is arranged in the workstation module, and provides a simulation environment for air combat simulation, including scene modeling, airplane modeling, reward function modeling and the like. The purpose of scene modeling is to establish organizational relationships among airplanes, such as scene assumptions of 2v2 and 4v2, battle rule design and the like. The airplane modeling comprises basic element modeling and maneuvering capability modeling, wherein the basic element modeling is an important part for constructing a simulation environment and comprises a detection radar, a photoelectric radar, a missile, a jamming bomb, a jamming pod and the like; maneuvering ability modeling is the basis of air combat simulation environment construction and tactical strategy design, and for an air combat game intelligent engine, state information of a virtual airplane, such as position, angle, speed, radar state, missile state and the like, is the basis for influencing decision making. The reward function is a standard for the intelligent agent to judge the self performance, and the establishment of a reasonable reward function R(s) is a key for reinforcement learning. The reward function should take into account the realizability and sparsity comprehensively, and achieve the balance between tactical implementation and exploration. The reward function is not uniquely shaped and the distribution of positive and negative rewards is preferably balanced across the range of argument for value optimization.
The workstation module returns data of the self-game confrontation training, such as states and rewards to the deep reinforcement learning module, the deep reinforcement learning module returns decision behaviors to the workstation module, and the workstation module executes decisions, so that a stronger confrontation opponent agent is trained.
In the reasoning mode, the workstation module of the embodiment is connected with the simulator through the adaptation module, and the simulator is a flight simulator when the pilot trains for example. The adaptation module is used for adapting different simulators and mainly comprises an image recognition module and an image processing module. The pilot operates on the flight simulator, and the pilot usually observes the information of the pilot and the information of the azimuth angle, the distance and the like of the enemy plane in the radar detection range in real time through a vision system of the flight simulator so as to carry out the next flight operation. Similarly, for the intelligent agent, the situation information of the current air combat environment also needs to be obtained in real time, and the next flight operation is output. However, since the existing flight simulators are various and there is not necessarily an interface for transmitting flight data of the flight simulator to the outside of the flight simulator, how to obtain the flight data from the flight simulator is a key problem.
According to the embodiment, attitude and attitude information such as the position, the direction angle and the like of the pilot plane is obtained by continuously identifying instrument panel information in the pilot vision system at a high speed. In the air combat countermeasure process, in order to give the pilot the feeling of real flight, the vision system can present corresponding pictures according to the current environment, the change of the operating rod and the current motion state of the airplane, for example, the instrument panel cannot be seen clearly due to strong light, and effective information is shielded when the operating rod is moved. There are different countermeasures to this problem:
a) and performing split screen processing on the main instrument panel, capturing screen picture images in real time, and performing image recognition to obtain information transmitted by the instrument panel. The information in the instrument panel is identified simply and accurately, and the acquired situation information is stable. A disadvantage is that the flight simulator needs to provide the ability to split the dashboard.
b) And directly identifying a visual system, namely a picture of the visual angle of a pilot. For the condition of effective information shielding, continuous data provided by continuous image identification is subjected to data analysis to obtain the effective information shielding, and algorithms such as interpolation, prediction and the like are included. The method has the advantages that the method does not depend on whether the flight simulator can provide corresponding external interfaces or not, and the universality is stronger. The disadvantages are: the situation information is not stably obtained, corresponding algorithms need to be formulated according to different abnormal conditions of the picture, and if a deep learning method with strong generalization capability is adopted, a large number of training samples need to be obtained.
The image recognition module of the embodiment recognizes the image through OCR and acquires attitude and posture information. OCR (optical character recognition) is a technique of determining the approximate shape of the image content by detecting patterns of light and shade, and then converting the shape into computer text by a character recognition method. OCR typically performs the following operations on an image: 1) image preprocessing, such as binarization processing and the like, is used for enhancing the readability of the image; 2) detecting, namely positioning an area containing characters in a picture; 3) and identifying characters in the area.
Commonly used detection methods include: a) a connected domain based approach; 2) a sliding window based approach; 3) a method based on deep learning. The method based on the connected domain considers that the characters in the image appear in the form of the connected domain, so that the method firstly extracts all the connected domains in the image as alternatives, and then selects the corresponding character connected domain according to judgment of a classifier or some rules. The method based on the sliding window is to extract the characteristics of the image covered by the sliding window, and input the image into a pre-scoring classifier to judge whether the current area contains characters. The detection method based on deep learning is generally used for the situation that the scene is more complicated. By selecting labeled data for the text portion in a large number of boxes, a multi-tiered learnable network is used to fit the mapping between the data and the labels, boxes by minimizing a predefined loss function.
And identifying the detected target area. In principle, the method can be roughly divided into two types: template matching, feature extraction and classifier. The traditional feature extraction is usually to extract edge information, gray scale features, computer features, etc. of the target region, and commonly used classifiers include a support vector machine, nearest neighbor, etc. Recently, the hot deep learning is another type of feature extraction + classifier model. The method comprises the steps of automatically extracting features which are easy to classify through an end-to-end mode, a multidimensional fitting space, a predefined loss function and a large amount of labeled data. The method is widely used through the characteristics of self-extracting characteristics, higher identification precision and strong generalization capability.
The situation information that the image recognition module among this embodiment adaptation module can discern simulator operation object (red is equal) in the simulator passes to the workstation module, and image processing module can be sent the simulator with intelligent agent (blue army) situation information stack to radar situation picture, and then realizes man-machine confrontation.
The workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module outputs decision-making actions according to the current attitude, generates decision-making information and sends the decision-making information to the workstation module, the workstation module updates the state of the intelligent body in a simulation environment after executing the decision-making, and sends the attitude information of the intelligent body to the adaptation module.
The adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display, and the pilot carries out man-machine game operation after seeing the blue army attitude displayed on the screen.
In one embodiment, the simulator opponent reinforcing device based on artificial intelligence further comprises a data management module, wherein the data management module is used for storing sample data generated by a training mode and an inference mode and storing a trained intelligent agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module. In the training mode, the workstation module generates confrontation sample data through a built-in self-game confrontation training engine, and the generated confrontation sample data is sent to the deep reinforcement learning module for training and simultaneously the same data is sent to the data management module for persistent storage; meanwhile, in the training process, the deep reinforcement learning module sends the intelligent model to the data management module irregularly according to the convergence condition of the intelligent model training, and the data management module performs unified storage and management according to the description of the model. In the reasoning mode, the workstation module receives the real-time countermeasure data sent by the adaptation module, and sends the data to the data management module for persistent storage while making a decision.
Sample data, intelligent agent models and the like generated in the training and reasoning processes are uniformly managed by the data management module, and can be stored, loaded, selected, copied and the like through the workstation module.
In another embodiment, the simulator opponent strengthening device based on artificial intelligence further comprises an efficiency evaluation module, the efficiency evaluation module is connected with the workstation module, the workstation module sends attitude and posture information of both opponents and intelligent agent decision information to the efficiency evaluation module in the training and reasoning process, and the efficiency evaluation module performs intelligent agent decision effect evaluation and opponents attitude and superiority evaluation according to the received information and visually displays evaluation results.
The efficiency evaluation module can help a user to better understand the training progress and evaluate the training convergence condition of the intelligent agent; by modifying the difference between the parameter comparison training result of the simulation environment and the intelligent agent winning rate, the decision maker can understand the influence degree of each parameter on the final result, and the defects of each aspect of the intelligent agent are overcome. The efficiency evaluation module can evaluate the combat efficiency of each airplane and judge the strength of the intelligent agent according to the efficiency evaluation result.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (5)
1. An artificial intelligence based simulator opponent augmentation device, comprising: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:
when the intelligent game machine works in a training mode, the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module;
the deep reinforcement learning module trains an agent by adopting the received data;
when the intelligent agent is in an inference mode, the workstation module is connected with the simulator through the adaptation module so that the simulator operation object and the intelligent agent can perform confrontation simulation, the adaptation module acquires screen display information from the simulator for recognition, and sends recognized attitude and posture information of the simulator operation object to the workstation module;
the workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module generates decision information and sends the decision information to the workstation module, and the workstation module sends the attitude information of the intelligent body to the adaptation module after executing the decision;
and the adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display.
2. The artificial intelligence based simulator opponent enhancement device of claim 1, further comprising a data management module for saving sample data generated by the training mode and the inference mode, and saving the trained agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module.
3. The artificial intelligence-based simulator opponent enhancement device of claim 1, further comprising a performance evaluation module coupled to the workstation module for evaluating training results.
4. The artificial intelligence-based simulator opponent enhancement device of claim 1, wherein the deep reinforcement learning module employs a DDPG algorithm to train an agent.
5. The artificial intelligence based simulator opponent enhancement device of claim 1, wherein the adaptation module obtains on-screen information from the simulator for recognition, and uses OCR to recognize images obtained from the on-screen information and obtain pose information of the simulator operation object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010140651.6A CN111488992A (en) | 2020-03-03 | 2020-03-03 | Simulator adversary reinforcing device based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010140651.6A CN111488992A (en) | 2020-03-03 | 2020-03-03 | Simulator adversary reinforcing device based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111488992A true CN111488992A (en) | 2020-08-04 |
Family
ID=71798144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010140651.6A Pending CN111488992A (en) | 2020-03-03 | 2020-03-03 | Simulator adversary reinforcing device based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111488992A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560332A (en) * | 2020-11-30 | 2021-03-26 | 北京航空航天大学 | Aviation soldier system intelligent behavior modeling method based on global situation information |
CN112836036A (en) * | 2021-03-18 | 2021-05-25 | 中国平安人寿保险股份有限公司 | Interactive training method, device, terminal and storage medium for intelligent agent |
CN113298260A (en) * | 2021-06-11 | 2021-08-24 | 中国人民解放军国防科技大学 | Confrontation simulation deduction method based on deep reinforcement learning |
CN115470710A (en) * | 2022-09-26 | 2022-12-13 | 北京鼎成智造科技有限公司 | Air game simulation method and device |
KR20240048839A (en) * | 2022-10-07 | 2024-04-16 | 엘아이지넥스원 주식회사 | Virtual training apparatus capable of adaptive simulation of virtual target and training method using the same |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021754A (en) * | 2017-12-06 | 2018-05-11 | 北京航空航天大学 | A kind of unmanned plane Autonomous Air Combat Decision frame and method |
CN109496318A (en) * | 2018-07-30 | 2019-03-19 | 东莞理工学院 | Adaptive game playing algorithm based on deeply study |
CN109636699A (en) * | 2018-11-06 | 2019-04-16 | 中国电子科技集团公司第五十二研究所 | A kind of unsupervised intellectualized battle deduction system based on deeply study |
CN109670596A (en) * | 2018-12-14 | 2019-04-23 | 启元世界(北京)信息技术服务有限公司 | Non-fully game decision-making method, system and the intelligent body under information environment |
CN110428057A (en) * | 2019-05-06 | 2019-11-08 | 南京大学 | A kind of intelligent game playing system based on multiple agent deeply learning algorithm |
-
2020
- 2020-03-03 CN CN202010140651.6A patent/CN111488992A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021754A (en) * | 2017-12-06 | 2018-05-11 | 北京航空航天大学 | A kind of unmanned plane Autonomous Air Combat Decision frame and method |
CN109496318A (en) * | 2018-07-30 | 2019-03-19 | 东莞理工学院 | Adaptive game playing algorithm based on deeply study |
CN109636699A (en) * | 2018-11-06 | 2019-04-16 | 中国电子科技集团公司第五十二研究所 | A kind of unsupervised intellectualized battle deduction system based on deeply study |
CN109670596A (en) * | 2018-12-14 | 2019-04-23 | 启元世界(北京)信息技术服务有限公司 | Non-fully game decision-making method, system and the intelligent body under information environment |
CN110428057A (en) * | 2019-05-06 | 2019-11-08 | 南京大学 | A kind of intelligent game playing system based on multiple agent deeply learning algorithm |
Non-Patent Citations (1)
Title |
---|
张春明主编: "《防空导弹飞行控制系统仿真测试技术》", 30 June 2014, 中国宇航出版社 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560332A (en) * | 2020-11-30 | 2021-03-26 | 北京航空航天大学 | Aviation soldier system intelligent behavior modeling method based on global situation information |
CN112836036A (en) * | 2021-03-18 | 2021-05-25 | 中国平安人寿保险股份有限公司 | Interactive training method, device, terminal and storage medium for intelligent agent |
CN112836036B (en) * | 2021-03-18 | 2023-09-08 | 中国平安人寿保险股份有限公司 | Interactive training method and device for intelligent agent, terminal and storage medium |
CN113298260A (en) * | 2021-06-11 | 2021-08-24 | 中国人民解放军国防科技大学 | Confrontation simulation deduction method based on deep reinforcement learning |
CN115470710A (en) * | 2022-09-26 | 2022-12-13 | 北京鼎成智造科技有限公司 | Air game simulation method and device |
KR20240048839A (en) * | 2022-10-07 | 2024-04-16 | 엘아이지넥스원 주식회사 | Virtual training apparatus capable of adaptive simulation of virtual target and training method using the same |
KR102718368B1 (en) * | 2022-10-07 | 2024-10-16 | 엘아이지넥스원 주식회사 | Virtual training apparatus capable of adaptive simulation of virtual target and training method using the same |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111488992A (en) | Simulator adversary reinforcing device based on artificial intelligence | |
CN110929394B (en) | Combined combat system modeling method based on super network theory and storage medium | |
CN113705102B (en) | Deduction simulation system, deduction simulation method, deduction simulation equipment and deduction simulation storage medium for sea-air cluster countermeasure | |
CN112131786A (en) | Target detection and distribution method and device based on multi-agent reinforcement learning | |
CN105678030B (en) | Divide the air-combat tactics team emulation mode of shape based on expert system and tactics tactics | |
CN110109653B (en) | A kind of land warfare wargame intelligent engine and its operation method | |
KR102560798B1 (en) | unmanned vehicle simulator | |
CN109597839B (en) | Data mining method based on avionic combat situation | |
Zacharias et al. | SAMPLE: Situation awareness model for pilot in-the-loop evaluation | |
CN113625569A (en) | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving | |
CN118194691A (en) | Human experience guided unmanned aerial vehicle air combat method based on deep reinforcement learning | |
CN116861779A (en) | Intelligent anti-unmanned aerial vehicle simulation system and method based on digital twinning | |
CN115185294B (en) | QMIX-based aviation soldier multi-formation collaborative autonomous behavior decision modeling method | |
Madni et al. | Augmenting MBSE with Digital Twin Technology: Implementation, Analysis, Preliminary Results, and Findings | |
CN113469853A (en) | Method for accelerating command control of fighting and artificial intelligence device | |
US20240320551A1 (en) | Autonomous virtual entities continuously learning from experience | |
KR101345645B1 (en) | Simulation System And Method for War Game | |
Liang et al. | A conception of flight test mode for future intelligent cockpit | |
Wang et al. | Research on naval air defense intelligent operations on deep reinforcement learning | |
CN115909027A (en) | Situation estimation method and device | |
Xiaochao et al. | A cgf behavior decision-making model based on fuzzy bdi framework | |
Bisantz et al. | Validating methods in cognitive engineering: a comparison of two work domain models | |
Dimitriu et al. | A Reinforcement Learning Approach to Military Simulations in Command: Modern Operations | |
Kushnier et al. | Situation Assessment Through Collaborative Human‐Computer Interaction | |
He et al. | Knowledge Graph Construction of System Capability in the Simulation Training Commanded with Electronic Countermeasures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200804 |