CN111488992A

CN111488992A - Simulator adversary reinforcing device based on artificial intelligence

Info

Publication number: CN111488992A
Application number: CN202010140651.6A
Authority: CN
Inventors: 夏少杰; 刘长卫; 瞿崇晓; 张瑞峰; 高翔; 李永强
Original assignee: CETC 52 Research Institute
Current assignee: CETC 52 Research Institute
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2020-08-04

Abstract

The invention discloses a simulator opponent reinforcing device based on artificial intelligence, which comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module; the deep reinforcement learning module trains an agent by adopting the received data; the workstation module is connected with the simulator through the adaptation module, so that the simulator operation object and the intelligent agent perform confrontation simulation. The invention utilizes a uniform technical framework, can realize networking man-machine fight in various types of simulators, can automatically complete information acquisition and further promotes the continuous upgrading of the intelligent agent.

Description

Simulator adversary reinforcing device based on artificial intelligence

Technical Field

The invention belongs to the technical field of artificial intelligence confrontation and simulation deduction, and particularly relates to a simulator opponent reinforcing device based on artificial intelligence.

Background

Various simulators are often used to model a true confrontational environment, whether for games or training of special personnel. For the flight simulator, the flight simulator can approach reality to a great extent through simulation of a real airplane, including approximation of a dynamic model, environmental simulation, a cockpit, operation parts and the like. Under the condition of low cost, the aircraft crew can continuously carry out flight training through the flight simulator, and the flight simulator has the advantages of safety, reliability, economy and high working efficiency, and has a positive effect on improving the flight level of pilots.

However, the upgrade and update of the existing simulator are mainly updated one by one through original factories, and as the sources of the factories of the flight simulator are various, the upgrade difficulty and investment of the operation are extremely high.

The existing simulator generally aims at flight simulation, and can partially realize simple autonomous behaviors such as cruising and fixed-point flight; and a flight simulator can select a simulation opponent to carry out air combat confrontation training, but the confrontation algorithm mostly adopts simple expert knowledge, and is low in confrontation level and poor in adaptability under different scenes.

The defects of the existing flight simulator are mainly reflected in the following aspects by taking air combat as a training purpose: the intelligent level of the virtual opponent is not enough, the antagonism experience is not enough, and the strong enemy antagonism cannot be realized to achieve the intelligent auxiliary purpose; the simulator system is closed, so that the outside is difficult to interact with the simulator system, the expansion is inconvenient, and the further upgrading can not be carried out according to the requirement; the different simulators have greatly different hardware and software designs, high maintenance cost and incapability of being transplanted, and time and labor are wasted in customized upgrading.

In view of the above disadvantages, upgrading the existing simulator urgently needs to solve the following problems: (1) the upgrading scheme is easy to copy, has wide adaptability and does not depend on the type of a simulator; (2) the intelligent level of the game opponents is high, and the game opponents are more close to actual combat by taking air combat as guidance; (3) the maintenance cost is low, and the transplantation is convenient.

Disclosure of Invention

The invention aims to provide an artificial intelligence-based simulator opponent reinforcing device which is used for improving the antagonism of games or training and providing high-intelligence imaginary opponents for operators.

In order to achieve the purpose, the technical scheme of the application is as follows:

an artificial intelligence-based simulator opponent augmentation device, the artificial intelligence-based simulator opponent augmentation device comprising: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:

when the intelligent game machine works in a training mode, the workstation module is provided with a self-game confrontation training engine and sends data obtained by self-game confrontation training to the deep reinforcement learning module;

the deep reinforcement learning module trains an agent by adopting the received data;

when the intelligent agent is in an inference mode, the workstation module is connected with the simulator through the adaptation module so that the simulator operation object and the intelligent agent can perform confrontation simulation, the adaptation module acquires screen display information from the simulator for recognition, and sends recognized attitude and posture information of the simulator operation object to the workstation module;

the workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module generates decision information and sends the decision information to the workstation module, and the workstation module sends the attitude information of the intelligent body to the adaptation module after executing the decision;

and the adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display.

Further, the simulator opponent reinforcing device based on artificial intelligence further comprises a data management module, wherein the data management module is used for storing sample data generated by a training mode and an inference mode and storing a trained intelligent agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module.

Further, the simulator opponent reinforcing device based on artificial intelligence further comprises an efficiency evaluation module, wherein the efficiency evaluation module is connected with the workstation module and is used for evaluating the training result.

Further, the deep reinforcement learning module trains the agent by adopting a DDPG algorithm.

Further, the adaptation module acquires screen display information from the simulator for recognition, and adopts OCR to recognize images acquired from the screen display information and acquire attitude and posture information of the simulator operation object.

The simulator adversary strengthening device based on artificial intelligence that this application provided constructs a simulation environment to the air battle through the workstation module, realizes the butt joint with the line simulator through the adaptation module, lets the flight simulator can operate in the simulation environment of structure. Through the deep reinforcement learning module, the training generation of the intelligent agent on the strong enemy of the opponent is realized, and a strong AI opponent is provided for simulation training personnel. The technical scheme of the application has the advantages that: (1) by utilizing a uniform technical architecture, networking man-machine fight can be realized in various types of flight simulators; (2) the method takes continuous image high-speed identification as a core, can extract the airplane state by simply utilizing video information output by a simulator, inputs the airplane state into an intelligent air combat engine, and simultaneously displays a virtual enemy target in the engine in a display of an original simulator in an overlapping manner, so that a flight simulator is upgraded into an intelligent air combat simulator, and the training level of air combat tactics is promoted to be improved; (3) in the man-machine fighting process, tactical information acquisition can be automatically completed, and continuous upgrading of the intelligent air combat system is further promoted.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence-based simulator opponent reinforcing device according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, an artificial intelligence based simulator opponent augmentation device includes: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:

In this embodiment, pilot training is taken as an example for explanation, and the method is also applicable to the reinforcement of other simulator artificial intelligence opponents, for example, the reinforcement of an artificial intelligence opponent in a game, and details are not described below. In the present embodiment, an artificial intelligence opponent is referred to as an agent, and an object that is manually operated on a simulator is referred to as a simulator operation object. In the pilot training, the agent performs the countermeasure drilling as the blue army in the air combat countermeasure, and the simulated airplane participating in the pilot operation of the simulated countermeasure as the red army (simulator operation object).

In this embodiment, the workstation module is internally provided with a self-game countermeasure training engine, and can also be externally connected with the self-game countermeasure training engine, if pilot training is performed, the self-game countermeasure training engine can be a simulated air combat engine, various functions of simulated air combat simulation are realized, the functions include various elements such as an air combat scene and an airplane, and the simulated air combat engine can perform personalized configuration on the air combat scene and the airplane type. The workstation module is the pivot of the device, and serves as a bridge among all modules, and supports scene selection, airplane parameter configuration and the like. The method can realize the definition of air battle scene, the editing of scene parameters, the loading of scene, the definition of airplane model, the editing of airplane model parameters, the loading of airplane model, the configuration of victory or defeat rule parameters and the like.

The deep reinforcement learning module integrates algorithms related to air combat, including reinforcement learning, expert rules and the like, by taking pilot training as an example, and can realize classical tactical methods of air combat such as cross attack, unilateral attack, bilateral attack and the like.

The deep reinforcement learning module of the embodiment trains the agent based on the DDPG algorithm, which is called a deep diagnostic Policy Gradient, and the DDPG algorithm is a relatively mature algorithm and is not described herein again. Assuming a state s_tFor the fighting situation observed by the agent in the current state, a_tDecision-making actions to be performed by agents in the current situation of combat, r_tInstant rewards, s, obtained for agents after performing decision-making actions_t+1And after the intelligent agent executes the decision-making action, entering the fighting situation returned by the next state environment. The intelligent agent carries out the engagement with the air battle environment through a series of s, a and rThe goal is to maximize the jackpot. Under the current state s, the intelligent agent calculates a decision behavior a according to the strategy mu-mu (a-s)_tWill decide action a_tAnd executing in the simulation environment until the end of the war office.

In the training mode, the training can be started from zero, and the existing model can be loaded to continue training. A self-game confrontation training engine is arranged in the workstation module, and provides a simulation environment for air combat simulation, including scene modeling, airplane modeling, reward function modeling and the like. The purpose of scene modeling is to establish organizational relationships among airplanes, such as scene assumptions of 2v2 and 4v2, battle rule design and the like. The airplane modeling comprises basic element modeling and maneuvering capability modeling, wherein the basic element modeling is an important part for constructing a simulation environment and comprises a detection radar, a photoelectric radar, a missile, a jamming bomb, a jamming pod and the like; maneuvering ability modeling is the basis of air combat simulation environment construction and tactical strategy design, and for an air combat game intelligent engine, state information of a virtual airplane, such as position, angle, speed, radar state, missile state and the like, is the basis for influencing decision making. The reward function is a standard for the intelligent agent to judge the self performance, and the establishment of a reasonable reward function R(s) is a key for reinforcement learning. The reward function should take into account the realizability and sparsity comprehensively, and achieve the balance between tactical implementation and exploration. The reward function is not uniquely shaped and the distribution of positive and negative rewards is preferably balanced across the range of argument for value optimization.

The workstation module returns data of the self-game confrontation training, such as states and rewards to the deep reinforcement learning module, the deep reinforcement learning module returns decision behaviors to the workstation module, and the workstation module executes decisions, so that a stronger confrontation opponent agent is trained.

In the reasoning mode, the workstation module of the embodiment is connected with the simulator through the adaptation module, and the simulator is a flight simulator when the pilot trains for example. The adaptation module is used for adapting different simulators and mainly comprises an image recognition module and an image processing module. The pilot operates on the flight simulator, and the pilot usually observes the information of the pilot and the information of the azimuth angle, the distance and the like of the enemy plane in the radar detection range in real time through a vision system of the flight simulator so as to carry out the next flight operation. Similarly, for the intelligent agent, the situation information of the current air combat environment also needs to be obtained in real time, and the next flight operation is output. However, since the existing flight simulators are various and there is not necessarily an interface for transmitting flight data of the flight simulator to the outside of the flight simulator, how to obtain the flight data from the flight simulator is a key problem.

According to the embodiment, attitude and attitude information such as the position, the direction angle and the like of the pilot plane is obtained by continuously identifying instrument panel information in the pilot vision system at a high speed. In the air combat countermeasure process, in order to give the pilot the feeling of real flight, the vision system can present corresponding pictures according to the current environment, the change of the operating rod and the current motion state of the airplane, for example, the instrument panel cannot be seen clearly due to strong light, and effective information is shielded when the operating rod is moved. There are different countermeasures to this problem:

a) and performing split screen processing on the main instrument panel, capturing screen picture images in real time, and performing image recognition to obtain information transmitted by the instrument panel. The information in the instrument panel is identified simply and accurately, and the acquired situation information is stable. A disadvantage is that the flight simulator needs to provide the ability to split the dashboard.

b) And directly identifying a visual system, namely a picture of the visual angle of a pilot. For the condition of effective information shielding, continuous data provided by continuous image identification is subjected to data analysis to obtain the effective information shielding, and algorithms such as interpolation, prediction and the like are included. The method has the advantages that the method does not depend on whether the flight simulator can provide corresponding external interfaces or not, and the universality is stronger. The disadvantages are: the situation information is not stably obtained, corresponding algorithms need to be formulated according to different abnormal conditions of the picture, and if a deep learning method with strong generalization capability is adopted, a large number of training samples need to be obtained.

The image recognition module of the embodiment recognizes the image through OCR and acquires attitude and posture information. OCR (optical character recognition) is a technique of determining the approximate shape of the image content by detecting patterns of light and shade, and then converting the shape into computer text by a character recognition method. OCR typically performs the following operations on an image: 1) image preprocessing, such as binarization processing and the like, is used for enhancing the readability of the image; 2) detecting, namely positioning an area containing characters in a picture; 3) and identifying characters in the area.

Commonly used detection methods include: a) a connected domain based approach; 2) a sliding window based approach; 3) a method based on deep learning. The method based on the connected domain considers that the characters in the image appear in the form of the connected domain, so that the method firstly extracts all the connected domains in the image as alternatives, and then selects the corresponding character connected domain according to judgment of a classifier or some rules. The method based on the sliding window is to extract the characteristics of the image covered by the sliding window, and input the image into a pre-scoring classifier to judge whether the current area contains characters. The detection method based on deep learning is generally used for the situation that the scene is more complicated. By selecting labeled data for the text portion in a large number of boxes, a multi-tiered learnable network is used to fit the mapping between the data and the labels, boxes by minimizing a predefined loss function.

And identifying the detected target area. In principle, the method can be roughly divided into two types: template matching, feature extraction and classifier. The traditional feature extraction is usually to extract edge information, gray scale features, computer features, etc. of the target region, and commonly used classifiers include a support vector machine, nearest neighbor, etc. Recently, the hot deep learning is another type of feature extraction + classifier model. The method comprises the steps of automatically extracting features which are easy to classify through an end-to-end mode, a multidimensional fitting space, a predefined loss function and a large amount of labeled data. The method is widely used through the characteristics of self-extracting characteristics, higher identification precision and strong generalization capability.

The situation information that the image recognition module among this embodiment adaptation module can discern simulator operation object (red is equal) in the simulator passes to the workstation module, and image processing module can be sent the simulator with intelligent agent (blue army) situation information stack to radar situation picture, and then realizes man-machine confrontation.

The workstation module sends the attitude information of the simulator operation object and the attitude information of the intelligent body to the deep reinforcement learning module, the deep reinforcement learning module outputs decision-making actions according to the current attitude, generates decision-making information and sends the decision-making information to the workstation module, the workstation module updates the state of the intelligent body in a simulation environment after executing the decision-making, and sends the attitude information of the intelligent body to the adaptation module.

The adaptation module generates an image containing the attitude of the intelligent body according to the attitude information of the intelligent body and sends the image to the simulator for screen display, and the pilot carries out man-machine game operation after seeing the blue army attitude displayed on the screen.

In one embodiment, the simulator opponent reinforcing device based on artificial intelligence further comprises a data management module, wherein the data management module is used for storing sample data generated by a training mode and an inference mode and storing a trained intelligent agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module. In the training mode, the workstation module generates confrontation sample data through a built-in self-game confrontation training engine, and the generated confrontation sample data is sent to the deep reinforcement learning module for training and simultaneously the same data is sent to the data management module for persistent storage; meanwhile, in the training process, the deep reinforcement learning module sends the intelligent model to the data management module irregularly according to the convergence condition of the intelligent model training, and the data management module performs unified storage and management according to the description of the model. In the reasoning mode, the workstation module receives the real-time countermeasure data sent by the adaptation module, and sends the data to the data management module for persistent storage while making a decision.

Sample data, intelligent agent models and the like generated in the training and reasoning processes are uniformly managed by the data management module, and can be stored, loaded, selected, copied and the like through the workstation module.

In another embodiment, the simulator opponent strengthening device based on artificial intelligence further comprises an efficiency evaluation module, the efficiency evaluation module is connected with the workstation module, the workstation module sends attitude and posture information of both opponents and intelligent agent decision information to the efficiency evaluation module in the training and reasoning process, and the efficiency evaluation module performs intelligent agent decision effect evaluation and opponents attitude and superiority evaluation according to the received information and visually displays evaluation results.

The efficiency evaluation module can help a user to better understand the training progress and evaluate the training convergence condition of the intelligent agent; by modifying the difference between the parameter comparison training result of the simulation environment and the intelligent agent winning rate, the decision maker can understand the influence degree of each parameter on the final result, and the defects of each aspect of the intelligent agent are overcome. The efficiency evaluation module can evaluate the combat efficiency of each airplane and judge the strength of the intelligent agent according to the efficiency evaluation result.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An artificial intelligence based simulator opponent augmentation device, comprising: the simulator opponent reinforcing device based on artificial intelligence comprises a deep reinforcement learning module, a workstation module and an adaptation module, wherein the working modes of the simulator opponent reinforcing device based on artificial intelligence comprise a training mode and an inference mode, wherein:

2. The artificial intelligence based simulator opponent enhancement device of claim 1, further comprising a data management module for saving sample data generated by the training mode and the inference mode, and saving the trained agent model; and the data management module is respectively connected with the workstation module and the deep reinforcement learning module.

3. The artificial intelligence-based simulator opponent enhancement device of claim 1, further comprising a performance evaluation module coupled to the workstation module for evaluating training results.

4. The artificial intelligence-based simulator opponent enhancement device of claim 1, wherein the deep reinforcement learning module employs a DDPG algorithm to train an agent.

5. The artificial intelligence based simulator opponent enhancement device of claim 1, wherein the adaptation module obtains on-screen information from the simulator for recognition, and uses OCR to recognize images obtained from the on-screen information and obtain pose information of the simulator operation object.