CN112171669B

CN112171669B - A brain-computer collaboration digital twin reinforcement learning control method and system

Info

Publication number: CN112171669B
Application number: CN202010998177.0A
Authority: CN
Inventors: 张小栋; 张腾; 陆竹风; 张毅; 蒋志明; 王雅纯; 朱文静; 蒋永玉
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2021-10-08
Anticipated expiration: 2040-09-21
Also published as: CN112171669A

Abstract

The invention discloses a brain-machine cooperative digital twin reinforcement learning control method and system. By constructing a brain-machine cooperative control model, the operator gives a virtual robot direction command, and at the same time, the brain when the operator gives the virtual robot direction command is collected. Electric signal, according to the collected EEG signal, the virtual robot is given the corresponding speed command to complete the specified action, and the brain-machine cooperative control model is rewarded according to the completion quality, and the training of the brain-machine cooperative control model is completed. The dual-loop information interaction mechanism between the brain and the machine is realized by reinforcement learning, and the interaction of the information layer and the instruction layer between the brain and the machine is realized. Compared with other brain-machine cooperation methods, it improves the robustness and generalization ability, and realizes the mutual adaptation and mutual growth between the brain-machine.

Description

Brain-computer cooperation digital twin reinforcement learning control method and system

Technical Field

The invention belongs to the technical field of brain-computer interface and artificial intelligence synthesis, and relates to a brain-computer cooperation digital twin reinforcement learning control method and system.

Background

With the development of robot technology, the demand for intelligent robots with human-like advanced perception and cognitive ability capable of executing unset tasks in highly complex environments is increasing. However, the current technology cannot achieve an intelligent robot with human thinking reasoning mode, autonomous discovery and feature extraction, online incremental learning and comprehensive processing of various information capabilities only by the artificial intelligence technology. The intelligent robot has the advantages that the human-machine intelligence fusion is utilized, the advantages of different intelligence of a human and a computer are fully exerted, and the intelligent robot is an important way for realizing the intelligent robot. With the complexity of tasks and scenes faced by a human-machine hybrid intelligent system, higher requirements are put on human intention perception and recognition, so that a brain-machine cooperation enhanced intelligent technology based on 'human-machine loop' is proposed and rapidly draws high attention on the basis of the research of human-machine hybrid intelligence. However, in the field of precision manipulation (e.g., surgical robots, special working robots, etc.), the brain-controlled system still has risks in terms of stability and safety, as compared to the limb manipulation system. Therefore, the human limb control command is still the main command in the field of precise control at present.

The research shows that the following two problems still exist in the field of precise control: (1) lack of bidirectional interaction of information between the operator and the robot, and unable to realize precise perception of the intention of the operator; (2) the brain has the problems of distraction, mental fatigue, excessive mental load and the like, so that the performance of the brain-computer hybrid intelligent system is poor and even dangerous. Aiming at the problem of brain-machine cooperation, the effective fusion of human brain intelligence and machine intelligence is not formed. Aiming at the poor performance of a brain-computer system caused by the mental state of the human brain, only the problem of one-way compensation of a control instruction is considered at present, and the interaction mechanism of a double loop between the brain and the computer is lacked. In summary, in the brain-machine cooperation control method for the human-machine in the loop, an integrated brain-machine cooperation model still exists at the present stage, the deep fusion of an information layer and an instruction layer cannot be effectively realized, and the precision, stability and safety of the brain-machine cooperation control need to be improved.

Disclosure of Invention

The invention aims to provide a brain-computer cooperation digital twin reinforcement learning control method and system to overcome the defects of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a brain-computer cooperation digital twin reinforcement learning control method comprises the following steps:

step 1), constructing a brain-computer cooperative control model based on a digital twin environment, wherein in the digital twin environment, an operator gives a virtual robot direction instruction, electroencephalograms when the operator gives the virtual robot direction instruction are collected at the same time, and a corresponding speed instruction of the virtual robot is given according to the collected electroencephalograms;

step 2), the virtual robot finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-computer cooperative control model according to the finishing quality of the appointed action, and finishes the training of the brain-computer cooperative control model at the current moment;

step 3), repeating the step 1) to the step 2), finishing the training of the brain-computer cooperative control model at different moments, finishing the training of the brain-computer cooperative control model when the absolute value of the difference of two adjacent reward values of the brain-computer cooperative control model is smaller than a threshold value K, and otherwise, continuing repeating the step 1) to the step 2) until the training of the brain-computer cooperative control model is finished;

and 4) realizing the brain-machine cooperation accurate control of the entity robot by utilizing the trained brain-machine cooperation control model, thereby finishing the brain-machine cooperation digital twin strengthening control.

Further, a virtual robot digital twin environment platform is set up, and adjustable instructions of the virtual robot are set, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the brain state of the operator.

Furthermore, in a digital twin environment, an operator gives a virtual robot direction instruction through a virtual control platform, and gives a virtual robot speed instruction according to electroencephalogram signals of the operator.

Further, an operator controls the virtual robot through the control device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model.

Furthermore, the operator passes through the device at time tThe control device sends a direction command C_tGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculated_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the brain_tAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.

Further, the virtual robot receives a speed instruction A at the time t_tThen, combine the direction command C of the operator at the same time_tStarting to execute the corresponding action until receiving action A at the next moment_t+1And direction command C of the operator_t+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing time_t。

Further, according to the brain state S_tSpeed command A_tAnd a prize R_tAnd (4) forming a data set, and updating the brain-machine cooperation control model.

Further, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.

A brain-computer cooperation digital twin reinforcement learning control system comprises an electroencephalogram acquisition module, a model training module and a control module;

the electroencephalogram acquisition module is used for acquiring an electroencephalogram signal of an operator when the operator gives a direction instruction of the virtual robot, giving a corresponding speed instruction of the virtual robot according to the acquired electroencephalogram signal, and transmitting the speed instruction to the model training module; the model training module finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishes the training of the brain-machine cooperation control model at the current moment, and the control module realizes the brain-machine cooperation accurate control of the entity robot according to the brain-machine cooperation control model obtained by the training.

Further, the model training module sends a direction instruction C through the control device at the time t according to the operator_tGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculated_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the brain_tAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a brain-machine cooperation digital twin reinforcement learning control method, which comprises the steps of constructing a brain-machine cooperation control model, setting a virtual robot direction instruction by an operator in a digital twin environment, simultaneously acquiring electroencephalograms when the virtual robot direction instruction is set by the operator, setting a corresponding speed instruction of a virtual robot according to the acquired electroencephalograms, finishing the direction instruction and speed instruction appointed action according to the obtained direction instruction and speed instruction, carrying out reward values on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishing the training of the brain-machine cooperation control model at the current moment, realizing a double loop information interaction mechanism between brain and machines by reinforcement learning through the brain-machine cooperation digital twin environment, wherein the model has good mobility from a virtual scene to a real scene, the method realizes the updating of model parameters in a control algorithm in the cooperative control process of the operator and the robot, along with the increase of the interaction times between the operator and the robot, the performance can be continuously improved, and the method has the capability of crossing individuals and tasks. Compared with other brain-computer cooperation methods, the robustness and the generalization capability are improved, and the brain-computer mutual adaptation and growth are realized.

Furthermore, electroencephalogram signals are used as environment objects, a control algorithm is used as an intelligent object, a double-loop information interaction mechanism is provided, an operator sends an operation command to the robot through an operation device, and meanwhile, the operation state of the robot is supervised in real time through visual and auditory information, the operation command is adjusted, and errors are corrected; after the integrated brain-computer cooperation model is used for processing, a regulation and control instruction is sent to the robot, and an active loop instruction and a passive loop instruction act on the robot in a cooperation mode, so that the robot can execute tasks safely, accurately and efficiently.

The brain-computer cooperation digital twin reinforcement learning control system realizes the mutual adaptation, mutual supervision and mutual growth of brain-computer cooperation, so that the robot can execute tasks accurately, safely and efficiently.

Drawings

FIG. 1 is a flowchart illustrating an exemplary control procedure according to an embodiment of the present invention.

FIG. 2 is a block diagram of a method flow in an embodiment of the invention.

FIG. 3 is a schematic diagram of the electrode arrangement position of the electroencephalogram signal acquisition module in the embodiment of the present invention.

Fig. 4 is a schematic diagram of an integrated brain-computer cooperation model in the embodiment of the invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

step 1), constructing a brain-machine cooperative control model based on a digital twin environment, training the brain-machine cooperative control model in the digital twin environment, giving a virtual robot direction instruction by an operator through a virtual control platform, simultaneously acquiring electroencephalograms when the operator gives the virtual robot direction instruction, and giving a corresponding speed instruction of the virtual robot according to the electroencephalograms when the operator gives the virtual robot direction instruction;

establishing a virtual environment: establishing a brain-machine cooperation control model based on a digital twin environment, and performing reinforcement learning training of the brain-machine cooperation control model in the digital twin environment; setting up a digital twin environment platform of the virtual robot, and setting adjustable instructions of the virtual robot, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the electroencephalogram signal of the operator.

The control device is used for outputting control instructions and comprises a mouse, a handle and a direction controller.

An operator controls the virtual robot through an operation device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected at the same time, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model;

in the aspect of speed instruction control of the virtual robot, a brain-computer interface technology is utilized, a computer analyzes a brain state according to an electroencephalogram signal of a manipulator, and a speed instruction for controlling the virtual robot is output according to the brain state.

When the control is started, the operator sends a direction command C through the control device at the moment t_tGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of the operator 600ms before the time t are collected. The position arrangement of brain wave cap channels meets the international 10/20 standard, and electrodes are arranged on Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6. Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1, O2 positions. 32 channels of electroencephalogram signals are counted.

Calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signal_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3(wherein F_P3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [_D,F_P,F_P3]Forming multi-dimensional characteristic data S capable of reflecting current state of brain_t；

Specifically, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.

the virtual robot receives a speed instruction A at the time t_tThen, combine the direction command C of the operator at the same time_tStarting to execute the corresponding action until receiving action A at the next moment_t+1And direction command C of the operator_t+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing time_t。

Specifically, the brain-machine cooperation control model adopts a 5-layer fully-connected neural network model. According to the brain shapeState S_tSpeed command A_tAnd a prize R_tComposed data set (S)_t、A_t、R_t) And updating parameters of the 5-layer fully-connected neural network model, wherein the specific updating process comprises the following steps: when rewarding R_tWhen the value is higher, after the brain-machine cooperation control model parameters are updated, the brain state S is input next time_tIn the case of (2), a speed command A is output_tThe probability of (2) becomes large; when rewarding R_tWhen the value is lower, after the brain-machine cooperation control model parameters are updated, the brain state S is input next time_tIn the case of (2), a speed command A is output_tThe probability of (2) becomes small. According to the training process, when a brain state S is input, the model outputs a corresponding speed instruction A, so that the obtained reward R can be stabilized at a higher level.

and setting a model training threshold K, finishing the model training when the absolute value of the difference of the two adjacent rewards R is smaller than the threshold K, and continuing the training until the model training is finished.

Specifically, a trained brain-machine cooperation control model is transplanted to a controller of the entity robot, and in the control process, a real environment and a virtual environment are synchronized through a digital twin method, so that parameters of the controller of the entity robot are corrected in real time.

Example (b):

and transplanting the trained model to a controller of the entity robot to realize brain-computer cooperation accurate control of the entity robot. Meanwhile, in the control process, the real environment and the virtual environment are completely synchronized by using a digital twin technology, and the parameters of the entity robot controller are corrected in real time.

Step 1: the method comprises the steps of setting up a real physical environment operation robot, and compared with a virtual training platform, except that a controlled object is a physical robot, other operation objects are the same;

step 2: when the control is started, the operator sends a direction command C through the control device at the moment t_tFeeding the entity robot; meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected; the position arrangement of brain wave cap channels conforms to the international 10/20 standard, and electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2. 32 channels of electroencephalogram signals are counted;

and step 3: calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signal_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3(wherein F_P3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [_D,F_P,F_P3]Forming multi-dimensional characteristic data S capable of reflecting current state of brain_t。

And 4, step 4: inputting S in trained brain-machine cooperation control model_tOutput as action A at the corresponding time_tAnd will output action A_tTransmitting the data to the entity robot by using a wireless communication mode;

and 5: the entity robot receives the action A sent by the computer at the time t_tWhile combining the directional command C of the operator_tStarting to execute corresponding action until receiving action A sent by the computer at the next moment_t+1And the command C sent by the operator_t+1Then, executing the next action; until the control task is finished.

Step 6: and when the entity robot executes corresponding actions, real environment parameters and entity robot state parameters are transmitted to the digital twin environment by using the sensor, so that the virtual environment and the real environment are synchronized, and the parameters of the entity robot controller are corrected in real time.

Training a brain-machine cooperation control model:

(1-1) a mechanical arm digital twin environment platform (as shown in figure 1) is built, the adjustable instructions of the tail end of the virtual mechanical arm are set to be 8 items (direction: front, back, left, right, up and down; speed: acceleration and deceleration), wherein the direction instruction C is controlled by an operator through a remote lever, and the speed instruction A is controlled by a controller according to the brain state of the operator. The controller sends a direction instruction C to the virtual mechanical arm by operating the remote lever, and sends a speed instruction A to the virtual mechanical arm by detecting the brain state of the controller. The virtual mechanical arm combines two instructions of direction and speed to execute the task of tracking the tail end track of the mechanical arm. In the brain-computer cooperation control process, an operator continuously adjusts the direction instruction C by observing the running state of the virtual mechanical arm; meanwhile, the running state of the virtual mechanical arm can also influence the brain state of the operator (the controller adjusts the speed instruction A of the virtual mechanical arm by detecting the brain state), so that the virtual mechanical arm is accurately controlled through brain-machine cooperation.

(1-2) starting the control, the operator sends a direction command C through the control device at the time t_tAnd (4) giving the virtual robot. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.

(1-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signal_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3(wherein F_P3Comprises the following steps: theta rhythmThe sum of the band energies of the wave (4-8Hz) and the alpha rhythm wave (8-16Hz) is divided by the band energy of the beta rhythm wave (16-32Hz), and the three feature matrices are combined along the row vector in the form of [ F [_D,F_P,F_P3]Forming multi-dimensional characteristic data S capable of reflecting current state of brain_t。

(1-4) establishing a 5-layer fully-connected neural network in the controller, wherein the network input is the brain state S at the time t_tAfter network training, 2D motion data A at corresponding time is output_tI.e. increase in speed by + δ v, or decrease in speed by- δ v, and move A_tTo the virtual robotic arm.

(1-5) receiving the speed regulation and control instruction A at the time t by the virtual mechanical arm_tThen, the direction command C of the operator is combined at the same time_tStarting to move in the virtual environment at the corresponding speed and direction until the next moment of action a is received_t+1And direction command C of the operator_t+1And then, executing the next action.

(1-6) the task of the current round is completed, or the task fails within a specified time, namely the task of the round is finished. The virtual environment feeds back the quality of the task completed by the virtual mechanical arm, and the quality is scored. There are two cases: firstly, recording 0 point when the task fails; task completion score is composed of two parts: basic score (50 min) + track quality, completion time (0-50 min). The scores are then normalized and converted into the reward R that can be recognized by the controller_t(plus/minus bonus).

(1-7) Collection of N (N ═ 5) groups from brain State S_tVirtual mechanical arm action A_tAnd a prize R_tComposed data set (S)_t、A_t、R_t) Calculating the average value (S) thereof_{t_a}、A_{t_a}、R_{t_a}) The parameters are input into a controller, and parameters of the model are updated by adopting a gradient descent method.

(1-8) setting a model training threshold K, and respectively solving reward functions R at t +1 and t moments_t+1And R_tIf the absolute value of the difference is less than K, the model training is judged to be finished, otherwise, the step (1-3) is returned, and the next circulation is carried out until the model is testedAnd finishing the training.

And (1-9) inputting the trained model parameters into a computer for controlling the mechanical arm of the entity.

Brain-machine cooperative digital twinning manipulation:

and (2-1) constructing a real physical environment to control the physical mechanical arm, wherein compared with a virtual training platform, the controlled object is the physical mechanical arm, and other objects are the same.

(2-2) starting the control, the operator sends a direction command C through the control device at the time t_tAnd (4) feeding the physical mechanical arm. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.

(2-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signal_DPower spectral density characteristic F_PAnd three frequency band energy relationship features F_P3(wherein F_P3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [_D,F_P,F_P3]Forming multi-dimensional characteristic data S capable of reflecting current state of brain_t。

(2-4) inputting S in the trained brain-machine cooperation control model_tOutput as action A at the corresponding time_tAnd will output action A_tAnd transmitting the data to the entity mechanical arm by using a local area network mode.

(2-5) the physical mechanical arm receives the action A sent by the computer at the time t_tWhile combining the directional command C of the operator_tStart to perform the correspondingUntil receiving action A sent by the computer at the next moment_t+1And the command C sent by the operator_t+1And then, executing the next action. Until the control task is finished.

And (2-6) while the entity mechanical arm executes corresponding actions, transmitting real environment parameters and entity mechanical arm state parameters to a digital twin environment by using a sensor, synchronizing the virtual environment and the real environment, and correcting the parameters of the entity mechanical arm controller in real time.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. a brain-computer collaboration digital twin reinforcement learning control method, is characterized in that, comprises the following steps:

Step 1), build a brain-computer cooperative control model based on the digital twin environment. In the digital twin environment, the operator controls the virtual robot through the manipulation device. At this time, the virtual robot obtains the direction command, and at the same time collects the direction command given by the operator to the virtual robot When the EEG signal is obtained, the corresponding speed command of the virtual robot is given according to the collected EEG signal. Specifically: the operator sends the direction command C _t to the virtual robot through the control device at time t ; at the same time, the operator is collected 600ms before time t . The surface EEG signal of the brain is calculated, the differential entropy feature _FD , the power spectral density feature _FP and the three frequency band energy relationship feature FP3 of the extracted _600ms EEG signal are calculated, and the matrix of these three features is arranged along the row vector Combining to form multi-dimensional feature data S _t that can reflect the current state of the brain, establishing the correlation between the virtual robot and the EEG signal according to the actual speed command of the virtual robot, and obtaining a preliminary model of brain-computer cooperative control;

Step 2), the virtual robot completes the specified action of the direction command and speed command according to the obtained direction command and speed command, and according to the completion quality of the specified action, the brain-computer cooperative control model is rewarded, and the brain-computer cooperative control at the current moment is completed. model training;

Step 3), repeat step 1)-step 2), complete the training of the brain-computer cooperative control model at different times, when the absolute value of the difference between the two adjacent reward values of the brain-computer cooperative control model is less than the threshold K, Then complete the training of the brain-computer cooperative control model, otherwise, continue to repeat steps 1)-step 2) until the training of the brain-computer cooperative control model is completed;

Step 4), using the trained brain-computer cooperation control model to realize the precise control of the brain-computer cooperation of the physical robot, so as to complete the enhanced control of the brain-computer cooperation digital twin.

2. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, is characterized in that, builds virtual robot digital twin environment platform, sets the adjustable instruction of virtual robot, and adjustable instruction comprises direction instruction and . The speed command; the direction command is the direction control command, which is controlled by the operator through the manipulation device; the speed command is the speed control command used to control the virtual robot, and the speed command is obtained according to the operator's brain state.

3. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 2, is characterized in that, in digital twin environment, operator gives virtual robot direction instruction by virtual control platform, according to operator's brain The electrical signal gives the virtual robot speed command.

4. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, it is characterized in that, after virtual robot receives speed command A t at time _t , combines the direction command C _t of operator simultaneously, starts to execute Corresponding actions, until the action A _t+1 at the next moment and the direction command C _t+1 of the controller are received, the next action is performed until the task of one round is completed; when the task of each round is completed , record the virtual robot task execution, and calculate the reward R _t according to the two criteria of task completion quality and completion time.

5. A kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 4, is characterized in that, according to the data group that brain state S _t , speed instruction A _t and reward R _t are formed, update brain-computer collaboration control model.

6. A kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, is characterized in that, the operator sends the direction instruction C to the virtual robot through the control device, simultaneously detects the operator's EEG signal, and the The EEG signal is converted into a speed command A and sent to the virtual robot. The virtual robot combines the direction command C and the speed command A to execute the tasks specified by the direction command C and the speed command A; In the running state, continuously adjust the direction command C, and at the same time collect the operator's EEG signal to give the corresponding speed command A.

7. A brain-computer collaboration digital twin reinforcement learning control system, characterized in that it comprises an EEG acquisition module, a model training module and a control module;

The EEG acquisition module is used to obtain the EEG signal of the operator when the operator gives the direction command of the virtual robot, give the corresponding speed command of the virtual robot according to the collected EEG signal, and transmit the speed command to the model training module; model training The module completes the specified action of the direction command and speed command according to the obtained direction command and speed command, and according to the completion quality of the specified action, rewards the brain-computer cooperative control model, and completes the training of the brain-computer cooperative control model at the current moment. The module realizes the precise control of the brain - computer cooperation of the physical _robot according to the brain -computer cooperation control model obtained by training. The brain surface EEG signal of the 600ms operator is calculated, the differential entropy feature _FD , the power spectral density feature _FP and the three frequency band energy relationship feature FP3 of the extracted _600ms EEG signal are calculated, and the matrix of these three features is calculated along the Combining the line vectors to form multi-dimensional feature data S _t that can reflect the current state of the brain, establish the correlation between the virtual robot and the EEG signal according to the actual speed command of the virtual robot, and obtain a preliminary model of brain-computer cooperative control.