Disclosure of Invention
The invention aims to provide a brain-computer cooperation digital twin reinforcement learning control method and system to overcome the defects of the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a brain-computer cooperation digital twin reinforcement learning control method comprises the following steps:
step 1), constructing a brain-computer cooperative control model based on a digital twin environment, wherein in the digital twin environment, an operator gives a virtual robot direction instruction, electroencephalograms when the operator gives the virtual robot direction instruction are collected at the same time, and a corresponding speed instruction of the virtual robot is given according to the collected electroencephalograms;
step 2), the virtual robot finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-computer cooperative control model according to the finishing quality of the appointed action, and finishes the training of the brain-computer cooperative control model at the current moment;
step 3), repeating the step 1) to the step 2), finishing the training of the brain-computer cooperative control model at different moments, finishing the training of the brain-computer cooperative control model when the absolute value of the difference of two adjacent reward values of the brain-computer cooperative control model is smaller than a threshold value K, and otherwise, continuing repeating the step 1) to the step 2) until the training of the brain-computer cooperative control model is finished;
and 4) realizing the brain-machine cooperation accurate control of the entity robot by utilizing the trained brain-machine cooperation control model, thereby finishing the brain-machine cooperation digital twin strengthening control.
Further, a virtual robot digital twin environment platform is set up, and adjustable instructions of the virtual robot are set, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the brain state of the operator.
Furthermore, in a digital twin environment, an operator gives a virtual robot direction instruction through a virtual control platform, and gives a virtual robot speed instruction according to electroencephalogram signals of the operator.
Further, an operator controls the virtual robot through the control device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model.
Furthermore, the operator passes through the device at time tThe control device sends a direction command CtGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculatedDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the braintAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.
Further, the virtual robot receives a speed instruction A at the time ttThen, combine the direction command C of the operator at the same timetStarting to execute the corresponding action until receiving action A at the next momentt+1And direction command C of the operatort+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing timet。
Further, according to the brain state StSpeed command AtAnd a prize RtAnd (4) forming a data set, and updating the brain-machine cooperation control model.
Further, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.
A brain-computer cooperation digital twin reinforcement learning control system comprises an electroencephalogram acquisition module, a model training module and a control module;
the electroencephalogram acquisition module is used for acquiring an electroencephalogram signal of an operator when the operator gives a direction instruction of the virtual robot, giving a corresponding speed instruction of the virtual robot according to the acquired electroencephalogram signal, and transmitting the speed instruction to the model training module; the model training module finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishes the training of the brain-machine cooperation control model at the current moment, and the control module realizes the brain-machine cooperation accurate control of the entity robot according to the brain-machine cooperation control model obtained by the training.
Further, the model training module sends a direction instruction C through the control device at the time t according to the operatortGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculatedDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the braintAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention relates to a brain-machine cooperation digital twin reinforcement learning control method, which comprises the steps of constructing a brain-machine cooperation control model, setting a virtual robot direction instruction by an operator in a digital twin environment, simultaneously acquiring electroencephalograms when the virtual robot direction instruction is set by the operator, setting a corresponding speed instruction of a virtual robot according to the acquired electroencephalograms, finishing the direction instruction and speed instruction appointed action according to the obtained direction instruction and speed instruction, carrying out reward values on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishing the training of the brain-machine cooperation control model at the current moment, realizing a double loop information interaction mechanism between brain and machines by reinforcement learning through the brain-machine cooperation digital twin environment, wherein the model has good mobility from a virtual scene to a real scene, the method realizes the updating of model parameters in a control algorithm in the cooperative control process of the operator and the robot, along with the increase of the interaction times between the operator and the robot, the performance can be continuously improved, and the method has the capability of crossing individuals and tasks. Compared with other brain-computer cooperation methods, the robustness and the generalization capability are improved, and the brain-computer mutual adaptation and growth are realized.
Furthermore, electroencephalogram signals are used as environment objects, a control algorithm is used as an intelligent object, a double-loop information interaction mechanism is provided, an operator sends an operation command to the robot through an operation device, and meanwhile, the operation state of the robot is supervised in real time through visual and auditory information, the operation command is adjusted, and errors are corrected; after the integrated brain-computer cooperation model is used for processing, a regulation and control instruction is sent to the robot, and an active loop instruction and a passive loop instruction act on the robot in a cooperation mode, so that the robot can execute tasks safely, accurately and efficiently.
The brain-computer cooperation digital twin reinforcement learning control system realizes the mutual adaptation, mutual supervision and mutual growth of brain-computer cooperation, so that the robot can execute tasks accurately, safely and efficiently.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
a brain-computer cooperation digital twin reinforcement learning control method comprises the following steps:
step 1), constructing a brain-machine cooperative control model based on a digital twin environment, training the brain-machine cooperative control model in the digital twin environment, giving a virtual robot direction instruction by an operator through a virtual control platform, simultaneously acquiring electroencephalograms when the operator gives the virtual robot direction instruction, and giving a corresponding speed instruction of the virtual robot according to the electroencephalograms when the operator gives the virtual robot direction instruction;
establishing a virtual environment: establishing a brain-machine cooperation control model based on a digital twin environment, and performing reinforcement learning training of the brain-machine cooperation control model in the digital twin environment; setting up a digital twin environment platform of the virtual robot, and setting adjustable instructions of the virtual robot, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the electroencephalogram signal of the operator.
The control device is used for outputting control instructions and comprises a mouse, a handle and a direction controller.
An operator controls the virtual robot through an operation device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected at the same time, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model;
in the aspect of speed instruction control of the virtual robot, a brain-computer interface technology is utilized, a computer analyzes a brain state according to an electroencephalogram signal of a manipulator, and a speed instruction for controlling the virtual robot is output according to the brain state.
When the control is started, the operator sends a direction command C through the control device at the moment ttGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of the operator 600ms before the time t are collected. The position arrangement of brain wave cap channels meets the international 10/20 standard, and electrodes are arranged on Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6. Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1, O2 positions. 32 channels of electroencephalogram signals are counted.
Calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint;
Specifically, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.
Step 2), the virtual robot finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-computer cooperative control model according to the finishing quality of the appointed action, and finishes the training of the brain-computer cooperative control model at the current moment;
the virtual robot receives a speed instruction A at the time ttThen, combine the direction command C of the operator at the same timetStarting to execute the corresponding action until receiving action A at the next momentt+1And direction command C of the operatort+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing timet。
Specifically, the brain-machine cooperation control model adopts a 5-layer fully-connected neural network model. According to the brain shapeState StSpeed command AtAnd a prize RtComposed data set (S)t、At、Rt) And updating parameters of the 5-layer fully-connected neural network model, wherein the specific updating process comprises the following steps: when rewarding RtWhen the value is higher, after the brain-machine cooperation control model parameters are updated, the brain state S is input next timetIn the case of (2), a speed command A is outputtThe probability of (2) becomes large; when rewarding RtWhen the value is lower, after the brain-machine cooperation control model parameters are updated, the brain state S is input next timetIn the case of (2), a speed command A is outputtThe probability of (2) becomes small. According to the training process, when a brain state S is input, the model outputs a corresponding speed instruction A, so that the obtained reward R can be stabilized at a higher level.
Step 3), repeating the step 1) to the step 2), finishing the training of the brain-computer cooperative control model at different moments, finishing the training of the brain-computer cooperative control model when the absolute value of the difference of two adjacent reward values of the brain-computer cooperative control model is smaller than a threshold value K, and otherwise, continuing repeating the step 1) to the step 2) until the training of the brain-computer cooperative control model is finished;
and setting a model training threshold K, finishing the model training when the absolute value of the difference of the two adjacent rewards R is smaller than the threshold K, and continuing the training until the model training is finished.
And 4) realizing the brain-machine cooperation accurate control of the entity robot by utilizing the trained brain-machine cooperation control model, thereby finishing the brain-machine cooperation digital twin strengthening control.
Specifically, a trained brain-machine cooperation control model is transplanted to a controller of the entity robot, and in the control process, a real environment and a virtual environment are synchronized through a digital twin method, so that parameters of the controller of the entity robot are corrected in real time.
Example (b):
and transplanting the trained model to a controller of the entity robot to realize brain-computer cooperation accurate control of the entity robot. Meanwhile, in the control process, the real environment and the virtual environment are completely synchronized by using a digital twin technology, and the parameters of the entity robot controller are corrected in real time.
Step 1: the method comprises the steps of setting up a real physical environment operation robot, and compared with a virtual training platform, except that a controlled object is a physical robot, other operation objects are the same;
step 2: when the control is started, the operator sends a direction command C through the control device at the moment ttFeeding the entity robot; meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected; the position arrangement of brain wave cap channels conforms to the international 10/20 standard, and electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2. 32 channels of electroencephalogram signals are counted;
and step 3: calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint。
And 4, step 4: inputting S in trained brain-machine cooperation control modeltOutput as action A at the corresponding timetAnd will output action AtTransmitting the data to the entity robot by using a wireless communication mode;
and 5: the entity robot receives the action A sent by the computer at the time ttWhile combining the directional command C of the operatortStarting to execute corresponding action until receiving action A sent by the computer at the next momentt+1And the command C sent by the operatort+1Then, executing the next action; until the control task is finished.
Step 6: and when the entity robot executes corresponding actions, real environment parameters and entity robot state parameters are transmitted to the digital twin environment by using the sensor, so that the virtual environment and the real environment are synchronized, and the parameters of the entity robot controller are corrected in real time.
Training a brain-machine cooperation control model:
(1-1) a mechanical arm digital twin environment platform (as shown in figure 1) is built, the adjustable instructions of the tail end of the virtual mechanical arm are set to be 8 items (direction: front, back, left, right, up and down; speed: acceleration and deceleration), wherein the direction instruction C is controlled by an operator through a remote lever, and the speed instruction A is controlled by a controller according to the brain state of the operator. The controller sends a direction instruction C to the virtual mechanical arm by operating the remote lever, and sends a speed instruction A to the virtual mechanical arm by detecting the brain state of the controller. The virtual mechanical arm combines two instructions of direction and speed to execute the task of tracking the tail end track of the mechanical arm. In the brain-computer cooperation control process, an operator continuously adjusts the direction instruction C by observing the running state of the virtual mechanical arm; meanwhile, the running state of the virtual mechanical arm can also influence the brain state of the operator (the controller adjusts the speed instruction A of the virtual mechanical arm by detecting the brain state), so that the virtual mechanical arm is accurately controlled through brain-machine cooperation.
(1-2) starting the control, the operator sends a direction command C through the control device at the time ttAnd (4) giving the virtual robot. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.
(1-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: theta rhythmThe sum of the band energies of the wave (4-8Hz) and the alpha rhythm wave (8-16Hz) is divided by the band energy of the beta rhythm wave (16-32Hz), and the three feature matrices are combined along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint。
(1-4) establishing a 5-layer fully-connected neural network in the controller, wherein the network input is the brain state S at the time ttAfter network training, 2D motion data A at corresponding time is outputtI.e. increase in speed by + δ v, or decrease in speed by- δ v, and move AtTo the virtual robotic arm.
(1-5) receiving the speed regulation and control instruction A at the time t by the virtual mechanical armtThen, the direction command C of the operator is combined at the same timetStarting to move in the virtual environment at the corresponding speed and direction until the next moment of action a is receivedt+1And direction command C of the operatort+1And then, executing the next action.
(1-6) the task of the current round is completed, or the task fails within a specified time, namely the task of the round is finished. The virtual environment feeds back the quality of the task completed by the virtual mechanical arm, and the quality is scored. There are two cases: firstly, recording 0 point when the task fails; task completion score is composed of two parts: basic score (50 min) + track quality, completion time (0-50 min). The scores are then normalized and converted into the reward R that can be recognized by the controllert(plus/minus bonus).
(1-7) Collection of N (N ═ 5) groups from brain State StVirtual mechanical arm action AtAnd a prize RtComposed data set (S)t、At、Rt) Calculating the average value (S) thereoft_a、At_a、Rt_a) The parameters are input into a controller, and parameters of the model are updated by adopting a gradient descent method.
(1-8) setting a model training threshold K, and respectively solving reward functions R at t +1 and t momentst+1And RtIf the absolute value of the difference is less than K, the model training is judged to be finished, otherwise, the step (1-3) is returned, and the next circulation is carried out until the model is testedAnd finishing the training.
And (1-9) inputting the trained model parameters into a computer for controlling the mechanical arm of the entity.
Brain-machine cooperative digital twinning manipulation:
and (2-1) constructing a real physical environment to control the physical mechanical arm, wherein compared with a virtual training platform, the controlled object is the physical mechanical arm, and other objects are the same.
(2-2) starting the control, the operator sends a direction command C through the control device at the time ttAnd (4) feeding the physical mechanical arm. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.
(2-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint。
(2-4) inputting S in the trained brain-machine cooperation control modeltOutput as action A at the corresponding timetAnd will output action AtAnd transmitting the data to the entity mechanical arm by using a local area network mode.
(2-5) the physical mechanical arm receives the action A sent by the computer at the time ttWhile combining the directional command C of the operatortStart to perform the correspondingUntil receiving action A sent by the computer at the next momentt+1And the command C sent by the operatort+1And then, executing the next action. Until the control task is finished.
And (2-6) while the entity mechanical arm executes corresponding actions, transmitting real environment parameters and entity mechanical arm state parameters to a digital twin environment by using a sensor, synchronizing the virtual environment and the real environment, and correcting the parameters of the entity mechanical arm controller in real time.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.