Nothing Special   »   [go: up one dir, main page]

CN112171669B - A brain-computer collaboration digital twin reinforcement learning control method and system - Google Patents

A brain-computer collaboration digital twin reinforcement learning control method and system Download PDF

Info

Publication number
CN112171669B
CN112171669B CN202010998177.0A CN202010998177A CN112171669B CN 112171669 B CN112171669 B CN 112171669B CN 202010998177 A CN202010998177 A CN 202010998177A CN 112171669 B CN112171669 B CN 112171669B
Authority
CN
China
Prior art keywords
brain
virtual robot
computer
operator
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010998177.0A
Other languages
Chinese (zh)
Other versions
CN112171669A (en
Inventor
张小栋
张腾
陆竹风
张毅
蒋志明
王雅纯
朱文静
蒋永玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010998177.0A priority Critical patent/CN112171669B/en
Publication of CN112171669A publication Critical patent/CN112171669A/en
Application granted granted Critical
Publication of CN112171669B publication Critical patent/CN112171669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/015Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Neurology (AREA)
  • Dermatology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Neurosurgery (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

本发明公开了一种脑‑机协作数字孪生强化学习控制方法及系统,通过构建脑‑机协作控制模型,操控者给定虚拟机器人方向指令,同时采集操控者给定虚拟机器人方向指令时的脑电信号,根据采集的脑电信号给定虚拟机器人相应的速度指令完成指定动作,根据完成质量对脑‑机协作控制模型进行奖励值,完成脑‑机协作控制模型的训练,通过脑‑机协作的数字孪生环境,以强化学习实现脑‑机之间的双环路信息交互机制,实现了大脑和机器之间信息层、指令层的交互,本发明通过脑电信号检测操控者大脑状态,根据操控者的大脑状态对机器人的指令进行补偿调控,实现精准操控,相较其他脑‑机协作方法,提高了鲁棒性和泛化能力,实现了脑‑机之间的互适应、互增长。

Figure 202010998177

The invention discloses a brain-machine cooperative digital twin reinforcement learning control method and system. By constructing a brain-machine cooperative control model, the operator gives a virtual robot direction command, and at the same time, the brain when the operator gives the virtual robot direction command is collected. Electric signal, according to the collected EEG signal, the virtual robot is given the corresponding speed command to complete the specified action, and the brain-machine cooperative control model is rewarded according to the completion quality, and the training of the brain-machine cooperative control model is completed. The dual-loop information interaction mechanism between the brain and the machine is realized by reinforcement learning, and the interaction of the information layer and the instruction layer between the brain and the machine is realized. Compared with other brain-machine cooperation methods, it improves the robustness and generalization ability, and realizes the mutual adaptation and mutual growth between the brain-machine.

Figure 202010998177

Description

Brain-computer cooperation digital twin reinforcement learning control method and system
Technical Field
The invention belongs to the technical field of brain-computer interface and artificial intelligence synthesis, and relates to a brain-computer cooperation digital twin reinforcement learning control method and system.
Background
With the development of robot technology, the demand for intelligent robots with human-like advanced perception and cognitive ability capable of executing unset tasks in highly complex environments is increasing. However, the current technology cannot achieve an intelligent robot with human thinking reasoning mode, autonomous discovery and feature extraction, online incremental learning and comprehensive processing of various information capabilities only by the artificial intelligence technology. The intelligent robot has the advantages that the human-machine intelligence fusion is utilized, the advantages of different intelligence of a human and a computer are fully exerted, and the intelligent robot is an important way for realizing the intelligent robot. With the complexity of tasks and scenes faced by a human-machine hybrid intelligent system, higher requirements are put on human intention perception and recognition, so that a brain-machine cooperation enhanced intelligent technology based on 'human-machine loop' is proposed and rapidly draws high attention on the basis of the research of human-machine hybrid intelligence. However, in the field of precision manipulation (e.g., surgical robots, special working robots, etc.), the brain-controlled system still has risks in terms of stability and safety, as compared to the limb manipulation system. Therefore, the human limb control command is still the main command in the field of precise control at present.
The research shows that the following two problems still exist in the field of precise control: (1) lack of bidirectional interaction of information between the operator and the robot, and unable to realize precise perception of the intention of the operator; (2) the brain has the problems of distraction, mental fatigue, excessive mental load and the like, so that the performance of the brain-computer hybrid intelligent system is poor and even dangerous. Aiming at the problem of brain-machine cooperation, the effective fusion of human brain intelligence and machine intelligence is not formed. Aiming at the poor performance of a brain-computer system caused by the mental state of the human brain, only the problem of one-way compensation of a control instruction is considered at present, and the interaction mechanism of a double loop between the brain and the computer is lacked. In summary, in the brain-machine cooperation control method for the human-machine in the loop, an integrated brain-machine cooperation model still exists at the present stage, the deep fusion of an information layer and an instruction layer cannot be effectively realized, and the precision, stability and safety of the brain-machine cooperation control need to be improved.
Disclosure of Invention
The invention aims to provide a brain-computer cooperation digital twin reinforcement learning control method and system to overcome the defects of the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a brain-computer cooperation digital twin reinforcement learning control method comprises the following steps:
step 1), constructing a brain-computer cooperative control model based on a digital twin environment, wherein in the digital twin environment, an operator gives a virtual robot direction instruction, electroencephalograms when the operator gives the virtual robot direction instruction are collected at the same time, and a corresponding speed instruction of the virtual robot is given according to the collected electroencephalograms;
step 2), the virtual robot finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-computer cooperative control model according to the finishing quality of the appointed action, and finishes the training of the brain-computer cooperative control model at the current moment;
step 3), repeating the step 1) to the step 2), finishing the training of the brain-computer cooperative control model at different moments, finishing the training of the brain-computer cooperative control model when the absolute value of the difference of two adjacent reward values of the brain-computer cooperative control model is smaller than a threshold value K, and otherwise, continuing repeating the step 1) to the step 2) until the training of the brain-computer cooperative control model is finished;
and 4) realizing the brain-machine cooperation accurate control of the entity robot by utilizing the trained brain-machine cooperation control model, thereby finishing the brain-machine cooperation digital twin strengthening control.
Further, a virtual robot digital twin environment platform is set up, and adjustable instructions of the virtual robot are set, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the brain state of the operator.
Furthermore, in a digital twin environment, an operator gives a virtual robot direction instruction through a virtual control platform, and gives a virtual robot speed instruction according to electroencephalogram signals of the operator.
Further, an operator controls the virtual robot through the control device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model.
Furthermore, the operator passes through the device at time tThe control device sends a direction command CtGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculatedDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the braintAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.
Further, the virtual robot receives a speed instruction A at the time ttThen, combine the direction command C of the operator at the same timetStarting to execute the corresponding action until receiving action A at the next momentt+1And direction command C of the operatort+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing timet
Further, according to the brain state StSpeed command AtAnd a prize RtAnd (4) forming a data set, and updating the brain-machine cooperation control model.
Further, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.
A brain-computer cooperation digital twin reinforcement learning control system comprises an electroencephalogram acquisition module, a model training module and a control module;
the electroencephalogram acquisition module is used for acquiring an electroencephalogram signal of an operator when the operator gives a direction instruction of the virtual robot, giving a corresponding speed instruction of the virtual robot according to the acquired electroencephalogram signal, and transmitting the speed instruction to the model training module; the model training module finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishes the training of the brain-machine cooperation control model at the current moment, and the control module realizes the brain-machine cooperation accurate control of the entity robot according to the brain-machine cooperation control model obtained by the training.
Further, the model training module sends a direction instruction C through the control device at the time t according to the operatortGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of a controller 600ms before t moment are collected, and differential entropy characteristics F of the extracted 600ms electroencephalogram signals are calculatedDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3Combining the three feature matrixes along the row vector to form multi-dimensional feature data S capable of reflecting the current state of the braintAnd establishing the association between the virtual robot and the electroencephalogram signal according to the actual speed instruction of the virtual robot to obtain a brain-computer cooperative control preliminary model.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention relates to a brain-machine cooperation digital twin reinforcement learning control method, which comprises the steps of constructing a brain-machine cooperation control model, setting a virtual robot direction instruction by an operator in a digital twin environment, simultaneously acquiring electroencephalograms when the virtual robot direction instruction is set by the operator, setting a corresponding speed instruction of a virtual robot according to the acquired electroencephalograms, finishing the direction instruction and speed instruction appointed action according to the obtained direction instruction and speed instruction, carrying out reward values on the brain-machine cooperation control model according to the finishing quality of the appointed action, finishing the training of the brain-machine cooperation control model at the current moment, realizing a double loop information interaction mechanism between brain and machines by reinforcement learning through the brain-machine cooperation digital twin environment, wherein the model has good mobility from a virtual scene to a real scene, the method realizes the updating of model parameters in a control algorithm in the cooperative control process of the operator and the robot, along with the increase of the interaction times between the operator and the robot, the performance can be continuously improved, and the method has the capability of crossing individuals and tasks. Compared with other brain-computer cooperation methods, the robustness and the generalization capability are improved, and the brain-computer mutual adaptation and growth are realized.
Furthermore, electroencephalogram signals are used as environment objects, a control algorithm is used as an intelligent object, a double-loop information interaction mechanism is provided, an operator sends an operation command to the robot through an operation device, and meanwhile, the operation state of the robot is supervised in real time through visual and auditory information, the operation command is adjusted, and errors are corrected; after the integrated brain-computer cooperation model is used for processing, a regulation and control instruction is sent to the robot, and an active loop instruction and a passive loop instruction act on the robot in a cooperation mode, so that the robot can execute tasks safely, accurately and efficiently.
The brain-computer cooperation digital twin reinforcement learning control system realizes the mutual adaptation, mutual supervision and mutual growth of brain-computer cooperation, so that the robot can execute tasks accurately, safely and efficiently.
Drawings
FIG. 1 is a flowchart illustrating an exemplary control procedure according to an embodiment of the present invention.
FIG. 2 is a block diagram of a method flow in an embodiment of the invention.
FIG. 3 is a schematic diagram of the electrode arrangement position of the electroencephalogram signal acquisition module in the embodiment of the present invention.
Fig. 4 is a schematic diagram of an integrated brain-computer cooperation model in the embodiment of the invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
a brain-computer cooperation digital twin reinforcement learning control method comprises the following steps:
step 1), constructing a brain-machine cooperative control model based on a digital twin environment, training the brain-machine cooperative control model in the digital twin environment, giving a virtual robot direction instruction by an operator through a virtual control platform, simultaneously acquiring electroencephalograms when the operator gives the virtual robot direction instruction, and giving a corresponding speed instruction of the virtual robot according to the electroencephalograms when the operator gives the virtual robot direction instruction;
establishing a virtual environment: establishing a brain-machine cooperation control model based on a digital twin environment, and performing reinforcement learning training of the brain-machine cooperation control model in the digital twin environment; setting up a digital twin environment platform of the virtual robot, and setting adjustable instructions of the virtual robot, wherein the adjustable instructions comprise a direction instruction and a speed instruction; the direction control instruction is a direction control instruction which is controlled by an operator through an operation control device; the speed instruction is a speed control instruction for controlling the virtual robot, and the speed instruction is obtained according to the electroencephalogram signal of the operator.
The control device is used for outputting control instructions and comprises a mouse, a handle and a direction controller.
An operator controls the virtual robot through an operation device, the virtual robot acquires a direction instruction at the moment, electroencephalograms when the operator gives the direction instruction of the virtual robot are collected at the same time, corresponding speed instructions of the virtual robot are given according to the collected electroencephalograms, and the correlation between the electroencephalograms and the speed instructions is established to obtain a brain-computer cooperative control preliminary model;
in the aspect of speed instruction control of the virtual robot, a brain-computer interface technology is utilized, a computer analyzes a brain state according to an electroencephalogram signal of a manipulator, and a speed instruction for controlling the virtual robot is output according to the brain state.
When the control is started, the operator sends a direction command C through the control device at the moment ttGiving the virtual robot; meanwhile, brain surface electroencephalogram signals of the operator 600ms before the time t are collected. The position arrangement of brain wave cap channels meets the international 10/20 standard, and electrodes are arranged on Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6. Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1, O2 positions. 32 channels of electroencephalogram signals are counted.
Calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint
Specifically, an operator sends a direction instruction C to the virtual robot through the control device, simultaneously detects electroencephalograms of the operator, converts the electroencephalograms into a speed instruction A and sends the speed instruction A to the virtual robot, and the virtual robot executes tasks specified by the direction instruction C and the speed instruction A by combining the direction instruction C and the speed instruction A; in the brain-computer cooperation control process, a controller continuously adjusts the direction command C by observing the running state of the virtual robot, and simultaneously acquires an electroencephalogram signal of the controller to give a corresponding speed command A.
Step 2), the virtual robot finishes the appointed action of the direction instruction and the speed instruction according to the obtained direction instruction and speed instruction, carries out reward value on the brain-computer cooperative control model according to the finishing quality of the appointed action, and finishes the training of the brain-computer cooperative control model at the current moment;
the virtual robot receives a speed instruction A at the time ttThen, combine the direction command C of the operator at the same timetStarting to execute the corresponding action until receiving action A at the next momentt+1And direction command C of the operatort+1Then, executing the next action until the task of one round is executed; after each round of task is finished, recording the execution condition of the virtual robot task, and calculating the reward R according to two standards of task finishing quality and finishing timet
Specifically, the brain-machine cooperation control model adopts a 5-layer fully-connected neural network model. According to the brain shapeState StSpeed command AtAnd a prize RtComposed data set (S)t、At、Rt) And updating parameters of the 5-layer fully-connected neural network model, wherein the specific updating process comprises the following steps: when rewarding RtWhen the value is higher, after the brain-machine cooperation control model parameters are updated, the brain state S is input next timetIn the case of (2), a speed command A is outputtThe probability of (2) becomes large; when rewarding RtWhen the value is lower, after the brain-machine cooperation control model parameters are updated, the brain state S is input next timetIn the case of (2), a speed command A is outputtThe probability of (2) becomes small. According to the training process, when a brain state S is input, the model outputs a corresponding speed instruction A, so that the obtained reward R can be stabilized at a higher level.
Step 3), repeating the step 1) to the step 2), finishing the training of the brain-computer cooperative control model at different moments, finishing the training of the brain-computer cooperative control model when the absolute value of the difference of two adjacent reward values of the brain-computer cooperative control model is smaller than a threshold value K, and otherwise, continuing repeating the step 1) to the step 2) until the training of the brain-computer cooperative control model is finished;
and setting a model training threshold K, finishing the model training when the absolute value of the difference of the two adjacent rewards R is smaller than the threshold K, and continuing the training until the model training is finished.
And 4) realizing the brain-machine cooperation accurate control of the entity robot by utilizing the trained brain-machine cooperation control model, thereby finishing the brain-machine cooperation digital twin strengthening control.
Specifically, a trained brain-machine cooperation control model is transplanted to a controller of the entity robot, and in the control process, a real environment and a virtual environment are synchronized through a digital twin method, so that parameters of the controller of the entity robot are corrected in real time.
Example (b):
and transplanting the trained model to a controller of the entity robot to realize brain-computer cooperation accurate control of the entity robot. Meanwhile, in the control process, the real environment and the virtual environment are completely synchronized by using a digital twin technology, and the parameters of the entity robot controller are corrected in real time.
Step 1: the method comprises the steps of setting up a real physical environment operation robot, and compared with a virtual training platform, except that a controlled object is a physical robot, other operation objects are the same;
step 2: when the control is started, the operator sends a direction command C through the control device at the moment ttFeeding the entity robot; meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected; the position arrangement of brain wave cap channels conforms to the international 10/20 standard, and electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2. 32 channels of electroencephalogram signals are counted;
and step 3: calculating the differential entropy characteristics F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint
And 4, step 4: inputting S in trained brain-machine cooperation control modeltOutput as action A at the corresponding timetAnd will output action AtTransmitting the data to the entity robot by using a wireless communication mode;
and 5: the entity robot receives the action A sent by the computer at the time ttWhile combining the directional command C of the operatortStarting to execute corresponding action until receiving action A sent by the computer at the next momentt+1And the command C sent by the operatort+1Then, executing the next action; until the control task is finished.
Step 6: and when the entity robot executes corresponding actions, real environment parameters and entity robot state parameters are transmitted to the digital twin environment by using the sensor, so that the virtual environment and the real environment are synchronized, and the parameters of the entity robot controller are corrected in real time.
Training a brain-machine cooperation control model:
(1-1) a mechanical arm digital twin environment platform (as shown in figure 1) is built, the adjustable instructions of the tail end of the virtual mechanical arm are set to be 8 items (direction: front, back, left, right, up and down; speed: acceleration and deceleration), wherein the direction instruction C is controlled by an operator through a remote lever, and the speed instruction A is controlled by a controller according to the brain state of the operator. The controller sends a direction instruction C to the virtual mechanical arm by operating the remote lever, and sends a speed instruction A to the virtual mechanical arm by detecting the brain state of the controller. The virtual mechanical arm combines two instructions of direction and speed to execute the task of tracking the tail end track of the mechanical arm. In the brain-computer cooperation control process, an operator continuously adjusts the direction instruction C by observing the running state of the virtual mechanical arm; meanwhile, the running state of the virtual mechanical arm can also influence the brain state of the operator (the controller adjusts the speed instruction A of the virtual mechanical arm by detecting the brain state), so that the virtual mechanical arm is accurately controlled through brain-machine cooperation.
(1-2) starting the control, the operator sends a direction command C through the control device at the time ttAnd (4) giving the virtual robot. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.
(1-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: theta rhythmThe sum of the band energies of the wave (4-8Hz) and the alpha rhythm wave (8-16Hz) is divided by the band energy of the beta rhythm wave (16-32Hz), and the three feature matrices are combined along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint
(1-4) establishing a 5-layer fully-connected neural network in the controller, wherein the network input is the brain state S at the time ttAfter network training, 2D motion data A at corresponding time is outputtI.e. increase in speed by + δ v, or decrease in speed by- δ v, and move AtTo the virtual robotic arm.
(1-5) receiving the speed regulation and control instruction A at the time t by the virtual mechanical armtThen, the direction command C of the operator is combined at the same timetStarting to move in the virtual environment at the corresponding speed and direction until the next moment of action a is receivedt+1And direction command C of the operatort+1And then, executing the next action.
(1-6) the task of the current round is completed, or the task fails within a specified time, namely the task of the round is finished. The virtual environment feeds back the quality of the task completed by the virtual mechanical arm, and the quality is scored. There are two cases: firstly, recording 0 point when the task fails; task completion score is composed of two parts: basic score (50 min) + track quality, completion time (0-50 min). The scores are then normalized and converted into the reward R that can be recognized by the controllert(plus/minus bonus).
(1-7) Collection of N (N ═ 5) groups from brain State StVirtual mechanical arm action AtAnd a prize RtComposed data set (S)t、At、Rt) Calculating the average value (S) thereoft_a、At_a、Rt_a) The parameters are input into a controller, and parameters of the model are updated by adopting a gradient descent method.
(1-8) setting a model training threshold K, and respectively solving reward functions R at t +1 and t momentst+1And RtIf the absolute value of the difference is less than K, the model training is judged to be finished, otherwise, the step (1-3) is returned, and the next circulation is carried out until the model is testedAnd finishing the training.
And (1-9) inputting the trained model parameters into a computer for controlling the mechanical arm of the entity.
Brain-machine cooperative digital twinning manipulation:
and (2-1) constructing a real physical environment to control the physical mechanical arm, wherein compared with a virtual training platform, the controlled object is the physical mechanical arm, and other objects are the same.
(2-2) starting the control, the operator sends a direction command C through the control device at the time ttAnd (4) feeding the physical mechanical arm. Meanwhile, brain surface electroencephalogram signals 600ms before t moment are collected. The position arrangement of brain electricity cap channels accords with the international 10/20 standard, electrodes are arranged at the positions of Fp1, Fp2, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, Cz, C3, C4, T3, T4, CP1, CP2, CP5, CP6, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, Oz, O1 and O2 (shown in figure 3), and brain electricity signals of the 32 channels are collected. In the embodiment, the electroencephalogram acquisition equipment adopts 32-channel Briokang Nersenw 32 electroencephalogram acquisition equipment, the reference electrode adopts an AFz and CPz channel double-reference-electrode arrangement scheme according to the equipment, the sampling frequency is 1000Hz, and the electroencephalogram acquisition equipment is transmitted to a computer through a local area network.
(2-3) calculating the differential entropy characteristic F of the extracted 600ms electroencephalogram signalDPower spectral density characteristic FPAnd three frequency band energy relationship features FP3(wherein FP3Comprises the following steps: dividing the sum of the band energy of theta rhythm wave (4-8Hz) and alpha rhythm wave (8-16Hz) by the band energy of beta rhythm wave (16-32Hz), and combining the three feature matrices along the row vector in the form of [ F [D,FP,FP3]Forming multi-dimensional characteristic data S capable of reflecting current state of braint
(2-4) inputting S in the trained brain-machine cooperation control modeltOutput as action A at the corresponding timetAnd will output action AtAnd transmitting the data to the entity mechanical arm by using a local area network mode.
(2-5) the physical mechanical arm receives the action A sent by the computer at the time ttWhile combining the directional command C of the operatortStart to perform the correspondingUntil receiving action A sent by the computer at the next momentt+1And the command C sent by the operatort+1And then, executing the next action. Until the control task is finished.
And (2-6) while the entity mechanical arm executes corresponding actions, transmitting real environment parameters and entity mechanical arm state parameters to a digital twin environment by using a sensor, synchronizing the virtual environment and the real environment, and correcting the parameters of the entity mechanical arm controller in real time.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (7)

1.一种脑-机协作数字孪生强化学习控制方法,其特征在于,包括以下步骤:1. a brain-computer collaboration digital twin reinforcement learning control method, is characterized in that, comprises the following steps: 步骤1)、构建基于数字孪生环境的脑-机协作控制模型,在数字孪生环境中,操控者通过操控装置控制虚拟机器人,此时虚拟机器人获取方向指令,同时采集操控者给定虚拟机器人方向指令时的脑电信号,根据采集的脑电信号给定虚拟机器人相应的速度指令,具体的:操控者在t时刻通过操控装置发送方向指令C t 给虚拟机器人;同时,采集t时刻前600ms操控者的大脑表面脑电信号,计算所提取的600ms脑电信号的微分熵特征FD、功率谱密度特征FP以及三种频带能量关系特征FP3,并将这三种特征的矩阵沿着行向量进行组合,形成能够反映大脑当前状态的多维特征数据S t ,根据虚拟机器人实际的速度指令建立其与脑电信号之间的关联,得到脑-机协作控制初步模型;Step 1), build a brain-computer cooperative control model based on the digital twin environment. In the digital twin environment, the operator controls the virtual robot through the manipulation device. At this time, the virtual robot obtains the direction command, and at the same time collects the direction command given by the operator to the virtual robot When the EEG signal is obtained, the corresponding speed command of the virtual robot is given according to the collected EEG signal. Specifically: the operator sends the direction command C t to the virtual robot through the control device at time t ; at the same time, the operator is collected 600ms before time t . The surface EEG signal of the brain is calculated, the differential entropy feature FD , the power spectral density feature FP and the three frequency band energy relationship feature FP3 of the extracted 600ms EEG signal are calculated, and the matrix of these three features is arranged along the row vector Combining to form multi-dimensional feature data S t that can reflect the current state of the brain, establishing the correlation between the virtual robot and the EEG signal according to the actual speed command of the virtual robot, and obtaining a preliminary model of brain-computer cooperative control; 步骤2)、虚拟机器人根据得到的方向指令和速度指令完成该方向指令和速度指令指定动作,根据指定动作的完成质量,对脑-机协作控制模型进行奖励值,完成当前时刻脑-机协作控制模型的训练;Step 2), the virtual robot completes the specified action of the direction command and speed command according to the obtained direction command and speed command, and according to the completion quality of the specified action, the brain-computer cooperative control model is rewarded, and the brain-computer cooperative control at the current moment is completed. model training; 步骤3)、重复步骤1)-步骤2),完成不同时刻下脑-机协作控制模型的训练,当对脑-机协作控制模型的相邻两次奖励值的差的绝对值小于阈值K,则完成脑-机协作控制模型的训练,反之则继续重复步骤1)-步骤2),直到脑-机协作控制模型训练完毕;Step 3), repeat step 1)-step 2), complete the training of the brain-computer cooperative control model at different times, when the absolute value of the difference between the two adjacent reward values of the brain-computer cooperative control model is less than the threshold K, Then complete the training of the brain-computer cooperative control model, otherwise, continue to repeat steps 1)-step 2) until the training of the brain-computer cooperative control model is completed; 步骤4)、利用完成训练的脑-机协作控制模型实现对实体机器人的脑-机协作精准控制,从而完成脑-机协作数字孪生强化操控。Step 4), using the trained brain-computer cooperation control model to realize the precise control of the brain-computer cooperation of the physical robot, so as to complete the enhanced control of the brain-computer cooperation digital twin. 2.根据权利要求1所述的一种脑-机协作数字孪生强化学习控制方法,其特征在于,搭建虚拟机器人数字孪生环境平台,设定虚拟机器人的可调指令,可调指令包括方向指令和速度指令;其中方向指令为方向控制指令,由操控者通过操控装置控制;速度指令为用于控制虚拟机器人的速度控制指令,根据操控者的大脑状态得到速度指令。2. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, is characterized in that, builds virtual robot digital twin environment platform, sets the adjustable instruction of virtual robot, and adjustable instruction comprises direction instruction and . The speed command; the direction command is the direction control command, which is controlled by the operator through the manipulation device; the speed command is the speed control command used to control the virtual robot, and the speed command is obtained according to the operator's brain state. 3.根据权利要求2所述的一种脑-机协作数字孪生强化学习控制方法,其特征在于,在数字孪生环境中,操控者通过虚拟控制平台给定虚拟机器人方向指令,根据操控者的脑电信号给定虚拟机器人速度指令。3. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 2, is characterized in that, in digital twin environment, operator gives virtual robot direction instruction by virtual control platform, according to operator's brain The electrical signal gives the virtual robot speed command. 4.根据权利要求1所述的一种脑-机协作数字孪生强化学习控制方法,其特征在于,虚拟机器人接收到t时刻速度指令A t 之后,同时结合操控者的方向指令C t ,开始执行相应的动作,直到接收到下一时刻的动作A t+1 和操控者的方向指令C t+1 后,再执行下一次的动作,直到执行完一个回合的任务;当每一回合任务完成后,记录虚拟机器人任务执行的情况,并根据任务完成质量和完成时间两个标准计算奖励R t 4. a kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, it is characterized in that, after virtual robot receives speed command A t at time t , combines the direction command C t of operator simultaneously, starts to execute Corresponding actions, until the action A t+1 at the next moment and the direction command C t+1 of the controller are received, the next action is performed until the task of one round is completed; when the task of each round is completed , record the virtual robot task execution, and calculate the reward R t according to the two criteria of task completion quality and completion time. 5.根据权利要求4所述的一种脑-机协作数字孪生强化学习控制方法,其特征在于,根据大脑状态S t 、速度指令A t 和奖励R t 组成的数据组,更新脑-机协作控制模型。5. A kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 4, is characterized in that, according to the data group that brain state S t , speed instruction A t and reward R t are formed, update brain-computer collaboration control model. 6.根据权利要求1所述的一种脑-机协作数字孪生强化学习控制方法,其特征在于,操控者通过操控装置,发送方向指令C给虚拟机器人,同时检测操控者脑电信号,并将脑电信号转化为速度指令A发送给虚拟机器人,虚拟机器人结合方向指令C和速度指令A执行方向指令C和速度指令A规定的任务;脑-机协作操控过程中,操控者通过观察虚拟机器人的运行状态,不断调整方向指令C,同时采集操控者的脑电信号给定相应的速度指令A。6. A kind of brain-computer collaboration digital twin reinforcement learning control method according to claim 1, is characterized in that, the operator sends the direction instruction C to the virtual robot through the control device, simultaneously detects the operator's EEG signal, and the The EEG signal is converted into a speed command A and sent to the virtual robot. The virtual robot combines the direction command C and the speed command A to execute the tasks specified by the direction command C and the speed command A; In the running state, continuously adjust the direction command C, and at the same time collect the operator's EEG signal to give the corresponding speed command A. 7.一种脑-机协作数字孪生强化学习控制系统,其特征在于,包括脑电采集模块、模型训练模块和控制模块;7. A brain-computer collaboration digital twin reinforcement learning control system, characterized in that it comprises an EEG acquisition module, a model training module and a control module; 脑电采集模块用于获取操控者给定虚拟机器人方向指令时操控者的脑电信号,根据采集的脑电信号给定虚拟机器人相应的速度指令,并将速度指令传输至模型训练模块;模型训练模块根据得到的方向指令和速度指令完成该方向指令和速度指令指定动作,根据指定动作的完成质量,对脑-机协作控制模型进行奖励值,完成当前时刻脑-机协作控制模型的训练,控制模块根据训练得到的脑-机协作控制模型实现对实体机器人的脑-机协作精准控制,模型训练模块根据操控者在t时刻通过操控装置发送方向指令C t 给虚拟机器人;同时,采集t时刻前600ms操控者的大脑表面脑电信号,计算所提取的600ms脑电信号的微分熵特征FD、功率谱密度特征FP以及三种频带能量关系特征FP3,并将这三种特征的矩阵沿着行向量进行组合,形成能够反映大脑当前状态的多维特征数据S t ,根据虚拟机器人实际的速度指令建立其与脑电信号之间的关联,得到脑-机协作控制初步模型。The EEG acquisition module is used to obtain the EEG signal of the operator when the operator gives the direction command of the virtual robot, give the corresponding speed command of the virtual robot according to the collected EEG signal, and transmit the speed command to the model training module; model training The module completes the specified action of the direction command and speed command according to the obtained direction command and speed command, and according to the completion quality of the specified action, rewards the brain-computer cooperative control model, and completes the training of the brain-computer cooperative control model at the current moment. The module realizes the precise control of the brain - computer cooperation of the physical robot according to the brain -computer cooperation control model obtained by training. The brain surface EEG signal of the 600ms operator is calculated, the differential entropy feature FD , the power spectral density feature FP and the three frequency band energy relationship feature FP3 of the extracted 600ms EEG signal are calculated, and the matrix of these three features is calculated along the Combining the line vectors to form multi-dimensional feature data S t that can reflect the current state of the brain, establish the correlation between the virtual robot and the EEG signal according to the actual speed command of the virtual robot, and obtain a preliminary model of brain-computer cooperative control.
CN202010998177.0A 2020-09-21 2020-09-21 A brain-computer collaboration digital twin reinforcement learning control method and system Active CN112171669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010998177.0A CN112171669B (en) 2020-09-21 2020-09-21 A brain-computer collaboration digital twin reinforcement learning control method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010998177.0A CN112171669B (en) 2020-09-21 2020-09-21 A brain-computer collaboration digital twin reinforcement learning control method and system

Publications (2)

Publication Number Publication Date
CN112171669A CN112171669A (en) 2021-01-05
CN112171669B true CN112171669B (en) 2021-10-08

Family

ID=73955701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010998177.0A Active CN112171669B (en) 2020-09-21 2020-09-21 A brain-computer collaboration digital twin reinforcement learning control method and system

Country Status (1)

Country Link
CN (1) CN112171669B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495578B (en) * 2021-09-07 2021-12-10 南京航空航天大学 A Reinforcement Learning Method for Cluster Track Planning Based on Digital Twin Training
CN114310870A (en) * 2021-11-10 2022-04-12 达闼科技(北京)有限公司 Intelligent agent control method and device, electronic equipment and storage medium
CN114147706A (en) * 2021-11-25 2022-03-08 北京邮电大学 A collaborative robot remote monitoring system and method based on digital twin
CN115577641B (en) * 2022-11-14 2023-04-07 成都飞机工业(集团)有限责任公司 Training method, device, equipment and medium for digital twin model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9050200B2 (en) * 2007-05-02 2015-06-09 University Of Florida Research Foundation, Inc. System and method for brain machine interface (BMI) control using reinforcement learning
US20100324440A1 (en) * 2009-06-19 2010-12-23 Massachusetts Institute Of Technology Real time stimulus triggered by brain state to enhance perception and cognition
WO2016094862A2 (en) * 2014-12-12 2016-06-16 Francis Joseph T Autonomous brain-machine interface
US10664766B2 (en) * 2016-01-27 2020-05-26 Bonsai AI, Inc. Graphical user interface to an artificial intelligence engine utilized to generate one or more trained artificial intelligence models
CN105563495B (en) * 2016-02-01 2017-12-15 浙江大学 Arm-and-hand system and method based on refinement motion imagination EEG signals control
US10712820B2 (en) * 2016-10-27 2020-07-14 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for a hybrid brain interface for robotic swarms using EEG signals and an input device
CN109015635A (en) * 2018-08-08 2018-12-18 西安科技大学 A kind of service robot control method based on brain-machine interaction

Also Published As

Publication number Publication date
CN112171669A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112171669B (en) A brain-computer collaboration digital twin reinforcement learning control method and system
US20220054071A1 (en) Motor imagery electroencephalogram signal processing method, device, and storage medium
CN106671084B (en) An autonomous assistance method for robotic arms based on brain-computer interface
CN110355761B (en) Rehabilitation robot control method based on joint stiffness and muscle fatigue
CN106726209B (en) A kind of method for controlling intelligent wheelchair based on brain-computer interface and artificial intelligence
CN109394476A (en) The automatic intention assessment of brain flesh information and upper limb intelligent control method and system
CN111096796B (en) Full-automatic vein puncture robot multilayer control system
CN112051780B (en) A mobile robot formation control system and method based on brain-computer interface
CN112631173A (en) Brain-controlled unmanned platform cooperative control system
CN108491071B (en) Brain-controlled vehicle sharing control method based on fuzzy control
CN110377049A (en) Unmanned plane cluster flight pattern reconfigurable control method based on brain-computer interface
CN113408397A (en) Domain-adaptive cross-subject motor imagery electroencephalogram signal identification system and method
Gergondet et al. Multitask humanoid control with a brain-computer interface: user experiment with hrp-2
CN116500901A (en) Digital twin-driven man-machine cooperation task planning method under condition of unknown user intention
Turgunov et al. Comparative analysis of the results of EMG signal classification based on machine learning algorithms
CN113625769B (en) A multi-modal control system for UAV formation based on EEG signals
CN117697769B (en) Robot control system and method based on deep learning
CN112936259B (en) A Human-Robot Collaboration Method Applicable to Underwater Robots
CN111598311A (en) Novel intelligent optimization method for train running speed curve
CN112631148B (en) Exoskeleton robot platform communication method and online simulation control system
CN113730190A (en) Upper limb rehabilitation robot system with three-dimensional space motion
CN112085169B (en) Self-learning and evolutionary method of brain-myoelectric fusion perception of limb exoskeleton-assisted rehabilitation
Huo et al. A BCI-based motion control system for heterogeneous robot swarm
CN112364977A (en) Unmanned aerial vehicle control method based on motor imagery signals of brain-computer interface
CN211271430U (en) Control system of intelligent bionic knee joint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant