CN118249474A - Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device - Google Patents
Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device Download PDFInfo
- Publication number
- CN118249474A CN118249474A CN202410658334.1A CN202410658334A CN118249474A CN 118249474 A CN118249474 A CN 118249474A CN 202410658334 A CN202410658334 A CN 202410658334A CN 118249474 A CN118249474 A CN 118249474A
- Authority
- CN
- China
- Prior art keywords
- working condition
- under
- simulated
- ray
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000011217 control strategy Methods 0.000 title claims abstract description 29
- 238000003306 harvesting Methods 0.000 title claims abstract description 21
- 241000330899 Hepialus Species 0.000 title description 2
- 230000009189 diving Effects 0.000 claims abstract description 223
- 230000009471 action Effects 0.000 claims abstract description 186
- 230000000875 corresponding effect Effects 0.000 claims abstract description 86
- 241001175904 Labeo bata Species 0.000 claims abstract description 20
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 19
- 241001331491 Myliobatis californica Species 0.000 claims abstract description 15
- 230000001276 controlling effect Effects 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 81
- 238000010248 power generation Methods 0.000 claims description 75
- 230000006870 function Effects 0.000 claims description 61
- 230000007774 longterm Effects 0.000 claims description 40
- 230000008569 process Effects 0.000 claims description 36
- 230000000977 initiatory effect Effects 0.000 claims description 27
- 230000003993 interaction Effects 0.000 claims description 26
- 238000004146 energy storage Methods 0.000 claims description 18
- 230000002787 reinforcement Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 241000288673 Chiroptera Species 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 2
- 230000014759 maintenance of location Effects 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- HBBGRARXTFLTSG-UHFFFAOYSA-N Lithium ion Chemical compound [Li+] HBBGRARXTFLTSG-UHFFFAOYSA-N 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 229910001416 lithium ion Inorganic materials 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- WHXSMMKQMYFTQS-UHFFFAOYSA-N Lithium Chemical compound [Li] WHXSMMKQMYFTQS-UHFFFAOYSA-N 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 229910052744 lithium Inorganic materials 0.000 description 3
- 230000010287 polarization Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003989 dielectric material Substances 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000011664 nicotinic acid Substances 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000003487 electrochemical reaction Methods 0.000 description 1
- 239000003792 electrolyte Substances 0.000 description 1
- 239000002783 friction material Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J7/00—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
- H02J7/007—Regulation of charging or discharging current or voltage
- H02J7/00712—Regulation of charging or discharging current or voltage the cycle being controlled or terminated in response to electric parameters
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B63—SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
- B63H—MARINE PROPULSION OR STEERING
- B63H21/00—Use of propulsion power plant or units on vessels
- B63H21/12—Use of propulsion power plant or units on vessels the vessels being motor-driven
- B63H21/17—Use of propulsion power plant or units on vessels the vessels being motor-driven by electric motor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B63—SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
- B63H—MARINE PROPULSION OR STEERING
- B63H21/00—Use of propulsion power plant or units on vessels
- B63H21/21—Control means for engine or transmission, specially adapted for use on marine vessels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J7/00—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
- H02J7/0047—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries with monitoring or indicating devices or circuits
- H02J7/0048—Detection of remaining charge capacity or state of charge [SOC]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J7/00—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
- H02J7/32—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries for charging batteries from a charging set comprising a non-electric prime mover rotating at constant speed
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J7/00—Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
- H02J7/34—Parallel operation in networks using both storage and other dc sources, e.g. providing buffering
- H02J7/35—Parallel operation in networks using both storage and other dc sources, e.g. providing buffering with light sensitive cells
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02N—ELECTRIC MACHINES NOT OTHERWISE PROVIDED FOR
- H02N1/00—Electrostatic generators or motors using a solid moving electrostatic charge carrier
- H02N1/04—Friction generators
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02N—ELECTRIC MACHINES NOT OTHERWISE PROVIDED FOR
- H02N1/00—Electrostatic generators or motors using a solid moving electrostatic charge carrier
- H02N1/06—Influence generators
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B63—SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
- B63H—MARINE PROPULSION OR STEERING
- B63H21/00—Use of propulsion power plant or units on vessels
- B63H21/12—Use of propulsion power plant or units on vessels the vessels being motor-driven
- B63H21/17—Use of propulsion power plant or units on vessels the vessels being motor-driven by electric motor
- B63H2021/171—Use of propulsion power plant or units on vessels the vessels being motor-driven by electric motor making use of photovoltaic energy conversion, e.g. using solar panels
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B63—SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
- B63H—MARINE PROPULSION OR STEERING
- B63H21/00—Use of propulsion power plant or units on vessels
- B63H21/21—Control means for engine or transmission, specially adapted for use on marine vessels
- B63H2021/216—Control means for engine or transmission, specially adapted for use on marine vessels using electric control means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Power Engineering (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Ocean & Marine Engineering (AREA)
- Mechanical Engineering (AREA)
- Combustion & Propulsion (AREA)
- Chemical & Material Sciences (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of energy management, in particular to an energy control strategy of a multi-source energy harvesting and storing system of a simulated bata diving device, which comprises the following steps: acquiring relevant parameters of each moment when the simulated ray diving device sails underwater; establishing a load prediction model, and predicting the total load power of the simulated ray of the submersible at the next moment; constructing a corresponding action strategy network model of the simulated ray of the batray diving device under each modal working condition; acquiring a target action strategy network model; predicting the action taken by the simulated ray diving device at the next moment under the current modal working condition, and controlling the action of the simulated ray diving device. The invention ensures that the energy system of the simulated ray diving device can meet the requirement of multi-target tasks when facing complex modal working conditions; and the energy control strategy is autonomously determined by an algorithm without human intervention, so that the design cost and the error probability are greatly reduced compared with the traditional logic strategy, and the control precision of energy is improved.
Description
Technical Field
The invention relates to the technical field of energy management, in particular to an energy control strategy of a multi-source energy harvesting and storing system of an imitation bata diving device.
Background
The 'simulated ray diving device' is an innovative scientific and technological result and has wide prospect and great potential. It not only provides new ideas and tools, but also makes an important contribution to maintaining marine rights. However, a single lithium battery type is difficult to support the submersible for long-term hidden work in deep sea, so that various energy storage and energy supply devices are designed and mounted aiming at the performance requirements and the working environment of the bionic fish submersible for improving the endurance capacity of the submersible. Specifically, the back of the bionic fish is provided with a solar energy capturing system for capturing solar energy; the ocean current energy friction power generation device is arranged at the abdomen of the ocean current energy friction power generation device and is used for capturing ocean current energy; finally, flexible lithium batteries which can be bent are carried on the flapping wing mechanisms at two sides; the flexibility and the conventional lithium battery are combined, so that the distributed storage of multi-source energy harvesting is realized. The multi-source energy harvesting-storing system greatly improves the radius of exploration of underwater unmanned equipment in the deep sea by capturing renewable energy sources of the deep sea multi-system.
However, in the face of complex and changeable deep sea environments and multi-mode task demands, the traditional experience-based control strategy is difficult to adapt, mutual coordination among multi-source energy harvesting and storage systems cannot be performed, and a corresponding optimal control strategy cannot be designed according to different working conditions of different environments, so that the accuracy of the control strategy is low.
Thus, the first and second substrates are bonded together, it is necessary to provide a multi-source energy-capturing and energy-storing device for a simulated ray of the bata an energy control strategy for the system to solve the above problems.
Disclosure of Invention
The invention provides an energy control strategy of a multi-source energy-harvesting energy storage system of an imitation ray of a Chinese-character 'Mi' submersible, which aims at solving the problems that the existing experience-based control strategy is difficult to adapt, the mutual coordination among the multi-source energy-harvesting energy storage systems can not be carried out, and the corresponding optimal control strategy can not be designed aiming at different working conditions of different environments, so that the accuracy of the control strategy is lower.
The invention discloses an energy control strategy of a multi-source energy-harvesting energy-storage system of a simulated ray of a ray, which adopts the following technical scheme that:
acquiring relevant parameters of each moment when the simulated ray diving device sails underwater, wherein the relevant parameters comprise: the solar power generation module, the ocean current energy power generation module and the battery pack module of the simulated ray-bated diving device correspond to the output power of the light intensity, the navigational speed and the simulated ray-bated diving device;
establishing a load prediction model, taking relevant parameters at the current moment as the input of the load prediction model, and predicting the total load power of the simulated batlight diving device at the next moment;
Constructing a corresponding loss function under a high-speed maneuver mode working condition based on the total load power of the simulated ray-simulated diving device under the next moment under the high-speed maneuver mode working condition, which is predicted by the load prediction model, and constructing an action strategy network model of the simulated ray-simulated diving device under the high-speed maneuver mode working condition based on the corresponding loss function under the high-speed maneuver mode working condition;
constructing a corresponding loss function of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the sailing distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, and constructing a corresponding action strategy network model of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the corresponding loss function under the long-term self-sustaining mode working condition and the benthonic residence mode working condition;
optimizing the action strategy network model under each modal working condition by adopting a reinforcement learning algorithm to obtain a target action strategy network model;
And inputting the charge state value, the light intensity, the navigational speed of the battery pack module and the action taken by the batray-simulated diving device under the current mode working condition into a corresponding target action strategy network model under the current mode working condition, predicting the action taken by the batray-simulated diving device under the current mode working condition at the next moment, and controlling the action of the batray-simulated diving device according to the action taken by the batray-simulated diving device under the current mode working condition at the next moment.
Preferably, the expression of the corresponding loss function under the long-term self-sustaining mode working condition is as follows:
In the method, in the process of the invention, Representing the corresponding loss function value under the long-time self-sustaining mode working condition;
the furthest sailing distance of the simulated bata ray diving device under the long-time self-sustaining mode working condition is shown.
Preferably, the expression of the corresponding loss function under the high-speed maneuvering mode working condition is as follows:
In the method, in the process of the invention, Representing a corresponding loss function value under a high-speed maneuvering mode working condition;
representing the total load power of the simulated ray of the batray diving apparatus predicted by the load prediction model at the next moment;
Representing the total power output by the simulated ray diving device at the next moment under the working condition of a high-speed maneuvering mode;
Representing the output power of the solar power generation module;
representing the output power of the ocean current energy power generation module;
representing the output power of the battery module.
Preferably, the expression of the corresponding loss function under the benthonic residence mode working condition is as follows:
In the method, in the process of the invention, Representing a corresponding loss function value under the benthonic residence mode working condition;
The furthest sailing distance of the simulated ray of the bats is shown under the working condition of the benthonic residence mode.
Preferably, the step of constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition comprises the following steps:
Constructing an initial action strategy network model: taking the corresponding loss function under each mode working condition as the loss function of the network model to obtain a corresponding initial action strategy network model under each mode working condition;
Training an initial action strategy network model: the method comprises the steps of inputting a state of charge value, light intensity, navigational speed of a battery module of the simulated ray diving apparatus under each modal working condition and actions taken by the simulated ray diving apparatus at the current moment as corresponding initial action strategy network models under the modal working condition, outputting actions taken by the simulated ray diving apparatus under the modal working condition as corresponding initial action strategy network models under the modal working condition, and training the initial action strategy network models under each modal working condition to obtain trained action strategy network models under each modal working condition;
and taking the trained action strategy network model as a corresponding action strategy network model under each modal working condition.
Preferably, when optimizing the action strategy network model under each modal working condition, the expression of the reward function of the reinforcement learning algorithm is:
In the method, in the process of the invention, Representation of/>Strengthening a reward function of a learning algorithm when a trained action strategy network model under a modal working condition is optimized;
Representing a discount factor;
indicating that the simulated ray of the ray is in the/> Mode-of-operation mode/>Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
indicating the kth time.
Preferably, under the system state of each modal working condition, the steps of the simulated ray of the bats diving device after the action and the environment interaction are as follows:
when the modal working condition of the simulated ray of the bated ray submersible is a long-time self-sustaining modal working condition, the expression of the rewarding value after the action and the environment interaction is:
when the modal working condition of the simulated ray of the bated ray submersible is a high-speed maneuvering modal working condition, the expression of the rewarding value after the action and the environment interaction is:
When the modal working condition of the simulated ray of the bated diving device is the benthonic resident modal working condition, the expression of the rewarding value after the action and the environment interaction is:
In the method, in the process of the invention, Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
Representing the/>, of the simulated ray diving device under the working condition of benthonic residence mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
the variation trend of the sailing distance of the simulated ray of the batray submersible under the long-time self-sustaining mode working condition is shown;
representing the variation trend of a loss function of a corresponding action strategy network model of the simulated ray of the diving device under the working condition of a high-speed maneuvering mode;
Representing the variation trend of the sailing distance of the simulated ray of the bated ray submersible under the working condition of the benthonic residence mode;
the rewarding items of the simulated ray diving device under the long-time self-sustaining mode working condition are represented;
expressing detail punishment items of the simulated ray diving device under the long-time self-sustaining mode working condition;
A reward item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;
A detail punishment item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;
Representing the state of charge value of the battery module.
Preferably, the expression of the action taken by the simulated ray of the light diving device under the long-term self-sustaining mode working condition is as follows:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the long-time self-sustaining mode working condition;
Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the long-time self-sustaining mode working condition;
representing the action value of the simulated ray diving device when the solar power generation module is closed under the long-time self-sustaining mode working condition;
representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the long-time self-sustaining mode working condition;
Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the long-time self-sustaining mode working condition;
Representing the long-term self-sustaining mode of the simulated ray diving apparatus an action value of the battery pack module during charging under the working condition;
the action value of the simulated ray diving device when the battery pack module discharges under the long-time self-sustaining mode working condition is shown.
Preferably, the expression of the action taken by the simulated ray diving apparatus under the high-speed maneuvering mode working condition is as follows:
In the method, in the process of the invention, Representing actions taken by the ray-simulated diving device under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device when the solar power generation module is closed under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the working condition of a high-speed maneuvering mode;
representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the working condition of a high-speed maneuvering mode;
Representing the high-speed maneuvering mode of the simulated ray-bated diving device an action value of the battery pack module during charging under the working condition;
the action value of the simulated ray of the batray diving device when the battery pack module discharges under the working condition of a high-speed maneuvering mode is shown.
Preferably, the expression of the action taken by the simulated ray diving apparatus under the benthonic residence mode working condition is as follows:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the working condition of the benthonic residence mode;
Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower solar power generation module is closed;
Representing the mode working condition of the simulated ray diving device in benthonic residence an action value of the power generation of the ocean current energy power generation module;
representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower ocean current energy power generation module is closed;
Representing the residence mode of the simulated ray of the light diving device on the bottom an action value of the battery pack module during charging under the working condition;
And representing the action value of the simulated ray of the bated ray diving device when the battery pack module discharges under the working condition of the benthonic residence mode.
The beneficial effects of the invention are as follows:
Based on the relevant parameters of the simulated ray diving apparatus at the current moment when the diving apparatus sails under water, the total load power at the next moment is predicted by utilizing a load prediction model, and the energy control strategy is adjusted in advance, so that the dynamic response capability of the system is improved; then, based on the total load power of the simulated ray of the light diving device predicted by the load prediction model at the next moment under the high-speed maneuvering mode working condition, an action strategy network model under the high-speed maneuvering mode working condition is built, based on the navigation distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, an action strategy network model corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition is built, and then, the action strategy network model is optimized by utilizing a reinforcement learning algorithm, so that the optimized target action strategy network model outputs an optimal control strategy. The invention ensures that the energy system of the bate ray-imitating diving device can meet the requirement of multi-target tasks when facing complex modal working conditions; because the energy control strategy is autonomously determined by the algorithm without human intervention, the design cost and the error probability are greatly reduced compared with the traditional logic strategy, and the control precision of the energy is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an energy control strategy for a multi-source energy harvesting and storage system of a simulated ray bata submersible of the present invention;
FIG. 2 is a schematic diagram of a second-order RC equivalent circuit model in the present embodiment;
Fig. 3 is a schematic structural diagram of a ocean current energy power generation module according to the present embodiment;
FIG. 4 is a schematic diagram of a network structure of a load prediction model in the present embodiment;
FIG. 5 is a schematic diagram of a network structure of an action policy network model in the present embodiment;
Fig. 6 is a flowchart of the reinforcement learning algorithm in the present embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An embodiment of an energy control strategy of a multi-source energy-harvesting energy-storage system of a ray-simulated submersible of the invention, as shown in fig. 1, comprises:
s1, acquiring relevant parameters of each moment when the bated-ray-imitating submersible is sailed underwater;
Specifically, the relevant parameters include: the solar power generation module, the ocean current energy power generation module and the battery pack module of the simulated solar ray diving apparatus correspond to the output power.
In the embodiment, a virtual mathematical model of a multi-source energy-harvesting energy-storage system of the simulated bata diving device is firstly established; the method comprises the steps of creating a virtual mathematical model of a physical entity in a digital mode, simulating the behavior of the physical entity in a real environment by means of data, and feeding back relevant parameters of each moment when the simulated bata diving device sails underwater through virtual-real interaction between the physical entity and the virtual mathematical model; the method comprises the steps of constructing an operation environment of a multi-source energy-harvesting energy storage system of the simulated bata diving device, dividing the operation mode of the simulated bata diving device into a long-time self-sustaining mode working condition, a high-speed maneuvering mode working condition and a benthonic residence mode working condition according to task requirements of the simulated bata diving device, and respectively operating a virtual mathematical model under the three mode working conditions to obtain relevant parameters of the bata diving device at each moment when the bata diving device sails under water.
Wherein, the virtual mathematical model of the multisource energy storage system comprises: the solar power generation module, the ocean current energy power generation module and the battery pack module are all composed of lithium ion batteries, so that the battery pack module in the embodiment adopts a second-order RC equivalent circuit model; specifically, as shown in fig. 2, the second-order RC equivalent circuit model is formed by connecting two parallel RC networks in series on an internal resistance mode, expressing internal electrochemical polarization characteristics of the lithium ion battery pack in the working process by the characteristics of a capacitive resistance element, and obtaining the expression of each parameter in the circuit according to the davin theorem:
In the middle of Is the terminal voltage of the battery,/>,/>Is the terminal voltage of two RC parallel networks; /(I)Is the battery open circuit voltage; i is the circuit current; /(I)The ohmic internal resistance is composed of battery internal parts such as two-pole materials of a battery, electrolyte, an intermediate diaphragm and the like;,/> Is electrochemical polarization resistance,/> ,/>All represent electrochemical polarized capacitances caused by the polarization reaction of the cell; the second-order RC equivalent circuit model has a simple structure, can accurately represent the electrochemical reaction process inside the battery, has higher accuracy, and is one of the most widely used battery models at present.
Then, estimating the SOC of the lithium ion battery by adopting an ampere-hour integration method; state of charge of lithium ion battery at time tThe calculation formula of (2) is as follows:
Wherein the method comprises the steps of Is the initial state of charge of the battery; /(I)Is the rated capacity of the battery; /(I)Representing the current; /(I)Is coulomb factor, i.e. battery charge-discharge efficiency, is generally taken/>. The ampere-hour integration method is the most commonly used method for estimating the SOC of the battery; the relative value of the charge quantity can be obtained through integrating the current; the method is simple to operate and high in precision. However, in practical application, the integration error is increased due to the current drift phenomenon of the Hall sensor for measuring the current, and the composite energy system runs in a simulation environment and has no influence of the current drift; the initial state of charge of the lithium ion battery is also set, so that the state of charge of the battery in the working process can be accurately measured by using an ampere-hour integration method, and the state of charge value (/ >) of the battery module can be obtained by using the ampere-hour integration method)。
The basic principle of the friction nano generator is that the alternating electric field is generated by using the surface charge of a dielectric material under the action of periodic external force to drive the electrons of an external circuit to flow through the coupling of friction electrification and electrostatic induction effect, so that electric energy is output to the outside; the structure diagram of the ocean current energy power generation module is shown in figure 3; the expression of the ocean current energy power generation module for generating electricity by ocean current energy friction according to Maxwell's equation is as follows:
wherein S is the contact area; is the effective dielectric material thickness; /(I) Is the distance that the two friction materials move over time; /(I)Is a dielectric constant; /(I)Is the charge density; /(I)Representing the amount of charge generated.
Wherein, photovoltaic power generation module includes solar cell panel, and every photovoltaic array piece on the solar cell panel all includes a plurality of power generation module of establishing ties, and a plurality of power generation module of establishing ties are parallelly connected, and wherein, power generation module includes: the photo-generated current source is connected with a series resistor and a diode in series, and the parallel resistor is connected between the input end of the diode and the series resistor in parallel; the photovoltaic array is a five-parameter model, and irradiance and temperature-dependent I-V characteristics of the module are represented by using a photo-generated current source, a diode, a series resistor and a parallel resistor; the diode I-V characteristics of a single module are
Wherein,Is diode voltage; /(I)Is diode current; /(I)Is diode saturated current; nI is a diode management factorization, taking ni=0.9; k is the boltzmann constant; q is electron charge 1.6022 e-19C; ncell is the number of units in series in the module; /(I)An exponential function based on a natural constant e; /(I)Is a thermal voltage; /(I)Is the temperature.
The model of the simulated ray diving device is as follows:
Wherein, Is the power which needs to be input by the driving motor of the simulated ray diving apparatus,/>Is the driving motor torque of the simulated ray diving device,/>Efficiency of driving motor of simulated ray diving apparatus,/>Is the rotation speed of a driving motor of the simulated ray diving device,/>The thrust generated by the simulated ray diving device is L which is the structural length of the flapping wing of the simulated ray diving device; /(I)Is sailing resistance.
The collected current, voltage, navigational speed and illumination intensity are transmitted to a mathematical model of the baton-like multi-source energy-capturing energy-storage system; the parameters of the output power of the solar power generation module, the ocean current energy power generation module and the battery pack module are calculated respectively, and the other light intensity and the navigational speed can be directly acquired.
S2, a load prediction model is established, and the total load power of the simulated ray-bated submersible at the next moment is predicted;
Establishing a load prediction model, taking relevant parameters at the current moment as the input of the load prediction model, and predicting the total load power of the simulated batlight diving device at the next moment; the load prediction model is established by the following steps: constructing an initial load prediction model, training the initial load prediction model to obtain a load prediction model, collecting relevant parameters in historical data in a specific training process, taking the relevant parameters of the historical data as input of the initial load prediction model, taking the total load power of the simulated bata diving device at the next moment corresponding to the relevant parameters of the historical data at each moment as output of the initial load prediction model, and training the initial load prediction model until a loss function converges to obtain the load prediction model, wherein the total load at the initial moment is 0.
In the embodiment, the load prediction model predicts the total load of the simulated traw diving device by adopting a BPNN (binary neural network). Determining a network model as 5-input single-output; the input variables are light intensity, navigational speed, a solar power generation module, a ocean current energy generation module and a battery pack module respectively, the output power corresponding to the solar power generation module is P1, the output power corresponding to the ocean current energy generation module is P2, and the output power corresponding to the battery pack module is P3; predicting the total load power of the simulated ray of the submersible at the next moment according to the input variables; the number of neurons of the middle two hidden layers is 10, and 10 data features are extracted from five input variables through full-connection operation; finally, an output result is obtained through the full link layer, and specifically, the structure of the load prediction model is shown in fig. 5.
Specifically, the load prediction model predicts the following steps:
step 21, data preprocessing: converting the five input variables into tensor data types which can be identified by the neural network; 70% of the data were used as training data for the load prediction model, and 30% were used as test data. In order to prevent the weight of the load prediction model from deviating from the data having a large value, it is necessary to normalize the sample so that the data is symmetrical about the origin.
Step 22, forward propagation of the neuronal network: randomly initializing a weight matrix (W1, W2, W3) and bias vectors (b 1, b2, b 3) of the load prediction model; performing full-connection operation on the input value and the weight W1 to obtain 10 characteristic values; performing full-connection operation with a weight W2 after performing non-linearization by relu functions to further extract features, and performing non-linearization by relu functions; and finally obtaining an output value through W3.
The calculation process from the input layer to the hidden layer 1 is as follows:
Wherein the method comprises the steps of For/>Inputting a vector; /(I)For/>A weight matrix; /(I)For/>A weight bias parameter vector; /(I)Is relu activation functions.
The calculation process of the hidden layer 1 to the hidden layer 2 is as follows:
Wherein the method comprises the steps of For/>Inputting a vector; /(I)For/>A weight matrix; /(I)For/>A weight bias parameter vector; /(I)Is relu activation functions.
The calculation process from the hidden layer 2 to the output layer is as follows:
Wherein the method comprises the steps of For/>Weight matrix,/>Is a bias vector; /(I)Representing the feature variables of hidden layer 2.
Step 23, back propagation of the neural network: calculating model loss according to the output value of the load prediction model and the label call loss function; the weight matrix is updated using an Adam optimizer based on the loss values.
The loss function SSE of the load prediction model is calculated as follows:
In the method, in the process of the invention, Representing absolute error; /(I)Representing the actual total load power of the simulated ray diving device at the next moment; /(I)And the total load power of the simulated ray of the diving device predicted by the load prediction model at the next moment is represented.
S3, constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition;
Specifically, based on the total load power of the simulated batray diving apparatus predicted by the load prediction model at the next moment under the high-speed maneuver mode working condition, constructing a corresponding loss function under the high-speed maneuver mode working condition, and based on the corresponding loss function under the high-speed maneuver mode working condition, constructing an action strategy network model of the simulated batray diving apparatus under the high-speed maneuver mode working condition; constructing a corresponding loss function of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the sailing distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, and constructing a corresponding action strategy network model of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the corresponding loss function under the long-term self-sustaining mode working condition and the benthonic residence mode working condition; wherein the inputs and inputs of the action policy network model and the network structure are shown in fig. 5.
Step 31, constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition;
Step 311, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the long-term self-sustaining mode working condition:
Step 3111, constructing a loss function corresponding to the simulated ray diving apparatus under the long-term self-sustaining mode working condition: because the long-time self-sustaining mode working condition refers to the working condition of the simulated batray diving apparatus when executing some long-distance and long-period cruising tasks; the energy system is required to achieve self-sufficiency of energy by capturing solar energy and ocean current energy, the main purpose is to improve the endurance of the submersible, and the requirement on the maneuverability of the submersible is not high, so the long-term self-sustaining mode working condition focuses on the state of charge value of the battery module, and the expression of the corresponding loss function under the long-term self-sustaining mode working condition is as follows:
In the method, in the process of the invention, Representing the corresponding loss function value under the long-time self-sustaining mode working condition;
the furthest sailing distance of the simulated bata ray diving device under the long-time self-sustaining mode working condition is shown.
Step 3112, based on the loss function corresponding to the long-term self-sustaining mode working condition, constructing an initial action strategy network model corresponding to the long-term self-sustaining mode working condition of the simulated bata diving device.
Step 3113, training an initial action strategy network model under a long-term self-sustaining mode working condition to obtain an action strategy network model corresponding to the simulated bata diving device under the long-term self-sustaining mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the action taken by the simulated ray of the light diving device at the current moment are input as the corresponding initial action strategy network model under the long-term self-sustaining mode working condition, the action taken by the simulated ray of the light diving device under the long-term self-sustaining mode working condition at the next moment is output as the corresponding initial action strategy network model under the long-term self-sustaining mode working condition, and the initial action strategy network model under the long-term self-sustaining mode working condition is trained to obtain the trained action strategy network model under the long-term self-sustaining mode working condition.
The expression of the action taken by the simulated ray diving apparatus under the long-term self-sustaining mode working condition in the embodiment is as follows:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device when the solar power generation module is closed under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the long-time self-sustaining mode working condition; /(I)Representing the long-term self-sustaining mode of the simulated ray diving apparatus an action value of the battery pack module during charging under the working condition; /(I)The action value of the simulated ray diving device when the battery pack module discharges under the long-time self-sustaining mode working condition is shown.
Step 312, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the high-speed maneuvering mode working condition:
step 3121, constructing a loss function of the simulated ray diving apparatus under the high-speed maneuvering mode working condition:
The bate ray-imitating diving device is used for target pursuing and rapid striking of high-speed movement under the working condition of high-speed maneuvering mode. Requiring the simulated bata diving instrument to have faster dynamic response capability under the high-speed maneuvering mode working condition, wherein the energy system is required to provide all power required by a load (namely the predicted power of a load prediction model) in a short time; the requirement on the cruising ability is not high, so the expression of the corresponding loss function under the high-speed maneuvering mode working condition in the embodiment is as follows:
In the method, in the process of the invention, Representing a corresponding loss function value under a high-speed maneuvering mode working condition;
representing the total load power of the simulated ray of the batray diving apparatus predicted by the load prediction model at the next moment;
Representing the total power output by the simulated ray diving device at the next moment under the working condition of a high-speed maneuvering mode;
Representing the output power of the solar power generation module;
representing the output power of the ocean current energy power generation module;
representing the output power of the battery module.
Step 3122, based on the corresponding loss function under the high-speed maneuvering mode working condition, constructing an initial action strategy network model corresponding to the simulated bata diving device under the high-speed maneuvering mode working condition.
Step 3123, training an initial action strategy network model under a high-speed maneuvering mode working condition to obtain an action strategy network model corresponding to the simulated batlight diving device under the high-speed maneuvering mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray-light diving device under the high-speed maneuvering mode working condition and the action taken by the simulated ray-light diving device at the current moment are input as the corresponding initial action strategy network model under the high-speed maneuvering mode working condition, the action taken by the simulated ray-light diving device at the next moment under the high-speed maneuvering mode working condition is output as the corresponding initial action strategy network model under the high-speed maneuvering mode working condition, and the initial action strategy network model under the high-speed maneuvering mode working condition is trained to obtain the trained action strategy network model under the high-speed maneuvering mode working condition.
The expression of the action taken by the simulated ray of the bats submersible under the working condition of a high-speed maneuvering mode is as follows:
In the method, in the process of the invention, Representing actions taken by the ray-simulated diving device under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device when the solar power generation module is closed under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the working condition of a high-speed maneuvering mode; /(I)Representing the high-speed maneuvering mode of the simulated ray-bated diving device an action value of the battery pack module during charging under the working condition; /(I)The action value of the simulated ray of the batray diving device when the battery pack module discharges under the working condition of a high-speed maneuvering mode is shown.
Step 313, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the benthonic residence mode working condition:
Step 3131, constructing a loss function corresponding to the simulated ray diving apparatus under the benthonic residence mode working condition: the benthonic residence mode working condition of the simulated ray diving device is used in the long-time incubation task; only capturing ocean current energy for charging under the benthonic residence mode working condition, and meanwhile, the requirement on the maneuverability of the ocean current energy is not high, so that the expression of a corresponding loss function under the benthonic residence mode working condition in the embodiment is as follows:
In the method, in the process of the invention, Representing a corresponding loss function value under the benthonic residence mode working condition; /(I)The furthest sailing distance of the simulated ray of the bats is shown under the working condition of the benthonic residence mode.
Step 3132, based on the corresponding loss function under the benthonic residence mode working condition, an initial action strategy network model corresponding to the bata-simulated diving device under the benthonic residence mode working condition can be constructed.
Step 3123, training an initial action strategy network model under the benthonic residence mode working condition to obtain an action strategy network model corresponding to the simulated bata submersible under the benthonic residence mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray-simulated diving device under the benthonic retention mode working condition and the action taken by the simulated ray-simulated diving device at the current moment are taken as the corresponding initial action strategy network model input under the benthonic retention mode working condition, the action taken by the simulated ray-simulated diving device at the next moment under the benthonic retention mode working condition is taken as the corresponding initial action strategy network model output under the benthonic retention mode working condition, and the initial action strategy network model under the benthonic retention mode working condition is trained to obtain the trained action strategy network model under the benthonic retention mode working condition
The expression of the action taken by the simulated ray diving apparatus under the benthonic residence mode working condition in the embodiment is as follows:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the working condition of the benthonic residence mode; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower solar power generation module is closed; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value of the power generation of the ocean current energy power generation module; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower ocean current energy power generation module is closed; /(I)Representing the residence mode of the simulated ray of the light diving device on the bottom an action value of the battery pack module during charging under the working condition; /(I)And representing the action value of the simulated ray of the bated ray diving device when the battery pack module discharges under the working condition of the benthonic residence mode. /(I)
S4, acquiring a target action strategy network model;
Because, the executing action of the simulated ray of the light diving device is selected through the action strategy network model, and the simulated ray of the light diving device is interacted with the environment in which the simulated ray of the light diving device is positioned. Calculating the state change of the simulated ray of the light diving device after taking certain action through the simulated ray of the light diving device model; selecting proper observation variables according to task characteristics under different modal working conditions of the simulated ray of the bats, and calculating the scoring condition of the simulated ray of the bats on the basis of the variables and the action taken by a reward evaluation system; deducting the score if the state change of the simulated batline diving device after the action does not accord with the ideal condition or exceeds the constraint condition; the selection of the action strategy network model is a continuous optimization process, the best action is continuously searched through a trial-and-error mechanism, and the maximum accumulated return is obtained by improving the self behavior mode, so that the best action (the strategy with the highest score) is searched by adopting the reinforcement learning algorithm shown in fig. 6, and the action at the initial moment is obtained by random initialization; since it does not have a learning value, the evaluation corresponding to the initial operation is 0. The step of optimizing the action strategy network model under each modal working condition by adopting the reinforcement learning algorithm to obtain the target action strategy network model comprises the following steps:
Step 41, when optimizing the action strategy network model under the long-term self-sustaining mode working condition, the expression of the reward function of the reinforcement learning algorithm is as follows:
In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing the action strategy network model under the long-term self-sustaining mode working condition; /(I)Representing discount factors,/>Representing/>, of discount factorsTo the power,/>;/>Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.
In the long-time self-sustaining mode working condition of the simulated ray diving device, the expression of the prize value after the action taken and the environmental interaction is:
Wherein, Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)The variation trend of the sailing distance of the simulated ray of the batray submersible under the long-time self-sustaining mode working condition is shown; /(I)The rewarding items of the simulated ray diving device under the long-time self-sustaining mode working condition are represented; /(I)Expressing detail punishment items of the simulated ray diving device under the long-time self-sustaining mode working condition; /(I)Representing a state of charge value of the battery module; in order to ensure that the action strategy network of the simulated ray of the light diving device is continuously updated and is prevented from being trapped into local optimum, the situation that the sailing distance of the simulated ray of the light diving device is unchanged under the long-term self-sustaining mode working condition is set as a deduction item; while adding a state of charge (SOC) detail penalty. /(I)
Step 42, when optimizing the action strategy network model under the high-speed maneuver mode working condition, the expression of the reward function of the reinforcement learning algorithm is:
In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing an action strategy network model under a high-speed maneuvering mode working condition; /(I)Representing discount factors,/>;/>Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.
High in imitation ray diving device in the working condition of the fast maneuvering mode, the expression of the prize value after the action taken and the environmental interaction is:
In the method, in the process of the invention, Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)The variation trend of the loss function of the corresponding action strategy network model of the simulated ray of the light diving device under the working condition of the high-speed maneuvering mode is shown.
Step 43, when optimizing the action strategy network model under the benthonic resident mode working condition, the expression of the reward function of the reinforcement learning algorithm is as follows:
In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing the action strategy network model under the benthonic resident mode working condition; /(I)Representing discount factors,/>;/>Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.
When the simulated ray of the bats is in the working condition of the benthonic residence mode, the expression of the rewarding value after the action and the environment interaction is:
/>
In the method, in the process of the invention, Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Representing the variation trend of the sailing distance of the simulated ray of the bated ray submersible under the working condition of the benthonic residence mode; /(I)A reward item of the simulated ray diving device under the working condition of the benthonic residence mode is represented; /(I)A detail punishment item of the simulated ray diving device under the working condition of the benthonic residence mode is represented; /(I)Representing the state of charge value of the battery module.
The corresponding target action strategy network model under each mode working condition after the action strategy network model under each mode working condition is optimized by adopting the reinforcement learning algorithm can be obtained.
S5, predicting actions taken by the simulated batray diving device at the next moment under the current modal working condition, and controlling the actions of the simulated batray diving device.
Step 51, when the simulated ray diving apparatus selects a long-time self-sustaining mode working condition: the action strategy selection network inputs the charge state value, the light intensity and the navigational speed of the battery pack module and the action taken by the batray-imitating diving device, and the action strategy selection network aims at the maximum navigational distance and selects the maximum navigational distance from the targetThe execution strategy with the highest accumulated return under the reward mechanism outputs the action to be taken at the next moment; and then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.
Step 52, when the simulated bate ray diving apparatus selects a high-speed maneuvering mode working condition: the action policy selection network inputs the state of charge value, the light intensity, the navigational speed and the action taken by the battery pack module, and the action policy selection network targets the load dynamic response capability and selects the action in the battery pack moduleThe execution strategy with the highest accumulated return under the reward mechanism outputs the action a t+1 which should be taken at the next moment. And then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.
Step 53, when the simulated bated ray diving device selects a benthonic residence mode working condition: the action strategy selection network inputs the state of charge value, the light intensity, the navigational speed and the action taken by the battery pack module, and the action strategy selection network aims at the maximum navigational distance and selects the maximum navigational distanceThe execution strategy with the highest accumulated return under the reward mechanism outputs the action to be taken at the next moment. And then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.
And 54, controlling the action of the simulated ray diving device according to the action taken by the simulated ray diving device at the next moment under the current modal working condition.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (10)
1. An energy control strategy of a multi-source energy harvesting and storing system of a simulated ray of a ray, which is characterized by comprising:
acquiring relevant parameters of each moment when the simulated ray diving device sails underwater, wherein the relevant parameters comprise: the solar power generation module, the ocean current energy power generation module and the battery pack module of the simulated ray-bated diving device correspond to the output power of the light intensity, the navigational speed and the simulated ray-bated diving device;
establishing a load prediction model, taking relevant parameters at the current moment as the input of the load prediction model, and predicting the total load power of the simulated batlight diving device at the next moment;
Constructing a corresponding loss function under a high-speed maneuver mode working condition based on the total load power of the simulated ray-simulated diving device under the next moment under the high-speed maneuver mode working condition, which is predicted by the load prediction model, and constructing an action strategy network model of the simulated ray-simulated diving device under the high-speed maneuver mode working condition based on the corresponding loss function under the high-speed maneuver mode working condition;
constructing a corresponding loss function of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the sailing distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, and constructing a corresponding action strategy network model of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the corresponding loss function under the long-term self-sustaining mode working condition and the benthonic residence mode working condition;
optimizing the action strategy network model under each modal working condition by adopting a reinforcement learning algorithm to obtain a target action strategy network model;
And inputting the charge state value, the light intensity, the navigational speed of the battery pack module and the action taken by the batray-simulated diving device under the current mode working condition into a corresponding target action strategy network model under the current mode working condition, predicting the action taken by the batray-simulated diving device under the current mode working condition at the next moment, and controlling the action of the batray-simulated diving device according to the action taken by the batray-simulated diving device under the current mode working condition at the next moment.
2. The energy control strategy of the multi-source energy-harvesting energy-storage system of the bata-ray-simulating submersible as claimed in claim 1, wherein the expression of the corresponding loss function under the long-term self-sustaining mode condition is:
In the method, in the process of the invention, Representing the corresponding loss function value under the long-time self-sustaining mode working condition;
the furthest sailing distance of the simulated bata ray diving device under the long-time self-sustaining mode working condition is shown.
3. The energy control strategy of the multi-source energy-capturing energy storage system of the simulated bata-diving apparatus according to claim 1, wherein the expression of the corresponding loss function under the high-speed maneuvering mode working condition is as follows:
In the method, in the process of the invention, Representing a corresponding loss function value under a high-speed maneuvering mode working condition;
representing the total load power of the simulated ray of the batray diving apparatus predicted by the load prediction model at the next moment;
Representing the total power output by the simulated ray diving device at the next moment under the working condition of a high-speed maneuvering mode;
Representing the output power of the solar power generation module;
representing the output power of the ocean current energy power generation module;
representing the output power of the battery module.
4. The energy control strategy of the multi-source energy-harvesting energy storage system of the simulated ray-in-bata submersible of claim 1, wherein the expression of the corresponding loss function under the benthonic residence mode condition is:
In the method, in the process of the invention, Representing a corresponding loss function value under the benthonic residence mode working condition;
The furthest sailing distance of the simulated ray of the bats is shown under the working condition of the benthonic residence mode.
5. The energy control strategy of the multi-source energy storage system of the simulated bata-diving device according to claim 1, wherein the step of constructing the corresponding action strategy network model of the simulated bata-diving device under each modal condition is characterized by:
Constructing an initial action strategy network model: taking the corresponding loss function under each mode working condition as the loss function of the network model to obtain a corresponding initial action strategy network model under each mode working condition;
Training an initial action strategy network model: the method comprises the steps of inputting a state of charge value, light intensity, navigational speed of a battery module of the simulated ray diving apparatus under each modal working condition and actions taken by the simulated ray diving apparatus at the current moment as corresponding initial action strategy network models under the modal working condition, outputting actions taken by the simulated ray diving apparatus under the modal working condition as corresponding initial action strategy network models under the modal working condition, and training the initial action strategy network models under each modal working condition to obtain trained action strategy network models under each modal working condition;
and taking the trained action strategy network model as a corresponding action strategy network model under each modal working condition.
6. The energy control strategy of the multi-source energy-harvesting energy storage system of the bata-ray-simulating submersible as claimed in claim 1, wherein when the action strategy network model under each modal working condition is optimized, the expression of the reward function of the reinforcement learning algorithm is as follows:
In the method, in the process of the invention, Representation of/>Strengthening a reward function of a learning algorithm when a trained action strategy network model under a modal working condition is optimized;
Representing a discount factor;
indicating that the simulated ray of the ray is in the/> Mode-of-operation mode/>Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
indicating the kth time.
7. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-ray submersible of claim 6, wherein the steps of obtaining the rewarding value after the action and the environmental interaction taken by the simulated bata-ray submersible under the system state of each modal working condition are as follows:
when the modal working condition of the simulated ray of the bated ray submersible is a long-time self-sustaining modal working condition, the expression of the rewarding value after the action and the environment interaction is:
when the modal working condition of the simulated ray of the bated ray submersible is a high-speed maneuvering modal working condition, the expression of the rewarding value after the action and the environment interaction is:
When the modal working condition of the simulated ray of the bated diving device is the benthonic resident modal working condition, the expression of the rewarding value after the action and the environment interaction is:
In the method, in the process of the invention, Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
Representing the/>, of the simulated ray diving device under the working condition of benthonic residence mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;
the variation trend of the sailing distance of the simulated ray of the batray submersible under the long-time self-sustaining mode working condition is shown;
representing the variation trend of a loss function of a corresponding action strategy network model of the simulated ray of the diving device under the working condition of a high-speed maneuvering mode;
Representing the variation trend of the sailing distance of the simulated ray of the bated ray submersible under the working condition of the benthonic residence mode;
the rewarding items of the simulated ray diving device under the long-time self-sustaining mode working condition are represented;
expressing detail punishment items of the simulated ray diving device under the long-time self-sustaining mode working condition;
A reward item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;
A detail punishment item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;
Representing the state of charge value of the battery module.
8. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-diving apparatus of claim 1, wherein the expression of the action taken by the simulated bata-diving apparatus under the long-term self-sustaining mode working condition is as follows:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the long-time self-sustaining mode working condition;
Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the long-time self-sustaining mode working condition;
representing the action value of the simulated ray diving device when the solar power generation module is closed under the long-time self-sustaining mode working condition;
representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the long-time self-sustaining mode working condition;
Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the long-time self-sustaining mode working condition;
Representing the long-term self-sustaining mode of the simulated ray diving apparatus an action value of the battery pack module during charging under the working condition;
the action value of the simulated ray diving device when the battery pack module discharges under the long-time self-sustaining mode working condition is shown.
9. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-diving apparatus of claim 1, wherein the expression of the action taken by the simulated bata-diving apparatus under the high-speed maneuvering mode condition is as follows:
In the method, in the process of the invention, Representing actions taken by the ray-simulated diving device under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device when the solar power generation module is closed under the working condition of a high-speed maneuvering mode;
Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the working condition of a high-speed maneuvering mode;
representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the working condition of a high-speed maneuvering mode;
Representing the high-speed maneuvering mode of the simulated ray-bated diving device an action value of the battery pack module during charging under the working condition;
the action value of the simulated ray of the batray diving device when the battery pack module discharges under the working condition of a high-speed maneuvering mode is shown.
10. The energy control strategy of the multi-source energy-harvesting energy storage system of the simulated ray-of-bata submersible according to claim 1, wherein the expression of the action taken by the simulated ray-of-bata submersible in the benthonic residence mode condition is:
In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the working condition of the benthonic residence mode;
Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower solar power generation module is closed;
Representing the mode working condition of the simulated ray diving device in benthonic residence an action value of the power generation of the ocean current energy power generation module;
representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower ocean current energy power generation module is closed;
Representing the residence mode of the simulated ray of the light diving device on the bottom an action value of the battery pack module during charging under the working condition;
And representing the action value of the simulated ray of the bated ray diving device when the battery pack module discharges under the working condition of the benthonic residence mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410658334.1A CN118249474B (en) | 2024-05-27 | 2024-05-27 | Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410658334.1A CN118249474B (en) | 2024-05-27 | 2024-05-27 | Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118249474A true CN118249474A (en) | 2024-06-25 |
CN118249474B CN118249474B (en) | 2024-08-06 |
Family
ID=91554042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410658334.1A Active CN118249474B (en) | 2024-05-27 | 2024-05-27 | Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118249474B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012245944A (en) * | 2011-05-31 | 2012-12-13 | Sugino Gomu Kagaku Kogyosho:Kk | Seabed exploration apparatus |
JP2021034050A (en) * | 2019-08-21 | 2021-03-01 | 哈爾浜工程大学 | Auv action plan and operation control method based on reinforcement learning |
CN113381639A (en) * | 2021-06-17 | 2021-09-10 | 河南科技学院 | Robot is from electricity generation and micro-energy storage and discharge system under micro-sea environment |
CN115986834A (en) * | 2022-12-07 | 2023-04-18 | 北京交通大学 | Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system |
CN117421566A (en) * | 2023-12-19 | 2024-01-19 | 国网山东省电力公司营销服务中心(计量中心) | Photovoltaic power generation power prediction method based on IMRFO-StemNN |
CN117879128A (en) * | 2024-03-13 | 2024-04-12 | 西北工业大学宁波研究院 | Energy management system and management strategy of large container ship composite energy system |
CN117977818A (en) * | 2023-09-13 | 2024-05-03 | 西北工业大学宁波研究院 | Multi-source energy harvesting-distributed energy storage system and method for simulated ray of light diving device |
-
2024
- 2024-05-27 CN CN202410658334.1A patent/CN118249474B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012245944A (en) * | 2011-05-31 | 2012-12-13 | Sugino Gomu Kagaku Kogyosho:Kk | Seabed exploration apparatus |
JP2021034050A (en) * | 2019-08-21 | 2021-03-01 | 哈爾浜工程大学 | Auv action plan and operation control method based on reinforcement learning |
CN113381639A (en) * | 2021-06-17 | 2021-09-10 | 河南科技学院 | Robot is from electricity generation and micro-energy storage and discharge system under micro-sea environment |
CN115986834A (en) * | 2022-12-07 | 2023-04-18 | 北京交通大学 | Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system |
CN117977818A (en) * | 2023-09-13 | 2024-05-03 | 西北工业大学宁波研究院 | Multi-source energy harvesting-distributed energy storage system and method for simulated ray of light diving device |
CN117421566A (en) * | 2023-12-19 | 2024-01-19 | 国网山东省电力公司营销服务中心(计量中心) | Photovoltaic power generation power prediction method based on IMRFO-StemNN |
CN117879128A (en) * | 2024-03-13 | 2024-04-12 | 西北工业大学宁波研究院 | Energy management system and management strategy of large container ship composite energy system |
Non-Patent Citations (2)
Title |
---|
JUNJIE HE ET AL.: "A New Type of Bionic Manta Ray Robot", GLOBAL OCEANS 2020: SINGAPORE – U.S. GULF COAST, 9 April 2021 (2021-04-09), pages 1 - 6 * |
高鹏骋 等: "蝠鲼集群滑翔水动力性能研究", 西北工业大学学报, vol. 41, no. 3, 30 June 2023 (2023-06-30), pages 595 - 600 * |
Also Published As
Publication number | Publication date |
---|---|
CN118249474B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liang et al. | Evolutionary multi-task optimization for parameters extraction of photovoltaic models | |
CN113511082B (en) | Hybrid electric vehicle energy management method based on rule and double-depth Q network | |
KR100886891B1 (en) | Control system of solar cell generation using genetic algorithm and neuro fuzzy controller | |
Sharma et al. | Identification of photovoltaic module parameters by implementing a novel teaching learning based optimization with unique exemplar generation scheme (TLBO-UEGS) | |
Dou et al. | Extreme learning machine model for state-of-charge estimation of lithium-ion battery using salp swarm algorithm | |
Rezk et al. | A novel strategy based on recent equilibrium optimizer to enhance the performance of PEM fuel cell system through optimized fuzzy logic MPPT | |
Zhao et al. | The li-ion battery state of charge prediction of electric vehicle using deep neural network | |
CN113469839A (en) | Smart park optimization strategy based on deep reinforcement learning | |
Pan et al. | Research on variable pitch control strategy of direct-driven offshore wind turbine using KELM wind speed soft sensor | |
May et al. | Battery-degradation model based on the ANN regression function for EV applications | |
CN118249474B (en) | Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device | |
Yu et al. | A robust method based on reinforcement learning and differential evolution for the optimal photovoltaic parameter extraction | |
CN115542168A (en) | Lithium battery residual service life prediction method based on fusion data driving model | |
Raj et al. | Numerical simulation and performance assessment of ANN-INC improved maximum power point tracking system for solar photovoltaic system under changing irradiation operation | |
CN117879128B (en) | Energy management system and management strategy of large container ship composite energy system | |
CN114139778A (en) | Wind turbine generator power prediction modeling method and device | |
Zhang et al. | State of charge estimation of Li-ion battery for underwater vehicles based on EKF–RELM under temperature-varying conditions | |
Raj et al. | Numerical Simulation and Comparative Assessment of Improved Cuckoo Search and PSO based MPPT System for Solar Photovoltaic System Under Partial Shading Condition | |
Sebbane et al. | ANN training using fireworks algorithm and its variants for PV array fault classification | |
CN114048576B (en) | Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid | |
Mai et al. | Combining Dynamic Adaptive Snake Algorithm with Perturbation and Observation for MPPT in PV Systems under Shading Conditions | |
Manoj et al. | A comprehensive review on optimization and artificial intelligence algorithms for effective battery management in EVs | |
Chandraprabha et al. | LSTM model based wind speed forecasting | |
El Shahat | Neural network storage unit parameters modelling | |
Praveen et al. | A dual-axis solar tracking system with minimized tracking error through optimization technique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |