CN118249474A

CN118249474A - Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device

Info

Publication number: CN118249474A
Application number: CN202410658334.1A
Authority: CN
Inventors: 卢丞一; 张劭玮; 裴毓; 李炬晨; 潘光; 罗斯伦; 曹永辉; 晁赫; 任鑫彤
Original assignee: Ningbo Research Institute of Northwestern Polytechnical University
Current assignee: Ningbo Research Institute of Northwestern Polytechnical University
Priority date: 2024-05-27
Filing date: 2024-05-27
Publication date: 2024-06-25
Anticipated expiration: 2044-05-27
Also published as: CN118249474B

Abstract

The invention relates to the technical field of energy management, in particular to an energy control strategy of a multi-source energy harvesting and storing system of a simulated bata diving device, which comprises the following steps: acquiring relevant parameters of each moment when the simulated ray diving device sails underwater; establishing a load prediction model, and predicting the total load power of the simulated ray of the submersible at the next moment; constructing a corresponding action strategy network model of the simulated ray of the batray diving device under each modal working condition; acquiring a target action strategy network model; predicting the action taken by the simulated ray diving device at the next moment under the current modal working condition, and controlling the action of the simulated ray diving device. The invention ensures that the energy system of the simulated ray diving device can meet the requirement of multi-target tasks when facing complex modal working conditions; and the energy control strategy is autonomously determined by an algorithm without human intervention, so that the design cost and the error probability are greatly reduced compared with the traditional logic strategy, and the control precision of energy is improved.

Description

Energy control strategy of multi-source energy harvesting and storing system of simulated ray of the Hepialus logging device

Technical Field

The invention relates to the technical field of energy management, in particular to an energy control strategy of a multi-source energy harvesting and storing system of an imitation bata diving device.

Background

The 'simulated ray diving device' is an innovative scientific and technological result and has wide prospect and great potential. It not only provides new ideas and tools, but also makes an important contribution to maintaining marine rights. However, a single lithium battery type is difficult to support the submersible for long-term hidden work in deep sea, so that various energy storage and energy supply devices are designed and mounted aiming at the performance requirements and the working environment of the bionic fish submersible for improving the endurance capacity of the submersible. Specifically, the back of the bionic fish is provided with a solar energy capturing system for capturing solar energy; the ocean current energy friction power generation device is arranged at the abdomen of the ocean current energy friction power generation device and is used for capturing ocean current energy; finally, flexible lithium batteries which can be bent are carried on the flapping wing mechanisms at two sides; the flexibility and the conventional lithium battery are combined, so that the distributed storage of multi-source energy harvesting is realized. The multi-source energy harvesting-storing system greatly improves the radius of exploration of underwater unmanned equipment in the deep sea by capturing renewable energy sources of the deep sea multi-system.

However, in the face of complex and changeable deep sea environments and multi-mode task demands, the traditional experience-based control strategy is difficult to adapt, mutual coordination among multi-source energy harvesting and storage systems cannot be performed, and a corresponding optimal control strategy cannot be designed according to different working conditions of different environments, so that the accuracy of the control strategy is low.

Thus, the first and second substrates are bonded together, it is necessary to provide a multi-source energy-capturing and energy-storing device for a simulated ray of the bata an energy control strategy for the system to solve the above problems.

Disclosure of Invention

The invention provides an energy control strategy of a multi-source energy-harvesting energy storage system of an imitation ray of a Chinese-character 'Mi' submersible, which aims at solving the problems that the existing experience-based control strategy is difficult to adapt, the mutual coordination among the multi-source energy-harvesting energy storage systems can not be carried out, and the corresponding optimal control strategy can not be designed aiming at different working conditions of different environments, so that the accuracy of the control strategy is lower.

The invention discloses an energy control strategy of a multi-source energy-harvesting energy-storage system of a simulated ray of a ray, which adopts the following technical scheme that:

acquiring relevant parameters of each moment when the simulated ray diving device sails underwater, wherein the relevant parameters comprise: the solar power generation module, the ocean current energy power generation module and the battery pack module of the simulated ray-bated diving device correspond to the output power of the light intensity, the navigational speed and the simulated ray-bated diving device;

establishing a load prediction model, taking relevant parameters at the current moment as the input of the load prediction model, and predicting the total load power of the simulated batlight diving device at the next moment;

Constructing a corresponding loss function under a high-speed maneuver mode working condition based on the total load power of the simulated ray-simulated diving device under the next moment under the high-speed maneuver mode working condition, which is predicted by the load prediction model, and constructing an action strategy network model of the simulated ray-simulated diving device under the high-speed maneuver mode working condition based on the corresponding loss function under the high-speed maneuver mode working condition;

constructing a corresponding loss function of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the sailing distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, and constructing a corresponding action strategy network model of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the corresponding loss function under the long-term self-sustaining mode working condition and the benthonic residence mode working condition;

optimizing the action strategy network model under each modal working condition by adopting a reinforcement learning algorithm to obtain a target action strategy network model;

And inputting the charge state value, the light intensity, the navigational speed of the battery pack module and the action taken by the batray-simulated diving device under the current mode working condition into a corresponding target action strategy network model under the current mode working condition, predicting the action taken by the batray-simulated diving device under the current mode working condition at the next moment, and controlling the action of the batray-simulated diving device according to the action taken by the batray-simulated diving device under the current mode working condition at the next moment.

Preferably, the expression of the corresponding loss function under the long-term self-sustaining mode working condition is as follows:

In the method, in the process of the invention, Representing the corresponding loss function value under the long-time self-sustaining mode working condition;

the furthest sailing distance of the simulated bata ray diving device under the long-time self-sustaining mode working condition is shown.

Preferably, the expression of the corresponding loss function under the high-speed maneuvering mode working condition is as follows:

In the method, in the process of the invention, Representing a corresponding loss function value under a high-speed maneuvering mode working condition;

representing the total load power of the simulated ray of the batray diving apparatus predicted by the load prediction model at the next moment;

Representing the total power output by the simulated ray diving device at the next moment under the working condition of a high-speed maneuvering mode;

Representing the output power of the solar power generation module;

representing the output power of the ocean current energy power generation module;

representing the output power of the battery module.

Preferably, the expression of the corresponding loss function under the benthonic residence mode working condition is as follows:

In the method, in the process of the invention, Representing a corresponding loss function value under the benthonic residence mode working condition;

The furthest sailing distance of the simulated ray of the bats is shown under the working condition of the benthonic residence mode.

Preferably, the step of constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition comprises the following steps:

Constructing an initial action strategy network model: taking the corresponding loss function under each mode working condition as the loss function of the network model to obtain a corresponding initial action strategy network model under each mode working condition;

Training an initial action strategy network model: the method comprises the steps of inputting a state of charge value, light intensity, navigational speed of a battery module of the simulated ray diving apparatus under each modal working condition and actions taken by the simulated ray diving apparatus at the current moment as corresponding initial action strategy network models under the modal working condition, outputting actions taken by the simulated ray diving apparatus under the modal working condition as corresponding initial action strategy network models under the modal working condition, and training the initial action strategy network models under each modal working condition to obtain trained action strategy network models under each modal working condition;

and taking the trained action strategy network model as a corresponding action strategy network model under each modal working condition.

Preferably, when optimizing the action strategy network model under each modal working condition, the expression of the reward function of the reinforcement learning algorithm is:

In the method, in the process of the invention, Representation of/>Strengthening a reward function of a learning algorithm when a trained action strategy network model under a modal working condition is optimized;

Representing a discount factor;

indicating that the simulated ray of the ray is in the/> Mode-of-operation mode/>Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;

indicating the kth time.

Preferably, under the system state of each modal working condition, the steps of the simulated ray of the bats diving device after the action and the environment interaction are as follows:

when the modal working condition of the simulated ray of the bated ray submersible is a long-time self-sustaining modal working condition, the expression of the rewarding value after the action and the environment interaction is:

when the modal working condition of the simulated ray of the bated ray submersible is a high-speed maneuvering modal working condition, the expression of the rewarding value after the action and the environment interaction is:

When the modal working condition of the simulated ray of the bated diving device is the benthonic resident modal working condition, the expression of the rewarding value after the action and the environment interaction is:

In the method, in the process of the invention, Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;

Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;

Representing the/>, of the simulated ray diving device under the working condition of benthonic residence mode Under the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction;

the variation trend of the sailing distance of the simulated ray of the batray submersible under the long-time self-sustaining mode working condition is shown;

representing the variation trend of a loss function of a corresponding action strategy network model of the simulated ray of the diving device under the working condition of a high-speed maneuvering mode;

Representing the variation trend of the sailing distance of the simulated ray of the bated ray submersible under the working condition of the benthonic residence mode;

the rewarding items of the simulated ray diving device under the long-time self-sustaining mode working condition are represented;

expressing detail punishment items of the simulated ray diving device under the long-time self-sustaining mode working condition;

A reward item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;

A detail punishment item of the simulated ray diving device under the working condition of the benthonic residence mode is represented;

Representing the state of charge value of the battery module.

Preferably, the expression of the action taken by the simulated ray of the light diving device under the long-term self-sustaining mode working condition is as follows:

In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the long-time self-sustaining mode working condition;

Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the long-time self-sustaining mode working condition;

representing the action value of the simulated ray diving device when the solar power generation module is closed under the long-time self-sustaining mode working condition;

representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the long-time self-sustaining mode working condition;

Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the long-time self-sustaining mode working condition;

Representing the long-term self-sustaining mode of the simulated ray diving apparatus an action value of the battery pack module during charging under the working condition;

the action value of the simulated ray diving device when the battery pack module discharges under the long-time self-sustaining mode working condition is shown.

Preferably, the expression of the action taken by the simulated ray diving apparatus under the high-speed maneuvering mode working condition is as follows:

In the method, in the process of the invention, Representing actions taken by the ray-simulated diving device under the working condition of a high-speed maneuvering mode;

Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the working condition of a high-speed maneuvering mode;

Representing the action value of the simulated ray diving device when the solar power generation module is closed under the working condition of a high-speed maneuvering mode;

Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the working condition of a high-speed maneuvering mode;

representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the working condition of a high-speed maneuvering mode;

Representing the high-speed maneuvering mode of the simulated ray-bated diving device an action value of the battery pack module during charging under the working condition;

the action value of the simulated ray of the batray diving device when the battery pack module discharges under the working condition of a high-speed maneuvering mode is shown.

Preferably, the expression of the action taken by the simulated ray diving apparatus under the benthonic residence mode working condition is as follows:

In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the working condition of the benthonic residence mode;

Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower solar power generation module is closed;

Representing the mode working condition of the simulated ray diving device in benthonic residence an action value of the power generation of the ocean current energy power generation module;

representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower ocean current energy power generation module is closed;

Representing the residence mode of the simulated ray of the light diving device on the bottom an action value of the battery pack module during charging under the working condition;

And representing the action value of the simulated ray of the bated ray diving device when the battery pack module discharges under the working condition of the benthonic residence mode.

The beneficial effects of the invention are as follows:

Based on the relevant parameters of the simulated ray diving apparatus at the current moment when the diving apparatus sails under water, the total load power at the next moment is predicted by utilizing a load prediction model, and the energy control strategy is adjusted in advance, so that the dynamic response capability of the system is improved; then, based on the total load power of the simulated ray of the light diving device predicted by the load prediction model at the next moment under the high-speed maneuvering mode working condition, an action strategy network model under the high-speed maneuvering mode working condition is built, based on the navigation distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, an action strategy network model corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition is built, and then, the action strategy network model is optimized by utilizing a reinforcement learning algorithm, so that the optimized target action strategy network model outputs an optimal control strategy. The invention ensures that the energy system of the bate ray-imitating diving device can meet the requirement of multi-target tasks when facing complex modal working conditions; because the energy control strategy is autonomously determined by the algorithm without human intervention, the design cost and the error probability are greatly reduced compared with the traditional logic strategy, and the control precision of the energy is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an energy control strategy for a multi-source energy harvesting and storage system of a simulated ray bata submersible of the present invention;

FIG. 2 is a schematic diagram of a second-order RC equivalent circuit model in the present embodiment;

Fig. 3 is a schematic structural diagram of a ocean current energy power generation module according to the present embodiment;

FIG. 4 is a schematic diagram of a network structure of a load prediction model in the present embodiment;

FIG. 5 is a schematic diagram of a network structure of an action policy network model in the present embodiment;

Fig. 6 is a flowchart of the reinforcement learning algorithm in the present embodiment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

An embodiment of an energy control strategy of a multi-source energy-harvesting energy-storage system of a ray-simulated submersible of the invention, as shown in fig. 1, comprises:

s1, acquiring relevant parameters of each moment when the bated-ray-imitating submersible is sailed underwater;

Specifically, the relevant parameters include: the solar power generation module, the ocean current energy power generation module and the battery pack module of the simulated solar ray diving apparatus correspond to the output power.

In the embodiment, a virtual mathematical model of a multi-source energy-harvesting energy-storage system of the simulated bata diving device is firstly established; the method comprises the steps of creating a virtual mathematical model of a physical entity in a digital mode, simulating the behavior of the physical entity in a real environment by means of data, and feeding back relevant parameters of each moment when the simulated bata diving device sails underwater through virtual-real interaction between the physical entity and the virtual mathematical model; the method comprises the steps of constructing an operation environment of a multi-source energy-harvesting energy storage system of the simulated bata diving device, dividing the operation mode of the simulated bata diving device into a long-time self-sustaining mode working condition, a high-speed maneuvering mode working condition and a benthonic residence mode working condition according to task requirements of the simulated bata diving device, and respectively operating a virtual mathematical model under the three mode working conditions to obtain relevant parameters of the bata diving device at each moment when the bata diving device sails under water.

Wherein, the virtual mathematical model of the multisource energy storage system comprises: the solar power generation module, the ocean current energy power generation module and the battery pack module are all composed of lithium ion batteries, so that the battery pack module in the embodiment adopts a second-order RC equivalent circuit model; specifically, as shown in fig. 2, the second-order RC equivalent circuit model is formed by connecting two parallel RC networks in series on an internal resistance mode, expressing internal electrochemical polarization characteristics of the lithium ion battery pack in the working process by the characteristics of a capacitive resistance element, and obtaining the expression of each parameter in the circuit according to the davin theorem:

In the middle of Is the terminal voltage of the battery,/>，/>Is the terminal voltage of two RC parallel networks; /(I)Is the battery open circuit voltage; i is the circuit current; /(I)The ohmic internal resistance is composed of battery internal parts such as two-pole materials of a battery, electrolyte, an intermediate diaphragm and the like;，/> Is electrochemical polarization resistance,/> ，/>All represent electrochemical polarized capacitances caused by the polarization reaction of the cell; the second-order RC equivalent circuit model has a simple structure, can accurately represent the electrochemical reaction process inside the battery, has higher accuracy, and is one of the most widely used battery models at present.

Then, estimating the SOC of the lithium ion battery by adopting an ampere-hour integration method; state of charge of lithium ion battery at time tThe calculation formula of (2) is as follows:

Wherein the method comprises the steps of Is the initial state of charge of the battery; /(I)Is the rated capacity of the battery; /(I)Representing the current; /(I)Is coulomb factor, i.e. battery charge-discharge efficiency, is generally taken/>. The ampere-hour integration method is the most commonly used method for estimating the SOC of the battery; the relative value of the charge quantity can be obtained through integrating the current; the method is simple to operate and high in precision. However, in practical application, the integration error is increased due to the current drift phenomenon of the Hall sensor for measuring the current, and the composite energy system runs in a simulation environment and has no influence of the current drift; the initial state of charge of the lithium ion battery is also set, so that the state of charge of the battery in the working process can be accurately measured by using an ampere-hour integration method, and the state of charge value (/ >) of the battery module can be obtained by using the ampere-hour integration method）。

The basic principle of the friction nano generator is that the alternating electric field is generated by using the surface charge of a dielectric material under the action of periodic external force to drive the electrons of an external circuit to flow through the coupling of friction electrification and electrostatic induction effect, so that electric energy is output to the outside; the structure diagram of the ocean current energy power generation module is shown in figure 3; the expression of the ocean current energy power generation module for generating electricity by ocean current energy friction according to Maxwell's equation is as follows:

wherein S is the contact area; is the effective dielectric material thickness; /(I) Is the distance that the two friction materials move over time; /(I)Is a dielectric constant; /(I)Is the charge density; /(I)Representing the amount of charge generated.

Wherein, photovoltaic power generation module includes solar cell panel, and every photovoltaic array piece on the solar cell panel all includes a plurality of power generation module of establishing ties, and a plurality of power generation module of establishing ties are parallelly connected, and wherein, power generation module includes: the photo-generated current source is connected with a series resistor and a diode in series, and the parallel resistor is connected between the input end of the diode and the series resistor in parallel; the photovoltaic array is a five-parameter model, and irradiance and temperature-dependent I-V characteristics of the module are represented by using a photo-generated current source, a diode, a series resistor and a parallel resistor; the diode I-V characteristics of a single module are

Wherein,Is diode voltage; /(I)Is diode current; /(I)Is diode saturated current; nI is a diode management factorization, taking ni=0.9; k is the boltzmann constant; q is electron charge 1.6022 e-19C; ncell is the number of units in series in the module; /(I)An exponential function based on a natural constant e; /(I)Is a thermal voltage; /(I)Is the temperature.

The model of the simulated ray diving device is as follows:

Wherein, Is the power which needs to be input by the driving motor of the simulated ray diving apparatus,/>Is the driving motor torque of the simulated ray diving device,/>Efficiency of driving motor of simulated ray diving apparatus,/>Is the rotation speed of a driving motor of the simulated ray diving device,/>The thrust generated by the simulated ray diving device is L which is the structural length of the flapping wing of the simulated ray diving device; /(I)Is sailing resistance.

The collected current, voltage, navigational speed and illumination intensity are transmitted to a mathematical model of the baton-like multi-source energy-capturing energy-storage system; the parameters of the output power of the solar power generation module, the ocean current energy power generation module and the battery pack module are calculated respectively, and the other light intensity and the navigational speed can be directly acquired.

S2, a load prediction model is established, and the total load power of the simulated ray-bated submersible at the next moment is predicted;

Establishing a load prediction model, taking relevant parameters at the current moment as the input of the load prediction model, and predicting the total load power of the simulated batlight diving device at the next moment; the load prediction model is established by the following steps: constructing an initial load prediction model, training the initial load prediction model to obtain a load prediction model, collecting relevant parameters in historical data in a specific training process, taking the relevant parameters of the historical data as input of the initial load prediction model, taking the total load power of the simulated bata diving device at the next moment corresponding to the relevant parameters of the historical data at each moment as output of the initial load prediction model, and training the initial load prediction model until a loss function converges to obtain the load prediction model, wherein the total load at the initial moment is 0.

In the embodiment, the load prediction model predicts the total load of the simulated traw diving device by adopting a BPNN (binary neural network). Determining a network model as 5-input single-output; the input variables are light intensity, navigational speed, a solar power generation module, a ocean current energy generation module and a battery pack module respectively, the output power corresponding to the solar power generation module is P1, the output power corresponding to the ocean current energy generation module is P2, and the output power corresponding to the battery pack module is P3; predicting the total load power of the simulated ray of the submersible at the next moment according to the input variables; the number of neurons of the middle two hidden layers is 10, and 10 data features are extracted from five input variables through full-connection operation; finally, an output result is obtained through the full link layer, and specifically, the structure of the load prediction model is shown in fig. 5.

Specifically, the load prediction model predicts the following steps:

step 21, data preprocessing: converting the five input variables into tensor data types which can be identified by the neural network; 70% of the data were used as training data for the load prediction model, and 30% were used as test data. In order to prevent the weight of the load prediction model from deviating from the data having a large value, it is necessary to normalize the sample so that the data is symmetrical about the origin.

Step 22, forward propagation of the neuronal network: randomly initializing a weight matrix (W1, W2, W3) and bias vectors (b 1, b2, b 3) of the load prediction model; performing full-connection operation on the input value and the weight W1 to obtain 10 characteristic values; performing full-connection operation with a weight W2 after performing non-linearization by relu functions to further extract features, and performing non-linearization by relu functions; and finally obtaining an output value through W3.

The calculation process from the input layer to the hidden layer 1 is as follows:

Wherein the method comprises the steps of For/>Inputting a vector; /(I)For/>A weight matrix; /(I)For/>A weight bias parameter vector; /(I)Is relu activation functions.

The calculation process of the hidden layer 1 to the hidden layer 2 is as follows:

The calculation process from the hidden layer 2 to the output layer is as follows:

Wherein the method comprises the steps of For/>Weight matrix,/>Is a bias vector; /(I)Representing the feature variables of hidden layer 2.

Step 23, back propagation of the neural network: calculating model loss according to the output value of the load prediction model and the label call loss function; the weight matrix is updated using an Adam optimizer based on the loss values.

The loss function SSE of the load prediction model is calculated as follows:

In the method, in the process of the invention, Representing absolute error; /(I)Representing the actual total load power of the simulated ray diving device at the next moment; /(I)And the total load power of the simulated ray of the diving device predicted by the load prediction model at the next moment is represented.

S3, constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition;

Specifically, based on the total load power of the simulated batray diving apparatus predicted by the load prediction model at the next moment under the high-speed maneuver mode working condition, constructing a corresponding loss function under the high-speed maneuver mode working condition, and based on the corresponding loss function under the high-speed maneuver mode working condition, constructing an action strategy network model of the simulated batray diving apparatus under the high-speed maneuver mode working condition; constructing a corresponding loss function of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the sailing distance corresponding to the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition, and constructing a corresponding action strategy network model of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the benthonic residence mode working condition based on the corresponding loss function under the long-term self-sustaining mode working condition and the benthonic residence mode working condition; wherein the inputs and inputs of the action policy network model and the network structure are shown in fig. 5.

Step 31, constructing a corresponding action strategy network model of the simulated ray of the light diving device under each modal working condition;

Step 311, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the long-term self-sustaining mode working condition:

Step 3111, constructing a loss function corresponding to the simulated ray diving apparatus under the long-term self-sustaining mode working condition: because the long-time self-sustaining mode working condition refers to the working condition of the simulated batray diving apparatus when executing some long-distance and long-period cruising tasks; the energy system is required to achieve self-sufficiency of energy by capturing solar energy and ocean current energy, the main purpose is to improve the endurance of the submersible, and the requirement on the maneuverability of the submersible is not high, so the long-term self-sustaining mode working condition focuses on the state of charge value of the battery module, and the expression of the corresponding loss function under the long-term self-sustaining mode working condition is as follows:

Step 3112, based on the loss function corresponding to the long-term self-sustaining mode working condition, constructing an initial action strategy network model corresponding to the long-term self-sustaining mode working condition of the simulated bata diving device.

Step 3113, training an initial action strategy network model under a long-term self-sustaining mode working condition to obtain an action strategy network model corresponding to the simulated bata diving device under the long-term self-sustaining mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray of the light diving device under the long-term self-sustaining mode working condition and the action taken by the simulated ray of the light diving device at the current moment are input as the corresponding initial action strategy network model under the long-term self-sustaining mode working condition, the action taken by the simulated ray of the light diving device under the long-term self-sustaining mode working condition at the next moment is output as the corresponding initial action strategy network model under the long-term self-sustaining mode working condition, and the initial action strategy network model under the long-term self-sustaining mode working condition is trained to obtain the trained action strategy network model under the long-term self-sustaining mode working condition.

The expression of the action taken by the simulated ray diving apparatus under the long-term self-sustaining mode working condition in the embodiment is as follows:

In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device when the solar power generation module is closed under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the long-time self-sustaining mode working condition; /(I)Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the long-time self-sustaining mode working condition; /(I)Representing the long-term self-sustaining mode of the simulated ray diving apparatus an action value of the battery pack module during charging under the working condition; /(I)The action value of the simulated ray diving device when the battery pack module discharges under the long-time self-sustaining mode working condition is shown.

Step 312, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the high-speed maneuvering mode working condition:

step 3121, constructing a loss function of the simulated ray diving apparatus under the high-speed maneuvering mode working condition:

The bate ray-imitating diving device is used for target pursuing and rapid striking of high-speed movement under the working condition of high-speed maneuvering mode. Requiring the simulated bata diving instrument to have faster dynamic response capability under the high-speed maneuvering mode working condition, wherein the energy system is required to provide all power required by a load (namely the predicted power of a load prediction model) in a short time; the requirement on the cruising ability is not high, so the expression of the corresponding loss function under the high-speed maneuvering mode working condition in the embodiment is as follows:

Representing the output power of the solar power generation module;

representing the output power of the battery module.

Step 3122, based on the corresponding loss function under the high-speed maneuvering mode working condition, constructing an initial action strategy network model corresponding to the simulated bata diving device under the high-speed maneuvering mode working condition.

Step 3123, training an initial action strategy network model under a high-speed maneuvering mode working condition to obtain an action strategy network model corresponding to the simulated batlight diving device under the high-speed maneuvering mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray-light diving device under the high-speed maneuvering mode working condition and the action taken by the simulated ray-light diving device at the current moment are input as the corresponding initial action strategy network model under the high-speed maneuvering mode working condition, the action taken by the simulated ray-light diving device at the next moment under the high-speed maneuvering mode working condition is output as the corresponding initial action strategy network model under the high-speed maneuvering mode working condition, and the initial action strategy network model under the high-speed maneuvering mode working condition is trained to obtain the trained action strategy network model under the high-speed maneuvering mode working condition.

The expression of the action taken by the simulated ray of the bats submersible under the working condition of a high-speed maneuvering mode is as follows:

In the method, in the process of the invention, Representing actions taken by the ray-simulated diving device under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device during the power generation of the solar power generation module under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device when the solar power generation module is closed under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device during the power generation of the ocean current energy power generation module under the working condition of a high-speed maneuvering mode; /(I)Representing the action value of the simulated ray diving device when the ocean current energy power generation module is closed under the working condition of a high-speed maneuvering mode; /(I)Representing the high-speed maneuvering mode of the simulated ray-bated diving device an action value of the battery pack module during charging under the working condition; /(I)The action value of the simulated ray of the batray diving device when the battery pack module discharges under the working condition of a high-speed maneuvering mode is shown.

Step 313, constructing a corresponding action strategy network model of the simulated ray diving apparatus under the benthonic residence mode working condition:

Step 3131, constructing a loss function corresponding to the simulated ray diving apparatus under the benthonic residence mode working condition: the benthonic residence mode working condition of the simulated ray diving device is used in the long-time incubation task; only capturing ocean current energy for charging under the benthonic residence mode working condition, and meanwhile, the requirement on the maneuverability of the ocean current energy is not high, so that the expression of a corresponding loss function under the benthonic residence mode working condition in the embodiment is as follows:

In the method, in the process of the invention, Representing a corresponding loss function value under the benthonic residence mode working condition; /(I)The furthest sailing distance of the simulated ray of the bats is shown under the working condition of the benthonic residence mode.

Step 3132, based on the corresponding loss function under the benthonic residence mode working condition, an initial action strategy network model corresponding to the bata-simulated diving device under the benthonic residence mode working condition can be constructed.

Step 3123, training an initial action strategy network model under the benthonic residence mode working condition to obtain an action strategy network model corresponding to the simulated bata submersible under the benthonic residence mode working condition; specifically, in this embodiment, the state of charge value, light intensity, navigational speed of the battery module of the simulated ray-simulated diving device under the benthonic retention mode working condition and the action taken by the simulated ray-simulated diving device at the current moment are taken as the corresponding initial action strategy network model input under the benthonic retention mode working condition, the action taken by the simulated ray-simulated diving device at the next moment under the benthonic retention mode working condition is taken as the corresponding initial action strategy network model output under the benthonic retention mode working condition, and the initial action strategy network model under the benthonic retention mode working condition is trained to obtain the trained action strategy network model under the benthonic retention mode working condition

The expression of the action taken by the simulated ray diving apparatus under the benthonic residence mode working condition in the embodiment is as follows:

In the method, in the process of the invention, Representing the action taken by the simulated ray diving device under the working condition of the benthonic residence mode; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower solar power generation module is closed; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value of the power generation of the ocean current energy power generation module; /(I)Representing the mode working condition of the simulated ray diving device in benthonic residence an action value when the lower ocean current energy power generation module is closed; /(I)Representing the residence mode of the simulated ray of the light diving device on the bottom an action value of the battery pack module during charging under the working condition; /(I)And representing the action value of the simulated ray of the bated ray diving device when the battery pack module discharges under the working condition of the benthonic residence mode. /(I)

S4, acquiring a target action strategy network model;

Because, the executing action of the simulated ray of the light diving device is selected through the action strategy network model, and the simulated ray of the light diving device is interacted with the environment in which the simulated ray of the light diving device is positioned. Calculating the state change of the simulated ray of the light diving device after taking certain action through the simulated ray of the light diving device model; selecting proper observation variables according to task characteristics under different modal working conditions of the simulated ray of the bats, and calculating the scoring condition of the simulated ray of the bats on the basis of the variables and the action taken by a reward evaluation system; deducting the score if the state change of the simulated batline diving device after the action does not accord with the ideal condition or exceeds the constraint condition; the selection of the action strategy network model is a continuous optimization process, the best action is continuously searched through a trial-and-error mechanism, and the maximum accumulated return is obtained by improving the self behavior mode, so that the best action (the strategy with the highest score) is searched by adopting the reinforcement learning algorithm shown in fig. 6, and the action at the initial moment is obtained by random initialization; since it does not have a learning value, the evaluation corresponding to the initial operation is 0. The step of optimizing the action strategy network model under each modal working condition by adopting the reinforcement learning algorithm to obtain the target action strategy network model comprises the following steps:

Step 41, when optimizing the action strategy network model under the long-term self-sustaining mode working condition, the expression of the reward function of the reinforcement learning algorithm is as follows:

In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing the action strategy network model under the long-term self-sustaining mode working condition; /(I)Representing discount factors,/>Representing/>, of discount factorsTo the power,/>；/>Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.

In the long-time self-sustaining mode working condition of the simulated ray diving device, the expression of the prize value after the action taken and the environmental interaction is:

Wherein, Represents the/>, of the simulated ray diving device under the long-time self-sustaining mode working conditionUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)The variation trend of the sailing distance of the simulated ray of the batray submersible under the long-time self-sustaining mode working condition is shown; /(I)The rewarding items of the simulated ray diving device under the long-time self-sustaining mode working condition are represented; /(I)Expressing detail punishment items of the simulated ray diving device under the long-time self-sustaining mode working condition; /(I)Representing a state of charge value of the battery module; in order to ensure that the action strategy network of the simulated ray of the light diving device is continuously updated and is prevented from being trapped into local optimum, the situation that the sailing distance of the simulated ray of the light diving device is unchanged under the long-term self-sustaining mode working condition is set as a deduction item; while adding a state of charge (SOC) detail penalty. /(I)

Step 42, when optimizing the action strategy network model under the high-speed maneuver mode working condition, the expression of the reward function of the reinforcement learning algorithm is:

In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing an action strategy network model under a high-speed maneuvering mode working condition; /(I)Representing discount factors,/>；/>Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.

High in imitation ray diving device in the working condition of the fast maneuvering mode, the expression of the prize value after the action taken and the environmental interaction is:

In the method, in the process of the invention, Represents the/>, of the simulated ray diving apparatus under the working condition of high-speed maneuvering modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)The variation trend of the loss function of the corresponding action strategy network model of the simulated ray of the light diving device under the working condition of the high-speed maneuvering mode is shown.

Step 43, when optimizing the action strategy network model under the benthonic resident mode working condition, the expression of the reward function of the reinforcement learning algorithm is as follows:

In the method, in the process of the invention, Representing a reward function of the reinforcement learning algorithm when optimizing the action strategy network model under the benthonic resident mode working condition; /(I)Representing discount factors,/>；/>Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Indicating the kth time; /(I)Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeAnd under the system state at the moment, the action taken interacts with the environment to obtain the rewarding value.

When the simulated ray of the bats is in the working condition of the benthonic residence mode, the expression of the rewarding value after the action and the environment interaction is:

/>

In the method, in the process of the invention, Representing the/>, of the simulated ray diving device under the working condition of benthonic residence modeUnder the state of the system at the moment, the action taken and the rewarding value obtained after the environment interaction; /(I)Representing the variation trend of the sailing distance of the simulated ray of the bated ray submersible under the working condition of the benthonic residence mode; /(I)A reward item of the simulated ray diving device under the working condition of the benthonic residence mode is represented; /(I)A detail punishment item of the simulated ray diving device under the working condition of the benthonic residence mode is represented; /(I)Representing the state of charge value of the battery module.

The corresponding target action strategy network model under each mode working condition after the action strategy network model under each mode working condition is optimized by adopting the reinforcement learning algorithm can be obtained.

S5, predicting actions taken by the simulated batray diving device at the next moment under the current modal working condition, and controlling the actions of the simulated batray diving device.

Step 51, when the simulated ray diving apparatus selects a long-time self-sustaining mode working condition: the action strategy selection network inputs the charge state value, the light intensity and the navigational speed of the battery pack module and the action taken by the batray-imitating diving device, and the action strategy selection network aims at the maximum navigational distance and selects the maximum navigational distance from the targetThe execution strategy with the highest accumulated return under the reward mechanism outputs the action to be taken at the next moment; and then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.

Step 52, when the simulated bate ray diving apparatus selects a high-speed maneuvering mode working condition: the action policy selection network inputs the state of charge value, the light intensity, the navigational speed and the action taken by the battery pack module, and the action policy selection network targets the load dynamic response capability and selects the action in the battery pack moduleThe execution strategy with the highest accumulated return under the reward mechanism outputs the action a _t+1 which should be taken at the next moment. And then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.

Step 53, when the simulated bated ray diving device selects a benthonic residence mode working condition: the action strategy selection network inputs the state of charge value, the light intensity, the navigational speed and the action taken by the battery pack module, and the action strategy selection network aims at the maximum navigational distance and selects the maximum navigational distanceThe execution strategy with the highest accumulated return under the reward mechanism outputs the action to be taken at the next moment. And then the method is circulated until the mode working condition is ended to be switched to other mode working conditions.

And 54, controlling the action of the simulated ray diving device according to the action taken by the simulated ray diving device at the next moment under the current modal working condition.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. An energy control strategy of a multi-source energy harvesting and storing system of a simulated ray of a ray, which is characterized by comprising:

2. The energy control strategy of the multi-source energy-harvesting energy-storage system of the bata-ray-simulating submersible as claimed in claim 1, wherein the expression of the corresponding loss function under the long-term self-sustaining mode condition is:

3. The energy control strategy of the multi-source energy-capturing energy storage system of the simulated bata-diving apparatus according to claim 1, wherein the expression of the corresponding loss function under the high-speed maneuvering mode working condition is as follows:

Representing the output power of the solar power generation module;

representing the output power of the battery module.

4. The energy control strategy of the multi-source energy-harvesting energy storage system of the simulated ray-in-bata submersible of claim 1, wherein the expression of the corresponding loss function under the benthonic residence mode condition is:

5. The energy control strategy of the multi-source energy storage system of the simulated bata-diving device according to claim 1, wherein the step of constructing the corresponding action strategy network model of the simulated bata-diving device under each modal condition is characterized by:

6. The energy control strategy of the multi-source energy-harvesting energy storage system of the bata-ray-simulating submersible as claimed in claim 1, wherein when the action strategy network model under each modal working condition is optimized, the expression of the reward function of the reinforcement learning algorithm is as follows:

Representing a discount factor;

indicating the kth time.

7. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-ray submersible of claim 6, wherein the steps of obtaining the rewarding value after the action and the environmental interaction taken by the simulated bata-ray submersible under the system state of each modal working condition are as follows:

Representing the state of charge value of the battery module.

8. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-diving apparatus of claim 1, wherein the expression of the action taken by the simulated bata-diving apparatus under the long-term self-sustaining mode working condition is as follows:

9. The energy control strategy of the multi-source energy-harvesting energy-storage system of the simulated bata-diving apparatus of claim 1, wherein the expression of the action taken by the simulated bata-diving apparatus under the high-speed maneuvering mode condition is as follows:

10. The energy control strategy of the multi-source energy-harvesting energy storage system of the simulated ray-of-bata submersible according to claim 1, wherein the expression of the action taken by the simulated ray-of-bata submersible in the benthonic residence mode condition is: