Deep Reinforcement Learning for Charging Scheduling of Electric Vehicles Considering Distribution Network Voltage Stability
<p>The collaborative EV charging and voltage control framework.</p> "> Figure 2
<p>Constructure of the proposed DDPG algorithm.</p> "> Figure 3
<p>The architecture of the designed policy network.</p> "> Figure 4
<p>Schematic diagram of the framework for offline training and online operation.</p> "> Figure 5
<p>Modified IEEE 33-bus distribution network. The number on the feeder (0–32) indicates the number of the node.</p> "> Figure 6
<p>Comparison of node voltage between the coordinated solution and dispatch-only solution. (<b>a</b>) Coordinated solution of Case 1; (<b>b</b>) dispatch-only solution of Case 1; (<b>c</b>) coordinated solution of Case 2; (<b>d</b>) dispatch-only solution of Case 2.</p> "> Figure 6 Cont.
<p>Comparison of node voltage between the coordinated solution and dispatch-only solution. (<b>a</b>) Coordinated solution of Case 1; (<b>b</b>) dispatch-only solution of Case 1; (<b>c</b>) coordinated solution of Case 2; (<b>d</b>) dispatch-only solution of Case 2.</p> "> Figure 7
<p>Comparison of average cumulative voltage violation during the training process for different learning algorithms. (<b>a</b>) Average CVV of Case 1; (<b>b</b>) average CVV of Case 2.</p> "> Figure 8
<p>Control results for the OLTC and SCB. (<b>a</b>) Results of Case 1; (<b>b</b>) results of Case 2.</p> "> Figure 9
<p>Comparison of the cumulative cost of different scheduling methods on the test set. (<b>a</b>) Comparison results of Case 1; (<b>b</b>) comparison results of Case 2.</p> "> Figure 10
<p>Comparison of average reward during the training process for different learning algorithms. (<b>a</b>) Comparison results of Case 1; (<b>b</b>) comparison results of Case 2.</p> "> Figure 11
<p>Operation results of DDPG for the IEEE-33 node system. (<b>a</b>) Active and reactive power generation of DG1 and DG2; (<b>b</b>) charging/discharging power and energy status of EVs.</p> ">
Abstract
:1. Introduction
- A time-independent two-layer coordinated EV charging and voltage control framework is proposed to minimize EV charging costs and stabilize the voltage of distribution networks.
- An MDP with unknown transition probability is established to solve the EV charging problem considering the voltage stabilization of DN. The reward function is reasonably designed to balance the EV charging target and voltage stability target.
- The model-free DDPG algorithm is introduced to solve the coordinated optimization problem. A DNN-based policy network is designed to output hybrid continuous scheduling signals and discrete control signals.
2. Materials and Methods
2.1. Modelling of DN System
2.1.1. Controllable Units in the Distribution Network
- 1.
- EVs
- 2.
- Controllable DGs
- 3.
- Third OLTCs and VRs
- 4.
- SCBs
2.1.2. Operational Constraints of the DN
2.1.3. MDP Model
- 1.
- State
- 2.
- Action
- 3.
- Reward function
- 4.
- Objective
2.2. Deep Reinforcement Learning Solution
2.2.1. DRL Based Approach
2.2.2. Design of the Parameterized Policy Network
2.2.3. Practices Implementation
Algorithm 1 DDPG-based Learning Algorithm | |||
1 | Initialize weights and of critic network and actor network | ||
2 | Initialize weights , of target network and | ||
3 | Initialize experience replay buffer | ||
4 | for = 1, 2, …, do | ||
5 | Receive initial observation state | ||
6 | for = 1, 2, …, do | ||
7 | Choose and do simulation using pandapower | ||
8 | Observe reward and the next state | ||
9 | Store transition in | ||
10 | Sample a random minibatch of transitions from | ||
11 | Set according to Equation (26) | ||
12 | Update critic network parameters by minimizing the loss, see Equation (27): | ||
13 | Update the actor policy using the sampled policy gradient, see Equation (28): | ||
14 | Softly update the target networks using the updated critic and actor network parameters: | ||
15 | end for | ||
16 | end for |
Algorithm 2 Online Running Algorithm | ||
1 | Input system state | |
2 | Output EV charging/discharging schedule and voltage control signals | |
3 | for = 1, 2, …, do | |
4 | Obtain historical information and EV charging demand | |
5 | Build observation state according to Equation (13) | |
6 | Choose action according to Equation (24) using the trained Algorithm 1 | |
7 | Output EV charging/discharging schedule and voltage control signals | |
8 | end for |
3. Results and Discussion
3.1. IEEE-33 Node System and Parameter Settings
3.2. Simulation Comparison of Voltage Control Performance
3.3. Simulation Comparison of Cost Reduction Performance
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Nomenclature
Abbreviations | |
EV | Electric vehicle |
DN | Distribution network |
DRL | Deep reinforcement learning |
DDPG | Deep deterministic policy gradient |
MDP | Markov decision process |
V2G | Vehicle-to-grid |
DG | Distribute generator |
VVC | Vol/Var control |
RES | Renewable energy source |
OLTC | On-line tap changer |
VR | Voltage regulator |
CB | Capacitor bank |
MPC | Model predictive control |
ANN | Artificial neural network |
DR | Demand response |
DNN | Deep neural network |
CC | Central controller |
SOC | State of charge |
WT | Wind turbine |
PV | Photovoltaic |
DQN | Deep Q-network |
DPG | Deterministic policy gradient |
MLP | Multilayer perceptron |
LMP | Locational marginal prices |
SAC | Soft actor–critic |
CVV | Cumulative voltage violation |
CPC | Constant power charge |
TOU | Time of use |
PSO | Particle swarm optimization |
NVC | No voltage control |
Subscript and Superscripts | |
Index of node | |
Index of branch | |
Index of time slot | |
Charging | |
Discharging | |
Arrive time of EV | |
Departure time of EV | |
Substation | |
Load demand | |
Net load demand | |
Main grid | |
Voltage | |
Critic network | |
Actor network | |
Variables | |
Active power | |
State of charge | |
Negative power | |
Electricity price | |
Tap position of OLTC/VR | |
Number of SCB unit in operation | |
Nodal voltage | |
Branch current | |
State of DN | |
Action of policy | |
Reward | |
Parameters of actor network | |
Parameters of critic network | |
Discounted state visitation distribution | |
Stochastic behavior policy | |
Sets | |
Set of time slots | |
Set of nodes | |
Set of branches | |
Parameters | |
Charging/discharging efficiency | |
Apparent power of substation | |
Weight coefficient | |
Cost coefficient of DG (USD/kWh2) | |
Cost coefficient of DG (USD/kWh) | |
Cost coefficient of DG (USD/h) | |
Penalty coefficient | |
Discount factor | |
Max training episode | |
Capacity of replay buffer | |
Batch size | |
Soft update factor | |
Interval of one time step | |
Number of time steps |
References
- Awad, A.S.A.; Shaaban, M.F.; Fouly, T.H.M.E.; El-Saadany, E.F.; Salama, M.M.A. Optimal Resource Allocation and Charging Prices for Benefit Maximization in Smart PEV-Parking Lots. IEEE Trans. Sustain. Energy 2017, 8, 906–915. [Google Scholar] [CrossRef]
- Revankar, S.R.; Kalkhambkar, V.N. Grid integration of battery swapping station: A review. J. Energy Storage 2021, 41, 102937. [Google Scholar] [CrossRef]
- Wan, Z.; Li, H.; He, H.; Prokhorov, D. Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 5246–5257. [Google Scholar] [CrossRef]
- Cao, Y.; Wang, H.; Li, D.; Zhang, G. Smart Online Charging Algorithm for Electric Vehicles via Customized Actor-Critic Learning. IEEE Internet Things J. 2022, 9, 684–694. [Google Scholar] [CrossRef]
- Revankar, S.R.; Kalkhambkar, V.N.; Gupta, P.P.; Kumbhar, G.B. Economic Operation Scheduling of Microgrid Integrated with Battery Swapping Station. Arab. J. Sci. Eng. 2022, 47, 13979–13993. [Google Scholar] [CrossRef]
- The eGallon: How Much Cheaper Is It to Drive on Electricity? Available online: https://www.energy.gov/articles/egallon-how-much-cheaper-it-drive-electricity (accessed on 28 October 2022).
- Tang, W.; Bi, S.; Zhang, Y.J. Online Charging Scheduling Algorithms of Electric Vehicles in Smart Grid: An Overview. IEEE Commun. Mag. 2016, 54, 76–83. [Google Scholar] [CrossRef]
- Moghaddass, R.; Mohammed, O.A.; Skordilis, E.; Asfour, S. Smart Control of Fleets of Electric Vehicles in Smart and Connected Communities. IEEE Trans. Smart Grid 2019, 10, 6883–6897. [Google Scholar] [CrossRef]
- Patil, H.; Kalkhambkar, V.N. Grid Integration of Electric Vehicles for Economic Benefits: A Review. J. Mod. Power Syst. Clean Energy 2021, 9, 13–26. [Google Scholar] [CrossRef]
- Li, H.; Wan, Z.; He, H. Constrained EV Charging Scheduling Based on Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 2427–2439. [Google Scholar] [CrossRef]
- Deng, W.; Pei, W.; Wu, Q.; Kong, L. Study on Stability of Low-voltage Multi-terminal DC System Under Electric Vehicle Integration. In Proceedings of the 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), Wuhan, China, 30 October–1 November 2020; pp. 1913–1918. [Google Scholar]
- Li, H.; He, H. Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2022, 13, 1860–1872. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.; Zhao, J.; Huang, Q.; Chen, Z.; Blaabjerg, F. A Multi-Agent Deep Reinforcement Learning Based Voltage Regulation Using Coordinated PV Inverters. IEEE Trans. Power Syst. 2020, 35, 4120–4123. [Google Scholar] [CrossRef]
- Hu, D.; Ye, Z.; Gao, Y.; Ye, Z.; Peng, Y.; Yu, N. Multi-agent Deep Reinforcement Learning for Voltage Control with Coordinated Active and Reactive Power Optimization. IEEE Trans. Smart Grid 2022, 13, 4873–4886. [Google Scholar] [CrossRef]
- Pourjafari, E.; Reformat, M. A Support Vector Regression Based Model Predictive Control for Volt-Var Optimization of Distribution Systems. IEEE Access 2019, 7, 93352–93363. [Google Scholar] [CrossRef]
- Hu, Y.; Liu, W.; Wang, W. A Two-Layer Volt-Var Control Method in Rural Distribution Networks Considering Utilization of Photovoltaic Power. IEEE Access 2020, 8, 118417–118425. [Google Scholar] [CrossRef]
- Savasci, A.; Inaolaji, A.; Paudyal, S. Two-Stage Volt-VAr Optimization of Distribution Grids With Smart Inverters and Legacy Devices. IEEE Trans. Ind. Appl. 2022, 58, 5711–5723. [Google Scholar] [CrossRef]
- Li, S.; Sun, Y.; Ramezani, M.; Xiao, Y. Artificial Neural Networks for Volt/VAR Control of DER Inverters at the Grid Edge. IEEE Trans. Smart Grid 2019, 10, 5564–5573. [Google Scholar] [CrossRef]
- Wang, W.; Yu, N.; Gao, Y.; Shi, J. Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems. IEEE Trans. Smart Grid 2020, 11, 3008–3018. [Google Scholar] [CrossRef]
- Sun, X.; Qiu, J. Hierarchical Voltage Control Strategy in Distribution Networks Considering Customized Charging Navigation of Electric Vehicles. IEEE Trans. Smart Grid 2021, 12, 4752–4764. [Google Scholar] [CrossRef]
- Kesler, M.; Kisacikoglu, M.C.; Tolbert, L.M. Vehicle-to-Grid Reactive Power Operation Using Plug-In Electric Vehicle Bidirectional Offboard Charger. IEEE Trans. Ind. Electron. 2014, 61, 6778–6784. [Google Scholar] [CrossRef]
- Zheng, Y.; Song, Y.; Hill, D.J.; Meng, K. Online Distributed MPC-Based Optimal Scheduling for EV Charging Stations in Distribution Systems. IEEE Trans. Ind. Inform. 2019, 15, 638–649. [Google Scholar] [CrossRef]
- Nazir, N.; Almassalkhi, M. Voltage Positioning Using Co-Optimization of Controllable Grid Assets in Radial Networks. IEEE Trans. Power Syst. 2021, 36, 2761–2770. [Google Scholar] [CrossRef]
- Yong, J.Y.; Ramachandaramurthy, V.K.; Tan, K.M.; Selvaraj, J. Experimental Validation of a Three-Phase Off-Board Electric Vehicle Charger With New Power Grid Voltage Control. IEEE Trans. Smart Grid 2018, 9, 2703–2713. [Google Scholar] [CrossRef]
- Patil, H.; Kalkhambkar, V.N. Charging cost minimisation by centralised controlled charging of electric vehicles. Int. Trans. Electr. Energy Syst. 2020, 30, e12226. [Google Scholar] [CrossRef]
- Dabbaghjamanesh, M.; Moeini, A.; Kavousi-Fard, A. Reinforcement Learning-Based Load Forecasting of Electric Vehicle Charging Station Using Q-Learning Technique. IEEE Trans. Ind. Inform. 2021, 17, 4229–4237. [Google Scholar] [CrossRef]
- Jahangir, H.; Gougheri, S.S.; Vatandoust, B.; Golkar, M.A.; Golkar, M.A.; Ahmadian, A.; Hajizadeh, A. A Novel Cross-Case Electric Vehicle Demand Modeling Based on 3D Convolutional Generative Adversarial Networks. IEEE Trans. Power Syst. 2022, 37, 1173–1183. [Google Scholar] [CrossRef]
- Jahangir, H.; Gougheri, S.S.; Vatandoust, B.; Golkar, M.A.; Ahmadian, A.; Hajizadeh, A. Plug-in Electric Vehicle Behavior Modeling in Energy Market: A Novel Deep Learning-Based Approach With Clustering Technique. IEEE Trans. Smart Grid 2020, 11, 4738–4748. [Google Scholar] [CrossRef]
- Velamuri, S.; Cherukuri, S.H.C.; Sudabattula, S.K.; Prabaharan, N.; Hossain, E. Combined Approach for Power Loss Minimization in Distribution Networks in the Presence of Gridable Electric Vehicles and Dispersed Generation. IEEE Syst. J. 2022, 16, 3284–3295. [Google Scholar] [CrossRef]
- Li, S.; Hu, W.; Cao, D.; Zhang, Z.; Huang, Q.; Chen, Z.; Blaabjerg, F. EV Charging Strategy Considering Transformer Lifetime via Evolutionary Curriculum Learning-Based Multiagent Deep Reinforcement Learning. IEEE Trans. Smart Grid 2022, 13, 2774–2787. [Google Scholar] [CrossRef]
- Javadi, M.S.; Gough, M.; Mansouri, S.A.; Ahmarinejad, A.; Nematbakhsh, E.; Santos, S.F.; Catalao, J.P.S. A two-stage joint operation and planning model for sizing and siting of electrical energy storage devices considering demand response programs. Int. J. Electr. Power Energy Syst. 2022, 138, 107912. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21 June 2014; pp. 387–395. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Abadi, M.i.N.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-scale Machine Learning on Heterogeneous Systems. Available online: https://www.tensorflow.org/ (accessed on 22 August 2022).
- Thurner, L.; Scheidler, A.; Schafer, F.; Menke, J.; Dollichon, J.; Meier, F.; Meinecke, S.; Braun, M. pandapower—An Open-Source Python Tool for Convenient Modeling, Analysis, and Optimization of Electric Power Systems. IEEE Trans. Power Syst. 2018, 33, 6510–6521. [Google Scholar] [CrossRef]
- Baran, M.E.; Wu, F.F. Network reconfiguration in distribution systems for loss reduction and load balancing. IEEE Trans. Power Deliv. 1989, 4, 1401–1407. [Google Scholar] [CrossRef]
- Li, H.; Li, G.; Wang, K. Real-time Dispatch Strategy for Electric Vehicles Based on Deep Reinforcement Learning. Autom. Electr. Power Syst. 2020, 44, 161–167. [Google Scholar]
- OASIS. California ISO Open Access Same-Time Information System. Available online: http://oasis.caiso.com/mrioasis/logon.do (accessed on 9 September 2021).
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the International Confernece on Machine Learning, Stockholm, Sweden, 9 January 2018. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithm. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Layer | Output Dimension |
---|---|
Input layer (state space) | |
Full connection layer + ReLU (units 256) | 256 |
Full connection layer + ReLU (units 128) | 128 |
Full connection layer + ReLU (units 64) | 64 |
Full connection layer + tanh (action dimension) | |
Round block and inverse-transform block | |
Output of hybrid action = |
Type and Number | Parameters | ||||||
---|---|---|---|---|---|---|---|
DG | NO. | Maximum Power (kW) | Minimum Power (kW) | a (USD/kWh2) | b (USD/kWh) | c (USD/h) | |
1 | 300 | 100 | 0.0175 | 1.75 | 0 | ||
2 | 400 | 100 | 0.0625 | 1 | 0 | ||
RES | NO. | Maximum power (kW) | Minimum power (kW) | ||||
WT1-2 | 15 | 0 | |||||
PV1-2 | 8 | 0 |
Variable | Distribution | Boundary |
---|---|---|
Arrival time | ||
Departure time | ||
Initial SOC |
Symbol | Parameters | Numerical |
---|---|---|
Training episode | 3000 | |
Learning rate of actor | 0.00001 | |
Learning rate of critic | 0.001 | |
Soft update coefficient | 0.01 | |
Memory capacity | 25,000 | |
Batch size | 48 | |
Discount factor | 0.95 | |
Trade-off factor | 0.5 | |
Penalty of voltage fluctuation | 100,000 |
Type | Time Period | Price (USD/kWh) |
---|---|---|
Valley | 1:00–8:00 | 0.295 |
Peak | 9:00–12:00, 18:00–21:00 | 0.845 |
Flat | 13:00–17:00, 22:00–24:00 | 0.56 |
DDPG | DQN | SAC | PPO | ||
---|---|---|---|---|---|
Case 1 | Training (h) | 13.57 | 28.36 | 18.64 | 16.85 |
Testing (s) | 0.0014 | 0.0016 | 0.0014 | 0.0015 | |
Case 2 | Training (h) | 14.85 | 40.46 | 20.81 | 18.72 |
Testing (s) | 0.0024 | 0.0032 | 0.0026 | 0.0027 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, D.; Zeng, P.; Cui, S.; Song, C. Deep Reinforcement Learning for Charging Scheduling of Electric Vehicles Considering Distribution Network Voltage Stability. Sensors 2023, 23, 1618. https://doi.org/10.3390/s23031618
Liu D, Zeng P, Cui S, Song C. Deep Reinforcement Learning for Charging Scheduling of Electric Vehicles Considering Distribution Network Voltage Stability. Sensors. 2023; 23(3):1618. https://doi.org/10.3390/s23031618
Chicago/Turabian StyleLiu, Ding, Peng Zeng, Shijie Cui, and Chunhe Song. 2023. "Deep Reinforcement Learning for Charging Scheduling of Electric Vehicles Considering Distribution Network Voltage Stability" Sensors 23, no. 3: 1618. https://doi.org/10.3390/s23031618