Open AccessArticle

Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3

College of Information Science and Engineering, Hunan University, Changsha 410000, China

School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

Department of Mechanical, Aerospace and Civil Engineering, the University of Manchester, Manchester M13 9PL, UK

Author to whom correspondence should be addressed.

Energies 2025, 18(4), 901; https://doi.org/10.3390/en18040901

Submission received: 13 January 2025 / Revised: 7 February 2025 / Accepted: 9 February 2025 / Published: 13 February 2025

(This article belongs to the Section F1: Electrical Power System)

Download

Browse Figures

Figure 1
Three-phase PMSM vector control block diagram. "> Figure 2
Structure of the TD3 algorithm. "> Figure 3
The dual closed-loop control structure of PMSM speed and current based on TD3. "> Figure 4
Snapshot of the implemented software. "> Figure 5
Training results for working condition 1. "> Figure 6
Experimental results of a PMSM operating in working condition 1. (a) Rotor speed; (b) Q-axis current; (c) speed tracking error. "> Figure 7
Training results for working condition 2. "> Figure 8
Experimental results of PMSM operating in working condition 2. (a) Rotor speed; (b) Q-axis current; (c) speed tracking error. "> Figure 9
Experimental results of a PMSM operating in working condition 2 with torque disturbances. (a) Working condition 1; (b) working condition 2. ">

Versions Notes

Abstract

Permanent magnet synchronous motors (PMSMs) are widely used in industrial automation and electric vehicles due to their high efficiency and excellent dynamic performance. However, controlling PMSMs presents challenges such as parameter variations and system nonlinearities. This paper proposes a twin delayed deep deterministic policy gradient (TD3)-based energy-saving optimization control method for PMSM drive systems. The TD3 algorithm uses double networks, target policy smoothing regularization, and delayed actor network updates to improve training stability and accuracy. Simulation experiments under two operating conditions show that the TD3 algorithm outperforms traditional proportional–integral (PI) controllers and linear active disturbance rejection control (LADRC) controllers in terms of reference trajectory tracking, q-axis current regulation, and speed tracking error minimization. The results demonstrate the TD3 algorithm’s effectiveness in enhancing motor efficiency and system robustness, offering a novel approach to PMSM drive system control through deep reinforcement learning.

Keywords:

PMSM; TD3; optimization control; energy-saving

1. Introduction

PMSMs have gained significant traction in industrial automation and electric vehicle applications due to their exceptional efficiency, high power density, and superior dynamic performance [1,2]. The PMSM drive system represents a typical dual-timescale system. Among the most efficient control designs for PMSMs is the cascade structure, which incorporates a rapid inner loop responsible for armature current control while employing a slower outer loop intended to adjust angular velocity by generating appropriate current reference signals. However, achieving the desired control performance for PMSM applications presents challenges, including issues such as parameter variations, external disturbances, and system nonlinearity. In reference [3], it is pointed out that the permanent magnetic flux linkage can vary by 20% of its nominal value, while the stator resistance can vary by 200% of its nominal value. In practical applications, researchers often aim to employ a stable and highly accurate control strategy that can effectively achieve a rapid response and robustness in the presence of uncertain parameters. However, conventional PI controllers prove inadequate for tracking the outer loop velocity when motor parameter uncertainties arise. Consequently, researchers have explored diverse advanced control techniques to address the velocity servo problem, encompassing model predictive control (MPC), robust control, adaptive control, fuzzy control, disturbance rejection control, sliding mode control, prediction-based model-free control, and deep learning and reinforcement learning-based controls. Paper [4] proposes a robust model predictive current control method based on nonlinear extended state observation to enhance the control performance of PMSM in the presence of parameter variations. Paper [5] presents a robust adaptive model predictive speed control method based on a recurrent neural network to tackle the speed control problem of PMSMs under the conditions of mismatched parameters.

Paper [6] presents a universal control framework that utilizes an observer to estimate both the system state and disturbances, while establishing a predictive current controller based on an enhanced system model. Paper [7] introduces a motor-parameter-free model predictive voltage control strategy for PMSM drive systems. Its fundamental concept aims to reduce dependence on motor parameters in PMSM control, thus enhancing the robustness of this control strategy. Papers [8,9,10,11,12] employ the MPC algorithm to mitigate torque fluctuations in PMSM drive systems induced by interturn faults (ITFs). It simplifies the complexity of control methods for PMSM drives, considering ITF by incorporating an adaptive compensation current approach.

In the field of robust adaptive control, the compensator based on the extended state observer proposed in reference [13] effectively addresses the issue of excessively high switching gains required for disturbance rejection. Reference [14] introduced a system transformation method that effectively converts a PMSM system with current constraints into an unconstrained system, thereby streamlining the controller design process. Reference [15] proposed an observer scheme utilizing neural networks along with a sensorless robust optimal control approach to address the speed and current tracking challenges in partially unknown PMSM systems under disturbances and saturation voltages. The paper [16] integrates the adaptive integral sliding mode method and employs a self-regulation approach to adjust the amplitude of the sliding mode function as well as compensate for load disturbances, thereby enhancing the dynamic performance of the system. The paper [17,18,19] utilizes deep reinforcement learning (DRL) to solve the control problem of PMSM. By introducing artificial intelligence algorithms into the traditional parameter optimization process, a DRL model is constructed that can automatically optimize and adjust parameters in different application scenarios, thereby achieving optimal control effects in various environments.

In the domain of prediction-based control strategies, reference [20] introduces an advanced model-free active disturbance rejection dead-zone predictive current control approach specifically designed for PMSM. This approach is based on a data-driven methodology, aiming to address the issue of parameter mismatching in dead-zone prediction current control and improve the performance of PMSM control systems. The paper [21] introduces a model-free predictive current control drive system that incorporates an extended Kalman filter to address performance degradation in model predictive control due to variations in motor parameters. The paper [22] introduces a model-free predictive current control strategy for the drive system of PMSMs in electric vehicles. This innovative approach effectively mitigates performance limitations caused by inaccurate inertia estimation through real-time dynamic adjustment of inertia parameters. Paper [23] presents a novel speed control strategy that combines an adaptive speed controller with a radial basis function neural network for precise speed regulation of PMSM. This approach effectively mitigates the impact of parameter uncertainties and load variations on system performance. The paper [24] utilizes a linear–nonlinear switching active disturbance rejection control strategy to design speed and current controllers for PMSM in servo systems, aiming to improve the disturbance rejection performance of PMSM speed and current controllers. Paper [25] presents an optimal tracking control strategy for PMSM systems characterized by partially unknown dynamics, voltage saturation, and varying speed and current. By integrating an advanced feedforward control input, conventional velocity and current tracking challenges are redefined as novel optimal control problems within a cascaded framework. Experimental results demonstrate that both tracking and approximation errors are uniformly bounded.

In the vector control system of a three-phase PMSM, the conventional PI regulator is widely adopted as the speed controller due to its simplicity and robustness. However, the PMSM exhibits nonlinear dynamics and strong coupling among multiple variables. In the presence of external disturbances or variations in internal parameters, the traditional PI control method struggles to meet stringent control requirements. To improve the dynamic performance of the PMSM speed regulation system, it is crucial to implement a control strategy that remains insensitive to external disturbances and parameter changes while ensuring rapid response and high accuracy. Furthermore, achieving high-performance control for a PMSM requires precise rotor position and speed information within the magnetic field-oriented vector control framework. The use of mechanical sensors for this purpose, however, increases system cost, size, and weight, and imposes strict operating environment constraints. Sensorless control technology addresses these challenges by monitoring electrical signals within the motor windings and employing advanced algorithms to accurately estimate rotor position and speed, thereby enhancing the robustness and reliability of the PMSM vector control system. This paper introduces an energy-optimized speed control algorithm based on the TD3 for PMSM drive systems. Based on the traditional double closed-loop control structure for PMSM speed and current, the TD3 algorithm is employed to train an intelligent agent aimed at optimizing the inner-loop current controller. In designing the reward function for this intelligent agent, both the speed tracking error of the PMSM and the energy consumption of the controller are taken into account. The performance of this optimized controller is then compared with that of PI and LADRC controllers. The main contributions are summarized as follows.

The TD3-based optimal control reduces the difficulty of designing a speed tracking controller for nonlinear PMSM.
Adding energy consumption optimization control to the traditional control objective of steadiness, accuracy, and speed effectively improves the efficiency of the motor.
The better generalization of the algorithm enables the motor to exhibit better control performance under different operating conditions.

The rest of this article is organized as follows. In Section 2, the paper details the modeling process of PMSM and clearly defines the control objectives. In Section 3, we introduce a PMSM speed tracking control algorithm utilizing TD3 and present an in-depth analysis of both the foundational environment and the implementation process of this algorithm. Section 4 analyzes and discusses the experimental results in detail. Finally, Section 5 provides a summary of the article.

2. Description of the Control Problem

2.1. Control Object Model

The motion equation of PMSM can be described as follows:

J \frac{d ω_{m}}{d t} = T_{e} - T_{L} - B ω_{m}

(1)

T_{e} = \frac{3}{2} p_{n} i_{q} [i_{d} (L_{d} - L_{q}) + φ_{f}]

(2)

Among them,

ω_{m}

J

B

p_{n}

T_{e}

and

T_{L}

represent the mechanical angular velocity, moment of inertia, damping coefficient, number of pole pairs, electromagnetic torque, and load torque of the motor, respectively. To facilitate controller design in a synchronous rotating coordinate system,

d - q

for PMSM models are commonly established. The stator voltage equation can be expressed as Equation (3):

\{\begin{matrix} u_{d} = R i_{d} + \frac{d}{d t} φ_{d} - ω_{e} φ_{q} \\ u_{q} = R i_{q} + \frac{d}{d t} φ_{q} + ω_{e} φ_{d} \end{matrix}

(3)

The stator magnetic flux equation is given by Equation (4):

\{\begin{array}{l} φ_{d} = L_{d} i_{d} + φ_{f} \\ φ_{q} = L_{q} i_{q} \end{array}

(4)

The substitution of Equation (4) into Equation (3) yields the stator voltage equation as shown (5):

\{\begin{array}{l} u_{d} = R i_{d} + L_{d} \frac{d}{d t} i_{d} - ω_{e} L_{q} i_{q} \\ u_{q} = R i_{q} + L_{q} \frac{d}{d t} i_{q} + ω_{e} (L_{d} i_{d} + φ_{f}) \end{array}

(5)

Among them,

u_{d}, u_{q}

are the components of the stator voltage on the

d - q

axis, while

i_{d}, i_{q}

are the components of the stator current on the

d - q

axis;

R

represents the stator resistance.

φ_{d}, φ_{q}

are the X-axis components of the stator magnetic flux.

ω_{e}

represents the angular velocity of electricity,

L_{d}, L_{q}

are the inductance components of the

d - q

axis, and

φ_{f}

represents the magnetic flux of a permanent magnet. In addition, it is important to pay attention to the relationship between variables in Equation (6) when constructing a PMSM simulation model.

\{\begin{matrix} ω_{e} = p_{n} ω_{m} \\ N_{r} = \frac{30}{π} ω_{m} \\ θ_{e} = \int ω_{e} d t \end{matrix}

(6)

where

ω_{m}

represents the mechanical angular velocity of the motor (

r a d / s

), and

N_{r}

represents the motor speed (

r / \min

). The common methods of traditional vector control include

i_{d} = 0

control and maximum torque current ratio control. The control method where the

d q

-axis inductance of surface-mounted PMSM is equal, i.e.,

L_{d} = L_{q}

and

i_{d} = 0

, is mainly applicable to three-phase surface-mounted PMSMs. In this case, Equation (2) can be rewritten as Equation (7):

T_{e} = \frac{3}{2} p_{n} i_{q} φ_{f}

(7)

For surface-mounted three-phase PMSMs,

i_{d} = 0

control and the maximum torque current ratio control are equivalent.

2.2. Speed Loop Control

Assuming the motor starts with no load, i.e.,

T_{L} = 0

, when adopting control strategy

i_{d} = 0

, the active damping is defined as in Equation (8):

i_{q} = {i^{'}}_{q} - B_{a} ω_{m}

(8)

The combination of Equations (1) and (2) yields Equation (9), thereby establishing a comprehensive relationship.

\frac{d ω_{m}}{d t} = \frac{1.5 p_{n} φ_{f}}{J} ({i^{'}}_{q} - B_{a} ω_{m}) - \frac{B}{J} ω_{m}

(9)

Repositioning the poles of Equation (9) at the desired closed-loop bandwidth

β

and applying a Laplace transform derives the transfer function representing motor speed in relation to shaft current as shown in Equation (10).

ω_{m} (s) = \frac{1.5 p_{n} φ_{f} / J}{s + β} {i^{'}}_{q} (s)

(10)

The

β

represents the desired bandwidth of the speed control loop. The coefficient

B_{a}

of active damping, as derived from the comparison between Equations (9) and (10), is presented in Equation (11).

B_{a} = \frac{β J - B}{1.5 p_{n} φ_{f}}

(11)

The speed loop controller can be mathematically represented as Equation (12) when employing a conventional PI controller.

i_{q} * = (K_{p} + \frac{K_{i}}{s}) (ω_{m} * - ω_{m}) - B_{a} ω_{m}

(12)

where

K_{p} = \frac{β J}{1.5 p_{n} φ_{f}}, K_{i} = β K_{p}

2.3. Current Loop Control

The current equation for the

d - q

-axis, which corresponds to Equation (13), can be obtained by rewriting Equation (5).

\{\begin{array}{l} \frac{d}{d t} i_{d} = - \frac{R}{L_{d}} i_{d} + \frac{L_{q}}{L_{d}} ω_{e} i_{q} + \frac{1}{L_{d}} u_{d} \\ \frac{d}{d t} i_{q} = - \frac{R}{L_{d}} i_{d} - \frac{1}{L q} ω_{e} (L_{d} i_{d} + φ_{f}) + \frac{1}{L_{q}} u_{q} \end{array}

(13)

The complete decoupling of

i_{d}

and

i_{q}

yields Equation (14).

\{\begin{array}{l} u_{d 0} = u_{d} + ω_{e} L_{q} i_{q} \\ u_{q 0} = u_{q} - ω_{e} (L_{d} i_{d} + φ_{f}) \end{array}

(14)

The substitution of Equation (5) into Equation (14) results in the derivation of Equation (15).

\{\begin{array}{l} u_{d 0} = R i_{d} + L_{d} \frac{d}{d t} i_{d} \\ u_{q 0} = R i_{q} + L_{q} \frac{d}{d t} i_{q} \end{array}

(15)

where

u_{d 0}, u_{q 0}

are the

d

-axis and

q

-axis voltages after current decoupling, respectively. Using the conventional PI controller and combining with the feed-forward decoupling control strategy, the

d - q

-axis voltage can be obtained as shown in Equation (16).

\{\begin{array}{l} u_{d} * = (K_{p d} + \frac{K_{i d}}{s}) (i_{d} * - i_{d}) - ω_{e} L_{q} i_{q} \\ u_{q} * = (K_{p d} + \frac{K_{i q}}{s}) (i_{q} * - i_{q}) + ω_{e} (L_{d} i_{d} + ψ_{f}) \end{array}

(16)

where

K_{p d}, K_{p q}

is the proportional gain of the PI regulator and

K_{i d}, K_{i q}

is the integral gain of the PI regulator.

The block diagram in Figure 1 illustrates the three-phase PMSM vector control employing the method with

i_{d} = 0

. It is evident from the figure that three-phase PMSM vector control primarily comprises three components, namely a speed loop controller, a current loop controller, and the space vector pulse width modulation (SVPWM) algorithm.

3. TD3 of PMSM

The TD3 algorithm proposes three key enhancement techniques based on the deep deterministic policy gradient (DDPG) algorithm, as delineated below [26]. (1) The double network refers to the utilization of two Critic networks, wherein the smaller one is employed for computing the target value in order to mitigate potential issues related to overestimation. (2) Target policy smoothing regularization incorporates perturbations to the action in the subsequent state while calculating the target value, aiming to enhance the precision of value evaluation. (3) Following multiple updates to the Critic network, subsequent updates are made to the Actor network to enhance training stability. Figure 2 shows the structure of the TD3 algorithm [27].

The Actor network updates by maximizing the cumulative expected return through a deterministic policy gradient, while both Critic1 and Critic2 networks update by minimizing the error between the evaluation value and the target value using mean squared error. All target networks are updated using an exponential moving average (EMA) soft update method. During the training phase, a batch of data are sampled from the Replay Buffer with a specific batch size. The Replay Buffer plays a pivotal role in reinforcement learning. It enhances the efficiency of sample utilization and accelerates training speed. Additionally, by mitigating data correlation, it improves both the stability and generalization capability of the model. In this study, the capacity of the Replay Buffer is configured to 10,000. Assuming one sample is denoted as

(s, a, r, s^{'}, d o n e)

, the update process for all networks follows. The updating procedure for the Critic1 and Critic2 networks entails employing the Target Actor network to calculate the action

a^{'} = μ^{'} (s^{'} | θ^{μ^{'}})

in state

s^{'}

. Subsequently, target policy smoothing regularization is implemented, and

ε

is incorporated into the target action

a^{'}

ε

represents Gaussian noise, which adheres to a normal distribution.

s^{'}

and

a^{'}

denote the state and action at the subsequent time step, respectively.

μ^{'} (\cdot)

signifies the target neural network function, while

θ^{μ^{'}}

represents the parameters of this target neural network.

\begin{matrix} a^{'} = a^{'} + ε \\ ε ~ c l i p (N (0, σ), - c, c) \end{matrix}

(17)

Building upon the concept of dual networks, Equation (18) is employed for the computation of the target value

y

y = r + γ \min_{i = 1, 2} {Q_{i}}^{'} (s^{'}, a^{'} | θ_{i}^{Q^{'}})

(18)

The gradient descent algorithm is ultimately employed to minimize the error between the evaluation value and the target value, thereby facilitating parameter updates in both the Critic1 and Actor network update process.

y = r + γ \min_{i = 1, 2} {Q_{i}}^{'} (s^{'}, a^{'} | θ_{i}^{Q^{'}})

(19)

After updating the Critic1 and Critic2 networks for d steps, an update of the Actor network is initiated. The Actor network is employed to compute action

a_{n e w}

for state

s

, where

a_{n e w} = μ (s | θ^{μ})

It is crucial to emphasize that there is no necessity to introduce noise after computing the action, as our objective is for the Actor network to update towards the direction of maximum value. Adding noise would be inconsequential in this context. Subsequently, we employ either Critic1 or Critic2 network to assess the state–action pair

(s, a_{n e w})

, assuming utilization of the Critic1 network

q_{n e w} = Q_{1} (s, a_{n e w} | θ^{Q_{1}})

. The

q_{n e w}

optimization is ultimately achieved by employing a gradient ascent algorithm to facilitate updates in the Actor network.

The target network is updated using a soft update method, wherein a learning rate

τ

is introduced to calculate the weighted average between the old parameters of the target network and their corresponding new parameters. Subsequently, these averaged parameters are assigned to the target network (Equation (20)).

\begin{matrix} θ^{Q_{i}'} = τ θ^{Q_{i}} + (1 - τ) θ^{{Q_{i}}^{'}}, (i = 1, 2) \\ θ^{μ^{'}} = τ θ^{μ} + (1 - τ) θ^{μ^{'}} \end{matrix}

(20)

where

τ \in (0, 1)

, The value of t is usually set to 0.005.

In this paper, the traditional vector control double closed-loop system’s current loop control algorithm is upgraded to the TD3 reinforcement learning algorithm, consequently updating Figure 1, Figure 2 and Figure 3 as illustrated. The agent operates within the current inner loop of the PMSM, regulating the voltage inputs to its d-axis and q-axis. During training, the state quantities selected are the q-axis current error, d-axis current error, integral of the q-axis current error, integral of the d-axis current error, motor speed

ω_{m}

, and motor reference speed

ω_{m} *

, with the execution action being the q-axis input voltage and d-axis input voltage. The reward at each time step is

r_{t} = (α_{1} * i {d_{e r r}}^{2} + α_{2} * i {q_{e r r}}^{2} + β * \sum_{j} {u^{j}}_{t - 1}^{2}) .

(21)

Here,

α_{1}, α_{2}, β

are the coefficients of each term,

i d_{e r r}

is the d-axis current error,

i q_{e r r}

is the q-axis current error, and

{u^{j}}_{t - 1}

are the actions from the previous time step.

The hyperparameter settings for the training process are shown in Table 1.

4. Results and Analysis

The parameters of the motor utilized in the simulation presented in this paper are detailed in Table 2. In addition, the PI, LADRC, and TD3 controllers proposed in this article can be implemented in MATLAB/Simulink to control the PMSM. The simulation was performed using MATLAB version R2024b on a personal computer equipped with a Windows 11 64-bit operating system, 16 GB of memory, and a CPU running at 3.7 GHz. The snapshot of the implemented software is shown in Figure 4, as follows.

In this article, simulation experiments and data analysis employ per-unit (pu) values. Per-unit values are a dimensionless indicator commonly utilized in power systems to represent the ratio between actual values and reference values. This enables relative comparisons among different systems, electrical quantities, or engineering parameters, facilitating quantitative analysis and research. The reference values used in this article can be found in Table 3.

To evaluate the effectiveness and robustness of the algorithm, simulation experiments were conducted under two representative PMSM operating conditions.

Working condition 1: The PMSM load is constant, the starting moment is given as 0.5 pu, and the reference speed trajectory is a sinusoidal signal,

S p e e d_{r e f} = 0.3 * \sin (t) + 0.5

; there is a no-load startup. The experimental results are shown in Figure 5 and Figure 6. In Figure 5, Episode Q0 refers to the value of the initial state estimated by the critic (i.e., Q(S0)). The same applies to Figure 7 below.

As can be seen from Figure 5, the agent’s Episode reward value converges around −310 after approximately 100 episodes. An episode consists of a sequence of time steps. During each time step, the agent performs an action, receives a reward from the environment, and transitions to a new state. This sequence of interactions forms a complete episode. The duration of different episodes can vary significantly.

From Figure 6a, it is evident that the TD3 algorithm of this design achieves faster tracking of the upper reference trajectory speed with minimal overshoot compared to the other controllers. The traditional PI controller exhibits significant overshoot, whereas the LADRC controller requires a longer time to reach a steady state. From Figure 6b, it can be observed that the TD3-based controller stabilizes the q-axis current closer to the reference value more rapidly and with less fluctuation. In contrast, the PI controller shows substantial overshoot, and the LADRC controller experiences greater fluctuations. From Figure 6c, it is clear that the TD3 algorithm, as a reinforcement learning approach, demonstrates the smallest speed tracking error and superior stability.

Working condition 2: The PMSM speed is suddenly increased, the load is constant, the given speed is 0.5 pu at the starting moment, and it is suddenly changed to 0.8 pu at 1 s; there is a no-load startup. The experimental results are shown in Figure 7 and Figure 8.

As can be seen from Figure 7, the agent’s Episode reward value converges around −230 after approximately 100 episodes.

From Figure 8a, it is evident that the controller using the LADRC algorithm exhibits the smallest overshoot but requires the longest time to reach a steady state. In contrast, the RL algorithm achieves the fastest return to a steady state with significantly less overshoot compared to the PI algorithm. From Figure 8b, it can be observed that the torque fluctuation caused by a sudden change in rotational speed is also minimized. Furthermore, as shown in Figure 8c, the proposed algorithm demonstrates the smallest overall rotational speed tracking error. Table 4 summarizes the performance comparison of the three algorithms over a 1 s period.

From Table 3, it is evident that the RL algorithm demonstrates superior adaptability to sudden speed changes. It can rapidly return to a steady state, with the overshoot remaining within an acceptable range.

To verify the stability of the proposed algorithm, torque disturbances with abrupt changes were introduced under two typical operating conditions. In condition 1, a torque load of 1 N was applied at 2 s, while in condition 2, the same load was applied at 1.5 s. The experimental results are presented in Figure 9. Under both conditions, the proposed method exhibited robust anti-interference performance.

5. Conclusions

The PMSM drive system is a typical dual-timescale system characterized by control challenges such as parameter variations, external disturbances, and nonlinearities. In this paper, we propose a TD3-based optimal control method aimed at minimizing the energy consumption of PMSM drive systems. We conducted simulation experiments under two typical operating conditions to validate the effectiveness and robustness of the proposed algorithm. At the same time, when designing the algorithm, the energy loss of the controller was taken into account, that is, the control quantity was added to the reward function of the agent. The experimental results demonstrate that the TD3 algorithm surpasses both the traditional PI controller and the linear active disturbance rejection control (LADRC) algorithm in terms of reference trajectory tracking accuracy, q-axis current regulation, and speed tracking error minimization. Furthermore, when the PMSM was subjected to torque mutations, the algorithm proposed in this paper exhibited superior control performance. This study offers a novel approach to PMSM drive system control, significantly enhancing motor efficiency and system robustness through the integration of deep reinforcement learning.

Author Contributions

Conceptualization, Y.Z.; methodology, and writing—original draft preparation, Z.H.; software, M.L.; validation, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly funded by the National Key Research and Development Program of China, grant number 2019YFE0105300, and the Hunan Provincial Regional Joint Fund Project, grant number 2024JJ7179.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, X.; Xu, R.; Yao, W.; Gao, Y.; Sun, G.; Liu, J. Observer-Based Prescribed Performance Speed Control for PMSMs: A Data-Driven RBF Neural Network Approach. IEEE Trans. Ind. Inform. 2024, 20, 7502–7512. [Google Scholar] [CrossRef]
Lin, X.; Wu, C.; Yao, W.; Liu, Z.; Shen, X.; Xu, R. Observer-Based Fixed-Time Control for Permanent-Magnet Synchronous Motors with Parameter Uncertainties. IEEE Trans. Power Electron. 2023, 38, 4335–4344. [Google Scholar] [CrossRef]
Krishnan, R. Electric Motor Drives: Modeling, Analysis, and Control; Prentice-Hall: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
Zhang, Z.; Liu, Y.; Liang, X.; Guo, H.; Zhuang, X. Robust Model Predictive Current Control of PMSM Based on Nonlinear Extended State Observer. IEEE J. Emerg. Sel. Top. Power Electron. 2023, 11, 862–873. [Google Scholar] [CrossRef]
Yang, C.; Meng, F.; Zhang, H.; Zhao, J.; Wang, H.; Zhou, L. Optimal Coordinated Control for Speed Tracking and Torque Synchronization of Rigidly Connected Dual-Motor Systems. IEEE/ASME Trans. Mechatron. 2023, 28, 2609–2620. [Google Scholar] [CrossRef]
Li, X.; Tian, W.; Gao, X. A Generalized Observer-Based Robust Predictive Current Control Strategy for PMSM Drive System. IEEE Trans. Ind. Electron. 2022, 69, 1322–1332. [Google Scholar] [CrossRef]
Wang, Y.; Fang, S.; Hu, J.; Huang, D. A Novel Active Disturbance Rejection Control of PMSM Based on Deep Reinforcement Learning for More Electric Aircraft. IEEE Trans. Energy Convers. 2023, 38, 1461–1470. [Google Scholar] [CrossRef]
Wang, Y.; Fang, S.; Hu, J. Active Disturbance Rejection Control Based on Deep Reinforcement Learning of PMSM for More Electric Aircraft. IEEE Trans. Power Electron. 2023, 38, 406–416. [Google Scholar] [CrossRef]
Jiang, X.; Yang, Y.; Fan, M.; Ji, A.; Xiao, Y.; Zhang, X. An Improved Implicit Model Predictive Current Control with Continuous Control Set for PMSM Drives. IEEE Trans. Transp. Electrif. 2022, 8, 2444–2455. [Google Scholar] [CrossRef]
Xu, B.; Jiang, Q.; Ji, W.; Ding, S. An Improved Three-Vector-Based Model Predictive Current Control Method for Surface-Mounted PMSM Drives. IEEE Trans. Transp. Electrif. 2022, 8, 4418–4430. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Gu, M.; Xu, Z.; Zou, Z.; Wang, W. Fault-Tolerant Control of Common Electrical Faults in Dual Three-Phase PMSM Drives Fed by T-Type Three-Level Inverters. IEEE Trans. Ind. Appl. 2021, 57, 481–491. [Google Scholar] [CrossRef]
Sun, Z.; Deng, Y.; Wang, J.; Yang, T.; Wei, Z.; Cao, H. Finite Control Set Model-Free Predictive Current Control of PMSM with Two Voltage Vectors Based on Ultralocal Model. IEEE Trans. Power Electron. 2023, 38, 776–788. [Google Scholar] [CrossRef]
Ma, Y.; Li, D.; Li, Y.; Yang, L. A Novel Discrete Compound Integral Terminal Sliding Mode Control With Disturbance Compensation for PMSM Speed System. IEEE/ASME Trans. Mechatron. 2022, 27, 549–560. [Google Scholar] [CrossRef]
Zhang, J.; Ren, W.; Sun, X. Current-Constrained Adaptive Robust Control for Uncertain PMSM Drive Systems: Theory and Experimentation. IEEE Trans. Transp. Electrif. 2023, 9, 4158–4169. [Google Scholar] [CrossRef]
Tan, L.; Cong, T.; Cong, D. Neural Network Observers and Sensorless Robust Optimal Control for Partially Unknown PMSM with Disturbances and Saturating Voltages. IEEE Trans. Power Electron. 2021, 36, 12045–12056. [Google Scholar] [CrossRef]
Li, Z.; Wang, F.; Ke, D.; Li, J.; Zhang, W. Robust Continuous Model Predictive Speed and Current Control for PMSM with Adaptive Integral Sliding-Mode Approach. IEEE Trans. Power Electron. 2021, 36, 14398–14408. [Google Scholar] [CrossRef]
Zhao, J.; Yang, C.; Gao, W.; Zhou, L. Reinforcement Learning and Optimal Control of PMSM Speed Servo System. IEEE Trans. Ind. Electron. 2023, 70, 8305–8313. [Google Scholar] [CrossRef]
Attestog, S.; Senanayaka, J.; Khang, H.; Robbersmyr, K. Robust Active Learning Multiple Fault Diagnosis of PMSM Drives with Sensorless Control Under Dynamic Operations and Imbalanced Datasets. IEEE Trans. Ind. Inform. 2023, 19, 9291–9301. [Google Scholar] [CrossRef]
Wang, Y.; Fang, S.; Hu, J.; Huang, D. Multiscenarios Parameter Optimization Method for Active Disturbance Rejection Control of PMSM Based on Deep Reinforcement Learning. IEEE Trans. Ind. Electron. 2023, 70, 10957–10968. [Google Scholar] [CrossRef]
Wang, Y.; Fang, S.; Huang, D. An Improved Model-Free Active Disturbance Rejection Deadbeat Predictive Current Control Method of PMSM Based on Data-Driven. IEEE Trans. Power Electron. 2023, 38, 9606–9616. [Google Scholar] [CrossRef]
Luo, L.; Huang, W.; Huang, M.; Fan, Q. Model-Free Predictive Current Control of Sensorless PMSM Drives with Extended Kalman Filter. In Proceedings of the 2023 26th International Conference on Electrical Machines and Systems (ICEMS), Zhuhai, China, 5–8 November 2023; pp. 2369–2374. [Google Scholar]
Wei, Y.; Men, S.; Wei, Y.; Qi, H.; Wang, F. A Model-Free Predictive Current Control for PMSM Driving System of EV with Adjustable Low Inertia. In Proceedings of the 2022 IEEE Transportation Electrification Conference and Expo, Asia-Pacific (ITEC Asia-Pacific), Haining, China, 28–31 October 2022; pp. 1–7. [Google Scholar]
Jie, H.; Zheng, G.; Zou, J.; Xin, X.; Guo, L. Speed Regulation Based on Adaptive Control and RBFNN for PMSM Considering Parametric Uncertainty and Load Fluctuation. IEEE Access 2020, 8, 190147–190159. [Google Scholar] [CrossRef]
Lin, P.; Wu, Z.; Liu, K.; Sun, X. A Class of Linear–Nonlinear Switching Active Disturbance Rejection Speed and Current Controllers for PMSM. IEEE Trans. Power Electron. 2021, 36, 14366–14382. [Google Scholar] [CrossRef]
Tan, L.; Pham, T. Optimal Tracking Control for PMSM With Partially Unknown Dynamics, Saturation Voltages, Torque, and Voltage Disturbances. IEEE Trans. Ind. Electron. 2022, 69, 3481–3491. [Google Scholar] [CrossRef]
Saglam, B.; Mutlu, F.; Cicek, D.; Kozat, S. Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients. Neural. Process. Lett. 2024, 56, 80. [Google Scholar] [CrossRef]
Yin, F.; Yuan, X.; Ma, Z.; Xu, X. Vector Control of PMSM Using TD3 Reinforcement Learning Algorithm. Algorithms 2023, 16, 404. [Google Scholar] [CrossRef]

Figure 1. Three-phase PMSM vector control block diagram.

Figure 2. Structure of the TD3 algorithm.

Figure 3. The dual closed-loop control structure of PMSM speed and current based on TD3.

Figure 4. Snapshot of the implemented software.

Figure 5. Training results for working condition 1.

Figure 6. Experimental results of a PMSM operating in working condition 1. (a) Rotor speed; (b) Q-axis current; (c) speed tracking error.

Figure 7. Training results for working condition 2.

Figure 8. Experimental results of PMSM operating in working condition 2. (a) Rotor speed; (b) Q-axis current; (c) speed tracking error.

Figure 9. Experimental results of a PMSM operating in working condition 2 with torque disturbances. (a) Working condition 1; (b) working condition 2.

Table 1. The hyperparameter settings for the training process.

Parameters	Value
Sample Time	0.0001
Discount Factor	0.99
Experience Buffer Length	10000
Target Smoothing Factor	0.006
Target Update Frequency	10
Mini Batch Size	256

Table 2. Parameters of PMSM.

Parameters	Symbol	Value
Rated current	$I_{N}$	7.2600
Rated torque	$T_{N}$	0.3471
Maximum speed	$N_{\max}$	4300
Number of pole pairs	$N p$	7
Nominal phase resistance	$R_{s}$	0.2930
Nominal d-axis inductance	$L_{d}$	8.7678 × 10⁻⁵
Nominal q-axis inductance	$L_{q}$	7.7724 × 10⁻⁵
Nominal permanent flux	$Φ_{m}$	0.0046

Table 3. The reference values used in this article.

Name	Value
Speed base (RPM)	3476
Current base (A)	21.4286
Voltage base (V)	13.8564

Table 4. Performance comparison under different control algorithms.

Performance Parameters	PI	LADRC	RL
Settlingtime (s)	0.25	0.50	0.18
Risetime (s)	0.03	0.15	0.04
Undershoot (%)	11.25	2.38	5.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Z.; Zhang, Y.; Li, M.; Liao, Y. Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3. Energies 2025, 18, 901. https://doi.org/10.3390/en18040901

AMA Style

Hu Z, Zhang Y, Li M, Liao Y. Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3. Energies. 2025; 18(4):901. https://doi.org/10.3390/en18040901

Chicago/Turabian Style

Hu, Zuolei, Yingjie Zhang, Ming Li, and Yuhua Liao. 2025. "Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3" Energies 18, no. 4: 901. https://doi.org/10.3390/en18040901

APA Style

Hu, Z., Zhang, Y., Li, M., & Liao, Y. (2025). Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3. Energies, 18(4), 901. https://doi.org/10.3390/en18040901

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Speed Optimization Control of a Permanent Magnet Synchronous Motor Based on TD3

Abstract

1. Introduction

2. Description of the Control Problem

2.1. Control Object Model

2.2. Speed Loop Control

2.3. Current Loop Control

3. TD3 of PMSM

4. Results and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI