A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning
<p>Five-axis machining process.</p> "> Figure 2
<p>TCP diagram of a UV printing robot.</p> "> Figure 3
<p>A Complex surface path planning framework based on GAIL-SAC.</p> "> Figure 4
<p>GAIL-SAC network algorithm framework.</p> "> Figure 5
<p>Main printing path data generation diagram. (<b>a</b>) The generated CNC data. (<b>b</b>) The robot data obtained after conversion.</p> "> Figure 6
<p>Reacher experiment flowchart.</p> "> Figure 7
<p>Reward curve in the reacher environment.</p> "> Figure 8
<p>Spray printing experiment flowchart.</p> "> Figure 9
<p>Comparison of algorithm rewards for the printing environment.</p> "> Figure 10
<p>Cartesian space path comparison diagram. (<b>a</b>) Comparison of algorithm path in the X-Z plane. (<b>b</b>) Comparison of algorithm path in the X-Y plane. (<b>c</b>) Comparison of algorithm path in the Y-Z plane.</p> "> Figure 11
<p>Comparison of algorithm rewards for the printing environment.</p> "> Figure 12
<p>Comparison chart of joint space smoothness. (<b>a</b>)Velocity fluctuations in joint space for the conventional path planning algorithm. (<b>b</b>) Velocity fluctuations in joint space for the Genetic Algorithm (GA). (<b>c</b>) Velocity fluctuations in joint space for the Particle Swarm Optimization (PSO) algorithm. (<b>d</b>) Velocity fluctuations in joint space for the GAIL-SAC algorithm. (<b>e</b>) Comparative analysis of velocity fluctuations in joint 0 across algorithms. (<b>f</b>) Comparative analysis of velocity fluctuations in joint 1 across algorithms. (<b>g</b>) Comparative analysis of velocity fluctuations in joint 2 across algorithms. (<b>h</b>) Comparative analysis of velocity fluctuations in joint 3 across algorithms. (<b>i</b>) Comparative analysis of velocity fluctuations in joint 4 across algorithms. (<b>j</b>) Comparative analysis of velocity fluctuations in joint 5 across algorithms.</p> "> Figure 13
<p>Comparison chart of joint space smoothness.</p> "> Figure 14
<p>Comparison between simulated robots and real robots. (<b>a</b>) Simulation environment. (<b>b</b>) Real environment.</p> "> Figure 15
<p>Spray printing process and spray printing effect diagram. (<b>a</b>) Robot’s movement position 1 during the printing process. (<b>b</b>) Robot’s movement position 2 during the printing process. (<b>c</b>) Robot’s movement position 3 during the printing process. (<b>d</b>) Robot’s movement position 4 during the printing process. (<b>e</b>) Robot’s movement position 5 during the printing process. (<b>f</b>) Printing result at position 1. (<b>g</b>) Printing result at position 2. (<b>h</b>) Printing result at position 3. (<b>i</b>) Printing result at position 4. (<b>j</b>) Printing result at position 5.</p> "> Figure 16
<p>Comparison chart of smoothness in Cartesian space. (<b>a</b>) Cartesian space velocity variation plot of the conventional path planning algorithm. (<b>b</b>) Cartesian space velocity variation plot of the GA algorithm. (<b>c</b>) Cartesian space velocity variation plot of the PSO algorithm. (<b>d</b>) Cartesian space velocity variation plot of the GAIL-SAC algorithm. (<b>e</b>) Comparison of velocity variation in the X-axis direction for different algorithms. (<b>f</b>) Comparison of velocity variation in the Y-axis direction for different algorithms. (<b>g</b>) Comparison of velocity variation in the Z-axis direction for different algorithms.</p> "> Figure 17
<p>Comparison chart of the standard deviation of velocity variation in Cartesian space.</p> ">
Abstract
:1. Introduction
- Path generation methods based on CAD/point cloud data mainly focus on geometric modeling [1] but lack optimization for trajectory smoothness and precision. Current research predominantly addresses spray gun modeling and coating thickness optimization, with relatively little focus on path accuracy and smoothness optimization. This may lead to suboptimal performance in high-speed, high-precision processing, negatively impacting printing quality.
- Traditional optimization algorithms, such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied in path optimization. However, these methods [2] tend to fall into local optima in high-dimensional optimization problems, exhibit slow convergence rates, and have limited capability in trajectory smoothness optimization.
- Reinforcement learning (RL), known for its adaptive optimization capabilities, has been applied to path optimization problems [3]. However, existing RL methods suffer from challenges such as convergence difficulties, low training efficiency, and inadequate trajectory smoothness optimization.
- A curved surface trajectory generation method based on CNC path transformation is proposed, which integrates CAD models and point cloud data. A conversion strategy from CNC machining paths to robot trajectories is designed to ensure path accuracy and operational feasibility.
- A robot motion accuracy model is established and formulated as an MDP problem, where the SAC reinforcement learning algorithm is used to optimize it. The reinforcement learning algorithm is applied in joint space trajectory planning, ensuring trajectory smoothness not only in joint space but also in Cartesian space with high accuracy.
- A GAIL-SAC reinforcement learning framework is proposed, leveraging imitation learning to improve training efficiency and reinforcement learning to optimize trajectory precision, thereby enhancing the algorithm’s convergence speed and stability.
- Experimental validation demonstrates that the proposed method outperforms existing methods in terms of trajectory smoothness and accuracy while significantly reducing training time and improving the robustness of trajectory optimization.
2. Related Work
2.1. Curved Surface Path Generation Method
2.2. Traditional Optimization Algorithms (GA and PSO) and Their Limitations
2.3. Reinforcement Learning Path Optimization Method
3. Surface Path Planning Method for Spray Printing
3.1. Generate Main Path
3.2. Establishment of Robot Motion Accuracy Model
3.3. Markov Decision Process (MDP)
- S (State set): Represents all possible states the agent can be in. For instance, in path planning, the state could include the robot’s position, orientation, and other relevant information at a given time. Each state encapsulates the full information about the environment.
- A (Action set): Represents the set of actions the agent can take in each state. The action set can be discrete (e.g., move up, move down, move left, move right) or continuous (e.g., adjusting the robot’s speed or direction). Each action corresponds to a specific behavior.
- P(s′ | s, a) (State transition function): Describes the probability of transitioning from state s to state after performing action a. It reflects the dynamic characteristics of the environment. For example, in path planning, the robot may fail to reach the desired position due to external disturbances.
- R(s, a) (Reward function): Represents the immediate reward received after taking action a in state s. The reward is a scalar value used to measure the quality of the outcome of an action. In path planning problems, rewards can be related to factors such as path length, smoothness, and obstacle avoidance, with the goal of maximizing the cumulative reward.
- (Discount factor): Used to balance the trade-off between current rewards and future rewards, with values typically in the range [0, 1]. When is close to 1, the agent focuses more on long-term returns; when is smaller (closer to 0), the agent focuses more on short-term rewards.
3.4. Reinforcement Learning SAC Algorithm
- Q-Network(Critic): Used to evaluate the return of each state–action pair. SAC uses a dual Q-network ( and ), where the two Q-value networks independently estimate the return for the same state–action pair. By using dual Q-networks, SAC reduces the problem of overestimation of Q-values, enhancing the stability of the learning process. The Q-value network update formula is given by Equation (17):
- Target Q-Network: To stabilize the training of the Q-network, SAC introduces a target Q-network. The target Q-network is used to compute the target , preventing overestimation of the Q-values. The target Q-value network is updated through soft updates, as follows:
- Policy Network (Actor): The purpose of the policy network is to generate the probability distribution of actions given a state. In SAC, the policy is represented by a Gaussian distribution, which outputs the mean and standard deviation of the actions. The goal of the policy network is to maximize the return while maintaining exploration. The update objective of the policy network is as follows:
- Temperature Parameter (): The temperature parameter controls the balance between the entropy of the policy and the return. A higher temperature encourages more exploration, while a lower temperature strengthens the maximization of the return. The update formula for the temperature parameter is as follows:
3.5. Generative Adversarial Imitation Learning
4. Framework for Path Planning of Complex Surface Spray Printing
- Stage 1: The primary path is generated, which is designed using multi-axis CNC machining software such as UG. The data are exported as CNC data, and a conversion algorithm is developed to transform the CNC tool path data into a Cartesian path for the robot.
- Stage 2: In this stage, robot motion planning is performed based on the primary path generated in the first stage. The robot moves with different joint angles at different positions to complete the path planning. This process is treated as an MDP problem, which is solved using SAC.
- Stage 3: This stage utilizes the GAIL-SAC framework to improve the convergence speed and trajectory accuracy of reinforcement learning training, and its algorithmic process is shown in Table 1.
4.1. Spray Printing Trajectory Generation Scheme
4.2. Spray Printing Trajectory Planning Scheme
4.3. Reinforcement Learning GAIL-SAC
5. Simulation and Experiments
5.1. Experimental Environment Configuration
5.2. Main Path Generation Experiment
5.3. Convergence of GAIL-SAC
5.4. Printing Path Planning Experiment
6. Case Analysis
7. Practical Implications Analysis
- Automotive Industry: Automotive parts, such as bodies, hoods, doors, spoilers, and interior components, often have complex surface shapes. Traditional spraying methods fail to meet the high-precision spraying requirements. UV printing technology can accurately print patterns, colors, or provide protective coatings on these irregular surfaces, thereby enhancing the aesthetics of automotive exteriors and extending the lifespan of parts.
- Aerospace: Aircraft components such as fuselages, wings, tail fins, engines, and turbine blades require clear identification, including production numbers, model types, airline logos, and safety marks. UV printing technology can accurately print these marks on complex surfaces, ensuring compliance with aviation safety standards and providing durability. Additionally, it offers uniform and high-quality coatings for complex aerospace parts, thereby improving production efficiency.
- Medical Equipment: In medical device manufacturing, UV printing technology can meet the personalized printing needs of devices with special geometric shapes, such as custom prosthetics, orthotics, surgical instruments, and medical monitoring equipment, ensuring surface printing accuracy and enhancing both the functionality and aesthetics of the devices.
- Food Packaging: UV printing technology can provide precise pattern printing solutions for beverage bottles, cans, and other packaging, enhancing brand recognition and increasing market competitiveness.
- Consumer Electronics: In consumer electronic products such as smartphones, tablets, and other devices, UV printing technology can achieve high-quality coating printing, providing protection against static electricity, fingerprints, and enhancing the product’s lifespan and consumer experience.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
{WCS} | Workpiece Coordinate System |
{LCS} | Local Tool Coordinate System |
RL | Reinforcement Learning |
MDP | Markov Decision Process |
PSO | Particle Swarm Optimization |
GA | Genetic Algorithm |
SAC | Soft Actor–Critic |
GAIL | Generative Adversarial Imitation Learning |
GAIL-SAC | Generative Adversarial Imitation Learning and Soft Actor–Critic |
CAM | Computer-Aided Manufacturing |
BC | Behavioral Cloning |
HER | Hindsight Experience Replay |
TCP | Tool Center Point |
CNC | Computer Numerical Control |
CAD | Computer-Aided Design |
References
- Verduyn, A.; De Schutter, J.; Decré, W.; Vochten, M. Shape-based path adaptation and simulation-based velocity optimization of initial tool trajectories for robotic spray painting. In Proceedings of the 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), Auckland, New Zealand, 26–30 August 2023; pp. 1–8. [Google Scholar]
- Gao, R.; Zhou, Q.; Cao, S.; Jiang, Q. Apple-Picking Robot Picking Path Planning Algorithm Based on Improved PSO. Electronics 2023, 12, 1832. [Google Scholar] [CrossRef]
- Huang, Z.; Chen, G.; Shen, Y.; Wang, R.; Liu, C.; Zhang, L. An Obstacle-Avoidance Motion Planning Method for Redundant Space Robot via Reinforcement Learning. Actuators 2023, 12, 69. [Google Scholar] [CrossRef]
- Nieto Bastida, S.; Lin, C.Y. Autonomous Trajectory Planning for Spray Painting on Complex Surfaces Based on a Point Cloud Model. Sensors 2023, 23, 9634. [Google Scholar] [CrossRef] [PubMed]
- Weber, A.M.; Gambao, E.; Brunete, A. A Survey on Autonomous Offline Path Generation for Robot-Assisted Spraying Applications. Actuators 2023, 12, 403. [Google Scholar] [CrossRef]
- Bedaka, A.K.; Lin, C.Y. CAD-based robot path planning and simulation using OPEN CASCADE. Procedia Comput. Sci. 2018, 133, 779–785. [Google Scholar] [CrossRef]
- Gleeson, D.; Jakobsson, S.; Salman, R.; Ekstedt, F.; Sandgren, N.; Edelvik, F.; Carlson, J.S.; Lennartson, B. Generating optimized trajectories for robotic spray painting. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1380–1391. [Google Scholar] [CrossRef]
- Park, J.H.; Lim, Y.E.; Choi, J.H.; Hwang, M.J. Trajectory-based 3D point cloud ROI determination methods for autonomous mobile robot. IEEE Access 2023, 11, 8504–8522. [Google Scholar] [CrossRef]
- Meng, Y.; Jiang, Y.; Li, Y.; Pang, G.; Tong, Q. Research on point cloud processing and grinding trajectory planning of steel helmet based on 3D scanner. IEEE Access 2023, 12, 3085–3097. [Google Scholar] [CrossRef]
- Shah, S.H.; Khan, S.G.; Tran, C.C. Surface Normal Generation and Compliance Control for Robotic Based Machining Operations. In Proceedings of the 2024 9th International Conference on Control and Robotics Engineering (ICCRE), Osaka, Japan, 10–12 May 2024; pp. 74–79. [Google Scholar]
- Wu, L.; Zang, X.; Yin, W.; Zhang, X.; Li, C.; Zhu, Y.; Zhao, J. Pose and Path Planning for Industrial Robot Surface Machining Based on Direction Fields. IEEE Robot. Autom. Lett. 2024, 9, 10455–10462. [Google Scholar] [CrossRef]
- Wang, G.; Li, W.; Jiang, C.; Zhu, D.; Li, Z.; Xu, W.; Zhao, H.; Ding, H. Trajectory planning and optimization for robotic machining based on measured point cloud. IEEE Trans. Robot. 2021, 38, 1621–1637. [Google Scholar] [CrossRef]
- Zeng, Y.; Yu, Y.; Zhao, X.; Liu, Y.; Liu, J.; Liu, D. Trajectory planning of spray gun with variable posture for irregular plane based on boundary constraint. IEEE Access 2021, 9, 52902–52912. [Google Scholar] [CrossRef]
- Zhang, Y.; Xu, C.; Xiao, H.; Zhou, B.; Zeng, Y. Planning method of offset spray path for patch considering boundary factors. Math. Probl. Eng. 2018, 2018, 6067391. [Google Scholar] [CrossRef]
- Lu, S.; Ding, B.; Li, Y. Minimum-jerk trajectory planning pertaining to a translational 3-degree-of-freedom parallel manipulator through piecewise quintic polynomials interpolation. Adv. Mech. Eng. 2020, 12, 1687814020913667. [Google Scholar] [CrossRef]
- Zhu, J.; Pan, D. Improved Genetic Algorithm for Solving Robot Path Planning Based on Grid Maps. Mathematics 2024, 12, 4017. [Google Scholar] [CrossRef]
- Gao, Y.; Li, Z.; Wang, H.; Hu, Y.; Jiang, H.; Jiang, X.; Chen, D. An Improved Spider-Wasp Optimizer for Obstacle Avoidance Path Planning in Mobile Robots. Mathematics 2024, 12, 2604. [Google Scholar] [CrossRef]
- Hsieh, H.T.; Chu, C.H. Improving optimization of tool path planning in 5-axis flank milling using advanced PSO algorithms. Robot. Comput.-Integr. Manuf. 2013, 29, 3–11. [Google Scholar] [CrossRef]
- Prianto, E.; Park, J.H.; Bae, J.H.; Kim, J.S. Deep reinforcement learning-based path planning for multi-arm manipulators with periodically moving obstacles. Appl. Sci. 2021, 11, 2587. [Google Scholar] [CrossRef]
- Zhao, T.; Wang, M.; Zhao, Q.; Zheng, X.; Gao, H. A path-planning method based on improved soft actor-critic algorithm for mobile robots. Biomimetics 2023, 8, 481. [Google Scholar] [CrossRef] [PubMed]
- von Eschwege, D.; Engelbrecht, A. Soft Actor-Critic Approach to Self-Adaptive Particle Swarm Optimisation. Mathematics 2024, 12, 3481. [Google Scholar] [CrossRef]
- He, Y.; Hu, R.; Liang, K.; Liu, Y.; Zhou, Z. Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission. Mathematics 2024, 13, 46. [Google Scholar] [CrossRef]
- Huang, Y.; Zhou, C.; Zhang, L.; Lu, X. A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization. Mathematics 2024, 12, 4020. [Google Scholar] [CrossRef]
- Chen, W.; Li, X.; Ge, H.; Wang, L.; Zhang, Y. Trajectory planning for spray painting robot based on point cloud slicing technique. Electronics 2020, 9, 908. [Google Scholar] [CrossRef]
- He, S.; Hu, C.; Lin, S.; Zhu, Y. An online time-optimal trajectory planning method for constrained multi-axis trajectory with guaranteed feasibility. IEEE Robot. Autom. Lett. 2022, 7, 7375–7382. [Google Scholar] [CrossRef]
- He, S.; Hu, C.; Lin, S.; Zhu, Y.; Tomizuka, M. Real-time time-optimal continuous multi-axis trajectory planning using the trajectory index coordination method. ISA Trans. 2022, 131, 639–649. [Google Scholar] [CrossRef] [PubMed]
- Praniewicz, M.; Kurfess, T.R.; Saldana, C. Error qualification for multi-axis BC-type machine tools. J. Manuf. Syst. 2019, 52, 211–216. [Google Scholar] [CrossRef]
- Xie, S.; Sun, L.; Chen, G.; Wang, Z.; Wang, Z. A novel solution to the inverse kinematics problem of general 7r robots. IEEE Access 2022, 10, 67451–67469. [Google Scholar] [CrossRef]
- Chen, W.; Liu, J.; Tang, Y.; Huan, J.; Liu, H. Trajectory optimization of spray painting robot for complex curved surface based on exponential mean Bézier method. Math. Probl. Eng. 2017, 2017, 4259869. [Google Scholar] [CrossRef]
- Gao, G.; Sun, G.; Na, J.; Guo, Y.; Wu, X. Structural parameter identification for 6 DOF industrial robots. Mech. Syst. Signal Process. 2018, 113, 145–155. [Google Scholar] [CrossRef]
- Ren, J.; Sun, Y.; Hui, J.; Ahmad, R.; Ma, Y. Coating thickness optimization for a robotized thermal spray system. Robot. Comput.-Integr. Manuf. 2023, 83, 102569. [Google Scholar] [CrossRef]
- Teng, Q.; Yi, J.; Zhu, X.; Zhang, Y. Extraction method of position and posture information of robot arm picking up target based on RGB-D data. Therm. Sci. 2020, 24, 1481–1488. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Zuo, G.; Zhao, Q.; Huang, S.; Li, J.; Gong, D. Adversarial imitation learning with mixed demonstrations from multiple demonstrators. Neurocomputing 2021, 457, 365–376. [Google Scholar] [CrossRef]
- Kidera, S.; Shintani, K.; Tsuneda, T.; Yamane, S. Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning. IEEE Access 2024, 12, 19942–19951. [Google Scholar] [CrossRef]
- Tsurumine, Y.; Matsubara, T. Goal-aware generative adversarial imitation learning from imperfect demonstration for robotic cloth manipulation. Robot. Auton. Syst. 2022, 158, 104264. [Google Scholar] [CrossRef]
- Xu, L.; Cao, M.; Song, B. A new approach to smooth path planning of mobile robot based on quartic Bezier transition curve and improved PSO algorithm. Neurocomputing 2022, 473, 98–106. [Google Scholar] [CrossRef]
- Wang, F.; Wu, Z.; Bao, T. Time-jerk optimal trajectory planning of industrial robots based on a hybrid WOA-GA algorithm. Processes 2022, 10, 1014. [Google Scholar] [CrossRef]
GAIL-SAC Algorithm Steps | |
---|---|
1: Input: | Initialize neural network parameters |
2: | Initialize target Q-network parameters |
3: | Initialize reply buffer D |
4: | Initialize Expert buffer |
5: for each iteration do | |
6: for each environment do | |
7: | The action selected through strategy based on the current state |
8: | The robot reaches the next state to receive an immediate reward r |
9: | Store in replay buffer D |
10: end for | |
11: for each gradient step do | |
12: | Update Q-network parameter |
13: | Update discriminator D-network parameters |
14: Equation (26) | Determine the update formula for the Actor- network |
15: if | |
16: | Update Actor-network parameters |
17: else | |
18: | Update Actor-network parameters |
19: | Update temperature parameter |
20: | Update target Q-network parameter |
21: end for | |
22: end for |
Parameters | Setting Method |
---|---|
Processing method | Variable profile milling |
tool | Spherical milling cutter |
Knife axis vector | Vertical to the processed component |
guide | Boundary curve |
Path direction | one-way |
Machine tool type | Five axis machine tool (BC axis) |
Algorithm Comparison | Training Epochs | Reward |
---|---|---|
SAC | 3000 | −10.12 |
BC | 3000 | −15.23 |
GAIL-SAC | 3000 | −6.72 |
Algorithm | X-Z Plane MSE (mm2) | X-Y Plane MSE (mm2) | Y-Z Plane MSE (mm2) |
---|---|---|---|
PSO | 55.436 | 7.987 | 3.138 |
MOVE | 1231.129 | 1672.257 | 1560.586 |
RL | 40.025 | 4.877 | 4.771 |
GA | 51.443 | 5.533 | 8.249 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Lin, X.; Huang, C.; Cai, Z.; Liu, Z.; Chen, M.; Li, Z. A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning. Mathematics 2025, 13, 648. https://doi.org/10.3390/math13040648
Liu J, Lin X, Huang C, Cai Z, Liu Z, Chen M, Li Z. A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning. Mathematics. 2025; 13(4):648. https://doi.org/10.3390/math13040648
Chicago/Turabian StyleLiu, Jie, Xianxin Lin, Chengqiang Huang, Zelong Cai, Zhenyong Liu, Minsheng Chen, and Zhicong Li. 2025. "A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning" Mathematics 13, no. 4: 648. https://doi.org/10.3390/math13040648
APA StyleLiu, J., Lin, X., Huang, C., Cai, Z., Liu, Z., Chen, M., & Li, Z. (2025). A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning. Mathematics, 13(4), 648. https://doi.org/10.3390/math13040648