An actor-critic learning framework based on Lyapunov stability for automatic assembly

832 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

With the continuous improvement of the reinforcement learning (RL) algorithm, the algorithm has achieved excellent performance in an increasing number of automatic control tasks. However, there are still some challenges when applying the algorithm to realistic automatic assembly. The most significant challenge is that the stability of these model-free RL methods cannot be effectively guaranteed. Stability is the most critical characteristic of a control system, and stability is closely related to reliability and safety. To ensure the stability of the system, we reconstruct the RL algorithm based on the Lyapunov stability theory of the stochastic system proposed in this paper. An actor-critic learning framework based on Lyapunov stability (LSAC) is proposed for automatic assembly. In addition, this paper proposes a median Q-value theory to alleviate the Q-value estimation deviation that restricts the performance of the RL algorithm. To allow RL agents to better complete the automatic assembly task, this paper designs an adaptive impedance control algorithm. This impedance algorithm executes the actions output by the LSAC framework. Finally, a realistic experiment on automatic assembly is carried out to verify the robustness and superiority of the proposed strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generation Approach of Human-Robot Cooperative Assembly Strategy Based on Transfer Learning

Article 19 August 2022

Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell

Disassembly line optimization with reinforcement learning

Article Open access 09 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Tereshchuk V, Bykov N, Pedigo S, Devasia S, Banerjee AG (2021) A scheduling method for multi-robot assembly of aircraft structures with soft task precedence constraints. Robot Comput-Integr Manuf 71:102154
Article Google Scholar
Gunji AB, Deepak BBBVL, Bahubalendruni CMVAR, Biswal DBB (2018) An optimal robotic assembly sequence planning by assembly subsets detection method using teaching learning-based optimization algorithm. IEEE Trans Autom Sci Eng 15(3):1369–1385
Article Google Scholar
Su J, Liu C, Li R (2022) Robot precision assembly combining with passive and active compliant motions. IEEE Trans Ind Electron 69(8):8157–8167
Article Google Scholar
Zhang T, Liang X, Zou Y (2022) Robot peg-in-hole assembly based on contact force estimation compensated by convolutional neural network. Control Eng Practice 120:105012
Article Google Scholar
Liu Z, Song L, Hou Z, Chen K, Liu S, Xu J (2019) Screw insertion method in peg-in-hole assembly for axial friction reduction. IEEE Access 7:148313–148325
Article Google Scholar
Park H, Park J, Lee D, Park J, Baeg M, Bae J (2017) Compliance-based robotic peg-in-hole assembly strategy without force feedback. IEEE Trans Ind Electron 64(8):6299–6309
Article Google Scholar
Zhang H, Peng Q, Zhang J, Gu P (2021) Planning for automatic product assembly using reinforcement learning. Comput Ind 130:103471
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, RiedmillerM FAK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Neves M, Vieira M, Neto P (2021) A study on a Q-learning algorithm application to a manufacturing assembly problem. J Manuf Syst 59:426–440
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Computer Science 8(6):A187
Google Scholar
Beltran-Hernandez CC, Petit D, Ramirez-Alpizar IG, Harada K (2020) Variable compliance control for robotic peg-in-hole assembly: a deep-reinforcement-learning approach. Appl Sci-Basel 10(19):6923
Article Google Scholar
Li X, Xiao J, Zhao W, Liu H, Wang G (2022) Multiple peg-in-hole compliant assembly based on a learning-accelerated deep deterministic policy gradient strategy. Ind Robot 49(1):54–64
Article Google Scholar
Kim YL, Ahn KH, Song JB (2020) Reinforcement learning based on movement primitives for contact tasks. Robot Comput-Integr Manuf 62:101863. https://doi.org/10.1016/j.rcim.2019.101863
Article Google Scholar
Xu J, Hou Z, Wang W, Xu B, Zhang K, Chen K (2019) Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Trans Ind Inform 15(3):1658–1667
Article Google Scholar
Xiong F, Sun B, Yang X, Qiao H, Zhang K, Hussain A, Liu Z (2019) Guided policy search for sequential multitask learning. IEEE Trans Syst Man Cybern-Syst 49(1):216–226
Article Google Scholar
Luo W, Zhang J, Feng P, Liu H, Yu D, Wu Z (2021) An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm. Appl Intell 51:3405–3420
Article Google Scholar
Hou Z, Li Z, Hsu C, Zhang K, Xu J (2022) Fuzzy logic-driven variable time-scale prediction-based reinforcement learning for robotic multiple peg-in-hole assembly. IEEE Trans Autom Sci Eng 19(1):218–229
Article Google Scholar
Zanon M, Gros S (2021) Safe reinforcement learning using robust MPC. IEEE Trans Autom Control 66(8):3638–3652
Article MATH Google Scholar
Wu B, Chang X-H, Zhao X (2021) Fuzzy Η_∞ output feedback control for nonlinear NCSs with quantization and stochastic communication protocol. IEEE Trans Fuzzy Syst 29(9):2623–2634
Article Google Scholar
Zhang H, Wang H, Niu B, Zhang L, Ahmad AM (2021) Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf Sci 580:756–774
Article Google Scholar
Kumar A, Sharma R (2017) Fuzzy Lyapunov reinforcement learning for non linear systems. ISA Trans 67:151–159
Article Google Scholar
Abhishek K, Rajneesh S, Pragya V (2018) Lyapunov fuzzy Markov game controller for two link robotic manipulator. J Intell Fuzzy Syst 34(3):1479–1490
Article Google Scholar
Han M, Zhang L, Wang J, Pan W (2020) Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett 5(4):6217–6224
Article Google Scholar
Chen M, Lam HK, Shi Q, Xiao B (2020) Reinforcement learning-based control of nonlinear systems using Lyapunov stability concept and fuzzy reward scheme. IEEE Trans Circuits Syst II-Express Briefs 67(10):2059–2063
Google Scholar
Zhang L, Zhang R, Wu T, Weng R, Han M, Zhao Y (2021) Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst 32(12):5435–5444
Article Google Scholar
Khader SA, Yin H, Falco P, Kragic D (2021) Stability-guaranteed reinforcement learning for contact-rich manipulation. IEEE Robot Autom Lett 6(1):1–8
Article Google Scholar
Bhandari J, Russo D, Singal R (2018) A finite-time analysis of temporal difference learning with linear function approximation. Oper Res 69(3):1691–1692
Google Scholar
Fujimoto S, Hoof HV, Meger D (2018) Addressing function approximation error in Actor-Critic methods. In: Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. pp 1587–1596. https://doi.org/10.48550/arXiv.1802.09477
Tiong T, Saad I, Teo KTK, Lago Hb (2020) Deep reinforcement learning with robust deep deterministic policy gradient. In: 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering, Kuala Lumpur, Malaysia. pp 1–5. https://doi.org/10.1109/ICECIE50279.2020.9309539
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (ICML 1999), Bled, Slovenia. pp 278–287. https://dl.acm.org/doi/10.5555/645528.657613
Wiewiora E, Cottrell GW, Elkan C (2003) Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), Washington DC, pp 792–799. https://aaai.org/Papers/ICML/2003/ICML03-103.pdf
Wang S, Yang R, Li B, Kan Z (2022) Structural parameter space exploration for reinforcement learning via a matrix variate distribution. IEEE Transactions on Emerging Topics in Computational Intelligence:1–11. https://doi.org/10.1109/TETCI.2022.3140380
Roveda L, Pedrocchi N, Beschi M, Tosatti LM (2017) High-accuracy robotized industrial assembly task control schema with force overshoots avoidance. Control Eng Practice 71:142–153
Article Google Scholar
Beltran-Hernandez CC, Petit D, Ramirez-Alpizar IG, Nishi T, Kikuchi S, Matsubara T, Harada K (2020) Learning force control for contact-rich manipulation tasks with rigid position-controlled robots. IEEE Robot Autom Lett 5(4):5709–5716
Article Google Scholar
Zhao X, Han S, Tao B, Yin Z, Ding H (2021) Model-based actor-critic learning of robotic impedance control in complex interactive environment. IEEE Trans Ind Electron. https://doi.org/10.1109/TIE.2021.3134082

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grants 52175025 and 51721003).

Data availability statements

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education Tianjin University, Tianjin, 300354, China
Xinwang Li, Juliang Xiao, Yu Cheng & Haitao Liu
AVIC Manufacturing Technology Institute, Beijing, 100024, China
Xinwang Li

Authors

Xinwang Li
View author publications
You can also search for this author in PubMed Google Scholar
Juliang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juliang Xiao.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(MP4 12,361 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Xiao, J., Cheng, Y. et al. An actor-critic learning framework based on Lyapunov stability for automatic assembly. Appl Intell 53, 4801–4812 (2023). https://doi.org/10.1007/s10489-022-03844-2

Download citation

Accepted: 01 June 2022
Published: 15 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03844-2

An actor-critic learning framework based on Lyapunov stability for automatic assembly

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generation Approach of Human-Robot Cooperative Assembly Strategy Based on Transfer Learning

Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell

Disassembly line optimization with reinforcement learning

References

Acknowledgements

Data availability statements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An actor-critic learning framework based on Lyapunov stability for automatic assembly

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generation Approach of Human-Robot Cooperative Assembly Strategy Based on Transfer Learning

Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell

Disassembly line optimization with reinforcement learning

Explore related subjects

References

Acknowledgements

Data availability statements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation