Methods for Learning Control Policies from Variable-Constraint Demonstrations

Matthew Howard⁴,
Stefan Klanke⁴,
Michael Gienger⁵,
Christian Goerick⁵ &
…
Sethu Vijayakumar⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 264))

1564 Accesses
8 Citations

Abstract

Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the task or the environment. Constraints are usually not observable and frequently change between contexts. In this chapter, we explore the problem of learning control policies from data containing variable, dynamic and non-linear constraints on motion. We discuss how an effective approach for doing this is to learn the unconstrained policy in a way that is consistent with the constraints. We then go on to discuss several recent algorithms for extracting policies from movement data, where observations are recorded under variable, unknown constraints. We review a number of experiments testing the performance of these algorithms and demonstrating how the resultant policy models generalise over constraints allowing prediction of behaviour under unseen settings where new constraints apply.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Task Dependent Trajectory Learning from Multiple Demonstrations Using Movement Primitives

Learning Spatio-temporal Characteristics of Human Motions Through Observation

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

Article 04 May 2017

References

Alissandrakis, A., Nehaniv, C.L., Dautenhahn, K.: Correspondence mapping induced state and action metrics for robotic imitation. IEEE Transactions on Systems, Man and Cybernetics 37(2), 299–307 (2007)
Article Google Scholar
Antonelli, G., Arrichiello, F., Chiaverini, S.: The null-space-based behavioral control for soccer-playing mobile robots. In: IEEE International Conference Advanced Intelligent Mechatronics, pp. 1257–1262 (2005)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. In: Robotics and Autonomous Systems (2008) (in press) (Corrected Proof)
Google Scholar
Billard, A., Calinon, S., Dillmann, R., Schaal, S.: Robot programming by demonstration. In: Handbook of Robotics, ch. 59. MIT Press, Cambridge (2007)
Google Scholar
Bolder, B., Dunn, M., Gienger, M., Janssen, H., Sugiura, H., Goerick, C.: Visually guided whole body interaction. In: IEEE International Conference on Robotics and Automation, pp. 3054–3061 (2007)
Google Scholar
Calinon, S., Billard, A.: Learning of gestures by imitation in a humanoid robot. In: Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions (2007)
Google Scholar
Chajewska, U., Koller, D., Ormoneit, D.: Learning an agent’s utility function by observing behavior. In: International Conference on Machine Learning (2001)
Google Scholar
Chajewska, U., Getoor, L., Norman, J., Shahar, Y.: Utility elicitation as a classification problem. In: Uncertainty in Artificial Intelligence, pp. 79–88. Morgan Kaufmann Publishers, San Francisco (1998)
Google Scholar
Chaumette, F., Marchand, A.: A redundancy-based iterative approach for avoiding joint limits: Application to visual servoing. IEEE Trans. Robotics and Automation 17(5), 719–730 (2001)
Article Google Scholar
Il Choi, S., Kim, B.K.: Obstacle avoidance control for redundant manipulators using collidability measure. Robotica 18(2), 143–151 (2000)
Article Google Scholar
Conner, D.C., Rizzi, A.A., Choset, H.: Composition of local potential functions for global robot control and navigation. In: IEEE International Conference on Intelligent Robots and Systems, October 27-31, vol. 4, pp. 3546–3551 (2003)
Google Scholar
D’Souza, A., Vijayakumar, S., Schaal, S.: Learning inverse kinematics. In: IEEE International Conference on Intelligent Robots and Systems (2001)
Google Scholar
English, J.D., Maciejewski, A.A.: On the implementation of velocity control for kinematically redundant manipulators. IEEE Transactions on Systems, Man and Cybernetics 30(3), 233–237 (2000)
Article Google Scholar
Fumagalli, M., Gijsberts, A., Ivaldi, S., Jamone, L., Metta, G., Natale, L., Nori, F., Sandini, G.: Learning how to exploit proximal force sensing: A comparison approach. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 149–167. Springer, Heidelberg (2010)
Google Scholar
Gienger, M., Janssen, H., Goerick, C.: Task-oriented whole body motion for humanoid robots. In: IEEE International Conference on Humanoid Robots, December 5, pp. 238–244 (2005)
Google Scholar
Grimes, D.B., Chalodhorn, R., Rajesh, P.N.R.: Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In: Robotics: Science and Systems. MIT Press, Cambridge (2006)
Google Scholar
Grimes, D.B., Rashid, D.R., Rajesh, P.N.R.: Learning nonparametric models for probabilistic imitation. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2007)
Google Scholar
Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, Special Issue on Imitative Robots 21(13), 1521–1544 (2007)
Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Behaviour generation in humanoids by learning potential-based policies from constrained motion. Applied Bionics and Biomechanics 5(4), 195–211 (2008) (in press)
Article Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Learning potential-based policies from constrained motion. In: IEEE International Conference on Humanoid Robots (2008)
Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from constrained motion. In: IEEE International Conference on Robotics and Automation (2009)
Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from variable constraint data. In: Autonomous Robots (submitted, 2009)
Google Scholar
Howard, M., Vijayakumar, S.: Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators. In: Workshop on Robotics and Mathematics (September 2007)
Google Scholar
Ijspeert, A.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoid robots. In: IEEE International Conference on Robotics and Automation, pp. 1398–1403 (2002); ICRA 2002 best paper award
Google Scholar
Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, pp. 1523–1530. MIT Press, Cambridge (2003)
Google Scholar
Inamura, T., Toshima, I., Tanie, H., Nakamura, Y.: Embodied symbol emergence based on mimesis theory. The International Journal of Robotics Research 23(4), 363–377 (2004)
Article Google Scholar
Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K., Harada, K., Yokoi, K., Hirukawa, H.: Resolved momentum control: Humanoid motion planning based on the linear and angular momentum. In: IEEE Int. Conf. on Intelligent Robots and Systems (2003)
Google Scholar
Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM 51(3), 497–515 (2004)
Article MathSciNet Google Scholar
Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: IEEE International Conference on Robotics and Automation, vol. 1, pp. 428–436 (1985)
Google Scholar
Khatib, O.: A unified approach for motion and force control of robot manipulators: the operational space formulation. IEEE Journal of Robotics and Automation RA-3(1), 43–53 (1987)
Article Google Scholar
Körding, K.P., Fukunaga, I., Howard, I.S., Ingram, J.N., Wolpert, D.M.: A neuroeconomics approach to inferring utility functions in sensorimotor control. PLoS Biolology 2(10), 330 (2004)
Article Google Scholar
Körding, K.P., Wolpert, D.M.: The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences 101, 9839–9842 (2004)
Article Google Scholar
Liégeois, A.: Automatic supervisory control of the configuration and behavior of multibody mechanisms. IEEE Trans. Sys., Man and Cybernetics 7, 868–871 (1977)
Article MATH Google Scholar
Mattikalli, R., Khosla, P.: Motion constraints from contact geometry: Representation and analysis. In: IEEE International Conference on Robotics and Automation (1992)
Google Scholar
Murray, R.M., Li, Z., Sastry, S.S.: A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton (1994)
MATH Google Scholar
Nakamura, Y.: Advanced Robotics: Redundancy and Optimization. Addison Wesley, Reading (1991)
Google Scholar
Ohta, K., Svinin, M., Luo, Z., Hosoe, S., Laboissiere, R.: Optimal trajectory formation of constrained human arm reaching movements. Biological Cybernetics 91, 23–36 (2004)
Article MATH Google Scholar
Park, J., Khatib, O.: Contact consistent control framework for humanoid robots. In: IEEE International Conference on Robotics and Automation (May 2006)
Google Scholar
Peters, J., Mistry, M., Udwadia, F.E., Nakanishi, J., Schaal, S.: A unifying framework for robot control with redundant dofs. Autonomous Robots 24, 1–12 (2008)
Article Google Scholar
Peters, J., Schaal, S.: Learning to control in operational space. The International Journal of Robotics Research 27(2), 197–212 (2008)
Article Google Scholar
Ren, J., McIsaac, K.A., Patel, R.V.: Modified Newton’s method applied to potential field-based navigation for mobile robots. In: IEEE Transactions on Robotics (2006)
Google Scholar
Rimon, E., Koditschek, D.E.: Exact robot navigation using artificial potential functions. IEEE Transactions on Robotics and Automation 8(5), 501–518 (1992)
Article Google Scholar
De Sapio, V., Khatib, O., Delp, S.: Task-level approaches for the control of constrained multibody systems (2006)
Google Scholar
De Sapio, V., Warren, J., Khatib, O., Delp, S.: Simulating the task-level control of human motion: a methodology and framework for implementation. The Visual Computer 21(5), 289–302 (2005)
Article Google Scholar
Schaal, S.: Learning from demonstration. In: Mozer, M.C., Jordan, M., Petsche, T. (eds.) Advances in Neural Information Processing Systems, pp. 1040–1046. MIT Press, Cambridge (1997)
Google Scholar
Schaal, S., Atkeson, C.G.: Constructive incremental learning from only local information. Neural Computation 10, 2047–2084 (1998)
Article Google Scholar
Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. Philosophical Transactions: Biological Sciences 358(1431), 537–547 (2003)
Article Google Scholar
Sentis, L., Khatib, O.: Task-oriented control of humanoid robots through prioritization. In: IEEE International Conference on Humanoid Robots (2004)
Google Scholar
Sentis, L., Khatib, O.: Synthesis of whole-body behaviors through hierarchical control of behavioral primitives. International Journal of Humanoid Robotics 2(4), 505–518 (2005)
Article Google Scholar
Sentis, L., Khatib, O.: A whole-body control framework for humanoids operating in human environments. In: IEEE International Conference on Robotics and Automation (May 2006)
Google Scholar
Sugiura, H., Gienger, M., Janssen, H., Goerick, C.: Real-time collision avoidance with whole body motion control for humanoid robots. In: IEEE International Conference on Intelligent Robots and Systems, pp. 2053–2058 (2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)
Google Scholar
Takano, W., Yamane, K., Sugihara, T., Yamamoto, K., Nakamura, Y.: Primitive communication based on motion recognition and generation with hierarchical mimesis model. In: IEEE International Conference on Robotics and Automation (2006)
Google Scholar
Todorov, E.: Optimal control theory. In: Doya, K. (ed.) Bayesian Brain. MIT Press, Cambridge (2006)
Google Scholar
Udwadia, F.E., Kalaba, R.E.: Analytical Dynamics: A New Approach. Cambridge University Press, Cambridge (1996)
Google Scholar
Verbeek, J.: Learning non-linear image manifolds by combining local linear models. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(8), 1236–1250 (2006)
Article Google Scholar
Verbeek, J., Roweis, S., Vlassis, N.: Non-linear cca and pca by alignment of local models. In: Advances in Neural Information Processing Systems (2004)
Google Scholar
Vijayakumar, S., D’Souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17(12), 2602–2634 (2005)
Article MathSciNet Google Scholar
Yoshikawa, T.: Manipulability of robotic mechanisms. The International Journal of Robotics Research 4(2), 3–9 (1985)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Perception Action and Behaviour, University of Edinburgh, Scotland, UK
Matthew Howard, Stefan Klanke & Sethu Vijayakumar
Honda Research Institute Europe (GmBH), Offenbach, Germany
Michael Gienger & Christian Goerick

Authors

Matthew Howard
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Klanke
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gienger
View author publications
You can also search for this author in PubMed Google Scholar
Christian Goerick
View author publications
You can also search for this author in PubMed Google Scholar
Sethu Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut des Systèmes Intelligents et de Robotique (CNRS UMR 7222), Université Pierre et Marie Curie Pyramide, Tour 55 Boîte courrier 173, 4 Place Jussieu, 75252, PARIS cedex 05, France
Olivier Sigaud
Dept. Schölkopf, Max-Planck Institute for Biological Cybernetics, Spemannstraße 38,Rm 223, 72076, Tübingen, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S. (2010). Methods for Learning Control Policies from Variable-Constraint Demonstrations. In: Sigaud, O., Peters, J. (eds) From Motor Learning to Interaction Learning in Robots. Studies in Computational Intelligence, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05181-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-05181-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05180-7
Online ISBN: 978-3-642-05181-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Methods for Learning Control Policies from Variable-Constraint Demonstrations

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Task Dependent Trajectory Learning from Multiple Demonstrations Using Movement Primitives

Learning Spatio-temporal Characteristics of Human Motions Through Observation

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Methods for Learning Control Policies from Variable-Constraint Demonstrations

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Task Dependent Trajectory Learning from Multiple Demonstrations Using Movement Primitives

Learning Spatio-temporal Characteristics of Human Motions Through Observation

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation