Abstract
In this paper, we study the means of developing an imitation process allowing to improve learning in the framework of learning classifier systems. We present three different approaches in the way a behavior observed may be taken into account through a guidance interaction: two approaches using a model of this behavior, and one without modelling. Those approaches are evaluated and compared in different environments when they are applied to three major classifier systems: ZCS, XCS and ACS. Results are analyzed and discussed. They highlight the importance of using a model of the observed behavior to enable an efficient imitation. Moreover, they show the advantages of taking this model into account by a specialized internal action. Finally, they bring new results of comparison between ZCS, XCS and ACS.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Butz uses the following parameters: γ = 0.95, b r = 0.05, b q = 0.05, θ r = 0.9 and θ i = 0.1.
A system applying the policy of M1 around the obstacles and food but performing another action in the empty situation would have the following performance: 3.70 with E or W actions, 3.35 with SE or NW, 3.80 with SW or NE. N and S actions may conduct to cycling behaviors.
References
Atkeson CG, Schaal S (1997) Robot learning from demonstration. In: Proceedings of the 14th international conference on machine learning. Morgan Kaufmann, pp 12–20
Bakker P, Kuniyoshi Y (1996) Robot see, robot do: an overview of robot imitation. In: AISB’96 workshop on learning in robots and animals. Brighton, UK, pp 3–11
Bull L, Hurst J (2002) ZCS redux. Evolut Comput 10(2):185–205
Butz M, Goldberg DE, Stolzmann W (1999) New challenges for an ACS: hard problems and possible solutions. Technical report 99019, University of Illinois at Urbana-Champaign, Urbana, IL, October 1999
Butz M, Goldberg DE, Stolzmann W (2000) The anticipatory classifier system and genetic generalization. Technical report 2000032, Illinois Genetic Algorithms Laboratory, 2000
Butz MV, Wilson SW (2002) An algorithmic description of XCS. Soft Computing 6:144–153
Cliff D, Ross S (1994) Adding temporary memory to ZCS. Adapt Behav 3(2):101–150
Damer BF (1998) Avatars, exploring and building Virtual Worlds on the Internet. Addison-Wesley Longman/Peachpit Press, USA
Dorigo M, Colombetti M (1994) Robot shaping: developing autonomous agents through learning. Artif Intell 71:321–370
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, Mass
Heudin J-C (1998) Virtual Worlds. In: Heudin JC (ed) Virtual Worlds: synthetic universes, digital life and complexity. Perseus Books, pp 1–28
Hoffmann J (1993) Vorhersage und Erkenntnis [Anticipation and Cognition]. Hogrefe
Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Mitchell, Michalski, Carbonell (eds) Machine learning, an artificial intelligence approach, vol II, chapter 20. Morgan Kaufmann, pp 593–623
Kovacs T (1996) Evolving optimal populations with XCS classifier systems. Master’s thesis, School of Computer Science, University of Birmingham, Birmingham, UK, 1996. Also technical report CSR-96-17 and CSRP-96-17
Kovacs T, Kerber M (2001) What makes a problem hard for XCS? In: Lanzi PL, Stolzmann W, Wilson SW (eds) Advances in learning classifier systems, volume 1996 of LNAI. Springer–Verlag, Berlin, pp 80–99
Lanzi Pier Luca (1999) An analysis of generalization in the XCS classifier system. Evolut Comput 7(2):125–149
Lanzi PL (2002) Learning classifier systems from a reinforcement learning perspective. Soft Comput 6(3):162–170
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn J 8:293–321
Mataric MJ (1998) Using communication to reduce locality in distributed multiagent learning. J Exp Theor Artif Intell 10(3):357–369
Mitchell RW (1989) A comparative-developmental approach to understanding imitation. Perspect Ethol 7:183–215
Ng AY, Russell S (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 663–670
Price B (2003) Accelerating reinforcement learning with imitation. PhD thesis, University of British Columbia
Price B, Boutilier C (1999) Implicit imitation in multiagent reinforcement learning. In: Proceedings of the 16th international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 325–334
Sammut C, Hurst S, Kedzier D, Michie D (1992) Learning to fly. In: Proceedings of the ninth international conference on machine learning, July 1992. Morgan Kaufmann, Aberdeen
Stolzmann W (1998) Anticipatory classifier systems. In: Proceedings of the third annual genetic programming conference. Morgan Kaufmann, pp 658–664
Šuc D, Bratko I (1997) Skill reconstruction as induction of LQ controllers with subgoals. In: Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), August 23–29 1997. Morgan Kaufmann Publishers, San Francisco, pp 914–919
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge
Urbančič T, Bratko I (1994) Reconstructing human skill with machine learning. In: Cohn AG (ed) Proceedings of the eleventh European conference on artificial intelligence, August 8–12 1994. John Wiley and Sons, Chichester, pp 498–502
Utgoff PE, Clouse JA (1991) Two kinds of training information for evaluation function learning. In: Dean, McKeown (eds) Proceedings of the 9th national conference on artificial intelligence. MIT Press, July, pp 596–600
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):272–292
Whitehead SD (1991) A complexity analysis of cooperative mechanisms in reinforcement learning. In: Dean, McKeown (eds) Proceedings of the 9th national conference on artificial intelligence. MIT Press, pp 607–613
Wilson SW (1994) ZCS: a zeroth level classifier system. Evol Comput 2(1):1–18
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Métivier, M., Lattaud, C. Imitation guided learning in learning classifier systems. Nat Comput 8, 29–56 (2009). https://doi.org/10.1007/s11047-007-9054-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-007-9054-8