Abstract
Value function approximation is a critical task in solving Markov decision processes and accurately modeling reinforcement learning agents. A significant issue is how to construct efficient feature spaces from samples collected by the environment in order to obtain an optimal policy. The particular study addresses this challenge by proposing an on-line kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds a new cluster, or adjusts the winning cluster’s parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Inteligence Research 4, 237–285 (1996)
Sutton, R.: Learning to predict by the method of temporal differences. Machine Learning 3(1), 9–44 (1988)
Boyan, J.A.: Technical update: Least-squares temporal difference learning. Machine Learning, 233–246 (2002)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)
Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems 16, pp. 751–759 (2004)
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005)
Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. In: NIPS, pp. 441–448 (2008)
Konidaris, G.D., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the fourier basis. In: AAAI Conf. on Artificial Intelligence, pp. 380–385 (2011)
Mahadevan, S.: Samuel meets amarel: Automating value function approximation using global state space analysis. In: AAAI (2005)
Mahadevan, S., Maggione, M.: Proto-value Functions: A Laplacian Framework for Learning Repersentation and Control in Markov Decision Porocesses. Journal of Machine Learning Research 8, 2169–2231 (2007)
Menache, I., Mannor, S., Shimkin, N.: Basis Function Adaptation in Temporal Difference Reinforcement Learning. Annals of Operations Research 134, 215–238 (2005)
Petrik, M.: An analysis of laplacian methods for value function approximation in mdps. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)
Scholkopf, B., Smola, A.J., Muller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Tzortzis, G., Likas, A.: The Global Kernel k-Means Clustering Algorithm. IEEE Trans. on Neural Networks 20(7), 1181–1194 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tziortziotis, N., Blekas, K. (2012). An Online Kernel-Based Clustering Approach for Value Function Approximation. In: Maglogiannis, I., Plagianakos, V., Vlahavas, I. (eds) Artificial Intelligence: Theories and Applications. SETN 2012. Lecture Notes in Computer Science(), vol 7297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30448-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-30448-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30447-7
Online ISBN: 978-3-642-30448-4
eBook Packages: Computer ScienceComputer Science (R0)