Abstract
In this paper, we solve the call admission control and routing problem in multimedia networks via reinforcement learning (RL). The problem requires that network revenue be maximized while simultaneously meeting quality of service constraints that forbid entry into certain states and use of certain actions. The problem can be formulated as a constrained semi-Markov decision process. We show that RL provides a solution to this problem and is able to earn significantly higher revenues than alternative heuristics.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Altman, E., & Shwartz, A. (1991). Adaptive control of constrained Markov chains. IEEE Trans. on Auto. Control, 36:4, 454–462.
Bertsekas, D. P., & Gallager, R. (1992). Data networks, 2nd ed. Englewood Cliff, NJ: Prentice Hall.
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neural-dynamic programming. Belmont, MA: Athena Scientific.
Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. J. Math. Anal. Appl., 112, 236–252.
Beutler, F. J., & Ross, K. W. (1986). Time-average optimal constrained semi-Markov decision processes. Adv. Appl. Prob., 18, 341–359.
Boyan, J. A., & Littman, M. L. (1994). Packet routing in dynamically changing networks: A reinforcement learning approach. In J. D. Cowan, et al. (Eds.), Advances in NIPS 6 (pp. 671–678). San Francisco: Morgan Kauffman.
Brown, T. X. (1997). Adaptive statistical multiplexing for broadband communications. In Fifth IFIP Workshop on Performance Modeling & Evaluation of ATM Networks, Ilkley, U.K., invited tutorial; also published in D. Kouvatsos (Ed.), Performance evaluation and application of ATM networks (pp. 51–79). Boston, MA: Kluwer.
Brown, T. X. (2001). Statistical classification based admission control. In R. D. van der Mei et al. (Eds.), Proc. SPIE Conf. on Internet Performance and Control of Network Systems II (Vol. 4523, pp. 1-14).
Brown, T. X., Tong, H., & Singh, S. (1999). Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning. In M. Kearns et al. (Eds.), Advances in neural information processing systems 11 (pp. 982–988). Cambridge, MA: MIT Press.
Carlstrom, J., & Nordstrom, E. (1997). Control of self-similar ATM call traffic by reinforcement learning. In J. Alspector et al. (Eds.), Applications of neural networks to telecommunications 3. LEA Publishers.
Dziong, Z., & Mason, L. G. (1994). Call admission and routing in multi-service loss networks. IEEE Trans. Commun., 42:4, 2011–2022.
Dziong, Z. (1997). ATM network resource management. New York: McGraw-Hill.
Feinberg, E. A. (1994). Constrained semi-Markov decision processes with average reward. ZOR-Math. Methods Oper. Res., 39, 257–288.
Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. In Proc. International Conference on Machine Learning, Madison, WI.
Krishnan, K. R. (1990). Markov decision algorithms for dynamic routing. IEEE Commun. Mag., Oct., 66-69.
Lee, W. C., Hluchyj, M. G., & Humblet, P. A. (1995). Routing subject to quality of service constraints in integrated communication networks. IEEE Network, July/August, 46-55.
Leland,W. E., Taqqu, M. S., Willingger, W., & Wilson, D. V. (1993). On the self-similar nature of Ethernet traffic. In Proc. of ACM SIGCOMM (pp. 183-193).
Marbach, P., & Tsitsiklis, J. N. (1997). A neuro-dynamic approach to admission control in ATM networks: The single link case. In ICASSP.
Marbach, P., Mihatsch, O., & Tsitsiklis, J. N. (2000). Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE J. Sel. Areas Commun., 18:2, 197–208.
Mitra, D., Reiman, M. I., & Wang, J. (1998). Robust dynamic admission control for unified cell and call QoS in statistical multiplexers. IEEE J. Sel. Areas Commun., 16:5, 692–707.
Nie, J., & Haykin, S. (1999). A Q-learning based dynamic channel assignment technique for mobile communication systems. IEEE Trans. on Vehicular Technology, 48:5, 1676–1687.
Pornavalai, C., Chakraborty, G., & Shiratori, N. (1997). Qos based routing algorithm in integrated services packet networks. In Proc. 5th Intl. Conf. on Network Protocols, Atlanta, GA (pp. 167-174).
Singh, S. P., & Bertsekas, D. P. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In M. Mozer et al. (Eds.), Advances in NIPS 9 (pp. 974–980). Cambridge, MA: MIT Press.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning, An introduction. Cambridge MA: The MIT Press.
Szepesvari, Cs. (1998). The asymptotic convergence-rate of Q-learning. In M. Jordan et al. (Eds.), Advances in NIPS 10 (pp. 1064–1070). Cambridge, MA: MIT Press.
Tong, H. (1999). Adaptive admission control and routing under quality of service constraints in broadband communications. Ph.D. thesis, Dept. of Electrical and Computer Engineering, Univ. of Colorado, Boulder, CO.
Tong, H., & Brown, T. X. (1998). Estimating loss rates in an integrated services network by neural networks. In Proc. IEEE Globecom98, Sydney, Australia.
Tong, H., & Brown, T. X. (2000). Adaptive call admission control under quality of service constraints: A reinforcement learning solution. IEEE J. Sel. Areas Commun., 18:2, 209–221.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tong, H., Brown, T.X. Reinforcement Learning for Call Admission Control and Routing under Quality of Service Constraints in Multimedia Networks. Machine Learning 49, 111–139 (2002). https://doi.org/10.1023/A:1017924227920
Issue Date:
DOI: https://doi.org/10.1023/A:1017924227920