Reinforcement Learning in Economics and Finance

7563 Accesses
45 Citations
15 Altmetric
1 Mention
Explore all metrics

Abstract

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy – a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning and stochastic optimisation

Article 23 December 2021

Introduction to Reinforcement Learning

Deep Reinforcement Learning in Financial Markets Context: Review and Open Challenges

References

Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference in Machine Learning (ICML 2004).
Abel, D. (2019). Concepts in Bounded Rationality: Perspectives from Reinforcement Learning. PhD thesis, Brown University.
Aguirregabiria, V., & Mira, P. (2002). Swapping the nested fixed point algorithm: a class of estimators for discrete markov decision models. Econometrica, 70(4), 1519–1543.
Article Google Scholar
Aguirregabiria, V., & Mira, P. (2010). Dynamic discrete choice structural models: a survey. Journal of Econometrics, 156(1), 38–67.
Article Google Scholar
Almahdi, S., & Yang, S. Y. (2017). An adaptive portfolio trading system: a risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 87, 267–279.
Article Google Scholar
Arthur, W. B. (1991). Designing economic agents that act like human agents: a behavioral approach to bounded rationality. The American Economic Review, 81(2), 353–359.
Google Scholar
Arthur, W. B. (1994). Inductive reasoning and bounded rationality. The American Economic Review, 84(2), 406–411.
Google Scholar
Athey, S., & Imbens, G. W. (2016). The econometrics of randomized experiments. ArXiv e-prints.
Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11(1), 685–725.
Article Google Scholar
Aumann, R. J. (1997). Rationality and bounded rationality. Games and Economic Behavior, 21(1), 2–14.
Article Google Scholar
Bain, M., & Sammut, C. (1995). A framework for behavioural cloning. In Machine Intelligence 15.
Baldacci, B. Manziuk, I., Mastrolia, T., & Rosenbaum, M. (2019). Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach. arXiv preprint arXiv:1912.01129.
Barto,A. G., & Singh, S. P. (1991). On the computational economics of reinforcement learning. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton (eds), Connectionist Models, pp. 35 – 44. Morgan Kaufmann.
Basci, E. (1999). Learning by imitation. Journal of Economic Dynamics and Control, 23(9), 1569–1585.
Article Google Scholar
Bellman, R. (1957). Dynamic Programming. Princeton, NJ: Princeton University Press.
Google Scholar
Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.
Bergemann, D., & Hege, U. (1998). Venture capital financing, moral hazard and learning. Journal of Banking and Finance, 22(6), 703–735.
Article Google Scholar
Bergemann, D., & Hege, U. (2005). The financing of innovation: Learning and stopping. The RAND Journal of Economics, 36(4), 719–752.
Google Scholar
Bergemann, D., & Välimäki, J. (1996). Learning and strategic pricing. Econometrica, 64(5), 1125–1149.
Article Google Scholar
Bernheim, B. D. (1984). Rationalizable strategic behavior. Econometrica, 52(4), 1007–1028.
Article Google Scholar
Berry, D. A., & Fristedt, B. (1985). Bandits Problems Sequential Allocation of Experiments. — (Monographs on statistics and applied probability). Chapman and Hall.
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific.
Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (ed), Online Learning and Neural Networks.
Brown, G. W. (1951). Iterative solutions of games by fictitious play. In T. Koopmans (Ed.), Activity Analysis of Production and Allocation (pp. 374–376). NewYork: Wiley.
Google Scholar
Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
Article Google Scholar
Börgers, T., Morales, A. J., & Sarin, R. (2004). Expedient and monotone learning rules. Econometrica, 72(2), 383–405.
Article Google Scholar
Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., & Guo, D. (2017). Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pp. 661–670. Association for Computing Machinery: New York, USA.
Charpentier, A., Flachaire, E., & Ly, A. (2018). Econometrics and machine learning. Economics and Statistics, 505(1), 147–169.
Google Scholar
Chattopadhyay, R., & Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy experiment in india. Econometrica, 72(5), 1409–1443.
Article Google Scholar
Cherniak, C. (1986). Minimal Rationality. MIT Press: MIT Press.
Google Scholar
Christofides, N. (1976). Worst-case analysis of a new heuristic for the travelling salesman problem. Graduate School of Industrial Administration, CMU: Technical report.
Google Scholar
Croes, G. A. (1958). A method for solving traveling-salesman problems. Operations research, 6(6), 791–812.
Article Google Scholar
Cyert, R. M., & DeGroot, M. H. (1974). Rational expectations and bayesian analysis. Journal of Political Economy, 82(3), 521–536.
Article Google Scholar
Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017). Learning combinatorial optimization algorithms over graphs. arXiv preprint arXiv:1704.01665.
Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3), 653–664.
Article Google Scholar
Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., & Rousseau, L.-M. (2018). Learning heuristics for the tsp by policy gradient. Artificial Intelligence, and Operations Research. In W.-J. van Hoeve (Ed.), Integration of Constraint Programming (pp. 170–181). Cham: Springer International Publishing.
Google Scholar
Devaine, M., Gaillard, P., Goude, Y., & Stoltz, G. (2013). Forecasting electricity consumption by aggregating specialized experts. Machine Learning, 90(2), 231–260.
Article Google Scholar
Dilaver, O., Calvert Jump, R., & Levine, P. (2018). Agent-based macroeconomics and dynamic stochastic general equilibrium models: Where do we go from here? Journal of Economic Surveys, 32(4), 1134–1159.
Article Google Scholar
Doraszelski, U., & Satterthwaite, M. (2010). Computable markov-perfect industry dynamics. The RAND Journal of Economics, 41(2), 215–243.
Article Google Scholar
Dorigo,M., & Gambardella, L. M. (1996). Ant colonies for the traveling salesman problem. Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, 3.
Dütting, P., Feng, Z., Narasimhan, H., Parkes, D. C., & Ravindranath, S. S. (2017). Optimal auctions through deep learning.
Elie, R., Perolat, J., Laurière, M., Geist, M., & Pietquin, O. (2020). On the convergence of model free learning in mean field games. In AAAI Conference one Artificial Intelligence (AAAI 2020).
Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88(4), 848–881.
Google Scholar
Ericson, R., & Pakes, A. (1995). Markov-perfect industry dynamics: a framework for empirical work. The Review of Economic Studies, 62(1), 53–82.
Article Google Scholar
Escobar, J. F. (2013). Equilibrium analysis of dynamic models of imperfect competition. International Journal of Industrial Organization, 31(1), 92–101.
Article Google Scholar
Even Dar, E., Mirrokni, V. S., Muthukrishnan, S., Mansour, Y., & Nadav, U. (2009). Bid optimization for broad match ad auctions. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, pages 231–240. Association for Computing Machinery: New York, USA.
Feldman, M. (1987). Bayesian learning and convergence to rational expectations. Journal of Mathematical Economics, 16(3), 297–313.
Article Google Scholar
Feng, Z., Narasimhan, H., Parkes, D. C. (2018). Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pp. 354–362. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC.
Fershtman, C., & Pakes, A. (2012). Dynamic Games with Asymmetric Information: a Framework for Empirical Work*. The Quarterly Journal of Economics, 127(4), 1611–1661.
Article Google Scholar
Flood, M. M. (1956). The travelling salesman problem. Operations Research, 4, 61–75.
Article Google Scholar
Folkers, A., Rick, M., & Buskens, C. (2019). Controlling an autonomous vehicle with deep reinforcement learning. 2019 IEEE Intelligent Vehicles Symposium (IV). https://doi.org/10.1109/ivs.2019.8814124.
Franke, R. (2003). Reinforcement learning in the el farol model. Journal of Economic Behavior and Organization, 51(3), 367–388.
Article Google Scholar
Fudenberg, D., & Levine, D. (1998). The Theory of Learning in Games. USA: Massachusetts Institute of Technology (MIT) Press.
Google Scholar
Fécamp, S., Mikael, J., & Warin, X. (2019). Risk management with machine-learning-based algorithms. arXiv preprint arXiv:1902.05287,.
Gabaix, X. (2014). A sparsity-based model of bounded rationality. The Quarterly Journal of Economics, 129(4), 1661–1710.
Article Google Scholar
Galichon, A. (2017). Optimal transport methods in economics. USA: Princeton University Press.
Book Google Scholar
Gambardella, L. M., & Dorigo, M. (1995). Ant-Q: A reinforcement learning approach to the traveling salesman problem. In A. Prieditis and S. Russell, editors, Machine Learning Proceedings 1995, pp. 252–260. Morgan Kaufmann.
Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi-agent dealer market. arXiv preprint arXiv:1911.05892.
Garcia, J. (1981). The nature of learning explanations. Behavioral and Brain Sciences, 4(1), 143–144.
Article Google Scholar
Gennaioli, N., & Shleifer, A. (2010). What Comes to Mind*. The Quarterly Journal of Economics, 125(4), 1399–1433.
Article Google Scholar
Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273–278.
Article Google Scholar
Gibson, B. (2007). A multi-agent systems approach to microeconomic foundations of macro. Economics Department Working Paper, University of Massachusetts, 2007-10.
Gigerenzer, G., & Goldstein, D. (1996). Reasoning the fast and frugal way: models of bounded rationality. Psychological review, 103(4), 650.
Article Google Scholar
Gittins, J. (1989). Bandit processes and dynamic allocation indices. NewYork: Wiley.
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
Granato, J., Guse, E. A., & Wong, M. C. S. (2008). Learning from the expectations of others. Macroeconomic Dynamics, 12(3), 345–377. https://doi.org/10.1017/S1365100507070186.
Article Google Scholar
Guéant, O., & Manziuk, I. (2020). Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. Applied Mathematical Finance, 26(5), 387–452.
Article Google Scholar
Hansen, L. P., & Sargent, T. J. (2013). Recursive Models of Dynamic Linear Economies. The Gorman Lectures in Economics. Princeton University Press.
Hart, S., & Mas-Colell, A. (2003). Uncoupled dynamics do not lead to nash equilibrium. American Economic Review, 93(5), 1830–1836.
Article Google Scholar
Hasselt, H. V. (2010). Double q-learning. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Inc.
Hellwig, M. F. (1973). Sequential models in economic dynamics. PhD thesis, Massachusetts Institute of Technology, Department of Economics.
Holland, J. H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. USA: University of Michigan Press.
Google Scholar
Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.
Google Scholar
Hopkins, E. (2002). Two competing models of how people learn in games. Econometrica, 70(6), 2141–2166.
Article Google Scholar
Horst, U. (2005). Stationary equilibria in discounted stochastic games with weakly interacting players. Games and Economic Behavior, 51(1), 83–108.
Article Google Scholar
Hotz, V. J., & Miller, R. A. (1993). Conditional choice probabilities and the estimation of dynamic models. The Review of Economic Studies, 60(3), 497–529.
Article Google Scholar
Howard, R. A. (1960). Dynamic Programming and Markov Processes. Cambridge, Massachusetts: MIT Press.
Google Scholar
Huang, M., Malhamé, R. P., & Caines, P. E. (2006). Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Communications in Information and Systems, 6(3), 221–252.
Article Google Scholar
Hughes, N. (2014). Applying reinforcement learning to economic problems. Technical report, Australian National University.
Igami, M. (2017). Artificial intelligence as structural estimation: Economic interpretations of deep blue, bonanza, and alphago. arXiv preprint arXiv:1710.10967.
Ito, K., & Reguant, M. (2016). Sequential markets, market power, and arbitrage. American Economic Review, 106(7), 1921–57. https://doi.org/10.1257/aer.20141529.
Article Google Scholar
Jenkins, H. M. (1979). Animal learning and behavior theory. In E. Hearst (ed), The first century of experimental psychology, pp. 177–228.
Jovanovic, B. (1982). Selection and the evolution of industry. Econometrica, 50(3), 649–670.
Article Google Scholar
Kahneman, D. (2011). Thinking, fast and slow. NewYork: Macmillan.
Google Scholar
Kasy, M., & Sautmann, A. (2019). Adaptive treatment assignment in experiments for policy choice. Technical report, Harvard University.
Keller, G., & Rady, S. (1999). Optimal experimentation in a changing environment. The Review of Economic Studies, 66(3), 475–507.
Article Google Scholar
Kimbrough, S. O., & Murphy, F. H. (2008). Learning to collude tacitly on production levels by oligopolistic agents. Computational Economics, 33(1), 47.
Article Google Scholar
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Sallab, A. A. A., Yogamani, S., & Pérez, P. (2020). Deep reinforcement learning for autonomous driving: A survey. arXiv preprint arXiv:2002.00444, 2020.
Kiyotaki, N., & Wright, R. (1989). On money as a medium of exchange. Journal of Political Economy, 97(4), 927–954.
Article Google Scholar
Klein, E., Geist, M., Piot, B., & Pietquin, O. (2012). Inverse reinforcement learning through structured classification. Advances in Neural Information Processing Systems, 1007–1015.
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.
Google Scholar
Lasry, J.-M., & Lions, P.-L. (2006a). Jeux à champ moyen. i - le cas stationnaire. Comptes Rendus Mathematique, 343(9), 619–625.
Article Google Scholar
Lasry, J.-M., & Lions, P.-L. (2006b). Jeux à champ moyen. ii - horizon fini et contrôle optimal. Comptes Rendus Mathematique, 343(10), 679–684.
Article Google Scholar
Leimar, O., & McNamara, J. (2019). Learning leads to bounded rationality and the evolution of cognitive bias in public goods games. Nature Scientific Reports, 9, 16319.
Article Google Scholar
Lettau, M., & Uhlig, H. (1999). Rules of thumb versus dynamic programming. American Economic Review, 89(1), 148–174.
Article Google Scholar
Levina, T., Levin, Y., McGill, J., & Nediak, M. (2009). Dynamic pricing with online learning and strategic consumers: an application of the aggregating algorithm. Operations Research, 57(2), 327–341.
Article Google Scholar
Li, B., & Hoi, S. C. (2014). Online portfolio selection: a survey. ACM Computing Surveys (CSUR), 46(3), 1–36.
Google Scholar
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, pp. 157–163. Elsevier.
Ljungqvist, L., & Sargent, T. J. (2018). Recursive macroeconomic theory (Vol. 4). USA: MIT Press.
Google Scholar
Magnac, T., & Thesmar, D. (2002). Identifying dynamic discrete decision processes. Econometrica, 70(2), 801–816.
Article Google Scholar
Marcet, A., & Sargent, T. J. (1989a). Convergence of least-squares learning in environments with hidden state variables and private information. Journal of Political Economy, 97(6), 1306–1322.
Article Google Scholar
Marcet, A., & Sargent, T. J. (1989b). Convergence of least squares learning mechanisms in self-referential linear stochastic models. Journal of Economic Theory, 48(2), 337–368.
Article Google Scholar
Maskin, E., & Tirole, J. (1988a). A theory of dynamic oligopoly, I: Overview and quantity competition with large fixed costs. Econometrica, 56, 549–569.
Article Google Scholar
Maskin, E., & Tirole, J. (1988b). A theory of dynamic oligopoly, II: Price competition, kinked demand curves, and edgeworth cycles. Econometrica, 56, 571–579.
Article Google Scholar
McLennan, A. (1984). Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control, 7(3), 331–347.
Article Google Scholar
Miller, R. A. (1984). Job matching and occupational choice. Journal of Political Economy, 92(6), 1086–1120.
Article Google Scholar
Minsky, M. (1961). Steps toward artificial intelligence. Transactions on Institute of Radio Engineers, 49, 8–30.
Google Scholar
Misra, K., Schwartz, E. M., & Abernethy, J. (2019). Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science, 38(2), 226–252.
Article Google Scholar
Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875–889.
Article Google Scholar
Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106.
Article Google Scholar
Nedić, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13, 79–110.
Article Google Scholar
Ng, A. Y., Russell, S. J., et al. (2000). Algorithms for inverse reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), 663–670.
O’Neill, D., Levorato, M., Goldsmith, A., & Mitra, U. (Oct 2010). Residential demand response using reinforcement learning. In 2010 First IEEE International Conference on Smart Grid Communications, pp. 409–414.
Pakes, A. (1986). Patents as options: some estimates of the value of holding european patent stocks. Econometrica, 54(4), 755–784.
Article Google Scholar
Pakes, A., & Schankerman, M. (1984). The rate of obsolescence of patents, research gestation lags, and the private rate of return to research resources (pp. 73–88). Chicago: University of Chicago Press.
Google Scholar
Pearce, D. G. (1984). Rationalizable strategic behavior and the problem of perfection. Econometrica, 52(4), 1029–1050.
Article Google Scholar
Pearl, J. (2019). The seven tools of causal inference, with reflections on machine learning. Commununications of the ACM, 62(3), 54–60.
Article Google Scholar
Perolat, J., Piot, B., & Pietquin, O. (2018). Actor-critic fictitious play in simultaneous move multistage games. International Conference on Artificial Intelligence and Statistics, pp. 919–928.
Rescorla, R. A. (1979). Aspects of the reinforcer learned in second-order Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 5(1), 79–95.
Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Article Google Scholar
Robinson, J. (1951). An iterative method of solving a game. Annals of mathematics, 296–301.
Rosenkrantz, D. J., Stearns, R. E., & Lewis, P. M. (Oct 1974). Approximate algorithms for the traveling salesperson problem. In 15th Annual Symposium on Switching and Automata Theory (swat 1974), pp. 33–42.
Rothkopf, C. A., & Dimitrakakis, C. Preference elicitation and inverse reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, pp. 34–48. Springer: Berlin.
Rothschild, M. (1974). A two-armed bandit theory of market pricing. Journal of Economic Theory, 9(2), 185–202.
Article Google Scholar
Rubinstein, A. (1998). Modeling Bounded Rationality. USA: MIT Press.
Book Google Scholar
Russell, S. J., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall: New Jersey.
Google Scholar
Russell, S. J., & Subramanian, D. (1995). Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2(1), 575–609.
Article Google Scholar
Rust, J. (1987). Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica, 55(5), 999–1033.
Article Google Scholar
Rustichini, A. (1999). Optimal properties of stimulus-response learning models. Games and Economic Behavior, 29(1), 244–273. https://doi.org/10.1006/game.1999.0712.
Article Google Scholar
Samuelson, L. (1997). Evolutionary games and equilibrium selection. Mass: MIT Press Cambridge.
Google Scholar
Sargent, T. (1993). Bounded rationality in macroeconomics. Oxford: Oxford University Press.
Google Scholar
Schaal, S. (1996). Learning from demonstration. In Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS’96, pp.1040-1046, Cambridge, MA, USA. MIT Press.
Schwalbe, U. (2019). Algorithms, machine learning, and collusion. Journal of Competition Law and Economics, 14(4), 568–607.
Article Google Scholar
Schwind, M. (2007). Dynamic pricing and automated resource allocation for complex information services: reinforcement learning and combinatorial auctions. Berlin: Springer-Verlag.
Google Scholar
Semenova, V. (2018). Machine learning for dynamic discrete choice. arXiv preprint arXiv:1808.02569.
Shapley, L. (1964). Some topics in two-person games. Advances in Game Theory, 52, 1–29.
Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.
Article Google Scholar
Simon, H. A. (1972). Theories of bounded rationality. Decision and Organization, 1(1), 161–176.
Google Scholar
Sinitskaya, E., & Tesfatsion, L. (2015). Macroeconomies as constructively rational games. Journal of Economic Dynamics and Control, 61, 152–182.
Article Google Scholar
Skinner, B. F. (1938). The behavior of organisms: an experimental analysis. New York: Appleton-Century-Crofts.
Google Scholar
Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 434–442. International Foundation for Autonomous Agents and Multiagent Systems.
Stokey, N. L., Lucas, R. E., & Prescott, E. C. (1989). Recursive methods in economic dynamics. Cambridge: Harvard University Press.
Book Google Scholar
Su, C.-L., & Judd, K. L. (2012). Constrained optimization approaches to estimation of structural models. Econometrica, 80(5), 2213–2230.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review, 88(2), 135.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIP Press.
Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015). Policy gradient for coherent risk measures. Advances in Neural Information Processing Systems, 1468–1476.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
Article Google Scholar
Thorndike, E. L. (1911). Animal Intelligence. New York, NY: Macmillan.
Google Scholar
Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological review, 55(4), 189.
Article Google Scholar
Vyetrenko, S., & Xu, S. (2019). Risk-sensitive compact decision trees for autonomous execution in presence of simulated market response. arXiv preprint arXiv:1906.02312
Waltman, L., & Kaymak, U. (2008). $q$-learning agents in a cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293.
Article Google Scholar
Wang, H., & Zhou, X.Y. (2019). Continuous-time mean-variance portfolio optimization via reinforcement learning. arXiv preprint arXiv:1904.11392
Watkins, C.J. (1989). Learning from delayed reward. PhD thesis, Cambridge University
Watkins, C. J. C. H., & Dayan, P. (1992). $q$-learning. Machine Learning, 8(3), 279–292.
Article Google Scholar
Weber, R. (1992). On the gittins index for multiarmed bandits. The Annals of Applied Probability, 2(4), 1024–1033.
Article Google Scholar
Weinan, E., Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4), 349–380.
Article Google Scholar
Weitzman, M. L. (1979). Optimal search for the best alternative. Econometrica, 47(3), 641–654.
Article Google Scholar
Whittle, P. (1983). Optimization Over Time (Vol. 1). Chichester, UK: Wiley.
Google Scholar
Wiese, M., Bai, L., Wood, B., & Buehler, H. (2019a). Deep hedging: learning to simulate equity option markets. Available at SSRN 3470756
Wiese, M., Knobloch, R., Korn, R., & Kretschmer, P. (2019b). Quant gans: deep generation of financial time series. arXiv preprint arXiv:1907.06673
Wintenberger, O. (2017). Optimal learning with bernstein online aggregation. Machine Learning, 106(1), 119–141.
Article Google Scholar
Wolpin, K. I. (1984). An estimable dynamic stochastic model of fertility and child mortality. Journal of Political Economy, 92(5), 852–874.
Article Google Scholar
Zhang, K., Yang, Z., & Başar, T. (2019). Multi-agent reinforcement learning: a selective overview of theories and algorithms
Zhang, W., Yuan, S., & Wang, J. (2014). Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 1077–1086, New York, NY, USA. Association for Computing Machinery.
Zhao, J., Qiu, G., Guan, Z., Zhao, W., He, X. (2018). Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pp. 1021–1030. Association for Computing Machinery: New York, USA.

Download references

Funding

Arthur Charpentier acknowledges the financial support of the AXA Research Fund through the joint research initiative Use and value of unusual data in actuarial science, as well as NSERC grant 2019-07077.

Author information

Authors and Affiliations

Université du Québec à Montréal (UQAM), 201, avenue du Président-Kennedy, Montréal (Québec), H2X 3Y7, Canada
Arthur Charpentier
LAMA, Université Gustave Eiffel, CNRS, 5, boulevard Descartes Cité Descartes - Champs-sur-Marne, 77454, Marne-la-Vallée cedex 2, France
Romuald Élie & Carl Remlinger

Authors

Arthur Charpentier
View author publications
You can also search for this author in PubMed Google Scholar
Romuald Élie
View author publications
You can also search for this author in PubMed Google Scholar
Carl Remlinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Charpentier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Charpentier, A., Élie, R. & Remlinger, C. Reinforcement Learning in Economics and Finance. Comput Econ 62, 425–462 (2023). https://doi.org/10.1007/s10614-021-10119-4

Download citation

Accepted: 07 April 2021
Published: 23 April 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10614-021-10119-4

Reinforcement Learning in Economics and Finance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement learning and stochastic optimisation

Introduction to Reinforcement Learning

Deep Reinforcement Learning in Financial Markets Context: Review and Open Challenges

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Subscribe and save

Buy Now

Navigation

Reinforcement Learning in Economics and Finance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement learning and stochastic optimisation

Introduction to Reinforcement Learning

Deep Reinforcement Learning in Financial Markets Context: Review and Open Challenges

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Subscribe and save

Buy Now

Search

Navigation