Abstract
As a model-free algorithm, deep reinforcement learning (DRL) agent learns and makes decisions by interacting with the environment in an unsupervised way. In recent years, DRL algorithms have been widely applied by scholars for portfolio optimization in consecutive trading periods, since the DRL agent can dynamically adapt to market changes and does not rely on the specification of the joint dynamics across the assets. However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns. Since the dynamic correlations among portfolio assets are crucial in optimizing the portfolio, the lack of such knowledge makes it difficult for the DRL agent to maximize the return per unit of risk, especially when the target market permits short selling (i.e., the US stock market). In this research, we propose a hybrid portfolio optimization model combining the DRL agent and the Black-Litterman (BL) model to enable the DRL agent to learn the dynamic correlation between the portfolio asset returns and implement an efficacious long/short strategy based on the correlation. Essentially, the DRL agent is trained to learn the policy to apply the BL model to determine the target portfolio weights. In this model, we formulate a specific objective function based on the environment’s reward function, which considers the return, risk, and transaction scale of the portfolio. Our DRL agent is trained by propagating the objective function’s gradient to the policy function of our DRL agent. To test our DRL agent, we construct the portfolio based on all the Dow Jones Industrial Average constitute stocks. Empirical results of the experiments conducted on real-world United States stock market data demonstrate that our DRL agent significantly outperforms various comparison portfolio choice strategies and alternative DRL frameworks by at least 42% in terms of accumulated return. In terms of the return per unit of risk, our DRL agent significantly outperforms various comparative portfolio choice strategies and alternative strategies based on other machine learning frameworks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
This research uses only publicly available American stock data from Yahoo Finance platform. All datasets generated during and/or analyzed during the current study, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Markowitz H (1952) Portfolio selection J. Finance.
Lindbeck A (2001) The Sveriges Riksbank (Bank of Sweden) Prize in Economic Sciences in Memory of Alfred Nobel 1969–2000. The Nobel Prize. The First 100 Years, 197–217.
Creamer GG (2015) Can a corporate network and news sentiment improve portfolio optimization using the Black-Litterman model? Quant Finance 15(8):1405–1416
Markovitz HM (1959) Portfolio selection: Efficient diversification of investments. John Wiley.
Leung MF, Wang J, Che H (2022) Cardinality-constrained portfolio selection via two-timescale duplex neurodynamic optimization. Neural Netw 153:399–410
Leung M-F, Wang J (2021) Minimax and biobjective portfolio selection based on collaborative neurodynamic optimization. IEEE Trans Neural Netw Learn Syst 32(7):2825–2836
Leung M-F, Wang J (2022) Cardinality-constrained portfolio selection based on collaborative neurodynamic optimization. Neural Netw 145:68–79
Leung M-F, Wang J, Li D (2022) Decentralized robust portfolio optimization based on cooperative-competitive multiagent systems. IEEE Trans Cybern 1–10.
Colasanto F, Grilli L, Santoro D, Villani G (2022) BERT’s sentiment score for portfolio optimization: a fine-tuned view in Black and Litterman model. Neural Comput Appl 34(20):17507–17521
Kochliaridis V, Kouloumpris E, Vlahavas I (2023) Combining deep reinforcement learning with technical analysis and trend monitoring on cryptocurrency markets. Neural Comput Appl 1–18.
Vaziri J, Farid D, Nazemi Ardakani M, Hosseini Bamakan SM, Shahlaei M (2023) A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints. Neural Comput Appl 1–26.
Maciel L, Ballini R, Gomide F (2023) Adaptive fuzzy modeling of interval-valued stream data and application in cryptocurrencies prediction. Neural Comput Appl 35(10):7149–7159
Fatima S, Uddin M (2022) On the forecasting of multivariate financial time series using hybridization of DCC-GARCH model and multivariate ANNs. Neural Comput Appl 34(24):21911–21925
Gao S, Wang Y, Yang X (2023) StockFormer: Learning Hybrid Trading Machines with Predictive Coding. IJCAI International Joint Conference on Artificial Intelligence, 2023–August, 4766–4774–4774.
Shi S, Li J, Li G, Pan P, Chen Q, Sun Q (2022) GPM: a graph convolutional network based reinforcement learning framework for portfolio management. Neurocomputing 498:14–27
Fabozzi FJ, Focardi SM, Kolm PN (2010) Quantitative equity investing: Techniques and strategies. John Wiley & Sons, USA
Michaud RO (1989) The Markowitz optimization enigma: Is ‘optimized’optimal? Financ Anal J 45(1):31–42
Michaud RO, Michaud RO (2008) Efficient asset management: a practical guide to stock portfolio optimization and asset allocation. Oxford University Press
Lee W (2000) Theory and methodology of tactical asset allocation (Vol. 65). John Wiley & Sons.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Henrique BM, Sobreiro VA, Kimura H (2023) Practical machine learning: Forecasting daily financial markets directions. Exp Syst Appl 120840.
Khashei M, Hajirahimi Z (2017) Performance evaluation of series and parallel strategies for financial time series forecasting. Financial Innov 3(1):1–24
Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
Hambly B, Xu R, Yang H (2021) Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553.
Sun R, Jiang Z, Su J (2021) A deep residual shrinkage neural network-based deep reinforcement learning strategy in financial portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 76–86). IEEE.
Shi S, Li J, Li G, Pan P (2019). A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In: Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1613–1622)
Song Z, Wang Y, Qian P, Song S, Coenen F, Jiang Z, Su J (2023) From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization. Appl Intell 53(12):15188–15203
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning (pp. 1861–1870). PMLR.
Dabney W, Rowland M, Bellemare M, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
Ye Y, Pei H, Wang B, Chen PY, Zhu Y, Xiao J, Li B (2020) Reinforcement-learning based portfolio management with augmented asset movement prediction states. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 1112–1119).
Wang Z, Huang B, Tu S, Zhang K, Xu L (2021) DeepTrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions Embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 1, pp. 643–650).
Liu XY, Yang H, Gao J, Wang CD (2021) FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the Second ACM International Conference on AI in Finance (pp. 1–9).
Lilicrap T, Hunt J, Pritzel A, Hess N, Erez T, Silver D, Wiestra D (2016) Continuous control with deep reinforcement learning. In: International Conference on Representation Learning (ICRL).
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning (pp. 1587–1596). PMLR.
Gao Y, Gao Z, Hu Y, Song S, Jiang Z, Su J (2021). A Framework of Hierarchical Deep Q-Network for Portfolio Management. In ICAART (2) (pp. 132–140).
Shavandi A, Khedmati M (2022) A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Syst Appl 208:118124
Lucarelli G, Borrotti M (2019) A deep reinforcement learning approach for automated cryptocurrency trading. In: Artificial Intelligence Applications and Innovations: 15th IFIP WG 12.5 International Conference, AIAI 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15 (pp. 247–258). Springer International Publishing.
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning (pp. 1995–2003). PMLR.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Lucarelli G, Borrotti M (2020) A deep Q-learning portfolio management framework for the cryptocurrency market. Neural Comput Appl 32:17229–17244
Giot P, Laurent S (2003) Value-at-risk for long and short trading positions. J Appl Economet 18(6):641–663
Woolridge JR, Dickinson A (1994) Short selling and common stock prices. Financ Anal J 50(1):20–28
Black F, Litterman R (1990) Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Res 115(1):7–18
Idzorek T (2007) A step-by-step guide to the Black-Litterman model: Incorporating user-specified confidence levels. In Forecasting expected returns in the financial markets (pp. 17–38). Academic Press.
Black F, Litterman R (1992) Global portfolio optimization. Financ Anal J 48(5):28–43
He G, Litterman R (2002) The intuition behind Black-Litterman model portfolios. Available at SSRN 334304.
Litterman B (2004) Modern investment management: an equilibrium approach. John Wiley & Sons
Martin JJ (1967) Bayesian decision problemas and Markov chains (No. 519.233 M3).
Steinbach MC (2001) Markowitz revisited: Mean-variance models in financial portfolio analysis. SIAM Rev 43(1):31–85
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Hao X, Mao H, Wang W, Yang Y, Li D, Zheng Y, Hao J (2022) Breaking the curse of dimensionality in multiagent state space: a unified agent permutation framework. arXiv preprint arXiv:2203.05285.
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13 (pp. 184–199). Springer International Publishing.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30.
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450.
Baevski A, Auli M (2018) Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853.
Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: International workshop on artificial neural networks (pp. 195–201). Berlin, Heidelberg: Springer Berlin Heidelberg.
LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backdrop. In: Neural Networks: Tricks of the Trade (pp. 9–48). Springer Verlag.
Lo AW (2004) The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. J Portfolio Manag, Forthcoming.
De Prado ML (2018) Advances in financial machine learning. John Wiley & Sons
Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (pp. 1928–1937). PMLR.
Cover TM (1991) Universal portfolios. Math Finance 1(1):1–29
Borodin A, El-Yaniv R, Gogan V (2000) On the competitive theory and practice of portfolio selection. In: LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, Punta del Este, Uruguay, April 10–14, 2000 Proceedings 4 (pp. 173–196). Springer Berlin Heidelberg.
Li B, Hoi SC (2014) Online portfolio selection: a survey. ACM Comput Surv (CSUR) 46(3):1–36
Cover TM, Ordentlich E (1996) Universal portfolios with side information. IEEE Trans Inf Theory 42(2):348–363
Helmbold DP, Schapire RE, Singer Y, Warmuth MK (1998) On-line portfolio selection using multiplicative updates. Math Financ 8(4):325–347
Borodin A, El-Yaniv R, Gogan V (2003) Can we learn to beat the best stock. Adv Neural Inf Process Syst 16.
Li B, Zhao P, Hoi SC, Gopalkrishnan V (2012) PAMR: passive aggressive mean reversion strategy for portfolio selection. Mach Learn 87:221–258
Li B, Hoi SC, Zhao P, Gopalkrishnan V (2011) Confidence weighted mean reversion strategy for on-line portfolio selection. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 434–442). JMLR Workshop and Conference Proceedings.
Li B, Hoi SC (2012) On-line portfolio selection with moving average reversion. arXiv preprint arXiv:1206.4626.
Huang D, Zhou J, Li B, Hoi S, Zhou S (2012) Robust Median Reversion Strategy for On-Line Portfolio Selection (2013). In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence: IJCAI 2013: Beijing, 3–9 August 2013.
Gao L, Zhang W (2013) Weighted moving average passive aggressive algorithm for online portfolio selection. In: 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (Vol. 1, pp. 327–330). IEEE.
Györfi L, Lugosi G, Udina F (2006) Nonparametric kernel-based sequential investment strategies. Math Finance: Int J Math, Stat Financ Econom 16(2):337–357
Li B, Hoi SC, Gopalkrishnan V (2011) Corn: correlation-driven nonparametric learning approach for portfolio selection. ACM Trans Intell Syst Technol (TIST) 2(3):1–29
Agarwal A, Hazan E, Kale S, Schapire RE (2006) Algorithms for portfolio management based on the newton method. In: Proceedings of the 23rd international conference on Machine learning (pp. 9–16).
Jorion P (1986) Bayes-stein estimation for portfolio analysis. J Financ Quant Anal 21(3):279–292
Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42(3):621–656
Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting?. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 9, pp. 11121–11128).
Rollinger TN, Hoffman ST (2013) Sortino: a ‘sharper’ratio. Chicago, Illinois: Red Rock Capital.
Khodaee P, Esfahanipour A, Taheri HM (2022) Forecasting turning points in stock price by applying a novel hybrid CNN-LSTM-ResNet model fed by 2D segmented images. Eng Appl Artif Intell 116:105464
Xiao Y, Valdez EA (2015) A Black-Litterman asset allocation model under Elliptical distributions. Quant Finance 15(3):509–519
Ren X, Jiang Z, Su J (2021) The use of features to enhance the capability of deep reinforcement learning for investment portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 44–50). IEEE.
Gu F, Jiang Z, Su J (2021) Application of features and neural network to enhance the performance of deep reinforcement learning in portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 92–97). IEEE.
Chen L, Dai SL, Dong C (2022) Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning. IEEE Trans Neural Netw Learn Syst.
Pham TL, Dao PN (2022) Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans 130:277–292
Dao PN, Liu YC (2022) Adaptive reinforcement learning in control design for cooperating manipulator systems. Asian J Control 24(3):1088–1103
Possieri C, Sassano M (2022) Q-learning for continuous-time linear systems: a data-driven implementation of the Kleinman algorithm. IEEE Trans Syst, Man, Cybern: Syst 52(10):6487–6497
Vu VT, Tran QH, Pham TL, Dao PN (2022) Online actor-critic reinforcement learning control for uncertain surface vessel systems with external disturbances. Int J Control Autom Syst 20(3):1029–1040
Li C, Shen L, Qian G (2023) Online hybrid neural network for stock price prediction: a case study of high-frequency stock trading in the Chinese market. Econometrics 11(2):13
Duan Y, Wang L, Zhang Q, Li J (2022) Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 4, pp. 4468–4476).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, R., Stefanidis, A., Jiang, Z. et al. Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization. Neural Comput & Applic 36, 20111–20146 (2024). https://doi.org/10.1007/s00521-024-09805-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09805-9