Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization

157 Accesses
Explore all metrics

Abstract

As a model-free algorithm, deep reinforcement learning (DRL) agent learns and makes decisions by interacting with the environment in an unsupervised way. In recent years, DRL algorithms have been widely applied by scholars for portfolio optimization in consecutive trading periods, since the DRL agent can dynamically adapt to market changes and does not rely on the specification of the joint dynamics across the assets. However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns. Since the dynamic correlations among portfolio assets are crucial in optimizing the portfolio, the lack of such knowledge makes it difficult for the DRL agent to maximize the return per unit of risk, especially when the target market permits short selling (i.e., the US stock market). In this research, we propose a hybrid portfolio optimization model combining the DRL agent and the Black-Litterman (BL) model to enable the DRL agent to learn the dynamic correlation between the portfolio asset returns and implement an efficacious long/short strategy based on the correlation. Essentially, the DRL agent is trained to learn the policy to apply the BL model to determine the target portfolio weights. In this model, we formulate a specific objective function based on the environment’s reward function, which considers the return, risk, and transaction scale of the portfolio. Our DRL agent is trained by propagating the objective function’s gradient to the policy function of our DRL agent. To test our DRL agent, we construct the portfolio based on all the Dow Jones Industrial Average constitute stocks. Empirical results of the experiments conducted on real-world United States stock market data demonstrate that our DRL agent significantly outperforms various comparison portfolio choice strategies and alternative DRL frameworks by at least 42% in terms of accumulated return. In terms of the return per unit of risk, our DRL agent significantly outperforms various comparative portfolio choice strategies and alternative strategies based on other machine learning frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Multi-agent deep reinforcement learning algorithm with trend consistency regularization for portfolio management

Article 24 November 2022

Multi-agent reinforcement learning approach for hedging portfolio problem

Article 19 April 2021

Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

This research uses only publicly available American stock data from Yahoo Finance platform. All datasets generated during and/or analyzed during the current study, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

References

Markowitz H (1952) Portfolio selection J. Finance.
Lindbeck A (2001) The Sveriges Riksbank (Bank of Sweden) Prize in Economic Sciences in Memory of Alfred Nobel 1969–2000. The Nobel Prize. The First 100 Years, 197–217.
Creamer GG (2015) Can a corporate network and news sentiment improve portfolio optimization using the Black-Litterman model? Quant Finance 15(8):1405–1416
Article MathSciNet Google Scholar
Markovitz HM (1959) Portfolio selection: Efficient diversification of investments. John Wiley.
Leung MF, Wang J, Che H (2022) Cardinality-constrained portfolio selection via two-timescale duplex neurodynamic optimization. Neural Netw 153:399–410
Article Google Scholar
Leung M-F, Wang J (2021) Minimax and biobjective portfolio selection based on collaborative neurodynamic optimization. IEEE Trans Neural Netw Learn Syst 32(7):2825–2836
Article MathSciNet Google Scholar
Leung M-F, Wang J (2022) Cardinality-constrained portfolio selection based on collaborative neurodynamic optimization. Neural Netw 145:68–79
Article Google Scholar
Leung M-F, Wang J, Li D (2022) Decentralized robust portfolio optimization based on cooperative-competitive multiagent systems. IEEE Trans Cybern 1–10.
Colasanto F, Grilli L, Santoro D, Villani G (2022) BERT’s sentiment score for portfolio optimization: a fine-tuned view in Black and Litterman model. Neural Comput Appl 34(20):17507–17521
Article Google Scholar
Kochliaridis V, Kouloumpris E, Vlahavas I (2023) Combining deep reinforcement learning with technical analysis and trend monitoring on cryptocurrency markets. Neural Comput Appl 1–18.
Vaziri J, Farid D, Nazemi Ardakani M, Hosseini Bamakan SM, Shahlaei M (2023) A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints. Neural Comput Appl 1–26.
Maciel L, Ballini R, Gomide F (2023) Adaptive fuzzy modeling of interval-valued stream data and application in cryptocurrencies prediction. Neural Comput Appl 35(10):7149–7159
Article Google Scholar
Fatima S, Uddin M (2022) On the forecasting of multivariate financial time series using hybridization of DCC-GARCH model and multivariate ANNs. Neural Comput Appl 34(24):21911–21925
Article Google Scholar
Gao S, Wang Y, Yang X (2023) StockFormer: Learning Hybrid Trading Machines with Predictive Coding. IJCAI International Joint Conference on Artificial Intelligence, 2023–August, 4766–4774–4774.
Shi S, Li J, Li G, Pan P, Chen Q, Sun Q (2022) GPM: a graph convolutional network based reinforcement learning framework for portfolio management. Neurocomputing 498:14–27
Article Google Scholar
Fabozzi FJ, Focardi SM, Kolm PN (2010) Quantitative equity investing: Techniques and strategies. John Wiley & Sons, USA
Google Scholar
Michaud RO (1989) The Markowitz optimization enigma: Is ‘optimized’optimal? Financ Anal J 45(1):31–42
Article Google Scholar
Michaud RO, Michaud RO (2008) Efficient asset management: a practical guide to stock portfolio optimization and asset allocation. Oxford University Press
Book Google Scholar
Lee W (2000) Theory and methodology of tactical asset allocation (Vol. 65). John Wiley & Sons.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Henrique BM, Sobreiro VA, Kimura H (2023) Practical machine learning: Forecasting daily financial markets directions. Exp Syst Appl 120840.
Khashei M, Hajirahimi Z (2017) Performance evaluation of series and parallel strategies for financial time series forecasting. Financial Innov 3(1):1–24
Article Google Scholar
Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
Hambly B, Xu R, Yang H (2021) Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553.
Sun R, Jiang Z, Su J (2021) A deep residual shrinkage neural network-based deep reinforcement learning strategy in financial portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 76–86). IEEE.
Shi S, Li J, Li G, Pan P (2019). A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In: Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1613–1622)
Song Z, Wang Y, Qian P, Song S, Coenen F, Jiang Z, Su J (2023) From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization. Appl Intell 53(12):15188–15203
Article Google Scholar
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning (pp. 1861–1870). PMLR.
Dabney W, Rowland M, Bellemare M, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
Ye Y, Pei H, Wang B, Chen PY, Zhu Y, Xiao J, Li B (2020) Reinforcement-learning based portfolio management with augmented asset movement prediction states. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 1112–1119).
Wang Z, Huang B, Tu S, Zhang K, Xu L (2021) DeepTrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions Embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 1, pp. 643–650).
Liu XY, Yang H, Gao J, Wang CD (2021) FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the Second ACM International Conference on AI in Finance (pp. 1–9).
Lilicrap T, Hunt J, Pritzel A, Hess N, Erez T, Silver D, Wiestra D (2016) Continuous control with deep reinforcement learning. In: International Conference on Representation Learning (ICRL).
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning (pp. 1587–1596). PMLR.
Gao Y, Gao Z, Hu Y, Song S, Jiang Z, Su J (2021). A Framework of Hierarchical Deep Q-Network for Portfolio Management. In ICAART (2) (pp. 132–140).
Shavandi A, Khedmati M (2022) A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Syst Appl 208:118124
Article Google Scholar
Lucarelli G, Borrotti M (2019) A deep reinforcement learning approach for automated cryptocurrency trading. In: Artificial Intelligence Applications and Innovations: 15th IFIP WG 12.5 International Conference, AIAI 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15 (pp. 247–258). Springer International Publishing.
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning (pp. 1995–2003). PMLR.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Lucarelli G, Borrotti M (2020) A deep Q-learning portfolio management framework for the cryptocurrency market. Neural Comput Appl 32:17229–17244
Article Google Scholar
Giot P, Laurent S (2003) Value-at-risk for long and short trading positions. J Appl Economet 18(6):641–663
Article Google Scholar
Woolridge JR, Dickinson A (1994) Short selling and common stock prices. Financ Anal J 50(1):20–28
Article Google Scholar
Black F, Litterman R (1990) Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Res 115(1):7–18
Google Scholar
Idzorek T (2007) A step-by-step guide to the Black-Litterman model: Incorporating user-specified confidence levels. In Forecasting expected returns in the financial markets (pp. 17–38). Academic Press.
Black F, Litterman R (1992) Global portfolio optimization. Financ Anal J 48(5):28–43
Article Google Scholar
He G, Litterman R (2002) The intuition behind Black-Litterman model portfolios. Available at SSRN 334304.
Litterman B (2004) Modern investment management: an equilibrium approach. John Wiley & Sons
Google Scholar
Martin JJ (1967) Bayesian decision problemas and Markov chains (No. 519.233 M3).
Steinbach MC (2001) Markowitz revisited: Mean-variance models in financial portfolio analysis. SIAM Rev 43(1):31–85
Article MathSciNet Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Article Google Scholar
Hao X, Mao H, Wang W, Yang Y, Li D, Zheng Y, Hao J (2022) Breaking the curse of dimensionality in multiagent state space: a unified agent permutation framework. arXiv preprint arXiv:2203.05285.
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13 (pp. 184–199). Springer International Publishing.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30.
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450.
Baevski A, Auli M (2018) Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853.
Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: International workshop on artificial neural networks (pp. 195–201). Berlin, Heidelberg: Springer Berlin Heidelberg.
LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backdrop. In: Neural Networks: Tricks of the Trade (pp. 9–48). Springer Verlag.
Lo AW (2004) The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. J Portfolio Manag, Forthcoming.
De Prado ML (2018) Advances in financial machine learning. John Wiley & Sons
Google Scholar
Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (pp. 1928–1937). PMLR.
Cover TM (1991) Universal portfolios. Math Finance 1(1):1–29
Article MathSciNet Google Scholar
Borodin A, El-Yaniv R, Gogan V (2000) On the competitive theory and practice of portfolio selection. In: LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, Punta del Este, Uruguay, April 10–14, 2000 Proceedings 4 (pp. 173–196). Springer Berlin Heidelberg.
Li B, Hoi SC (2014) Online portfolio selection: a survey. ACM Comput Surv (CSUR) 46(3):1–36
Google Scholar
Cover TM, Ordentlich E (1996) Universal portfolios with side information. IEEE Trans Inf Theory 42(2):348–363
Article MathSciNet Google Scholar
Helmbold DP, Schapire RE, Singer Y, Warmuth MK (1998) On-line portfolio selection using multiplicative updates. Math Financ 8(4):325–347
Article Google Scholar
Borodin A, El-Yaniv R, Gogan V (2003) Can we learn to beat the best stock. Adv Neural Inf Process Syst 16.
Li B, Zhao P, Hoi SC, Gopalkrishnan V (2012) PAMR: passive aggressive mean reversion strategy for portfolio selection. Mach Learn 87:221–258
Article MathSciNet Google Scholar
Li B, Hoi SC, Zhao P, Gopalkrishnan V (2011) Confidence weighted mean reversion strategy for on-line portfolio selection. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 434–442). JMLR Workshop and Conference Proceedings.
Li B, Hoi SC (2012) On-line portfolio selection with moving average reversion. arXiv preprint arXiv:1206.4626.
Huang D, Zhou J, Li B, Hoi S, Zhou S (2012) Robust Median Reversion Strategy for On-Line Portfolio Selection (2013). In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence: IJCAI 2013: Beijing, 3–9 August 2013.
Gao L, Zhang W (2013) Weighted moving average passive aggressive algorithm for online portfolio selection. In: 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (Vol. 1, pp. 327–330). IEEE.
Györfi L, Lugosi G, Udina F (2006) Nonparametric kernel-based sequential investment strategies. Math Finance: Int J Math, Stat Financ Econom 16(2):337–357
Article MathSciNet Google Scholar
Li B, Hoi SC, Gopalkrishnan V (2011) Corn: correlation-driven nonparametric learning approach for portfolio selection. ACM Trans Intell Syst Technol (TIST) 2(3):1–29
Article Google Scholar
Agarwal A, Hazan E, Kale S, Schapire RE (2006) Algorithms for portfolio management based on the newton method. In: Proceedings of the 23rd international conference on Machine learning (pp. 9–16).
Jorion P (1986) Bayes-stein estimation for portfolio analysis. J Financ Quant Anal 21(3):279–292
Article Google Scholar
Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42(3):621–656
Article Google Scholar
Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting?. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 9, pp. 11121–11128).
Rollinger TN, Hoffman ST (2013) Sortino: a ‘sharper’ratio. Chicago, Illinois: Red Rock Capital.
Khodaee P, Esfahanipour A, Taheri HM (2022) Forecasting turning points in stock price by applying a novel hybrid CNN-LSTM-ResNet model fed by 2D segmented images. Eng Appl Artif Intell 116:105464
Article Google Scholar
Xiao Y, Valdez EA (2015) A Black-Litterman asset allocation model under Elliptical distributions. Quant Finance 15(3):509–519
Article MathSciNet Google Scholar
Ren X, Jiang Z, Su J (2021) The use of features to enhance the capability of deep reinforcement learning for investment portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 44–50). IEEE.
Gu F, Jiang Z, Su J (2021) Application of features and neural network to enhance the performance of deep reinforcement learning in portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 92–97). IEEE.
Chen L, Dai SL, Dong C (2022) Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning. IEEE Trans Neural Netw Learn Syst.
Pham TL, Dao PN (2022) Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans 130:277–292
Article Google Scholar
Dao PN, Liu YC (2022) Adaptive reinforcement learning in control design for cooperating manipulator systems. Asian J Control 24(3):1088–1103
Article Google Scholar
Possieri C, Sassano M (2022) Q-learning for continuous-time linear systems: a data-driven implementation of the Kleinman algorithm. IEEE Trans Syst, Man, Cybern: Syst 52(10):6487–6497
Article Google Scholar
Vu VT, Tran QH, Pham TL, Dao PN (2022) Online actor-critic reinforcement learning control for uncertain surface vessel systems with external disturbances. Int J Control Autom Syst 20(3):1029–1040
Article Google Scholar
Li C, Shen L, Qian G (2023) Online hybrid neural network for stock price prediction: a case study of high-frequency stock trading in the Chinese market. Econometrics 11(2):13
Article Google Scholar
Duan Y, Wang L, Zhang Q, Li J (2022) Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 4, pp. 4468–4476).

Download references

Author information

Authors and Affiliations

Department of Financial and Actuarial Mathematics, School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Ruoyu Sun
School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Angelos Stefanidis, Zhengyong Jiang & Jionglong Su

Authors

Ruoyu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Angelos Stefanidis
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jionglong Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhengyong Jiang or Jionglong Su.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Table

Table 11 Hyper-parameters in the paper

Full size table

11.

Appendix 2

Table

Table 12 The list of the Dow Jones Industrial Average (DJIA) components for portfolio construction and their respective tickers, names, and categories

Full size table

12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, R., Stefanidis, A., Jiang, Z. et al. Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization. Neural Comput & Applic 36, 20111–20146 (2024). https://doi.org/10.1007/s00521-024-09805-9

Download citation

Received: 04 July 2023
Accepted: 25 March 2024
Published: 10 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s00521-024-09805-9

Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-agent deep reinforcement learning algorithm with trend consistency regularization for portfolio management

Multi-agent reinforcement learning approach for hedging portfolio problem

Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-agent deep reinforcement learning algorithm with trend consistency regularization for portfolio management

Multi-agent reinforcement learning approach for hedging portfolio problem

Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection

Explore related subjects

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation