Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

As a model-free algorithm, deep reinforcement learning (DRL) agent learns and makes decisions by interacting with the environment in an unsupervised way. In recent years, DRL algorithms have been widely applied by scholars for portfolio optimization in consecutive trading periods, since the DRL agent can dynamically adapt to market changes and does not rely on the specification of the joint dynamics across the assets. However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns. Since the dynamic correlations among portfolio assets are crucial in optimizing the portfolio, the lack of such knowledge makes it difficult for the DRL agent to maximize the return per unit of risk, especially when the target market permits short selling (i.e., the US stock market). In this research, we propose a hybrid portfolio optimization model combining the DRL agent and the Black-Litterman (BL) model to enable the DRL agent to learn the dynamic correlation between the portfolio asset returns and implement an efficacious long/short strategy based on the correlation. Essentially, the DRL agent is trained to learn the policy to apply the BL model to determine the target portfolio weights. In this model, we formulate a specific objective function based on the environment’s reward function, which considers the return, risk, and transaction scale of the portfolio. Our DRL agent is trained by propagating the objective function’s gradient to the policy function of our DRL agent. To test our DRL agent, we construct the portfolio based on all the Dow Jones Industrial Average constitute stocks. Empirical results of the experiments conducted on real-world United States stock market data demonstrate that our DRL agent significantly outperforms various comparison portfolio choice strategies and alternative DRL frameworks by at least 42% in terms of accumulated return. In terms of the return per unit of risk, our DRL agent significantly outperforms various comparative portfolio choice strategies and alternative strategies based on other machine learning frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

This research uses only publicly available American stock data from Yahoo Finance platform. All datasets generated during and/or analyzed during the current study, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. 1http://www.finance.yahoo.com.

  2. 2https://qlib.readthedocs.io/en/latest/component/strategy.html.

References

  1. Markowitz H (1952) Portfolio selection J. Finance.

  2. Lindbeck A (2001) The Sveriges Riksbank (Bank of Sweden) Prize in Economic Sciences in Memory of Alfred Nobel 1969–2000. The Nobel Prize. The First 100 Years, 197–217.

  3. Creamer GG (2015) Can a corporate network and news sentiment improve portfolio optimization using the Black-Litterman model? Quant Finance 15(8):1405–1416

    Article  MathSciNet  Google Scholar 

  4. Markovitz HM (1959) Portfolio selection: Efficient diversification of investments. John Wiley.

  5. Leung MF, Wang J, Che H (2022) Cardinality-constrained portfolio selection via two-timescale duplex neurodynamic optimization. Neural Netw 153:399–410

    Article  Google Scholar 

  6. Leung M-F, Wang J (2021) Minimax and biobjective portfolio selection based on collaborative neurodynamic optimization. IEEE Trans Neural Netw Learn Syst 32(7):2825–2836

    Article  MathSciNet  Google Scholar 

  7. Leung M-F, Wang J (2022) Cardinality-constrained portfolio selection based on collaborative neurodynamic optimization. Neural Netw 145:68–79

    Article  Google Scholar 

  8. Leung M-F, Wang J, Li D (2022) Decentralized robust portfolio optimization based on cooperative-competitive multiagent systems. IEEE Trans Cybern 1–10.

  9. Colasanto F, Grilli L, Santoro D, Villani G (2022) BERT’s sentiment score for portfolio optimization: a fine-tuned view in Black and Litterman model. Neural Comput Appl 34(20):17507–17521

    Article  Google Scholar 

  10. Kochliaridis V, Kouloumpris E, Vlahavas I (2023) Combining deep reinforcement learning with technical analysis and trend monitoring on cryptocurrency markets. Neural Comput Appl 1–18.

  11. Vaziri J, Farid D, Nazemi Ardakani M, Hosseini Bamakan SM, Shahlaei M (2023) A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints. Neural Comput Appl 1–26.

  12. Maciel L, Ballini R, Gomide F (2023) Adaptive fuzzy modeling of interval-valued stream data and application in cryptocurrencies prediction. Neural Comput Appl 35(10):7149–7159

    Article  Google Scholar 

  13. Fatima S, Uddin M (2022) On the forecasting of multivariate financial time series using hybridization of DCC-GARCH model and multivariate ANNs. Neural Comput Appl 34(24):21911–21925

    Article  Google Scholar 

  14. Gao S, Wang Y, Yang X (2023) StockFormer: Learning Hybrid Trading Machines with Predictive Coding. IJCAI International Joint Conference on Artificial Intelligence, 2023–August, 4766–4774–4774.

  15. Shi S, Li J, Li G, Pan P, Chen Q, Sun Q (2022) GPM: a graph convolutional network based reinforcement learning framework for portfolio management. Neurocomputing 498:14–27

    Article  Google Scholar 

  16. Fabozzi FJ, Focardi SM, Kolm PN (2010) Quantitative equity investing: Techniques and strategies. John Wiley & Sons, USA

    Google Scholar 

  17. Michaud RO (1989) The Markowitz optimization enigma: Is ‘optimized’optimal? Financ Anal J 45(1):31–42

    Article  Google Scholar 

  18. Michaud RO, Michaud RO (2008) Efficient asset management: a practical guide to stock portfolio optimization and asset allocation. Oxford University Press

    Book  Google Scholar 

  19. Lee W (2000) Theory and methodology of tactical asset allocation (Vol. 65). John Wiley & Sons.

  20. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

  21. Henrique BM, Sobreiro VA, Kimura H (2023) Practical machine learning: Forecasting daily financial markets directions. Exp Syst Appl 120840.

  22. Khashei M, Hajirahimi Z (2017) Performance evaluation of series and parallel strategies for financial time series forecasting. Financial Innov 3(1):1–24

    Article  Google Scholar 

  23. Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.

  24. Hambly B, Xu R, Yang H (2021) Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553.

  25. Sun R, Jiang Z, Su J (2021) A deep residual shrinkage neural network-based deep reinforcement learning strategy in financial portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 76–86). IEEE.

  26. Shi S, Li J, Li G, Pan P (2019). A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In: Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1613–1622)

  27. Song Z, Wang Y, Qian P, Song S, Coenen F, Jiang Z, Su J (2023) From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization. Appl Intell 53(12):15188–15203

    Article  Google Scholar 

  28. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning (pp. 1861–1870). PMLR.

  29. Dabney W, Rowland M, Bellemare M, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).

  30. Ye Y, Pei H, Wang B, Chen PY, Zhu Y, Xiao J, Li B (2020) Reinforcement-learning based portfolio management with augmented asset movement prediction states. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 1112–1119).

  31. Wang Z, Huang B, Tu S, Zhang K, Xu L (2021) DeepTrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions Embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 1, pp. 643–650).

  32. Liu XY, Yang H, Gao J, Wang CD (2021) FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the Second ACM International Conference on AI in Finance (pp. 1–9).

  33. Lilicrap T, Hunt J, Pritzel A, Hess N, Erez T, Silver D, Wiestra D (2016) Continuous control with deep reinforcement learning. In: International Conference on Representation Learning (ICRL).

  34. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

  35. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning (pp. 1587–1596). PMLR.

  36. Gao Y, Gao Z, Hu Y, Song S, Jiang Z, Su J (2021). A Framework of Hierarchical Deep Q-Network for Portfolio Management. In ICAART (2) (pp. 132–140).

  37. Shavandi A, Khedmati M (2022) A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Syst Appl 208:118124

    Article  Google Scholar 

  38. Lucarelli G, Borrotti M (2019) A deep reinforcement learning approach for automated cryptocurrency trading. In: Artificial Intelligence Applications and Innovations: 15th IFIP WG 12.5 International Conference, AIAI 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15 (pp. 247–258). Springer International Publishing.

  39. Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning (pp. 1995–2003). PMLR.

  40. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

  41. Lucarelli G, Borrotti M (2020) A deep Q-learning portfolio management framework for the cryptocurrency market. Neural Comput Appl 32:17229–17244

    Article  Google Scholar 

  42. Giot P, Laurent S (2003) Value-at-risk for long and short trading positions. J Appl Economet 18(6):641–663

    Article  Google Scholar 

  43. Woolridge JR, Dickinson A (1994) Short selling and common stock prices. Financ Anal J 50(1):20–28

    Article  Google Scholar 

  44. Black F, Litterman R (1990) Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Res 115(1):7–18

    Google Scholar 

  45. Idzorek T (2007) A step-by-step guide to the Black-Litterman model: Incorporating user-specified confidence levels. In Forecasting expected returns in the financial markets (pp. 17–38). Academic Press.

  46. Black F, Litterman R (1992) Global portfolio optimization. Financ Anal J 48(5):28–43

    Article  Google Scholar 

  47. He G, Litterman R (2002) The intuition behind Black-Litterman model portfolios. Available at SSRN 334304.

  48. Litterman B (2004) Modern investment management: an equilibrium approach. John Wiley & Sons

    Google Scholar 

  49. Martin JJ (1967) Bayesian decision problemas and Markov chains (No. 519.233 M3).

  50. Steinbach MC (2001) Markowitz revisited: Mean-variance models in financial portfolio analysis. SIAM Rev 43(1):31–85

    Article  MathSciNet  Google Scholar 

  51. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  52. Hao X, Mao H, Wang W, Yang Y, Li D, Zheng Y, Hao J (2022) Breaking the curse of dimensionality in multiagent state space: a unified agent permutation framework. arXiv preprint arXiv:2203.05285.

  53. Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13 (pp. 184–199). Springer International Publishing.

  54. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30.

  55. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450.

  56. Baevski A, Auli M (2018) Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853.

  57. Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: International workshop on artificial neural networks (pp. 195–201). Berlin, Heidelberg: Springer Berlin Heidelberg.

  58. LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backdrop. In: Neural Networks: Tricks of the Trade (pp. 9–48). Springer Verlag.

  59. Lo AW (2004) The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. J Portfolio Manag, Forthcoming.

  60. De Prado ML (2018) Advances in financial machine learning. John Wiley & Sons

    Google Scholar 

  61. Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430

    Google Scholar 

  62. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (pp. 1928–1937). PMLR.

  63. Cover TM (1991) Universal portfolios. Math Finance 1(1):1–29

    Article  MathSciNet  Google Scholar 

  64. Borodin A, El-Yaniv R, Gogan V (2000) On the competitive theory and practice of portfolio selection. In: LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, Punta del Este, Uruguay, April 10–14, 2000 Proceedings 4 (pp. 173–196). Springer Berlin Heidelberg.

  65. Li B, Hoi SC (2014) Online portfolio selection: a survey. ACM Comput Surv (CSUR) 46(3):1–36

    Google Scholar 

  66. Cover TM, Ordentlich E (1996) Universal portfolios with side information. IEEE Trans Inf Theory 42(2):348–363

    Article  MathSciNet  Google Scholar 

  67. Helmbold DP, Schapire RE, Singer Y, Warmuth MK (1998) On-line portfolio selection using multiplicative updates. Math Financ 8(4):325–347

    Article  Google Scholar 

  68. Borodin A, El-Yaniv R, Gogan V (2003) Can we learn to beat the best stock. Adv Neural Inf Process Syst 16.

  69. Li B, Zhao P, Hoi SC, Gopalkrishnan V (2012) PAMR: passive aggressive mean reversion strategy for portfolio selection. Mach Learn 87:221–258

    Article  MathSciNet  Google Scholar 

  70. Li B, Hoi SC, Zhao P, Gopalkrishnan V (2011) Confidence weighted mean reversion strategy for on-line portfolio selection. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 434–442). JMLR Workshop and Conference Proceedings.

  71. Li B, Hoi SC (2012) On-line portfolio selection with moving average reversion. arXiv preprint arXiv:1206.4626.

  72. Huang D, Zhou J, Li B, Hoi S, Zhou S (2012) Robust Median Reversion Strategy for On-Line Portfolio Selection (2013). In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence: IJCAI 2013: Beijing, 3–9 August 2013.

  73. Gao L, Zhang W (2013) Weighted moving average passive aggressive algorithm for online portfolio selection. In: 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (Vol. 1, pp. 327–330). IEEE.

  74. Györfi L, Lugosi G, Udina F (2006) Nonparametric kernel-based sequential investment strategies. Math Finance: Int J Math, Stat Financ Econom 16(2):337–357

    Article  MathSciNet  Google Scholar 

  75. Li B, Hoi SC, Gopalkrishnan V (2011) Corn: correlation-driven nonparametric learning approach for portfolio selection. ACM Trans Intell Syst Technol (TIST) 2(3):1–29

    Article  Google Scholar 

  76. Agarwal A, Hazan E, Kale S, Schapire RE (2006) Algorithms for portfolio management based on the newton method. In: Proceedings of the 23rd international conference on Machine learning (pp. 9–16).

  77. Jorion P (1986) Bayes-stein estimation for portfolio analysis. J Financ Quant Anal 21(3):279–292

    Article  Google Scholar 

  78. Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42(3):621–656

    Article  Google Scholar 

  79. Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting?. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 9, pp. 11121–11128).

  80. Rollinger TN, Hoffman ST (2013) Sortino: a ‘sharper’ratio. Chicago, Illinois: Red Rock Capital.

  81. Khodaee P, Esfahanipour A, Taheri HM (2022) Forecasting turning points in stock price by applying a novel hybrid CNN-LSTM-ResNet model fed by 2D segmented images. Eng Appl Artif Intell 116:105464

    Article  Google Scholar 

  82. Xiao Y, Valdez EA (2015) A Black-Litterman asset allocation model under Elliptical distributions. Quant Finance 15(3):509–519

    Article  MathSciNet  Google Scholar 

  83. Ren X, Jiang Z, Su J (2021) The use of features to enhance the capability of deep reinforcement learning for investment portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 44–50). IEEE.

  84. Gu F, Jiang Z, Su J (2021) Application of features and neural network to enhance the performance of deep reinforcement learning in portfolio management. In: 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA) (pp. 92–97). IEEE.

  85. Chen L, Dai SL, Dong C (2022) Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning. IEEE Trans Neural Netw Learn Syst.

  86. Pham TL, Dao PN (2022) Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans 130:277–292

    Article  Google Scholar 

  87. Dao PN, Liu YC (2022) Adaptive reinforcement learning in control design for cooperating manipulator systems. Asian J Control 24(3):1088–1103

    Article  Google Scholar 

  88. Possieri C, Sassano M (2022) Q-learning for continuous-time linear systems: a data-driven implementation of the Kleinman algorithm. IEEE Trans Syst, Man, Cybern: Syst 52(10):6487–6497

    Article  Google Scholar 

  89. Vu VT, Tran QH, Pham TL, Dao PN (2022) Online actor-critic reinforcement learning control for uncertain surface vessel systems with external disturbances. Int J Control Autom Syst 20(3):1029–1040

    Article  Google Scholar 

  90. Li C, Shen L, Qian G (2023) Online hybrid neural network for stock price prediction: a case study of high-frequency stock trading in the Chinese market. Econometrics 11(2):13

    Article  Google Scholar 

  91. Duan Y, Wang L, Zhang Q, Li J (2022) Factorvae: A probabilistic dynamic factor model based on variational autoencoder for predicting cross-sectional stock returns. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 4, pp. 4468–4476).

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhengyong Jiang or Jionglong Su.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Table

Table 11 Hyper-parameters in the paper

11.

Appendix 2

Table

Table 12 The list of the Dow Jones Industrial Average (DJIA) components for portfolio construction and their respective tickers, names, and categories

12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, R., Stefanidis, A., Jiang, Z. et al. Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization. Neural Comput & Applic 36, 20111–20146 (2024). https://doi.org/10.1007/s00521-024-09805-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09805-9

Keywords

Navigation