research-article

Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network

Authors:

Anisoara Calinescu,

Jakob FoersterAuthors Info & Claims

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

Pages 91 - 99

https://doi.org/10.1145/3604237.3626898

Published: 25 November 2023 Publication History

Abstract

Developing a generative model of realistic order flow in financial markets is a challenging open problem, with numerous applications for market participants. Addressing this, we propose the first end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages. These messages are interpreted by the JAX-LOB simulator, which updates the LOB state. To handle long sequences efficiently, the model employs simplified structured state-space layers to process sequences of order book states and tokenized messages. Using LOBSTER data of NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens, similar to tokenization in large language models. Out-of-sample results show promising performance in approximating the data distribution, as evidenced by low model perplexity. Furthermore, the mid-price returns calculated from the generated order flow exhibit a significant correlation with the data, indicating impressive conditional forecast performance. Due to the granularity of generated data, and the accuracy of the model, it offers new application areas for future work beyond forecasting, e.g. acting as a world model in high-frequency financial reinforcement learning applications. Overall, our results invite the use and extension of the model in the direction of autoregressive large financial models for the generation of high-frequency financial data. 1

References

[1]

Samuel A Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E Tillman, Prashant Reddy, and Manuela Veloso. 2020. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance. 1–8.

Digital Library

[2]

David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019. Seeing what a gan cannot generate. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4502–4511.

[3]

Siddharth Bhatia, Arjit Jain, and Bryan Hooi. 2021. Exgan: Adversarial generation of extreme samples. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 6750–6758.

[4]

Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4. Springer.

[5]

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax

[6]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[7]

David Byrd, Maria Hybinette, and Tucker Hybinette Balch. 2020. ABIDES: Towards high-fidelity multi-agent market simulation. In Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 11–22.

Digital Library

[8]

Stanley F Chen, Douglas Beeferman, and Roni Rosenfeld. 1998. Evaluation metrics for language models. (1998).

[9]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint:1904.10509 (2019).

[10]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).

[11]

Andrea Coletta, Joseph Jerome, Rahul Savani, and Svitlana Vyetrenko. 2023. Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness. arXiv preprint arXiv:2306.12806 (2023).

[12]

Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance. 428–436.

Digital Library

[13]

Andrea Coletta, Matteo Prata, Michele Conti, Emanuele Mercanti, Novella Bartolini, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2021. Towards realistic market simulations: a generative adversarial networks approach. In Proceedings of the Second ACM International Conference on AI in Finance. 1–9.

Digital Library

[14]

Rama Cont, Mihai Cucuringu, Renyuan Xu, and Chao Zhang. 2022. Tail-GAN: Learning to Simulate Tail Risk Scenarios. Available at SSRN 3812973 (2022).

[15]

Rama Cont, Mihai Cucuringu, and Chao Zhang. 2021. Cross Impact of Order Flow Imbalances: Contemporaneous and Predictive. arXiv preprint arXiv:2112.13213 (2021).

[16]

Rama Cont, Sasha Stoikov, and Rishi Talreja. 2010. A stochastic model for order book dynamics. Operations research 58, 3 (2010), 549–563.

[17]

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. 2023. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

[18]

Florian Eckerli and Joerg Osterrieder. 2021. Generative Adversarial Networks in finance: an overview. arXiv preprint arXiv:2106.06364 (2021).

[19]

Zoltan Eisler, Jean-Philippe Bouchaud, and Julien Kockelkoren. 2012. The price impact of order book events: market orders, limit orders and cancellations. Quantitative Finance 12, 9 (2012), 1395–1419.

[20]

Sascha Frey, Kang Li, Peer Nagy, Silvia Sapora, Chris Lu, Stefan Zohren, Jakob Foerster, and Anisoara Calinescu. 2023. JAX-LOB: A GPU-Accelerated Limit Order Book Simulator to Unlock Large-Scale Reinforcement Learning for Trading. In Proceedings of the Fourth ACM International Conference on AI in Finance.

Digital Library

[21]

Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2020. Tadgan: Time series anomaly detection using generative adversarial networks. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 33–43.

[22]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).

[23]

Martin D. Gould, Mason A. Porter, Stacy Williams, Mark McDonald, Daniel J. Fenn, and Sam D. Howison. 2013. Limit order books. Quantitative Finance 13, 11 (Nov. 2013), 1709–1742. https://doi.org/10.1080/14697688.2013.803148 Publisher: Routledge _eprint: https://doi.org/10.1080/14697688.2013.803148.

[24]

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. 2020. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems 33 (2020), 1474–1487.

[25]

Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. 2022. On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems 35 (2022), 35971–35983.

[26]

Albert Gu, Karan Goel, and Christopher Ré. 2021. Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations.

[27]

Pierre Henry-Labordere. 2019. Generative models for financial data. Available at SSRN 3408007 (2019).

[28]

Ruihong Huang and Tomas Polak. 2011. Lobster: Limit order book reconstruction system. Available at SSRN 1977207 (2011).

[29]

Hanna Hultin, Henrik Hult, Alexandre Proutiere, Samuel Samama, and Ala Tarighati. 2023. A generative model of a limit order book using recurrent neural networks. Quantitative Finance (2023), 1–28.

[30]

Hanna Hultin, Henrik Hult, Alexandre Proutiere, Samuel Samama, and Ala Tarighati. 2023. A generative model of a limit order book using recurrent neural networks. Quantitative Finance 23, 6 (2023), 931–958. https://doi.org/10.1080/14697688.2023.2205583 arXiv:https://doi.org/10.1080/14697688.2023.2205583

[31]

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.

[32]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171–4186.

[33]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[34]

Petter N Kolm, Jeremy Turiel, and Nicholas Westray. 2021. Deep order flow imbalance: Extracting alpha at multiple horizons from the limit order book. Available at SSRN 3900141 (2021).

[35]

Alireza Koochali, Peter Schichtel, Andreas Dengel, and Sheraz Ahmed. 2019. Probabilistic Forecasting of Sensory Data With Generative Adversarial Networks – ForGAN. IEEE Access 7 (2019), 63868–63880. https://doi.org/10.1109/access.2019.2915544

[36]

Jerzy Korczak and Marcin Hemes. 2017. Deep learning for financial time series forecasting in a-trader system. In 2017 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 905–912.

[37]

Cheuk Ting Li and Farzan Farnia. 2023. Mode-Seeking Divergences: Theory and Applications to GANs. In International Conference on Artificial Intelligence and Statistics. PMLR, 8321–8350.

[38]

Chunli Liu, Carmine Ventre, and Maria Polukarov. 2022. Synthetic Data Augmentation for Deep Reinforcement Learning in Financial Trading. In Proceedings of the Third ACM International Conference on AI in Finance. 343–351.

Digital Library

[39]

Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014). arXiv:1411.1784http://arxiv.org/abs/1411.1784

[40]

Yusuke Naritomi and Takanori Adachi. 2020. Data augmentation of high frequency financial data using generative adversarial network. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 641–648.

[41]

Georg Ostrovski, Will Dabney, and Rémi Munos. 2018. Autoregressive quantile networks for generative modeling. In International Conference on Machine Learning. PMLR, 3936–3945.

[42]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.

[43]

David E Rumelhart and James L McClelland. 1987. Learning Internal Representations by Error Propagation. (1987).

[44]

Omer Berat Sezer, Mehmet Ugur Gudelek, and Ahmet Murat Ozbayoglu. 2020. Financial time series forecasting with deep learning : A systematic literature review: 2005–2019. Applied Soft Computing 90 (2020), 106181. https://doi.org/10.1016/j.asoc.2020.106181

[45]

Jimmy TH Smith, Andrew Warrington, and Scott Linderman. 2022. Simplified State Space Layers for Sequence Modeling. In The Eleventh International Conference on Learning Representations.

[46]

Shuntaro Takahashi, Yu Chen, and Kumiko Tanaka-Ishii. 2019. Modeling financial time-series with generative adversarial networks. Physica A: Statistical Mechanics and its Applications 527 (2019), 121261.

[47]

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers. In International Conference on Learning Representations.

[48]

Arnold Tustin. 1947. A method of analysing the behaviour of linear systems in terms of time series. Journal of the Institution of Electrical Engineers-Part IIA: Automatic Regulators and Servo Mechanisms 94, 1 (1947), 130–142.

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[50]

Milena Vuletić, Felix Prenzel, and Mihai Cucuringu. 2023. Fin-gan: Forecasting and classifying financial time series via generative adversarial networks. Available at SSRN 4328302 (2023).

[51]

Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar. 2019. Time-series Generative Adversarial Networks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf

[52]

Michael Zhang, Khaled Kamal Saab, Michael Poli, Tri Dao, Karan Goel, and Christopher Re. 2022. Effectively Modeling Time Series with Simple Discrete State Spaces. In The Eleventh International Conference on Learning Representations.

[53]

Zhaoyu Zhang, Mengyan Li, and Jun Yu. 2018. On the convergence and mode collapse of GAN. In SIGGRAPH Asia 2018 Technical Briefs. 1–4.

[54]

Zihao Zhang, Bryan Lim, and Stefan Zohren. 2021. Deep learning for market by order data. Applied Mathematical Finance 28, 1 (2021), 79–95.

[55]

Zihao Zhang, Stefan Zohren, and Stephen Roberts. 2019. Deeplob: Deep convolutional neural networks for limit order books. IEEE Transactions on Signal Processing 67, 11 (2019), 3001–3012.

[56]

Xingyu Zhou, Zhisong Pan, Guyu Hu, Siqi Tang, and Cheng Zhao. 2018. Stock market prediction on high-frequency data using generative adversarial nets.Mathematical Problems in Engineering (2018).

Cited By

ELOMARI-KESSAB SMaitrier GBonart JBouchaud J(2024)Microstructure Modes -- Disentangling the Joint Dynamics of Prices & Order FlowSSRN Electronic Journal10.2139/ssrn.4831906Online publication date: 2024
https://doi.org/10.2139/ssrn.4831906
Sideras ABougiatiotis KZavitsanos EPaliouras GVouros G(2024)Bankruptcy Prediction: Data Augmentation, LLMs and the Need for Auditor's OpinionProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698627(453-460)Online publication date: 14-Nov-2024
https://dl.acm.org/doi/10.1145/3677052.3698627

Index Terms

Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Pricing European Options in a Discrete Time Model for the Limit Order Book
Abstract
In this paper we build a discrete time model for the structure of the limit order book, so that the price per share depends on the size of the transaction. We deduce the value of a portfolio when the investor trades using market orders and a bank ...
Deep Reinforcement Learning for Market Making
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

Market Making is high frequency trading strategy in which an agent provides liquidity simultaneously quoting a bid price and an ask price on an asset. Market Makers reaps profits in the form of the spread between the quoted price placed on the buy and ...
Dynamic Calibration of Order Flow Models with Generative Adversarial Networks
ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

Classical models for order flow dynamics based on point processes, such as Poisson or Hawkes processes, have been studied intensively. Often, several days of limit border book (LOB) data is used to calibrate such models, thereby averaging over ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

November 2023

697 pages

ISBN:9798400702402

DOI:10.1145/3604237

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICAIF '23

ICAIF '23: 4th ACM International Conference on AI in Finance

November 27 - 29, 2023

NY, Brooklyn, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
331
Total Downloads

Downloads (Last 12 months)331
Downloads (Last 6 weeks)28

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

ELOMARI-KESSAB SMaitrier GBonart JBouchaud J(2024)Microstructure Modes -- Disentangling the Joint Dynamics of Prices & Order FlowSSRN Electronic Journal10.2139/ssrn.4831906Online publication date: 2024
https://doi.org/10.2139/ssrn.4831906
Sideras ABougiatiotis KZavitsanos EPaliouras GVouros G(2024)Bankruptcy Prediction: Data Augmentation, LLMs and the Need for Auditor's OpinionProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698627(453-460)Online publication date: 14-Nov-2024
https://dl.acm.org/doi/10.1145/3677052.3698627

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents