A Comparative Study of Bitcoin Price Prediction Using Deep Learning
<p>Bitcoin daily prices on Bitstamp (USD) from 29 November 2011 to 31 December 2018. The upper line shows log prices, whereas the lower line shows plain prices. Some recurring patterns seem to exist when considering the log value of the Bitcoin price.</p> "> Figure 2
<p>Spearman rank correlation coefficient matrix between Bitcoin blockchain features.</p> "> Figure 3
<p>Changes in the values of selected features.</p> "> Figure 4
<p>Data preprocessing. A total of <math display="inline"><semantics> <mrow> <mn>2590</mn> <mo>−</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> </mrow> </semantics></math> sequence data are generated from the whole dataset of 2590 days if <span class="html-italic">m</span> consecutive days are analyzed to predict the next Bitcoin price.</p> "> Figure 5
<p>An example of fully-connected deep neural network (DNN) model.</p> "> Figure 6
<p>Recurrent neural network (RNN) model (<b>left</b>) and long short-term memory (LSTM) model (<b>right</b>).</p> "> Figure 7
<p>Our CNN model. It consists of a single 2D convolution layer where 36 filters of size <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>18</mn> </mrow> </semantics></math> are used for convolution. An <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>×</mo> <mn>18</mn> </mrow> </semantics></math> input matrix is translated into an <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>m</mi> <mo>−</mo> <mn>2</mn> <mo>)</mo> <mo>×</mo> <mn>36</mn> </mrow> </semantics></math> matrix by the Conv2D layer.</p> "> Figure 8
<p>Our ResNet model. It uses four residual blocks which implement shortcut connections between two convolution layers.</p> "> Figure 9
<p>Our CRNN model (a combination of CNNs and RNNs) that combines the results of CNN and LSTM models using a concatenation operator. The Conv2D block represents the 2D convolution layer in <a href="#mathematics-07-00898-f007" class="html-fig">Figure 7</a> and the LSTM block represents the LSTM model discussed in Secion <a href="#sec3dot2dot2-mathematics-07-00898" class="html-sec">Section 3.2.2</a>.</p> "> Figure 10
<p>Stacking ensemble model: DNN, LSTM, and CNN are used as base learners at the first level and another DNN as a meta learner at the second level.</p> "> Figure 11
<p>Prediction results of the DNN model with <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>5</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>20</mn> <mo>,</mo> <mn>50</mn> <mo>,</mo> <mn>100</mn> </mrow> </semantics></math>. The log values of the major features, the sequential partitioning, and the first value-based normalization were used.</p> "> Figure 12
<p>Prediction results of the proposed models with <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math>. The log values of the major features, the sequential partitioning, and the first value-based normalization were used.</p> "> Figure 13
<p>Prediction results of DNN with <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>5</mn> <mo>,</mo> <mn>20</mn> </mrow> </semantics></math> (<b>left</b>) and prediction results of DNN and LSTM with <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math> (<b>right</b>) from 1 December 2017 to 1 April 2018, excerpted from <a href="#mathematics-07-00898-f011" class="html-fig">Figure 11</a> and <a href="#mathematics-07-00898-f012" class="html-fig">Figure 12</a>.</p> "> Figure 14
<p>Prediction results of the DNN model with different normalization methods. The sequence size <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math>. The log values of the major features and the sequential partitioning were used.</p> ">
Abstract
:1. Introduction
1.1. Related Work
1.2. Organization of the Paper
2. Datasets and Feature Engineering
3. Methods
3.1. Data Preparation
3.2. Prediction Models
3.2.1. Deep Neural Networks
3.2.2. Recurrent Neural Networks and Long Short-Term Memory
3.2.3. Convolutional Neural Networks
3.2.4. Deep Residual Networks
3.2.5. Combinations of CNNs and RNNs
3.2.6. Ensemble Models
4. Experimental Results
- The first set of experiments varied the size, m, of the input sequence. We used and 100. For other experiments below, m was set to 20 for regression problems and 50 for classification problems.
- In the second set of experiments, for six blockchain features such as difficulty, est-trans-vol-usd, hash-rate, my-wallets, trade-vol, and trans-fees-usd, where the difference between the minimum and the maximum values is very large, we compared the performance of using their log values with the performance of using their plain values. For other experiments, the default experimental setting was to use their log values.
- Different data split methods were also compared. The default split method was to use the first of the entire data as training data and the rest as test data (i.e., sequential partitioning).
- In addition, different normalization methods were compared. The default normalization method was the first value-based normalization discussed in Section 3.1.
- Finally, we compared the profitability of the proposed models using a simple trading strategy: buy Bitcoin with all funds or hold it if the model predicts a price rise or otherwise sell all Bitcoin or wait.
4.1. Effect of the Sequence Size
4.1.1. Effect of the Sequence Size on Regression
4.1.2. Effect of the Sequence Size on Classification
4.2. Effect of Using Log Values
4.3. Effect of Data Split Methods
4.4. Effect of Normalization Methods
4.5. Profitability of the Proposed Models
5. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. Technical Report. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 12 July 2019).
- Cootner, P.H. The Random Character of Stock Market Prices; MIT Press: Cambridge, MA, USA, 1964. [Google Scholar]
- Alessandretti, L.; ElBahrawy, A.; Aiello, L.M.; Baronchelli, A. Anticipating Cryptocurrency Prices Using Machine Learning. Complexity 2018, 2018, 8983590:1–8983590:16. [Google Scholar] [CrossRef]
- Corbet, S.; Lucey, B.; Urquhart, A.; Yarovaya, L. Cryptocurrencies as a financial asset: A systematic analysis. Int. Rev. Financ. Anal. 2019, 62, 182–199. [Google Scholar] [CrossRef]
- McNally, S.; Roche, J.; Caton, S. Predicting the Price of Bitcoin Using Machine Learning. In Proceedings of the 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Cambridge, UK, 21–23 March 2018; pp. 339–343. [Google Scholar]
- Saad, M.; Mohaisen, A. Towards characterizing blockchain-based cryptocurrencies for highly-accurate predictions. In Proceedings of the IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA, 16–19 April 2018; pp. 704–709. [Google Scholar]
- Jang, H.; Lee, J. An Empirical Study on Modeling and Prediction of Bitcoin Prices with Bayesian Neural Networks Based on Blockchain Information. IEEE Access 2018, 6, 5427–5437. [Google Scholar] [CrossRef]
- Nakano, M.; Takahashi, A.; Takahashi, S. Bitcoin technical trading with artificial neural network. Phys. A Stat. Mech. Appl. 2018, 510, 587–609. [Google Scholar] [CrossRef]
- Rebane, J.; Karlsson, I.; Denic, S.; Papapetrou, P. Seq2Seq RNNs and ARIMA models for Cryptocurrency Prediction: A Comparative Study. In Proceedings of the KDD Data Science in Fintech Workshop, London, UK, 20 August 2018. [Google Scholar]
- Jang, H.; Lee, J.; Ko, H.; Lee, W. Predicting Bitcoin Prices by Using Rolling Window LSTM model. In Proceedings of the KDD Data Science in Fintech Workshop, London, UK, 20 August 2018. [Google Scholar]
- Shintate, T.; Pichl, L. Trend Prediction Classification for High Frequency Bitcoin Time Series with Deep Learning. J. Risk Financ. Manag. 2019, 12, 17. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Lecun, Y. Generalization and Network Design Strategies; Technical Report CRG-TR-89-4; Department of Computer Science, University of Toronto: Toronto, ON, Canada, 1989. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 933–941. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yang, Y.; Mao, J.; Huang, Z.; Huang, C.; Xu, W. CNN-RNN: A Unified Framework for Multi-label Image Classification. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2285–2294. [Google Scholar]
- Box, G.; Jenkins, G. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1976. [Google Scholar]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Kim, Y.B.; Kim, J.G.; Kim, W.; Im, J.H.; Kim, T.H.; Kang, S.J.; Kim, C.H. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLoS ONE 2016, 11. [Google Scholar] [CrossRef]
- Li, T.R.; Chamrajnagar, A.S.; Fong, X.R.; Rizik, N.R.; Fu, F. Sentiment-Based Prediction of Alternative Cryptocurrency Price Fluctuations Using Gradient Boosting Tree Model. arXiv 2018, arXiv:1805.00558. [Google Scholar] [CrossRef]
- Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
- Pearson, K. Notes on the history of correlation. Biometrika 1920, 13, 25–45. [Google Scholar] [CrossRef]
- Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.; LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 2146–2153. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qata, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 2, pp. 2017–2025. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML 2015 Deep Learning Workshop, Lille, France, 10–11 July 2015. [Google Scholar]
Feature | Description |
---|---|
avg-block-size | The 24 h average block size in MB. |
blockchain-size | The total size of all block headers and transactions. |
cost-per-trans | Miners revenue divided by the number of transactions. |
cost-per-trans-pct | Miners revenue as percentage of the transaction volume. |
difficulty | A relative measure of difficulty in finding a new block. |
est-trans-vol | The estimated value of transactions on the Bitcoin blockchain in BTC. |
est-trans-vol-usd | The estimated USD value of transactions. |
hash-rate | The estimated number of tera hashes per second the Bitcoin network is performing. |
market-cap | The total USD value of Bitcoin supply in circulation. |
market-price | The average USD market price across major Bitcoin exchanges. |
med-cfm-time | The median time for a transaction to be accepted into a mined block. |
mempool-count | The number of transactions waiting to be confirmed. |
mempool-growth | The rate of the memory pool (mempool) growth per second. |
mempool-size | The aggregate size of transactions waiting to be confirmed. |
miners-revenue | The total value of Coinbase block rewards and transaction fees paid to miners. |
my-wallets | The total number of blockchain wallets created. |
n-trans | The number of daily confirmed Bitcoin transactions. |
n-trans-excl-100 | The total number of transactions per day excluding the chains longer than 100. |
n-trans-excl-popular | The total number of transactions, excluding those involving any of the network’s |
100 most popular addresses. | |
n-trans-per-block | The average number of transactions per block. |
n-trans-total | The total number of transactions. |
n-unique-addr | The total number of unique addresses used on the Bitcoin blockchain. |
output-val | The total value of all transaction outputs per day. |
total-bitcoins | The total number of Bitcoins that have already been mined. |
trade-vol | The total USD value of trading volume on major Bitcoin exchanges. |
trans-fees | The total BTC value of all transaction fees paid to miners. |
trans-fees-usd | The total USD value of all transaction fees paid to miners. |
trans-per-sec | The number of Bitcoin transactions added to the mempool per second. |
utxo-count | The number of unspent Bitcoin transactions outputs. |
Bitcoin Feature | Coefficient | Bitcoin Feature | Coefficient |
---|---|---|---|
avg-block-size | 0.8496 | n-trans | 0.8162 |
blockchain-size | 0.9030 | n-trans-excl-100 | 0.8469 |
cost-per-trans | 0.7542 | n-trans-excl-popular | 0.8233 |
cost-per-trans-pct | −0.4874 | n-trans-per-block | 0.8070 |
difficulty | 0.9033 | n-trans-total | 0.9030 |
est-trans-vol | −0.0595 | n-unique-addr | 0.8655 |
est-trans-vol-usd | 0.9436 | output-vol | 0.1537 |
hash-rate | 0.9033 | total-bitcoins | 0.9030 |
market-cap | 0.9975 | trade-vol | 0.8977 |
market-price | 0.9997 | trans-fees | 0.4227 |
med-cfm-time | 0.0895 | trans-fees-usd | 0.9390 |
miners-revenue | 0.9692 | utxo-count | 0.9067 |
my-wallets | 0.9030 |
Notation | Definition |
---|---|
S | A raw dataset |
The sequence from to | |
X | A data window of size m, |
The i-th data point in X | |
The value of the j-th dimension of the i-th data point in X |
Size | DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM |
---|---|---|---|---|---|---|---|
5 | 3.61 | 3.79 | 4.27 | 4.95 | 4.12 | 4.02 | 4.75 |
10 | 4.00 | 3.96 | 4.88 | 7.12 | 4.26 | 4.80 | 4.88 |
20 | 4.81 | 4.46 | 7.93 | 8.96 | 5.90 | 6.19 | 5.19 |
50 | 10.88 | 6.68 | 20.00 | 16.91 | 10.11 | 11.45 | 6.34 |
100 | 21.44 | 27.75 | 115.22 | 52.10 | 39.13 | 48.87 | 12.77 |
Size | DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random |
---|---|---|---|---|---|---|---|---|---|
5 | 49.16% | 50.43% | 50.14% | 50.88% | 50.02% | 49.14% | 50.88% | 50.88% | 51.38% |
10 | 48.19% | 50.22% | 48.74% | 50.51% | 48.37% | 49.43% | 50.79% | 50.79% | 50.62% |
20 | 50.64% | 49.70% | 50.78% | 49.84% | 48.86% | 51.02% | 50.80% | 50.80% | 49.68% |
50 | 53.06% | 50.94% | 52.48% | 49.83% | 51.52% | 52.02% | 50.85% | 50.85% | 50.98% |
100 | 48.16% | 48.89% | 48.49% | 50.00% | 48.59% | 49.40% | 50.00% | 50.00% | 48.61% |
Measure | DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random |
---|---|---|---|---|---|---|---|---|---|
Accuracy | 53.06% | 50.94% | 52.48% | 49.83% | 51.52% | 52.02% | 50.85% | 50.85% | 50.98% |
Precision | 52.90% | 51.40% | 52.70% | 20.40% | 52.00% | 51.89% | 51.00% | 51.00% | 51.80% |
Recall | 69.70% | 66.70% | 66.80% | 40.00% | 63.50% | 75.22% | 100.00% | 100.00% | 51.20% |
Specificity | 53.70% | 50.00% | 52.30% | 29.40% | 50.80% | 52.33% | 0.00% | 0.00% | 50.20% |
F1 score | 60.00% | 57.90% | 58.50% | 26.80% | 56.90% | 61.33% | 67.00% | 67.00% | 51.50% |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | |
---|---|---|---|---|---|---|---|
Log values | 4.81 | 4.46 | 7.93 | 8.96 | 5.90 | 6.19 | 5.19 |
Plain values | 5.48 | 5.21 | 10.13 | 8.94 | 7.50 | 6.40 | 7.02 |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random | |
---|---|---|---|---|---|---|---|---|---|
Log values | 53.06% | 50.94% | 52.48% | 49.83% | 51.52% | 52.02% | 50.85% | 50.85% | 50.98% |
Plain values | 50.85% | 50.53% | 49.83% | 50.81% | 52.22% | 49.74% | 50.85% | 50.85% | 49.76% |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | |
---|---|---|---|---|---|---|---|
sequential | 4.81 | 4.46 | 7.93 | 8.96 | 5.90 | 6.19 | 5.19 |
random | 3.65 | 3.52 | 4.93 | 5.26 | 5.20 | 4.16 | 4.23 |
5-fold CV | 4.83 | 4.09 | 6.80 | 9.75 | 7.12 | 5.18 | 5.04 |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random | |
---|---|---|---|---|---|---|---|---|---|
sequential | 53.06% | 50.94% | 52.48% | 49.83% | 51.52% | 52.02% | 50.85% | 50.85% | 50.98% |
random | 52.46% | 52.24% | 52.17% | 49.76% | 52.40% | 52.09% | 50.97% | 51.18% | 50.63% |
5-fold CV | 53.60% | 51.82% | 51.07% | 51.92% | 51.06% | 51.14% | 53.76% | 54.00% | 50.68% |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | |
---|---|---|---|---|---|---|---|
first value | 4.81 | 4.46 | 7.93 | 8.96 | 5.90 | 6.19 | 5.19 |
minmax | 14.23 | 14.18 | 157.00 | 92.68 | 35.73 | 22.34 | 34.44 |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random | |
---|---|---|---|---|---|---|---|---|---|
first value | 53.06% | 50.94% | 52.48% | 49.83% | 51.52% | 52.02% | 50.85% | 50.85% | 50.98% |
minmax | 51.71% | 51.28% | 49.83% | 50.85% | 49.36% | 49.44% | 50.85% | 50.85% | 49.82% |
DNN | LSTM | CNN | ResNet | CRNN | Ensemble | SVM | Base | Random | |
---|---|---|---|---|---|---|---|---|---|
regression | 6755.55 | 8806.72 | 6616.87 | 7608.35 | 8102.71 | 5772.99 | 9842.95 | − | − |
classification | 10877.07 | 10359.42 | 10422.19 | 10619.98 | 10315.18 | 10432.44 | 9532.43 | 9532.43 | 9918.70 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ji, S.; Kim, J.; Im, H. A Comparative Study of Bitcoin Price Prediction Using Deep Learning. Mathematics 2019, 7, 898. https://doi.org/10.3390/math7100898
Ji S, Kim J, Im H. A Comparative Study of Bitcoin Price Prediction Using Deep Learning. Mathematics. 2019; 7(10):898. https://doi.org/10.3390/math7100898
Chicago/Turabian StyleJi, Suhwan, Jongmin Kim, and Hyeonseung Im. 2019. "A Comparative Study of Bitcoin Price Prediction Using Deep Learning" Mathematics 7, no. 10: 898. https://doi.org/10.3390/math7100898