Composite leading search index: a preprocessing method of internet search data for stock trends prediction

Ying Liu¹,
Yibing Chen¹,
Sheng Wu¹,
Geng Peng¹ &
…
Benfu Lv¹

887 Accesses
26 Citations
Explore all metrics

Abstract

Previous studies have revealed that Internet search data is a new source of data that can be used to predict the stock market. In this new, data-driven research field, choosing a method for preprocessing data is crucial to achieving accurate prediction performance. This paper proposes a preprocessing method of Internet search data: composite leading search index (CLSI), which is composed of three steps: (a) keyword selection, (b) time difference measurement, and (c) leading index composition. We demonstrate the validity of CLSI by comparing this method’s results with the results from search volume index (SVI), which is most commonly used in previous literatures. We build a time series model (TS) with error correction and support vector regression (SVR) for stock trend prediction, and combine into four models for comparison: SVI–TS, CLSI–TS, SVI–SVR, and CLSI–SVR. We test these four models in the context of the Chinese stock market, which interests more and more investors nowadays, and analyzed results in nine datasets: stable periods, peak periods and trough periods of Shanghai Composite Index, Shenzhen Composite Index, and Hushen 300 index respectively. The results show that using TS and SVR as forecasting models, CLSI performs better than SVI on majority of the test dataset while has almost the same performance with that of SVI on the remaining test dataset. It is to some extent convincing that CLSI is a more efficient preprocessing method of Internet search data for stock trend prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors

Article 24 January 2017

LT-SMF: long term stock market price trend prediction using optimal hybrid machine learning technique

Article 31 October 2022

A deep learning approach for financial market prediction: utilization of Google trends and keywords

Article 11 June 2019

Notes

Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html.
In optimization, if KKT conditions are satisfied, then the solution of the original optimization problems is identical to that of the dual problems. In applications of Support Vector Machines (including classification and regression), dual problems rather than original problems are solved.

References

Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294.
Article Google Scholar
Askitas, N., & Zimmermann, K. F. (2009). Google econometrics and unemployment forecasting. Applied Economics Quarterly, 55(2), 107–120.
Article Google Scholar
Boehm, E. A. (2001). The contribution of economic indicator analysis to understanding and forecasting business cycles. Indian Economic Review, 36, 1–36.
Google Scholar
Bowerman, B. L., O’Connell, R. T., & Koehler, A. B. (2004). Forecasting, time series and regression: An applied approach. Belmont, CA: Thomson Brooks/Cole.
Google Scholar
Cao, Q., Parry, M. E., & Leggio, K. B. (2011). The three-factor model and artificial neural networks: Predicting stock price movement in China. Annals of Operations Research, 185(1), 25–44.
Article Google Scholar
Choi, H., & Varian, H. (2012). Predicting the present with Google trends. Economic Record, 88(s1), 2–9.
Article Google Scholar
Clarkson, G. P. E. (1963). A model of the trust investment process. Computers and thought. New York: McGraw-Hill.
Google Scholar
Capon, N., Fitzsimons, G. J., & Prince, R. A. (1996). An individual level analysis of the mutual fund investment decision. Journal of Financial Services Research, 10, 59–82.
Article Google Scholar
Da, Z., Engelberg, J., & Gao, P. (2011). In search of attention. The Journal of Finance, 66(5), 1461–1499.
Article Google Scholar
Granger, C. W. (1988). Some recent development in a concept of causality. Journal of Econometrics, 39(1), 199–211.
Article Google Scholar
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457, 1012–1014.
Article Google Scholar
Huang, W., Nakamori, Y., & Wang, S. (2005). Forecasting stock market movement direction with using support vector machine. Computers and Operations Research, 32, 2513–2522.
Article Google Scholar
Hulth, A., Rydevik, G., & Linde, A. (2009). Web queries as a source for syndromic surveillance. PLoS One, 4(2), e4378.
Article Google Scholar
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Kim, K. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55, 307–319.
Article Google Scholar
Kullback, S. (1987). The kullback–leibler distance. The American Statistician, 41(4), 340–341.
Google Scholar
Mao, H., Counts, S., Bollen, J. (2011). Predicting financial markets: Comparing survey, news, twitter and search engine data. arXiv preprint arXiv:1112.1051
Moore, G. H., & Shiskin, J. (1967). Indicators of business expansions and contractions. NBER. Occasional Paper, No 103.
Mitchell, T. (2009). Mining our reality. Science, 326, 1644–1645.
Article Google Scholar
Hanke, J. E., & Reitsch, A. G. (1995). Business forecasting (5th ed.). Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Smith, G. P. (2012). Google internet search activity and volatility prediction in the market for foreign currency. Finance Research Letters, 9(2), 103–110.
Article Google Scholar
Tierney, H. L. R., & Pan, B. (2012). A poisson regression examination of the relationship between website traffic and search engine queries. NETNOMICS: Economic Research and Electronic Networking, 13, 155–189.
Tumarkin, R., & Whitelaw, R. F. (2001). News or noise? Internet postings and stock prices. Financial Analysts, 57(3), 41–51.
Article Google Scholar
Wang, L., & Zhu, J. (2010). Financial market forecasting using a two-step kernel learning method for the support vector regression. Annals of Operations Research, 174(1), 103–120.
Article Google Scholar

Download references

Acknowledgments

This work has been partially supported by the National Natural Science Foundation of China under Grant 71202115, 71172199, 71201143, and 70972104, Beijing Natural Science Foundation under Grant 9143021, Postdoctoral Science Foundation of China under Grant 2013T60158, and Sponsorship from China scholarship Council (CSC).

Author information

Authors and Affiliations

School of Management, University of Chinese Academy of Sciences, Beijing, 100190, China
Ying Liu, Yibing Chen, Sheng Wu, Geng Peng & Benfu Lv

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yibing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Geng Peng
View author publications
You can also search for this author in PubMed Google Scholar
Benfu Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geng Peng.

Additional information

The authors are grateful to the anonymous reviewers and editors for their helpful comments and suggestions to improve the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Chen, Y., Wu, S. et al. Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Ann Oper Res 234, 77–94 (2015). https://doi.org/10.1007/s10479-014-1779-z

Download citation

Published: 13 January 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10479-014-1779-z

Composite leading search index: a preprocessing method of internet search data for stock trends prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors

LT-SMF: long term stock market price trend prediction using optimal hybrid machine learning technique

A deep learning approach for financial market prediction: utilization of Google trends and keywords

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Composite leading search index: a preprocessing method of internet search data for stock trends prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors

LT-SMF: long term stock market price trend prediction using optimal hybrid machine learning technique

A deep learning approach for financial market prediction: utilization of Google trends and keywords

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation