Abstract
Nowadays, World Wide Web is full of rich information, including text data, XML data, multimedia data, time series data, etc. The web is usually represented as a large graph and PageRank is computed to rank the importance of web pages. In this paper, we study the problem of ranking evolving time series and discovering leaders from them by analyzing lead-lag relations. A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series. At each time point, we compute the lagged correlation between each pair of time series and model them in a graph. Then, the leadership rank is computed from the graph, which brings order to time series. Based on the leadership ranking, the leaders of time series are extracted. However, the problem poses great challenges since the dynamic nature of time series results in a highly evolving graph, in which the relationships between time series are modeled. We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy. Our experiments on real weather science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both weather monitoring and financial risk control.
Similar content being viewed by others
References
Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press (2006)
Bhuyan, R.: Information, alternative markets, and security price processes: a survey of literature. Finance 0211002, EconWPA (2002)
Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting and Control. Prentice Hall (1994)
Brennan, M.J., Jegadeesh, N., Swaminathan, B.: Investment analysis and the adjustment of stock prices to common information. Rev. Financ. Stud. 6(4), 799–824 (1993)
Brent, R.P.: Algorithms for Minimization Without Derivatives. Dover Publications (2002)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Campbell, J.Y., Grossman, S.J., Wang, J.: Trading volume and serial correlation in stock returns. Q. J. Econ. 108(4), 905–939 (1993)
Chan, K.: A further analysis of the lead-lag relationship between the cash market and stock index futures market. Rev. Financ. Stud. 5(1), 123–152 (1992)
Corso, G.M.D., Gullí, A., Romani, F.: Ranking a stream of news. In: WWW ’05: Proceedings of the 14th International Conference on World Wide Web, pp. 97–106. ACM, New York (2005)
Dorr, D.H., Denton, A.M.: Establishing relationships among patterns in stock market data. In: Data & Knowledge Engineering (2008)
Douglis, F., Ball, T., Chen, Y.-F., Koutsofios, E.: The AT&T internet difference engine: tracking and viewing changes on the web. World Wide Web 1(1), 27–44 (1998)
Granger, C.W.J.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3), 424–38 (1969)
Greco, G., Greco, S., Zumpano, E.: A probabilistic approach for distillation and ranking of web pages. World Wide Web 4(3), 189–207 (2001)
Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: KDD, pp. 78–87. ACM, New York (2005)
Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: WWW, pp. 491–501. ACM, New York (2004)
Idé, T., Kashima, H.: Eigenspace-based anomaly detection in computer systems. In: KDD, pp. 440–449 (2004)
Idé, T., Papadimitriou, S., Vlachos, M.: Computing correlation anomaly scores using stochastic nearest neighbors. In: ICDM, pp. 523–528
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous subspace clustering in streaming time series. Inf. Syst. 33(2), 240–260 (2008)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)
Meijering, E.: Chronology of interpolation: From ancient astronomy to modern signal and image processing. In: Proc. of the IEEE, pp. 319–342 (2002)
Nie, Z., Zhang, Y., Wen, J.-R., Ma, W.-Y.: Object-level ranking: bringing order to web objects. In: WWW, pp. 567–574 (2005)
Papadimitriou, S., Sun, J., Yu, P.S.: Local correlation tracking in time series. In: ICDM, pp. 456–465 (2006)
Pirolli, P., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web 2(1–2), 29–45 (1999)
Pitkow, J.E.: Summary of www characterizations. World Wide Web 2(1–2), 3–13 (1999)
Säfvenblad, P.: Lead-lag effects when prices reveal cross-security information. Working Paper Series in Economics and Finance 189. Stockholm School of Economics (1997)
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Braid: stream mining through group lag correlations. In: SIGMOD, pp. 599–610 (2005)
Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S.A., Potter, C.: Discovery of climate indices using clustering. In: KDD, pp. 446–455 (2003)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)
von Storch, H., Zwiers, F.W.: Statistical Analysis in Climate Research. Cambridge University Press (2002)
Wang, Q., Megalooikonomou, V.: A dimensionality reduction technique for efficient time series similarity analysis. Inf. Syst. 33(1), 115–132 (2008)
Wichard, J.D., Merkwirth, C., Ogorzałlek, M.: Detecting correlation in stock market. Physica, A 344(1–2), 308–311 (2004)
Wu, D., Ke, Y., Yu, J.X., Yu, P.S., Chen, L.: Detecting leaders from correlated time series. In: DASFAA, pp. 352–367 (2010)
Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, D., Ke, Y., Yu, J.X. et al. Leadership discovery when data correlatively evolve. World Wide Web 14, 1–25 (2011). https://doi.org/10.1007/s11280-010-0095-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-010-0095-z