Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A framework to monitor clusters evolution applied to economy and finance problems

Published: 01 January 2012 Publication History

Abstract

The study of evolution has become an important research issue, especially in the last decade, due to our ability to collect and store high detailed and time-stamped data. The need for describing and understanding the behavior of a given phenomena over time led to the emergence of new frameworks and methods focused on the temporal evolution of data and models. In this paper we address the problem of monitoring the evolution of clusters over time and propose the MEC framework. MEC traces evolution through the detection and categorization of clusters transitions, such as births, deaths and merges, and enables their visualization through bipartite graphs. It includes a taxonomy of transitions, a tracking method based in the computation of conditional probabilities, and a transition detection algorithm. We use MEC with two main goals: to determine the general evolution trends and to detect abnormal behavior or rare events. To demonstrate the applicability of our framework we present real world economic and financial case studies, using datasets extracted from Banco de Portugal Central Balance-Sheet Database and the The Data Page of New York University --Leonard N. Stern School of Business. The results allow us to draw interesting conclusions about the evolution of activity sectors and European companies.

References

[1]
C.C. Aggarwal, A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD '03, pages 575-586, New York, NY, USA. ACM, (2003).
[2]
C.C. Aggarwal, On change diagnosis in evolving data streams, IEEE Transactions on Knowledge and Data Engineering 17 (2005), 587-600.
[3]
A.N. Albatineh, M. Niewiadomska-Bugaj and D. Mihalko, On similarity indices and correction for chance agreement, Journal of Classification 23 (2006), 301-313.
[4]
S. Asur, S. Parthasarathy and D. Ucar, An event-based framework for characterizing the evolutionary behavior of interaction graphs, ACM Transactions on Knowledge Discovery from Data 3 (2009), 1-36.
[5]
S. Baron and M. Spiliopoulou, Monitoring change in mining results, in: Data Warehousing and Knowledge Discovery, Y. Kambayashi and W.W.A. M., eds, volume 2114 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2001, pp. 51-60.
[6]
S. Baron and M. Spiliopoulou, Monitoring the evolution of web usage patterns, in: Web Mining: From Web to Semantic Web, B. Berendt, A. Hotho, D. Mladenic, M. van Someren, M. Spiliopoulou and G. Stumme, eds, volume 3209 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2004, pp. 181-200.
[7]
I. Bartolini, P. Ciaccia, I. Ntoutsi, M. Patella and Y. Theodoridis, The panda framework for comparing patterns, Data and Knowledge Engineering 68(2) (2009), 244-260.
[8]
P. Berkhin, A survey of clustering data mining techniques, in: Grouping Multidimensional Data, J. Kogan and N.C.T.M., eds, Springer Berlin Heidelberg, 2006, pp. 25-71.
[9]
N. Bolshakova and F. Azuaje, Cluster validation techniques for genome expression data, Signal Processing 83(4) (2003), 825-833.
[10]
M. Bottcher, F. Hoppner and M. Spiliopoulou, On exploiting the power of time in data mining, SIGKDD Explorations 10 (2008), 3-11.
[11]
M.G.M.S. Cardoso and A. Ponce de Leon Ferreira de Carvalho, Quality indices for (practical) clustering evaluation, Intelligent Data Analysis 13(5) (2009), 725-740.
[12]
S.S. Chawathe and H. Garcia-Molina, Meaningful change detection in structured data, SIGMOD Record 26 (1997), 26-37.
[13]
K. Chen and L. Liu, Detecting the change of clustering structure in categorical data streams, in: Proceedings of the 6th SIAM International Conference on Data Mining, J. Ghosh, D. Lambert, D.B. Skillicorn and J. Srivastava, eds, USA. SIAM, 2006.
[14]
S. Dudoit and J. Fridlyand, A prediction-based resampling method for estimating the number of clusters in a dataset, Genome Biology 3 (2002), 301-313.
[15]
S. Elnekave, M. Last and O. Maimon, in: Proceedings of the 23th International Conference on Data Engineering, Data Engineering Workshops. IEEE Computer Society, 2007.
[16]
T. Falkowski, J. Bartelheimer and M. Spiliopoulou, Mining and visualizing the evolution of subgroups in social networks, In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, WI '06, pages 52-58, Washington, DC, USA. IEEE Computer Society, 2006.
[17]
V. Ganti, J. Gehrke and R. Ramakrishnan, A framework for measuring changes in data characteristics, In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '99, pages 126-137, New York, NY, USA. ACM, 1999.
[18]
E.R. Hruschka, R.J.G.B. Campello and L.N. Castro, Evolving clusters in gene-expression data, Information Sciences 176 (2006), 1898-1927.
[19]
A.K. Jain, Data clustering: 50 years beyond k-means, Pattern Recognition Letters 31(8) (2010), 651-666.
[20]
A.K. Jain, M.N. Murty and P.J. Flynn, Data clustering: a review, ACM Computing Surveys 31 (1999), 264-323.
[21]
P. Kalnis, N. Mamoulis and S. Bakiras, On discovering moving clusters in spatio-temporal data, in: Advances in Spatial and Temporal Databases, C. Bauzer Medeiros, M. Egenhofer and E. Bertino, eds, volume 3633 of Lecture Notes in Computer Science, pages 364-381. Springer Berlin / Heidelberg, 2005.
[22]
L. Kaufman, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, 2005.
[23]
S. Kaur, V. Bhatnagar, S. Mehta and S. Kapoor, Concept drift in unlabeled data stream. Technical report, University of Delhi, 2009.
[24]
T. Li, S. Ma and M. Ogihara, Entropy-based criterion in categorical clustering, In Proceedings of the 21th International Conference on Machine Learning, ICML '04, pages 68-, New York, NY, USA. ACM, 2004.
[25]
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha and R. Motwani, Streaming-data algorithms for high-quality clustering, In Proceedings of the 18th International Conference on Data Engineering, ICDE '02, 2002, pages 685-694.
[26]
M. Oliveira and J. Gama, Bipartite graphs for monitoring clusters transitions, in: Advances in Intelligent Data Analysis IX, P. Cohen, N. Adams and M. Berthold, eds, volume 6065 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2010, pp. 114-124.
[27]
S. Petrovic, A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters, in Proceedings of the 11th Nordic Workshop on Secure IT-systems, NordSec 2006, 2006, pages 53-64.
[28]
H. Qiao and B. Edwards, A data clustering tool with cluster validity indices, in International Conference on Computing, Engineering and Information, ICC '09, 2009, pages 303-309.
[29]
P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 20 (1987), 53-65.
[30]
M. Spiliopoulou, I. Ntoutsi and Y. Theodoridis, Tracing cluster transitions for different cluster types, Control and Cybernetics 38 (2009), 239-259.
[31]
M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis and R. Schult, Monic: Modeling and monitoring cluster transitions, In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 706-711, New York, NY, USA. ACM, 2006.
[32]
E.J. Spinosa, A. Ponce de Leon Ferreira de Carvalho and J. Gama, Olindda: a cluster-based approach for detecting novelty and concept drift in data streams, In Proceedings of the 2007 ACM Symposium on Applied Computing, SAC '07, pages 448-452, New York, NY, USA. ACM, 2007.
[33]
G. Urga, The econometrics of panel data: A selective introduction, Economics series working papers, University of Oxford, Department of Economics, 1992.
[34]
H. Yang, S. Parthasarathy and S. Mehta, A generalized framework for mining spatio-temporal patterns in scientific data, In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD '05, pages 716-721, New York, NY, USA. ACM, 2005.

Cited By

View all
  • (2018)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3164135.316413611:4(393-405)Online publication date: 5-Oct-2018
  • (2018)Data Stream Evolution Diagnosis Using Recursive Wavelet Density EstimatorsACM Transactions on Knowledge Discovery from Data10.1145/310636912:1(1-28)Online publication date: 23-Jan-2018
  • (2017)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3186728.316413611:4(393-405)Online publication date: 1-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Intelligent Data Analysis
Intelligent Data Analysis  Volume 16, Issue 1
January 2012
159 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2012

Author Tags

  1. Bipartite Graphs
  2. Change Mining
  3. Clustering Evolution
  4. Monitoring
  5. Transitions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3164135.316413611:4(393-405)Online publication date: 5-Oct-2018
  • (2018)Data Stream Evolution Diagnosis Using Recursive Wavelet Density EstimatorsACM Transactions on Knowledge Discovery from Data10.1145/310636912:1(1-28)Online publication date: 23-Jan-2018
  • (2017)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3186728.316413611:4(393-405)Online publication date: 1-Dec-2017
  • (2016)Analyzing the behavior dynamics of grain price indexes using Tucker tensor decomposition and spatio-temporal trajectoriesComputers and Electronics in Agriculture10.1016/j.compag.2015.11.011120:C(72-78)Online publication date: 1-Jan-2016
  • (2014)Survival analysis on data streamsInternational Journal of Applied Mathematics and Computer Science10.2478/amcs-2014-001524:1(199-212)Online publication date: 1-Mar-2014
  • (2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014
  • (2013)Data stream clusteringACM Computing Surveys10.1145/2522968.252298146:1(1-31)Online publication date: 11-Jul-2013

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media