Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

On exploiting the power of time in data mining

Published: 20 December 2008 Publication History

Abstract

We introduce the new paradigm of Change Mining as data mining over a volatile, evolving world with the objective of understanding change. While there is much work on incremental mining and stream mining, both focussing on the adaptation of patterns to a changing data distribution, Change Mining concentrates on understanding the changes themselves. This includes detecting when change occurs in the population under observation, describing the change, predicting change and pro-acting towards it. We identify the main tasks of Change Mining and discuss to what extent they are already present in related research areas. We elaborate on research results that can contribute to these tasks, giving a brief overview of the current state of the art and identifying open areas and challenges for the new research area.

References

[1]
C. Aggarwal. On change diagnosis in evolving data streams. IEEE TKDE, 17(5):587--600, May 2005.
[2]
C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In Proc. of Int. Conf. on Very Large Data Bases (VLDB'03), 2003.
[3]
C.C. Aggarwal and P.S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. In Proceedings of the SIAM conference on Data Mining 2006, April 2006.
[4]
R. Agrawal and G. Psaila. Active data mining. In M. Fayyad, Usama and R. Uthurusamy, editors, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 3--8, Montreal, Quebec, Canada, 1995. AAAI Press, Menlo Park, CA, USA.
[5]
S. Baron, M. Spiliopoulou, and O. Günther. Efficient monitoring of patterns in data mining environments. In Proc. of 7th East-European Conf. on Advances in Databases and Inf. Sys. (ADBIS'03), LNCS, pages 253--265. Springer, Sept. 2003.
[6]
I. Bartolini, P. Ciaccia, I. Ntoutsi, M. Patella, and Y. Theodoridis. A unified and flexible framework for comparing simple and complex patterns. In Proc. of ECML/PKDD 2004, Pisa, Italy, Sept. 2004. Springer Verlag.
[7]
P. Bille. A survey on tree edit distance and related problems. Theoretical Computer Science, 337(1-3):217--239, 2005.
[8]
M. Boettcher, D. Nauck, D. Ruta, and M. Spott. Towards a framework for change detection in datasets. In Proceedings of the 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 115--128. Springer, 2006.
[9]
M. Boettcher, D. Nauck, D. Ruta, and M. Spott. A framework for discovering and analyzing changing customer segments. In Proceedings of the 7th Industrial Conference on Data Mining (ICDM2007), LNAI 4597, pages 255--268. Springer, 2007.
[10]
L. Breiman. The heuristics of instability in model selection. Annals of Statistics, 24:2350--2383, 1996.
[11]
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-Based Clustering over an Evolving Data Stream with Noise. In Proc. SIAM Conf. Data Mining, 2006.
[12]
S. Chakrabarti, S. Sarawagi, and B. Dom. Mining Surprising Patterns Using Temporal Description Length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB'98, pages 606--617, New York City, NY, August 1998. Morgan Kaufmann.
[13]
S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In Proceedings of the 24th International Conference on Very Large Databases, pages 606--617. Morgan Kaufmann Publishers Inc., 1998.
[14]
M.-C. Chen, A.-L. Chiu, and H.-H. Chang. Mining changes in customer behavior in retail marketing. Expert Systems with Applications, 28(4):773--781, 2005.
[15]
G. Dong, J. Han, and L. Lakshmanan. Online mining of changes from data streams - research problems and preliminary results. In Proceedings of the ACM SIGMOD Workshop on Management and Processing of Data Streams, June 2003.
[16]
G. Dong and J. Li. Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--52, 1999.
[17]
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental Clustering for Mining in a Data Warehousing Environment. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 323--333, New York City, New York, USA, August 1998. Morgan Kaufmann.
[18]
V. Ganti, J. Gehrke, and R. Ramakrishnan. A Framework for Measuring Changes in Data Characteristics. In Proc. of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 126--137, Philadelphia, Pennsylvania, May 1999. ACM Press.
[19]
V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS: Clustering categorical data using summaries. In Proc. of 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD '99), pages 73--83, San Diego, CA, Aug. 1999. ACM Press.
[20]
V. Ganti, J. Gehrke, and R. Ramakrishnan. DEMON: Mining and Monitoring Evolving Data. In Proc. of the 15th Int. Conf. on Data Engineering (ICDE'2000), pages 439--448, San Diego, CA, USA, Feb. 2000. IEEE Computer Society.
[21]
V. Guralnik and J. Srivastava. Event detection from time series data. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 33--42, New York, NY, USA, 1999. ACM.
[22]
F. Höppner and M. Böttcher. Matching partitions over time to reliably capture local clusters in noisy domains. In Principles and Practice of Knowledge Discovery in Databases PKDD, pages 479--486, Warsaw, Poland, 2007. Springer.
[23]
P. Kalnis, N. Mamoulis, and S. Bakiras. On Discovering Moving Clusters in Spatio-temporal Data. In Proc. of 9th Int. Symposium on Advances in Spatial and Temporal Databases (SSTD'2005), number 3633 in LNCS, pages 364--381, Angra dos Reis, Brazil, Aug. 2005. Springer.
[24]
E. Keogh, S. Lonardi, and B.Y. chi' Chiu. Finding surprising patterns in a time series database in linear time and space. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 550--556, New York, NY, USA, 2002. ACM.
[25]
J.K. Kim, H.S. Song, T.S. Kim, and H.K. Kim. Detecting the change of customer behavior based on decision tree analysis. Expert Systems, 22(4):193--205, 2005.
[26]
R.-H. Li and G.G. Belford. Instability of decision tree classification algorithms. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 570--575, New York, NY, USA, 2002. ACM.
[27]
B. Liu, W. Hsu, H.-S. Han, and Y. Xia. Mining changes for real-life applications. In Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pages 337--346, London, UK, 2000. Springer.
[28]
B. Liu, W. Hsu, and Y. Ma. Discovering the set of fundamental rule changes. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 335--340, 2001.
[29]
B. Liu, Y. Ma, and R. Lee. Analyzing the interestingness of association rules from the temporal dimension. In Proceedings of the IEEE International Conference on Data Mining, pages 377--384. IEEE Computer Society, 2001.
[30]
B. Liu and A. Tuzhilin. Managing large collections of data mining models. Communications of ACM, 51(2):85--89, Feb. 2008.
[31]
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 613--618, New York, NY, USA, 2003. ACM.
[32]
A. Maddalena and B. Catania. Towards an interoperable solution for pattern management. In 3rd Int. Workshop on Database Interoperability INTERDB'07 (in conjunction with VLDB'07), Vienna, Austria, Sept. 2007.
[33]
Q. Mei and C. Zhai. Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 198--207, Chicago, IL, Aug. 2005. ACM Press.
[34]
O. Nasraoui, C. Cardona-Uribe, and C. Rojas-Coronel. Tecno-Streams: Tracking evolving clusters in noisy data streams with an scalable immune system learning method. In Proc. IEEE Int. Conf. on Data Mining (ICDM'03), Melbourne, Australia, 2003.
[35]
M. Pĕchouček, O. Štĕpánková, and P.Mikšovský. Maintenance of Discovered Knowledge. In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, pages 476--483, Prague, Czech Republic, September 1999. Springer.
[36]
E. L. Rissland and M. T. Friedman. Detecting change in legal concepts. In ICAIL '95: Proceedings of the 5th International Conference on Artificial Intelligence and Law, pages 127--136, New York, NY, USA, 1995. ACM.
[37]
J.F. Roddick, M. Spiliopoulou, D. Lister, and A. Ceglar. Higher order mining. submitted for publication, 2007.
[38]
R. Schult and M. Spiliopoulou. Discovering emerging topics in unlabelled text collections. In Proc. of AD-BIS'2006, Thessaloniki, Greece, Sept. 2006. Springer.
[39]
S. Schulz, M. Spiliopoulou, and R. Schult. Topic and cluster evolution over noisy document streams. In F. Masseglia, P. Poncelet, and M. Teisseire, editors, Data Mining Patterns: New Methods and Applications. Idea Group, 2007.
[40]
M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult. Monic -- modeling and monitoring cluster transitions. In Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), pages 706--711, Philadelphia, USA, Aug. 2006. ACM.
[41]
M. Vazirgiannis, M. Halkidi, and D. Gunopoulos. Uncertainty Handling and Quality Assessment in Data Mining. Springer, 2003.
[42]
H. Yang, S. Parthasarathy, and S. Mehta. A generalized framework for mining spatio-temporal patterns in scientific data. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 716--721, Chicago, IL, Aug. 2005. ACM Press.
[43]
K. Zhang, J.T.L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In Z. Galil and E. Ukkonen, editors, Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, pages 395--407. Springer-Verlag, Berlin, 1995.
[44]
X. Zhang, G. Dong, and R. Kotagiri. Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 310--314, 2000.
[45]
A. Zhou, C. Feng, W. Qian, and C. Jin. Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2007.

Cited By

View all
  • (2024)Optimizing City Services through Configurable Change Mining: Implementation of the Variability Change Tree Approach2024 Mediterranean Smart Cities Conference (MSCC)10.1109/MSCC62288.2024.10697070(1-6)Online publication date: 2-May-2024
  • (2022)Call Graph Evolution Analytics over a Version Series of an Evolving Software SystemProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3559573(1-5)Online publication date: 10-Oct-2022
  • (2022)System Network Analytics: Evolution and Stable Rules of a State Series2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032382(1-10)Online publication date: 13-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 10, Issue 2
December 2008
98 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1540276
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2008
Published in SIGKDD Volume 10, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing City Services through Configurable Change Mining: Implementation of the Variability Change Tree Approach2024 Mediterranean Smart Cities Conference (MSCC)10.1109/MSCC62288.2024.10697070(1-6)Online publication date: 2-May-2024
  • (2022)Call Graph Evolution Analytics over a Version Series of an Evolving Software SystemProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3559573(1-5)Online publication date: 10-Oct-2022
  • (2022)System Network Analytics: Evolution and Stable Rules of a State Series2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032382(1-10)Online publication date: 13-Oct-2022
  • (2022)Change-Detection Machine Learning Model for Educational ManagementCybernetics and Systems10.1080/01969722.2022.208034054:7(1212-1239)Online publication date: 30-May-2022
  • (2022)Strategic planning support for road safety measures based on accident data miningIATSS Research10.1016/j.iatssr.2022.06.00146:3(427-440)Online publication date: Oct-2022
  • (2022)Identifying Non-intuitive Relationships Within Returns Data of a Furniture Online-Shop Using Temporal Data MiningRecent Challenges in Intelligent Information and Database Systems10.1007/978-981-19-8234-7_24(299-312)Online publication date: 24-Nov-2022
  • (2021)minStab: Stable Network Evolution Rule Mining for System Changeability AnalysisIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.28927345:2(274-283)Online publication date: Apr-2021
  • (2021)Service Evolution Analytics: Change and Evolution Mining of a Distributed SystemIEEE Transactions on Engineering Management10.1109/TEM.2020.298764168:1(137-148)Online publication date: Feb-2021
  • (2020)System Network Complexity: Network Evolution Subgraphs of System State SeriesIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2018.28482934:2(130-139)Online publication date: Apr-2020
  • (2020)Deployment and Model ManagementGuide to Intelligent Data Science10.1007/978-3-030-45574-3_10(319-328)Online publication date: 7-Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media