Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization

Published: 01 January 2016 Publication History

Abstract

Sequential pattern analysis aims at finding statistically relevant temporal structures where the values are delivered in a sequence. With the growing complexity of real-world dynamic scenarios, more and more symbols are often needed to encode the sequential values. This is so-called “curse of cardinality”, which can impose significant challenges to the design of sequential analysis methods in terms of computational efficiency and practical use. Indeed, given the overwhelming scale and the heterogeneous nature of the sequential data, new visions and strategies are needed to face the challenges. To this end, in this paper, we propose a “temporal skeletonization” approach to proactively reduce the cardinality of the representation for sequences by uncovering significant, hidden temporal structures. The key idea is to summarize the temporal correlations in an undirected graph, and use the “skeleton” of the graph as a higher granularity on which hidden temporal patterns are more likely to be identified. As a consequence, the embedding topology of the graph allows us to translate the rich temporal content into a metric space. This opens up new possibilities to explore, quantify, and visualize sequential data. Our approach has shown to greatly alleviate the curse of cardinality in challenging tasks of sequential pattern mining and clustering. Evaluation on a business-to-business (B2B) marketing application demonstrates that our approach can effectively discover critical buying paths from noisy customer event data.

References

[1]
R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proc. Int. Conf. Data Eng., 1995, pp. 3–14.
[2]
J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2002, pp. 429–435.
[3]
M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2001, vol. 14, pp. 585 –591.
[4]
Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, Aug. 1995.
[5]
P. Demartines and J. Hérault, “Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets,” IEEE Trans. Neural Netw., vol. 8, no. 1, pp. 148–154, Jan. 1997.
[6]
F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi, “Trajectory pattern mining,” in Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2007, pp. 330–339.
[7]
J. Han and Y. Fu, “ Mining multiple-level association rules in large databases,” IEEE Trans. Knowl. Data Eng., vol. 11, no. 5, pp. 798–805, Sep./Oct. 1999.
[8]
J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in Proc. 17th Int. conf. Data Eng., 2001, pp. 215–224.
[9]
J. Han, H. Cheng, D. Xin, and X. Yan, “Frequent pattern mining: Current status and future directions,” Data Mining Knowl. Discovery, vol. 15, no. 1, pp. 55–86, 2007.
[10]
B. K. Joseph, “Nonmetric multidimensional scaling: A numerical method, ” Psychometrika, vol. 29, no. 2, pp. 115 –129, 1964.
[11]
J.-G. Lee, J. Han, X. Li, and H. Cheng, “Mining discriminative patterns for classifying trajectories on road networks,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 5, pp. 713–726, May 2011.
[12]
C. Liu. Demo: Temporal Skeletonization [Online]. Available: http://liuchuanren.xminer.org/publications/S2Net.htm, 2015.
[13]
C. Liu, K. Zhang, H. Xiong, G. Jiang, and Q. Yang, “Temporal skeletonization on sequential data: Patterns, categorization, and visualization,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 1336–1345.
[14]
Y. N. Andrew, I. J. Michael, and Y. Weiss, “ On spectral clustering: Analysis and an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., 2002, vol. 2, pp. 849–856.
[15]
M. O. Lane, E. A. Les, and D. B. Gary, “Automatic clustering of vector time-series for manufacturing machine monitoring,” in Proc. Int. Conf. Acoust., Speech Signal Process., 1997, vol. 4, pp. 3393 –3396.
[16]
J. Pei, G. Dong, W. Zou, and J. Han, “On computing condensed frequent pattern bases,” in Proc. Int. Conf. Data Mining, 2002, pp. 378 –385.
[17]
K. Zhang, Q. Wang, Z. Chen, I. Marsic, V. Kumar, G. Jiang, and J. Zhang, “From categorical to numerical: Multiple transitive distance learning and embedding,” in Proc. SIAM Int. Conf. Data Mining, 2015, pp. 46–54.
[18]
W. S. Bernard, Density Estimation for Statistics and Data Analysis, vol. 26. Boca Raton, FL, USA: CRC Press, 1986.
[19]
R. Srikant and R. Agrawal, “Mining generalized association rules, ” in Proc. 21st Int. Conf. Very Large Databases, 1995, vol. 95, pp. 407–419.
[20]
R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” in Proc. 5th Int. Conf. Extending Database Technol.: Adv. Database Technol., 1996, pp. 3–17.
[21]
B. T. Joshua, V. D. Silva, and C. L. John, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
[22]
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsity and smoothness via the fused lasso, ” J. Royal Statist. Soc, Ser. B (Statist. Methodol.), vol. 67, no. 1, pp. 91–108, 2005.
[23]
J. Venna and S. Kaski, “Local multidimensional scaling,” Neural Netw., vol. 19, no. 6, pp. 889–899, 2006.
[24]
D. Xin, J. Han, X. Yan, and H. Cheng, “Mining compressed frequent-pattern sets,” in Proc. 31st Int. Conf. Very Large Data Bases, 2005, pp. 709 –720.
[25]
S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “ Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 40 –51, Jan. 2007.
[26]
J. Z. Mohammed, “Spade: An efficient algorithm for mining frequent sequences, ” Mach. Learn., vol. 42, nos. 1/2, pp. 31 –60, 2001.
[27]
K. Zhang and J. Kwok, “Clustered nyström method for large scale manifold learning and dimension reduction,” IEEE Trans. Neural Netw. Learn. Syst., vol. 21, no. 10, pp. 1576–1587, Oct. 2010.

Cited By

View all
  • (2023)MUSE: Visual Analysis of Musical Semantic SequenceIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.317536429:9(4015-4030)Online publication date: 1-Sep-2023
  • (2022)Sequence graph transform (SGT): a feature embedding function for sequence data miningData Mining and Knowledge Discovery10.1007/s10618-021-00813-036:2(668-708)Online publication date: 1-Mar-2022
  • (2020)A Framework for Event-oriented Text Retrieval Based on Temporal AspectsProceedings of the 2020 12th International Conference on Machine Learning and Computing10.1145/3383972.3384051(39-46)Online publication date: 15-Feb-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 28, Issue 1
Jan. 2016
294 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 January 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)MUSE: Visual Analysis of Musical Semantic SequenceIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.317536429:9(4015-4030)Online publication date: 1-Sep-2023
  • (2022)Sequence graph transform (SGT): a feature embedding function for sequence data miningData Mining and Knowledge Discovery10.1007/s10618-021-00813-036:2(668-708)Online publication date: 1-Mar-2022
  • (2020)A Framework for Event-oriented Text Retrieval Based on Temporal AspectsProceedings of the 2020 12th International Conference on Machine Learning and Computing10.1145/3383972.3384051(39-46)Online publication date: 15-Feb-2020
  • (2019)Detecting Pickpocket Suspects from Large-Scale Public Transit RecordsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283490931:3(465-478)Online publication date: 1-Mar-2019
  • (2017)A Data-driven Process Recommender FrameworkProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098174(2111-2120)Online publication date: 13-Aug-2017
  • (2016)Data-driven Automatic Treatment Regimen Development and RecommendationProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939866(1865-1874)Online publication date: 13-Aug-2016
  • (2016)Unified Point-of-Interest Recommendation with Temporal Interval AssessmentProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939773(1015-1024)Online publication date: 13-Aug-2016
  • (2016)Catch Me If You CanProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939687(87-96)Online publication date: 13-Aug-2016

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media