Abstract
Sequential data are generated in many domains of science and technology. Although many studies have been carried out for sequence classification in the past decade, the problem is still a challenge, particularly for pattern-based methods. We identify two important issues related to pattern-based sequence classification, which motivate the present work: the curse of parameter tuning and the instability of common interestingness measures. To alleviate these issues, we suggest a new approach and framework for mining sequential rule patterns for classification purpose. We introduce a space of rule pattern models and a prior distribution defined on this model space. From this model space, we define a Bayesian criterion for evaluating the interest of sequential patterns. We also develop a user parameter-free algorithm to efficiently mine sequential patterns from the model space. Extensive experiments show that (i) the new criterion identifies interesting and robust patterns, (ii) the direct use of the mined rules as new features in a classification process demonstrates higher inductive performance than the state-of-the-art sequential pattern-based classifiers.
Similar content being viewed by others
Notes
In case if MiSeRe has difficulty in finding the required 1024 rules, another constraint based on time is fixed. If MiSeRe cannot find any interesting rule 5 minutes after the generation of last interesting rule, the algorithm is terminated.
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE’95, pp 3–14
Aseervatham S, Osmani A, Viennet E (2006) Bitspade: a lattice-based sequential pattern mining algorithm using bitmap representation. In: Sixth International Conference on Data Mining, 2006. ICDM’06. IEEE, pp 792–797
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD’02. ACM, pp 429–435
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Baralis E, Chiusano S, Dutto R, Mantellini L (2008) Compact representations of sequential classification rules. In: Data mining: foundations and practice, pp 1–30
Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1):131–165
Boullé M (2007) Compression-based averaging of selective naive Bayes classifiers. J Mach Learn Res 8:1659–1685
Cardoso-Cachopo A (2007) Improving methods for single-label text categorization. PdD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa
Cheng H, Yan X, Han J, Hsu CW (2007) Discriminative frequent pattern analysis for effective classification. In: IEEE 23rd international conference on data engineering, 2007. ICDE 2007. IEEE, pp 716–725
Coenen F, Leng PH (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2):345–360
Companion website (2015) MiSeRe: Mining sequential classification rules. http://misere.co.nf
Cover TM, Thomas JA (2006) Elements of information theory (Wiley series in telecommunications and signal processing). Wiley-Interscience, New York
Dafé G, Veloso A, Zaki M, Meira W Jr. (2015) Learning sequential classifiers from long and noisy discrete-event sequences efficiently. Data Min Knowl Discov, To appear
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deng K, Zaïane OR (2010) An occurrence based approach to mine emerging sequences. In: DaWaK’10, pp 275–284
Deshpande M, Karypis G (2002) Evaluation of techniques for classifying biological sequences. In: PAKDD’02, pp 417–431
Egho E, Gay D, Boullé M, Voisine N, Clérot F (2015) A parameter-free approach for mining robust sequential classification rules. In: 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, 14–17 Nov 2015, pp 745–750
Egho E, Raïssi C, Calders T, Jay N, Napoli A (2015) On measuring similarity for sequences of itemsets. Data Min Knowl Discov 29(3):732–764
Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: ACM SIGKDD’08, pp 230–238
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst, To appear
Gay D, Boullé M (2012) A Bayesian approach for classification rule mining in quantitative databases. In: ECML/PKDD’12, pp 243–259
Gay D, Selmaoui N, Boulicaut J-F (2008) Feature construction based on closedness properties is not that simple. In: Advances in knowledge discovery and data mining. Springer, pp 112–123
Grosskreutz H, Lang B, Trabold D (2013) A relevance criterion for sequential patterns. In: ECML/PKDD’13, pp 369–384
Grünwald PD, Myung IJ, Pitt MA (2005) Advances in minimum description length: theory and applications. MIT press, Cambridge
Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Holat P, Plantevit M, Raïssi C, Tomeh N, Charnois T, Crémilleux B (2014) Sequence classification based on delta-free sequential patterns. In: ICDM’14, pp 170–179
Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. In: IEEE ICDM’05, pp 194–201
Jorge AM, Azevedo PL, Pereira F (2006) Distribution rules with numeric attributes of interest. In: PKDD’06, pp 247–258
Lam HT, Moerchen F, Fradkin D, Calders T (2012) Mining compressing sequential patterns. In: SDM’12, pp 319–330
Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52
Lavrac N, Gamberger D, Jovanoski V (1999) A study of relevance for learning in deductive databases. J Log Program 40(2–3):215–249
Lawrence R (2002) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):419–444
Lesh N, Zaki MJ, Ogihara M (1999) Mining features for sequence classification. In: ACM SIGKDD’99, pp 342–346
Leslie CS, Eskin E, Weston J, Noble WS (2002) Mismatch string kernels for SVM protein classification. In: NIPS’02, pp 1417–1424
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: ACM SIGKDD’98, pp 80–86
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Christopher JCH (2002) Watkins. Text classification using string kernels. J Mach Learn Res 2:419–444
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: KDD’96, pp 189–194
Ming L, Paul V (2013) An introduction to Kolmogorov complexity and its applications. Springer Science & Business Media, New York
Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215
Myers JL, Well AD (2003) Res Des Stat Anal. Lawrence Erlbaum Associates, New Jersey
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
She R, Chen F, Wang F, Ester M, Gardy FL, Brinkman FSL (2003) Frequent-subsequence-based prediction of outer membrane proteins. In: ACM SIGKDD’03, pp 436–445
Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Discov 6(1):9–35
Tatti N, Vreeken J (2012) The long and the short of it: Summarising event sequences with serial episodes. In: ACM SIGKDD’12, pp 462–470
Tseng VS, Lee CH (2005) CBS: a new classification method by using sequential patterns. In: SDM’05, pp 596–600
Vitányi P, Li M (2000) Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans Inf Theory 46(2):446–464
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: ICDE’04, pp 79–90
Xing Z, Pei J, Keogh EJ (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: CIKM’00, pp 422–429
Zaki MJ, Carothers CD, Szymanski BK (2010) VOGUE: a variable order hidden Markov model with duration based on frequent sequence mining. TKDD, 4(1)
Zhou C, Cule B, Goethals B (2013) Itemset based sequence classification. In: ECML/PKDD’13, pp 353–368
Zimmermann A, Nijssen S (2014) Supervised pattern mining and applications to classification. In: Frequent pattern mining, pp 425–442
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done while Dominique Gay was a research engineer at Orange Labs.
Appendix: Proof of Theorem 1 and 2
Appendix: Proof of Theorem 1 and 2
Here, we will present a complete Proof of Theorem 1, while theorem 2 can be proven in the same way.
Proof
Given the cost of the null model in Eq. , the prior terms are neglected when the number of sequences n is very high. Therefore, when \(n \rightarrow \infty \) the cost of the null model is written as:
Using the approximation \(log(n!) = n(log(n)-1)+O(log(n))\) based on Stirlings formula, the cost of the null model can be written as:
Notice that \(n=\sum _{i=1}^{j} n_{c_i}\). Therefore, we have:
Rights and permissions
About this article
Cite this article
Egho, E., Gay, D., Boullé, M. et al. A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52, 53–81 (2017). https://doi.org/10.1007/s10115-016-1002-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-1002-4