A user parameter-free approach for mining robust sequential classification rules

Elias Egho ORCID: orcid.org/0000-0001-5250-4698¹,
Dominique Gay²,
Marc Boullé¹,
Nicolas Voisine¹ &
…
Fabrice Clérot¹

670 Accesses
24 Citations
Explore all metrics

Abstract

Sequential data are generated in many domains of science and technology. Although many studies have been carried out for sequence classification in the past decade, the problem is still a challenge, particularly for pattern-based methods. We identify two important issues related to pattern-based sequence classification, which motivate the present work: the curse of parameter tuning and the instability of common interestingness measures. To alleviate these issues, we suggest a new approach and framework for mining sequential rule patterns for classification purpose. We introduce a space of rule pattern models and a prior distribution defined on this model space. From this model space, we define a Bayesian criterion for evaluating the interest of sequential patterns. We also develop a user parameter-free algorithm to efficiently mine sequential patterns from the model space. Extensive experiments show that (i) the new criterion identifies interesting and robust patterns, (ii) the direct use of the mined rules as new features in a classification process demonstrates higher inductive performance than the state-of-the-art sequential pattern-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining sequential patterns for classification

Article 03 January 2015

SPaR-FTR: An Efficient Algorithm for Mining Sequential Patterns-Based Rules

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Notes

http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html.
http://web.ist.utl.pt/acardoso/datasets/.
http://www.khiops.com.
In case if MiSeRe has difficulty in finding the required 1024 rules, another constraint based on time is fixed. If MiSeRe cannot find any interesting rule 5 minutes after the generation of last interesting rule, the algorithm is terminated.

References

Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE’95, pp 3–14
Aseervatham S, Osmani A, Viennet E (2006) Bitspade: a lattice-based sequential pattern mining algorithm using bitmap representation. In: Sixth International Conference on Data Mining, 2006. ICDM’06. IEEE, pp 792–797
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD’02. ACM, pp 429–435
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Baralis E, Chiusano S, Dutto R, Mantellini L (2008) Compact representations of sequential classification rules. In: Data mining: foundations and practice, pp 1–30
Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1):131–165
Article Google Scholar
Boullé M (2007) Compression-based averaging of selective naive Bayes classifiers. J Mach Learn Res 8:1659–1685
MathSciNet MATH Google Scholar
Cardoso-Cachopo A (2007) Improving methods for single-label text categorization. PdD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa
Cheng H, Yan X, Han J, Hsu CW (2007) Discriminative frequent pattern analysis for effective classification. In: IEEE 23rd international conference on data engineering, 2007. ICDE 2007. IEEE, pp 716–725
Coenen F, Leng PH (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2):345–360
Article Google Scholar
Companion website (2015) MiSeRe: Mining sequential classification rules. http://misere.co.nf
Cover TM, Thomas JA (2006) Elements of information theory (Wiley series in telecommunications and signal processing). Wiley-Interscience, New York
Google Scholar
Dafé G, Veloso A, Zaki M, Meira W Jr. (2015) Learning sequential classifiers from long and noisy discrete-event sequences efficiently. Data Min Knowl Discov, To appear
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deng K, Zaïane OR (2010) An occurrence based approach to mine emerging sequences. In: DaWaK’10, pp 275–284
Deshpande M, Karypis G (2002) Evaluation of techniques for classifying biological sequences. In: PAKDD’02, pp 417–431
Egho E, Gay D, Boullé M, Voisine N, Clérot F (2015) A parameter-free approach for mining robust sequential classification rules. In: 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, 14–17 Nov 2015, pp 745–750
Egho E, Raïssi C, Calders T, Jay N, Napoli A (2015) On measuring similarity for sequences of itemsets. Data Min Knowl Discov 29(3):732–764
Article MathSciNet Google Scholar
Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: ACM SIGKDD’08, pp 230–238
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst, To appear
Gay D, Boullé M (2012) A Bayesian approach for classification rule mining in quantitative databases. In: ECML/PKDD’12, pp 243–259
Gay D, Selmaoui N, Boulicaut J-F (2008) Feature construction based on closedness properties is not that simple. In: Advances in knowledge discovery and data mining. Springer, pp 112–123
Grosskreutz H, Lang B, Trabold D (2013) A relevance criterion for sequential patterns. In: ECML/PKDD’13, pp 369–384
Grünwald PD, Myung IJ, Pitt MA (2005) Advances in minimum description length: theory and applications. MIT press, Cambridge
Google Scholar
Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Holat P, Plantevit M, Raïssi C, Tomeh N, Charnois T, Crémilleux B (2014) Sequence classification based on delta-free sequential patterns. In: ICDM’14, pp 170–179
Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. In: IEEE ICDM’05, pp 194–201
Jorge AM, Azevedo PL, Pereira F (2006) Distribution rules with numeric attributes of interest. In: PKDD’06, pp 247–258
Lam HT, Moerchen F, Fradkin D, Calders T (2012) Mining compressing sequential patterns. In: SDM’12, pp 319–330
Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52
Article MathSciNet Google Scholar
Lavrac N, Gamberger D, Jovanoski V (1999) A study of relevance for learning in deductive databases. J Log Program 40(2–3):215–249
Article MathSciNet MATH Google Scholar
Lawrence R (2002) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):419–444
Google Scholar
Lesh N, Zaki MJ, Ogihara M (1999) Mining features for sequence classification. In: ACM SIGKDD’99, pp 342–346
Leslie CS, Eskin E, Weston J, Noble WS (2002) Mismatch string kernels for SVM protein classification. In: NIPS’02, pp 1417–1424
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: ACM SIGKDD’98, pp 80–86
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Christopher JCH (2002) Watkins. Text classification using string kernels. J Mach Learn Res 2:419–444
MATH Google Scholar
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: KDD’96, pp 189–194
Ming L, Paul V (2013) An introduction to Kolmogorov complexity and its applications. Springer Science & Business Media, New York
Google Scholar
Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215
Article MathSciNet Google Scholar
Myers JL, Well AD (2003) Res Des Stat Anal. Lawrence Erlbaum Associates, New Jersey
Google Scholar
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Article MATH Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article MathSciNet Google Scholar
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Article MathSciNet Google Scholar
She R, Chen F, Wang F, Ester M, Gardy FL, Brinkman FSL (2003) Frequent-subsequence-based prediction of outer membrane proteins. In: ACM SIGKDD’03, pp 436–445
Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Discov 6(1):9–35
Article MathSciNet Google Scholar
Tatti N, Vreeken J (2012) The long and the short of it: Summarising event sequences with serial episodes. In: ACM SIGKDD’12, pp 462–470
Tseng VS, Lee CH (2005) CBS: a new classification method by using sequential patterns. In: SDM’05, pp 596–600
Vitányi P, Li M (2000) Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans Inf Theory 46(2):446–464
Article MathSciNet MATH Google Scholar
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: ICDE’04, pp 79–90
Xing Z, Pei J, Keogh EJ (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48
Article Google Scholar
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: CIKM’00, pp 422–429
Zaki MJ, Carothers CD, Szymanski BK (2010) VOGUE: a variable order hidden Markov model with duration based on frequent sequence mining. TKDD, 4(1)
Zhou C, Cule B, Goethals B (2013) Itemset based sequence classification. In: ECML/PKDD’13, pp 353–368
Zimmermann A, Nijssen S (2014) Supervised pattern mining and applications to classification. In: Frequent pattern mining, pp 425–442

Download references

Author information

Authors and Affiliations

Orange Labs, 2, avenue Pierre Marzin, 22307, Lannion Cédex, France
Elias Egho, Marc Boullé, Nicolas Voisine & Fabrice Clérot
Université de La Réunion, 2 rue Joseph Wetzell, 97490, Sainte Clotilde, France
Dominique Gay

Authors

Elias Egho
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Gay
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Voisine
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Clérot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elias Egho.

Additional information

This work was done while Dominique Gay was a research engineer at Orange Labs.

Appendix: Proof of Theorem 1 and 2

Here, we will present a complete Proof of Theorem 1, while theorem 2 can be proven in the same way.

Proof

Given the cost of the null model in Eq. , the prior terms are neglected when the number of sequences n is very high. Therefore, when $n \rightarrow \infty $ the cost of the null model is written as:

$$\begin{aligned} \mathrm{cost}(\pi _{\emptyset })= & {} log (n!)- \sum _{i=1}^{j} log (n_{c_i}!)\\ \end{aligned}$$

Using the approximation $log(n!) = n(log(n)-1)+O(log(n))$ based on Stirlings formula, the cost of the null model can be written as:

$$\begin{aligned} \mathrm{cost}(\pi _{\emptyset })= & {} n(log(n)-1)- \sum _{i=1}^{j} n_{c_i}(log (n_{c_i})-1)+O(log(n))\nonumber \\= & {} nlog(n)-n - \sum _{i=1}^{j} n_{c_i}log (n_{c_i})+ \sum _{i=1}^{j} n_{c_i}+O(log(n))\\ \end{aligned}$$

Notice that $n=\sum _{i=1}^{j} n_{c_i}$. Therefore, we have:

$$\begin{aligned} \mathrm{cost}(\pi _{\emptyset })= & {} \sum _{i=1}^{j} n_{c_i}log(n)-n - \sum _{i=1}^{j} n_{c_i}log (n_{c_i})+ n+O(log(n))\\= & {} -\sum _{i=1}^{j} n_{c_i} \bigg ( log(n_{c_i})-log(n) \bigg )+O(log(n))\\= & {} n \times \bigg (-\sum _{i=1}^{j}\frac{n_{c_i}}{n}log( \frac{n_{c_i}}{n})\bigg )+O(log(n))\\= & {} n \times \bigg (-\sum \limits _{i=1}^{j} p(c_i)log(p(c_i))\bigg )+O(log(n))\\= & {} n \times H(y) +O(log(n)) \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Egho, E., Gay, D., Boullé, M. et al. A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52, 53–81 (2017). https://doi.org/10.1007/s10115-016-1002-4

Download citation

Received: 24 February 2016
Revised: 25 August 2016
Accepted: 03 October 2016
Published: 18 October 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10115-016-1002-4

A user parameter-free approach for mining robust sequential classification rules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining sequential patterns for classification

SPaR-FTR: An Efficient Algorithm for Mining Sequential Patterns-Based Rules

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Theorem 1 and 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A user parameter-free approach for mining robust sequential classification rules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining sequential patterns for classification

SPaR-FTR: An Efficient Algorithm for Mining Sequential Patterns-Based Rules

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Theorem 1 and 2

Appendix: Proof of Theorem 1 and 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation