article

Frequency-based views to pattern collections

Author:

Taneli MielikäinenAuthors Info & Claims

Discrete Applied Mathematics, Volume 154, Issue 7

Pages 1113 - 1139

Published: 01 May 2006 Publication History

Abstract

Finding interesting patterns from data is one of the most important problems in data mining and it has been studied actively for more than a decade. However, it is still largely open problem which patterns are interesting and which are not. The problem of detecting the interesting patterns (in a predefined class of patterns) has been attempted to solve by determining quality values for potentially interesting patterns and deciding a pattern to be interesting if its quality value (i.e., the interestingness of the pattern) is higher than a given threshold value. Again, it is very difficult to find a threshold value and a way to determine the quality values such that the collection of patterns with quality values greater than the threshold value would contain almost all truly interesting patterns and only few uninteresting ones. To enable more accurate characterization of interesting patterns, use of constraints to further prune the pattern collection has been proposed. However, most of the constrained pattern discovery research has been focused on structural constraints for the pattern collections and the patterns. We take a complementary approach and focus on constraining the quality values of the patterns. We propose quality value simplifications as a complementary approach to structural constraints on patterns. As a special case of the quality value simplifications, we consider discretizing the quality values. We analyze the worst-case error of certain discretization functions and give efficient discretization algorithms minimizing several loss functions. In addition to that, we show that the discretizations of the quality values can be used to obtain small approximate condensed representations for collections of interesting patterns. We evaluate the proposed condensation approach experimentally using frequent itemsets.

References

[1]

Agrawal, R., Imielinski, T. and Swami, A.N., Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (Eds.), Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ACM Press, New York. pp. 207-216.

Digital Library

[2]

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A.I., Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, New York. pp. 307-328.

Digital Library

[3]

B¿doiu, M., Har-Peled, S. and Indyk, P., Approximate clustering via core-sets. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing, ACM Press, New York. pp. 250-257.

Digital Library

[4]

Bayardo Jr., R.J., Agrawal, R. and Gunopulos, D., Constraint-based rule mining in large, dense databases. Data Min. Knowledge Discovery. v4 i2/3. 217-240.

Digital Library

[5]

R. Bayardo, B. Goethals, M.J. Zaki (Eds.), Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-04), Brighton, UK, 1 November 2004, CEUR Workshop Proceedings, vol. 126, 2004, http://CEUR-WS.org/Vol-126/.

[6]

F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi, ExAnte: anticipated data reduction in constrained pattern mining, in: N. Lavrač, D. Gamberger, H. Blockeel, L. Todorovski (Eds.), Knowledge Discovery in Databases: PKDD 2003, Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22-26 September 2003, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2838, Springer, Berlin, 2003, pp. 59-70.

[7]

J.-F. Boulicaut, A. Bykowski, Frequent closures as a concise representation for binary data mining, in: T. Terano, H. Liu, A.L.P. Chen (Eds.), Knowledge Discovery and Data Mining, Current Issues and New Applications, Fourth Pacific-Asia Conference, 18-20 April 2000, PAKDD 2000, Kyoto, Japan, Proceedings, Lecture Notes in Computer Science, vol. 1805, Springer, Berlin, 2000, pp. 62-73.

Digital Library

[8]

Boulicaut, J.-F., Bykowski, A. and Rigotti, C., Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Min. Knowledge Discovery. v7 i1. 5-22.

Digital Library

[9]

Bykowski, A. and Rigotti, C., A condensed representation to find frequent patterns. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, New York.

Digital Library

[10]

T. Calders, B. Goethals, Mining all non-derivable frequent itemsets, in: T. Elomaa, H. Mannila, H. Toivonen (Eds.), Principles of Data Mining and Knowledge Discovery, Sixth European Conference, PKDD 2002, Helsinki, Finland, 19-23 August 2002, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2431, Springer, Berlin, 2002, pp. 74-865.

Digital Library

[11]

T. Calders, B. Goethals, Minimal k-free representations of frequent sets, in: N. Lavrač, D. Gamberger, H. Blockeel, L. Todorovski (Eds.), Knowledge Discovery in Databases: PKDD 2003, Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22-26 September 2003, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2838, Springer, Berlin, 2003, pp. 71-82.

[12]

G. Casas-Garriga, Discovering unbounded episodes in sequential data, in: N. Lavrač, D. Gamberger, H. Blockeel, L. Todorovski (Eds.), Knowledge Discovery in Databases: PKDD 2003, Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22-26 September 2003, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2838, Springer, Berlin, 2003, pp. 83-94.

[13]

S. Dasgupta, Performance guarantees for hierarchical clustering, in: J. Kivinen, R.H. Sloan (Eds.), Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, 8-10 July 2002, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2375, Springer, Berlin, 2002, pp. 351-363.

Digital Library

[14]

de la Vega, W.F., Karpinski, M., Kenyon, C. and Rabani, Y., Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, ACM Press, New York.

Digital Library

[15]

L. De Raedt, M. Jaeger, S.D. Lee, H. Mannila, A theory of inductive query answering, in: V. Kumar, S. Tsumoto (Eds.), Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9-12 December 2002, Maebashi City, Japan, IEEE Computer Society, Silver Spring, MD, 2002, pp. 123-130.

Digital Library

[16]

Dehaspe, L. and Toivonen, H.T.T., Discovery of relational association rules. In: D¿eroski, S., Lavrač, N. (Eds.), Relational Data Mining, Springer, Berlin. pp. 189-212.

Digital Library

[17]

Elomaa, T. and Rousu, J., On the computational complexity of optimal multisplitting. Fundam. Inform. v47 i1-2. 35-52.

[18]

Feder, T. and Greene, D.H., Optimal algorithms for approximate clustering. In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing, ACM Press, New York. pp. 434-444.

Digital Library

[19]

Fisher, W.D., On grouping for maximum homogeneity. J. Amer. Statist. Assoc. v53 i284. 789-798.

[20]

B. Goethals, J. Van den Bussche, On supporting interactive association rule mining, in: Y. Kambayashi, M.K. Mohania, A.M. Tjoa (Eds.), DaWaK, Lecture Notes in Computer Science, vol. 1874, Springer, Berlin, 2000, pp. 307-316.

Digital Library

[21]

B. Goethals, J. Van den Bussche, Relational association rules: getting WARMeR, in: D.J. Hand, N.M. Adams, R.J. Bolton (Eds.), Pattern Detection and Discovery, ESF Exploratory Workshop, London, UK, 16-19 September 2002, Proceedings, Lecture Notes in Computer Science, vol. 2447, Springer, Berlin, 2002, pp. 125-139.

Digital Library

[22]

B. Goethals, M.J. Zaki (Eds.), Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-03), Melbourne, FL, USA, 19 November 2003, CEUR Workshop Proceedings, vol. 90, 2003, http://CEUR-WS.org/Vol-90/.

[23]

Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H. and Sharma, R.S., Discovering all most specific sentences. ACM Trans. Database Systems. v28 i2. 140-174.

Digital Library

[24]

R. Gwadera, M. Atallah, W. Szpankowski, Reliable detection of episodes in event sequences, in: X. Wu, A. Tuzhilin, J. Shavlik (Eds.), Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), 19-22 December 2003, Melbourne, FL, USA, IEEE Computer Society Press, Silver Spring, MD, 2003, pp. 67-74.

Digital Library

[25]

Han, J., Pei, J., Yin, Y. and Mao, R., Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowledge Discovery. v8 i1. 53-87.

Digital Library

[26]

D.J. Hand, Pattern detection and discovery, in: D.J. Hand, N.M. Adams, R.J. Bolton (Eds.), Pattern Detection and Discovery, ESF Exploratory Workshop, London, UK, 16-19 September 2002, Proceedings, Lecture Notes in Computer Science, vol. 2447, Springer, Berlin, 2002, pp. 1-12.

Digital Library

[27]

Hand, D.J., Mannila, H. and Smyth, P., Principles of Data Mining. 2001. MIT Press, Cambridge, MA.

Digital Library

[28]

Har-Peled, S. and Sadri, B., How fast is the k-means method?. Algorithmica. v41. 185-202.

Digital Library

[29]

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, Berlin, 2001.

[30]

Inokuchi, A., Washio, T. and Motoda, H., Complete mining of frequent patterns from graphs: mining graph data. Mach. Learn. v50. 321-354.

Digital Library

[31]

Ioannidis, Y.E., The history of histograms (abridged). In: Freytag, J.C., Lockemann, P.C., Abiteboul, S., Carey, M.J., Selinger, P.G., Heuer, A. (Eds.), VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, Morgan Kaufmann, Los Altos, CA. pp. 19-30.

Digital Library

[32]

Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C. and Suel, T., Optimal histograms with quality guarantees. In: Gupta, A., Shmueli, O., Widom, J. (Eds.), VLDB'98, Proceedings of 24th International Conference on Very Large Data Bases, Morgan Kaufmann, Los Altos, CA. pp. 275-286.

Digital Library

[33]

Kannan, R., Vempala, S. and Vetta, A., On clusterings: good, bad and spectral. J. ACM. v51 i3. 497-515.

Digital Library

[34]

Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R. and Wu, A.Y., A local search approximation algorithm for k-means clustering. Comput. Geom.: Theory Appl. v28. 89-112.

Digital Library

[35]

Kifer, D., Gehrke, J., Bucila, C. and White, W., How to quickly find a witness. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, New York. pp. 272-283.

Digital Library

[36]

Knuth, D.E., Sorting and Searching, The Art of Computer Programming. 1998. second ed. Addison-Wesley, Reading, MA.

Digital Library

[37]

M. Kryszkiewicz, Concise representation of frequent patterns based on disjunction-free generators, in: N. Cercone, T.Y. Lin, X. Wu (Eds.), Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November-2 December 2001, San Jose, California, USA, IEEE Computer Society Press, Silver Spring, MD, 2001, pp. 305-312.

Digital Library

[38]

Kumar, A., Sabharwal, Y. and Sen, S., A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In: 45th Symposium on Foundations of Computer Science (FOCS 2004), IEEE Computer Society Press, Silver Spring, MD. pp. 454-462.

Digital Library

[39]

M. Kurakochi, G. Karypis, Frequent subgraph discovery, in: N. Cercone, T.Y. Lin, X. Wu (Eds.), Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November-2 December 2001, San Jose, California, USA, IEEE Computer Society Press, Silver Spring, MD, 2001, pp. 313-320.

Digital Library

[40]

Lakshmanan, L.V.S., Leung, C.K.-S. and Ng, R.T., Efficient dynamic mining of constrained frequent sets. ACM Trans. Database Systems. v28 i4. 337-389.

Digital Library

[41]

J. Maloberti, E. Suzuki, Improving efficiency of frequent query discovery by eliminating non-relevant candidates, in: G. Grieser, Y. Tanaka, A. Yamamoto (Eds.), Discovery Science, Sixth International Conference, DS 2003, Sapporo, Japan, 17-19 October 2003, Proceedings, Lecture Notes in Computer Science, vol. 2843, Springer, Berlin, 2003, pp. 220-232.

[42]

H. Mannila, Local and global methods in data mining: basic techniques and open problems, in: P. Widmayer, F.T. Ruiz, R.M. Bueno, M. Hennessy, S. Eidenbenz, R. Conejo (Eds.), Automata, Languages and Programming, 29th International Colloquium, ICALP 2002, Malaga, Spain, 8-13 July 2002, Proceedings, Lecture Notes in Computer Science, vol. 2380, Springer, Berlin, 2002, pp. 57-68.

Digital Library

[43]

Mannila, H. and Toivonen, H., Levelwise search and borders of theories in knowledge discovery. Data Min. Knowledge Discovery. v1 i3. 241-258.

Digital Library

[44]

Mannila, H., Toivonen, H. and Verkamo, A.I., Discovery of frequent episodes in event sequences. Data Min. Knowledge Discovery. v1 i3. 259-289.

Digital Library

[45]

T. Mielikäinen, Chaining patterns, in: G. Grieser, Y. Tanaka, A. Yamamoto (Eds.), Discovery Science, Sixth International Conference, DS 2003, Sapporo, Japan, 17-19 October 2003, Proceedings, Lecture Notes in Computer Science, vol. 2843, Springer, Berlin, 2003, pp. 232-243.

[46]

Mielikäinen, T., Finding all occurring sets of interest. In: Boulicaut, J.-F., D¿eroski, S. (Eds.), Second International Workshop on Knowledge Discovery in Inductive Databases, pp. 97-106.

[47]

T. Mielikäinen, Separating structure from interestingness, in: H. Dai, R. Srikant, C. Zhang (Eds.), Advances in Knowledge Discovery and Data Mining, Eighth Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, 26-28 May 2004, Proceedings, Lecture Notes in Artificial Intelligence, vol. 3056, Springer, Berlin, 2004, pp. 476-485.

[48]

T. Mielikäinen, H. Mannila, The pattern ordering problem, in: N. Lavrač, D. Gamberger, H. Blockeel, L. Todorovski (Eds.), Knowledge Discovery in Databases: PKDD 2003, Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22-26 September 2003, Proceedings, Lecture Notes in Artificial Intelligence, vol. 2838, Springer, Berlin, 2003, pp. 327-338.

[49]

N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: C. Beeri, P. Buneman (Eds.), Database Theory-ICDT '99, Seventh International Conference, Jerusalem, Israel, 10-12 January 1999, Proceedings, Lecture Notes in Computer Science, vol. 1540, Springer, Berlin, 1999, pp. 398-416.

Digital Library

[50]

J. Pei, G. Dong, W. Zou, J. Han, On computing condensed pattern bases, in: V. Kumar, S. Tsumoto (Eds.), Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9-12 December 2002, Maebashi City, Japan, IEEE Computer Society Press, Silver Spring, MD, 2002, pp. 378-385.

Digital Library

[51]

Srikant, R., Vu, Q. and Agrawal, R., Mining association rules with item constraints. In: Heckerman, D., Mannila, H., Pregibon, D. (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), AAAI Press. pp. 67-73.

[52]

P.-N. Tan, V. Kumar, J. Srivastava, Selecting the right interestingness measure for association patterns, in: D. Hand, D. Keim, R. Ng (Eds.), Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 23-26 July 2002, Edmonton, Alberta, Canada, ACM Press, New York, 2002.

Digital Library

[53]

Toivonen, H., Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (Eds.), VLDB'96, Proceedings of 22nd International Conference on Very Large Data Bases, Morgan Kaufmann, Los Altos, CA. pp. 134-145.

Digital Library

[54]

Wang, J. and Han, J., BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), IEEE Computer Society, Silver Spring, MD.

Digital Library

[55]

Wang, X., Wang, J.T.L., Shasha, D., Shapiro, B.A., Rigoutsos, I. and Zhang, K., Finding patterns in three-dimensional graphs: algorithms and applications to scientific data mining. IEEE Trans. Knowledge Data Eng. v14 i4. 731-749.

Digital Library

[56]

Y. Xiao, J.-F. Yao, Z. Li, M.H. Dunham, Efficient data mining for maximal frequent subtrees, in: X. Wu, A. Tuzhilin, J. Shavlik (Eds.), Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), 19-22 December 2003, Melbourne, FL, USA, IEEE Computer Society, Silver Spring, MD, 2003, pp. 379-386.

Digital Library

[57]

Yan, X., Cheng, H., Han, J. and Xin, D., Summarizing itemset patterns: a profile-based approach. . In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York.

Digital Library

[58]

X. Yan, J. Han, gSpan: graph-based substructure pattern mining, in: V. Kumar, S. Tsumoto (Eds.), Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9-12 December 2002, Maebashi City, Japan, IEEE Computer Society Press, Silver Spring, MD, 2002, pp. 721-724.

Digital Library

[59]

Zaki, M.J., Scalable algorithms for association mining. IEEE Trans. Knowledge Data Eng. v12 i3. 372-390.

Digital Library

[60]

Zaki, M.J., SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. v42. 31-60.

[61]

M.J. Zaki, Efficiently mining frequent trees in a forest, in: D. Hand, D. Keim, R. Ng (Eds.), Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 23-26 July 2002, Edmonton, Alberta, Canada, ACM Press, New York, 2002.

Digital Library

Cited By

Mielikäinen T(2003)Change ProfilesProceedings of the Third IEEE International Conference on Data Mining10.5555/951949.952183Online publication date: 19-Nov-2003
https://dl.acm.org/doi/10.5555/951949.952183

Index Terms

Frequency-based views to pattern collections

Recommendations

A constrained frequent pattern mining system for handling aggregate constraints
IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Sysmposium

Frequent pattern mining searches data for sets of items that are frequently co-occurring together. Most of algorithms find all the frequent patterns. However, there are many real-life situations in which users is interested in only some small portions ...
Closed frequent similar pattern mining

The concept of closed frequent similar pattern mining is introduced.Several lemmas to prune the search space are introduced and proved.A novel closed frequent similar pattern mining algorithm (CFSP-Miner), is proposed.CFSP-Miner is more efficient than ...
Visually Contrast Two Collections of Frequent Patterns
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops

Frequent pattern mining searches for frequently occurring sets of items or events. While users are interested in finding these frequent patterns in most situations, they may want to compare and contrast the mined frequent patterns in some other ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Discrete Applied Mathematics

Discrete Applied Mathematics Volume 154, Issue 7

Special issue: Discrete mathematics & data mining II (DM & DM II)

1 May 2006

127 pages

ISSN:0166-218X

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2005.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 May 2006

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mielikäinen T(2003)Change ProfilesProceedings of the Third IEEE International Conference on Data Mining10.5555/951949.952183Online publication date: 19-Nov-2003
https://dl.acm.org/doi/10.5555/951949.952183

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents