Abstract
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93). Washington, D.C., pp. 207–216.
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE '95). Taipei, Taiwan, pp. 3–14.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press, pp. 307–328.
Bairoch, A., Bucher, P., and Hofmann, K. 1995. The PROSITE database, its status in 1995. Nucleic Acids Research, 24:189–196.
Bettini, C., Wang, X.S., and Jajodia, S. 1996. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 68–78.
Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., and Kärkkäinen, J. 1997. Episode matching. In Proceedings of the 8th Symposium on Combinatorial Pattern Matching (CPM '97). Aarhus, Denmark, pp. 12–27.
Denning, P.J. 1968. The working set model of program behavior. Communications of the ACM, 11(5):323–333.
Dousson, C., Gaborit, P., and Ghallab, M. 1993. Situation recognition: Representation and algorithms. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambery, France, pp. 166–172.
ExPASy Molecular Biology Server. Geneva University Hospital and University of Geneva, Switzerland. http://expasy.hcuge.ch/.
Forgy, C.L. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19:17–37.
Gehani, N., Jagadish, H., and Shmueli, O. 1992. Composite event specification in active databases. In Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB '92). San Diego, CA, pp. 327–338.
Goodman, R.M., Ambrose, B.E., Latin, H.W., and Ulmer, C.T. 1995. Noaa-An expert system managing the telephone network. In Integrated Network Management IV, A.S. Sethi, Y. Raynaud, and F. Faure-Vincent (Eds.), London, UK: Chapman and Hall, pp. 316–327.
Grossi, R. and Luccio, F. 1989. Simple and efficient string matching with k mismatches. Information Processing Letters, 33:113–120.
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.
Hätönen, K., Klemettinen, M., Mannila, H., Ronkainen, P., and Toivonen, H. 1996a. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE '96). New Orleans, LA, pp. 115–122.
Hätönen, K., Klemettinen, M., Mannila, H., Ronkainen, P., and Toivonen, H. 1996b. TASA: Telecommunication alarm sequence analyzer, or how to enjoy faults in your network. In 1996 IEEE Network Operations and Management Symposium (NOMS '96). Kyoto, Japan, pp. 520–529.
Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.
Howe, A.E. 1995. Finding dependencies in event streams using local search. In Preliminary Papers of the Fifth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, pp. 271–277.
Jakobson, G. and Weissman, M.D. 1993. Alarm correlation. IEEE Network, 7(6):52–59.
Jonassen, I., Collins, J.F., and Higgins, D.G. 1995. Finding flexible patterns in unaligned protein sequences. Protein Science, 4(8):1587–1595.
Kalbfleisch, J.D. and Prentice, R.L. 1980. The Statistical Analysis of Failure Time Data. New York, NY: John Wiley Inc.
Laird, P. 1993. Identifying and using patterns in sequential data. In Algorithmic Learning Theory, 4th International Workshop (ALT 93). (Lecture Notes in Artificial Intelligence 744, Berlin: Springer-Verlag). Chofu, Japan, pp. 1–18.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.
Mannila, H. and Toivonen, H. 1996. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96). Portland, OR, pp. 146–151.
Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258.
Milne, R., Nicol, C., Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C., Quevedo, J., Aguilar, J., and Guasch, A. 1994. TIGER: Real-time situation assessment of dynamic systems. Intelligent Systems Engineering, pp. 103–124.
Morris, R.A., Shoaff, W.D., and Khatib, L. 1994. An algebraic formulation of temporal knowledge for reasoning about recurring events. In Proceedings of the InternationalWorkshop onTemporal Representation and Reasoning (TIME-94). Pensacola, FL, pp. 29–34.
Muggleton, S. 1992. Inductive Logic Programming. London, UK: Academic Press.
Oates, T. and Cohen, P.R. 1996. Searching for structure in multiple streams of data. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML '96). Bari, Italy, pp. 346–354.
Padmanabhan, B. and Tuzhilin, A. 1996. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96). Portland, OR, pp. 351–354.
Seshadri, P., Livny, M., and Ramakrishnan, R. 1996. SEQ: Design and implementation of a sequence database system. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 99–110.
Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.
Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Advances in Database Technology-5th International Conference on Extending Database Technology (EDBT '96). Avignon, France, pp. 3–17.
Wang, J.T.-L., Chirn, G.-W., Marr, T.G., Shapiro, B., Shasha, D., and Zhang, K. 1994. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '94). Minneapolis, MI, pp. 115–125.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mannila, H., Toivonen, H. & Inkeri Verkamo, A. Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997). https://doi.org/10.1023/A:1009748302351
Issue Date:
DOI: https://doi.org/10.1023/A:1009748302351