Abstract
Sequential pattern mining aims to find the subsequences in a sequence database that appear together in the order of timestamps. Although there exist sequential pattern mining techniques, they ignore the temporal relationship information between the itemsets in the subsequences. This information is important in many real-world applications. For example, even if healthcare providers know that symptom Y frequently occurs after symptom X, it is also valuable for them to be able to estimate when Y will occur after X so that they can provide treatment at the right time. Considering temporal relationship information for sequential pattern mining raises new issues to be solved, such as designing a new data structure to save this information and traversing this structure efficiently to discover patterns without re-scanning the database. In this paper, we propose an algorithm called Minits-AllOcc (MINIng Timed Sequential Pattern for All-time Occurrences) to find sequential patterns and the transition time between itemsets based on all possible occurrences of a pattern in the database. We also propose a parallel multicore CPU version of this algorithm, called MMinits-AllOcc (Multicore Minits-AllOcc), to deal with Big Data. Extensive experiments on real and synthetic datasets show the advantages of this approach over the brute-force method. Also, the multicore CPU version of the algorithm is shown to outperform the single-core version on Big Data by 2.5X.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th IEEE International Conference, Taiwan, pp. 3–14. IEEE (1995)
AlZahrani, M.Y., Mazarbhuiya, F.A. Discovering constraint-based sequential patterns from medical datasets. Int. J. Recent Tech. Eng. (IJRTE) (2019). ISSN 2277-3878
Chen, Y.L., Chiang, M.C., Ko, M.T.: Discovering time-interval sequential patterns in sequence databases. Expert Syst. Appl. 25(3), 343–354 (2003)
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)
Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Yu, P.S.: A survey of parallel sequential pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 13(3), 1–34 (2019)
Giannotti, F., Nanni, M., Pedreschi, D.: Efficient mining of temporally annotated sequences. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 348–359 (2006)
Giannotti, F., Nanni, M., Pinelli, F., Pedreschi, D.: Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 330–339 (2007)
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
Hu, Y.H., Huang, T.C.K., Yang, H.R., Chen, Y.L.: On mining multi-time-interval sequential patterns. Data Knowl. Eng. 68(10), 1112–1127 (2009)
Huynh, B., Vo, B., Snasel, V.: An efficient method for mining frequent sequential patterns using multi-core processors. Appl. Intell. 46(3), 703–716 (2017)
Jay, N., Herengt, G., Albuisson, E., Kohler, F., Napoli, A.: Sequential pattern mining and classification of patient path. Medinfo 1667 (2004)
Yuan, J., et al.: T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2010, pp. 99–108. ACM, New York (2010)
Yuan, J., Zheng, Y., Xie, X., Sun, G.: Driving with knowledge from the physical world. In: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011. ACM, New York (2011)
Karsoum, S., Gruenwald, L., Barrus, C., Leal, E.: Using timed sequential patterns in the transportation industry. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3573–3582. IEEE, December 2019
Li, H., Zhou, X., Pan, C.: Study on GSP algorithm based on Hadoop. In: 5th IEEE International Conference Electronics Information and Emergency Communication (ICEIEC), pp. 321–324 (2015)
Pei, J., et al.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICCCN, p. 0215 (2001)
Pramono, Y.W.T.: Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: case study: statistics of Indonesia (BPS) website. In: 2014 International Conference of Advanced Informatics: Concept, Theory, and Application (ICAICTA), pp. 203–208. IEEE, August 2014
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Wei, Y.-Q., Liu, D., Duan, L.-S.: Distributed prefixspan algorithm based on mapreduce. In: International Symposium on IEEE Information Technology in Medicine and Education (ITME), vol. 2, pp. 901–904 (2012)
Yang, H., Gruenwald, L., Boulanger, M.: A novel real-time framework for extracting patterns from trajectory data streams. In: Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, pp. 26–32 (2013)
Zaki, M.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Karsoum, S., Barrus, C., Gruenwald, L., Leal, E. (2021). Minits-AllOcc: An Efficient Algorithm for Mining Timed Sequential Patterns. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)