Abstract
Sequential pattern mining algorithms are unsupervised machine learning algorithms that allow finding sequential patterns on data sequences that have been put together based on a particular order. These algorithms are mostly optimized for finding sequential data sequences containing more than one element. Hence, we argue that there is a need for algorithms that are particularly optimized for data sequences that contain only one element. Within the scope of this research, we study the design and development of a novel algorithm that is optimized for data sets containing data sequences with single elements and that can detect sequential patterns with high performance. The time and memory requirements of the proposed algorithm are examined experimentally. The results show that the proposed algorithm has low running times, while it has the same accuracy results as the algorithms in the similar category in the literature. The obtained results are promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994). https://doi.org/10.5555/645920.672836
Anil, R., et al.: Apache mahout: machine learning on distributed dataflow systems. J. Mach. Learn. Res. 21, 1–6 (2020)
Bahadır, D., et al.: A big data processing framework for self-healing internet of things applications. In: 12th International Conference on Semantics, Knowledge and Grids (SKG) (2016)
Burak, C., et al.: Data feature selection methods on distributed big data processing platforms. In: 3rd International Conference On Computer Science And Engineering (2018)
Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurr. Comput. Pract. Exp. (CCPE) J. 27(8), 2078–2091 (2015)
Duygu, S., et al.: Implementation of association rule mining algorithms on distributed data processing platforms. In: 4th International Conference on Computer Science and Engineering (UBMK) (2019)
Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recogn. 1(1), 54–77 (2017)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000). https://doi.org/10.1145/335191.335372
Kim, B., Yi, G.: Location-based parallel sequential pattern mining algorithm. IEEE Access 7, 128651–128658 (2019)
Li, H., Zhou, X., Pan, C.: Study on GSP algorithm based on hadoop. In: 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication, pp. 321–324 (2015)
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
Mooney, C.H., Roddick, J.F.: Sequential pattern mining-approaches and algorithms. ACM Comput. Surv. (CSUR) 45(2), 1–39 (2013)
Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004). https://doi.org/10.1109/TKDE.2004.77
Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using small sets of frequent part-of-speech skip-grams. In: The Twenty-Ninth International Flairs Conference (2016)
Sabrina, P.N., Saptawati, G.P.: Multiple mapreduce and derivative projected database: new approach for supporting prefixspan scalability. In: 2015 International Conference on Data and Software Engineering (ICoDSE), pp. 148–153. IEEE (2015)
Sagiroglu, S., Sinanc, D.: Big data: a review. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47 (2013)
Secil, Y., et al.: On the performance analysis of map-reduce programming model on in-memory nosql storage platforms: a case study. In: International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (2018)
Spmf an open-source data mining library. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php, Accessed 15 Sept 2021
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Tas, Y., et al.: An approach to standalone provenance systems for big social provenance data. In: 12th International Conference on Semantics, Knowledge and Grids (SKG) (2016)
Tufek, A., et al.: On the provenance extraction techniques from large scale log files. In: Concurrency And Computation-Practice & Experience (Early Access) (2021) https://doi.org/10.1002/cpe.6559
Uzun-Per, M., Gürel, A.V., Can, A.B., Aktas, M.S.: An approach to recommendation systems using scalable association mining algorithms on big data processing platforms: A case study in airline industry. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–6. IEEE (2021)
Uzun-Per, M., Can, A.B., Gürel, A.V., Aktas, M.S.: Big data testing framework for recommendation systems in e-science and e-commerce domains. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 2353–2361. IEEE (2021)
Uzun-Per, M., Gurel, A.V., Can, A.B., Aktas, M.S.: Scalable recommendation systems based on finding similar items and sequences. Concurr. Comput. Pract. Exp., e6841 (2022)
Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. Knowl. Data Eng. 19(8), 1042–1056 (2007)
Wei, Y.Q., Liu, D., Duan, L.S.: Distributed prefixspan algorithm based on mapreduce. In: 2012 International Symposium on Information Technologies in Medicine and Education, vol. 2, pp. 901–904 (2012)
Yasin, U., et al.: Technical analysis on financial time series data based on map-reduce programming model: a case study. In: International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (2018)
Yasin, U., et al.: On the large-scale graph data processing for user interface testing in big data science projects. In: 8th IEEE International Conference on Big Data (Big Data) (2020)
Yu, X., Li, Q., Liu, J.: Scalable and parallel sequential pattern mining using spark. World Wide Web 22(1), 295–324 (2018). https://doi.org/10.1007/s11280-018-0566-1
Yu, X., Liu, J., Liu, X., Ma, C., Li, B.: A mapreduce reinforced distributed sequential pattern mining algorithm. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 183–197 (2015)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664
Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42, 31–60 (2004)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: The Third International Conference on Knowledge Discovery and Data Mining (KDD-97), pp. 283–286. AAAI Press, Newport Beach (1997)
Acknowledgements
This study was supported by BiletBank R &D Center. We would like to thank BiletBank for providing us with the necessary hardware and access to their datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Can, A.B., Uzun-Per, M., Aktas, M.S. (2022). A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13377. Springer, Cham. https://doi.org/10.1007/978-3-031-10536-4_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-10536-4_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10535-7
Online ISBN: 978-3-031-10536-4
eBook Packages: Computer ScienceComputer Science (R0)