Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Evolutive Frequent Pattern Tree-based Incremental Knowledge Discovery Algorithm

Published: 04 February 2022 Publication History

Abstract

To understand current situation in specific scenarios, valuable knowledge should be mined from both historical data and emerging new data. However, most existing algorithms take the historical data and the emerging data as a whole and periodically repeat to analyze all of them, which results in heavy computation overhead. It is also challenging to accurately discover new knowledge in time, because the emerging data are usually small compared to the historical data. To address these challenges, we propose a novel knowledge discovery algorithm based on double evolving frequent pattern trees that can trace the dynamically evolving data by an incremental sliding window. One tree is used to record frequent patterns from the historical data, and the other one records incremental frequent items. The structures of the double frequent pattern trees and their relationships are updated periodically according to the emerging data and a sliding window. New frequent patterns are mined from the incremental data and new knowledge can be obtained from pattern changes. Evaluations show that this algorithm can discover new knowledge from evolving data with good performance and high accuracy.

References

[1]
Charu C. Aggarwal, Philip S. Yu, Jiawei Han, and Jianyong Wang. 2003. A Framework for Clustering Evolving Data Streams. In Proceedings 2003 VLDB Conference, Johann-Christoph Freytag, Peter Lockemann, Serge Abiteboul, Michael Carey, Patricia Selinger, and Andreas Heuer (Eds.). Morgan Kaufmann, San Francisco, 81–92.DOI:https://doi.org/10.1016/B978-012722442-8/50016-1
[2]
Rakesh Agrawal, Ramakrishnan Srikant et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases. 487–499.
[3]
U. Ahmed, J. C. W. Lin, G. Srivastava, R. Yasin, and Y. Djenouri. 2021. An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans. Emerg. Topics Computat. Intell.19–28.
[4]
Anindita Borah and Bhabesh Nath. 2020. Rare association rule mining from incremental databases. Pattern Anal. Applic. 23, 1 (2020), 113–134. DOI:https://doi.org/10.1007/s10044-018-0759-3
[5]
Jiawei Chen, Hongyan Liu, Yinghui (Catherine) Yang, and Jun He. 2019. Effective selection of a compact and high-quality review set with information preservation. ACM Trans. Manag. Inf. Syst. 10, 4 (Dec. 2019). DOI:https://doi.org/10.1145/3369395
[6]
Dwl Cheung, J. Han, V. Ng, and Cy Wong. 1996. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. of the Intl. Conf. on Data Engineering (New Orleans, Louisiana, USA). 106–114.
[7]
David W. Cheung, S. D. Lee, and Benjamin Kao. 1997. A general incremental technique for maintaining discovered association rules. In Database Systems for Advanced Applications ’97. Advanced Database Research and Development Series, Vol. 6. WORLD SCIENTIFIC, 185–194. DOI:https://doi.org/10.1142/9789812819536_0020
[8]
S. Danjuma, T. Herawan, M. A. Ismail, H. Chiroma, A. I. Abubakar, and A. M. Zeki. 2017. A review on soft set-based parameter reduction and decision making. IEEE Access 5 (2017), 4671–4689. DOI:https://doi.org/10.1109/ACCESS.2017.2682231
[9]
Razieh Davashi. 2021. ILUNA: Single-pass incremental method for uncertain frequent pattern mining without false positives. Inf. Sci. 564 (2021), 1–26. DOI:https://doi.org/10.1016/j.ins.2021.02.067
[10]
Youcef Djenouri, Asma Belhadi, Philippe Fournier-Viger, and Hamido Fujita. 2018. Mining diversified association rules in big datasets: A cluster/GPU/genetic approach. Inf. Sci. 459 (2018), 117–134. DOI:https://doi.org/10.1016/j.ins.2018.05.031
[11]
Youcef Djenouri, Jerry Chun-Wei Lin, Kjetil Nørvåg, Heri Ramampiaro, and Philip S. Yu. 2021. Exploring decomposition for solving pattern mining problems. ACM Trans. Manag. Inf. Syst. 12, 2 (Feb. 2021). DOI:https://doi.org/10.1145/3439771
[12]
Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Philippe Fournier-Viger, Xuan Wang, and Philip S. Yu. 2020. Utility-driven mining of trend information for intelligent system. ACM Trans. Manag. Inf. Syst. 11, 3 (June 2020). DOI:https://doi.org/10.1145/3391251
[13]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. More efficient algorithm for mining frequent patterns with multiple minimum supports. In Web-Age Information Management, Bin Cui, Nan Zhang, Jianliang Xu, Xiang Lian, and Dexi Liu (Eds.). Springer International Publishing, Cham, 3–16.
[14]
Alexander Gepperth and Barbara Hammer. 2016. Incremental learning algorithms and applications. In European Symposium on Artificial Neural Networks (ESANN’16), Bruges, Belgium.
[15]
Bin Gu, Xin Quan, Yunhua Gu, Victor S. Sheng, and Guansheng Zheng. 2018. Chunk incremental learning for cost-sensitive hinge loss support vector machine. Pattern Recog. 83 (2018), 196–208.
[16]
Kamel Eddine Heraguemi, Nadjet Kamel, and Habiba Drias. 2016. Multi-swarm bat algorithm for association rule mining using multiple cooperative strategies. Appl. Intell. 45, 4 (2016), 1021–1033.
[17]
Tzung Pei Hong, Chun Wei Lin, and Yu Lung Wu. 2008. An efficient FUFP-tree maintenance algorithm for record modification. Int. J. Innov. Comput. Inf. Contr. 4, 11 (2008).
[18]
Yanyong Huang, Tianrui Li, Chuan Luo, Hamido Fujita, and Shi-jinn Horng. 2017. Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl.-based Syst. 119 (2017), 273–283.
[19]
Yexi Jiang, Chunqiu Zeng, Jian Xu, and Tao Li. 2014. Real time contextual collective anomaly detection over multiple data streams. Proceedings of the ODD. 23–30.
[20]
Yu Jiang, Minghao Zhao, Chengquan Hu, Lili He, Hongtao Bai, and Jin Wang. 2018. A parallel FP-growth algorithm on World Ocean Atlas data with multi-core CPU. J. Supercomput. 75, 2 (2018).
[21]
Jaein Kim and Buhyun Hwang. 2016. Real-time stream data mining based on CanTree and Gtree. Inf. Sci. 367 (2016), 512–528.
[22]
Georg Krempl, Indre Žliobaite, Dariusz Brzeziński, Eyke Hüllermeier, Mark Last, Vincent Lemaire, Tino Noack, Ammar Shaker, Sonja Sievi, Myra Spiliopoulou et al. 2014. Open challenges for data stream mining research. ACM SIGKDD Explor. Newslett. 16, 1 (2014), 1–10.
[23]
Tuong Le, Bay Vo, Philippe Fournier-Viger, Mi Young Lee, and Sung Wook Baik. 2019. SPPC: A new tree structure for mining erasable patterns in data streams. Appl. Intell. 49, 2 (Feb. 2019), 478–495. DOI:https://doi.org/10.1007/s10489-018-1280-5
[24]
Ning Li, Wenjuan Luo, Kun Yang, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. 2018. Self-organizing weighted incremental probabilistic latent semantic analysis. Int. J. Mach. Learn. Cybern. 9, 12 (Dec. 2018), 1987–1998. DOI:https://doi.org/10.1007/s13042-017-0681-9
[25]
Guanqing Liang, Jingxin Zhao, Helena Yan Ping Lau, and Cane Wing-Ki Leung. 2021. Using social media to analyze public concerns and policy responses to COVID-19 in Hong Kong. ACM Trans. Manag. Inf. Syst. 12, 4 (Sept. 2021). DOI:https://doi.org/10.1145/3460124
[26]
Chun-Wei Lin and Tzung-Pei Hong. 2014. Maintenance of prelarge trees for data mining with modified records. Inf. Sci. 278 (2014), 88–103. DOI:https://doi.org/10.1016/j.ins.2014.03.023
[27]
Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2009. The pre-FUFP algorithm for incremental mining. Expert Syst. Applic. 36, 5 (2009), 9498–9505. DOI:https://doi.org/10.1016/j.eswa.2008.03.014
[28]
Jerry Chun-Wei Lin, Wensheng Gan, and Tzung-Pei Hong. 2016. Maintaining the discovered high-utility itemsets with transaction modification. Appl. Intell. 44, 1 (2016), 166–178.
[29]
Jerry Chun-Wei Lin, Yina Shao, Philippe Fournier-Viger, Youcef Djenouri, and Xiangmin Guo. 2018. Maintenance algorithm for high average-utility itemsets with transaction deletion. Appl. Intell. 48, 10 (Oct. 2018), 3691–3706. DOI:https://doi.org/10.1007/s10489-018-1180-8
[30]
Kawuu W. Lin, Sheng-Hao Chung, and Chun-Cheng Lin. 2016. A fast and distributed algorithm for mining frequent patterns in congested networks. Computing 98, 3 (2016), 235–256.
[31]
Xin Liu, Xiaomiao Zhang, Yiwen Wang, Jiehan Zhou, Sumi Helal, Zhidong Xu, Weishan Zhang, and Shuai Cao. 2018. PARMTRD: Parallel association rules based multiple-topic relationships detection. In International Conference on Web Services. Springer, 422–436.
[32]
José María Luna, Alberto Cano, Mykola Pechenizkiy, and Sebastián Ventura. 2016. Speeding-up association rule mining with inverted index compression. IEEE Trans. Cybern. 46, 12 (2016), 3059–3072.
[33]
D. Martín, María Martínez-Ballesteros, Diego García-Gil, Jesús Alcalá-Fdez, Francisco Herrera, and J. C. Riquelme-Santos. 2018. MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems. Knowl.-based Syst. 153 (2018), 176–192.
[34]
Li Ni, Wenjian Luo, Nannan Lu, and Wenjie Zhu. 2020. Mining the local dependency itemset in a products network. ACM Trans. Manag. Inf. Syst. 11, 1 (Apr. 2020). https://doi.org/10.1145/3384473
[35]
Mahardhika Pratama, Jie Lu, Edwin Lughofer, Guangquan Zhang, and Meng Joo Er. 2016. An incremental learning of concept drifts using evolving type-2 recurrent fuzzy neural networks. IEEE Trans. Fuzzy Syst. 25, 5 (2016), 1175–1192.
[36]
Beatriz Pérez-Sánchez, Oscar Fontenla-Romero, and Bertha Guijarro-Berdiñas. 2018. A review of adaptive online learning for artificial neural networks. Artif. Intell. Rev. 49, 2 (2018), 281–299.DOI:https://doi.org/10.1007/s10462-016-9526-2
[37]
Xueheng Qiu, Ponnuthurai Nagaratnam Suganthan, and Gehan A. J. Amaratunga. 2018. Ensemble incremental learning random vector functional link network for short-term electric load forecasting. Knowl.-based Syst. 145 (2018), 182–196.
[38]
Zhiguo Qu, John Keeney, Sebastian Robitzsch, Faisal Zaman, and Xiaojun Wang. 2016. Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun. 13, 7 (2016), 108–116.
[39]
Md Mamunur Rashid, Muhammad Amar, Iqbal Gondal, and Joarder Kamruzzaman. 2016. A data mining approach for machine fault diagnosis based on associated frequency patterns. Appl. Intell. 45, 3 (2016), 638–651.
[40]
Mandeep Kaur Saggi and Sushma Jain. 2018. A survey towards an integration of big data analytics to big insights for value-creation. Inf. Process. Manag. 54, 5 (Sept. 2018), 758–790. https://doi.org/10.1016/j.ipm.2018.01.010
[41]
Wenzhong Shi, Anshu Zhang, and Geoffrey I. Webb. 2018. Mining significant crisp-fuzzy spatial association rules. Int. J. Geog. Inf. Sci. 32, 6 (2018), 1247–1270.
[42]
Ömer M. Soysal, Eera Gupta, and Harisha Donepudi. 2016. A sparse memory allocation data structure for sequential and parallel association rule mining. J. Supercomput. 72, 2 (Feb. 2016), 347–370. https://doi.org/10.1007/s11227-015-1566-x
[43]
G. Srivastava, J. C. W. Lin, A. Jolfaei, Y. Li, and Y. Djenouri. 2020. Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans. Intell. Transport. Syst. 22, 8 (2020), 1–12. DOI:https://doi.org/10.1109/TITS.2020.3012387
[44]
G. Srivastava, J. C. W. Lin, X. Zhang, and Y. Li. 2020. Large-scale high-utility sequential pattern analytics in internet of things. IEEE Internet Things J. 8, 16 (2020), 1–1. DOI:https://doi.org/10.1109/JIOT.2020.3026826
[45]
Y. Sun, K. Tang, Z. Zhu, and X. Yao. 2018. Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4822–4832. DOI:https://doi.org/10.1109/TNNLS.2017.2775225
[46]
Liang Tang, Chang-Jie Tang, Lei Duan, Chuan Li, Ye-Xi Jiang, Chun-Qiu Zeng, and Jun Zhu. 2008. MoStream: An efficient algorithm for monitoring clusters evolving in data streams. In Proceedings of the IEEE International Conference on Granular Computing. IEEE, 582–587.
[47]
Darshan Tank. 2012. Real-time Business Intelligence & Frequent Pattern Mining Algorithm: Timely Consistent Analysis Using Real-time Data Warehouse Environment and Improving Efficiency of Apriori Algorithm. LAP Lambert Academic Publishing.
[48]
Thanh-LongNguyen, Vo Bay, and Vaclav Snasel. 2017. Efficient algorithms for mining colossal patterns in high dimensional databases. Knowl.-based Systems 122 (2017).
[49]
Wil van der Aalst. 2012. Process mining: Overview and opportunities. ACM Trans. Manag. Inf. Syst. 3, 2 (July 2012). DOI:https://doi.org/10.1145/2229156.2229157
[50]
Huang Wei-Dong, Wang Qian, and Cao Jie. 2018. Tracing public opinion propagation and emotional evolution based on public emergencies in social networks. Int. J. Comput. Commun. Contr. 13, 1 (2018), 129–142.
[51]
Jun Wu, Zengyou He, Feiyang Gu, Xiaoqing Liu, Jianyu Zhou, and Can Yang. 2016. Computing exact permutation p-values for association rules. Inf. Sci. 346 (2016), 146–162.
[52]
Y. Xun, Xiaohui Cui, Jifu Zhang, and Qingxia Yin. 2021. Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Syst. Appl. 163 (2021), 113805.
[53]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy (Eds.).1996. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.
[54]
Binbin Zhang, Jerry Chun-Wei Lin, Yinan Shao, Philippe Fournier-Viger, and Youcef Djenouri. 2018. Maintenance of discovered high average-utility itemsets in dynamic databases. Appl. Sci. 8, 5 (2018). DOI:https://doi.org/10.3390/app8050769
[55]
Lei Zhu, Kazushi Ikeda, Shaoning Pang, Tao Ban, and Abdolhossein Sarrafzadeh. 2018. Merging weighted SVMs for parallel incremental learning. Neural Netw. 100 (2018), 25–38.

Cited By

View all
  • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
  • (2024)A Survey of Graph Neural Networks for Social Recommender SystemsACM Computing Surveys10.1145/366182156:10(1-34)Online publication date: 22-Jun-2024
  • (2024)Counterfactual Graph Convolutional Learning for Personalized RecommendationACM Transactions on Intelligent Systems and Technology10.1145/365563215:4(1-20)Online publication date: 18-Jun-2024
  • Show More Cited By

Index Terms

  1. An Evolutive Frequent Pattern Tree-based Incremental Knowledge Discovery Algorithm

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Management Information Systems
    ACM Transactions on Management Information Systems  Volume 13, Issue 3
    September 2022
    312 pages
    ISSN:2158-656X
    EISSN:2158-6578
    DOI:10.1145/3512349
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2022
    Accepted: 01 November 2021
    Revised: 01 October 2021
    Received: 01 January 2021
    Published in TMIS Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Association rule
    2. evolutive frequent pattern tree
    3. knowledge discovery
    4. data mining
    5. incremental window
    6. public opinion analysis

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
    • (2024)A Survey of Graph Neural Networks for Social Recommender SystemsACM Computing Surveys10.1145/366182156:10(1-34)Online publication date: 22-Jun-2024
    • (2024)Counterfactual Graph Convolutional Learning for Personalized RecommendationACM Transactions on Intelligent Systems and Technology10.1145/365563215:4(1-20)Online publication date: 18-Jun-2024
    • (2024)ID-SR: Privacy-Preserving Social Recommendation Based on Infinite Divisibility for Trustworthy AIACM Transactions on Knowledge Discovery from Data10.1145/363941218:7(1-25)Online publication date: 19-Jun-2024
    • (2024)Privacy-Preserving Individual-Level COVID-19 Infection Prediction via Federated Graph LearningACM Transactions on Information Systems10.1145/363320242:3(1-29)Online publication date: 22-Jan-2024
    • (2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
    • (2024)Revisit Targeted Model Poisoning on Federated Recommendation: Optimize via Multi-objective TransportProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657764(1722-1732)Online publication date: 10-Jul-2024
    • (2024)GraphScope Flex: LEGO-like Graph Computing StackCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653383(386-399)Online publication date: 9-Jun-2024
    • (2024)Mining the colossal patterns using ISSA based KMC with VGHHO clustering model for high dimensional dataTelematics and Informatics Reports10.1016/j.teler.2024.10012513(100125)Online publication date: Mar-2024
    • (2023)FedGTA: Topology-Aware Averaging for Federated Graph LearningProceedings of the VLDB Endowment10.14778/3617838.361784217:1(41-50)Online publication date: 1-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media