A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining
<p>Pre-large concept for transaction deletion.</p> "> Figure 2
<p>Three side effects of the sanitization process.</p> "> Figure 3
<p>Runtime under varied percentages of sensitivity.</p> "> Figure 4
<p>Hiding failure under varied percentages of sensitivity.</p> "> Figure 5
<p>Missing cost under varied percentages of sensitivity.</p> "> Figure 6
<p>Artificial cost under varied percentages of sensitivity.</p> "> Figure 7
<p>Database dissimilarity under varied percentages of sensitivity.</p> "> Figure 8
<p>Scalability under varied database sizes.</p> ">
Abstract
:1. Introduction
- This is the first work regarding the design of an MOPSO-based framework in PPDM, which shows a better performance in terms of side effects compared to a conventional single-objective approach and the NSGA-II-based model.
- We present two updating strategies for the global best (gbest) and personal best (pbest) solutions during the updating process of the MOPSO framework, which can be utilized in the PPDM problem for obtaining un-dominated solutions.
- To speed up the evolutionary process, the pre-large concept is utilized here to speed up the process of the evaluated solution, and thus multiple database scans can be greatly avoided.
- Experiment results demonstrate the performance of the designed GMPSO in terms of the time cost and four side effects as compared to a traditional single-objective algorithm and the NSGA-II-based approach.
2. Literature Review
2.1. Evolutionary Computation
2.2. Privacy-Preserving Data Mining
2.3. Pre-Large Concept
3. Preliminary Information and Problem Statement
4. Proposed MOPSO-Based Framework for Data Sanitization
4.1. Data Processing
Algorithm 1: Data Processing |
4.2. Evolution Process
Algorithm 2: Proposed GMPSO Algorithm |
Algorithm 3: GridProb(Pool) |
5. An Illustrated Example
6. Experimental Results
6.1. Runtime
6.2. Side Effects
6.3. Scalability
7. Conclusions and Areas of Future Study
Author Contributions
Funding
Conflicts of Interest
Appendix A
Acronym | Full Name and Its Explanation |
MDT | The maximal deleted transaction of all the confidential itemsets |
m | The size of the particle in the evolution |
D | The original database before sanitization |
D’ | The sanitized database |
CI | The confidential information that should be hidden before sanitization |
CI’ | The remaining confidential information after sanitization |
FI | The set of frequent itemsets before sanitization |
FI’ | The set of frequent itemsets after sanitization |
PF | The set of pre-large itemsets |
The hiding failure after sanitization | |
The missing cost after sanitization | |
The artificial cost after sanitization | |
Dis | The database dissimilarity between the original database and the sanitized one |
minsup | The minimum support threshold |
lowsup | The lower support threshold |
LUS | The utilized local updating strategy in the designed algorithm |
GUS | The utilized global updating strategy in the designed algorithm |
References
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Base, Santiago, Chile, 12–15 September 1994; pp. 487–499. [Google Scholar]
- Chen, M.S.; Han, J.; Yu, P.S. Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 1996, 8, 866–883. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.W.; Chao, H.C.; Zhan, J. Data mining in distributed environment: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017, 7, e1216. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Hong, T.P.; Fujita, H. A survey of incremental high-utility itemset mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1242. [Google Scholar] [CrossRef]
- Lin, J.C.W.; Yang, L.; Fournier-Viger, P.; Hong, T.P. Mining of skyline patterns by considering both frequent and utility constraints. Eng. Appl. Artif. Intell. 2019, 77, 229–238. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Lin, J.C.W.; Kiran, R.U.; Koh, Y.S.; Thomas, R. A survey of sequential pattern mining. Data Sci. Pattern Recognit. 2017, 1, 54–77. [Google Scholar]
- Atallah, M.; Bertino, E.; Elmagarmid, A.; Ibrahim, M.; Verykios, V. Disclosure limitation of sensitive rules. In Proceedings of the Workshop on Knowledge and Data Engineering Exchange, Chicago, IL, USA, 7 November 1999; pp. 45–52. [Google Scholar]
- Aggarwal, C.C.; Pei, J.; Zhang, B. On privacy preservation against adversarial data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 510–516. [Google Scholar]
- Oliveira, S.R.M.; Zaïane, O.R. Privacy preserving frequent itemset mining. In Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, Maebashi City, Japan, 23–26 July 2002; pp. 43–54. [Google Scholar]
- Verykios, V.S.; Bertino, E.; Fovino, I.N.; Provenza, L.P.; Saygin, Y.; Theodoridis, Y. State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 2004, 33, 50–57. [Google Scholar] [CrossRef]
- Lindell, Y.; Pinkas, B. Privacy preserving data mining. In Proceedings of the Annual International Cryptology Conference on Advances in Cryptology, Santa Barbara, CA, USA, 20–24 August 2000; pp. 36–54. [Google Scholar]
- Clifton, C.; Kantarcioglu, M.; Vaidya, J.; Lin, X.; Zhu, M.Y. Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. 2003, 4, 28–34. [Google Scholar] [CrossRef]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3876, pp. 265–284. [Google Scholar]
- Wu, Y.H.; Chiang, C.M.; Chen, A.L.P. Hiding sensitive association rules with limited side effects. IEEE Trans. Knowl. Data Eng. 2007, 19, 29–42. [Google Scholar] [CrossRef]
- Hong, T.P.; Lin, C.W.; Yang, K.T.; Wang, S.L. Using TF-IDF to hide sensitive itemsets. Appl. Intell. 2012, 38, 502–510. [Google Scholar] [CrossRef]
- Dasseni, E.; Verykios, V.S.; Elmagarmid, A.K.; Bertino, E. Hiding association rules by using confidence and support. In Proceedings of the International Workshop on Information Hiding, Pittsburgh, PA, USA, 25–27 April 2001; pp. 369–383. [Google Scholar]
- Evfimievski, A.; Srikant, R.; Agrawal, R.; Gehrke, J. Privacy preserving mining of association rules. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 217–228. [Google Scholar]
- Lin, C.W.; Hong, T.P.; Chang, C.C.; Wang, S.L. A greedy-based approach for hiding sensitive itemsets by transaction insertion. J. Inf. Hiding Multimed. Signal Process. 2013, 4, 201–227. [Google Scholar]
- Lin, C.W.; Zhang, B.; Yang, K.T.; Hong, T.P. Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci. World J. 2014, 2014, 398269. [Google Scholar] [CrossRef] [PubMed]
- Lin, C.W.; Hong, T.P.; Yang, K.T.; Wang, S.L. The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl. Intell. 2015, 42, 210–230. [Google Scholar] [CrossRef]
- Cheng, P.; Lee, I.; Lin, C.W.; Pan, J.S. Association rule hiding based on evolutionary multi-objective optimization. Intell. Data Anal. 2016, 20, 495–514. [Google Scholar] [CrossRef]
- Deb, K.; Pratap, A.; Agrawal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
- Lin, J.C.W.; Zhang, Y.; Zhang, B.; Fournier-Viger, P.; Djenouri, Y. Hiding sensitive itemsets with multiple objective optimization. Soft Comput. 2019, 1–19. [Google Scholar] [CrossRef]
- Coello, C.A.; Lechuga, M.S. MOPSO: A proposal for multiple objective particle swarm optimization. In Proceedings of the IEEE Congress on Evolutionary Computation, Honolulu, HI, USA, 12–17 May 2002; pp. 1051–1056. [Google Scholar]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
- Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1989. [Google Scholar]
- Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Colorni, A.; Dorigo, M.; Maniezzo, V. Distributed optimization by ant colonies. In Proceedings of the European Conference on Artificial Life, Paris, France, 11–13 December 1991; pp. 134–142. [Google Scholar]
- Karaboga, D.; Basturk, B. A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. J. Glob. Optim. 2007, 39, 459–471. [Google Scholar] [CrossRef]
- Fonseca, C.M.; Fleming, P.J. Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In Proceedings of the International Conference on Genetic Algorithms, Urbana-Champaign, IL, USA, 17–21 June 1993; pp. 416–423. [Google Scholar]
- Srinivas, N.; Deb, K. Multiobjective optimization using nondominated sorting in genetic algorithms. Evolut. Comput. 1994, 2, 221–248. [Google Scholar] [CrossRef]
- Jeyadevi, S.; Baskar, S.; Babulal, C.K.; Iruthayarajan, M.W. Solving multiobjective optimal reactive power dispatch using modified NSGA-II. Int. J. Electr. Power Energy Syst. 2011, 33, 219–228. [Google Scholar] [CrossRef]
- Zitzler, E.; Thiele, L. Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach. IEEE Trans. Evolut. Comput. 1999, 3, 257–271. [Google Scholar] [CrossRef]
- Knowles, J.; Corne, D. The pareto archived evolution strategy: A new baseline algorithm for Pareto multiobjective optimisation. In Proceedings of the IEEE Congress on Evolutionary Computation, Washington, DC, USA, 6–9 July 1999; pp. 98–105. [Google Scholar]
- Chen, C.H.; Chen, Y.H.; Lin, J.C.W.; Wu, M.E. An effective approach for obtaining a group trading strategy portfolio using grouping genetic algorithm. IEEE Access 2019, 7, 7313–7325. [Google Scholar] [CrossRef]
- Pan, J.S.; Kong, L.; Sung, T.W.; Tsai, P.W.; Snášel, V. A clustering scheme for wireless sensor networks based on genetic algorithm and dominating set. J. Internet Technol. 2018, 19, 1111–1118. [Google Scholar]
- Wu, J.M.T.; Zhan, J.; Lin, J.C.W. An ACO-based approach to mine high-utility itemsets. Knowl. Based Syst. 2017, 116, 102–113. [Google Scholar] [CrossRef]
- Agrawal, R.; Srikant, R. Privacy-preserving data mining. ACM SIGMOD Rec. 2000, 29, 439–450. [Google Scholar] [CrossRef] [Green Version]
- Islam, M.Z.; Brankovic, L. Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowl. Based Syst. 2011, 24, 1214–1223. [Google Scholar] [CrossRef]
- Han, S.; Ng, W.K. Privacy-preserving genetic algorithms for rule discovery. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Regensburg, Germany, 3–7 September 2007; pp. 407–417. [Google Scholar]
- Hasan, A.S.M.T.; Jiang, Q.; Chen, H.; Wang, S. A new approach to privacy-preserving multiple independent data publishing. Appl. Sci. 2018, 8, 783. [Google Scholar] [CrossRef]
- Liu, F.; Li, T. A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur. Commun. Netw. 2018, 2018, 4945152. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 2004, 8, 53–87. [Google Scholar] [CrossRef]
- Cheung, D.W.; Han, J.; Ng, V.T.; Wong, C.Y. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proceedings of the International Conference on Data Engineering, New Orleans, LA, USA, 26 February–1 March 1996; pp. 106–114. [Google Scholar]
- Lin, C.W.; Hong, T.P.; Lu, W.H. The pre-FUFP algorithm for incremental mining. Expert Syst. Appl. 2009, 36, 9498–9505. [Google Scholar] [CrossRef]
- Hong, T.P.; Wang, C.Y.; Tao, Y.H. A new incremental data mining algorithm using pre-large itemsets. Intell. Data Anal. 2001, 5, 111–129. [Google Scholar] [CrossRef]
- Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms; John Wiley & Sons, Inc.: New York, NY, USA, 2001. [Google Scholar]
- Fournier-Viger, P.; Lin, J.C.W.; Gomariz, A.; Gueniche, T.; Soltani, A.; Deng, Z. The SPMF open-source data mining library version 2. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Italy, 19–23 September 2016; pp. 36–40. [Google Scholar]
- Agrawal, R.; Srikant, R. Quest Synthetic Data Generator; IBM Almaden Research Center: San Jose, CA, USA, 1994; Available online: http://www.Almaden.ibm.com/cs/quest/syndata.html (accessed on 12 December 2018).
TID | Items | TID | Items |
---|---|---|---|
A, B, E | B, C, E | ||
B, C, E | B, C, D, E | ||
A, B, C, E | A, B, E | ||
A, C, D | A, B | ||
B, C, E | C, E |
1-Itemset | Count | 2-Itemset | Count | 3-Itmset | Count |
---|---|---|---|---|---|
(A) | 5 | (AB) | 4 | (BCE) | 5 |
(B) | 8 | (BC) | 5 | ||
(C) | 7 | (BE) | 7 | ||
(E) | 8 | (CE) | 6 |
TID | Items |
---|---|
B, C, E | |
A, B, C, E | |
B, C, E | |
B, C, E | |
B, C, D, E | |
C, E |
Particle | TIDs | ||||
---|---|---|---|---|---|
2, 3, 5, 6 | 0 | 1 | 0 | 4 | |
0, 5, 6, 7 | 1 | 1 | 2 | 3 | |
0, 2, 3, 3 | 1 | 1 | 1 | 3 | |
0, 5, 6, 0 | 1 | 0 | 1 | 2 | |
2, 3, 0, 10 | 2 | 0 | 0 | 3 |
Dataset | AvgLen | MaxLen | Type | ||
---|---|---|---|---|---|
chess | 3196 | 74 | 37 | 37 | dense |
mushroom | 8124 | 119 | 23 | 23 | dense |
foodmart | 21,556 | 1559 | 4 | 11 | sparse |
T10I4D100K | 100,000 | 870 | 10.1 | 29 | sparse |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, T.-Y.; Lin, J.C.-W.; Zhang, Y.; Chen, C.-H. A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining. Appl. Sci. 2019, 9, 774. https://doi.org/10.3390/app9040774
Wu T-Y, Lin JC-W, Zhang Y, Chen C-H. A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining. Applied Sciences. 2019; 9(4):774. https://doi.org/10.3390/app9040774
Chicago/Turabian StyleWu, Tsu-Yang, Jerry Chun-Wei Lin, Yuyu Zhang, and Chun-Hao Chen. 2019. "A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining" Applied Sciences 9, no. 4: 774. https://doi.org/10.3390/app9040774
APA StyleWu, T. -Y., Lin, J. C. -W., Zhang, Y., & Chen, C. -H. (2019). A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining. Applied Sciences, 9(4), 774. https://doi.org/10.3390/app9040774