Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Privacy preserving rare itemset mining

Published: 25 June 2024 Publication History

Abstract

In recent years, rare pattern mining has shown great vitality in some real-world fields, such as disease diagnosis, criminal behavior analysis, anomaly detection in networks, and so on. When data organizations publish or share information publicly, shared data can be at risk of leakage as data mining techniques may discover sensitive knowledge and information. To keep competitors from obtaining hidden information after processing the database, privacy-preserving data mining (PPDM) has been proposed and studied widely. However, most of the techniques in PPDM are applied to frequent pattern mining and cannot deal with the privacy protection problems in rare pattern mining, such as network vulnerability detection and abnormal medical data. To address this limitation, we introduce a privacy-preserving technique for rare pattern mining. In this paper, two novel algorithms named Longest Transaction-Minimum Item Number (LT-MIN) and Longest Transaction-Maximum Item Number (LT-MAX) are proposed to hide sensitive rare itemsets and return the sanitized database. These two algorithms succeed in hiding target itemsets while minimizing the side effects on the original database. What's more, they employ a projection mechanism to reduce the time spent scanning the database. Besides using the traditional evaluation criteria in PPDM, we also propose two additional similarity measures to evaluate the performance from the perspective of the itemsets and the structural integrity of the database. The experimental results indicate that the proposed algorithms can hide sensitive rare itemsets successfully and efficiently, and the evaluation methods used can become the evaluation criteria for privacy-preserving rare itemset mining (PPRIM).

References

[1]
M. Adda, L. Wu, Y. Feng, Rare itemset mining, in: Sixth International Conference on Machine Learning and Applications, IEEE, 2007, pp. 73–80.
[2]
C.C. Aggarwal, P.S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer Science & Business Media, 2008.
[3]
R. Agrawal, R. Srikant, Privacy-preserving data mining, in: The ACM SIGMOD International Conference on Management of Data, 2000, pp. 439–450.
[4]
A. Amiri, Dare to share: protecting sensitive knowledge with data sanitization, Decis. Support Syst. 43 (2007) 181–191.
[5]
M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, V. Verykios, Disclosure limitation of sensitive rules, in: Workshop on Knowledge and Data Engineering Exchange, IEEE, 1999, pp. 45–52.
[6]
D.E. Bakken, R. Rarameswaran, D.M. Blough, A.A. Franz, T.J. Palmer, Data obfuscation: anonymity and desensitization of usable data sets, IEEE Secur. Priv. 2 (2004) 34–41.
[7]
S. Bashir, Z. Halim, A.R. Baig, Mining fault tolerant frequent patterns using pattern growth approach, in: IEEE/ACS International Conference on Computer Systems and Applications, IEEE, 2008, pp. 172–179.
[8]
U.Y. Bhatt, P.A. Patel, An effective approach to mine rare items using maximum constraint, in: IEEE 9th International Conference on Intelligent Systems and Control, IEEE, 2015, pp. 1–6.
[9]
Y. Chen, J. Zhang, C.K. Yeo, Privacy-preserving knowledge transfer for intrusion detection with federated deep autoencoding gaussian mixture model, Inf. Sci. 609 (2022) 1204–1220.
[10]
Y. Cui, W. Gan, H. Lin, W. Zheng, FRI-miner: fuzzy rare itemset mining, Appl. Intell. (2022) 1–16.
[11]
S. Darrab, D. Broneske, G. Saake, Modern applications and challenges for rare itemset mining, Int. J. Mach. Learn. Comput. 11 (2021) 208–218.
[12]
X. Dong, Y. Gong, L. Cao, e-RNSP: an efficient method for mining repetition negative sequential patterns, IEEE Trans. Cybern. 50 (2020) 2084–2096.
[13]
P. Fournier-Viger, W. Gan, Y. Wu, M. Nouioua, W. Song, T. Truong, H. Duong, Pattern mining: current challenges and opportunities, in: International Conference on Database Systems for Advanced Applications, Springer, 2022, pp. 34–49.
[14]
W. Gan, J. Chun Wei, H.C. Chao, S.L. Wang, P.S. Yu, Privacy preserving utility mining: a survey, in: IEEE International Conference on Big Data, IEEE, 2018, pp. 2617–2626.
[15]
W. Gan, J.C.W. Lin, J. Zhang, H.C. Chao, H. Fujita, P.S. Yu, ProUM: projection-based utility mining on sequence data, Inf. Sci. 513 (2020) 222–240.
[16]
X. Gao, Y. Gong, T. Xu, J. Lu, Y. Zhao, X. Dong, Toward better structure and constraint to mine negative sequential patterns, IEEE Trans. Neural Netw. Learn. Syst. 34 (2023) 571–585.
[17]
Y. Gong, Z. Li, J. Zhang, W. Liu, Y. Zheng, Online spatio-temporal crowd flow distribution prediction for complex metro system, IEEE Trans. Knowl. Data Eng. 34 (2022) 865–880.
[18]
Z. Halim, O. Ali, M.G. Khan, On the efficient representation of datasets as graphs to mine maximal frequent itemsets, IEEE Trans. Knowl. Data Eng. 33 (2021) 1674–1691.
[19]
J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, SIGMOD Rec. 29 (2000) 1–12.
[20]
U. Hewage, R. Sinha, M.A. Naeem, Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review, Artif. Intell. Rev. (2023) 1–38.
[21]
S. Jangra, D. Toshniwal, Efficient algorithms for victim item selection in privacy-preserving utility mining, Future Gener. Comput. Syst. 128 (2022) 219–234.
[22]
K. Kenthapadi, I. Mironov, A.G. Thakurta, Privacy-preserving data mining in industry, in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019, pp. 840–841.
[23]
Y.C. Li, J.S. Yeh, C.C. Chang, MICF: an effective sanitization algorithm for hiding sensitive patterns on data mining, Adv. Eng. Inform. 21 (2007) 269–280.
[24]
J.C.W. Lin, W. Gan, P. Fournier-Viger, T.P. Hong, V.S. Tseng, Efficient algorithms for mining high-utility itemsets in uncertain databases, Knowl.-Based Syst. 96 (2016) 171–187.
[25]
J.C.W. Lin, T.Y. Wu, P. Fournier-Viger, G. Lin, J. Zhan, M. Voznak, Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining, Eng. Appl. Artif. Intell. 55 (2016) 269–284.
[26]
X. Liu, G. Chen, S. Wen, G. Song, An improved sanitization algorithm in privacy-preserving utility mining, Math. Probl. Eng. 2020 (2020).
[27]
C. Ni, L.S. Cang, P. Gope, G. Min, Data anonymization evaluation for big data and iot environment, Inf. Sci. 605 (2022) 381–392.
[28]
S.R. Oliveira, O.R. Zaiane, Privacy preserving frequent itemset mining, in: The IEEE International Conference on Privacy, Security and Data Mining, Citeseer, 2002, pp. 43–54.
[29]
F. Padillo, J.M. Luna, S. Ventura, Mining perfectly rare itemsets on big data: an approach based on apriori-inverse and mapreduce, in: International Conference on Intelligent Systems Design and Applications, Springer, 2016, pp. 508–518.
[30]
A. Pika, M.T. Wynn, S. Budiono, A.H. Ter Hofstede, W.M. van der Aalst, H.A. Reijers, Privacy-preserving process mining in healthcare, Int. J. Environ. Res. Public Health 17 (2020) 1612.
[31]
P. Ruzafa-Alcázar, P. Fernández-Saura, E. Mármol-Campos, A. González-Vidal, J.L. Hernández-Ramos, J. Bernal-Bernabe, A.F. Skarmeta, Intrusion detection based on privacy-preserving federated learning for the industrial iot, IEEE Trans. Ind. Inform. 19 (2021) 1145–1154.
[32]
K.S. Sadhasivam, T. Angamuthu, Mining rare itemset with automated support thresholds, J. Comput. Sci. 7 (2011) 394.
[33]
A. Shah, Z. Halim, On efficient mining of frequent itemsets from big uncertain databases, J. Grid Comput. 17 (2019) 831–850.
[34]
L. Szathmary, A. Napoli, P. Valtchev, Towards rare itemset mining, in: The 19th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 2007, pp. 305–312.
[35]
R. Talat, M.S. Obaidat, M. Muzammal, A.H. Sodhro, Z. Luo, S. Pirbhulal, A decentralised approach to privacy preserving trajectory mining, Future Gener. Comput. Syst. 102 (2020) 382–392.
[36]
S. Tsang, Y.S. Koh, G. Dobbie, RP-tree: rare pattern tree mining, in: International Conference on Data Warehousing and Knowledge Discovery, Springer, 2011, pp. 277–288.
[37]
K. Tummala, C. Oswald, B. Sivaselvan, A frequent and rare itemset mining approach to transaction clustering, in: International Conference on Data Science Analytics and Applications, Springer, 2017, pp. 8–18.
[38]
V.S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, E. Dasseni, Association rule hiding, IEEE Trans. Knowl. Data Eng. 16 (2004) 434–447.
[39]
V.S. Verykios, A. Gkoulalas-Divanis, A survey of association rule hiding methods for privacy, in: Privacy-Preserving Data Mining, Springer, 2008, pp. 267–289.
[40]
Y.H. Wu, C.M. Chiang, A.L. Chen, Hiding sensitive association rules with limited side effects, IEEE Trans. Knowl. Data Eng. 19 (2007) 29–42.
[41]
J.S. Yeh, P.C. Hsu, HHUIF and MSICF: novel algorithms for privacy preserving utility mining, Expert Syst. Appl. 37 (2010) 4779–4786.
[42]
C. Zhang, L. Zhu, C. Xu, R. Lu, PPDP: an efficient and privacy-preserving disease prediction scheme in cloud-based e-healthcare system, Future Gener. Comput. Syst. 79 (2018) 16–25.

Cited By

View all
  • (2024)A Comprehensive Survey on Rare Event PredictionACM Computing Surveys10.1145/369995557:3(1-39)Online publication date: 11-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 662, Issue C
Mar 2024
1436 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

  1. Data mining
  2. Privacy-preserving
  3. Rare pattern
  4. Minimum side effects

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Comprehensive Survey on Rare Event PredictionACM Computing Surveys10.1145/369995557:3(1-39)Online publication date: 11-Nov-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media