A transversal hypergraph approach for the frequent itemset hiding problem

Elias C. Stavropoulos ORCID: orcid.org/0000-0002-0706-781X^1,2,
Vassilios S. Verykios^1,3 &
Vasileios Kagklis^1,4

475 Accesses
17 Citations
Explore all metrics

Abstract

We propose a methodology for hiding all sensitive frequent itemsets in a transaction database. Our methodology relies on a novel technique that enumerates the minimal transversals of a hypergraph in order to induce the ideal border between frequent and sensitive itemsets. The ideal border is then utilized to formulate an integer linear program (ILP) that answers whether a feasible sanitized database that attains the ideal border, exists. The solution of the program identifies the set of transactions that need to be modified (sanitized) so that the hiding can be achieved with the maximum accuracy. If no solution exists, we modify the ILP by relaxing the constraints needed to be satisfied so that the sanitized database preserves the privacy with guarantee but with minimum effect in data quality. Experimental evaluation of the proposed approach on a number of real datasets has shown that the produced sanitized databases exhibit higher accuracy when compared with the solutions of other well-known approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

Multiobjective-integer-programming-based Sensitive Frequent Itemsets Hiding

An Improved Algorithm to Protect Sensitive High Utility Itemsets in Transaction Database

References

Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Advances in database systems. Springer, New York
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB’94), pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD 2000), pp 439–450
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the knowledge and data engineering exchange (KDEX’99), pp 45–52
Bailey J, Manoukian T, Ramamohanarao K (2003) A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 485–488. IEEE computer Society, Dec 2003
Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), pp 85–93
Berge C (1989) Hypergraphs: combinatorics of finite sets, volume 45 of North Holland mathematical library. Elsevier Science Publishers B.V., Amsterdam
Google Scholar
Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03), vol 90, pp 56–65
Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & Hall/CRC data mining and knowledge discovery series. CRC Press INC
Borgelt C (2012) Frequent item set mining. Wiley Interdiscip Rev: Data Min Knowl Discov 2(6):437–456
Google Scholar
Boros E, Elbassioni K, Gurvich V, Khachiyan L (2003) An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals. In: Proceedings of the 11th annual European symposium on algorithms (ESA 2003), vol 2432 of LNCS, 556–567
Boros E, Elbassioni K, Makino K (2008) On Berge multiplication for monotone boolean dualization. In: Proceedings of the 35th international colloquium on automata, languages and programming (ICALP 2008), volume 5125 of LNCS, 48–59
Boros E, Gurvich V, Khachiyan L, Makino K (2003) On maximal frequent and minimal infrequent sets in binary matrices. Ann Math Artif Intell 39(3):211–221
Article MathSciNet MATH Google Scholar
Brijs T, Swinnen G, Vanhoof K, Wets G (1999) Using association rules for product assortment decisions: a case study. In: proceedings of the 5th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 254–260
Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input–output privacy. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 696–705
Calders T (2004) Computational complexity on itemset frequency satisfiability. In: Proceedings of symposium on principles of database systems 2004 (PODS’04), pp 143–154
Calders T (2008) Itemset frequency satisfiability: complexity and axiomatization. Theor Comput Sci 394(1–2):84–111
Article MathSciNet MATH Google Scholar
Clifton C (1999) Protecting against data mining through samples. In: Proceedings of the 13th international conference on database security (DBSec’99), pp 193–207
Dong G, Li J (2005) Mining border descriptions of emerging patterns from dataset pairs. Knowl Info Syst 8(2):178–202
Article Google Scholar
Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304
Article MathSciNet MATH Google Scholar
Eiter T, Gottlob G (2002) Hypergraph transversal computation and related problems in Logic and AI. In: Proceedings of European conference on logic in AI (JELIA 2002), vol 2424 of LNCS/LNAI, pp 549–564
Eiter T, Gottlob G, Makino K (2003) New results on monotone dualization and generating hypergraph transversals. SIAM J Comput 32(2):514–537
Article MathSciNet MATH Google Scholar
Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Info Syst 29(4):343–364
Article Google Scholar
Faloutsos C, Megalooikonomou V (2007) On data mining, compression, and Kolmogorov complexity. Data Min Knowl Discov 15(1):3–20
Article MathSciNet Google Scholar
Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/
Fredman ML, Khachiyan L (1996) On the complexity of dualization of monotone disjunctive normal forms. J Algorithm 21:618–628
Article MathSciNet MATH Google Scholar
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):571–588
Article Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San Francisco
MATH Google Scholar
Georgakopoulos G, Kavvadias D, Papadimitriou CH (1988) Probabilistic satisfiability. J Complex 4:1–11
Article MathSciNet MATH Google Scholar
Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Info Syst 20(3):263–299
Article Google Scholar
Goldsmith J, Levy MA, Mundhenk M (1996) Limited nondeterminism. ACM SIGACT News 27(2):20–29
Article MathSciNet Google Scholar
Gottlob G (2013) Deciding monotone duality and identifying frequent itemsets in quadratic logspace. Technical report arxiv:1212.1881v3 [cs.DC]
Gunopulos D, Khardon R, Mannila H, Saluja S, Sharma HTR (2003) Discovering all most specific sentences. ACM Trans Database Syst 28(2):140–174
Article Google Scholar
Gurvich V, Khachiyan L (1999) On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discret Appl Math 96–97:363–373
Article MathSciNet MATH Google Scholar
Guzzo A, Moccia L, Saccà D, Serra E (2013) Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs. ACM Trans Knowl Discov Data 7(4), Article 18, 1–39
Guzzo A, Saccà D, Serra E (2009) An effective approach to inverse frequent set mining. In: Proceedings of the 9th IEEE international conference on data mining (ICDM’09), pp 806–811
Hagen M (2009) Lower bounds for three algorithms for transversal hypergraph generation. Discret Appl Math 157:1460–1469
Article MathSciNet MATH Google Scholar
IBM ILOG CPLEX user’s manual v12.6
IBM Basket Data Generator. http://sourceforge.net/projects/ibmquestdatagen/
Kagklis V, Verykios VS, Tzimas G, Tsakalidis AK (2014) An integer linear programming scheme to sanitize sensitive frequent itemsets. In: Proceedings of 2014 IEEE international conference on tools with AI (ICTAI 2014), 2014. To appear
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 599–604
Kavvadias DJ, Stavropoulos EC (2003) Monotone Boolean dualization is in co-NP[$\log ^2n$]. Info Process Lett 85(1):1–6
Article MathSciNet MATH Google Scholar
Kavvadias DJ, Stavropoulos EC (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9(2):239–264
Article MathSciNet MATH Google Scholar
Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD explorations, 2(2):86–98. http://www.ecn.purdue.edu/KDDCUP
Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: The 6th international conference on information, process, and knowledge management, pp 124–130
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1:241–258
Article Google Scholar
Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Info Syst Res 16(3):256–270
Article Google Scholar
Mielikäinen T (2003) On inverse frequent set mining problems. In: Proceedings of the 2nd workshop on privacy preserving data mining (PPDM’03), pp 18–33
Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89
Article Google Scholar
Murakami K, Uno T (2011) Efficient algorithms for dualizing large-scale hypergraphs. Technical report arxiv:1102.3813v2 [cs.DC]
Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large data bases (VLDB’02), pp 682–693
Sun X, Yu P (2005) A border–based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE internationa conference on data mining (ICDM 2005), pp 426–433
Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94
Article Google Scholar
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588
Article MathSciNet MATH Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
Takata K (2007) A worst-case analysis of the sequential method to list the minimal hitting sets of a hypergraph. SIAM J Discret Math 21(4):936–946
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors wish to thank the anonymous referees for their valuable comments that improved the final presentation of the paper.

Author information

Authors and Affiliations

Educational Content, Methodology and Technology Laboratory, Hellenic Open University, 278, Patron Claus Str., 263 35, Patras, Greece
Elias C. Stavropoulos, Vassilios S. Verykios & Vasileios Kagklis
Business Administration Department, Technological Educational Institute of Western Greece, 263 34, Patras, Greece
Elias C. Stavropoulos
School of Science and Technology, Hellenic Open University, 263 35, Patras, Greece
Vassilios S. Verykios
Computer Engineering and Informatics Department, University of Patras, 265 04, Patras, Greece
Vasileios Kagklis

Authors

Elias C. Stavropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Vassilios S. Verykios
View author publications
You can also search for this author in PubMed Google Scholar
Vasileios Kagklis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elias C. Stavropoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stavropoulos, E.C., Verykios, V.S. & Kagklis, V. A transversal hypergraph approach for the frequent itemset hiding problem. Knowl Inf Syst 47, 625–645 (2016). https://doi.org/10.1007/s10115-015-0862-3

Download citation

Received: 09 December 2014
Revised: 20 March 2015
Accepted: 06 July 2015
Published: 17 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10115-015-0862-3

A transversal hypergraph approach for the frequent itemset hiding problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

Multiobjective-integer-programming-based Sensitive Frequent Itemsets Hiding

An Improved Algorithm to Protect Sensitive High Utility Itemsets in Transaction Database

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A transversal hypergraph approach for the frequent itemset hiding problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

Multiobjective-integer-programming-based Sensitive Frequent Itemsets Hiding

An Improved Algorithm to Protect Sensitive High Utility Itemsets in Transaction Database

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation