research-article

Discovering frequent patterns in sensitive data

Authors:

Raghav Bhaskar,

Srivatsan Laxman,

Abhradeep ThakurtaAuthors Info & Claims

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 503 - 512

https://doi.org/10.1145/1835804.1835869

Published: 25 July 2010 Publication History

Abstract

Discovering frequent patterns from data is a popular exploratory technique in datamining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there.

We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

Supplementary Material

JPG File (kdd2010_thakurta_dfp_01.jpg)

Download
10.00 KB

MOV File (kdd2010_thakurta_dfp_01.mov)

Download
99.12 MB

References

[1]

Apriori implementation of Ferenc Bodon. http://www.cs.bme.hu/~bodon/en/apriori/.

[2]

Frequent itemset mining implementations repository. http://fimi.helsinki.fi.

[3]

C. Aggarwal, C. C. Aggarwal, and P. S. Yu. A condensation approach to privacy preserving data mining. In Proceedings of the Ninth International Conference on Extending Database Technology (EDBT), pages 183--199, 2004.

[4]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207--216, May 1993.

Digital Library

[5]

R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, Mar. 1995. IEEE Computer Society, Washington, DC, USA.

Digital Library

[6]

S. Agrawal and J. R. Haritsa. A framework for high-accuracy privacy-preserving mining. In ICDE, pages 193--204, 2005.

Digital Library

[7]

M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.

[8]

R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering Frequent Patterns in Sensitive Data . Technical Report NAS-TR-0129-2010, Network and Security Research Center, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA, Apr. 2010. http://www.cse.psu.edu/~asmith/fim/.

Digital Library

[9]

A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008.

Digital Library

[10]

C. Dwork. Differential privacy. In ICALP, LNCS, pages 1--12, 2006.

Digital Library

[11]

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006.

Digital Library

[12]

A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003.

Digital Library

[13]

S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008.

Digital Library

[14]

N. G.N., B. A., H. J., S. K., and E. I.R. Temporal pattern discovery for trends and transient effects: Its application to patient records. In Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining SIGKDD 2008, pages 963--971, 2008.

Digital Library

[15]

M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Privacy in search logs. CoRR, abs/0904.0682, 2009.

[16]

J. Han and M. Kamber. Data mining: Concepts and techniques. Morgan Kaufmann Publishers, San Fransisco, CA, USA, 2001.

Digital Library

[17]

D. Hand, H. Mannila, and P. Smyth. Principles of data mining. MIT Press, Cambridge, MA, USA, 2001.

Digital Library

[18]

V. Hristidis, editor. Information Discovery on Electronic Health Records. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Boca Raton, FL, USA, 2009.

Digital Library

[19]

A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009.

Digital Library

[20]

H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. 1(3):259--289, 1997.

Digital Library

[21]

Y. Matias, J. S. Vitter, and W.-C. Ni. Dynamic generation of discrete random variates. Theory Comput. Syst., 36(4):329--358, 2003.

[22]

F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference, pages 19--30, 2009.

Digital Library

[23]

F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007.

Digital Library

[24]

T. Mielikäinen. On inverse frequent set mining. In 2nd Workshop on Privacy Preserving Data Mining (PPDM 2003), pages 18--23. IEEE Computer Society, 2003.

[25]

L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002.

Digital Library

[26]

T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.

Digital Library

Cited By

Chang SShi ZXiao FHuang HLiu XSun C(2024)Privacy-Enhanced Frequent Sequence Mining and Retrieval for Personalized Behavior PredictionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339192819(4957-4969)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3391928
Zhang LWu SZhang H(2024)Privacy-Preserving Frank-Wolfe on Shuffle ModelActa Mathematicae Applicatae Sinica, English Series10.1007/s10255-024-1095-640:4(887-907)Online publication date: 1-Jun-2024
https://doi.org/10.1007/s10255-024-1095-6
Raff EKhanna ALu FOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Scaling up differentially private LASSO regularized logistic regression via faster frank-wolfe iterationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667700(36349-36363)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667700
Show More Cited By

Index Terms

Discovering frequent patterns in sensitive data
1. Information systems
  1. Data management systems

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Dataless Transitions Between Concise Representations of Frequent Patterns

For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
Mining frequent graph patterns with differential privacy
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

July 2010

1240 pages

ISBN:9781450300551

DOI:10.1145/1835804

General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '10

Sponsor:

KDD '10: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 25 - 28, 2010

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

155
Total Citations
View Citations
1,574
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang SShi ZXiao FHuang HLiu XSun C(2024)Privacy-Enhanced Frequent Sequence Mining and Retrieval for Personalized Behavior PredictionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339192819(4957-4969)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3391928
Zhang LWu SZhang H(2024)Privacy-Preserving Frank-Wolfe on Shuffle ModelActa Mathematicae Applicatae Sinica, English Series10.1007/s10255-024-1095-640:4(887-907)Online publication date: 1-Jun-2024
https://doi.org/10.1007/s10255-024-1095-6
Raff EKhanna ALu FOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Scaling up differentially private LASSO regularized logistic regression via faster frank-wolfe iterationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667700(36349-36363)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667700
Wu HOhrimenko OWirth AKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Tight data access bounds for private top-k selectionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619976(37635-37655)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619976
Xiao YWang GZhang DKifer D(2023)Answering Private Linear Queries Adaptively Using the Common MechanismProceedings of the VLDB Endowment10.14778/3594512.359451916:8(1883-1896)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.14778/3594512.3594519
Ma XGuan SLang Y(2023)Differential Privacy Frequent Closed Itemset Mining over Data Stream2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00124(865-872)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TrustCom60117.2023.00124
Zheng PCheng ZTian XLiu HLuo WHuang J(2023)Non-Interactive Privacy-Preserving Frequent Itemset Mining Over Encrypted Cloud DataIEEE Transactions on Cloud Computing10.1109/TCC.2023.329137811:4(3452-3468)Online publication date: Oct-2023
https://doi.org/10.1109/TCC.2023.3291378
Yamamoto AShibuya T(2023)A Joint Permute-and-Flip and Its Enhancement for Large-Scale Genomic Statistical Analysis2023 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW60847.2023.00034(217-226)Online publication date: 4-Dec-2023
https://doi.org/10.1109/ICDMW60847.2023.00034
Tong WChen WHan TChen HZhong S(2023)Differentially Private Two-Party Top-$k$ Frequent Item Mining2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00045(166-177)Online publication date: Jul-2023
https://doi.org/10.1109/ICDCS57875.2023.00045
Mulla RMali YRathod VTambe RShirbhate RAgnihotri R(2023)Enhancing Query Performance Using Simultaneous Execution and Vertical Query Splitting2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307920(1-4)Online publication date: 6-Jul-2023
https://doi.org/10.1109/ICCCNT56998.2023.10307920
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents