Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2976749.2978409acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

Published: 24 October 2016 Publication History

Abstract

In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

References

[1]
Aol search log. http://www.gregsadetsky.com/aol-data/.
[2]
Spmf: An open-source data mining library. http://www.philippe-fournier-viger.com/spmf.
[3]
R. Bassily and A. D. Smith. Local, private, efficient protocols for succinct histograms. In STOC, pages 127--135, 2015.
[4]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, pages 503--512, 2010.
[5]
A. Blum, K. Ligett, and A. Roth. A learning theory approach to noninteractive database privacy. JACM, 60(2):12, 2013.
[6]
C. Burges et al. Learning to rank using gradient descent. In ICML, pages 89--96, 2005.
[7]
R. Chen et al. Publishing set-valued data via differential privacy. PVLDB, 4(11):1087--1098, 2011.
[8]
J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Privacy aware learning. J. ACM, 61(6):38:1--38:57.
[9]
J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates. In FOCS, pages 429--438, 2013.
[10]
C. Dwork. Differential privacy: A survey of results. In Theory and applications of models of computation, pages 1--19. 2008.
[11]
C. Dwork, M. Naor, T. Pitassi, G. N. Rothblum, and S. Yekhanin. Pan-private streaming algorithms. In ICS, pages 66--80, 2010.
[12]
Ú. Erlingsson et al. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054--1067, 2014.
[13]
G. Fanti et al. Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries. PoPETS, issue 3, 2016, 2016.
[14]
A. Gupta, M. Hardt, A. Roth, and J. Ullman. Privately releasing conjunctions and the statistical query barrier. SICOMP, 42(4):1494--1520, 2013.
[15]
S. Hansell. Aol removes search data on vast group of web users. New York Times, 8:C4, 2006.
[16]
M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, pages 705--714, 2010.
[17]
M. Hay et al. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 3(1--2):1021--1032, 2010.
[18]
X. He, A. Machanavajjhala, and B. Ding. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In SIGMOD, pages 1447--1458, 2014.
[19]
J. Hsu, S. Khanna, and A. Roth. Distributed private heavy hitters. In Automata, Languages, and Programming, pages 461--472. 2012.
[20]
W. B. Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189--206):1, 1984.
[21]
D. Karger et al. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In STOC, pages 654--663, 1997.
[22]
S. P. Kasiviswanathan et al. What can we learn privately? SICOMP, 40(3):793--826, 2011.
[23]
C. Li et al. Optimizing linear counting queries under differential privacy. In PODS, pages 123--134, 2010.
[24]
H. Li et al. Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In CIKM, pages 1001--1010, 2015.
[25]
N. Li, W. Qardaji, D. Su, and J. Cao. Privbasis: Frequent itemset mining with differential privacy. PVLDB, 5(11):1340--1351, 2012.
[26]
F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, pages 19--30, 2009.
[27]
T. T. Nguyên et al. Collecting and analyzing data from smart device users with local differential privacy. arXiv preprint arXiv:1606.05053, 2016.
[28]
S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the ASA, 60(309):63--69, 1965.
[29]
X. Xiao, Y. Tao, and M. Chen. Optimal random perturbation at multiple privacy levels. PVLDB, 2(1):814--825, 2009.
[30]
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. TKDE, 23(8):1200--1214, 2011.
[31]
J. Xu et al. Differentially private histogram publication. VLDBJ, 22(6):797--822, 2013.
[32]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In Algorithmic Aspects in Information and Management, pages 337--348. 2008.

Cited By

View all
  • (2025)WF-LDPSR: A local differential privacy mechanism based on water-filling for secure release of trajectory statistics dataComputers & Security10.1016/j.cose.2024.104165148(104165)Online publication date: Jan-2025
  • (2024)Local differential privacy for data security in key value pair dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-23001624:3(1955-1970)Online publication date: 17-Jun-2024
  • (2024)An efficient online histogram publication method for data streams with local differential privacy一种基于局部差分隐私的数据流高效在线直方图发布算法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230036825:8(1096-1109)Online publication date: 30-Aug-2024
  • Show More Cited By

Index Terms

  1. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
    October 2016
    1924 pages
    ISBN:9781450341394
    DOI:10.1145/2976749
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. heavy hitter
    2. local differential privacy

    Qualifiers

    • Research-article

    Funding Sources

    • MOE Singapore

    Conference

    CCS'16
    Sponsor:

    Acceptance Rates

    CCS '16 Paper Acceptance Rate 137 of 831 submissions, 16%;
    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)149
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)WF-LDPSR: A local differential privacy mechanism based on water-filling for secure release of trajectory statistics dataComputers & Security10.1016/j.cose.2024.104165148(104165)Online publication date: Jan-2025
    • (2024)Local differential privacy for data security in key value pair dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-23001624:3(1955-1970)Online publication date: 17-Jun-2024
    • (2024)An efficient online histogram publication method for data streams with local differential privacy一种基于局部差分隐私的数据流高效在线直方图发布算法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230036825:8(1096-1109)Online publication date: 30-Aug-2024
    • (2024)Privacy Amplification via Shuffling: Unified, Simplified, and TightenedProceedings of the VLDB Endowment10.14778/3659437.365944417:8(1870-1883)Online publication date: 1-Apr-2024
    • (2024)AAA: An Adaptive Mechanism for Locally Differentially Private Mean EstimationProceedings of the VLDB Endowment10.14778/3659437.365944217:8(1843-1855)Online publication date: 31-May-2024
    • (2024)Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded MemoryProceedings of the ACM on Management of Data10.1145/36392852:1(1-27)Online publication date: 26-Mar-2024
    • (2024)Collection and Analysis of Sensitive Data with Privacy Protection by a Distributed Randomized Response ProtocolProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636024(1415-1424)Online publication date: 8-Apr-2024
    • (2024)Differentially Private Top-$k$ Flows Estimation Mechanism in Network TrafficIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.330925011:3(2462-2472)Online publication date: May-2024
    • (2024)PTT: Piecewise Transformation Technique for Analyzing Numerical Data Under Local Differential PrivacyIEEE Transactions on Mobile Computing10.1109/TMC.2024.336449623:10(9518-9531)Online publication date: Oct-2024
    • (2024)Locally Private Set-Valued Data Analyses: Distribution and Heavy Hitters EstimationIEEE Transactions on Mobile Computing10.1109/TMC.2023.334205623:8(8050-8065)Online publication date: Aug-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media