Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3459637.3482467acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy

Published: 30 October 2021 Publication History

Abstract

With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the interest is in the multiple attributes of the population, that is to say, in multidimensional data (d ≥ 2). However, contrary to the problem of frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. This one can indeed grow extremely quickly due to the composition theorem. To the authors' knowledge, two solutions seem to stand out for this task: 1) splitting the privacy budget for each attribute, i.e., send each value with ε d ≥-LDP (Spl), and 2) random sampling a single attribute and spend all the privacy budget to send it with ε-LDP (Smp). AlthoughSmp adds additional sampling error, it has proven to provide higher data utility than the formerSpl solution. However, we argue that aggregators (who are also seen as attackers) are aware of the sampled attribute and its LDP value, which is protected by a "less strict" eε probability bound (rather than e^ε/d ). This way, we propose a solution named Random S ampling plus Fake Data (RS+FD), which allows creatinguncertainty over the sampled attribute by generating fake data for each non-sampled attribute; RS+FD further benefits from amplification by sampling. We theoretically and experimentally validate our proposed solution on both synthetic and real-world datasets to show that RS+FD achieves nearly the same or better utility than the state-of-the-artSmp solution.

Supplementary Material

MP4 File (CIKM21-rgfp1742.mp4)
This paper investigates the problem of collecting multidimensional data under ?-local-differential-privacy for the fundamental task of frequency estimation. Unlike the single attribute frequency estimation problem, the multidimensional setting needs to consider the allocation of the privacy budget. To the authors' knowledge, there are mainly two solutions: 1) Splitting (Spl) the privacy budget over the number of attributes and 2) Sampling (Smp) a single attribute per user and using the whole privacy budget on it. The latter solution has proven to provide higher data utility than the former. This paper proposes a novel solution named Random Sampling Plus Fake Data (RS+FD), which randomly samples a single attribute per user and generates one fake data for each non-sampled attribute. This way, there is uncertainty on the view of the aggregator concerning the reported attribute (higher privacy protection). RS+FD achieves nearly the same or better performance than the state-of-the-art Smp solution.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy (CCS '16). Association for Computing Machinery, New York, NY, USA, 308--318. https://doi.org/10.1145/2976749.2978318
[2]
John M. Abowd. 2018. The U.S. Census Bureau Adopts Differential Privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM. https://doi.org/10.1145/3219819.3226070
[3]
Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. 2019. Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 1120--1129.
[4]
Ahmet Aktay et al. 2020. Google COVID-19 community mobility reports: Anonymization process description (version 1.0). arXiv preprint arXiv:2004.04145 (2020).
[5]
Mario Alvim, Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Anna Pazii. 2018. Invited Paper: Local Differential Privacy on Metric Spaces: Optimizing the Trade-Off with Utility. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF). IEEE. https://doi.org/10.1109/csf.2018.00026
[6]
Héber H. Arcolezi, Jean-François Couchot, Oumaya Baala, Jean-Michel Contet, Bechara Al Bouna, and Xiaokui Xiao. 2020a. Mobility modeling through mobile data: generating an optimized and open dataset respecting privacy. In 2020 International Wireless Communications and Mobile Computing (IWCMC). IEEE. https://doi.org/10.1109/iwcmc48107.2020.9148138
[7]
Hé ber H. Arcolezi, Jean-Francc ois Couchot, Bechara Al Bouna, and Xiaokui Xiao. 2021. Longitudinal Collection and Analysis of Mobile Phone Data with Local Differential Privacy. In IFIP Advances in Information and Communication Technology. Springer International Publishing, 40--57. https://doi.org/10.1007/978--3-030--72465--8_3
[8]
Hé ber H. Arcolezi, Jean-Francc ois Couchot, Selene Cerna, Christophe Guyeux, Guillaume Royer, Bé chara Al Bouna, and Xiaokui Xiao. 2020b. Forecasting the number of firefighter interventions per region with local-differential-privacy-based data. Computers & Security, Vol. 96 (Sept. 2020), 101888. https://doi.org/10.1016/j.cose.2020.101888
[9]
Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy amplification by subsampling: tight analyses via couplings and divergences. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 6280--6290.
[10]
Borja Balle, Gilles Barthe, and Marco Gaboardi. 2020. Privacy profiles and amplification by subsampling. Journal of Privacy and Confidentiality, Vol. 10, 1 (2020).
[11]
Borja Balle, James Bell, Adrià Gascó n, and Kobbi Nissim. 2019. The Privacy Blanket of the Shuffle Model. In Advances in Cryptology textendash CRYPTO 2019. Springer International Publishing, 638--667. https://doi.org/10.1007/978--3-030--26951--7_22
[12]
Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Thakurta. 2017. Practical locally private heavy hitters. arXiv preprint arXiv:1707.04982 (2017).
[13]
Raef Bassily and Adam Smith. 2015. Local, Private, Efficient Protocols for Succinct Histograms. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing. ACM. https://doi.org/10.1145/2746539.2746632
[14]
Kamalika Chaudhuri and Nina Mishra. 2006. When Random Sampling Preserves Privacy. In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 198--213. https://doi.org/10.1007/11818175_12
[15]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3571--3580.
[16]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[17]
John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2018. Minimax Optimal Procedures for Locally Private Estimation. J. Amer. Statist. Assoc., Vol. 113, 521 (Jan. 2018), 182--201. https://doi.org/10.1080/01621459.2017.1389735
[18]
Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming. Springer Berlin Heidelberg, 1--12. https://doi.org/10.1007/11787006_1
[19]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography. Springer Berlin Heidelberg, 265--284. https://doi.org/10.1007/11681878_14
[20]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.
[21]
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, and Abhradeep Thakurta. 2020. Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation. arXiv preprint arXiv:2001.03618 (2020).
[22]
Ú lfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2468--2479. https://doi.org/10.1137/1.9781611975482.151
[23]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA). ACM, New York, NY, USA, 1054--1067. https://doi.org/10.1145/2660267.2660348
[24]
Giulia Fanti, Vasyl Pihur, and Ú lfar Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies, Vol. 2016, 3 (May 2016), 41--61. https://doi.org/10.1515/popets-2016-0015
[25]
Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta. 2018. Privacy Amplification by Iteration. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). 521--532. https://doi.org/10.1109/FOCS.2018.00056
[26]
Simson Garfinkel. 2021. Implementing Differential Privacy for the 2020census. USENIX Association.
[27]
Peter Kairouz, Keith Bonawitz, and Daniel Ramage. 2016. Discrete distribution estimation under local privacy. In International Conference on Machine Learning. PMLR, 2436--2444.
[28]
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2008. What Can We Learn Privately?. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE. https://doi.org/10.1109/focs.2008.27
[29]
Stephan Kessler, Jens Hoff, and Johann-Christoph Freytag. 2019. SAP HANA goes private. Proceedings of the VLDB Endowment, Vol. 12, 12 (Aug. 2019), 1998--2009. https://doi.org/10.14778/3352063.3352119
[30]
Jong Wook Kim, Dae-Ho Kim, and Beakcheol Jang. 2018. Application of Local Differential Privacy to Collection of Indoor Positioning Data. IEEE Access, Vol. 6 (2018), 4276--4286. https://doi.org/10.1109/access.2018.2791588
[31]
Ninghui Li, Wahbeh Qardaji, and Dong Su. 2012. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security - ASIACCS textquotesingle12. ACM Press. https://doi.org/10.1145/2414456.2414474
[32]
Zitao Li, Tianhao Wang, Milan Lopuha"a-Zwakenberg, Ninghui Li, and Boris vS koric. 2020. Estimating Numerical Distributions under Local Differential Privacy. In Proceedings of the 2020aCM SIGMOD International Conference on Management of Data. ACM. https://doi.org/10.1145/3318464.3389700
[33]
David McCandless, Tom Evans, Miriam Quick, Ella Hollowood, Christian Miles, Dan Hampson, and Duncan Geere. 2021. World's Biggest Data Breaches & Hacks. https://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/. Online; accessed 11 March 2021.
[34]
Thông T. Nguyên, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, and Junbum Shin. 2016. Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy. ArXiv, Vol. abs/1606.05053 (2016).
[35]
Fan Peng, Shaohua Tang, Bowen Zhao, and Yuxian Liu. 2019. A privacy-preserving data aggregation of mobile crowdsensing based on local differential privacy. In Proceedings of the ACM Turing Celebration Conference - China. ACM. https://doi.org/10.1145/3321408.3321602
[36]
Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, Xiaokui Xiao, and Kui Ren. 2016. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM. https://doi.org/10.1145/2976749.2978409
[37]
Xuebin Ren, Chia-mu Yu, Weiren Yu, Shusen Yang, Senior Member, Xinyu Yang, Julie A. Mccann, Philip S. Yu, and Life Fellow. 2018. LoPub : High-Dimensional Crowdsourced Data., Vol. 13, 9 (2018), 2151--2166. https://doi.org/10.1109/TIFS.2018.2812146
[38]
Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, and Parvez Ahammad. 2020. LinkedIn's Audience Engagements API: A privacy preserving data analytics system at scale. arXiv preprint arXiv:2002.05839 (2020).
[39]
Apple Differential Privacy Team. 2017. Learning with privacy at scale. https://docs-assets.developer.apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf. Online; accessed 11 March 2021.
[40]
Ning Wang, Xiaokui Xiao, Yin Yang, Jun Zhao, Siu Cheung Hui, Hyejin Shin, Junbum Shin, and Ge Yu. 2019. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE. https://doi.org/10.1109/icde.2019.00063
[41]
Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally Differentially Private Protocols for Frequency Estimation. In 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC, 729--745.
[42]
Tianhao Wang, Bolin Ding, Min Xu, Zhicong Huang, Cheng Hong, Jingren Zhou, Ninghui Li, and Somesh Jha. 2020a. Improving utility and security of the shuffler-based differential privacy. Proceedings of the VLDB Endowment, Vol. 13, 13 (Sept. 2020), 3545--3558. https://doi.org/10.14778/3424573.3424576
[43]
Tianhao Wang, Ninghui Li, and Somesh Jha. 2018. Locally Differentially Private Frequent Itemset Mining. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2018.00035
[44]
Tianhao Wang, Ninghui Li, and Somesh Jha. 2021a. Locally Differentially Private Heavy Hitter Identification. IEEE Transactions on Dependable and Secure Computing, Vol. 18, 2 (March 2021), 982--993. https://doi.org/10.1109/tdsc.2019.2927695
[45]
Tianhao Wang, Milan Lopuhaa-Zwakenberg, Zitao Li, Boris Skoric, and Ninghui Li. 2020b. Locally Differentially Private Frequency Estimation with Consistency. In Proceedings 2020 Network and Distributed System Security Symposium. Internet Society. https://doi.org/10.14722/ndss.2020.24157
[46]
Teng Wang, Jun Zhao, Zhi Hu, Xinyu Yang, Xuebin Ren, and Kwok-Yan Lam. 2021b. Local Differential Privacy for data collection and analysis. Neurocomputing, Vol. 426 (Feb. 2021), 114--133. https://doi.org/10.1016/j.neucom.2020.09.073
[47]
Stanley L. Warner. 1965. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Amer. Statist. Assoc., Vol. 60, 309 (March 1965), 63--69. https://doi.org/10.1080/01621459.1965.10480775
[48]
Xingxing Xiong, Shubo Liu, Dan Li, Zhaohui Cai, and Xiaoguang Niu. 2020. A Comprehensive Survey on Local Differential Privacy. Security and Communication Networks, Vol. 2020 (Oct. 2020), 1--29. https://doi.org/10.1155/2020/8829523
[49]
Zhikun Zhang, Tianhao Wang, Ninghui Li, Shibo He, and Jiming Chen. 2018. CALM: Consistent adaptive local marginal for marginal release under local differential privacy. Proceedings of the ACM Conference on Computer and Communications Security (2018), 212--229. https://doi.org/10.1145/3243734.3243742
[50]
Dan Zhao, Hong Chen, Suyun Zhao, Xiaoying Zhang, Cuiping Li, and Ruixuan Liu. 2019. Local Differential Privacy with K-anonymous for Frequency Estimation. In 2019 IEEE International Conference on Big Data (Big Data). IEEE. https://doi.org/10.1109/bigdata47090.2019.9006022

Cited By

View all
  • (2024)Local differential privacy and its applicationsComputer Standards & Interfaces10.1016/j.csi.2023.10382789:COnline publication date: 25-Jun-2024
  • (2024)Towards answering analytical query over hierarchical histogram under untrusted serversDistributed and Parallel Databases10.1007/s10619-024-07447-343:1Online publication date: 12-Nov-2024
  • (2024)On the impact of multi-dimensional local differential privacy on fairnessData Mining and Knowledge Discovery10.1007/s10618-024-01031-038:4(2252-2275)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
    October 2021
    4966 pages
    ISBN:9781450384469
    DOI:10.1145/3459637
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. frequency estimation
    2. local differential privacy
    3. multidimensional data
    4. sampling

    Qualifiers

    • Research-article

    Funding Sources

    • EIPHI-BFC Graduate School
    • Region of Bourgogne Franche-Comt?

    Conference

    CIKM '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)78
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Local differential privacy and its applicationsComputer Standards & Interfaces10.1016/j.csi.2023.10382789:COnline publication date: 25-Jun-2024
    • (2024)Towards answering analytical query over hierarchical histogram under untrusted serversDistributed and Parallel Databases10.1007/s10619-024-07447-343:1Online publication date: 12-Nov-2024
    • (2024)On the impact of multi-dimensional local differential privacy on fairnessData Mining and Knowledge Discovery10.1007/s10618-024-01031-038:4(2252-2275)Online publication date: 27-May-2024
    • (2023)Neural graph generation from graph statisticsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667698(36324-36338)Online publication date: 10-Dec-2023
    • (2023)On the Risks of Collecting Multidimensional Data Under Local Differential PrivacyProceedings of the VLDB Endowment10.14778/3579075.357908616:5(1126-1139)Online publication date: 1-Jan-2023
    • (2023)Collaborative Sampling for Partial Multi-Dimensional Value Collection Under Local Differential PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328900718(3948-3961)Online publication date: 1-Jan-2023
    • (2023)Local Hashing and Fake Data for Privacy-Aware Frequency Estimation2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM56909.2023.10035583(1-4)Online publication date: 3-Jan-2023
    • (2022)SMARTEN—A Sample-Based Approach towards Privacy-Friendly Data RefinementJournal of Cybersecurity and Privacy10.3390/jcp20300312:3(606-628)Online publication date: 15-Aug-2022
    • (2022)Sarve: synthetic data and local differential privacy for private frequency estimationCybersecurity10.1186/s42400-022-00129-65:1Online publication date: 3-Aug-2022
    • (2022)LDP-IDS: Local Differential Privacy for Infinite Data StreamsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526190(1064-1077)Online publication date: 10-Jun-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media