Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

U-DPAP: Utility-aware Efficient Range Counting on Privacy-preserving Spatial Data Federation

Published: 11 February 2025 Publication History

Abstract

Range counting is a fundamental operation in spatial data applications. There is a growing demand to facilitate this operation over a data federation, where spatial data are separately held by multiple data providers (a.k.a., data silos). Most existing data federation schemes employ Secure Multiparty Computation (SMC) to protect privacy, but this approach is computationally expensive and leads to high latency. Consequently, private data federations are often impractical for typical database workloads.This challenge highlights the need for a private data federation scheme capable of providing fast and accurate query responses while maintaining strong privacy.
To address this issue, we propose U-DPAP, a utility-aware efficient privacy-preserving method. It is the first scheme to exclusively use differential privacy for privacy protection in spatial data federation, without employing SMC. Moreover, it combines approximate query processing to further enhance efficiency. Our experimental results indicate that a straightforward combination of the two techniques results in unacceptable impacts on data utility. Thus, we design two novel algorithms: one to make differential privacy practical by optimizing the privacy-utility trade-off, and another to address the efficiency-utility trade-off in approximate query processing. The grouping-based perturbation algorithm reduces noise by grouping similar data and applying noise to the groups. The representative data silos selection algorithm minimizes approximate error by selecting representative silos using the similarity between data silos. We rigorously prove the privacy guarantees of U-DPAP. Moreover, experimental results demonstrate that U-DPAP enhances data utility by an order of magnitude while maintaining high communication efficiency.

References

[1]
2021. 9 Bike. http://www.9bikelm.com
[2]
Gergely Acs and Claude Castelluccia. 2014. A case study: Privacy preserving release of spatio-temporal density in paris. In ACM SIGKDD.
[3]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. In Morgan Kaufmann/ACM VLDB.
[4]
Ahmad Ali, Yanmin Zhu, and Muhammad Zakarya. 2021. A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimedia Tools and Applications 80, 20 (2021), 31401--31433.
[5]
Johes Bater, Gregory Elliott, Craig Eggen, Satyender Goel, Abel N Kho, and Jennie Rogers. 2017. SMCQL: Secure Query Processing for Private Data Networks. In Morgan Kaufmann/ACM VLDB.
[6]
Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, and Jennie Rogers. 2018. Shrinkwrap: efficient sql query processing in differentially private data federations. In Morgan Kaufmann/ACM VLDB.
[7]
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. In Morgan Kaufmann/ACM VLDB.
[8]
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10 (2001), 199--223.
[9]
Rui Chen, Haoran Li, A Kai Qin, Shiva Prasad Kasiviswanathan, and Hongxia Jin. 2016. Private spatial data aggregation in the local setting. In IEEE ICDE.
[10]
Graham Cormode. 2011. Sketch techniques for approximate query processing. Foundations and Trends in Databases. NOW publishers 15 (2011).
[11]
Graham Cormode, Cecilia Procopiuc, Divesh Srivastava, Entong Shen, and Ting Yu. 2012. Differentially private spatial decompositions. In IEEE ICDE.
[12]
Cynthia Dwork. 2006. Differential privacy. In Springer ICALP.
[13]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Springer TCC.
[14]
Ahmed Eldawy and Mohamed F Mokbel. 2015. Spatialhadoop: A mapreduce framework for spatial data. In IEEE ICDE.
[15]
David K Hsiao. 1992. Federated databases and systems: part i-a tutorial on their data sharing. The VLDB Journal 1, 1 (1992), 127--179.
[16]
Maocheng Li, Yuxiang Zeng, and Lei Chen. 2023. Efficient and accurate range counting on privacy-preserving spatial data federation. In Springer DASFAA.
[17]
Yawen Li, Ye Yuan, YishuWang, Xiang Lian, Yuliang Ma, and GuorenWang. 2020. Distributed multimodal path queries. IEEE Transactions on Knowledge and Data Engineering 34, 7 (2020), 3196--3210.
[18]
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In ACM SIGMOD.
[19]
Prashanth Mohan, Abhradeep Thakurta, Elaine Shi, Dawn Song, and David Culler. 2012. GUPT: privacy preserving data analysis made easy. In ACM SIGMOD.
[20]
Wahbeh Qardaji, Weining Yang, and Ninghui Li. 2013. Differentially private grids for geospatial data. In IEEE ICDE.
[21]
Amit P Sheth and James A Larson. 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. Comput. Surveys 22, 3 (1990), 183--236.
[22]
Yexuan Shi, Yongxin Tong, Yuxiang Zeng, Zimu Zhou, Bolin Ding, and Lei Chen. 2021. Efficient approximate range aggregation over large-scale spatial data federation. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2021), 418--430.
[23]
Hien To, Gabriel Ghinita, and Cyrus Shahabi. 2014. A framework for protecting worker location privacy in spatial crowdsourcing. In Morgan Kaufmann/ACM VLDB.
[24]
Yongxin Tong, Xuchen Pan, Yuxiang Zeng, Yexuan Shi, Chunbo Xue, Zimu Zhou, Xiaofei Zhang, Lei Chen, Yi Xu, Ke Xu, et al. 2022. Hu-fu: Efficient and secure spatial queries over data federation. In Morgan Kaufmann/ACM VLDB.
[25]
Nikolaj Volgushev, Malte Schwarzkopf, Ben Getchell, Mayank Varia, Andrei Lapets, and Azer Bestavros. 2019. Conclave: secure multi-party computation on big data. In ACM EuroSys.
[26]
Xuan-Son Vu, Addi Ait-Mlouk, Erik Elmroth, and Lili Jiang. 2019. Graph-based interactive data federation system for heterogeneous data retrieval and analytics. In ACM WWW.
[27]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient in-memory spatial analytics. In IEEE ICDE.
[28]
Ye Yuan, Delong Ma, Zhenyu Wen, Zhiwei Zhang, and Guoren Wang. 2021. Subgraph matching over graph federation. In Morgan Kaufmann/ACM VLDB.
[29]
Kaining Zhang, Yongxin Tong, Yexuan Shi, Yuxiang Zeng, Yi Xu, Lei Chen, Zimu Zhou, Ke Xu, Weifeng Lv, and Zhiming Zheng. 2023. Approximate k-nearest neighbor query over spatial data federation. In Springer DASFAA.
[30]
Meifan Zhang, Hongzhi Wang, Jianzhong Li, and Hong Gao. 2020. SUM-optimal histograms for approximate query processing. Knowledge and Information Systems 62 (2020), 3155--3180.
[31]
Yuanyuan Zhang, Yexuan Shi, Zimu Zhou, Chunbo Xue, Yi Xu, Ke Xu, and Junping Du. 2023. Efficient and secure skyline queries over vertical data federation. IEEE Transactions on Knowledge and Data Engineering 35, 9 (2023), 9269--9280.

Index Terms

  1. U-DPAP: Utility-aware Efficient Range Counting on Privacy-preserving Spatial Data Federation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 3, Issue 1
    SIGMOD
    February 2025
    2261 pages
    EISSN:2836-6573
    DOI:10.1145/3717614
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 February 2025
    Published in PACMMOD Volume 3, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. data federation
    2. differential privacy
    3. range counting

    Qualifiers

    • Research-article

    Funding Sources

    • the National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media