Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2806416.2806421acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model

Published: 17 October 2015 Publication History

Abstract

We give the first Õ(1 over √ T)-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm is optimal up to the poly(log(T)) factor in terms of the error and requires only O(log T) memory. It aims to learn a hidden database-vector w* Ε in ℜ D in order to accurately answer a stream of queries regarding the hidden database, which arrive in an online fashion from some unknown distribution D. We assume the distribution D is defined on the neighborhood of a low-dimensional manifold. The presented algorithm runs in O(dD)-time per query, where d is the dimensionality of the query-space. Contrary to the classical setting, there is no separate training set that is used by the algorithm to learn the database --- the stream on which the algorithm will be evaluated must also be used to learn the database-vector. The algorithm only has access to a binary oracle Ο that answers whether a particular linear function of the database-vector plus random noise is larger than a threshold, which is specified by the algorithm. We note that we allow for a significant O(D) amount of noise to be added while other works focused on the low noise o(√D)-setting. For a stream of T queries our algorithm achieves an average error Õ(1 over √T) by filtering out random noise, adapting threshold values given to the oracle based on its previous answers and, as a consequence, recovering with high precision a projection of a database-vector w* onto the manifold defining the query-space. Our algorithm may be also applied in the adversarial machine learning context to compromise machine learning engines by heavily exploiting the vulnerabilities of the systems that output only binary signal and in the presence of significant noise.

References

[1]
Cynthia Dwork. Differential privacy. In ICALP (2), pages 1--12, 2006.
[2]
Cynthia Dwork. Differential privacy in new settings. In Moses Charikar, editor, SODA, pages 174--183. SIAM, 2010.
[3]
Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Leonard J. Schulman, editor, STOC, pages 715--724. ACM, 2010.
[4]
Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In Andrew Chi-Chih Yao, editor, ICS, pages 66--80. Tsinghua University Press, 2010.
[5]
Kobbi Nissim, Rann Smorodinsky, and Moshe Tennenholtz. Approximately optimal mechanism design via differential privacy. CoRR, abs/1004.2888, 2010.
[6]
Krzysztof Choromanski and Tal Malkin. The power of the dinur-nissim algorithm: breaking privacy of statistical and graph databases. In PODS, pages 65--76, 2012.
[7]
Sergey Yekhanin. Private information retrieval. Commun. ACM, 53(4):68--73, 2010.
[8]
Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of LP decoding. In David S. Johnson and Uriel Feige, editors, STOC, pages 85--94. ACM, 2007.
[9]
Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages 202--210. ACM, 2003.
[10]
Cynthia Dwork and Sergey Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469--480, 2008.
[11]
Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In STOC, pages 537--546, 2008.
[12]
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373--1396, 2003.
[13]
Richard G. Baraniuk, Volkan Cevher, and Michael B. Wakin. Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective. pages 959--971, 2010.
[14]
Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 99--108, New York, NY, USA, 2004. ACM.
[15]
Yevgeniy Vorobeychik and Bo Li. Optimal randomized classification in adversarial settings. In AAMAS, pages 485--492, 2014.
[16]
Kevin G. Jamieson, Maya R. Gupta, Eric Swanson, and Hyrum S. Anderson. Training a support vector machine to classify signals in a real environment given clean training data. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pages 2214--2217, 2010.
[17]
S. Ross. Stochastic processes. Wiley, 1996.

Index Terms

  1. An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
    October 2015
    1998 pages
    ISBN:9781450337946
    DOI:10.1145/2806416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial machine learning
    2. bisection method
    3. database privacy
    4. low-dimensional querying model
    5. online retrieval algorithms
    6. statistical databases

    Qualifiers

    • Research-article

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 257
      Total Downloads
    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 18 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media