Nothing Special   »   [go: up one dir, main page]

skip to main content
article

What Can We Learn Privately?

Published: 01 June 2011 Publication History

Abstract

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask, What concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (nonprivate) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private probabilistically approximately correct learner for the class of parity functions. This result dispels the similarity between learning with noise and private learning (both must be robust to small changes in inputs), since parity is thought to be very hard to learn given random classification noise. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Therefore, for local private learning algorithms, the similarity to learning with noise is stronger: local learning is equivalent to SQ learning, and SQ algorithms include most known noise-tolerant learning algorithms. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.

References

[1]
D. Agrawal and C. C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms, in Proceedings of PODS '01, P. Buneman, ed., ACM, New York, 2001, pp. 247-255.
[2]
R. Agrawal and R. Srikant, Privacy-preserving data mining, SIGMOD Rec., 29 (2000), pp. 439-450.
[3]
S. Agrawal and J. R. Haritsa, A framework for high-accuracy privacy-preserving mining, in Proceedings of ICDE '05, K. Aberer, M. Franklin, and S. Nishio, eds., IEEE, Washington, DC, 2005, pp. 193-204.
[4]
M. Alekhnovich, More on average case vs approximation complexity, in Proceedings of FOCS, P. Beame, ed., IEEE, Washington, DC, 2003, pp. 298-307.
[5]
A. Ambainis, M. Jakobsson, and H. Lipmaa, Cryptographic randomized response techniques, in PKC, Lecture Notes in Comput. Sci. 2947, Springer, Berlin, 2004, pp. 425-438.
[6]
D. Angluin and L. G. Valiant, Fast probabilistic algorithms for hamiltonian circuits and matchings, J. Comput. System Sci., 18 (1979), pp. 155-193.
[7]
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, Privacy, accuracy, and consistency too: A holistic solution to contingency table release, in Proceedings of PODS '05, P. Kolaitis and L. Libkin, eds., ACM, New York, 2007, pp. 273-282.
[8]
A. Beimel, S. P. Kasiviswanathan, and K. Nissim, Bounds on the sample complexity forprivate learning and private data release, in Theory of Cryptography Conference (TCC), Lecture Notes in Comput. Sci. 5978, Daniele Micciancio, ed., Springer, Berlin, 2010, pp. 437-454.
[9]
S. Ben-David, D. Pál, and H.-U. Simon, Stability of $k$-means clustering, in COLT '07:Proceedings of the 20th Annual Conference on Learning Theory, N. H. Bshouty and C. Gentile, eds., Springer, Berlin, 2007, pp. 20-34.
[10]
S. Ben-David, U. von Luxburg, and D. Pál, A sober look at clustering stability, in Learning Theory, Lecture Notes in Comput. Sci. 4005, Springer, Berlin, 2006, pp. 5-19.
[11]
A. Blum, C. Dwork, F. McSherry, and K. Nissim, Practical privacy: The SuLQ framework, in Proceedings of PODS '05, G. Gottlob and F. Afrati, eds., ACM, New York, 2005, pp. 128-138.
[12]
A. Blum, M. L. Furst, J. Jackson, M. J. Kearns, Y. Mansour, and S. Rudich, Weakly learning DNF and characterizing statistical query learning using Fourier analysis, in Proceedings of STOC '94, F. T. Leighton and M. Goodrich, eds., ACM, New York, 1994, pp. 253-262.
[13]
A. Blum, A. Kalai, and H. Wasserman, Noise-tolerant learning, the parity problem, and the statistical query model, J. ACM, 50 (2003), pp. 506-519.
[14]
A. Blum, K. Ligett, and A Roth, A learning theory approach to non-interactive database privacy, in Proceedings of STOC '08, R. Ladner and C. Dwork, eds., ACM, New York, 2008, pp. 609-618.
[15]
A. Blumer, A. Ehrenfeucht, D. Haussler, and K. M. Warmuth, Occam's razor, Inform. Process. Lett., 24 (1987), pp. 377-380.
[16]
O. Bousquet and A. Elisseeff, Stability and generalization, J. Mach. Learn. Res., 2 (2002), pp. 499-526.
[17]
N. H. Bshouty and V. Feldman, On using extended statistical queries to avoid membership queries, J. Mach. Learn. Res., 2 (2002), pp. 359-395.
[18]
K. Chaudhuri, C. Dwork, and K. Talwar, private communication, 2008.
[19]
H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist., 23 (1952), pp. 493-507.
[20]
L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory, 25 (1979), pp. 601-604.
[21]
I. Dinur and K. Nissim, Revealing information while preserving privacy, in Proceedings of PODS '03, F. Neven, C. Beeri, and T. Milo, eds., ACM, New York, 2003, pp. 202-210.
[22]
C. Dwork, Differential privacy, in Automata, Languages, and Programming, Lecture Notes in Comput. Sci. 4052, Springer, Berlin, 2006, pp. 1-12.
[23]
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, Our data, ourselves: Privacy via distributed noise generation, in Advances in Cryptology, Lecture Notes in Comput. Sci. 4004, Springer, Berlin, 2006, pp. 486-503.
[24]
C. Dwork and J. Lei, Differential privacy and robust statistics, in Proceedings of STOC '09, M. Mitzenmacher, ed., ACM, New York, 2009, pp. 371-380.
[25]
C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating noise to sensitivity in private data analysis, in Theory of Cryptology, Lecture Notes in Comput. Sci. 3876, Springer, Berlin, 2006, pp. 265-284.
[26]
C. Dwork, F. McSherry, and K. Talwar, The price of privacy and the limits of LP decoding, in Proceedings of STOC '07, R. Ladner and U. Feige, eds., ACM, New York, 2007, pp. 85-94.
[27]
C. Dwork and K. Nissim, Privacy-preserving datamining on vertically partitioned databases, in Advances in Cryptography, Lecture Notes in Comput. Sci. 3152, Springer, Berlin, 2004, pp. 528-544.
[28]
C. Dwork and S. Yekahnin, New efficient attacks on statistical disclosure control, in Advances in Cryptology, Lecture Notes in Comput. Sci. 5157, Springer, Berlin, 2008, pp. 469-480.
[29]
A. Elisseeff, T. Evgeniou, and M. Pontil, Stability of randomized learning algorithms, J. Mach. Learn. Res., 6 (2005), pp. 55-79.
[30]
A. Evfimievski, J. Gehrke, and R. Srikant, Limiting privacy breaches in privacy preserving data mining, in Proceedings of PODS '03, ACM, New York, 2003, pp. 211-222.
[31]
P. Fischer and H.-U. Simon, On learning ring-sum-expansions, SIAM J. Comput., 21 (1992), pp. 181-192.
[32]
Y. Freund, Y. Mansour, and R. E. Schapire, Generalization bounds for averaged classifiers, Ann. Statist., 32 (2004), pp. 1698-1722.
[33]
D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Inform. and Comput., 100 (1992), pp. 78-150.
[34]
D. Helmbold, R. Sloan, and M. K. Warmuth, Learning integer lattices, SIAM J. Comput., 21 (1992), pp. 240-266.
[35]
W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), pp. 13-30.
[36]
N. J. Hopper and M. Blum, Secure human identification protocols, in Advances in Cryptology, Lecture Notes in Comput. Sci. 2248, Springer, Berlin, 2001, pp. 52-66.
[37]
W. Jank and G. Shmueli, Statistical Methods in eCommerce Research, Wiley & Sons, New York, 2008.
[38]
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, What can we learn privately?, in Proceedings of FOCS, R. Ravi, ed., IEEE, Washington, DC, 2008, pp. 559-569.
[39]
S. P. Kasiviswanathan and A. Smith, A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information, Preprint arXiv:0803.39461 {cs.CR}, 2008.
[40]
M. Kearns, Efficient noise-tolerant learning from statistical queries, J. ACM, 45 (1998), pp. 983-1006.
[41]
M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, Neural Comput., 11 (1999), pp. 1427-1453.
[42]
M. J. Kearns, R. E. Schapire, and L. M. Sellie, Toward efficient agnostic learning, Mach. Learn., 17 (1994), pp. 115-141.
[43]
M. J. Kearns and U. V. Vazirani, An Introduction to Computational Learning Theory, MIT Press, Cambridge, MA, 1994.
[44]
-1S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, in Proceedings of UAI, Association for Uncertainty in Artificial Intelligence, 2002, pp. 275-282.
[45]
F. McSherry and K. Talwar, Mechanism design via differential privacy, in Proceedings of FOCS, A. Sinclair, ed., IEEE, Washington, DC, 2007, pp. 94-103.
[46]
N. Mishra and M. Sandler, Privacy via pseudorandom sketches, in Proceedings of PODS '06, S. Vansummeren, ed., ACM, New York, 2006, pp. 143-152.
[47]
T. Moran and M. Naor, Polling with physical envelopes: A rigorous analysis of a human-centric protocol, in Advances in Cryptology, Lecture Notes in Comput. Sci. 4004, Springer, Berlin, 2006, pp. 88-108.
[48]
K. Nissim, S. Raskhodnikova, and A. Smith, Smooth sensitivity and sampling in private data analysis, in Proceedings of STOC '07, R. Ladner and U. Feige, eds., ACM, New York, 2007, pp. 75-84.
[49]
V. Rastogi, S. Hong, and D. Suciu, The boundary between privacy and utility in data publishing, in Proceedings of VLDB '07, VLDB Endowment, Vienna, Austria, 2007, pp. 531-542.
[50]
O. Regev, On lattices, learning with errors, random linear codes, and cryptography, J. ACM, 56 (2009), pp. 1-40.
[51]
A. Smith, Efficient, Differentially Private Point Estimators, Preprint arXiv:0809.4794v1 {cs.CR}, 2008.
[52]
L. G. Valiant, A theory of the learnable, Commun. of the ACM, 27 (1984), pp. 1134-1142.
[53]
A. van den Hout and P. G. M. van der Heijden, Randomized response, statistical disclosure control and misclassification: A review, Internat. Statist. Rev., 70 (2002), pp. 269-288.
[54]
S. L. Warner, Randomized response: A survey technique for eliminating evasive answer bias, J. Amer. Statist. Assoc., 60 (1965), pp. 63-69.
[55]
L. Wasserman and S. Zhou, A statistical framework for differential privacy, J. Amer. Statist. Assoc., 105 (2010), pp. 375-389.
[56]
K. Yang, New lower bounds for statistical query learning, J. Comput. System Sci., 70 (2005), pp. 485-509.
[57]
S. Zhou, K. Ligett, and L. Wasserman, Differential privacy with compression, in IEEE International Symposium on Information Theory, R. Calderbank, H. Chung, and A. Orlitsky, eds., IEEE, Washington, DC, 2009, pp. 2718-2722.

Cited By

View all
  • (2024)Privacy Amplification via Shuffling: Unified, Simplified, and TightenedProceedings of the VLDB Endowment10.14778/3659437.365944417:8(1870-1883)Online publication date: 1-Apr-2024
  • (2024)Cookie Monster: Efficient On-Device Budgeting for Differentially-Private Ad-Measurement SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695965(693-708)Online publication date: 4-Nov-2024
  • (2024)Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network ApproachACM Transactions on Knowledge Discovery from Data10.1145/364382118:5(1-21)Online publication date: 30-Jan-2024
  • Show More Cited By
  1. What Can We Learn Privately?

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image SIAM Journal on Computing
    SIAM Journal on Computing  Volume 40, Issue 3
    May 2011
    356 pages

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 01 June 2011

    Author Tags

    1. data privacy
    2. differential privacy
    3. probabilistically approximately correct learning
    4. statistical query learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Privacy Amplification via Shuffling: Unified, Simplified, and TightenedProceedings of the VLDB Endowment10.14778/3659437.365944417:8(1870-1883)Online publication date: 1-Apr-2024
    • (2024)Cookie Monster: Efficient On-Device Budgeting for Differentially-Private Ad-Measurement SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695965(693-708)Online publication date: 4-Nov-2024
    • (2024)Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network ApproachACM Transactions on Knowledge Discovery from Data10.1145/364382118:5(1-21)Online publication date: 30-Jan-2024
    • (2024)1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential PrivacyProceedings of the 10th ACM International Workshop on Security and Privacy Analytics10.1145/3643651.3659896(23-33)Online publication date: 21-Jun-2024
    • (2024)Privacy-Preserving Graph Embedding based on Local Differential PrivacyProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679759(1316-1325)Online publication date: 21-Oct-2024
    • (2024)Towards Accurate and Stronger Local Differential Privacy for Federated Learning with Staircase Randomized ResponseProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653279(307-318)Online publication date: 19-Jun-2024
    • (2024)ProGAP: Progressive Graph Neural Networks with Differential Privacy GuaranteesProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635761(596-605)Online publication date: 4-Mar-2024
    • (2024)Integrating Differential Privacy and Contextual IntegrityProceedings of the Symposium on Computer Science and Law10.1145/3614407.3643702(9-15)Online publication date: 12-Mar-2024
    • (2024)DPAR: Decoupled Graph Neural Networks with Node-Level Differential PrivacyProceedings of the ACM Web Conference 202410.1145/3589334.3645531(1170-1181)Online publication date: 13-May-2024
    • (2024)Decentralized Privacy Preservation for Critical Connections in GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340664136:11(5911-5925)Online publication date: 1-Nov-2024
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media