Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The tao of inference in privacy-protected databases

Published: 01 July 2018 Publication History

Abstract

To protect database confidentiality even in the face of full compromise while supporting standard functionality, recent academic proposals and commercial products rely on a mix of encryption schemes. The recommendation is to apply strong, semantically secure encryption to the "sensitive" columns and protect other columns with property-revealing encryption (PRE) that supports operations such as sorting.
We design, implement, and evaluate a new methodology for inferring data stored in such encrypted databases. The cornerstone is the multinomial attack, a new inference technique that is analytically optimal and empirically outperforms prior heuristic attacks against PRE-encrypted data. We also extend the multinomial attack to take advantage of correlations across multiple columns. This recovers PRE-encrypted data with sufficient accuracy to then apply machine learning and record linkage methods to infer columns protected by semantically secure encryption or redaction.
We evaluate our methodology on medical, census, and union-membership datasets, showing for the first time how to infer full database records. For PRE-encrypted attributes such as demographics and ZIP codes, our attack outperforms the best prior heuristic by a factor of 16. Unlike any prior technique, we also infer attributes, such as incomes and medical diagnoses, protected by strong encryption. For example, when we infer that a patient in a hospital-discharge dataset has a mental health or substance abuse condition, this prediction is 97% accurate.

References

[1]
Nabil R. Adam and John C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4), 1989.
[2]
Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. Order preserving encryption for numeric data. In SIGMOD, 2004.
[3]
Anthem. Anthem data breach. https://www.anthemfacts.com/, 2016.
[4]
Arvind Arasu, Spyros Blanas, Ken Eguro, Raghav Kaushik, Donald Kossmann, Ravishankar Ramamurthy, and Ramarathnam Venkatesan. Orthogonal security with Cipherbase. In CIDR, 2013.
[5]
David W. Archer, Dan Bogdanov, Y. Lindell, Liina Kamm, Kurt Nielsen, Jakob Illeborg Pagter, Nigel P. Smart, and Rebecca N. Wright. From keys to databases - real-world applications of secure multi-party computation. Cryptology ePrint Archive, Report 2018/450, 2018. https://eprint.iacr.org/2018/450.
[6]
Mihir Bellare, Thomas Ristenpart, Phillip Rogaway, and Till Stegers. Format-preserving encryption. In SAC, 2009.
[7]
Vincent Bindschaedler, Paul Grubbs, David Cash, Thomas Ristenpart, and Vitaly Shmatikov. The tao of inference in privacy-protected databases (full version). IACR ePrint, 2017. https://eprint.iacr.org/2017/1078.
[8]
John Black and Phillip Rogaway. Ciphers with arbitrary finite domains. In CT-RSA. 2002.
[9]
Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O'Neill. Order-preserving symmetric encryption. In EUROCRYPT. 2009.
[10]
Dan Boneh, Kevin Lewi, Mariana Raykova, Amit Sahai, Mark Zhandry, and Joe Zimmerman. Semantically secure order-revealing encryption: Multi-input functional encryption without obfuscation. In Eurocrypt. 2015.
[11]
Ellick Chan, Peifung Lam, and John C. Mitchell. Understanding the challenges with medical data segmentation for privacy. In USENIX HealthTech, 2013.
[12]
Nathan Chenette, Kevin Lewi, Stephen A Weis, and David J Wu. Practical order-revealing encryption with limited leakage. In FSE, 2016.
[13]
Ciphercloud. Ciphercloud. http://www.ciphercloud.com/, 2016.
[14]
CNN. Yahoo data breach. http://money.cnn.com/2016/09/22/technology/yahoo-data-breach/, 2016.
[15]
Tore Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift, 15, 1977.
[16]
Tore Dalenius. Finding a needle in a haystack or Identifying anonymous Census records. J. Offic. Stat., 2(3), 1986.
[17]
Ernesto Damiani, SDCD Vimercati, Sushil Jajodia, Stefano Paraboschi, and Pierangela Samarati. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS, 2003.
[18]
datamade. USAddress library. https://github.com/datamade/usaddress, 2016.
[19]
Harry S. Delugach and Thomas H. Hinke. Wizard: a database inference analysis and detection system. IEEE Transactions on Knowledge and Data Engineering, 1996.
[20]
Pennsylvania State Department. PA voters. http://www.dos.pa.gov/votingelections/pages/default.aspx, 2016.
[21]
Josep Domingo-Ferrer. A survey of inference control methods for privacy-preserving data mining. http://vneumann.etse.urv.es/webCrises/publications/bcpi/domingo-ferrer_survey_SDC_corrected.pdf, 2008.
[22]
Josep Domingo-Ferrer. Inference control in statistical databases. In Encyclopedia of Database Systems, 2009.
[23]
F. Betül Durak, Thomas M. DuBuisson, and David Cash. What else is revealed by order-revealing encryption? In CCS, 2016.
[24]
Cynthia Dwork. Differential privacy. In ICALP, 2006.
[25]
Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. Protocol misidentification made easy with format-transforming encryption. In CCS, 2013.
[26]
eWeek. Navajo Systems. http://tinyurl.com/y85obds6, 2009.
[27]
Sky Faber, Stanislaw Jarecki, Hugo Krawczyk, Quan Nguyen, Marcel-Catalin Rosu, and Michael Steiner. Rich queries on encrypted data: beyond exact matches. In ESORICS, 2015.
[28]
Csilla Farkas and Sushil Jajodia. The inference problem: a survey. ACM SIGKDD Explorations Newsletter, 2002.
[29]
Ivan Fellegi and Alan Sunter. A theory for record linkage. J. Am. Stat. Assoc., 64(328), 1969.
[30]
FOP. About Michigan FOP. http://www.mifop.com/?zone=/unionactive/view_page.cfm&page=About20MIFOP, 2017.
[31]
FOP. Colorado state local FOP lodges. http://www.coloradofop.org/?zone=/unionactive/private_view_page.cfm&page=Local20Lodge20Directory, 2017.
[32]
FOP. Florida FOP lodge. http://www.floridastatefop.org/howjoin.asp, 2017.
[33]
FOP. North Carolina FOP lodge 69. http://www.ncfop69.org/about-us/, 2017.
[34]
FOP. Oklahoma FOP lodge. http://www.okfop.org/index.cfm?zone=/unionactive/view_page.cfm&page=About20us, 2017.
[35]
FOP. Pennsylvania FOP. http://www.pafop.org/, 2017.
[36]
Oded Goldreich, Shafi Goldwasser, and Silvio Micali. How to construct random functions. In FOCS, 1984.
[37]
Melissa Goldstein and Alison Rein. Data segmentation in electronic health information exchange: policy considerations and analysis. The George Washington University Medical Center, 2010.
[38]
Paul Grubbs, Richard McPherson, Muhammad Naveed, Thomas Ristenpart, and Vitaly Shmatikov. Breaking web applications built on top of encrypted data. In CCS, 2016.
[39]
Paul Grubbs, Thomas Ristenpart, and Vitaly Shmatikov. Why your encrypted database is not secure. In HotOS, 2017.
[40]
Paul Grubbs, Kevin Sekniqi, Vincent Bindschaedler, Muhammad Naveed, and Thomas Ristenpart. Leakage-abuse attacks against order-revealing encryption. In S&P, 2017.
[41]
Carl Gunter, Mike Berry, and Martin French. Decision support for data segmentation (DS2): application to pull architectures for HIE. In USENIX HealthTech, 2014.
[42]
Shai Halevi and Phillip Rogaway. A tweakable enciphering mode. In CRYPTO. 2003.
[43]
Shai Halevi and Phillip Rogaway. A parallelizable enciphering mode. In CT-RSA. 2004.
[44]
Healthcare Cost and Utilization Project (HCUP). HCUP nationwide inpatient sample (NIS). https://www.hcup-us.ahrq.gov/nisoverview.jsp, 2008.
[45]
Healthcare Cost and Utilization Project (HCUP). HCUP clinical classifications software (CCS) for ICD-9-CM. https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp, 2016.
[46]
Thomas H Hinke. Inference aggregation detection in database management systems. In S&P, 1988.
[47]
IQrypt. IQrypt: Encrypt and query your database. http://www.iqrypt.com/, 2016.
[48]
Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O'Neill. Generic attacks on secure outsourced databases. In CCS, 2016.
[49]
Jeremy Kepner, Vijay Gadepally, Pete Michaleas, Nabil Schear, Mayank Varia, Arkady Yerukhimovich, and Robert K Cunningham. Computing on masked data: a high performance method for improving big data veracity. In HPEC, 2014.
[50]
Florian Kerschbaum. Frequency-hiding order-preserving encryption. In CCS, 2015.
[51]
Florian Kerschbaum and Axel Schröpfer. Optimal average-complexity ideal-security order-preserving encryption. In CCS, 2014.
[52]
Marie-Sarah Lacharité, Brice Minaud, and Kenneth G. Paterson. Improved reconstruction attacks on encrypted data using range query leakage. In S&P, 2018.
[53]
Marie-Sarah Lacharité and Kenneth G Paterson. A note on the optimality of frequency analysis vs. &ell;<sub>p</sub>-optimization. http://eprint.iacr.org/2015/1158.pdf, 2015.
[54]
Kevin Lewi and David J Wu. Order-revealing encryption: new constructions, applications, and lower bounds. In CCS, 2016.
[55]
Daniel Luchaup, Kevin P. Dyer, Somesh Jha, Thomas Ristenpart, and Thomas Shrimpton. LibFTE: A user-friendly toolkit for constructing practical format-abiding encryption schemes. In USENIX Security, 2014.
[56]
Daniel Luchaup, Thomas Shrimpton, Thomas Ristenpart, and Somesh Jha. Formatted encryption beyond regular languages. In CCS, 2014.
[57]
Teresa F Lunt. Aggregation and inference: facts and fallacies. In S&P, 1989.
[58]
Microsoft. Always Encrypted (Database Engine). https://msdn.microsoft.com/en-us/library/mt163865.aspx, 2016.
[59]
Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In S&P, 2008.
[60]
Muhammad Naveed, Seny Kamara, and Charles V Wright. Inference attacks on property-preserving encrypted databases. In CCS, 2015.
[61]
Skyhigh Networks. Skyhigh Networks. https://www.skyhighnetworks.com/, 2017.
[62]
H. Newcombe, J. Kennedy, S. Axford, and A. James. Automatic linkage of vital records. Science, 130(3381), 1959.
[63]
Whitney K. Newey and Daniel McFadden. Chapter 36 large sample estimation and hypothesis testing. Handbook of Econometrics. 1994.
[64]
State of Ohio. Ohio voters. http://ohiovoters.info/, 2016.
[65]
Office of the Chief Privacy Officer. The data segmentation for privacy initiative. https://www.healthit.gov/providers-professionals/ds4p-initiative, 2016.
[66]
Office of the National Coordinator. HL7 implementation guide for DS4P initiative. https://www.hl7.org/implement/standards/product_brief.cfm?product_id=354, 2016.
[67]
Antonis Papadimitriou, Ranjita Bhagwan, Nishanth Chandran, Ramachandran Ramjee, Andreas Haeberlen, Harmeet Singh, Abhishek Modi, and Saikrishna Badrinarayanan. Big data analytics over encrypted datasets with Seabed. In NSDI, 2016.
[68]
Perspecys. Perspecsys: A Blue Coat Company. http://perspecsys.com/, 2016.
[69]
Rishabh Poddar, Tobias Boelter, and Raluca Popa. Arx: A strongly encrypted database system. https://eprint.iacr.org/2016/591, 2016.
[70]
Raluca Popa, Frank H Li, and Nickolai Zeldovich. An ideal-security protocol for order-preserving encoding. In S&P, 2013.
[71]
Raluca Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan. CryptDB: Protecting confidentiality with encrypted query processing. In SOSP, 2011.
[72]
Raluca Popa, Nickolai Zeldovich, and Hari Balakrishnan. Guidelines for using the CryptDB system securely. https://eprint.iacr.org/2015/979, 2015.
[73]
Daniel S. Roche, Daniel Apon, Seung Geol Choi, and Arkady Yerukhimovich. POPE: partial order preserving encoding. In CCS, 2016.
[74]
Phillip Rogaway and Thomas Shrimpton. The SIV mode of operation for deterministic authenticated-encryption (key wrap) and misuse-resistant nonce-based authenticated-encryption. http://web.cs.ucdavis.edu/~rogaway/papers/siv.pdf, 2007.
[75]
Andreas Schaad. SAP SEEED Project. https://www.sics.se/sites/default/files/pub/andreasschaad.pdf, 2016.
[76]
J. Schlörer. Identification and retrieval of personal records from a statistical data bank. Methods Inf. Med., 14(1), 1975.
[77]
Latanya Sweeney. Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics, 25, 1997.
[78]
Stephen Tu, M Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich. Processing analytical queries over encrypted data. PVLDB, 6(5):289--300, 2013.
[79]
US Census Bureau. American community survey (ACS). http://www.census.gov/programs-surveys/acs/, 2016.
[80]
USPS. ZIP code database. http://www.unitedstateszipcodes.org/zip-code-database/, 2016.
[81]
Wikipedia. Sony pictures entertainment hack. https://en.wikipedia.org/wiki/Sony_Pictures_Entertainment_hack, 2017.
[82]
Wikipedia. Target customer privacy. https://en.wikipedia.org/wiki/Target_Corporation#Customer_privacy, 2017.
[83]
Leon Willenborg and Ton de Waal. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
[84]
Leon Willenborg and Ton de Waal. Elements of Statistical Disclosure Control. Springer-Verlag, 2001.
[85]
Wired. RateMyCop user ensnared in dumbest case ever. https://www.wired.com/2010/06/dumbest-case-ever/, 2010.
[86]
Andrew C Yao. Protocols for secure computations. In FOCS, 1982.
[87]
Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Popa, Joseph E Gonzalez, and Ion Stoica. Opaque: An oblivious and encrypted distributed analytics platform. In NSDI, 2017.

Cited By

View all
  • (2024)Exploring the Security Vulnerability in Frequency-Hiding Order-Preserving EncryptionSecurity and Communication Networks10.1155/2024/27643452024Online publication date: 29-Feb-2024
  • (2024)Reconstructing with Even Less: Amplifying Leakage and Drawing GraphsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670313(4777-4791)Online publication date: 2-Dec-2024
  • (2023)Security analysis of MongoDB queryable encryptionProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620654(7445-7462)Online publication date: 9-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 11
July 2018
507 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2018
Published in PVLDB Volume 11, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the Security Vulnerability in Frequency-Hiding Order-Preserving EncryptionSecurity and Communication Networks10.1155/2024/27643452024Online publication date: 29-Feb-2024
  • (2024)Reconstructing with Even Less: Amplifying Leakage and Drawing GraphsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670313(4777-4791)Online publication date: 2-Dec-2024
  • (2023)Security analysis of MongoDB queryable encryptionProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620654(7445-7462)Online publication date: 9-Aug-2023
  • (2023)Frequency-Revealing Attacks against Frequency-Hiding Order-Preserving EncryptionProceedings of the VLDB Endowment10.14778/3611479.361151316:11(3124-3136)Online publication date: 1-Jul-2023
  • (2023)Leakage-Abuse Attacks Against Forward and Backward Private Searchable Symmetric EncryptionProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623085(3003-3017)Online publication date: 15-Nov-2023
  • (2023)Towards Practical Oblivious Join ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331003836:4(1829-1842)Online publication date: 1-Sep-2023
  • (2023)ShieldDB: An Encrypted Document Database With Padding CountermeasuresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312660735:4(4236-4252)Online publication date: 1-Apr-2023
  • (2022)PantheonProceedings of the VLDB Endowment10.14778/3574245.357425116:4(643-656)Online publication date: 1-Dec-2022
  • (2022)Range Search over Encrypted Multi-Attribute DataProceedings of the VLDB Endowment10.14778/3574245.357424716:4(587-600)Online publication date: 1-Dec-2022
  • (2022)OpBoostProceedings of the VLDB Endowment10.14778/3565816.356582316:2(202-215)Online publication date: 1-Oct-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media