Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2525373.2525389dlproceedingsArticle/Chapter ViewAbstractPublication PagesausdmConference Proceedingsconference-collections
research-article
Free access

An iterative two-party protocol for scalable privacy-preserving record linkage

Published: 05 December 2012 Publication History

Abstract

Record linkage is the process of identifying which records in different databases refer to the same real-world entities. When personal details of individuals, such as names and addresses, are used to link databases across different organisations, then privacy becomes a major concern. Often it is not permissible to exchange identifying data among organisations. Linking databases in situations where no private or confidential information can be revealed is known as 'privacy-preserving record linkage' (PPRL). We propose a novel protocol for scalable and approximate PPRL based on Bloom filters in a scenario where no third party is available to conduct a linkage.
While two-party protocols are more secure because there is no possibility of collusion between one of the database owners and the third party, these protocols generally require more complex and expensive techniques to ensure that a database owner cannot infer any sensitive information about the other party's data during the linkage process. Our two-party protocol uses an efficient privacy technique called Bloom filters, and conducts an iterative classification of record pairs into matches and non-matches, as selected bits of the Bloom filters are revealed. Experiments conducted on real-world databases that contain nearly two million records, show that our protocol is scalable to large databases while providing sufficient privacy characteristics and achieving high linkage quality.

References

[1]
Agrawal, R., Evfimievski, A. & Srikant, R. (2003), Information sharing across private databases, in 'ACM SIGMOD', San Diego, pp. 86--97.
[2]
Al-Lawati, A., Lee, D. & McDaniel, P. (2005), Blocking-aware private record linkage, in 'Journal of Data and Information Quality', pp. 59--68.
[3]
Atallah, M., Kerschbaum, F. & Du, W. (2003), Secure and private sequence comparisons, in 'ACM Workshop on Privacy in the Electronic Society', pp. 39--44.
[4]
Bloom, B. (1970), 'Space/time trade-offs in hash coding with allowable errors', Communications of the ACM 13(7), 422--426.
[5]
Broder, A., Mitzenmacher, M. & Mitzenmacher, A. (2002), Network applications of Bloom filters: A survey, in 'Internet Mathematics'.
[6]
Christen, P. (2006), Privacy-preserving data linkage and geocoding: Current approaches and research directions, in 'IEEE ICDM Workshop on Privacy Aspects of Data Mining', Hong Kong.
[7]
Christen, P. (2009), Geocode matching and privacy preservation, in 'Workshop on Privacy, Security, and Trust in KDD', Springer, pp. 7--24.
[8]
Christen, P. (2011), 'A survey of indexing techniques for scalable record linkage and deduplication', IEEE Transactions on Knowledge and Data Engineering.
[9]
Christen, P. (2012), Data Matching, Data-Centric Systems and Applications, Springer.
[10]
Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A. & Suciu, D. (2004), Privacy-preserving data integration and sharing, in 'ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery', pp. 19--26.
[11]
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X. & Zhu, M. (2002), 'Tools for privacy preserving distributed data mining', SIGKDD Explorations 4(2), 28--34.
[12]
Durham, E. (2012), A framework for accurate, efficient private record linkage, PhD thesis, Vanderbilt University.
[13]
Durham, E., Xue, Y., Kantarcioglu, M. & Malin, B. (2010), Private medical record linkage with approximate matching, in 'AMIA Annual Symposium Proceedings', p. 182.
[14]
Durham, E., Xue, Y., Kantarcioglu, M. & Malin, B. (2011), 'Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage', Information Fusion.
[15]
Dusserre, L., Quantin, C. & Bouzelat, H. (1995), 'A one way public key cryptosystem for the linkage of nominal files in epidemiological studies', Medinfo 8, 644--647.
[16]
Freedman, M., Ishai, Y., Pinkas, B. & Reingold, O. (2005), 'Keyword search and oblivious pseudorandom functions', Theory of Cryptography.
[17]
Gionis, A., Indyk, P. & Motwani, R. (1999), Similarity search in high dimensions via hashing, in 'Proceedings of the International Conference on Very Large Data Bases', pp. 518--529.
[18]
Goldreich, O. (2004), Foundations of cryptography: Basic applications, Vol. 2, Cambridge Univ Press.
[19]
Hall, R. & Fienberg, S. (2010), Privacy-preserving record linkage, in 'Privacy in Statistical Databases, Springer LNCS 6344', Corfu, Greece, pp. 269--283.
[20]
Inan, A., Kantarcioglu, M., Bertino, E. & Scannapieco, M. (2008), A hybrid approach to private record linkage, in 'IEEE ICDE', Cancun, Mexico, pp. 496--505.
[21]
Inan, A., Kantarcioglu, M., Ghinita, G. & Bertino, E. (2010), Private record matching using differential privacy, in 'EDBT'.
[22]
Kantarcioglu, M., Jiang, W. & Malin, B. (2008), A privacy-preserving framework for integrating person-specific databases, in 'Privacy in Statistical Databases', Springer, pp. 298--314.
[23]
Karakasidis, A. & Verykios, V. (2010), Advances in privacy preserving record linkage, in 'E-activity and Innovative Technology, Advances in Applied Intelligence Technologies Book Series', IGI Global.
[24]
Karakasidis, A. & Verykios, V. (2011), 'Secure blocking+secure matching = secure record linkage', Journal of Computing Science and Engineering 5, 223--235.
[25]
Karakasidis, A. & Verykios, V. (2012), Reference table based k-anonymous private blocking, in 'ACM Symposium on Applied Computing', Riva del Garda, Italy.
[26]
Karakasidis, A., Verykios, V. & Christen, P. (2011), Fake injection strategies for private phonetic matching, in 'International Workshop on Data Privacy Management', Leuven, Belgium.
[27]
Kargupta, H., Datta, S., Wang, Q. & Sivakumar, K. (2003), On the privacy preserving properties of random data perturbation techniques, in 'ICDM', IEEE, pp. 99--106.
[28]
Kissner, L. & Song, D. (2004), Private and threshold set-intersection, in 'Technical Report', Carnegie Mellon University.
[29]
Kuzu, M., Kantarcioglu, M., Durham, E. & Malin, B. (2011), A constraint satisfaction cryptanalysis of Bloom filters in private record linkage, in 'Privacy Enhancing Technologies', Springer, pp. 226--245.
[30]
Lai, P., Yiu, S., Chow, K., Chong, C. & Hui, L. (2006), An Efficient Bloom filter based Solution for Multiparty Private Matching, in 'International Conference on Security and Management'.
[31]
Lindell, Y. & Pinkas, B. (2009), 'Secure multiparty computation for privacy-preserving data mining', Journal of Privacy and Confidentiality 1(1), 5.
[32]
Mohammed, N., Fung, B. & Debbabi, M. (2011), 'Anonymity meets game theory: secure data integration with malicious participants', VLDB 20(4), 567--588.
[33]
O'Keefe, C., Yung, M., Gu, L. & Baxter, R. (2004), Privacy-preserving data linkage protocols, in 'ACM Workshop on Privacy in the Electronic Society'.
[34]
Pang, C., Gu, L., Hansen, D. & Maeder, A. (2009), 'Privacy-preserving fuzzy matching using a public reference table', Intelligent Patient Management pp. 71--89.
[35]
Ravikumar, P., Cohen, W. & Fienberg, S. (2004), A secure protocol for computing string distance metrics, in 'Workshop on Privacy and Security Aspects of Data Mining held at IEEE ICDM'04', pp. 40--46.
[36]
Scannapieco, M., Figotin, I., Bertino, E. & Elmagarmid, A. (2007), Privacy preserving schema and data matching, in 'ACM SIGMOD', pp. 653--664.
[37]
Schnell, R., Bachteler, T. & Reiher, J. (2009), 'Privacy-preserving record linkage using Bloom filters', BMC Medical Informatics and Decision Making 9(1).
[38]
Song, D., Wagner, D. & Perrig, A. (2000), 'Practical techniques for searches on encrypted data', sp.
[39]
Trepetin, S. (2008), 'Privacy-preserving string comparisons in record linkage systems: a review', Information Security Journal: A Global Perspective 17(5), 253--266.
[40]
Van Eycken, E., Haustermans, K., Buntinx, F., Ceuppens, A., Weyler, J., Wauters, E., VAN, O. et al. (2000), 'Evaluation of the encryption procedure and record linkage in the Belgian National Cancer Registry', Archives of public health 58(6), 281--294.
[41]
Vatsalan, D., Christen, P. & Verykios, V. (2011), An efficient two-party protocol for approximate matching in private record linkage, in 'AusDM, CRPIT 121'.
[42]
Vatsalan, D., Christen, P. & Verykios, V. (2013), 'A taxonomy of privacy-preserving record linkage techniques', to appear in Journal of Information Systems.
[43]
Verykios, V., Karakasidis, A. & Mitrogiannis, V. (2009), 'Privacy preserving record linkage approaches', Int. J. of Data Mining, Modelling and Management 1(2), 206--221.
[44]
Weber, S., Lowe, H., Das, A. & Ferris, T. (2012), 'A simple heuristic for blindfolded record linkage', Journal of the American Medical Informatics Association.
[45]
Yakout, M., Atallah, M. & Elmagarmid, A. (2012), 'Efficient and practical approach for private record linkage', Journal of Data and Information Quality 3(3), 5.

Cited By

View all
  • (2016)Privacy-preserving matching of similar patientsJournal of Biomedical Informatics10.1016/j.jbi.2015.12.00459:C(285-298)Online publication date: 1-Feb-2016
  • (2013)Efficient two-party private blocking based on sorted nearest neighborhood clusteringProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505757(1949-1958)Online publication date: 27-Oct-2013
  • (2013)A taxonomy of privacy-preserving record linkage techniquesInformation Systems10.1016/j.is.2012.11.00538:6(946-969)Online publication date: 1-Sep-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
AusDM '12: Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
December 2012
233 pages
ISBN:9781921770142

Publisher

Australian Computer Society, Inc.

Australia

Publication History

Published: 05 December 2012

Author Tags

  1. Bloom filter
  2. approximate matching
  3. data matching
  4. entity resolution
  5. privacy
  6. scalability

Qualifiers

  • Research-article

Acceptance Rates

AusDM '12 Paper Acceptance Rate 25 of 55 submissions, 45%;
Overall Acceptance Rate 98 of 232 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Privacy-preserving matching of similar patientsJournal of Biomedical Informatics10.1016/j.jbi.2015.12.00459:C(285-298)Online publication date: 1-Feb-2016
  • (2013)Efficient two-party private blocking based on sorted nearest neighborhood clusteringProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505757(1949-1958)Online publication date: 27-Oct-2013
  • (2013)A taxonomy of privacy-preserving record linkage techniquesInformation Systems10.1016/j.is.2012.11.00538:6(946-969)Online publication date: 1-Sep-2013
  • (2013)Privacy-preserving record linkageWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11013:5(321-332)Online publication date: 1-Sep-2013

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media