research-article

Steered Microaggregation as a Unified Primitive to Anonymize Data Sets and Data Streams

Authors:

Josep Domingo-Ferrer,

Jordi Soria-Comas,

Rafael Mulero-VellidoAuthors Info & Claims

IEEE Transactions on Information Forensics and Security, Volume 14, Issue 12

Pages 3298 - 3311

https://doi.org/10.1109/TIFS.2019.2914832

Published: 01 December 2019 Publication History

Abstract

As data grow in quantity and complexity, data anonymization is becoming increasingly challenging. On one side, a great diversity of masking methods, synthetic data generation methods, and privacy models exists, and this diversity is often perceived as unsettling by practitioners. On the other side, most of the anonymization methodology was designed for static, structured, and small data, whereas the current landscape includes big data and, in particular, data streams. We explore here a unified and conceptually simple anonymization approach, by presenting a primitive called steered microaggregation that can be tailored to enforce various privacy models on static data sets and also on data streams. Steered microaggregation is based on adding artificial attributes that are properly initialized and weighted in order to guide the microaggregation process into meeting certain desired constraints. To demonstrate the potential of this type of microaggregation, we show how it can be used to achieve <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-anonymity, <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula>-closeness, <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diversity, and <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-differential privacy in the context of static data sets; furthermore, we discuss how it can be used to achieve <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-anonymity of data streams while controlling tuple reordering. Beyond its flexibility and theoretical appeal, steered microaggregation can drastically reduce information loss, as shown by our experimental evaluation.

References

[1]

ARX Data Anonymization Tool, Version 3.7.1. Accessed: Mar. 8, 2018. [Online]. Available: https://arx.deidentifier.org

[2]

S. Äyrämö and T. Kärkkäinen, “Introduction to partitioning-based clustering methods with a robust example,” Dept. Math. Inf. Technol., Softw. Comput. Eng., Univ. Jyväskylä, Tech. Rep., Jan. 2006.

[3]

A. Blum, K. Ligett, and A. Roth, “A learning theory approach to non-interactive database privacy,” in Proc. 40th Annu. Symp. Theory Comput. (STOC), 2008, pp. 609–618.

[4]

Y. Bu, A. W. C. Fu, R. C. W. Wong, L. Chen, and J. Li, “Privacy preserving serial data publishing by role composition,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 845–856, 2008.

[5]

J. Cao, B. Carminati, E. Ferrari, and K.-L. Tan, “CASTLE: Continuously anonymizing data streams,” IEEE Trans. Depend. Sec. Comput., vol. 8, no. 3, pp. 337–352, May/Jun. 2011.

Digital Library

[6]

J. Cao and P. Karras, “Publishing microdata with a robust privacy guarantee,” Proc. VLDB Endowment, vol. 5, no. 11, pp. 1388–1399, 2012.

[7]

J. Cao, P. Karras, P. Kalnis, and K.-L. Tan, “SABRE: A sensitive attribute Bucketization and redistribution framework for t-closeness,” VLDB J.- Int. J. Very Large Data Bases, vol. 20, no. 1, pp. 59–81, Feb. 2011.

Digital Library

[8]

R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong, “Publishing set-valued data via differential privacy,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 1087–1098, 2011.

[9]

J. P. Darieset al., “Privacy, anonymity, and big data in the social sciences,” Commun. ACM, vol. 57, no. 9, pp. 56–63, 2014.

Digital Library

[10]

J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 1, pp. 189–201, Jan./Feb. 2002.

Digital Library

[11]

J. Domingo-Ferrer, D. Sánchez, and G. Rufián-Torrell. “Anonymization of nominal data based on semantic marginality,” Inf. Sci., vol. 242, pp. 35–48, Sep. 2013.

[12]

J. Domingo-Ferrer and J. Soria-Comas, “From t-closeness to differential privacy and vice versa in data anonymization,” Knowl.-Based Syst., vol. 74, pp. 151–158, Jan. 2015.

Digital Library

[13]

J. Domingo-Ferrer and J. Soria-Comas, “Anonymization in the time of big data,” in Proc. Int. Conf. Privacy Stat. Databases (PDS) (Lecture Notes in Computer Science), vol. 9867. Cham, Switzerland: Springer, 2016, pp. 57–68.

[14]

J. Domingo-Ferrer and J. Soria-Comas, “Steered microaggregation: A unified primitive for anonymization of data sets and data streams,” in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), Nov. 2017, pp. 995–1002.

[15]

J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,” Data Mining Knowl. Discovery, vol. 11, no. 2, pp. 195–212, 2005.

Digital Library

[16]

C. Duhigg, “How companies learn your secrets,” The New York Times, Feb. 16, 2012.

[17]

C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloq. Automata, Lang. Program. (ICALP) (Lecture Notes in Computer Science), vol. 4052. Berlin, Germany: Springer, 2006, pp. 1–12.

[18]

C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. Vadhan, “On the complexity of differentially private data release: Efficient algorithms and hardness results,” in Proc. 41st Annu. Symp. Theory Comput. (STOC), 2009, pp. 381–390.

[19]

E. A. H. Elamir and C. J. Skinner, “Record level measures of disclosure risk for survey microdata,” J. Off. Statist., vol. 22, no. 3, pp. 525–539, 2006.

[20]

M. Elliot, E. Mackie, S. O’Shea, C. Tudor, and K. Spicer, “End user licence to open government data? A simulated penetration attack on two social survey datasets,” J. Off. Statist., vol. 32, no. 2, pp. 329–348, 2016.

[21]

J. Gehrke, M. Hay, E. Lui, and R. Pass, “Crowd-blending privacy,” in Advances in Cryptology (Lecture Notes in Computer Science), vol. 7417. Berlin, Germany: Springer, 2012, pp. 479–496.

[22]

M. Hardt, K. Ligett, and F. Mcsherry. (Dec. 21, 2010). “A simple and practical algorithm for differentially private data release.” [Online]. Available: https://arxiv.org/abs/1012.4763

[23]

S. Har-Peled and B. Sadri, “How fast is the k-means method?” in Proc. 16th ACM-SIAM Symp. Discrete Algorithms (SODA), 2005, pp. 877–885.

[24]

A. Hundepoolet al., Statistical Disclosure Control. Hoboken, NJ, USA: Wiley, 2012.

[25]

D. Lambert, “Measures of disclosure risk and harm,” J. Off. Statist., vol. 9, no. 3, pp. 313–331, 1993.

[26]

N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy beyond k-anonymity and $\ell$ -diversity,” in Proc. 23rd IEEE Int. Conf. Data Eng. (ICDE), Apr. 2007, pp. 106–115.

[27]

N. Li, T. Li, and S. Venkatasubramanian, “Closeness: A new privacy measure for data publishing,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 7, pp. 943–956, Jul. 2010.

Digital Library

[28]

J. Liu and K. Wang, “On optimal anonymization for $l^{+}$ -diversity,” in Proc. IEEE 26th Int. Conf. Data Eng. (ICDE), Mar. 2010, pp. 213–224.

[29]

A. Machanavajjhala, J. Gehrke, D. Kiefer, and M. Venkitasubramaniam, “L-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discovery Data, vol. 1, no. 1, 2007, Art. no.

[30]

A. Oganian and J. Domingo-Ferrer, “On the complexity of optimal microaggregation for statistical disclosure control,” Stat. J. United Nations Econ. Commission Eur., vol. 18, no. 4, pp. 345–354, 2001.

[31]

J. Pei, J. Xu, Z. Wang, W. Wang, and K. Wang, “Maintaining k-anonymity against incremental updates,” in Proc. SSDBM, 2007, pp. 1–10.

[32]

D. Rebollo-Monedero, J. Forné, and J. Domingo-Ferrer, “From t-closeness-like privacy to postrandomization via information theory,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 11, pp. 1623–1636, Nov. 2010.

Digital Library

[33]

D. Riboni, L. Pareschi, and C. Bettini, “JS-Reduce: Defending your data from sequential background knowledge attacks,” IEEE Trans. Depend. Sec. Comput., vol. 9, no. 3, pp. 387–400, May/Jun. 2012.

Digital Library

[34]

Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance as a metric for image retrieval,” Int. J. Comput. Vis., vol. 40, no. 2, pp. 99–121, Nov. 2000.

Digital Library

[35]

P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” SRI Int., Menlo Park, CA, USA, Tech. Rep., 1998.

[36]

D. Sánchez, J. Domingo-Ferrer, S. Martínez, and J. Soria-Comas, “Utility-preserving differentially private data releases via individual ranking microaggregation,” Inf. Fusion, vol. 30, pp. 1–14, Jul. 2016.

Digital Library

[37]

E. Shmueli and T. Tassa, “Privacy by diversity in sequential releases of databases,” Inf. Sci., vol. 298, no. 20, pp. 344–372, 2015.

Digital Library

[38]

E. Shmueli, T. Tassa, R. Wasserstein, B. Shapira, and L. Rokach, “Limiting disclosure of sensitive data in sequential releases of databases,” Inf. Sci., vol. 191, pp. 98–127, May 2012.

Digital Library

[39]

J. Soria-Comas and J. Domingo-Ferrer, “Probabilistic k-anonymity through microaggregation and data swapping,” in Proc. IEEE Int. Conf. Fuzzy Syst., Jun. 2012, pp. 1–8.

[40]

J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez, “Enhancing data utility in differential privacy via microaggregation-based k-anonymity,” VLDB J.-Int. J. Very Large Data Bases, vol. 23, no. 5, pp. 771–794, 2014.

Digital Library

[41]

J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez. “ t-Closeness through microaggregation: Strict privacy with enhanced utility preservation,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 11, pp. 3098–3110, Nov. 2015.

Digital Library

[42]

V. Torra and J. Domingo-Ferrer, “Record linkage methods for multidatabase data mining,” in Information Fusion in Data Mining. Berlin, Germany: Springer, 2003, pp. 99–130.

[43]

K. Wang and B. Fung, “Anonymizing sequential releases,” in Proc. KDD, 2012, pp. 414–423.

[44]

X. Xiao and Y. Tao, “Anatomy: Simple and effective privacy preservation,” in Proc. 32nd Int. Conf. Very Large Data Bases (VLDB), 2006, pp. 139–150.

[45]

X. Xiao and Y. Tao, “M-invariance: Towards privacy preserving re-publication of dynamic datasets,” in Proc. SIGMOD, 2007, pp. 689–700.

Cited By

Ruiz N(2024)Bistochastically Private Release of Data Streams with Zero DelayModeling Decisions for Artificial Intelligence10.1007/978-3-031-68208-7_13(152-164)Online publication date: 27-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-68208-7_13
Chen MCang LChang ZIqbal MAlmakhles D(2023)Data anonymization evaluation against re-identification attacks in edge storageWireless Networks10.1007/s11276-023-03235-630:6(5263-5277)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1007/s11276-023-03235-6
Alshomrani SLi S(2022)PUFDCAWireless Communications & Mobile Computing10.1155/2022/63675792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/6367579

Index Terms

Steered Microaggregation as a Unified Primitive to Anonymize Data Sets and Data Streams
1. Social and professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

Efficient multivariate data-oriented microaggregation

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released while preserving the privacy of the underlying ...
Utility-preserving differentially private data releases via individual ranking microaggregation

An utility-preserving method for differentially private data releases is presented.Like with k-anonymity, it is able to produce general-purpose protected datasets.Data is processed via individual ranking microaggregation to reduce its ...
Enhancing data utility in differential privacy via microaggregation-based k-anonymity

It is not uncommon in the data anonymization literature to oppose the "old" k-anonymity model to the "new" differential privacy model, which offers more robust privacy guarantees. Yet, it is often disregarded that the utility of the anonymized results ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Forensics and Security

IEEE Transactions on Information Forensics and Security Volume 14, Issue 12

Dec. 2019

98 pages

ISSN:1556-6013

Issue’s Table of Contents

1556-6013 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 December 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ruiz N(2024)Bistochastically Private Release of Data Streams with Zero DelayModeling Decisions for Artificial Intelligence10.1007/978-3-031-68208-7_13(152-164)Online publication date: 27-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-68208-7_13
Chen MCang LChang ZIqbal MAlmakhles D(2023)Data anonymization evaluation against re-identification attacks in edge storageWireless Networks10.1007/s11276-023-03235-630:6(5263-5277)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1007/s11276-023-03235-6
Alshomrani SLi S(2022)PUFDCAWireless Communications & Mobile Computing10.1155/2022/63675792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/6367579

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents