Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Steered Microaggregation as a Unified Primitive to Anonymize Data Sets and Data Streams

Published: 01 December 2019 Publication History

Abstract

As data grow in quantity and complexity, data anonymization is becoming increasingly challenging. On one side, a great diversity of masking methods, synthetic data generation methods, and privacy models exists, and this diversity is often perceived as unsettling by practitioners. On the other side, most of the anonymization methodology was designed for static, structured, and small data, whereas the current landscape includes big data and, in particular, data streams. We explore here a unified and conceptually simple anonymization approach, by presenting a primitive called steered microaggregation that can be tailored to enforce various privacy models on static data sets and also on data streams. Steered microaggregation is based on adding artificial attributes that are properly initialized and weighted in order to guide the microaggregation process into meeting certain desired constraints. To demonstrate the potential of this type of microaggregation, we show how it can be used to achieve <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-anonymity, <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula>-closeness, <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diversity, and <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-differential privacy in the context of static data sets; furthermore, we discuss how it can be used to achieve <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-anonymity of data streams while controlling tuple reordering. Beyond its flexibility and theoretical appeal, steered microaggregation can drastically reduce information loss, as shown by our experimental evaluation.

References

[1]
ARX Data Anonymization Tool, Version 3.7.1. Accessed: Mar. 8, 2018. [Online]. Available: https://arx.deidentifier.org
[2]
S. Äyrämö and T. Kärkkäinen, “Introduction to partitioning-based clustering methods with a robust example,” Dept. Math. Inf. Technol., Softw. Comput. Eng., Univ. Jyväskylä, Tech. Rep., Jan. 2006.
[3]
A. Blum, K. Ligett, and A. Roth, “A learning theory approach to non-interactive database privacy,” in Proc. 40th Annu. Symp. Theory Comput. (STOC), 2008, pp. 609–618.
[4]
Y. Bu, A. W. C. Fu, R. C. W. Wong, L. Chen, and J. Li, “Privacy preserving serial data publishing by role composition,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 845–856, 2008.
[5]
J. Cao, B. Carminati, E. Ferrari, and K.-L. Tan, “CASTLE: Continuously anonymizing data streams,” IEEE Trans. Depend. Sec. Comput., vol. 8, no. 3, pp. 337–352, May/Jun. 2011.
[6]
J. Cao and P. Karras, “Publishing microdata with a robust privacy guarantee,” Proc. VLDB Endowment, vol. 5, no. 11, pp. 1388–1399, 2012.
[7]
J. Cao, P. Karras, P. Kalnis, and K.-L. Tan, “SABRE: A sensitive attribute Bucketization and redistribution framework for t-closeness,” VLDB J.- Int. J. Very Large Data Bases, vol. 20, no. 1, pp. 59–81, Feb. 2011.
[8]
R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong, “Publishing set-valued data via differential privacy,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 1087–1098, 2011.
[9]
J. P. Darieset al., “Privacy, anonymity, and big data in the social sciences,” Commun. ACM, vol. 57, no. 9, pp. 56–63, 2014.
[10]
J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 1, pp. 189–201, Jan./Feb. 2002.
[11]
J. Domingo-Ferrer, D. Sánchez, and G. Rufián-Torrell. “Anonymization of nominal data based on semantic marginality,” Inf. Sci., vol. 242, pp. 35–48, Sep. 2013.
[12]
J. Domingo-Ferrer and J. Soria-Comas, “From t-closeness to differential privacy and vice versa in data anonymization,” Knowl.-Based Syst., vol. 74, pp. 151–158, Jan. 2015.
[13]
J. Domingo-Ferrer and J. Soria-Comas, “Anonymization in the time of big data,” in Proc. Int. Conf. Privacy Stat. Databases (PDS) (Lecture Notes in Computer Science), vol. 9867. Cham, Switzerland: Springer, 2016, pp. 57–68.
[14]
J. Domingo-Ferrer and J. Soria-Comas, “Steered microaggregation: A unified primitive for anonymization of data sets and data streams,” in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), Nov. 2017, pp. 995–1002.
[15]
J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,” Data Mining Knowl. Discovery, vol. 11, no. 2, pp. 195–212, 2005.
[16]
C. Duhigg, “How companies learn your secrets,” The New York Times, Feb. 16, 2012.
[17]
C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloq. Automata, Lang. Program. (ICALP) (Lecture Notes in Computer Science), vol. 4052. Berlin, Germany: Springer, 2006, pp. 1–12.
[18]
C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. Vadhan, “On the complexity of differentially private data release: Efficient algorithms and hardness results,” in Proc. 41st Annu. Symp. Theory Comput. (STOC), 2009, pp. 381–390.
[19]
E. A. H. Elamir and C. J. Skinner, “Record level measures of disclosure risk for survey microdata,” J. Off. Statist., vol. 22, no. 3, pp. 525–539, 2006.
[20]
M. Elliot, E. Mackie, S. O’Shea, C. Tudor, and K. Spicer, “End user licence to open government data? A simulated penetration attack on two social survey datasets,” J. Off. Statist., vol. 32, no. 2, pp. 329–348, 2016.
[21]
J. Gehrke, M. Hay, E. Lui, and R. Pass, “Crowd-blending privacy,” in Advances in Cryptology (Lecture Notes in Computer Science), vol. 7417. Berlin, Germany: Springer, 2012, pp. 479–496.
[22]
M. Hardt, K. Ligett, and F. Mcsherry. (Dec. 21, 2010). “A simple and practical algorithm for differentially private data release.” [Online]. Available: https://arxiv.org/abs/1012.4763
[23]
S. Har-Peled and B. Sadri, “How fast is the k-means method?” in Proc. 16th ACM-SIAM Symp. Discrete Algorithms (SODA), 2005, pp. 877–885.
[24]
A. Hundepoolet al., Statistical Disclosure Control. Hoboken, NJ, USA: Wiley, 2012.
[25]
D. Lambert, “Measures of disclosure risk and harm,” J. Off. Statist., vol. 9, no. 3, pp. 313–331, 1993.
[26]
N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy beyond k-anonymity and $\ell$ -diversity,” in Proc. 23rd IEEE Int. Conf. Data Eng. (ICDE), Apr. 2007, pp. 106–115.
[27]
N. Li, T. Li, and S. Venkatasubramanian, “Closeness: A new privacy measure for data publishing,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 7, pp. 943–956, Jul. 2010.
[28]
J. Liu and K. Wang, “On optimal anonymization for $l^{+}$ -diversity,” in Proc. IEEE 26th Int. Conf. Data Eng. (ICDE), Mar. 2010, pp. 213–224.
[29]
A. Machanavajjhala, J. Gehrke, D. Kiefer, and M. Venkitasubramaniam, “L-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discovery Data, vol. 1, no. 1, 2007, Art. no.
[30]
A. Oganian and J. Domingo-Ferrer, “On the complexity of optimal microaggregation for statistical disclosure control,” Stat. J. United Nations Econ. Commission Eur., vol. 18, no. 4, pp. 345–354, 2001.
[31]
J. Pei, J. Xu, Z. Wang, W. Wang, and K. Wang, “Maintaining k-anonymity against incremental updates,” in Proc. SSDBM, 2007, pp. 1–10.
[32]
D. Rebollo-Monedero, J. Forné, and J. Domingo-Ferrer, “From t-closeness-like privacy to postrandomization via information theory,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 11, pp. 1623–1636, Nov. 2010.
[33]
D. Riboni, L. Pareschi, and C. Bettini, “JS-Reduce: Defending your data from sequential background knowledge attacks,” IEEE Trans. Depend. Sec. Comput., vol. 9, no. 3, pp. 387–400, May/Jun. 2012.
[34]
Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance as a metric for image retrieval,” Int. J. Comput. Vis., vol. 40, no. 2, pp. 99–121, Nov. 2000.
[35]
P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” SRI Int., Menlo Park, CA, USA, Tech. Rep., 1998.
[36]
D. Sánchez, J. Domingo-Ferrer, S. Martínez, and J. Soria-Comas, “Utility-preserving differentially private data releases via individual ranking microaggregation,” Inf. Fusion, vol. 30, pp. 1–14, Jul. 2016.
[37]
E. Shmueli and T. Tassa, “Privacy by diversity in sequential releases of databases,” Inf. Sci., vol. 298, no. 20, pp. 344–372, 2015.
[38]
E. Shmueli, T. Tassa, R. Wasserstein, B. Shapira, and L. Rokach, “Limiting disclosure of sensitive data in sequential releases of databases,” Inf. Sci., vol. 191, pp. 98–127, May 2012.
[39]
J. Soria-Comas and J. Domingo-Ferrer, “Probabilistic k-anonymity through microaggregation and data swapping,” in Proc. IEEE Int. Conf. Fuzzy Syst., Jun. 2012, pp. 1–8.
[40]
J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez, “Enhancing data utility in differential privacy via microaggregation-based k-anonymity,” VLDB J.-Int. J. Very Large Data Bases, vol. 23, no. 5, pp. 771–794, 2014.
[41]
J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez. “ t-Closeness through microaggregation: Strict privacy with enhanced utility preservation,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 11, pp. 3098–3110, Nov. 2015.
[42]
V. Torra and J. Domingo-Ferrer, “Record linkage methods for multidatabase data mining,” in Information Fusion in Data Mining. Berlin, Germany: Springer, 2003, pp. 99–130.
[43]
K. Wang and B. Fung, “Anonymizing sequential releases,” in Proc. KDD, 2012, pp. 414–423.
[44]
X. Xiao and Y. Tao, “Anatomy: Simple and effective privacy preservation,” in Proc. 32nd Int. Conf. Very Large Data Bases (VLDB), 2006, pp. 139–150.
[45]
X. Xiao and Y. Tao, “M-invariance: Towards privacy preserving re-publication of dynamic datasets,” in Proc. SIGMOD, 2007, pp. 689–700.

Cited By

View all

Index Terms

  1. Steered Microaggregation as a Unified Primitive to Anonymize Data Sets and Data Streams
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Information Forensics and Security
    IEEE Transactions on Information Forensics and Security  Volume 14, Issue 12
    Dec. 2019
    98 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 December 2019

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media