Abstract
Protecting the privacy of individuals, whose data are released to untrusted parties, is a problem that has captured the attention of the scientific community for years. Several techniques have been proposed to cope with this problem. Amongst these techniques, microaggregation is able to provide a good trade-off between information loss and disclosure risk. Thus, many efforts have been devoted to its study. Microaggregation is a statistical disclosure control (SDC) technique that aims at protecting the privacy of individual respondents by aggregating the information of similar respondents, so as to make them undistinguishable. Although microaggregation is a very interesting approach, to microaggregate multivariate data sets optimally is known to be an NP-hard problem. Consequently, the use of heuristics has been suggested as a possible strategy to solve the problem in a reasonable time. Specifically, genetic algorithms (GA) have been shown to be able to find good solutions to the microaggregation problem for small, multivariate data sets. However, due to the very nature of the problem, GA can hardly cope with large, multivariate data sets. With the aim to apply them to large data sets, those have to be previously partitioned into smaller disjoint subsets that the GA can handle separately. In this chapter, we summarise several proposals for partitioning data sets, in order to apply GA to microaggregate them. In addition, we elaborate on the study of a partitioning strategy based on the variable-MDAV algorithm, we study the effect of several parameters, namely the dimension, the aggregation parameter (k), the size of the data sets, etc. Also, we compare it with the most relevant previous proposals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002), http://neon.vb.cbs.nl/casc
Canadian Privacy: Canadian privacy regulations (2005), http://www.media-awareness.ca/english/issues/privacy/canadian_legislation_privacy.cfm
Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, Eurostat, Luxemburg, pp. 69–78 (1995)
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. The VLDB Journal 15(4), 355–369 (2006)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications 55(4), 714–732 (2008)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Edwards, A.W.F., Cavalli-Sforza, L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)
European Parliament: DIRECTIVE 2002/58/EC of the European Parliament and Council of concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications) (July 12, 2002), http://europa.eu.int/eur-lex/pri/en/oj/dat/2002/l_201/l_20120020731en00370047.pdf
Fayyoumi, E., Oommen, B.J.: A Fixed Structure Learning Automaton Micro-Aggregation Technique for Secure Statistical Databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 114–128. Springer, Heidelberg (2006)
Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering 15(4), 1043–1044 (2003)
Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Hundepool, A., de Wetering, A.V., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S.: μ-ARGUS version 4.0 Software and User’s Manual. Statistics Netherlands, Voorburg NL (2005), http://neon.vb.cbs.nl/casc
Hutter, M.: Fitness uniform selection to preserve genetic diversity. Tech. Rep. IDSIA-01-01, IDSIA, Manno-Lugano, Switzerland (2001)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Martínez-Ballesté, A., Solanas, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A genetic approach to multivariate microaggregation for database privacy. In: ICDE Workshops, pp. 180–185. IEEE Computer Society Press (2007), http://dx.doi.org/10.1109/ICDEW.2007.4400989
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe 18(4), 345–354 (2001)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 459–476 (2002)
Solanas, A.: Privacy Protection with Genetic Algorithms. In: Success in Evolutionary Computation. SCI, pp. 215–237. Springer, Heidelberg (2008)
Solanas, A., Gonzalez-Nicolaas, U., Martinez-Balleste, A.: A variable-mdav-based partitioning strategy to continuous multivariate microaggregation with genetic algorithms. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010), doi:10.1109/IJCNN.2010.5596660
Solanas, A., Martínez-Ballesté, A.: V-MDAV: Variable group size multivariate microaggregation. In: COMPSTAT 2006, Rome, pp. 917–925 (2006)
Solanas, A., Martínez-Ballesté, A., Mateo-Sanz, J.M., Domingo-Ferrer, J.: Multivariate microaggregation based on genetic algorithms. In: 3rd IEEE Conference On Intelligent Systems, pp. 65–70. IEEE Computer Society Press, Westminster (2006)
Torra, V.: Microaggregation for categorical variables: A median based approach. In: Privacy in Statistical Databases, pp. 162–174 (2004)
US Privacy: regulations (2005), http://www.media-awareness.ca/english/issues/privacy/us_legislation_privacy.cfm
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Solanas, A., González-Nicolás, Ú., Martínez-Ballesté, A. (2012). Mixing Genetic Algorithms and V-MDAV to Protect Microdata. In: Elizondo, D., Solanas, A., Martinez-Balleste, A. (eds) Computational Intelligence for Privacy and Security. Studies in Computational Intelligence, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25237-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-25237-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25236-5
Online ISBN: 978-3-642-25237-2
eBook Packages: EngineeringEngineering (R0)