Abstract
We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we introduce a parameterized model of structural knowledge available to the adversary and quantify the success of attacks on individuals in anonymized networks. We show that the risks of these attacks vary based on network structure and size and provide theoretical results that explain the anonymity risk in random networks. We then propose a novel approach to anonymizing network data that models aggregate network structure and allows analysis to be performed by sampling from the model. The approach guarantees anonymity for entities in the network while allowing accurate estimates of a variety of network measures with relatively little bias.
Similar content being viewed by others
References
Aiello, W., Chung, F.R.K., Lu, L.: A random graph model for massive graphs. In: Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing, pp. 171–180. ACM Press, New York (2000)
Alderson D.L., Li L.: Diversity of graphs with highly variable connectivity. Phys. Rev. E 75(4), 046102 (2007)
Babai, L., Kucera, L.: Canonical labelling of graphs in linear average time. In: Proceedings of the Twentieth Annual Symposium on Foundations of Computer Science, pp. 39–46, IEEE Computer Society, San Juan, Puerto Rico (1979)
Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: Proceedings of the Sixteenth International World Wide Web Conference, pp. 181–190. ACM Press, New York (2007)
Barabási A.-L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Blitzstein, J., Diaconis, P.: A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Unpublished (2006)
Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/, (2005)
Cormode G., Srivastava D., Bhagat S., Krishnamurthy B.: Class-based graph anonymization for social network data. Proc. VLDB Endowment 2(1), 766–777 (2009)
Cormode G., Srivastava D., Yu T., Zhang Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endowment 1(1), 833–844 (2008)
Corneil D.G., Gotlieb C.C.: An efficient algorithm for graph isomorphism. J. ACM 17, 51–64 (1970)
Costa L., Rodrigues F., Travieso G., Boas P.: Characterization of complex networks: A survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Dwork, C.: Differential privacy: A survey of results. In Proceedings of the Theory and Applications of Models of Computation Fifth International Conference, number 4978 in Lecture Notes in Computer Science, pp. 1–19. Springer, Xi’an, 25–29 April (2008)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Theory of Cryptography Conference, number 3876 in Lecture Notes in Computer Science, pp. 265–284. Springer, New York, 4–7 March (2006)
Erdös P., Rényi A.: On the evolution of random graphs. Bull. Inst. Int. Stat. 38, 343–347 (1961)
Friedkin N.: Horizons of observability and limits of informal control in organizations. Soc. Forces 62(1), 54–77 (1983)
Frikken, K.B., Golle, P.: Private social network analysis: How to assemble pieces of a graph privately. In: Proceedings of the ACM Workshop on Privacy in the Electronic Society, pp. 89–98. ACM Press, Alexandria, VA, 30 October (2006)
Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 265–273. ACM Press, New York (2008)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the IEEE International Conference on Data Mining, pp. 169–178. IEEE Computer Society, Los Alamitos (2009)
Hay M., Miklau G., Jensen D.D., Towsley D.F., Weis P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endowment 1(1), 102–114 (2008)
Hay, M., Miklau, G., Jensen, D.D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report UM-CS-2007-19, Department of Computer Science, University of Massachusetts, Amherst (2007)
Hay M., Rastogi V., Miklau G., Suciu D.: Boosting the accuracy of differentially-private histograms through consistency. Proc. VLDB Endowment 3(1), 1021–1032 (2010)
Holme P., Kim B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002)
Kifer, D.: Attacks on privacy and de Finetti’s theorem. In: Proceedings of the Thirty-fifth SIGMOD International Conference on Management of Data, pp. 127–138. ACM Press, New York (2009)
Kleinberg J.M.: Navigation in a small world. Nature 406(6798), 845 (2000)
Korolova A., Motwani R., Nabar S.U., Xu Y.: Link privacy in social networks. In: Proceedings of the Seventeenth ACM Conference on Information and Knowledge Management, pp. 289–298. ACM Press, Napa Valley, 26–30 October (2008)
Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings of the Eighth IEEE International Conference on Computer Vision, vol. 2, pp. 251–256 (2001)
Li L., Alderson D., Doyle J., Willinger W.: Towards a theory of scale-free graphs: Definition, properties, and implications. Int. Math. 2(4), 431–523 (2005)
Li, L., Alderson, D., Willinger, W., Doyle, J.: A first-principles approach to understanding the internet’s router-level topology. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 3–14. ACM Press, New York (2004)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 106–115. IEEE Computer Society, Istanbul (2007)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM Press, New York, 10–12 June (2008)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: Proceedings of the Twenty-second International Conference on Data Engineering, p. 24. IEEE Computer Society, Atlanta (2006)
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 126–135. IEEE Computer Society, Istanbul (2007)
Mitzenmacher M., Upfal E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 173–187. IEEE Computer Society , Oakland, 17–20 May (2009)
Newman M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Nissim K., Raskhodnikova S., Smith A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM Press, New York, 11–13 June (2007)
Potterat J.J., Phillips-Plummer L., Muth S.Q., Rothenberg R.B., Woodhouse D.E., Maldonado-Long T.S., Zimmerman H.P., Muth J.B.: Risk network structure in the early epidemic phase of HIV transmission in Colorado Springs. Sex. Transm. Infect. 78(Suppl. 1), i159–i163 (2002)
Rastogi, V., Hay, M., Miklau, G., Suciu, D.: Relationship privacy: Output perturbation for queries with joins. In: Proceedings of the Twenty-Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 107–116. ACM Press, Providence June 29–July 2 (2009)
Rastogi, V., Hong, S., Suciu, D.: The boundary between privacy and utility in data publishing. In: Proceedings of the Thirty-third International Conference on Very Large Data Bases, pp. 531–542. ACM Press, New York (2007)
Russell S.J., Norvig P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Upper Saddle River (2003)
Samarati P.: Protecting respondents’ privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, Computer Science Laboratory, SRI International (1998)
Singh, L., Zhan, J.: Measuring topological anonymity in social networks. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 770–770 (2007)
Sweeney L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., Willinger, W.: Network topology generators: degree-based vs. structural. In: Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 147–159. ACM Press, New York (2002)
Wang, D.-W., Liau, C.-J., sheng Hsu, T.: Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 997–1003 (2006)
Watts D.J., Strogatz S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)
Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 543–554. VLDB Endowment (2007)
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 689–700. ACM Press, New York (2007)
Ying, X., Wu, X.: Randomizing social networks: A spectrum preserving approach. In: Proceedings of the SIAM International Conference on Data Mining, pp. 739–750. Society for Industrial and Applied Mathematics, Atlanta, 24–26 April (2008)
Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: First ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD, Revised Selected Papers, number 4890 in Lecture Notes in Computer Science, pp. 153–171. Springer, San Jose (2007)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the Twenty-fourth International Conference on Data Engineering, pp. 506–515. IEEE Computer Society, Cancun (2008)
Zou L., Chen L., Özsu M.T.: K-Automorphism: A general framework for privacy preserving network publication. Proc. VLDB Endowment 2(1), 946–957 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
An earlier version appeared at the 34th International Conference on Very Large Data Bases (VLDB), 2008.
Rights and permissions
About this article
Cite this article
Hay, M., Miklau, G., Jensen, D. et al. Resisting structural re-identification in anonymized social networks. The VLDB Journal 19, 797–823 (2010). https://doi.org/10.1007/s00778-010-0210-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-010-0210-x