Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Resisting structural re-identification in anonymized social networks

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we introduce a parameterized model of structural knowledge available to the adversary and quantify the success of attacks on individuals in anonymized networks. We show that the risks of these attacks vary based on network structure and size and provide theoretical results that explain the anonymity risk in random networks. We then propose a novel approach to anonymizing network data that models aggregate network structure and allows analysis to be performed by sampling from the model. The approach guarantees anonymity for entities in the network while allowing accurate estimates of a variety of network measures with relatively little bias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aiello, W., Chung, F.R.K., Lu, L.: A random graph model for massive graphs. In: Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing, pp. 171–180. ACM Press, New York (2000)

  2. Alderson D.L., Li L.: Diversity of graphs with highly variable connectivity. Phys. Rev. E 75(4), 046102 (2007)

    Article  MathSciNet  Google Scholar 

  3. Babai, L., Kucera, L.: Canonical labelling of graphs in linear average time. In: Proceedings of the Twentieth Annual Symposium on Foundations of Computer Science, pp. 39–46, IEEE Computer Society, San Juan, Puerto Rico (1979)

  4. Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: Proceedings of the Sixteenth International World Wide Web Conference, pp. 181–190. ACM Press, New York (2007)

  5. Barabási A.-L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  6. Blitzstein, J., Diaconis, P.: A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Unpublished (2006)

  7. Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/, (2005)

  8. Cormode G., Srivastava D., Bhagat S., Krishnamurthy B.: Class-based graph anonymization for social network data. Proc. VLDB Endowment 2(1), 766–777 (2009)

    Google Scholar 

  9. Cormode G., Srivastava D., Yu T., Zhang Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endowment 1(1), 833–844 (2008)

    Google Scholar 

  10. Corneil D.G., Gotlieb C.C.: An efficient algorithm for graph isomorphism. J. ACM 17, 51–64 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  11. Costa L., Rodrigues F., Travieso G., Boas P.: Characterization of complex networks: A survey of measurements. Adv. Phys. 56(1), 167–242 (2007)

    Article  Google Scholar 

  12. Dwork, C.: Differential privacy: A survey of results. In Proceedings of the Theory and Applications of Models of Computation Fifth International Conference, number 4978 in Lecture Notes in Computer Science, pp. 1–19. Springer, Xi’an, 25–29 April (2008)

  13. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Theory of Cryptography Conference, number 3876 in Lecture Notes in Computer Science, pp. 265–284. Springer, New York, 4–7 March (2006)

  14. Erdös P., Rényi A.: On the evolution of random graphs. Bull. Inst. Int. Stat. 38, 343–347 (1961)

    MATH  Google Scholar 

  15. Friedkin N.: Horizons of observability and limits of informal control in organizations. Soc. Forces 62(1), 54–77 (1983)

    Article  Google Scholar 

  16. Frikken, K.B., Golle, P.: Private social network analysis: How to assemble pieces of a graph privately. In: Proceedings of the ACM Workshop on Privacy in the Electronic Society, pp. 89–98. ACM Press, Alexandria, VA, 30 October (2006)

  17. Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 265–273. ACM Press, New York (2008)

  18. Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the IEEE International Conference on Data Mining, pp. 169–178. IEEE Computer Society, Los Alamitos (2009)

  19. Hay M., Miklau G., Jensen D.D., Towsley D.F., Weis P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endowment 1(1), 102–114 (2008)

    Google Scholar 

  20. Hay, M., Miklau, G., Jensen, D.D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report UM-CS-2007-19, Department of Computer Science, University of Massachusetts, Amherst (2007)

  21. Hay M., Rastogi V., Miklau G., Suciu D.: Boosting the accuracy of differentially-private histograms through consistency. Proc. VLDB Endowment 3(1), 1021–1032 (2010)

    Google Scholar 

  22. Holme P., Kim B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002)

    Article  Google Scholar 

  23. Kifer, D.: Attacks on privacy and de Finetti’s theorem. In: Proceedings of the Thirty-fifth SIGMOD International Conference on Management of Data, pp. 127–138. ACM Press, New York (2009)

  24. Kleinberg J.M.: Navigation in a small world. Nature 406(6798), 845 (2000)

    Article  Google Scholar 

  25. Korolova A., Motwani R., Nabar S.U., Xu Y.: Link privacy in social networks. In: Proceedings of the Seventeenth ACM Conference on Information and Knowledge Management, pp. 289–298. ACM Press, Napa Valley, 26–30 October (2008)

  26. Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings of the Eighth IEEE International Conference on Computer Vision, vol. 2, pp. 251–256 (2001)

  27. Li L., Alderson D., Doyle J., Willinger W.: Towards a theory of scale-free graphs: Definition, properties, and implications. Int. Math. 2(4), 431–523 (2005)

    MATH  MathSciNet  Google Scholar 

  28. Li, L., Alderson, D., Willinger, W., Doyle, J.: A first-principles approach to understanding the internet’s router-level topology. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 3–14. ACM Press, New York (2004)

  29. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 106–115. IEEE Computer Society, Istanbul (2007)

  30. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM Press, New York, 10–12 June (2008)

  31. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: -diversity: Privacy beyond k-anonymity. In: Proceedings of the Twenty-second International Conference on Data Engineering, p. 24. IEEE Computer Society, Atlanta (2006)

  32. Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 126–135. IEEE Computer Society, Istanbul (2007)

  33. Mitzenmacher M., Upfal E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)

    MATH  Google Scholar 

  34. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 173–187. IEEE Computer Society , Oakland, 17–20 May (2009)

  35. Newman M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  36. Nissim K., Raskhodnikova S., Smith A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM Press, New York, 11–13 June (2007)

  37. Potterat J.J., Phillips-Plummer L., Muth S.Q., Rothenberg R.B., Woodhouse D.E., Maldonado-Long T.S., Zimmerman H.P., Muth J.B.: Risk network structure in the early epidemic phase of HIV transmission in Colorado Springs. Sex. Transm. Infect. 78(Suppl. 1), i159–i163 (2002)

    Article  Google Scholar 

  38. Rastogi, V., Hay, M., Miklau, G., Suciu, D.: Relationship privacy: Output perturbation for queries with joins. In: Proceedings of the Twenty-Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 107–116. ACM Press, Providence June 29–July 2 (2009)

  39. Rastogi, V., Hong, S., Suciu, D.: The boundary between privacy and utility in data publishing. In: Proceedings of the Thirty-third International Conference on Very Large Data Bases, pp. 531–542. ACM Press, New York (2007)

  40. Russell S.J., Norvig P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Upper Saddle River (2003)

    Google Scholar 

  41. Samarati P.: Protecting respondents’ privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  42. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, Computer Science Laboratory, SRI International (1998)

  43. Singh, L., Zhan, J.: Measuring topological anonymity in social networks. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 770–770 (2007)

  44. Sweeney L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  45. Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., Willinger, W.: Network topology generators: degree-based vs. structural. In: Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 147–159. ACM Press, New York (2002)

  46. Wang, D.-W., Liau, C.-J., sheng Hsu, T.: Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 997–1003 (2006)

  47. Watts D.J., Strogatz S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)

    Article  Google Scholar 

  48. Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 543–554. VLDB Endowment (2007)

  49. Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 689–700. ACM Press, New York (2007)

  50. Ying, X., Wu, X.: Randomizing social networks: A spectrum preserving approach. In: Proceedings of the SIAM International Conference on Data Mining, pp. 739–750. Society for Industrial and Applied Mathematics, Atlanta, 24–26 April (2008)

  51. Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: First ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD, Revised Selected Papers, number 4890 in Lecture Notes in Computer Science, pp. 153–171. Springer, San Jose (2007)

  52. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the Twenty-fourth International Conference on Data Engineering, pp. 506–515. IEEE Computer Society, Cancun (2008)

  53. Zou L., Chen L., Özsu M.T.: K-Automorphism: A general framework for privacy preserving network publication. Proc. VLDB Endowment 2(1), 946–957 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Hay.

Additional information

An earlier version appeared at the 34th International Conference on Very Large Data Bases (VLDB), 2008.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hay, M., Miklau, G., Jensen, D. et al. Resisting structural re-identification in anonymized social networks. The VLDB Journal 19, 797–823 (2010). https://doi.org/10.1007/s00778-010-0210-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0210-x

Keywords

Navigation