Resisting structural re-identification in anonymized social networks

Michael Hay¹,
Gerome Miklau¹,
David Jensen¹,
Don Towsley¹ &
…
Chao Li¹

550 Accesses
76 Citations
Explore all metrics

Abstract

We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we introduce a parameterized model of structural knowledge available to the adversary and quantify the success of attacks on individuals in anonymized networks. We show that the risks of these attacks vary based on network structure and size and provide theoretical results that explain the anonymity risk in random networks. We then propose a novel approach to anonymizing network data that models aggregate network structure and allows analysis to be performed by sampling from the model. The approach guarantees anonymity for entities in the network while allowing accurate estimates of a variety of network measures with relatively little bias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy and uniqueness of neighborhoods in social networks

Article Open access 11 October 2021

The effect of distant connections on node anonymity in complex networks

Article Open access 12 January 2024

Hiding Information in Social Networks from De-anonymization Attacks by Using Identity Separation

References

Aiello, W., Chung, F.R.K., Lu, L.: A random graph model for massive graphs. In: Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing, pp. 171–180. ACM Press, New York (2000)
Alderson D.L., Li L.: Diversity of graphs with highly variable connectivity. Phys. Rev. E 75(4), 046102 (2007)
Article MathSciNet Google Scholar
Babai, L., Kucera, L.: Canonical labelling of graphs in linear average time. In: Proceedings of the Twentieth Annual Symposium on Foundations of Computer Science, pp. 39–46, IEEE Computer Society, San Juan, Puerto Rico (1979)
Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: Proceedings of the Sixteenth International World Wide Web Conference, pp. 181–190. ACM Press, New York (2007)
Barabási A.-L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet Google Scholar
Blitzstein, J., Diaconis, P.: A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Unpublished (2006)
Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/, (2005)
Cormode G., Srivastava D., Bhagat S., Krishnamurthy B.: Class-based graph anonymization for social network data. Proc. VLDB Endowment 2(1), 766–777 (2009)
Google Scholar
Cormode G., Srivastava D., Yu T., Zhang Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endowment 1(1), 833–844 (2008)
Google Scholar
Corneil D.G., Gotlieb C.C.: An efficient algorithm for graph isomorphism. J. ACM 17, 51–64 (1970)
Article MATH MathSciNet Google Scholar
Costa L., Rodrigues F., Travieso G., Boas P.: Characterization of complex networks: A survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Article Google Scholar
Dwork, C.: Differential privacy: A survey of results. In Proceedings of the Theory and Applications of Models of Computation Fifth International Conference, number 4978 in Lecture Notes in Computer Science, pp. 1–19. Springer, Xi’an, 25–29 April (2008)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Theory of Cryptography Conference, number 3876 in Lecture Notes in Computer Science, pp. 265–284. Springer, New York, 4–7 March (2006)
Erdös P., Rényi A.: On the evolution of random graphs. Bull. Inst. Int. Stat. 38, 343–347 (1961)
MATH Google Scholar
Friedkin N.: Horizons of observability and limits of informal control in organizations. Soc. Forces 62(1), 54–77 (1983)
Article Google Scholar
Frikken, K.B., Golle, P.: Private social network analysis: How to assemble pieces of a graph privately. In: Proceedings of the ACM Workshop on Privacy in the Electronic Society, pp. 89–98. ACM Press, Alexandria, VA, 30 October (2006)
Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 265–273. ACM Press, New York (2008)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the IEEE International Conference on Data Mining, pp. 169–178. IEEE Computer Society, Los Alamitos (2009)
Hay M., Miklau G., Jensen D.D., Towsley D.F., Weis P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endowment 1(1), 102–114 (2008)
Google Scholar
Hay, M., Miklau, G., Jensen, D.D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report UM-CS-2007-19, Department of Computer Science, University of Massachusetts, Amherst (2007)
Hay M., Rastogi V., Miklau G., Suciu D.: Boosting the accuracy of differentially-private histograms through consistency. Proc. VLDB Endowment 3(1), 1021–1032 (2010)
Google Scholar
Holme P., Kim B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002)
Article Google Scholar
Kifer, D.: Attacks on privacy and de Finetti’s theorem. In: Proceedings of the Thirty-fifth SIGMOD International Conference on Management of Data, pp. 127–138. ACM Press, New York (2009)
Kleinberg J.M.: Navigation in a small world. Nature 406(6798), 845 (2000)
Article Google Scholar
Korolova A., Motwani R., Nabar S.U., Xu Y.: Link privacy in social networks. In: Proceedings of the Seventeenth ACM Conference on Information and Knowledge Management, pp. 289–298. ACM Press, Napa Valley, 26–30 October (2008)
Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings of the Eighth IEEE International Conference on Computer Vision, vol. 2, pp. 251–256 (2001)
Li L., Alderson D., Doyle J., Willinger W.: Towards a theory of scale-free graphs: Definition, properties, and implications. Int. Math. 2(4), 431–523 (2005)
MATH MathSciNet Google Scholar
Li, L., Alderson, D., Willinger, W., Doyle, J.: A first-principles approach to understanding the internet’s router-level topology. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 3–14. ACM Press, New York (2004)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 106–115. IEEE Computer Society, Istanbul (2007)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM Press, New York, 10–12 June (2008)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: Proceedings of the Twenty-second International Conference on Data Engineering, p. 24. IEEE Computer Society, Atlanta (2006)
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 126–135. IEEE Computer Society, Istanbul (2007)
Mitzenmacher M., Upfal E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)
MATH Google Scholar
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 173–187. IEEE Computer Society , Oakland, 17–20 May (2009)
Newman M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Article MATH MathSciNet Google Scholar
Nissim K., Raskhodnikova S., Smith A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM Press, New York, 11–13 June (2007)
Potterat J.J., Phillips-Plummer L., Muth S.Q., Rothenberg R.B., Woodhouse D.E., Maldonado-Long T.S., Zimmerman H.P., Muth J.B.: Risk network structure in the early epidemic phase of HIV transmission in Colorado Springs. Sex. Transm. Infect. 78(Suppl. 1), i159–i163 (2002)
Article Google Scholar
Rastogi, V., Hay, M., Miklau, G., Suciu, D.: Relationship privacy: Output perturbation for queries with joins. In: Proceedings of the Twenty-Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 107–116. ACM Press, Providence June 29–July 2 (2009)
Rastogi, V., Hong, S., Suciu, D.: The boundary between privacy and utility in data publishing. In: Proceedings of the Thirty-third International Conference on Very Large Data Bases, pp. 531–542. ACM Press, New York (2007)
Russell S.J., Norvig P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Upper Saddle River (2003)
Google Scholar
Samarati P.: Protecting respondents’ privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, Computer Science Laboratory, SRI International (1998)
Singh, L., Zhan, J.: Measuring topological anonymity in social networks. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 770–770 (2007)
Sweeney L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., Willinger, W.: Network topology generators: degree-based vs. structural. In: Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 147–159. ACM Press, New York (2002)
Wang, D.-W., Liau, C.-J., sheng Hsu, T.: Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 997–1003 (2006)
Watts D.J., Strogatz S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)
Article Google Scholar
Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 543–554. VLDB Endowment (2007)
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 689–700. ACM Press, New York (2007)
Ying, X., Wu, X.: Randomizing social networks: A spectrum preserving approach. In: Proceedings of the SIAM International Conference on Data Mining, pp. 739–750. Society for Industrial and Applied Mathematics, Atlanta, 24–26 April (2008)
Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: First ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD, Revised Selected Papers, number 4890 in Lecture Notes in Computer Science, pp. 153–171. Springer, San Jose (2007)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the Twenty-fourth International Conference on Data Engineering, pp. 506–515. IEEE Computer Society, Cancun (2008)
Zou L., Chen L., Özsu M.T.: K-Automorphism: A general framework for privacy preserving network publication. Proc. VLDB Endowment 2(1), 946–957 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, 01002, USA
Michael Hay, Gerome Miklau, David Jensen, Don Towsley & Chao Li

Authors

Michael Hay
View author publications
You can also search for this author in PubMed Google Scholar
Gerome Miklau
View author publications
You can also search for this author in PubMed Google Scholar
David Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Don Towsley
View author publications
You can also search for this author in PubMed Google Scholar
Chao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Hay.

Additional information

An earlier version appeared at the 34th International Conference on Very Large Data Bases (VLDB), 2008.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hay, M., Miklau, G., Jensen, D. et al. Resisting structural re-identification in anonymized social networks. The VLDB Journal 19, 797–823 (2010). https://doi.org/10.1007/s00778-010-0210-x

Download citation

Received: 15 August 2009
Revised: 21 May 2010
Accepted: 19 November 2010
Published: 14 December 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s00778-010-0210-x

Resisting structural re-identification in anonymized social networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Privacy and uniqueness of neighborhoods in social networks

The effect of distant connections on node anonymity in complex networks

Hiding Information in Social Networks from De-anonymization Attacks by Using Identity Separation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Resisting structural re-identification in anonymized social networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Privacy and uniqueness of neighborhoods in social networks

The effect of distant connections on node anonymity in complex networks

Hiding Information in Social Networks from De-anonymization Attacks by Using Identity Separation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation