Abstract
Privacy and utility are two main desiderata of good sensitive information publishing schemes. For publishing social networks, many existing algorithms rely on \(k\)-anonymity as a criterion to guarantee privacy protection. They reduce the utility loss by first using the degree sequence to model the structural properties of the original social network and then minimizing the changes on the degree sequence caused by the anonymization process. However, the degree sequence-based graph model is simple, and it fails to capture many important graph topological properties. Consequently, the existing anonymization algorithms that rely on this simple graph model to measure utility cannot guarantee generating anonymized social networks of high utility. In this paper, we propose novel utility measurements that are based on more complex community-based graph models. We also design a general \(k\)-anonymization framework, which can be used with various utility measurements to achieve \(k\)-anonymity with small utility loss on given social networks. Finally, we conduct extensive experimental evaluation on real datasets to evaluate the effectiveness of the new utility measurements proposed. The results demonstrate that our scheme achieves significant improvement on the utility of the anonymized social networks compared with the existing anonymization algorithms. The utility losses of many social network statistics of the anonymized social networks generated by our scheme are under 1 % in most cases.
Similar content being viewed by others
Notes
The attacker could possess some non-structural information about the targets as well (e.g., the labels of vertices and edges), but in this paper, we only consider the structural background knowledge.
Zhou et al provide the NP-hardness proof in Zhou and Pei [37] by an induction from the \(k\)-anonymity problem in relational data.
We want to highlight that the additive adjustment only applies to the refining local structure information step of Algorithm 1 (i.e., lines 10–11). In other part of the algorithm (e.g., lines 5–9 where edge operations are performed to change the current graph toward the target graph), we consider both the edge addition operations and the edge deletion operations.
The \(y\)-axis is in logarithm scale.
References
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, Berlin
Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In: WWW’07, pp 181–190
Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2009) Class-based graph anonymization for social network data. VLDB Endow 2(1):766–777
Cheng J, Fu AWC, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: SIGMOD’10, pp 459–470
Clauset A, Moore C, Newman MEJ (2007) Structural inference of hierarchies in networks. In: ICML’06, pp 1–13
Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101
comScore (2011) It’s a social world: Top 10 need-to-knows about social networking and where it’s headed http://www.comscore.com/Press_Events/Presentations_Whitepapers/2011/it_is_a_social_world_top_10_need-to-knows_about_social_networking
Dwork C (2008) Differential privacy: a survey of results. In: TAMC’08, pp 1–19
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: TCC’06, pp 265–284
Frikken KB, Golle P (2006) Private social network analysis: how to assemble pieces of a graph privately. In: WPES’06, pp 89–98
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):14:1–14:53
Gao J, Yu JX, Jin R, Zhou J, Wang T, Yang D (2011) Neighborhood-privacy protected shortest distance computing in cloud. In: SIGMOD ’11, pp 409–420
Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: ICDE ’08, pp 715–724
Hay M, Miklau G, Jensen D (2007) Anonymizing social networks. Tech. rep, UMass Amberst
Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. VLDB Endow 1(1):102–114
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: ICDM ’09, pp 169–178
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: SIGMOD’06, pp 217–228
Li T, Li N (2009) On the tradeoff between privacy and utility in data publishing. In: SIGKDD’09, pp 517–525
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: SIGMOD’08, pp 93–106
Liu K, Das K, Grandison T, Kargupta H (2008) Privacy-preserving data analysis on graphs and social networks. In: Next generation of data mining, chap 21. Chapman & Hall/CRC, pp 419–437
Liu L, Wang J, Liu J, Zhang J (2009) Privacy preservation in social networks with sensitive edge weights. In: SDM’09, pp 954–965
Maiya AS, Berger-Wolf TY (2010) Sampling community structure. In: WWW’10, pp 701–710
Mir DJ, Wright RN (2012) A differentially private graph estimator for the stochastic kronecker graph model. In: EDBT, PAIS workshops, pp 122–129
Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066,133+
Rastogi V, Suciu D, Hong S (2007) The boundary between privacy and utility in data publishing. In: VLDB’07, pp 531–542
Sala A, Zhao X, Wilson C, Zheng H, Zhao BY (2011) Sharing graphs using differentially private graph models. In: IMC ’11, pp 81–98
Santo F (2010) Community detection in graphs. Phys Rep 486:75–174
Sun X, Wang H, Li J, Pei J (2011) Publishing anonymous survey rating data. Data Mining Knowl Discov 23(3):379–406
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Wong RCW, Fu AWC, Wang K, Pei J (2007) Minimality attack in privacy preserving data publishing. In: VLDB ’07, pp 543–554
Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) K-symmetry model for identity anonymization in social networks. In: EDBT’10, pp 111–122
Xiao Q, Wang Z, Tan KL (2011) Lora: link obfuscation by randomization in graphs. In: SDM’11, pp 33–51
Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: SDM’08, pp 739–750
Ying X, Wu X (2009) On link privacy in randomizing social networks. In: PAKDD’09, pp 28–39
Zheleva E, Getoor L (2008) Preserving the privacy of sensitive relationships in graph data. In: PinKDD’07, pp 153–171
Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: ICDE’08, pp 506–515
Zhou B, Pei J (2010) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 24(2):1–38
Zhou B, Pei J, Luk W (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor Newsl 10:12–22
Zou L, Chen L, Özsu M (2009) K-automorphism: a general framework for privacy preserving network publication. VLDB Endow 2(1):946–957
Acknowledgments
This study was funded through a research Grant 10-C220-SMU-005 from the Office of Research, Singapore Management University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Y., Xie, L., Zheng, B. et al. High utility K-anonymization for social network publishing. Knowl Inf Syst 41, 697–725 (2014). https://doi.org/10.1007/s10115-013-0674-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0674-2