Abstract
Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds (\(\upmu \)s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20 \(\upmu \)s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12 \(\upmu \)s per out/in-neighbor retrieved, which improves upon all existing representations.
Similar content being viewed by others
Notes
www.worldwidewebsize.com, on August 6, 2012.
http://newsroom.fb.com/content/default.aspx?NewsAreaId=22, considering June 2012.
The term “dense subgraph” appears in the literature with different meanings [41], but in this paper, we use it to mean the described generalization of cliques and bicliques.
Available at www.cse.psu.edu/~madduri/software/GTgraph.
Available at http://micans.org/mcl/.
Available at law.dsi.unimi.it.
Available at http://www.dia.uniroma3.it/~drovandi/software.php.
Available at http://law.dsi.unimi.it/webdata/uk-2006-05.
Available at snap.stanford.edu/data.
Available at http://libcds.recoded.cl.
References
Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings of the data compression conference (DCC). Snowbird, UT, pp 203–212
Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, Berlin
Anh V, Moffat A (2010) Local modeling for webgraph compression. In: Proceedings of the data compression conference (DCC). Snowbird UT, p 519
Apostolico A, Drovandi G (2009) Graph compression by BFS. Algorithms 2(3):1031–1044
Bader D, Madduri K (2005) Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In: Proceedings of the 12th international high performance computing (HiPC). Goa, India, pp 465–476
Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):2
Boldi P, Vigna S (2004) The Webgraph framework I: compression techniques. In: Proceedings of the 13th international conference on the world wide web (WWW), New York, NY, pp 595–602
Boldi P, Santini M, Vigna S (2009) Permuting web graph. In: The 6th workshop on algorithms and models for the web graph (WAW), Barcelona, Spain, pp 116–126
Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on world wide web (WWW), Hyderabad, India, pp 587–596
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 1(7):107–117
Brisaboa N, Ladra S, Navarro G (2009) K2-trees for compact web graph representation. In: Proceedings of the 16th international symposium on string processing and information retrieval (SPIRE), Saariselkä, Finland, pp 18–30
Brisaboa N, Ladra S, Navarro G (2012) Personal communication including code
Broder A (2000) Min-wise independent permutations: theory and practice. In: Proceedings of the 27th international colloquium on automata, languages and programming (ICALP), Geneva, Italy, p 808
Brohée S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488
Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (Algorithm 457). Commun ACM 16(9):575–576
Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, CA, pp 95–106
Cha M, Mislove A, Gummadi P (2009) A measurement-driven analysis of information propagation in the Flickr social networking. In: Proceedings of the 20th international conference on world wide web (WWW), Madrid, Spain, pp 721–730
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL
Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 219–228
Claude F, Navarro F (2010) Extended compact web graph representations. In: Algorithms and applications. Lecture notes in computer science 6060. Springer, Berlin, pp 77–91
Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16
Claude F, Navarro G (2008) Practical rank/select queries over arbitrary sequences. In: Proceedings of the 15th international symposium on string processing and information retrieval (SPIRE), Melbourne, Australia, pp 176–187
Claude F, Ladra S (2011) Practical representations for web and social graphs. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, UK, pp 1185–1190
Clark D (1996) Compact Pat trees. Ph.D. Thesis, University of Waterloo, Canada
Demetrescu C, Finocchi I, Ribichini A (2006) Trading off space for passes in graph streaming problems. In: Proceedings of the 17th ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 714–723
Donato D, Millozzi S, Leonardi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceedings of the 8th workshop on the web and databases (WebDB), Baltimore, MD, pp 145–150
Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web (WWW) Banff, Alberta, Canada, pp 461–470
Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on very large data bases (VLDB), Trondheim, Norway, pp 721–732
González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Poster Proceedings of the volume of 4th workshop on efficient and experimental algorithms (WEA), Santorini Island, Greece, pp 27–38
Golynski A, Munro J, Rao S (2006) Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 368–373
Grabowski S, Bieniecki W (2010) Tight and simple web graph compression. CoRR abs/006.0809
Grabowski S, Bieniecki W (2011) Merging adjacency lists for efficient web graph compression. Adv Intell Soft Comput 103(1):385–392
Grossi R, Gupta A, Vitter J (2003) High-order entropy-compressed text indexes. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms (SODA), Baltimore, MD, pp 841–850
Hasan M, Salem S, Zaki M (2011) SimClus: an effective algorithm for clustering with a lower bound on similarity. Knowl Inf Syst 28(3):665–685
Hernández C, Navarro G (2011) Compression of web and social graphs supporting neighbor and community queries. In: Proceedings of the 6th ACM workshop on social network mining and analysis (SNAKDD), San Diego, CA
Hernández C, Navarro G (2012) Compressed representation of web and social networks via dense subgraphs. In: Proceedings of the 19th international symposium on string processing and information retrieval (SPIRE), Cartagena de Indias, Colombia, pp 264–276
Katarzyna M, Przemyslaw K, Piotr B (2009) User position measures in social networks. In: Proceedings of the 4th ACM workshop on social network mining and analysis (SNAKDD), Paris, France, pp 1–9
Kleinberg J (1999) Authoritative sources in a hyperlinked environment. JACM 46(5):604–632
Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the web for emerging cyber-communities. Comput Netw 31(11):1481–1493
Larsson N, Moffat A (1999) Offline dictionary-based compression. In: Proceedings of the data compression conference (DCC), Snowbird, Utah, pp 296–305
Lee V, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. Manag Min Graph Data 2010:303–336
Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. PVLDB J 3(1):693–702
Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 533–542
Mcpherson J, Ma K, Ogawa M (2005) Discovering parametric clusters in social small-world graphs. In: Proceedings of the ACM symposium on applied computing, Santa Fe, New Mexico, USA
Mislove A, Marcon M, Gummadi P, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the internet measurement conference (IMC), San Diego, CA, pp 29–42
Mishra R, Shukla S, Arora D, Kumar M (2011) An effective comparison of graph clustering algorithms via random graphs. Int J Comput Appl 22(1):22–27
Morik K, Kaspari A, Wurst M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738
Randall K, Stata R, Wiener J, Wickremesinghe R (2002) The link database: fast access to graphs of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 122–131
Raman R, Raman V, Rao S (2002) Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 233–242
Saito H, Toyoda M, Kitsuregawa M, Aihara K (2007) A large-scale study of link spam detection by graph algorithms. In: Proceedings of adversarial information retrieval on the web (AIRWeb), Banff, Alberta, Canada
Saito K, Kimura M, Ohara K, Motoda H (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635
Suel T, Yuan J (2001) Compressing the graph structure of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 213–222
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on the world wide web (WWW), Hyderabad, India, pp 607–614
Van Dongen, S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht, The Netherlands
Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141
Vitter J (2001) External memory algorithms and data structures: dealing with massive data. ACM Comput Surv 33(2):209–271
Zhuge H (2009) Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans Knowl Data Eng 21(6):785–799
Acknowledgments
Partially funded by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F, and Fondecyt Grant 1-110066, Chile.
Author information
Authors and Affiliations
Corresponding author
Additional information
Partial versions of this article appeared in Proc. SNA-KDD 2011 and Proc. SPIRE 2012.
Rights and permissions
About this article
Cite this article
Hernández, C., Navarro, G. Compressed representations for web and social graphs. Knowl Inf Syst 40, 279–313 (2014). https://doi.org/10.1007/s10115-013-0648-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0648-4