Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks

Published: 01 September 2013 Publication History

Abstract

Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency.

References

[1]
I. Antonellis, H. G. Molina, and C. Chang. SimRank++: Query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008.
[2]
P. Berkhin. Survey: A survey on PageRank computing. Internet Mathematics, 2(1), 2005.
[3]
V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev., 46(4), 2004.
[4]
R. Brualdi and D. Cvetkovic. A Combinatorial Approach to Matrix Theory and Its Applications. Discrete Mathematics and Its Applications. Taylor & Francis, 2008.
[5]
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008.
[6]
S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. In WWW, pages 571--580, 2007.
[7]
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, 2005.
[8]
G. He, H. Feng, C. Li, and H. Chen. Parallel SimRank computation on large graphs with iterative aggregation. In KDD, 2010.
[9]
G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, 2002.
[10]
R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD, 2011.
[11]
M. M. Kessler. Bibliographic coupling between scientific papers. Amer. Doc., 14(1): 10--25, 1963.
[12]
P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, 2012.
[13]
E. A. Leicht, P. Holme, and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73(2), 2006.
[14]
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of SimRank for static and dynamic information networks. In EDBT, 2010.
[15]
X. Lin. On the computational complexity of edge concentration. Discrete Applied Mathematics, 101(1-3): 197--205, 2000.
[16]
Z. Lin, M. R. Lyu, and I. King. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1), 2012.
[17]
D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for SimRank computation. PVLDB, 1(1), 2008.
[18]
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci., 24(4), 1973.
[19]
H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, 2006.
[20]
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. SimFusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005.
[21]
X. Yin, J. Han, and P. S. Yu. LinkClus: Efficient clustering via heterogeneous semantic links. In VLDB, 2006.
[22]
W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. http://www.cse.unsw.edu.au/~weirenyu/pubs/20130428.pdf UNSW-CSE-TR-201304, University of New South Wales, 2013.
[23]
P. Zhao, J. Han, and Y. Sun. P-Rank: A comprehensive structural similarity measure over information networks. In CIKM, 2009.
[24]
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural / attribute similarities. PVLDB, 2(1), 2009.

Cited By

View all
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
  • (2021)DiskProceedings of the VLDB Endowment10.14778/3430915.343092514:3(351-363)Online publication date: 9-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 1
September 2013
96 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2013
Published in PVLDB Volume 7, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
  • (2021)DiskProceedings of the VLDB Endowment10.14778/3430915.343092514:3(351-363)Online publication date: 9-Dec-2021
  • (2021)Comprehensively Computing Link-based Similarities by Building A Random Surfer GraphProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482329(2578-2587)Online publication date: 26-Oct-2021
  • (2021)Efficient structural node similarity computation on billion-scale graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00654-930:3(471-493)Online publication date: 23-Feb-2021
  • (2020)SimTabProceedings of the VLDB Endowment10.14778/3407790.340781913:12(2202-2214)Online publication date: 14-Sep-2020
  • (2020)Realtime index-free single source SimRank processing on web-scale graphsProceedings of the VLDB Endowment10.14778/3384345.338434713:7(966-980)Online publication date: 26-Mar-2020
  • (2020)Exact Single-Source SimRank Computation on Large GraphsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389781(653-663)Online publication date: 11-Jun-2020
  • (2020)LSimRank: Node Similarity in a Labeled GraphWeb and Big Data10.1007/978-3-030-60259-8_10(127-144)Online publication date: 12-Aug-2020
  • (2019)Pivotal relationship identificationProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367721(4874-4880)Online publication date: 10-Aug-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media