Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3366423.3380081acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

P-Simrank: Extending Simrank to Scale-Free Bipartite Networks

Published: 20 April 2020 Publication History

Abstract

The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.

References

[1]
Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. 2008. Simrank++: query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment 1, 1 (2008), 408–421.
[2]
Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan. 2003. Directed scale-free graphs. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 132–139.
[3]
Yuanzhe Cai, Gao Cong, Xu Jia, Hongyan Liu, Jun He, Jiaheng Lu, and Xiaoyong Du. 2009. Efficient algorithm for computing link-based similarity in real world networks. In 2009 Ninth IEEE International Conference on Data Mining. IEEE, 734–739.
[4]
Hongbo Deng, Irwin King, and Michael R Lyu. 2009. Entropy-biased models for query representation on the click graph. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 339–346.
[5]
Todd Z DeSantis, Keith Keller, Ulas Karaoz, Alexander V Alekseyenko, Navjeet NS Singh, Eoin L Brodie, Zhiheng Pei, Gary L Andersen, and Niels Larsen. 2011. Simrank: Rapid and sensitive general-purpose k-mer search tool. BMC ecology 11, 1 (2011), 11.
[6]
Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. 2010. Parallel SimRank computation on large graphs with iterative aggregation. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 543–552.
[7]
Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 538–543.
[8]
Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. 2017. READS: a random walk approach for efficient and accurate dynamic SimRank. Proceedings of the VLDB Endowment 10, 9 (2017), 937–948.
[9]
Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343–1350.
[10]
Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. 2014. Scalable similarity search for SimRank. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 325–336.
[11]
Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of simrank for static and dynamic information networks. In Proceedings of the 13th International Conference on Extending Database Technology. ACM, 465–476.
[12]
Lina Li, Cuiping Li, Hong Chen, and Xiaoyong Du. 2013. Mapreduce-based SimRank computation and its application in social recommender system. In 2013 IEEE international congress on big data. IEEE, 133–140.
[13]
Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John Lui. 2015. Walking in the cloud: parallel SimRank at scale. Proceedings of the VLDB Endowment 9, 1 (2015), 24–35.
[14]
Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. 2017. ProbeSim: scalable single-source and top-k SimRank computations on dynamic graphs. Proceedings of the VLDB Endowment 11, 1 (2017), 14–26.
[15]
Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, and Denis Turdakov. 2008. Accuracy estimate and optimization techniques for simrank computation. Proceedings of the VLDB Endowment 1, 1 (2008), 422–433.
[16]
Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, and Denis Turdakov. 2010. Accuracy estimate and optimization techniques for SimRank computation. The VLDB Journal–The International Journal on Very Large Data Bases 19, 1(2010), 45–66.
[17]
Claude E Shannon. 1951. Prediction and entropy of printed English. Bell system technical journal 30, 1 (1951), 50–64.
[18]
Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. 2015. An efficient similarity search framework for SimRank over large dynamic graphs. Proceedings of the VLDB Endowment 8, 8 (2015), 838–849.
[19]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs. arXiv preprint arXiv:1905.02354(2019).
[20]
Weiren Yu, Xuemin Lin, and Wenjie Zhang. 2014. Fast incremental SimRank on link-evolving graphs. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 304–315.
[21]
Weiren Yu and Julie A McCann. 2015. Efficient partial-pairs simrank search on large networks. Proceedings of the VLDB Endowment 8, 5 (2015), 569–580.
[22]
Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. 2012. A space and time efficient algorithm for SimRank computation. World Wide Web 15, 3 (2012), 327–353.

Cited By

View all
  • (2024)BIRD: Efficient Approximation of Bidirectional Hidden Personalized PageRankProceedings of the VLDB Endowment10.14778/3665844.366585517:9(2255-2268)Online publication date: 1-May-2024
  • (2024)HetFS: a method for fast similarity search with ad-hoc meta-paths on heterogeneous information networksWorld Wide Web10.1007/s11280-024-01303-127:6Online publication date: 18-Sep-2024
  • (2023)Social Network Analysis: A Survey on Measure, Structure, Language Information Analysis, Privacy, and ApplicationsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/353973222:5(1-47)Online publication date: 9-May-2023
  • Show More Cited By

Index Terms

  1. P-Simrank: Extending Simrank to Scale-Free Bipartite Networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Network Analysis
        2. Power-law graphs
        3. Simrank

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)38
        • Downloads (Last 6 weeks)6
        Reflects downloads up to 19 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)BIRD: Efficient Approximation of Bidirectional Hidden Personalized PageRankProceedings of the VLDB Endowment10.14778/3665844.366585517:9(2255-2268)Online publication date: 1-May-2024
        • (2024)HetFS: a method for fast similarity search with ad-hoc meta-paths on heterogeneous information networksWorld Wide Web10.1007/s11280-024-01303-127:6Online publication date: 18-Sep-2024
        • (2023)Social Network Analysis: A Survey on Measure, Structure, Language Information Analysis, Privacy, and ApplicationsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/353973222:5(1-47)Online publication date: 9-May-2023
        • (2022)Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge GraphsACM Transactions on Information Systems10.1145/349520940:4(1-45)Online publication date: 11-Jan-2022
        • (2022)Efficient and Effective Similarity Search over Bipartite GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3511959(308-318)Online publication date: 25-Apr-2022
        • (2020)Graph Regularization for Multi-lingual Topic ModelsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401231(1741-1744)Online publication date: 25-Jul-2020

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media