Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A space and time efficient algorithm for SimRank computation

Published: 01 May 2012 Publication History

Abstract

SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields $O\left(n^3\right)$ worst-case time per iteration with the space requirement $O\left(n^2\right)$ , where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes $O \left(\min \left\{ n \cdot m , n^r \right\}\right)$ time and $O \left( n + m \right)$ space, where m is the number of edges in a web-graph model and r log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.

References

[1]
Antonellis, I., Garcia-Molina, H., Chang, C.C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1 (1), 408-421 (2008).
[2]
Bhatia, R.: Matrix Analysis. Springer, New York (1997).
[3]
Cai, Y., Li, P., Liu, H., He, J., Du, X.: S-simrank: combining content and link information to cluster papers effectively and efficiently. In: ADMA (2008).
[4]
Chan, W.M., George, A.: A linear time implementation of the reverse cuthill-mckee algorithm. BIT 20 (1), 8-14 (1980).
[5]
Cohen, J., Roth, M.S.: On the implementation of strassen's fast multiplication algorithm. Acta Inf. 6 , 341-355 (1976).
[6]
Coppersmith, D.,Winograd, S.: On the asymptotic complexity of matrix multiplication. SIAM J. Comput. 11 (3), 82-90 (1982).
[7]
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9 (3), 1-6 (1990).
[8]
D'Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized sparse matrix multiply for compressed row storage format. In: International Conference on Computational Science (1) (2005).
[9]
Fogaras, D., Racz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: EDBT Workshops (2004).
[10]
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005).
[11]
He, G., Feng, H., Li, C., Chen, H.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD (2010).
[12]
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002).
[13]
Li, P., Cai, Y., Liu, H., He, J., Du, X.: Exploiting the block structure of link graph for efficient similarity computation. In: PAKDD (2009).
[14]
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT (2010).
[15]
Lim, A., Rodrigues, B., Xiao, F.: Heuristics for matrix bandwidth reduction. Eur. J. Oper. Res. 174 (1), 69-91 (2006).
[16]
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. PVLDB 1 (1), 422-433 (2008).
[17]
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19 (1), 45-66 (2010).
[18]
Mendelzon, A.O.: Review--authoritative sources in a hyperlinked environment. ACM SIGMOD Digit. Rev. 1 , 604-632 (2000).
[19]
Page, L., Brin, S.R.M., Winograd, T.: The pagerank citation ranking bringing order to the web. Technial report (1998).
[20]
Pathak, A., Chakrabarti, S., Gupta, M.S.: Index design for dynamic personalized pagerank. In: ICDE (2008).
[21]
Quevedo, J.U., Huang, S.H.S.: Similarity among web pages based on their link structure. In: IKE (2003).
[22]
Weinberg, B.H.: Bibliographic coupling: a review. Inf. Storage Retr. 10 (5-6), 189-196 (1974).
[23]
Wijaya, D.T., Bressan, S.: Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms. IJWIS 2 (2), 69-76 (2006).
[24]
Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR (2005).
[25]
Yu,W., Lin, X., Le, J.:Aspace and time efficient algorithm for simrank computation. In:APWeb (2010).
[26]
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs. In: WAIM (2010).
[27]
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2 (1), 718-729 (2009).
[28]
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM '09: Proceeding of the 18th ACM Conference on Information and Knowledge Management (2009).

Cited By

View all
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
  • (2023)Hierarchical All-Pairs SimRank CalculationDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_17(252-268)Online publication date: 17-Apr-2023
  • Show More Cited By
  1. A space and time efficient algorithm for SimRank computation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image World Wide Web
    World Wide Web  Volume 15, Issue 3
    May 2012
    146 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 May 2012

    Author Tags

    1. SimRank
    2. graph similarity
    3. link-based analysis
    4. optimal algorithms

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
    • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
    • (2023)Hierarchical All-Pairs SimRank CalculationDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_17(252-268)Online publication date: 17-Apr-2023
    • (2021)DiskProceedings of the VLDB Endowment10.14778/3430915.343092514:3(351-363)Online publication date: 9-Dec-2021
    • (2021)ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truthsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00672-730:6(989-1015)Online publication date: 5-Jun-2021
    • (2020)SimTabProceedings of the VLDB Endowment10.14778/3407790.340781913:12(2202-2214)Online publication date: 14-Sep-2020
    • (2020)Realtime index-free single source SimRank processing on web-scale graphsProceedings of the VLDB Endowment10.14778/3384345.338434713:7(966-980)Online publication date: 26-Mar-2020
    • (2020)P-Simrank: Extending Simrank to Scale-Free Bipartite NetworksProceedings of The Web Conference 202010.1145/3366423.3380081(3084-3090)Online publication date: 20-Apr-2020
    • (2020)Exact Single-Source SimRank Computation on Large GraphsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389781(653-663)Online publication date: 11-Jun-2020
    • (2019)PRSimProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319873(1042-1059)Online publication date: 25-Jun-2019
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media