Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Graph homomorphism revisited for graph matching

Published: 01 September 2010 Publication History

Abstract

In a variety of emerging applications one needs to decide whether a graph G matches another Gp, i.e., whether G has a topological structure similar to that of Gp. The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applications. This paper studies revisions of these notions, providing a full treatment from complexity to algorithms. (1) We propose p-homomorphism (p-hom) and 1-1 p-hom, which extend graph homomorphism and subgraph isomorphism, respectively, by mapping edges from one graph to paths in another, and by measuring the similarity of nodes. (2) We introduce metrics to measure graph similarity, and several optimization problems for p-hom and 1-1 p-hom. (3) We show that the decision problems for p-hom and 1-1 p-hom are NP-complete even for DAGs, and that the optimization problems are approximation-hard. (4) Nevertheless, we provide approximation algorithms with provable guarantees on match quality. We experimentally verify the effectiveness of the revised notions and the efficiency of our algorithms in Web site matching, using real-life and synthetic data.

References

[1]
Chemistry development kit (cdk). http://sourceforge.net/projects/cdk/.
[2]
Stanford webbase. http://diglib.stanford.edu:8091/testbed/doc2/WebBase.
[3]
M. Aery and S. Chakravarthy. eMailSift: Email classification based on structure and content. In ICDM, 2005.
[4]
K. Bharat and A. Broder. Mirror, mirror on the Web: a study of host pairs with replicated content. Comput. Netw., 31(11--16), 1999.
[5]
K. Bharat et al. A comparison of techniques to find mirrored hosts on the WWW. J. Am. Soc. Inf. Sci., 51(12), 2000.
[6]
V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev., 46(4), 2004.
[7]
R. B. Boppana and M. M. Halldórsson. Approximating maximum independent sets by excluding subgraphs. BIT, 32(2), 1992.
[8]
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the Web. Comput. Netw. ISDN Syst., 29(8--13), 1997.
[9]
H. Bunke. Graph matching: Theoretical foundations, algorithms, and applications. Vision Interface, 3, 2000.
[10]
L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In VLDB, 2005.
[11]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, 2008.
[12]
J. Cho, N. Shivakumar, and H. Garcia-Molina. Finding replicated Web collections. SIGMOD Rec., 29(2), 2000.
[13]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 2001.
[14]
W. Fan and P. Bohannon. Information preserving XML schema embedding. TODS, 33(1), 2008.
[15]
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.
[16]
M. M. Halldórsson. Approximations of weighted independent set and hereditary subset problems. J. Graph Algorithms Appl., 4(1), 2000.
[17]
M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995.
[18]
S. Joshi et al. A bag of paths model for measuring structural similarity in Web documents. In KDD, 2003.
[19]
V. Kann. On the approximability of the maximum common subgraph problem. In STACS, 1992.
[20]
C. Liu, C. Chen, J. Han, and P. S. Yu. Gplag: Detection of software plagiarism by program dependence graph analysis. In SIGKDD, 2006.
[21]
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm. In ICDE, 2002.
[22]
E. Nuutila. An efficient transitive closure algorithm for cyclic digraphs. Inf. Process. Lett., 52(4), 1994.
[23]
P. Papadimitriou, A. Dasdan, and H. Garcia-Molina. Web graph similarity for anomaly detection. Technical report, 2008.
[24]
I. Sanz, M. Mesiti, G. Guerrini, and R. B. Llavori. Fragment-based approximate retrieval in highly heterogeneous XML collections. Data Knowl. Eng., 64(1):266--293, 2008.
[25]
A. Schenker, M. Last, H. Bunke, and A. Kandel. Classification of Web documents using graph matching. IJPRAI, 18(3), 2004.
[26]
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, 2002.
[27]
Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, 2008.
[28]
V. V. Vazirani. Approximation Algorithms. Springer, 2003.
[29]
Webconfs. Similar page checker. www.webconfs.com/similar-page-checker.php.
[30]
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, 2005.
[31]
Z. Zeng, A. K. Tung, J. Wang, J. Feng, and L. Zhou. Edit distance evaluation on graph structures. In VLDB, 2009.
[32]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. In VLDB, 2009.

Cited By

View all
  1. Graph homomorphism revisited for graph matching

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Extending Graph Rules with OraclesProceedings of the VLDB Endowment10.14778/3654621.365464117:7(1775-1787)Online publication date: 1-Mar-2024
    • (2023)Knowledge Graphs QueryingACM SIGMOD Record10.1145/3615952.361595652:2(18-29)Online publication date: 11-Aug-2023
    • (2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
    • (2022)Efficient In-Memory Evaluation of Reachability Graph Pattern Queries on Data GraphsDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_4(55-71)Online publication date: 11-Apr-2022
    • (2020)BOOMER: A Tool for Blending Visual P-Homomorphic Queries on Large NetworksProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3384680(2685-2688)Online publication date: 11-Jun-2020
    • (2020)A Hybrid Index for Distance QueriesWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62005-9_17(227-241)Online publication date: 20-Oct-2020
    • (2020)Parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_3(31-46)Online publication date: 2-Oct-2020
    • (2020)Processing Big Data Across InfrastructuresBig Data – BigData 202010.1007/978-3-030-59612-5_4(38-51)Online publication date: 18-Sep-2020
    • (2020)Virtual network embedding on massive substrate networksTransactions on Emerging Telecommunications Technologies10.1002/ett.384931:2Online publication date: 16-Feb-2020
    • (2019)NAVIGATEProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3320245(1965-1968)Online publication date: 25-Jun-2019
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media