Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1951365.1951408acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Efficient discovery of frequent subgraph patterns in uncertain graph databases

Published: 21 March 2011 Publication History

Abstract

Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994.
[2]
S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.
[3]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[4]
J. Cheng, Y. Ke, and W. Ng. Graphgen: A synthetic graph generator. http://www.cse.ust.hk/graphgen/, 2006.
[5]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007.
[6]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, pages 149--159, 2001.
[7]
J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, pages 1721--1729, 2007.
[8]
E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng., 18(11):1441--1456, 2006.
[9]
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, page 38, 2006.
[10]
C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology evaluation challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.
[11]
P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008.
[12]
M. Hua and J. Pei. Probabilistic path queries in road networks: traffic uncertainty aware path selection. In EDBT, pages 347--358, 2010.
[13]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003.
[14]
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004.
[15]
J. Huff and J. Haseman. Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards. Environmental Health Perspectives, 96(3):23--31, 1991.
[16]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000.
[17]
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003.
[18]
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng., 16(9):1038--1051, 2004.
[19]
D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003.
[20]
Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008.
[21]
S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004.
[22]
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In PVLDB, 2010.
[23]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976.
[24]
L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.
[25]
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, pages 316--325, 2004.
[26]
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003.
[27]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008.
[28]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002.
[29]
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003.
[30]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004.
[31]
S. Zhang, J. Yang, and S. Li. Ring: An integrated method for frequent representative subgraph mining. In ICDM, pages 1082--1087, 2009.
[32]
Z. Zou, J. Li, H. Gao, and S. Zhang. Frequent subgraph pattern mining on uncertain graph data. In CIKM, pages 583--592, 2009.

Cited By

View all
  • (2024)Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic GraphsElectronics10.3390/electronics1301019213:1(192)Online publication date: 1-Jan-2024
  • (2024)Fast Query Answering by Labeling Index on Uncertain Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00311(4058-4071)Online publication date: 13-May-2024
  • (2023)Discovering Interesting Patterns from HypergraphsACM Transactions on Knowledge Discovery from Data10.1145/362294018:1(1-34)Online publication date: 16-Oct-2023
  • Show More Cited By

Index Terms

  1. Efficient discovery of frequent subgraph patterns in uncertain graph databases

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
    March 2011
    587 pages
    ISBN:9781450305280
    DOI:10.1145/1951365
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Microsoft Research: Microsoft Research

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 March 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '11
    Sponsor:
    • Microsoft Research
    EDBT/ICDT '11: EDBT/ICDT '11 joint conference
    March 21 - 24, 2011
    Uppsala, Sweden

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic GraphsElectronics10.3390/electronics1301019213:1(192)Online publication date: 1-Jan-2024
    • (2024)Fast Query Answering by Labeling Index on Uncertain Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00311(4058-4071)Online publication date: 13-May-2024
    • (2023)Discovering Interesting Patterns from HypergraphsACM Transactions on Knowledge Discovery from Data10.1145/362294018:1(1-34)Online publication date: 16-Oct-2023
    • (2023)Art of Rare Pattern Discovery Techniques in Data Mining2023 Global Conference on Information Technologies and Communications (GCITC)10.1109/GCITC60406.2023.10426173(1-4)Online publication date: 1-Dec-2023
    • (2022)Depression Classification Using Frequent Subgraph Mining Based on Pattern Growth of Frequent Edge in Functional Magnetic Resonance Imaging Uncertain NetworkFrontiers in Neuroscience10.3389/fnins.2022.88910516Online publication date: 29-Apr-2022
    • (2022)Approximating probabilistic group steiner trees in graphsProceedings of the VLDB Endowment10.14778/3565816.356583416:2(343-355)Online publication date: 1-Oct-2022
    • (2022)On the Evolutionary of Bloom Filter False Positives - An Information Theoretical Approach to Optimizing Bloom Filter ParametersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3200045(1-13)Online publication date: 2022
    • (2022)A Graph Structure to Discover Patterns in Unstructured Processes of Product Development2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00057(222-227)Online publication date: Aug-2022
    • (2021)UFreS: A New Technique for Discovering Frequent Subgraph Patterns in Uncertain Graph Databases2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00042(253-260)Online publication date: Dec-2021
    • (2020)An efficient mining algorithm for maximal frequent patterns in uncertain graph databaseJournal of Intelligent & Fuzzy Systems10.3233/JIFS-200237(1-13)Online publication date: 17-Oct-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media