Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020511acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Dual active feature and sample selection for graph classification

Published: 21 August 2011 Publication History

Abstract

Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labeling graph data is quite expensive and time consuming for many real-world applications. In order to reduce the labeling cost for graph data, we address the problem of how to select the most important graph to query for the label. This problem is challenging and different from conventional active learning problems because there is no predefined feature vector. Moreover, the subgraph enumeration problem is NP-hard. The active sample selection problem and the feature selection problem are correlated for graph data. Before we can solve the active sample selection problem, we need to find a set of optimal subgraph features. To address this challenge, we demonstrate how one can simultaneously estimate the usefulness of a query graph and a set of subgraph features. The idea is to maximize the dependency between subgraph features and graph labels using an active learning framework. We propose a branch-and-bound algorithm to search for the optimal query graph and optimal features simultaneously. Empirical studies on nine real-world tasks demonstrate that the proposed method can obtain better accuracy on graph data than alternative approaches.

References

[1]
M. F. Balcan, A. Z. Broder, and T. Zhang. Margin based active learning. In COLT, pages 35--50, 2007.
[2]
C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM, pages 211--218, 2002.
[3]
K. M. Borgwardt. Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich, 2007.
[4]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[5]
H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying bug signatures using discrimative graph mining. In ISSTA, pages 141--152, 2009.
[6]
I. Dagan and S. P. Engenlson. Committee-based sampling for training probabilistic classifiers. In ICML, pages 150--157, 1995.
[7]
S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In ICML, pages 208--215, 2008.
[8]
P. Donmez, J. G. Carbonell, and P. N. Bennett. Dual strategy active learning. In ECML, pages 116--127, 2007.
[9]
W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. S. Yu, and O. Verscheure. Direct mining of discriminative and essential frequent patterns via model-based search tree. In KDD, pages 230--238, 2008.
[10]
Y. Freund, E. S. H. S. Seung, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2--3):133--168, 1997.
[11]
T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In COLT/Kernel, 2003.
[12]
A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf. Measuring statistical dependence with Hilbert-Schmidt norms. In ALT, pages 63--77, Singapore, 2005.
[13]
T. Horvath, T. Gärtner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In KDD, 2004.
[14]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. In ICDM, pages 549--552, 2003.
[15]
S.-J. Huang, R. Jin, and Z.-H. Zhou. Active learning by querying informative and representative examples. In NIPS. 2011.
[16]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000.
[17]
N. Jin, C. Young, and W. Wang. GAIA: graph classification using evolutionary computation. In SIGMOD, pages 879--890, 2010.
[18]
X. Kong and P. S. Yu. Semi-supervised feature selection for graph classification. In KDD, pages 793--802, 2010.
[19]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001.
[20]
H. T. Nguyen and A. W. M. . Smeulders. Active learning using pre-cluster. In ICML, pages 623--630, 2004.
[21]
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004.
[22]
M. Opper, H. S. Seung, and H. Sompolinsky. Query by committee. In COLT, pages 287--294, 1992.
[23]
B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin--Madison, 2009.
[24]
M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In SDM, pages 1075--1086, 2009.
[25]
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In ICML, pages 999--1006, 2000.
[26]
X. Yan, H. Cheng, J. Han, and P. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008.
[27]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002.
[28]
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent struture analysis. ACM Transactions on Database Systems, 30(4):960--993, 2005.
[29]
B. Yang, J. Sun, T. Wang, and Z. Chen. Effective multi-label active learning for text classification. In KDD, pages 917--926, 2009.
[30]
K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, pages 1081--1088, 2006.

Cited By

View all
  • (2024)Real-Time Sampling Strategies for Regression with Irrelevant FeaturesInternational Journal of Semantic Computing10.1142/S1793351X2443001318:03(417-435)Online publication date: 29-May-2024
  • (2022)Depression Classification Using Frequent Subgraph Mining Based on Pattern Growth of Frequent Edge in Functional Magnetic Resonance Imaging Uncertain NetworkFrontiers in Neuroscience10.3389/fnins.2022.88910516Online publication date: 29-Apr-2022
  • (2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
  • Show More Cited By

Index Terms

  1. Dual active feature and sample selection for graph classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. active learning
    2. data mining
    3. feature construction
    4. feature selection
    5. graph classification
    6. subgraph pattern

    Qualifiers

    • Research-article

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Real-Time Sampling Strategies for Regression with Irrelevant FeaturesInternational Journal of Semantic Computing10.1142/S1793351X2443001318:03(417-435)Online publication date: 29-May-2024
    • (2022)Depression Classification Using Frequent Subgraph Mining Based on Pattern Growth of Frequent Edge in Functional Magnetic Resonance Imaging Uncertain NetworkFrontiers in Neuroscience10.3389/fnins.2022.88910516Online publication date: 29-Apr-2022
    • (2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
    • (2022)Unsupervised Instance and Subnetwork Selection for Network Data2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032410(1-10)Online publication date: 13-Oct-2022
    • (2021)ALPINE: Active Link Prediction Using Network EmbeddingApplied Sciences10.3390/app1111504311:11(5043)Online publication date: 29-May-2021
    • (2021)Feedback-Based Dynamic Feature Selection for Constrained Continuous Data Acquisition2021 American Control Conference (ACC)10.23919/ACC50511.2021.9483180(3507-3512)Online publication date: 25-May-2021
    • (2019)Joint Active Learning with Feature Selection via CUR Matrix DecompositionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.284098041:6(1382-1396)Online publication date: 1-Jun-2019
    • (2019)Tree++: Truncated Tree Based Graph KernelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2946149(1-1)Online publication date: 2019
    • (2019)Recursive Maximum Margin Active LearningIEEE Access10.1109/ACCESS.2019.29153347(59933-59943)Online publication date: 2019
    • (2019)Learning compact graph representations via an encoder-decoder networkApplied Network Science10.1007/s41109-019-0157-94:1Online publication date: 18-Jul-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media