Abstract
Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon and Gärtner (Proceedings of the first international workshop on mining graphs, trees and sequences, pp. 65–74, 2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels based on the count of common walks, on the one hand, and kernels that emphasize the detection of large common subtrees, on the other hand. We also propose two modular extensions to this formulation. The first extension increases the number of subtrees that define the feature space, and the second one removes noisy features from the graph representations. We validate experimentally these new kernels on problems of toxicity and anti-cancer activity prediction for small molecules with support vector machines.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2005). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl. 1), i47–i56.
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines.
Chen, J., Swamidass, S. J., Dou, Y., Bruand, J., & Baldi, P. (2005). ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics, 21(22), 4133–4139.
Deshpande, M., Kuramochi, M., Wale, N., & Karypis, G. (2005). Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050.
Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In B. Schölkopf & M. Warmuth (Eds.), Lecture notes in computer science : Vol. 2777. Proceedings of the sixteenth annual conference on computational learning theory and the seventh annual workshop on kernel machines (pp. 129–143). Heidelberg: Springer.
Helma, C., Cramer, T., Kramer, S., & De Raedt, L. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information and Computer Sciences, 44(4), 1402–1411.
Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 158–167). New York: ACM Press.
Inokuchi, A., Washio, T., & Motoda, H. (2003). Complete mining of frequent patterns from graphs: mining graph data. Machine Learning, 50(3), 321–354.
Karklin, Y., Meraz, R. F., & Holbrook, S. R. (2005). Classification of non-coding RNA using graph representations of secondary structure. In Pacific symposium on biocomputing (pp. 4–15).
Kashima, H., Tsuda, K., & Inokuchi, A. (2004). Kernels for graphs. In B. Schölkopf, K. Tsuda, & J. P. Vert (Eds.), Kernel methods in computational biology (pp. 155–170). Cambridge: MIT Press.
King, R. D., Muggleton, S. H., Srinivasan, A., & Sternberg, M. J. (1996). Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences of the United States of America, 93(1), 438–442.
Leach, A. R., & Gillet, V. J. (2003). An introduction to chemoinformatics. Dordrecht: Kluwer Academic.
Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: a string kernel for SVM protein classification. In R. B. Altman, A. K. Dunker, L. Hunter, K. Lauerdale, & T. E. Klein (Eds.), Proceedings of the Pacific symposium on biocomputing 2002 (pp. 564–575). Singapore: World Scientific.
Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure-activity relationship analysis with support vector machines. Journal of Chemical Information and Modeling, 45(4), 939–951.
Menchetti, S., Costa, F., & Frasconi, P. (2005). Weighted decomposition kernels. In L. De Raedt & S. Wrobel (Eds.), Proceedings of the twenty-second international conference on machine learning (ICML 2005) (pp. 585–592). New York: Assoc. Comput. Mach.
Ralaivola, L., Swamidass, S. J., Saigo, H., & Baldi, P. (2005). Graph kernels for chemical informatics. Neural Networks, 18(8), 1093–1110.
Ramon, J., & Gärtner, T. (2003). Expressivity versus efficiency of graph kernels. In T. Washio & L. De Raedt (Eds.), Proceedings of the first international workshop on mining graphs, trees and sequences (pp. 65–74).
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(Suppl. 1), i359–i368.
Author information
Authors and Affiliations
Corresponding author
Additional information
Guest Editors: Thomas Gärtner, Gemma C. Garriga.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Mahé, P., Vert, JP. Graph kernels based on tree patterns for molecules. Mach Learn 75, 3–35 (2009). https://doi.org/10.1007/s10994-008-5086-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5086-2