Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3292500.3330923acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

Published: 25 July 2019 Publication History

Abstract

Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as all possible subsequences, which leads to diagonal dominance of the kernel matrix, or (iii) rely on non-positive-definite similarity measures derived from the edit distance. Furthermore, while there have been works addressing the computational challenge with respect to the length of string, most of them still experience quadratic complexity in terms of the number of training samples when used in a kernel-based classifier. In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings through global alignments, (ii) maintain positive-definiteness of the kernel, without introducing a diagonal dominant kernel matrix, and (iii) have a training cost linear with respect to not only the length of the string but also the number of training string samples. To this end, the proposed kernels are explicitly defined through a series of different random feature maps, each corresponding to a distribution of random strings. We show that kernels defined this way are always positive-definite, and exhibit computational benefits as they always produce Random String Embeddings (RSE) that can be directly used in any linear classification models. Our extensive experiments on nine benchmark datasets corroborate that RSE achieves better or comparable accuracy in comparison to state-of-the-art baselines, especially with the strings of longer lengths. In addition, we empirically show that RSE scales linearly with the increase of the number and the length of string.

References

[1]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology 2, 3 (2011), 27.
[2]
Jie Chen, Lingfei Wu, Kartik Audhkhasi, Brian Kingsbury, and Bhuvana Ramab- hadrari. 2016. Efficient one-vs-one kernel ridge regression for speech recognition. In ICASSP. IEEE, 2454--2458.
[3]
Yihua Chen, Eric K Garcia, Maya R Gupta, Ali Rahimi, and Luca Cazzanti. 2009. Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research 10, Mar (2009), 747--776.
[4]
Kyunghyun Cho, Bart Van Merrie?nboer, Dzmitry Bahdanau, and Yoshua Ben- gio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259 (2014).
[5]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014).
[6]
Corinna Cortes, Patrick Haffner, and Mehryar Mohri. 2004. Rational kernels: Theory and algorithms. Journal of Machine Learning Research 5, Aug (2004), 1035--1062.
[7]
Nello Cristianini, John Shawe-Taylor, et al. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.
[8]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research 9, Aug (2008), 1871--1874.
[9]
Muhammad Farhan, Juvaria Tariq, Arif Zaman, Mudassir Shabbir, and Imdad Ul- lah Khan. 2017. Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification. In NIPS. 6938--6948.
[10]
Andrew Frank and Arthur Asuncion. 2010. UCI Machine Learning Repository {http://archive. ics. uci. edu/ml}. Irvine, CA: University of California. School of information and computer science 213 (2010).
[11]
Leo Gordon, Alexey Ya Chervonenkis, Alex J Gammerman, Ilham A Shahmuradov, and Victor V Solovyev. 2003. Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19, 15 (2003), 1964--1971.
[12]
Derek Greene and Padraig Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In ICML. ACM, 377--384.
[13]
Klaus Greff, Rupesh K Srivastava, Jan Koutnik, Bas R Steunebrink, and Jurgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222--2232.
[14]
Bernard Haasdonk and Claus Bahlmann. 2004. Learning with distance substitu- tion kernels. In Joint Pattern Recognition Symposium. Springer, 220--227.
[15]
David Haussler. 1999. Convolution kernels on discrete structures. Technical Report. Department of Computer Science, University of California at Santa Cruz.
[16]
Po-Sen Huang, Haim Avron, Tara N Sainath, Vikas Sindhwani, and Bhuvana Ramabhadran. 2014. Kernel methods match Deep Neural Networks on TIMIT. In ICASSP. 205--209.
[17]
Catalin Ionescu, Alin Popa, and Cristian Sminchisescu. 2017. Large-scale data- dependent kernel approximation. In AIStats. 19--27.
[18]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization. CoRR abs/1412.6980 (2014).
[19]
Rui Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund, and Christina Leslie. 2005. Profile-based string kernels for remote homology detection and motif extraction. Journal of bioinformatics and computational biology 3, 03 (2005), 527--550.
[20]
Pavel P Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. 2009. Scalable algorithms for string kernels with inexact matching. In NIPS. 881--888.
[21]
Christina Leslie, Eleazar Eskin, and William Stafford Noble. 2001. The spectrum kernel: A string kernel for SVM protein classification. In Biocomputing 2002. World Scientific, 564--575.
[22]
ChristinaLeslie,EleazarEskin,JasonWeston,andWilliamStaffordNoble.2003. Mismatch string kernels for SVM protein classification. In NIPS. Neural informa- tion processing systems foundation.
[23]
Christina Leslie and Rui Kuang. 2004. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, Nov (2004), 1435--1455.
[24]
ChristinaSLeslie,EleazarEskin,AdielCohen,JasonWeston,andWilliamStafford Noble. 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 4 (2004), 467--476.
[25]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.
[26]
HumaLodhi,CraigSaunders,JohnShawe-Taylor,NelloCristianini,andChris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research 2, Feb (2002), 419--444.
[27]
Gaelle Loosli, Stephane Canu, and Cheng Soon Ong. 2016. Learning SVM in Krein spaces. IEEE transactions on pattern analysis and machine intelligence 38, 6 (2016), 1204--1216.
[28]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443--453.
[29]
Michel Neuhaus and Horst Bunke. 2006. Edit distance-based kernel functions for structural pattern classification. Pattern Recognition 39, 10 (2006), 1852--1863.
[30]
Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In NIPS. 1177--1184.
[31]
Alessandro Rudi and Lorenzo Rosasco. 2017. Generalization properties of learning with random features. In NIPS. 3218--3228.
[32]
Hiroto Saigo, Jean-Philippe Vert, Nobuhisa Ueda, and Tatsuya Akutsu. 2004. Protein homology detection using string alignment kernels. Bioinformatics 20, 11 (2004), 1682--1689.
[33]
Bernhard Scholkopf, Koji Tsuda, Jean-Philippe Vert, Director Sorin Istrail, Pavel A Pevzner, Michael S Waterman, et al. 2004. Kernel methods in computational biology. MIT press.
[34]
Si Si, Cho-Jui Hsieh, and Inderjit S Dhillon. 2017. Memory efficient kernel approximation. The Journal of Machine Learning Research 18, 1 (2017), 682--713.
[35]
Temple F Smith and Michael S Waterman. 1981. Comparison of biosequences. Advances in applied mathematics 2, 4 (1981), 482--489.
[36]
Alex J Smola and SVN Vishwanathan. 2003. Fast kernels for string and tree matching. In NIPS. 585--592.
[37]
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[38]
MICHAEL S Waterman, JANA Joyce, and Mark Eggert. 1991. Computer alignment of sequences. Phylogenetic analysis of DNA sequences (1991), 59--72.
[39]
Chris Watkins. 1999. Dynamic alignment kernels. NIPS (1999), 39--50.
[40]
Jason Weston, Bernhard Scholkopf, Eleazar Eskin, Christina Leslie, and William Stafford Noble. 2003. Dealing with large diagonals in kernel matrices. Annals of the Institute of Statistical Mathematics 55, 2 (2003), 391--408.
[41]
Christopher KI Williams and Matthias Seeger. 2001. Using the Nystrom method to speed up kernel machines. In NIPS. 682--688.
[42]
Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, and Charu Aggarwal. 2018. Scalable spectral clustering using random binning features. In KDD. ACM, 2506--2515.
[43]
Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. 2016. Revisiting random binning features: Fast convergence and strong parallelizability. In KDD. ACM, 1265--1274. {44} Lingfei Wu, Ian EH Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, and Michael J Witbrock. 2018. Word Mover's Embedding: From Word2Vec to Document Embedding. EMNLP (2018).
[44]
Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikuma, and Michael Witbrock. 2018. D2KE: From Distance to Kernel and Embedding. arXiv:1802.04956 (2018). {46} Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock. 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In AIStats. 793--802.
[45]
Zhengzheng Xing, Jian Pei, and Eamonn Keogh. 2010. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter 12, 1 (2010), 40--48.
[46]
Li Yujian and Liu Bo. 2007. A normalized Levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence 29, 6 (2007), 1091--1095.

Cited By

View all
  • (2022)Sequence graph transform (SGT): a feature embedding function for sequence data miningData Mining and Knowledge Discovery10.1007/s10618-021-00813-036:2(668-708)Online publication date: 4-Jan-2022

Index Terms

  1. Efficient Global String Kernel with Random Features: Beyond Counting Substructures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. random features
    2. string embedding
    3. string kernel

    Qualifiers

    • Research-article

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Sequence graph transform (SGT): a feature embedding function for sequence data miningData Mining and Knowledge Discovery10.1007/s10618-021-00813-036:2(668-708)Online publication date: 4-Jan-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media