Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1835804.1835910acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large linear classification when data cannot fit in memory

Published: 25 July 2010 Publication History

Abstract

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.

Supplementary Material

JPG File (kdd2010_yu_llcw_01.jpg)
MOV File (kdd2010_yu_llcw_01.mov)

References

[1]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A library for large linear classification," JMLR, vol. 9, pp. 1871--1874, 2008.
[2]
T. Joachims, "Training linear {SVM}s in linear time," in ACM KDD, 2006.
[3]
S. Shalev-Shwartz, Y. Singer, and N. Srebro, "Pegasos: primal estimate. sub-gradient solver for {SVM}," in ICML, 2007.
[4]
L. Bottou, "Stochastic gradient descent examples," 2007. http://leon.bottou.org/projects/sgd.
[5]
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, "A dual coordinate descent method for large-scale linear SVM," in ICML. 2008.
[6]
J. Langford, L. Li, and T. Zhang, "Sparse online learning via truncated gradient," JMLR, vol. 10, pp. 771--801, 2009.
[7]
E. Chang, K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu, and H. Cui, "Parallelizing support vector machines on distributed computers," in NIPS 21, 2007.
[8]
Z. A. Zhu, W. Chen, G. Wang, C. Zhu, and Z. Chen, "P-pack SVM: Parallel primal gradient descent kernel {SVM}," in ICDM, 2009.
[9]
J. Langford, A. J. Smola, and M. Zinkevich, "Slow learners are fast," in NIPS, 2009.
[10]
H. Yu, J. Yang, and J. Han, "Classifying large data sets using SVMs with hierarchical clusters," in ACM KDD, 2003.
[11]
D. P. Bertsekas, Nonlinear Programming. Belmont, MA 02178--9998: Athena Scientific, second ed., 1999.
[12]
C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vecto. machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[13]
T. Joachims, "Making large-scale {SVM} learning practical," in Advance. in Kernel Methods - Support Vector Learning, MIT Press, 1998.
[14]
R. Memisevic, "Dual optimization of conditional probability models," tech. rep., Department of Computer Science, University of Toronto, 2006.
[15]
F. Perez-Cruz, A. R. Figueiras-Vidal, and A. Artes-Rodriguez, "Double chunking for solving SVMs for very large datasets," in Proceedings of Learning 2004, Spain, 2004.
[16]
S. Ruping, "my{SVM} - another one of those support vector machines," 2000. Software available at http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/.
[17]
K. G. Morse, Jr., "Compression tools compared," Linux Journal, 2005.
[18]
K. Crammer and Y. Singer, "On the learnability and design of output codes for multiclass problems," in COLT, 2000.
[19]
P. Rai, H. Daume III, and S. Venkatasubramanian, "Streamed learning: One-pass SVMs," in IJCAI, 2009.
[20]
Z.-Q. Luo and P. Tseng, "On the convergence of coordinate descent method for convex differentiable minimization," J. Optim. Theory Appl., vol. 72, no. 1, pp. 7--35, 1992.

Cited By

View all
  • (2024)Automated Disaster Monitoring From Social Media Posts Using AI-Based Location Intelligence and Sentiment AnalysisIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315714211:4(4614-4624)Online publication date: Aug-2024
  • (2024)An Innovative Way of Analyzing COVID Topics with LLMJournal of Economy and Technology10.1016/j.ject.2024.11.004Online publication date: Nov-2024
  • (2023)A New Social Media Analytics Method for Identifying Factors Contributing to COVID-19 Discussion TopicsInformation10.3390/info1410054514:10(545)Online publication date: 5-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SVM
  2. block minimization
  3. large scale learning

Qualifiers

  • Research-article

Conference

KDD '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automated Disaster Monitoring From Social Media Posts Using AI-Based Location Intelligence and Sentiment AnalysisIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315714211:4(4614-4624)Online publication date: Aug-2024
  • (2024)An Innovative Way of Analyzing COVID Topics with LLMJournal of Economy and Technology10.1016/j.ject.2024.11.004Online publication date: Nov-2024
  • (2023)A New Social Media Analytics Method for Identifying Factors Contributing to COVID-19 Discussion TopicsInformation10.3390/info1410054514:10(545)Online publication date: 5-Oct-2023
  • (2022)Automated Analysis of Australian Tropical Cyclones with Regression, Clustering and Convolutional Neural NetworkSustainability10.3390/su1416983014:16(9830)Online publication date: 9-Aug-2022
  • (2022)A New Decision Support System for Analyzing Factors of Tornado Related Deaths in BangladeshSustainability10.3390/su1410630314:10(6303)Online publication date: 22-May-2022
  • (2021)Knowledge Discovery of Global Landslides Using Automated Machine Learning AlgorithmsIEEE Access10.1109/ACCESS.2021.31150439(131400-131419)Online publication date: 2021
  • (2018)Faster learning by reduction of data access timeApplied Intelligence10.1007/s10489-018-1235-x48:12(4715-4729)Online publication date: 1-Dec-2018
  • (2017)Group-Based Alternating Direction Method of Multipliers for Distributed Linear ClassificationIEEE Transactions on Cybernetics10.1109/TCYB.2016.257080847:11(3568-3582)Online publication date: Nov-2017
  • (2016)Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939864(1875-1884)Online publication date: 13-Aug-2016
  • (2016)Support vector analysis of large-scale data based on kernels with iteratively increasing orderThe Journal of Supercomputing10.1007/s11227-015-1404-172:9(3297-3311)Online publication date: 1-Sep-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media