Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICDM.2005.90guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Making Logistic Regression a Core Data Mining Tool with TR-IRLS

Published: 27 November 2005 Publication History

Abstract

Binary classification is a core data mining task. For large datasets or real-time applications, desirable classifiersare accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively re-weighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary.

References

[1]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[2]
J. E. Gentle. Elements of Computational Statistics. Statistics and Computing. Springer Verlag, 2002.
[3]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Verlag, 2001.
[4]
D. W. Hosmer and S. Lemeshow. Applied Logistic Regression. Wiley, 2nd edition, 2000.
[5]
http://www.sas.com/. SAS. http://www.sas.com/.
[6]
T. Joachims. SVM light , 2002. svmlight.joachims.org.
[7]
P. Komarek. Logistic Regression for Data Mining and High-Dimensional Classification. Technical Report TR-O4-34, Robotics Inst., Carnegie Mellon Univ., Pgh, PA, May 2004.
[8]
P. Komarek. Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity. Technical Report TR-O5-27, Robotics Inst., Carnegie Mellon Univ., Pgh, PA, May 2004.
[9]
P. Komarek. Datasets, 2005. http://komarix.org/ac/ds.
[10]
P. Komarek and A. Moore. Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs. In Artificial Intelligence and Statistics, 2003.
[11]
J. Kubica, A. Goldenberg, P. Komarek, A. Moore, and J. Schneider. A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion. In KDD Workshop on Link Analysis for Detecting Complex Behavior, page 8, August 2003.
[12]
T. Liu, A. Moore, and A. Gray. Efficient Exact k-NN and Nonparametric Classification in High Dimensions. In Proc. of Neural Information Processing Systems, 2003.
[13]
P. McCullagh and J. A. Nelder. Generalized Linear Models, volume 37 of Monographs on Statistics and Applied Probability. Chapman & Hall, 2 edition, 1989.
[14]
A. McIntosh. Fitting Linear Models: An Application of Conjugate Gradient Algorithms, volume 10 of Lecture Notes in Statistics. Springer-Verlag, New York, 1982.
[15]
T. P. Minka. Algorithms for maximum-likelihood logistic regression. Technical Report Stats 758, Carnegie Mellon University, October 2001.
[16]
A. Moore, P. Komarek, and J. Ostlund. Activity Prediction From Links, 2004. http://www.autonlab.org.
[17]
S. G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, 1996.
[18]
M. Orr. Introduction to Radial Basis Function Networks, 1996. http://www.anc.ed.ac.uk/~mjo/rbf.html.
[19]
J. R. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical Report CS- 94-125, Carnegie Mellon University, Pittsburgh, 1994.
[20]
T. Zhang and F. J. Oles. Text Categorization Based on Regularized Linear Classification Methods. Kluwer, 2001.
[21]
J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14(1):185-205, March 2005.

Cited By

View all
  • (2018)Emerging trend of big data analytics in bioinformaticsInternational Journal of Bioinformatics Research and Applications10.5555/3192082.319209114:1-2(144-205)Online publication date: 1-Jan-2018
  • (2017)Body orientation estimation with the ensemble of logistic regression classifiersMultimedia Tools and Applications10.1007/s11042-016-4129-076:22(23589-23605)Online publication date: 1-Nov-2017
  • (2014)omniClassifierProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2649439(514-523)Online publication date: 20-Sep-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining
November 2005
837 pages
ISBN:0769522785

Publisher

IEEE Computer Society

United States

Publication History

Published: 27 November 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Emerging trend of big data analytics in bioinformaticsInternational Journal of Bioinformatics Research and Applications10.5555/3192082.319209114:1-2(144-205)Online publication date: 1-Jan-2018
  • (2017)Body orientation estimation with the ensemble of logistic regression classifiersMultimedia Tools and Applications10.1007/s11042-016-4129-076:22(23589-23605)Online publication date: 1-Nov-2017
  • (2014)omniClassifierProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2649439(514-523)Online publication date: 20-Sep-2014
  • (2014)Transfer Learning for Emotional Polarity ClassificationProceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0210.1109/WI-IAT.2014.85(94-101)Online publication date: 11-Aug-2014
  • (2012)Collective context-aware topic models for entity disambiguationProceedings of the 21st international conference on World Wide Web10.1145/2187836.2187935(729-738)Online publication date: 16-Apr-2012
  • (2012)Scalable subspace logistic regression models for high dimensional dataProceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications10.1007/978-3-642-29253-8_65(685-694)Online publication date: 11-Apr-2012
  • (2009)Integrating genomic data and topological metrics to obtain reliable protein-protein interactionsProceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 510.5555/1801874.1801887(57-60)Online publication date: 14-Aug-2009
  • (2009)Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literatureProceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology10.1007/978-3-642-13131-8_8(62-70)Online publication date: 28-Jun-2009
  • (2008)Contextual advertising by combining relevance with click feedbackProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367554(417-426)Online publication date: 21-Apr-2008

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media