Nothing Special   »   [go: up one dir, main page]

Skip to main content

Benchmarking Classification Algorithms on High-Performance Computing Clusters

  • Conference paper
  • First Online:
Data Analysis, Machine Learning and Knowledge Discovery

Abstract

Comparing and benchmarking classification algorithms is an important topic in applied data analysis. Extensive and thorough studies of such a kind will produce a considerable computational burden and are therefore best delegated to high-performance computing clusters. We build upon our recently developed R packages BatchJobs (Map, Reduce and Filter operations from functional programming for clusters) and BatchExperiments (Parallelization and management of statistical experiments). Using these two packages, such experiments can now effectively and reproducibly be performed with minimal effort for the researcher. We present benchmarking results for standard classification algorithms and study the influence of pre-processing steps on their performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, Berkeley, CA.

    Google Scholar 

  • Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., & Weihs, C. (2012). Computing on high performance clusters with R: Packages BatchJobs and BatchExperiments. SFB 876, TU Dortmund University. http://sfb876.tu-dortmund.de/PublicPublicationFiles/bischl_etal_2012a.pdf

  • Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173, 781–800.

    Article  MathSciNet  MATH  Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    MATH  Google Scholar 

  • Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods (2nd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4), 455–492.

    Article  MathSciNet  MATH  Google Scholar 

  • King, R. D., Feng, C., & Sutherland, A. (1995). StatLog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3), 289–333.

    Article  Google Scholar 

  • Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., & Konen, W. (2012). Tuning and evolution of support vector kernels. Evolutionary Intelligence, 5(3), 153–170.

    Article  Google Scholar 

  • Pechenizkiy, M., Tsymbal, A., & Puuronen, S. (2004). PCA-based feature transformation for classification: Issues in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. Silver Spring: IEEE Computer Society.

    Google Scholar 

  • Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M., & Maechler, M. (2012). Robustbase: Basic Robust Statistics. R package version 0.9–2. URL http://CRAN.R-project.org/package=robustbase.

  • Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212–232.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernd Bischl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bischl, B., Schiffner, J., Weihs, C. (2014). Benchmarking Classification Algorithms on High-Performance Computing Clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_3

Download citation

Publish with us

Policies and ethics