Benchmarking Classification Algorithms on High-Performance Computing Clusters

Bernd Bischl²¹,
Julia Schiffner²¹ &
Claus Weihs²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

5393 Accesses

Abstract

Comparing and benchmarking classification algorithms is an important topic in applied data analysis. Extensive and thorough studies of such a kind will produce a considerable computational burden and are therefore best delegated to high-performance computing clusters. We build upon our recently developed R packages BatchJobs (Map, Reduce and Filter operations from functional programming for clusters) and BatchExperiments (Parallelization and management of statistical experiments). Using these two packages, such experiments can now effectively and reproducibly be performed with minimal effort for the researcher. We present benchmarking results for standard classification algorithms and study the influence of pre-processing steps on their performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using p-values for the comparison of classifiers: pitfalls and alternatives

Article 11 April 2022

Instance spaces for machine learning classification

Article 28 December 2017

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Article Open access 16 May 2023

References

Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, Berkeley, CA.
Google Scholar
Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., & Weihs, C. (2012). Computing on high performance clusters with R: Packages BatchJobs and BatchExperiments. SFB 876, TU Dortmund University. http://sfb876.tu-dortmund.de/PublicPublicationFiles/bischl_etal_2012a.pdf
Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173, 781–800.
Article MathSciNet MATH Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
MATH Google Scholar
Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods (2nd ed.). New York: Wiley.
MATH Google Scholar
Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4), 455–492.
Article MathSciNet MATH Google Scholar
King, R. D., Feng, C., & Sutherland, A. (1995). StatLog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3), 289–333.
Article Google Scholar
Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., & Konen, W. (2012). Tuning and evolution of support vector kernels. Evolutionary Intelligence, 5(3), 153–170.
Article Google Scholar
Pechenizkiy, M., Tsymbal, A., & Puuronen, S. (2004). PCA-based feature transformation for classification: Issues in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. Silver Spring: IEEE Computer Society.
Google Scholar
Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M., & Maechler, M. (2012). Robustbase: Basic Robust Statistics. R package version 0.9–2. URL http://CRAN.R-project.org/package=robustbase.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212–232.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Statistics, Department of Statistics, TU Dortmund, Dortmund, Germany
Bernd Bischl (Chair), Julia Schiffner (Chair) & Claus Weihs (Chair)

Authors

Bernd Bischl
View author publications
You can also search for this author in PubMed Google Scholar
Julia Schiffner
View author publications
You can also search for this author in PubMed Google Scholar
Claus Weihs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernd Bischl .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Myra Spiliopoulou
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Lars Schmidt-Thieme
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Ruth Janning

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bischl, B., Schiffner, J., Weihs, C. (2014). Benchmarking Classification Algorithms on High-Performance Computing Clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-01595-8_3
Published: 10 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics