Abstract
Several experiments aimed to apply recently proposed statistical procedures which are recommended for analysing multiple 1×n and n×n comparisons of machine learning algorithms were conducted. 11 regression algorithms comprising 5 deterministic and 6 neural network ones implemented in the data mining system KEEL were employed. All experiments were performed using 29 benchmark datasets for regression. The investigation proved the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alcalá-Fdez, J., et al.: KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems. Soft Computing 13(3), 307–318 (2009)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bergmann, G., Hommel, G.: Improvements of general multiple test procedures for redundant systems of hypotheses. In: Bauer, P., Hommel, G., Sonnemann, E. (eds.) Multiple Hypotheses Testing, pp. 100–115. Springer, Berlin (1988)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Dunn, O.J.: Multiple comparisons among means. Journal of the American Statistical Association 56(238), 52–64 (1961)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of the American Statistical Assoc. 32(200), 675–701 (1937)
García, S., Fernandez, A., Luengo, J., Herrera, F.: A Study of Statistical Techniques and Performance Measures for Genetics-Based Machine Learning: Accuracy and Interpretability. Soft Computing 13(10), 959–977 (2009)
García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
Graczyk, M., Lasota, T., Trawiński, B.: Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 800–812. Springer, Heidelberg (2009)
Güvenir, H.A., Uysal, I.: Function Approximation Repository, Bilkent University (2000), http://funapp.cs.bilkent.edu.tr
Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–803 (1988)
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
Hommel, G., Bernhard, G.: A rapid algorithm and a computer program for multiple test procedures using procedures using logical structures of hypotheses. Computer Methods and Programs in Biomedicine 43, 213–216 (1994)
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Communications in Statistics 18, 571–595 (1980)
KEEL (Knowledge Extraction based on Evolutionary Learning), KEEL-dataset, http://www.keel.es
Krzystanek, M., Lasota, T., Trawiński, B.: Comparative Analysis of Evolutionary Fuzzy Models for Premises Valuation Using KEEL. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 838–849. Springer, Heidelberg (2009)
Lasota, T., Makos, M., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., et al. (eds.) New Challenges in Computational Collective Intelligence. SCI, vol. 244, pp. 337–348. Springer, Berlin (2009)
Lasota, T., Mazurkiewicz, J., Trawiński, B., Trawiński, K.: Comparison of Data Driven Models for the Validation of Residential Premises using KEEL. International Journal of Hybrid Intelligent Systems 7(1), 3–16 (2010)
Lasota, T., Sachnowski, P., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 776–787. Springer, Heidelberg (2009)
Luengo, J., García, S., Herrera, F.: A Study on the Use of Statistical Tests for Experimentation with Neural Networks: Analysis of Parametric Test Conditions and Non-Parametric Tests. Expert Systems with Applications 36, 7798–7808 (2009)
Nemenyi, P.B.: Distribution-free Multiple comparisons. PhD thesis, Princeton University (1963)
Salzberg, S.L.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1, 317–327 (1997)
Shaffer, J.P.: Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association 81(395), 826–831 (1986)
Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. of Psych. 46, 561–584 (1995)
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)
Torgo, L.: University of Porto (LIACC), Regression DataSets, http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Wright, S.P.: Adjusted p-values for simultaneous inference. Biometrics 48, 1005–1013 (1992)
Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)
Zar, J.H.: Biostatistical Analysis, 5th edn. Prentice-Hall, Englewood Cliffs (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graczyk, M., Lasota, T., Telec, Z., Trawiński, B. (2010). Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15387-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-15387-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15386-0
Online ISBN: 978-3-642-15387-7
eBook Packages: Computer ScienceComputer Science (R0)