Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems

Magdalena Graczyk²³,
Tadeusz Lasota²⁴,
Zbigniew Telec²³ &
…
Bogdan Trawiński²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6276))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1971 Accesses
15 Citations
1 Altmetric

Abstract

Several experiments aimed to apply recently proposed statistical procedures which are recommended for analysing multiple 1×n and n×n comparisons of machine learning algorithms were conducted. 11 regression algorithms comprising 5 deterministic and 6 neural network ones implemented in the data mining system KEEL were employed. All experiments were performed using 29 benchmark datasets for regression. The investigation proved the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Comparison of Robust Model Choice Criteria Within a Metalearning Study

A Metalearning Study for Robust Nonlinear Regression

rNPBST: An R Package Covering Non-parametric and Bayesian Statistical Tests

References

Alcalá-Fdez, J., et al.: KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems. Soft Computing 13(3), 307–318 (2009)
Article Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bergmann, G., Hommel, G.: Improvements of general multiple test procedures for redundant systems of hypotheses. In: Bauer, P., Hommel, G., Sonnemann, E. (eds.) Multiple Hypotheses Testing, pp. 100–115. Springer, Berlin (1988)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Google Scholar
Dunn, O.J.: Multiple comparisons among means. Journal of the American Statistical Association 56(238), 52–64 (1961)
Article MATH MathSciNet Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of the American Statistical Assoc. 32(200), 675–701 (1937)
Article Google Scholar
García, S., Fernandez, A., Luengo, J., Herrera, F.: A Study of Statistical Techniques and Performance Measures for Genetics-Based Machine Learning: Accuracy and Interpretability. Soft Computing 13(10), 959–977 (2009)
Article Google Scholar
García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
Google Scholar
Graczyk, M., Lasota, T., Trawiński, B.: Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 800–812. Springer, Heidelberg (2009)
Chapter Google Scholar
Güvenir, H.A., Uysal, I.: Function Approximation Repository, Bilkent University (2000), http://funapp.cs.bilkent.edu.tr
Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–803 (1988)
Article MATH MathSciNet Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
MATH MathSciNet Google Scholar
Hommel, G., Bernhard, G.: A rapid algorithm and a computer program for multiple test procedures using procedures using logical structures of hypotheses. Computer Methods and Programs in Biomedicine 43, 213–216 (1994)
Article Google Scholar
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Communications in Statistics 18, 571–595 (1980)
Article Google Scholar
KEEL (Knowledge Extraction based on Evolutionary Learning), KEEL-dataset, http://www.keel.es
Krzystanek, M., Lasota, T., Trawiński, B.: Comparative Analysis of Evolutionary Fuzzy Models for Premises Valuation Using KEEL. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 838–849. Springer, Heidelberg (2009)
Chapter Google Scholar
Lasota, T., Makos, M., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., et al. (eds.) New Challenges in Computational Collective Intelligence. SCI, vol. 244, pp. 337–348. Springer, Berlin (2009)
Chapter Google Scholar
Lasota, T., Mazurkiewicz, J., Trawiński, B., Trawiński, K.: Comparison of Data Driven Models for the Validation of Residential Premises using KEEL. International Journal of Hybrid Intelligent Systems 7(1), 3–16 (2010)
MATH Google Scholar
Lasota, T., Sachnowski, P., Trawiński, B.: Comparative Analysis of Regression Tree Models for Premises Valuation Using Statistica Data Miner. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 776–787. Springer, Heidelberg (2009)
Chapter Google Scholar
Luengo, J., García, S., Herrera, F.: A Study on the Use of Statistical Tests for Experimentation with Neural Networks: Analysis of Parametric Test Conditions and Non-Parametric Tests. Expert Systems with Applications 36, 7798–7808 (2009)
Article Google Scholar
Nemenyi, P.B.: Distribution-free Multiple comparisons. PhD thesis, Princeton University (1963)
Google Scholar
Salzberg, S.L.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1, 317–327 (1997)
Article Google Scholar
Shaffer, J.P.: Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association 81(395), 826–831 (1986)
Article MATH Google Scholar
Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. of Psych. 46, 561–584 (1995)
Article Google Scholar
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)
MATH Google Scholar
Torgo, L.: University of Porto (LIACC), Regression DataSets, http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Article Google Scholar
Wright, S.P.: Adjusted p-values for simultaneous inference. Biometrics 48, 1005–1013 (1992)
Article Google Scholar
Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)
Article Google Scholar
Zar, J.H.: Biostatistical Analysis, 5th edn. Prentice-Hall, Englewood Cliffs (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Magdalena Graczyk, Zbigniew Telec & Bogdan Trawiński
Dept. of Spatial Management, Wroclaw University of Environmental and Life Sciences, Ul. Norwida 25/27, 50-375, Wroclaw, Poland
Tadeusz Lasota

Authors

Magdalena Graczyk
View author publications
You can also search for this author in PubMed Google Scholar
Tadeusz Lasota
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Telec
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Trawiński
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, The Parade, Cardiff University, CF24 3AA, Cardiff, UK
Rossitza Setchi
Dept. of Computer Science and Software Engineering, BUckingham Building, Lion Terrace, University of Portsmouth, PO1 3HE, Portsmouth, UK
Ivan Jordanov
KES International, 145-157, St. John Street, EC1V 4PY, London, UK
Robert J. Howlett
School of Electrical and Information Engineering, University of South Australia, ,, Adelaide, Mawson Lakes Campus, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Graczyk, M., Lasota, T., Telec, Z., Trawiński, B. (2010). Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15387-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-15387-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15386-0
Online ISBN: 978-3-642-15387-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Comparison of Robust Model Choice Criteria Within a Metalearning Study

A Metalearning Study for Robust Nonlinear Regression

rNPBST: An R Package Covering Non-parametric and Bayesian Statistical Tests

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Comparison of Robust Model Choice Criteria Within a Metalearning Study

A Metalearning Study for Robust Nonlinear Regression

rNPBST: An R Package Covering Non-parametric and Bayesian Statistical Tests

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation