Departmental Technical Reports (CS)

20 Relation Between Training and Testing Sets: A Pedagogical Explanation

Afshin Gholamy, The University of Texas at El PasoFollow
Vladik Kreinovich, The University of Texas at El PasoFollow
Olga Kosheleva, The University of Texas at El PasoFollow

Publication Date

2-2018

Comments

Technical Report: UTEP-CS-18-09

Abstract

When learning a dependence from data, to avoid overfitting, it is important to divide the data into the training set and the testing set. We first train our model on the training set, and then we use the data from the testing set to gauge the accuracy of the resulting model. Empirical studies show that the best results are obtained if we use 20-30% of the data for testing, and the remaining 70-80% of the data for training. In this paper, we provide a possible explanation for this empirical result.

Download

Included in

Computer Sciences Commons

COinS

Departmental Technical Reports (CS)

Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation

Publication Date

Comments

Abstract

Included in

Search

Links

Browse

Author Corner

Links

Departmental Technical Reports (CS)

Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation

Authors

Publication Date

Comments

Abstract

Included in

Share

Search

Links

Browse

Author Corner

Links