Browse Datasets
Sort by # Views, desc
Adult
Predict whether annual income of an individual exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Bank Marketing
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
Student Performance
Predict student performance in secondary education (high school).
Predict Students' Dropout and Academic Success
A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.
Default of Credit Card Clients
This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.
CDC Diabetes Health Indicators
The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.
Census Income
Predict whether income exceeds $50K/yr based on census data. Also known as Adult dataset.
Higher Education Students Performance Evaluation
The data was collected from the Faculty of Engineering and Faculty of Educational Sciences students in 2019. The purpose is to predict students' end-of-term performances using ML techniques.
National Poll on Healthy Aging (NPHA)
This is a subset of the NPHA dataset filtered down to develop and validate machine learning algorithms for predicting the number of doctors a survey respondent sees in a year. This dataset’s records represent seniors who responded to the NPHA survey.
Drug Consumption (Quantified)
Classify type of drug consumer by personality data
0 to 10 of 22