Abstract
The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment.
The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire.
The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM.
For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.