Authors:
Paraskevas Koukaras
;
Vasiliki Tsichli
and
Christos Tjortjis
Affiliation:
School of Science and Technology, International Hellenic University, 14th km Thessaloniki–N. Moudania, Thermi, 57001, Thessaloniki, Greece
Keyword(s):
Social Media, Prediction, Machine Learning, Data Science, Stocks.
Abstract:
Microblogging data analysis and sentiment extraction has become a popular approach for market prediction. However, this kind of data contain noise and it is difficult to distinguish truly valid information. In this work we collected 782.459 tweets starting from 2018/11/01 until 2019/31/07. For each day, we create a graph (271 graphs in total) describing users and their followers. We utilize each graph to obtain a PageRank score which is multiplied with sentiment data. Findings indicate that using an importance-based measure, such as PageRank, can improve the scoring ability of the applied prediction models. This approach is validated utilizing three datasets (PageRank, economic and sentiment). On average, the PageRank dataset achieved a lower mean squared error than the economic dataset and the sentiment dataset. Finally, we tested multiple machine learning models, showing that XGBoost is the best model, with the random forest being the second best and LSTM being the worst.