Ahmad 2017 Ijca 915758
Ahmad 2017 Ijca 915758
Ahmad 2017 Ijca 915758
net/publication/321084834
CITATIONS READS
118 14,889
3 authors:
Iftikhar Ali
Virtual University of Pakistan
15 PUBLICATIONS 602 CITATIONS
SEE PROFILE
All content following this page was uploaded by Shabib Aftab on 21 November 2017.
25
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.5, November 2017
According to authors, the machine learning algorithms such as 3. MATERIALS AND METHODS
Naive Bayes, Maximum Entropy and SVM when trained with This paper aims to analyze the performance of Support Vector
emotion tweets can have accuracy more than 80%. The study Machine (SVM) for polarity detection (positive, negative and
also highlighted the steps used in preprocessing stage of neutral) of textual data. Two Pre-labeled twitter datasets are
classification for high accuracy. [24] Presented an application considered for this analysis. The reason of choosing the pre-
of Arabic sentiment analysis on twitter data. They analyzed labeled tweets as test data is to analyze the performance and
1000 tweets for polarity detection by using machine learning accuracy of SVM. The output polarity for each tweet from
techniques, NB and SVM. In the proposed approach feature this algorithm will be compared to the pre-labeled class and
vectors were applied to machine learning classifiers for higher then the difference will be calculated by Weka. The
accuracy. The authors also pointed out some problem areas in performance will be measured in terms of precision, recall and
training data such as multiple occurrences of tweets, opinion f measure [1], [2], [3], [8].
spamming and dual opinion tweets. These issues could put the
question mark for the level of achieved accuracy. In [25], the
authors have used three different machine learning algorithms 3.1 Weka
Naïve Bayes, Decision Trees and Support Vector Machine for In this study, we have used Weka [4], [7] for classification
sentiment classification of Arabic dataset which was obtained and performance analysis of SVM. It is one of the widely used
from twitter. This research has followed a framework for tools to analyze the working of data mining and machine
Arabic tweets classification in which two special sub-tasks learning algorithms. Weka is developed in Java language at
were performed in pre-processing, Term Frequency-Inverse the University of Waikato, New Zealand. It is widely accepted
Document Frequency (TF-IDF) and Arabic stemming. They due to its easy to use GUI interface. It is very famous tool due
have used one dataset with three algorithms and performance to its portability and General Public License.
has been evaluated on the basis three different information
retrieval metrics precision, recall, and f-measure. In [26], the 3.2 Datasets
authors have proposed an efficient feature vector technique by Two pre-labeled datasets of tweets are used in this research.
dividing the feature extraction process in two steps after the First dataset contains the tweets about self-driving cars [5]. It
preprocessing. In first step, those features are extracted which contains 110 very negative, 685 slightly negative, 4245
are twitter specific and then added to feature vector. After that neutral, 1444 slightly positive, 459 very positive and 213
these features are removed from the tweets and then again the irrelevant tweets.
feature extraction process is done just like the case with
normal text. These extracted features are also added to the Table 1. Twitter dataset for self-driving cars
feature vector. The accuracy of the proposed feature vector
Class Tweets
technique is same for Nave Bayes, SVM, Maximum Entropy
and Ensemble classifiers. However this technique performed Very Negative 110
well for the domain of electronic products. [27] Proposed
adaptive multiclass SVM model which works with topic Slightly Negative 685
adaptive sentiment classifier. The authors focused on non-text Neutral 4245
features to handle the sparsity of tweets. An iterative
algorithm is proposed, consisted of three steps: optimization, Slightly Positive 1444
unlabeled data selection and adaptive feature expansion. With
6 topic tweets, the proposed algorithm achieved promising Very Positive 459
high accuracy as compared to other well-known supervised Irrelevant 213
and semi supervised classifiers. The authors in [28] focused
on the polarity of hashtags as a classification feature of tweets Total 7156
in political domain. They proposed the rules for automatic
dataset labeling based on the positive and negative hashtags,
and finally proposed a method to enrich terms in the tweet by Second dataset [6] contains tweets about apple products
hashtag term extraction. The authors highlighted that use of (iphone, iPod etc). This dataset consists of 1218 negative,
positive and negative hashtags for dataset labeling and 2162 neutral, 423 positive and 81 irrelevant tweets.
sentiment classification has accuracy of more than 95%. Table 2. Twitter dataset for Apple products
Moreover this hashtag feature outperforms the unigram
feature when combined with Naïve Bayes, SVM or Logistic Class Tweets
Regression algorithms, but the accuracy decreases when
Negative 1218
combined with Random Forest algorithm based on computing
time to build the model. In [29], three data mining techniques Neutral 2162
are used to predict and analyze students’ academic
performance. The authors have used Decision tree (C4.5), Positive 423
Multilayer Perception and Naïve Bayes. All these techniques
Irrelevant 81
were applied on student’s data which was collected from 2
undergraduate courses in two semesters. According to results, Total 3884
Naïve Bayes showed the prediction accuracy of 86% which
was higher among other MLP and Decision tree. With this
type of prediction it would be easy for teachers to detect those The dataset or input phase of our classification approach
students early, who are expected to get F grade in the course. includes the downloading of relevant datasets and
So ultimately, with the teacher’s special care to those transformation of this data into CSV/ARFF format to use in
students, the academic performance can be improved. WEKA Workbench [4], [7]. We have used simple CLI to
convert text files into ARFF format by using
26
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.5, November 2017
5. RESULTS
This section focuses on the results and comparative analysis
of SVM in different measures for both datasets. For
comparison, three evaluation parameters are used in this
study: Precision, Recall and F Measure.
The precision can be calculated using TP and FP rate as
shown below:
27
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.5, November 2017
Apple 71.2%
28
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.5, November 2017
[6] Crowdflower.com. (2017). [online] Available at: [20] Pang, B., Lee, L., & Vaithyanathan, S. (2002, July).
https://www.crowdflower.com/wp- Thumbs up?: sentiment classification using machine
content/uploads/2016/03/ Apple-Twitter-Sentiment- learning techniques. In Proceedings of the ACL-02
DFE.csv [Accessed 15 Aug. 2017]. conference on Empirical methods in natural language
processing-Volume 10 (pp. 79-86). Association for
[7] Weka: http://www.cs.waikato.ac.nz/~ml/weka/ Computational Linguistics
[8] Zainudin, S., Jasim, D. S., & Bakar, A. A. (2016). [21] Zgheib, W. A., & Barbar, A. M. A Study using Support
Comparative Analysis of Data Mining Techniques for Vector Machines to Classify the Sentiments of Tweets.
Malaysian Rainfall Prediction. International Journal on
Advanced Science, Engineering and Information [22] Arora, R. (2012). Comparative analysis of classification
Technology, 6(6), 1148-1153. algorithms on different datasets using
WEKA. International Journal of Computer
[9] Pang, B., & Lee, L. (2008). Opinion mining and Applications, 54(13).
sentiment analysis. Foundations and Trends® in
Information Retrieval, 2(1–2), 1-135. [23] Go, A., Bhayani, R., & Huang, L. (2009). Twitter
sentiment classification using distant
[10] Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). supervision. CS224N Project Report, Stanford, 1(2009),
Contextual semantics for sentiment analysis of 12.
Twitter. Information Processing & Management, 52(1),
5-19. [24] Shoukry, A., & Rafea, A. (2012, May). Sentence-level
Arabic sentiment analysis. In Collaboration Technologies
[11] Liu, B. (2012). Sentiment analysis and opinion mining. and Systems (CTS), 2012 International Conference
Synthesis lectures on human language technologies, 5(1), on (pp. 546-550). IEEE.
1-167
[25] Altawaier, M. M., & Tiun, S. (2016). Comparison of
[12] Ahmad, M., Aftab, S., Muhammad, S. S., & Waheed, U. Machine Learning Approaches on Arabic Twitter
(2017). Tools and Techniques for Lexicon Driven Sentiment Analysis. International Journal on Advanced
Sentiment Analysis: A Review. Int. J. Multidiscip. Sci. Science, Engineering and Information Technology, 6(6),
Eng, 8(1), 17-23. 1067-1073.
[13] Ahmad, M., Aftab, S., Muhammad, S. S., & Ahmad, S. [26] Neethu, M. S., & Rajasree, R. (2013, July). Sentiment
(2017). Machine Learning Techniques for Sentiment analysis in twitter using machine learning techniques.
Analysis: A Review. Int. J. Multidiscip. Sci. Eng, 8(3), In Computing, Communications and Networking
27-32. Technologies (ICCCNT), 2013 Fourth International
[14] Mudinas, A., Zhang, D., & Levene, M. (2012, August). Conference on (pp. 1-5). IEEE.
Combining lexicon and learning based approaches for [27] Liu, S., Li, F., Li, F., Cheng, X., & Shen, H. (2013,
concept-level sentiment analysis. In Proceedings of the October). Adaptive co-training SVM for sentiment
First International Workshop on Issues of Sentiment classification on tweets. In Proceedings of the 22nd
Discovery and Opinion Mining(p. 5). ACM. ACM international conference on Information &
[15] Malandrakis, N., Kazemzadeh, A., Potamianos, A., & Knowledge Management (pp. 2079-2088). ACM.
Narayanan, S. (2013, June). SAIL: A hybrid approach to [28] Alfina, I., Sigmawaty, D., Nurhidayati, F., & Hidayanto,
sentiment analysis. In SemEval@ NAACL-HLT (pp. A. N. (2017, February). Utilizing Hashtags for Sentiment
438-442). Analysis of Tweets in The Political Domain.
[16] Balage Filho, P., & Pardo, T. (2013, June). NILC_USP: In Proceedings of the 9th International Conference on
A Hybrid System for Sentiment Analysis in Twitter Machine Learning and Computing (pp. 43-47). ACM.
Messages. In SemEval@ NAACL-HLT (pp. 568-572). [29] Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling
[17] “AlchemyAPI.” [Online]. Available: and Predicting Students' Academic Performance Using
https://www.ibm.com/watson/alchemy-api.html. Data Mining Techniques. International Journal of
Modern Education and Computer Science, 8(11), 36.
[18] Ahmad, M., Aftab, S., Ali, I., & Hameed, N. (2017).
Hybrid Tools and Techniques for Sentiment Analysis: A [30] Isa, D., Lee, L. H., Kallimani, V. P., & Rajkumar, R.
Review. Int. J. Multidiscip. Sci. Eng, 8(3) (2008). Text document preprocessing with the Bayes
formula for classification using the support vector
[19] Cortes, C., & Vapnik, V. (1995). Support vector machine. IEEE Transactions on Knowledge and Data
machine. Machine learning, 20(3), 273-297 engineering, 20(9), 1264-1272.
IJCATM : www.ijcaonline.org
29