Nothing Special   »   [go: up one dir, main page]

Ebook Sentiment Analysis For Social Media Carlos A Iglesias Editor Online PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Sentiment Analysis for Social Media

Carlos A. Iglesias (Editor)


Visit to download the full and correct content document:
https://ebookmeta.com/product/sentiment-analysis-for-social-media-carlos-a-iglesias-
editor/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Sentiment Analysis Mining Opinions Sentiments and


Emotions 2nd Edition Bing Liu

https://ebookmeta.com/product/sentiment-analysis-mining-opinions-
sentiments-and-emotions-2nd-edition-bing-liu/

The End of Marketing Humanizing Your Brand in the Age


of Social Media 2nd Edition Gil Carlos

https://ebookmeta.com/product/the-end-of-marketing-humanizing-
your-brand-in-the-age-of-social-media-2nd-edition-gil-carlos-2/

The End of Marketing Humanizing Your Brand in the Age


of Social Media 2nd Edition Gil Carlos

https://ebookmeta.com/product/the-end-of-marketing-humanizing-
your-brand-in-the-age-of-social-media-2nd-edition-gil-carlos/

Social Media Data Extraction and Content Analysis 1st


Edition Shalin Hai-Jew

https://ebookmeta.com/product/social-media-data-extraction-and-
content-analysis-1st-edition-shalin-hai-jew/
Social media management: Using Social Media as a
Business Instrument Second Edition Amy Van Looy

https://ebookmeta.com/product/social-media-management-using-
social-media-as-a-business-instrument-second-edition-amy-van-
looy/

Social Media Strategy: A Practical Guide to Social


Media Marketing and Customer Engagement, 2e Julie
Atherton

https://ebookmeta.com/product/social-media-strategy-a-practical-
guide-to-social-media-marketing-and-customer-engagement-2e-julie-
atherton/

Social Media Strategy: A Practical Guide to Social


Media Marketing and Customer Engagement 2nd Edition
Julie Atherton

https://ebookmeta.com/product/social-media-strategy-a-practical-
guide-to-social-media-marketing-and-customer-engagement-2nd-
edition-julie-atherton/

Data Analysis for Social Science Elena Llaudet

https://ebookmeta.com/product/data-analysis-for-social-science-
elena-llaudet/

Social Media for Project Management 1st Edition Johan


Ninan

https://ebookmeta.com/product/social-media-for-project-
management-1st-edition-johan-ninan/
Sentiment Analysis
for Social Media
Edited by
Carlos A. Iglesias and Antonio Moreno
Printed Edition of the Special Issue Published in Applied Sciences

www.mdpi.com/journal/applsci
Sentiment Analysis for Social Media
Sentiment Analysis for Social Media

Special Issue Editors


Carlos A. Iglesias
Antonio Moreno

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin
Special Issue Editors
Carlos A. Iglesias Antonio Moreno
Departamento de Ingenierı́a de Departament d’Enginyeria
Sistemas Telemáticos, Informàtica i Matemàtiques,
ETSI Telecomunicación Escola Tècnica Superior
Spain d’Enginyeria, Universitat Rovira
i Virgili (URV)
Spain

Editorial Office
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal
Applied Sciences (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special
issues/Sentiment Social Media).

For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number,
Page Range.

ISBN 978-3-03928-572-3 (Pbk)


ISBN 978-3-03928-573-0 (PDF)


c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license, which allows users to download, copy and build upon
published articles, as long as the author and publisher are properly credited, which ensures maximum
dissemination and a wider impact of our publications.
The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons
license CC BY-NC-ND.
Contents

About the Special Issue Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Carlos A. Iglesias and Antonio Moreno


Sentiment Analysis for Social Media
Reprinted from: Appl. Sci. 2019, 9, 5037, doi:10.3390/app9235037 . . . . . . . . . . . . . . . . . . . 1

Hyoji Ha, Hyunwoo Han, Seongmin Mun, Sungyun Bae, Jihye Lee and Kyungwon Lee
An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment
Word of Movie Review Data
Reprinted from: Appl. Sci. 2019, 9, 2419, doi:10.3390/app9122419 . . . . . . . . . . . . . . . . . . . 5

Hannah Kim and Young-Seob Jeong


Sentiment Classification Using Convolutional Neural Networks
Reprinted from: Appl. Sci. 2019, 9, 2347, doi:10.3390/app9112347 . . . . . . . . . . . . . . . . . . . 31

Xingliang Mao, Shuai Chang, Jinjing Shi, Fangfang Li and Ronghua Shi
Sentiment-Aware Word Embedding for Emotion Classification
Reprinted from: Appl. Sci. 2019, 9, 1334, doi:10.3390/app9071334 . . . . . . . . . . . . . . . . . . . 45

Mohammed Jabreel and Antonio Moreno


A Deep Learning-Based Approach for Multi-Label Emotion Classification in Tweets
Reprinted from: Appl. Sci. 2019, 9, 1123, doi:10.3390/app9061123 . . . . . . . . . . . . . . . . . . . 59

Eline M. van den Broek-Altenburg and Adam J. Atherly


Using Social Media to Identify Consumers’ Sentiments towards Attributes of Health Insurance
during Enrollment Season
Reprinted from: Appl. Sci. 2019, 9, 2035, doi:10.3390/app9102035 . . . . . . . . . . . . . . . . . . . 75

Sunghee Park and Jiyoung Woo


Gender Classification Using Sentiment Analysis and Deep Learning in a Health Web Forum
Reprinted from: Appl. Sci. 2019, 9, 1249, doi:10.3390/app9061249 . . . . . . . . . . . . . . . . . . . 85

Hui Liu, Yinghui Huang, Zichao Wang, Kai Liu, Xiangen Hu and Weijun Wang
Personality or Value: A Comparative Study of Psychographic Segmentation Based on an Online
Review Enhanced Recommender System
Reprinted from: Appl. Sci. 2019, 9, 1992, doi:10.3390/app9101992 . . . . . . . . . . . . . . . . . . . 97

Guadalupe Obdulia Gutiérrez-Esparza, Maite Vallejo-Allende and José Hernández-Torruco


Classification of Cyber-Aggression Cases Applying Machine Learning
Reprinted from: Appl. Sci. 2019, 9, 1828, doi:10.3390/app9091828 . . . . . . . . . . . . . . . . . . . 125

v
About the Special Issue Editors
Carlos A. Iglesias (Professor). Prof. Carlos Iglesias Associate Professor at the Universidad
Politécnica de Madrid. He holds a Ph.D. in Telecommunication Engineering. He was previously
Deputy Director at Grupo Gesfor and Innovation Director at Germinus XXI. He has been actively
involved in research projects funded by private companies as well as national and European
programs. His research interests are focused on intelligent systems (knowledge engineering,
multi-agent systems, machine learning, and natural language processing).

Antonio Moreno (Professor). Dr. Moreno is a Full Professor of Artificial Intelligence at


University Rovira i Virgili (URV) in Tarragona, Spain. He was the founder and director of the
ITAKA (Intelligent Technologies for Advanced Knowledge Acquisition) research group until 2019.
Since 2018, he has been the Deputy Director of URV Engineering School. He has been the author
of more than 60 journal papers and over 125 conference papers. He has supervised 8 Ph.D. theses
on different topics, including ontology learning, agents applied in health care, intelligent data
analysis applied on healthcare data, recommender systems, and multi-criteria decision making. His
current research interests are focused on sentiment analysis, recommender systems, and multi-criteria
decision support systems.

vii
applied
sciences
Editorial
Sentiment Analysis for Social Media
Carlos A. Iglesias 1, *,† and Antonio Moreno 2,†
1 Intelligent Systems Group, ETSI Telecomunicación, Avda. Complutense 30, 28040 Madrid, Spain
2 Intelligent Technologies for Advance Knowledge Acquisition (ITAKA) Group,
Escola Tècnica Superior d’Enginyeria, Departament d’Enginyeria Informàtica i Matemàtiques,
Universitat Rovira i Virgili, 43007 Tarragona, Spain; antonio.moreno@urv.cat
* Correspondence: carlosangel.iglesias@upm.es; Tel.: +34-910671900
† These authors contributed equally to this work.

Received: 31 October 2019; Accepted: 19 November 2019; Published: 22 November 2019 

Abstract: Sentiment analysis has become a key technology to gain insight from social networks.
The field has reached a level of maturity that paves the way for its exploitation in many different
fields such as marketing, health, banking or politics. The latest technological advancements, such as
deep learning techniques, have solved some of the traditional challenges in the area caused by the
scarcity of lexical resources. In this Special Issue, different approaches that advance this discipline
are presented. The contributed articles belong to two broad groups: technological contributions
and applications.

Keywords: sentiment analysis; emotion analysis; social media; affect computing

1. Introduction
Sentiment analysis technologies enable the automatic analysis of the information distributed
through social media to identify the polarity of posted opinions [1]. These technologies have been
extended in the last years to analyze other aspects, such as the stance of a user towards a topic [2]
or the users’ emotions [3], even combining text analytics with other inputs, including multimedia
analysis [4] or social network analysis [5].
This Special Issue “Sentiment Analysis for Social Media" aims to reflect recent developments in
sentiment analysis and to present new advances in sentiment analysis that enable the development
of future sentiment analysis and social media monitoring methods. The following sections detail the
selected works in the development of new techniques as well as new applications.

2. New Paths in Sentiment Analysis on Social Media


Traditionally, sentiment analysis has focused on text analysis using Natural Language Processing
and feature-based Machine Learning techniques. The advances in disciplines such as Big Data and
Deep Learning technologies have impacted and benefited the evolution of the field. This Special Issue
includes four works that propose novel techniques.
In the first work, titled “An Improved Study of Multilevel Semantic Network Visualization for Analyzing
Sentiment Word of Movie Review Data” [6], Ha et al. propose a method for sentiment visualization in
massive social media. For this purpose, they design a multi-level sentiment network visualization
mechanism based on emotional words in the movie review domain. They propose three visualization
methods: a heatmap visualization of the semantic words of every node, a two-dimensional scaling
map of semantic word data, and a constellation visualization using asterism images for each cluster
of the network. The proposed visualizations have been used as a recommender system that suggest
movies with similar emotions to the previously watched ones. This novel idea of recommending
contents based on similar emotional patterns can be applied to other social networks.

Appl. Sci. 2019, 9, 5037; doi:10.3390/app9235037 1 www.mdpi.com/journal/applsci


Appl. Sci. 2019, 9, 5037

In the second contribution, titled “Sentiment Classification Using Convolutional Neural Networks” [7],
Kim and Jeong deal with the problem of textual sentiment classification. They propose a Convolutional
Neural Network (CNN) model consisting of an embedding layer, two convolutional layers, a pooling
layer, and a fully-connected layer. The model is evaluated in three datasets (movie review data,
customer review data and Stanford Sentiment Treebank data) and compared with traditional Machine
Learning models and state of the art Deep Learning models. Their main conclusion is that the use of
consecutive convolutional layers is effective for relatively long texts.
In the third work, titled “Sentiment-Aware Word Embedding for Emotion Classification” [8], Mao et al.
suggest the use of a sentiment-aware word embedding for improving emotional analysis. The proposed
method builds a hybrid representation that combines emotional word embeddings based on
an emotional lexicon with semantic word embeddings based on Word2Vec [9]. They use the emotional
lexicon DUTIR, which is a Chinese ontology resource collated and labeled by the Dalian University
of Technology Information Retrieval Laboratory [10]. This resource annotates lexicon entries with
a model of seven emotions (happiness, trust, anger, sadness, fear, disgust and surprise). The evaluation
is done with data from Weibo, a popular Chinese social networking site. The paper evaluates two
methods (direct combination and addition) for building the hybrid representation in several datasets.
They conclude that the experiments prove that the use of hybrid word vectors is effective for supervised
emotion classification, improving significantly the classification accuracy.
Finally, in the fourth theoretical contribution, titled “A Deep Learning-Based Approach for Multi-Label
Emotion Classification in Tweets” [11], Jabreel and Moreno address the problem of multi-class emotion
classification based on Deep Learning techniques. The most popular approach for this problem is
to transform it into multiple binary classification problems, one for each emotion class. This paper
proposes a new transformation approach, so-called xy-pair-set, that transforms the original problem
into just one binary classification problem. The transformation problem is solved with a Deep
Learning-based system, so-called BNet. This system consists of three modules: an embedding module
that uses three embedding models and an attention function, an encoding module based on Recurrent
Neural Networks (RNNs), and a classification module that uses two feed-forward layers with the
ReLU activation function followed by a sigmoid unit. The system is evaluated using the dataset “Affect
in Tweets” of SemEval-2019 Task 1 [2], and it outperformed the state of the art systems.

3. Applications of Sentiment Analysis in Social Media


The wide range of applications of sentiment analysis has fostered its evolution. Sentiment analysis
techniques have enabled to make sense of big social media data to make more informed decisions and
understand social events, product marketings or political events. Four works selected in this Special
Issue deal with the application of sentiment analysis for improving health insurances, understanding
AIDS patients, e-commerce user profiling and cyberagression detection.
In the first work, titled “Using Social Media to Identify Consumers’ Sentiments towards Attributes
of Health Insurance during Enrollment Season” [12], van den Broek-Altenburg and Atherly aim at
understanding the consumers’ sentiments towards health insurances. For this purpose, they mined
Twitter discussions and analyzed them using a dictionary-based approach using the NRC Emotion
Lexicon [13], which provides for each word its polarity as well as its related emotion (anger, anticipation,
disgust, fear, joy, sadness, surprise and trust). The main finding of this study is that consumers are
worried about providers networks, prescription drug benefits and political preferences. In addition,
consumers trust medical providers but fear unexpected events. These results suggest that more
research is needed to understand the origin of the sentiments that drive consumers so that insurers can
provide better insurance plans.
In the second contribution, titled “Gender Classification Using Sentiment Analysis and Deep Learning
in a Health Web Forum” [14], Park and Woo deal also with the application of sentiment analysis
techniques to health-related topics. In particular, they apply sentiment analysis for identifying
gender in health forums based on Deep Learning techniques. The authors analyze messages from

2
Appl. Sci. 2019, 9, 5037

an AIDS-related bulletin board from HealthBoard.com and evaluate both traditional and Deep Learning
techniques for gender classification.
In the third approach [15], titled “Personality or Value: A Comparative Study of Psychographic
Segmentation Based on an Online Review Enhanced Recommender System”, Liu et al. analyze the predictive
and explanatory capability of psychographic characteristics in e-commerce user preferences. For this
purpose, they construct a pychographic lexicon based on seed words provided by psycholinguistics that
are expanded using synonyms from WordNet [16], resulting in positive and negative lexicons for two
psychographic models, Schwartz Value Survey (SVS) [17] and Big Five Factor (BFF) [18]. Then they
construct word embeddings using Word2Vec [9] and extend the corpus with word embeddings
from an Amazon corpus [19]. Finally, they incorporate the lexicons in a deep neural network-based
recommender system to predict the users’ online purchasing behaviour. They also evaluate customer
segmentation based on BDSCAN clustering [20], but this does not provide a significant improvement.
The main insight of this research is that psychographic variables improve the explanatory power of
e-consumer preferences, but their prediction capability is not significant.
Finally, in the fourth work [21], titled “Classification of Cyber-Aggression Cases Applying Machine
Learning”, Gutiérrez-Esparza et al. deal with the detection of cyberagression. They build and label
a corpus of cyberagression news from Facebook in Latinamerica and develop a classification model
based on Machine Learning techniques. The developed corpus can foster research in this field, given
the scarcity of lexical resources in languages different from English.

4. Conclusions
The diversity of approaches of the articles included in this Special Issue shows the great interest
and dynamism of this field. Moreover, this Special Issue of Applied Sciences contributes to provide
a good overview of some of the main areas of research in this field.

Funding: This research received no external funding.


Acknowledgments: The Guest Editors would like to thank all the authors that have participated in this Special
Issue and also the reference contact in MDPI, Nyssa Yuan, for all the support and work dedicated to the success of
this Special Issue.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Liu, B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167.
2. Mohammad, S.; Bravo-Marquez, F.; Salameh, M.; Kiritchenko, S. Semeval-2018 task 1: Affect in tweets.
In Proceedings of the 12th International Workshop on Semantic Evaluation, New Orleans, LA, USA,
5–6 June 2018; pp. 1–17.
3. Cambria, E.; Poria, S.; Hussain, A.; Liu, B. Computational Intelligence for Affective Computing and
Sentiment Analysis [Guest Editorial]. IEEE Comput. Intell. Mag. 2019, 14, 16–17.
4. Li, Z.; Fan, Y.; Jiang, B.; Lei, T.; Liu, W. A survey on sentiment analysis and opinion mining for social
multimedia. Multimed. Tools Appl. 2019, 78, 6939–6967.
5. Sánchez-Rada, J.F.; Iglesias, C.A. Social context in sentiment analysis: Formal definition, overview of current
trends and framework for comparison. Inf. Fusion 2019, 52, 344–356.
6. Ha, H.; Han, H.; Mun, S.; Bae, S.; Lee, J.; Lee, K. An Improved Study of Multilevel Semantic Network
Visualization for Analyzing Sentiment Word of Movie Review Data. Appl. Sci. 2019, 9, 2419. [CrossRef]
7. Kim, H.; Jeong, Y.S. Sentiment Classification Using Convolutional Neural Networks. Appl. Sci. 2019, 9, 2347.
[CrossRef]
8. Mao, X.; Chang, S.; Shi, J.; Li, F.; Shi, R. Sentiment-Aware Word Embedding for Emotion Classification.
Appl. Sci. 2019, 9, 1334. [CrossRef]
9. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space.
arXiv 2013, arXiv:1301.3781.

3
Appl. Sci. 2019, 9, 5037

10. Chen, J. The Construction and Application of Chinese Emotion Word Ontology. Master’s Thesis, Dailian
University of Technology, Dalian, China, 2008.
11. Jabreel, M.; Moreno, A. A Deep Learning-Based Approach for Multi-Label Emotion Classification in Tweets.
Appl. Sci. 2019, 9, 1123. [CrossRef]
12. Van den Broek-Altenburg, E.M.; Atherly, A.J. Using Social Media to Identify Consumers’ Sentiments towards
Attributes of Health Insurance during Enrollment Season. Appl. Sci. 2019, 9, 2035. [CrossRef]
13. Mohammad, S.M.; Kiritchenko, S.; Zhu, X. NRC-Canada: Building the state-of-the-art in sentiment analysis
of tweets. arXiv 2013, arXiv:1308.6242.
14. Park, S.; Woo, J. Gender Classification Using Sentiment Analysis and Deep Learning in a Health Web Forum.
Appl. Sci. 2019, 9, 1249. [CrossRef]
15. Liu, H.; Huang, Y.; Wang, Z.; Liu, K.; Hu, X.; Wang, W. Personality or Value: A Comparative Study of
Psychographic Segmentation Based on an Online Review Enhanced Recommender System. Appl. Sci. 2019,
9, 1992. [CrossRef]
16. Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [CrossRef]
17. Sagiv, L.; Schwartz, S.H. Cultural values in organisations: Insights for Europe. Eur. J. Int. Manag. 2007,
1, 176–190. [CrossRef]
18. McCrae, R.R.; Costa, P.T., Jr. The five-factor theory of personality. In Handbook of Personality: Theory and
Research; The Guilford Press: New York, NY, USA, 2008; pp. 159–181.
19. McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and
substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development
in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52.
20. Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X.. A density-based algorithm for discovering clusters in large spatial
databases with noise. In KDD-96 Proceddings; AAAI Press: Portland, OR, USA, 1996; pp. 226–231.
21. Gutiérrez-Esparza, G.O.; Vallejo-Allende, M.; Hernández-Torruco, J. Classification of Cyber-Aggression
Cases Applying Machine Learning. Appl. Sci. 2019, 9, 1828. [CrossRef]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

4
applied
sciences
Article
An Improved Study of Multilevel Semantic Network
Visualization for Analyzing Sentiment Word of
Movie Review Data
Hyoji Ha 1 , Hyunwoo Han 1 , Seongmin Mun 1,2 , Sungyun Bae 1 , Jihye Lee 1 and
Kyungwon Lee 3, *
1 Lifemedia Interdisciplinary Program, Ajou University, Suwon 16499, Korea; hjha0508@ajou.ac.kr (H.H.);
ainatsumi@ajou.ac.kr (H.H.); stat34@ajou.ac.kr (S.M.); roah@ajou.ac.kr (S.B.); alice0428@ajou.ac.kr (J.L.)
2 Science of Language, MoDyCo UMR 7114 CNRS, University Paris Nanterre, 92000 Nanterre, France
3 Department of Digital Media, Ajou University, Suwon 16499, Korea
* Correspondence: kwlee@ajou.ac.kr

Received: 30 April 2019; Accepted: 31 May 2019; Published: 13 June 2019 

Abstract: This paper suggests a method for refining a massive amount of collective intelligence data
and visualizing it with a multilevel sentiment network in order to understand the relevant information
in an intuitive and semantic way. This semantic interpretation method minimizes network learning
in the system as a fixed network topology only exists as a guideline to help users understand.
Furthermore, it does not need to discover every single node to understand the characteristics of each
clustering within the network. After extracting and analyzing the sentiment words from the movie
review data, we designed a movie network based on the similarities between the words. The network
formed in this way will appear as a multilevel sentiment network visualization after the following
three steps: (1) design a heatmap visualization to effectively discover the main emotions on each
movie review; (2) create a two-dimensional multidimensional scaling (MDS) map of semantic word
data to facilitate semantic understanding of network and then fix the movie network topology on
the map; (3) create an asterism graphic with emotions to allow users to easily interpret node groups
with similar sentiment words. The research also presents a virtual scenario about how our network
visualization can be used as a movie recommendation system. We next evaluated our progress to
determine whether it would improve user cognition for multilevel analysis experience compared to
the existing network system. Results showed that our method provided improved user experience in
terms of cognition. Thus, it is appropriate as an alternative method for semantic understanding.

Keywords: collaborative schemes of sentiment analysis and sentiment systems; review data mining;
semantic networks; sentiment word analysis

1. Introduction

1.1. Background and Purpose of Research


At present, we are faced with an enormous amount of information every day due to the vast
growth of information and communications technology. Thus, there is increased interest in effective
data processing and analysis. In particular, big data is playing an increasingly important role since
it is suitable for refined and semantic processing, even if the amount of data is considerable or if its
structure is complex [1]. Big data has also attracted great attention in the field of data visualization
primarily for the design of efficient processing and semantic analysis. Data visualization is a redesigned
concept of data analysis with better readability, offering distinct insights that cannot be grasped from
a table or graph [2]. Network visualization is a visual tool to semantically analyze data if there is a

Appl. Sci. 2019, 9, 2419; doi:10.3390/app9122419 5 www.mdpi.com/journal/applsci


Appl. Sci. 2019, 9, 2419

massive amount or if the structure is complex [3]. Therefore, this study aims to demonstrate massive
collective intelligence data through network visualization. This study also proposes an intuitive
and semantic analysis method by designing multilevel sentiment network visualization based upon
emotion words [4]. Social network analysis plays a significant role in understanding and finding
solutions for society-functional problems by examining the original structure and relationships of the
network. Therefore, network visualization is applied in a wide variety of fields, including network
analysis based on data similarity, network analysis about social-scientific situations, graph theory, and
recommendation systems.
Force-directed graph drawing algorithm is a standard layout algorithm for designing a network
graph. It is considered to be highly useful since it allows related nodes to form a cluster [5]. However,
the location of the node varies each time the graph is formed if a force-directed graph drawing
algorithm is used because the entry value of the node’s location is random and the eventual position is
determined by relative connections between nodes. Therefore, users must repeat the learning of the
system if in a force-directed network, since the absolute location information is not fixed (Figure 1).
This is a notable drawback. Such difficulties can become great obstacles when interpreting the network
if it consists of a considerable amount of data. Furthermore, the collective intelligence can deliver the
wrong data if it is visualized based on a force-directed layout since locations of nodes may vary.

Figure 1. Force-directed layout network (left: 300 nodes; right: 400 nodes). Locations of nodes continue
to change whenever data is added or modified. Reproduced with permission from [6]; published by
[IEEE], 2015.

Following a preliminary study of ours [6–8] which introduced the sentiment network visualization
approaches that are the basis of this work, we designed a multilevel sentiment network visualization
so as to facilitate intuitive understanding of the complex collective intelligence data. Also, we present
an approach to find solutions for those difficulties of a force-directed layout. Specifically, the primary
contributions of our work are:

• More complete description of the sentiment movie network developing process: we present
the explanation of flock algorithm method. Even if nodes increase in large scale, with this
method, nodes won’t be overlapped making bigger sentiment network configuration possible
with improved delivery.
• Considering more diversity visual metaphorical graphic model, and conducting a survey about
selection of graphic to better support interaction with the sentiment movie network.
• An evaluation for comparing the awareness level on network location.
• An evaluation for comparing between its final visualization map (Network + MDS+ Constellation
Metaphor) and our previous visualization map [8].

6
Appl. Sci. 2019, 9, 2419

1.2. Research Process


After targeting “movie review data” among examples of collective intelligence, we selected 36
sentiment words that generally appeared in movie reviews. These were classified into seven clustering
characteristics through principal component analysis. These keywords were also analyzed through
multidimensional scaling (MDS) to discover any correlations, such as similarities and differences.
A two-dimensional distribution map was designed in the next step according to these correlations.
We then designed a sentiment-based movie similarity network based on this two-dimensional
distribution map through three steps. First, assuming that each node contains one piece of information
about a movie, edges were created to match with each other if the nodes shared the most similar
emotion characteristics. This eventually led to a network of 678 movie nodes. Heatmap visualization
was also applied to allow us to more easily grasp the frequency of sentiment words of each movie [9].
A two-dimensional distribution map with sentiment words related to the movie review was created
accordingly to distribute nodes in semantic positions. As each node was influenced by the semantic
point depending on its attribute value, absolute positions of nodes were designed to reflect attributes
of nodes [10]. Thus, nodes representing each movie formed a network layout which attracted nodes
to where sentiment words were spread on the two-dimensional distribution map according to its
frequency. The whole structure was referred to as the sentiment-movie network. Third, we applied a
constellation visualization to label the characteristics of each cluster when nodes in a network structure
showed clustering on a two-dimensional keyword distribution map. An asterism graphic consisted
of objects representing traits of each clustering formed by nodes and edges of a network clustering
structure. We next suggested a virtual scenario in the general public’s view in order to determine how
this network visualization could be applied in the actual movie recommendation process. Assuming
that a random user enjoyed a certain movie, we demonstrated the process which the user went through
in order to find a movie with similar emotions. Finally, three kinds of evaluations were conducted to
verify whether the visualization method that we proposed could be linked to cognition improvement
for users’ multilevel analysis experience. The first test was designed to verify whether users would
show a satisfactory understanding of the meaning of the location structure of nodes in a movie network
visualization. The next test compared the two groups provided with or without heatmap visualization
to see how well they could adopt sentiment words data when discovering the network. The last test
was designed to determine which visualization case worked the most effectively for subject groups to
conduct a semantic analysis among the following three cases: (1) the first visualization with network
nodes; (2) the second visualization that involved fusing the first visualization with the two-dimensional
sentiment word map indicating locations of the sentiments; and (3) the third visualization which
superimposed the constellation metaphor over the second visualization. Figure 2 illustrates three
processes for the sentiment analysis, data visualizing, and the evaluation workflow.

Figure 2. Research framework.

7
Appl. Sci. 2019, 9, 2419

2. Materials and Methods

2.1. Sentiment Analysis and Visualization


Sentiment word analysis is a fundamental process in numerous applications of opinion and
emotion mining, such as review crawling, data refining, and SNS classification [11]. In order to
contribute to sentiment analysis and data visualization research fields, there is an abundance of studies
that applied sentiment words and sentiment visualization system.
MyungKyu et al. [12] was used to deal with sentiment words shown in online postings and social
media blogs. JoungYeon et al. [13] was used to illustrate adjectives used to describe the texture of
haptic and indicated relations between adjectives on multi-dimensional scaling.
Ali et al. [14] provided a clear and logical taxonomy of sentiment analysis work. This taxonomy
was made up of two main subcategories in this field: opinion mining and emotion mining. Also,
they present a set of significant resources, including lexicons and datasets that researchers need for a
polarity classification task. This would allow them to study the sentiment of any topic and determine
the polarity of said topic, be it either positive or negative.
Kostiantyn et al. [15] suggest the state of the art sentiment visualization techniques and trends by
providing the 132 cases of “sentimentvis” interactive survey browser. In this research, all the cases are
classified based on the categorization standards. This categorization consists of the data domain, data
source, data properties, analytic tasks, visualization tasks, visual variable, and visual representation.
Also, the collected data indicates the growing multidisciplinary insight for visualization of sentiment
with regard to multiple data domains and tasks.

2.2. Movie Recommendation


Studies on movie recommendation methods have mainly focused on “content-based
recommendation systems utilizing information filtering technology” and “corporative recommendation
system.” According to Oard et al. [16], a content-based recommendation system was used to extract
the characteristics from individual information and its following preference.
Movie recommendation systems based on corporative filtering have been analyzed by
Sarwar et al. [17] and Li et al. [18]. They were used to recommend options selected by the group that
shared similar information with individual users.
While these previous studies made recommendations based on individual information of users, we
further managed user’s experience data as in “emotional review data felt during the movie watching,”
thus enriching emotional attributes that fit the purpose of movie recommendation.
Recently, Ziani et al. [19] suggest a recommendation algorithm based on sentiment analysis to
help users decide on products, restaurants, movies, and other services using online product reviews.
The main goal of this study is to combine both the recommendation system and sentiment analysis in
order to generate the most accurate recommendations for users.
However, this work has a limitation that did not present a user-centered recommendation system
and focused on developing the automatic recommendation algorithm based on semi-supervised
support vector machines (S3VM). Meanwhile, our system provides user-centered recommendation
experience through using the network visualization and metaphorical sentiment graphic, which is
easy to analyze.

2.3. Network Visualization and Layouts


A number of studies have been conducted on network visualization methods, including several
recent studies on user’s perception. For example, Cody Dunne et al. [20] have introduced a technique
called motif simplification, in which common patterns of nodes and links are replaced with compact
and meaningful glyphs, allowing users to analyze network visualization easily.
While this method identifies the maximal motif more accurately, even enabling the estimation of
size through glyph and interaction, it has several difficulties for ordinary users. For example, users

8
Appl. Sci. 2019, 9, 2419

must put considerable effort toward learning concepts of motif and interpreting glyph. In addition,
they have difficulty in discovering the optimal set of motifs.
Another study by Giorgio et al. [21] presented a tool called “Knot” with a focus on the analysis of
multi-dimensional and heterogeneous data to achieve interface design and information visualization
in a multidisciplinary research context.
Furthermore, Nathalie et al. [22] suggested methods to solve clustering ambiguity and increase
readability in network visualization. That paper states that major challenges facing social network
visualization and analysis include the lack of readability of resulting large graphs and the ambiguous
assignment of actors shared among multiple communities to a single community.
They proposed using actor duplication in social networks in order to assign actors to multiple
communities without substantially affecting the readability. Duplications significantly improve
community-related tasks but can interfere with other graph readability tasks.
Their research provided meaningful insights as to how central actors could bridge the community.
However, it also left confusions when distinguishing duplicated nodes and analyzing visualizations
that exceeded a certain size since node duplications could artificially distort visualization.
Gloria et al. [23] present a novel method that uses semantic network analysis as an efficient way
to analyze vaccine sentiment. This study enhanced understanding of the scope and variability of
attitudes and beliefs toward vaccination by using the Gephi network layout.
These four studies mentioned above all aimed to improve network visualization from the
perspective of users, with a particular focus on settling challenges of visualization distortion and
existing network through users’ learning of new technologies. However, network layouts used in
previous studies could not handle overlapping of nodes when faced with increasing amounts of data.
In addition, the user may have to repeat the system since the location may vary. Furthermore,
these network layouts were inefficient in that users had to check each node one by one in order to
identify characteristics of a cluster.
Our research is consistent with previous studies in that it will fix the network data based upon
sentiment words. It was designed to minimize users’ learning and prevent distorted interpretation by
applying a metaphor based on the characteristics of nodes in the network.
While previous studies made recommendations based on user’s individual information, we
further managed user’s experience data as in “emotional review data felt during the movie watching,”
thus enriching emotional attributes that fit the purpose of movie recommendation.

3. Sentiment Analysis Data Processing


Sentiment analysis is the field of study that analyzes people’s opinions, sentiment, attitudes,
evaluations, survey, and emotions towards entities such as issues, events, topics, and their attributes.
For sentiment analysis, we performed three data processes as follows.

3.1. Sentiment Words Collection


We selected 100 sentiment words filtered from 834 sentiment words based on the research
conducted by DougWoong et al. [24] in order to create a sentiment word distribution map. A further
survey of 30 subjects aged from 20 to 29 years determined the most frequently used words among
these 100 sentiment words. Following basic instruction on the concept of sentiment words during
movies, we investigated to what degree the emotion represented in each sentiment word could be
drawn from watching some movies.
The survey began with the question “Describe how much you feel as in each sentiment words after
watching the movies with the following genres, based on your previous experience.” The questionnaire
used a 7-point Likert Scale to evaluate responses ranging from “Strongly irrelevant” to “Strongly
relevant.” After eliminating 32 sentiment words that scored below the average, 68 sentiment words
were finally selected [9].

9
Appl. Sci. 2019, 9, 2419

3.2. Sentiment Words Refinement


In order to select the final sentiment words used for a two-dimensional distribution map among
68 sentiment words from the user survey, we collected and compared sentiment word data in existing
movie reviews, eliminating words that were rarely used. This procedure consisted of three phases as
shown below.
Crawling: Movie review data used in this research were collected from movie information
service NAVER [25], a web portal site with the largest number of users in Korea. We designed a web
crawler to automate sentiment word collection from movie reviews. This crawler covered three stages:
(1) collecting unrefined movie reviews and tags in the NAVER movie web page, (2) refining collected
data suitable for this research, and (3) extracting sentiment words based on analysis of refined data.
As a result, we obtained 4,107,605 reviews of 2289 movies from 2004 to 2015.
Establishing sentiment word dictionary: We divided text data into morphemes collected through
the crawling process using a mecab-ko-lucene-analyzer [26] and further extracted sentiment morphemes.
A total of 133 morpheme clusters were selected through several text mining processes. A morpheme is
the smallest grammatical unit in a language. In other words, it is the smallest meaningful unit of a
language [27].
Each emotion morpheme selected was classified according to 68 kinds of detailed sentiment word
categories. A sentiment word dictionary classified by the chosen sentiment word was then established.
Extracting emotion morphemes and classifying them by category were conducted in consultation with
Korean linguists.
To produce more accurate results, we eliminated less influential sentiment word clusters after
matching them with actual movie review data. We calculated Term (w) Frequency (tf : Term Frequency)
of each sentiment word cluster (t) suggested by the following formula.

j
t f (t, d) = Σ i=0 f (wi , d) (1)

j = number of words in sentimental group t


The number of times that term t occurs in document d
Then, inverse document frequency (idf ) was also drawn from this formula to lower the weight of
the general sentiment word group. The inverse document frequency (idf ) is “the logarithmically scaled
inverse fraction of the documents that contain the word” [28]. It reduces the number of terms with a
high incidence across documents.
 
N
id f (t, D) = log (2)
|{d ∈ D : t ∈ d}|

N = total number of documents in the corpus N = |D|


D = document set
TF-IDF score of sentiment word clusters on each movie was calculated using the following formula:

TFIDF(t, d, D) = t f (t, d) ∗ id f (t, D) (3)

We next considered the maximum TF-IDF score that might appear from each sentiment word in
order to decrease the number of sentiment words. For example, the word “Aghast” showed a TF-IDF
score of no more than 0.8% in every movie, whereas “Sweet” scored 42% for at least one movie. We
eliminated sentiment words with TF-IDF scores under 10%. Eventually, we selected 36 sentiment
words. These sentiment word clusters were broadly divided into “Happy,” “Surprise,” “Boring,”
“Sad,” “Anger,” “Disgust,” and “Fear” as shown in Table 1.

10
Appl. Sci. 2019, 9, 2419

Table 1. Final sentiment words. Reproduced with permission from [6]; published by [IEEE], 2015.

Clustering Characteristics Sentiment Words


Happy Happy, Sweet, Funny, Excited, Pleasant, Fantastic, Gratified, Enjoyable, Energetic
Surprise Surprised, Ecstatic, Awesome, Wonderful, Great, Touched, Impressed
Boring Calm, Drowsy, Bored
Sad Pitiful, Lonely, Mournful, Sad, Heartbroken, Unfortunate
Anger Outraged, Furious
Disgust Ominous, Cruel, Disgusted
Fear Scared, Chilly, Horrified, Terrified, Creepy, Fearsome

3.3. Movie Data Collection


Movie samples used in network visualization were also collected from NAVER movie service in
the same way as movie review data [25]. Based on 2289 movie samples from 2004 to 2015 registered
in the NAVER movie service, movies with more than 1000 emotion morphemes were used to filter
the emotion level. As a result, 678 movie samples were ultimately selected and used as network
sample data.

4. Visualization Proposal
We proposed three methods to solve problems of the existing social network visualization layouts
as follows.

4.1. Heatmap Visualization


Heatmap visualization is a method consisting of rectangular tiling shaded in scaled color [29,30].
It is used to search for anomalous patterns of data metrics as outlined in previous research [31]. To
comprehend convergences, the method described by Young-Sik et al. [32] was used. A recent study
proposed 100 cases of IT fractures, visualized and analyzed using the 2D parametric fracture probability
heatmap [33]. The proposed map projection technique can project the high dimensional proximal
femur fracture information onto a single 2D plane. Also, they applied heatmap in order to present the
level of risk.
In this research, we showed TF-IDF size frequency of sentiment words in a heatmap, utilizing
coordinate space in 2-dimensional distribution map of each sentiment word in order to visualize a
sentimental distribution graphic for each movie node. The detailed research process is shown below.
First, we measured space among the selected 36 sentiment words and analyzed correlations in order to
design a two-dimensional sentiment word distribution map. We then conducted multi-dimensional
scaling (MDS). We conducted a survey on the semantic distance among 36 sentiment words by enrolling
20 college students majoring in Digital Media Technology. These 36 sentiment words were placed
on both axes (36 × 36) and the distance between words was scored in a range of plus/minus 3 points
by considering their emotional distance. We used UNICET to facilitate a variety of network analysis
methods based on data obtained from the 20 survey participants [10]. We also created Metric MDS as
shown in Figure 3 based on the semantic distance among movie review semantic words. As a result,
positive emotions such as “Happy” and “Surprise” were distributed on the right side of the X-axis while
negative feelings such as “Anger” and “Disgust” were distributed on the left side. Active emotions
that were generally exaggerated gestures such as “Fear” and “Surprise” were distributed on the top of
the Y-axis while static emotions such as “Sad” and “Boring” were on the bottom. Furthermore, each
type of sentiment words was clustered concretely based on particular characteristics such as “Happy,”
“Surprise,” “Boring,” “Sad,” “Anger,” “Disgust,” and “Fear” on the two-dimensional distribution map.
Results showed that cluster characteristic “Surprise” could be divided into “Happy” and “Fear” clusters.
This implies that both overflowing joy and sudden fright are dominant in particular movies [10].

11
Appl. Sci. 2019, 9, 2419

Figure 3. 36 sentiment words multidimensional scaling (MDS) map. Reproduced with permission
from [6]; published by [IEEE], 2015.

Heatmap visualization was then designed based on a two-dimensional sentiment word distribution
map as well as frequencies of these 36 sentiment words, which consisted of a two-dimensional
distribution map. One optional movie was needed to design the heatmap. We measured frequencies
of sentiment words on each movie by contrasting sentiment words in the movie review data obtained
through the data construction process and sentiment words in the morphological dictionary. In addition,
we measured numerical values by calculating the TF-IDF score to lower the weight of particular
sentiment words that frequently emerged regardless of the typical characteristics of the movie.
Therefore, TF-IDF score on each sentiment word could be interpreted as a numerical value reflected
on the heat map visualization graph for target movies. The final heatmap graph consisted of a
two-dimensional distribution map with sentiment words and tiny triangular cells. Every cell was
initialized at numerical value, 0. The value then increased depending on the TF-IDF score of the
sentiment word located in the pertinent cell. As the numerical value of the cell increased, the color
of the cell changed, making it easier to discover the value of TF-IDF score of a pertinent sentiment
word. Furthermore, as high-valued cells influenced the values of surrounding cells, the heatmap graph
became gradually significant.
Figure 4b is a heatmap graph representing the distribution map of sentiment words from movie
reviews written by viewers about the movie “Snowpiercer.” This graph shows high frequencies of
emotions such as “Pitiful and boring” as well as “Funny and great.” Some reviews noted “Not so
impressive, below expectation. It was pitiful and boring.” “It was crueler than I thought, and it lasts
pretty long in my head.” and “The movie was pretty pitiful, probably because my expectations were
too high.” As shown in these reviews, there were various spectators with different emotions about this
movie, including disappointment.

12
Appl. Sci. 2019, 9, 2419

(a)

(b)

Figure 4. (a) Heatmap of “Don’t Cry Mommy” showing single emotions (Furious, Outraged).
(b) Heatmap of “Snowpiercer” that shows various emotions (Cruel, Pitiful, Lonely, Bored, Funny, Great,
and Energetic). Reproduced with permission from [6]; published by [IEEE], 2015.

Therefore, a heatmap showing movie sentiment words can be divided into two cases. Thus,
there are two types of heatmaps [7,8]. The first one is a type including movies with a single frequent
sentiment word. The table shown above presented a ranking of sentiment words in a movie called
“Don’t cry mommy” (Figure 4a), while the table shown below is its actual movie reviews. Since the
movie describes a sexual crime, emotions related to “Furious” and “Outraged” are prominent. Such
parts are shown in red in the heatmap. The second is a type with various frequent sentiment words.
For instance, the word frequency table and reviews on the movie “Snowpiercer” (Figure 4b) revealed
different sentiment words such as “Boring, Cruel, Funny, and Pitiful.” Thus, its heatmap suggests
patterns based on such information. Furthermore, heatmap visualization helps us easily understand
the types of emotions that viewers have experienced in a movie by reflecting the frequency of each
sentiment word in the movie.

4.2. Sentiment Movie Network


In this session, we aim to describe the basic structures of suggested graphs and examples. Locations
of nodes can be altered depending on the main sentiment word from the movie review. The suggested
graph is similar to the artifact actor network, a type of multi-layered social network. the artifact actor
network connects between the artifact network and social network using the semantic connection.
Thus, it expresses the semantic relation between two networks [34]. In our proposed graph, we
connected sentiment words on the 2-dimensional scaling map with movie network. In this paper, we

13
Appl. Sci. 2019, 9, 2419

referred to this network as the sentiment-movie network. Figure 5 shows the basic structure of the
sentiment movie network [6].

Figure 5. The multilevel structure of the sentiment movie network. Reproduced with permission
from [6]; published by [IEEE], 2015.

As shown in Figure 5, the suggested graph is comprised of two layers. The first layer is called the
semantic layer. It consists of semantic points based on the 36 sentiment words. The semantic point of
the sentiment word is located at an initially set value. It is immovable. The second layer is called the
network layer, which includes nodes comprising the movie network. Each movie node forms the edge
of other movie nodes based on similarities. It also forms imaginary edges with the sentiment word in
the two-dimensional distribution map based on the sentiment word connoted by the pertinent node.
Nodes connected by edges have both attractive force and repulsive forces based on a force-directed
algorithm. By contrast, semantic points of sentiment words are immovable, leaving only attractive
force. For edge composition between nodes, we calculated cosine similarity between movies based on
TF-IDF scores of the 36 sentiment words. The similarity between movie A and movie B or SIM (A, B) is
shown as follows. n A ∗B
i=0 i i
SIM(A, B) =   (4)
n (A )2 ∗ n (B )2
i=0 i i=0 i

The edge between each node and the semantic point sets up a fixed threshold value and generates
an edge by designating a sentiment word with a value that is greater than a threshold value as semantic
feature. Figure 6a,b show an example that the location of a node on the graph can be altered depending
on the frequency of sentiment word indicated in the Heatmap Visualization [7,8].
Figure 6a shows that the node is located in the space of the sentiment word with overwhelmingly
high frequency. If a movie has more than two frequent sentiment words, its nodes will be located in the
way, as shown in Figure 6b. If shown in the heatmap visualization of Figure 6b, nodes in “Snowpiercer”
are influenced by keywords such as Pitiful, Bored, Energetic, Great, and Cruel. In addition, nodes
are particularly influenced by locations of “pitiful” and “bored” since those two are the strongest that
they affect nodes to be placed around the negative y-axis. As such, frequent sentiment words will
place nodes.
To show users a host of nodes in a meaningful position without overlapping, flock algorithm has
been applied. Flock algorithm consists of separation, alignment, and cohesion [35]. We composed
nodes applying separation method because nodes of developed visualization do not move and have
no direction.
Solution: One node compares all nodes to each other’s positions and moves in opposite directions
if overlaid.

14
Appl. Sci. 2019, 9, 2419

Algorithm 1. Separate nodes–avoid crowding neighbors


Input: nodes→Node objects to be displayed on the screen.
Output: Node objects whose locations have been recalculated so that they are distributed
without overlap.
Method
1: Initialize SEPARATION_DISTANCE
2: WHILE true
3: change = false
4: FOR src in nodes
5: FOR dst in nodes
6: IF src == dst
7: continue
8: ELSE
9: dist = src.position − dst.poisiton
10: IF dist.size < SEPARATION_DISTANCE
11: change = true
12: src.position += 1/dist * dist.direction
13: dst.position −= 1/dist * dist.direction
14: END IF
15: END IF
16: END FOR
17: END FOR
18: IF change == false
19: break
20: END WHILE

Even if nodes increase in large scale or similar nodes are added, with this method, nodes will not
be overlapped making bigger network configuration possible with improved delivery.
As every node connected by the network made of suggested methods is located in the graph,
clustering is formed by combining similar movies in the space of sentiment word with high frequency
considering connections between movies as well as connections between related sentiment word.
Figure 7 shows an extreme position of a node and a cluster [8]. This network allows users to easily
understand the network structure even if the number of nodes is changed by fixating the topology of
movie networks based on the sentiment word distribution map of movie reviews. Finally, k-means
clustering operation using cosine similarity value for classifying cluster characteristics of each node
was conducted. The number of clusters considered ranged from 9 to 12. The final cluster number was
chosen to be 11 as the node number of each cluster was evenly distributed and various characteristics
were well clustered. Furthermore, each node was colored to classify each node group based on these
11 clusters.

(a)

Figure 6. Cont.

15
Appl. Sci. 2019, 9, 2419

(b)

Figure 6. (a) Heatmap visualization and positioning on the sentiment-movie network. (one point
position) in the case of “Don’t Cry Mommy.” (b) Heatmap visualization and positioning on
the sentiment-movie network. (more than two point positions) in the case of “Snowpiercer.”
Reproduced with permission from [8]; published by [Journal of The Korea Society of Computer
and Information], 2016.

Figure 7. Sentiment movie network (678 Movie nodes). Reproduced with permission from [6];
published by [IEEE], 2015.

4.3. Constellation Visualization


This session facilitates a cognitive understanding of the process used to design constellation image
visualization based on specific nodes and edges with significant sentiment word frequency in order to
clarify semantic parts of each clustering. Metaphorical visualizations can map characteristics of certain
well-understood visual images/patterns to a more poorly understood data source so that aspects of the
target can be more understandable.
Recently, Information graphics and the field of information visualization apply a variety of
metaphorical devices to make large size, complex, abstract, or otherwise difficult-to-comprehend
information understandable in graphical terms [36,37].
We created an asterism graphic of each cluster network by considering the significant sentiment
words, information on movies, and synopses in each cluster. Our research makes it easy to understand
the sentiment data system to the general public by using the visual metaphor graphics.
In order to realize asterism images, we referred to the labeling data of the 11 different clusters
yielded from k-means clustering, the most dominant categories of sentiment words in each cluster,
and their following information on movies and synopsis. In order to select a pool of constellation
graphics, experts with academic backgrounds in graphic design and movie scenario professionals were

16
Appl. Sci. 2019, 9, 2419

consulted, ultimately narrowing down options to 2~3 graphics per sentiment cluster. Table 2 below
contains the lists of image options corresponding to each of these 11 clusters.
For the next phase, 30 students majoring in design and visual media were surveyed to choose
the final constellation graphics for representing sentiment words around each cluster. The survey
incorporated statements that described relations between sentiment words shown in clusters as well
as the choice of images in a 5-point Likert scale. For instance, when a statement of “An image of a
dynamite bears relation with the feeling of surprise” is presented, the test subject was supposed to
mark an answer from a range of “not at all (1 point) ~ very much (5 points)” and evaluate the strength
of the link between the sentiment word and the graphic. Table 2 presents the average drawn from
these 30 students’ data. The graphics with the highest average was chosen as the final constellation to
represent the relevant cluster.

Table 2. List of candidate graphics linked to a cluster of sentiment words.

Cluster Name List of Candidate Graphics


Cruel and dreadful Red pyramid (4.7), Piranha (4.1), Jigsaw (3.3)
Dramatic Emotional Comet (2.1), Ballerinas (3.5), Whale (4.1)
Dynamic mood change Chameleon (3.3), Persona mask (4.3)
Thrilling and horrifying Alien (3.8), Jaws (3.0)
Surprising A surprised face mask (1.7), Dynamite (3.4), Jack in the box (4.2)
Pleasing and exciting Firecracker (3.3), Gambit (3.7), Party hat (3.2)
Authentic fear Reaper (4.5), Scream Mask (4.1), Dracula (2.3)
Generally Monotonous Sloth (3.0), Snail (2.1), Yawner (2.3)
Fun and cheerful Wine Glass (4.3), Heart (3.2), A diamond ring (3.5)
Cute Gingerbread Cookie (3.5), Kitty (3.0)
Sad and touching Mermaid (4.2), Teardrop (2.8)

Table 3 below shows the main emotions and movie examples contained in each cluster as well
as motives for choosing each asterism name [6]. This constellation visualization helped us naturally
perceive characteristics of a node without having to individually review parts of the network.

Table 3. Definition of constellation visualization. Reproduced with permission from [6]; published by
[IEEE], 2015.

Cluster Name Movie Examples Asterism Name Motives for Each Name
Final Destination 3, Symbolized the cruelly murdering character in
Cruel and dreadful Red pyramid
Piranha 3D a movie <Silent Hill>
Inspired from the scene when grampus appears
Pride & Prejudice, The
Dramatic Emotional Whale in a movie <Life of Pi>, which aroused
Notebook
dramatic and emotional image simultaneously
Persona masks are supposed to express various
Snowpiercer,
Dynamic mood change Persona mask emotions, which is similar to movies with
Transformers
dynamic mood changes
Resident Evil, War Of Aliens arouse fear and suspense in
Thrilling and horrifying Alien
The Worlds unrealistic situations
Symbolized an object popping out of the box to
Surprising Saw, A Perfect Getaway Jack in the Box
express surprising moments
Relevant to the magician character of a movie
Pleasing and exciting Iron Man, Avatar Gambit
<X-men>, who is fun and surprising
Paranormal Activity, The Symbolized as a reaper to show the authentic
Authentic fear Reaper
Conjuring and intrinsic fear
Originated from the idea that sloths are boring
Generally Monotonous 127 Hours, Changeling Sloth
and mundane
Hairspray, The Spy: Wine glass is a symbol of romantic and
Fun and cheerful Wine Glass
Undercover Operation festive atmosphere
Despicable Me, Gingerbread men cookies represent cute and
Cute Gingerbread Cookie
Puss In Boots sweet sensations
Million Dollar Baby, The story of little mermaid shows touching,
Sad and touching Mermaid
Man on fire magical and sad ambience at the same time

17
Appl. Sci. 2019, 9, 2419

Constellation visualization helped us naturally perceive characteristics of a node without having


to review parts of the network individually. A comprehensive network map based on information
presented in this table is shown in Figure 8. We call this movie data network based on three proposals
“CosMovis.” Figure 8 also indicates that it is substantially easier to semantically analyze network
visualization using overlapping asterism images on each sentiment word and symbolic nodes with
connection structure of edges.

Figure 8. Comprehensive “CosMovis” constellation map of sentiment word-based movies. (Demo Site
URL: https://hyunwoo.io/cosmovis/). Reproduced with permission from [7]; published by [IEEE], 2014.

5. CosMovis Network Analysis Scenario


Our research fixed the network data based upon sentiment words MDS map and was designed
to minimize users’ learning and prevent distorted interpretation, by applying a metaphor based on
the sentiment keyword characteristics of the nodes in the network. In order to show that the user’s
target can be applied widely, this section presents an ideal scenario about how our sentiment network
visualization is used as a movie recommendation and how it affects a general public’s decision-making
process [8]. The scenario is shown as follows.
Scenario: Getting a movie recommendation from the cluster with similar emotions based on
previous movies that the general public has watched previously.
This scenario describes a situation of obtaining a movie recommendation from clusters that share
similar emotions based on sentiment words implied by users’ previous movie preferences. Assume
that the user watched “Star Trek” and was satisfied with the emotions provided by the movie. The
user will then search for a new movie from the network. He/she will first find nodes of “Star Trek” in
the network and then look for movies closely located to “Star Trek.” As a result, the user will be able to
find three other movie nodes based on “Star Trek” as shown in red in Figure 9.
Next, the user will focus on the heatmap to understand what kinds of emotions those three movies
contain. Figure 10 on the right shows heatmaps of movie nodes A, B, and C as well as their sentiment
word distribution table. Node A is for the movie “Avatar” and Node B is for the movie “Ironman 3.”
One can see that heatmaps of these two nodes are very similar to “Star Trek.” Thus, if the user wants to
have a similar emotional experience, one could watch Avatar or Ironman 3. Node C is for the movie
“Masquerade.” According to its heatmap, this movie also contains “touching” aspects in addition to
other emotions present in “Star Trek.” Therefore, if one wants a movie more emotional than “Star Trek,”
the user could pick this movie.

18
Appl. Sci. 2019, 9, 2419

We can also note that node C is for a historical genre movie, “Masquerade,” unlike “Star Trek,”
“Avatar,” or “Ironman 3”. This implies that sentiment-based movie similarity network can also
recommend movies in different genres as long as they contain similar sentiments. Likewise, users can
make more effective decisions if they want to obtain movie recommendations based on movies they
have watched by understanding the network structure, selecting candidate movies among ones similar
to his/her previously enjoyed movies, and analyzing sentiment word frequency of each candidate node.

Figure 9. (upper) Movie “Star Trek” Poster and TF-IDF size frequency of sentiment words. (middle)
“Star Trek” heatmap visualization. (lower) After selecting the movie “Star Trek (blue node),” three
movies A, B, and C (red nodes) that are located closely within the same cluster are discovered.
Reproduced with permission from [8]; published by [Journal of The Korea Society of Computer and
Information], 2016.

19
Appl. Sci. 2019, 9, 2419

Figure 10. (upper) Movie “Avatar” TF-IDF size frequency of sentiment words and heatmap visualization.
(middle) Movie “Ironman3,” (lower) movie “Masquerade.”

6. Evaluation
The scenario is as follows. Using an experimental method, we will explore whether the network
visualization put forth by this research can improve user cognition. For the purpose of verification,
three evaluations were conducted as follows. All evaluations were designed based on social science
and statistics in accordance with the International Review Board regulations.

6.1. Evaluation 1: User Awareness Level Test on Network Location


Purpose and Method of Cognition Evaluation: To gauge the users’ awareness level of the
network structure created through this research, an evaluation involving 40 participants was designed.
The participants were divided into two groups of 20, with one group given a simple explanation
regarding the visualization for two minutes while the other group was provided with a deeper

20
Appl. Sci. 2019, 9, 2419

explanation for five minutes. After a single movie node was selected from the network and the
corresponding heatmap shown, participants were asked to perform a series of tasks such as “Choose
the heatmap located adjacent to (or across from) the selected node.” The movie network was verified
through three axes, as illustrated below (Figure 11). The survey contained a total of 20 questions,
including five questions per axis along with an additional five questions pertaining to the node in the
center of the network.

Figure 11. Three axis and sentiment word directional guide for verifying the network structure.

Comparison of group average: For the purpose of average comparison, an independent t-test
was conducted. An independent t-test was also used to verify whether a difference in population
means existed between two groups that followed normal distribution N (μ1 , σ21 ) and N (μ2 , σ22 ) The two
groups were assumed to be independent of each other [38]. For this research, in order to discover the
difference in awareness levels between the two groups given either a simple or a detailed explanation
on the network structure, a survey asked participants to find the heatmap that represented a particular
node on the visualization map. We recruited 40 participants (20 in each group) who were majoring
in related fields with an understanding of visualization. We also assumed a normal distribution in
accordance with the central limit theorem. Three results on these two groups regarding network
structure are presented in Table 4.

Table 4. Results of independent t-test analysis.

Equal Variance P-Value/Alternative


Question P-Value T-Value P-Value
Assumption Hypothesis Adoption
b_A~a_A 0.01079 ** Heteroscedasticity ** −2.1963 0.03612 ** Adopt **
b_B~a_B 0.8118 Equal variance −2.5591 0.01461 ** Adopt **
b_C~a_C 0.3576 Equal variance −1.9321 0.06117 * Adopt *
b_M~b_M 0.6122 Equal variance −1.3635 0.1809 Dismissal
* 90% Confidence level, ** 95% Confidence level (A: A-axis/B: B-axis/C: C-axis/M: Middle area)/ (a: 2 min—Group/
b: 5 min—Group)/ (Left~Right: Compare Left and Right).

Independent t-test results between the more-informed group and the less-informed group showed
that P-values for questions in the three axes (excluding the middle area) were smaller than the
alpha-value (0.05), which was the critical value of a 95% confidence level, or (0.1), which was the
critical value of a 90% confidence level. This indicated a significant difference between the two groups

21
Appl. Sci. 2019, 9, 2419

that were more-instructed and less-instructed about the network. Table 5 and Figure 12 present details
regarding the two groups compared through independent t-tests.

Table 5. Details regarding the two test groups.

Question 95% Confidence Before After


−1.351534< μ
b_A~a_A 3.9 4.6
<−0.048465
−1.612026< μ
b_B~a_B 3.4 4.3
<−0.187974
−1.741933< μ
b_C~a_C 2.5 3.35
<−0.041933
b_M~a_M −1.242694< μ <0.242694 2.15 2.65

Figure 12. Comparison of correct answer rate between test groups.

Details on the two test groups indicated that, in terms of the rate of providing correct answers
for questions on all axes, the well-informed group outperformed the less-informed group in terms
of network structure. For the A-axis, after being provided a further explanation about the network
structure, the percentage of getting correct answers increased by 0.7. Similarly, for B-Axis, C-Axis,
and center node, percentages of getting correct answers increased by 0.9, 0.85, and 0.5, respectively.
Accuracy improvement was most pronounced for the A-Axis, whereas questions regarding the center
node produced the lowest enhancement in performance. Thus, we can infer that explaining the
network structure in advance of relevant visualization use can help raise the level of user awareness
on visualization. To further boost awareness regarding the center node, along with an explanation
on the network structure, a supplementary method may be required. Nevertheless, with the correct
answer rate for the less-informed group hovering around 50% on average, we can conclude that user
awareness of visualization is considered high in general.

6.2. Evaluation 2: Test on Usability Impact of the Heatmap Function


Purpose and Method of Usability Measurement Experiment: The first part of this research
concluded that explaining the network structure prior to visualization usage could enhance user
awareness [8]. To further improve the awareness level on center nodes, an additional prop up measure
was required. As one possible method for boosting user awareness on center nodes, this study
introduced a new function, through which dragging the mouse over to a particular center node would
offer heatmap visualization. The second test was thus designed to confirm whether a difference in

22
Appl. Sci. 2019, 9, 2419

visualization usability existed between group 1 (provided with the heatmap function) and group
2 (not provided such a function). The usability was measured through a survey comprised of five
questionnaires regarding learnability, efficiency, understandability, feature functionality, and accuracy
measured with a Likert scale of 7. Forty college students (two groups of 20) who were currently
studying data visualization with knowledge of the visualization field were selected as samples.
Reliability Analysis: Before analyzing the data gathered, a reliability analysis of the above survey
was carried out in order to verify its reliability. Reliability analysis measures internal consistency
among questions/statements employed in the survey based on Cronbach’s α. Cronbach’s α has a value
of between 0 and 1. The closer the value is to 1, the higher the question’s reliability. Generally, a
Cronbach’s α of 0.6 or higher is considered to be reliable. This single index can be used for all questions
to yield a comprehensive analysis. Results of reliability analysis performed on the final data set are
presented in Table 6.

Table 6. Results of reliability analysis. Reproduced with permission from [8]; published by [Journal of
The Korea Society of Computer and Information], 2016.

Cronbach’s α Cronbach’s α Total


Categories Statements
(Provide) (Non-Provide) Cronbach’s α
1. It is easy to select a movie based on
Learnability 0.698 0.827
the sentiment words.
2. It is efficient to select the node
Efficiency 0.698 0.826 0.666
based on the sentiment of the movie.
3. It is easy to understand the
Understandability sentiment distribution depending on 0.661 0.747
varying node locations.
4. It provides an adequate function to
Feature
help user choose a movie based on 0.663 0.749
Functionality
certain sentiment words.
5. The selected movie and the
Accuracy sentiment distribution predicted from 0.742 0.725
the movie’s map coordinate matches.

Comparing maximal values of Cronbach’s α after eliminating a certain item indicated that the
number was the highest (0.742) after eliminating Accuracy if the heatmap was provided while it had
the maximal value of 0.827 after eliminating Learnability if the heatmap was not provided. In addition,
the Cronbach’s α value exceeded 0.6, suggesting the absence of unreliability per each statement as well
as a high level of internal consistency across the survey. Thus, it can be concluded that the survey is
highly reliable.
Average Comparison per Group: In order to compare averages, an independent t-test was
conducted. An independent t-test was also used to verify whether there was a difference in population
means between two groups that followed a normal distribution N (μ1 , σ21 ) and N (μ2 , σ22 ). The two
groups are assumed to be independent of each other [39]. This research, through further data distillation,
examined 30 sets of survey data in accordance with the central limit theorem. The two groups were
assumed to follow a normal distribution and be independent of each other in comparing averages.
Results of average comparison analysis of the two groups are shown in Table 7.

23
Appl. Sci. 2019, 9, 2419

Table 7. Results of the independent t-test analysis. Reproduced with permission from [8]; published by
[Journal of The Korea Society of Computer and Information], 2016.

Equal Variance P-Value/Alternative


Question P-Value T-Value P-Value
Assumption Hypothesis Adoption
1_1~1_2 0.08203 Heteroscedasticity 4.8295 0.00003 ** Adopt **
2_1~2_2 0.5064 Heteroscedasticity 7.2038 0.00000001 ** Adopt **
3_1~3_2 0.2327 Heteroscedasticity 4.7609 0.000032 ** Adopt **
4_1~4_2 0.07771 Heteroscedasticity 4.3814 0.00011 ** Adopt **
5_1~5_2 0.0026 Equal variance 4.9205 0.000036 ** Adopt *
* 90% Confidence level, ** 95% Confidence level (a: Heatmap group/b: No heatmap group)/ (Left~Right: Compare
Left and Right).

After conducting an independent t-test on the two groups to which heatmap function was either
available or not available, it revealed significant differences between the two. Table 8 below presents
details of statements along with the alternative hypothesis.

Table 8. Details of statements with alternative hypothesis. Reproduced with permission from [8];
published by [Journal of The Korea Society of Computer and Information], 2016.

Question 95% Confidence Provide Non-Provide


1_1~1_2 1.128555< μ <2.771445 5.45 3.5
2_1~2_2 1.86879< μ <3.33121 5.95 3.35
3_1~3_2 1.118809< μ <2.781191 5.8 3.85
4_1~4_2 0.9908482< μ <2.7091518 5.65 3.8
5_1~5_2 1.254113< μ <3.045887 5.75 3.6
(Left~Right: Compare Left and Right).

For all statements that rejected the null hypothesis (providing heatmap function does not produce
a significant difference between the two groups) and adopted the alternative hypothesis (providing
heatmap function produces a significant difference between the two groups), it was confirmed that
the average value of the group provided with the heatmap function was higher than that of the
control group. Thus, we could conclude that a valid difference existed between the two groups based
on whether or not heatmap function was provided. We could further conclude that providing the
heatmap function reinforced the usability of the visualization map. It was also effective in enhancing
the awareness level of middle area nodes, which required an additional measure to be understood
better based on the first part of this research.

6.3. Evaluation 3: Reaction Time Test on Adding Constellation Visualization Function


Purpose and Method of Measuring Reaction Time: This research proceeds with a comparison
between its final visualization map and previous visualization maps [6–8]. The final visualization in
this study was formed by combining clustered (colored) nodes based on sentiment words, the sentiment
word MDS map indicating locations of sentiments, and the constellation metaphor applied to sentiment
words. This comparison test was designed to determine whether there was any difference in reaction
time spent when selecting the node for each visualization. Visualizations applied in this test consisted
of three different types: visualizations 1 to 3. Table 9, Table 10 and Figure 13 present further details of
these types. Test subjects were divided into three groups, one for each version of the visualization. In
this study, visualization scenes were divided into four quadrants. Subjects were given 30 s to observe
each quadrant of the visualization. Thus, the total observation time was 120 s. Meanwhile, subjects
were allowed to discover the content and structure of “CosMovis” freely. They were asked questions
(e.g., on which part of the map is the movie cluster that stirs touching and sad feelings located?)
on complex sentiments included in six polygons. Reaction time for recognizing the location of the
complex sentiment was then measured. College students currently studying data visualization with

24
Appl. Sci. 2019, 9, 2419

knowledge of the visualization field were selected as samples. A total of 45 participants were tested
(15 participants for each visualization group) [40].

Table 9. The basic structure of visualization (1 to 3).

Visualization Type Basic Structure


Visualization 1 (Network) Visualization on clustered (colored) nodes from sentiment words
Combined visualization on clustered nodes and MDS map
Visualization 2 (Network + MDS)
which shows the location of sentiments
Combined visualization on clustered nodes, MDS map which
Visualization 3 (Network + MDS +
shows the location of sentiments, and constellation metaphors
Constellation Metaphor)
based on sentiment words

Table 10. Polygon-related questions.

Polygon Name
Polygon Equal Variance Assumption
(Sentiment Words)

1. Where are the movies that stir up touching


Mermaid (Sad, Touching)
and sad feelings clustered around?

2. Where are the movies that arouse fun,


Wine Glass (Funny, Sweet,
sweet and pleasing emotions
Cheerful)
clustered around?

3. Where are the movies that trigger feelings


Jack in the box (Surprising)
of surprise clustered around?

4. Where are the movies that induce cruel and


Red-Pyramid (Cruel, Dreadful)
dreadful emotions clustered around?

5. Where are the movies that set off genuine


Reaper (Authentic fear)
fear clustered around?

6. Where are the movies that elicit dramatic


Whale (Dramatic Emotional)
sentiments clustered around?

Figure 13. Cont.

25
Appl. Sci. 2019, 9, 2419

Figure 13. (upper) First visualization: network; (middle) second visualization: network + MDS (lower)
third visualization: network + MDS + constellation metaphor.

ANOVA per group: To analyze the variance of data pertaining to multiple groups, this study
conducted two-way ANOVA. Two-way ANOVA can be used when a data set is comprised of two
independent variables and one dependent variable in order to verify the existence of a significant
difference in variance among different groups caused by changes in an independent variable [41]. This
study collected data through a closed testing method in which a series of 1:1 individual tests were
undertaken. We used questionnaires and measured reaction times of the three groups that were given
different versions of visualization for accurately locating the complex sentiment suggested by the six
polygons. Results of two-way ANOVA are presented in Table 11.

Table 11. Results of the two-way ANOVA.

DF Sum Sq Mean Sq F-Value Pr(>F)


Group 1 618.5 618.5 278.387 2 × 10−16 **
Question 1 83.8 83.8 37.74 2.16 × 10−9 **
Question: Group 1 0.1 0.1 0.056 0.814
Residuals 356 790.9 2.2
* 90% Confidence level, ** 95% Confidence level.

We reviewed the difference in reaction time among groups given different versions of visualization
maps and six kinds of polygon questions. The visualization map-based test result revealed an F-Value
of 278.387 and a P-Value of 2 × 10−16 lower than alpha value of 0.05, leading us to discard the null
hypothesis (no difference exists in reaction time to different polygon questions among groups) and
adopt the alternative hypothesis (a difference exists in reaction time to different polygon questions
among groups). In the polygon-based test, two-way ANOVA yielded an F-Value of 37.74 and a
P-Value of 2.16 × 10−9 , lower than the threshold alpha value of 0.05. Therefore, we replaced the null
hypothesis (no difference exists in reaction time among groups given different versions of polygon-based
visualization) with the alternative hypothesis (a difference exists in reaction time among groups given
different versions of polygon-based visualization). Finally, we analyzed the difference in reaction
time among nested groups with different polygon-visualization and obtained an F-Value of 0.056 and
a P-Value of 0.814, both of which were larger than the alpha value of 0.05. Thus, we accepted the
null hypothesis (a difference exists in reaction time among nested groups given different versions of
polygon-visualization), forgoing the alternative hypothesis (no difference exists in reaction time among
nested groups given different versions of polygon-visualization).
Based on ANOVA test results, we concluded that a significant difference satisfying the 95%
confidence level existed among the groups provided with different versions of the visualization and
among different questions of the test. In order to further delve into the makeup of such differences, a
post-hoc analysis was conducted using a Boxplot. Results are shown in Figure 14.
The post-hoc analysis using boxplot revealed that the first group’s (given the first version of the
visualization) reaction time in solving questions to find complex sentiment clusters was the longest.

26
Appl. Sci. 2019, 9, 2419

They spent an average of 12.01 s to complete the task. The second group (given the second version of
the visualization) took 10.41 s on average to locate the sentiment cluster, showing a reduction of 1.6 s
compared to the first group. The third group (given the third version of the visualization) had the
shortest reaction time in problem-solving recording, with an average of 8.8 s. Based on these results, it
can be concluded that using the final visualization form which combines clustered (colored) nodes,
the sentiment word MDS map, and the constellation metaphor can help users better understand the
visualization relative to the first two versions while minimizing the reaction time, thereby facilitating
the use of visualization.

Figure 14. Post-hoc analysis for the question*group.

7. Conclusions
This study proposed three different methods for intuitive and semantic analysis based on the
multilevel sentiment network visualization that we created from movie review data to serve as collective
intelligence. These three methods were: (1) heatmap visualization that indicated semantic word
characteristics of each node; (2) a method that described network nodes based upon a two-dimensional
sentiment word map; and (3) a method using an asterism graphic for semantic interpretation of
clustering followed by an evaluation to verify the suggestions.
Our system presents the insight which makes it easy to understand for users by using the three
methods mentioned above.
The first evaluation revealed that participants understood the relations between locations of the
nodes and the heatmap relatively well. However, their levels of awareness dropped significantly
when considering nodes in the middle area of the network, which was the lowest even when subjects
had been informed of relevant content. From the second evaluation, we concluded that providing
heatmap visualization reinforced the understanding of emotions in each movie since it delivered
emotion characteristics. The final evaluation led to the conclusion that the emotion word guideline of

27
Appl. Sci. 2019, 9, 2419

nodes, as well as the asterism metaphor, allowed users to understand the semantic network better than
being exposed to the network solely with nodes.
Results of these three methods signify that a heatmap visualization is a useful tool for identifying
subtle differences in sentiment word distribution among nodes placed in adjacent locations. This
sentiment-movie network allows users to promptly understand the characteristics of the nodes since it
assigns nodes based on sentiment words of each movie. Moreover, a two-dimensional distribution
map showing sentiment words facilitates the understanding of the main emotion of movie nodes.
Likewise, it is expected that these three methods can help users understand semantic and effective
multilevel sentiment analysis in different ways.
Our results also imply that the general public could efficiently select a movie by proposing a
virtual scenario for obtaining a movie recommendation using our network visualization, heatmap,
and asterism graphic. This indicates that “CosMovis” could be adopted to create a novel movie
recommendation algorithm that provides new contents with emotional patterns. However, this would
only be possible after further improvements regarding nodes in the middle area since its awareness
level needs to be enhanced according to our second evaluation.
Our research can also be applied to any numerical sentiment data (or other factors that might
work as collective intelligence) as long as it can be formed within a network structure in addition to
movie review data which has been covered so far. For instance, analyzing a trending issue in Twitter
can help us deduce a direct and intuitive outcome if a certain issue is placed as a node in the network
such as “CosMovis” by examining relations between the emotion pattern of a certain topic and those
of other topics. Future research is needed to analyze data in ontology structure as well as sentiment
analysis so that multilevel semantic visualization can be adopted in order to better clarify the criteria
or meanings of the ontology structure data. In addition, we will conduct additional evaluation to
select visual metaphorical graphics for the general public. Through this evaluation, we will select
appropriate graphics which can make the general public empathize with sentiment information. Also,
we will perform the three evaluations that dealt with at session 6 again in order to prove the user
effectiveness considering the general public.

Author Contributions: Conceptualization, H.H.; Methodology, H.H., J.L. and K.L.; Software, H.H.; Validation,
S.M.; Formal Analysis, H.H. and J.L.; Investigation, S.M.; Data Curation, H.H.; Writing—Original Draft Preparation,
H.H.; Writing—Review & Editing, H.H., S.B. and K.L.; Visualization, H.H., H.H. and S.B.; Supervision, K.L.
Funding: This research was funded by [the Ministry of Education of the Republic of Korea and the National
Research Foundation of Korea] grant number [NRF-2018S1A5B6075104] And [Brain Korea 21 Plus Digital Therapy
Research Team] grant number [NRF31Z20130012946].
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Hwansoo, L.; Dongwon, L.; Hangjung, Z. Personal Information Overload and User Resistance in the Big
Data Age. J. Intell. Inf. Syst. 2013, 19, 125–139.
2. Fayyad, U.M.; Wierse, A.; Grinstein, G.G. Information Visualization in Data Mining and Knowledge Discovery;
Morgan Kaufmann: Burlington, MA, USA, 2002.
3. Shneiderman, B.; Aris, A. Network Visualization by Semantic Substrates. IEEE Trans. Vis. Comput. Graph.
2006, 12, 733–740. [CrossRef] [PubMed]
4. Hao, M.; Rohrdantz, C.; Janetzko, H.; Dayal, U.; Keim, D.; Haug, L.E.; Hsu, M.C. Visual sentiment analysis
on twitter data streams. In Proceedings of the Visual Analytics Science and Technology (VAST), Providence,
RI, USA, 23–28 October 2011.
5. Thomas, M.J.; Edward, M. Graph Drawing by Force-direct Placement. Softw. Pract. Exp. 1991, 21, 1129–1164.
6. Hyoji, H.; Wonjoo, H.; Sungyun, B.; Hanmin, C.; Hyunwoo, H.; Gi-nam, K.; Kyungwon, L. CosMovis:
Semantic Network Visualization by Using Sentiment Words of Movie Review Data. In Proceedings of the
19th International Conference on Information Visualisation (IV 2015), Barcelona, Spain, 21 July–24 July 2015.

28
Appl. Sci. 2019, 9, 2419

7. Hyoji, H.; Gi-nam, K.; Wonjoo, H.; Hanmin, C.; Kyungwon, L. CosMovis: Analyzing semantic network of
sentiment words in movie reviews. In Proceedings of the IEEE 4th Symposium on Large Data Analysis and
Visualization (LDAV 2014), Paris, France, 9–10 November 2014.
8. Hyoji, H.; Hyunwoo, H.; Seongmin, M.S.; Sungyun, B.; Jihye, L.; Kyungwon, L. Visualization of movie
recommendation system using the sentimental vocabulary distribution map. J. Korea Soc. Comput. Inf. 2016,
21, 19–29.
9. Doi, K.; Park, H.; Junekyu, S.; Sunyeong, P.; Kyungwon, L. Visualization of Movie Recommender System
using Distribution Maps. In Proceedings of the IEEE Pacific Visualization Symposium (PacificVis 2012),
Songdo, Korea, 28 February–2 March 2012.
10. Hyoji, H.; Gi-nam, K.; Kyungwon, L. A Study on Analysis of Sentiment Words in Movie Reviews and the
Situation of Watching Movies. Soc. Des. Converg. 2013, 43, 17–32.
11. Deng, Z.H.; Yu, H.; Yang, Y. Identifying sentiment words using an optimization model with l1 regularization.
In Thirtieth AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2016.
12. MyungKyu, K.; JungHo, K.; MyungHoon, C.; Soo-Hoan, C. An Emotion Scanning System on Text Documents.
Korean J. Sci. Emot. 2009, 12, 433–442.
13. JoungYeon, S.; KwangSu, C. The Perceived Lexical Space for Haptic Adjective based on Visual Texture
aroused form Need for Touch. Soc. Des. Converg. 2013, 38, 117–128.
14. Yadollahi, A.; Shahraki, A.G.; Zaiane, O.R. Current state of text sentiment analysis from opinion to emotion
mining. ACM Comput. Surv. CSUR 2017, 50, 25. [CrossRef]
15. Kucher, K.; Paradis, C.; Kerren, A. The state of the art in sentiment visualization. Comput. Graph. Forum 2018,
37, 71–96. [CrossRef]
16. Oard, D.W.; Marchionini, G. A Conceptual Framework for Text Filtering; DRUM: College Park, MD, USA, 1996;
pp. 1–6.
17. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based Collaborative Filtering Recommendation Algorithms.
In Proceedings of the 10th International World Wide Web Conference, HongKong, China, 1–5 May 2001;
pp. 285–295.
18. Li, P.; Yamada, S. A Movie Recommender System Based on Inductive Learning. In Proceedings of the 2004
IEEE Conference on Cybernetics and Intelligent Systems, Singapore, 1–3 December 2004; pp. 318–323.
19. Ziani, A.; Azizi, N.; Schwab, D.; Aldwairi, M.; Chekkai, N.; Zenakhra, D.; Cheriguene, S. Recommender
System Through Sentiment Analysis. In Proceedings of the 2nd International Conference on Automatic
Control, Telecommunications and Signals, Annaba, Algeria, 11–12 December 2017.
20. Cody, D.; Ben, S. Motif simplification: Improving network visualization readability with fan, connector, and
clique glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13); ACM:
New York, NY, USA, 2013; pp. 3247–3256.
21. Uboldi, G.; Caviglia, G.; Coleman, N.; Heymann, S.; Mantegari, G.; Ciuccarelli, P. Knot: an interface for the
study of social networks in the humanities. In Proceedings of the Biannual Conference of the Italian Chapter of
SIGCHI (CHItaly ’13); ACM: New York, NY, USA, 2013; Volume 15, pp. 1–9.
22. Henry, N.; Bezerianos, A.; Fekete, J. Improving the Readability of Clustered Social Networks using Node
Duplication. IEEE Vis. Comput. Graph. 2008, 14, 1317–1324. [CrossRef] [PubMed]
23. Kang, G.J.; Ewing-Nelson, S.R.; Mackey, L.; Schlitt, J.T.; Marathe, A.; Abbas, K.M.; Swarup, S. Semantic
network analysis of vaccine sentiment in online social media. Vaccine 2017, 35, 3621–3638. [CrossRef]
[PubMed]
24. DougWoong, H.; HyeJa, K. Appropriateness and Frequency of Emotion Terms in Korea. Korean J. Psychol. Gen.
2000, 19, 78–98.
25. NAVER Movie. Available online: https://movie.naver.com/ (accessed on 1 February 2019).
26. Mecab-ko-lucene-analyzer. Available online: http://eunjeon.blogspot.kr (accessed on 15 February 2018).
27. Morpheme. Available online: https://en.wikipedia.org/wiki/Morpheme (accessed on 15 February 2018).
28. Srinivasa-Desikan, B. Natural Language Processing and Computational Linguistics: A Practical Guide to Text
Analysis with Python, Gensim, SpaCy, and Keras; Packt Publishing Ltd.: Birmingham, UK, 2018.
29. Wilkinson, L.; Frendly, M. The History of the Cluster Heat Map. Am. Stat. 2009, 63, 179–184. [CrossRef]
30. Van Eck, N.J.; Waltman, L.; Den Berg, J.; Kaymak, U. Visualizing the computational intelligence field.
IEEE Comput. Intell. Mag. 2006, 1, 6–10. [CrossRef]

29
Appl. Sci. 2019, 9, 2419

31. Robert, G.; Nick, G.; Rose, K.; Emre, S.; Awalin, S.; Cody, D.; Ben, S. Meirav Taieb-Maimon NetVisia: Heat
Map, Matrix Visualization of Dynamic Social Network Statistics & Content. In Proceedings of the Third IEEE
International Conference on Social Computing (the SocialCom 2011), Boston, MA, USA, 9–11 October 2011.
32. Young-Sik, J.; Chung, Y.J.; Jae Hyo, P. Visualisation of efficiency coverage and energy consumption of sensors
in wireless sensor networks using heat map. IET Commun. 2011, 5, 1129–1137.
33. Fu, Y.; Liu, R.; Liu, Y.; Lu, J. Intertrochanteric fracture visualization and analysis using a map projection
technique. Med. Biol. Eng. Comput. 2019, 57, 633–642. [CrossRef]
34. Reinhardt, W.; Moi, M.; Varlem, T. Artefact-Actor-Networks as tie between social networks and artefact
networks. In Proceedings of the 5th International Conference on Collaborative Computing: Networking,
Applications and Worksharing, Washington, DC, USA, 11–14 November 2009.
35. Flocking Algorithms. Available online: https://en.wikipedia.org/wiki/Flocking_(behavior) (accessed on
15 February 2018).
36. Risch, J.S. On the role of metaphor in information visualization. arXiv 2008, arXiv:0809.0884.
37. Hiniker, A.; Hong, S.; Kim, Y.S.; Chen, N.C.; West, J.D.; Aragon, C. Toward the operationalization of visual
metaphor. J. Assoc. Inf. Sci. Technol. 2017, 68, 2338–2349. [CrossRef]
38. Wiebe, K.L.; Bortolotti, G.R. Variation in carotenoid-based color in northern flickers in a hybrid zone. Wilson
J. Ornithol. 2002, 114, 393–401. [CrossRef]
39. Edgell, S.E.; Stephen, E.; Noon, S.M.; Sheila, M. Effect of violation of normality on the t test of the correlation
coefficient. Psychol. Bull. 1984, 95, 579. [CrossRef]
40. Clinch, J.J.; Keselman, H.J. Parametric alternatives to the analysis of variance. J. Educ. Stat. 1982, 7, 207–214.
[CrossRef]
41. Fujikoshi, Y. Two-way ANOVA models with unbalanced data. Discret. Math. 1993, 116, 315–334. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

30
applied
sciences
Article
Sentiment Classification Using Convolutional
Neural Networks
Hannah Kim and Young-Seob Jeong *
Department of Future Convergence Technology, Soonchunhyang University, Asan-si 31538, Korea;
hannah@sch.ac.kr
* Correspondence: bytecell@sch.ac.kr

Received: 29 April 2019; Accepted: 4 June 2019; Published: 7 June 2019 

Abstract: As the number of textual data is exponentially increasing, it becomes more important to
develop models to analyze the text data automatically. The texts may contain various labels such
as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some
industrial fields, so many studies of text classification have appeared. Recently, the Convolutional
Neural Network (CNN) has been adopted for the task of text classification and has shown quite
successful results. In this paper, we propose convolutional neural networks for the task of sentiment
classification. Through experiments with three well-known datasets, we show that employing
consecutive convolutional layers is effective for relatively longer texts, and our networks are better
than other state-of-the-art deep learning models.

Keywords: deep learning; convolutional neural network; sentiment classification

1. Introduction
In the Big Data era, the amount of various data, such as image, video, sound, and text, is
increasing exponentially. As text is the largest among them, studies related to text analysis have been
actively conducted from the past to the present. In particular, text classification has drawn much
attention because the text may have categorical labels such as sentiment (e.g., positive or negative),
author gender, language category, or various types (e.g., spam or ham). For example, the users of
Social Network Services (SNS) mostly represent their sentimental feeling, and they often share some
opinions about daily news with the public or friends. Emotional analysis involves mood categories
(e.g., happiness, joy, satisfaction, angry), while sentiment analysis involves categories such as positive,
neutral, and negative. In this paper, we target the sentiment analysis that classifies the given text
into one of sentiment categories. On websites about movies, people are likely to post their comments
that probably contain sentiment or opinions. If such a sentiment is accurately predicted, then it
will be applicable to various industrial fields (e.g., movie recommendation, personalized news-feed).
Indeed, the international market of movies is growing much faster than before, so many companies
(e.g., Netflix) provide movie recommendation services that essentially predict the sentiment or rating
scores of customers.
There have been many studies that have adopted machine learning techniques for text
classification. Although the machine learning techniques have been widely used and have shown
quite successful performance, they strongly depend on manually-defined features, where the feature
definition requires much effort of domain experts. For this reason, deep learning techniques have
been drawing attention recently, as they may reduce the effort for the feature definition and achieve
relatively high performance (e.g., accuracy). In this paper, we aim at sentiment classification for text
data and propose an architecture of the Convolutional Neural Network (CNN) [1–3], which is a type of

Appl. Sci. 2019, 9, 2347; doi:10.3390/app9112347 31 www.mdpi.com/journal/applsci


Appl. Sci. 2019, 9, 2347

deep learning model. We demonstrate the effectiveness of our proposed network through experimental
comparison with other machine learning models.
The contributions of this paper can be summarized as follows: (1) we design an architecture of two
consecutive convolutional layers to improve performance for long and complex texts; (2) we provide
a discussion about the architectural comparison between our models and two other state-of-the-art
models; (3) we apply the CNN model to binary sentiment classification and achieved 80.96%, 81.4%,
and 70.2% weighted F1 scores for Movie Review (MR) data [4–6], Customer Review (CR) data [7],
and Stanford Sentiment Treebank (SST) data [8], respectively; and (4) we also show that our model
achieved 68.31% for ternary sentiment classification with the MR data.
The remainder of this paper is organized as follows. Section 2 reviews related studies about
sentiment classification, machine learning, and deep learning models. Section 3 presents a detailed
description of our proposed network. Section 4 describes the experimental settings and datasets
and compares the proposed model with some other models. Thereafter, Section 5 discusses the
experimental results. Finally, Section 6 concludes this paper.

2. Background

2.1. Machine Learning for Sentiment Classification


The sentiment can be defined as a view of or an attitude toward a situation or event. It often
involves several types of self-indulgent feelings: happiness, tenderness, sadness, or nostalgia. One may
define the sentiment labels as a polarity or valence (e.g., positive, neutral, and negative) or several
types of emotional feeling (e.g., angry, happy, sad, proud). The definition of sentiment labels will
affect the outcome of the sentiment analysis, so we need to define the sentiment labels carefully. There
have been studies that defined three or more sentiment labels (e.g., opinion rating scores, emotional
feelings) [9–13], and some studies adopted two-dimensional labels (e.g., positive and negative) [14–26].
Although there has been much performance improvement in the field of sentiment analysis, the binary
classification of sentiment still remains challenging; for example, the performance (e.g., accuracy)
of recent studies varied from 70–90% in terms of the characteristics of data. In this paper, we aim
at the binary classification (positive or negative) and the ternary classification (positive, neutral,
and negative).
There have been many studies on classifying sentiments using machine learning models, such
as Support Vector Machine (SVM), Naive Bayes (NB), Maximum Entropy (ME), Stochastic Gradient
Descent (SGD), and ensemble. The most widely-used features for such machine learning models
have been n-grams. Read [14] used unigram features for sentiment binary classification and obtained
88.94% accuracy using the SVM. Kennedy and Inkpen [15] adopted unigram and bigram features
for sentiment binary classification and achieved 84.4% accuracy for movie review data [27] using
the SVM. Wan [20] tackled binary classification with unigram and bigram features and achieved
86.1% accuracy for translated Amazon product review data. In [21], Akaichi utilized a combination
of unigram, bigram, and trigram features and obtained 72.78% accuracy using the SVM. Valakunde
and Patwardhan [28] aimed at five-class (e.g., strong positive, positive, neutral, negative, and strong
negative) sentiment classification and obtained 81% accuracy using the SVM with bigram features. In
the study of Gautam and Yadav [18], they utilized the SVM along with the semantic analysis model
for the sentiment binary classification of Twitter texts and achieved 89.9% accuracy using unigram
features. Tripathy et al. [19] also used the SVM with n-gram features and obtained 88.94% accuracy
for sentiment binary classification. Hasan et al. [25] adopted unigram features for sentiment binary
classification and achieved 79% using the NB for translated Urdu tweets data. All of these studies
that utilized n-gram features generally achieved 70–90% accuracies, and the most effective model was
the SVM.
There also have been studies that have defined hand-crafted features for sentiment classification.
Yassine and Hajj [29] used affective lexicon, misspelling, and emoticons as features and obtained

32
Appl. Sci. 2019, 9, 2347

87% accuracy for ternary classification using the SVM. Denecke [22] defined three scores (e.g.,
positivity, negativity, and objectivity) as features and achieved 67% precision and 66% recall for
binary classification using the Logistic Regression (LR) classifier. Jiang et al. [16] tackled binary
classification using the SVM and achieved 67.8% accuracy for Twitter texts, where they adopted two
kinds of target-independent features (e.g., twitter content features and sentiment lexicon features).
Bahrainian and Dengel [23] used the number of positive words and the number of negative words
as features and achieved 86.7% accuracy for binary classification with Twitter texts. Neethu and
Rajasree [17] combined unigram features and their own Twitter-specific features and obtained 90%
accuracy for binary classification using the SVM. Karamibekr and Ghorbani [30] defined the number
opinion nouns as a feature and combined the feature with unigrams. They achieved 65.46% accuracy
for ternary classification using the SVM. Antai [24] used the normalized frequencies of words as
features and obtained 84% accuracy for binary classification using the SVM. Ghiassi and Lee [31]
aimed at five-class sentiment classification, and they defined a domain-independent feature set for
Twitter data. They achieved a 92.7% F1 score using the SVM. Mensikova and Mattmann [26] utilized
the results of Named Entity (NE) extraction as features and obtained a 0.9 False Positive Rate (FPR).
These studies elaborated on feature definition rather than using only n-gram features, and they
accomplished better performance (e.g., 92.7% F1 score). It is not fair, of course, to compare the
performance between the previous studies because they used different datasets. However, as shown
in [17,30], it is obvious that the combination of hand-crafted features and the n-grams must be better
than using only n-gram features.
Although there has been success using machine learning models with the hand-crafted features
and the n-gram features, these studies have a common limitation that their performance varied
depending how well the features were defined; for different data, much effort of domain experts will
be required to achieve better performance. This limitation also exists in information fusion approaches
for sentiment analysis [32] that combine other resources (e.g., ontology, lexicon), as it will cost much
time and effort of domain experts. Deep learning model is one of the solutions for such a limitation,
as it is known to capture arbitrary patterns (i.e., features) automatically. Furthermore, as stated in [33],
using the deep learning model for sentiment analysis will provide a meta-level feature representation
that generalizes well on new domains.
In this paper, we take the deep learning technique to tackle the sentiment classification. In the
next subsection, we review previous studies that adopted the deep learning techniques for the
sentiment classification.

2.2. Deep Learning for Sentiment Classification


The deep learning technique, one of the machine learning techniques, has been recently widely
used to classify sentiments. Dong et al. [34] designed a new model, namely the Adaptive Recursive
Neural Network (AdaRNN), that classifies the Twitter texts into three sentiment labels: positive,
neutral, and negative. By experimental results, the AdaRNN achieved 66.3% accuracy. Huang et al. [35]
proposed Hierarchical Long Short-Term Memory (HLSTM) and obtained 64.1% accuracy on Weibo
tweet texts. Tang et al. [36] introduced a new variant of the RNN model, the Gated Recurrent Neural
Network (GRNN), which achieved 66.6% (Yelp 2013–2015 data) and 45.3% (IMDB data) accuracies. All
of these studies commonly assumed that there were three or more sentiment labels.
Meanwhile, Qian et al. [37] utilized Long Short-Term Memory (LSTM) [38–41] for binary
classification of sentiment and obtained 82.1% accuracy on the movie review data [27]. Kim [1]
had a result of a maximum of 89.6% accuracy with seven different types of data through their CNN
model with one convolutional layer. Zhang et al. [42] proposed three-way classification and obtained
a maximum 94.2% accuracy with four datasets, where their best three-way model was NB-SVM.
Severyn and Moschitti [43] employed a pretrained Word2Vec for their CNN model and achieved
84.79% (phrase-level) and 64.59% (message-level) accuracies with SemEval-2015 data. The CNN model
used in [43] was essentially the same as the model of [1]. Deriu et al. [44] trained the CNN model that

33
Appl. Sci. 2019, 9, 2347

had a combination of two convolutional layers and two pooling layers to classify tweet data of four
languages sentimentally and obtained a 67.79% F1 score. In the study of Ouyang et al. [45], the CNN
model, which had three convolution/pooling layer pairs, was proposed, and the model outperformed
other previous models including the Matrix-Vector Recursive Neural Network (MV-RNN) [46]. We
conducted experiments with several of our CNN models that had different structures, and two of our
models (e.g., the seventh and eighth models) were similar to the model of [44,45].
As summarized above, for the sentiment classification, there are two dominant types of deep
learning technique: RNN and CNN. In this work, we propose a CNN model, the structure of which
was carefully designed for effective sentiment classification. In the next subsection, we explain the
advantages of the CNN in analyzing text.

2.3. Convolutional Neural Network for Text Classification


Among the existing studies using deep learning to classify texts, the CNN takes advantage of the
so-called convolutional filters that automatically learn features suitable for the given task. For example,
if we use the CNN for the sentiment classification, the convolutional filters may capture inherent
syntactic and semantic features of sentimental expressions, as shown in [47]. It has been shown
that a single convolutional layer, a combination of convolutional filters, might achieve comparable
performance even without any special hyperparameter adjustment [1]. Furthermore, the CNN does
not require expert knowledge about the linguistic structure of a target language [48]. Thanks to these
advantages, the CNN has been successfully applied to various text analyses: semantic parsing [49],
search by query [50], sentence modeling [2].
One may argue that the Recurrent Neural Network (RNN) [51] might be better for the text
classification than for the CNN, as it preserves the order of the word sequence. However, the CNN
is also capable of capturing sequential patterns, as concerns the local patterns by the convolutional
filters; for example, the convolutional filters along with the attention technique have been successfully
applied to machine translation [52]. Moreover, compared to the RNN, the CNN mostly has a smaller
number of parameters, so that the CNN is trainable with a small amount of data [43]. The CNN is also
known to explore the richness of pretrained word embeddings [53].
In this paper, we design a CNN model for the sentiment classification and show that our network
is better than other deep learning models through experimental results.

3. The Proposed Method


CNN, which has been widely used on image datasets, extracts the significant features of the
image, as the “convolutional” filter (i.e., kernel) moves through the image. If the input data are given
as one-dimensional, the same function of CNN could be used in the text as well. In the text area, while
the filter moves, local information of texts is stored, and important features are extracted. Therefore,
using CNN for text classification is effective.
Figure 1 shows a graphical representation of the proposed network. The network consisted of an
embedding layer, two convolutional layers, a pooling layer, and a fully-connected layer. We padded
the sentence vectors to make a fixed size. That is, too long sentences were cut to a certain length, and
too short sentences were appended with the [PAD] token. We set the fixed length S to be the maximum
length of the sentences. An embedding layer that maps each word of a sentence to an E-dimensional
feature vector outputs an S × E matrix, where E denotes the embedding size. For example, suppose
that 10 is king, 11 is shoes, and 20 is queen in the embedding space. 10 and 20 are close in this space
due to the semantic similarity of king and queen, but 10 and 11 are quite far because of the semantic
dissimilarity of king and shoes. In this example, 10, 11, and 20 are not numeric values, they are just
the simple position in this space. In other words, the embedding layer is a process of placing words
received as input into a semantically well-designed space, where words with similar meanings are
located close and words with opposite meanings are located far apart, digitizing them into a vector.
The embedding is the process of projecting a two-dimensional matrix into a low-dimensional vector

34
Appl. Sci. 2019, 9, 2347

space (E-dimension) to obtain a word vector. The embedding vectors can be obtained from other
resources (e.g., Word2Vec) or from the training process. In this paper, our embedding layer was
obtained through the training process, and all word tokens including the [UNK] token for unseen
words would be converted to numeric values using the embedding layer.

Figure 1. The graphical representation of the network, where the output dimensions of each layer are
represented at the bottom of the corresponding layers.

The S × E matrix, the output of the embedding layer, is then laid down as the first convolutional
layer. The first convolutional layer is the C1 × E matrix, which stores the local information needed
to classify the sentiment class in a S × E matrix and convey information to the next convolutional
layer. The C1 × E matrix slides (i.e., convolves) all the values of the S × E matrix with an arbitrary
stride, calculates the dot product, and passes the dot product result to the next layer. The second
convolutional layer uses the C2 × 1 matrix to extract features from the contextual information of the
main word based on the local information stored in the first convolutional layer. C1 and C2 denote the
filter size of each convolutional layer, and the two convolutional layers have K1 and K2 distinct filters,
respectively, to capture unique contextual information. In other words, the first convolutional layer is
utilized to look at simple contextual information while looking over the S × E matrix, and the second
convolutional layer is utilized to capture key features and then extract them (e.g., worst, great) that
contain sentiments affecting classification.
The matrix that passed through the consecutive convolutional layer is used as the input to the
pooling layer. While average-pooling and L2-norm pooling have been used as the pooling layer
position, in this paper, we used the max-pooling, which is a technique for selecting the largest value as
a representative of the peripheral values. Since the sentiment is often determined by a combination of
several words rather than expressing the sentiment in every word in the sentence, we adopted the
max-pooling technique. The pooling layer slides all the values of the matrix, which is the output of the
second convolutional layer, with an arbitrary stride, resulting in output vectors. Since max-pooling
is the layer that passes to the next layer the largest value among several values, it results in output
vectors of a much smaller size. In other words, the convolutional layer looks at the context and extracts
the main features, and the pooling layer plays a role in selecting the most prominent features.
After passing through the pooling layer, a flattening process is performed to convert the
two-dimensional feature map from the output into a one-dimensional format and to delivery it
to an F-dimensional Fully-Connected (FC) layer. Since the FC layer takes a one-dimensional vector
as the input, the two-dimensional vector delivered from the pooling layer needs to be flattened. The
FC layer connects all input and output neurons. A vector that passes through the FC layer forms an
output that is classified as positive or negative. The activation function softmax functions to classify
multiple classes in the FC layer. The softmax function outputs, the value, which is the probability
value, is generated for each class.
Most people might think that with many convolutional layer stacks, it may be better to store local
information and to extract contextual information; however, deep networks do not always have higher
performance than shallow networks. Because a network’s variation (e.g., deep or shallow) might rely

35
Appl. Sci. 2019, 9, 2347

on the length of the data and the number of data and features, in this paper, we argue that data passing
through two consecutive convolutional layers and then passing through the pooling layer is successful
at storing context information and extracting prominent features. This assertion is demonstrated by
experiments and discussed in Section 5.

4. Experiment

4.1. Data
We used three Movie Review (MR) data from kaggle [4–6]. The dataset in [4] consisted of two
labels, positive and negative, while [5] was composed of three labels of positive, neutral, and negative.
Furthermore, the dataset in [6] was composed of five labels of positive, somewhat positive, neutral,
somewhat negative, and negative. The positive class of [4], the positive class of [5], and the positive
and somewhat positive classes of [6] were merged and labeled as the positive class. The neutral class
of [5] and the neutral class of [6] were merged and labeled as the neutral class. Then, the negative
class of [4], the negative class of [5], and the somewhat negative and negative classes of [6] were
merged and labeled as the negative class. Positive, neutral, and negative were assigned as 1, 0, and 2,
respectively. Some 11,800 positive data, 5937 neutral data, and 9698 negative data were used, totaling
27,435. Amazon Customer Review data (CR) [7] were labeled as positive and negative, and the amount
of positive data was more than that of negative data. Therefore, we used only 3671 out of 34,062 data
to control the positive:negative ratio. Stanford Sentiment Treebank data (SST) [8] were labeled as a
value between 0 and 1, so a value of 0.5 or more was relabeled as positive, and a value of less than 0.5
was relabeled as negative.
Table 1 shows the detailed statistics of the data. N is the number of text data, Dist (+, −) is the
ratio of positive:negative classes, and aveL/maxL is the average length/maximum length of the text.
We divide all the data into three sets (i.e., train, validation, and test) with a ratio of 55:20:25, and |V |
was the dictionary size of each dataset.

Table 1. Detailed statistics of the data.

Data N Dist (+,−) aveL/maxL Train:Test:Val |V |


MR 21,498 55:45 31/290 12,095:5375:4031 9396
CR 3671 62:38 19/227 2064:918:689 1417
SST 11,286 52:48 12/41 6348:2822:2116 3550

4.2. Preprocessing
Preprocessing was carried out to modify the text data appropriately in the experiment. We used
decapitalization and did not mark the start and end of the sentences. We deleted #, two or more spaces,
tabs, Retweets (RT), and stop words. We also changed the text that represented the url that began
with “http” to [URL] and the text that represented the account ID that began with “@” to [NAME].
In addition, we changed digits to [NUM], and special characters to [SPE]. We changed “can’t” and
“isn’t” to “can not” and “is not”, respectively, since “not” is important in sentiment analysis.
We split the text data by space and constructed a new dictionary using only the words appearing
more than six times in the whole text. In the MR data, 9394 words out of a total of 38,195 words were
made into a dictionary. In CR data, 1415 out of a total of 5356 words, and in SST data, 3548 out of a
total of 16,870 words were made. The new dictionary also included [PAD] for padding and [UNK] to
cover missing words. Padding is a method of cropping all text to a fixed length and filling it with the
[PAD] token for text shorter than that length. Here, the fixed length was the maximum length of text
in each dataset.

36
Another random document with
no related content on Scribd:
In these latitudes the forest becomes a little diversified by the
appearance of several evergreens. One of the commonest is the
evergreen or live-oak. I frequently saw from the cars a small tree
having very much the appearance of our bay. There is an occasional
magnolia. There are several evergreen shrubs, and some creepers.
In swampy places there are long reaches of cane-brake. This is the
Bambusa gracilis, which we sometimes see in this country in
gardens, in spots where moisture and shelter can be combined.
These cane- or bamboo-brakes are evergreen impervious jungles
about twenty feet high. But the most interesting form to Northern
eyes is that of the little palmetto palm. Its fan-like leaf, however, is all
that can be seen of it, for its trunk scarcely rises above the ground.
In some of the swamps around New Orleans it and a kind of iris are
the most conspicuous plants.
The question of the colonisation or resettlement of the South by
the North is simply one of £. s. d. Will it be more profitable for a
Northern man to grow cotton in the South, or wheat, pork, beef, &c.
in the West? People say Northern emigrants will never go to the poor
lands of the South till all the good land of the West is taken up. This
is quite a wrong way of putting the case between the two. A man
would not go to the South to grow wheat, beef, and pork. That can
be done cheaper in the fertile West. But it is not impossible that an
equal amount of capital and labour embarked in the cotton culture of
the South might produce a greater return. In that case the poor soils
of the South might be preferred to the rich soils of the West.
One infers how much more completely the fabric of society rested
on slavery in South Carolina than it did in Virginia, by the fact that
three fourths of the devastation caused by the great fire at Richmond
have been already repaired, while nothing had been done to repair
the damages done by the great fire at Charleston, which destroyed
the whole of the centre of the city. Literally not one single brick has
yet been laid upon another for this purpose. There stand the
blackened remains of churches, residences, and stores, just as if it
were a city of the dead.
Still, even here I was surprised at the number and magnitude of
the hotels. I had taken up my quarters at the Mills House, where I
suppose there were three hundred guests. I saw
another, the Pavilion, quite as large; and I was told A Charleston
of a third, the Charleston Hotel, which was Sam Weller.
described to me as being much larger than either
of the two just mentioned.
Before I reached Charleston there had been some wet weather,
and the streets were muddy. I therefore used two pair of boots the
first day I was in the place. The next morning the Sam Weller of the
hotel refused to clean, or as he called it, to shine, more than one
pair. I remonstrated with him, but it was to no purpose, as he was
quite persuaded of the force of his own argument, ‘that if everyone in
the house was to wear two pair of boots a-day his work would be
doubled.’ Boots was a Paddy, and, since his naturalisation in the
land of freedom, had become sure that he had as much right to form
opinions for himself, express them, and act upon them, as the
President himself. I carried my grievance to the manager. He
promised redress, but the boots were not cleaned, that is to say, not
in the hotel. I mention this as it was the only instance of
insubordination I met with in this class in my tour through the United
States.
Nature has done much for Charleston. Its fine harbour is formed
by the junction of two large navigable rivers, the Ashley and the
Cooper. It is free from yellow fever, the scourge of Mobile and New
Orleans. The orange and oleander grow in the spaces between its
streets and its houses. Hitherto it has been the chief winter resort of
those in the Northern States who were unable to bear the severity of
their own climate, and of those from the West Indies who were
unable to bear the heat of theirs. Thus the victims both of heat and of
cold met in Charleston, and each found in its delightfully tempered
atmosphere exactly what he sought. And the society of these
visitors, combined with that of the rich proprietors of the State who
had their residences here, made it a very gay and pleasant place. Of
course America, hard-worked and consumption-scourged above all
other nations, ought to have its Naples; and it will be equally a matter
of course that the Naples of America should temper gaiety with
trade, and combine work with idleness, and should not be entirely a
population of do-nothings, of Lazaroni, and of pleasure-seekers. Still,
the feeling that came over one at Charleston was, How far off this
place is from the world! It was not the distance that caused this
feeling, for it is only nine hundred miles from New York, while
Chicago, where one has no feelings of this kind, is three hundred
miles further off. Perhaps fifty years hence, when probably a busy
white population will be cultivating the land around it, there will be
nothing in the place to suggest the thought that, when there, one is
out of the world.
I was surprised at finding how few Englishmen
had, at all events of late, been travelling in the Few English
Southern States. At Charleston, in a dozen folio Travellers.
pages of names in the guest-book of the hotel,
which I looked at for this purpose, I could only see one entry from
England. At Richmond I only found two English names in forty folio
pages, and the address appended to these two names was
Manchester. In the North I heard that Lord Morley, and Lord
Camperdown, and Mr. Cowper—I do not know whether the Right
Hon. W. Cowper was meant—had lately been through that part of
the country inspecting the common schools. But during my tour
through the United States, I did not fall in with a single English
traveller, nor did I fall in with one either on my outward or homeward
voyage. All my fellow-passengers on both occasions were
Americans. Of course in the summer—though it may be questioned
whether that is, so decidedly as most people suppose, the best
season for travelling in the United States—I might not have found my
countrymen so conspicuous by their absence. And yet there is no
part of the world which Englishmen ought to find so instructive and
interesting. What is there in the world more worthy of investigation
than the existing condition of things, and the events that are now
taking place, upon this great continent, which contains within itself
everything that is necessary for the well-being of man, which is
indeed a world in itself, and which stands in the same relation
towards the Atlantic and Pacific oceans that this little island does
towards the Irish and North seas; and which, whatever else may
befall, will at all events be thoroughly Anglo-Saxon?
Wherever one went in the South, one heard instances of the
hardships, the deadliness, and the evils of the late war. Of the family
of which I saw most at Richmond, the three sons had been sent to
the army. Of these three one was killed, and another maimed for life.
These were men who were no longer young, and many a day in their
long marches, and when before the enemy, their commissariat
having been utterly exhausted, they had had nothing to eat but a few
handfuls of horse corn. There were regiments that did not bring
home one fourth of the number with which they originally went out;
the rest having died of disease, or fallen in battle. In the family of
which I saw most at Charleston there were two sons, the eldest only
eighteen years of age. They both enlisted in Hampton’s cavalry, for
all above the age of sixteen had to go. And so again at New Orleans
the gentleman to whom I was especially consigned, and who had
held the commission of colonel in the Confederate army, and had
been a rich merchant of the place, told me that he and his whole
regiment were frequently without shoes, and that on one occasion he
went barefooted for six weeks continuously.
One of the most lamentable results of the great
cataclysm that ensued on the close of the war, has Who are now
been that almost a complete end has been put to Schoolmasters
the education of Southern children. Formerly many in the South.
were sent to the North, but now parents have
neither the inclination nor the means to continue this practice. In the
South itself schools did not abound; and of those which existed
before the war the greater number have followed the fate of so many
other Southern things. Some effort is being made to remedy this. In
one of the Southern cities I had been advised to call on a gentleman
who would be able to give me information on these matters. I found
him at last, after some trouble, teaching a private school of his own.
He was engaged in giving a French lesson. I remember him as a
remarkably handsome and well-mannered man. On my saying
something which implied astonishment at finding him so employed,
he replied that his profession was that of a lawyer; but that since the
war he had not been allowed to practise, because he was unable to
take the oaths which the North had imposed on all who had held any
office in the South during the war. But he added, ‘I am not in bad
company, for many of the best men in the South, beginning with
General Lee, are employed in teaching.’
While I was in the South the conventions were sitting. These were
assemblies elected under the new system of universal suffrage,
including the blacks, for the purpose of drawing up new constitutions
for their several States. Of course where the negroes were
numerically in the ascendant, the majority of the convention were
either negroes or negro nominees. I felt a repugnance to witness this
degradation of the whites; still it would have been foolish to have let
pass the opportunity for seeing what indications the blacks gave of
fitness for equal political power; and, as the governor of the State I
was then in offered to take me to the convention and introduce me to
some of the members, I accepted the offer. We had been talking on
the irrepressible subject of the negro, and I had said I thought the
blacks ought to be put on exactly the same footing as the whites. He
had assented to this idea, as it sounded very much like what he was
there to maintain. But I am not quite sure that he altogether
approved of my explanation, when I went on to say that what I meant
was, ‘that the negro should be left to work or starve, just as the white
man was in New York, in England, and everywhere else; that I did
not at all see why the negro should be petted, and patted on the
back, and have soup given to him, while he was doing nothing, and
have expectations raised in his mind that something would soon be
done for him, which treatment could not in the end be of any
advantage at all to him, while it was very costly to the whites.’ His
reply to this was the offer I just mentioned to take me to the
convention and introduce me to some of the members who were
among the leading partisans of the negro race in the State.
I was present for two hours in this convention; during that time no
speeches were made; I was therefore unable to judge in this manner
of the intelligence and animus of the black members. I did not,
however, leave the convention without a short conversation with one
of them. He was a man of unmixed African blood, and, seeing me
conversing with the white whom he regarded as the head of his
party, he left his place in the house, and came up to me and held out
his hand. I extended mine in return. On taking hold of it, he accosted
me with the words,
‘Sir, you then believe that the franchise is a God-given right.’
I said ‘that was not my belief.’
‘Why?’ he asked, rather astonished.
‘Because,’ I said, ‘it was given to you by Mr. Lincoln and the
North.’
‘No, sir,’ he replied, ‘it is a God-given right.’
‘If it is a God-given right,’ I rejoined, ‘how does it
come to pass that so few of mankind have ever The Southern
possessed it?’ Conventions.

He then inquired what limitation I would put to the ‘right.’ I told him
that I did not think it wise or just that those who had no property
should possess the power of imposing taxes on those who had
property, and deciding how the proceeds of the taxes were to be
disposed of; and that the argument would be strengthened, if these
persons without property were also without the knowledge requisite
for enabling them to read the constitution of their State. Instead of
replying to this, he returned to his seat.
I give the above dialogue as a specimen of what are a negro’s
ideas on the great subject of the day in that part of the world. I
afterwards heard that the gentleman who had been speaking to me
was the most prominent negro in the State.
In the convention I have just spoken of there was nothing
remarkable in the appearance of the members of either race. In
another convention, however, where I spent a morning, this was not
the case; for I did not see a single white who had at all the air of a
legislator, or even the appearance of a respectable member of
society. Here I heard a man who was black, or as nearly black as
one could be, make several speeches. He assumed a kind of
leadership, or at all events of authority in the assembly, to which, as
far as I could judge, he was most fully entitled. He certainly was the
best speaker I heard in the United States, or, indeed, ever heard
anywhere else, as far as his knowledge went. He spoke with perfect
ease, and complete confidence in himself, and at the same time
quite in good taste. He said nothing but what appeared to be most
reasonable, proper, and fair to both races. He was for putting an end
at once to all ideas and hopes of confiscation on the part of the
blacks, and to the fears of the whites on the subject, by some
authoritative declaration; for he believed that these hopes and fears
were giving false expectations to his own race, and causing much
uncertainty in the minds of the whites, which prevented their setting
about the re-establishment of cultivation on their estates. He had a
good musical voice, and he could vary its tones at his pleasure. His
thoughts were clearly conceived and clearly put. I must not, however,
omit to mention that, though the traces of white blood were so slight
in the colour of his skin, he had most completely the head and
features of the European—a high forehead, a thin straight nose, and
a small chin and month. His hair was woolly in the extreme. I
afterwards understood that the whites in this convention, who were
so greatly his inferiors in debate, were almost all Northern
adventurers and ‘mean whites’; the whole of the upper class having
declined to take any part in forming the new constitution.
I was taken over the South Carolina Orphan
Asylum. It is a large fine building in the town of South Carolina
Charleston, with well-kept and extensive grounds Orphan Asylum.
around it for the children to play in. The number of
children who are within the walls is two hundred. They are fed,
clothed, educated, and placed out in life by the institution. The
expenses were formerly divided between the State, a numerous
body of subscribers, and the interest that accrued from some very
considerable bequests. But during the war these investments went
the way of all other investments in the South. They were placed in
Confederate bonds, which are now worth nothing. Pretty nearly,
therefore, the whole of the burden of maintaining the asylum has
since been met by the State alone, in its present condition of
extreme impoverishment. All the children from the different class
rooms were summoned together for my inspection, into a large room
somewhat resembling a theatre, in which they are taught music and
some other subjects collectively. Small and great, they all appeared
to be very well under control. They seemed also to be happy and
healthy, but with more (which in those who lived all the year at
Charleston could hardly have been otherwise) of the Southern lily
than of the Northern rose in their complexions. As might have been
expected, there was not the animation and zeal one sees in Northern
schools, worked at the highest of high pressures. But it is a noble
institution: in it Mr. Memminger, the first Secretary of the Confederate
Treasury, was brought up.
CHAPTER IX.
COLD IN SOUTH CAROLINA AND GEORGIA—CURIOUS
APPEARANCE OF ICE—TIME NOT VALUED IN THE SOUTH—WHY
AMERICANS WILL NOT CULTIVATE THE OLIVE—TEA MIGHT
GROW IN GEORGIA—ATLANTA BOUND TO BE GREAT—CATTLE
BADLY OFF IN WINTER—A VIRGINIAN’S RECOLLECTION OF THE
WAR—HIS POSITION AND PROSPECTS—APPROACH TO MOBILE
BY THE ALABAMA RIVER—MOBILE—THE HARBOUR—WHY NO
AMERICAN SHIPS THERE—A DAY ON THE GULF—
PONCHATRAIN—NEW ORLEANS—FRENCH SUNDAY MARKET—
FRENCH APPEARANCE OF TOWN—A NEW ORLEANS
GENTLEMAN ON THE EPISCOPAL CHURCH—BISHOP ELECT OF
GEORGIA—MISSISSIPPI—THE CEMETERIES—EXPENSIVENESS
OF EVERYTHING—TRANSATLANTIC NEWS—FUSION OF NORTH
AND SOUTH—FRENCH HALF-BREEDS—ROADS—THE BEST IN
THE WORLD—APPROACH TO NEW ORLEANS BY LAND—SUGAR
PLANTATIONS—A PRAYER FOR A BROTHER MINISTER.
I left Charleston at midday. It was a cold day, but there was no ice in
the town. A few miles out of the town, on the Augusta railway, it was
freezing sharply, the railway-side ditches were coated with ice, and
the ground was white with hoarfrost wherever the sun had not
touched it. I supposed that the warmer air from the bay had kept the
frost out of the town. All through that day in travelling as far as
Augusta, and for the next two days, I found the frost in the country
between Augusta and Atlanta very severe. We should have
considered them unusually cold days in England; and yet there was
not a cloud to intercept a ray of the Southern sun. To my sensations
it was colder than I found it on any other occasion during my tour in
the United States. I mention this because it is satisfactory to collect
indications of large districts of the South being adapted to white
labour. No doubt the summer throughout this region is very warm,
but here it was cold enough to brace up relaxed constitutions.
I here saw an effect of frost, which, I suppose from differences in
radiation and evaporation, is never seen in England. Everywhere
along the railway embankments and cuttings, the
ice appeared to have shot out in rays or spikes, The Olive
three or four inches in length, and then to have unfitted for
bent over. When the rays were shorter they America.
remained straight. I asked a gentleman how the
people of the country explained the phenomenon. ‘Our explanation
of it is,’ he replied, ‘that in these parts the land spues up the ice.’
A Southern man does not set the same value on time a Northern
one does. The day, he thinks, will be long enough for all he has to
do. I often saw trains stopped, not at a station, for the purpose of
taking up or putting down a single passenger. I even saw this done
that a parcel or letter might be taken from a person standing by the
railway side. On one occasion an acquaintance with whom I was
travelling that day, and myself, both happened to have had no
dinner. We mentioned this to the conductor, and asked him if he
could manage in any way to let us have some supper. ‘Oh, yes,’ he
readily replied, ‘I will at eleven o’clock stop the train at a house in the
forest, where I sometimes have had supper myself. I will give you
twenty minutes.’ I suppose the other passengers, none of whom left
their seats, imagined that we had stopped to repair some small
damage, or to take in wood or water, for on returning to the car we
heard no observations made on the delay.
Whenever I suggested to Americans the probability that their long
range of Southern coast was well suited for the culture of the olive,
the suggestion was met with merriment. ‘There is no one in this
country,’ they would say, ‘who looks fifteen or twenty years ahead’
(the time it takes for an olive tree to come into profitable bearing).
‘Everybody here supposes that long before so many years have
expired he shall have sold his land very advantageously, or that his
business will have taken him to some other part of the country, or
that he shall have made his fortune and retired from business.’
The same objection does not lie against the culture of tea, for
which the uplands of Georgia appeared well adapted.
Atlanta I thought the most flourishing place in the South. I saw
several manufactories there, and much building was going on. It has
34,000 inhabitants. ‘Sir,’ said an Atlantan gentleman to me, ‘this
place is bound to become great and prosperous, because it is the
most central town of the Southern States.’ I suppose he had not yet
been able to divest his mind of the idea that the Southern States
formed a political unit, and must have a central capital.
The cattle of the South must, during the winter,
be among the most miserable of their kind. I saw A Virginian’s
nothing at all resembling what we call pastures; Recollections of
and if such institutions (in America everything is an the War.
institution, even the lift in an hotel) are known in
the South, they can be of little use at that time of year, for every
blade of grass in America is then withered and dry. The cattle
appeared to be kept generally in the woods, and in the maize fields,
where of course they could get nothing but the leaves that were
hanging on the dead stems. In the North, where the dead grass is
buried in snow, and the cattle therefore must be housed and kept on
artificial food and grain, they are sufficiently well off; while their
brethren in the South become the victims of a more beneficent
climate.
I sometimes repeat the remarks of persons I casually met, without
noticing whether I accept or disagree with the statements they
contain, or the spirit which appears to animate them, because what I
thought about the matter is of no consequence, while by reporting
occasionally what I heard I enable others to form some idea of what
passes in the minds of the people I came in contact with. For this
purpose I will report what a fellow-passenger said to me one night on
our way through the interminable forest in Alabama. I had several
times during the day had some talk with this gentleman, and had
been much struck with him and interested in what he said. He was a
handsome man—a very noble-looking specimen of humanity; and
his manners and ideas corresponded to his appearance. At night we
were seated together talking about the war, and the prospects of the
country, when he gave me the following account of himself. He was
a Virginian, and before the war had possessed a good property.
Though disliking the Yankees (I am giving his own words) and their
interference with the internal affairs of the Southern States, he had at
first opposed the war. But when his State had decided for it, he took
up his rifle, and joined it unreservedly. Everything he had possessed
had been lost in the war; but he was determined neither to complain,
nor to be beholden to any man. It was not a pleasant thing, for one
who had lived as a gentleman, to work for others; but that was what
he was now doing, for he had become travelling clerk for a large
mercantile house. The period of his agreement had nearly expired,
and if it was not renewed, and he could get nothing better, he would
drive a dray. He spoke bitterly of the Yankees, for their greed of
plunder, and for their want of a sense of honour. When the South
surrendered, they did it in perfect good faith; acknowledging that
fortune had entirely decided against them, and determining to submit
honestly to the award. But the Yankees would not believe in their
good faith, and had sent into every Southern State a military dictator
with an army, to oppress and insult the whites, and to keep them in
subjection to the blacks. He had loved his country, and been proud
of it: but now he had no country, no home, no prospects. He said the
blacks fought with more desperation than the Yankees. He had been
through the whole war, and had had plenty of opportunities for
comparing them. He would rather meet three Yankees than two
blacks. The black was easily wrought up into a state of enthusiasm,
and would fight like a fanatic. The Yankee was always calculating
chances, and taking care of himself. The West it was that decided
the war; and he thought it should have arbitrated at first, and
prevented the fighting. I lost sight of this fallen but brave-hearted
Virginian on the steps of the St. Charles Hotel at New Orleans, to
which he was so obliging as to guide me on our arrival at that city.
He was then on his way to Texas.
The railway does not run into Mobile, but ends at
a wharf, about twenty miles from the city, on the Mobile.
Tensas (pronounced Tensaw), a kind of loop
branch of the Alabama, which it rejoins at Mobile. It is a fine broad
piece of water, and its banks are clothed with the undisturbed
primæval forest, which is always to the European a sight of great
interest. On unloading the train I saw that we had picked up during
the night a dozen fine bucks, which we were to take on to Mobile. I
had observed in the early morning two or three small herds of wild
deer feeding in the forest. They seem to have become accustomed
to the locomotive. Even here it was freezing very sharply, and the
buckets of water on board the steamer were thickly coated with ice.
This frost, I found at Mobile, had killed all the young wood of the
orange trees. The crew of the steamer were negroes, and I was
surprised to see them on so cold a morning washing their woolly
heads in buckets of water drawn from the river, and then leaving
their wet hair and faces to be dried by the cold morning air. At the
junction of the Tensas and Alabama there was a great deal of
swampy land, partly covered with reeds, and much shallow water,
upon which were large flocks of wild fowl. The river was here full of
snags and sawyers, and its navigation was still further impeded by a
fortification of piles the Confederates had driven across it during the
war, to keep the enemy from getting up to the city. This approach to
Mobile had more of the air of novelty about it than anything I had yet
seen in America. It made me feel that I was really in a new world.
As I intended to make no stay at Mobile, I did not use any of the
letters of introduction I had with me, thinking I should in a short time
see more of the men and manners of the place, if I accompanied a
travelling acquaintance I had made, in his calls upon the firms with
whom he had to transact business. I was four times during the
morning invited, not to liquor, that expression I never once heard in
America, but to take a drink. There is much heartiness of feeling
here, and everybody carries out, to the full extent, the American
practice of shaking hands with everybody, which is a rational way of
expressing goodwill without saying anything. I walked about a mile
out into the environs to see the houses of the merchants and well-to-
do inhabitants. I passed three hospitals, one for yellow-fever cases
that can pay, one for yellow-fever cases that cannot pay, and one for
general cases. The streets of the town were full of pedestrians and
of traffic. In this respect it bore a very favourable comparison with
Charleston, where nothing was going on. The population amounts to
about 34,000. The Spaniards who originally settled the place have
been utterly obliterated. On inquiry I found that none had remained in
the city, being quite unable in any department of business to support
the competition of the Anglo-Saxon, but that among the mean
whites, at some little distance from Mobile, a few Spanish names
were to be found. These remnants of the original settlers the
Alabamans call Dagos, a corruption of the common Spanish name of
Diego.
The great cotton ships cannot come to up to
Mobile. There is, however, a magnificent bay, in A Day on the
the form of a great lake with a narrow inlet from the Gulf of Mexico.
sea, in which they ride at anchor, waiting for their
cargoes, at a distance of between thirty and forty miles from the city.
I counted thirty-seven of these ships. They were almost all English.
Some said there was not a single American among them. A few
years ago far the greater part of them would have been American;
but since they have taxed heavily everything received into the
country, or manufactured in it, they have ceased to be able to build
or to sail ships as cheaply as we can.
I shall never forget the day I passed on the Gulf of Mexico in going
from Mobile to New Orleans. The air was fresh and had just the
slightest movement in it. The sky was unclouded, and the sun
delightfully warm. We have pleasant enough days at home
occasionally, but this belonged to quite another order of things. And
as the darkness came on, the night was as fine and bright, after its
kind, as the day had been. Many sat talking on the deck till long after
the sun was down. Some, I suppose, felt that this would be their only
day on the Gulf of Mexico.
The communication between Mobile and New Orleans is not
carried on by the mouth of the Mississippi, but by the Lake of
Ponchatrain. This is a large piece of very shallow water, seldom
more than seven feet deep, which communicates with the sea. A
railway is carried out into the lake on piles for a distance of five
miles. At the terminus of this long pier the steamers deposit and
receive their passengers.
On entering the city, at the other terminus of this railway, at half-
past six o’clock in the morning, the first sight that attracted attention
was the French Sunday Market. This is what everyone who visits
New Orleans is expected to see. It is a general market, and the
largest of the week. I do not remember ever to have seen a larger or
a busier one. What attracted my attention most, on passing through
it, was the great quantity and variety of wild fowl exhibited for sale.
The Marché des Fleurs was very good. In short there was an
abundant supply of everything. The shops in the neighbourhood
were all open; and in the American part of the city also, I saw several
open on the morning of this day.
New Orleans still retains very much of the air of a French city.
Many of the streets are narrow, and paved (which I saw nowhere
else in America) with large blocks of granite. This is brought from
New England. Something however of the kind was necessary here,
on account of the wet alluvial soil on which the city stands; it would
be truer to say on which it floats. The houses are generally lofty, and
their external character is rather French than English. The French
language is spoken by a large part of the population. In the street
cars, one is almost sure to hear it, coming often from the mouths of
coloured people.
While at New Orleans I heard Dr. Beckwith, the
Bishop elect of Georgia. His church is about a mile Episcopal
and a half from the St. Charles’s Hotel, in one of Church at New
the best suburbs of the city. In going I asked a Orleans.
gentleman, who was seated next to me in the
street car, the way. He replied that he was one of the doctor’s
congregation, and would be my guide. This led to some
conversation; he said ‘that of late years, in New Orleans and
elsewhere in the States, the Episcopal Church had begun to exert
itself, and was now doing wonders in bringing people into its
communion.’ I told him that only a few days before I had seen it
stated in an editorial of a New York paper, ‘that the Episcopal Church
was now quite the church of the best society in the United States;
and that if one wished to get into good society, it was wise to join this
communion.’ He replied ‘that statements of this kind read well in
newspapers, and that of course there were some people who could
be influenced by such considerations; but that in his opinion the most
effective reasons for attracting people to the Episcopal Church was
the character of the Church itself, and of those who did belong and
had belonged to it. It was an historical church, with a grand
theological literature of its own, and that, indeed, almost the whole
literature of England appeared to belong to the Episcopal Church;
and it had, which he thought the most potent reason of all, a definite
creed and a dignified ritual.’
Dr. Beckwith’s congregation consisted of about a thousand very
well-dressed people. As is usual everywhere in Episcopal churches
in America, there was an offertory; and I saw, as its pecuniary result,
four large velvet dishes piled full of greenbacks, placed in the hands
of the three officiating clergymen. Nobody gives less than a quarter
of a dollar note. The Bishop elect preached. He is a very good-
looking, able, and eloquent man. He ridiculed the idea of a ‘psalm-
singing’ eternity, and affirmed that the possession of knowledge
would be an immeasurably nobler means of happiness. But if we
concede this, there will still remain the question whether the exercise
of the feelings of the heart would not confer on the majority of the
human race far more happiness than the exercise of the powers of
the intellect.
I looked into another large Episcopal church on my way to Dr.
Beckwith’s, and found in it several young men teaching Sunday
classes.
One gets so accustomed in the lakes, rivers and harbours of
America, to vast expanses of water, that the first sight of the
Mississippi at New Orleans becomes on that account more
disappointing to most people than it otherwise would be. As you
cross the Levee, you see before you a stream not three quarters of a
mile wide. The houses on the opposite side do not appear to be at
even that humble distance. The traveller remembers how many
streams he has crossed, particularly on the eastern and southern
coast, some of them even unnoticed on the map he is carrying with
him, but which had wider channels. And so he becomes dissatisfied
with his first view of the Father of Waters. Still, there he has before
him, in that stream not three quarters of a mile wide, the outlet for
the waters of a valley as large as half of Europe. What mighty rivers,
commingled together, are passing before him—the Arkansas, the
Red River, the Platte, the Missouri, the Ohio, the Wabash, the
Cumberland, the Tennessee! How great then must be the depth of
the channel through which this vast accumulation of water is being
conveyed to the ocean! On this last point I questioned several
persons in the city, getting from all the same answer, that several
attempts had been made to fathom this part of the river, but that
none had been attended with success.
On my calling the attention of the stout,
mediæval, coloured female who had charge of the The
baths of the St. Charles’s Hotel, to the water in the Cemeteries.
one she was preparing for me,—for it was of the
colour, and not far from the consistency of pea-soup,—she
convinced me in a moment of my ignorance, and of the irrational
character of my remark. ‘Child,’ she said, ‘it is Mississippi river, which
we all have to drink here all our lives.’
I visited the celebrated burial grounds of New Orleans, one in the
suburbs, and three others contiguous to one another, in an older part
of the city. None of them are more than two or three acres in size. In
each case the enclosure was surrounded with a high wall, which was
chambered on the inner side for the reception of coffins. The whole
peculiarity of these burial grounds arises from the fact that, the soil
being too swampy to admit of interment, the coffin must be placed in
some receptacle above ground. Many of the trades of the city, and
several other associations, appear to have buildings of their own in
the cemeteries, for the common reception of the bodies of those who
had in life belonged to the brotherhood. Most of the families, too, of
the place appear to have their own above-ground tombs. They are
almost universally of brick, plastered outside, and kept scrupulously
clean with whitewash. It was on a Sunday evening that I visited the
cemetery in the suburbs. It was very cold on that evening to my
sensations, and so I suppose it must have felt much colder to people
who were accustomed to the climate of New Orleans; still I saw
many persons, sometimes alone, and sometimes in parties, sitting or
standing by the tombs that contained the remains of those who had
been dear to them, and the recollection of whom they still cherished.
In some cases I saw one man, two men, or more than two, seated at
a grave smoking. In some cases there would be a whole family. This
I noticed particularly at what was far the best monument in the place.
It was one that had been raised to the memory of a young man who
had fallen in the late war. There was a small granite tomb, over
which rose a pillar of granite bearing the inscription. A little space all
round was paved with the same material, and this was edged with a
massive rim, also of granite, about two feet high. Upon this were
seated many of his sorrowing relatives, old and young. In the same
cemetery I saw two other monuments to young men who had died
soldiers’ deaths in the Confederate service. On each of the three, the
inscription ran that the deceased had died in the discharge of his
duty, or in defence of the rights of his country, or some expression
was used to indicate the enthusiastic feeling of the South. One was
stated to have been the last survivor of eight children, and the stone
went on to say that his parents felt that they had given their last child
to their country and to God. These were all English inscriptions; but I
saw some that were in two languages, English being mixed in some
cases with French, in others with German. I saw no inscriptions that
had any direct reference in any way to what Christians believe.
At New Orleans, fifteen hundred miles from New
York, you get, in your morning paper, whatever Amalgamation
was known in London yesterday of English and of North and
European news. And this department of an South.
American journal contains a great deal more than
we, in this country, are in the habit of seeing in the Times and other
English papers, as the messages brought to us by the Atlantic cable;
because we want intelligence only from the United States, whereas
they wish to get from us not only what is going on in England, but in
every part of Europe, and in fact in every part of the Old World.
As one is thus day after day, whether you be in the centre, or
thousands of miles away, at some almost unknown extremity, of this
vast transatlantic region, kept well informed as to what is passing
over almost all the earth, one feels that there are agencies at work
amongst us which in some respects render ‘the wisdom of the
ancients’ a little obsolete. Formerly it would have been thought
impossible to harmonise such discordant elements as the North and
the South. How could they ever dwell together as brethren, who
were locally so remote from each other, that while one was basking
in a sub-tropical sun, the other was shrinking from the nipping frosts
of the severest winter; whose institutions, too, and interests, and
antecedents had in many essential points been very dissimilar; and
whose differences at last had broken out into a fierce and sanguinary
war? Could they ever be fused into a single homogeneous people?
Down to the times of our fathers, it would have been quite
impossible. Each would then have kept only to his own region, and
known no influences but those which were native to it. But now we
have changed all that. A few threads of wire overhead, and a few
bars of iron on the levelled ground, will do all that is wanted. For
extreme remoteness they have substituted so close a contiguity that
the North and South can now talk together. The dissimilar
institutions, interests, and antecedents of the past, however strong
they may be in themselves, become powerless when something
stronger has arisen; and this new power is now vigorously at work
undermining and counteracting their effects. And it is a power that
will also exorcise envy, hatred, and malice. Men are what their ideas
are, for it is ideas that make the man. And every morning these two
people have the same ideas, and the same facts out of which ideas
are made, put in the same words before them. The wire threads
overhead do this. And if a Southern man, from what he reads this
morning, thinks that his interests call him to the North, or a Northern
man that his call him to the South, the railway, like the piece of
carpet in the Eastern story, will transport them hither and thither in a
moment. It ensures that there shall be a constant stream of human
beings flowing in each direction. Everyone can foresee the result—
that there must, sooner or later, be one homogeneous people.
Formerly the difference of their occupations produced difference of
feeling, and was a dissociating cause. Now it will lead to rapid,
constant, and extensive interchange of productions, by the aid of the
telegraph and railway. Each will always be occupied with the thought
of supplying the wants of the other. And this will lead to social
intercourse, and the union of families. So that what was impossible
before is what must be now.
Everything in the United States, except railway
fares and the per diem charges of hotels, is High Prices.
unreasonably dear; and the hotels themselves
participate in the general irrationality on this subject as soon as you
order anything that is not down in the list of what is allowed for the
daily sum charged you. I never got a fire lighted in my bed-room for a
few hours for less than a dollar, that is 3s. 6d.; or half a dozen pieces
washed for less than a dollar and a half. But the one matter of all in
which the charges are the most insane is that of hackney coaches
and railway omnibuses. You get into an omnibus at the station to be
carried two or three hundred yards to your hotel. As soon as the
vehicle begins to move, the conductor begins to levy his black mail.
He is only doing to you what his own government is doing to him.
And in America it seems to be taken for granted that one will pay just
what he is ordered to pay. ‘Sir,’ he addresses you, ‘you must pay
now; three quarters of a dollar for yourself, and a quarter of a dollar
for each piece,’ that is of luggage. You have perhaps four pieces,
being an ignorant stranger; if you had been a well-informed native
you would have had only one piece; and for these four pieces and
yourself you pay about 7s. In an English railway omnibus you would
have paid 6d. or 1s. The hackney coaches are very much worse. I
found a driver in New York who would take me for a short distance
for a dollar and a half, but I never found so reasonable a gentleman
in the profession elsewhere. At New Orleans there appear to be a
great many hackney coaches, all apparently quite new, with a great
deal of silver-plated mounting about them, almost as if they had
been intended for civic processions on the scale of our Lord Mayor’s
show. Each of them is drawn by a very fair pair of horses. I once
counted two-and-thirty of these coaches standing for hire, on a rainy
day, at the door of the St. Charles’s Hotel; and I was told that it was
their rule not to move off the stand for less than two dollars, or to
take one out to dinner, and bring one back, for less than ten dollars.
Books of travel in the United States generally contain some
remarks on the personal attractions of the mixed race in New
Orleans. From the little I saw of them, I can add that they appeared
to me as spirituel as the French themselves. I am more disposed to
believe that this is hereditary, than that it is the result merely of
imitation. But with respect to their personal appearance, after having
of late seen so many of the coarse and ill-visaged half-breeds in
which an Anglo-Saxon was the father, I was much struck with their

You might also like