Download Advances In Machine Learning Deep Learning Based Technologies Selected Papers In Honour Of Professor Nikolaos G Bourbakis Vol 2 Learning And Analytics In Intelligent Systems 23 George A Tsihrintzis online ebook texxtbook full chapter pdf
Download Advances In Machine Learning Deep Learning Based Technologies Selected Papers In Honour Of Professor Nikolaos G Bourbakis Vol 2 Learning And Analytics In Intelligent Systems 23 George A Tsihrintzis online ebook texxtbook full chapter pdf
Download Advances In Machine Learning Deep Learning Based Technologies Selected Papers In Honour Of Professor Nikolaos G Bourbakis Vol 2 Learning And Analytics In Intelligent Systems 23 George A Tsihrintzis online ebook texxtbook full chapter pdf
https://ebookmeta.com/product/advances-in-artificial-
intelligence-based-technologies-selected-papers-in-honour-of-
professor-nikolaos-g-bourbakis-vol-1-1st-edition-maria-virvou/
https://ebookmeta.com/product/machine-learning-techniques-and-
analytics-for-cloud-security-advances-in-learning-analytics-for-
intelligent-cloud-iot-systems-1st-edition-chakraborty/
https://ebookmeta.com/product/machine-learning-and-deep-learning-
techniques-in-wireless-and-mobile-networking-systems-1st-edition-
k-suganthi/
https://ebookmeta.com/product/deep-learning-in-cancer-
diagnostics-a-feature-based-transfer-learning-evaluation-mohd-
hafiz-arzmi/
Machine Learning and Deep Learning in Neuroimaging Data
Analysis 1st Edition Anitha S. Pillai
https://ebookmeta.com/product/machine-learning-and-deep-learning-
in-neuroimaging-data-analysis-1st-edition-anitha-s-pillai/
https://ebookmeta.com/product/design-of-intelligent-applications-
using-machine-learning-and-deep-learning-techniques-1st-edition-
ramchandra-sharad-mangrulkar-editor/
https://ebookmeta.com/product/recent-advances-in-internet-of-
things-and-machine-learning-real-world-applications-intelligent-
systems-reference-library-215-valentina-e-balas-editor/
https://ebookmeta.com/product/computer-vision-and-machine-
learning-in-agriculture-algorithms-for-intelligent-systems-
mohammad-shorif-uddin-editor/
Volume 23
Series Editors
George A. Tsihrintzis
University of Piraeus, Piraeus, Greece
Maria Virvou
University of Piraeus, Piraeus, Greece
Lakhmi C. Jain
Faculty of Engineering and Information Technology, Centre for Artificial
Intelligence, University of Technology, Sydney, NSW, Australia;, KES
International, Shoreham-by-Sea, UK; Liverpool Hope University,
Liverpool, UK
Maria Virvou
Department of Informatics, University of Piraeus, Piraeus, Greece
Lakhmi C. Jain
KES International, Shoreham-by-Sea, UK
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in
any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Further Reading
1. Arthur Samuel, Some Studies in Machine Learning Using the Game
of Checkers. IBM J. 3(3), 210–229 (1959)
2. https://en.wikipedia.org/wiki/Arthur_Samuel
Michalis Zervakis
Preface
A world-recognized researcher can be honored in a variety of ways,
including elevation of his professional status or various prestigious
awards and distinctions. When, additionally, the same researcher has
served as advisor to generations of undergraduate, graduate, and
doctoral students and as mentor to faculty and colleagues, the task of
appropriately honoring him becomes even harder! Perhaps, the best
way to honor this person is to ask former doctoral students, as
well as colleagues and fellow researchers from around the world,
to include some of their recent research results in one or more
high quality volumes edited in his honor. Such an edition indicates
that other researchers are pursuing and extending further what
they have learned from him in research areas where he made
outstanding contributions.
Professor Nikolaos G. Bourbakis has been serving the fields of
Artificial Intelligence (including Machine Learning/Deep Learning) and
Assistive Technologies from various posts for almost fifty years now. He
received a BS in Mathematics from the National and Kapodistrian
University of Athens, Greece, a Certificate in Electrical Engineering from
the University of Patras, Greece, and a Ph.D. in Computer Engineering
and Informatics (awarded with excellence), from the Department of
Computer Engineering & Informatics, University of Patras, Greece.
Dr. Bourbakis (IEEE Fellow-1996) is currently a Distinguished
Professor of Information & Technology and the Director of the Center of
Assistive Research Technologies (CART) at Wright State University,
Ohio, USA. He is the founder and Editor-in-Chief of the International
Journal on Artificial Intelligence Tools, the International Journal on
Monitoring and Surveillance Technology Research (IGI-Global, Publ.),
and the EAI Transactions on Bioengineering & Bioinformatics. He is
also the Founder and Steering Committee Chair of several International
IEEE Computer Society Conferences (namely, ICTAI, ICBIBE, ICIISA),
Symposia and Workshops. He pursues research in Assistive
Technologies, Applied Artificial Intelligence, Bioengineering,
Information Security, and Parallel/Distributed Processing, which is
funded by USA and European government and industry. He has
published extensively in IEEE and International Journals and he has
graduated, as the main advisor, several dozens of doctoral students. His
research work has been internationally recognized and he has received
several prestigious awards, including: IEEE Computer Society Technical
Research Achievement Award; Member of the New York Academy of
Sciences; Diploma of Honor in AI School of Engineering, University of
Patras, Greece; ASC Outstanding Scientists & Engineers Research
Award; Dr. F. Russ IEEE Biomedical Engineering award, Dayton Ohio;
Most Cited Article in Pattern Recognition Journal; IEEE ICTAI and
ICBIBE best paper Awards; Recognition Award for Outstanding
Scholarly Achievements and Contributions in the field of Computer
Science, University of Piraeus, Greece; IEEE EMBS-GR Award of
Achievements; IEEE Computer Society 30 years ICTAI Outstanding
Service & Leadership Recognition; Honorary Doctorate degree of the
University of Piraeus, Greece.
We have been collaborating with Prof. Nikolaos G. Bourbakis for
very many years. Thus, we proposed and undertook with pleasure the
task of editing a special book in his honor. The response from former
mentees, colleagues, and fellow researchers of his has been great!
Unfortunately, page limitations have forced us to limit the works to be
included in the book and we apologize to those authors whose works
were not included. Despite the decision not to include all proposed
chapters in the book, it became apparent that not only one, but three
volumes of the special book had to be developed, each of which would
focus on different aspects of Dr. Nikolaos G. Bourbakis’s research
activities.
The book at hand constitutes the second volume and is devoted to
Advances in Machine Learning/Deep Learning-based
Technologies. While honoring Professor Nikolaos G. Bourbakis, this
book also serves the purpose of exposing its reader to some of the most
significant advances in Machine Learning/Deep Learning-based
technologies. As such, the book is directed towards professors,
researchers, scientists, engineers, and students in computer science-
related disciplines. It is also directed towards readers who come from
other disciplines and are interested in becoming versed in some of the
most Advances in Machine Learning/Deep Learning-based
Technologies. We hope that all of them will find it useful and inspiring
in their works and researches.
We are grateful to the authors and reviewers for their excellent
contributions and visionary ideas. We are also thankful to Springer for
agreeing to publish this book in its Learning and Analytics in
Intelligent Systems series. Last, but not least, we are grateful to the
Springer staff for their excellent work in producing this book.
George A. Tsihrintzis
Maria Virvou
Lefteri Tsoukalas
Anna Esposito
Lakhmi C. Jain
Piraeus, Greece
Piraeus, Greece
Lafayette, Indiana, USA
Vietri, Italy
Sydney, Australia
Contents
1 Introduction to Advances in Machine Learning/Deep Learning-
Based Technologies
George A. Tsihrintzis, Maria Virvou and Lakhmi C. Jain
1.1 Editorial Note
1.2 Book Summary and Future Volumes
References
Part I Machine Learning/Deep Learning in Socializing and
Entertainment
2 Semi-supervised Feature Selection Method for Fuzzy Clustering
of Emotional States from Social Streams Messages
Ferdinando Di Martino and Sabrina Senatore
2.1 Introduction
2.2 The FS-EFCM Algorithm
2.2.1 EFCM Execution:Main Steps
2.2.2 Initial Parameter Setting
2.3 Experimental Results
2.3.1 Dataset
2.3.2 Feature Selection
2.3.3 FS-EFCM at Work
2.4 Conclusion
References
3 AI in (and for) Games
Kostas Karpouzis and George A. Tsatiris
3.1 Introduction
3.2 Game Content and Databases
3.3 Intelligent Game Content Generation and Selection
3.3.1 Generating Content for a Language Education Game
3.4 Conclusions
References
Part II Machine Learning/Deep Learning in Education
4 Computer-Human Mutual Training in a Virtual Laboratory
Environment
Vasilis Zafeiropoulos and Dimitris Kalles
4.1 Introduction
4.1.1 Purpose and Development of the Virtual Lab
4.1.2 Different Playing Modes
4.1.3 Evaluation
4.2 Background and Related Work
4.3 Architecture of the Virtual Laboratory
4.3.1 Conceptual Design
4.3.2 State-Transition Diagrams
4.3.3 High Level Design
4.3.4 State Machine
4.3.5 Individual Scores
4.3.6 Quantization
4.3.7 Normalization
4.3.8 Composite Evaluation
4.3.9 Success Rate
4.3.10 Weighted Average
4.3.11 Artificial Neural Network
4.3.12 Penalty Points
4.3.13 Aggregate Score
4.4 Machine Learning Algorithms
4.4.1 Genetic Algorithm for the Weighted Average
4.4.2 Training the Artificial Neural Network with Back-
Propagation
4.5 Implementation
4.5.1 Instruction Mode
4.5.2 Evaluation Mode
4.5.3 Computer Training Mode
4.5.4 Training Data Collection Sub-mode
4.5.5 Machine Learning Sub-mode
4.6 Training-Testing Process and Results
4.6.1 Training Data
4.6.2 Training and Testing on Various Data Set Groups
4.6.3 Genetic Algorithm Results
4.6.4 Artificial Neural Network Training Results
4.7 Conclusions
References
5 Exploiting Semi-supervised Learning in the Education Field:A
Critical Survey
Georgios Kostopoulos and Sotiris Kotsiantis
5.1 Introduction
5.2 Semi-supervised Learning
5.3 Literature Review
5.3.1 Performance Prediction
5.3.2 Dropout Prediction
5.3.3 Grade Level Prediction
5.3.4 Grade Point Value Prediction
5.3.5 Other Studies
5.3.6 Discussion
5.4 The Potential of SSL in the Education Field
5.5 Conclusions
References
Part III Machine Learning/Deep Learning in Security
6 Survey of Machine Learning Approaches in Radiation Data
Analytics Pertained to Nuclear Security
Miltiadis Alamaniotis and Alexander Heifetz
6.1 Introduction
6.2 Machine Learning Methodologies in Nuclear Security
6.2.1 Nuclear Signature Identification
6.2.2 Background Radiation Estimation
6.2.3 Radiation Sensor Placement
6.2.4 Source Localization
6.2.5 Anomaly Detection
6.3 Conclusion
References
7 AI for Cybersecurity:ML-Based Techniques for Intrusion
Detection Systems
Dilara Gumusbas and Tulay Yildirim
7.1 Introduction
7.1.1 Why Does AI Pose Great Importance for
Cybersecurity?
7.1.2 Contribution
7.2 ML-Based Models for Cybersecurity
7.2.1 K-Means
7.2.2 Autoencoder (AE)
7.2.3 Generative Adversarial Network (GAN)
7.2.4 Self Organizing Map
7.2.5 K-Nearest Neighbors (k-NN)
7.2.6 Bayesian Network
7.2.7 Decision Tree
7.2.8 Fuzzy Logic (Fuzzy Set Theory)
7.2.9 Multilayer Perceptron (MLP)
7.2.10 Support Vector Machine (SVM)
7.2.11 Ensemble Methods
7.2.12 Evolutionary Algorithms
7.2.13 Convolutional Neural Networks (CNN)
7.2.14 Recurrent Neural Network (RNN)
7.2.15 Long Short Term Memory (LSTM)
7.2.16 Restricted Boltzmann Machine (RBM)
7.2.17 Deep Belief Network (DBN)
7.2.18 Reinforcement Learning (RL)
7.3 Open Topics and Potential Directions
7.3.1 Novel Feature Representations
7.3.2 Unsupervised Learning Based Detection Systems
References
Part IV Machine Learning/Deep Learning in Time Series
Forecasting
8 A Comparison of Contemporary Methods on Univariate Time
Series Forecasting
Aikaterini Karanikola, Charalampos M. Liapis and Sotiris Kotsiantis
8.1 Introduction
8.2 Related Work
8.3 Theoretical Background
8.3.1 ARIMA
8.3.2 Prophet
8.3.3 The Holt-Winters Seasonal Models
8.3.4 N-BEATS:Neural Basis Expansion Analysis
8.3.5 DeepAR
8.3.6 Trigonometric BATS
8.4 Experiments and Results
8.4.1 Datasets
8.4.2 Algorithms
8.4.3 Evaluation
8.4.4 Results
8.5 Conclusions
References
9 Application of Deep Learning in Recurrence Plots for
Multivariate Nonlinear Time Series Forecasting
Sun Arthur A. Ojeda, Elmer C. Peramo and Geoffrey A. Solano
9.1 Introduction
9.2 Related Work
9.2.1 Background on Recurrence Plots
9.2.2 Time Series Imaging and Convolutional Neural
Networks
9.3 Time Series Nonlinearity
9.4 Time Series Imaging
9.4.1 Dimensionality Reduction
9.4.2 Optimal Parameters
9.5 Convolutional Neural Networks
9.6 Model Pipeline and Architecture
9.6.1 Architecture
9.7 Experimental Setup
9.8 Results
9.9 Conclusion
References
Part V Machine Learning in Video Coding and Information
Extraction
10 A Formal and Statistical AI Tool for Complex Human Activity
Recognition
Anargyros Angeleas and Nikolaos Bourbakis
10.1 Introduction
10.2 The Hybrid Framework—Formal Languages
10.3 Formal Tool and Statistical Pipeline Architecture
10.4 DATA Pipeline
10.5 Tools for Implementation
10.6 Experimentation with Datasets to Identify the Ideal Model
10.6.1 KINISIS—Single Human Activity Recognition
Modeling
10.6.2 DRASIS—Change of Human Activity Recognition
Modeling
10.7 Conclusions
References
11 A CU Depth Prediction Model Based on Pre-trained
Convolutional Neural Network for HEVC Intra Encoding
Complexity Reduction
Jiaming Li, Ming Yang, Ying Xie and Zhigang Li
11.1 Introduction
11.2 H.265 High Efficiency Video Coding
11.2.1 Coding Tree Unit Partition
11.2.2 Rate Distortion Optimization
11.2.3 CU Partition and Image Texture Features
11.3 Proposed Methodology
11.3.1 The Hierarchical Classifier
11.3.2 The Methodology of Transfer Learning
11.3.3 Structure of Convolutional Neural Network
11.3.4 Dataset Construction
11.4 Experiments and Results
11.5 Conclusion
References
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
G. A. Tsihrintzis et al. (eds.), Advances in Machine Learning/Deep Learning-based
Technologies, Learning and Analytics in Intelligent Systems 23
https://doi.org/10.1007/978-3-030-76794-5_1
Abstract
The field of Machine Learning and its sub-field of Deep Learning are
most active areas of research in Artificial Intelligence, as researchers
worldwide continuously develop and announce both new theoretical
results and innovative applications in increasingly many and diverse
other disciplines. The book at hand aims at exposing its readers to
some of the most significant recent advances in Machine
Learning/Deep Learning-based technologies. At the same time, the
book aims at honouring Professor Nikolaos G. Bourbakis, an
outstanding researcher in this area who has contributed significantly to
the development of Machine Learning/Deep Learning-based
technologies. As such, the book is directed towards professors,
researchers, scientists, engineers and students in computer science-
related disciplines. It is also directed towards readers who come from
other disciplines and are interested in becoming versed in some of the
most recent progress in Machine Learning/Deep Learning-based
technologies. An extensive list of bibliographic references at the end of
each chapter guides the readers to probe deeper into their areas of
interest.
References
1. J. Toonders, Data is the new oil of the digital economy. Wired. https://www.
wired.c om/insights/2014/07/data-new-oil-digital-economy/
2.
K. Schwabd, The Fourth Industrial Revolution—what it means and how to
respond. Foreign Affairs. https://www.foreignaffairs.c om/articles/2015-12-12/
fourth-industrial-revolution. Accessed 12 Dec 2015
3.
From Industry 4.0 to Society 5.0: the big societal transformation plan of Japan,
https://www.i-scoop.eu/industry-4-0/society-5-0/.
4.
Society 5.0, https://www8.c ao.go.j p/c stp/english/society5_0/index.html
5.
E. Rich, K. Knight, S.B. Nair, Artificial Intelligence, 3rd edn. (Tata McGraw-Hill
Publishing Company, 2010)
6.
J. Watt, R. Borhani, A.K. Katsaggelos, Machine Learning Refined—Foundations
Algorithms and Applications, 2nd edn. (Cambridge University Press, 2020)
7.
A. Samuel, Some studies in machine learning using the game of checkers. IBM J.
3(3), 210–229 (1959)
[MathSciNet][Crossref]
8.
A.S. Lampropoulos, G.A. Tsihrintzis, Machine learning paradigms—applications in
recommender systems, in Intelligent Systems Reference Library Book Series, vol.
92 (Springer, 2015)
9.
D.N. Sotiropoulos, G.A. Tsihrintzis, Machine learning paradigms—artificial
immune systems and their application in software personalization, in Intelligent
Systems Reference Library Book Series, vol. 118 (Springer, 2017)
10.
G.A. Tsihrintzis, D.N. Sotiropoulos, L.C. Jain (eds.), Machine learning paradigms—
advances in data analytics, in Intelligent Systems Reference Library Book Series,
vol. 149 (Springer, 2018)
11.
A.E. Hassanien (ed.), Machine learning paradigms: theory and application, in
Studies in Computational Intelligence Book Series, vol. 801 (Springer, 2019)
12.
G.A. Tsihrintzis, M. Virvou, E. Sakkopoulos, L.C. Jain (eds.), Machine learning
paradigms—applications of learning and analytics in intelligent systems, in
Learning and Analytics in Intelligent Systems Book Series, vol. 1 (Springer, 2019)
13.
J.K. Mandal, S. Mukhopadhyay, P. Dutta, K. Dasgupta (eds.), Algorithms in machine
learning paradigms, in Studies in Computational Intelligence Book Series, vol. 870
(Springer, 2020)
14.
M. Virvou, E. Alepis, G.A. Tsihrintzis, L.C. Jain (eds.), Machine learning paradigms
—advances in learning analytics, in Intelligent Systems Reference Library Book
Series, vol. 158 (Springer, 2020)
15.
J. Patterson, A. Gibson Deep Learning—A Practitioner’s Approach (O’ Reilly, 2017)
16.
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new
perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
[Crossref]
17.
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61,
85–117 (2015)
[Crossref]
18.
Y. Bengio, Y. LeCun, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
[Crossref]
19.
G.A. Tsihrintzis, L.C. Jain (eds.), Machine learning paradigms—advances in deep
learning-based technological applications, in Learning and Analytics in Intelligent
Systems Book Series, vol. 18 (Springer, 2020)
Part I
Machine Learning/Deep Learning in
Socializing and Entertainment
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
G. A. Tsihrintzis et al. (eds.), Advances in Machine Learning/Deep Learning-based Technologies, Learning and
Analytics in Intelligent Systems 23
https://doi.org/10.1007/978-3-030-76794-5_2
Sabrina Senatore
Email: ssenatore@unisa.it
Abstract
Capturing the text content, especially when it reflects the human emotional states and
feelings, is crucial in every decision-making process: from the item purchase to the
marketing campaign, the user mood is becoming an essential peculiarity to always
monitoring. This work proposed a new method based on a fuzzy clustering algorithm that
takes into account human suggestions for feature selection. The method exploits two fuzzy
indices, namely, the feature relevance that is initially provided by the human expertise and
the feature incidence on a specific cluster. The Extended Fuzzy C-Means (EFCM) clustering
is used to balance the two “dueling” indexes; a t-norm operator-based feature importance
index enables the appropriate feature set selection. Experimental results on social
message streams show the method’s effectiveness in supporting those emotions the
human considers relevant in the textual context.
2.1 Introduction
Nowadays, high throughput technologies routinely produce large data that are recorded
and stored for analytics purposes. In the Social Web particularly, the continuous user-
generated content needs to be arranged appropriately to accelerate text analysis and
information retrieval tasks. Data often contain irrelevant and redundant features, with a
high level of noise.
Especially in the classification task, large feature vectors could significantly slow down
the process, and, even though such vectors are expected to have more discriminating
power, practically, they often produce models that do not reflect a quite generalized data
representation.
This problem is quite evident in the processing of textual information, such as papers,
websites, reviews, twitters, or snippets. The expressiveness of natural language
emphasizes the difficulty to discriminate appropriate features that support accurately the
classification methods. On the other hand, the increasing volume of opinionated data
disseminated on the Web needs enhanced approaches to analyze and process data in an
efficient way, capturing the actual meaning behind the text. The natural language indeed is
imprecise and ambiguous and, in general, the text is composed of a loosely structured
sequence of words and symbols that can support humans in capturing the actual meaning
of the sentences but, this activity is quite complex for the computational systems that
could not infer a right context for a group of words. These issues can be amplified if text
mining activities are targeted at capturing the emotions and the sentiments from opinions
[20].
Analyzing user-generated content in social media to capture people’s emotions and
understand public attitude and mood is a crucial task for market analysis, business,
political consensus study. Consumers can influence other users’ consumption activities: an
opinion, a comment, a reaction quickly reach global audiences who share similar interests
in a product or brand. Text Mining is a complex activity that needs to discover relevant
information from a large collection of textual information, which is often unstructured,
redundant, and duplicated.
Feature selection becomes a mandatory preprocessing phase to reduce the
dimensionality and eliminate the duplication and unwanted features in the data.
Many feature selection algorithms have been developed in literature [7], often studied
as optimization problems [9]. In Information Retrieval (IR) approaches, the Bag-of-Words
(BoW) is the most known vector space model used to represent documents. It is filled by
word frequencies over a fixed dictionary. The feature selection methods remove the lowest
ranking terms based on a scoring function, such as term occurrences [16] and frequencies,
TF-IDF [17], as well as mutual information (MI) and chi-squared ranking (χ2) [12].
Selecting the highest-ranked terms does not guarantee to get the most relevant feature,
especially in the text mining tasks, where polysemy and synonymy can affect the
classification tasks: redundant features do not contribute to adding new information to
describing the concept, as well as irrelevant features can simply add noise to the mining
process [20]. On the other hand, the reduction of the feature set impacts the size of the
data space, and therefore also decreases the complexity of the classification and prediction
problems.
In addition to the traditional Information Retrieval and Text Mining approaches, which
are mainly based on a preliminary feature set definition, many approaches in the literature
aim at seeking latent semantics from the data space to face polysemy, synonyms,
homonyms, phrases dependencies issues. Latent models [1] are useful to identify the
semantic concepts in text documents and uncover the latent semantic structure embedded
in document collections. In [5] latent models help discriminating emotions from textual
space: the projection of words expressing sentiments and emotions in the same space
topology provides effective contexts to overcome linguistic ambiguity in the natural
language. Sentiment Analysis has been extensively studied in recent years [11–19], often
focusing on natural language processing techniques to face all the issues related to
understanding the written language targeted at interpreting human moods.
In [1], the Latent Dirichlet Allocation (LDA) model is used to extract latent topics and is
combined with a Bayesian approach to extract concepts to associate with the topics. Some
approaches [8–21] adopt external resources such as WordNet [22] as well as dictionaries,
thesauri, and knowledge bases to discriminate sense and the context of terms in sentences.
Classification in natural language processing tasks finds in Deep Learning techniques
[14] compelling methods to capture the complexity of language, overcoming problems,
such as the curse of dimensionality, since the linguistic text was represented with sparse
matrices (high-dimensional features). With the recent popularity of word embeddings,
neural-based approaches exhibit good performance compared to more traditional machine
learning models. Empirical evidence shows that discovering linguistic patterns remain an
open issue in language understanding, due to the complexity of natural language, that
through metaphors, rhetoric, figurative expressions make ineffective known automatic
models for feature extraction and selection.
This paper presents a novel feature selection method applied to fuzzy clustering. The
algorithm is called FS-EFCM (Future Selection on Extended Fuzzy C-Means) and extends
the EFCM algorithm [10]. The algorithm takes external scores into account as additional
parameters for the initial configuration. The idea is to allow human suggestion in the
discrimination of relevant features, viz., the features that are crucial to describe the
domain of interest the features are from, and then to affect the clustering process into
discarding irrelevant and noise information.
Thus, experts can provide their relevance values (weights) to each feature based on
their expertise and knowledge of the reference domain.
During the EFCM execution, some features could be discarded based on the expert
feature relevance selection. The features are also evaluated with respect to their impact,
i.e., the affecting on the formation of the clusters. Both feature relevance and incidence are
monitored during the FS-EFCM execution, to evaluate which features are crucial for both
the clustering performance and the experts.
The remainder of the paper is organized as follows. Section 2.2 introduces the
proposed FS-EFCM algorithm: a general overview of the algorithm is presented firstly, then
the main steps and the pseudocode describe the algorithm in detail. Additional
investigation on the parameter setting configuration is also provided. Section 2.3 is
devoted to the experimental results. A dataset composed of tweet streams is analyzed to
classify the tweet trends by capturing the sentiments and emotions from text analysis. The
experiments show the effectiveness of the proposed method. Finally, conclusions will be
given in the last section.
The algorithm takes as input the data collection and the expert-driven scores (weight)
associated with the features. Depending on the data, plausible data analysis and
preprocessing activities could be started to make it suitable for processing by the
algorithm. For example, the textual dataset must be processed by apply typical NLP tasks
(i.e., tokenization, stemming, stop-word removal, pos tagging, etc.), sentiment, and
emotion analysis instead focuses on capturing the emotional aspect embedded in the word
or sentence meaning. The EFCM algorithm works on data translated in a matrix form.
Each score given by the human experts and associated with a feature describes how
that feature is relevant in the domain of interest, according to the expert viewpoint.
As shown in Fig. 2.1, the scores are processed (Feature relevance (FR) Estimation) to
rescale them according to the appropriate range and evaluation metrics.
Acquired the input, the algorithm implements an iterative process: in each iteration,
the EFCM is launched and, until the stability condition is not verified, the whole algorithm
is re-run by updating the parameter configuration. Precisely, the EFCM output is targeted
at evaluating the incidence of each feature in the clustering formation (Feature Incidence
(FI) Assessment).
The stability condition is strictly correlated to two important indices of the algorithm:
the feature relevance FR and the feature incidence FI, which represents the importance of a
feature from the human viewpoint and the incidence of the same feature from the data
distribution in the clustering structure, respectively.
The algorithm stops when a condition of stability is reached, i.e., when all the features
are strongly affecting the clustering formation.
Otherwise, when the stability condition is not satisfied, a further evaluation based on
the two indices individuates the features candidate to be discarded. Once removed, the
process is re-iterated con the remaining features.
2.2.1 EFCM Execution: Main Steps
The FS-EFCM algorithm can be described by the following steps:
1. Feature relevance (FR) Estimation: the collected score assigned by experts to the
features sh with h = 1, …, H are translated in a proper scale in the range [0, 1]. In
general, scores assigned by experts could be defined in a scale correlated to the data
domain. Thus, an index could be necessary to bound the feature score in the interval
[0, 1]. For example, the score sh could be fuzzified assigning a membership degree
μFR(sh) to a pre-defined fuzzy set (e.g., a sigma fuzzy set on a universe of a discourse
given by an interval of the real line).
2. EFCM algorithm execution once given the data and feature scores, the EFCM algorithm
is executed; in the first run, all the input features are used. The generated clusters are
hyperspheres in the feature space.
3. Feature incidence (FI) assessment: the clusters generated by the EFCM are analyzed:
the incidence of each feature in the clustering structure is calculated, by evaluating the
hth feature component impact on each cluster prototype (measured as the distance of
the feature components between the cluster prototype pairs). More formally, at the tth
algorithm iteration, the weight value w(t)h of hth feature component, is feature
incidence value and it is calculated as follows:
(2.1)
where v′ih and v′kh are the hth component values of all the cluster center pairs,
evaluated at the tth iteration. The weight value w(t)h assumes values between 0 and 1;
the higher the value is, the more the feature affects the cluster formation. Similarly to
the FR index, the incidence value is used to calculate the corresponding membership
degree to a prefixed fuzzy set.
4. Stability Condition Check: The algorithm stops when the difference between the FI
values of a feature in two successive iterations is below a prefixed threshold θ.
Formally, the stability condition is given by:
(2.2)
where μFI(wh) is in general the membership degree to a fuzzy set defined on the
feature wh. If the condition holds, the current feature keeps being part of the feature
set, otherwise the algorithm continues to the next step.
5. Discard less significant features: the remaining features are the candidate to be
removed by the feature set since their contribution might be not significant in the
clustering process. The selection of features candidate to be removed from the feature
set is achieved by defining a new measure of significance μh of the hth feature for the
clustering structure, applying the t-norm operator as follows:
(2.3)
If μh is lower than a prefixed threshold δ, then the hth feature is unmeaningful, and
then it is removed from the current feature set. Finally, the process returns to step 2,
considering just the filtered features in the next iteration.
The pseudocode of the FS-EFCM is shown Listing 1.
A further choice concerns the t-norm operator used in Line 13 of Listing 2 and defined
in Step 5 of the FS-EFCM algorithm (Sect. 2.2.1). Among the families of t-norms defined in
literature, the most widely used in application problems are the triangular norms, shown
as follows:
– minimum (Gödel) t-norm x● y = min(x, y)
– product (Goguen) t-norm x● y = x⋅y
– Lukasiewicz t-norm x● y = max(x + y − 1, 0)
Depending on the selected t-norm, different fuzzy intersections are generated; in
particular, the minimum t-norm is the most used in fuzzy controls, whereas the product t-
norms produces a more drastic intersection than the minimum t-norm.
Then, EFCM runs on the remaining 154 selected features. The algorithm stops after 8
cycles/iterations. In the last three cycles, the number of features decreases to 20, keeping
this value until the last cycle. Table 2.3 shows the results obtained after each algorithm
iteration.
After the eighth cycle the stop iteration value is less than the threshold θ and the
algorithm stops.
In Table 2.4, the remaining twenty features and the measure of their significance are
shown.
Table 2.4 Significances of the final selected features
Feature name Feature significance
Admir 0.09
Afraid 0.15
Anxious 0.18
Attract 0.11
Bad 0.16
Good 0.16
Great 0.13
Happi 0.15
Hurt 0.14
Love 0.10
Pain 0.21
Panic 0.12
Passion 0.15
Sad 0.13
Scare 0.08
Thank 0.22
Touch 0.14
Unhappi 0.15
Upset 0.13
Worri 0.08
The final number of clusters is 16. The final document membership is calculated,
assigning it to the cluster to which the document belongs with the highest membership
degree.
Hence, cluster-class mapping is achieved by associating each cluster with the class to
which most of the documents that have been assigned to the cluster belong to.
Table 2.5 shows for each emotional category, the number of documents in the
class/category, the cluster label associated with the class, the number of documents
assigned to this cluster, and the number of documents assigned to the cluster that belong
to the class (well-classified documents).
Table 2.5 Number of documents assigned to the class and assigned to the correspondent cluster
Now, let us consider just those documents whose membership degree to the cluster to
which they are assigned is high or higher than a specific threshold; let us set that threshold
to 0.6.
The number of documents whose membership degree to the assigned cluster is greater
than 0.6 is 2047, which means it is about 37% of all the document collection.
Table 2.7 shows statistics similar to Table 2.5 but considering just the documents that
strongly belong to the cluster they are assigned to.
Table 2.7 Number of documents assigned to the class and assigned to the correspondent cluster
considering the documents with membership degree to the assigned cluster greater than 0.6
Table 2.8 Classification performance indices calculated for the documents in Table 2.7
These results highlight the effectiveness of the FS-EFCM, especially on supporting the
classification of social data drawing relevant emotion-driven user behavior.
The experimentation reveals how the approach can generate accurate clusters, well-
described by the features associated with each cluster. In fact, the final features are those
that better represent the data distribution in the clusters and give a clear idea about the
main sentiments and emotions expressed in the tweet trends analyzed.
2.4 Conclusion
This paper presents a semi-supervised clustering to classifying user generated. This paper
presents a semi-supervised clustering to classifying user-generated content from the
analysis of the main emotions expressed in the text.
The algorithm acquires from human experts some scores to assign a relevance degree
to the features in the reference domain. Our experimentation focused on capturing
emotions from tweet streams. For these reasons, the experts scored words expressing
emotions or sentiments (such as “joy”, “beautiful”, etc.). The FS-EFCM algorithm reaches a
trade-off between the feature relevance score provided by the experts and the feature
impact on the cluster formation.
The performance of the algorithm revealed not just the effectiveness of the approach,
but also its reliability in filtering very relevant features that clearly characterize the final
clusters.
Future development of the algorithm aims at investigating deeply how the documents
are associated with emotional categories. The idea is to consider not just the cluster to
which the document belongs, with the highest membership degree, but also the other
clusters to which the document can belong (with lower membership degree), to discover
which sets of multiple emotional categories emerge from the document.
References
1. L. Alsumait, D. Barbara, C. Domeniconi, On-Line LDA: adaptive topic models for mining text streams
with applications to topic detection and tracking, in Conference: Proceedings of the 8th IEEE
International Conference on Data Mining (ICDM 2008), Pisa, Italy (2008), 10 pp. https://doi.org/10.1109/
ICDM.2008.140
2.
L. Barbosa, J. Feng, Robust sentiment detection on twitter from based and noisy data, in Proceedings of
the 23rd International Conference on Computational Linguistics: Posters. COLING’10, Beijing, China
(Association for Computational Linguistics, Stroudsburg, PA, USA, 2010), pp. 36–44
3.
J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Kluwer Academic Publishers,
Norwell, MA, USA, 1981). https://doi.org/10.1007/978-1-4757-0450-1
4.
J.C. Bezdek, R. Ehrlich, W. Full, The fuzzy C-means clustering algorithm. Comput. Geosci. 10(2–3), 191–
203 (1984)
[Crossref]
5.
E. Cambria, Affective computing and sentiment analysis. IEEE Intell. Syst. 31(2), 102–107 (2016)
6.
D. Cavaliere, S. Senatore, V. Loia, Context-aware profiling of concepts from a semantic topological space.
Knowl. Based Syst. 130, 102–115 (2017)
7.
G. Chandrashekar, F. Sahin, A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28
(2014). https://doi.org/10.1016/j .c ompeleceng.2013.11.024.ISSN 0045-7906
8.
K. Dave, S. Lawrence, D.M. Pennock, Mining the peanut gallery: opinion extraction and semantic
classification of product reviews, in Proceedings of the 12th International Conference on World Wide Web,
WWW ’03 (ACM, New York, NY, USA, 2003), pp. 519–528. https://doi.org/10.1145/775152.775226
9.
I. Guyon, A. Elisseeff, A., An introduction to variable and feature selection. J. Mach. Learn. Res. 3(2003),
1157–1182 (2003)
[zbMATH]
10.
U. Kaymak, M. Setnes, Fuzzy clustering with volume prototype and adaptive cluster merging. IEEE
Trans. Fuzzy Syst. 10(6), 705–712 (2002)
11.
B. Liu, Sentiment analysis and opinion mining. Synth. Lect. Human Lang. Technol. 5(1), 168 (2012).
https://doi.org/10.2200/S00416ED1V01Y201204HLT016
12.
C.D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval (Cambridge University
Press, New York, 2008), 506 pp. ISBN 978-0521865715
13.
F. Martino Di, S. Senatore, S. Sessa, A lightweight clustering-based approach to discover different
emotional shades from social message streams. Int. J. Intell. Syst. 1, 19 (2019). https://doi.org/10.1002/
int.22105
14.
D.W. Otter, J.R. Medina, J.K. Kalita, A survey of the usages of deep learning for natural language
processing. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2020.
2979670
15.
S. Poria, A. Gelbukh, A. Hussain, N. Howard, D. Das, S. Bandyopadhyay, Enhanced senticnet with affective
labels for concept-based opinion mining. IEEE Intell. Syst. 28(2), 31–38 (2013). https://doi.org/10.
1109/MIS.2013
16.
A. Rehman, J. Kashif, H.A. Babri, S. Mehreen, Relative discrimination criterion—a novel feature ranking
method for text data. Expert Syst. Appl. (Elsevier) 42, 3670–3681 (2015)
[Crossref]
17.
G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5),
513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
[Crossref]
18.
L. Stanchev, Semantic document clustering using a similarity graph, in 2016 IEEE Tenth International
Conference on Semantic Computing (ICSC) (2016), pp. 1–8. https://doi.org/10.1109/I CSC.2016.8
19.
C. Strapparava, R. Mihalcea, Learning to identify emotions in text, in Proceedings of the 2008 ACM
Symposium on Applied Computing, SAC’08 (ACM, New York, NY, USA, 2008), pp. 1556–1560. https://doi.
org/10.1145/1363686.1364052
20.
S. Sun, C. Luo, J. Chen, A review of natural language processing techniques for opinion mining systems.
Inf. Fusion 36, 10–25 (2017). https://doi.org/10.1016/j .inffus.2016.10.004
[Crossref]
21.
T. Wei, Y. Lu, H. Chang, Q. Zhou, X. Bao, A semantic approach for text clustering using WordNet and
lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015). https://doi.org/10.1016/j .eswa.2014.10.023
[Crossref]
22.
G. Miller, C. Fellbaum, WordNet. An electronic lexical database. Cambridge, MA: MIT Press; (1998). 423
pp, ISBN: 978-0262061971.
Footnotes
1 www.enchantedlearning.c om.
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of A tragikum
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: A tragikum
Language: Hungarian
BEÖTHY ZSOLT.
KIADJA A KISFALUDY-TÁRSASÁG.
BUDAPEST
FRANKLIN-TÁRSULAT
MAGYAR IROD. INTÉZET ÉS KÖNYVNYOMDA.
1885.
FRANKLIN-TÁRSULAT NYOMDÁJA
A TRAGIKUM ELMÉLETE ÉS
JELENSÉGEI.
I.
A tragikum elemei.
A kellembeli kiválóság.