Practical Applications of Computational Biology and Bioinformatics 16th International Conference PACBB 2022 Florentino Fdez-Riverola
Practical Applications of Computational Biology and Bioinformatics 16th International Conference PACBB 2022 Florentino Fdez-Riverola
Practical Applications of Computational Biology and Bioinformatics 16th International Conference PACBB 2022 Florentino Fdez-Riverola
com
https://ebookmeta.com/product/practical-
applications-of-computational-biology-and-
bioinformatics-16th-international-conference-
pacbb-2022-florentino-fdez-riverola/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookmeta.com/product/genedis-2022-computational-biology-
and-bioinformatics-advances-in-experimental-medicine-and-
biology-1424-panagiotis-vlamos-editor/
https://ebookmeta.com/product/bioinformatics-and-computational-
biology-1st-edition-hamid-r-arabnia/
https://ebookmeta.com/product/bioinformatics-and-computational-
biology-a-primer-for-biologists-1st-edition-basant-k-tiwary/
Advances in Practical Applications of Agents Multi
Agent Systems and Complex Systems Simulation The PAAMS
Collection 20th International Conference PAAMS 2022
Frank Dignum
https://ebookmeta.com/product/advances-in-practical-applications-
of-agents-multi-agent-systems-and-complex-systems-simulation-the-
paams-collection-20th-international-conference-paams-2022-frank-
dignum/
Practical Applications
of Computational
Biology and
Bioinformatics,
16th International
Conference
(PACBB 2022)
Lecture Notes in Networks and Systems
Volume 553
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to
both the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Practical Applications
of Computational Biology
and Bioinformatics, 16th
International Conference
(PACBB 2022)
Editors
Florentino Fdez-Riverola Miguel Rocha
Computer Science Department Campus de Gualtar
Universidad de Vigo Universidade do Minho
Vigo, Spain Braga, Portugal
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
this event will strongly promote the interaction among researchers from international
research groups working in diverse fields. The scientific content will be innovative,
and it will help improve the valuable work that is being carried out by the participants.
This symposium is organized by the University of L’Aquila (Italy) with the
collaboration of the United Arab Emirates University, the University of Minho, the
University of Vigo, the University of Salamanca and the Gheorghe Asachi Technical
University of Ias, i. We would like to thank all the contributing authors, the members
of the programme committee and the sponsors. We thank for funding support to
the project: “Intelligent and sustainable mobility supported by multi-agent systems
and edge computing” (Id. RTI2018-095390-B-C32), and finally, we thank the local
organization members for their valuable work, which is essential for the success of
PACBB’22.
vii
Organization
Mohd Saberi Mohamad, United Arab Emirates University, United Arab Emirates
Miguel Rocha, University of Minho, Portugal
Advisory Committee
ix
x Organization
Organizing Committee
Programme Committee
xiii
xiv Contents
1 Introduction
Around one-third of the proteins in a cell are found in its membrane, and approxi-
mately one-third of these proteins are involved in molecule transport [21]. Trans-
membrane transport proteins, also known as transporters, are required for cell
metabolism, ion homeostasis, signal transduction, binding with small molecules in
the extracellular space, immune recognition, energy transduction, and physiological
and developmental processes [21].
Protein research has advanced our knowledge of human health and disease treat-
ment. The decreasing cost of sequencing technology has enabled the generation of
2 Related Work
SCMMTP [15] makes use of a novel scoring card method (SCM) to ascertain the
dipeptide composition of potential membrane transport proteins. SCMMTP begins
with a 400-dipeptide starting matrix and scores dipeptides based on the difference
between positive and negative compositions. Following that, the matrix is optimized
using a genetic algorithm. SCMMTP achieved an overall accuracy of 81.12% and
76.11% and an MCC of 0.62 and 0.47, respectively, on the training and independent
datasets.
Nguyen et al. [17] characterize transporter protein sequences using a word-
embedding technique. The protein sequence is defined by the word embedding and
the protein’s biological terms frequency. They achieved accurate results in terms
of transporter substrate specificity but not in terms of transporter detection. When
cross-validation was used, the prediction accuracy for transporters was only 83.94
and 85.00% using the independent dataset.
In 2020, Alballa and Butler developed TooT-T [2], an ensemble technique that
combines the results of two distinct approaches: homology annotation transfer and
machine learning. BLAST searches the Transporter Classification Database (TCDB)
[20] for homology to a query protein. If a query meets three thresholds, it is pro-
jected as a transporter. It also computes three composition features for training their
respective SVM models. Finally, the meta-model assigns a protein the transport pro-
tein classification. They claim accuracy of 90.07% and 92.22%, respectively, and
MCC values of 0.80 and 0.82 for the cross-validation and independent test sets,
respectively. While incorporating multiple feature sets and classifiers improves the
classification of transport proteins in TooT-T, it also increases the task’s complexity.
3.1 Dataset
This work utilizes the dataset from the TrSSP project [16] which can be accessed
at the following URL: https://www.zhaolab.org/TrSSP/. The dataset was created
using the UniProt database [14], in which 10, 780 transporter, carrier, and channel
proteins were initially well characterized at the protein level with different substrate
specificity annotation. Mishra et al. [16] eliminated from this benchmarking dataset
fragmented sequences, sequences with more than two substrate specificities, and
biological function annotations based only on sequence similarity. As presented in
Table 1 the final dataset contains 1, 560 protein sequences for the training and test
sets. This dataset is referred to as DS-T, which stands for a dataset for transporter
proteins.
4 H. Ghazikhani and G. Butler
where Q (Query), K (Key) and V (Value) are various linear transformations of the
input features in order to obtain information representations for various subspaces.
The dimension of K is dk and WiQ , WiK , WiV and WiO are weight matrices.
BERT is a two-step framework: pre-training and fine-tuning. Pre-training is train-
ing the model on a large amount of unlabeled data in an unsupervised manner. In
contrast, fine-tuning is the process of initializing the model with the pre-trained
parameters and fine-tuning all parameters using labeled data from downstream tasks
via an additional classifier [9].
There are two methods for extracting representations from pre-trained BERT
models: (i) frozen and (ii) fine-tuned. The former extracts features from a pre-trained
BERT model without updating the model’s weights, whereas the latter extracts
Transporters Prediction Using BERT 5
features after training the pre-trained BERT model on a smaller dataset and fine-
tuning the model’s weights [9].
ProtBERT-BFD [10] is the BERT model which has been pre-trained on a large
corpus of protein sequences from the BFD database (https://bfd.mmseqs.com) which
contains 2.5 billion protein sequences. MembraneBERT is ProtBERT-BFD fine-
tuned using the TooT-M membrane proteins dataset [1]. MembraneBERT can be
found at (https://huggingface.co/ghazikhanihamed/MembraneBERT).
The representations from the final hidden layer of ProtBERT-BFD and Mem-
braneBERT models are used in conjunction with a mean-pooling strategy, which is
concluded to be the optimal method in ProtTrans [10].
We add a classification layer and train the entire BERT model on the transporters
training set to fine-tune a BERT model. We randomly chose 10% of the training
samples as the validation set in this study. The downstream task dataset will update all
initialized weights from pre-training during the fine-tuning phase. We fine-tuned the
BERT models using the Trainer API from HuggingFace [24]. This is a preliminary
investigation of BERT’s role in transport protein analysis, so we used the same
hyperparameter settings as ProtTrans [10], except for the empirically determined
number of training epochs of 13 for ProtBERT-BFD and 10 for MembraneBERT.
We discovered these numbers when we have the maximum performance throughout
the validation set results. Additional hyperparameters for fine-tuning are listed in
Table 2 which are recommended and used in ProtTrans project.
3.5 Evaluation
A 10-fold cross-validation (CV) technique was used in this analysis to evaluate the
model’s performance by partitioning the dataset into ten sections. For the purpose of
fine-tuning the BERT, 10% of the training set was used as the validation set, while
the remaining 90% was used for training. The independent test set is utilised for the
sole purpose of evaluating the method.
Four key evaluation criteria are considered in this project: Sensitivity (Sen), Speci-
ficity (Spc), Accuracy (Acc), and MCC.
(T P × T N ) − (F P × F N )
MCC = √ (4)
(T P + F P)(T P + F N )(T N + F P)(T N + F N )
Fig. 1 The effect of fine-tuning (This figure depicts the results of fine-tuning the ProtBERT-BFD
(left) and MembraneBERT (right) with accuracy and MCC metrics at each epoch on the validation
set. The y-axis and x-axis display the scores and epochs, respectively)
Table 4 and Fig. 2 are used to compare TooT-BERT-T to other published methods that
use only the protein sequence on the same dataset. As demonstrated, TooT-BERT-T
outperforms other published works in all evaluation metrics except sensitivity, where
Nguyen et al. [17] achieves 100% sensitivity.
Actual values
of TooT-BERT-T, where T T 115 5
represents transport protein
and non-T represents
non-transport protein)
non-T 6 54
T non-T
TooT-BERT-T has a greater specificity (rate of true negatives) than the approach of
Nguyen et al. [17], indicating that it makes fewer false positive predictions (Fig. 3).
This is essential for achieving a high true negative rate of 90% when describing
non-transport proteins.
The proposed method, TooT-BERT-T, which employs fine-tuned ProtBERT-BFD
representation and a Logistic Regression classifier using the dataset explained in
Sect. 3.1, outperforms previous methods with an accuracy of 93.89% and an MCC
of 0.86 on the independent test set.
The ProtBERT-BFD representation is effective because it understands the context
of each amino acid in different protein sequences, whereas other methods rely on
static protein-encoding techniques.
Figure 3 shows a confusion matrix of TooT-BERT-T for separating transport pro-
teins from non-transport proteins. As depicted in the figure, despite the fact that the
number of errors is quite low, the model makes more mistakes when identifying non-
transporters as transporters (False positive = 6) than when predicting transporters
as non-transporters (False negative = 5). This suggests that the proposed strategy
is somewhat skewed towards predicting the positive class (transport proteins). This
issue may occur when the dataset is imbalanced, with more positive class samples
than negative class samples.
5 Conclusion
References
1. Alballa M, Butler G (2020) Integrative approach for detecting membrane proteins. BMC Bioin-
form 21(19):575
2. Alballa M, Butler G (2020) TooT-T: discrimination of transport proteins from non-transport
proteins. BMC Bioinform 21(3):25
3. Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM (2019) Unified rational protein
engineering with sequence-based deep representation learning. Nat Methods 16(12):1315–
1322
4. Aplop F, Butler G (2015) On predicting transport proteins and their substrates for the recon-
struction of metabolic networks. In: 2015 IEEE conference on computational intelligence in
bioinformatics and computational biology (CIBCB), pp 1–9
5. Aplop F, Butler G (2017) TransATH: transporter prediction via annotation transfer by homol-
ogy. ARPN J Eng Appl Sci 12(2):8
6. Bepler T, Berger B (2019) Learning protein sequence embeddings using information from
structure. arXiv:1902.08661 [cs, q-bio, stat]
7. Chicco D, Jurman G (2020) The advantages of the Matthews Correlation Coefficient (MCC)
over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):6
8. Detlefsen NS, Hauberg S, Boomsma W (2022) Learning meaningful representations of protein
sequences. Nat Commun 13(1):1914
9. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional
transformers for language understanding. arXiv:1810.04805 [cs]
10. Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer
C, Steinegger M, Bhowmik D, Rost B (2021) ProtTrans: towards cracking the language of
lifes code through self-supervised deep learning and high performance computing. IEEE Trans
Pattern Anal Mach Intell 1
11. Ferruz N, Höcker B (2022) Towards controllable protein design with conditional transformers.
arXiv:2201.07338 [q-bio]
12. Hess AS, Hess JR (2019) Logistic regression. Transfusion 59(7):2197–2198
13. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates
R, Z̆ídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-
Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M,
Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW,
Kavukcuoglu K, Kohli P, Hassabis D (2021) Highly accurate protein structure prediction with
AlphaFold. Nature 596(7873):583–589
14. Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R (2004) UniProt archive.
Bioinformatics 20(17):3236–3237
15. Liou YF, Vasylenko T, Yeh CL, Lin WC, Chiu SH, Charoenkwan P, Shu LS, Ho SY, Huang HL
(2015) SCMMTP: identifying and characterizing membrane transport proteins using propensity
scores of dipeptides. BMC Genom 16(12):S6
16. Mishra NK, Chang J, Zhao PX (2014) Prediction of membrane transport proteins and their
substrate specificities using primary sequence information. PLoS ONE 9(6):e100278
17. Nguyen TTD, Le NQK, Ho QT, Phan DV, Ou YY (2019) Using word embedding technique
to efficiently represent protein sequences for identifying substrate specificities of transporters.
Anal Biochem 577:73–81
18. Ofer D, Brandes N, Linial M (2021) The language of proteins: NLP, machine learning & protein
sequences. Comput Struct Biotechnol J 19:1750–1758
19. Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, Abbeel P, Song Y (2019)
Evaluating protein transfer learning with TAPE. In: Wallach H, Larochelle H, Beygelzimer A,
Alché-Buc Fd, Fox E, Garnett R (eds) Advances in neural information processing systems,
vol 32. Curran Associates, Inc
20. Saier Jr MH, Tran CV, Barabote RD (2006) TCDB: the transporter classification database for
membrane transport protein analyses and information. Nucleic Acids Res 34(suppl_1):D181–
D186
Transporters Prediction Using BERT 11
21. Saier Jr MH (2002) Families of transporters and their classification. In: Transmembrane trans-
porters. Wiley, pp 1–17
22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I
(2017) Attention is all you need. arXiv
23. Vig J, Madani A, Varshney LR, Xiong C, Socher R, Rajani NF (2021) BERTology meets
biology: interpreting attention in protein language models. arXiv:2006.15222 [cs, q-bio]
24. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz
M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame
M, Lhoest Q, Rush AM (2020) HuggingFace’s transformers: state-of-the-art natural language
processing. arXiv
Machine Learning and Deep Learning
Techniques for Epileptic Seizures
Prediction: A Brief Review
Abstract The third most common neurological disorder, only behind stroke and
migraines, is Epilepsy. The main criteria for its diagnosis are the occurrence of
unprovoked seizures and the possibility of new seizures appearing. Usually, the
professional in charge of detecting these seizures is a neurologist who interprets
the patients’ electroencephalography. However, more accurate, precise, and sensitive
methods are needed. Machine learning has increased as a viable alternative, reducing
costs and ensuring rapid diagnostic time. This work reviews the state of the art in
machine learning applied to epileptic seizure detection and prediction as a prospective
study before developing a novel seizure prediction algorithm.
This work was supported by the HERMES project, funded by the European Union under the Horizon
2020 FET-proactive program, Grant Agreement n. 824164., as well as the “XAI - XAI - Sistemas
Inteligentes Auto Explicativos creados con Módulos de Mezcla de Expertos” project, ID SA082P20,
financed by Junta Castilla y León, Consejería de Educación, and FEDER funds.
1 Introduction
EEG signals usually come with several artifacts that may obscure the signal’s
epilepsy-related information. These artifacts depend on the type of model of study
or the specifics of the EEG recording.
The most typical processing for most EEG signals includes the removal of back-
ground noise, done by filtering the 50–60 Hz powerline. A vast number of features
can then be extracted from the processed signal.
Detection of seizures by EEG has traditionally been done manually by clinical pro-
fessionals, evaluating the frequency, wavelength, voltage, amplitude, and waveforms.
These features are suitable for being analyzed using automated learning algorithms
[28]. Since the first computer analysis of EEG records in 2002 using wavelet trans-
form [5], automated detection of ictal events has become an essential matter in
epilepsy research [9, 25].
Many studies have been carried out in the last years to improve the performance of
automated seizure detection. A wide range of Machine Learning and Deep Learning
classifiers have been employed [15, 37] but, while detecting seizures may remain
valuable for research purposes, patients and clinical professionals need tools that
allow them to avoid the seizures instead of doing a post hoc analysis; here is where
seizure prediction comes necessary.
4 Seizure Prediction
The first experiments achieved times near 2.24 min and demonstrated the effective-
ness of wavelet functions as predictors [47]. Best time results in animal models were
obtained by [64] using canine EEG data and multiple machine learning algorithms,
being able to detect the seizures 1 min ahead. That work established a proof of con-
cept, so no diagnostic performance analyses were carried out. Nevertheless, some
works have developed systems with performances over 90% of sensitivity but with a
loose prediction time. Rajdev’s team [55] developed a seizure prediction system for
rat EEG recordings based on an adaptive wiener filter; his approach hits a 92% of
sensitivity, being also the most sensitive of the works done on animal EEG recordings.
The work of Iasemidis [33] established a ceiling of capacity in the time of prediction
with 91 min and a precision of 91.3% (and sensitivity of 81.82%). Other authors
aimed to maintain a prediction time horizon and keep the algorithm between those
parameters. In such works [2, 70] 30 and 50-min horizons were fixed, and sensitivity
between 79.9 and 90.2% were achieved.
Tsiouris [62] reached a prediction with 15 to 120 min ahead and a 99% of sensitiv-
ity, making use of Long Short-term Memory networks (LSTM), the first application
of deep learning in the field.
Other deep learning approaches have reached similar results while automating
feature extraction. Wei [65] uses an image of the EEG as input to an architecture
based on Convolutional Neural Networks (CNN) for feature extraction and LSTM
for sequence learning. This network achieves an average accuracy of 93.4% at an
average warning time of 21 min. Transformers have been used in a similar manner
[67] reaching prediction sensitivity and a False Positives Rate of 96.01% and 0.047/h,
respectively with an average warning time between 3 and 30 min.
5 Conclusion
Although many advances have been made since the first works in automated seizure
prediction, some gaps remain to be cleared.
18 M. Hernández et al.
References
1. Global, regional, and national incidence, prevalence, and years lived with disability for 310
diseases and injuries, 1990-2015: a systematic analysis for the global burden of disease study.
Technical report, GBD 2015 Disease and Injury Incidence and Prevalence Collaborators (2015)
2. Aarabi A, He B (2012) A rule-based seizure prediction method for focal neocortical epilepsy.
Clin Neurophysiol 123:1111–1122
3. Acharya UR, Oh S, Hagiwara Y, Adeli H (2017) Deep convolutional neural network for the
automated detection of seizure using EEG signals. Comput Biol Med 100:270–278
4. Acharya U, Yanti R, Zheng J, Mookiah M, Tan J, Martis R et al (2013) Automated diagnosis
of epilepsy using CWT, HOS and texture parameters. Int J Neural Syst 23
5. Adeli H, Zhou Z, Dadmehr N (2003) Analysis of EEG records in an epileptic patient using
wavelet transform. J Neurosci Methods 123:69–87
6. Alves N, Rodrigues R, Rocha M (2022) BioTMPy: a deep learning-based tool to classify
biomedical literature. In: Lecture notes in networks and systems. LNNS, vol 325, pp 115–125
7. Bandarabadi M, Rasekhi J, Teixeira C, Karami M (2015) On the proper selection of preictal
period for seizure prediction. Epilepsy Behav 45:158–166
8. Bandarabadi M, Teixeira C, Rasekhi J, Dourado A (2015) Epileptic seizure prediction using
relative spectral power features. Clin Neurophysiol 126:237–248
9. Calvaresi D, Albanese G, Calbimonte J, Schumacher M (2020) Seamless: simulation and anal-
ysis for multi-agent system in time-constrained environments. In: Lecture notes in computer
science (including subseries lecture notes in artificial intelligence and lecture notes in bioin-
formatics). LNAI, vol 12092, pp 392–397
10. Casado-Vara R, González-Briones A, Prieto J, Corchado J (2019) Smart contract for monitoring
and control of logistics activities: pharmaceutical utilities case study. Adv Intell Syst Comput
771:509–517
11. Casado-Vara R, Novais P, Gil A, Prieto J, Corchado J (2019) Distributed continuous-time fault
estimation control for multiple devices in IoT networks. IEEE Access 7:11972–11984
12. Casado-Vara R, Prieto-Castrillo F, Corchado J (2018) A game theory approach for cooperative
control to improve data quality and false data detection in WSN. Int J Robust Nonlinear Control
28(16):5087–5102
13. Chen H, Shen J, Wang L, Jin Y (2021) Towards a more effective bidirectional LSTM-based
learning model for human-bacterium protein-protein interactions. In: Advances in intelligent
systems and computing. AISC, vol 1240, pp 91–101
14. Costa Â, Novais P, Corchado J, Neves J (2012) Increased performance and better patient
attendance in an hospital with the use of smart agendas. Logic J IGPL 20(4):689–698
15. Cristani M, Tomazzoli C, Olivieri F, Pasetto L (2020) An ontology of changes in normative
systems from an agentive viewpoint. In: Communications in computer and information science.
CCIS, vol 1233, pp 131–142
ML and DL Techniques for Epileptic Seizures Prediction ... 19
16. Cámpora NE, Mininni CJ, Kochen S, Lew SE (2019) Seizure localization using pre ictal phase-
amplitude coupling in intracranial electroencephalography. Sci Rep 9:20022
17. D’Alessandro M, Esteller R, Vachtsevanos G, Hinson A, Echauz J, Litt B (2003) Epileptic
seizure prediction using hybrid feature selection overmultiple intracranial EEG electrode con-
tacts: a report of four patients. IEEE Trans Inf Theory 50:603–615
18. De Meo P, Falcone R, Sapienza A (2020) Fast and efficient partner selection in large agents’
communities: when categories overcome direct experience. In: Lecture notes in computer
science (including subseries lecture notes in artificial intelligence and lecture notes in bioin-
formatics). LNAI, vol 12092, pp 106–117
19. Direito B, Teixeira CA, Sales F, Castelo-Branco M, Dourado A (2017) A realistic seizure
prediction study based on multiclass SVM. Int J Neural Syst 27:1750006–1750021
20. Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Pro-
cess 62:531–544
21. Dressler O, Schneider G, Stockmanns G, Kochs E (2004) Awareness and the EEG power
spectrum: analysis of frequencies. Br J Anaesth 93
22. D’Auria M, Scott E, Lather R, Hilty J, Luke S (2020) Assisted parameter and behavior calibra-
tion in agent-based models with distributed optimization. In: Lecture notes in computer science
(including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics).
LNAI, vol 12092, pp 93–105
23. Fatima N (2020) Enhancing performance of a deep neural network: a comparative analysis of
optimization algorithms. ADCAIJ Adv Distrib Comput Artif Intell J 9(2):79–90
24. Gadhoumi K, Lina J, Gotman J (2013) Seizure prediction in patients with mesial temporal lobe
epilepsy using EEG measures of state similarity. Clin Neurophysiol 124:1745–1754
25. García-Retuerta D, Canal-Alonso A, Casado-Vara R, Rey A, Panuccio G, Corchado J (2021)
Bidirectional-pass algorithm for interictal event detection. In: Advances in intelligent systems
and computing. AISC, vol 1240, pp 197–204
26. Garg G, Singh V, Gupta J, Mittal A (2011) Relative wavelet energy as a new feature extractor
for sleep classification using EEG signals. Int J Biomed Signal Process 2:75–80
27. Grassberger P, Procaccia I (1983) Characterization of strange attractors. Phys Rev Lett 50:346–
349
28. Gupta S, Meena J, Gupta O (2020) Neural network based epileptic EEG detection and classi-
fication. ADCAIJ Adv Distrib Comput Artif Intell J 9(2):23–32
29. Gupta S, Ranga V, Agrawal P (2022) EpilNet: a novel approach to IoT based epileptic seizure
prediction and diagnosis system using artificial intelligence. ADCAIJ Adv Distrib Comput
Artif Intell J 10(4):435–452
30. Haenlein A (2004) A beginner’s guide to partial least squares analysis. Underst Stat 283–297
31. Hirtz D, Thurman DJ, Gwinn-Hardy K, Mohamed M, Chaudhuri AR, Zalutsky R (2007) How
common are the “common” neurologic disorders? Neurology 68:326–337
32. Hjorth B, Elema-Schonander A (1970) EEG analysis based on time domain properties. Elec-
troencephalogr Clin Neurophysiol 29:306–310
33. Iasemidis L, Shiau D, Pardalos P, Chaovalitwongse W, Narayanon K, Prasad A (2005) Long-
termprospective on-line real-time seizure prediction. Clin Neurophysiol 116:532–544
34. Khan A, Zubair S, Khan S (2021) Comprehensive performance analysis of neurodegenerative
disease incidence in the females of 60–96 year age group. ADCAIJ Adv Distrib Comput Artif
Intell J 10(2):183–196
35. Kumar M, Rao Y (2018) Epileptic seizures classification in EEG signal based on semantic
features and variational mode decomposition. Clust Comput 1–11
36. Lane N, Kahanda I (2021) DeepACPpred: a novel hybrid CNN-RNN architecture for predicting
anti-cancer peptides. In: Advances in intelligent systems and computing. AISC, vol 1240, pp
60–69
37. Lee K, Jeong H, Kim S, Yang D, Kang HC, Choi E (2022) Real-time seizure detection
using EEG: a comprehensive comparison of recent approaches under a realistic setting.
arXiv:2201.08780 [cs]
20 M. Hernández et al.
38. Li T, Fan H, García J, Corchado J (2018) Second-order statistics analysis and comparison
between arithmetic and geometric average fusion: application to multi-sensor target tracking.
Inf Fusion 51:233–243
39. Li T, Su J, Liu W, Corchado J (2017) Approximate gaussian conjugacy: parametric recursive
filtering under nonlinearity, multimodality, uncertainty, and constraint, and beyond. Front Inf
Technol Electron Eng 18(12):1913–1939
40. Mallat S, Hwang W (1992) Singularity detection and processing with wavelets. IEEE Trans
Inf Theory 38:617–643
41. Mandelbrot B (1983) Geometry of nature. Freeman
42. Mena Mamani N (2020) Machine learning techniques and polygenic risk score application to
prediction genetic diseases. ADCAIJ Adv Distrib Comput Artif Intell J 9(1):5-14
43. Muhamada AW, Mohammed AA (2022) Review on recent computer vision methods for human
action recognition. ADCAIJ Adv Distrib Comput Artif Intell J 10(4):361–379
44. Niedermeyer E, da Silva F (2004) Electroencephalography: basic principles, clinical applica-
tions, and related fields. Williams & Wilkins
45. Nikias C, Petropulu A (1993) Higher order spectra analysis: a nonlinear signal processing
framework. PTR Prentice Hall
46. Nugroho S, Weinmann A, Schindelhauer C, Christ A (2020) Averaging emulated time-series
data using approximate histograms in peer to peer networks. In: Communications in computer
and information science. CCIS, vol 1233, pp 339–346
47. Ouyang G, Li X, Guan X (2007) Application of wavelet-based similarity analysis to epileptic
seizures prediction. Comput Biol Med 37:430–437
48. Ozaktas H, Zalevsky Z, Kutay M (2001) The fractional Fourier transform. Wiley
49. Parvez M, Paul M (2016) Epileptic seizure prediction by exploiting spatiotemporal relationship
of EEG signals using phase correlation. IEEE Trans Neural Syst Rehabil Eng 24:158–168
50. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag
2:559–572
51. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of
max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell
27:1226–1238
52. Press M (ed) (1999) The infinite Gaussian mixture model
53. Press W, Flannery B, Teukolsky S, Vetterling W (1992) Numerical recipes in C: the art of
scientific computing. Cambridge University Press
54. Pérez-López R, Blanco G, Fdez-Riverola F, Lourenço A (2021) The activity of bioinformatics
developers and users in stack overflow. In: Advances in intelligent systems and computing.
AISC, vol 1240, pp 23–31
55. Rajev P, WArd M, Rickus J, Worth R, Irazoqui P (2010) Real-time seizure prediction from
local field potentials using an adaptive wiener algorithm. Comput Biol Med 40:97–108
56. Ramoser H, Mller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial
EEG during imagined hand movement. IEEE Trans Rehabil Eng 8:441–446
57. Reddy B, Chatteriji B (1996) An FFT-based technique for translation, rotation, and scale
invariant image registration. IEEE Trans Image Process 5:1266–1271
58. Richman J, Moorman J (2000) Physiological time-series analysis using approximate entropy
and sample entropy. Am J Physiol Heart Circ Physiol 278:2039–2049
59. Robert S, Fisher AC, Arzimanoglou A, Bogacz A, Cross JH, Elger Jr CE, Forsgren L, French
JA, Glynn M, Hesdorffer DC, Lee B, Mathern GW, Moshé SL, Perucca E, Scheffer IE, Tomson
T, Watanabe M, Wiebe S (2014) ILAE official report: a practical clinical definition of epilepsy.
Epilepsia 55:475–482
60. Rosenstein M, Colins J, de Luca C (1993) A practical method for calculating largest Lyapunov
exponents from small data sets. Phys D 65:117–134
61. Shafique A, Sayeed M, Tsakalis K (2018) Nonlinear dynamical systems with chaos and big
data: a case study of epileptic seizure prediction and control. Guide to big data applications.
Springer (2018)
ML and DL Techniques for Epileptic Seizures Prediction ... 21
1 Introduction
areas where health resources are scarce. To face these risks, we recently developed
several classifiers, useful to predict safe discharge, disease severity, and mortality risk
from COVID-19, fed by routine analyses collected in the Emergency Department
(ED) [7].
In this paper, we focus on a system, called COVID-19 Decision Support System
(C19DSS), whose aim is to enable doctors from EDs to take advantage of the afore-
mentioned models during the management of COVID-19 patients. The C19DSS sys-
tem has been developed following the User-Centered Design (UCD) methodology,
i.e., involving the doctors since the early phases of the system design, and revising
the development in the light of usability cycles [16]. So far, the system has been used
during the fourth wave. It is currently producing accurate results with patients that
did not complete the vaccination protocol (i.e., up to the second dose). Therefore,
we consider the system suitable in these cases, as well as in all countries with low
vaccination rates.
2 Background
To allow the paper to be self-contained, we briefly report the results concerning the
models we developed, which we submitted in [7].
From a dataset containing the routine analyses of 779 patients collected in the
ED, we devised several models for both the complete cases and through missing
data imputation [6]. The following different models were tried from the available
dataset: decision tree (DT) – as baseline [5], random forest (RF) [4] and gradient
boosting machines (GBM) [9]. The models were developed to predict safe discharge
(discharge/admit), disease severity (mild/severe), and mortality risk (no risk/risk).
For all models and outcomes, we split the dataset into train and test (with 75% of
data going for training, 25% for testing), used 10-fold cross-validation, tuned each
classifier according to its specific hyper-parameters, calculated the confusion matrix
and the ROC curve [10]. The results are summarized in Table 1. The table lists all
details of each model, for each outcome, for the subset of the complete cases and
for the complete dataset (with missing data imputation). The best AUC is reported
in bold, together with the corresponding model.
Besides the limited size of the dataset and the constraint of using only routine
clinical and laboratory data to devise the models, the performances of our models
are in line with the best prediction models available in the scientific literature, that
make use of similar data than our [3, 12, 26]. In particular: (i) concerning hospital
admission, Jimenez-Solem et al. [12] developed a RF model that reached an AUC
equal to 0.82, vs 0.89 and 0.94 of our models; (ii) for severity prediction, Yao et
al. [26] devised a Support Vector Machines (SVM) model that reached an accuracy
equal to 0.82, vs the 0.76 and 0.89 of accuracy of our models; (iii) for mortality
prediction, Booth et al. [3] developed a SVM model that reached an AUC of 0.93,
vs 0.84 and 0.87 of our models.
C19DSS 25
Table 1 Main statistics for all outcomes and classifiers. Acc = Accuracy, Sens = Sensitivity, Spec
= Specificity, AUC = Area Under the Curve
Complete cases Missing data imp.
Acc Sens Spec AUC Acc Sens Spec AUC
Safe discharge
DT 0.870 0.941 0.546 0.937 0.824 0.842 0.770 0.858 DT
RF 0.886 0.960 0.546 0.938 0.829 0.869 0.780 0.894 RF
GBM 0.886 0.951 0.591 0.943 0.824 0.876 0.666 0.882 GBM
Disease severity
DT 0.805 0.560 0.867 0.792 0.742 0.732 0.752 0.766 DT
RF 0.886 0.680 0.939 0.886 0.757 0.783 0.732 0.832 RF
GBM 0.846 0.600 0.908 0.893 0.762 0.804 0.721 0.827 GBM
Mortality
DT 0.829 0.970 0.250 0.758 0.840 0.944 0.290 0.689 DT
RF 0.878 0.960 0.542 0.866 0.876 0.969 0.387 0.842 RF
GBM 0.854 0.939 0.500 0.857 0.860 0.944 0.419 0.844 GBM
With respect to similar tools available in the scientific literature, Liu et al. [15]
propose a system to assist doctors in collecting data, assessing risk, triaging, manag-
ing, and following up on patients during the COVID-19 outbreak. The system uses
logistic regression to predict risk, obtaining an AUC of 0.71. Furthermore, McRae
et al. [18] developed an app that leverages models that use non-laboratory data to
help determine whether hospitalization is necessary (AUC = 0.79) and that predicts
the probability of mortality using bio-marker measurements (AUC = 0.95).
Our work continues in the same direction: it adopted state-of-the-art models based
on routine data collected by the involved EDs, and developed a system supporting
doctors in defining the need for hospitalization, disease severity and mortality risk,
for patients accessing the ED.
3 C19DSS
In this section, we first describe the architecture of the overall system, and in particular
of the C19CDSS app. Then, we present the usability results and the preliminary use
within our Institution.
26 P. Vittorini et al.
3.1 Architecture
(a) Dashboard (b) Patient list (c) New data (d) Edit data
personal data is ever communicated over the network) to the classification endpoint
(the server follows the RESTful API paradigm) [20]. Then, the server uses R [19] to
apply the correct model, depending on the request, on the received data. Hence, the
server stores the received data for further analyses (i.e., to evaluate the quality of the
predictions and potentially update the models), and finally returns the classification
results to the app.
To develop the app, we followed the UCD methodology, i.e., we involved the physi-
cians from the very beginning phases of the design, and then we adapted and improved
the design/implementation according to consecutive cycles of usability tests.
In the first phase, we discussed and defined with three physicians the navigational
structure and the app user interface through mockups. After the system implementa-
tion, the first usability testing took place. The following three tasks were evaluated
with seven physicians: (i) data entry, (ii) classification and (iii) data editing. We
collected quantitative and qualitative measures based on the Single Ease Question
(SEQ) and through unstructured interviews [22]. At the first iteration, we measured
an average SEQ of 3.86/5, 3.71/5 and 4.00/5 for each task, and we collected a few
issues and suggestions on how to improve the app. Among them, we added the auto-
mated calculation of the P/F, NLR and PLR values1 , we implemented a more clear
1 P/F (PaO2 /FIO2 ) = Oxygenation Index, NLR = Neutrophil-to-Lymphocyte Ratio, PLR = Platelet-
to-Lymphocyte Ratio.
28 P. Vittorini et al.
visualization of the classification, and we fixed a bug that blocked the classification.
At the second iteration, the average SEQs increased to 4.71/5, 4.43/5 and 4.71/5.
In summary, we increased the overall average ease of completing all tasks from
3.86/5 to 4.62/5, from the first to the second implementation.
4 Discussion
The work presented in this paper starts from previous research finalized to devise
state-of-the-art ML models, fed by routine clinical and laboratory analyses, to be
used by physicians to manage safe discharge, severe disease (on the seventh day
after medical presentation) and mortality during hospitalization.
Nevertheless, the models were devised from a cohort of unvaccinated patients,
hence a cohort not previously immunized against SARS-CoV-2, and therefore the
applicability of the models should be considered for unvaccinated patients.
At the time of writing, available data suggest long-term vaccine effectiveness in
fully vaccinated healthy adults, but there are some uncertainties regarding vaccine
waning in not fully vaccinated and in immunocompromised patients. Some evidence
suggests that the risk of severe disease is higher in immunocompromised patients and
in elderly ones [1, 13, 14]. On these bases, the app could be useful also in vulnerable
patients where the immunizations seem to be less effective after a prolonged time.
In order to optimize the app performance also in fully vaccinated patients, during
the data entry, for any new patient, we also save the vaccination status. So far, this
information is not used by our models. However, when enough data will be collected,
we could devise new models that will also consider the vaccination status. Moreover,
given the client/server architecture and given that the predictions are provided by the
server, the new models could be used by physicians without any change in the app,
but only with a server upgrade, without affecting the user experience.
C19DSS 29
With specific regard to the C19DSS system, the adoption of the UCD methodology
to design and develop the app, enabled us to gradually improve the user experience
and collect useful suggestions on how to improve the overall system. Finally, the
physicians that used the system reported that the application was easy ed intuitive to
use; the process of data entry and classification did not hamper the normal ED work
routine; conversely, it helped them to organize the workflow of COVID-19 patients.
5 Conclusions
Presumably, in the next future, the SARS-CoV-2 pandemic will no longer be a global
emergency, but in absence of an efficient global vaccination campaign, SARS-CoV-2
outbreaks could still be a threat to the communities where healthcare resources are
limited and the immunization rate has not reached a protective stage.
Our study highlighted how AI-powered tools could be a valid support for emer-
gency care. We do not suppose that mobile apps could replace the physician’s bedside
decision process, but we conceive that the interaction between emergency physicians
and AI tools could improve healthcare assistance and have a significant impact on
SARS-CoV-2 management.
References
1. Andrews N et al (2022) Duration of protection against mild and severe disease by COVID-19
vaccines. N Engl J Med 386(4):340–350. https://doi.org/10.1056/NEJMOA2115481
2. Bath PA (2008) Health informatics: current issues and challenges. J Inf Sci 34(4):501–518.
https://doi.org/10.1177/0165551508092267
3. Booth AL, Abels E, McCaffrey P (2021) Development of a prognostic model for mortality in
COVID-19 infection using machine learning. Mod Pathol 34(3):522–531. an official journal of
the United States and Canadian Academy of Pathology, Inc. https://doi.org/10.1038/S41379-
020-00700-X
4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32.
https://doi.org/10.1023/A:1010933404324
5. Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and Regression Trees.
CRC Press, Boca Raton. https://doi.org/10.1201/9781315139470
6. Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained
equations in R. J Stat Softw 45(3):1–67. https://doi.org/10.18637/JSS.V045.I03
7. Casano N et al (2022) Application of machine learning approach in Emergency Department
to support clinical decision making for SARS-CoV-2 infected patients. Submitted manuscript,
under review
8. Chamoso P, De La Prieta F, Eibenstein A, Santos-Santos D, Tizio A, Vittorini P (2017) A device
supporting the self management of tinnitus. In: Rojas I, Ortuño F (eds) IWBBIO 2017, vol
10209. LNCS. Springer, Cham, pp 399–410. https://doi.org/10.1007/978-3-319-56154-7_36
9. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp
785–794. ACM, New York, NY, USA. https://doi.org/10.1145/2939672
Another random document with
no related content on Scribd:
THE SICK CHAMBER
for I feel as I read that if the stage shows us the masks of men and
the pageant of the world, books let us into their souls and lay open to
us the secrets of our own. They are the first and last, the most home-
felt, the most heartfelt of all our enjoyments.
FOOTMEN