Nothing Special   »   [go: up one dir, main page]

Next Issue
Volume 14, April
Previous Issue
Volume 14, February
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Information, Volume 14, Issue 3 (March 2023) – 61 articles

Cover Story (view full-size image): The emergence of COVID-19 generated a need for accurate timely information related to its spread. We propose two methods for using Twitter to help model the spread of COVID-19: machine learning algorithms trained in five languages are used to identify symptomatic individuals; using the geo-location attached to each tweet, we also map where people have symptoms. We calibrated an epidemiological model and then evaluated the usefulness of the data when making predictions of deaths in 50 US States, 16 Latin American countries, 2 European countries and 7 regions in the UK. Using such tweets from symptomatic individuals results in improvements in accuracy when predicting COVID-19 deaths. We also show we can extract useful data describing movements between UK regions. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
11 pages, 448 KiB  
Article
Feature Selection Engineering for Credit Risk Assessment in Retail Banking
by Jaber Jemai and Anis Zarrad
Information 2023, 14(3), 200; https://doi.org/10.3390/info14030200 - 22 Mar 2023
Cited by 11 | Viewed by 5205
Abstract
In classification, feature selection engineering helps in choosing the most relevant data attributes to learn from. It determines the set of features to be rejected, supposing their low contribution in discriminating the labels. The effectiveness of a classifier passes mainly through the set [...] Read more.
In classification, feature selection engineering helps in choosing the most relevant data attributes to learn from. It determines the set of features to be rejected, supposing their low contribution in discriminating the labels. The effectiveness of a classifier passes mainly through the set of selected features. In this paper, we identify the best features to learn from in the context of credit risk assessment in the financial industry. Financial institutions concur with the risk of approving the loan request of a customer who may default later, or rejecting the request of a customer who can abide by their debt without default. We propose a feature selection engineering approach to identify the main features to refer to in assessing the risk of a loan request. We use different feature selection methods including univariate feature selection (UFS), recursive feature elimination (RFE), feature importance using decision trees (FIDT), and the information value (IV). We implement two variants of the XGBoost classifier on the open data set provided by the Lending Club platform to evaluate and compare the performance of different feature selection methods. The research shows that the most relevant features are found by the four feature selection techniques. Full article
(This article belongs to the Special Issue Machine Learning: From Tech Trends to Business Impact)
Show Figures

Figure 1

Figure 1
<p>ROC Curves of the XGBoost models based on different features selection techniques. (<b>a</b>) UFS-XGBoost. (<b>b</b>) FIDT-XGBoost. (<b>c</b>) RFE-XGBoost. (<b>d</b>) IV-XGBoost.</p>
Full article ">Figure 2
<p>Wordcloud of the selected features.</p>
Full article ">
20 pages, 6434 KiB  
Article
Research on the R&D Strategies of Iron and Steel Enterprises Based on Semantic Topic Analysis of Patents
by Hongxia Wang, Ming Li, Zhiru Wang and Zitong Shan
Information 2023, 14(3), 199; https://doi.org/10.3390/info14030199 - 22 Mar 2023
Cited by 1 | Viewed by 2393
Abstract
R&D strategies play a decisive role in the promotion of enterprise innovation output and innovation ability. In order to thoroughly investigate the R&D strategies of iron and steel enterprises, an R&D strategy analysis framework based on R&D semantic topic analysis and outlier behavior [...] Read more.
R&D strategies play a decisive role in the promotion of enterprise innovation output and innovation ability. In order to thoroughly investigate the R&D strategies of iron and steel enterprises, an R&D strategy analysis framework based on R&D semantic topic analysis and outlier behavior detection was proposed. Additionally, empirical research on R&D layout and direction, R&D quality, and the achievement maintenance strategy of enterprises, from both macro and micro perspectives, was conducted. The feasibility of the R&D strategy analysis framework was verified. Additionally, the results show that, in terms of R&D topic layout strategy, most enterprises adopted a stable maintenance strategy after quickly completing the layout; regarding the R&D focus strategy, most enterprises focused on R&D fields and carried out strategic management; for R&D quality control strategy, some enterprises adopted a strategy of prolonging the duration of invention patents, and high-quality outputs with a long lifetime were developed rapidly. These research results have reference value for Chinese enterprises, to adjust their R&D strategies, and for the government, to formulate supporting policies. Full article
Show Figures

Figure 1

Figure 1
<p>The research framework.</p>
Full article ">Figure 2
<p>The topics of the patent texts and the corresponding perplexity values.</p>
Full article ">Figure 3
<p>Number of patents and number of topics. Note: invention patent applying represents an invention patent applied but not granted; invention patent granted represents a granted invention patent. The line refers to the topic numbers of patents.</p>
Full article ">Figure 4
<p>Number of patents and topics during 1985–2022. Note: invention patent applying stands for an invention patent applied but not granted, and invention patent granted represents a granted invention patent. The line refers to the topic number of patents.</p>
Full article ">Figure 5
<p>Number of invention patents and grant rate during 1985–2022. Note: invention patent applying stands for an invention patent applied but not granted, and invention patent granted represents a granted invention patent. The line refers to the grant rate of invention patents.</p>
Full article ">Figure 6
<p>Time trend of the invention patent application volume of different topics.</p>
Full article ">Figure 6 Cont.
<p>Time trend of the invention patent application volume of different topics.</p>
Full article ">Figure 7
<p>Overall survival rate of the invention patents.</p>
Full article ">Figure 8
<p>Lifetime of expired and unexpired invention patents. Note: The dark line represents the unexpired invention patents, and the light line represents the expired invention patents. In the calculation of the KM survival curve, deleted data were removed from the denominator each time. If all data were deleted data, then the plotted survival curve was a horizontal line with a survival rate of 1. When processing unexpired data as deleted data, their distribution has difficulty reflecting the data and misunderstandings can easily occur. Therefore, the unexpired patents were also treated as expired patents, i.e., all the current unexpired patents were assumed to have immediately expired, so as to view their distribution for comparative analysis.</p>
Full article ">Figure 9
<p>KM curves of different themes over the years.</p>
Full article ">
10 pages, 407 KiB  
Article
Fundamental Research Challenges for Distributed Computing Continuum Systems
by Victor Casamayor Pujol, Andrea Morichetta, Ilir Murturi, Praveen Kumar Donta and Schahram Dustdar
Information 2023, 14(3), 198; https://doi.org/10.3390/info14030198 - 22 Mar 2023
Cited by 23 | Viewed by 4605
Abstract
This article discusses four fundamental topics for future Distributed Computing Continuum Systems: their representation, model, lifelong learning, and business model. Further, it presents techniques and concepts that can be useful to define these four topics specifically for Distributed Computing Continuum Systems. Finally, this [...] Read more.
This article discusses four fundamental topics for future Distributed Computing Continuum Systems: their representation, model, lifelong learning, and business model. Further, it presents techniques and concepts that can be useful to define these four topics specifically for Distributed Computing Continuum Systems. Finally, this article presents a broad view of the synergies among the presented technique that can enable the development of future Distributed Computing Continuum Systems. Full article
(This article belongs to the Special Issue Best IDEAS: International Database Engineered Applications Symposium)
Show Figures

Figure 1

Figure 1
<p>Starting at the center of the figure, we see that all developments require a trusted and secure environment, where Zero Trust techniques will have great relevance due to the heterogeneity and distribution characteristics of the system. On the left (<b>A</b>), three different abstractions for a Distributed Computing Continuum System are depicted (the granularity shown can be refined by showing network characteristics or unfolding aspects from the application logic). Each abstraction can easily engage a kind of stakeholder. However, the highest level: Resources-Cost-Quality, aims at being understood by all of them. On top (<b>B</b>), there is the representation of the system through Markov Blankets, considering <math display="inline"><semantics> <mi>α</mi> </semantics></math> as a high-level abstraction variable that we can assess by observing the surrounding node, i.e., those factors that affect its value, remarkably thanks to the Markov Blanket approach only those that are relevant are there. Further, <math display="inline"><semantics> <mi>α</mi> </semantics></math> can be decomposed on other lower-level variables (<math display="inline"><semantics> <mi>β</mi> </semantics></math> and <math display="inline"><semantics> <mi>γ</mi> </semantics></math>), providing a nested structure that can cover the entire system. On the right (<b>C</b>), we take advantage of this representation to embed SLOs, more specifically, a DeepSLO, in order to model the behavior of the system. As an example, SLO<sub>1</sub> can be related to the quality of a machine-learning inference component, which has its own metrics and adaptation strategies. This is linked to two other lower-level SLOs; in this example, SLO<sub>2</sub> can control the input data by making sure that it has the expected resolution, while SLO<sub>3</sub> controls the required time to perform the inference. Having this decomposition enables a fine-grained capacity for adaptation; other lower-level or higher-level SLOs could be developed if needed to have a broader view of a larger component or a more fine-grained control over a smaller one. On the bottom (<b>D</b>), several cases are shown with forced values for metrics and/or elastic strategies. This way, causal inference can improve the systems model to deal with future situations that are not yet known, providing a basis to deal with unexpected or unforeseen scenarios.</p>
Full article ">
24 pages, 1844 KiB  
Article
A Reference Architecture for Enabling Interoperability and Data Sovereignty in the Agricultural Data Space
by Rodrigo Falcão, Raghad Matar, Bernd Rauch, Frank Elberzhager and Matthias Koch
Information 2023, 14(3), 197; https://doi.org/10.3390/info14030197 - 21 Mar 2023
Cited by 7 | Viewed by 2893
Abstract
Agriculture is one of the major sectors of the global economy and also a software-intensive domain. The digital landscape of agriculture is composed of multiple digital ecosystems, which together constitute an agricultural domain ecosystem, also referred to as the “Agricultural Data Space’’ (ADS). [...] Read more.
Agriculture is one of the major sectors of the global economy and also a software-intensive domain. The digital landscape of agriculture is composed of multiple digital ecosystems, which together constitute an agricultural domain ecosystem, also referred to as the “Agricultural Data Space’’ (ADS). As the domain is so huge, there are several sub-domains and specialized solutions, and each of them poses challenges to interoperability. Additionally, farmers have increasing concerns about data sovereignty. In the context of the research project COGNAC, we elicited architecture drivers for interoperability and data sovereignty in agriculture and designed a reference architecture of a platform that aims to address these qualities in the ADS. In this paper, we present the solution concepts and design decisions that characterize the reference architecture. Early prototypes have been developed and made available to support the validation of the concept. Full article
(This article belongs to the Special Issue Architecting Digital Information Ecosystems)
Show Figures

Figure 1

Figure 1
<p>Overview of the elements of an ecosystem service [<a href="#B12-information-14-00197" class="html-bibr">12</a>]. Note that both the Service Asset Provider and the Service Asset Consumer are <span class="html-italic">service consumers</span> since they consume the digital ecosystem asset brokering service.</p>
Full article ">Figure 2
<p>Activities performed as part of the research method.</p>
Full article ">Figure 3
<p>Architecture drivers (ADs), solution concepts (SCs), and design decisions (DDs).</p>
Full article ">Figure 4
<p>Initial functional decomposition of the Twin-Hub.</p>
Full article ">Figure 5
<p>Example deployment diagram of a federated network of Twin-Hubs (interfaces are omitted for the sake of simplicity).</p>
Full article ">Figure 6
<p>Further functional decomposition of the Twin-Hub, now including the component Consent Manager.</p>
Full article ">Figure 7
<p>Sequence diagram of a consent request. First, the farmer provides the service with the URL of their Twin-Hub instance; next, the service prepares and sends a consent request to the Twin-Hub; then the farmer sees the pending consent request on their Twin-Hub, extended by the list of fields to be included in the consent; finally, the farmer decides to which fields they want to grant the service access and confirms the grant, creating a consent.</p>
Full article ">Figure 8
<p>Further functional decomposition of the Twin-Hub, now including the components Access Manager and Logger.</p>
Full article ">Figure 9
<p>Sequence diagram of the logging process. First, a service requests the reading of the field boundaries of a certain field. The request reaches the Field Twin Manager through the Twin-Hub API. Next, the Field Twin Manager asks the Access Manager whether the access has been granted to the requester. Then the Access Manager checks with the Consent Manager whether there is valid consent for the request. After that, if there is valid consent, the Access Manager logs the access. Finally, the Field Twin Manager retrieves the field boundaries from the corresponding digital field twin and returns this data to the requester.</p>
Full article ">Figure 10
<p>Final functional decomposition of the Twin-Hub.</p>
Full article ">Figure 11
<p>Example of the data exchange strategy for a READ operation. First, a service requests the reading of the field boundaries of a certain field. Next, the request reaches the Twin-Hub through its API and is forwarded to the Data Exchange Manager, which stores the data request. Then the Data Exchange Manager asks the Field Twin Manager for the data, which in turn retrieves the data from the storage. Finally, the Data Exchange Manager formats the field data and calls the requester back with a request to send the requested data.</p>
Full article ">Figure 12
<p>Two different services accessing the same digital field twin.</p>
Full article ">Figure 13
<p>Screenshots of the users’ journey through consent management.</p>
Full article ">Figure 13 Cont.
<p>Screenshots of the users’ journey through consent management.</p>
Full article ">
29 pages, 6259 KiB  
Article
Human Factors in Leveraging Systems Science to Shape Public Policy for Obesity: A Usability Study
by Philippe J. Giabbanelli and Chirag X. Vesuvala
Information 2023, 14(3), 196; https://doi.org/10.3390/info14030196 - 20 Mar 2023
Cited by 8 | Viewed by 2684
Abstract
Background: despite a broad consensus on their importance, applications of systems thinking in policymaking and practice have been limited. This is partly caused by the longstanding practice of developing systems maps and software in the intention of supporting policymakers, but without knowing [...] Read more.
Background: despite a broad consensus on their importance, applications of systems thinking in policymaking and practice have been limited. This is partly caused by the longstanding practice of developing systems maps and software in the intention of supporting policymakers, but without knowing their needs and practices. Objective: we aim to ensure the effective use of a systems mapping software by policymakers seeking to understand and manage the complex system around obesity, physical, and mental well-being. Methods: we performed a usability study with eight policymakers in British Columbia based on a software tool (ActionableSystems) that supports interactions with a map of obesity. Our tasks examine different aspects of systems thinking (e.g., unintended consequences, loops) at several levels of mastery and cover common policymaking needs (identification, evaluation, understanding). Video recordings provided quantitative usability metrics (correctness, time to completion) individually and for the group, while pre- and post-usability interviews yielded qualitative data for thematic analysis. Results: users knew the many different factors that contribute to mental and physical well-being in obesity; however, most were only familiar with lower-level systems thinking concepts (e.g., interconnectedness) rather than higher-level ones (e.g., feedback loops). Most struggles happened at the lowest level of the mastery taxonomy, and predominantly on network representation. Although participants completed tasks on loops and multiple pathways mostly correctly, this was at the detriment of spending significant time on these aspects. Results did not depend on the participant, as their experiences with the software were similar. The thematic analysis revealed that policymakers did not have a typical workflow and did not use any special software or tools in their policy work; hence, the integration of a new tool would heavily depend on individual practices. Conclusions: there is an important discrepancy between what constitutes systems thinking to policymakers and what parts of systems thinking are supported by software. Tools may be more successfully integrated when they include tutorials (e.g., case studies), facilitate access to evidence, and can be linked to a policymaker’s portfolio. Full article
(This article belongs to the Special Issue Data Science in Health Services)
Show Figures

Figure 1

Figure 1
<p>Key components of this study along with corresponding subsections.</p>
Full article ">Figure 2
<p>Common tasks in systems thinking with respect to maps include finding structures such as loops (<b>a</b>) and disjoint paths (<b>b</b>). For policymaking, loops can amplify effects or create inertia, while disjoint paths between an intervention and its measurement can include unintended side effects. The software studied here addresses both structures.</p>
Full article ">Figure 3
<p>Visualization of the PHSA report [<a href="#B56-information-14-00196" class="html-bibr">56</a>] as a causal map, where each factor is shown as a node (circle). This map was extended into a version with 98 factors [<a href="#B20-information-14-00196" class="html-bibr">20</a>]. Sizes indicate centrality and colors indicate themes, which are automatically inferred from the network structure by community extraction algorithms. Through this high-level map, it is possible to see how the overall system is composed of interacting themes such as weight stigma (closely related to eating disorders), well-being, or nutrition and health benefits.</p>
Full article ">Figure 4
<p>Exploring the causal map in the first Version of <span class="html-italic">ActionableSystems</span>. Policymakers can view relationships between high-level themes (triangles), or detail them to reveal individual factors (circles). Green lines indicate a causal increase while red lines indicate a causal decrease.</p>
Full article ">Figure 5
<p>Policymakers can choose a specific factor as an intervention target, which will be placed in the center. The factors directly and indirectly affected by the intervention are organized in concentric circles by <span class="html-italic">ActionableSystems</span>. Factors that belong to different themes are shown in different colors (e.g., eating-related factors are gray).</p>
Full article ">Figure 6
<p>Policymakers are often interested in measuring the consequences of an intervention. In the first version of <span class="html-italic">ActionableSystems</span>, this is supported by choosing an intervention node (e.g., weight bias) and a measurement target (e.g., mental well-being). The intervention will be placed on the left, and all paths that lead to the target can be followed left-to-right in order to identify all parts of the system that would be impacted by the intervention. This visual can reveal unintended consequences, such as impacted sleep duration. Factors that belong to different themes are shown in different colors (e.g., psychological constructs are light pink while well-being constructs are white).</p>
Full article ">Figure 7
<p>Correct (green) or incorrect (orange) answers for each user and task.</p>
Full article ">Figure 8
<p>Time to complete each task for each user. Darker colors indicate a longer time to completion.</p>
Full article ">Figure 9
<p>Average percentage of correct answers (<b>top</b>) and standard deviation (<b>bottom</b>) across all users.</p>
Full article ">Figure 10
<p>Average time to completion (<b>top</b>) and standard deviation (<b>bottom</b>) across all users.</p>
Full article ">Figure 11
<p>Relationship between the percentage of network representation questions that a user got correct and the time it took for users to answer the other questions. R-squared for this linear model is 0.017861.</p>
Full article ">Figure 12
<p>Relationship between the time spent on network representation questions and the time it took for users to answer the other questions. R-squared for this linear model is 0.0081678.</p>
Full article ">
23 pages, 32221 KiB  
Article
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
by Tilahun Yeshambel, Josiane Mothe and Yaregal Assabie
Information 2023, 14(3), 195; https://doi.org/10.3390/info14030195 - 20 Mar 2023
Cited by 8 | Viewed by 5290
Abstract
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development [...] Read more.
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development of a number of neural network language models for various languages. However, this is not the case for Amharic, which is known to be a morphologically complex and under-resourced language. Usable pre-trained models for automatic Amharic text processing are not available. This paper presents an investigation on the essence of learned text representation for information retrieval and NLP tasks using word embeddings and BERT language models. We explored the most commonly used methods for word embeddings, including word2vec, GloVe, and fastText, as well as the BERT model. We investigated the performance of query expansion using word embeddings. We also analyzed the use of a pre-trained Amharic BERT model for masked language modeling, next sentence prediction, and text classification tasks. Amharic ad hoc information retrieval test collections that contain word-based, stem-based, and root-based text representations were used for evaluation. We conducted a detailed empirical analysis on the usability of word embeddings and BERT models on word-based, stem-based, and root-based corpora. Experimental results show that word-based query expansion and language modeling perform better than stem-based and root-based text representations, and fastText outperforms other word embeddings on word-based corpus. Full article
(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)
Show Figures

Figure 1

Figure 1
<p>Representation of a document using word-based (<b>top</b>), stem-based (<b>middle</b>), and root-based (<b>bottom</b>) items.</p>
Full article ">Figure 2
<p>Distribution of words in the corpus.</p>
Full article ">Figure 3
<p>An example of masked word prediction.</p>
Full article ">Figure 4
<p>Query expansion architecture using word embeddings for Amharic IR.</p>
Full article ">Figure 5
<p>Training and evaluation of BERT word-based fine-tuned models for document classification based on relevance and subject.</p>
Full article ">Figure 6
<p>Number of tokens in document classification training and testing datasets.</p>
Full article ">
13 pages, 2592 KiB  
Article
Efficient Dynamic Reconfigurable CNN Accelerator for Edge Intelligence Computing on FPGA
by Kaisheng Shi, Mingwei Wang, Xin Tan, Qianghua Li and Tao Lei
Information 2023, 14(3), 194; https://doi.org/10.3390/info14030194 - 20 Mar 2023
Cited by 6 | Viewed by 3775
Abstract
This paper proposes an efficient dynamic reconfigurable CNN accelerator (EDRCA) for FPGAs to tackle the issues of limited hardware resources and low energy efficiency in the deployment of convolutional neural networks on embedded edge computing devices. First, a configuration layer sequence optimization method [...] Read more.
This paper proposes an efficient dynamic reconfigurable CNN accelerator (EDRCA) for FPGAs to tackle the issues of limited hardware resources and low energy efficiency in the deployment of convolutional neural networks on embedded edge computing devices. First, a configuration layer sequence optimization method is proposed to minimize the configuration time overhead and improve performance. Second, accelerator templates for dynamic regions are designed to create a unified high-speed interface and enhance operational performance. The dynamic reconfigurable technology is applied on the Xilinx KV260 FPGA platform to design the EDRCA accelerator, resolving the hardware resource constraints in traditional accelerator design. The YOLOV2-TINY object detection network is used to test the EDRCA accelerator on the Xilinx KV260 platform using floating point data. Results at 250 MHz show a computing performance of 75.1929 GOPS, peak power consumption of 5.25 W, and power efficiency of 13.6219 GOPS/W, indicating the potential of the EDRCA accelerator for edge intelligence computing. Full article
(This article belongs to the Special Issue Artificial Intelligence on the Edge)
Show Figures

Figure 1

Figure 1
<p>Concept of design for dynamic partial reconfiguration.</p>
Full article ">Figure 2
<p>Overall architecture of CNN accelerator.</p>
Full article ">Figure 3
<p>The calculation method of a convolutional layer.</p>
Full article ">Figure 4
<p>Top-level design interface specification of the single reconfigurable module group.</p>
Full article ">Figure 5
<p>Xilinx KV260 FPGA hardware platform.</p>
Full article ">Figure 6
<p>Implementation flow for hardware deployment of YOLOV2-Tiny.</p>
Full article ">Figure 7
<p>Comparison of EDRCA and general accelerator.</p>
Full article ">Figure 8
<p>Comparison of original graph and predicted results on Xilinx KV260 FPGA, AMD Ryzen7 CPU, and NVIDIA GeForce RTX2060; (<b>a</b>) Original image; (<b>b</b>) Xilinx KV260 FPGA Predicted results; (<b>c</b>) AMD Ryzen7 CPU Predicted results; (<b>d</b>) NVIDIA GeForce RTX2060 Predicted results.</p>
Full article ">
23 pages, 3431 KiB  
Article
The Effect of the Ransomware Dataset Age on the Detection Accuracy of Machine Learning Models
by Qussai M. Yaseen
Information 2023, 14(3), 193; https://doi.org/10.3390/info14030193 - 20 Mar 2023
Cited by 4 | Viewed by 2529
Abstract
Several supervised machine learning models have been proposed and used to detect Android ransomware. These models were trained using different datasets from different sources. However, the age of the ransomware datasets was not considered when training and testing these models. Therefore, the detection [...] Read more.
Several supervised machine learning models have been proposed and used to detect Android ransomware. These models were trained using different datasets from different sources. However, the age of the ransomware datasets was not considered when training and testing these models. Therefore, the detection accuracy for those models is inaccurate since they learned using features from specific ransomware, old or new ransomware, and they did not learn using diverse ransomware features from different ages. This paper sheds light on the importance of considering the age of ransomware datasets and its effects on the detection accuracy of supervised machine learning models. This proves that supervised machine learning models trained using new ransomware dataset are inefficient in detecting old types of ransomware and vice versa. Moreover, this paper collected a large and diverse dataset of ransomware applications that comprises new and old ransomware developed during the period 2008–2020. Furthermore, the paper proposes a supervised machine learning model that is trained and tested using the diverse dataset. The experiments show that the proposed model is efficient in detecting Android ransomware regardless of its age by achieving an accuracy of approximately 97.48%. Moreover, the results shows that the proposed model outperforms the state-of-the-art approaches considered in this work. Full article
(This article belongs to the Section Information Security and Privacy)
Show Figures

Figure 1

Figure 1
<p>Android ransomware attack.</p>
Full article ">Figure 2
<p>Static and Dynamic Analysis Features.</p>
Full article ">Figure 3
<p>The methodology of the proposed work.</p>
Full article ">Figure 4
<p>The methodology of creating the datasets.</p>
Full article ">Figure 5
<p>The distribution of the training and testing datasets in each part of experiments.</p>
Full article ">Figure 6
<p>The performance of supervised machine learning algorithms trained on 100% of the old dataset and tested on 100% of the new dataset.</p>
Full article ">Figure 7
<p>The performance of the supervised machine learning algorithms trained on 70% of the old dataset and tested on 30% of the old dataset.</p>
Full article ">Figure 8
<p>The performance of supervised machine learning algorithms trained on 100% of the new dataset and tested on 100% of the old dataset.</p>
Full article ">Figure 9
<p>The performance of supervised machine learning algorithms trained on 70% of the new dataset and tested on 30% of the new dataset.</p>
Full article ">Figure 10
<p>The performance of the supervised machine learning algorithms trained on 70% of the mixed dataset and tested on 30% of the mixed dataset.</p>
Full article ">
30 pages, 1105 KiB  
Review
A Systematic Literature Review on Human Ear Biometrics: Approaches, Algorithms, and Trend in the Last Decade
by Oyediran George Oyebiyi, Adebayo Abayomi-Alli, Oluwasefunmi ‘Tale Arogundade, Atika Qazi, Agbotiname Lucky Imoize and Joseph Bamidele Awotunde
Information 2023, 14(3), 192; https://doi.org/10.3390/info14030192 - 17 Mar 2023
Cited by 15 | Viewed by 4502
Abstract
Biometric technology is fast gaining pace as a veritable developmental tool. So far, biometric procedures have been predominantly used to ensure identity and ear recognition techniques continue to provide very robust research prospects. This paper proposes to identify and review present techniques for [...] Read more.
Biometric technology is fast gaining pace as a veritable developmental tool. So far, biometric procedures have been predominantly used to ensure identity and ear recognition techniques continue to provide very robust research prospects. This paper proposes to identify and review present techniques for ear biometrics using certain parameters: machine learning methods, and procedures and provide directions for future research. Ten databases were accessed, including ACM, Wiley, IEEE, Springer, Emerald, Elsevier, Sage, MIT, Taylor & Francis, and Science Direct, and 1121 publications were retrieved. In order to obtain relevant materials, some articles were excused using certain criteria such as abstract eligibility, duplicity, and uncertainty (indeterminate method). As a result, 73 papers were selected for in-depth assessment and significance. A quantitative analysis was carried out on the identified works using search strategies: source, technique, datasets, status, and architecture. A Quantitative Analysis (QA) of feature extraction methods was carried out on the selected studies with a geometric approach indicating the highest value at 36%, followed by the local method at 27%. Several architectures, such as Convolutional Neural Network, restricted Boltzmann machine, auto-encoder, deep belief network, and other unspecified architectures, showed 38%, 28%, 21%, 5%, and 4%, respectively. Essentially, this survey also provides the various status of existing methods used in classifying related studies. A taxonomy of the current methodologies of ear recognition system was presented along with a publicly available occlussion and pose sensitive black ear image dataset of 970 images. The study concludes with the need for researchers to consider improvements in the speed and security of available feature extraction algorithms. Full article
(This article belongs to the Special Issue Digital Privacy and Security)
Show Figures

Figure 1

Figure 1
<p>PRISMA flow chart for the search procedure.</p>
Full article ">Figure 2
<p>Pose of angles of the left and right ear images.</p>
Full article ">Figure 3
<p>A Taxonomy showing ear recognition state of the art methodology.</p>
Full article ">
19 pages, 636 KiB  
Review
A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack Detection
by Yang Lyu, Yaokai Feng and Kouichi Sakurai
Information 2023, 14(3), 191; https://doi.org/10.3390/info14030191 - 17 Mar 2023
Cited by 28 | Viewed by 5204
Abstract
Cyber attack detection technology plays a vital role today, since cyber attacks have been causing great harm and loss to organizations and individuals. Feature selection is a necessary step for many cyber-attack detection systems, because it can reduce training costs, improve detection performance, [...] Read more.
Cyber attack detection technology plays a vital role today, since cyber attacks have been causing great harm and loss to organizations and individuals. Feature selection is a necessary step for many cyber-attack detection systems, because it can reduce training costs, improve detection performance, and make the detection system lightweight. Many techniques related to feature selection for cyber attack detection have been proposed, and each technique has advantages and disadvantages. Determining which technology should be selected is a challenging problem for many researchers and system developers, and although there have been several survey papers on feature selection techniques in the field of cyber security, most of them try to be all-encompassing and are too general, making it difficult for readers to grasp the concrete and comprehensive image of the methods. In this paper, we survey the filter-based feature selection technique in detail and comprehensively for the first time. The filter-based technique is one popular kind of feature selection technique and is widely used in both research and application. In addition to general descriptions of this kind of method, we also explain in detail search algorithms and relevance measures, which are two necessary technical elements commonly used in the filter-based technique. Full article
(This article belongs to the Special Issue Advances in Computing, Communication & Security)
Show Figures

Figure 1

Figure 1
<p>The proposed NID model based on linear correlation.</p>
Full article ">
24 pages, 2462 KiB  
Article
An Improved Co-Training and Generative Adversarial Network (Diff-CoGAN) for Semi-Supervised Medical Image Segmentation
by Guoqin Li, Nursuriati Jamil and Raseeda Hamzah
Information 2023, 14(3), 190; https://doi.org/10.3390/info14030190 - 17 Mar 2023
Cited by 1 | Viewed by 2385
Abstract
Semi-supervised learning is a technique that utilizes a limited set of labeled data and a large amount of unlabeled data to overcome the challenges of obtaining a perfect dataset in deep learning, especially in medical image segmentation. The accuracy of the predicted labels [...] Read more.
Semi-supervised learning is a technique that utilizes a limited set of labeled data and a large amount of unlabeled data to overcome the challenges of obtaining a perfect dataset in deep learning, especially in medical image segmentation. The accuracy of the predicted labels for the unlabeled data is a critical factor that affects the training performance, thus reducing the accuracy of segmentation. To address this issue, a semi-supervised learning method based on the Diff-CoGAN framework was proposed, which incorporates co-training and generative adversarial network (GAN) strategies. The proposed Diff-CoGAN framework employs two generators and one discriminator. The generators work together by providing mutual information guidance to produce predicted maps that are more accurate and closer to the ground truth. To further improve segmentation accuracy, the predicted maps are subjected to an intersection operation to identify a high-confidence region of interest, which reduces boundary segmentation errors. The predicted maps are then fed into the discriminator, and the iterative process of adversarial training enhances the generators’ ability to generate more precise maps, while also improving the discriminator’s ability to distinguish between the predicted maps and the ground truth. This study conducted experiments on the Hippocampus and Spleen images from the Medical Segmentation Decathlon (MSD) dataset using three semi-supervised methods: co-training, semi-GAN, and Diff-CoGAN. The experimental results demonstrated that the proposed Diff-CoGAN approach significantly enhanced segmentation accuracy compared to the other two methods by benefiting on the mutual guidance of the two generators and the adversarial training between the generators and discriminator. The introduction of the intersection operation prior to the discriminator also further reduced boundary segmentation errors. Full article
Show Figures

Figure 1

Figure 1
<p>The brief framework of Diff-CoGAN comprises two generators and one discriminator.</p>
Full article ">Figure 2
<p>Configurations of Diff-CoGAN framework showing two generators and one discriminator. The input data comprise 2D labeled and unlabeled image datasets. The input to Generator 1 is the original dataset, while Generator 2 is fed the transformed (TX) dataset. T(X) is a 180° rotation of the original dataset. Consequently, predicted maps are generated by the two generators and are intersected (notated by the intersection, ∩) to be the input to the discriminator. Prior to intersection, the predicted map output from Generator 2 is inversed, which is notated by T<sup>-1</sup>.</p>
Full article ">Figure 3
<p>The network design of Generator 1 consists of an encoder and a decoder. The encoder extracts features using 4 dense blocks of convolution, batch normalization, and Leaky-Relu. The feature maps are then upsampled by the decoder to produce the output.</p>
Full article ">Figure 4
<p>The network design of Generator 2. In Generator 2, a transfer learning model is implemented as the encoder to extract features. The decoder performs four upsampling steps, followed by a series of convolutions to produce the output.</p>
Full article ">Figure 5
<p>The network of the discriminator. The discriminator determines how close the predicted map is to the ground truth by using an evaluation score.</p>
Full article ">Figure 6
<p>A sample of Hippocampus data shown in axial, sagittal, and coronal plane views from left to right. The axial view was chosen in this paper.</p>
Full article ">Figure 7
<p>A sample of the Spleen data shown in axial, sagittal, and coronal plane views from left to right. The axial view was chosen in this paper.</p>
Full article ">Figure 8
<p>Examples of the segmentation results for the Hippocampus dataset. The white regions are the segmented results, and the gray portions are the ground truth.</p>
Full article ">
18 pages, 2795 KiB  
Article
A Helium Speech Unscrambling Algorithm Based on Deep Learning
by Yonghong Chen and Shibing Zhang
Information 2023, 14(3), 189; https://doi.org/10.3390/info14030189 - 17 Mar 2023
Cited by 1 | Viewed by 2086
Abstract
Helium speech, the language spoken by divers in the deep sea who breathe a high-pressure helium–oxygen mixture, is almost unintelligible. To accurately unscramble helium speech, a neural network based on deep learning is proposed. First, an isolated helium speech corpus and a continuous [...] Read more.
Helium speech, the language spoken by divers in the deep sea who breathe a high-pressure helium–oxygen mixture, is almost unintelligible. To accurately unscramble helium speech, a neural network based on deep learning is proposed. First, an isolated helium speech corpus and a continuous helium speech corpus in a normal atmosphere are constructed, and an algorithm to automatically generate label files is proposed. Then, a convolution neural network (CNN), connectionist temporal classification (CTC) and a transformer are combined into a speech recognition network. Finally, an optimization algorithm is proposed to improve the recognition of continuous helium speech, which combines depth-wise separable convolution (DSC), a gated linear unit (GLU) and a feedforward neural network (FNN). The experimental results show that the accuracy of the algorithm, upon combining the CNN, CTC and the transformer, is 91.38%, and the optimization algorithm improves the accuracy of continuous helium speech recognition by 9.26%. Full article
(This article belongs to the Special Issue Intelligent Information Processing for Sensors and IoT Communications)
Show Figures

Figure 1

Figure 1
<p>The helium speech recognition model.</p>
Full article ">Figure 2
<p>The optimization algorithm.</p>
Full article ">Figure 3
<p>The images of relu and swish.</p>
Full article ">Figure 4
<p>The GLU architecture.</p>
Full article ">Figure 5
<p>Formant information of normal speech.</p>
Full article ">Figure 6
<p>Formant information of helium speech.</p>
Full article ">Figure 7
<p>The algorithm process when generating label files.</p>
Full article ">Figure 8
<p>Comparison of the LSF performance of different algorithms.</p>
Full article ">
17 pages, 611 KiB  
Article
Ontology-Driven Knowledge Sharing in Alzheimer’s Disease Research
by Sophia Lazarova, Dessislava Petrova-Antonova and Todor Kunchev
Information 2023, 14(3), 188; https://doi.org/10.3390/info14030188 - 16 Mar 2023
Cited by 2 | Viewed by 2280
Abstract
Alzheimer’s disease is a debilitating neurodegenerative condition which is known to be the most common cause of dementia. Despite its rapidly growing prevalence, medicine still lacks a comprehensive definition of the disease. As a result, Alzheimer’s disease remains neither preventable nor curable. In [...] Read more.
Alzheimer’s disease is a debilitating neurodegenerative condition which is known to be the most common cause of dementia. Despite its rapidly growing prevalence, medicine still lacks a comprehensive definition of the disease. As a result, Alzheimer’s disease remains neither preventable nor curable. In recent years, broad interdisciplinary collaborations in Alzheimer’s disease research are becoming more common. Furthermore, such collaborations have already demonstrated their superiority in addressing the complexity of the disease in innovative ways. However, establishing effective communication and optimal knowledge distribution between researchers and specialists with different expertise and background is not a straightforward task. To address this challenge, we propose the Alzheimer’s disease Ontology for Diagnosis and Preclinical Classification (AD-DPC) as a tool for effective knowledge sharing in interdisciplinary/multidisciplinary teams working on Alzheimer’s disease. It covers six major conceptual groups, namely Alzheimer’s disease pathology, Alzheimer’s disease spectrum, Diagnostic process, Symptoms, Assessments, and Relevant clinical findings. All concepts were annotated with definitions or elucidations and in some cases enriched with synonyms and additional resources. The potential of AD-DPC to support non-medical experts is demonstrated through an evaluation of its usability, applicability and correctness. The results show that the participants in the evaluation process who lack prior medical knowledge can successfully answer Alzheimer’s disease-related questions by interacting with AD-DPC. Furthermore, their perceived level of knowledge in the field increased leading to effective communication with medical experts. Full article
(This article belongs to the Special Issue Semantic Interoperability and Knowledge  Building)
Show Figures

Figure 1

Figure 1
<p>Representation of the skeletal methodology used during the development of AD-DPC. The figure is based on the original methodology proposed by Unschold and Gruninger.</p>
Full article ">Figure 2
<p>Visual representation of the key concepts and relations in AD-DPC. All rectangles represent concepts, and all arrows represent relations. The text under the concepts represents their definitions and descriptions in AD-DPC. Part (<b>A</b>) depicts the temporal sequence of the diagnostic process for MCI and AD. Part (<b>B</b>) shows the process of detecting pathological features of AD. Part (<b>C</b>) shows concepts that do not form rich taxonomies.</p>
Full article ">
25 pages, 1760 KiB  
Review
A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning
by Evans Kotei and Ramkumar Thirunavukarasu
Information 2023, 14(3), 187; https://doi.org/10.3390/info14030187 - 16 Mar 2023
Cited by 34 | Viewed by 12485
Abstract
Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was [...] Read more.
Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models. Full article
Show Figures

Figure 1

Figure 1
<p>Article retrieval and selection process based on PRISMA reporting standard.</p>
Full article ">Figure 2
<p>Selected article distribution.</p>
Full article ">Figure 3
<p>Transformer model (An input sequence is converted into a series of continuous representations by the encoder component of the transformer’s architecture before being supplied to the decoder. The decoder combines the encoder’s output with the decoder’s output from the preceding time step to produce an output sequence).</p>
Full article ">Figure 4
<p>(<b>A</b>,<b>B</b>) Attention mechanism in transformer network (This is performed in the multi-head attention mechanism, which concurrently implements several single attention functions by masking the output of the scaled multiplication of the <span class="html-italic">Q</span> and <span class="html-italic">K</span> matrices. The multi-head self-attention is comparable to the encoder’s first sublayer. This multi-head mechanism receives the keys and values from the encoder’s output and the queries from the preceding decoder sublayer on the decoder side. The decoder then focuses on every word in the input sequence).</p>
Full article ">Figure 5
<p>Incessant pretraining process (in this case, the pre-training task is progressively constructed, and the models are pre-trained and fine-tuned to respond to different language understanding tasks).</p>
Full article ">Figure 6
<p>Pretrained model based on knowledge transfer.</p>
Full article ">Figure 7
<p>Intermediate-task transfer learning and subsequent fine-tuning (a pre-trained model (BERT), is fine-tuned on the target task for intermediate task training. The model is then fine-tuned separately for each target and probing task. The target tasks offer great importance to NLP applications.).</p>
Full article ">
19 pages, 3782 KiB  
Article
A Quick Prototype for Assessing OpenIE Knowledge Graph-Based Question-Answering Systems
by Giuseppina Di Paolo, Diego Rincon-Yanez and Sabrina Senatore
Information 2023, 14(3), 186; https://doi.org/10.3390/info14030186 - 16 Mar 2023
Cited by 5 | Viewed by 3331
Abstract
Due to the rapid growth of knowledge graphs (KG) as representational learning methods in recent years, question-answering approaches have received increasing attention from academia and industry. Question-answering systems use knowledge graphs to organize, navigate, search and connect knowledge entities. Managing such systems requires [...] Read more.
Due to the rapid growth of knowledge graphs (KG) as representational learning methods in recent years, question-answering approaches have received increasing attention from academia and industry. Question-answering systems use knowledge graphs to organize, navigate, search and connect knowledge entities. Managing such systems requires a thorough understanding of the underlying graph-oriented structures and, at the same time, an appropriate query language, such as SPARQL, to access relevant data. Natural language interfaces are needed to enable non-technical users to query ever more complex data. The paper proposes a question-answering approach to support end users in querying graph-oriented knowledge bases. The system pipeline is composed of two main modules: one is dedicated to translating a natural language query submitted by the user into a triple of the form <subject, predicate, object>, while the second module implements knowledge graph embedding (KGE) models, exploiting the previous module triple and retrieving the answer to the question. Our framework delivers a fast OpenIE-based knowledge extraction system and a graph-based answer prediction model for question-answering tasks. The system was designed by leveraging existing tools to accomplish a simple prototype for fast experimentation, especially across different knowledge domains, with the added benefit of reducing development time and costs. The experimental results confirm the effectiveness of the proposed system, which provides promising performance, as assessed at the module level. In particular, in some cases, the system outperforms the literature. Finally, a use case example shows the KG generated by user questions in a graphical interface provided by an ad-hoc designed web application. Full article
(This article belongs to the Special Issue Knowledge Graph Technology and Its Applications)
Show Figures

Figure 1

Figure 1
<p>A triple example with a <span class="html-italic">property</span> edge “release_year” and a <span class="html-italic">relation</span> edge, “direct_by”.</p>
Full article ">Figure 2
<p>Overview of the proposed system: The first module, REBEL, takes a natural language question as input. The textual question is then processed by REBEL, which returns a &lt;s,p,o&gt; triple, whose object (o) represents one of the five possible question types. The extracted triple is given as input to the KGE module. The KGE model interprets relationships as translated operations on low-dimensional embeddings. Once the translated space is obtained, the model compares the distances between <span class="html-italic">subject + predicate</span> with <span class="html-italic">object</span> embedding features to verify if <span class="html-italic">subject</span> + <span class="html-italic">relation</span> = <span class="html-italic">object</span>. If the distance is lower than a threshold, the fact is considered reliable and true, and the triple has been completed successfully. The output given by TransE consists of the triple containing the correct answer.</p>
Full article ">Figure 3
<p>Embedding space representation of a single triple <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>h</mi> <mo>,</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>Example of a triple from the original MovieQA dataset with the modified new single triples.</p>
Full article ">Figure 5
<p>REBEL model validation score using 20% in the WikiMovies dataset, the scores show evaluation on each triple component: &lt;subj, pred, obj&gt; and an overall metric, in terms of Recall, Precision, F1-score.</p>
Full article ">Figure 6
<p>KGE Training Performance-Comparison between TransE, DistMult, and CompEx models on the modified triples in the WikiMovies dataset, measured in terms of MRR and HITS@N.</p>
Full article ">Figure 7
<p>KGE Validation Performance-Validation values a comparison among TransE, DistMult, and CompEx models on the modified triples on the WikiMovies dataset, measured in terms of MRR and HITS@N.</p>
Full article ">Figure 8
<p>The system at work: initially, the user can submit a question (1) by clicking on <span class="html-italic">Let’s start with your question</span> on the web application GUI; then, the question <span class="html-italic">Who directed “American Gigolò”?</span> is submitted (2), and the corresponding graph is shown (along with the graph nodes colored) (3); finally, the answer is returned in the interface, as shown in (4).</p>
Full article ">
15 pages, 2511 KiB  
Article
Aspect-Based Sentiment Analysis with Dependency Relation Weighted Graph Attention
by Tingyao Jiang, Zilong Wang, Ming Yang and Cheng Li
Information 2023, 14(3), 185; https://doi.org/10.3390/info14030185 - 16 Mar 2023
Cited by 10 | Viewed by 3173
Abstract
Aspect-based sentiment analysis is a fine-grained sentiment analysis that focuses on the sentiment polarity of different aspects of text, and most current research methods use a combination of dependent syntactic analysis and graphical neural networks. In this paper, a graph attention network aspect-based [...] Read more.
Aspect-based sentiment analysis is a fine-grained sentiment analysis that focuses on the sentiment polarity of different aspects of text, and most current research methods use a combination of dependent syntactic analysis and graphical neural networks. In this paper, a graph attention network aspect-based sentiment analysis model based on the weighting of dependencies (WGAT) is designed to address the problem in that traditional models do not sufficiently analyse the types of syntactic dependencies; in the proposed model, graph attention networks can be weighted and averaged according to the importance of different nodes when aggregating information. The model first transforms the input text into a low-dimensional word vector through pretraining, while generating a dependency syntax graph by analysing the dependency syntax of the input text and constructing a dependency weighted adjacency matrix according to the importance of different dependencies in the graph. The word vector and the dependency weighted adjacency matrix are then fed into a graph attention network for feature extraction, and sentiment polarity is predicted through the classification layer. The model can focus on syntactic dependencies that are more important for sentiment classification during training, and the results of the comparison experiments on the Semeval-2014 laptop and restaurant datasets and the ACL-14 Twitter social comment dataset show that the WGAT model has significantly improved accuracy and F1 values compared to other baseline models, validating its effectiveness in aspect-level sentiment analysis tasks. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

Figure 1
<p>Model structural diagram.</p>
Full article ">Figure 2
<p>Dependency syntax example 1.</p>
Full article ">Figure 3
<p>Dependency syntax example 2.</p>
Full article ">Figure 4
<p>Attention calculation diagram.</p>
Full article ">Figure 5
<p>Performance of WGAT model on each dataset under different <span class="html-italic">a</span> and <span class="html-italic">b</span>.</p>
Full article ">Figure 6
<p>Attention visualisation.</p>
Full article ">
21 pages, 1524 KiB  
Article
A Survey on Compression Domain Image and Video Data Processing and Analysis Techniques
by Yuhang Dong and W. David Pan
Information 2023, 14(3), 184; https://doi.org/10.3390/info14030184 - 15 Mar 2023
Cited by 10 | Viewed by 4069
Abstract
A tremendous amount of image and video data are being generated and shared in our daily lives. Image and video data are typically stored and transmitted in compressed form in order to reduce storage space and transmission time. The processing and analysis of [...] Read more.
A tremendous amount of image and video data are being generated and shared in our daily lives. Image and video data are typically stored and transmitted in compressed form in order to reduce storage space and transmission time. The processing and analysis of compressed image and video data can greatly reduce input data size and eliminate the need for decompression and recompression, thereby achieving significant savings in memory and computation time. There exists a body of research on compression domain data processing and analysis. This survey focuses on the work related to image and video data. The papers cited are categorized based on their target applications, including image and video resizing and retrieval, information hiding and watermark embedding, image and video enhancement and segmentation, object and motion detection, as well as pattern classification, among several other applications. Key methods used for these applications are explained and discussed. Comparisons are drawn among similar approaches. We then point out possible directions of further research. Full article
(This article belongs to the Special Issue Knowledge Management and Digital Humanities)
Show Figures

Figure 1

Figure 1
<p>Flow chart for baseline JPEG for an image. An input image will go through multiple operations, including the discrete cosine transform (DCT), quantization, differential coding, and variable-length coding. The output is a bitstream with headers and markers containing side information for the decoder.</p>
Full article ">Figure 2
<p>Output for different image enhancement methods. The operations applied, from left to right, are: original image, histogram equalization, motion blur, deblur using Wiener filter, smoothing with Gaussian filter, image sharpening, edge detection with Sobel operator.</p>
Full article ">Figure 3
<p>Flowchart for image retrieval.</p>
Full article ">Figure 4
<p>Flowchart for image retargeting.</p>
Full article ">Figure 5
<p>Toy sample for image hiding.</p>
Full article ">Figure 6
<p>Overview of watermark embedding process.</p>
Full article ">Figure 7
<p>Simplified flowchart for image classification.</p>
Full article ">Figure 8
<p>Intra (I) and inter (B and P) frames in a video sequence.</p>
Full article ">Figure 9
<p>Flowchart of MPEG-1 encoding.</p>
Full article ">Figure 10
<p>Compositing two DCT-based video sequences in DCT domain.</p>
Full article ">Figure 11
<p>Watermark embedding in spatial domain.</p>
Full article ">Figure 12
<p>Conventional video transcoding flowchart.</p>
Full article ">Figure 13
<p>Overview of the object tracking method.</p>
Full article ">Figure 14
<p>Flowchart for salient motion detection.</p>
Full article ">Figure 15
<p>Flowchart for video summarization.</p>
Full article ">
11 pages, 2307 KiB  
Article
Liver CT Image Recognition Method Based on Capsule Network
by Qifan Wang, Aibin Chen and Yongfei Xue
Information 2023, 14(3), 183; https://doi.org/10.3390/info14030183 - 15 Mar 2023
Cited by 3 | Viewed by 2014
Abstract
The automatic recognition of CT (Computed Tomography) images of liver cancer is important for the diagnosis and treatment of early liver cancer. However, there are problems such as single model structure and loss of pooling layer information when using a traditional convolutional neural [...] Read more.
The automatic recognition of CT (Computed Tomography) images of liver cancer is important for the diagnosis and treatment of early liver cancer. However, there are problems such as single model structure and loss of pooling layer information when using a traditional convolutional neural network to recognize CT images of liver cancer. Therefore, this paper proposes an efficient method for liver CT image recognition based on the capsule network (CapsNet). Firstly, the liver CT images are preprocessed, and in the process of image denoising, the traditional non-local mean (NLM) denoising algorithm is optimized with a superpixel segmentation algorithm to better protect the information of image edges. After that, CapsNet was used for image recognition for liver CT images. The experimental results show that the average recognition rate of liver CT images reaches 92.9% when CapsNet is used, which is 5.3% higher than the traditional CNN model, indicating that CapsNet has better recognition accuracy for liver CT images. Full article
(This article belongs to the Special Issue Deep Learning for Human-Centric Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Central pixel located at the edge and inside of the superpixel.</p>
Full article ">Figure 2
<p>Improved NLM algorithm flow chart.</p>
Full article ">Figure 3
<p>(<b>A</b>) Partial liver CT pictures before denoising; (<b>B</b>) Partial liver CT pictures after denoising.</p>
Full article ">Figure 4
<p>Image recognition process of liver cancer by CapsNet.</p>
Full article ">Figure 5
<p>The structure diagram of the capsule network.</p>
Full article ">Figure 6
<p>Average recognition rate under different convolution kernel sizes.</p>
Full article ">
11 pages, 1781 KiB  
Article
Deep Learning-Based Semantic Segmentation Methods for Pavement Cracks
by Yu Zhang, Xin Gao and Hanzhong Zhang
Information 2023, 14(3), 182; https://doi.org/10.3390/info14030182 - 15 Mar 2023
Cited by 2 | Viewed by 2337
Abstract
As road mileage continues to expand, the number of disasters caused by expanding pavement cracks is increasing. Two main methods, image processing and deep learning, are used to detect these cracks to improve the efficiency and quality of pavement crack segmentation. The classical [...] Read more.
As road mileage continues to expand, the number of disasters caused by expanding pavement cracks is increasing. Two main methods, image processing and deep learning, are used to detect these cracks to improve the efficiency and quality of pavement crack segmentation. The classical segmentation network, UNet, has a poor ability to extract target edge information and small target segmentation, and is susceptible to the influence of distracting objects in the environment, thus failing to better segment the tiny cracks on the pavement. To resolve this problem, we propose a U-shaped network, ALP-UNet, which adds an attention module to each encoding layer. In the decoding phase, we incorporated the Laplacian pyramid to make the feature map contain more boundary information. We also propose adding a PAN auxiliary head to provide an additional loss for the backbone to improve the overall network segmentation effect. The experimental results show that the proposed method can effectively reduce the interference of other factors on the pavement and effectively improve the mIou and mPA values compared to the previous methods. Full article
(This article belongs to the Special Issue Intelligent Manufacturing and Informatization)
Show Figures

Figure 1

Figure 1
<p>AL-Unet combining CBAM and Laplace pyramid.</p>
Full article ">Figure 2
<p>The ALP-UNet structure.</p>
Full article ">Figure 3
<p>Results of pavement segmentation dataset. (<b>a</b>): input color images. (<b>b</b>): ground truth. (<b>c</b>): results of the UNet. (<b>d</b>): results of the UperNet. (<b>e</b>): results of the ResUNet. (<b>f</b>): results of the Pointrend.</p>
Full article ">Figure 4
<p>Results of the pavement segmentation dataset with different modules added. (<b>a</b>): input color images; (<b>b</b>): ground truth; (<b>c</b>): results of the Pointrend; (<b>d</b>): results of the Att-UNet; (<b>e</b>): results of the AttWS-UNet; (<b>f</b>): results of the AL-UNet; (<b>g</b>): results of the ALP-UNet.</p>
Full article ">
17 pages, 3515 KiB  
Article
Smart Machine Health Prediction Based on Machine Learning in Industry Environment
by Sagar Yeruva, Jeshmitha Gunuganti, Sravani Kalva, Surender Reddy Salkuti and Seong-Cheol Kim
Information 2023, 14(3), 181; https://doi.org/10.3390/info14030181 - 14 Mar 2023
Cited by 2 | Viewed by 5528
Abstract
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches [...] Read more.
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches are reactive and preventive maintenance. These approaches are unreliable and wasteful in terms of time and resource utilization. The use of system health management in conjunction with a predictive maintenance strategy allows for the scheduling of maintenance times in such a way that device malfunction is avoided, and thus the repercussions are avoided. IoT can help monitor equipment health and provide the best outcomes, especially in an industrial setting. Internet of Things (IoT) and machine learning models are quite successful in providing ongoing knowledge and comprehensive study on infrastructure performance. Our suggested technique uses a mobile application that seeks to anticipate the machine’s health status using a classification method utilizing IoT and machine learning technologies, which might benefit the industry environment by alerting the appropriate maintenance team before inflicting significant harm to the system and disrupting normal operations. A comparison of decision tree, XGBoost, SVM, and KNN performance has been carried out. According to our findings, XGBoost achieves higher classification accuracy compared to the other algorithms. As a result, this model is selected for creating a user-based application that allows the user to easily check the state of the machine’s health. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

Figure 1
<p>Block diagram of system overview.</p>
Full article ">Figure 2
<p>XGBoost model.</p>
Full article ">Figure 3
<p>Decision-tree model.</p>
Full article ">Figure 4
<p>SVM model visualization.</p>
Full article ">Figure 5
<p>KNN model visualization.</p>
Full article ">Figure 6
<p>Solution architecture.</p>
Full article ">Figure 7
<p>Desired experimental setup.</p>
Full article ">Figure 8
<p>Real-time experimental setup.</p>
Full article ">Figure 9
<p>Confusion matrices.</p>
Full article ">Figure 10
<p>Comparison of ROC curves.</p>
Full article ">Figure 11
<p>User interface.</p>
Full article ">
16 pages, 2129 KiB  
Article
Localization of False Data Injection Attack in Smart Grids Based on SSA-CNN
by Kelei Shen, Wenxu Yan, Hongyu Ni and Jie Chu
Information 2023, 14(3), 180; https://doi.org/10.3390/info14030180 - 14 Mar 2023
Cited by 9 | Viewed by 2399
Abstract
In recent years, smart grids have integrated information and communication technologies into power networks, which brings new network security issues. Among the existing cyberattacks, the false data injection attack (FDIA) compromises state estimation in smart grids by injecting false data into the meter [...] Read more.
In recent years, smart grids have integrated information and communication technologies into power networks, which brings new network security issues. Among the existing cyberattacks, the false data injection attack (FDIA) compromises state estimation in smart grids by injecting false data into the meter measurements, which adversely affects the smart grids. Current studies on FDIAs mainly focus on the detection of its existence, but there are few studies on its localization. Most attack localization methods have difficulty locating the specific bus or line that is under attack quickly and accurately, have high computational complexity and are difficult to apply to large power networks. Therefore, this paper proposes a localization method for FDIAs that is based on a convolutional neural network and optimized with a sparrow search algorithm (SSA–CNN). Based on the physical meaning of measurement vectors, the proposed method can precisely locate a specific bus or line with relatively low computational complexity. To address the difficulty of selecting hyperparameters in the CNN, which leads to the degradation of localization accuracy, a SSA is used to optimize the hyperparameters of the CNN so that the hyperparameters are optimal when using the model for localization. Finally, simulation experiments are conducted on IEEE14-bus and IEEE118-bus test systems, and the simulation results show that the method proposed in this paper has a high localization accuracy and can largely reduce the false-alarm rate. Full article
(This article belongs to the Special Issue Cyber Security in IoT)
Show Figures

Figure 1

Figure 1
<p>Structure of the proposed CNN.</p>
Full article ">Figure 2
<p>Structure of the localization method for FDIAs.</p>
Full article ">Figure 3
<p>IEEE14-bus Test System.</p>
Full article ">Figure 4
<p>Adaptability values of the SSA for the IEEE14-bus test system.</p>
Full article ">Figure 5
<p>Adaptability values of the SSA for the IEEE118-bus test system.</p>
Full article ">Figure 6
<p>Accuracy and loss value curves of the IEEE14-bus test system. (<b>a</b>) Accuracy value curves of the IEEE14-bus test system. (<b>b</b>) Loss value curves of the IEEE14-bus test system.</p>
Full article ">Figure 7
<p>Accuracy and loss value curves of the IEEE118-bus test system. (<b>a</b>) Accuracy value curves of the IEEE118-bus test system. (<b>b</b>) Loss value curves of the IEEE118-bus test system.</p>
Full article ">
12 pages, 608 KiB  
Article
Adapting Off-the-Shelf Speech Recognition Systems for Novel Words
by Wiam Fadel, Toumi Bouchentouf, Pierre-André Buvet and Omar Bourja
Information 2023, 14(3), 179; https://doi.org/10.3390/info14030179 - 13 Mar 2023
Viewed by 2875
Abstract
Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. [...] Read more.
Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. Non-native accents, new vocabulary, and aging voices can cause malfunctions in a speech recognition system. If this task is not executed correctly, the assistant robot will inevitably produce false or random responses. In this paper, we used a statistical approach based on distance algorithms to improve OOV correction. We developed a post-processing algorithm to be combined with a speech recognition model. In this sense, we compared two distance algorithms: Damerau–Levenshtein and Levenshtein distance. We validated the performance of the two distance algorithms in conjunction with five off-the-shelf speech recognition models. Damerau–Levenshtein, as compared to the Levenshtein distance algorithm, succeeded in minimizing the Word Error Rate (WER) when using the MoroccanFrench test set with five speech recognition systems, namely VOSK API, Google API, Wav2vec2.0, SpeechBrain, and Quartznet pre-trained models. Our post-processing method works regardless of the architecture of the speech recognizer, and its results on our MoroccanFrench test set outperformed the five chosen off-the-shelf speech recognizer systems. Full article
Show Figures

Figure 1

Figure 1
<p>Speech recognition post-processing.</p>
Full article ">
23 pages, 4520 KiB  
Review
Quo Vadis Business Simulation Games in the 21st Century?
by Mirjana Pejić Bach, Tamara Ćurlin, Ana Marija Stjepić and Maja Meško
Information 2023, 14(3), 178; https://doi.org/10.3390/info14030178 - 13 Mar 2023
Cited by 7 | Viewed by 3990
Abstract
Business simulation games have become popular in higher education and business environments. The paper aims to identify the primary research trends and topics of business simulation games research using a systematic and automated literature review with the motivation of research (learning driven and [...] Read more.
Business simulation games have become popular in higher education and business environments. The paper aims to identify the primary research trends and topics of business simulation games research using a systematic and automated literature review with the motivation of research (learning driven and domain driven). Based on these findings, the future development of business simulation games research projected papers that research business simulation games were extracted from Scopus. Second, the research timeline, main publication venues and citation trends have been analysed. Third, the most frequent words, phrases, and topics were extracted using text mining. Results indicate that the research on business simulation games has stagnated, with the most cited papers published in the 2000s. There is a balance between learning-driven and domain driven-research, while technology-driven research is scarce, indicating that the technology used for business simulation games is mature. We project that the research on business simulation games needs to be directed in the area of new technologies that could improve communication with and among the users (virtual reality, augmented reality, simulation games) and technologies that could improve the reasoning and decision-making complexity in business simulation games (artificial intelligence). Full article
(This article belongs to the Special Issue Systems Engineering and Knowledge Management)
Show Figures

Figure 1

Figure 1
<p>Research questions; Source: Authors’ work.</p>
Full article ">Figure 2
<p>Stages of SLR; Source: Authors’ work.</p>
Full article ">Figure 3
<p>Stages of ALR; Source: Authors’ work.</p>
Full article ">Figure 4
<p>Number of papers per publication year (2000−2021); Source: Authors’ work based on Scopus.</p>
Full article ">Figure 5
<p>Paper’s access type in the period 1973–1999 vs. 2000–2023; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 6
<p>The research paper’s countries; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 7
<p>Number of papers and citations from 2000 to 2023; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 8
<p>Scatter plot of the number of papers and citations from 2000 to 2023; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 9
<p>Word cloud word occurrence; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 10
<p>Word cloud phrase occurrence (10+ occurrence); Source: Authors’ work based on Scopus.</p>
Full article ">Figure 11
<p>Bubble plot of the publication’s total frequencies; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 12
<p>Cluster results of the phrases; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 13
<p>Mapping of clusters; Source: Authors’ work based on Scopus.</p>
Full article ">Figure 14
<p>Business simulation games research perspectives; Source: Authors’ work.</p>
Full article ">
13 pages, 2072 KiB  
Article
Construction of a Human Resource Sharing System Based on Blockchain Technology
by Guoyao Zhu, Zhenyu Gu and Yonghui Dai
Information 2023, 14(3), 177; https://doi.org/10.3390/info14030177 - 11 Mar 2023
Cited by 1 | Viewed by 2460
Abstract
Human resource data sharing is very important for the cooperation of human resource institutions. Since the human resource data-sharing service needs to take into account the needs of individuals and talent service institutions, it faces issues such as the security of information sharing [...] Read more.
Human resource data sharing is very important for the cooperation of human resource institutions. Since the human resource data-sharing service needs to take into account the needs of individuals and talent service institutions, it faces issues such as the security of information sharing and the traceability of information. Therefore, this paper constructs a human resource data-sharing service system based on blockchain technology. Its trust mechanism is based on the Fabric alliance chain, and the system makes full use of its advantages of decentralization and consensus. Our research mainly includes data sharing architecture design, consensus mechanism analysis, smart contract design, data sharing process, and blockchain construction. The contribution of this paper is mainly in two aspects. On the one hand, it explores the trust mechanism of human resource data sharing and gives the scheme of the Fabric alliance chain. On the other hand, the overall architecture and smart contract design are given on the construction of the blockchain, which provides a reference for future research on human resource data sharing. Full article
Show Figures

Figure 1

Figure 1
<p>Sample of Merkle tree in blockchain connection.</p>
Full article ">Figure 2
<p>Data sharing architecture based on Fabric alliance chain.</p>
Full article ">Figure 3
<p>Fabric consensus mechanism.</p>
Full article ">Figure 4
<p>The sequence diagram of the data sharing process.</p>
Full article ">Figure 5
<p>The sample of construction of blockchain.</p>
Full article ">
24 pages, 1639 KiB  
Article
Ontology Learning Applications of Knowledge Base Construction for Microelectronic Systems Information
by Frank Wawrzik, Khushnood Adil Rafique, Farin Rahman and Christoph Grimm
Information 2023, 14(3), 176; https://doi.org/10.3390/info14030176 - 9 Mar 2023
Cited by 5 | Viewed by 2978
Abstract
Knowledge base construction (KBC) using AI has been one of the key goals of this highly popular technology since its emergence, as it helps to comprehend everything, including relations, around us. The construction of knowledge bases can summarize a piece of text in [...] Read more.
Knowledge base construction (KBC) using AI has been one of the key goals of this highly popular technology since its emergence, as it helps to comprehend everything, including relations, around us. The construction of knowledge bases can summarize a piece of text in a machine-processable and understandable way. This can prove to be valuable and assistive to knowledge engineers. In this paper, we present the application of natural language processing in the construction of knowledge bases. We demonstrate how a trained bidirectional long short-term memory or bi-LSTM neural network model can be used to construct knowledge bases in accordance with the exact ISO26262 definitions as defined in the GENIAL! Basic Ontology. We provide the system with an electronic text document from the microelectronics domain and the system attempts to create a knowledge base from the available information in textual format. This information is then expressed in the form of graphs when queried by the user. This method of information retrieval presents the user with a much more technical and comprehensive understanding of an expert piece of text. This is achieved by applying the process of named entity recognition (NER) for knowledge extraction. This paper provides a result report of the current status of our knowledge construction process and knowledge base content, as well as describes our challenges and experiences. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

Figure 1
<p>Knowledge-base construction pipeline.</p>
Full article ">Figure 2
<p>Electrical system and application.</p>
Full article ">Figure 3
<p>Reasoner output with NLP-generated sample relationship between computer and RAM.</p>
Full article ">Figure 4
<p>TBox reasoning of digital filter system [<a href="#B3-information-14-00176" class="html-bibr">3</a>].</p>
Full article ">Figure 5
<p>Distribution of class examples.</p>
Full article ">Figure 6
<p>Architecture of the applied bi-directional LSTM model.</p>
Full article ">Figure 7
<p>The final output with a sample context of ten sentences.</p>
Full article ">Figure 8
<p>Knowledge graph generated with tokens from test predictions presented in <a href="#information-14-00176-t006" class="html-table">Table 6</a>.</p>
Full article ">Figure 9
<p>Knowledge graph generated on our test data using a transformer neural network.</p>
Full article ">
18 pages, 3214 KiB  
Article
QoS-Aware Resource Management in 5G and 6G Cloud-Based Architectures with Priorities
by Spiros (Spyridon) Louvros, Michael Paraskevas and Theofilos Chrysikos
Information 2023, 14(3), 175; https://doi.org/10.3390/info14030175 - 9 Mar 2023
Cited by 10 | Viewed by 3280
Abstract
Fifth-generation and more importantly the forthcoming sixth-generation networks have been given special care for latency and are designed to support low latency applications including a high flexibility New Radio (NR) interface that can be configured to utilize different subcarrier spacings (SCS), slot durations, [...] Read more.
Fifth-generation and more importantly the forthcoming sixth-generation networks have been given special care for latency and are designed to support low latency applications including a high flexibility New Radio (NR) interface that can be configured to utilize different subcarrier spacings (SCS), slot durations, special scheduling optional features (mini-slot scheduling), cloud- and virtual-based transport network infrastructures including slicing, and finally intelligent radio and transport packet retransmissions mechanisms. QoS analysis with emphasis on the determination of the transmitted packets’ average waiting time is therefore crucial for both network performance and user applications. Most preferred implementations to optimize transmission network rely on the cloud architectures with star network topology. In this paper, as part of our original and innovative contribution, a two-stage queue model is proposed and analytically investigated. Firstly, a two-dimension queue is proposed in order to estimate the expected delay on priority scheduling decisions over the IP/Ethernet MAC layer of IP packet transmissions between gNB and the core network. Furthermore, a one-dimension queue is proposed to estimate the average packet waiting time on the RLC radio buffer before being scheduled mainly due to excessive traffic load and designed transmission bandwidth availability. Full article
(This article belongs to the Special Issue 5G Networks and Wireless Communication Systems)
Show Figures

Figure 1

Figure 1
<p>gNodeB star topology implementation in 5G/6G cloud-based transmission network.</p>
Full article ">Figure 2
<p>Two-dimensional Markov chain for mixed traffic services with packet queue.</p>
Full article ">Figure 3
<p>The state diagram with queue to calculate <math display="inline"><semantics> <mrow> <msub> <mi>T</mi> <mrow> <mi>y</mi> <mi>o</mi> <mi>u</mi> <mi>t</mi> <mo>/</mo> <mfenced> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </mfenced> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>First case of transition from states (0,3) or (1,3) or (1,2) or (2,2) into state (0,2).</p>
Full article ">Figure 5
<p>Second case of transition from states (1,3) or (2,3) or (1,2) or (2,2) or (2,1) into state (1,1).</p>
Full article ">Figure 6
<p>Third case of transition from states (2,2) or (2,1) into state (2,0).</p>
Full article ">Figure 7
<p>The state diagram (two-dimensional Markov chain) for GSM/GPRS traffic without queue.</p>
Full article ">
13 pages, 1703 KiB  
Article
An Attention-Based Deep Convolutional Neural Network for Brain Tumor and Disorder Classification and Grading in Magnetic Resonance Imaging
by Ioannis D. Apostolopoulos, Sokratis Aznaouridis and Mpesi Tzani
Information 2023, 14(3), 174; https://doi.org/10.3390/info14030174 - 9 Mar 2023
Cited by 13 | Viewed by 3588
Abstract
This study proposes the integration of attention modules, feature-fusion blocks, and baseline convolutional neural networks for developing a robust multi-path network that leverages its multiple feature-extraction blocks for non-hierarchical mining of important medical image-related features. The network is evaluated using 10-fold cross-validation on [...] Read more.
This study proposes the integration of attention modules, feature-fusion blocks, and baseline convolutional neural networks for developing a robust multi-path network that leverages its multiple feature-extraction blocks for non-hierarchical mining of important medical image-related features. The network is evaluated using 10-fold cross-validation on large-scale magnetic resonance imaging datasets involving brain tumor classification, brain disorder classification, and dementia grading tasks. The Attention Feature Fusion VGG19 (AFF-VGG19) network demonstrates superiority against state-of-the-art networks and attains an accuracy of 0.9353 in distinguishing between three brain tumor classes, an accuracy of 0.9565 in distinguishing between Alzheimer’s and Parkinson’s diseases, and an accuracy of 0.9497 in grading cases of dementia. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications)
Show Figures

Figure 1

Figure 1
<p>Attention Feature-Fusion VGG19 network.</p>
Full article ">Figure 2
<p>Training–validation accuracy–losses and ROC curve of AFF-VGG19.</p>
Full article ">
60 pages, 1718 KiB  
Article
Quickening Data-Aware Conformance Checking through Temporal Algebras
by Giacomo Bergami, Samuel Appleby and Graham Morgan
Information 2023, 14(3), 173; https://doi.org/10.3390/info14030173 - 8 Mar 2023
Cited by 6 | Viewed by 2847
Abstract
A temporal model describes processes as a sequence of observable events characterised by distinguishable actions in time. Conformance checking allows these models to determine whether any sequence of temporally ordered and fully-observable events complies with their prescriptions. The latter aspect leads to Explainable [...] Read more.
A temporal model describes processes as a sequence of observable events characterised by distinguishable actions in time. Conformance checking allows these models to determine whether any sequence of temporally ordered and fully-observable events complies with their prescriptions. The latter aspect leads to Explainable and Trustworthy AI, as we can immediately assess the flaws in the recorded behaviours while suggesting any possible way to amend the wrongdoings. Recent findings on conformance checking and temporal learning lead to an interest in temporal models beyond the usual business process management community, thus including other domain areas such as Cyber Security, Industry 4.0, and e-Health. As current technologies for accessing this are purely formal and not ready for the real world returning large data volumes, the need to improve existing conformance checking and temporal model mining algorithms to make Explainable and Trustworthy AI more efficient and competitive is increasingly pressing. To effectively meet such demands, this paper offers KnoBAB, a novel business process management system for efficient Conformance Checking computations performed on top of a customised relational model. This architecture was implemented from scratch after following common practices in the design of relational database management systems. After defining our proposed temporal algebra for temporal queries (xtLTLf), we show that this can express existing temporal languages over finite and non-empty traces such as LTLf. This paper also proposes a parallelisation strategy for such queries, thus reducing conformance checking into an embarrassingly parallel problem leading to super-linear speed up. This paper also presents how a single xtLTLf operator (or even entire sub-expressions) might be efficiently implemented via different algorithms, thus paving the way to future algorithmic improvements. Finally, our benchmarks highlight that our proposed implementation of xtLTLf (KnoBAB) outperforms state-of-the-art conformance checking software running on LTLf logic. Full article
(This article belongs to the Special Issue Best IDEAS: International Database Engineered Applications Symposium)
Show Figures

Figure 1

Figure 1
<p>Table of Contents.</p>
Full article ">Figure 2
<p>KnoBAB Architecture for Breast Cancer patients. Each trace ➀–➂ represents one single patient’s clinical history, represented with unique colouring, while each Declare clause Ⓐ–Ⓒ prescribes a temporal condition that such traces shall satisfy. Please observe that the atomisation process does not consider data distribution but rather partitions the data space as described by the data activation and target conditions. In the query plan, green arrows indicate access to shared sub-queries as in [<xref ref-type="bibr" rid="B9-information-14-00173">9</xref>], and thick red ellipses indicate which operators are untimed.</p>
Full article ">Figure 3
<p>We can express a cyber-security scenario by considering (<bold>a</bold>) possible situations in a Cyber Kill Chain, than are then (<bold>b</bold>) represented in the activity labels’ names associated to the events.</p>
Full article ">Figure 4
<p>Two exemplifying clauses distinguishing <monospace>Response</monospace> and <monospace>Precedence</monospace> behaviours. Traces are represented as temporally ordered events associated with activity labels (boxed). Activation (or target) conditions are circled here (or ticked/crossed). Ticks (or crosses) indicate a (un)successful match of a target condition. For all activations, there must be an un-failing target condition; for precedence, we shall consider at most one activation. These conditions require the usage of multiple join tests per trace.</p>
Full article ">Figure 5
<p>In-depth representation of the query plan associated with the model described in Example 15.</p>
Full article ">Figure 6
<p>Assessing a high-level use case of an intrusion attack on a software system through a declarative model.</p>
Full article ">Figure 7
<p>Results for the fast set operations <xref ref-type="sec" rid="sec6dot1-information-14-00173">Section 6.1</xref> against the traditional logical implementation.</p>
Full article ">Figure 8
<p>Results for the custom declarative clause implementations <xref ref-type="sec" rid="sec6dot2-information-14-00173">Section 6.2</xref> against the traditional logical implementation.</p>
Full article ">Figure 9
<p>Results for the <sc><bold>Until</bold></sc> operator (<xref ref-type="sec" rid="sec6dot3-information-14-00173">Section 6.3</xref>).</p>
Full article ">Figure 10
<p>Results for the derived operators <sc><bold>TimedAndFuture</bold></sc> and <sc><bold>TimedAndGlobally</bold></sc> <xref ref-type="sec" rid="sec6dot4-information-14-00173">Section 6.4</xref>. We include both variants of the fast implementations to analyse the environments where each thrive.</p>
Full article ">Figure 11
<p>Results for relational temporal mining <xref ref-type="sec" rid="sec7dot2-information-14-00173">Section 7.2</xref>.</p>
Full article ">Figure 12
<p>Results for parallelisation <xref ref-type="sec" rid="sec7dot3-information-14-00173">Section 7.3</xref>. <inline-formula><mml:math id="mm1144"><mml:semantics><mml:mi>ω</mml:mi></mml:semantics></mml:math></inline-formula> indicates the set of threads in the thread pool, and the red dashed horizontal lines indicate running times for single threaded instances.</p>
Full article ">Figure 13
<p>Running times over different models (<xref ref-type="app" rid="app1-information-14-00173">Table S1a</xref>) for different atomisation strategies.</p>
Full article ">Figure 14
<p>Running times for data-aware conformance checking.</p>
Full article ">
20 pages, 6151 KiB  
Article
Architecture-Oriented Agent-Based Simulations and Machine Learning Solution: The Case of Tsunami Emergency Analysis for Local Decision Makers
by Pavel Čech, Martin Mattoš, Viera Anderková, František Babič, Bilal Naji Alhasnawi, Vladimír Bureš, Milan Kořínek, Kamila Štekerová, Martina Husáková, Marek Zanker, Sunanda Manneela and Ioanna Triantafyllou
Information 2023, 14(3), 172; https://doi.org/10.3390/info14030172 - 8 Mar 2023
Cited by 4 | Viewed by 2524
Abstract
Tsunamis are a perilous natural phenomenon endangering growing coastal populations and tourists in many seaside resorts. Failures in responding to recent tsunami events stresses the importance of further research in building a robust tsunami warning system, especially in the “last mile” component. The [...] Read more.
Tsunamis are a perilous natural phenomenon endangering growing coastal populations and tourists in many seaside resorts. Failures in responding to recent tsunami events stresses the importance of further research in building a robust tsunami warning system, especially in the “last mile” component. The lack of detail, unification and standardisation in information processing and decision support hampers wider implementation of reusable information technology solutions among local authorities and officials. In this paper, the architecture of a tsunami emergency solution is introduced. The aim of the research is to present a tsunami emergency solution for local authorities and officials responsible for preparing tsunami response and evacuation plans. The solution is based on a combination of machine learning techniques and agent-based modelling, enabling analysis of both real and simulated datasets. The solution is designed and developed based on the principles of enterprise architecture development. The data exploration follows the practices for data mining and big data analyses. The architecture of the solution is depicted using the standardised notation and includes components that can be exploited by responsible local authorities to test various tsunami impact scenarios and prepare plans for appropriate response measures. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2023)
Show Figures

Figure 1

Figure 1
<p>Solution architecture using ArchiMate specification. The blue colour denotes application layer elements. Green is for technology layer elements.</p>
Full article ">Figure 2
<p>Map part of the tsunami emergency dashboard. The map is overlayed with contours. The contours are colour-coded from red through yellow to green. The red colour represents areas with the lowest elevation and highest tsunami risk. The green colour is used for places with the highest elevation and lowest tsunami risk. The location depicted is a Manly suburb in Sydney, Australia.</p>
Full article ">Figure 3
<p>The figure shows the tsunami wave parameter setting screen that is used to estimate the impact of the tsunami. The user can use the sliders to change the calculated values from the CFD cloud component, such as speed and wave height. In reference to the tsunami origin, the application recalculates the impact time and time-to-impact fields. The endangered people are also recalculated based on the location definitions and the wave height.</p>
Full article ">Figure 4
<p>The figure shows the screen of the tsunami emergency assistant. The upper orange part displays the notification received from the tsunami emergency dashboard. The grey bar contains information about the user’s status. The map can be used to navigate to the closest available safe location. The large red location icon on the map shows the current user position. The green location icon is for a safe location. The shortest path from the current user position to the safe location is marked with the blue line.</p>
Full article ">Figure 5
<p>Files uploading and data preview (data understanding).</p>
Full article ">Figure 6
<p>Data visualisation with a map: tsunami events on the map (data understanding).</p>
Full article ">Figure 7
<p>Data normalisation, splitting and visualisation (data preparation).</p>
Full article ">Figure 8
<p>Algorithms/parameters settings, interpretability of a model (modelling and evaluation).</p>
Full article ">Figure 9
<p>The main screen of the data exploration component.</p>
Full article ">Figure 10
<p>Visualisation of evacuation simulation of the Manly suburb in Sydney, Australia (see <a href="#information-14-00172-f002" class="html-fig">Figure 2</a> for the map of the area). The blue colour represents the tsunami wave propagating approximately from right to left. The white lines are the roads taken from the Open Street Map data. Yellow are the buildings also based on Open Street Map data. The white points are the intersections or places with differing elevations. The agents are moving along these lines. The coloured lines are the elevation contours. Red crosses represent inhabitants struck by the tsunami. The charts on the right show the number of inhabitants successfully finding shelter and inhabitants struck by the tsunami, respectively.</p>
Full article ">Figure 11
<p>Simulated evacuation process overview using BPMN notation.</p>
Full article ">
20 pages, 1704 KiB  
Article
Scams and Solutions in Cryptocurrencies—A Survey Analyzing Existing Machine Learning Models
by Lakshmi Priya Krishnan, Iman Vakilinia, Sandeep Reddivari and Sanjay Ahuja
Information 2023, 14(3), 171; https://doi.org/10.3390/info14030171 - 8 Mar 2023
Cited by 11 | Viewed by 5209
Abstract
With the emergence of cryptocurrencies and Blockchain technology, the financial sector is turning its gaze toward this latest wave. The use of cryptocurrencies is becoming very common for multiple services. Food chains, network service providers, tech companies, grocery stores, and so many other [...] Read more.
With the emergence of cryptocurrencies and Blockchain technology, the financial sector is turning its gaze toward this latest wave. The use of cryptocurrencies is becoming very common for multiple services. Food chains, network service providers, tech companies, grocery stores, and so many other services accept cryptocurrency as a mode of payment and give several incentives for people who pay using them. Despite this tremendous success, cryptocurrencies have opened the door to fraudulent activities such as Ponzi schemes, HYIPs (high-yield investment programs), money laundering, and much more, which has led to the loss of several millions of dollars. Over the decade, solutions using several machine learning algorithms have been proposed to detect these felonious activities. The objective of this paper is to survey these models, the datasets used, and the underlying technology. This study will identify highly efficient models, evaluate their performances, and compile the extracted features, which can serve as a benchmark for future research. Fraudulent activities and their characteristics have been exposed in this survey. We have identified the gaps in the existing models and propose improvement ideas that can detect scams early. Full article
(This article belongs to the Special Issue Machine Learning for the Blockchain)
Show Figures

Figure 1

Figure 1
<p>F1 score comparison of Ponzi schemes with account features.</p>
Full article ">Figure 2
<p>F1 score comparison of Ponzi schemes with code features.</p>
Full article ">Figure 3
<p>F1 score comparison of Ponzi schemes with account and code features.</p>
Full article ">Figure 4
<p>F1 score comparison of money laundering.</p>
Full article ">Figure 5
<p>F1 score comparison of phishing scams.</p>
Full article ">Figure 6
<p>Accuracy comparison of fake wallets and accounts.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop