Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Advances in Image Processing and Computer Vision Based on Machine Learning

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 11645

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical, Electronics and Computer Engineering (DIEEI), University of Catania, 95125 Catania, Italy
Interests: audio signal processing; biometrics; IoT; drone/UAV communications; rainfall estimation and monitoring; post-earthquake geolocation; image processing, computer vision, machine learning-based applications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to the recent advances in image processing and computer vision. Much of this recent explosion of developments and application areas is due to the powerful capabilities of machine learning algorithms and, more specifically, convolutional neural networks (CNNs).

Computer Vision plays an important role in health care (e.g., COVID-19), anti-crime, and countering hydrogeological disruption. This Special Issue aims to present original, unpublished, and breakthrough research in the metaverse and computer vision focusing on new algorithms and mechanisms, such as artificial intelligence, machine learning, and explainable artificial intelligence (XAI). We aim to bring leading scientists and researchers together and create an interdisciplinary platform for the exchange of computational theories, methodologies, and techniques.

The purpose of this Special Issue is to disseminate research papers or state-of-the-art surveys that pertain to novel or emerging applications in the field of image processing and computer vision based on machine learning algorithms. Papers may contribute to technologies and application areas that have emerged during the past decade. Submissions are particularly welcome in, though not limited to, the areas in the list of keywords below.

Technical Program Committee Member:

Ms. Roberta Avanzato   
E-mail: [email protected]
Homepage: https://www.researchgate.net/profile/Roberta-Avanzato
Affiliation: Department of Electrical, Electronics and Computer Engineering (DIEEI), University of Catania, 95124 Catania, Italy
Research Interests: rainfall estimation; geolocation; natural disasters; Internet of Things; UAV; computer networking; biomedical signal processing

Dr. Francesco Beritelli
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • image segmentation
  • computer vision
  • deep learning
  • machine learning
  • reinforcement learning
  • classification
  • healthcare applications
  • novel industrial applications
  • high-speed computer vision
  • novel applications for 3D vision
  • object recognition
  • object detection
  • object tracking

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 23607 KiB  
Article
Enhancing Image Copy Detection through Dynamic Augmentation and Efficient Sampling with Minimal Data
by Mohamed Fawzy, Noha S. Tawfik and Sherine Nagy Saleh
Electronics 2024, 13(16), 3125; https://doi.org/10.3390/electronics13163125 - 7 Aug 2024
Viewed by 1063
Abstract
Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating [...] Read more.
Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating misleading media. This highlights the need for developing new advanced techniques for the image copy detection task, which involves evaluating whether photos or videos originate from the same source. This research introduces a novel application of the Vision Transformer (ViT) model to the image copy detection task on the DISC21 dataset. Our approach involves innovative strategic sampling of the extensive DISC21 training set using K-means clustering to achieve a representative subset. Additionally, we employ complex augmentation pipelines applied while training with varying intensities. Our methodology follows the instance discrimination concept, where the Vision Transformer model is used as a classifier to map different augmentations of the same image to the same class. Next, the trained ViT model extracts descriptors of original and manipulated images that subsequently underwent post-processing to reduce dimensionality. Our best-achieving model, tested on a refined query set of 10K augmented images from the DISC21 dataset, attained a state-of-the-art micro-average precision of 0.79, demonstrating the effectiveness and innovation of our approach. Full article
Show Figures

Figure 1

Figure 1
<p>Structure of the proposed model divided into three phases: preprocessing, training, and inference.</p>
Full article ">Figure 2
<p>Original image and its augmented versions, including various augmentation techniques.</p>
Full article ">Figure 3
<p>Examples of original images from six different clusters and their augmented versions using different augmentation pipelines.</p>
Full article ">Figure 4
<p>The upper part of the figure demonstrates the architecture of the ViT L16 model. ViT L16 features 24 transformer encoder blocks; the lower part details the architecture of each one of them.</p>
Full article ">
27 pages, 6343 KiB  
Article
Detection and Classification of Obstructive Sleep Apnea Using Audio Spectrogram Analysis
by Salvatore Serrano, Luca Patanè, Omar Serghini and Marco Scarpa
Electronics 2024, 13(13), 2567; https://doi.org/10.3390/electronics13132567 - 29 Jun 2024
Cited by 1 | Viewed by 1483
Abstract
Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine [...] Read more.
Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine learning-based classifiers for the identification of obstructive sleep apnea–hypopnea (OSAH) events, including apnea/non-apnea status classification, apnea–hypopnea index (AHI) prediction, and AHI severity classification. The dataset considered contains recordings from 192 patients. It is derived from a recently released dataset which contains, amongst others, audio signals recorded with an ambient microphone placed ∼1 m above the studied subjects and apnea/hypopnea accurate events annotations performed by specialized medical doctors. We employ mel spectrogram images extracted from the environmental audio signals as input of a machine-learning-based classifier for apnea/hypopnea events classification. The proposed approach involves a stacked model which utilizes a combination of a pretrained VGG-like audio classification (VGGish) network and a bidirectional long short-term memory (bi-LSTM) network. Performance analysis was conducted using a 5-fold cross-validation approach, leaving out patients used for training and validation of the models in the testing step. Comparative evaluations with recently presented methods from the literature demonstrate the advantages of the proposed approach. The proposed architecture can be considered a useful tool for supporting OSAHS diagnoses by means of low-cost devices such as smartphones. Full article
Show Figures

Figure 1

Figure 1
<p>Typical examples of sequences of mel spectrograms extracted for different types of apnea category and in “No-Apnea” state: (<b>a</b>) “Central Apnea”, (<b>b</b>) “Hypopnea”, (<b>c</b>) “Mixed Apnea”, (<b>d</b>) “Obstructive Apnea”, (<b>e</b>) “No-Apnea”. The x-axis and y-axis are the time and frequency of the sound, respectively.</p>
Full article ">Figure 2
<p>Block scheme of the proposed method.</p>
Full article ">Figure 3
<p>Bi-LSTM architecture optimization: (<b>a</b>) Estimated objective function values versus hyperparameters; (<b>b</b>) Minimum objective function values versus number of function evaluations.</p>
Full article ">Figure 4
<p>Training/validation loss and training/validation recall for (<b>a</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>2</mn> </msub> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>, (<b>d</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>4</mn> </msub> </semantics></math>, and (<b>e</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>5</mn> </msub> </semantics></math> model.</p>
Full article ">Figure 5
<p>Confusion matrix, class-wise “precisions”, and class-wise “recalls” for (<b>a</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>2</mn> </msub> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>, (<b>d</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>4</mn> </msub> </semantics></math>, and (<b>e</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>5</mn> </msub> </semantics></math> model.</p>
Full article ">Figure 6
<p>Flow diagram of the proposed architecture: from the process recordings to the class prediction.</p>
Full article ">Figure 7
<p>Typical time evolution of <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>d</mi> <mo>^</mo> </mover> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </semantics></math> trend. The thresholds <math display="inline"><semantics> <mrow> <mi>T</mi> <msub> <mi>h</mi> <mn>1</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>T</mi> <msub> <mi>h</mi> <mn>2</mn> </msub> </mrow> </semantics></math> are also reported.</p>
Full article ">Figure 8
<p>Outcome of the considered objective functions and related optimal values of threshold pairs (<math display="inline"><semantics> <mrow> <mi>T</mi> <msub> <mi>h</mi> <mn>1</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>T</mi> <msub> <mi>h</mi> <mn>2</mn> </msub> </mrow> </semantics></math>) for model <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>: (<b>a</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math> objective function, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math> objective function, (<b>c</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math> objective function.</p>
Full article ">Figure 9
<p>Curves indicating the percentage of recordings with <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>H</mi> <msub> <mi>I</mi> <mi>e</mi> </msub> <mo>≤</mo> <mi>e</mi> </mrow> </semantics></math> for each model and taking into account (<b>a</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math> objective function, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math> objective function, and (<b>c</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math> objective function.</p>
Full article ">Figure 10
<p>Scatter diagram between true and predicted AHI. The legends report the value of Pearson correlation coefficient (PCC) and related <span class="html-italic">p</span>-value obtained by each model and thresholds optimized by (<b>a</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math> objective function, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math> objective function, and (<b>c</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math> objective function.</p>
Full article ">Figure 11
<p>Confusion matrices obtained to classify OSAHS severity for each model and optimization approach: (<b>a</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>; (<b>c</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>; (<b>d</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>2</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>; (<b>e</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>2</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>, (<b>f</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>2</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>; (<b>g</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>; (<b>h</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>; (<b>i</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>3</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>; (<b>j</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>4</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>; (<b>k</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>4</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>; (<b>l</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>4</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>; (<b>m</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>5</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>; (<b>n</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>5</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>; (<b>o</b>) <math display="inline"><semantics> <msub> <mi>M</mi> <mn>5</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 12
<p>Confusion matrices obtained to classify OSAHS severity using aggregate model results and policy <math display="inline"><semantics> <msub> <mi mathvariant="script">P</mi> <mn>1</mn> </msub> </semantics></math>: thresholds optimized using (<b>a</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>, and (<b>c</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 13
<p>Confusion matrices obtained to classify OSAHS severity using aggregate model results and policy <math display="inline"><semantics> <msub> <mi mathvariant="script">P</mi> <mn>2</mn> </msub> </semantics></math>: thresholds optimized using (<b>a</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>A</mi> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mi>R</mi> </msub> </semantics></math>, and (<b>c</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>C</mi> <mi>A</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">
25 pages, 940 KiB  
Article
Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications
by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo
Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024
Cited by 1 | Viewed by 856
Abstract
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article
Show Figures

Figure 1

Figure 1
<p>An illustration of the QTMT partition structure [<a href="#B1-electronics-13-02150" class="html-bibr">1</a>].</p>
Full article ">Figure 2
<p>Flowchart of the three-step intra mode decision in VVC reference software.</p>
Full article ">Figure 3
<p>An example of quadtree with nested multi-type tree coding block structure.</p>
Full article ">Figure 4
<p>Neighboring CUs.</p>
Full article ">Figure 5
<p>Relation between texture direction and gradient.</p>
Full article ">Figure 6
<p>Flowchart of the proposed fast CU partition decision based on random forest classifier.</p>
Full article ">Figure 7
<p>Different CUs with same variance.</p>
Full article ">Figure 8
<p>Illustration of the random forest <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>P</mi> <mi>M</mi> </mrow> </msub> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>E</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>Flowchart of the hierarchical search method for fast intra mode prediction.</p>
Full article ">Figure 10
<p>The accuracy of the two random forest classifiers. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>P</mi> <mi>M</mi> </mrow> </msub> </mrow> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>E</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 11
<p>Influence of different components in FCPD for different sequences. (<b>a</b>) <span class="html-italic">BasketballDrive</span>; (<b>b</b>) <span class="html-italic">RaceHorses</span>; (<b>c</b>) <span class="html-italic">BasketballPass</span>; (<b>d</b>) <span class="html-italic">Johnny</span>.</p>
Full article ">Figure 12
<p>The accuracy of FIMP.</p>
Full article ">Figure 13
<p>Performance results of the proposed FCPD, FIMD, and overall algorithm compared with VTM-7.0 encoder of RaceHorseC. (<b>a</b>) RD curves; (<b>b</b>) time saving under different QPs.</p>
Full article ">
14 pages, 617 KiB  
Article
Automatic Evaluation Method for Functional Movement Screening Based on Multi-Scale Lightweight 3D Convolution and an Encoder–Decoder
by Xiuchun Lin, Yichao Liu, Chen Feng, Zhide Chen, Xu Yang and Hui Cui
Electronics 2024, 13(10), 1813; https://doi.org/10.3390/electronics13101813 - 7 May 2024
Viewed by 1024
Abstract
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and [...] Read more.
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and efficiently. To address this challenge, this paper proposes an automatic evaluation method for FMS based on a multi-scale lightweight 3D convolution encoder–decoder (ML3D-ED) architecture. This method adopts a self-built multi-scale lightweight 3D convolution architecture to extract features from videos. The extracted features are then processed using an encoder–decoder architecture and probabilistic integration technique to effectively predict the final score distribution. This architecture, compared with the traditional Two-Stream Inflated 3D ConvNet (I3D) network, offers a better performance and accuracy in capturing advanced human movement features in temporal and spatial dimensions. Specifically, the ML3D-ED backbone network reduces the number of parameters by 59.5% and the computational cost by 77.7% when compared to I3D. Experiments have shown that ML3D-ED achieves an accuracy of 93.33% on public datasets, demonstrating an improvement of approximately 9% over the best existing method. This outcome demonstrates the effectiveness of and advancements made by the ML3D-ED architecture and probabilistic integration technique in extracting advanced human movement features and evaluating functional movements. Full article
Show Figures

Figure 1

Figure 1
<p>ML3D-ED network architecture.</p>
Full article ">Figure 2
<p>Three-dimensional filter equivalently transformed into two-dimensional + one-dimensional filters.</p>
Full article ">Figure 3
<p>ML3D architecture.</p>
Full article ">Figure 4
<p>Encoder–decoder.</p>
Full article ">Figure 5
<p>Comparison of the feature extraction models for the number of parameters and computational cost.</p>
Full article ">Figure 6
<p>Three-dimensional convolution decoupling methods.</p>
Full article ">Figure 7
<p>Three-dimensional convolution decoupling methods.</p>
Full article ">
18 pages, 7562 KiB  
Article
Graph- and Machine-Learning-Based Texture Classification
by Musrrat Ali, Sanoj Kumar, Rahul Pal, Manoj K. Singh and Deepika Saini
Electronics 2023, 12(22), 4626; https://doi.org/10.3390/electronics12224626 - 12 Nov 2023
Cited by 2 | Viewed by 1971
Abstract
The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity [...] Read more.
The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity of natural textures. This paper presents a method for classifying textures using graphs; specifically, natural and horizontal visibility graphs. The related image natural visibility graph (INVG) and image horizontal visibility graph (IHVG) are used to obtain features for classifying textures. These features are the clustering coefficient and the degree distribution. The suggested outcomes show that the aforementioned technique outperforms traditional ones and even comes close to matching the performance of convolutional neural networks (CNNs). Classifiers such as the support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) are utilized for the categorization. The suggested method is tested on well-known image datasets like the Brodatz texture and the Salzburg texture image (STex) datasets. The results are positive, showing the potential of graph methods for texture classification. Full article
Show Figures

Figure 1

Figure 1
<p>Natural visibility graph from time series data.</p>
Full article ">Figure 2
<p>Horizontal visibility graph from time series data.</p>
Full article ">Figure 3
<p>Sample texture images from the Brodatz texture image dataset.</p>
Full article ">Figure 4
<p>Sample texture images from the Salzburg texture image dataset.</p>
Full article ">Figure 5
<p>Flowchart of the proposed method.</p>
Full article ">Figure 6
<p>Accuracy of (<b>a</b>) INVG, (<b>b</b>) IHVG, and (<b>c</b>) combined graphs on Brodatz dataset.</p>
Full article ">Figure 7
<p>Accuracy of (<b>a</b>) degree distribution, (<b>b</b>) clustering coefficient, and (<b>c</b>) combined features on Brodatz dataset.</p>
Full article ">Figure 8
<p>Accuracy of (<b>a</b>) degree distribution, (<b>b</b>) clustering coefficient, and (<b>c</b>) combined features on STex dataset.</p>
Full article ">Figure 9
<p>Accuracy of (<b>a</b>) INVG, (<b>b</b>) IHVG, and (<b>c</b>) combined graphs on STex dataset.</p>
Full article ">
10 pages, 1247 KiB  
Article
Multi-Modality Tensor Fusion Based Human Fatigue Detection
by Jongwoo Ha, Joonhyuck Ryu and Joonghoon Ko
Electronics 2023, 12(15), 3344; https://doi.org/10.3390/electronics12153344 - 4 Aug 2023
Viewed by 1211
Abstract
Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we [...] Read more.
Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we investigate the effect of multimodal learning schemes of quantitative data to assess its qualitative state. We try to interpret human fatigue levels through analyzing video, thermal image and voice data together. The experiment showed that the multimodal approach using three types of data was more effective than the method of using each dataset individually. As a result, we identified the possibility of predicting human fatigue states. Full article
Show Figures

Figure 1

Figure 1
<p>F-TFN1: SubModels for extracting modality features.</p>
Full article ">Figure 2
<p>Video SubModel.</p>
Full article ">Figure 3
<p>Voice SubModel.</p>
Full article ">Figure 4
<p>F-TFN2: SubNets for modality embedding.</p>
Full article ">Figure 5
<p>Video subnetwork.</p>
Full article ">Figure 6
<p>Data collection process.</p>
Full article ">Figure 7
<p>Display sample of the biological signal acquisition system.</p>
Full article ">
18 pages, 6534 KiB  
Article
Underwater Image Color Constancy Calculation with Optimized Deep Extreme Learning Machine Based on Improved Arithmetic Optimization Algorithm
by Junyi Yang, Qichao Yu, Sheng Chen and Donghe Yang
Electronics 2023, 12(14), 3174; https://doi.org/10.3390/electronics12143174 - 21 Jul 2023
Cited by 1 | Viewed by 1097
Abstract
To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this [...] Read more.
To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this study, the gray edge framework is utilized to extract color features from underwater images, which are employed as input vectors. To address the issue of unstable prediction results caused by the random selection of parameters in DELM, the arithmetic optimization algorithm (AOA) is integrated, and the search segment mapping method is optimized by using hidden layer biases and input layer weights. Furthermore, an iterative chaotic mapping initialization strategy is incorporated to provide AOA with a better initial search proxy. The IAOA-DELM model computes illumination information based on the input color vectors. Experimental evaluations conducted on actual underwater images demonstrate that the proposed IAOA-DELM illumination correction model achieves an accuracy of 96.07%. When compared to the ORELM, ELM, RVFL, and BP models, the IAOA-DELM model exhibits improvements of 6.96%, 7.54%, 8.00%, and 8.89%, respectively, making it the most effective among the compared illumination correction models. Full article
Show Figures

Figure 1

Figure 1
<p>DELM Network structure.</p>
Full article ">Figure 2
<p>Search agent mapping flowchart.</p>
Full article ">Figure 3
<p>IAOA-DELM algorithm flowchart.</p>
Full article ">Figure 4
<p>Experimental facility.</p>
Full article ">Figure 5
<p>IAOA-DELM light correction process diagram.</p>
Full article ">Figure 6
<p>The average chrominance accuracy under different population sizes and number of iterations when alpha = 0.25.</p>
Full article ">Figure 7
<p>The average chrominance accuracy under different population sizes and number of iterations when alpha = 0.5.</p>
Full article ">Figure 8
<p>The average chrominance accuracy under different population sizes and number of iterations when alpha = 0.25.</p>
Full article ">Figure 9
<p>Comparison of average accuracy results between IAOA-DELM algorithm and comparison group algorithm.</p>
Full article ">Figure 10
<p>Algorithm angle error stability analysis box diagram.</p>
Full article ">Figure 11
<p>Correction results of different illumination correction models (<b>A</b>) Underwater image to be corrected under TL83 light source (<b>B</b>) Standard image in air under the same scene under D50 light source (<b>C</b>) IAOA-DELM (<b>D</b>) AOA-DELM (<b>E</b>) GWO-DELM (<b>F</b>) WOA-DELM (<b>G</b>) HHO-DELM (<b>H</b>) ORELM (<b>I</b>) ELM (<b>J</b>) RVFL (<b>K</b>) BP.</p>
Full article ">Figure 12
<p>Comparison with other classical and advanced image correction algorithms (<b>A</b>) Underwater image to be corrected under TL83 light source (<b>B</b>) Standard image in air under the same scene under D50 light source (<b>C</b>) IAOA-DELM (<b>D</b>) Interactive WB Method (<b>E</b>) Data-Driven WB Method (<b>F</b>) WB color augmenter (<b>G</b>) Grey-World (<b>H</b>) Max-RGB (<b>I</b>) Shades of Grey.</p>
Full article ">
25 pages, 9839 KiB  
Article
An Improved Median Filter Based on YOLOv5 Applied to Electrochemiluminescence Image Denoising
by Jun Yang, Junyang Chen, Jun Li, Shijie Dai and Yihui He
Electronics 2023, 12(7), 1544; https://doi.org/10.3390/electronics12071544 - 24 Mar 2023
Cited by 1 | Viewed by 1932
Abstract
In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the [...] Read more.
In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the image. In this paper, a Center-Adaptive Median Filter (CAMF) based on YOLOv5 is proposed. Unlike other traditional filtering algorithms, CAMF can adjust its size in real-time according to the current pixel position, the center and the boundary frame of each light spot, and the distance between them. This gives CAMF both a strong noise reduction ability and light spot detail protection ability. In our experiment, the evaluation scores of CAMF for the three indicators Peak Signal-to-Noise Ratio (PSNR), Image Enhancement Factor (IEF), and Structural Similarity (SSIM) were 40.47 dB, 613.28 and 0.939, respectively. The results show that CAMF is superior to other filtering algorithms in noise reduction and light spot protection. Full article
Show Figures

Figure 1

Figure 1
<p>Mosaic data enhancement.</p>
Full article ">Figure 2
<p>Median Filter.</p>
Full article ">Figure 3
<p>Comparison between five mobile phones. (<b>a</b>) Image captured by Xiaomi P30; (<b>b</b>) Image captured by Huawei P30; (<b>c</b>) Image captured by OPPO Reno8 Pro+; (<b>d</b>) Image captured by iPhone 13; (<b>e</b>) Image captured by Honor X20.</p>
Full article ">Figure 4
<p>Image comparison before and after processing. (<b>a</b>) Original image; (<b>b</b>) Processed image.</p>
Full article ">Figure 5
<p>Data presentation. (<b>a</b>) Image with one light spot; (<b>b</b>) Image with two light spots; (<b>c</b>) Image with three light spots; (<b>d</b>) Image with four light spots; (<b>e</b>) Image with five light spots; (<b>f</b>) Image with six light spots; (<b>g</b>) Image with seven light spots; (<b>h</b>) Image with eight light spots.</p>
Full article ">Figure 6
<p>Density plot.</p>
Full article ">Figure 7
<p>YOLOv5 model diagram.</p>
Full article ">Figure 8
<p>YOLOv5 was used to detect the light spot of the ECL images. In the Final detections, the orange and the blue frame are the boundary frames corresponding to the two light spots.</p>
Full article ">Figure 9
<p>The graphical description of <math display="inline"><semantics> <mrow> <msubsup> <mi>s</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mrow> <mi>n</mi> <mi>e</mi> <mi>a</mi> <mi>r</mi> </mrow> </msubsup> <mfenced> <mi>x</mi> </mfenced> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>d</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mrow> <mi>n</mi> <mi>e</mi> <mi>a</mi> <mi>r</mi> </mrow> </msubsup> <mfenced> <mi>x</mi> </mfenced> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p>The process of CAMF.</p>
Full article ">Figure 11
<p>Precision curve.</p>
Full article ">Figure 12
<p>Recall curve.</p>
Full article ">Figure 13
<p>F1-score curve.</p>
Full article ">Figure 14
<p><math display="inline"><semantics> <mrow> <mi>m</mi> <mi>A</mi> <msub> <mi>P</mi> <mrow> <mo>@</mo> <mn>0.5</mn> </mrow> </msub> </mrow> </semantics></math> curve.</p>
Full article ">Figure 15
<p>Precision-confidence curve.</p>
Full article ">Figure 16
<p>Precision-recall curve.</p>
Full article ">Figure 17
<p>Visual comparison of all the filtering algorithms. (<b>a</b>) Noisy image 1; (<b>b</b>) DAMF; (<b>c</b>) PDBF; (<b>d</b>) AIFF; (<b>e</b>) MF; (<b>f</b>) NAFSMF; (<b>g</b>) MDBUTMF; (<b>h</b>) IMF; (<b>i</b>) CAMF; (<b>j</b>) Noisy image 2; (<b>k</b>) DAMF; (<b>l</b>) PDBF; (<b>m</b>) AIFF; (<b>n</b>) MF; (<b>o</b>) NAFSMF; (<b>p</b>) MDBUTMF; (<b>q</b>) IMF; (<b>r</b>) CAMF.</p>
Full article ">Figure 18
<p>Some images from the FMD dataset.</p>
Full article ">Figure 19
<p>Some images from the blood cell image dataset.</p>
Full article ">
Back to TopTop