Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (86)

Search Parameters:
Keywords = CRNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 3601 KiB  
Article
Full-Scale Piano Score Recognition
by Xiang-Yi Zhang and Jia-Lien Hsu
Appl. Sci. 2025, 15(5), 2857; https://doi.org/10.3390/app15052857 - 6 Mar 2025
Abstract
Sheet music is one of the most efficient methods for storing music. Meanwhile, a large amount of sheet music-image data is stored in paper form, but not in a computer-readable format. Therefore, digitizing sheet music is an essential task, such that the encoded [...] Read more.
Sheet music is one of the most efficient methods for storing music. Meanwhile, a large amount of sheet music-image data is stored in paper form, but not in a computer-readable format. Therefore, digitizing sheet music is an essential task, such that the encoded music object could be effectively utilized for tasks such as editing or playback. Although there have been a few studies focused on recognizing sheet music images with simpler structures—such as monophonic scores or more modern scores with relatively simple structures, only containing clefs, time signatures, key signatures, and notes—in this paper we focus on the issue of classical sheet music containing dynamics symbols and articulation signs, more than only clefs, time signatures, key signatures, and notes. Therefore, this study augments the data from the GrandStaff dataset by concatenating single-line scores into multi-line scores and adding various classical music dynamics symbols not included in the original GrandStaff dataset. Given a full-scale piano score in pages, our approach first applies three YOLOv8 models to perform the three tasks: 1. Converting a full page of sheet music into multiple single-line scores; 2. Recognizing the classes and absolute positions of dynamics symbols in the score; and 3. Finding the relative positions of dynamics symbols in the score. Then, the identified dynamics symbols are removed from the original score, and the remaining score serves as the input into a Convolutional Recurrent Neural Network (CRNN) for the following steps. The CRNN outputs KERN notation (KERN, a core pitch/duration representation for common practice music notation) without dynamics symbols. By combining the CRNN output with the relative and absolute position information of the dynamics symbols, the final output is obtained. The results show that with the assistance of YOLOv8, there is a significant improvement in accuracy. Full article
(This article belongs to the Special Issue Integration of AI in Signal and Image Processing)
Show Figures

Figure 1

Figure 1
<p>An illustrated example of KERN-encoded pianoform music.</p>
Full article ">Figure 2
<p>The structure of the proposed model.</p>
Full article ">Figure 3
<p>The structure of the OMR Processing Block.</p>
Full article ">Figure 4
<p>Examples of <span class="html-italic">crescendo</span> and <span class="html-italic">decrescendo</span> symbols are presented as follows: (<b>Top left</b>) Crescendo symbol. (<b>Top right</b>) Decrescendo symbol. (<b>Bottom left</b>) Crescendo symbol represented in textual form. (<b>Bottom right</b>) Decrescendo symbol represented in textual form.</p>
Full article ">Figure 5
<p>An example of image column bounding box.</p>
Full article ">Figure 6
<p>(<b>Left</b>) Chord with a second interval. (<b>Right</b>) Chord without a second interval.</p>
Full article ">Figure 7
<p>The architecture of CRNN.</p>
Full article ">Figure 8
<p>The input image is first transformed into a feature map using the CNN. The feature map is subsequently divided into multiple sub-images of equal width but varying heights, which are then concatenated to form a single sequential representation.</p>
Full article ">Figure 9
<p>An example of a full-page music score image created by concatenating four single-line score images is shown. In the image, green bounding boxes highlight objects classified as “words”, while blue bounding boxes denote objects classified as “staff”.</p>
Full article ">Figure 10
<p>An illustration of erosion and dilation applied to the GrandStaff_dynam dataset alongside the original image is shown. The top row presents the original image, the middle row displays the eroded image, and the bottom row shows the dilated image.</p>
Full article ">Figure 11
<p>Confusion matrix for YOLOv8’s accuracy in recognizing dynamics symbols on original images. The horizontal axis represents the ground truth classification, while the vertical axis represents the predicted classification.</p>
Full article ">Figure 12
<p>Confusion matrix for YOLOv8’s accuracy in recognizing dynamics symbols on erosion or dilation images. The horizontal axis represents the ground truth classification, while the vertical axis represents the predicted classification.</p>
Full article ">Figure 13
<p>Examples of dynamics symbols detected by the YOLO model.</p>
Full article ">Figure 14
<p>CRNN for the recognition of score images without dynamics symbols. (<b>Top</b>) Ground truth. (<b>Bottom</b>) Images predicted by CRNN. These images are rendered through Verovio from KERN files, not the original output of CRNN.</p>
Full article ">Figure 15
<p>CRNN for the recognition of score images with dynamics symbols. (<b>Top</b>) Ground truth. (<b>Bottom</b>) Images predicted by CRNN. These images are rendered through Verovio from KERN files, not the original output of CRNN.</p>
Full article ">Figure 16
<p>CRNN and YOLOv8 for the recognition of score images with dynamics symbols. (<b>Top</b>) Ground truth. (<b>Bottom</b>) Images predicted by CRNN. These images are rendered through Verovio from KERN files, not the original output of CRNN.</p>
Full article ">Figure 17
<p>KERN encoding generated by CRNN without dynamics symbols. (<b>Left</b>) KERN encoding produced by CRNN. (<b>Right</b>) Ground truth, with incorrectly predicted symbols displayed in red font on the left side.</p>
Full article ">Figure 18
<p>(<b>Left</b>) The generated image using recognized KERN encoding and rendered with Verovio. Symbols highlighted in red indicate recognition errors. (<b>Right</b>) The actual image serving as the ground truth.</p>
Full article ">
27 pages, 1938 KiB  
Article
Skeleton Reconstruction Using Generative Adversarial Networks for Human Activity Recognition Under Occlusion
by Ioannis Vernikos and Evaggelos Spyrou
Sensors 2025, 25(5), 1567; https://doi.org/10.3390/s25051567 - 4 Mar 2025
Viewed by 104
Abstract
Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective [...] Read more.
Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective and perform well under controlled conditions but face challenges in real-world scenarios due to factors such as viewpoint changes, illumination variations, and occlusion. The latter is the most significant challenge in real-world recognition; partial occlusion impacts recognition accuracy to varying degrees depending on the activity and the occluded body parts while complete occlusion can render activity recognition impossible. In this paper, we propose a novel approach for human activity recognition in the presence of partial occlusion, which may be applied in cases wherein up to two body parts are occluded. The proposed approach works under the assumptions that (a) human motion is modeled using a set of 3D skeletal joints, and (b) the same body parts remain occluded throughout the whole activity. Contrary to previous research, in this work, we address this problem using a Generative Adversarial Network (GAN). Specifically, we train a Convolutional Recurrent Neural Network (CRNN), whose goal is to serve as the generator of the GAN. Its aim is to complete the missing parts of the skeleton due to occlusion. Specifically, the input to this CRNN consists of raw 3D skeleton joint positions, upon the removal of joints corresponding to occluded parts. The output of the CRNN is a reconstructed skeleton. For the discriminator of the GAN, we use a simple long short-term memory (LSTM) network. We evaluate the proposed approach using publicly available datasets in a series of occlusion scenarios. We demonstrate that in all scenarios, the occlusion of certain body parts causes a significant decline in performance, although in some cases, the reconstruction process leads to almost perfect recognition. Nonetheless, in almost every circumstance, the herein proposed approach exhibits superior performance compared to previous works, which varies between 2.2% and 37.5%, depending on the dataset used and the occlusion case. Full article
(This article belongs to the Special Issue Robust Motion Recognition Based on Sensor Technology)
Show Figures

Figure 1

Figure 1
<p>A human body pose with the 20 and 25 skeletal joints that are extracted using the Microsoft Kinect v1 (<b>left</b>) and v2 (<b>right</b>) cameras. Joints have been divided into subsets, each corresponding to one of the five main body parts, i.e., torso (blue), left hand (green), right hand (red), left leg (orange), and right leg (magenta). For illustrative purposes and also to facilitate comparisons between the two different versions, body parts have been colored using the same colors. Numbering follows the Kinect SDK in both cases; therefore, there exist several differences between the two versions.</p>
Full article ">Figure 2
<p>Example skeleton sequences of the activities (<b>a</b>) <span class="html-italic">handshaking</span> and (<b>b</b>) <span class="html-italic">hugging other person</span> from the PKU-MMD dataset, captured by Microsoft Kinect v2. First row: original skeletons, including all 25 joints (i.e., without any occlusion); second row: joints corresponding to (<b>a</b>) left arm; (<b>b</b>) both arms (see <a href="#sensors-25-01567-f001" class="html-fig">Figure 1</a>) have been discarded (i.e., the skeleton is partially occluded); third row: skeletons have been reconstructed using the proposed deep regression approach. The example of (<b>a</b>) is successfully reconstructed and correctly classified, while the example of (<b>b</b>) is unsuccessfully reconstructed and incorrectly classified.</p>
Full article ">Figure 3
<p>The architecture of the generator of the proposed GAN.</p>
Full article ">Figure 4
<p>The architecture of the discriminator of the proposed GAN architecture.</p>
Full article ">Figure 5
<p>A visual overview of the proposed approach.</p>
Full article ">Figure 6
<p>The architecture of the classifier of the proposed approach for the three-camera case.</p>
Full article ">Figure 7
<p>The architecture of the classifier of the proposed approach for the one-camera case.</p>
Full article ">Figure 8
<p>Normalized confusion matrices for classification for all datasets, without removing any body part.</p>
Full article ">Figure 9
<p>Confidence intervals using the proposed approach on all datasets, compared with the best weighted accuracies reported in previous works. In case of the proposed approach, red dot denotes the upper bound of the confidence interval, i.e., the best weighted accuracy achieved.</p>
Full article ">Figure 10
<p>Normalized confusion matrices for classification for the NTU-RGB+D dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 11
<p>Normalized confusion matrices for classification for the PKU-MMD dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 12
<p>Normalized confusion matrices for classification for the SYSU-3D-HOI dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 13
<p>Normalized confusion matrices for classification for the UT-Kinect-Action-3D dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">
32 pages, 16524 KiB  
Article
HI-CMAIM: Hybrid Intelligence-Based Multi-Source Unstructured Chinese Map Annotation Interpretation Model
by Jiaxin Ren, Wanzeng Liu, Jun Chen, Xiuli Zhu, Ran Li, Tingting Zhao, Jiadong Zhang, Yuan Tao, Shunxi Yin, Xi Zhai, Yunlu Peng and Xinpeng Wang
Remote Sens. 2025, 17(2), 204; https://doi.org/10.3390/rs17020204 - 8 Jan 2025
Viewed by 558
Abstract
Map annotation interpretation is crucial for geographic information extraction and intelligent map analysis. This study addresses the challenges associated with interpreting Chinese map annotations, specifically visual complexity and data scarcity issues, by proposing a hybrid intelligence-based multi-source unstructured Chinese map annotation interpretation method [...] Read more.
Map annotation interpretation is crucial for geographic information extraction and intelligent map analysis. This study addresses the challenges associated with interpreting Chinese map annotations, specifically visual complexity and data scarcity issues, by proposing a hybrid intelligence-based multi-source unstructured Chinese map annotation interpretation method (HI-CMAIM). Firstly, leveraging expert knowledge in an innovative way, we constructed a high-quality expert knowledge-based map annotation dataset (EKMAD), which significantly enhanced data diversity and accuracy. Furthermore, an improved annotation detection model (CMA-DB) and an improved annotation recognition model (CMA-CRNN) were designed based on the characteristics of map annotations, both incorporating expert knowledge. A two-stage transfer learning strategy was employed to tackle the issue of limited training samples. Experimental results demonstrated the superiority of HI-CMAIM over existing algorithms. In the detection task, CMA-DB achieved an 8.54% improvement in Hmean (from 87.73% to 96.27%) compared to the DB algorithm. In the recognition task, CMA-CRNN achieved a 15.54% improvement in accuracy (from 79.77% to 95.31%) and a 4-fold reduction in NED (from 0.1026 to 0.0242), confirming the effectiveness and advancement of the proposed method. This research not only provides a novel approach and data support for Chinese map annotation interpretation but also fills the gap of high-quality, diverse datasets. It holds practical application value in fields such as geographic information systems and cartography, significantly contributing to the advancement of intelligent map interpretation. Full article
(This article belongs to the Special Issue Deep Learning Innovations in Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Overall strategy for map annotation interpretation.</p>
Full article ">Figure 2
<p>Hybrid intelligence-based method for interpreting multi-source unstructured Chinese map annotations.</p>
Full article ">Figure 3
<p>Knowledge-guided construction method for intelligent Chinese map annotation dataset.</p>
Full article ">Figure 4
<p>Overall architecture of PP-HGNet.</p>
Full article ">Figure 5
<p>ESE module.</p>
Full article ">Figure 6
<p>Improved LKPAN module.</p>
Full article ">Figure 7
<p>Overall architecture of CRNN.</p>
Full article ">Figure 8
<p>ResNet-34 architecture.</p>
Full article ">Figure 9
<p>Two types of residual block.</p>
Full article ">Figure 10
<p>Overview of the transfer learning strategy.</p>
Full article ">Figure 11
<p>Transfer learning-based CMA-CRNN.</p>
Full article ">Figure 12
<p>Comparison of annotation detection results on topographic maps. Green boxes indicate correct detections, blue boxes represent false detections, and red boxes denote missed detections. Please zoom in to examine details.</p>
Full article ">Figure 13
<p>Comparison of annotation detection results on administrative maps. Green boxes indicate correct detections, blue boxes represent false detections, and red boxes denote missed detections. Please zoom in to examine details.</p>
Full article ">Figure 14
<p>Comparison of annotation detection results on internet maps. Green boxes indicate correct detections, blue boxes represent false detections, and red boxes denote missed detections. Please zoom in to examine details.</p>
Full article ">Figure 15
<p>Comparison of annotation detection results on image maps. Green boxes indicate correct detections, blue boxes represent false detections, and red boxes denote missed detections. Please zoom in to examine details.</p>
Full article ">Figure 16
<p>Recognition results for annotations of different styles. Red highlights indicate recognition errors, and red underlines denote missing characters. (<b>a</b>) low-contrast annotations, (<b>b</b>) annotations in complex backgrounds, (<b>c</b>) curved or rotated annotations, (<b>d</b>) incomplete annotations, (<b>e</b>) similar-looking characters, (<b>f</b>) composite characters.</p>
Full article ">
16 pages, 952 KiB  
Article
SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals
by Davide Lillini, Carlo Aironi, Lucia Migliorelli, Leonardo Gabrielli and Stefano Squartini
Sensors 2024, 24(23), 7782; https://doi.org/10.3390/s24237782 - 5 Dec 2024
Viewed by 918
Abstract
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography [...] Read more.
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography (PSG), an intrusive procedure that depends on subjective assessment by expert clinicians. To address the limitations of PSG, we propose a decision support system, which uses a tracheal microphone for data collection and a deep learning (DL) approach—namely SiCRNN—to detect apnea events during overnight sleep recordings. Our proposed SiCRNN processes Mel spectrograms using a Siamese approach, integrating a convolutional neural network (CNN) backbone and a bidirectional gated recurrent unit (GRU). The final detection of apnea events is performed using an unsupervised clustering algorithm, specifically k-means. Multiple experimental runs were carried out to determine the optimal network configuration and the most suitable type and frequency range for the input data. Tests with data from eight patients showed that our method can achieve a Recall score of up to 95% for apnea events. We also compared the proposed approach to a fully convolutional baseline, recently introduced in the literature, highlighting the effectiveness of the Siamese training paradigm in improving the identification of SAS. Full article
Show Figures

Figure 1

Figure 1
<p>The scatter plots illustrate the output of the principal component (PCA) applied to the output of the final GRU layer in the SiCRNN model. The resulting embeddings are derived from two patients under two conditions: (<b>a</b>) noise-free patient embeddings and (<b>b</b>) noisy patient embeddings. The observed distances between the <span class="html-italic">apnea</span> and <span class="html-italic">non-apnea</span> clusters are <math display="inline"><semantics> <mrow> <mn>2.0</mn> </mrow> </semantics></math> in the noise-free scenario and <math display="inline"><semantics> <mrow> <mn>0.87</mn> </mrow> </semantics></math> in the presence of noise, respectively.</p>
Full article ">Figure 2
<p>Overview of the proposed SiCRNN framework. The purple dashed line highlights the Siamese configuration employed during the training phase, whereas the green dashed line corresponds to the inference phase, which is carried out through the k-means clustering algorithm.</p>
Full article ">Figure 3
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of GRU hidden layers used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of orange represent the number of convolutional blocks used in the model’s training.</p>
Full article ">Figure 4
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the dimension of the kernel size used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of gray represent the kernel size used in the model’s training.</p>
Full article ">Figure 5
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of MEL bands selected for each input sample frequency during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of blue represent the number of MEL bands used in the model’s training.</p>
Full article ">Figure 6
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of convolutional blocks used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of green represent the number of convolutional blocks used in the model’s training.</p>
Full article ">Figure 7
<p>(<b>a</b>) The region located below the top of the red mask indicates the apnea events; (<b>b</b>,<b>c</b>) spectrograms with the labeled red mask display an apnea event with significant spectral content. The time associated with each individual bin in the spectrograms is <math display="inline"><semantics> <mrow> <mn>11.56</mn> </mrow> </semantics></math> ms.</p>
Full article ">
13 pages, 2404 KiB  
Article
Automated Cough Analysis with Convolutional Recurrent Neural Network
by Yiping Wang, Mustafaa Wahab, Tianqi Hong, Kyle Molinari, Gail M. Gauvreau, Ruth P. Cusack, Zhen Gao, Imran Satia and Qiyin Fang
Bioengineering 2024, 11(11), 1105; https://doi.org/10.3390/bioengineering11111105 - 1 Nov 2024
Viewed by 1069
Abstract
Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical [...] Read more.
Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical practice and research. There are currently limited tools to quantitatively measure spontaneous coughs in daily living settings in clinical trials and in clinical practice. In this study, we developed a machine learning model for the detection and classification of cough sounds. Mel spectrograms are utilized as a key feature representation to capture the temporal and spectral characteristics of coughs. We applied this approach to automate cough analysis using 300 h of audio recordings from cough challenge clinical studies conducted in a clinical lab setting. A number of machine learning algorithms were studied and compared, including decision tree, support vector machine, k-nearest neighbors, logistic regression, random forest, and neural network. We identified that for this dataset, the CRNN approach is the most effective method, reaching 98% accuracy in identifying individual coughs from the audio data. These findings provide insights into the strengths and limitations of various algorithms, highlighting the potential of CRNNs in analyzing complex cough patterns. This research demonstrates the potential of neural network models in fully automated cough monitoring. The approach requires validation in detecting spontaneous coughs in patients with refractory chronic cough in a real-life setting. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of cough data in time (<b>a</b>)- and frequency (<b>b</b>–<b>d</b>)-domain formats: (<b>a</b>) 2 min amplitude graph, with coughing events marked by red lines; (<b>b</b>) Mel spectrogram of the same audio segment; (<b>c</b>) 0.5-s audio segment containing a cough occurring between 1217.0 and 1217.3 s; (<b>d</b>) 0.5-s audio segment that does not contain a cough. The color bar shows volum of the recording by decibel from yellow (0 db) to black (80 db).</p>
Full article ">Figure 2
<p>The structure of the CRNN model.</p>
Full article ">Figure 3
<p>Cross-validation results of models trained and tested on different datasets. The metrics used were accuracy, sensitivity, and specificity. Blue represents full data set, orange represents set A, green represents set B and red represents. (<b>a</b>) Accuracy across train-test combinations. (<b>b</b>) Sensitivity across train-test combinations. (<b>c</b>) Specificity across train-test combinations.</p>
Full article ">Figure 4
<p>Training and testing performance metrics of the CRNN model across 50 epochs. Left figure bule line shows the training loss and orange line shows the test loss. Right figure blue line shows the training accuracy, orange shows test accuracy, green line shows the training sensitivity, red line shows the test sensitivity, purple line shows the training specificity and brown line shows the test specificity.</p>
Full article ">Figure 5
<p>ROC curves of all four machine learning modes and the CRNN model. All models were trained and tested on the filtered CCH dataset. The ROC curve of CRNN is from the best mixed-performance model.</p>
Full article ">
20 pages, 3003 KiB  
Article
Equipment Sounds’ Event Localization and Detection Using Synthetic Multi-Channel Audio Signal to Support Collision Hazard Prevention
by Kehinde Elelu, Tuyen Le and Chau Le
Buildings 2024, 14(11), 3347; https://doi.org/10.3390/buildings14113347 - 23 Oct 2024
Viewed by 709
Abstract
Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine [...] Read more.
Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine equipment sounds’ Direction of Arrival (DOA). No existing frameworks implement a neural network-based approach for both equipment sound classification and localization. This paper presents an innovative framework for sound classification and localization using multichannel sound datasets artificially synthesized in a virtual three-dimensional space. The simulation synthesized 10,000 multi-channel datasets using just fourteen single sound source audiotapes. This training includes a two-staged convolutional recurrent neural network (CRNN), where the first stage learns multi-label sound event classes followed by the second stage to estimate their DOA. The proposed framework achieves a low average DOA error of 30 degrees and a high F-score of 0.98, demonstrating accurate localization and classification of equipment near workers’ positions on the site. Full article
(This article belongs to the Special Issue Big Data Technologies in Construction Management)
Show Figures

Figure 1

Figure 1
<p>Multichannel Audio-based Collision Hazard Detection Pipeline.</p>
Full article ">Figure 2
<p>Spectrogram for (<b>Left</b>) Crane—Mobile Equipment, (<b>Right</b>) Saw—Stationary Equipment.</p>
Full article ">Figure 3
<p>Sample simulation setup.</p>
Full article ">Figure 4
<p>Sample scenario of equipment moving toward workers on construction site. One piece of equipment is mobile, moving towards the right (left ball), and another is stationary (right ball). (<b>A</b>) initial position of both pieces of equipment sound, (<b>B</b>) the mobile equipment (left ball) approaches the workers, while the stationary equipment (right ball) remains in place, (<b>C</b>) the mobile equipment is halfway toward the workers, with a potential collision hazard emerging, (<b>D</b>) the mobile equipment reaches its closest point to the workers.</p>
Full article ">Figure 5
<p>Two-Stage Sound Event Detection and Localization Network.</p>
Full article ">Figure 6
<p>SELD Score for Scenarios with both Stationary and Mobile Equipment.</p>
Full article ">Figure 7
<p>SELD Score for Scenarios with Two Concurrent Mobile Equipment.</p>
Full article ">Figure 8
<p>DOA Error Distribution across Different Equipment Types.</p>
Full article ">
13 pages, 3573 KiB  
Review
Cornulin as a Key Diagnostic and Prognostic Biomarker in Cancers of the Squamous Epithelium
by Varun Shankavaram, Dean Shah, Aseel Alashqar, Jackson Sweeney and Hilal Arnouk
Genes 2024, 15(9), 1122; https://doi.org/10.3390/genes15091122 - 26 Aug 2024
Viewed by 1820
Abstract
The prevalence of squamous cell carcinoma is increasing, and efforts that aid in an early and accurate diagnosis are crucial to improve clinical outcomes for patients. Cornulin, a squamous epithelium-specific protein, has recently garnered attention due to its implications in the progression of [...] Read more.
The prevalence of squamous cell carcinoma is increasing, and efforts that aid in an early and accurate diagnosis are crucial to improve clinical outcomes for patients. Cornulin, a squamous epithelium-specific protein, has recently garnered attention due to its implications in the progression of squamous cell carcinoma developed in several tissues. As an epidermal differentiation marker, it is involved in skin anchoring, regulating cellular proliferation, and is a putative tumor suppressor. The physiologically healthy squamous epithelium displays a considerable level of Cornulin, whereas squamous cell carcinomas have marked downregulation, suggesting that Cornulin expression levels can be utilized for the early detection and follow-up on the progression of these types of cancer. Cornulin’s expression patterns in cervical cancer have been examined, and findings support the stepwise downregulation of Cornulin levels that accompanies the progression to neoplasia in the cervix. Additional studies documented a similar trend in expression in other types of cancer, such as cutaneous, esophageal, and oropharyngeal squamous cell carcinomas. The consistent and predictable pattern of Cornulin expression across several squamous cell carcinomas and its correlation with key clinicopathological parameters make it a reliable biomarker for assessing the transformation and progression events in the squamous epithelium, thus potentially contributing to the early detection, definitive diagnosis, and more favorable prognosis for these cancer patients. Full article
(This article belongs to the Special Issue Molecular Diagnostic and Prognostic Markers of Human Cancers)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Schematic graph showing the correlation between the downregulation in Cornulin expression and the progression of oral squamous cell carcinomas from the normal oral mucosa to dysplastic premalignant lesions to invasive phenotypes. Representative immunohistochemistry staining for Cornulin in normal oral mucosa (<b>A</b>), leukoplakia lesion (<b>B</b>), and oral squamous cell carcinoma (<b>C</b>). Similar trends have been documented for cervical and esophageal cancers.</p>
Full article ">Figure 2
<p>Illustration of evaluating the extent of tumor spread and margins using direct visual examination of tumor mass (yellow zone), microscopic examination of histological alterations (green zone), and molecular studies to reveal genetic and proteomic alterations in the precursor fields (red zone) that can suffer malignant transformation leading to local relapses in head and neck cancer patients.</p>
Full article ">Figure 3
<p>Cornulin expression around keratin pearls in well-differentiated cutaneous squamous cell carcinoma tissue samples. Representative images (<b>A</b>) H&amp;E-stained and (<b>B</b>) Immunohistochemistry-stained show intense Cornulin immunoreactivity in the central keratinocytes (green asterisk) adjacent to the keratin pearl (dotted circle), while the peripheral keratinocytes (red asterisk) do not show any detectable levels of Cornulin expression.</p>
Full article ">
21 pages, 4424 KiB  
Article
CSA-SA-CRTNN: A Dual-Stream Adaptive Convolutional Cyclic Hybrid Network Combining Attention Mechanisms for EEG Emotion Recognition
by Ren Qian, Xin Xiong, Jianhua Zhou, Hongde Yu and Kaiwen Sha
Brain Sci. 2024, 14(8), 817; https://doi.org/10.3390/brainsci14080817 - 15 Aug 2024
Viewed by 1088
Abstract
In recent years, EEG-based emotion recognition technology has made progress, but there are still problems of low model efficiency and loss of emotional information, and there is still room for improvement in recognition accuracy. To fully utilize EEG’s emotional information and improve recognition [...] Read more.
In recent years, EEG-based emotion recognition technology has made progress, but there are still problems of low model efficiency and loss of emotional information, and there is still room for improvement in recognition accuracy. To fully utilize EEG’s emotional information and improve recognition accuracy while reducing computational costs, this paper proposes a Convolutional-Recurrent Hybrid Network with a dual-stream adaptive approach and an attention mechanism (CSA-SA-CRTNN). Firstly, the model utilizes a CSAM module to assign corresponding weights to EEG channels. Then, an adaptive dual-stream convolutional-recurrent network (SA-CRNN and MHSA-CRNN) is applied to extract local spatial-temporal features. After that, the extracted local features are concatenated and fed into a temporal convolutional network with a multi-head self-attention mechanism (MHSA-TCN) to capture global information. Finally, the extracted EEG information is used for emotion classification. We conducted binary and ternary classification experiments on the DEAP dataset, achieving 99.26% and 99.15% accuracy for arousal and valence in binary classification and 97.69% and 98.05% in ternary classification, and on the SEED dataset, we achieved an accuracy of 98.63%, surpassing relevant algorithms. Additionally, the model’s efficiency is significantly higher than other models, achieving better accuracy with lower resource consumption. Full article
(This article belongs to the Section Neurotechnology and Neuroimaging)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Emotion model: (<b>a</b>) discrete model, (<b>b</b>) two-dimensional valence–arousal model.</p>
Full article ">Figure 2
<p>Frame diagram of the CSA-SA-CRTNN model. The model consists of four modules, namely the CSAM module, the SA-CRNN module, the MHSA-CRNN module, and the MHSA-TCN module.</p>
Full article ">Figure 3
<p>CSAM Structure Diagram.</p>
Full article ">Figure 4
<p>MHSA Structure Diagram.</p>
Full article ">Figure 5
<p>Structure diagram of MHSA-TCN.</p>
Full article ">Figure 6
<p>Accuracy–epoch relationship diagram. (<b>a</b>) DEAP dataset; (<b>b</b>) SEED dataset.</p>
Full article ">Figure 7
<p>The average accuracy of arousal and valence of DEAP using CSA-SA-CRTNN for each subject. (<b>a</b>) 2-class (<b>b</b>) 3-class.</p>
Full article ">Figure 8
<p>Confusion matrix: (<b>a</b>) 2-class arousal; (<b>b</b>) 2-class valence; (<b>c</b>) 3-class arousal; (<b>d</b>) 3-class valence.</p>
Full article ">Figure 9
<p>Experimental results on the SEED dataset: (<b>a</b>) Average accuracy for each subject; (<b>b</b>) confusion matrix.</p>
Full article ">Figure 10
<p>Comparison of attention mechanisms in different channels.</p>
Full article ">
22 pages, 8725 KiB  
Article
Adaptive CAPTCHA: A CRNN-Based Text CAPTCHA Solver with Adaptive Fusion Filter Networks
by Xing Wan, Juliana Johari and Fazlina Ahmat Ruslan
Appl. Sci. 2024, 14(12), 5016; https://doi.org/10.3390/app14125016 - 8 Jun 2024
Cited by 1 | Viewed by 2445
Abstract
Text-based CAPTCHAs remain the most widely adopted security scheme, which is the first barrier to securing websites. Deep learning methods, especially Convolutional Neural Networks (CNNs), are the mainstream approach for text CAPTCHA recognition and are widely used in CAPTCHA vulnerability assessment and data [...] Read more.
Text-based CAPTCHAs remain the most widely adopted security scheme, which is the first barrier to securing websites. Deep learning methods, especially Convolutional Neural Networks (CNNs), are the mainstream approach for text CAPTCHA recognition and are widely used in CAPTCHA vulnerability assessment and data collection. However, verification code recognizers are mostly deployed on the CPU platform as part of a web crawler and security assessment; they are required to have both low complexity and high recognition accuracy. Due to the specifically designed anti-attack mechanisms like noise, interference, geometric deformation, twisting, rotation, and character adhesion in text CAPTCHAs, some characters are difficult to efficiently identify with high accuracy in these complex CAPTCHA images. This paper proposed a recognition model named Adaptive CAPTCHA with a CNN combined with an RNN (CRNN) module and trainable Adaptive Fusion Filtering Networks (AFFN), which effectively handle the interference and learn the correlation between characters in CAPTCHAs to enhance recognition accuracy. Experimental results on two datasets of different complexities show that, compared with the baseline model Deep CAPTCHA, the number of parameters of our proposed model is reduced by about 70%, and the recognition accuracy is improved by more than 10 percentage points in the two datasets. In addition, the proposed model has a faster training convergence speed. Compared with several of the latest models, the model proposed by the study also has better comprehensive performance. Full article
(This article belongs to the Special Issue Advanced Technologies in Data and Information Security III)
Show Figures

Figure 1

Figure 1
<p>The Network of Deep CAPTCHA.</p>
Full article ">Figure 2
<p>Some confusing adjacent characters in CAPTCHAs.</p>
Full article ">Figure 3
<p>The networks of Adaptive CAPTCHA.</p>
Full article ">Figure 4
<p>Samples of dataset: (<b>a</b>) M-CAPTCHA; (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 5
<p>Character statistical distributions: (<b>a</b>) M-CAPTCHA; (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 6
<p>Markov transition probabilities between characters on the M-CAPTCHA.</p>
Full article ">Figure 7
<p>The structure of AFFN.</p>
Full article ">Figure 8
<p>The training process of Alpha.</p>
Full article ">Figure 9
<p>The structure of CRNN.</p>
Full article ">Figure 10
<p>ASR with and without filter networks on the M-CAPTCHA.</p>
Full article ">Figure 11
<p>ASR with and without filter networks on the P-CAPTCHA.</p>
Full article ">Figure 12
<p>Comparison of images before and after filtering: (<b>a</b>) M-CAPTCHA; (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 13
<p>Loss comparison before and after filtering: (<b>a</b>) M-CAPTCHA; (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 14
<p>AASR with different filter units on the P–dataset.</p>
Full article ">Figure 15
<p>AASR comparison using FC and CRNN: (<b>a</b>) M-CAPTCHA; (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 16
<p>AASR comparison with and without BN in CRNN.</p>
Full article ">Figure 17
<p>AASR with different layers of LSTM on P–dataset.</p>
Full article ">Figure 18
<p>AASR with different residual connections: (<b>a</b>) M-CAPTCHA (<b>b</b>) P-CAPTCHA.</p>
Full article ">Figure 19
<p>AASR with different loss functions.</p>
Full article ">Figure 20
<p>AASR confusion matrix of Adaptive CAPTCHA on M–dataset.</p>
Full article ">Figure 21
<p>AASR of different models.</p>
Full article ">
16 pages, 4245 KiB  
Article
CrnnCrispr: An Interpretable Deep Learning Method for CRISPR/Cas9 sgRNA On-Target Activity Prediction
by Wentao Zhu, Huanzeng Xie, Yaowen Chen and Guishan Zhang
Int. J. Mol. Sci. 2024, 25(8), 4429; https://doi.org/10.3390/ijms25084429 - 17 Apr 2024
Cited by 3 | Viewed by 2236
Abstract
CRISPR/Cas9 is a powerful genome-editing tool in biology, but its wide applications are challenged by a lack of knowledge governing single-guide RNA (sgRNA) activity. Several deep-learning-based methods have been developed for the prediction of on-target activity. However, there is still room for improvement. [...] Read more.
CRISPR/Cas9 is a powerful genome-editing tool in biology, but its wide applications are challenged by a lack of knowledge governing single-guide RNA (sgRNA) activity. Several deep-learning-based methods have been developed for the prediction of on-target activity. However, there is still room for improvement. Here, we proposed a hybrid neural network named CrnnCrispr, which integrates a convolutional neural network and a recurrent neural network for on-target activity prediction. We performed unbiased experiments with four mainstream methods on nine public datasets with varying sample sizes. Additionally, we incorporated a transfer learning strategy to boost the prediction power on small-scale datasets. Our results showed that CrnnCrispr outperformed existing methods in terms of accuracy and generalizability. Finally, we applied a visualization approach to investigate the generalizable nucleotide-position-dependent patterns of sgRNAs for on-target activity, which shows potential in terms of model interpretability and further helps in understanding the principles of sgRNA design. Full article
Show Figures

Figure 1

Figure 1
<p>The heatmap shows (<b>a</b>) mean SCC and (<b>b</b>) mean PCC values of CrnnCrispr and four compared methods on nine datasets with three scales, including large-scale, medium-scale and small-scale datasets. The prediction methods are placed vertically, whereas the test datasets are arranged horizontally. Test datasets are classified by sample size.</p>
Full article ">Figure 2
<p>Performance comparison of CrnnCrispr training from scratch and transfer learning on three small-scale datasets (e.g., HCT116, HELA and HL60) under 5-fold cross-validation.</p>
Full article ">Figure 3
<p>Performance comparison in terms of SCC of CrnnCrispr and four existing deep-learning-based methods on nine datasets with various scales under a leave-one-cell-out procedure.</p>
Full article ">Figure 4
<p>Impact of nucleotide composition of sgRNA activity on three large-scale datasets. Bars show the Z-scores of nucleotide frequency for each position. The numbers below represent the positions of the sequence.</p>
Full article ">Figure 5
<p>Illustration of the CrnnCrispr architecture. The sgRNA was first encoded by one-hot encoding and label encoding and was subsequently used as input of the CNN branch and BiGRU branch, respectively. The outputs of these two branches were concatenated and fed into two LSTM layers for dimensionality reduction. The outputs were flattened and input into three fully connected layers to generate the final representation. The outputs of the final fully connected layer were fed into a linear regression transformation to make a prediction of sgRNA on-target activity.</p>
Full article ">
20 pages, 15351 KiB  
Article
Intelligent Analysis System for Teaching and Learning Cognitive Engagement Based on Computer Vision in an Immersive Virtual Reality Environment
by Ce Li, Li Wang, Quanzhi Li and Dongxuan Wang
Appl. Sci. 2024, 14(8), 3149; https://doi.org/10.3390/app14083149 - 9 Apr 2024
Cited by 1 | Viewed by 1297
Abstract
The 20th National Congress of the Communist Party of China and the 14th Five Year Plan for Education Informatization focus on digital technology and intelligent learning and implement innovation-driven education environment reform. An immersive virtual reality (IVR) environment has both immersive and interactive [...] Read more.
The 20th National Congress of the Communist Party of China and the 14th Five Year Plan for Education Informatization focus on digital technology and intelligent learning and implement innovation-driven education environment reform. An immersive virtual reality (IVR) environment has both immersive and interactive characteristics, which are an important way of virtual learning and are also one of the important ways in which to promote the development of smart education. Based on the above background, this article proposes an intelligent analysis system for Teaching and Learning Cognitive engagement in an IVR environment based on computer vision. By automatically analyzing the cognitive investment of students in the IVR environment, it is possible to better understand their learning status, provide personalized guidance to improve learning quality, and thereby promote the development of smart education. This system uses Vue (developed by Evan You, located in Wuxi, China) and ECharts (Developed by Baidu, located in Beijing, China) for visual display, and the algorithm uses the Pytorch framework (Developed by Facebook, located in Silicon Valley, CA, USA), YOLOv5 (Developed by Ultralytics, located in Washington, DC, USA), and the CRNN model (Convolutional Recurrent Neural Network) to monitor and analyze the visual attention and behavioral actions of students. Through this system, a more accurate analysis of learners’ cognitive states and personalized teaching support can be provided for the education field, providing certain technical support for the development of smart education. Full article
Show Figures

Figure 1

Figure 1
<p>Overall system design architecture diagram.</p>
Full article ">Figure 2
<p>Preprocessing steps diagram.</p>
Full article ">Figure 3
<p>Preprocessing process diagram. The Chinese word in the picture says “red blood cell”.</p>
Full article ">Figure 4
<p>Preprocessing result image. The Chinese word in the picture says “red blood cell”.</p>
Full article ">Figure 5
<p>YOLOv5 architecture diagram.</p>
Full article ">Figure 6
<p>C3 module structure diagram.</p>
Full article ">Figure 7
<p>Bottleneck module architecture diagram.</p>
Full article ">Figure 8
<p>SPP module architecture diagram.</p>
Full article ">Figure 9
<p>IOU calculation chart.</p>
Full article ">Figure 10
<p>Changes in NMS processing. The Chinese word in the picture says “red blood cell”.</p>
Full article ">Figure 11
<p>Text OCR layer flowchart.</p>
Full article ">Figure 12
<p>CRNN model structure diagram.</p>
Full article ">Figure 13
<p>Network input image. The Chinese word in the picture says “red blood cell”.</p>
Full article ">Figure 14
<p>Text recognition process diagram. The Chinese word in the picture says “red blood cell”.</p>
Full article ">Figure 15
<p>Video frame cutting results.</p>
Full article ">Figure 16
<p>Video frame cutting results.</p>
Full article ">Figure 17
<p>YOLOv5 module detection results. The Chinese words in the picture say “red blood cell” and “mitochondria”.</p>
Full article ">Figure 18
<p>YOLOv5 module detection results. The Chinese word in the picture says “vesica”.</p>
Full article ">Figure 19
<p>Text OCR detection module detection effect diagram. The Chinese words in the picture say “microtubule” and “white blood cell”.</p>
Full article ">Figure 20
<p>Integration process diagram of detection data.</p>
Full article ">Figure 21
<p>Comparison chart between system detection of various target objects and the actual frame rate.</p>
Full article ">Figure 22
<p>Accuracy chart of system detection for various target objects.</p>
Full article ">Figure 23
<p>Visualization page diagram.</p>
Full article ">Figure 24
<p>Visualization of IVR videos and learner videos.</p>
Full article ">Figure 25
<p>Select statistical object visualization.</p>
Full article ">Figure 26
<p>Experimental flow chart.</p>
Full article ">
17 pages, 8563 KiB  
Article
Research on the Vision-Based Dairy Cow Ear Tag Recognition Method
by Tianhong Gao, Daoerji Fan, Huijuan Wu, Xiangzhong Chen, Shihao Song, Yuxin Sun and Jia Tian
Sensors 2024, 24(7), 2194; https://doi.org/10.3390/s24072194 - 29 Mar 2024
Cited by 3 | Viewed by 1867
Abstract
With the increase in the scale of breeding at modern pastures, the management of dairy cows has become much more challenging, and individual recognition is the key to the implementation of precision farming. Based on the need for low-cost and accurate herd management [...] Read more.
With the increase in the scale of breeding at modern pastures, the management of dairy cows has become much more challenging, and individual recognition is the key to the implementation of precision farming. Based on the need for low-cost and accurate herd management and for non-stressful and non-invasive individual recognition, we propose a vision-based automatic recognition method for dairy cow ear tags. Firstly, for the detection of cow ear tags, the lightweight Small-YOLOV5s is proposed, and then a differentiable binarization network (DBNet) combined with a convolutional recurrent neural network (CRNN) is used to achieve the recognition of the numbers on ear tags. The experimental results demonstrated notable improvements: Compared to those of YOLOV5s, Small-YOLOV5s enhanced recall by 1.5%, increased the mean average precision by 0.9%, reduced the number of model parameters by 5,447,802, and enhanced the average prediction speed for a single image by 0.5 ms. The final accuracy of the ear tag number recognition was an impressive 92.1%. Moreover, this study introduces two standardized experimental datasets specifically designed for the ear tag detection and recognition of dairy cows. These datasets will be made freely available to researchers in the global dairy cattle community with the intention of fostering intelligent advancements in the breeding industry. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

Figure 1
<p>Some samples of data from CEID-D. Capture angles: frontal, lateral, and overhead views of cows. Weather conditions during shooting: overcast and sunny days. Captured cow poses: standing, feeding, and lying down.</p>
Full article ">Figure 2
<p>Ear tag image quality assessment.</p>
Full article ">Figure 3
<p>Preprocessing of ear tag images. From left to right, the original ear tag, the ear tag after bilateral filtering, the ear tag after edge sharpening, and the ear tag after grayscaling.</p>
Full article ">Figure 4
<p>Ear tag images annotated with Paddlelabel.</p>
Full article ">Figure 5
<p>Technology Roadmap.</p>
Full article ">Figure 6
<p>The structure of YOLOV5s.</p>
Full article ">Figure 7
<p>The structure of Small-YOLOV5s.</p>
Full article ">Figure 8
<p>The structure of CA.</p>
Full article ">Figure 9
<p>The structure of DBNet.</p>
Full article ">Figure 10
<p>The structure of the CRNN.</p>
Full article ">Figure 11
<p>Comparison of cow ear tag detection results. (<b>a</b>) The results of ear tag detection using the color threshold method, with the original image on the left and the detection results on the right. (<b>b</b>,<b>c</b>) The detection results of cow ear tags in different scenarios using Small-YOLOV5s.</p>
Full article ">Figure 12
<p>Loss decay and recognition accuracy in CRNN training.</p>
Full article ">
18 pages, 3564 KiB  
Article
Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net
by Qing-Dao-Er-Ji Ren, Lele Wang, Zerui Ma and Saheya Barintag
Electronics 2024, 13(5), 835; https://doi.org/10.3390/electronics13050835 - 21 Feb 2024
Cited by 1 | Viewed by 1156
Abstract
Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due [...] Read more.
Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due to the characteristics of the script itself and the lack of corpus. First, according to the characteristics of Mongolian handwritten characters, the random erasing data augmentation algorithm was modified, and a dual data augmentation (DDA) algorithm was proposed by combining the improved algorithm with horizontal wave transformation (HWT) to augment the dataset for training the Mongolian handwriting recognition. Second, the classical CRNN handwriting recognition model was improved. The structure of the encoder and decoder was adjusted according to the characteristics of the Mongolian script, and the attention mechanism was introduced in the feature extraction and decoding stages of the model. An improved handwriting recognition model, named the EGA model, suitable for the features of Mongolian handwriting was suggested. Finally, the effectiveness of the EGA model was verified by a large number of data tests. Experimental results demonstrated that the proposed EGA model improves the recognition accuracy of Mongolian handwriting, and the structural modification of the encoder and coder effectively balances the recognition accuracy and complexity of the model. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Experimental results of Horizontal Wave Transformation. (<b>a</b>) The original image; (<b>b</b>) the image processed by HWTDA with <span class="html-italic">R</span> = 5, 15, 25, 35, and 45 from left to right.</p>
Full article ">Figure 2
<p>Experimental test of Improved Random Erasing Data Augmentation. (<b>a</b>) The original image; (<b>b</b>) the original REDA algorithm processes images through rectangular areas; (<b>c</b>) the improved REDA algorithm processes images through elliptical areas.</p>
Full article ">Figure 3
<p>Flow of the Dual Data Augmentation Algorithm.</p>
Full article ">Figure 4
<p>EGA Model Training Process.</p>
Full article ">Figure 5
<p>Network LSTM and Network GRU Structure Comparison. (<b>a</b>) LSTM Network Structure; (<b>b</b>) GRU Network Structure.</p>
Full article ">Figure 6
<p>Network Structure of the EGA Model.</p>
Full article ">Figure 7
<p>Offline Mongolian Handwriting Images.</p>
Full article ">Figure 8
<p>Effect of Dual Data Augmentation. (<b>a</b>) The original image; (<b>b</b>) the image processed only by HWTDA; (<b>c</b>) the image processed by DDA.</p>
Full article ">Figure 9
<p>CRNN Model Recognition Accuracy Curve before and after Dual Data Augmentation.</p>
Full article ">Figure 10
<p>Model Loss Value Curve before and after Dual Data Augmentation.</p>
Full article ">Figure 11
<p>EGA Model Loss Value Curve.</p>
Full article ">Figure 12
<p>Comparison of Recognition Accuracy between EGA Model and CRNN Model.</p>
Full article ">
24 pages, 1950 KiB  
Article
Optimal Training Dataset Preparation for AI-Supported Multilanguage Real-Time OCRs Using Visual Methods
by Attila Biró, Sándor Miklós Szilágyi and László Szilágyi
Appl. Sci. 2023, 13(24), 13107; https://doi.org/10.3390/app132413107 - 8 Dec 2023
Viewed by 1938
Abstract
In the realm of multilingual, AI-powered, real-time optical character recognition systems, this research explores the creation of an optimal, vocabulary-based training dataset. This comprehensive endeavor seeks to encompass a range of criteria: comprehensive language representation, high-quality and diverse data, balanced datasets, contextual understanding, [...] Read more.
In the realm of multilingual, AI-powered, real-time optical character recognition systems, this research explores the creation of an optimal, vocabulary-based training dataset. This comprehensive endeavor seeks to encompass a range of criteria: comprehensive language representation, high-quality and diverse data, balanced datasets, contextual understanding, domain-specific adaptation, robustness and noise tolerance, and scalability and extensibility. The approach aims to leverage techniques like convolutional neural networks, recurrent neural networks, convolutional recurrent neural networks, and single visual models for scene text recognition. While focusing on English, Hungarian, and Japanese as representative languages, the proposed methodology can be extended to any existing or even synthesized languages. The development of accurate, efficient, and versatile OCR systems is at the core of this research, offering societal benefits by bridging global communication gaps, ensuring reliability in diverse environments, and demonstrating the adaptability of AI to evolving needs. This work not only mirrors the state of the art in the field but also paves new paths for future innovation, accentuating the importance of sustained research in advancing AI’s potential to shape societal development. Full article
Show Figures

Figure 1

Figure 1
<p>Adjusted PaddleOCR architecture (adapted from [<a href="#B19-applsci-13-13107" class="html-bibr">19</a>]).</p>
Full article ">Figure 2
<p>Result of data imbalance mitigation.</p>
Full article ">Figure 3
<p>Dataset preparation for real-time OCR (adapted from [<a href="#B42-applsci-13-13107" class="html-bibr">42</a>,<a href="#B47-applsci-13-13107" class="html-bibr">47</a>]).</p>
Full article ">Figure 4
<p>Sobel filter on data generation: (<b>a</b>) in the case of single-line text; (<b>b</b>) in the case of multiline text.</p>
Full article ">Figure 5
<p>Distribution of text lengths in the train dataset.</p>
Full article ">Figure 6
<p>Character number distribution.</p>
Full article ">Figure 7
<p>Text length distribution—experiment 1 (15M_enhujp_v2_1): English–Hungarian–Japanese distribution.</p>
Full article ">Figure 8
<p>Text length distribution—experiment 1 (30M_enhujp_v2_4): English–Hungarian–Japanese distribution.</p>
Full article ">Figure 9
<p>Text length distribution—experiment 1 (50M_enhujp_v2_2): English–Hungarian–Japanese distribution.</p>
Full article ">
21 pages, 1406 KiB  
Article
Contactless Heart and Respiration Rates Estimation and Classification of Driver Physiological States Using CW Radar and Temporal Neural Networks
by Amal El Abbaoui, David Sodoyer and Fouzia Elbahhar
Sensors 2023, 23(23), 9457; https://doi.org/10.3390/s23239457 - 28 Nov 2023
Cited by 3 | Viewed by 1966
Abstract
The measurement and analysis of vital signs are a subject of significant research interest, particularly for monitoring the driver’s physiological state, which is of crucial importance for road safety. Various approaches have been proposed using contact techniques to measure vital signs. However, all [...] Read more.
The measurement and analysis of vital signs are a subject of significant research interest, particularly for monitoring the driver’s physiological state, which is of crucial importance for road safety. Various approaches have been proposed using contact techniques to measure vital signs. However, all of these methods are invasive and cumbersome for the driver. This paper proposes using a non-contact sensor based on continuous wave (CW) radar at 24 GHz to measure vital signs. We associate these measurements with distinct temporal neural networks to analyze the signals to detect and extract heart and respiration rates as well as classify the physiological state of the driver. This approach offers robust performance in estimating the exact values of heart and respiration rates and in classifying the driver’s physiological state. It is non-invasive and requires no physical contact with the driver, making it particularly practical and safe. The results presented in this paper, derived from the use of a 1D Convolutional Neural Network (1D-CNN), a Temporal Convolutional Network (TCN), a Recurrent Neural Network particularly the Bidirectional Long Short-Term Memory (Bi-LSTM), and a Convolutional Recurrent Neural Network (CRNN). Among these, the CRNN emerged as the most effective Deep Learning approach for vital signal analysis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>The general architecture of the proposed models.</p>
Full article ">Figure 2
<p>A dilated causal convolution with dilation factors d = 1, 2, 4, 8, 16, and 32 and a filter size k = 3.</p>
Full article ">Figure 3
<p>Fundamental mechanism of CW radar.</p>
Full article ">Figure 4
<p>Loss function of regression models.</p>
Full article ">Figure 5
<p>Loss function of classification models.</p>
Full article ">Figure 6
<p>Confusion Matrix for Each Model Using the Simulated Dataset, Dependent on Individual variances.</p>
Full article ">Figure 7
<p>Comparative accuracy curves: predicting heart and respiration rates based on physiological state.</p>
Full article ">Figure 8
<p>Confusion Matrix for Each Model Using the Simulated Dataset, Independent of Individual Variances.</p>
Full article ">Figure 9
<p>Loss function of classification models.</p>
Full article ">Figure 10
<p>Loss function of regression models.</p>
Full article ">Figure 11
<p>Confusion matrix for each model Using the Real Dataset, Independent of Individual Variances.</p>
Full article ">
Back to TopTop