MDPI - Publisher of Open Access Journals

20 pages, 3601 KiB

Open AccessArticle

by Xiang-Yi Zhang and Jia-Lien Hsu

Appl. Sci. 2025, 15(5), 2857; https://doi.org/10.3390/app15052857 - 6 Mar 2025

Sheet music is one of the most efficient methods for storing music. Meanwhile, a large amount of sheet music-image data is stored in paper form, but not in a computer-readable format. Therefore, digitizing sheet music is an essential task, such that the encoded [...] Read more.

Sheet music is one of the most efficient methods for storing music. Meanwhile, a large amount of sheet music-image data is stored in paper form, but not in a computer-readable format. Therefore, digitizing sheet music is an essential task, such that the encoded music object could be effectively utilized for tasks such as editing or playback. Although there have been a few studies focused on recognizing sheet music images with simpler structures—such as monophonic scores or more modern scores with relatively simple structures, only containing clefs, time signatures, key signatures, and notes—in this paper we focus on the issue of classical sheet music containing dynamics symbols and articulation signs, more than only clefs, time signatures, key signatures, and notes. Therefore, this study augments the data from the GrandStaff dataset by concatenating single-line scores into multi-line scores and adding various classical music dynamics symbols not included in the original GrandStaff dataset. Given a full-scale piano score in pages, our approach first applies three YOLOv8 models to perform the three tasks: 1. Converting a full page of sheet music into multiple single-line scores; 2. Recognizing the classes and absolute positions of dynamics symbols in the score; and 3. Finding the relative positions of dynamics symbols in the score. Then, the identified dynamics symbols are removed from the original score, and the remaining score serves as the input into a Convolutional Recurrent Neural Network (CRNN) for the following steps. The CRNN outputs KERN notation (KERN, a core pitch/duration representation for common practice music notation) without dynamics symbols. By combining the CRNN output with the relative and absolute position information of the dynamics symbols, the final output is obtained. The results show that with the assistance of YOLOv8, there is a significant improvement in accuracy. Full article

(This article belongs to the Special Issue Integration of AI in Signal and Image Processing)

► Show Figures

Figure 1

27 pages, 1938 KiB

Open AccessArticle

Skeleton Reconstruction Using Generative Adversarial Networks for Human Activity Recognition Under Occlusion

by Ioannis Vernikos and Evaggelos Spyrou

Sensors 2025, 25(5), 1567; https://doi.org/10.3390/s25051567 - 4 Mar 2025

Viewed by 104

Abstract

Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective [...] Read more.

Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective and perform well under controlled conditions but face challenges in real-world scenarios due to factors such as viewpoint changes, illumination variations, and occlusion. The latter is the most significant challenge in real-world recognition; partial occlusion impacts recognition accuracy to varying degrees depending on the activity and the occluded body parts while complete occlusion can render activity recognition impossible. In this paper, we propose a novel approach for human activity recognition in the presence of partial occlusion, which may be applied in cases wherein up to two body parts are occluded. The proposed approach works under the assumptions that (a) human motion is modeled using a set of 3D skeletal joints, and (b) the same body parts remain occluded throughout the whole activity. Contrary to previous research, in this work, we address this problem using a Generative Adversarial Network (GAN). Specifically, we train a Convolutional Recurrent Neural Network (CRNN), whose goal is to serve as the generator of the GAN. Its aim is to complete the missing parts of the skeleton due to occlusion. Specifically, the input to this CRNN consists of raw 3D skeleton joint positions, upon the removal of joints corresponding to occluded parts. The output of the CRNN is a reconstructed skeleton. For the discriminator of the GAN, we use a simple long short-term memory (LSTM) network. We evaluate the proposed approach using publicly available datasets in a series of occlusion scenarios. We demonstrate that in all scenarios, the occlusion of certain body parts causes a significant decline in performance, although in some cases, the reconstruction process leads to almost perfect recognition. Nonetheless, in almost every circumstance, the herein proposed approach exhibits superior performance compared to previous works, which varies between

2.2 %

and

37.5 %

, depending on the dataset used and the occlusion case. Full article

(This article belongs to the Special Issue Robust Motion Recognition Based on Sensor Technology)

► Show Figures

Figure 1

32 pages, 16524 KiB

Open AccessArticle

HI-CMAIM: Hybrid Intelligence-Based Multi-Source Unstructured Chinese Map Annotation Interpretation Model

by Jiaxin Ren, Wanzeng Liu, Jun Chen, Xiuli Zhu, Ran Li, Tingting Zhao, Jiadong Zhang, Yuan Tao, Shunxi Yin, Xi Zhai, Yunlu Peng and Xinpeng Wang

Remote Sens. 2025, 17(2), 204; https://doi.org/10.3390/rs17020204 - 8 Jan 2025

Viewed by 558

Abstract

Map annotation interpretation is crucial for geographic information extraction and intelligent map analysis. This study addresses the challenges associated with interpreting Chinese map annotations, specifically visual complexity and data scarcity issues, by proposing a hybrid intelligence-based multi-source unstructured Chinese map annotation interpretation method [...] Read more.

Map annotation interpretation is crucial for geographic information extraction and intelligent map analysis. This study addresses the challenges associated with interpreting Chinese map annotations, specifically visual complexity and data scarcity issues, by proposing a hybrid intelligence-based multi-source unstructured Chinese map annotation interpretation method (HI-CMAIM). Firstly, leveraging expert knowledge in an innovative way, we constructed a high-quality expert knowledge-based map annotation dataset (EKMAD), which significantly enhanced data diversity and accuracy. Furthermore, an improved annotation detection model (CMA-DB) and an improved annotation recognition model (CMA-CRNN) were designed based on the characteristics of map annotations, both incorporating expert knowledge. A two-stage transfer learning strategy was employed to tackle the issue of limited training samples. Experimental results demonstrated the superiority of HI-CMAIM over existing algorithms. In the detection task, CMA-DB achieved an 8.54% improvement in Hmean (from 87.73% to 96.27%) compared to the DB algorithm. In the recognition task, CMA-CRNN achieved a 15.54% improvement in accuracy (from 79.77% to 95.31%) and a 4-fold reduction in NED (from 0.1026 to 0.0242), confirming the effectiveness and advancement of the proposed method. This research not only provides a novel approach and data support for Chinese map annotation interpretation but also fills the gap of high-quality, diverse datasets. It holds practical application value in fields such as geographic information systems and cartography, significantly contributing to the advancement of intelligent map interpretation. Full article

(This article belongs to the Special Issue Deep Learning Innovations in Remote Sensing)

► Show Figures

Figure 1

16 pages, 952 KiB

Open AccessFeature PaperArticle

SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals

by Davide Lillini, Carlo Aironi, Lucia Migliorelli, Leonardo Gabrielli and Stefano Squartini

Sensors 2024, 24(23), 7782; https://doi.org/10.3390/s24237782 - 5 Dec 2024

Viewed by 918

Abstract

Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography [...] Read more.

Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography (PSG), an intrusive procedure that depends on subjective assessment by expert clinicians. To address the limitations of PSG, we propose a decision support system, which uses a tracheal microphone for data collection and a deep learning (DL) approach—namely SiCRNN—to detect apnea events during overnight sleep recordings. Our proposed SiCRNN processes Mel spectrograms using a Siamese approach, integrating a convolutional neural network (CNN) backbone and a bidirectional gated recurrent unit (GRU). The final detection of apnea events is performed using an unsupervised clustering algorithm, specifically k-means. Multiple experimental runs were carried out to determine the optimal network configuration and the most suitable type and frequency range for the input data. Tests with data from eight patients showed that our method can achieve a

R e c a l l

score of up to 95% for apnea events. We also compared the proposed approach to a fully convolutional baseline, recently introduced in the literature, highlighting the effectiveness of the Siamese training paradigm in improving the identification of SAS. Full article

(This article belongs to the Special Issue Wearable Sensors and Artificial Intelligence for Measuring Human Vital Signs: 2nd Edition)

► Show Figures

Figure 1

13 pages, 2404 KiB

Open AccessArticle

Automated Cough Analysis with Convolutional Recurrent Neural Network

by Yiping Wang, Mustafaa Wahab, Tianqi Hong, Kyle Molinari, Gail M. Gauvreau, Ruth P. Cusack, Zhen Gao, Imran Satia and Qiyin Fang

Bioengineering 2024, 11(11), 1105; https://doi.org/10.3390/bioengineering11111105 - 1 Nov 2024

Viewed by 1069

Abstract

Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical [...] Read more.

Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical practice and research. There are currently limited tools to quantitatively measure spontaneous coughs in daily living settings in clinical trials and in clinical practice. In this study, we developed a machine learning model for the detection and classification of cough sounds. Mel spectrograms are utilized as a key feature representation to capture the temporal and spectral characteristics of coughs. We applied this approach to automate cough analysis using 300 h of audio recordings from cough challenge clinical studies conducted in a clinical lab setting. A number of machine learning algorithms were studied and compared, including decision tree, support vector machine, k-nearest neighbors, logistic regression, random forest, and neural network. We identified that for this dataset, the CRNN approach is the most effective method, reaching 98% accuracy in identifying individual coughs from the audio data. These findings provide insights into the strengths and limitations of various algorithms, highlighting the potential of CRNNs in analyzing complex cough patterns. This research demonstrates the potential of neural network models in fully automated cough monitoring. The approach requires validation in detecting spontaneous coughs in patients with refractory chronic cough in a real-life setting. Full article

(This article belongs to the Special Issue Recent Advances in Optical Imaging and Machine Learning in Biomedicine)

► Show Figures

Figure 1

Figure 1
Examples of cough data in time (a)- and frequency (b–d)-domain formats: (a) 2 min amplitude graph, with coughing events marked by red lines; (b) Mel spectrogram of the same audio segment; (c) 0.5-s audio segment containing a cough occurring between 1217.0 and 1217.3 s; (d) 0.5-s audio segment that does not contain a cough. The color bar shows volum of the recording by decibel from yellow (0 db) to black (80 db). Full article ">Figure 2
The structure of the CRNN model. Full article ">Figure 3
Cross-validation results of models trained and tested on different datasets. The metrics used were accuracy, sensitivity, and specificity. Blue represents full data set, orange represents set A, green represents set B and red represents. (a) Accuracy across train-test combinations. (b) Sensitivity across train-test combinations. (c) Specificity across train-test combinations. Full article ">Figure 4
Training and testing performance metrics of the CRNN model across 50 epochs. Left figure bule line shows the training loss and orange line shows the test loss. Right figure blue line shows the training accuracy, orange shows test accuracy, green line shows the training sensitivity, red line shows the test sensitivity, purple line shows the training specificity and brown line shows the test specificity. Full article ">Figure 5
ROC curves of all four machine learning modes and the CRNN model. All models were trained and tested on the filtered CCH dataset. The ROC curve of CRNN is from the best mixed-performance model. Full article ">

20 pages, 3003 KiB

Open AccessArticle

Equipment Sounds’ Event Localization and Detection Using Synthetic Multi-Channel Audio Signal to Support Collision Hazard Prevention

by Kehinde Elelu, Tuyen Le and Chau Le

Buildings 2024, 14(11), 3347; https://doi.org/10.3390/buildings14113347 - 23 Oct 2024

Viewed by 709

Abstract

Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine [...] Read more.

Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine equipment sounds’ Direction of Arrival (DOA). No existing frameworks implement a neural network-based approach for both equipment sound classification and localization. This paper presents an innovative framework for sound classification and localization using multichannel sound datasets artificially synthesized in a virtual three-dimensional space. The simulation synthesized 10,000 multi-channel datasets using just fourteen single sound source audiotapes. This training includes a two-staged convolutional recurrent neural network (CRNN), where the first stage learns multi-label sound event classes followed by the second stage to estimate their DOA. The proposed framework achieves a low average DOA error of 30 degrees and a high F-score of 0.98, demonstrating accurate localization and classification of equipment near workers’ positions on the site. Full article

(This article belongs to the Special Issue Big Data Technologies in Construction Management)

► Show Figures

Figure 1

Figure 1
Multichannel Audio-based Collision Hazard Detection Pipeline. Full article ">Figure 2
Spectrogram for (Left) Crane—Mobile Equipment, (Right) Saw—Stationary Equipment. Full article ">Figure 3
Sample simulation setup. Full article ">Figure 4
Sample scenario of equipment moving toward workers on construction site. One piece of equipment is mobile, moving towards the right (left ball), and another is stationary (right ball). (A) initial position of both pieces of equipment sound, (B) the mobile equipment (left ball) approaches the workers, while the stationary equipment (right ball) remains in place, (C) the mobile equipment is halfway toward the workers, with a potential collision hazard emerging, (D) the mobile equipment reaches its closest point to the workers. Full article ">Figure 5
Two-Stage Sound Event Detection and Localization Network. Full article ">Figure 6
SELD Score for Scenarios with both Stationary and Mobile Equipment. Full article ">Figure 7
SELD Score for Scenarios with Two Concurrent Mobile Equipment. Full article ">Figure 8
DOA Error Distribution across Different Equipment Types. Full article ">

13 pages, 3573 KiB

Open AccessReview

Cornulin as a Key Diagnostic and Prognostic Biomarker in Cancers of the Squamous Epithelium

by Varun Shankavaram, Dean Shah, Aseel Alashqar, Jackson Sweeney and Hilal Arnouk

Genes 2024, 15(9), 1122; https://doi.org/10.3390/genes15091122 - 26 Aug 2024

Viewed by 1820

Abstract

The prevalence of squamous cell carcinoma is increasing, and efforts that aid in an early and accurate diagnosis are crucial to improve clinical outcomes for patients. Cornulin, a squamous epithelium-specific protein, has recently garnered attention due to its implications in the progression of [...] Read more.

The prevalence of squamous cell carcinoma is increasing, and efforts that aid in an early and accurate diagnosis are crucial to improve clinical outcomes for patients. Cornulin, a squamous epithelium-specific protein, has recently garnered attention due to its implications in the progression of squamous cell carcinoma developed in several tissues. As an epidermal differentiation marker, it is involved in skin anchoring, regulating cellular proliferation, and is a putative tumor suppressor. The physiologically healthy squamous epithelium displays a considerable level of Cornulin, whereas squamous cell carcinomas have marked downregulation, suggesting that Cornulin expression levels can be utilized for the early detection and follow-up on the progression of these types of cancer. Cornulin’s expression patterns in cervical cancer have been examined, and findings support the stepwise downregulation of Cornulin levels that accompanies the progression to neoplasia in the cervix. Additional studies documented a similar trend in expression in other types of cancer, such as cutaneous, esophageal, and oropharyngeal squamous cell carcinomas. The consistent and predictable pattern of Cornulin expression across several squamous cell carcinomas and its correlation with key clinicopathological parameters make it a reliable biomarker for assessing the transformation and progression events in the squamous epithelium, thus potentially contributing to the early detection, definitive diagnosis, and more favorable prognosis for these cancer patients. Full article

(This article belongs to the Special Issue Molecular Diagnostic and Prognostic Markers of Human Cancers)

► Show Figures

Graphical abstract

21 pages, 4424 KiB

Open AccessArticle

CSA-SA-CRTNN: A Dual-Stream Adaptive Convolutional Cyclic Hybrid Network Combining Attention Mechanisms for EEG Emotion Recognition

by Ren Qian, Xin Xiong, Jianhua Zhou, Hongde Yu and Kaiwen Sha

Brain Sci. 2024, 14(8), 817; https://doi.org/10.3390/brainsci14080817 - 15 Aug 2024

Viewed by 1088

Abstract

In recent years, EEG-based emotion recognition technology has made progress, but there are still problems of low model efficiency and loss of emotional information, and there is still room for improvement in recognition accuracy. To fully utilize EEG’s emotional information and improve recognition [...] Read more.

In recent years, EEG-based emotion recognition technology has made progress, but there are still problems of low model efficiency and loss of emotional information, and there is still room for improvement in recognition accuracy. To fully utilize EEG’s emotional information and improve recognition accuracy while reducing computational costs, this paper proposes a Convolutional-Recurrent Hybrid Network with a dual-stream adaptive approach and an attention mechanism (CSA-SA-CRTNN). Firstly, the model utilizes a CSAM module to assign corresponding weights to EEG channels. Then, an adaptive dual-stream convolutional-recurrent network (SA-CRNN and MHSA-CRNN) is applied to extract local spatial-temporal features. After that, the extracted local features are concatenated and fed into a temporal convolutional network with a multi-head self-attention mechanism (MHSA-TCN) to capture global information. Finally, the extracted EEG information is used for emotion classification. We conducted binary and ternary classification experiments on the DEAP dataset, achieving 99.26% and 99.15% accuracy for arousal and valence in binary classification and 97.69% and 98.05% in ternary classification, and on the SEED dataset, we achieved an accuracy of 98.63%, surpassing relevant algorithms. Additionally, the model’s efficiency is significantly higher than other models, achieving better accuracy with lower resource consumption. Full article

(This article belongs to the Section Neurotechnology and Neuroimaging)

► Show Figures

Graphical abstract

22 pages, 8725 KiB

Open AccessArticle

Adaptive CAPTCHA: A CRNN-Based Text CAPTCHA Solver with Adaptive Fusion Filter Networks

by Xing Wan, Juliana Johari and Fazlina Ahmat Ruslan

Appl. Sci. 2024, 14(12), 5016; https://doi.org/10.3390/app14125016 - 8 Jun 2024

Cited by 1 | Viewed by 2445

Abstract

Text-based CAPTCHAs remain the most widely adopted security scheme, which is the first barrier to securing websites. Deep learning methods, especially Convolutional Neural Networks (CNNs), are the mainstream approach for text CAPTCHA recognition and are widely used in CAPTCHA vulnerability assessment and data [...] Read more.

Text-based CAPTCHAs remain the most widely adopted security scheme, which is the first barrier to securing websites. Deep learning methods, especially Convolutional Neural Networks (CNNs), are the mainstream approach for text CAPTCHA recognition and are widely used in CAPTCHA vulnerability assessment and data collection. However, verification code recognizers are mostly deployed on the CPU platform as part of a web crawler and security assessment; they are required to have both low complexity and high recognition accuracy. Due to the specifically designed anti-attack mechanisms like noise, interference, geometric deformation, twisting, rotation, and character adhesion in text CAPTCHAs, some characters are difficult to efficiently identify with high accuracy in these complex CAPTCHA images. This paper proposed a recognition model named Adaptive CAPTCHA with a CNN combined with an RNN (CRNN) module and trainable Adaptive Fusion Filtering Networks (AFFN), which effectively handle the interference and learn the correlation between characters in CAPTCHAs to enhance recognition accuracy. Experimental results on two datasets of different complexities show that, compared with the baseline model Deep CAPTCHA, the number of parameters of our proposed model is reduced by about 70%, and the recognition accuracy is improved by more than 10 percentage points in the two datasets. In addition, the proposed model has a faster training convergence speed. Compared with several of the latest models, the model proposed by the study also has better comprehensive performance. Full article

(This article belongs to the Special Issue Advanced Technologies in Data and Information Security III)

► Show Figures

Figure 1

16 pages, 4245 KiB

Open AccessArticle

CrnnCrispr: An Interpretable Deep Learning Method for CRISPR/Cas9 sgRNA On-Target Activity Prediction

by Wentao Zhu, Huanzeng Xie, Yaowen Chen and Guishan Zhang

Int. J. Mol. Sci. 2024, 25(8), 4429; https://doi.org/10.3390/ijms25084429 - 17 Apr 2024

Cited by 3 | Viewed by 2236

Abstract

CRISPR/Cas9 is a powerful genome-editing tool in biology, but its wide applications are challenged by a lack of knowledge governing single-guide RNA (sgRNA) activity. Several deep-learning-based methods have been developed for the prediction of on-target activity. However, there is still room for improvement. [...] Read more.

CRISPR/Cas9 is a powerful genome-editing tool in biology, but its wide applications are challenged by a lack of knowledge governing single-guide RNA (sgRNA) activity. Several deep-learning-based methods have been developed for the prediction of on-target activity. However, there is still room for improvement. Here, we proposed a hybrid neural network named CrnnCrispr, which integrates a convolutional neural network and a recurrent neural network for on-target activity prediction. We performed unbiased experiments with four mainstream methods on nine public datasets with varying sample sizes. Additionally, we incorporated a transfer learning strategy to boost the prediction power on small-scale datasets. Our results showed that CrnnCrispr outperformed existing methods in terms of accuracy and generalizability. Finally, we applied a visualization approach to investigate the generalizable nucleotide-position-dependent patterns of sgRNAs for on-target activity, which shows potential in terms of model interpretability and further helps in understanding the principles of sgRNA design. Full article

(This article belongs to the Special Issue Advances in Diagnostics: Applications of Nucleic Acids and Their Analogs)

► Show Figures

Figure 1

Figure 1
The heatmap shows (a) mean SCC and (b) mean PCC values of CrnnCrispr and four compared methods on nine datasets with three scales, including large-scale, medium-scale and small-scale datasets. The prediction methods are placed vertically, whereas the test datasets are arranged horizontally. Test datasets are classified by sample size. Full article ">Figure 2
Performance comparison of CrnnCrispr training from scratch and transfer learning on three small-scale datasets (e.g., HCT116, HELA and HL60) under 5-fold cross-validation. Full article ">Figure 3
Performance comparison in terms of SCC of CrnnCrispr and four existing deep-learning-based methods on nine datasets with various scales under a leave-one-cell-out procedure. Full article ">Figure 4
Impact of nucleotide composition of sgRNA activity on three large-scale datasets. Bars show the Z-scores of nucleotide frequency for each position. The numbers below represent the positions of the sequence. Full article ">Figure 5
Illustration of the CrnnCrispr architecture. The sgRNA was first encoded by one-hot encoding and label encoding and was subsequently used as input of the CNN branch and BiGRU branch, respectively. The outputs of these two branches were concatenated and fed into two LSTM layers for dimensionality reduction. The outputs were flattened and input into three fully connected layers to generate the final representation. The outputs of the final fully connected layer were fed into a linear regression transformation to make a prediction of sgRNA on-target activity. Full article ">

20 pages, 15351 KiB

Open AccessArticle

Intelligent Analysis System for Teaching and Learning Cognitive Engagement Based on Computer Vision in an Immersive Virtual Reality Environment

by Ce Li, Li Wang, Quanzhi Li and Dongxuan Wang

Appl. Sci. 2024, 14(8), 3149; https://doi.org/10.3390/app14083149 - 9 Apr 2024

Cited by 1 | Viewed by 1297

Abstract

The 20th National Congress of the Communist Party of China and the 14th Five Year Plan for Education Informatization focus on digital technology and intelligent learning and implement innovation-driven education environment reform. An immersive virtual reality (IVR) environment has both immersive and interactive [...] Read more.

The 20th National Congress of the Communist Party of China and the 14th Five Year Plan for Education Informatization focus on digital technology and intelligent learning and implement innovation-driven education environment reform. An immersive virtual reality (IVR) environment has both immersive and interactive characteristics, which are an important way of virtual learning and are also one of the important ways in which to promote the development of smart education. Based on the above background, this article proposes an intelligent analysis system for Teaching and Learning Cognitive engagement in an IVR environment based on computer vision. By automatically analyzing the cognitive investment of students in the IVR environment, it is possible to better understand their learning status, provide personalized guidance to improve learning quality, and thereby promote the development of smart education. This system uses Vue (developed by Evan You, located in Wuxi, China) and ECharts (Developed by Baidu, located in Beijing, China) for visual display, and the algorithm uses the Pytorch framework (Developed by Facebook, located in Silicon Valley, CA, USA), YOLOv5 (Developed by Ultralytics, located in Washington, DC, USA), and the CRNN model (Convolutional Recurrent Neural Network) to monitor and analyze the visual attention and behavioral actions of students. Through this system, a more accurate analysis of learners’ cognitive states and personalized teaching support can be provided for the education field, providing certain technical support for the development of smart education. Full article

(This article belongs to the Special Issue Advanced Applications of Artificial Intelligence, Data Analytics and Soft Computing)

► Show Figures

Figure 1

17 pages, 8563 KiB

Open AccessArticle

Research on the Vision-Based Dairy Cow Ear Tag Recognition Method

by Tianhong Gao, Daoerji Fan, Huijuan Wu, Xiangzhong Chen, Shihao Song, Yuxin Sun and Jia Tian

Sensors 2024, 24(7), 2194; https://doi.org/10.3390/s24072194 - 29 Mar 2024

Cited by 3 | Viewed by 1867

Abstract

With the increase in the scale of breeding at modern pastures, the management of dairy cows has become much more challenging, and individual recognition is the key to the implementation of precision farming. Based on the need for low-cost and accurate herd management [...] Read more.

With the increase in the scale of breeding at modern pastures, the management of dairy cows has become much more challenging, and individual recognition is the key to the implementation of precision farming. Based on the need for low-cost and accurate herd management and for non-stressful and non-invasive individual recognition, we propose a vision-based automatic recognition method for dairy cow ear tags. Firstly, for the detection of cow ear tags, the lightweight Small-YOLOV5s is proposed, and then a differentiable binarization network (DBNet) combined with a convolutional recurrent neural network (CRNN) is used to achieve the recognition of the numbers on ear tags. The experimental results demonstrated notable improvements: Compared to those of YOLOV5s, Small-YOLOV5s enhanced recall by 1.5%, increased the mean average precision by 0.9%, reduced the number of model parameters by 5,447,802, and enhanced the average prediction speed for a single image by 0.5 ms. The final accuracy of the ear tag number recognition was an impressive 92.1%. Moreover, this study introduces two standardized experimental datasets specifically designed for the ear tag detection and recognition of dairy cows. These datasets will be made freely available to researchers in the global dairy cattle community with the intention of fostering intelligent advancements in the breeding industry. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

18 pages, 3564 KiB

Open AccessArticle

Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net

by Qing-Dao-Er-Ji Ren, Lele Wang, Zerui Ma and Saheya Barintag

Electronics 2024, 13(5), 835; https://doi.org/10.3390/electronics13050835 - 21 Feb 2024

Cited by 1 | Viewed by 1156

Abstract

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due [...] Read more.

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due to the characteristics of the script itself and the lack of corpus. First, according to the characteristics of Mongolian handwritten characters, the random erasing data augmentation algorithm was modified, and a dual data augmentation (DDA) algorithm was proposed by combining the improved algorithm with horizontal wave transformation (HWT) to augment the dataset for training the Mongolian handwriting recognition. Second, the classical CRNN handwriting recognition model was improved. The structure of the encoder and decoder was adjusted according to the characteristics of the Mongolian script, and the attention mechanism was introduced in the feature extraction and decoding stages of the model. An improved handwriting recognition model, named the EGA model, suitable for the features of Mongolian handwriting was suggested. Finally, the effectiveness of the EGA model was verified by a large number of data tests. Experimental results demonstrated that the proposed EGA model improves the recognition accuracy of Mongolian handwriting, and the structural modification of the encoder and coder effectively balances the recognition accuracy and complexity of the model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

24 pages, 1950 KiB

Open AccessArticle

Optimal Training Dataset Preparation for AI-Supported Multilanguage Real-Time OCRs Using Visual Methods

by Attila Biró, Sándor Miklós Szilágyi and László Szilágyi

Appl. Sci. 2023, 13(24), 13107; https://doi.org/10.3390/app132413107 - 8 Dec 2023

Viewed by 1938

Abstract

In the realm of multilingual, AI-powered, real-time optical character recognition systems, this research explores the creation of an optimal, vocabulary-based training dataset. This comprehensive endeavor seeks to encompass a range of criteria: comprehensive language representation, high-quality and diverse data, balanced datasets, contextual understanding, [...] Read more.

In the realm of multilingual, AI-powered, real-time optical character recognition systems, this research explores the creation of an optimal, vocabulary-based training dataset. This comprehensive endeavor seeks to encompass a range of criteria: comprehensive language representation, high-quality and diverse data, balanced datasets, contextual understanding, domain-specific adaptation, robustness and noise tolerance, and scalability and extensibility. The approach aims to leverage techniques like convolutional neural networks, recurrent neural networks, convolutional recurrent neural networks, and single visual models for scene text recognition. While focusing on English, Hungarian, and Japanese as representative languages, the proposed methodology can be extended to any existing or even synthesized languages. The development of accurate, efficient, and versatile OCR systems is at the core of this research, offering societal benefits by bridging global communication gaps, ensuring reliability in diverse environments, and demonstrating the adaptability of AI to evolving needs. This work not only mirrors the state of the art in the field but also paves new paths for future innovation, accentuating the importance of sustained research in advancing AI’s potential to shape societal development. Full article

► Show Figures

Figure 1

21 pages, 1406 KiB

Open AccessArticle

Contactless Heart and Respiration Rates Estimation and Classification of Driver Physiological States Using CW Radar and Temporal Neural Networks

by Amal El Abbaoui, David Sodoyer and Fouzia Elbahhar

Sensors 2023, 23(23), 9457; https://doi.org/10.3390/s23239457 - 28 Nov 2023

Cited by 3 | Viewed by 1966

Abstract

The measurement and analysis of vital signs are a subject of significant research interest, particularly for monitoring the driver’s physiological state, which is of crucial importance for road safety. Various approaches have been proposed using contact techniques to measure vital signs. However, all [...] Read more.

The measurement and analysis of vital signs are a subject of significant research interest, particularly for monitoring the driver’s physiological state, which is of crucial importance for road safety. Various approaches have been proposed using contact techniques to measure vital signs. However, all of these methods are invasive and cumbersome for the driver. This paper proposes using a non-contact sensor based on continuous wave (CW) radar at 24 GHz to measure vital signs. We associate these measurements with distinct temporal neural networks to analyze the signals to detect and extract heart and respiration rates as well as classify the physiological state of the driver. This approach offers robust performance in estimating the exact values of heart and respiration rates and in classifying the driver’s physiological state. It is non-invasive and requires no physical contact with the driver, making it particularly practical and safe. The results presented in this paper, derived from the use of a 1D Convolutional Neural Network (1D-CNN), a Temporal Convolutional Network (TCN), a Recurrent Neural Network particularly the Bidirectional Long Short-Term Memory (Bi-LSTM), and a Convolutional Recurrent Neural Network (CRNN). Among these, the CRNN emerged as the most effective Deep Learning approach for vital signal analysis. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

Search Results (86)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (86)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI