Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (549)

Search Parameters:
Keywords = audio design

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 3437 KiB  
Article
Enhancing Player Experience in a First-Person Shooter with Dynamic Audio Cue Adjustment Based on Gaussian Progress Regression
by Xiaoxu Li, Yi Xia, Mustafa Can Gursesli, Xiao You, Siyuan Chen and Ruck Thawonmas
Appl. Sci. 2024, 14(23), 11146; https://doi.org/10.3390/app142311146 - 29 Nov 2024
Viewed by 30
Abstract
This paper analyzes the experience of first-person shooter (FPS) players when game difficulty is adjusted by personalizing their audio cue settings, considering the balance between player performance, modeled using Gaussian process regression (GPR), and prior data serving as designer preference. In addition, we [...] Read more.
This paper analyzes the experience of first-person shooter (FPS) players when game difficulty is adjusted by personalizing their audio cue settings, considering the balance between player performance, modeled using Gaussian process regression (GPR), and prior data serving as designer preference. In addition, we investigate why player experience changes according to in-game audio cues. Previous studies have proposed various dynamic difficulty adjustment (DDA) methods for FPS games. However, few studies have considered the role of audio cues in the player experience. This paper compares the player experience of personalized enemy audio cue volume settings (GPR-DDA) with that of predetermined settings in an FPS game. Two comprehensive experimental phases, involving 80 participants, are conducted to assess the efficacy of GPR-DDA. The experience of our players is measured using questions taken from the Game User Experience Satisfaction Scale (GUESS) questionnaire and a final survey asking for their open-ended feedback. A large language model (LLM) is used to analyze the natural language expressions of the players according to their native languages. To ensure the LLM effectively assists a limited number of qualified human evaluators in classifying player responses, we have developed an original procedure for this task. The GUESS results show that GPR-DDA can improve the player experience. In addition, the high consistency in the classification results over multiple runs of the selected LLM, as well as the similarity between its results and those of our human evaluators, reflects the reliability of the proposed LLM-assisted procedure. Full article
(This article belongs to the Special Issue Advanced Technologies for User-Centered Design and User Experience)
Show Figures

Figure 1

Figure 1
<p>Experimental workflow.</p>
Full article ">Figure 2
<p>Game scene.</p>
Full article ">Figure 3
<p>Game map design.</p>
Full article ">Figure 4
<p>Trend of GUESS values in two phases.</p>
Full article ">
17 pages, 2093 KiB  
Article
Investigation of Data Augmentation Techniques in Environmental Sound Recognition
by Anastasios Loukas Sarris, Nikolaos Vryzas, Lazaros Vrysis and Charalampos Dimoulas
Electronics 2024, 13(23), 4719; https://doi.org/10.3390/electronics13234719 - 28 Nov 2024
Viewed by 133
Abstract
The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck [...] Read more.
The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of sufficient data representing each of a wide range of categories. In the context of audio data, an important method to increase the available data is the process of the augmentation of existing datasets. In this study, some of the most widespread time domain data augmentation techniques are studied, along with their effects on the recognition of environmental sounds, through the UrbanSound8K dataset, which consists of ten classes. The confusion matrix and the metrics that can be calculated based on the matrix were used to examine the effect of the augmentation. Also, to address the difficulty that arises when large datasets are augmented, a web-based data augmentation application was created. To evaluate the performance of the data augmentation techniques, a convolutional neural network architecture trained on the original set was used. Moreover, four time domain augmentation techniques were used. Although the parameters of the techniques applied were chosen conservatively, they helped the model to better cluster the data, especially in the four classes in which confusion was high in the initial classification. Furthermore, a web application is presented in which the user can upload their own data and apply these data augmentation techniques to both the audio extract and its time frequency representation, the spectrogram. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

Figure 1
<p>Process block diagram of the experiment.</p>
Full article ">Figure 2
<p>Block diagram of the proposed CNN.</p>
Full article ">Figure 3
<p>Block diagram presenting the pipeline of the data augmentation web application.</p>
Full article ">Figure 4
<p>Data augmentation routine implementation. It refers to random gain addition.</p>
Full article ">Figure 5
<p>Class distribution of UrbanSound8K dataset.</p>
Full article ">Figure 6
<p>Confusion matrix of the original classification on test set.</p>
Full article ">Figure 7
<p>Confusion matrices from classification of the augmented test sets. (<b>a</b>) Random gain addition following uniform distribution. (<b>b</b>) Time stretch by a factor of 2 with steady pitch. (<b>c</b>) Pitch shift by a semitone with steady duration. (<b>d</b>) Dynamic range compression by 5 dBs.</p>
Full article ">Figure A1
<p>Home Page of the web application.</p>
Full article ">Figure A2
<p>Random Gain Augmentation Page. The user can upload only .wav sound files.</p>
Full article ">
24 pages, 15074 KiB  
Article
A Symmetric Reversible Audio Information Hiding Algorithm Using Matrix Embedding Within Image Carriers
by Yongqiang Tuo, Guodong Li and Kaiyue Hou
Symmetry 2024, 16(12), 1586; https://doi.org/10.3390/sym16121586 - 27 Nov 2024
Viewed by 280
Abstract
To address the vulnerability of existing hiding algorithms to differential attacks and the limitations of single chaotic systems, such as small key space and low security, a novel algorithm combining audio encryption with information hiding is proposed. First, the original audio is divided [...] Read more.
To address the vulnerability of existing hiding algorithms to differential attacks and the limitations of single chaotic systems, such as small key space and low security, a novel algorithm combining audio encryption with information hiding is proposed. First, the original audio is divided into blocks to enhance efficiency. A “one-time pad” mechanism is achieved by associating the key with the plaintext, and a new multidimensional sine-coupled chaotic map is designed, which, in conjunction with multiple chaotic systems, generates the key stream. Next, the block-processed audio signals are matrix-converted and then encrypted using cyclic remainder scrambling, an improved Josephus scrambling, XOR diffusion, and bit diffusion. This results in an encrypted audio information matrix. Finally, the GHM multiwavelet transform is used to select embedding channels, and the least significant bit (LSB) method is employed to hide the information within the carrier image. The algorithm is symmetric, and decryption involves simply reversing the encryption process on the stego image. Experimental results demonstrate that the Structural Similarity Index (SSIM) between the carrier image and the stego image is 0.992540, the Peak Signal-to-Noise Ratio (PSNR) is 49.659404 dB, and the Mean Squared Error (MSE) is 0.708044. These metrics indicate high statistical similarity and indistinguishability in visual appearance. The key space of the encryption algorithm is approximately 2850, which effectively resists brute-force attacks. The energy distribution of the encrypted audio approximates noise, with information entropy close to 8, uniform histograms, high scrambling degree, strong resistance to differential attacks, and robustness against noise and cropping attacks. Full article
(This article belongs to the Special Issue Algebraic Systems, Models and Applications)
Show Figures

Figure 1

Figure 1
<p>Tent map phase diagram (<b>left</b>) and bifurcation and Lyapunov exponent plots of Tent chaos map (<b>right</b>).</p>
Full article ">Figure 2
<p>(<b>a</b>) Trajectories; (<b>b</b>) bifurcation diagrams; (<b>c</b>) Lyapunov exponents for multidimensional sine-coupled chaotic maps.</p>
Full article ">Figure 3
<p>Attractors in the <span class="html-italic">x</span>−<span class="html-italic">z</span> plane of the unified chaotic system and maximum Lyapunov exponent plots for varying parameters. (<b>a</b>) x-z Plane of the Unified Chaotic System with Parameter α= 0; (<b>b</b>) x-z Plane of the Unified Chaotic System with Parameter α = 0.6; (<b>c</b>) x-z Plane of the Unified Chaotic System with Parameter α = 0.8; (<b>d</b>) x-z Plane of the Unified Chaotic System with Parameter α = 1; (<b>e</b>) Maximum Lyapunov Exponent of the Unified Chaotic System with Varying Parameter α.</p>
Full article ">Figure 4
<p>Illustration of the enhanced Josephus permutation algorithm.</p>
Full article ">Figure 5
<p>Illustration of the enhanced Josephus permutation algorithm.</p>
Full article ">Figure 6
<p>Schematic diagram of the bit diffusion algorithm.</p>
Full article ">Figure 7
<p>Flowchart of audio encryption and steganography algorithms.</p>
Full article ">Figure 8
<p>Simulation experiment results. (<b>a</b>) Original audio time series; (<b>b</b>) Encrypted audio time series; (<b>c</b>) Carrier image; (<b>d</b>) Audio image; (<b>e</b>) Encrypted audio image; (<b>f</b>) Decrypted audio time Series.</p>
Full article ">Figure 9
<p>Original carrier image vs. carrier image with embedded encrypted audio. (<b>a</b>) Carrier image 1; (<b>b</b>) Carrier image 2; (<b>c</b>) Carrier image 3; (<b>d</b>) Carrier image 4; (<b>e</b>) Stego image 1; (<b>f</b>) Stego image 2; (<b>g</b>) Stego image 3; (<b>h</b>) Stego image 4.</p>
Full article ">Figure 10
<p>Histogram contrast: original carrier image vs. stego carrier with embedded data. (<b>a</b>) Histogram comparison of carrier image 1 before and after embedding; (<b>b</b>) Histogram comparison of carrier image 2 before and after embedding; (<b>c</b>) Histogram comparison of carrier image 3 before and after embedding; (<b>d</b>) Histogram comparison of carrier image 4 before and after embedding.</p>
Full article ">Figure 11
<p>Visualization of decryption outcomes using incorrect key.</p>
Full article ">Figure 12
<p>Spectrograms of original audio (<b>top</b>) and encrypted audio (<b>bottom</b>).</p>
Full article ">Figure 13
<p>Scatter plot of adjacent signal amplitudes for plaintext and encrypted audio. (<b>a</b>) Correlation plot of adjacent signal amplitudes for original audio 1; (<b>b</b>) Correlation plot of adjacent signal amplitudes for original audio 2; (<b>c</b>) Correlation plot of adjacent signal amplitudes for original audio 3; (<b>d</b>) Correlation plot of adjacent signal amplitudes for encrypted audio 1; (<b>e</b>) Correlation plot of adjacent signal amplitudes for encrypted audio 2; (<b>f</b>) Correlation plot of adjacent signal amplitudes for encrypted audio 3.</p>
Full article ">Figure 14
<p>Histograms of original audio (<b>left</b>) and encrypted audio (<b>right</b>).</p>
Full article ">Figure 15
<p>Scatter plot of permuted index distribution.</p>
Full article ">Figure 16
<p>Impact of noise attacks on hidden information robustness in steganography. (<b>a</b>) Stego image with 1% salt and pepper noise; (<b>b</b>) Stego image with 5% salt and pepper noise; (<b>c</b>) Stego image with 10% salt and pepper noise; (<b>d</b>) Decrypted audio from 1% salt and pepper noise; (<b>e</b>) Decrypted audio from 5% salt and pepper noise; (<b>f</b>) Decrypted audio from 10% salt and pepper noise.</p>
Full article ">Figure 17
<p>Results of cropping attack analysis on information hiding systems. (<b>a</b>) Stego image after 1% cropping attack; (<b>b</b>) Stego image after 5% cropping attack; (<b>c</b>) Stego image after 10% cropping attack; (<b>d</b>) Decrypted audio after 1% cropping attack; (<b>e</b>) Decrypted audio after 5% cropping attack; (<b>f</b>) Decrypted audio after 10% cropping attack.</p>
Full article ">
15 pages, 2101 KiB  
Article
Scalable Transformer Accelerator with Variable Systolic Array for Multiple Models in Voice Assistant Applications
by Seok-Woo Chang and Dong-Sun Kim
Electronics 2024, 13(23), 4683; https://doi.org/10.3390/electronics13234683 - 27 Nov 2024
Viewed by 276
Abstract
Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision [...] Read more.
Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision Transformer models, like the ViT. In this study, we propose a Scalable Transformer Accelerator Unit (STAU) for multiple models, enabling efficient handling of various Transformer models used in voice assistant applications. Variable Systolic Array (VSA) centralized design, along with control and data preprocessing in embedded processors, enables matrix operations of varying sizes. In addition, we propose an efficient variable structure and a row-wise data input method for natural language processing where the word count changes. The proposed scalable Transformer accelerator accelerates text summarization, audio processing, image search, and generative AI used in voice assistance. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

Figure 1
<p>Differences in processes with and without hardware accelerators.</p>
Full article ">Figure 2
<p>Top module architecture.</p>
Full article ">Figure 3
<p>(<b>a</b>) Multi-Head Attention hardware architecture. (<b>b</b>) Data flow of Transformer encoder.</p>
Full article ">Figure 4
<p>(<b>a</b>) Comparison of clock cycles with systolic arrays. (<b>b</b>) Comparison of mac units with systolic arrays.</p>
Full article ">Figure 5
<p>(<b>a</b>) Comparison of clock cycles with systolic arrays. (<b>b</b>) Comparison of MAC units with systolic arrays.</p>
Full article ">Figure 6
<p>Block diagram of softmax module.</p>
Full article ">Figure 7
<p>Quantization bit distribution method.</p>
Full article ">Figure 8
<p>Zero-padding techniques in Multi-Head Attention.</p>
Full article ">Figure 9
<p>Concatenating eight heads of Multi-Head Attention into vector format.</p>
Full article ">Figure 10
<p>The layer-form accumulation method using block-by-block multiplication for the second matrix multiplication in the Feed-Forward Network.</p>
Full article ">Figure 11
<p>Speed improvement comparison for a word count of 4.</p>
Full article ">Figure 12
<p>Performance comparison for different word counts.</p>
Full article ">
24 pages, 3407 KiB  
Article
Gender and Intersecting Barriers and Facilitators to Access the HIV Cascade of Care in Manitoba, Canada, Before and During the COVID-19 Pandemic: A Qualitative Study
by Enrique Villacis-Alvarez, Cheryl Sobie, Katharina Maier, Margaret Lavallee, Chantal Daniels, Heather Pashe, Joel Baliddawa, Nikki Daniels, Rebecca Murdock, Robert Russell, Clara Dan, Freda Woodhouse, Susie Cusson, Lisa Patrick, Marj Schenkels, Michael Payne, Ken Kasper, Lauren J. MacKenzie, Laurie Ireland, Kimberly Templeton, Kathleen Deering, Margaret Haworth-Brockman, Yoav Keynan and Zulma Vanessa Ruedaadd Show full author list remove Hide full author list
Trop. Med. Infect. Dis. 2024, 9(12), 287; https://doi.org/10.3390/tropicalmed9120287 - 25 Nov 2024
Viewed by 613
Abstract
Marginalized groups in Manitoba, Canada, especially females and people who inject drugs, are overrepresented in new HIV diagnoses and disproportionately affected by HIV and structural disadvantages. Informed by syndemic theory, our aim was to understand people living with HIV’s (PLHIV) gendered and intersecting [...] Read more.
Marginalized groups in Manitoba, Canada, especially females and people who inject drugs, are overrepresented in new HIV diagnoses and disproportionately affected by HIV and structural disadvantages. Informed by syndemic theory, our aim was to understand people living with HIV’s (PLHIV) gendered and intersecting barriers and facilitators across the cascade of HIV care before and during the COVID-19 pandemic. This study was co-designed and co-led alongside people with lived experience and a research advisory committee. We employed semi-structured interviews with thirty-two participants and three questionnaires. Interviews were audio-recorded, transcribed, and coded, and descriptive statistics were performed on the first two questionnaires. Qualitative data analysis used thematic analysis and focused on identifying categories (individual, healthcare, and social/structural) related to the barriers and facilitators to HIV care. A total of 32 PLHIV completed this study and over 70% of females and 50% of males reported severe and moderate sexual abuse among other traumatic childhood experiences. Barriers to accessing or continuing in the cascade of HIV care included navigating the initial shock of receiving an HIV diagnosis, mental health challenges and inaccessible supports, substance use, violence (including intimate partner), internalized and enacted compounded stigma related to houselessness and substance use, discrimination by primary care service providers and social networks, lack of preventative and social supports, lack of accessible housing, and programmatic issues. COVID-19 increased mental health problems and disrupted relationships with HIV service providers and peers living with HIV. Facilitators to HIV care included stopping substance use, caring service providers particularly during HIV diagnosis, welcoming healthcare environments, social opportunities and integrated supports, and supportive social networks. Women, men, and non-binary PLHIV experience interconnected factors complicating their experiences with HIV care. Interventions should consider holistic, person-centered, and trauma-informed care options to address the barriers found in this research and appropriately serve PLHIV. Full article
(This article belongs to the Special Issue An Update on Syndemics)
Show Figures

Figure 1

Figure 1
<p>Data collection process. Our approach to inviting people living with HIV to participate in the research project is grounded in cultural safety, trauma-informed care, and harm reduction.</p>
Full article ">Figure 2
<p>Severity of trauma for each of type of childhood trauma experienced by people living with HIV in Manitoba by sex (<b>A</b>) and gender (<b>B</b>). Severity score of trauma: none, low, moderate, and severe. Type of childhood trauma: emotional, physical and sexual abuse, and emotional and physical neglect.</p>
Full article ">Figure 3
<p>Barriers to HIV care in Manitoba. Model of understanding the interconnected barriers in HIV care experienced by people living with HIV in Manitoba.</p>
Full article ">Figure 4
<p>Barriers and facilitators to HIV care in Manitoba. Facilitators and barriers across an extended cascade of HIV care.</p>
Full article ">
32 pages, 5049 KiB  
Article
Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset
by Dayang Chen, Na Sun, Jong-Hoon Lee, Changman Zou and Wang-Su Jeon
Appl. Sci. 2024, 14(23), 10811; https://doi.org/10.3390/app142310811 - 22 Nov 2024
Viewed by 390
Abstract
This study focuses on the construction and evaluation of a high-quality Chinese Manchu music dataset designed to facilitate Artificial Intelligence (AI) research and applications within cultural heritage and ethnomusicology. Through a systematic collection and organization of diverse Manchu music resources, including folk songs, [...] Read more.
This study focuses on the construction and evaluation of a high-quality Chinese Manchu music dataset designed to facilitate Artificial Intelligence (AI) research and applications within cultural heritage and ethnomusicology. Through a systematic collection and organization of diverse Manchu music resources, including folk songs, dance music, and ceremonial pieces, this dataset effectively represents the cultural breadth of Manchu music. The dataset includes digitized and preprocessed audio data, with comprehensive metadata annotations, such as essential information, musical features, and cultural context, creating a robust foundation for AI-based analysis. Experimental evaluations highlight the dataset’s utility across various AI-driven applications: in music classification, using a CNN model, an accuracy of 90% was achieved in the “folk ensemble” category, with an overall accuracy of 85.7% and a precision of 82.3%. For music generation, a Generative Adversarial Network (GAN) model yielded a quality score of 7.8/10 and a Fréchet Audio Distance (FAD) of 0.32. In emotion recognition, the Random Forest model achieved 87% accuracy in identifying the emotion “joy”. These results underscore the dataset’s potential in supporting digital preservation and expanding AI applications in ethnic music classification, generation, and emotional analysis, contributing to both cultural heritage preservation and AI advancement in ethnomusicology. Full article
(This article belongs to the Special Issue Application of Digital Technology in Cultural Heritage)
Show Figures

Figure 1

Figure 1
<p>Population proportion and distribution of the Manchu ethnic group in China.</p>
Full article ">Figure 2
<p>Examples of Chinese traditional folk musical instruments.</p>
Full article ">Figure 3
<p>Manchu musical instruments and performance scenes.</p>
Full article ">Figure 4
<p>Data collection framework diagram.</p>
Full article ">Figure 5
<p>Audio processing and feature extraction framework diagram.</p>
Full article ">Figure 6
<p>Music information retrieval algorithm framework.</p>
Full article ">Figure 7
<p>Music generation and synthesis algorithm framework diagram.</p>
Full article ">Figure 8
<p>Music classification and sentiment analysis network framework diagram.</p>
Full article ">
13 pages, 212 KiB  
Article
Professional Nurses’ Views and Experiences of Poor Health-Seeking Behavior Among Men in Limpopo Province, South Africa
by Lazarros Chavalala, Rachel Tsakani Lebese and Lufuno Makhado
Healthcare 2024, 12(23), 2320; https://doi.org/10.3390/healthcare12232320 - 21 Nov 2024
Viewed by 475
Abstract
Background/Objectives: The health-seeking behavior of men is a public health concern and is associated with poor health outcomes and lower life expectancy among men. Masculinity norms are among the factors contributing to men’s poor health-seeking behavior. This study explored the views and experiences [...] Read more.
Background/Objectives: The health-seeking behavior of men is a public health concern and is associated with poor health outcomes and lower life expectancy among men. Masculinity norms are among the factors contributing to men’s poor health-seeking behavior. This study explored the views and experiences of purposively selected nurses on men’s health-seeking behavior. Methods: Through qualitative descriptive design, individual semi-structured interviews were conducted with 14 professional nurses, and collected data were audio recorded and transcribed verbatim. Tech’s eight steps were used to analyze data and guide the development of the main themes and sub-themes. Trustworthiness was ensured through credibility, confirmability, dependability, and transferability. Ethical approval was granted by the University of Venda Research ethics committee with reference number FHS/21/PH/26/1215. Results: Self-medication, a lack of trust in Western medicine, and the use of traditional healers were identified as reasons men underutilize health care services. Men were viewed as people who consult heath care services when illness is severe, feel uncomfortable with female nurses, and value respect from clinicians. Conclusions: Cultural norms still remain a barrier among men in this study setting, as men did not feel comfortable with their private parts being physically examined by female clinicians. There is a need to take note of cultural norms impacting health care service usage by men. Full article
21 pages, 978 KiB  
Article
Audio Features and Crowdfunding Success: An Empirical Study Using Audio Mining
by Miao Miao, Yudan Wang, Jingpeng Li, Yushi Jiang and Qiang Yang
J. Theor. Appl. Electron. Commer. Res. 2024, 19(4), 3176-3196; https://doi.org/10.3390/jtaer19040154 - 18 Nov 2024
Viewed by 495
Abstract
Crowdfunding videos have become a crucial tool for entrepreneurs seeking financial support, with audio design playing a critical role in attracting potential investors. However, research on how audio features influence crowdfunding success remains limited. This study uses audio analysis techniques to examine data [...] Read more.
Crowdfunding videos have become a crucial tool for entrepreneurs seeking financial support, with audio design playing a critical role in attracting potential investors. However, research on how audio features influence crowdfunding success remains limited. This study uses audio analysis techniques to examine data from 4500 crowdfunding campaigns on the Kickstarter platform between 2013 and 2016, investigating the impact of audio features on crowdfunding success rates. Grounded in the signaling theory, we posited four hypotheses suggesting that speech rate, loudness, pitch, and emotional arousal would each exhibit an inverted U-shaped relationship with crowdfunding success rates. Through data analysis, we found that moderate levels of speech rate, loudness, pitch, and emotional arousal significantly enhanced crowdfunding success, whereas extremes in these vocal characteristics had a detrimental effect. Our findings not only extend the application of audio analysis in the crowdfunding domain, but also provide empirical evidence for the influence of audio features on crowdfunding success. This research offers practical guidance for project initiators in developing promotional strategies and for platforms in optimizing user experience. Full article
(This article belongs to the Topic Interactive Marketing in the Digital Era)
Show Figures

Figure 1

Figure 1
<p>Sound spectrum generated by Praat software.</p>
Full article ">Figure 2
<p>Diagram of the influence of each independent variable on the dependent variable.</p>
Full article ">
20 pages, 4970 KiB  
Article
Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT
by Fawaz S. Al-Anzi and S. T. Bibin Shalini
Appl. Sci. 2024, 14(22), 10498; https://doi.org/10.3390/app142210498 - 14 Nov 2024
Viewed by 446
Abstract
Arabic raw audio datasets were initially gathered to produce a corresponding signal spectrum, which was further used to extract the Mel-Frequency Cepstral Coefficients (MFCCs). The pronunciation dictionary, language model, and acoustic model were further derived from the MFCCs’ features. These output data were [...] Read more.
Arabic raw audio datasets were initially gathered to produce a corresponding signal spectrum, which was further used to extract the Mel-Frequency Cepstral Coefficients (MFCCs). The pronunciation dictionary, language model, and acoustic model were further derived from the MFCCs’ features. These output data were processed into Baidu’s Deep Speech model (ASR system) to attain the text corpus. Baidu’s Deep Speech model was implemented to precisely identify the global optimal value rapidly while preserving a low word and character discrepancy rate by attaining an excellent performance in isolated and end-to-end speech recognition. The desired outcome in this work is to forecast the next word and character in a sequential and systematic order that applies under natural language processing (NLP). This work combines the trained Arabic language model ARABERT with the potential of Long Short-Term Memory (LSTM) networks to predict the next word and character in an Arabic text. We used the pre-trained ARABERT embedding to improve the model’s capacity and, to capture semantic relationships within the language, we educated LSTM + CNN and Markov models on Arabic text data to assess the efficacy of this model. Python libraries such as TensorFlow, Pickle, Keras, and NumPy were used to effectively design our development model. We extensively assessed the model’s performance using new Arabic text, focusing on evaluation metrics like accuracy, word error rate, character error rate, BLEU score, and perplexity. The results show how well the combined LSTM + ARABERT and Markov models have outperformed the baseline models in envisaging the next word or character in the Arabic text. The accuracy rates of 64.9% for LSTM, 74.6% for ARABERT + LSTM, and 78% for Markov chain models were achieved in predicting the next word, and the accuracy rates of 72% for LSTM, 72.22% for LSTM + CNN, and 73% for ARABERET + LSTM models were achieved for the next-character prediction. This work unveils a novelty in Arabic natural language processing tasks, estimating a potential future expansion in deriving a precise next-word and next-character forecasting, which can be an efficient utility for text generation and machine translation applications. Full article
Show Figures

Figure 1

Figure 1
<p>Baidu’s Deep Speech Arabic representation.</p>
Full article ">Figure 2
<p>Block diagram representation.</p>
Full article ">Figure 3
<p>LSTM architecture.</p>
Full article ">Figure 4
<p>Block diagram representation—next-character prediction.</p>
Full article ">Figure 5
<p>Case 1: Word-based prediction.</p>
Full article ">Figure 6
<p>Case 2: character-based prediction.</p>
Full article ">
34 pages, 11454 KiB  
Article
Compassionate Care with Autonomous AI Humanoid Robots in Future Healthcare Delivery: A Multisensory Simulation of Next-Generation Models
by Joannes Paulus Tolentino Hernandez
Biomimetics 2024, 9(11), 687; https://doi.org/10.3390/biomimetics9110687 - 11 Nov 2024
Viewed by 1116
Abstract
The integration of AI and robotics in healthcare raises concerns, and additional issues regarding autonomous systems are anticipated. Effective communication is crucial for robots to be seen as “caring”, necessitating advanced mechatronic design and natural language processing (NLP). This paper examines the potential [...] Read more.
The integration of AI and robotics in healthcare raises concerns, and additional issues regarding autonomous systems are anticipated. Effective communication is crucial for robots to be seen as “caring”, necessitating advanced mechatronic design and natural language processing (NLP). This paper examines the potential of humanoid robots to autonomously replicate compassionate care. The study employs computational simulations using mathematical and agent-based modeling to analyze human–robot interactions (HRIs) surpassing Tetsuya Tanioka’s TRETON. It incorporates stochastic elements (through neuromorphic computing) and quantum-inspired concepts (through the lens of Martha Rogers’ theory), running simulations over 100 iterations to analyze complex behaviors. Multisensory simulations (visual and audio) demonstrate the significance of “dynamic communication”, (relational) “entanglement”, and (healthcare system and robot’s function) “superpositioning” in HRIs. Quantum and neuromorphic computing may enable humanoid robots to empathetically respond to human emotions, based on Jean Watson’s ten caritas processes for creating transpersonal states. Autonomous AI humanoid robots will redefine the norms of “caring”. Establishing “pluralistic agreements” through open discussions among stakeholders worldwide is necessary to align innovations with the values of compassionate care within a “posthumanist” framework, where the compassionate care provided by Level 4 robots meets human expectations. Achieving compassionate care with autonomous AI humanoid robots involves translating nursing, communication, computer science, and engineering concepts into robotic care representations while considering ethical discourses through collaborative efforts. Nurses should lead the design and implementation of AI and robots guided by “technological knowing” in Rozzano Locsin’s TCCN theory. Full article
(This article belongs to the Special Issue Optimal Design Approaches of Bioinspired Robots)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Interpretation of Tanioka’s [<a href="#B10-biomimetics-09-00687" class="html-bibr">10</a>] model according to cybernetic HRI communication [<a href="#B92-biomimetics-09-00687" class="html-bibr">92</a>].</p>
Full article ">Figure 2
<p>Communication in “Level 3” HRI [<a href="#B92-biomimetics-09-00687" class="html-bibr">92</a>].</p>
Full article ">Figure 3
<p>Model validation for “Level 3” HRI [<a href="#B92-biomimetics-09-00687" class="html-bibr">92</a>].</p>
Full article ">Figure 4
<p>The representation of dissonance with “Level 3” HRI [<a href="#B92-biomimetics-09-00687" class="html-bibr">92</a>]. (Download the file at <a href="https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction" target="_blank">https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction</a> (accessed on 25 August 2024).</p>
Full article ">Figure 5
<p>The representation of Level 4 HRI. (Note: The mathematics in quantum communication is referenced from Yuan and Cheng [<a href="#B94-biomimetics-09-00687" class="html-bibr">94</a>], when discussing fidelity).</p>
Full article ">Figure 6
<p>The communication, entanglement, and superpositioning of the three states.</p>
Full article ">Figure 7
<p>Model validation involving overlapping states.</p>
Full article ">Figure 8
<p>The sonification of frequencies between states exhibiting quantum relationships. (Download the file at <a href="https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction" target="_blank">https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction</a>).</p>
Full article ">Figure 9
<p>An intuitive, self-regulating, and agile robot system architecture through steps 1–9. Note: <sup>a</sup> Information processing must be dynamic, symbolically instantiated (unsupervised), and evolving (unbounded materially) through <sup>c</sup> “state transition” (the humanoid robot’s conditions based on actions or events). Unbounded transitions refer to a system’s capacity for an unlimited number of transitions between states, often occurring when the conditions for transitions are not strictly defined or when the system can respond to a wide variety of inputs. In the real world, second-order cybernetics [<a href="#B99-biomimetics-09-00687" class="html-bibr">99</a>] should allow the operation of artificial cognition that is fluid and capable of co-creating knowledge within the healthcare network. <sup>b</sup> Alternatively, it can involve the construction and decomposition of “information granules” (the chunks of information) [<a href="#B95-biomimetics-09-00687" class="html-bibr">95</a>], applicable to both algorithmic (deductive) and non-algorithmic (inductive and abductive) computing using quantum logic. This process evolves through machine learning with quantum logic.</p>
Full article ">Figure 10
<p>Care actions and intentionality construed from wave function collapse.</p>
Full article ">Figure 11
<p>Model validation using machine learning.</p>
Full article ">Figure 12
<p>The data sonification of simulated care actions. Download the file at <a href="https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction" target="_blank">https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction</a> (accessed on 25 August 2024).</p>
Full article ">Figure 13
<p>The spectrogram comparison of the three audio files.</p>
Full article ">Figure 14
<p>The mathematical model simulation of “stochasticity” and “intentionality” in the humanoid robot. Note: The blue line represents the relationship between “stochasticity” and “intentionality” in a neuromorphic circuit, as modeled by the equation <span class="html-italic">I</span> = 0.5278 + 0.0666<span class="html-italic">S</span> − 0.0565<span class="html-italic">S</span><sup>2</sup>.) The pattern exhibits three distinct phases: Initial Rise (0.0 to ~0.45); Peak Plateau (~0.45 to ~0.8); and Final Decline (~0.8 to 1.0).</p>
Full article ">Figure 15
<p>The mathematical model simulation of adaptive learning in the humanoid robot. Note: The blue line (“Initial”) shows the robot’s behavior before learning, characterized by jagged fluctuations due to varying levels of randomness (stochasticity). In contrast, the red line (“After Learning”) presents a smoother curve with less variability, indicating enhanced stability after learning. Both lines begin at around 0.5275 intentionality, peak at approximately 0.5475 at “medium stochasticity” (0.6), where there is a balanced mix of predictability and unpredictability, and then decline as stochasticity approaches 1.0. The main difference is that the red line represents a more optimized response, showing that adaptive learning has resulted in more controlled and predictable behavior while maintaining the relationship between “stochasticity” and “intentionality”.</p>
Full article ">Figure 16
<p>Neuromorphic circuit design.</p>
Full article ">Figure 17
<p>Quantum-neuromorphic circuit design.</p>
Full article ">Figure 18
<p>Quantum-neuromorphic circuit simulation.</p>
Full article ">Figure 19
<p>The data sonification of the quantum-neuromorphic circuit simulation. Note: The ‘x’ symbols in (<b>A</b>) mark the peak amplitudes of the quantum-neuromorphic circuit’s waveform, indicating moments of maximum oscillation in the system’s behavior. (Download the file at <a href="https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction" target="_blank">https://github.com/jphernandezrn/Data-Sonification-Human-Robot-Interaction</a>).</p>
Full article ">
13 pages, 1449 KiB  
Article
Evaluating the User Experience and Usability of the MINI Robot for Elderly Adults with Mild Dementia and Mild Cognitive Impairment: Insights and Recommendations
by Aysan Mahmoudi Asl, Jose Miguel Toribio-Guzmán, Álvaro Castro-González, María Malfaz, Miguel A. Salichs and Manuel Franco Martín
Sensors 2024, 24(22), 7180; https://doi.org/10.3390/s24227180 - 8 Nov 2024
Viewed by 495
Abstract
Introduction: In recent years, the integration of robotic systems into various aspects of daily life has become increasingly common. As these technologies continue to advance, ensuring user-friendly interfaces and seamless interactions becomes more essential. For social robots to genuinely provide lasting value [...] Read more.
Introduction: In recent years, the integration of robotic systems into various aspects of daily life has become increasingly common. As these technologies continue to advance, ensuring user-friendly interfaces and seamless interactions becomes more essential. For social robots to genuinely provide lasting value to humans, a favourable user experience (UX) emerges as an essential prerequisite. This article aimed to evaluate the usability of the MINI robot, highlighting its strengths and areas for improvement based on user feedback and performance. Materials and Methods: In a controlled lab setting, a mixed-method qualitative study was conducted with ten individuals aged 65 and above diagnosed with mild dementia (MD) and mild cognitive impairment (MCI). Participants engaged in individual MINI robot interaction sessions, completing cognitive tasks as per written instructions. Video and audio recordings documented interactions, while post-session System Usability Scale (SUS) questionnaires quantified usability perception. Ethical guidelines were followed, ensuring informed consent, and the data underwent qualitative and quantitative analyses, contributing insights into the MINI robot’s usability for this demographic. Results: The study addresses the ongoing challenges that tasks present, especially for MD individuals, emphasizing the importance of user support. Most tasks require both verbal and physical interactions, indicating that MD individuals face challenges when switching response methods within subtasks. These complexities originate from the selection and use of response methods, including difficulties with voice recognition, tablet touch, and tactile sensors. These challenges persist across tasks, with individuals with MD struggling to comprehend task instructions and provide correct answers and individuals with MCI struggling to use response devices, often due to the limitations of the robot’s speech recognition. Technical shortcomings have been identified. The results of the SUS indicate positive perceptions, although there are lower ratings for instructor assistance and pre-use learning. The average SUS score of 68.3 places device usability in the “good” category. Conclusions: Our study examines the usability of the MINI robot, revealing strengths in quick learning, simple system and operation, and integration of features, while also highlighting areas for improvement. Careful design and modifications are essential for meaningful engagement with people with dementia. The robot could better benefit people with MD and MCI if clear, detailed instructions and instructor assistance were available. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>MINI robot components.</p>
Full article ">Figure 2
<p>MINI robot apps categorized: blue for general categories, purple for entertainment types, and orange for specific activities within each.</p>
Full article ">Figure 3
<p>SUS score for each Item.</p>
Full article ">
17 pages, 4004 KiB  
Article
Designing a Tactile Document UI for 2D Refreshable Tactile Displays: Towards Accessible Document Layouts for Blind People
by Sara Alzalabny, Omar Moured, Karin Müller, Thorsten Schwarz, Bastian Rapp and Rainer Stiefelhagen
Multimodal Technol. Interact. 2024, 8(11), 102; https://doi.org/10.3390/mti8110102 - 8 Nov 2024
Viewed by 590
Abstract
Understanding document layouts is vital for enhancing document exploration and information retrieval for sighted individuals. However, for blind and visually impaired people, it becomes challenging to have access to layout information using typical assistive technologies such as screen readers. In this paper, we [...] Read more.
Understanding document layouts is vital for enhancing document exploration and information retrieval for sighted individuals. However, for blind and visually impaired people, it becomes challenging to have access to layout information using typical assistive technologies such as screen readers. In this paper, we examine the potential benefits of presenting documents on two-dimensional (2D) refreshable tactile displays. These displays enable the tactile perception of 2D data, offering the advantage of dynamic and interactive functionality. Despite their potential, the development of user interfaces (UIs) for such displays has not advanced significantly. Thus, we propose a design of an intelligent tactile user interface (TUI), incorporating touch and audio feedback to represent documents in a tactile format. Our exploratory study for evaluating this approach revealed satisfaction from participants with the experience of directly viewing documents in their true form, rather than relying on screen-reading interpretations. Additionally, participants offered recommendations for incorporating additional features and refining the approach in future iterations. To facilitate further research and development, we have made our dataset and models publicly available. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The pipeline of our tactile document system consisting of (<b>a</b>) the layout extraction module, which utilizes the YOLOv10 detection model and an OCR model, as well as ChatGPT to extract metadata from each predicted bounding box, and (<b>b</b>) the tactile representation module, which responsible for representing the document’s metadata in a tactile format. This module handles touch and button interactions, and provides audio feedback for the auditory representation of text elements.</p>
Full article ">Figure 2
<p>An example of a document at the various stages of the system pipeline: (<b>a</b>) Following document segmentation, bounding boxes are generated for each element, with ids assigned to each bounding box. (<b>b</b>) A JSON file is created containing the bounding box coordinates, reading order, and OCR-generated text. (<b>c</b>) The tactile representation of the document using our interface. In the first view mode, the interface displays bounding boxes, while the second view mode shows only Braille letters and element identifiers to represent the document elements.</p>
Full article ">Figure 3
<p>The different views available in the tactile document interface, based on the VISM principles. (<b>a</b>) Element identifier overview mode. (<b>b</b>) Bounding boxes overview mode. (<b>c</b>) Selection of an element to explore through navigation buttons or touch. (<b>d</b>) Zoom and filter view. (<b>e</b>) Details-on-demand view.</p>
Full article ">Figure 4
<p>The interactions available in the tactile document interface and the corresponding buttons used on the HyperBraille display. (<b>a</b>) Navigation controls, (<b>b</b>) audio feedback, (<b>c</b>) view mode button, (<b>d</b>) help buttons, (<b>e</b>) page number, (<b>f</b>) back button, (<b>g</b>) document element with a selection box, (<b>h</b>) file name footer, and (<b>i</b>) page navigation button.</p>
Full article ">Figure 5
<p>(<b>a</b>) A document represented in the overview mode using the proposed interface. (<b>b</b>) The corresponding auditory information given to the user after clicking the help button.</p>
Full article ">
17 pages, 2039 KiB  
Article
Overcoming the Challenges of Including Learners with Visual Impairments Through Teacher Collaborations
by Manis Maesala and Ferreira Ronél
Educ. Sci. 2024, 14(11), 1217; https://doi.org/10.3390/educsci14111217 - 4 Nov 2024
Viewed by 669
Abstract
In this article we report on a study undertaken with 255 teachers working with learners with visual impairments. The focus of our discussion is teachers’ implementation of inclusive education policies with learners with visual impairments in full-service schools in South Africa. We foreground [...] Read more.
In this article we report on a study undertaken with 255 teachers working with learners with visual impairments. The focus of our discussion is teachers’ implementation of inclusive education policies with learners with visual impairments in full-service schools in South Africa. We foreground the ways in which the teacher participants relied on teacher collaborations to overcome some of the challenges they faced as a result of limited resource provisions in schools in this country. We implemented an instrumental case study design and followed the approach of participatory reflection and action (PRA). The sample included teachers (n = 255) from seven full-service and ten special schools from five provinces in South Africa. In addition, 50 expert stakeholders who work in the field of visual impairment were involved. For data generation and documentation, we utilised PRA-based workshops, the observation-as-context-of-interaction method, audio-visual techniques, field notes, and reflective journals. The findings of our research confirm that full-service schools face distinct challenges regarding limited resources as well as teachers that are inexperienced to accommodate learners with visual impairments. Even though the teachers in our study were initially reluctant to implement inclusive education practices, their collaboration with fellow teachers and other informed stakeholders enabled them to address some of the challenges they experienced and implement inclusive practices. They subsequently formed a team and learnt from one another to facilitate positive changes through the implementation of inclusive practices, thereby following a socio-ecological approach to inclusive practices in full-service schools in South Africa. Full article
(This article belongs to the Special Issue Cultivating Inclusive Classrooms: Practices in Special Education)
Show Figures

Figure 1

Figure 1
<p>Resources required by teachers to teach learners with visual impairments.</p>
Full article ">Figure 2
<p>Resources required by teachers to teach learners with visual impairments.</p>
Full article ">Figure 3
<p>Participants learning from one another.</p>
Full article ">Figure 4
<p>Gaining knowledge and skills from one another.</p>
Full article ">Figure 5
<p>Teachers feeling encouraged to implement inclusive education.</p>
Full article ">
13 pages, 2700 KiB  
Article
Hardware Implementation of a 2D Chaotic Map-Based Audio Encryption System Using S-Box
by Hisham M. Elrefai, Wafaa S. Sayed and Lobna A. Said
Electronics 2024, 13(21), 4254; https://doi.org/10.3390/electronics13214254 - 30 Oct 2024
Viewed by 454
Abstract
This paper presents a hardware-based audio encryption system using a 2D chaotic map and dynamic S-box design implemented on an Artix-7 FPGA platform. Three distinct chaotic maps—logistic–fraction (2D-LF), logistic–sine (2D-LS), and fraction–sine (2D-FS)—were investigated and implemented on an FPGA. The 2D-LF map was [...] Read more.
This paper presents a hardware-based audio encryption system using a 2D chaotic map and dynamic S-box design implemented on an Artix-7 FPGA platform. Three distinct chaotic maps—logistic–fraction (2D-LF), logistic–sine (2D-LS), and fraction–sine (2D-FS)—were investigated and implemented on an FPGA. The 2D-LF map was employed in the encryption system for its throughput and power efficiency performance. The proposed encryption system benefits from the randomness of chaotic sequences for block permutation and S-box substitution to enhance the diffusion and confusion properties of the encrypted speech signal. The system’s encryption strength is validated through performance evaluations, using the mean squared error (MSE), signal-to-noise ratio (SNR), correlation coefficients, and NIST randomness tests, which confirm the unpredictability of the encrypted speech signal. The hardware implementation results show a throughput of 2880 Mbps and power consumption of 0.13 W. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

Figure 1
<p>Phase space trajectory (<b>a</b>) 2D-LF, (<b>b</b>) 2D-Ls, and (<b>c</b>) 2D-FS.</p>
Full article ">Figure 2
<p>Hardware architecture of (<b>a</b>) 2D-LF, (<b>b</b>) 2D-Ls, and (<b>c</b>) 2D-FS.</p>
Full article ">Figure 3
<p>Hardware architectures of (<b>a</b>) x-y term.png, (<b>b</b>) <math display="inline"><semantics> <mi>θ</mi> </semantics></math>, (<b>c</b>) MUL, and (<b>d</b>) <math display="inline"><semantics> <msub> <mi>O</mi> <mrow> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 4
<p>Phase space trajectory of fixed-point hardware design vs. floating-point Matlab: (<b>a</b>) 2D-LF, (<b>b</b>) 2D-LS, (<b>c</b>) 2D-FS.</p>
Full article ">Figure 5
<p>Proposed encryption system for audio signals.</p>
Full article ">
16 pages, 624 KiB  
Article
Towards the Development of the Clinical Decision Support System for the Identification of Respiration Diseases via Lung Sound Classification Using 1D-CNN
by Syed Waqad Ali, Muhammad Munaf Rashid, Muhammad Uzair Yousuf, Sarmad Shams, Muhammad Asif, Muhammad Rehan and Ikram Din Ujjan
Sensors 2024, 24(21), 6887; https://doi.org/10.3390/s24216887 - 27 Oct 2024
Viewed by 651
Abstract
Respiratory disorders are commonly regarded as complex disorders to diagnose due to their multi-factorial nature, encompassing the interplay between hereditary variables, comorbidities, environmental exposures, and therapies, among other contributing factors. This study presents a Clinical Decision Support System (CDSS) for the early detection [...] Read more.
Respiratory disorders are commonly regarded as complex disorders to diagnose due to their multi-factorial nature, encompassing the interplay between hereditary variables, comorbidities, environmental exposures, and therapies, among other contributing factors. This study presents a Clinical Decision Support System (CDSS) for the early detection of respiratory disorders using a one-dimensional convolutional neural network (1D-CNN) model. The ICBHI 2017 Breathing Sound Database, which contains samples of different breathing sounds, was used in this research. During pre-processing, audio clips were resampled to a uniform rate, and breathing cycles were segmented into individual instances of the lung sound. A One-Dimensional Convolutional Neural Network (1D-CNN) consisting of convolutional layers, max pooling layers, dropout layers, and fully connected layers, was designed to classify the processed clips into four categories: normal, crackles, wheezes, and combined crackles and wheezes. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data. Hyperparameters were optimized using grid search with k−fold cross-validation. The model achieved an overall accuracy of 0.95, outperforming state-of-the-art methods. Particularly, the normal and crackles categories attained the highest F1-scores of 0.97 and 0.95, respectively. The model’s robustness was further validated through 5−fold and 10−fold cross-validation experiments. This research highlighted an essential aspect of diagnosing lung sounds through artificial intelligence and utilized the 1D-CNN to classify lung sounds accurately. The proposed advancement of technology shall enable medical care practitioners to diagnose lung disorders in an improved manner, leading to better patient care. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

Figure 1
<p>Preprocessing steps applied to the ICBHI 2017 Respiratory Sound Database.</p>
Full article ">Figure 2
<p>Spectrogram of lung sound that does not contain any crackles or wheezes.</p>
Full article ">Figure 3
<p>Wave plot of the audio segment containing crackle and its spectrogram.</p>
Full article ">Figure 4
<p>Wave plot for audio segment containing wheeze and its spectrogram.</p>
Full article ">Figure 5
<p>Wave plot for audio segment containing both wheeze and crackle and its spectrogram.</p>
Full article ">Figure 6
<p>One-Dimensional Convolutional Neural Network (1D-CNN) architecture.</p>
Full article ">Figure 7
<p>Confusion matrix for the four-class classification task.</p>
Full article ">Figure 8
<p>Training and testing accuracy curves.</p>
Full article ">Figure 9
<p>Training and testing loss curves.</p>
Full article ">
Back to TopTop