Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = mel-filterbank energy features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 33216 KiB  
Article
System Design for a Prototype Acoustic Network to Deter Avian Pests in Agriculture Fields
by Destiny Kwabla Amenyedzi, Micheline Kazeneza, Ipyana Issah Mwaisekwa, Frederic Nzanywayingoma, Philibert Nsengiyumva, Peace Bamurigire, Emmanuel Ndashimye and Anthony Vodacek
Agriculture 2025, 15(1), 10; https://doi.org/10.3390/agriculture15010010 - 24 Dec 2024
Viewed by 1063
Abstract
Crop damage attributed to pest birds is an important problem, particularly in low-income countries. This paper describes a prototype system for pest bird detection using a Conv1D neural network model followed by scaring actions to reduce the presence of pest birds on farms. [...] Read more.
Crop damage attributed to pest birds is an important problem, particularly in low-income countries. This paper describes a prototype system for pest bird detection using a Conv1D neural network model followed by scaring actions to reduce the presence of pest birds on farms. Acoustic recorders were deployed on farms for data collection, supplemented by acoustic libraries. The sounds of pest bird species were identified and labeled. The labeled data were used in Edge Impulse to train a tinyML Conv1D model to detect birds of interest. The model was deployed on Arduino Nano 33 BLE Sense (nodes) and XIAO (Base station) microcontrollers to detect the pest birds, and based on the detection, scaring sounds were played to deter the birds. The model achieved an accuracy of 96.1% during training and 92.99% during testing. The testing F1 score was 0.94, and the ROC score was 0.99, signifying a good discriminatory ability of the model. The prototype was able to make inferences in 53 ms using only 14.8 k of peak RAM and only 43.8 K of flash memory to store the model. Results from the prototype deployment in the field demonstrated successful detection and triggering actions and SMS messaging notifications. Further development of this novel integrated and sustainable solution will add another tool for dealing with pest birds. Full article
(This article belongs to the Special Issue Smart Agriculture Sensors and Monitoring Systems for Field Detection)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Farmhands rattling sounds to repel pest birds from the beans farm at the University of Rwanda-Busogo campus (1°33′42.9″ S, 29°33′12.0″ E). (<b>b</b>) Acoustic monitoring deployment. Photos: D.K. Amenyedzi.</p>
Full article ">Figure 2
<p>Spectrograms illustrating species and environmental sounds. Panels (<b>A</b>–<b>G</b>) are spectrograms for several pest species, namely Chubb’s cisticola, common bulbul, common waxbill, red-billed quelea, village weaver, white-browed robin-chat, and yellow-fronted canary, respectively. Panel (<b>H</b>) is a beneficial bird species, hadada ibis, and panels (<b>I</b>–<b>K</b>) are examples of ambient noise, i.e., car horn, children talking, and rattling sounds, respectively.</p>
Full article ">Figure 3
<p>Visual representation of the same audio in the three feature selection techniques. (<b>a</b>) Spectrogram feature. (<b>b</b>) MFCC feature. (<b>c</b>) MFE feature.</p>
Full article ">Figure 4
<p>Conv1D network architecture.</p>
Full article ">Figure 5
<p>(<b>a</b>) The prototype setup on the PCB board minus speaker. (<b>b</b>) Deployment in the field with solar power to recharge the battery.</p>
Full article ">Figure 6
<p>Prototype system flowchart.</p>
Full article ">Figure 7
<p>Confusion matrix describing the performance of the MFE feature with the best Conv1D model.</p>
Full article ">Figure 8
<p>ROC curve for the MFE feature with the best Conv1D model.</p>
Full article ">Figure 9
<p>On-device result displayed on Arduino IDE serial monitor.</p>
Full article ">Figure 10
<p>Screenshots from a smartphone of SMS messages delivered from the base station and nodes. (<b>a</b>) SMS to the farmer from base station A indicating bird detections. (<b>b</b>) SMS from node B indicating bird detection. (<b>c</b>) SMS to the farmer from node C indicating a security threat.</p>
Full article ">
13 pages, 456 KiB  
Article
Robust Detection of Background Acoustic Scene in the Presence of Foreground Speech
by Siyuan Song, Yanjue Song and Nilesh Madhu
Appl. Sci. 2024, 14(2), 609; https://doi.org/10.3390/app14020609 - 10 Jan 2024
Cited by 1 | Viewed by 1083
Abstract
The characterising sound required for the Acoustic Scene Classification (ASC) system is contained in the ambient signal. However, in practice, this is often distorted by e.g., foreground speech of the speakers in the surroundings. Previously, based on the iVector framework, we proposed different [...] Read more.
The characterising sound required for the Acoustic Scene Classification (ASC) system is contained in the ambient signal. However, in practice, this is often distorted by e.g., foreground speech of the speakers in the surroundings. Previously, based on the iVector framework, we proposed different strategies to improve the classification accuracy when foreground speech is present. In this paper, we extend these methods to deep-learning (DL)-based ASC systems, for improving foreground speech robustness. ResNet models are proposed as the baseline, in combination with multi-condition training at different signal-to-background ratios (SBRs). For further robustness, we first investigate the noise-floor-based Mel-FilterBank Energies (NF-MFBE) as the input feature of the ResNet model. Next, speech presence information is incorporated within the ASC framework obtained from a speech enhancement (SE) system. As the speech presence information is time-frequency specific, it allows the network to learn to distinguish better between background signal regions and foreground speech. While the proposed modifications improve the performance of ASC systems when foreground speech is dominant, in scenarios with low-level or absent foreground speech, performance is slightly worse. Therefore, as a last consideration, ensemble methods are introduced, to integrate classification scores from different models in a weighted manner. The experimental study systematically validates the contribution of each proposed modification and, for the final system, it is shown that with the proposed input features and meta-learner, the classification accuracy is improved in all tested SBRs. Especially for SBRs of 20 dB, absolute improvements of up to 9% can be obtained. Full article
(This article belongs to the Special Issue Deep Learning Based Speech Enhancement Technology)
Show Figures

Figure 1

Figure 1
<p>ResNet Architecture: C1 = 20 and C2 = 40 in small-footprint model; C1 = 80 and C2 = 160 in large-footprint model.</p>
Full article ">Figure 2
<p>Double Conv Block.</p>
Full article ">Figure 3
<p>The performance of both small- and large-footprint ResNet baseline and the proposed best systems.</p>
Full article ">
18 pages, 10771 KiB  
Article
A Deep Neural Network Model for Speaker Identification
by Feng Ye and Jun Yang
Appl. Sci. 2021, 11(8), 3603; https://doi.org/10.3390/app11083603 - 16 Apr 2021
Cited by 76 | Viewed by 10999
Abstract
Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent [...] Read more.
Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification. Full article
(This article belongs to the Special Issue Deep Learning for Signal Processing Applications)
Show Figures

Figure 1

Figure 1
<p>Conventional recurrent neural network (RNN) model architecture. The input is a time series, <math display="inline"><semantics> <msub> <mi>x</mi> <mi>t</mi> </msub> </semantics></math> represents the input information at time <span class="html-italic">t</span>. The middle box represents the hidden state. <math display="inline"><semantics> <msub> <mi>y</mi> <mi>t</mi> </msub> </semantics></math> represents the output at time <span class="html-italic">t</span>.</p>
Full article ">Figure 2
<p>Long short-term memory (LSTM) architecture (a single memory block is shown for clarity).</p>
Full article ">Figure 3
<p>Gated recurrent unit (GRU) architecture (a single memory block is shown for clarity).</p>
Full article ">Figure 4
<p>The structure of the deep GRU networks. The entire architecture diagram mainly includes voice signal preprocessing, two-dimensional convolutional layer, stacked GRU layers, a fully-connected layer, and softmax layer.</p>
Full article ">Figure 5
<p>The architecture of the deep GRU network model, showing the input and some specific parameters of other layers, the framework is also the basis of code implementation.</p>
Full article ">Figure 6
<p>Spectrogram of speech. The intensity of any given frequency component at a given time is expressed by color depth. Darker, smaller; brighter, larger.</p>
Full article ">Figure 7
<p>The process of the data preprocessing. The purpose of the preprocessing process is to obtain a spectrogram of the speech signal for further analysis.</p>
Full article ">Figure 8
<p>Loss and accuracy of the deep GRU network model with the Aishell-1 dataset. The left block presents loss vs. training epochs and the right block represents the accuracy vs. training epochs.</p>
Full article ">Figure 9
<p>Loss and accuracy of the 2-D convolutional neural network (CNN) network model with the Aishell-1 dataset. The left part of the figure describes the loss vs. training epochs, and the right part describes the accuracy vs. training epochs.</p>
Full article ">Figure 10
<p>Performance comparison of proposed deep GRU network model with other network models. The block on the left side shows the performance of the models on the original Aishell-1 dataset, and the right part corresponds to the performance on the Aishell-1 dataset with Gaussian white noise.</p>
Full article ">
Back to TopTop