MDPI - Publisher of Open Access Journals

19 pages, 33216 KiB

Open AccessArticle

System Design for a Prototype Acoustic Network to Deter Avian Pests in Agriculture Fields

by Destiny Kwabla Amenyedzi, Micheline Kazeneza, Ipyana Issah Mwaisekwa, Frederic Nzanywayingoma, Philibert Nsengiyumva, Peace Bamurigire, Emmanuel Ndashimye and Anthony Vodacek

Agriculture 2025, 15(1), 10; https://doi.org/10.3390/agriculture15010010 - 24 Dec 2024

Viewed by 1063

Abstract

Crop damage attributed to pest birds is an important problem, particularly in low-income countries. This paper describes a prototype system for pest bird detection using a Conv1D neural network model followed by scaring actions to reduce the presence of pest birds on farms. Acoustic recorders were deployed on farms for data collection, supplemented by acoustic libraries. The sounds of pest bird species were identified and labeled. The labeled data were used in Edge Impulse to train a tinyML Conv1D model to detect birds of interest. The model was deployed on Arduino Nano 33 BLE Sense (nodes) and XIAO (Base station) microcontrollers to detect the pest birds, and based on the detection, scaring sounds were played to deter the birds. The model achieved an accuracy of 96.1% during training and 92.99% during testing. The testing F1 score was 0.94, and the ROC score was 0.99, signifying a good discriminatory ability of the model. The prototype was able to make inferences in 53 ms using only 14.8 k of peak RAM and only 43.8 K of flash memory to store the model. Results from the prototype deployment in the field demonstrated successful detection and triggering actions and SMS messaging notifications. Further development of this novel integrated and sustainable solution will add another tool for dealing with pest birds. Full article

(This article belongs to the Special Issue Smart Agriculture Sensors and Monitoring Systems for Field Detection)

► Show Figures

Figure 1

13 pages, 456 KiB

Open AccessArticle

Robust Detection of Background Acoustic Scene in the Presence of Foreground Speech

by Siyuan Song, Yanjue Song and Nilesh Madhu

Appl. Sci. 2024, 14(2), 609; https://doi.org/10.3390/app14020609 - 10 Jan 2024

Cited by 1 | Viewed by 1083

Abstract

The characterising sound required for the Acoustic Scene Classification (ASC) system is contained in the ambient signal. However, in practice, this is often distorted by e.g., foreground speech of the speakers in the surroundings. Previously, based on the iVector framework, we proposed different strategies to improve the classification accuracy when foreground speech is present. In this paper, we extend these methods to deep-learning (DL)-based ASC systems, for improving foreground speech robustness. ResNet models are proposed as the baseline, in combination with multi-condition training at different signal-to-background ratios (SBRs). For further robustness, we first investigate the noise-floor-based Mel-FilterBank Energies (NF-MFBE) as the input feature of the ResNet model. Next, speech presence information is incorporated within the ASC framework obtained from a speech enhancement (SE) system. As the speech presence information is time-frequency specific, it allows the network to learn to distinguish better between background signal regions and foreground speech. While the proposed modifications improve the performance of ASC systems when foreground speech is dominant, in scenarios with low-level or absent foreground speech, performance is slightly worse. Therefore, as a last consideration, ensemble methods are introduced, to integrate classification scores from different models in a weighted manner. The experimental study systematically validates the contribution of each proposed modification and, for the final system, it is shown that with the proposed input features and meta-learner, the classification accuracy is improved in all tested SBRs. Especially for SBRs of 20 dB, absolute improvements of up to 9% can be obtained. Full article

(This article belongs to the Special Issue Deep Learning Based Speech Enhancement Technology)

► Show Figures

Figure 1

18 pages, 10771 KiB

Open AccessArticle

A Deep Neural Network Model for Speaker Identification

by Feng Ye and Jun Yang

Appl. Sci. 2021, 11(8), 3603; https://doi.org/10.3390/app11083603 - 16 Apr 2021

Cited by 76 | Viewed by 10999

Abstract

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification. Full article

(This article belongs to the Special Issue Deep Learning for Signal Processing Applications)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI