CN115206341A - Equipment abnormal sound detection method and device and inspection robot - Google Patents
Equipment abnormal sound detection method and device and inspection robot Download PDFInfo
- Publication number
- CN115206341A CN115206341A CN202210840946.3A CN202210840946A CN115206341A CN 115206341 A CN115206341 A CN 115206341A CN 202210840946 A CN202210840946 A CN 202210840946A CN 115206341 A CN115206341 A CN 115206341A
- Authority
- CN
- China
- Prior art keywords
- abnormal sound
- mel frequency
- channel
- sound
- bottleneck
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 135
- 238000001514 detection method Methods 0.000 title claims description 29
- 238000007689 inspection Methods 0.000 title claims description 15
- 238000001228 spectrum Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 10
- 230000004807 localization Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000017105 transposition Effects 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 15
- 238000012545 processing Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 235000019878 cocoa butter replacer Nutrition 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The application provides a method and a device for detecting abnormal sounds of equipment and a patrol robot, wherein the method comprises the following steps: acquiring multi-channel original audio data through a multi-microphone linear array; performing Mel frequency spectrum feature extraction on the original audio data of each channel to obtain a Mel frequency spectrum of the corresponding channel; carrying out sound classification and identification by utilizing a sound identification model according to the Mel frequency spectrums of all channels; and triggering an abnormal sound alarm when the abnormal sound of the channels exceeding the preset number is identified. By using the method, the equipment can be identified by abnormal sound in real time and automatically alarmed by the abnormal sound, so that intelligent monitoring is realized, the labor cost is reduced, and the like.
Description
Technical Field
The application relates to the technical field of audio signal processing, in particular to a method and a device for detecting abnormal sounds of equipment and a patrol robot.
Background
With the development of production technology, it is a normal state that more and more large-scale equipment is purchased in an enterprise production room. The sound is produced by the vibration, and equipment such as motor, gear, conveyer belt all can produce the vibration of fixed law under the normal condition, and then will send extra unusual sound characteristic when equipment takes place unusually, carries out unusual sound to equipment and detects and have the important function to enterprise's intelligent monitoring.
The general method of abnormal sound detection adopts a single sound pick-up to extract the audio features of various devices, and then the audio features are accessed into a vector machine model such as a hidden Markov model or traditional machine learning to be identified. However, the simple single sound pickup has much interference to the collected sound, and can not accurately locate the abnormal sound source, even if abnormal sound alarm occurs, a large amount of equipment needs to be manually checked one by one, and the efficiency is low.
Disclosure of Invention
In view of this, the embodiment of the present application provides a method and an apparatus for detecting abnormal sounds of a device, and an inspection robot.
In a first aspect, an embodiment of the present application provides an apparatus abnormal sound detection method, including:
acquiring multi-channel original audio data through a multi-microphone linear array;
carrying out Mel frequency spectrum feature extraction on the original audio data of each channel to obtain a Mel frequency spectrum of the corresponding channel;
carrying out sound classification and identification by using a sound identification model according to the Mel frequency spectrums of all channels;
and triggering an abnormal sound alarm when the abnormal sound of the channels exceeding the preset number is identified.
In some embodiments, after triggering the abnormal audible alarm, the method further comprises:
and carrying out abnormal sound source positioning according to the original audio data of the channel with the abnormal sound, and sending the identified abnormal sound data and the positioning information of the abnormal sound source to an equipment management platform.
In some embodiments, the performing abnormal sound source localization according to the original audio data of the channel where the abnormal sound exists includes:
extracting a Mel frequency spectrum local response region of a corresponding audio frequency for triggering abnormal sound alarm through gradient return according to a sound classification result output by the sound identification model;
extracting a target Mel frequency range section from the Mel frequency spectrum local response region of each path of audio, wherein the target Mel frequency range section is other Mel frequency range sections except the Mel frequency range section corresponding to abnormal sound;
carrying out short-time Fourier transform on the original audio data of a channel with abnormal sound to obtain a spectrogram;
and determining the position of an abnormal sound source according to the spectrogram and the target Mel frequency range segment.
In some embodiments, the determining a location of an abnormal sound source from the spectrogram and the target mel-frequency range band comprises:
carrying out Mel frequency inverse transformation on the target Mel frequency range segment, and calculating a corresponding target sound frequency range segment;
zeroing the amplitude of the target sound frequency range segment to filter the spectrogram to obtain a filtered spectrogram;
carrying out short-time inverse Fourier transform on the filtered spectrogram to obtain a filtered audio;
and carrying out sound source positioning according to the filtered audio of each channel based on a beam forming technology, and calculating to obtain the position of an abnormal sound source.
In some embodiments, the recognizing the voice type by using the voice recognition model according to the mel spectrums of all the channels includes:
based on the acquisition duration of the original audio data, adjusting the frame number of the Mel frequency spectrum of each channel to obtain a corresponding Mel frequency spectrum frame;
and performing two-dimensional imaging on the Mel frequency spectrum frame of each channel to input the Mel frequency spectrum frame into the voice recognition model for voice classification recognition, and obtaining the voice classification result of each channel.
In some embodiments, the voice recognition model comprises a modified batch normalization layer, first to fourth bottleneck stacking structures, and a pooling classification layer connected in series;
the improved batch normalization layer comprises a first channel transposition unit, a batch normalization unit and a second channel transposition unit which are sequentially arranged; the first bottleneck stacking structure, the second bottleneck stacking structure and the fourth bottleneck stacking structure respectively comprise at least one bottleneck unit and one maximum pooling unit; the pooled classification layer includes pooled cells that are pooled based on the average and maximum values of the input feature map.
In some embodiments, the first bottleneck stack comprises one bottleneck unit, the second and third bottleneck stacks each comprise three bottleneck units, and the fourth bottleneck stack comprises two bottleneck units; wherein the bottleneck unit is a bottleneck layer in a residual error network.
In a second aspect, an embodiment of the present application provides an apparatus abnormal sound detection device, including:
the data acquisition module is used for acquiring multi-channel original audio data through the multi-microphone linear array;
the data analysis module is used for extracting the Mel frequency spectrum characteristics of the original audio data of each channel to obtain the Mel frequency spectrum of the corresponding channel;
the model identification module is used for carrying out sound classification and identification by utilizing a sound identification model according to the Mel frequency spectrums of all channels;
and the abnormal alarm module is used for triggering abnormal sound alarm when recognizing that the channels with the number exceeding the preset number have abnormal sound.
In a third aspect, an embodiment of the present application provides an inspection robot, which includes a multi-microphone linear array, a processor and a memory, wherein the multi-microphone linear array is used for acquiring multi-channel original audio data, and the memory stores a computer program, and the processor is used for executing the computer program to implement the above-mentioned abnormal sound detection method for the device.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, which stores a computer program, and the computer program, when executed on a processor, implements the above-mentioned device abnormal sound detection method.
The embodiment of the application has the following beneficial effects:
the abnormal sound detection method of the equipment acquires multi-channel original audio data in real time by adopting a multi-microphone linear array, and performs Mel frequency spectrum feature extraction on the original audio data to obtain a Mel frequency spectrum of a corresponding channel; and then, carrying out sound classification and identification by utilizing a sound identification model constructed in advance according to the Mel frequency spectrums of all the channels, and triggering abnormal sound alarm when identifying that abnormal sounds exist in channels exceeding a preset number. The method can solve the problem of abnormal sound detection of the equipment and automatically alarm when abnormal sound occurs, thereby realizing intelligent monitoring and reducing labor cost; in addition, the original data collected based on the multi-microphone linear array can further be used for positioning abnormal sound sources, and the like, so that manual checking is not needed one by one, and the maintenance efficiency can be greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 shows a schematic structural diagram of an inspection robot according to an embodiment of the present application;
FIG. 2 shows a schematic structural diagram of a four-wheat linear array of an embodiment of the present application;
fig. 3 shows a first flowchart of an apparatus abnormal sound detection method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a voice recognition model according to an embodiment of the present application;
FIG. 5 illustrates a schematic structural diagram of an improved batch normalization layer according to an embodiment of the present application;
FIG. 6 shows a schematic structural diagram of a pooled classification layer based on a combination of mean and maximum values according to an embodiment of the present application;
fig. 7 shows a second flowchart of the device abnormal sound detection method of the embodiment of the present application;
FIG. 8 is a flow chart illustrating abnormal sound source localization according to an embodiment of the present application;
FIGS. 9A and 9B are schematic diagrams illustrating a Mel spectral input of a single audio and a Mel spectral local response region obtained based on gradient back-transmission, respectively;
fig. 10 is a schematic structural diagram showing an abnormal device sound detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Please refer to fig. 1, which is a schematic structural diagram of an inspection robot according to an embodiment of the present disclosure. Exemplarily, the inspection robot may include a memory 11, a processor 12 and a sound sensing unit 13, wherein the respective elements of the memory 11, the processor 12 and the sound sensing unit 13 are electrically connected to each other directly or indirectly to realize data transmission or interaction.
In this embodiment, the Memory 11 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 11 is used for storing a computer program, and the processor 12 can execute the computer program after receiving the execution instruction.
In this embodiment, the processor 12 may be an integrated circuit chip having signal processing capabilities. The Processor 12 may be a general-purpose Processor including at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that implements or executes the methods, steps and logic blocks disclosed in the embodiments of the present application.
In the present embodiment, the sound sensing unit 13 is mainly a microphone, and is used for collecting sound signals of the external environment. Preferably, the inspection robot will adopt a multi-microphone linear array, as shown in fig. 2, the embodiment of the present application adopts a four-microphone linear array, specifically including a four-channel audio input composed of microphones m1 to m4, and an out1 to out4 four-channel audio output, which is used for directly outputting the original audio data collected by m1 to m4 through a processing unit (DSP); in addition, the system also comprises channels out 5-out 6 for outputting the audio of the two loudspeakers, and a channel speed out for outputting interference-free audio data and the like after the audio signals input by the six channels are subjected to denoising and echo cancellation by using a third-party audio processing library. It can be understood that, in the practical application, when the workman overhauld, can use the pronunciation function of talkbacking on patrolling and examining the robot and carry out the pronunciation conversation in the LAN with the central control room at the robot end.
Based on the inspection robot structure, the embodiment of the application provides an equipment abnormal sound detection method which can be applied to robot inspection scenes of equipment such as a pipe belt conveyor and the like, and the method is used for inter-sound abnormal detection based on a multi-microphone linear array and timely reminding equipment of abnormality and the like; in addition, after abnormal sound alarm occurs, the approximate position of the source of the abnormal sound can be positioned and uploaded to an equipment management platform, and subsequent overhaul workers can conveniently maintain the equipment and the like.
As shown in fig. 3, the apparatus abnormal sound detecting method exemplarily includes steps S110 to S140:
and S110, acquiring multi-channel original audio data through a multi-microphone linear array.
Exemplarily, when the multi-microphone linear array is used for acquiring audio data, the real-time acquisition may be performed according to a preset sampling period, and a preset time length is acquired each time, for example, the audio data with a time length of 3s to 5s and the like may be acquired each time, and the specific time length is not limited herein and may be adjusted according to actual requirements. It will be appreciated that the number of channels here corresponds to the number of microphones, such as the four-microphone linear array shown in fig. 2, at which time the raw audio data output by the four channels out1 to out4 will be obtained.
And S120, extracting the Mel frequency spectrum characteristics of the original audio data of each channel to obtain the Mel frequency spectrum of the corresponding channel.
The Mel frequency spectrum largely retains the information needed by human ears to understand the original voice, so the Mel frequency spectrum feature extraction is carried out on the original audio data. For feature extraction of mel-frequency spectrum, the description is not expanded here, and the related documents are disclosed.
And S130, carrying out voice classification and recognition by using a voice recognition model according to the Mel frequency spectrums of all the channels.
The voice recognition model is obtained through pre-training and is used for performing classification recognition on the voice, for example, the classification result may be a classification result in which an abnormal voice exists or an abnormal voice does not exist in the output audio. It is understood that the abnormal sound mainly refers to one or more sounds generated by the corresponding device at the time of abnormality.
Taking a pipe belt conveyor apparatus as an example, the abnormal sound may include, but is not limited to, sounds including abnormal sound of a motor, knocking of a metal bracket, interference of a carrier roller, and the like. For example, when the carrier roller is damaged, the carrier roller is easily broken into two parts, the carrier roller is driven by a pipe belt conveyor to rotate at a high speed, the broken two parts continuously collide to generate abnormal sound, and the abnormal sound is the carrier roller interference abnormal sound.
In one embodiment, the above-mentioned voice recognition model is obtained by modifying audio pre-trained neural networks (PANNs), and the modified model structure is shown in fig. 4. Specifically, the voice recognition model comprises an improved batch normalization layer (marked as Mel BN structure), a first bottleneck stacking structure, a second bottleneck stacking structure, a third bottleneck stacking structure, a fourth bottleneck stacking structure and a pooling classification layer (marked as Meanmax pool layer) which are connected in sequence. The improved batch normalization layer is used as an input layer of the model and used for performing batch normalization processing on the input Mel frequency spectrum characteristic diagram on an audio time sequence. As shown in fig. 5, the Mel BN structure includes a first channel transpose unit (i.e., transpose), a batch normalization unit (i.e., BN), and a second channel transpose unit, which are sequentially arranged. It is worth noting that different from the batch normalization of computer vision, the amplitudes corresponding to the Mel frequencies are not related to each other, and the Mel BN structure is subjected to batch normalization only on the dimension of a time sequence, so that the Mel BN structure has the advantages that the convergence is easier to be ensured, meanwhile, the change of the energy of the Mel frequencies on the time dimension is highlighted, and the detection of abnormal sounds distributed in different frequencies has better robustness. Through an ablation experiment test, the Mel BN structure can improve the mAP of abnormal sound recognition classification by 1.7 percent and the like.
The first to fourth bottleneck stacking structures are mainly used for deepening the network depth to extract deeper features. In one embodiment, each of the four neck stacks includes at least one bottleneck cell (denoted as CBR structure) and one maximum pooling cell (max pool), e.g., a first neck stack includes one bottleneck cell (e.g., CBR 1 structure in fig. 4), a second and third bottleneck stacks each include three bottleneck cells (e.g., two CBR 3 structures in fig. 4), and a fourth bottleneck stack includes two bottleneck cells (e.g., CBR 2 structure in fig. 4). It should be understood that the number of these bottleneck stacks can be increased or decreased according to actual requirements, and the number of bottleneck units in each bottleneck stack can be adapted as well. For example, the bottleneck unit in each bottleneck stack structure may be implemented using a bottleneck layer (bottleeck) in a residual (resnet) network.
It can be understood that, in the sound classification, the stacking basis in the original PANNs network is convolution, normalization and Relu activation, which may cause the problem that the gradient disappears easily in the deeper network, so that the present embodiment may deepen the depth of the network while not significantly increasing the amount of model operation by using the stacking structure of multiple CBRs × n, thereby improving the accuracy of the identification and classification of multiple abnormal sounds, etc. Through ablation experimental tests, after the bottleneck stacking structure is adopted, the abnormal sound identification and classification mAP can be improved by 3 percent.
Wherein the Meanmax pool layer is used for performing dimension reduction and classification output on the features, as shown in fig. 6, the pooling classification layer may include a pooling unit based on a combination of an average value (mean) and a maximum value (max) of the input feature map and a classification function. Compared with a scheme of reducing the dimension by using the global average pooling/global maximum pooling layer alone, the embodiment can enhance the feature representation capability by combining the average value and the maximum value of the feature map. Through ablation experimental tests, the Meanmax pool structure is compared with the global average/maximum pooling structure, so that the abnormal sound identification classification mAP is respectively improved by 2% and 1.4%, and the like.
After the network model with the structure is constructed, the voice recognition model can be obtained through further training and then is deployed in the inspection robot. It can be understood that, in this embodiment, by performing structural improvement on the existing PANNs network, on one hand, the improved network can have better accuracy, robustness and the like for sound classification and identification, and on the other hand, the routing inspection robot serving as the edge processing device is also fully considered, and the operation processing capability of the routing inspection robot is often limited by hardware.
For the step S130, exemplarily, the frame number of the mel spectrum of each channel may be adjusted according to the acquisition duration, the sampling rate, and the like of the original audio data, so as to obtain a corresponding mel spectrum frame; and then, performing two-dimensional imaging on the Mel frequency spectrum frame of each channel to input the Mel frequency spectrum frame into the trained voice recognition model for voice classification recognition, thereby obtaining the voice classification result of each channel.
The frame number adjustment mainly depends on the input size of the voice recognition model. Since the abnormal sound of the device is generally strong in periodicity, the audio data in 3s time is basically enough to represent the abnormal sound, so as to take the original audio data with a length of 3s as an example, assuming that the sampling rate sr is 16000, the hamming window size n _ fft =512, the adjacent frame offset hop _ length =320, and the number of mel filters is n _ mel =64, so that the number of mel-spectrum frames seq _ len = (3 × sr-n _ fft)/hop _ length +1 of the 3s audio is, for example, 160 frames can be selected. Two-dimensional data with a mel-frequency spectrum input of [160, 64] is thus obtained. It will be appreciated that the original PANNs input is 1000 frames, and here adjusted to 160 frames (as just one example), which greatly speeds up the algorithm operation efficiency and relieves the computational stress of the edge computing devices. The adjusted target frame number may be adaptively adjusted according to the audio duration, the sampling rate, and the like of each acquisition, and is not limited herein.
And S140, when the abnormal sound existing in the channels exceeding the preset number is identified, triggering an abnormal sound alarm.
The preset number may be adaptively set according to the total number of channels, the strictness of the scene to the voice recognition, and the like, and is not limited herein. For example, taking the four channels as an example, when abnormal sounds are identified in the audio input by more than two channels, the abnormal sound alarm may be triggered to remind the worker to perform maintenance in time. Otherwise, if the abnormal sound is not identified in the corresponding number of channels, the process returns to the step S110 to continuously perform the abnormal sound detection.
As a preferable scheme, the method further comprises the steps of automatically positioning the abnormal sound source, and uploading related information such as audio data of the abnormal sound and the position of the abnormal sound source to the management platform so as to record the running state of the equipment and prompt a maintenance worker to perform equipment maintenance.
Exemplarily, as shown in fig. 7, the apparatus abnormal sound detecting method further includes the step S150 of:
s150, positioning an abnormal sound source according to the original audio data of the channel with the abnormal sound, and sending the identified abnormal sound data and the positioning information of the abnormal sound source to the equipment management platform.
In one embodiment, as shown in fig. 8, the process of locating an abnormal sound source mainly includes:
and S210, extracting a Mel frequency spectrum local response region of the corresponding audio frequency for triggering abnormal sound alarm through gradient return according to the sound classification result output by the sound identification model.
The gradient feedback is mainly to update the gradient of the network once according to the classification output result of the model, for example, a GradCam algorithm and the like can be adopted, the response thermodynamic diagram of the convolution layer is calculated by taking the gradient output of the last convolution layer, and then the original mel-frequency spectrum input size is recovered through upsampling interpolation to determine the mel-frequency spectrum local response region. For example, fig. 9A shows the mel-frequency spectrum input of a certain audio, and fig. 9B shows a schematic diagram of a corresponding mel-frequency spectrum local response region extracted based on gradient feedback, wherein the frequency domain direction (y direction) component of the region indicated by the arrow is the mel frequency range segment of the abnormal sound.
S220, extracting a target Mel frequency range section from a Mel frequency spectrum local response region of each path of audio; the target Mel frequency range segment is a Mel frequency range segment other than the Mel frequency range segment corresponding to the abnormal sound.
And S230, performing short-time Fourier transform on the original audio data of the channel with the abnormal sound to obtain a spectrogram.
S240, determining the position of the abnormal sound source according to the spectrogram and the target Mel frequency range.
Exemplarily, when the position of the abnormal sound source is determined, mel frequency inverse transformation may be performed on the target mel frequency range segment to calculate a corresponding target sound frequency range segment; then, setting the amplitude of the target sound frequency range segment to zero to filter the spectrogram to obtain the filtered spectrogram; and carrying out short-time inverse Fourier transform on the filtered spectrogram to obtain a filtered audio. It can be understood that the obtained filtered audio output inhibits other sounds except abnormal sounds, and prevents interference and the like caused by subsequent sound source positioning. For example, the sound of idler interference occurs at the front point of a pipe belt conveyor which is conveyed at a high speed, but because the motor sound is large, if the motor sound is not suppressed, the sound source positioning algorithm tends to position the motor sound position instead of the idler interference sound position. And finally, positioning the sound source based on the beam forming technology according to the filtered audio of each channel with abnormal sound to obtain the position of the abnormal sound source. It can be understood that the original audio data of multiple channels is selected for abnormal sound identification and sound source localization, and the original audio data is used instead of the processed audio data of denoising, echo cancellation and the like, so that the most complete abnormal sound data can be reserved, and too many abnormal sound features are prevented from being lost during audio processing.
According to the abnormal sound detection method for the equipment, the abnormal sound is identified and the abnormal sound is alarmed by adopting the multi-microphone array and the improved model, so that the problem of abnormal sound detection of the equipment can be solved, the alarm is automatically alarmed when the abnormal sound occurs, the intelligent monitoring is realized, and the labor cost is reduced; in addition, the abnormal sound source positioning method can further perform abnormal sound source positioning and the like based on the original data acquired by the multi-microphone linear array, so that manual checking is not needed, the maintenance efficiency can be greatly improved, and the like.
Referring to fig. 10, based on the method of the foregoing embodiment, the present embodiment provides an apparatus 100 for detecting abnormal sounds, and exemplarily, the apparatus 100 includes:
and the data acquisition module 110 is used for acquiring multi-channel original audio data through the multimicrophone linear array.
The data analysis module 120 is configured to perform mel-frequency spectrum feature extraction on the original audio data of each channel to obtain a mel-frequency spectrum of a corresponding channel.
And the model identification module 130 is used for performing sound classification and identification by using a sound identification model according to the mel frequency spectrums of all the channels.
And the abnormal alarm module 140 is used for triggering an abnormal sound alarm when the abnormal sound existing in the channels exceeding the preset number is identified.
Further, the device abnormal sound detection apparatus 100 further includes a sound source localization module, configured to perform abnormal sound source localization according to the original audio data of a channel where the abnormal sound exists, and send the identified abnormal sound data and localization information of the abnormal sound source to the device management platform.
It is understood that the apparatus of the present embodiment corresponds to the method of the above embodiment, and the alternatives of the above embodiment are also applicable to the present embodiment, so the description is not repeated here.
The application also provides a readable storage medium for storing the computer program used in the inspection robot.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.
Claims (10)
1. An apparatus abnormal sound detection method, comprising:
acquiring multi-channel original audio data through a multi-microphone linear array;
carrying out Mel frequency spectrum feature extraction on the original audio data of each channel to obtain a Mel frequency spectrum of the corresponding channel;
carrying out sound classification and identification by using a sound identification model according to the Mel frequency spectrums of all channels;
and when the abnormal sounds of the channels exceeding the preset number are identified, triggering abnormal sound alarm.
2. The device abnormal sound detection method according to claim 1, wherein after triggering an abnormal sound alarm, the method further comprises:
and carrying out abnormal sound source positioning according to the original audio data of the channel with the abnormal sound, and sending the identified abnormal sound data and the positioning information of the abnormal sound source to an equipment management platform.
3. The apparatus abnormal sound detection method according to claim 2, wherein said performing abnormal sound source localization based on the original audio data of the channel in which the abnormal sound exists comprises:
extracting a Mel frequency spectrum local response region of a corresponding audio frequency for triggering abnormal sound alarm through gradient return according to a sound classification result output by the sound identification model;
extracting a target Mel frequency range section from the Mel frequency spectrum local response region of each path of audio, wherein the target Mel frequency range section is other Mel frequency range sections except the Mel frequency range section corresponding to abnormal sound;
carrying out short-time Fourier transform on the original audio data of the channel with the abnormal sound to obtain a spectrogram;
and determining the position of an abnormal sound source according to the spectrogram and the target Mel frequency range segment.
4. The apparatus abnormal sound detection method according to claim 3, wherein said determining a location of an abnormal sound source based on the spectrogram and the target Mel frequency range bin comprises:
carrying out Mel frequency inverse transformation on the target Mel frequency range segment, and calculating a corresponding target sound frequency range segment;
zeroing the amplitude of the target sound frequency range segment to filter the spectrogram to obtain a filtered spectrogram;
carrying out short-time inverse Fourier transform on the filtered spectrogram to obtain a filtered audio;
and carrying out sound source positioning according to the filtered audio of each channel based on a beam forming technology, and calculating to obtain the position of an abnormal sound source.
5. The device abnormal sound detection method according to any one of claims 1 to 4, wherein the performing sound type identification by using a sound identification model according to the Mel frequency spectrums of all channels comprises:
based on the acquisition duration of the original audio data, adjusting the frame number of the Mel frequency spectrum of each channel to obtain a corresponding Mel frequency spectrum frame;
and performing two-dimensional imaging on the Mel frequency spectrum frame of each channel to input the Mel frequency spectrum frame into the voice recognition model for voice classification recognition, and obtaining the voice classification result of each channel.
6. The device abnormal sound detection method according to any one of claims 1 to 4, wherein the sound recognition model comprises an improved batch normalization layer, first to fourth bottleneck stacking structures and a pooling classification layer which are connected in sequence;
the improved batch normalization layer comprises a first channel transposition unit, a batch normalization unit and a second channel transposition unit which are sequentially arranged; the first to fourth bottleneck stacking structures each comprise at least one bottleneck unit and one maximum pooling unit; the pooled classification layer includes pooled cells that are pooled based on the average and maximum values of the input feature map.
7. The apparatus abnormal sound detection method according to claim 6, wherein the first bottleneck stack structure includes one bottleneck unit, the second and third bottleneck stack structures each include three bottleneck units, and the fourth bottleneck stack structure includes two bottleneck units; wherein the bottleneck unit is a bottleneck layer in a residual error network.
8. An apparatus abnormal sound detection device, comprising:
the data acquisition module is used for acquiring multi-channel original audio data through the multi-microphone linear array;
the data analysis module is used for extracting the Mel frequency spectrum characteristics of the original audio data of each channel to obtain the Mel frequency spectrum of the corresponding channel;
the model identification module is used for carrying out sound classification identification by utilizing a sound identification model according to the Mel frequency spectrums of all the channels;
and the abnormal alarm module is used for triggering abnormal sound alarm when recognizing that abnormal sounds exist in the channels with the number exceeding the preset number.
9. An inspection robot comprising a multi-microphone linear array for acquiring multi-channel raw audio data, a processor and a memory storing a computer program, the processor being configured to execute the computer program to implement the device abnormal sound detection method of any one of claims 1-7.
10. A readable storage medium characterized in that it stores a computer program which, when executed on a processor, implements the device abnormal sound detection method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210840946.3A CN115206341B (en) | 2022-07-18 | 2022-07-18 | Equipment abnormal sound detection method and device and inspection robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210840946.3A CN115206341B (en) | 2022-07-18 | 2022-07-18 | Equipment abnormal sound detection method and device and inspection robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115206341A true CN115206341A (en) | 2022-10-18 |
CN115206341B CN115206341B (en) | 2024-10-29 |
Family
ID=83581701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210840946.3A Active CN115206341B (en) | 2022-07-18 | 2022-07-18 | Equipment abnormal sound detection method and device and inspection robot |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115206341B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2981641A1 (en) * | 2022-12-13 | 2024-10-09 | Univ Leon | SYSTEM, METHOD AND PROGRAM PRODUCT FOR AUTOMATIC ACCENT CLASSIFICATION IN AUDIO SIGNALS |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320670A (en) * | 2014-11-17 | 2015-01-28 | 东方网力科技股份有限公司 | Summary information extracting method and system for network video |
US20180132815A1 (en) * | 2016-11-11 | 2018-05-17 | iMEDI PLUS Inc. | Multi-mic sound collector and system and method for sound localization |
US20180299527A1 (en) * | 2015-12-22 | 2018-10-18 | Huawei Technologies Duesseldorf Gmbh | Localization algorithm for sound sources with known statistics |
CN110335617A (en) * | 2019-05-24 | 2019-10-15 | 国网新疆电力有限公司乌鲁木齐供电公司 | A kind of noise analysis method in substation |
CN112183647A (en) * | 2020-09-30 | 2021-01-05 | 国网山西省电力公司大同供电公司 | Transformer substation equipment sound fault detection and positioning method based on deep learning |
US20210233554A1 (en) * | 2020-01-24 | 2021-07-29 | Motional Ad Llc | Detection and classification of siren signals and localization of siren signal sources |
CN114220438A (en) * | 2022-02-22 | 2022-03-22 | 武汉大学 | Lightweight speaker identification method and system based on bottleeck and channel segmentation |
CN114373465A (en) * | 2021-11-30 | 2022-04-19 | 北京清微智能信息技术有限公司 | Voiceprint recognition method and device, electronic equipment and computer readable medium |
CN114678030A (en) * | 2022-03-17 | 2022-06-28 | 重庆邮电大学 | Voiceprint identification method and device based on depth residual error network and attention mechanism |
-
2022
- 2022-07-18 CN CN202210840946.3A patent/CN115206341B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320670A (en) * | 2014-11-17 | 2015-01-28 | 东方网力科技股份有限公司 | Summary information extracting method and system for network video |
US20180299527A1 (en) * | 2015-12-22 | 2018-10-18 | Huawei Technologies Duesseldorf Gmbh | Localization algorithm for sound sources with known statistics |
US20180132815A1 (en) * | 2016-11-11 | 2018-05-17 | iMEDI PLUS Inc. | Multi-mic sound collector and system and method for sound localization |
CN110335617A (en) * | 2019-05-24 | 2019-10-15 | 国网新疆电力有限公司乌鲁木齐供电公司 | A kind of noise analysis method in substation |
US20210233554A1 (en) * | 2020-01-24 | 2021-07-29 | Motional Ad Llc | Detection and classification of siren signals and localization of siren signal sources |
CN112183647A (en) * | 2020-09-30 | 2021-01-05 | 国网山西省电力公司大同供电公司 | Transformer substation equipment sound fault detection and positioning method based on deep learning |
CN114373465A (en) * | 2021-11-30 | 2022-04-19 | 北京清微智能信息技术有限公司 | Voiceprint recognition method and device, electronic equipment and computer readable medium |
CN114220438A (en) * | 2022-02-22 | 2022-03-22 | 武汉大学 | Lightweight speaker identification method and system based on bottleeck and channel segmentation |
CN114678030A (en) * | 2022-03-17 | 2022-06-28 | 重庆邮电大学 | Voiceprint identification method and device based on depth residual error network and attention mechanism |
Non-Patent Citations (2)
Title |
---|
SON J 等: "Practical inter-floor noise sensing system with localization and classification", 《SENSORS》, 30 September 2019 (2019-09-30) * |
王宁: ""基于麦克风阵列的机电设备故障声源定位系统"", 《中国优秀硕士学位论文全文数据库(工程科技Ⅱ辑)》, 15 July 2020 (2020-07-15), pages 1 - 2 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2981641A1 (en) * | 2022-12-13 | 2024-10-09 | Univ Leon | SYSTEM, METHOD AND PROGRAM PRODUCT FOR AUTOMATIC ACCENT CLASSIFICATION IN AUDIO SIGNALS |
Also Published As
Publication number | Publication date |
---|---|
CN115206341B (en) | 2024-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103903612B (en) | Method for performing real-time digital speech recognition | |
CN110444202B (en) | Composite voice recognition method, device, equipment and computer readable storage medium | |
CN108847253B (en) | Vehicle model identification method, device, computer equipment and storage medium | |
CN111031463A (en) | Microphone array performance evaluation method, device, equipment and medium | |
CN108922514B (en) | Robust feature extraction method based on low-frequency log spectrum | |
CN112908344A (en) | Intelligent recognition method, device, equipment and medium for bird song | |
CN115206341A (en) | Equipment abnormal sound detection method and device and inspection robot | |
CN112183582A (en) | Multi-feature fusion underwater target identification method | |
US20230116052A1 (en) | Array geometry agnostic multi-channel personalized speech enhancement | |
CN111782861A (en) | Noise detection method and device and storage medium | |
CN117782625A (en) | Vehicle fault acoustic detection method, system, control device and storage medium | |
Zeng et al. | Underwater sound classification based on Gammatone filter bank and Hilbert-Huang transform | |
CN113093106A (en) | Sound source positioning method and system | |
Colonna et al. | A framework for chainsaw detection using one-class kernel and wireless acoustic sensor networks into the amazon rainforest | |
WO2020250797A1 (en) | Information processing device, information processing method, and program | |
CN115598594B (en) | Unmanned aerial vehicle sound source positioning method and device, unmanned aerial vehicle and readable storage medium | |
CN105261363A (en) | Voice recognition method, device and terminal | |
CN117169812A (en) | Sound source positioning method based on deep learning and beam forming | |
US20220358953A1 (en) | Sound model generation device, sound model generation method, and recording medium | |
CN116364108A (en) | Transformer voiceprint detection method and device, electronic equipment and storage medium | |
CN107919136B (en) | Digital voice sampling frequency estimation method based on Gaussian mixture model | |
CN115293205A (en) | Anomaly detection method, self-encoder model training method and electronic equipment | |
CN111613247B (en) | Foreground voice detection method and device based on microphone array | |
CN111596261B (en) | Sound source positioning method and device | |
JP4859130B2 (en) | Monitoring system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |