CN116072146A - Pumped storage station detection method and system based on voiceprint recognition - Google Patents
Pumped storage station detection method and system based on voiceprint recognition Download PDFInfo
- Publication number
- CN116072146A CN116072146A CN202211591960.0A CN202211591960A CN116072146A CN 116072146 A CN116072146 A CN 116072146A CN 202211591960 A CN202211591960 A CN 202211591960A CN 116072146 A CN116072146 A CN 116072146A
- Authority
- CN
- China
- Prior art keywords
- short
- time
- voiceprint
- component
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 56
- 238000001514 detection method Methods 0.000 title claims abstract description 20
- 230000005236 sound signal Effects 0.000 claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 16
- 230000009467 reduction Effects 0.000 claims abstract description 16
- 238000005457 optimization Methods 0.000 claims abstract description 10
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 8
- 239000000306 component Substances 0.000 claims description 54
- 239000013598 vector Substances 0.000 claims description 37
- 238000000034 method Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 25
- 238000004458 analytical method Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 239000008358 core component Substances 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 9
- 238000009432 framing Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical group [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000004804 winding Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E10/00—Energy generation through renewable energy sources
- Y02E10/20—Hydro energy
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention provides a pumped storage station detection method and a system based on voiceprint recognition, comprising the steps of obtaining sound signals of operation of a main transformer and a water turbine; processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal; extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients; realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features; and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
Description
Technical Field
The disclosure relates to the technical field of voiceprint recognition, in particular to a pumped storage station detection method and system based on voiceprint recognition.
Background
The voiceprint signals of power transformers, reactors and turbines have long been ignored as noise. During operation, the power transformer (reactor) generates mechanical vibration due to the action of magnetostriction force and electric field force, and the iron core and the winding generate acoustic signals of the power transformer (reactor) through the propagation of an air medium, so that a large amount of state information of equipment is contained.
When the power transformer (reactor) fails, the internal structure changes, so that the voiceprint signal changes, and the power transformer (reactor) can be subjected to fault diagnosis through analysis of the voiceprint signal.
The information disclosed in the background section of this application is only for enhancement of understanding of the general background of this application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The embodiment of the disclosure provides a pumped storage station detection method and a pumped storage station detection system based on voiceprint recognition, which at least can solve part of problems in the prior art.
In a first aspect of embodiments of the present disclosure,
the pumped storage station detection method based on voiceprint recognition comprises the following steps:
obtaining main transformer and water turbine an operational sound signal;
processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features;
and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
In an alternative embodiment of the present invention,
the step of improving the recognition rate of the sound signal through the short-time frequency domain based on a weighted optimization algorithm comprises the following steps:
the recognition rate of the sound signal is improved according to the following formula:
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) Representing all samples of the ith recognition objectThe kth dimension component of the present, n i Representing the number of samples of the i-th recognition object.
In an alternative embodiment of the present invention,
the method for realizing multi-mode feature dimension reduction of the short-time domain based on the PCA algorithm comprises the following steps:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
variance contribution rate:
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
In an alternative embodiment of the present invention,
the voiceprint recognition model is constructed based on a DNN network, RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function and comprises four hidden layers.
In a second aspect of the embodiments of the present disclosure,
provided is a pumped storage station detection device based on voiceprint recognition, comprising:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature;
and a fifth unit, configured to identify the category of the voiceprint enhancement feature through a voiceprint identification model that is trained in advance.
In a third aspect of the embodiments of the present disclosure,
the utility model provides a pumped storage station detecting system based on voiceprint discernment, including aforementioned pumped storage station detecting device based on voiceprint discernment still includes:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
In an alternative embodiment of the present invention,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
In a fourth aspect of embodiments of the present disclosure,
there is provided an apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fifth aspect of embodiments of the present disclosure,
there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
Drawings
FIG. 1 is a flow chart of a pumped-storage station detection method based on voiceprint recognition in an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a pumped-storage station detection device based on voiceprint recognition according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein.
It should be understood that, in various embodiments of the present disclosure, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.
It should be understood that in this disclosure, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this disclosure, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in this disclosure, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a from which B may be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
The technical scheme of the present disclosure is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flow chart of a pumped-storage station detection method based on voiceprint recognition according to an embodiment of the present disclosure, as shown in fig. 1, the method includes:
s101, acquiring a sound signal of operation of a main transformer and a water turbine;
the operation sound signals of the main transformer and the water turbine are tedious and messy, and the similarity of noise in time domain and frequency domain under different working conditions is very high, so that the noise is difficult to directly analyze and identify. By extracting features in the noise signal, subsequent noise analysis can be facilitated.
S102, processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
for noise signals, it is not practical to directly perform time and frequency domain analysis on them, since they are often acquired over a long period of time. Therefore, it is first required to preprocess the sound signal acquired in one period of time, and then to divide the sound signal into several short-time signals. The signal preprocessing comprises two steps of framing and windowing.
To ensure the continuity of the signals of two adjacent frames, there is generally an overlap between the two frames. In voiceprint recognition, 20m to 30ms is usually taken as one frame. If the frame length is too short, the feature vector scale is smaller, and the representativeness is poor; if the frame length is too long, the speech signal is too large in crossing, and the accuracy of the feature vector is affected. Compared with the voice signal, the noise signal is stable, and the frame length can be properly increased to obtain higher accuracy, but the recognition efficiency can be seriously affected by the overlong frame length. Through experimental analysis, the frame length of about 500ms is most suitable for noise. In addition, the noise signals under the same working condition are considered to be stable, the continuity of two adjacent frames after framing is good, and the overlapping rate can be lower, so that the calculated amount is reduced. Through experimental analysis, the overlapping rate is 40%.
To extract the framing features, a discrete fourier transform is required. Direct discrete fourier transformation of the framing signal produces a large distortion. Therefore, each framing model is first windowed and then fourier transformed to increase the continuity across the signal, thereby reducing the distortion caused by the fourier transform. Common windowing functions are rectangular windows, hamming windows, hanning windows, and the like. Experiments show that the Hamming window has good low-pass capacity and can better reflect the frequency characteristic of short-time signals. Therefore, a hamming window is selected to process the divided frame signal.
S103, extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
after framing the original noise signal, each frame is regarded as a short-time stable signal, and short-time domain and frequency domain characteristics are extracted respectively. The short-time domain characteristics mainly comprise short-time energy, short-time average zero-crossing rate, short-time energy entropy and the like; the short-time frequency characteristics mainly comprise Mel frequency cepstrum coefficients and the like. In addition, according to the operation condition, the characteristics such as energy similarity, vibration correlation, vibration stability and frequency complexity are designed.
Short time energy: calculating the energy of each frame of noise signal, wherein the vectors formed by the short-time energy of all frames are short-time energy characteristics; short-time average zero-crossing rate: calculating the times of the signal waveform passing through the transverse axis in each frame, wherein vectors formed by the short-time zero crossing rates of all frames are the short-time average zero crossing rate characteristics;
short-term energy entropy: calculating the uniformity degree of energy distribution in each frame, wherein vectors formed by short-time energy entropy of all frames are short-time energy entropy characteristics;
mel frequency cepstral coefficients: the MFCC obtaining process comprises five processes of FFT exchange, modulus value obtaining, mel filtering, logarithmic transformation and DCT transformation, and each preprocessed frame is divided into frames to obtain an MFCC feature vector respectively, and the MFCC feature vectors are combined together to form a feature vector group. Specifically, each framing signal is first FPT transformed and taken to its modulus value, and then transformed to the Mel frequency domain by p Mel filter banks. The resulting set of p coefficients c (i) constitutes the MFCC feature vector for that frame.
S104, realizing multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain based on a weighted optimization algorithm to obtain voiceprint enhancement features; and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
After extracting the noise characteristics of the transformer, the target working condition needs to be modeled so as to compare and judge the noise to be detected. In speech recognition, commonly used models are gaussian mixture models (Gaussian Mixture Model, GMM) and hidden markov models (Hidden Markov Model, HMM), and i-veetor models, etc.
However, the above model is not adequate in fitting a nonlinear acoustic signal. In addition, the model contains a large number of parameters, and the calculation complexity is high. Google in 2014 proposed using deep neural networks as speaker models for voiceprint recognition and named d-vector. During training, the completed DM is used, after training is completed, the last classifying layer is removed, the output result of the last hiding layer is used as a speaker voice template to be compared, and compared with the i-vector model with the best effect, the accuracy is improved, and the calculated amount is greatly reduced. The present embodiment uses DNN as a baseline, and improves noise characteristics.
The network structure is analyzed based on the DMN transformer voiceprint characteristics, the network structure is input into a spliced result of the extracted framing characteristic vector, the spliced result is output into One-hot vectors corresponding to different working conditions, in the neural network, a ReLU is used as an activation function, a softMax function is used for classification, and a cross entropy function is used for an objective function.
Because the noise sample data under each working condition is too small in the actual running state, the model is not sufficiently trained, and the collection of a large number of samples under different working conditions in voiceprint recognition is a very time-consuming and laborious task. In addition, if the sound model of a different working condition is trained, if other types of equipment are needed to be used as monitoring objects, the training needs to be performed again, and time and resources are wasted. Based on the assumption that the acoustic characteristics have a large degree of commonality, the thought of parameter migration is used for training and learning the noise characteristics of different types of equipment. Specifically, training is performed in a manner that fits a specific monitoring target by using a pre-training model trained using a large amount of general data.
After training the neural network by using training data, the voice of the target speaker passes through the trained neural network with fixed parameters, and the output classification layer with the label at the last layer is removed, so that the output of the hidden layer at the last layer is used as a voiceprint template corresponding to the working condition. And 5-15 pieces of data in the training data set are randomly taken out for different working conditions, and a target template is obtained through a neural network. Because the template selection has certain randomness, the k-means algorithm is used for clustering the obtained target templates in order to ensure that the target templates can accurately reflect noise conditions under the working conditions. Specifically, the 5 to 15 corresponding target template outputs are clustered with k being 2. After the clustering is completed, the clusters where most samples are located are selected as sample templates representing the working conditions.
Finally, the noise signal to be detected is sent to a trained DNN model after the characteristics of the noise signal are extracted, and then the output o of the last hidden layer is obtained. And comparing the similarity of o and the output p of the target template. Specifically, a cosine similarity measure is used:
in an alternative embodiment of the present invention,
the step of improving the recognition rate of the sound signal through the short-time frequency domain based on a weighted optimization algorithm comprises the following steps:
the recognition rate of the sound signal is improved according to the following formula:
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) The kth dimension component, n, representing all samples of the ith recognition object i Representing the number of samples of the i-th recognition object.
In an alternative embodiment of the present invention,
the method for realizing multi-mode feature dimension reduction of the short-time domain based on the PCA algorithm comprises the following steps:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
variance contribution rate:
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
In an alternative embodiment of the present invention,
the voiceprint recognition model is constructed based on a DNN network, RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function and comprises four hidden layers.
In a second aspect of the embodiments of the present disclosure,
fig. 2 is a schematic structural diagram of a pumped-storage station detection device based on voiceprint recognition according to an embodiment of the present disclosure, including:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature; for identifying the category of the voiceprint enhancement features by means of a pre-trained voiceprint identification model.
In a third aspect of the embodiments of the present disclosure,
the utility model provides a pumped storage station detecting system based on voiceprint discernment, including aforementioned pumped storage station detecting device based on voiceprint discernment still includes:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
In an alternative embodiment of the present invention,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
In a fourth aspect of embodiments of the present disclosure,
there is provided an apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fifth aspect of embodiments of the present disclosure,
there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present disclosure.
Claims (9)
1. A pumped storage station detection method based on voiceprint recognition, comprising:
acquiring sound signals of operation of a main transformer and a water turbine;
processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features;
and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
2. The method of claim 1, wherein increasing the recognition rate of the sound signal for the short-time frequency domain by a weight-based optimization algorithm comprises:
the recognition rate of the sound signal is improved according to the following formula:
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) The kth dimension component, n, representing all samples of the ith recognition object i Representing the number of samples of the i-th recognition object.
3. The method of claim 1, wherein implementing multi-modal feature dimension reduction on the short-time domain by PCA-based algorithm comprises:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
variance contribution rate:
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
4. The method of claim 1, wherein the voiceprint recognition model is constructed based on a DNN network, a RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function, comprising four hidden layers.
5. Pumped storage station detection device based on voiceprint discernment, characterized by comprising:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature;
and a fifth unit, configured to identify the category of the voiceprint enhancement feature through a voiceprint identification model that is trained in advance.
6. A voiceprint recognition-based pumped-storage station detection system comprising the voiceprint recognition-based pumped-storage station detection apparatus of claim 5, further comprising:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
8. An apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 4.
9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591960.0A CN116072146A (en) | 2022-12-12 | 2022-12-12 | Pumped storage station detection method and system based on voiceprint recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591960.0A CN116072146A (en) | 2022-12-12 | 2022-12-12 | Pumped storage station detection method and system based on voiceprint recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116072146A true CN116072146A (en) | 2023-05-05 |
Family
ID=86177840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211591960.0A Pending CN116072146A (en) | 2022-12-12 | 2022-12-12 | Pumped storage station detection method and system based on voiceprint recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116072146A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118391183A (en) * | 2024-04-23 | 2024-07-26 | 淮阴工学院 | A method for turbine condition monitoring and fault diagnosis based on voiceprint imaging |
-
2022
- 2022-12-12 CN CN202211591960.0A patent/CN116072146A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118391183A (en) * | 2024-04-23 | 2024-07-26 | 淮阴工学院 | A method for turbine condition monitoring and fault diagnosis based on voiceprint imaging |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108198547B (en) | Voice endpoint detection method and device, computer equipment and storage medium | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
CN110910891B (en) | Speaker segmentation labeling method based on long-time and short-time memory deep neural network | |
CN110349597B (en) | A kind of voice detection method and device | |
CN108305616A (en) | A kind of audio scene recognition method and device based on long feature extraction in short-term | |
JP2019211749A (en) | Method and apparatus for detecting starting point and finishing point of speech, computer facility, and program | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN110120230B (en) | Acoustic event detection method and device | |
CN112331207B (en) | Service content monitoring method, device, electronic equipment and storage medium | |
CN115101076B (en) | Speaker clustering method based on multi-scale channel separation convolution feature extraction | |
CN112071308A (en) | Awakening word training method based on speech synthesis data enhancement | |
CN108091340B (en) | Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium | |
CN112735477B (en) | Voice emotion analysis method and device | |
Rahman et al. | Dynamic time warping assisted svm classifier for bangla speech recognition | |
CN114121025A (en) | Voiceprint fault intelligent detection method and device for substation equipment | |
CN116072146A (en) | Pumped storage station detection method and system based on voiceprint recognition | |
Salhi et al. | Robustness of auditory teager energy cepstrum coefficients for classification of pathological and normal voices in noisy environments | |
CN117594061A (en) | A sound detection and localization method based on multi-scale feature attention network | |
CN117877510A (en) | Voice automatic test method, device, electronic equipment and storage medium | |
CN113921018B (en) | Voiceprint recognition model training method and device, voiceprint recognition method and device | |
Faridh et al. | HiVAD: a voice activity detection application based on deep learning | |
Bai et al. | CIAIC-BAD system for DCASE2018 challenge task 3 | |
Lavania et al. | Reviewing Human-Machine Interaction through Speech Recognition approaches and Analyzing an approach for Designing an Efficient System | |
CN110689875A (en) | Language identification method and device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |