Nothing Special   »   [go: up one dir, main page]

CN116072146A - Pumped storage station detection method and system based on voiceprint recognition - Google Patents

Pumped storage station detection method and system based on voiceprint recognition Download PDF

Info

Publication number
CN116072146A
CN116072146A CN202211591960.0A CN202211591960A CN116072146A CN 116072146 A CN116072146 A CN 116072146A CN 202211591960 A CN202211591960 A CN 202211591960A CN 116072146 A CN116072146 A CN 116072146A
Authority
CN
China
Prior art keywords
short
time
voiceprint
component
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211591960.0A
Other languages
Chinese (zh)
Inventor
于潇
卢彬
任刚
郭旭东
李国宾
董旭龙
赵雪鹏
金清山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Zhanghewan Pumped Storage Power Co ltd
State Grid Xinyuan Co Ltd
Original Assignee
Hebei Zhanghewan Pumped Storage Power Co ltd
State Grid Xinyuan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Zhanghewan Pumped Storage Power Co ltd, State Grid Xinyuan Co Ltd filed Critical Hebei Zhanghewan Pumped Storage Power Co ltd
Priority to CN202211591960.0A priority Critical patent/CN116072146A/en
Publication of CN116072146A publication Critical patent/CN116072146A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/09Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/20Hydro energy

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention provides a pumped storage station detection method and a system based on voiceprint recognition, comprising the steps of obtaining sound signals of operation of a main transformer and a water turbine; processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal; extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients; realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features; and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.

Description

Pumped storage station detection method and system based on voiceprint recognition
Technical Field
The disclosure relates to the technical field of voiceprint recognition, in particular to a pumped storage station detection method and system based on voiceprint recognition.
Background
The voiceprint signals of power transformers, reactors and turbines have long been ignored as noise. During operation, the power transformer (reactor) generates mechanical vibration due to the action of magnetostriction force and electric field force, and the iron core and the winding generate acoustic signals of the power transformer (reactor) through the propagation of an air medium, so that a large amount of state information of equipment is contained.
When the power transformer (reactor) fails, the internal structure changes, so that the voiceprint signal changes, and the power transformer (reactor) can be subjected to fault diagnosis through analysis of the voiceprint signal.
The information disclosed in the background section of this application is only for enhancement of understanding of the general background of this application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The embodiment of the disclosure provides a pumped storage station detection method and a pumped storage station detection system based on voiceprint recognition, which at least can solve part of problems in the prior art.
In a first aspect of embodiments of the present disclosure,
the pumped storage station detection method based on voiceprint recognition comprises the following steps:
obtaining main transformer and water turbine an operational sound signal;
processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features;
and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
In an alternative embodiment of the present invention,
the step of improving the recognition rate of the sound signal through the short-time frequency domain based on a weighted optimization algorithm comprises the following steps:
the recognition rate of the sound signal is improved according to the following formula:
Figure BDA0003994954730000021
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) Representing all samples of the ith recognition objectThe kth dimension component of the present, n i Representing the number of samples of the i-th recognition object.
In an alternative embodiment of the present invention,
the method for realizing multi-mode feature dimension reduction of the short-time domain based on the PCA algorithm comprises the following steps:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
Figure BDA0003994954730000031
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
Figure BDA0003994954730000032
variance contribution rate:
Figure BDA0003994954730000033
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
In an alternative embodiment of the present invention,
the voiceprint recognition model is constructed based on a DNN network, RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function and comprises four hidden layers.
In a second aspect of the embodiments of the present disclosure,
provided is a pumped storage station detection device based on voiceprint recognition, comprising:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature;
and a fifth unit, configured to identify the category of the voiceprint enhancement feature through a voiceprint identification model that is trained in advance.
In a third aspect of the embodiments of the present disclosure,
the utility model provides a pumped storage station detecting system based on voiceprint discernment, including aforementioned pumped storage station detecting device based on voiceprint discernment still includes:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
In an alternative embodiment of the present invention,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
In a fourth aspect of embodiments of the present disclosure,
there is provided an apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fifth aspect of embodiments of the present disclosure,
there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
Drawings
FIG. 1 is a flow chart of a pumped-storage station detection method based on voiceprint recognition in an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a pumped-storage station detection device based on voiceprint recognition according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein.
It should be understood that, in various embodiments of the present disclosure, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.
It should be understood that in this disclosure, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this disclosure, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in this disclosure, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a from which B may be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
The technical scheme of the present disclosure is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flow chart of a pumped-storage station detection method based on voiceprint recognition according to an embodiment of the present disclosure, as shown in fig. 1, the method includes:
s101, acquiring a sound signal of operation of a main transformer and a water turbine;
the operation sound signals of the main transformer and the water turbine are tedious and messy, and the similarity of noise in time domain and frequency domain under different working conditions is very high, so that the noise is difficult to directly analyze and identify. By extracting features in the noise signal, subsequent noise analysis can be facilitated.
S102, processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
for noise signals, it is not practical to directly perform time and frequency domain analysis on them, since they are often acquired over a long period of time. Therefore, it is first required to preprocess the sound signal acquired in one period of time, and then to divide the sound signal into several short-time signals. The signal preprocessing comprises two steps of framing and windowing.
To ensure the continuity of the signals of two adjacent frames, there is generally an overlap between the two frames. In voiceprint recognition, 20m to 30ms is usually taken as one frame. If the frame length is too short, the feature vector scale is smaller, and the representativeness is poor; if the frame length is too long, the speech signal is too large in crossing, and the accuracy of the feature vector is affected. Compared with the voice signal, the noise signal is stable, and the frame length can be properly increased to obtain higher accuracy, but the recognition efficiency can be seriously affected by the overlong frame length. Through experimental analysis, the frame length of about 500ms is most suitable for noise. In addition, the noise signals under the same working condition are considered to be stable, the continuity of two adjacent frames after framing is good, and the overlapping rate can be lower, so that the calculated amount is reduced. Through experimental analysis, the overlapping rate is 40%.
To extract the framing features, a discrete fourier transform is required. Direct discrete fourier transformation of the framing signal produces a large distortion. Therefore, each framing model is first windowed and then fourier transformed to increase the continuity across the signal, thereby reducing the distortion caused by the fourier transform. Common windowing functions are rectangular windows, hamming windows, hanning windows, and the like. Experiments show that the Hamming window has good low-pass capacity and can better reflect the frequency characteristic of short-time signals. Therefore, a hamming window is selected to process the divided frame signal.
S103, extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
after framing the original noise signal, each frame is regarded as a short-time stable signal, and short-time domain and frequency domain characteristics are extracted respectively. The short-time domain characteristics mainly comprise short-time energy, short-time average zero-crossing rate, short-time energy entropy and the like; the short-time frequency characteristics mainly comprise Mel frequency cepstrum coefficients and the like. In addition, according to the operation condition, the characteristics such as energy similarity, vibration correlation, vibration stability and frequency complexity are designed.
Short time energy: calculating the energy of each frame of noise signal, wherein the vectors formed by the short-time energy of all frames are short-time energy characteristics; short-time average zero-crossing rate: calculating the times of the signal waveform passing through the transverse axis in each frame, wherein vectors formed by the short-time zero crossing rates of all frames are the short-time average zero crossing rate characteristics;
short-term energy entropy: calculating the uniformity degree of energy distribution in each frame, wherein vectors formed by short-time energy entropy of all frames are short-time energy entropy characteristics;
mel frequency cepstral coefficients: the MFCC obtaining process comprises five processes of FFT exchange, modulus value obtaining, mel filtering, logarithmic transformation and DCT transformation, and each preprocessed frame is divided into frames to obtain an MFCC feature vector respectively, and the MFCC feature vectors are combined together to form a feature vector group. Specifically, each framing signal is first FPT transformed and taken to its modulus value, and then transformed to the Mel frequency domain by p Mel filter banks. The resulting set of p coefficients c (i) constitutes the MFCC feature vector for that frame.
S104, realizing multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain based on a weighted optimization algorithm to obtain voiceprint enhancement features; and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
After extracting the noise characteristics of the transformer, the target working condition needs to be modeled so as to compare and judge the noise to be detected. In speech recognition, commonly used models are gaussian mixture models (Gaussian Mixture Model, GMM) and hidden markov models (Hidden Markov Model, HMM), and i-veetor models, etc.
However, the above model is not adequate in fitting a nonlinear acoustic signal. In addition, the model contains a large number of parameters, and the calculation complexity is high. Google in 2014 proposed using deep neural networks as speaker models for voiceprint recognition and named d-vector. During training, the completed DM is used, after training is completed, the last classifying layer is removed, the output result of the last hiding layer is used as a speaker voice template to be compared, and compared with the i-vector model with the best effect, the accuracy is improved, and the calculated amount is greatly reduced. The present embodiment uses DNN as a baseline, and improves noise characteristics.
The network structure is analyzed based on the DMN transformer voiceprint characteristics, the network structure is input into a spliced result of the extracted framing characteristic vector, the spliced result is output into One-hot vectors corresponding to different working conditions, in the neural network, a ReLU is used as an activation function, a softMax function is used for classification, and a cross entropy function is used for an objective function.
Because the noise sample data under each working condition is too small in the actual running state, the model is not sufficiently trained, and the collection of a large number of samples under different working conditions in voiceprint recognition is a very time-consuming and laborious task. In addition, if the sound model of a different working condition is trained, if other types of equipment are needed to be used as monitoring objects, the training needs to be performed again, and time and resources are wasted. Based on the assumption that the acoustic characteristics have a large degree of commonality, the thought of parameter migration is used for training and learning the noise characteristics of different types of equipment. Specifically, training is performed in a manner that fits a specific monitoring target by using a pre-training model trained using a large amount of general data.
After training the neural network by using training data, the voice of the target speaker passes through the trained neural network with fixed parameters, and the output classification layer with the label at the last layer is removed, so that the output of the hidden layer at the last layer is used as a voiceprint template corresponding to the working condition. And 5-15 pieces of data in the training data set are randomly taken out for different working conditions, and a target template is obtained through a neural network. Because the template selection has certain randomness, the k-means algorithm is used for clustering the obtained target templates in order to ensure that the target templates can accurately reflect noise conditions under the working conditions. Specifically, the 5 to 15 corresponding target template outputs are clustered with k being 2. After the clustering is completed, the clusters where most samples are located are selected as sample templates representing the working conditions.
Finally, the noise signal to be detected is sent to a trained DNN model after the characteristics of the noise signal are extracted, and then the output o of the last hidden layer is obtained. And comparing the similarity of o and the output p of the target template. Specifically, a cosine similarity measure is used:
Figure BDA0003994954730000091
in an alternative embodiment of the present invention,
the step of improving the recognition rate of the sound signal through the short-time frequency domain based on a weighted optimization algorithm comprises the following steps:
the recognition rate of the sound signal is improved according to the following formula:
Figure BDA0003994954730000092
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) The kth dimension component, n, representing all samples of the ith recognition object i Representing the number of samples of the i-th recognition object.
In an alternative embodiment of the present invention,
the method for realizing multi-mode feature dimension reduction of the short-time domain based on the PCA algorithm comprises the following steps:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
Figure BDA0003994954730000101
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
Figure BDA0003994954730000102
variance contribution rate:
Figure BDA0003994954730000103
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
In an alternative embodiment of the present invention,
the voiceprint recognition model is constructed based on a DNN network, RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function and comprises four hidden layers.
In a second aspect of the embodiments of the present disclosure,
fig. 2 is a schematic structural diagram of a pumped-storage station detection device based on voiceprint recognition according to an embodiment of the present disclosure, including:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature; for identifying the category of the voiceprint enhancement features by means of a pre-trained voiceprint identification model.
In a third aspect of the embodiments of the present disclosure,
the utility model provides a pumped storage station detecting system based on voiceprint discernment, including aforementioned pumped storage station detecting device based on voiceprint discernment still includes:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
In an alternative embodiment of the present invention,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
In a fourth aspect of embodiments of the present disclosure,
there is provided an apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fifth aspect of embodiments of the present disclosure,
there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present disclosure.

Claims (9)

1. A pumped storage station detection method based on voiceprint recognition, comprising:
acquiring sound signals of operation of a main transformer and a water turbine;
processing each frame of the sound signal through a Hamming window, and performing Fourier transform to obtain a short-time stable signal;
extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
realizing multi-mode feature dimension reduction on the short-time domain by a PCA algorithm, and improving the recognition rate of the sound signal on the short-time frequency domain by a weighting optimization algorithm to obtain voiceprint enhancement features;
and identifying the category of the voiceprint enhancement feature through a pre-trained voiceprint identification model.
2. The method of claim 1, wherein increasing the recognition rate of the sound signal for the short-time frequency domain by a weight-based optimization algorithm comprises:
the recognition rate of the sound signal is improved according to the following formula:
Figure FDA0003994954720000011
wherein F (k) represents the recognition rate of the voice signal corresponding to the kth dimension component of the short time frequency domain, N represents the number of recognition objects, u k (i) The kth dimension component, u, representing the ith object feature vector k Mean value, x k (i) The kth dimension component, n, representing all samples of the ith recognition object i Representing the number of samples of the i-th recognition object.
3. The method of claim 1, wherein implementing multi-modal feature dimension reduction on the short-time domain by PCA-based algorithm comprises:
setting the dimension of a joint feature vector formed by the voiceprint time domain and the frequency domain feature vector as m, and constructing a matrix X of m X n;
and (3) calculating a correlation matrix R of X, and simultaneously calculating characteristic values and characteristic vectors of R:
Figure FDA0003994954720000021
R=X T X/(n-1)
calculating the eigenvalue lambda of R 1 ,λ 2 ,...λ m And feature vector u corresponding to each feature value 1 ,u 2 ,...u m Determining a variance contribution and a cumulative variance contribution rate, wherein,
variance contribution:
Figure FDA0003994954720000022
variance contribution rate:
Figure FDA0003994954720000023
and (3) selecting p with the accumulated variance contribution rate larger than 75% as the number of principal components, namely the dimension of the feature vector after the dimension reduction, and realizing the dimension reduction from the m-dimension voiceprint feature vector to the p-dimension feature vector.
4. The method of claim 1, wherein the voiceprint recognition model is constructed based on a DNN network, a RELU is used as an activation function, a softmax classification function is used, and an objective function is a cross entropy function, comprising four hidden layers.
5. Pumped storage station detection device based on voiceprint discernment, characterized by comprising:
the first unit is used for acquiring sound signals of the operation of the main transformer and the water turbine;
a second unit, configured to process each frame of the sound signal through a hamming window, and perform fourier transform to obtain a short-time stationary signal;
the third unit is used for extracting a short-time domain and a short-time frequency domain of the short-time stationary signal, wherein the short-time domain comprises at least one of short-time energy, short-time average zero-crossing rate and short-time energy entropy, and the short-time frequency domain comprises Mel frequency cepstrum coefficients;
a fourth unit, configured to implement multi-mode feature dimension reduction on the short-time domain based on a PCA algorithm, and improve the recognition rate of the sound signal on the short-time domain based on a weighted optimization algorithm, to obtain a voiceprint enhancement feature;
and a fifth unit, configured to identify the category of the voiceprint enhancement feature through a voiceprint identification model that is trained in advance.
6. A voiceprint recognition-based pumped-storage station detection system comprising the voiceprint recognition-based pumped-storage station detection apparatus of claim 5, further comprising:
an infrastructure layer comprising an application server, an audio server, and unstructured storage;
the data layer comprises multi-sensor sound data, a sound recognition model and labeling data;
the core component layer comprises a sound characteristic analysis component, a storage component, a network communication component and an interface service component;
the sound analysis service layer comprises an abnormal working condition detection component, an abnormal working condition early warning component, a resource library management component and a storage control component.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
the sound characteristic analysis component comprises sound characteristic extraction, intelligent recognition algorithm and sound stream code processing;
the storage component comprises a distributed component, a storage verification and storage structure component;
the network communication component comprises data encryption and decryption, request forwarding task scheduling and data caching;
the interface service component comprises interface release, interface service and interface state.
8. An apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 4.
9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 4.
CN202211591960.0A 2022-12-12 2022-12-12 Pumped storage station detection method and system based on voiceprint recognition Pending CN116072146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211591960.0A CN116072146A (en) 2022-12-12 2022-12-12 Pumped storage station detection method and system based on voiceprint recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211591960.0A CN116072146A (en) 2022-12-12 2022-12-12 Pumped storage station detection method and system based on voiceprint recognition

Publications (1)

Publication Number Publication Date
CN116072146A true CN116072146A (en) 2023-05-05

Family

ID=86177840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211591960.0A Pending CN116072146A (en) 2022-12-12 2022-12-12 Pumped storage station detection method and system based on voiceprint recognition

Country Status (1)

Country Link
CN (1) CN116072146A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118391183A (en) * 2024-04-23 2024-07-26 淮阴工学院 A method for turbine condition monitoring and fault diagnosis based on voiceprint imaging

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118391183A (en) * 2024-04-23 2024-07-26 淮阴工学院 A method for turbine condition monitoring and fault diagnosis based on voiceprint imaging

Similar Documents

Publication Publication Date Title
CN108198547B (en) Voice endpoint detection method and device, computer equipment and storage medium
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
CN110910891B (en) Speaker segmentation labeling method based on long-time and short-time memory deep neural network
CN110349597B (en) A kind of voice detection method and device
CN108305616A (en) A kind of audio scene recognition method and device based on long feature extraction in short-term
JP2019211749A (en) Method and apparatus for detecting starting point and finishing point of speech, computer facility, and program
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN108986798B (en) Processing method, device and the equipment of voice data
CN110120230B (en) Acoustic event detection method and device
CN112331207B (en) Service content monitoring method, device, electronic equipment and storage medium
CN115101076B (en) Speaker clustering method based on multi-scale channel separation convolution feature extraction
CN112071308A (en) Awakening word training method based on speech synthesis data enhancement
CN108091340B (en) Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium
CN112735477B (en) Voice emotion analysis method and device
Rahman et al. Dynamic time warping assisted svm classifier for bangla speech recognition
CN114121025A (en) Voiceprint fault intelligent detection method and device for substation equipment
CN116072146A (en) Pumped storage station detection method and system based on voiceprint recognition
Salhi et al. Robustness of auditory teager energy cepstrum coefficients for classification of pathological and normal voices in noisy environments
CN117594061A (en) A sound detection and localization method based on multi-scale feature attention network
CN117877510A (en) Voice automatic test method, device, electronic equipment and storage medium
CN113921018B (en) Voiceprint recognition model training method and device, voiceprint recognition method and device
Faridh et al. HiVAD: a voice activity detection application based on deep learning
Bai et al. CIAIC-BAD system for DCASE2018 challenge task 3
Lavania et al. Reviewing Human-Machine Interaction through Speech Recognition approaches and Analyzing an approach for Designing an Efficient System
CN110689875A (en) Language identification method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination