Nothing Special   »   [go: up one dir, main page]

CN113113026A - Voiceprint identity authentication system and intelligent detection closestool based on home user level - Google Patents

Voiceprint identity authentication system and intelligent detection closestool based on home user level Download PDF

Info

Publication number
CN113113026A
CN113113026A CN202110404410.2A CN202110404410A CN113113026A CN 113113026 A CN113113026 A CN 113113026A CN 202110404410 A CN202110404410 A CN 202110404410A CN 113113026 A CN113113026 A CN 113113026A
Authority
CN
China
Prior art keywords
voiceprint
module
detection
sound
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110404410.2A
Other languages
Chinese (zh)
Inventor
李春林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Defang Information Technology Co ltd
Original Assignee
Chongqing Defang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Defang Information Technology Co ltd filed Critical Chongqing Defang Information Technology Co ltd
Priority to CN202110404410.2A priority Critical patent/CN113113026A/en
Publication of CN113113026A publication Critical patent/CN113113026A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/12Score normalisation
    • EFIXED CONSTRUCTIONS
    • E03WATER SUPPLY; SEWERAGE
    • E03DWATER-CLOSETS OR URINALS WITH FLUSHING DEVICES; FLUSHING VALVES THEREFOR
    • E03D11/00Other component parts of water-closets, e.g. noise-reducing means in the flushing system, flushing pipes mounted in the bowl, seals for the bowl outlet, devices preventing overflow of the bowl contents; devices forming a water seal in the bowl after flushing, devices eliminating obstructions in the bowl outlet or preventing backflow of water and excrements from the waterpipe
    • E03D11/02Water-closet bowls ; Bowls with a double odour seal optionally with provisions for a good siphonic action; siphons as part of the bowl
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Hydrology & Water Resources (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a voiceprint identity authentication system based on a home user level, which comprises a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit, wherein the sound collector is connected with the sound detection unit; the invention also discloses an intelligent detection closestool. In the invention, the identity authentication of the home user is carried out through voiceprint recognition, the identity authentication process is simpler, the problem of privacy disclosure does not exist, and the use experience of the user is greatly improved; the voice detection unit is arranged, and the voiceprint authentication system is started to work when the maximum amplitude of the audio signal exceeds the preset starting amplitude, so that the standby power consumption of the system can be reduced; the UBM model is adopted to train the voiceprint models of family members, so that the training time is short, and the training efficiency is high.

Description

Voiceprint identity authentication system and intelligent detection closestool based on home user level
Technical Field
The invention relates to the technical field of intelligent toilets, in particular to a voiceprint identity authentication system based on a home user level and an intelligent detection toilet.
Background
More and more existing smart homes need to use a home user-level identity authentication system, a fingerprint identification or face identification mode is generally adopted in the prior art, but the fingerprint identification needs a detected person to directly contact a finger with a fingerprint collector to confirm identity before using the smart home, the detected person cannot use the smart home if the detected person has water on the hand or is inconvenient to collect fingerprints when doing other things, and partial palm of the hand is often peeled off, or the finger of a child is small, so that the identity cannot be authenticated, and the detection fails; the mode that adopts face identification then need use the camera, uses the camera to reveal privacy easily at home, also has inconveniently.
In addition, the existing intelligent toilet has the urine and excrement detection function, but the identity of a detected person needs to be authenticated before detection, and the prior art generally adopts a fingerprint identification or face identification mode so as to correspond a detection result to the detected person, but the adoption of the fingerprint identification can increase a program when the toilet is used, and the detected person may not be in time to collect a fingerprint when the detected person is in urgent need of using the toilet; the adoption of the face recognition mode needs to use a camera, and the use of the camera in a toilet may reveal privacy, and particularly for places with integrated bathroom and toilet, the face recognition mode is not suitable for the identity verification mode.
Disclosure of Invention
The invention aims to provide a voiceprint identity authentication system based on a home user level and an intelligent detection closestool.
The technical scheme of the invention is as follows:
a voiceprint identity authentication system based on a home user level comprises a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit;
the sound collector is used for collecting sound information to generate an audio signal;
The voice detection unit is used for detecting the maximum amplitude of the audio signal and enabling the voiceprint authentication system to start working when the maximum amplitude of the audio signal exceeds a preset starting amplitude;
the sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals;
the voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals;
the UBM model training unit is used for training a voiceprint model of each family member through the UBM model;
the voiceprint library is used for respectively storing the voiceprint models of all family members;
and the judgment unit is used for comparing the extracted voiceprint characteristics with the voiceprint models of all family members in the voiceprint library one by one, scoring and authenticating the identity according to the scoring result.
An intelligent detection closestool comprises a urinal, a water tank, a closestool cover, a voiceprint authentication system, a sampling device, a pipeline system, a detection device and a control system; the voiceprint authentication system, the sampling device, the detection device, the pipeline system and the control system are all arranged on the toilet cover, and the voiceprint authentication system, the sampling device, the detection device and the pipeline system are all electrically connected with the control system;
The voiceprint authentication system is used for training the voice information of family members to generate a voiceprint library and authenticating the identity according to the collected voice information;
the voiceprint authentication system comprises a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit;
the sound collector is used for collecting sound information to generate an audio signal;
the voice detection unit is used for detecting the maximum amplitude of the audio signal and enabling the voiceprint authentication system to start working when the maximum amplitude of the audio signal exceeds a preset starting amplitude;
the sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals;
the voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals;
the UBM model training unit is used for training a voiceprint model of each family member through the UBM model;
the voiceprint library is used for respectively storing the voiceprint models of all family members;
the judgment unit is used for comparing the extracted voiceprint characteristics with voiceprint models of all family members in a voiceprint library one by one and scoring, and performing identity authentication according to a scoring result;
The control system is used for controlling the sampling device and the detection device to work after the judgment unit finishes the identity authentication, analyzing the detection result sent by the detection device, generating a detection report and storing the detection report in the corresponding family member directory;
the sampling device is used for collecting a liquid sample, and the liquid sample comprises a urine sample and/or a stool dilution sample;
the pipeline system is used for conveying the collected liquid sample to the detection device;
the detection device is used for detecting the liquid sample and sending a detection result to the control system.
In the invention, the identity authentication is carried out through the voiceprint, and the user can finish the identity authentication only by randomly saying a plurality of words, so that the use is simpler, the privacy disclosure problem does not exist, and the use experience of the user is greatly improved; the voiceprint identity authentication system aims at the family user level, and because the number of people in a family is limited and the excessively high identification precision is not required, the training model of the voiceprint identity authentication system is simpler, and the training time is short and the training efficiency is high by adopting a UBM model to train the voiceprint models of family members; in addition, by arranging the sound detection unit, the voiceprint authentication system is started to work when the maximum amplitude of the audio signal exceeds the preset starting amplitude, so that the standby power consumption of the system can be reduced.
Drawings
FIG. 1 is a block diagram of a preferred embodiment of the home user level based voiceprint authentication system of the present invention;
FIG. 2 is a logic diagram of a preferred embodiment of the intelligent detection toilet of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, a preferred embodiment of the voiceprint authentication system based on the home user level of the present invention includes a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a decision unit.
The sound collector is used for collecting sound information to generate an audio signal. The voice detection unit is used for detecting the maximum amplitude of the audio signal, and when the maximum amplitude of the audio signal exceeds a preset starting amplitude, the voiceprint authentication system is started to work.
The sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals. The sound preprocessing unit comprises a silence detection module, a voice quality detection module and an effective audio extraction module; the silence detection module is used for removing silence and pause parts in the audio signal; the voice quality detection module is used for detecting the signal-to-noise ratio, the amplitude truncation size and the volume of the audio signal; the effective audio extraction module is used for extracting effective audio according to the detection result.
The voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals; the voiceprint feature extraction unit comprises a pre-emphasis module, a framing module, a windowing module, a Fourier transform module, a Mel filter bank, a cepstrum module and a difference module.
The pre-emphasis module is used for emphasizing the high-frequency part of the sound to enable the energy of the high-frequency part of the sound to be close to the energy of the low-frequency part of the sound.
The framing module is used for framing the audio signal to enable each frame to be a stable signal.
And the windowing module is used for enabling the amplitude of each frame signal to gradually change to 0 at two ends through windowing operation. The windowing operation is actually the multiplication of the frame by a window function.
The Fourier transform module is used for decomposing the time domain signal into a plurality of sinusoidal signals with different frequencies through Fourier transform.
The Mel filter bank is used for filtering the signals after Fourier transform.
The cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum (namely, the power spectrum) of the filtered signal to obtain an N-dimensional cepstrum coefficient (N is a natural number).
The standard cepstral parameters MFCC only reflect the static characteristics of the speech parameters, and the dynamic characteristics of speech can be described by the differential spectrum of these static characteristics. The dynamic and static characteristics can be combined by adding the differential parameters, and the identification performance of the system is effectively improved.
The UBM model training unit is used for training a voiceprint model of each family member through the UBM model; the training method of the UBM model comprises the following steps: collecting the voice of a large number of non-family members, training a universal background model UBM by using the voiceprint data as background data, and then carrying out self-adaptive training on each single Gaussian model of the UBM by a family user before using the universal background model UBM according to the voiceprint characteristic parameters of each family member to obtain the voiceprint model of each family member.
The algorithm parameters in the UBM training process comprise matrix quantity, a training method, iteration times and an initialization method; the training data subset defining method comprises the total training data, the number of speakers, the number of speeches spoken by each speaker, a speaker selection method, a use mode of a feature vector and stability of data according to a channel, a microphone and a language.
The parameters for judging whether the UBM model meets the requirements comprise an error rejection rate, an error acceptance rate, an equal error rate, an accuracy rate, an ROC curve, an extraction speed and a verification comparison speed.
The voiceprint library is used for respectively storing the voiceprint models of all family members.
And the judgment unit is used for comparing the extracted voiceprint characteristics with the voiceprint models of all family members in the voiceprint library one by one, scoring and authenticating the identity according to the scoring result. The method for performing identity authentication according to the scoring result comprises the following steps: setting a score threshold value, and if the score exceeding the score threshold value is one, determining that the identity authentication result is the family member corresponding to the voiceprint model with the highest score; if more or less than one score exceeds the score threshold, the authentication fails.
During the use, sound collector gathers sound in real time, if the size that detects sound signal is less than predetermined awakening threshold value then do not handle, if the size that detects sound signal exceeds predetermined awakening threshold value then gather sound to carry out the preliminary treatment to the sound of gathering: removing mute and pause parts in the audio signal, and then detecting the signal-to-noise ratio, the amplitude clipping size and the volume size of the audio signal; and the extraction unit is used for extracting effective audio according to the detection result.
Then, performing voiceprint feature extraction on the preprocessed sound: the high-frequency part of the sound is emphasized through a pre-emphasis module, so that the energy of the high-frequency part of the sound is close to the energy of the low-frequency part, and the high-frequency formants are utilized for spectral analysis or vocal tract parameter analysis. And then, framing and windowing are carried out on the audio signal through a framing module and a windowing module, so that each frame is a stable signal, and the amplitude of each frame signal gradually changes to 0 at two ends. The time domain signal is decomposed into a plurality of sinusoidal signals with different frequencies through a Fourier transform module, and the signals after Fourier transform are filtered through a Mel filter bank. And then, the cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum (namely, the power spectrum) of the filtered signal to obtain an N-dimensional cepstrum coefficient. And finally, calculating a first-order difference parameter and a second-order difference parameter of the cepstrum coefficient through a difference module, and calculating to obtain a cepstrum parameter MFCC. Therefore, the dynamic and static characteristics are combined in the extracted voiceprint characteristics, and the identification performance of the system is effectively improved.
Comparing the extracted voiceprint features with voiceprint models of all family members in a voiceprint library one by one through a judging unit and scoring, wherein if the score exceeding a score threshold value is one, the identity authentication is successful, and the identity authentication result is the family member corresponding to the voiceprint model with the highest score; if more or less than one score exceeds the score threshold, the authentication fails and the authentication needs to be performed again.
Example 2
As shown in fig. 1 and 2, a preferred embodiment of the intelligent detection toilet of the invention comprises a urinal, a water tank, a toilet lid, a voiceprint authentication system, a sampling device, a pipeline system, a detection system and a control system; the voiceprint authentication system, the sampling device, the detection system, the pipeline system and the control system are all arranged on the toilet cover, and the voiceprint authentication system, the sampling device, the detection system and the pipeline system are all electrically connected with the control system.
As shown in fig. 2, the voiceprint authentication system is configured to train voice information of a family member to generate a voiceprint library, and perform identity authentication according to collected voice information; the voiceprint authentication system comprises a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit.
The sound collector is used for collecting sound information to generate an audio signal. The voice detection unit is used for detecting the maximum amplitude of the audio signal, and when the maximum amplitude of the audio signal exceeds a preset starting amplitude, the voiceprint authentication system is started to work.
The sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals. The sound preprocessing unit comprises a silence detection module, a voice quality detection module and an effective audio extraction module; the silence detection module is used for removing silence and pause parts in the audio signal; the voice quality detection module is used for detecting the signal-to-noise ratio, the amplitude truncation size and the volume of the audio signal; the effective audio extraction module is used for extracting effective audio according to the detection result.
The voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals; the voiceprint feature extraction unit comprises a pre-emphasis module, a framing module, a windowing module, a Fourier transform module, a Mel filter bank, a cepstrum module and a difference module.
The pre-emphasis module is used for emphasizing the high-frequency part of the sound to enable the energy of the high-frequency part of the sound to be close to the energy of the low-frequency part of the sound. Since the average power spectrum of the voice signal is affected by glottic excitation and oral-nasal radiation, the high frequency end drops by 6 dB/octave above 800Hz, i.e. 6dB/oct (2 frequency doubling) or 20dB/dec (10 frequency doubling), so when the frequency spectrum of the voice signal is calculated, the higher the frequency, the smaller the corresponding component is, and the frequency spectrum of the high frequency part is difficult to calculate than that of the low frequency part. For this purpose, the high-frequency signal is subjected to a pre-emphasis process in a pre-processing. The purpose of pre-emphasis is to emphasize the high frequency part, so that the energy of the high frequency part and the energy of the low frequency part have similar amplitudes, ensure that the same signal-to-noise ratio can be used for obtaining the frequency spectrum in the whole frequency band from low frequency to high frequency, and simultaneously, the high frequency formant can be better utilized for the frequency spectrum analysis or the vocal tract parameter analysis.
The framing module is used for framing the audio signal to enable each frame to be a stable signal. For audio signals, the characteristics can be classified into long-term (long-term), mid-term (mid-term), and short-term (short-term) according to the duration. The short-term characteristic has a good property that the voice can be approximately regarded as a good stable signal within a range of 20-50 milliseconds; this stationarity is essential for the fourier transform, so we first frame the signal before doing the signal analysis.
And the windowing module is used for enabling the amplitude of each frame signal to gradually change to 0 at two ends through windowing operation. If the dropped frame signals cannot be smoothly connected at the end points, the waveform is discontinuous, which in turn causes the FFT result to have a spectrum leakage phenomenon. Windowing is to solve this problem. By windowing, the amplitude of a frame signal gradually changes to 0 at both ends, so that no matter what form the result of framing, the left and right end points of the frame signal can be smoothly connected, and each frame can show the characteristics of a periodic function, so as to avoid the Gibbs effect (the extended reading link is recommended to be consulted for further understanding). Also, tapering has the additional benefit of fourier transformation, which can improve the resolution of the transformed result (i.e., the spectrum). The windowing operation is actually the multiplication of the frame by a window function.
The Fourier transform module is used for decomposing the time domain signal into a plurality of sinusoidal signals with different frequencies through Fourier transform. To make the speech signal easier to model, we can decompose complex sound waves into component parts. The signal in the time domain can be decomposed into sinusoidal signals with different frequencies through Fourier transform, and with the decomposition, the energy relation of the signal at different frequencies can be constructed, namely the signal in the time domain is decomposed into the signal in the frequency domain.
The Mel filter bank is used for filtering the signals after Fourier transform.
The cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum (namely, the power spectrum) of the filtered signal to obtain an N-dimensional cepstrum coefficient (N is a natural number). The spectrum of speech often exhibits both fine structure and envelope patterns. The fine structure is a small peak on the spectrogram, and the distance of the small peak on the horizontal axis is the fundamental frequency, which represents the pitch of the voice, wherein the sparser the peak is, the higher the fundamental frequency is, and the higher the pitch is. The envelope is then a smooth curve connecting the peaks of these small peaks, which represents the mouth shape. Cepstral analysis is used to extract spectral envelope and spectral details from a segment of speech. Considering the human auditory characteristics, the human ear resembles a filter set, focusing only on specific frequency components. In addition, the filters are not uniformly distributed on the frequency coordinate axis, a large number of filters are arranged in a low-frequency area and are distributed more densely, and the number of the filters is small and the filters are distributed sparsely in a high-frequency area.
The difference module is used for calculating a first order difference parameter and a second order difference parameter of the cepstrum coefficient according to the cepstrum coefficient, and synthesizing the cepstrum coefficient, the first order difference parameter and the second order difference parameter to obtain a cepstrum parameter MFCC. The first-order difference is the difference between two continuous adjacent terms in the discrete function, and the physical meaning is the difference between the current speech frame and the previous frame, which reflects the relation between the frames (two adjacent frames); on the basis of the first-order difference, the second-order difference represents the relation between the first-order difference and the first-order difference, namely the relation between the previous first-order difference and the next first-order difference, and the dynamic relation between the three adjacent frames is shown on the frame. The N-dimensional MFCC parameters can be calculated by:
n-dimensional MFCC parameter is N/3MFCC coefficient + N/3 first-order difference parameter + N/3 second-order difference parameter
The standard cepstral parameters MFCC only reflect the static characteristics of the speech parameters, and the dynamic characteristics of speech can be described by the differential spectrum of these static characteristics. The dynamic and static characteristics can be combined by adding the differential parameters, and the identification performance of the system is effectively improved.
The UBM model training unit is used for training a voiceprint model of each family member through the UBM model; the training method of the UBM model comprises the following steps: collecting the voice of a large number of non-family members, training a universal background model UBM by using the voiceprint data as background data, and then carrying out self-adaptive training on each single Gaussian model of the UBM by a family user before using the universal background model UBM according to the voiceprint characteristic parameters of each family member to obtain the voiceprint model of each family member. Due to the variability of sound, even if two pieces of collected voice content are the same for the same user, the voice has some differences due to reasons such as emotion, speed of speech, fatigue degree and the like; when the number of Gaussian components of a Gaussian Mixture Model (GMM) is enough, the GMM can simulate any probability distribution, so that the difference of voice can be compensated by adopting the GMM, and the GMM is a parameterized generative Model and has extremely strong representation power on actual data; conversely, the larger the GMM scale, the stronger the characterization force, and the more obvious the negative effects: the scale of the parameters is also expanded proportionally, and more data is needed to drive the parameter training of the GMM so as to obtain a more universal GMM model. Assuming that an acoustic feature with a dimension of 50 is modeled, the GMM includes 1024 gaussian components, and the covariance of the multidimensional gaussian is simplified to be a diagonal matrix, the total amount of parameters to be estimated by one GMM is 1024 (the total weight of the gaussian components) +1024 × 50 (the total mean of the gaussian components) +1024 × 50 (the total variance of the gaussian components) × 103424, which is more than 10 ten thousand. In the embodiment, a good pre-estimation is given to the probability model of the voice characteristics in the spatial distribution by adopting a universal background model UBM, and the voiceprint models of all family members can be quickly generated by carrying out parameter fine adjustment on the universal background model based on the data of the family users, so that the parameters of the GMM do not need to be calculated from the beginning.
The algorithm parameters in the UBM training process comprise matrix quantity, a training method, iteration times and an initialization method; the training data subset defining method comprises the total training data, the number of speakers, the number of speeches spoken by each speaker, a speaker selection method, a use mode of a feature vector and stability of data according to a channel, a microphone and a language.
The parameters for judging whether the UBM model meets the requirements comprise an error rejection rate, an error acceptance rate, an equal error rate, an accuracy rate, an ROC curve, an extraction speed and a verification comparison speed.
False Rejection Rate (FRR): in the classification problem, if two samples are of the same type (same person) but are mistakenly considered to be of different types (non-same person) by the system, the case is a false rejection case. The false rejection rate is the proportion of false rejection cases in all similar matching cases.
False Acceptance Rate (FAR): in the classification problem, if two samples are heterogeneous (not the same person), but are mistaken by the system as homogeneous (the same person), the classification problem is an error acceptance case. The false acceptance rate is the proportion of false acceptance cases in all heterogeneous matching cases.
Equal Error Rate (Equal Error Rate, EER): by adjusting the fraction threshold, the False Rejection Rate (FRR) is equal to the False Acceptance Rate (FAR), and the values of FAR and FRR at this time are called equal error rates.
Accuracy (Accuracy, ACC): by adjusting the score threshold value so that FAR + FRR is minimum, 1 minus this value is the recognition accuracy, i.e., ACC is 1-min (FAR + FRR).
Extraction speed: expressed as a Real Time Factor (Real Time Factor), is used to measure the relationship between the extraction Time and the audio duration, such as: 1 second is able to process 80s of audio, then the real-time ratio is 1: 80.
And (3) verifying the comparison speed: means the average number of voiceprint comparisons per second that can be performed.
ROC curve: a curve describing the relationship between FAR and FRR, the X-axis being the FAR value and the Y-axis being the FRR value. From left to right, during the threshold increase, there is a pair of FAR and FRR values at each time, and these values are plotted on the graph to form a curve, which is the ROC curve.
Score threshold: in an accept/reject binary classification system, a score threshold is typically set above which an acceptance decision is made. Adjusting the score threshold may balance FAR and FRR according to traffic demand. When a high score threshold is set, the score requirement for the system to make an acceptance decision is strict, FAR is reduced, and FRR is increased; when a low score threshold is set, the system is more relaxed in the score requirement to make acceptance decisions, FAR is raised, and FRR is lowered. In different application scenarios, different score thresholds are adjusted, so that a balance can be achieved between safety and convenience.
The voiceprint library is used for respectively storing the voiceprint models of all family members.
And the judgment unit is used for comparing the extracted voiceprint characteristics with the voiceprint models of all family members in the voiceprint library one by one, scoring and authenticating the identity according to the scoring result. The method for performing identity authentication according to the scoring result comprises the following steps: setting a score threshold value, and if the score exceeding the score threshold value is one, determining that the identity authentication result is the family member corresponding to the voiceprint model with the highest score; if more or less than one score exceeds the score threshold, the authentication fails.
And the control system is used for controlling the sampling device and the detection system to work after the judgment unit finishes the identity authentication, analyzing the detection result sent by the detection system, generating a detection report and storing the detection report in the corresponding family member directory.
The sampling device is used for collecting a liquid sample, and the liquid sample comprises a urine sample and/or a stool dilution sample; the sampling device comprises a urine sampling module and a stool sampling module; the urine sampling module is used for collecting urine samples, and the excrement sampling module is used for collecting excrement dilution samples.
The pipeline system is used for conveying the collected liquid sample to the detection device.
The detection system is used for detecting the liquid sample and sending the detection result to the control system. The detection system comprises a detection card and an image acquisition module, wherein the detection card comprises a urine detection card for detecting a urine sample and a stool detection card for detecting a stool dilution sample.
The working principle of the embodiment is as follows:
as shown in fig. 1 and 2, when in use, the voiceprint model of each family member needs to be stored in the voiceprint library, which specifically includes the following steps:
s101, collecting the voice of family members;
step S102, preprocessing collected sound;
step S103, extracting voiceprint characteristics of the preprocessed sound;
step S104, training the extracted voiceprint features through a UBM model, judging whether the training model meets requirements or not, if so, executing the substep S105, otherwise, returning to execute the substep S101;
and S105, storing the voiceprint models of the family members generated by the training models into a voiceprint library.
The UBM model is trained, and the universal background model UBM is irrelevant to the voice of family members, so that a large number of voices of non-family members can be collected before leaving a factory, the voiceprint data is used as background data to train the universal background model UBM, and when the universal background model UBM is used by a family user, the voiceprint model of each family member is obtained by performing self-adaptive training on each single Gaussian model of the UBM according to the voiceprint characteristic parameters of each family member, so that the voiceprint data of the family members required by training can be greatly reduced, and the training time is shortened.
The intelligent detection closestool gathers sound in real time at ordinary times, if the size that detects sound signal is less than predetermined awakening threshold value then do not handle, if the size that detects sound signal exceeds predetermined awakening threshold value then gather sound to carry out the preliminary treatment to the sound of gathering: removing mute and pause parts in the audio signal, and then detecting the signal-to-noise ratio, the amplitude clipping size and the volume size of the audio signal; and the extraction unit is used for extracting effective audio according to the detection result.
Then, performing voiceprint feature extraction on the preprocessed sound: the high-frequency part of the sound is emphasized through a pre-emphasis module, so that the energy of the high-frequency part of the sound is close to the energy of the low-frequency part, and the high-frequency formants are utilized for spectral analysis or vocal tract parameter analysis. And then, framing and windowing are carried out on the audio signal through a framing module and a windowing module, so that each frame is a stable signal, and the amplitude of each frame signal gradually changes to 0 at two ends. The time domain signal is decomposed into a plurality of sinusoidal signals with different frequencies through a Fourier transform module, and the signals after Fourier transform are filtered through a Mel filter bank. And then, the cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum (namely, the power spectrum) of the filtered signal to obtain an N-dimensional cepstrum coefficient. And finally, calculating a first-order difference parameter and a second-order difference parameter of the cepstrum coefficient through a difference module, and calculating to obtain a cepstrum parameter MFCC. Therefore, the dynamic and static characteristics are combined in the extracted voiceprint characteristics, and the identification performance of the system is effectively improved.
Comparing the extracted voiceprint features with voiceprint models of all family members in a voiceprint library one by one through a judging unit and scoring, wherein if the score exceeding a score threshold value is one, the identity authentication is successful, and the identity authentication result is the family member corresponding to the voiceprint model with the highest score; if more or less than one score exceeds the score threshold, the authentication fails and the authentication needs to be performed again.
When the identity authentication is successful, the judging unit sends a starting signal to the control system to start the control system and sends the authenticated identity information to the control system; the control system controls the sampling device, the pipeline system and the detection system to start working, respectively collects urine samples and excrement diluted samples to be dripped on the urine detection card and the excrement detection card, collects images of the detection card for analysis after the reaction is finished, generates detection results and stores the detection results in a list of authenticated family members.
In the embodiment, the identity authentication of the intelligent detection closestool is carried out through the voiceprint, the user can finish the identity authentication only by randomly speaking a plurality of words, the use is simpler, the privacy disclosure problem does not exist, and the use experience of the user is greatly improved; the voice detection unit is arranged, and the voiceprint authentication system is started to work when the maximum amplitude of the audio signal exceeds the preset starting amplitude, so that the standby power consumption of the system can be reduced; the UBM model is adopted to train the voiceprint models of family members, so that the training time is short and the training efficiency is high; the UBM model is optimized according to the characteristic of small number of users, the UBM model is simplified on the premise of ensuring the identification accuracy, and the cost is reduced.
The undescribed parts of the present invention are consistent with the prior art, and are not described herein. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures made by using the contents of the present specification and the drawings can be directly or indirectly applied to other related technical fields, and are within the scope of the present invention.

Claims (10)

1. A voiceprint identity authentication system based on a home user level is characterized by comprising a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit;
the sound collector is used for collecting sound information to generate an audio signal;
the voice detection unit is used for detecting the maximum amplitude of the audio signal and enabling the voiceprint authentication system to start working when the maximum amplitude of the audio signal exceeds a preset starting amplitude;
the sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals;
the voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals;
the UBM model training unit is used for training a voiceprint model of each family member through the UBM model;
The voiceprint library is used for respectively storing the voiceprint models of all family members;
and the judgment unit is used for comparing the extracted voiceprint characteristics with the voiceprint models of all family members in the voiceprint library one by one, scoring and authenticating the identity according to the scoring result.
2. The home user-level based voiceprint authentication system according to claim 1, wherein the voiceprint feature extraction unit comprises a pre-emphasis module, a framing module, a windowing module, a fourier transform module, a Mel filter bank, a cepstrum module and a difference module;
the pre-emphasis module is used for emphasizing the high-frequency part of the sound to enable the energy of the high-frequency part of the sound to be close to the energy of the low-frequency part;
the framing module is used for framing the audio signal to enable each frame to be a stable signal;
the windowing module is used for enabling the amplitude of each frame signal to gradually change to 0 at two ends through windowing operation;
the Fourier transform module is used for decomposing the time domain signal into a plurality of sinusoidal signals with different frequencies through Fourier transform;
the Mel filter bank is used for filtering the signals after Fourier transform;
the cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum of the filtered signal to obtain a one-dimensional or multi-dimensional cepstrum coefficient;
The difference module is used for calculating a first order difference parameter and a second order difference parameter of the cepstrum coefficient according to the cepstrum coefficient, and synthesizing the cepstrum coefficient, the first order difference parameter and the second order difference parameter to obtain a cepstrum parameter MFCC.
3. An intelligent detection closestool is characterized by comprising a urinal, a water tank, a closestool cover, a voiceprint authentication system, a sampling device, a pipeline system, a detection device and a control system; the voiceprint authentication system, the sampling device, the detection device, the pipeline system and the control system are all arranged on the toilet cover, and the voiceprint authentication system, the sampling device, the detection device and the pipeline system are all electrically connected with the control system;
the voiceprint authentication system is used for training the voice information of family members to generate a voiceprint library and authenticating the identity according to the collected voice information;
the voiceprint authentication system comprises a sound collector, a sound detection unit, a sound preprocessing unit, a voiceprint feature extraction unit, a UBM model training unit, a voiceprint library and a judgment unit;
the sound collector is used for collecting sound information to generate an audio signal;
the voice detection unit is used for detecting the maximum amplitude of the audio signal and enabling the voiceprint authentication system to start working when the maximum amplitude of the audio signal exceeds a preset starting amplitude;
The sound preprocessing unit is used for preprocessing the audio signals collected by the sound collector and extracting effective signals;
the voiceprint feature extraction unit is used for extracting voiceprint features from the effective signals;
the UBM model training unit is used for training a voiceprint model of each family member through the UBM model;
the voiceprint library is used for respectively storing the voiceprint models of all family members;
the judgment unit is used for comparing the extracted voiceprint characteristics with voiceprint models of all family members in a voiceprint library one by one and scoring, and performing identity authentication according to a scoring result;
the control system is used for controlling the sampling device and the detection device to work after the judgment unit successfully authenticates the identity, analyzing the detection result sent by the detection device, generating a detection report and storing the detection report in the corresponding family member directory;
the sampling device is used for collecting a liquid sample, and the liquid sample comprises a urine sample and/or a stool dilution sample;
the pipeline system is used for conveying the collected liquid sample to the detection device;
the detection device is used for detecting the liquid sample and sending a detection result to the control system.
4. The smart phone of claim 3, wherein the sound preprocessing unit comprises a silence detection module, a voice quality detection module, and an active audio extraction module;
The silence detection module is used for removing silence and pause parts in the audio signal;
the voice quality detection module is used for detecting the signal-to-noise ratio, the amplitude truncation size and the volume of the audio signal;
the effective audio extraction module is used for extracting effective audio according to the detection result.
5. The intelligent detection toilet of claim 3, wherein the voiceprint feature extraction unit comprises a pre-emphasis module, a framing module, a windowing module, a Fourier transform module, a Mel filter bank, a cepstrum module, and a difference module;
the pre-emphasis module is used for emphasizing the high-frequency part of the sound to enable the energy of the high-frequency part of the sound to be close to the energy of the low-frequency part;
the framing module is used for framing the audio signal to enable each frame to be a stable signal;
the windowing module is used for enabling the amplitude of each frame signal to gradually change to 0 at two ends through windowing operation;
the Fourier transform module is used for decomposing the time domain signal into a plurality of sinusoidal signals with different frequencies through Fourier transform;
the Mel filter bank is used for filtering the signals after Fourier transform;
the cepstrum module is used for carrying out inverse logarithmic Fourier transform on the short-time amplitude spectrum of the filtered signal to obtain a one-dimensional or multi-dimensional cepstrum coefficient;
The difference module is used for calculating a first order difference parameter and a second order difference parameter of the cepstrum coefficient according to the cepstrum coefficient, and synthesizing the cepstrum coefficient, the first order difference parameter and the second order difference parameter to obtain a cepstrum parameter MFCC.
6. The intelligent detection toilet bowl as claimed in claim 3, wherein the training method of the UBM model is as follows: collecting the voice of a large number of non-family members, training a universal background model UBM by using the voiceprint data as background data, and then carrying out self-adaptive training on each single Gaussian model of the UBM according to the voiceprint characteristic parameters of each family member to obtain the voiceprint model of each family member.
7. The intelligent detection toilet bowl according to claim 3, wherein the algorithm parameters in the UBM model training process comprise the number of matrices, a training method, the number of iterations and an initialization method; the training data subset definition method comprises the total training data, the number of speakers, the number of voices spoken by each speaker, a speaker selection method, a use mode of a feature vector and stability of data according to a channel, a microphone and a language; the parameters for judging whether the UBM model meets the requirements comprise an error rejection rate, an error acceptance rate, an equal error rate, an accuracy rate, an ROC curve, an extraction speed and a verification comparison speed.
8. The intelligent detection closestool of claim 3, wherein the identity authentication method according to the scoring result comprises the following steps: setting a score threshold value, and if the score exceeding the score threshold value is one, determining that the identity authentication result is the family member corresponding to the voiceprint model with the highest score; if more or less than one score exceeds the score threshold, the authentication fails.
9. The smart detection toilet of claim 3, wherein the sampling device comprises a urine sampling module and a stool sampling module; the urine sampling module is used for collecting urine samples, and the excrement sampling module is used for collecting excrement dilution samples.
10. The intelligent detection closestool of claim 3, wherein the detection device comprises a detection card and an image acquisition module, the pipeline system drops the liquid sample on the detection card, and the control system controls the image acquisition module to acquire the image of the detection card after the liquid sample is dropped on the detection card and the reaction is completed.
CN202110404410.2A 2021-04-15 2021-04-15 Voiceprint identity authentication system and intelligent detection closestool based on home user level Withdrawn CN113113026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110404410.2A CN113113026A (en) 2021-04-15 2021-04-15 Voiceprint identity authentication system and intelligent detection closestool based on home user level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110404410.2A CN113113026A (en) 2021-04-15 2021-04-15 Voiceprint identity authentication system and intelligent detection closestool based on home user level

Publications (1)

Publication Number Publication Date
CN113113026A true CN113113026A (en) 2021-07-13

Family

ID=76717096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110404410.2A Withdrawn CN113113026A (en) 2021-04-15 2021-04-15 Voiceprint identity authentication system and intelligent detection closestool based on home user level

Country Status (1)

Country Link
CN (1) CN113113026A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN109235590A (en) * 2018-09-26 2019-01-18 深圳市博电电子技术有限公司 Intelligent closestool control method and intelligent closestool
CN209136623U (en) * 2018-04-12 2019-07-23 北京几何科技有限公司 It is a kind of to provide the detection system of health detection strategy based on user identity
CN110274903A (en) * 2019-07-26 2019-09-24 重庆德方信息技术有限公司 Health test apparatus for intelligent closestool
CN110376023A (en) * 2019-07-26 2019-10-25 重庆德方信息技术有限公司 Control system and method applied to intelligent closestool
CN110993043A (en) * 2019-11-25 2020-04-10 重庆德方信息技术有限公司 Medical health management system
CN111122839A (en) * 2019-12-30 2020-05-08 佛山市为博康医疗科技有限公司 Intelligent conventional detection device
CN111305338A (en) * 2020-02-14 2020-06-19 宁波五维检测科技有限公司 Disease early warning system based on excrement ecological evaluation, health monitoring ring and closestool
CN111699387A (en) * 2020-03-05 2020-09-22 厦门波耐模型设计有限责任公司 Closestool type urine and excrement detection robot and Internet of things system thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN209136623U (en) * 2018-04-12 2019-07-23 北京几何科技有限公司 It is a kind of to provide the detection system of health detection strategy based on user identity
CN109235590A (en) * 2018-09-26 2019-01-18 深圳市博电电子技术有限公司 Intelligent closestool control method and intelligent closestool
CN110274903A (en) * 2019-07-26 2019-09-24 重庆德方信息技术有限公司 Health test apparatus for intelligent closestool
CN110376023A (en) * 2019-07-26 2019-10-25 重庆德方信息技术有限公司 Control system and method applied to intelligent closestool
CN110993043A (en) * 2019-11-25 2020-04-10 重庆德方信息技术有限公司 Medical health management system
CN111122839A (en) * 2019-12-30 2020-05-08 佛山市为博康医疗科技有限公司 Intelligent conventional detection device
CN111305338A (en) * 2020-02-14 2020-06-19 宁波五维检测科技有限公司 Disease early warning system based on excrement ecological evaluation, health monitoring ring and closestool
CN111699387A (en) * 2020-03-05 2020-09-22 厦门波耐模型设计有限责任公司 Closestool type urine and excrement detection robot and Internet of things system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DOMINIC221: ""声纹识别之GMM-UBM系统框架简介"", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/97119639》 *

Similar Documents

Publication Publication Date Title
CN104835498B (en) Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
US9691392B1 (en) System and method for improved audio consistency
US7877254B2 (en) Method and apparatus for enrollment and verification of speaker authentication
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
CN108597496A (en) Voice generation method and device based on generation type countermeasure network
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
CN103236260A (en) Voice recognition system
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
CN102324232A (en) Voiceprint recognition method and system based on Gaussian mixture model
Kamble et al. Novel energy separation based instantaneous frequency features for spoof speech detection
CN111583936A (en) Intelligent voice elevator control method and device
CN113823293A (en) A method and system for speaker recognition based on speech enhancement
CN106653004A (en) Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Hao et al. Speech enhancement using Gaussian scale mixture models
CN113241059A (en) Voice wake-up method, device, equipment and storage medium
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
Wang et al. Robust Text-independent Speaker Identification in a Time-varying Noisy Environment.
CN113113026A (en) Voiceprint identity authentication system and intelligent detection closestool based on home user level
Pati et al. A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information
Bouziane et al. Towards an objective comparison of feature extraction techniques for automatic speaker recognition systems
Zouhir et al. Robust speaker recognition based on biologically inspired features
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
Parrul et al. Automatic speaker recognition system
Sanderson Speech processing & text-independent automatic person verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210713

WW01 Invention patent application withdrawn after publication