CN111710340A - Method, device, server and storage medium for identifying user identity based on voice - Google Patents
Method, device, server and storage medium for identifying user identity based on voice Download PDFInfo
- Publication number
- CN111710340A CN111710340A CN202010505687.XA CN202010505687A CN111710340A CN 111710340 A CN111710340 A CN 111710340A CN 202010505687 A CN202010505687 A CN 202010505687A CN 111710340 A CN111710340 A CN 111710340A
- Authority
- CN
- China
- Prior art keywords
- verification
- pronunciation
- user
- voice
- voiceprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012795 verification Methods 0.000 claims abstract description 214
- 238000013136 deep learning model Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 5
- 230000006399 behavior Effects 0.000 abstract description 7
- 238000012217 deletion Methods 0.000 abstract description 7
- 230000037430 deletion Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention discloses a method, a device, a server and a storage medium for identifying user identity based on voice, wherein the method comprises the following steps: acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the method for identifying the user identity based on the voice, provided by the embodiment of the invention, the user identity is identified by identifying the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of the user account is improved, and the user experience is enhanced.
Description
Technical Field
The embodiment of the invention relates to a voice recognition technology, in particular to a method, a device, a server and a storage medium for recognizing user identity based on voice.
Background
Fingerprints and human faces as biological characteristics are widely used in various technical fields of identity confirmation, such as fingerprint door lock, human face check-in, mobile phone fingerprints, human face payment and the like, but another biological characteristic is usually ignored by people, namely voice. The change of the sound of an adult with the age is very small and can be basically ignored, the distinction degree of the sound of men and women, children and adults is very large, in addition, the sound difference between people is also very large on the fine granularity, and more effective sound biological characteristic purposes can be developed by utilizing the difference.
In the face recognition technology, along with the rapid popularization of smart phone devices, the difficulty of acquiring face data gradually decreases, and the fingerprint recognition technology is also developed to acquire the face data under a screen. For voiceprint recognition, only a microphone is needed for collection, and other equipment is not needed. In the user angle, most people do not want to see own faces in the camera, so the experience effect of the face recognition user is not high; for fingerprint identification, fingerprint identification usually needs identification equipment and fingers to keep clean, otherwise the identification effect can be very poor, for voiceprint identification, only need to make sound can accomplish the identification, experience effect is better for the user.
Therefore, the verification through voice is more easily welcomed by users, but in the voice recognition in the prior art, some schemes only introduce the whole identity confirmation process, but in order to explain the key technologies related to each module in the process, some adopted technologies are laggard, are not in line with the development trend of deep learning at present, and can not effectively represent the difference of the identities of people.
Disclosure of Invention
The invention provides a method for identifying a user identity based on voice, which aims to improve the safety of a user account and enhance the user experience.
In a first aspect, an embodiment of the present invention provides a method for identifying a user identity based on voice, including:
acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
Optionally, before the obtaining of the verification pronunciation generated by the user reading verification information, the method further includes:
acquiring a plurality of sample pronunciations generated by reading a plurality of pieces of sample information by a user;
and if the plurality of sample pronunciations are matched with the preset pronunciations corresponding to the sample information, confirming the voiceprint range of the user according to the plurality of sample pronunciations.
Optionally, the determining whether the verification pronunciation matches the preset pronunciation of the verification information includes: and converting the verification pronunciation into a verification graphic feature by using a first deep learning model, and judging whether the verification graphic feature is matched with a preset graphic feature corresponding to a preset pronunciation of the verification information.
Optionally, the converting the verification utterance into the verification pattern feature using the first deep learning model includes:
each word, symbol or number of the verification utterance is separately converted to a verification graph feature using a first deep learning model.
Optionally, the converting the verification utterance into the verification pattern feature using the first deep learning model includes:
and converting all characters, symbols or numbers of the verification pronunciation into a verification graphic feature for recognition by using a first deep learning model.
Optionally, the determining the user identity according to the voiceprint of the verification pronunciation includes: and converting the verification pronunciation into a frequency distribution graph by using a second deep learning model, and confirming the voiceprint of the user and the user identity corresponding to the voiceprint according to the frequency distribution graph.
Optionally, the verification information is one or more of a random first-digit word, symbol or number and one or more of a second-digit word, symbol or number.
In a second aspect, an embodiment of the present invention further provides a device for recognizing a user identity based on voice, where the device includes:
the voice acquisition module is used for acquiring verification pronunciation generated by reading verification information by a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and the voice recognition module is used for confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information or not, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method for recognizing a user identity based on speech as in any one of the above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements any of the above-mentioned methods for recognizing a user identity based on voice.
The embodiment of the invention discloses a method, a device, a server and a storage medium for identifying user identity based on voice, wherein the method comprises the following steps: acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the method for identifying the user identity based on the voice, provided by the embodiment of the invention, the user identity is identified by identifying the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of the user account is improved, and the user experience is enhanced.
Drawings
Fig. 1 is a flowchart of a method for identifying a speaker identity based on a dynamic voice password according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for identifying the identity of a speaker based on a dynamic voice password according to a second embodiment of the present invention;
FIG. 3 is a diagram illustrating identification of numbers according to waveforms in the second embodiment;
FIG. 4 is a schematic structural diagram of an apparatus for recognizing a user identity based on speech in an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer server according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first deep learning model may be referred to as a second deep learning model, and similarly, a second deep learning model may be referred to as a first deep learning model, without departing from the scope of the present application. Both the first deep learning model and the second deep learning model are deep learning models, but they are not the same deep learning model. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Example one
Fig. 1 is a flowchart of a method for recognizing an identity of a speaker based on a dynamic voice password according to an embodiment of the present invention, where the embodiment is suitable for a situation where an identity of a user is confirmed by recognizing a voice of the user, and specifically includes the following steps:
In this embodiment, the verification information is generated by the server according to the verification request of the user, and is sent to the user terminal after being generated so as to be used for voice verification by the user, the verification information is similar to a mobile phone verification code, for example, a digital string 01273569 or a text string "it is good today's weather", in this embodiment, the verification information is described by taking the digital verification code as an example, common digital verification codes include 4-bit, 6-bit and 8-bit digital verification codes, and it is considered that a voice signal is to be collected later, and the voice length is guaranteed to be valid, so that an 8-bit digital verification code is adopted. The verification pronunciation is audio information sent after the user reads the verification information, and the audio information can be collected in a microphone of the user terminal or a recording function of the terminal APP and uploaded to the server for verification. After the server acquires the verification pronunciation of the user, the server can carry out matching through the preset pronunciation in the database to verify whether the verification pronunciation of the user is accurate.
In this embodiment, each bit of text, symbol, or number in the verification information is randomly generated, for example, each bit of text, symbol, or number is randomly generated by a random number generator, or may be randomly generated by a custom function, and the random generation of each bit of text, symbol, or number avoids that the number of samples in the verification information is too small, which results in too small number of verification information and lack of diversity. In an alternative embodiment, each character, symbol or number in the verification information is independent from each other and is not repeated, for example, the number string 18475920, and each number in the number string is not repeated, so that each number is ensured to be separately identified and matched, and the accuracy of voice recognition is improved. In an alternative embodiment, the distance between two words, symbols or numbers in the verification information, where the acoustic pronunciation is close, is at least 2 bits, and when the number 2 and the number 8 are present, the distance between the two words, symbols or numbers needs to be kept above 2 bits, because the acoustic pronunciations are closer and are easy to be confused in linguistics with the number 2 and the number 8, and especially when the two words, symbols or numbers are continuous, the two words, symbols or numbers need to be spaced apart. In an alternative embodiment, the verification information is one or more of a random first digit of text, symbol, or number plus one or more of a second digit of text, symbol, or number. In order to improve the throughput and enhance the user experience, the digital verification code is formed by adding the random number of the fixed number and the fixed number of the fixed number, specifically, for example, three strings of verification codes are generated in the registration stage, which are respectively: 84672591, 07632591 and 48672591, it can be seen that the last four digits 2591 are fixed, the first four digits are random, and the new authentication code, such as 36702591, is obtained after the first four digits are random in the authentication stage. The method is used for preparing for voiceprint recognition, fixes partial text, limits the voiceprint recognition to be a text related task, can effectively improve the recognition rate, and increases the verification passing rate.
And step 110, confirming whether the verification pronunciation is matched with a preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
In this embodiment, the voiceprint refers to the sound wave spectrum that carries speech information that shows with the electro-acoustic apparatus, and the voiceprint has the characteristics of specificity and stability, and everyone's voiceprint all is inequality and along with age growth, and the voiceprint change is also little, consequently through the voiceprint recognition user identity success rate higher. The preset pronunciation is a standard pronunciation of characters, symbols or numbers, which can be obtained by averaging pronunciations of multiple persons, and is used for comparing with a verification pronunciation of a user to judge whether the pronunciation of the user is accurate. And after the server receives the verification pronunciation generated by the user according to the verification information, if the user pronunciation is matched with the preset pronunciation, the server indicates that the user reads the verification information accurately, and the server performs next step of voiceprint recognition to further confirm the identity of the user. In the voiceprint recognition, the server matches the verification pronunciation of the user with the user voice backup recorded in the database, if the similarity degree of the voiceprint of the verification pronunciation and the voiceprint of the voice backup is within the range of a preset threshold value, the verification is passed, and if the similarity degree is not within the range of the preset threshold value, the verification fails. Illustratively, a user needs to perform identity authentication when logging in a mobile phone APP, at the moment, the user clicks to start an authentication server to randomly generate a string of authentication codes, the user reads audio information of the authentication codes and sends the audio information to the server for authentication and matching, if the audio information passes the authentication codes, the user is allowed to log in the mobile phone APP, and if the audio information does not pass the authentication codes, the user is not allowed to log in, so that the safety of the user when using the mobile phone APP is ensured.
The embodiment discloses a method for identifying user identity based on voice, which comprises the following steps: acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the method for identifying the user identity based on the voice, provided by the embodiment of the invention, the user identity is identified by identifying the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of the user account is improved, and the user experience is enhanced.
Example two
Fig. 2 is a flowchart of a method for identifying an identity of a speaker based on a dynamic voice password according to a second embodiment of the present invention, where the second embodiment is suitable for a situation where an identity of a user is confirmed by identifying a voice of the user, and specifically includes the following steps:
and 200, acquiring a plurality of sample pronunciations generated by reading a plurality of pieces of sample information by a user.
In this embodiment, the sample information is generated by the server according to the authentication request of the user, and is sent to the user terminal after being generated for the user to perform voice authentication, where the authentication information is similar to a mobile phone authentication code, such as a numeric string 01273569 or a text string "it is good today's weather". The sample voice is audio information sent after the user reads the verification information, and the audio information can be collected through a microphone of the user terminal or a recording function of the terminal APP and the like and uploaded to the server for verification. In this embodiment, the format and the generation rule of the sample information are the same as those of the verification information in the first embodiment, and the specific format and the generation rule are described in detail in the first embodiment. For example, when a user uses APP for the first time, a sample pronunciation of the user needs to be collected for being used as a verification audio when the user logs in later, the server sends a plurality of pieces of sample information to the user handsets such as 84672591, 07632591 and 48672591, and the user needs to read according to the plurality of pieces of sample information to generate a plurality of sample pronunciations to be sent to the server for verification.
And 210, if the sample pronunciations are all matched with the preset pronunciations corresponding to the sample information, confirming the voiceprint range of the user according to the sample pronunciations.
In this embodiment, the server receives multiple sample pronunciations of the user and then matches the multiple sample pronunciations with a preset pronunciation, if the multiple sample pronunciations are within a fault-tolerant range of the preset pronunciation, the server successfully verifies the multiple sample pronunciations, and the server matches and stores a mean value of voiceprints corresponding to the multiple sample pronunciations of the user and identity information of the user in a database. Illustratively, a plurality of pieces of sample information are collected for the user a, which are 84672591, 07632591 and 48672591, and correspond to the voiceprint a, the voiceprint B and the voiceprint C, respectively, the server calculates an average value of the voiceprint a, the voiceprint B and the voiceprint C to obtain a voiceprint D and confirms a threshold range of the voiceprint D, and stores the threshold range of the voiceprint D and the voiceprint D with various pieces of user information of the user a, such as name, telephone and the like. When the user who speaks the nail puts forward authentication, the voiceprint X corresponding to the voice of the user is collected and matched with the voiceprint D of the nail in the database, and if the voiceprint X accords with the threshold range of the voiceprint D, the user is proved to be authenticated to be the nail of the user.
In this embodiment, the first deep learning model is vgg13 (a lightweight version of the vgg model) for identifying specific words, symbols, or numbers from a frequency spectrogram. The preset pattern feature is to convert the verification utterance into a frequency distribution pattern, and illustratively, the frequency distribution pattern of each number is different, so that each number can be distinguished by the frequency distribution pattern. In the embodiment, each character, symbol or number of the verification pronunciation is individually converted into the verification graphic feature by using the first deep learning model, that is, the whole character, symbol or number is divided into single characters, symbols or numbers to be recognized one by one, for example, the number string 57043102 is divided into eight individual numbers, i.e., 5, 7, 0, 4, 3, 1, 0 and 2, to be recognized one by one. In the recognition process, the voice signal is segmented by removing the silent segment in the verification pronunciation, and then each voiced segment is recognized, so that the Chinese digital recognition of the continuous voice signal is realized. Referring to fig. 3, fig. 3 is a schematic diagram of the present embodiment according to waveform identification numbers, and the waveform in fig. 3 is a time-domain signal waveform, because the time-domain signal carries a single information, the available features are less, and especially the discrimination between signals is not high, so that when analyzing a speech signal, fourier transform is often adopted to convert the time domain into the frequency domain, and the signal distribution is observed in the frequency domain. After transformation, since each number has a different corresponding frequency distribution and a different formant, a specific number can be identified according to the frequency distribution and formant condition in the frequency map. In the vgg13 model, training data cover at least more than 60 persons, including male and female, each of which is a single pronunciation of one number from 0 to 9, and data enhancement is performed on the basis, including pruning, loudness compensation, adding background noise, adjusting pitch and adjusting speech speed, and ten-fold cross validation is adopted in the training process.
In an alternative embodiment, all words, symbols or numbers of the verification utterance are recognized as a whole using a first deep learning model, i.e., all words, symbols or numbers are recognized as a whole, such as the string of numbers 57043102 as a whole.
In an alternative embodiment, using mel-energy spectrogram of speech signal as input features, the first deep learning model uses CNN-LSTM as model, which is different from vgg13 model in that the input and output are no longer single digital speech signal, and are continuous speech signals, for example, the input is the pronunciation of 01234567, the output is eight digits of 01234567, an end-to-end idea, and adding CTC as loss function, it will automatically align the prediction result of each frame, merge the repeated terms, and give the sequence result with highest confidence.
In this embodiment, the second deep learning model is an identity coding model, and the model structure is a combination of CNN, RNN, LSTM, GRU, and the like, which is preferred for performance. The voiceprints are input into the identity coding model and converted into the frequency distribution map, and the voiceprints of each person can be distinguished by reading the frequency distribution map for identification because the voiceprints of each person are different and the frequency distribution map is different. In the voiceprint feature extraction stage, original MFCC features are abandoned, the voiceprint feature extraction stage is mainly characterized in that the original MFCC features are abandoned, the voiceprint feature extraction stage is mainly obtained through manual extraction and dimension reduction, excessive information is screened out subjectively, and model self-learning is not facilitated. The Mel energy spectrum chart is used for characterizing the frequency distribution of sounds which can be heard by a person, is a deep characteristic of the person for distinguishing things through the sounds, and is more suitable for constructing a speaker recognition system by utilizing the distribution characteristic in the Mel frequency domain. Although the mel frequency domain is nonlinear, the relationship between the ordinary frequency domain and the mel frequency domain is linear, and in the mel frequency domain, the human perception of the tone is also linear. Obviously, after such conversion, the voice signal becomes an image carrying voiceprint information, and for a single signal, the mel-energy spectrum is black and white, which can be understood as a single-channel picture.
In the training stage of the identity coding model, the target of model training is more concerned than the structure in the stage, the problem of model multiplexing is considered, the adopted discrimination model is different from the generated model, only one model exists in a plurality of categories, but the problem of fixed number of categories also exists.
In the first alternative embodiment, an identity coding task is directly trained, an identity coding task is not trained through a classification task, specifically, a twin network structure is adopted, the twin network structure is two models with the same structure and shared weight, a pair of two data are input each time, and when the two models are the same person, the aim is to reduce the difference between two network outputs; when the two people are different, the difference between the two network outputs is increased, and after final convergence, the model can be understood to have learned the capability of distinguishing the sounds of the same person and different persons, and the result vector output by the model can be used as the identity code.
In the second alternative embodiment, based on the idea of the first alternative embodiment, a better result can be obtained by using a ternary loss function, where the input is three, one metadata, one positive case with the same person as the metadata, and one negative case with different persons as the metadata, and the goal is to expect that the difference between the metadata and the positive case is far smaller than the difference between the metadata and the negative case for each input, which is more beneficial for the network from the voiceprint characterization to the identity of different persons.
The embodiment of the invention discloses a method for identifying user identity based on voice, which comprises the following steps: acquiring a plurality of sample pronunciations generated by reading a plurality of pieces of sample information by a user; if the plurality of sample pronunciations are matched with the preset pronunciations corresponding to the sample information, confirming the voiceprint range of the user according to the plurality of sample pronunciations; acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; converting the verification pronunciation into a verification graphic feature by using a first deep learning model, and judging whether the verification graphic feature is matched with a preset graphic feature corresponding to a preset pronunciation of the verification information; and converting the verification pronunciation into a frequency distribution graph by using a second deep learning model, and confirming the voiceprint of the user and the user identity corresponding to the voiceprint according to the frequency distribution graph. According to the method for identifying the user identity based on the voice, provided by the embodiment of the invention, the user identity is identified by identifying the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of the user account is improved, and the user experience is enhanced.
EXAMPLE III
The device for recognizing the user identity based on the voice provided by the embodiment of the invention can implement the method for recognizing the user identity based on the voice provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Fig. 4 is a schematic structural diagram of an apparatus 300 for recognizing a user identity based on voice in an embodiment of the present invention. Referring to fig. 4, the apparatus 300 for recognizing a user identity based on voice according to an embodiment of the present invention may specifically include:
the voice acquisition module is used for acquiring verification pronunciation generated by reading verification information by a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and the voice recognition module is used for confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information or not, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
Optionally, before the obtaining of the verification pronunciation generated by the user reading verification information, the method further includes:
acquiring a plurality of sample pronunciations generated by reading a plurality of pieces of sample information by a user;
and if the plurality of sample pronunciations are matched with the preset pronunciations corresponding to the sample information, confirming the voiceprint range of the user according to the plurality of sample pronunciations.
Optionally, the determining whether the verification pronunciation matches the preset pronunciation of the verification information includes: and converting the verification pronunciation into a verification graphic feature by using a first deep learning model, and judging whether the verification graphic feature is matched with a preset graphic feature corresponding to a preset pronunciation of the verification information.
Optionally, the converting the verification utterance into the verification pattern feature using the first deep learning model includes:
each word, symbol or number of the verification utterance is separately converted to a verification graph feature using a first deep learning model.
Optionally, the converting the verification utterance into the verification pattern feature using the first deep learning model includes:
and converting all characters, symbols or numbers of the verification pronunciation into a verification graphic feature for recognition by using a first deep learning model.
Optionally, the determining the user identity according to the voiceprint of the verification pronunciation includes: and converting the verification pronunciation into a frequency distribution graph by using a second deep learning model, and confirming the voiceprint of the user and the user identity corresponding to the voiceprint according to the frequency distribution graph.
Optionally, the verification information is one or more of a random first-digit word, symbol or number and one or more of a second-digit word, symbol or number.
The embodiment discloses a device based on speech recognition user identity, including: the voice acquisition module is used for acquiring verification pronunciation generated by reading verification information by a user, wherein the verification information comprises one or more of characters, symbols or numbers; and the voice recognition module is used for confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information or not, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the method for identifying the user identity based on the voice, provided by the embodiment of the invention, the user identity is identified by identifying the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of the user account is improved, and the user experience is enhanced.
Example four
Fig. 5 is a schematic structural diagram of a computer server according to an embodiment of the present invention, as shown in fig. 5, the computer server includes a memory 410 and a processor 420, the number of the processors 420 in the computer server may be one or more, and one processor 420 is taken as an example in fig. 5; the memory 410 and the processor 420 in the device may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The memory 410 is used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the method for recognizing a user identity based on voice in the embodiment of the present invention (e.g., the voice obtaining module 310 and the voice recognition module 320 in the apparatus 300 for recognizing a user identity based on voice), and the processor 420 executes various functional applications and data processing of the device/terminal/equipment by executing the software programs, instructions, and modules stored in the memory 410, so as to implement the method for recognizing a user identity based on voice.
Wherein the processor 420 is configured to run the computer program stored in the memory 410, and implement the following steps:
acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
In one embodiment, the computer program of the computer device provided by the embodiment of the present invention is not limited to the above method operations, and may also perform related operations in the method for recognizing the user identity based on voice provided by any embodiment of the present invention.
The memory 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 410 may further include memory located remotely from the processor 420, which may be connected to devices/terminals/devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiment discloses a server for identifying user identity based on voice, which is used for executing the following method, and comprises the following steps: acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the server for identifying the user identity based on the voice, the user identity is identified by identifying the verification pronunciation of the user reading verification information, non-user illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the safety of the user account is improved, and the user experience is enhanced.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for recognizing a user identity based on speech, the method including:
acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in a method for recognizing a user identity based on a voice according to any embodiment of the present invention.
The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or terminal. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The embodiment discloses a storage medium for recognizing user identity based on voice, which is used for executing the following method, comprising the following steps: acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers; and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation. According to the storage medium for recognizing the user identity based on the voice, provided by the embodiment of the invention, the user identity is recognized by recognizing the verification pronunciation of the user reading verification information, so that non-personal illegal operation behaviors can be effectively prevented in common links related to privacy operations, such as account login, transaction information inquiry, transaction record deletion and the like, the security of a user account is improved, and the user experience is enhanced.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A method for recognizing user identity based on voice is characterized by comprising the following steps:
acquiring verification pronunciation generated by reading verification information of a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
2. The method for recognizing user identity based on voice according to claim 1, wherein the obtaining of the verification pronunciation generated by the verification information read by the user further comprises:
acquiring a plurality of sample pronunciations generated by reading a plurality of pieces of sample information by a user;
and if the plurality of sample pronunciations are matched with the preset pronunciations corresponding to the sample information, confirming the voiceprint range of the user according to the plurality of sample pronunciations.
3. The method for recognizing user identity based on voice according to claim 1, wherein the confirming whether the verification pronunciation matches the preset pronunciation of the verification information comprises: and converting the verification pronunciation into a verification graphic feature by using a first deep learning model, and judging whether the verification graphic feature is matched with a preset graphic feature corresponding to a preset pronunciation of the verification information.
4. The method for recognizing user identity based on speech according to claim 3, wherein said converting the verification utterance into a verification pattern using the first deep learning model comprises:
each word, symbol or number of the verification utterance is separately converted to a verification graph feature using a first deep learning model.
5. The method for recognizing user identity based on speech according to claim 3, wherein said converting the verification utterance into a verification pattern using the first deep learning model comprises:
and converting all characters, symbols or numbers of the verification pronunciation into a verification graphic feature for recognition by using a first deep learning model.
6. The method of claim 1, wherein the determining the user identity based on the voiceprint of the verification utterance comprises: and converting the verification pronunciation into a frequency distribution graph by using a second deep learning model, and confirming the voiceprint of the user and the user identity corresponding to the voiceprint according to the frequency distribution graph.
7. The method of claim 1, wherein the verification information is one or more of a random first digit of text, symbol, or number plus a second digit of text, symbol, or number.
8. An apparatus for recognizing user identity based on voice, comprising:
the voice acquisition module is used for acquiring verification pronunciation generated by reading verification information by a user, wherein the verification information comprises one or more of characters, symbols or numbers;
and the voice recognition module is used for confirming whether the verification pronunciation is matched with the preset pronunciation of the verification information or not, and if so, determining the identity of the user according to the voiceprint of the verification pronunciation.
9. A server, characterized in that the server comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method for recognizing a user identity based on speech as recited in any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for recognizing a user identity on the basis of speech as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010505687.XA CN111710340A (en) | 2020-06-05 | 2020-06-05 | Method, device, server and storage medium for identifying user identity based on voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010505687.XA CN111710340A (en) | 2020-06-05 | 2020-06-05 | Method, device, server and storage medium for identifying user identity based on voice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111710340A true CN111710340A (en) | 2020-09-25 |
Family
ID=72539403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010505687.XA Pending CN111710340A (en) | 2020-06-05 | 2020-06-05 | Method, device, server and storage medium for identifying user identity based on voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111710340A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948788A (en) * | 2021-04-13 | 2021-06-11 | 网易(杭州)网络有限公司 | Voice verification method, device, computing equipment and medium |
CN113098850A (en) * | 2021-03-24 | 2021-07-09 | 北京嘀嘀无限科技发展有限公司 | Voice verification method and device and electronic equipment |
CN113449083A (en) * | 2021-08-31 | 2021-09-28 | 深圳市信润富联数字科技有限公司 | Operation safety management method, device, equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102413100A (en) * | 2010-09-25 | 2012-04-11 | 盛乐信息技术(上海)有限公司 | Voiceprint authentication system for voiceprint password picture prompt and implementation method thereof |
CN102708867A (en) * | 2012-05-30 | 2012-10-03 | 北京正鹰科技有限责任公司 | Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice |
EP3107091A1 (en) * | 2015-06-17 | 2016-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd | Voiceprint authentication method and apparatus |
US20170318013A1 (en) * | 2016-04-29 | 2017-11-02 | Yen4Ken, Inc. | Method and system for voice-based user authentication and content evaluation |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
CN109243467A (en) * | 2018-11-14 | 2019-01-18 | 龙马智声(珠海)科技有限公司 | Sound-groove model construction method, method for recognizing sound-groove and system |
CN110751945A (en) * | 2019-10-17 | 2020-02-04 | 成都三零凯天通信实业有限公司 | End-to-end voice recognition method |
CN111048099A (en) * | 2019-12-16 | 2020-04-21 | 随手(北京)信息技术有限公司 | Sound source identification method, device, server and storage medium |
CN111081080A (en) * | 2019-05-29 | 2020-04-28 | 广东小天才科技有限公司 | Voice detection method and learning device |
CN111192574A (en) * | 2018-11-14 | 2020-05-22 | 奇酷互联网络科技(深圳)有限公司 | Intelligent voice interaction method, mobile terminal and computer readable storage medium |
-
2020
- 2020-06-05 CN CN202010505687.XA patent/CN111710340A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102413100A (en) * | 2010-09-25 | 2012-04-11 | 盛乐信息技术(上海)有限公司 | Voiceprint authentication system for voiceprint password picture prompt and implementation method thereof |
CN102708867A (en) * | 2012-05-30 | 2012-10-03 | 北京正鹰科技有限责任公司 | Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice |
EP3107091A1 (en) * | 2015-06-17 | 2016-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd | Voiceprint authentication method and apparatus |
US20170318013A1 (en) * | 2016-04-29 | 2017-11-02 | Yen4Ken, Inc. | Method and system for voice-based user authentication and content evaluation |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
CN109243467A (en) * | 2018-11-14 | 2019-01-18 | 龙马智声(珠海)科技有限公司 | Sound-groove model construction method, method for recognizing sound-groove and system |
CN111192574A (en) * | 2018-11-14 | 2020-05-22 | 奇酷互联网络科技(深圳)有限公司 | Intelligent voice interaction method, mobile terminal and computer readable storage medium |
CN111081080A (en) * | 2019-05-29 | 2020-04-28 | 广东小天才科技有限公司 | Voice detection method and learning device |
CN110751945A (en) * | 2019-10-17 | 2020-02-04 | 成都三零凯天通信实业有限公司 | End-to-end voice recognition method |
CN111048099A (en) * | 2019-12-16 | 2020-04-21 | 随手(北京)信息技术有限公司 | Sound source identification method, device, server and storage medium |
Non-Patent Citations (4)
Title |
---|
吴哲顺: "基于协同边缘计算的声纹识别系统的研究与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 02, pages 30 - 41 * |
杨楠: "《基于深度学习的说话人识别研究与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
杨楠: "《基于深度学习的说话人识别研究与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 07, 15 July 2019 (2019-07-15), pages 13 - 30 * |
杨楠: "基于深度学习的说话人识别研究与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 07, pages 54 - 55 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098850A (en) * | 2021-03-24 | 2021-07-09 | 北京嘀嘀无限科技发展有限公司 | Voice verification method and device and electronic equipment |
CN112948788A (en) * | 2021-04-13 | 2021-06-11 | 网易(杭州)网络有限公司 | Voice verification method, device, computing equipment and medium |
CN112948788B (en) * | 2021-04-13 | 2024-05-31 | 杭州网易智企科技有限公司 | Voice verification method, device, computing equipment and medium |
CN113449083A (en) * | 2021-08-31 | 2021-09-28 | 深圳市信润富联数字科技有限公司 | Operation safety management method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7386448B1 (en) | Biometric voice authentication | |
CN109493872B (en) | Voice information verification method and device, electronic equipment and storage medium | |
Liu et al. | An MFCC‐based text‐independent speaker identification system for access control | |
WO2017197953A1 (en) | Voiceprint-based identity recognition method and device | |
CN111710340A (en) | Method, device, server and storage medium for identifying user identity based on voice | |
CN105933272A (en) | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | |
CN111883140A (en) | Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition | |
US20230401338A1 (en) | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium | |
WO2021007856A1 (en) | Identity verification method, terminal device, and storage medium | |
US20240126851A1 (en) | Authentication system and method | |
CN109040466B (en) | Voice-based mobile terminal unlocking method and device, electronic equipment and storage medium | |
CN108550368B (en) | Voice data processing method | |
CN108416592B (en) | High-speed voice recognition method | |
Saleema et al. | Voice biometrics: the promising future of authentication in the internet of things | |
Nagakrishnan et al. | Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models | |
CN111785280B (en) | Identity authentication method and device, storage medium and electronic equipment | |
CN114417372A (en) | Data file encryption method and storage device based on voice band characteristics | |
CN108447491B (en) | Intelligent voice recognition method | |
CN111933117A (en) | Voice verification method and device, storage medium and electronic device | |
KR100917419B1 (en) | Speaker recognition systems | |
Rumsey | Audio forensics: Keeping up in the age of smartphones and fakery | |
Wankhede et al. | Enhancing Biometric Speaker Recognition Through MFCC Feature Extraction and Polar Codes for Remote Application | |
Nagakrishnan et al. | Novel secured speech communication for person authentication | |
Fathima et al. | Transfer Learning for Speaker Verification with Short-Duration Audio | |
Kadu et al. | Voice Based Authentication System for Web Applications using Machine Learning. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200925 |
|
RJ01 | Rejection of invention patent application after publication |