CN115331673B - Voiceprint recognition household appliance control method and device in complex sound scene - Google Patents
Voiceprint recognition household appliance control method and device in complex sound scene Download PDFInfo
- Publication number
- CN115331673B CN115331673B CN202211256541.1A CN202211256541A CN115331673B CN 115331673 B CN115331673 B CN 115331673B CN 202211256541 A CN202211256541 A CN 202211256541A CN 115331673 B CN115331673 B CN 115331673B
- Authority
- CN
- China
- Prior art keywords
- audio
- voiceprint recognition
- similarity
- model
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 29
- 238000010801 machine learning Methods 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000000556 factor analysis Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Selective Calling Equipment (AREA)
Abstract
The invention provides a voiceprint recognition household appliance control method and device in a complex sound scene, and relates to the field of household appliance control. The template audio fully considers various conditions in a complex sound scene, has better representativeness, and lays a foundation for improving the voiceprint recognition precision in the complex sound scene. And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved. The model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.
Description
Technical Field
The invention relates to the field of household appliance control, in particular to a voiceprint recognition household appliance control method and device in a complex sound scene.
Background
With the progress of science and technology, more and more modern household appliances are widely applied by consumers. As an important identity recognition technology, voiceprint recognition can be used for recognizing the identities of family members, so that the household appliances receive the instructions of specific family members, and the instruction interference of irrelevant personnel is prevented. Under normal conditions, a common voiceprint recognition technology can ensure high recognition accuracy, so that accurate control of specific family members on household appliances is realized.
However, in the process of controlling the home appliance by using the voiceprint recognition technology, a complicated sound scene is often accompanied, and the recognition accuracy of the voiceprint recognition technology is greatly reduced. With the obvious reduction of the identification precision, the application value of the household appliance based on the voiceprint identification control method is also obviously reduced. Therefore, how to design a voiceprint recognition household appliance control method in a complex sound scene can ensure the accuracy of voiceprint recognition in the complex sound scene, and the method has very important application value.
Disclosure of Invention
In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a method and an apparatus for controlling a voiceprint recognition appliance in a complex sound scene.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present method provides a voiceprint recognition home appliance control method in a complex sound scene, including:
respectively recording multiple sections of audio of specific family members in multiple sound scenes;
encoding a plurality of pieces of audio;
after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, carrying out the next step;
and judging whether the output audio of the household appliance user is the audio of the specific family member by using the voiceprint recognition decision model.
Based on the first aspect, in some embodiments of the invention, the machine learning model is an SVM model.
Based on the first aspect, in some embodiments of the present invention, the step of determining whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model includes:
if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
Based on the first aspect, in some embodiments of the present invention, the step of calculating the similarity between the audio segment and the template audio comprises:
performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal;
and calculating the cosine distance between the section of audio and the template audio.
Based on the first aspect, in some embodiments of the present invention, the step of respectively entering multiple pieces of audio of a specific family member in multiple sound scenes includes:
recording multi-section audio of a specific family member under one or more conditions of high noise, speaking of multiple persons and small sound;
and controlling the duration of each piece of audio to be within 5 seconds when the audio is recorded.
Based on the first aspect, in some embodiments of the present invention, the step of encoding the multiple pieces of audio includes:
and coding the multi-segment audio by using an I-Vector calculation method.
Based on the first aspect, in some embodiments of the invention, the step of collecting audio of a plurality of non-specific family members as negative training samples comprises:
more than 50 non-specific family member's audios were collected as negative training samples.
In a second aspect, an embodiment of the present invention provides a home appliance control system for voiceprint recognition in a complex sound scene, including:
a logging module: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
the coding module: encoding a plurality of pieces of audio;
a calculate similarity module: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
a training module: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
an identification module: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
a judging module: if the similarity between the audio and any template audio is smaller than the preset similarity, the voiceprint recognition decision model is used for judging whether the output audio of the household appliance user is the audio of the specific family member.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor, at least one memory, and a data bus; wherein:
the processor and the memory complete mutual communication through the data bus; the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing a computer program, where the computer program causes a computer to execute the method described above.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:
(1) The template audio fully considers various conditions in a complex sound scene, has better representativeness, and lays a foundation for improving the voiceprint recognition precision in the complex sound scene.
(2) And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved.
(3) The similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, the models are simple to complex, the audios which are easy to judge can obtain results by using the simple models, the audios which are difficult to judge can obtain results by using the complex models, and the consumption of computing resources is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of an embodiment of a voiceprint recognition home appliance control method in a complex sound scene according to the present invention;
FIG. 2 is a flowchart of an embodiment of a voiceprint recognition appliance control method in a complex sound scene according to the present invention;
fig. 3 is a block diagram illustrating a structure of a voiceprint recognition home appliance control apparatus in a complex sound scene according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the invention.
Icon: 1. a recording module; 2. an encoding module; 3. a calculate similarity module; 4. a training module; 5. an identification module; 6. a judgment module; 7. a processor; 8. a memory; 9. a data bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The system embodiments are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and computer program products according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device, which may be a personal computer, a server, or a network device, to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the embodiments of the present invention, "a plurality" represents at least 2.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Examples
Referring to fig. 1, in a first aspect, an embodiment of the present invention provides a method for controlling a home appliance for voiceprint recognition in a complex sound scene, including:
s1: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
in the step, a plurality of sound scenes, namely duplicated scenes, comprise various conditions such as high noise, multi-person speaking, small sound and the like, all conditions of voiceprint recognition in the use of the household appliance are contained as comprehensively as possible, the audio time can be set according to the actual conditions, the time for voice control on the household appliance is short under the common condition, and each section of audio can be within 5 seconds. The template audio fully considers various conditions in a complex sound scene, has better representativeness and lays a foundation for improving the voiceprint recognition precision in the complex sound scene.
S2: encoding a plurality of pieces of audio;
in this step, a multi-segment audio is encoded using an I-Vector calculation method. In practical application, because the speaker information and various interference information are mixed together in the speaker voice, and the channels of different acquisition devices have differences, the channel interference information is mixed in the collected voice. Such interfering information can cause perturbations in the speaker's information. The traditional GMM-UBM method has no way to overcome the problem, and the system performance is unstable. In the GMM-UBM model, each target speaker can be described by the GMM model. Since the mean value is only changed and no adjustments are made to the weights and covariance when adapting from the UBM model to the GMM model for each speaker, most of the speaker's information is contained in the mean value of the GMM. The GMM mean vector contains channel information in addition to most of the speaker information. Joint Factor Analysis (JFA) can model speaker differences and channel differences, respectively, thereby compensating for channel differences and improving system performance. However, JFA needs a lot of training corpora of different channels, and is difficult to obtain and complicated in calculation, so that it is difficult to put into practical use. Proposed by Dehak, a novel solution is proposed based on the I-Vector factor analysis technology. 5363 the method of JFA models speaker difference space and channel difference space separately, and the method based on I-Vector models global difference as a whole, so the processing relaxes the limitation to the corpus, and the calculation is simple and the performance is equivalent.
S3: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
in the step, calculating the similarity between every two audios of each family member comprises respectively carrying out audio filtering on the two sections of audios, calculating the short-time energy of an audio signal and intercepting effective data of the audio signal; and calculating the cosine distance of the two sections of audio. And reserving a section of audio with the similarity larger than a preset value, and considering all the reserved audio as template audio, wherein the preset value can be reasonably set according to actual requirements.
S4: taking all template audios as positive training samples, collecting audios of a plurality of non-specific family members as negative training samples, and training by using a machine learning model to obtain a voiceprint recognition decision model;
in this step, the step of collecting audio of a plurality of unspecific family members as negative training samples includes: more than 50 non-specific family member audios were collected as negative training samples. The machine learning model may be an SVM model.
S5: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, performing the next step;
in this step, the similarity between the piece of audio and the template audio may be calculated using a similarity detection model based on the template audio. The step of calculating the similarity between the audio and the template audio comprises: performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal; and calculating the cosine distance between the section of audio and the template audio.
S6: and judging whether the output audio of the household appliance user is the audio of the specific family member by using the voiceprint recognition decision model.
The similarity detection model based on the template audio and the voiceprint recognition decision model based on the SVM model are used for sequentially judging, so that the voiceprint recognition precision is improved; the model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.
In some embodiments of the present invention based on the first aspect, the step of determining whether the output audio of the user of the appliance is the audio of a specific family member by using a voiceprint recognition decision model includes:
referring to fig. 2, S61: if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
s62: and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
The similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved; the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, the models are simple to complex, the audios which are easy to judge can obtain results by using the simple models, the audios which are difficult to judge can obtain results by using the complex models, and the consumption of computing resources is reduced.
Referring to fig. 3, in a second aspect, an embodiment of the present invention provides a voiceprint recognition home appliance control system in a complex sound scene, including:
the recording module 1: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
and the coding module 2: encoding a plurality of pieces of audio;
calculate similarity module 3: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
the training module 4: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
the identification module 5: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
and a judging module 6: if the similarity between the audio and any template audio is smaller than the preset similarity, the voiceprint recognition decision model is used for judging whether the output audio of the household appliance user is the audio of the specific family member.
For the specific implementation of the apparatus, please refer to the implementation of the method, and redundant description is omitted here.
Referring to fig. 4, in a third aspect, an embodiment of the invention provides an electronic device, including:
at least one processor 7, at least one memory 8 and a data bus 9; wherein:
the processor 7 and the memory 8 complete communication with each other through the data bus 9; the memory 8 stores program instructions executable by the processor 7, and the processor 7 calls the program instructions to perform the method. For example, the above steps S1-S6 are performed.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing a computer program, where the computer program causes a computer to execute the method described above. For example, the above steps S1-S6 are performed.
In conclusion, the invention provides the voiceprint recognition household appliance control method in the complex sound scene, the template audio fully considers various conditions in the complex sound scene, the representativeness is better, and a foundation is laid for improving the voiceprint recognition precision in the complex sound scene. And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved. The model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Claims (8)
1. A voiceprint recognition household appliance control method in a complex sound scene is characterized by comprising the following steps:
respectively recording multiple sections of audio of specific family members in multiple sound scenes;
encoding a plurality of pieces of audio;
after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
taking all template audios as positive training samples, collecting audios of a plurality of non-specific family members as negative training samples, and training by using a machine learning model to obtain a voiceprint recognition decision model, wherein the machine learning model is an SVM model;
when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, performing the next step;
judging whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model;
the step of judging whether the output audio of the household appliance user is the audio of a specific family member by using the voiceprint recognition decision model comprises the following steps:
if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
2. The method as claimed in claim 1, wherein the user of the home appliance outputs a segment of audio, and the step of calculating the similarity between the segment of audio and the template audio comprises:
performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal;
and calculating the cosine distance between the section of audio and the template audio.
3. The method as claimed in claim 1, wherein the step of respectively recording multiple audio segments of a specific family member in a plurality of sound scenes comprises:
recording multi-section audio of a specific family member under one or more conditions of high noise, speaking of multiple persons and small sound;
and controlling the duration of each piece of audio to be within 5 seconds when the audio is recorded.
4. The method as claimed in claim 1, wherein the step of encoding the multiple segments of audio comprises:
and coding the multi-segment audio by using an I-Vector calculation method.
5. The method as claimed in claim 1, wherein the step of collecting the audios of a plurality of unspecified family members as negative training samples comprises:
more than 50 non-specific family member audios were collected as negative training samples.
6. A voiceprint recognition household appliance control device in a complex sound scene is characterized by comprising:
a logging module: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
and an encoding module: encoding a plurality of pieces of audio;
a calculate similarity module: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
a training module: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model, wherein the machine learning model is an SVM model;
an identification module: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
a judgment module: if the similarity between the section of audio and any template audio is smaller than the preset similarity, judging whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model;
the judging module comprises:
an identification submodule: if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
a final decision submodule: and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
7. An electronic device, comprising:
at least one processor, at least one memory, and a data bus; wherein:
the processor and the memory complete mutual communication through the data bus; the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the method of any of claims 1 to 5.
8. A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211256541.1A CN115331673B (en) | 2022-10-14 | 2022-10-14 | Voiceprint recognition household appliance control method and device in complex sound scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211256541.1A CN115331673B (en) | 2022-10-14 | 2022-10-14 | Voiceprint recognition household appliance control method and device in complex sound scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115331673A CN115331673A (en) | 2022-11-11 |
CN115331673B true CN115331673B (en) | 2023-01-03 |
Family
ID=83913606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211256541.1A Active CN115331673B (en) | 2022-10-14 | 2022-10-14 | Voiceprint recognition household appliance control method and device in complex sound scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115331673B (en) |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9875743B2 (en) * | 2015-01-26 | 2018-01-23 | Verint Systems Ltd. | Acoustic signature building for a speaker from multiple sessions |
CN105895077A (en) * | 2015-11-15 | 2016-08-24 | 乐视移动智能信息技术(北京)有限公司 | Recording editing method and recording device |
CN106971737A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove spoken based on many people |
CN107766868A (en) * | 2016-08-15 | 2018-03-06 | 中国联合网络通信集团有限公司 | A kind of classifier training method and device |
CN108305615B (en) * | 2017-10-23 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Object identification method and device, storage medium and terminal thereof |
CN110164453A (en) * | 2019-05-24 | 2019-08-23 | 厦门快商通信息咨询有限公司 | A kind of method for recognizing sound-groove, terminal, server and the storage medium of multi-model fusion |
CN111785286A (en) * | 2020-05-22 | 2020-10-16 | 南京邮电大学 | Home CNN classification and feature matching combined voiceprint recognition method |
CN112230555A (en) * | 2020-10-12 | 2021-01-15 | 珠海格力电器股份有限公司 | Intelligent household equipment, control method and device thereof and storage medium |
CN112634869B (en) * | 2020-12-09 | 2023-05-26 | 鹏城实验室 | Command word recognition method, device and computer storage medium |
CN112351047B (en) * | 2021-01-07 | 2021-08-24 | 北京远鉴信息技术有限公司 | Double-engine based voiceprint identity authentication method, device, equipment and storage medium |
CN113241081B (en) * | 2021-04-25 | 2023-06-16 | 华南理工大学 | Far-field speaker authentication method and system based on gradient inversion layer |
CN114464193A (en) * | 2022-03-12 | 2022-05-10 | 云知声智能科技股份有限公司 | Voiceprint clustering method and device, storage medium and electronic device |
-
2022
- 2022-10-14 CN CN202211256541.1A patent/CN115331673B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115331673A (en) | 2022-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
US20180158449A1 (en) | Method and device for waking up via speech based on artificial intelligence | |
US20180005628A1 (en) | Speech Recognition | |
CN109767765A (en) | Talk about art matching process and device, storage medium, computer equipment | |
CN111785288B (en) | Voice enhancement method, device, equipment and storage medium | |
CN105976812A (en) | Voice identification method and equipment thereof | |
Eyben et al. | Affect recognition in real-life acoustic conditions-a new perspective on feature selection | |
CN110544469B (en) | Training method and device of voice recognition model, storage medium and electronic device | |
US11133022B2 (en) | Method and device for audio recognition using sample audio and a voting matrix | |
CN111816170B (en) | Training of audio classification model and garbage audio recognition method and device | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN114127849A (en) | Speech emotion recognition method and device | |
CN111344717A (en) | Interactive behavior prediction method, intelligent device and computer-readable storage medium | |
CN117789699B (en) | Speech recognition method, device, electronic equipment and computer readable storage medium | |
CN113436633B (en) | Speaker recognition method, speaker recognition device, computer equipment and storage medium | |
CN115331673B (en) | Voiceprint recognition household appliance control method and device in complex sound scene | |
CN116959464A (en) | Training method of audio generation network, audio generation method and device | |
Bovbjerg et al. | Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions | |
CN115547345A (en) | Voiceprint recognition model training and related recognition method, electronic device and storage medium | |
CN115221351A (en) | Audio matching method and device, electronic equipment and computer-readable storage medium | |
CN112489678A (en) | Scene recognition method and device based on channel characteristics | |
CN114333840A (en) | Voice identification method and related device, electronic equipment and storage medium | |
Aung et al. | M-Diarization: A Myanmar Speaker Diarization using Multi-scale dynamic weights | |
CN113035230A (en) | Authentication model training method and device and electronic equipment | |
US20230377560A1 (en) | Speech tendency classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |