Nothing Special   »   [go: up one dir, main page]

CN115331673B - Voiceprint recognition household appliance control method and device in complex sound scene - Google Patents

Voiceprint recognition household appliance control method and device in complex sound scene Download PDF

Info

Publication number
CN115331673B
CN115331673B CN202211256541.1A CN202211256541A CN115331673B CN 115331673 B CN115331673 B CN 115331673B CN 202211256541 A CN202211256541 A CN 202211256541A CN 115331673 B CN115331673 B CN 115331673B
Authority
CN
China
Prior art keywords
audio
voiceprint recognition
similarity
model
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211256541.1A
Other languages
Chinese (zh)
Other versions
CN115331673A (en
Inventor
张林焘
吴昊
别荣芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN202211256541.1A priority Critical patent/CN115331673B/en
Publication of CN115331673A publication Critical patent/CN115331673A/en
Application granted granted Critical
Publication of CN115331673B publication Critical patent/CN115331673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The invention provides a voiceprint recognition household appliance control method and device in a complex sound scene, and relates to the field of household appliance control. The template audio fully considers various conditions in a complex sound scene, has better representativeness, and lays a foundation for improving the voiceprint recognition precision in the complex sound scene. And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved. The model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.

Description

Voiceprint recognition household appliance control method and device in complex sound scene
Technical Field
The invention relates to the field of household appliance control, in particular to a voiceprint recognition household appliance control method and device in a complex sound scene.
Background
With the progress of science and technology, more and more modern household appliances are widely applied by consumers. As an important identity recognition technology, voiceprint recognition can be used for recognizing the identities of family members, so that the household appliances receive the instructions of specific family members, and the instruction interference of irrelevant personnel is prevented. Under normal conditions, a common voiceprint recognition technology can ensure high recognition accuracy, so that accurate control of specific family members on household appliances is realized.
However, in the process of controlling the home appliance by using the voiceprint recognition technology, a complicated sound scene is often accompanied, and the recognition accuracy of the voiceprint recognition technology is greatly reduced. With the obvious reduction of the identification precision, the application value of the household appliance based on the voiceprint identification control method is also obviously reduced. Therefore, how to design a voiceprint recognition household appliance control method in a complex sound scene can ensure the accuracy of voiceprint recognition in the complex sound scene, and the method has very important application value.
Disclosure of Invention
In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a method and an apparatus for controlling a voiceprint recognition appliance in a complex sound scene.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present method provides a voiceprint recognition home appliance control method in a complex sound scene, including:
respectively recording multiple sections of audio of specific family members in multiple sound scenes;
encoding a plurality of pieces of audio;
after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, carrying out the next step;
and judging whether the output audio of the household appliance user is the audio of the specific family member by using the voiceprint recognition decision model.
Based on the first aspect, in some embodiments of the invention, the machine learning model is an SVM model.
Based on the first aspect, in some embodiments of the present invention, the step of determining whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model includes:
if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
Based on the first aspect, in some embodiments of the present invention, the step of calculating the similarity between the audio segment and the template audio comprises:
performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal;
and calculating the cosine distance between the section of audio and the template audio.
Based on the first aspect, in some embodiments of the present invention, the step of respectively entering multiple pieces of audio of a specific family member in multiple sound scenes includes:
recording multi-section audio of a specific family member under one or more conditions of high noise, speaking of multiple persons and small sound;
and controlling the duration of each piece of audio to be within 5 seconds when the audio is recorded.
Based on the first aspect, in some embodiments of the present invention, the step of encoding the multiple pieces of audio includes:
and coding the multi-segment audio by using an I-Vector calculation method.
Based on the first aspect, in some embodiments of the invention, the step of collecting audio of a plurality of non-specific family members as negative training samples comprises:
more than 50 non-specific family member's audios were collected as negative training samples.
In a second aspect, an embodiment of the present invention provides a home appliance control system for voiceprint recognition in a complex sound scene, including:
a logging module: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
the coding module: encoding a plurality of pieces of audio;
a calculate similarity module: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
a training module: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
an identification module: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
a judging module: if the similarity between the audio and any template audio is smaller than the preset similarity, the voiceprint recognition decision model is used for judging whether the output audio of the household appliance user is the audio of the specific family member.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor, at least one memory, and a data bus; wherein:
the processor and the memory complete mutual communication through the data bus; the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing a computer program, where the computer program causes a computer to execute the method described above.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:
(1) The template audio fully considers various conditions in a complex sound scene, has better representativeness, and lays a foundation for improving the voiceprint recognition precision in the complex sound scene.
(2) And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved.
(3) The similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, the models are simple to complex, the audios which are easy to judge can obtain results by using the simple models, the audios which are difficult to judge can obtain results by using the complex models, and the consumption of computing resources is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of an embodiment of a voiceprint recognition home appliance control method in a complex sound scene according to the present invention;
FIG. 2 is a flowchart of an embodiment of a voiceprint recognition appliance control method in a complex sound scene according to the present invention;
fig. 3 is a block diagram illustrating a structure of a voiceprint recognition home appliance control apparatus in a complex sound scene according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the invention.
Icon: 1. a recording module; 2. an encoding module; 3. a calculate similarity module; 4. a training module; 5. an identification module; 6. a judgment module; 7. a processor; 8. a memory; 9. a data bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The system embodiments are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and computer program products according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device, which may be a personal computer, a server, or a network device, to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the embodiments of the present invention, "a plurality" represents at least 2.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Examples
Referring to fig. 1, in a first aspect, an embodiment of the present invention provides a method for controlling a home appliance for voiceprint recognition in a complex sound scene, including:
s1: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
in the step, a plurality of sound scenes, namely duplicated scenes, comprise various conditions such as high noise, multi-person speaking, small sound and the like, all conditions of voiceprint recognition in the use of the household appliance are contained as comprehensively as possible, the audio time can be set according to the actual conditions, the time for voice control on the household appliance is short under the common condition, and each section of audio can be within 5 seconds. The template audio fully considers various conditions in a complex sound scene, has better representativeness and lays a foundation for improving the voiceprint recognition precision in the complex sound scene.
S2: encoding a plurality of pieces of audio;
in this step, a multi-segment audio is encoded using an I-Vector calculation method. In practical application, because the speaker information and various interference information are mixed together in the speaker voice, and the channels of different acquisition devices have differences, the channel interference information is mixed in the collected voice. Such interfering information can cause perturbations in the speaker's information. The traditional GMM-UBM method has no way to overcome the problem, and the system performance is unstable. In the GMM-UBM model, each target speaker can be described by the GMM model. Since the mean value is only changed and no adjustments are made to the weights and covariance when adapting from the UBM model to the GMM model for each speaker, most of the speaker's information is contained in the mean value of the GMM. The GMM mean vector contains channel information in addition to most of the speaker information. Joint Factor Analysis (JFA) can model speaker differences and channel differences, respectively, thereby compensating for channel differences and improving system performance. However, JFA needs a lot of training corpora of different channels, and is difficult to obtain and complicated in calculation, so that it is difficult to put into practical use. Proposed by Dehak, a novel solution is proposed based on the I-Vector factor analysis technology. 5363 the method of JFA models speaker difference space and channel difference space separately, and the method based on I-Vector models global difference as a whole, so the processing relaxes the limitation to the corpus, and the calculation is simple and the performance is equivalent.
S3: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
in the step, calculating the similarity between every two audios of each family member comprises respectively carrying out audio filtering on the two sections of audios, calculating the short-time energy of an audio signal and intercepting effective data of the audio signal; and calculating the cosine distance of the two sections of audio. And reserving a section of audio with the similarity larger than a preset value, and considering all the reserved audio as template audio, wherein the preset value can be reasonably set according to actual requirements.
S4: taking all template audios as positive training samples, collecting audios of a plurality of non-specific family members as negative training samples, and training by using a machine learning model to obtain a voiceprint recognition decision model;
in this step, the step of collecting audio of a plurality of unspecific family members as negative training samples includes: more than 50 non-specific family member audios were collected as negative training samples. The machine learning model may be an SVM model.
S5: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, performing the next step;
in this step, the similarity between the piece of audio and the template audio may be calculated using a similarity detection model based on the template audio. The step of calculating the similarity between the audio and the template audio comprises: performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal; and calculating the cosine distance between the section of audio and the template audio.
S6: and judging whether the output audio of the household appliance user is the audio of the specific family member by using the voiceprint recognition decision model.
The similarity detection model based on the template audio and the voiceprint recognition decision model based on the SVM model are used for sequentially judging, so that the voiceprint recognition precision is improved; the model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.
In some embodiments of the present invention based on the first aspect, the step of determining whether the output audio of the user of the appliance is the audio of a specific family member by using a voiceprint recognition decision model includes:
referring to fig. 2, S61: if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
s62: and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
The similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved; the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, the models are simple to complex, the audios which are easy to judge can obtain results by using the simple models, the audios which are difficult to judge can obtain results by using the complex models, and the consumption of computing resources is reduced.
Referring to fig. 3, in a second aspect, an embodiment of the present invention provides a voiceprint recognition home appliance control system in a complex sound scene, including:
the recording module 1: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
and the coding module 2: encoding a plurality of pieces of audio;
calculate similarity module 3: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
the training module 4: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model;
the identification module 5: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
and a judging module 6: if the similarity between the audio and any template audio is smaller than the preset similarity, the voiceprint recognition decision model is used for judging whether the output audio of the household appliance user is the audio of the specific family member.
For the specific implementation of the apparatus, please refer to the implementation of the method, and redundant description is omitted here.
Referring to fig. 4, in a third aspect, an embodiment of the invention provides an electronic device, including:
at least one processor 7, at least one memory 8 and a data bus 9; wherein:
the processor 7 and the memory 8 complete communication with each other through the data bus 9; the memory 8 stores program instructions executable by the processor 7, and the processor 7 calls the program instructions to perform the method. For example, the above steps S1-S6 are performed.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing a computer program, where the computer program causes a computer to execute the method described above. For example, the above steps S1-S6 are performed.
In conclusion, the invention provides the voiceprint recognition household appliance control method in the complex sound scene, the template audio fully considers various conditions in the complex sound scene, the representativeness is better, and a foundation is laid for improving the voiceprint recognition precision in the complex sound scene. And the similarity detection model based on the template audio, the voiceprint recognition decision model based on the SVM model and the voiceprint recognition model based on the convolutional neural network are used for sequentially judging, so that the voiceprint recognition precision is improved. The model is simple to complex, the audio frequency which is easy to judge can obtain the result by using the simple model, the audio frequency signal which is difficult to judge can obtain the result by using the complex model, and the consumption of computing resources is reduced.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (8)

1. A voiceprint recognition household appliance control method in a complex sound scene is characterized by comprising the following steps:
respectively recording multiple sections of audio of specific family members in multiple sound scenes;
encoding a plurality of pieces of audio;
after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
taking all template audios as positive training samples, collecting audios of a plurality of non-specific family members as negative training samples, and training by using a machine learning model to obtain a voiceprint recognition decision model, wherein the machine learning model is an SVM model;
when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member; if the similarity between the section of audio and any template audio is smaller than the preset similarity, performing the next step;
judging whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model;
the step of judging whether the output audio of the household appliance user is the audio of a specific family member by using the voiceprint recognition decision model comprises the following steps:
if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
2. The method as claimed in claim 1, wherein the user of the home appliance outputs a segment of audio, and the step of calculating the similarity between the segment of audio and the template audio comprises:
performing, for the segment of audio and the template audio: audio filtering, calculating short-time energy of an audio signal and intercepting effective data of the audio signal;
and calculating the cosine distance between the section of audio and the template audio.
3. The method as claimed in claim 1, wherein the step of respectively recording multiple audio segments of a specific family member in a plurality of sound scenes comprises:
recording multi-section audio of a specific family member under one or more conditions of high noise, speaking of multiple persons and small sound;
and controlling the duration of each piece of audio to be within 5 seconds when the audio is recorded.
4. The method as claimed in claim 1, wherein the step of encoding the multiple segments of audio comprises:
and coding the multi-segment audio by using an I-Vector calculation method.
5. The method as claimed in claim 1, wherein the step of collecting the audios of a plurality of unspecified family members as negative training samples comprises:
more than 50 non-specific family member audios were collected as negative training samples.
6. A voiceprint recognition household appliance control device in a complex sound scene is characterized by comprising:
a logging module: respectively recording multiple sections of audio of specific family members in multiple sound scenes;
and an encoding module: encoding a plurality of pieces of audio;
a calculate similarity module: after encoding, calculating the similarity between every two audios of each family member, reserving a section of audio with the similarity larger than a preset value, and regarding all reserved audio as template audio;
a training module: all template audios are used as positive training samples, audios of a plurality of non-specific family members are collected and used as negative training samples, and a machine learning model is used for training to obtain a voiceprint recognition decision model, wherein the machine learning model is an SVM model;
an identification module: when the household appliance user outputs a section of audio, calculating the similarity between the section of audio and the template audio, and if the similarity between the section of audio and any template audio is greater than the preset similarity, directly identifying the audio as the audio of a specific family member;
a judgment module: if the similarity between the section of audio and any template audio is smaller than the preset similarity, judging whether the output audio of the household appliance user is the audio of a specific family member by using a voiceprint recognition decision model;
the judging module comprises:
an identification submodule: if the score of the voiceprint recognition decision result based on the SVM model is larger than a first preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a specific family member, if the score of the voiceprint recognition decision result based on the SVM model is smaller than a second preset score, directly recognizing the voiceprint recognition decision result based on the SVM model as the audio frequency of a non-specific family member, and if the score of the voiceprint recognition decision result based on the SVM model is between the first preset score and the second preset score, performing the next step;
a final decision submodule: and finally judging the output audio of the household appliance user by using a voiceprint recognition model based on the convolutional neural network, and judging whether the audio is the audio of a specific family member.
7. An electronic device, comprising:
at least one processor, at least one memory, and a data bus; wherein:
the processor and the memory complete mutual communication through the data bus; the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the method of any of claims 1 to 5.
8. A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the method according to any one of claims 1 to 5.
CN202211256541.1A 2022-10-14 2022-10-14 Voiceprint recognition household appliance control method and device in complex sound scene Active CN115331673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211256541.1A CN115331673B (en) 2022-10-14 2022-10-14 Voiceprint recognition household appliance control method and device in complex sound scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211256541.1A CN115331673B (en) 2022-10-14 2022-10-14 Voiceprint recognition household appliance control method and device in complex sound scene

Publications (2)

Publication Number Publication Date
CN115331673A CN115331673A (en) 2022-11-11
CN115331673B true CN115331673B (en) 2023-01-03

Family

ID=83913606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211256541.1A Active CN115331673B (en) 2022-10-14 2022-10-14 Voiceprint recognition household appliance control method and device in complex sound scene

Country Status (1)

Country Link
CN (1) CN115331673B (en)

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9875743B2 (en) * 2015-01-26 2018-01-23 Verint Systems Ltd. Acoustic signature building for a speaker from multiple sessions
CN105895077A (en) * 2015-11-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 Recording editing method and recording device
CN106971737A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove spoken based on many people
CN107766868A (en) * 2016-08-15 2018-03-06 中国联合网络通信集团有限公司 A kind of classifier training method and device
CN108305615B (en) * 2017-10-23 2020-06-16 腾讯科技(深圳)有限公司 Object identification method and device, storage medium and terminal thereof
CN110164453A (en) * 2019-05-24 2019-08-23 厦门快商通信息咨询有限公司 A kind of method for recognizing sound-groove, terminal, server and the storage medium of multi-model fusion
CN111785286A (en) * 2020-05-22 2020-10-16 南京邮电大学 Home CNN classification and feature matching combined voiceprint recognition method
CN112230555A (en) * 2020-10-12 2021-01-15 珠海格力电器股份有限公司 Intelligent household equipment, control method and device thereof and storage medium
CN112634869B (en) * 2020-12-09 2023-05-26 鹏城实验室 Command word recognition method, device and computer storage medium
CN112351047B (en) * 2021-01-07 2021-08-24 北京远鉴信息技术有限公司 Double-engine based voiceprint identity authentication method, device, equipment and storage medium
CN113241081B (en) * 2021-04-25 2023-06-16 华南理工大学 Far-field speaker authentication method and system based on gradient inversion layer
CN114464193A (en) * 2022-03-12 2022-05-10 云知声智能科技股份有限公司 Voiceprint clustering method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN115331673A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
US20180158449A1 (en) Method and device for waking up via speech based on artificial intelligence
US20180005628A1 (en) Speech Recognition
CN109767765A (en) Talk about art matching process and device, storage medium, computer equipment
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
CN105976812A (en) Voice identification method and equipment thereof
Eyben et al. Affect recognition in real-life acoustic conditions-a new perspective on feature selection
CN110544469B (en) Training method and device of voice recognition model, storage medium and electronic device
US11133022B2 (en) Method and device for audio recognition using sample audio and a voting matrix
CN111816170B (en) Training of audio classification model and garbage audio recognition method and device
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
CN114127849A (en) Speech emotion recognition method and device
CN111344717A (en) Interactive behavior prediction method, intelligent device and computer-readable storage medium
CN117789699B (en) Speech recognition method, device, electronic equipment and computer readable storage medium
CN113436633B (en) Speaker recognition method, speaker recognition device, computer equipment and storage medium
CN115331673B (en) Voiceprint recognition household appliance control method and device in complex sound scene
CN116959464A (en) Training method of audio generation network, audio generation method and device
Bovbjerg et al. Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
CN115547345A (en) Voiceprint recognition model training and related recognition method, electronic device and storage medium
CN115221351A (en) Audio matching method and device, electronic equipment and computer-readable storage medium
CN112489678A (en) Scene recognition method and device based on channel characteristics
CN114333840A (en) Voice identification method and related device, electronic equipment and storage medium
Aung et al. M-Diarization: A Myanmar Speaker Diarization using Multi-scale dynamic weights
CN113035230A (en) Authentication model training method and device and electronic equipment
US20230377560A1 (en) Speech tendency classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant