Nothing Special   »   [go: up one dir, main page]

CN111370029A - Voice data processing method and device, storage medium and electronic equipment - Google Patents

Voice data processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111370029A
CN111370029A CN202010130126.6A CN202010130126A CN111370029A CN 111370029 A CN111370029 A CN 111370029A CN 202010130126 A CN202010130126 A CN 202010130126A CN 111370029 A CN111370029 A CN 111370029A
Authority
CN
China
Prior art keywords
voice data
pronunciation
value
user
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010130126.6A
Other languages
Chinese (zh)
Inventor
陈晓婕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiyi Education Information Consulting Co ltd
Original Assignee
Beijing Yiyi Education Information Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yiyi Education Information Consulting Co ltd filed Critical Beijing Yiyi Education Information Consulting Co ltd
Priority to CN202010130126.6A priority Critical patent/CN111370029A/en
Publication of CN111370029A publication Critical patent/CN111370029A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a method and a device for processing voice data, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring user voice data input by a user; determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data; and determining a target operation associated with a next stage of the current stage according to the evaluation level, and executing the target operation. By the voice data processing method, the voice data processing device, the voice data processing storage medium and the electronic equipment, a learning or training process of a user is divided into a plurality of stages, and target operation related to the next stage of the current stage can be executed based on the evaluation level; the relevance between the two stages can be enhanced, the learning efficiency of the user is improved, or the learning effect of the user is improved.

Description

Voice data processing method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing voice data, a storage medium, and an electronic device.
Background
With the development of internet technology, more and more application programs appear in the market, and the functions of the application programs are more and more powerful; various language learning applications have been developed to improve the spoken foreign language level of users. The position of spoken language is particularly important in the process of learning foreign languages.
At present, a traditional application program generally simply moves a learning scene in reality to the application program to realize online or offline learning, for example, the application program displays content to be read aloud to a user, and a spoken language score is obtained based on a vocalization of the user. Although the traditional application program enables a user to learn spoken language at any time and improves convenience, the essence of the traditional application program is the same as that of classroom teaching, the traditional application program does not fully embody the functionality of the application program, and the learning efficiency of the user is not further improved.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a method and an apparatus for processing voice data, a storage medium, and an electronic device.
In a first aspect, an embodiment of the present invention provides a method for processing voice data, including:
acquiring user voice data input by a user, wherein the user voice data is related to standard voice data of the current stage;
determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data;
and determining a target operation associated with the next phase of the current phase according to the evaluation level, and executing the target operation.
On the basis of the foregoing embodiment, the determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data includes:
if the standard voice data contains standard keywords, determining a key value of the user voice data according to the degree of the user voice data hitting the standard keywords;
determining a corresponding integrity value according to the integrity degree of the user voice data and determining a corresponding pronunciation value according to the pronunciation condition of the user voice data by taking the standard voice data as a reference;
and determining the evaluation level of the user voice data according to the key value, the complete value and the pronunciation value.
On the basis of the above embodiment, the determining the evaluation level of the user voice data according to the key value, the integrity value and the pronunciation value includes:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
when the key value indicates that the user voice data does not hit the standard key word, if the complete value belongs to the second complete interval, determining that the evaluation level of the user voice data is a low level;
when the key value indicates that the user voice data does not hit the standard key word, if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and when the key value, the integrity value and the pronunciation value belong to other intervals, determining the evaluation level of the user voice data as a middle level.
On the basis of the foregoing embodiment, the determining an evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data further includes:
and if the standard voice data does not contain the standard key words, determining the evaluation level of the user voice data according to the complete value and the pronunciation value.
On the basis of the above embodiment, the determining the evaluation level of the user speech data according to the integrity value and the pronunciation value includes:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
if the complete value belongs to the second complete interval, determining the evaluation level of the user voice data as a low level;
if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and determining the evaluation level of the user voice data as a middle level when the integrity value and the pronunciation value belong to other intervals.
On the basis of the above embodiment, the determining, according to the evaluation level, a target operation associated with a stage next to the current stage includes:
when the evaluation grade is high grade, automatically jumping to the next stage as target operation;
when the evaluation grade is a medium-high grade, taking a display selection item as a target operation, wherein the selection item comprises repeating the current stage and/or jumping to the next stage;
and when the evaluation level is a medium level or a low level, repeating the current stage and forbidding jumping to the next stage as target operation.
On the basis of the above embodiment, the determining, according to the evaluation level, a target operation associated with a stage next to the current stage further includes:
when the evaluation level is a medium level or a low level, displaying text data corresponding to the standard voice data in a prompt area;
and in the current stage, if the number of times of the evaluation grade of the medium grade or the low grade is greater than a preset threshold value, displaying the selection item.
In a second aspect, an embodiment of the present invention further provides a device for processing voice data, including:
the acquisition module is used for acquiring user voice data input by a user, wherein the user voice data is related to standard voice data of the current stage;
the evaluation module is used for determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data;
and the association operation module is used for determining a target operation associated with the next phase of the current phase according to the evaluation level and executing the target operation.
In a third aspect, an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions are used in any one of the foregoing voice data processing methods.
In a fourth aspect, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of processing speech data according to any one of the preceding claims.
In the solution provided by the first aspect of the embodiments of the present invention, a learning or training process of a user is divided into a plurality of stages, and user voice data input by the user at each stage is matched and compared with corresponding labeled voice data to determine a corresponding evaluation level, so that a target operation associated with a next stage of a current stage can be executed based on the evaluation level; when the evaluation level of the user is higher, the user can automatically jump to the next stage, the relevance between the two stages is enhanced, and the learning efficiency of the user is improved; when the evaluation level is low, the relevance between the two stages can be cut off, the user is forced to learn repeatedly in the current stage, the user is helped to master knowledge points, and the learning effect of the user is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for processing voice data according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a display of a display area of an intelligent terminal at a current stage in a method for processing voice data according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating determining an evaluation level of user voice data in a method for processing voice data according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating a speech data processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device for executing a processing method of voice data according to an embodiment of the present invention.
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Referring to fig. 1, a method for processing voice data according to an embodiment of the present invention includes:
step 101: and acquiring user voice data input by a user, wherein the user voice data is related to the standard voice data of the current stage.
In the embodiment of the invention, in the actual foreign language spoken language learning or training process, the foreign language spoken language learning or training process is generally divided into a plurality of stages according to the time sequence; in this embodiment, the "stage" refers to a learning or training process, and may specifically be a dialog, or a topic, etc. In each stage, presetting corresponding standard voice data and guiding a user to speak the standard voice data; the standard voice data spoken by the user is the user voice data, and can be acquired through pickup equipment such as a microphone on the intelligent terminal. For example, the current learning or training process is a follow-up process, and the smart terminal may play standard voice data (e.g., It's sunny today) of the current stage, and the user may input corresponding user voice data (e.g., It's sunny today) to the smart terminal by repeating the standard voice data. Or, the current learning or training process is a dialog scenario, and at this time, the intelligent terminal may also play corresponding voice data in the current stage, for example, the question "What's yourname? "and the like; accordingly, the standard voice data corresponding to the question may be "I'm name is …"; after hearing the standard voice data, the user needs to make a sound for replying to the standard voice data, so that the intelligent terminal can acquire the user voice data, such as "I'm name is …".
Step 102: and determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data.
In the embodiment of the invention, after the user voice data is acquired, the user voice data can be compared with the corresponding standard voice data, and whether the user voice data is qualified or not is further determined according to the matching degree between the user voice data and the standard voice data, so that the pronunciation level of the user can be scored and evaluated, and the corresponding evaluation level is determined. For example, by comparing the difference between the user voice data and the standard voice data, the Integrity (Integrity) and the Pronunciation (pronunciations) of the user voice data can be determined, and the evaluation level of the user voice data can be determined by using the Integrity and the Pronunciation as the matching degree therebetween. Wherein, the integrity degree represents the completion degree of the user answering, and the pronunciation condition represents the similarity degree between the pronunciation of the user and the standard voice data.
In this embodiment, the evaluation level is used to indicate the pronunciation level of the user, and the higher the evaluation level is, the higher the pronunciation level of the user is. Wherein, the evaluation grade is a plurality of grades, and in order to distinguish different levels of users, the evaluation grade is at least three.
Step 103: and determining a target operation associated with a next stage of the current stage according to the evaluation level, and executing the target operation.
In the embodiment of the invention, corresponding target operation is set for each evaluation level in advance, and the target operation can be executed after the evaluation level of the current stage is determined. In this embodiment, the target operation is an operation associated with a next stage of the current stage, and it may be determined whether the next stage can be entered based on the target operation, that is, the operation associated with the next stage of the current stage can be executed according to the evaluation level of the current stage of the user, so that the association between the two stages (e.g., the current stage and the next stage) can be increased, and the learning efficiency of the user in the two stages is further improved.
Alternatively, the target operation may be "automatically jump to the next stage", "display a selection item including repeating the current stage and/or jumping to the next stage", "repeating the current stage and prohibiting jumping to the next stage", and the like. In this embodiment, the evaluation levels may at least include a high level, a medium level, and a low level; specifically, when the evaluation grade is high grade, automatically jumping to the next stage as target operation; when the evaluation grade is a medium-high grade, taking the display selection item as a target operation; and when the evaluation level is the middle level or the low level, repeating the current stage and forbidding jumping to the next stage as target operation.
For example, in this embodiment, the learning or training process of the user at this time is divided into a plurality of topics, each topic corresponds to one stage, and in the current stage, if the user completes the corresponding topic and the evaluation level is high, it indicates that the user is skilled in mastering the topic, at this time, the user can automatically jump to the next stage, that is, the user can automatically jump to the next topic for continuous learning, and the user does not need to perform any operation, so that the learning efficiency of the user can be improved. If the evaluation level of the user is a medium-high level, the user is shown to master the question to a certain extent, but the user is not particularly skilled, at this time, two options including 'repeat the current stage' and 'jump to the next stage' can be displayed in the display area of the intelligent terminal for the user to select, if the user selects 'repeat the current stage', the user can repeatedly complete the question of the current stage, and if the user selects 'jump to the next stage', the user can be shown the question of the next stage for the user to continue learning. If the evaluation level of the user is the middle level or the low level, the user does not master the question of the current stage, and the current stage is forcibly and repeatedly executed and the user is prohibited to directly jump to the next stage, namely the user is forced to repeat the question of the current stage, so that the user can master the knowledge through learning the knowledge which is not mastered for many times, and the learning effect can be improved.
Alternatively, when the rating level is a middle level or a low level, text data corresponding to the standard voice data may be displayed in the prompt region, so that the standard answer or the model sentence may be provided to the user in a text form. In this embodiment, the prompt area is a part of the display area of the intelligent terminal, and is generally located in the lower half of the display area. Specifically, as shown in fig. 2, the current standard voice data is "Good moving", and if the evaluation level of the user is low (for example, medium level or low level), the prompt region may display a corresponding text "Good moving" for the user to refer to.
In addition, at the current stage, if the number of times of evaluating the intermediate level or the low level is greater than a preset threshold, the above-mentioned selection item may be displayed. In this embodiment, if the evaluation level of the user for multiple times (for example, two times, three times, etc.) is always low, the user does not need to be prohibited from entering the next stage at this time, but the user can autonomously select whether to enter the next stage by displaying the selection item, that is, "repeat the current stage" and "jump to the next stage" are displayed. As shown in fig. 2, the key "re-recording" in fig. 2 is a selection item related to "repeat the current stage", the key "Next" in fig. 2 is a selection item related to "jump to the Next stage", and a user can trigger to select the corresponding selection item by operating the corresponding key, so as to execute the operation of repeating the current stage or jumping to the Next stage.
According to the processing method of the voice data, provided by the embodiment of the invention, the learning or training process of a user is divided into a plurality of stages, the user voice data input by the user in each stage is matched and compared with the corresponding marked voice data to determine the corresponding evaluation level, and then the target operation associated with the next stage of the current stage can be executed based on the evaluation level; when the evaluation level of the user is higher, the user can automatically jump to the next stage, the relevance between the two stages is enhanced, and the learning efficiency of the user is improved; when the evaluation level is low, the relevance between the two stages can be cut off, the user is forced to learn repeatedly in the current stage, the user is helped to master knowledge points, and the learning effect of the user is improved.
On the basis of the above-described embodiment, in the present embodiment, a keyword, that is, a standard keyword may be set in the standard voice data, and the evaluation level may be determined based on a case where the user hits the standard keyword. Specifically, referring to fig. 3, the step 102 "determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data" includes:
step 1021: and if the standard voice data contains the standard keywords, determining the key value of the user voice data according to the degree of hitting the standard keywords by the user voice data.
In the embodiment of the invention, the standard keywords can be set for the standard voice data, and the standard keywords are knowledge points or vocabularies which need to be mastered by the user in an emphatic manner. For example, the standard speech data is "Would you like driving tea? ", where" Would you like "can be a keyword, i.e., a standard keyword. Meanwhile, if the corresponding keywords exist in the user voice data, the user voice data can be considered to hit the standard keywords, evaluation scoring is carried out according to the similarity between the keywords in the user voice data and the standard keywords, and the corresponding key values are determined. The similarity between the keywords in the user voice data and the standard keywords is higher, and the key value is higher; in addition, a plurality of keywords can be set in the standard voice data, and the more keywords hit by the user voice data, the higher the corresponding key value.
Step 1022: and determining a corresponding integrity value according to the integrity of the user voice data and determining a corresponding pronunciation value according to the pronunciation condition of the user voice data by taking the standard voice data as a reference.
In the embodiment of the invention, the complete value of the user voice data can be quantized through comparison and calculated according to the conventional percentage system; the integrity value corresponds to the degree of integrity, the higher the integrity value. Similarly, the more standard the pronunciation of the user, the higher the pronunciation value. Wherein, the integrity value and the pronunciation value can be determined according to the existing scoring engine or pronunciation model.
Step 1023: and determining the evaluation level of the user voice data according to the key value, the integrity value and the pronunciation value.
In the embodiment of the invention, the key value, the complete value and the pronunciation value can all quantitatively evaluate the voice data of the user, and further determine the corresponding evaluation level. Wherein, a corresponding interval range can be set for each value, and the evaluation level is determined according to the interval range in which the value falls. Specifically, the step 1023 of determining the evaluation level of the user voice data according to the key value, the integrity value and the pronunciation value includes:
step A1: a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval are preset.
In the embodiment of the invention, a plurality of intervals are set for different values in advance, and the evaluation grade can be determined according to the interval in which the value falls. Meanwhile, for different application scenes, parameters during rating can be quickly adjusted by uniformly changing the upper limit and the lower limit of the interval, so that the method is quickly applicable to different application scenes. For example, for students of a grade, the difficulty of the used subjects is generally low, and at this time, the interval corresponding to the high grade can be properly reduced; correspondingly, the general difficulty of the subjects used by the senior students is higher, and the corresponding interval of the senior can be properly expanded.
In this embodiment, two intervals are set for the complete value, that is, a first complete interval and a second complete interval, and the first complete interval and the second complete interval do not completely cover all the intervals of the complete value, that is, other intervals exist besides the first complete interval and the second complete interval; meanwhile, the upper limit of the first complete interval is the upper limit of the complete value whole interval, and the lower limit of the second complete interval is the upper limit of the complete value whole interval. For example, if the whole interval of the complete value has a value range of [0,100], the upper limit of the first complete interval is 100, and the lower limit of the second complete interval is 0; for example, the first full interval may be [60,100], the second full interval may be [0,10], and in addition, other intervals, namely (10,60), may automatically exist; if the integrity value is only positive, the other interval may be [11,59 ]. Similarly, in the present embodiment, three intervals, that is, a first pronunciation interval, a second pronunciation interval, and a third pronunciation interval, are set for the pronunciation value, and the three pronunciation intervals may not completely cover the whole interval of the pronunciation value, that is, other intervals may exist; meanwhile, the upper limit of the first pronunciation section is the upper limit of the whole section of the pronunciation value. For example, if the pronunciation value is an integer, the whole interval of the pronunciation value has a value range of [0,8], the first pronunciation interval may be [6,8], the second pronunciation interval may be [4,5], the third pronunciation interval may be [2,3], and in addition, other intervals, namely [0,1], automatically exist. Those skilled in the art can understand that the first complete interval, the second complete interval, the first pronunciation interval, the second pronunciation interval and the third pronunciation interval can be specifically set according to actual conditions, but it is required that there is no intersection between the first complete interval and the second complete interval, and there is no intersection between the first pronunciation interval, the second pronunciation interval and the third pronunciation interval.
Step A2: and when the key value indicates that the user voice data hits the standard keyword, if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level.
Step A3: and when the key value indicates that the user voice data hits the standard keyword, if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level.
Step A4: and when the key value indicates that the user voice data does not hit the standard keyword, if the complete value belongs to the second complete interval, determining the evaluation level of the user voice data to be a low level.
In the embodiment of the present invention, the key value may indicate a condition that the user voice data hits the standard keyword, specifically, a key value threshold may be preset, and if the key value is greater than the key value threshold, it indicates that the user voice data hits the standard keyword, otherwise, the user voice data does not hit the standard keyword. The critical value threshold may be specifically 0, 2, etc., and may be specifically determined based on actual conditions. Specifically, if the user voice data hits the standard keyword, and the integrity value of the user voice data belongs to the first integrity interval, and the pronunciation value belongs to the first pronunciation interval, it is indicated that both the integrity value and the pronunciation value fall within the interval range with a higher numerical value, and the integrity value and the pronunciation value are higher, at this time, it may be determined that the evaluation level is a high level.
Similarly, if the user speech data hits the standard keyword, and the integrity value belongs to the first integrity interval, and the pronunciation value belongs to the second pronunciation interval, it is indicated that the integrity value of the user speech data is higher, and the pronunciation value can also be considered to be higher, but since the pronunciation value falls into the second pronunciation interval with a smaller numerical value than the first pronunciation interval, the evaluation level can be determined to be a medium-high level.
If the user voice data does not hit the standard keywords, the standard keywords are most probably not present in the user voice data; meanwhile, if the integrity value belongs to the second integrity interval, it is indicated that the integrity value of the user voice data is poor (at this time, the pronunciation value of the user is also poor, that is, the pronunciation value is low), and at this time, it is determined that the evaluation level of the user voice data is the lowest low level.
Step A5: when the key value indicates that the user voice data does not hit the standard key words, if the pronunciation value belongs to the third pronunciation interval, determining the evaluation level of the user voice data as a middle level; and when the key value, the integrity value and the pronunciation value belong to other intervals, determining the evaluation level of the user voice data as a middle level.
In this embodiment, the evaluation grades sequentially include, from high to low: high grade, medium grade, low grade. And if the key value, the integrity value and the pronunciation value of the user voice data belong to other intervals, namely the evaluation level of the user voice data is not the high level, the middle level or the low level, setting the evaluation level as the middle level. Meanwhile, since the scoring engine determines that the integrity value and the pronunciation value have a certain correlation, for example, when the integrity value is small, the pronunciation value is generally small; accordingly, if the pronunciation value has a certain height, the integrity value is generally not small, i.e. there is no high-low situation between the pronunciation value and the pronunciation value. Therefore, in the present embodiment, the user can be ranked based on the second complete interval and the third pronunciation interval in step a4 and step a5, respectively. For example, if the second complete interval is [0,10], the third pronunciation interval is [2,3], and the complete value belongs to [0,10], it can be directly determined as a low level; accordingly, if the pronunciation value belongs to [2,3], the value can be directly recognized as the middle level.
In addition, in the present embodiment, in order to simplify the determination process of the evaluation level, the same complete interval, that is, the first complete interval is shared when the high level and the medium-high level are evaluated; meanwhile, because the steps a2 and A3 can completely cover the case where the user pronunciation situation is good (i.e., the standard keyword is hit and both the integrity value and the pronunciation value are high), and the step a4 also completely covers the case where the user pronunciation situation is not good (i.e., the standard keyword is not hit and both the integrity value and the pronunciation value are low), the step a5 and all cases are used as medium levels, and the user voice data at that time can be comprehensively and accurately evaluated.
On the basis of the foregoing embodiment, referring to fig. 3, the step 102 "determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data" further includes:
step 1024: and if the standard voice data does not contain the standard key words, determining the evaluation level of the user voice data according to the complete value and the pronunciation value.
In the embodiment of the invention, if the standard keyword is not set in the standard voice data, the evaluation grade does not need to be determined based on the key value, namely, the evaluation grade of the user voice data only needs to be determined according to the complete value and the pronunciation value. Similar to the above-described embodiment, when the standard keyword is not contained in the standard speech data, the evaluation level of the user speech data can be surely confirmed by the process similar to the above-described steps a1-a 5. Specifically, the step 1024 of determining the evaluation level of the user voice data according to the integrity value and the pronunciation value includes:
step B1: a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval are preset.
Step B2: and if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining the evaluation level of the user voice data as a high level.
Step B3: and if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level.
Step B4: and if the integrity value belongs to the second integrity interval, determining the evaluation level of the user voice data as a low level.
Step B5: if the pronunciation value belongs to the third pronunciation interval, determining the evaluation level of the user voice data as a middle level; and when the complete value and the pronunciation value belong to other intervals, determining the evaluation level of the user voice data as a middle level.
In the embodiment of the present invention, the step B1 may be identical to the step a1, that is, when the step a1 exists, the step a1 may be directly used as the step B1, and the step B1 does not need to be executed again. The remaining steps B2-B5 are similar to the steps A2-A5 described above without regard to the critical values, and are not described in detail herein.
In this embodiment, a process of determining the evaluation level is described in detail by one embodiment. In the embodiment, the complete value I and the pronunciation value P of the user voice data can be determined, the key value K of the user voice data can be determined when the standard keyword exists, and the user voice data can be considered to hit the standard keyword when the key value K is not equal to 0; meanwhile, the first complete interval is [60,100], the second complete interval is [0,10], the first pronunciation interval is [6,8], the second pronunciation interval is [4,5], and the third pronunciation interval is [2,3 ].
Specifically, if the key value K is present and K ≠ 0, the evaluation level is high when I ∈ [60,100], and P ∈ [6,8], or if the key value K is not present, the evaluation level is high when I ∈ [60,100], and P ∈ [6,8 ].
If the key value K is present, and K ≠ 0, the evaluation level is medium-high when I ∈ [60,100], and P ∈ [4,5] or if the key value K is not present, the evaluation level is also medium-high when I ∈ [60,100], and P ∈ [4,5]
If the key value K is present, but K is 0, then it is rated medium when P ∈ [2,3] or if the key value K is not present, then it is rated medium when P ∈ [2,3 ].
If the key value K is present, but K is 0, the rating is low when I ∈ [0,10] or if the key value K is not present, the rating is low when I ∈ [0,10 ].
For example, if the key value K exists and K ≠ 0, it is medium when I < 60 or P < 6, if the key value K exists but K ═ 0, it is medium when I > 10 and P > 1, if the key value K does not exist, I ∈ (10,60), it is medium, and it is also considered medium otherwise, and detailed description thereof will not be given here.
Optionally, after determining the evaluation level of each stage, all stages of the learning or training process may be summarized to determine a total evaluation value of all stages. For example, a corresponding score may be set for each evaluation level, and the average of the scores of all the stages may be taken as the total evaluation value. The total evaluation value may also be determined in other manners, which is not limited in this embodiment.
According to the processing method of the voice data, provided by the embodiment of the invention, the learning or training process of a user is divided into a plurality of stages, the user voice data input by the user in each stage is matched and compared with the corresponding marked voice data to determine the corresponding evaluation level, and then the target operation associated with the next stage of the current stage can be executed based on the evaluation level; when the evaluation level of the user is higher, the user can automatically jump to the next stage, the relevance between the two stages is enhanced, and the learning efficiency of the user is improved; when the evaluation level is low, the relevance between the two stages can be cut off, the user is forced to learn repeatedly in the current stage, the user is helped to master knowledge points, and the learning effect of the user is improved. Standard keywords are set for the labeled voice data, so that a user can learn knowledge points related to the standard keywords, and the learning effect is further improved; a plurality of intervals are set for different values of user voice data in advance, rating parameters under different application scenes can be adjusted rapidly, and the method can adapt to switching to different application scenes. The user speech data at this time can be comprehensively and accurately evaluated by setting the middle level (the middle level) as a bottom-in-pocket condition.
The above describes in detail the flow of the processing method of voice data, which can also be implemented by a corresponding apparatus, and the structure and function of the apparatus are described in detail below.
Referring to fig. 4, a processing apparatus for voice data according to an embodiment of the present invention includes:
an obtaining module 41, configured to obtain user voice data input by a user, where the user voice data is related to standard voice data of a current stage;
the evaluation module 42 is configured to determine an evaluation level of the user voice data according to a matching degree between the user voice data and the standard voice data;
and an association operation module 43, configured to determine a target operation associated with a next stage of the current stage according to the evaluation level, and execute the target operation.
On the basis of the above embodiment, the evaluation module 42 includes:
a key value determining unit, configured to determine a key value of the user voice data according to a degree that the user voice data hits a standard keyword if the standard voice data contains the standard keyword;
the complete value and pronunciation value determining unit is used for determining a corresponding complete value according to the complete degree of the user voice data and determining a corresponding pronunciation value according to the pronunciation condition of the user voice data by taking the standard voice data as a reference;
and the first evaluation unit is used for determining the evaluation level of the user voice data according to the key value, the integrity value and the pronunciation value.
On the basis of the above embodiment, the determining, by the first evaluation unit, the evaluation level of the user speech data according to the key value, the integrity value, and the pronunciation value includes:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
when the key value indicates that the user voice data does not hit the standard key word, if the complete value belongs to the second complete interval, determining that the evaluation level of the user voice data is a low level;
when the key value indicates that the user voice data does not hit the standard key word, if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and when the key value, the integrity value and the pronunciation value belong to other intervals, determining the evaluation level of the user voice data as a middle level.
On the basis of the above embodiment, the evaluation module 42 further includes:
and the second evaluation unit is used for determining the evaluation level of the user voice data according to the complete value and the pronunciation value if the standard voice data does not contain the standard key words.
On the basis of the above embodiment, the determining, by the second evaluation unit, the evaluation level of the user speech data based on the integrated value and the pronunciation value includes:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
if the complete value belongs to the second complete interval, determining the evaluation level of the user voice data as a low level;
if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and determining the evaluation level of the user voice data as a middle level when the integrity value and the pronunciation value belong to other intervals.
On the basis of the above embodiment, the association operation module 43 determines, according to the evaluation level, a target operation associated with a next stage of the current stage, including:
when the evaluation grade is high grade, automatically jumping to the next stage as target operation;
when the evaluation grade is a medium-high grade, taking a display selection item as a target operation, wherein the selection item comprises repeating the current stage and/or jumping to the next stage;
and when the evaluation level is a medium level or a low level, repeating the current stage and forbidding jumping to the next stage as target operation.
On the basis of the above embodiment, the association operation module 43 determines the target operation associated with the next stage of the current stage according to the evaluation level, and further includes:
when the evaluation level is a medium level or a low level, displaying text data corresponding to the standard voice data in a prompt area;
and in the current stage, if the number of times of the evaluation grade of the medium grade or the low grade is greater than a preset threshold value, displaying the selection item.
According to the processing device of the voice data, provided by the embodiment of the invention, the learning or training process of a user is divided into a plurality of stages, the user voice data input by the user in each stage is matched and compared with the corresponding marked voice data to determine the corresponding evaluation level, and then the target operation related to the next stage of the current stage can be executed based on the evaluation level; when the evaluation level of the user is higher, the user can automatically jump to the next stage, the relevance between the two stages is enhanced, and the learning efficiency of the user is improved; when the evaluation level is low, the relevance between the two stages can be cut off, the user is forced to learn repeatedly in the current stage, the user is helped to master knowledge points, and the learning effect of the user is improved. Standard keywords are set for the labeled voice data, so that a user can learn knowledge points related to the standard keywords, and the learning effect is further improved; a plurality of intervals are set for different values of user voice data in advance, rating parameters under different application scenes can be adjusted rapidly, and the method can adapt to switching to different application scenes. The user speech data at this time can be comprehensively and accurately evaluated by setting the middle level (the middle level) as a bottom-in-pocket condition.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer-executable instructions, which include a program for executing the above-mentioned voice data processing method, and the computer-executable instructions can execute the method in any of the above-mentioned method embodiments.
The computer storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical memory (e.g., CD, DVD, BD, HVD, etc.), and semiconductor memory (e.g., ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), Solid State Disk (SSD)), etc.
Fig. 5 shows a block diagram of an electronic device according to another embodiment of the present invention. The electronic device 1100 may be a host server with computing capabilities, a personal computer PC, or a portable computer or terminal that is portable, or the like. The specific embodiment of the present invention does not limit the specific implementation of the electronic device.
The electronic device 1100 includes at least one processor (processor)1110, a Communications Interface 1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 communicate with each other via the bus 1140.
The communication interface 1120 is used for communicating with network elements including, for example, virtual machine management centers, shared storage, etc.
Processor 1110 is configured to execute programs. Processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The memory 1130 is used for executable instructions. The memory 1130 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1130 may also be a memory array. The storage 1130 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The instructions stored by the memory 1130 are executable by the processor 1110 to enable the processor 1110 to perform a method of processing voice data in any of the method embodiments described above.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the modifications or alternative embodiments within the technical scope of the present invention, and shall be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for processing voice data, comprising:
acquiring user voice data input by a user, wherein the user voice data is related to standard voice data of the current stage;
determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data;
and determining a target operation associated with the next phase of the current phase according to the evaluation level, and executing the target operation.
2. The method according to claim 1, wherein the determining the rating level of the user voice data according to the matching degree between the user voice data and the standard voice data comprises:
if the standard voice data contains standard keywords, determining a key value of the user voice data according to the degree of the user voice data hitting the standard keywords;
determining a corresponding integrity value according to the integrity degree of the user voice data and determining a corresponding pronunciation value according to the pronunciation condition of the user voice data by taking the standard voice data as a reference;
and determining the evaluation level of the user voice data according to the key value, the complete value and the pronunciation value.
3. The method of claim 2, wherein said determining a rating level for the user speech data based on the key value, the full value, and the pronunciation value comprises:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
when the key value indicates that the user voice data hits the standard key words, if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
when the key value indicates that the user voice data does not hit the standard key word, if the complete value belongs to the second complete interval, determining that the evaluation level of the user voice data is a low level;
when the key value indicates that the user voice data does not hit the standard key word, if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and when the key value, the integrity value and the pronunciation value belong to other intervals, determining the evaluation level of the user voice data as a middle level.
4. The method according to claim 2, wherein the determining the rating of the user speech data according to the matching degree between the user speech data and the standard speech data further comprises:
and if the standard voice data does not contain the standard key words, determining the evaluation level of the user voice data according to the complete value and the pronunciation value.
5. The method of claim 4, wherein said determining a rating level for the user speech data based on the integrity value and the pronunciation value comprises:
presetting a first complete interval, a second complete interval, a first pronunciation interval, a second pronunciation interval and a third pronunciation interval;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the first pronunciation interval, determining that the evaluation level of the user voice data is a high level;
if the complete value belongs to the first complete interval and the pronunciation value belongs to the second pronunciation interval, determining that the evaluation level of the user voice data is a medium-high level;
if the complete value belongs to the second complete interval, determining the evaluation level of the user voice data as a low level;
if the pronunciation value belongs to the third pronunciation interval, determining that the evaluation level of the user voice data is a middle level; and determining the evaluation level of the user voice data as a middle level when the integrity value and the pronunciation value belong to other intervals.
6. The method of claim 1, wherein said determining a target operation associated with a next stage of the current stage based on the evaluation level comprises:
when the evaluation grade is high grade, automatically jumping to the next stage as target operation;
when the evaluation grade is a medium-high grade, taking a display selection item as a target operation, wherein the selection item comprises repeating the current stage and/or jumping to the next stage;
and when the evaluation level is a medium level or a low level, repeating the current stage and forbidding jumping to the next stage as target operation.
7. The method of claim 6, wherein said determining a target operation associated with a next phase of the current phase based on the evaluation level further comprises:
when the evaluation level is a medium level or a low level, displaying text data corresponding to the standard voice data in a prompt area;
and in the current stage, if the number of times of the evaluation grade of the medium grade or the low grade is greater than a preset threshold value, displaying the selection item.
8. An apparatus for processing voice data, comprising:
the acquisition module is used for acquiring user voice data input by a user, wherein the user voice data is related to standard voice data of the current stage;
the evaluation module is used for determining the evaluation level of the user voice data according to the matching degree between the user voice data and the standard voice data;
and the association operation module is used for determining a target operation associated with the next phase of the current phase according to the evaluation level and executing the target operation.
9. A computer storage medium storing computer-executable instructions for performing the method of processing speech data according to any one of claims 1 to 7.
10. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of processing speech data according to any one of claims 1 to 7.
CN202010130126.6A 2020-02-28 2020-02-28 Voice data processing method and device, storage medium and electronic equipment Withdrawn CN111370029A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010130126.6A CN111370029A (en) 2020-02-28 2020-02-28 Voice data processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010130126.6A CN111370029A (en) 2020-02-28 2020-02-28 Voice data processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111370029A true CN111370029A (en) 2020-07-03

Family

ID=71211606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010130126.6A Withdrawn CN111370029A (en) 2020-02-28 2020-02-28 Voice data processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111370029A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040166480A1 (en) * 2003-02-14 2004-08-26 Sayling Wen Language learning system and method with a visualized pronunciation suggestion
CN108428382A (en) * 2018-02-14 2018-08-21 广东外语外贸大学 It is a kind of spoken to repeat methods of marking and system
CN108831503A (en) * 2018-06-07 2018-11-16 深圳习习网络科技有限公司 A kind of method and device for oral evaluation
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN109817201A (en) * 2019-03-29 2019-05-28 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium
CN109872726A (en) * 2019-03-26 2019-06-11 北京儒博科技有限公司 Pronunciation evaluating method, device, electronic equipment and medium
CN110136497A (en) * 2018-02-02 2019-08-16 上海流利说信息技术有限公司 Data processing method and device for verbal learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040166480A1 (en) * 2003-02-14 2004-08-26 Sayling Wen Language learning system and method with a visualized pronunciation suggestion
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN110136497A (en) * 2018-02-02 2019-08-16 上海流利说信息技术有限公司 Data processing method and device for verbal learning
CN108428382A (en) * 2018-02-14 2018-08-21 广东外语外贸大学 It is a kind of spoken to repeat methods of marking and system
CN108831503A (en) * 2018-06-07 2018-11-16 深圳习习网络科技有限公司 A kind of method and device for oral evaluation
CN109872726A (en) * 2019-03-26 2019-06-11 北京儒博科技有限公司 Pronunciation evaluating method, device, electronic equipment and medium
CN109817201A (en) * 2019-03-29 2019-05-28 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110083693B (en) Robot dialogue reply method and device
CN112487139B (en) Text-based automatic question setting method and device and computer equipment
CN111833853B (en) Voice processing method and device, electronic equipment and computer readable storage medium
WO2017097061A1 (en) Smart response method and apparatus
TWI727476B (en) Adaptability job vacancies matching system and method
CN110085261A (en) A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium
CN106649739B (en) Multi-round interactive information inheritance identification method and device and interactive system
GB2571199A (en) Method and system for training a chatbot
JP2016045420A (en) Pronunciation learning support device and program
CN112183117B (en) Translation evaluation method and device, storage medium and electronic equipment
CN109492085B (en) Answer determination method, device, terminal and storage medium based on data processing
CN114021718A (en) Model behavior interpretability method, system, medium, and apparatus
CN111370029A (en) Voice data processing method and device, storage medium and electronic equipment
CN112214579B (en) Machine intelligent review method and system for short answer questions
CN110930988B (en) Method and system for determining phoneme score
CN117370190A (en) Test case generation method and device, electronic equipment and storage medium
CN116258613A (en) Course planning method, course planning device, and readable storage medium
CN112687296B (en) Audio disfluency identification method, device, equipment and readable storage medium
CN111739518B (en) Audio identification method and device, storage medium and electronic equipment
Passonneau et al. Learning about voice search for spoken dialogue systems
CN111881694B (en) Chapter point detection method, apparatus, device and storage medium
CN112560431A (en) Method, apparatus, device, storage medium, and computer program product for generating test question tutoring information
KR102203074B1 (en) English learning methods and devices
CN115546355B (en) Text matching method and device
CN112541651B (en) Electronic device, pronunciation learning method, server device, pronunciation learning processing system, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200703