WO2021179445A1 - Conversation state prediction-based multi-round conversation method, device, and computer apparatus - Google Patents
Conversation state prediction-based multi-round conversation method, device, and computer apparatus Download PDFInfo
- Publication number
- WO2021179445A1 WO2021179445A1 PCT/CN2020/093426 CN2020093426W WO2021179445A1 WO 2021179445 A1 WO2021179445 A1 WO 2021179445A1 CN 2020093426 W CN2020093426 W CN 2020093426W WO 2021179445 A1 WO2021179445 A1 WO 2021179445A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- round
- dialogue
- state
- text
- preset
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000012545 processing Methods 0.000 claims abstract description 23
- 239000013598 vector Substances 0.000 claims description 100
- 238000004364 calculation method Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 15
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000004044 response Effects 0.000 abstract description 4
- 108010001267 Protein Subunits Proteins 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to the field of artificial intelligence, in particular to a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction.
- Multi-round dialogue technology is used to realize rapid information interaction between humans and computers.
- the multi-round dialogue system includes modules such as speech recognition, language understanding, dialogue state maintenance, action candidate sorting, language generation, and speech synthesis.
- the answer logic mainly reflects In the dialog state maintenance module, that is, after receiving the output of the language understanding module, it is judged what state the system should jump to.
- the inventor realizes that the dialogue state maintenance module can generally be set by manual rules, but the dialogue state maintenance module based on manual rules does not have generalization ability, that is, when the user inputs special information, the manual rules are not set for the special information. It will cause the interruption of the entire multi-round dialogue. Therefore, the generalization ability of the traditional multi-round dialogue scheme is poor, and the running smoothness cannot be guaranteed.
- the main purpose of this application is to provide a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction, aiming to improve the generalization ability of the multi-round dialogue scheme and ensure fluency.
- this application proposes a multi-round dialogue method based on dialogue state prediction, which includes the following steps:
- a preset voice recognition method perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
- the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text.
- the previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
- a preset voice output device is used to output the (i+1)th round of reply voice.
- This application provides a multi-round dialogue device based on dialogue state prediction, including:
- the (i+1)th round of speech acquisition unit is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;
- the (i+1)th round of text and text acquisition unit is configured to perform voice recognition processing on the (i+1)th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;
- a dialogue state generation condition judging unit configured to judge whether the i+1th round of text and text triggers a preset dialogue state generation condition
- the predictive dialogue state acquisition unit is configured to use preset p dialogue state prediction tools if the i+1th round of text text does not trigger a preset dialogue state generation condition, based on the i+1th round of text text Perform dialogue state prediction with the preceding information corresponding to the i+1th round of text, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1,
- the preceding information includes at least the first round of text,..., the i-th round of text;
- a predictive dialogue state judging unit configured to determine whether the p predicted dialogue states are the same
- the i+1th round reply voice acquisition unit is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and according to the correspondence between the preset dialogue state and the reply voice Relationship, get the i+1 round reply voice;
- the i+1th round reply voice output unit is configured to use a preset voice output device to output the i+1th round reply voice.
- the present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor executes the computer program when the computer program is executed.
- a multi-round dialogue method based on dialogue state prediction includes the following steps:
- a preset voice recognition method perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
- the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text.
- the previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
- a preset voice output device is used to output the (i+1)th round of reply voice.
- This application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed when the computer program is executed by a processor.
- a multi-round dialogue method based on dialogue state prediction includes the following steps:
- a preset voice recognition method perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
- the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text.
- the previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
- a preset voice output device is used to output the (i+1)th round of reply voice.
- the multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction of the present application acquire the i+1th round of voice input by the user after i-round dialogue with the user; Perform voice recognition processing to obtain the i+1 round of text; determine whether the i+1 round of text triggers a preset dialog state generation condition; if the i+1 round of text does not trigger a preset
- the dialog state generation condition is to use preset p dialog state prediction tools to predict the dialog state, thereby obtaining p predicted dialog states; determine whether the p predicted dialog states are the same; if the p predicted dialog states are the same, Then update the current state of the multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the corresponding relationship between the preset dialogue state and the reply voice; use the preset voice output device to output the Reply voice in the i+1 round.
- the generalization ability of the multi-round dialogue scheme is improved and fluency is ensured. Therefore, the method of integrating p dialog state prediction tools (to improve the accuracy of prediction) and the method of using the previous information to predict the dialog state (making the analysis of multiple rounds of dialogs are based on the whole, the data is more sufficient, and the analysis The result is more accurate), which makes the data analysis more adequate, more adaptable (that is, the generalization ability is improved), and the dialogue is more fluent.
- FIG. 1 is a schematic flowchart of a multi-round dialogue method based on dialogue state prediction according to an embodiment of this application;
- FIG. 2 is a schematic block diagram of the structure of a multi-round dialogue device based on dialogue state prediction according to an embodiment of the application;
- FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
- an embodiment of the present application provides a multi-round dialogue method based on dialogue state prediction, including the following steps:
- S2 perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text text;
- the i+1th round of text does not trigger the preset dialog state generation condition, then use p preset dialog state prediction tools, based on the i+1th round of text and the i-th +1 rounds of the preceding information corresponding to the text, perform dialog state prediction, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, and the preceding information includes at least The first round of text,..., the i-th round of text;
- step S1 after i rounds of dialogue with the user, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1.
- This application is applied in the process of multiple rounds of dialogue, so it is implemented after the first round of dialogue, that is, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1.
- the voice recognition processing is performed on the i+1th round of speech, so as to obtain the i+1th round of text.
- the speech recognition method can adopt any feasible method, for example, an open source speech recognition tool is used to process speech into text.
- the open source speech recognition tool is, for example, Google's open source Live Transcribe speech recognition to text tool.
- step S3 it is determined whether the (i+1)th round of texts triggers a preset dialog state generation condition.
- the dialog state generation conditions can be pre-recorded in a preset configuration file, such as a json configuration file, where the trigger condition corresponds to the "trigger" part of the json.
- a preset configuration file such as a json configuration file
- the trigger condition corresponds to the "trigger" part of the json.
- step S4 if the i+1th round of text does not trigger the preset dialog state generation condition, then preset p dialog state prediction tools are used, based on the i+1th round of text and The preceding information corresponding to the i+1th round of text is used to predict the dialog state, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, so The foregoing information includes at least the first round of text,..., the i-th round of text.
- the dialogue state prediction tool may be any feasible tool, for example, a dialogue state prediction tool based on neural network model training, or a dialogue state prediction tool based on an external knowledge base.
- the p dialog state prediction tools are used to continue the breakpoint, that is, predict Out of the dialogue state to maintain multiple rounds of dialogue.
- the i+1th round of text does not trigger the preset dialogue state generation condition
- the multiple rounds of dialogue are forced to end or the multiple rounds of dialogue are forced to restart, which is not conducive to the smooth operation of the multiple rounds of dialogue state.
- the so-called dialogue state is a data structure containing the dialogue history from time 0 to time t (for example, the current time).
- the predicted dialogue state is, for example, M1-M2-M3, where M1-M2 is the dialogue history (that is, two rounds of dialogue have occurred, including data such as the user's input and the user's reply), and M3 is the newly predicted dialogue The new part of the state.
- the dialogue state may also be accompanied by labels of fluency and quality, such as smooth, unsmooth, or good, excellent, or poor dialogue quality, so that the data is more accurate and it is more conducive to realizing accurate dialogue state prediction.
- step S5 it is determined whether the p predicted dialog states are the same. If the p predicted dialogue states are the same, it means that all dialogue state prediction tools predict the same dialogue state, and the predicted dialogue state is the final dialogue state, that is, the current state of multiple rounds of dialogue should be updated to the predicted dialogue state.
- step S6 if the p predicted dialogue states are the same, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state, and the i-th dialogue state is obtained according to the preset correspondence relationship between the dialogue state and the reply voice. +1 round of reply voice.
- the current state of the round dialogue is updated to the predicted dialogue state, it indicates that the computer has understood the i+1 round voice input by the user, and therefore should output the corresponding reply voice.
- This application presets the corresponding relationship between the dialogue state and the reply voice, so the i+1th round of reply voice can be accurately obtained.
- the preset voice output device is used to output the i+1th round of reply voice.
- the voice output device is, for example, a speaker or a sound box.
- the output of the i+1th round of reply voice is used to maintain multiple rounds of dialogue and give the user the opportunity to conduct the i+2th round of dialogue.
- the step S3 of judging whether the (i+1)th round of texts triggers a preset dialog state generation condition includes:
- S302 Determine whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;
- the method includes:
- the configuration file is, for example, a json configuration file, where the trigger condition, reply content, and jump state correspond to the "trigger" part, the "output” part, and the "state” part of the json, respectively.
- the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations,
- the step S4 of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:
- S402. Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain;
- the state chain means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain.
- the node relationship of the corresponding state nodes in the state chain is the same;
- the external knowledge base stores multiple historical rounds of dialogue, which can be used as a basis for predicting the state of the dialogue. Multiple rounds of dialogue are composed of multiple rounds of dialogue.
- the execution terminal of this application will determine what the current dialogue state is, and then decide what kind of reply voice should be returned. This is a manual rule Standard process.
- the first state chain is, for example, T1-T2, that is, the current multi-round dialogue is stuck in the third round of dialogue. Therefore, the historical multi-round dialogue with the T1-T2 chain is obtained from the external knowledge base.
- the historical multi-round dialogue is regarded as the designated
- the state node directly connected to the T1-T2 chain is T5
- T5 is a designated state node
- the predicted dialogue state node corresponding to the designated dialogue state prediction tool should be recorded as T5.
- the second state chain of the designated historical multi-round dialogue includes the first state chain
- the designated historical multi-round dialogue is similar to the current multi-round dialogue.
- the multi-round dialogue is stuck, refer to the designated historical multi-round
- the dialogue can give a relatively accurate prediction of the dialogue state, so as to maintain the progress of multiple rounds of dialogue.
- the priority search principle or the voting decision principle can be adopted to select the most accurate predicted dialogue state.
- the principle of preferential search refers to the first designated state node of the designated historical multi-round dialogue searched as the predicted dialogue state.
- the voting decision principle refers to the maximum number of designated state nodes as the predicted dialogue state. For example, there are three historical multi-round dialogues, and their state chains are T1-T2-T5-T8, T1-T2-T4-T7, T1-T2. -T5-T9, then T5 is the designated state node with the largest number, so T5 is used as the predicted dialogue state node.
- T1 is, for example, the status of determining the user authority, and the output reply voice is, for example: "Authority verification is correct, please select the business to be handled" (for example, the user enters the user name and password in the first round of dialogue);
- T2 is the business confirmation Status, the output response voice is, for example, "Do you need to adjust the temporary quota or the fixed quota?" (For example, the user enters a voice similar to "I want to adjust the credit limit");
- T3 is the status of the quota category confirmation, and the output voice is, for example, "You need How to adjust the temporary quota” (for example, the user has entered the voice of "temporary quota”).
- the above example of T1-T3 is only for explaining one application scenario of this application, but not as a limitation to this application.
- the method includes:
- this application uses a preset similarity calculation method to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue, so as to obtain multiple corresponding to all designated historical multi-round dialogues. Similarity value; to obtain the designated state node of the designated historical multi-round dialogue corresponding to the maximum similarity value, and record it as the method of predicting the dialogue state corresponding to the designated dialogue state prediction tool to ensure that the most current multi-round dialogue is obtained Multiple rounds of similar designated history dialogues. Therefore, the next dialogue state of the most similar designated historical multi-round dialogue is most likely to be the dialogue state of the current multi-round dialogue. Thereby improving the accuracy of predicting the state of the dialogue.
- the step S4031 of calculating the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue according to a preset similarity calculation method includes:
- the similarity calculation method of the preset similarity calculation method is implemented to calculate the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue.
- This application not only uses the current round of voice input by the user as the basis for similar calculations, but also uses the user’s previous voice input as the basis for similar calculations, so as to improve the accuracy of similar calculations.
- the word vector database is used to map words to vectors, and is a common database in the field of natural language analysis.
- the word vector library is used to obtain i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively, and the i+1th round of speech
- a word vector sequence is connected in order to obtain the first comprehensive vector X; and by querying the universal word vector library, the first round of voice input by the user in the designated historical multiple rounds of dialogue,..., the i+1 round
- the i+1 second word vector sequences corresponding to the speech respectively, and the i+1 second word vector sequences are sequentially connected to obtain the second integrated vector Y.
- the similarity judgment between the current multiple rounds of dialogue and the historical multiple rounds of dialogue is transformed into a similarity calculation between vectors. According to the formula:
- the method includes:
- the current state of multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group, and the dialogue state prediction tool corresponding to the second group is deleted from the p dialogue state prediction tools .
- p predicted dialogue states are all the same, but in fact, the prediction accuracy of p dialogue state prediction tools are different, so it is very likely that p predicted dialogue states are not exactly the same.
- this application divides the p predicted dialog states into multiple groups, where the first group with the most members in the group indicates that most of the dialog state prediction tools recognize the predicted dialog state Therefore, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group.
- the dialog state prediction tools corresponding to the second group are also deleted from the p dialog state prediction tools, so as to improve the accuracy of the next prediction.
- the relative weight of the dialogue state prediction tool thereby improving the accuracy of the subsequent possible dialogue state prediction.
- an embodiment of the present application provides a multi-round dialogue device based on dialogue state prediction, including:
- the (i+1)th round of speech acquisition unit 10 is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;
- the i+1th round of text and text acquisition unit 20 is configured to perform voice recognition processing on the i+1th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;
- the dialog state generation condition determination unit 30 is configured to determine whether the i+1th round of text and text triggers a preset dialog state generation condition
- the predictive dialogue state acquisition unit 40 is configured to use preset p dialogue state prediction tools if the i+1th round of text does not trigger a preset dialogue state generation condition, based on the i+1th round of text
- the text and the preceding information corresponding to the i+1-th round of textual text are used to predict the dialogue state, thereby obtaining p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1 ,
- the preceding information includes at least the first round of text,..., the i-th round of text;
- the predicted dialogue state judging unit 50 is used to judge whether the p predicted dialogue states are the same;
- the i+1th round reply voice acquiring unit 60 is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and based on the preset dialogue state and the reply voice Correspondence, get the i+1th round reply voice;
- the i+1th round reply voice output unit 70 is configured to use a preset voice output device to output the i+1th round reply voice.
- the dialog state generation condition judgment unit 30 includes:
- the word segmentation processing subunit is used to perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;
- the configuration file judging subunit is used to judge whether the keyword or the combination of the keywords is recorded in a preset configuration file, wherein the configuration file records the trigger condition, the reply voice, and the jump state;
- the dialog state generation condition judging subunit is used to determine that if the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;
- the device includes:
- the dialogue state update unit is configured to update the current state of multiple rounds of dialogue to the jump state if the i+1th round of text triggers a preset dialogue state generation condition, and use a preset voice output device Output the reply voice.
- the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations,
- the predictive dialogue state obtaining unit 40 includes:
- the first state chain generation subunit is used to use the specified dialog state prediction tool to generate the current multiple rounds according to the i+1th round of text and the preceding information corresponding to the i+1th round of text
- the first state chain of the dialogue
- the node relationship of is also the same as the node relationship of the corresponding state node in the second state chain;
- the designated historical multi-round dialogue quantity judging subunit is used for judging whether the designated historical multi-round dialogue quantity is equal to 1;
- the designated state node obtaining subunit is configured to obtain the designated state node in the second state chain if the number of the designated history multi-round dialogue is equal to 1, and record the designated state node as the designated dialogue state The prediction dialog state corresponding to the prediction tool, wherein the designated state node is directly connected to the first state chain.
- the device includes:
- the similarity calculation unit is used to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue according to the preset similarity calculation method if the number of the designated historical multi-round dialogue is not equal to 1, thereby obtaining the similarity between the designated historical multi-round dialogue and the current multi-round dialogue.
- the predicted dialogue state marking unit is used to obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node and the The first state chain is directly connected.
- the similarity calculation unit includes:
- the first comprehensive vector X obtaining subunit is used to obtain the i+1 first words corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively by querying the universally set word vector library Vector sequence, and sequentially connect the i+1 first word vector sequences to obtain the first comprehensive vector X;
- the second comprehensive vector Y obtaining subunit is used to obtain the i corresponding to the first round of speech,..., and the i+1th round of speech input by the user in the specified historical multiple rounds of dialogue by querying the universally set word vector library +1 second word vector sequence, and sequentially connect the i+1 second word vector sequences to obtain a second comprehensive vector Y;
- the similarity M calculation subunit is used according to the formula:
- the device includes:
- a grouping division unit configured to divide the p predicted dialog states into multiple groups if the p predicted dialog states are not completely the same, wherein each group includes only one predicted dialog state;
- the first group obtaining unit is configured to obtain the first group with the most members in the group from the plurality of groups, and update the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;
- the second group obtaining unit is configured to obtain the second group with the least members in the group from the plurality of groups, and delete the dialog state prediction tool corresponding to the second group from the p dialog state prediction tools.
- an embodiment of the present application also provides a computer device.
- the computer device may be a server, and its internal structure may be as shown in the figure.
- the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
- the database of the computer equipment is used to store data used in the multi-round dialogue method based on dialogue state prediction.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer program is executed by the processor to realize a multi-round dialogue method based on dialogue state prediction.
- the above-mentioned processor executes the above-mentioned multi-round dialogue method based on dialogue state prediction, wherein the steps included in the method respectively correspond to the steps of executing the multi-round dialogue method based on dialogue state prediction in the aforementioned embodiment one-to-one, and will not be repeated here.
- the multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,...
- the i-th round of text determines whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.
- An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
- the storage medium is a volatile storage medium or a non-volatile storage medium.
- the multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,...
- the i-th round of text determines whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
A conversation state prediction-based multi-round conversation method, a device, a computer apparatus, and a storage medium. The method comprises: acquiring, after an i-th conversation round with a user, (i+1)-th round speech data input by the user (S1); performing, according to a preset speech recognition method, speech recognition processing on the (i+1)-th round speech data to obtain (i+1)-th round text data (S2); determining whether the (i+1)-th round text data triggers a preset conversation state generation criterion (S3); if not, performing conversation state prediction by using p preset conversation state prediction tools to obtain p predicted conversation states (S4); determining whether the p predicted conversation states are the same (S5); if so, updating a current state of a multi-round conversation as the predicted conversation states, and acquiring, according to a correspondence relationship between the predicted conversation states and response speech data, (i+1)-th round response speech data (S6); and outputting the (i+1)-th round response speech data by using a preset speech output device (S7). In this way, the invention improves generalizability of multi-round conversation solutions, and guarantees fluency.
Description
本申请要求于2020年3月13日提交中国专利局、申请号为202010177686.7,发明名称为“基于对话状态预测的多轮对话方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 13, 2020, the application number is 202010177686.7, and the invention title is "Multi-round dialogue method, device and computer equipment based on dialogue state prediction", and its entire content Incorporated in this application by reference.
本申请涉及到人工智能领域,特别是涉及到一种基于对话状态预测的多轮对话方法、装置、计算机设备和存储介质。This application relates to the field of artificial intelligence, in particular to a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction.
多轮对话技术用以实现人与计算机之类的快速信息交互,其中多轮对话系统包括语音识别、语言理解、对话状态维护、动作候选排序、语言生成、语音合成等模块,其中回答逻辑主要体现在对话状态维护模块中,也即,当接收到语言理解模块的输出之后,判断系统应该跳转到什么状态。发明人意识到,对话状态维护模块一般可采用人工规则来设置,但是基于人工规则的对话状态维护模块不具有泛化能力,即当用户输入特别信息后,人工规则未针对该特别信息进行设置,则会造成整个多轮对话的中断。因此,传统的多轮对话方案的泛化能力差,运行流畅性得不到保证。Multi-round dialogue technology is used to realize rapid information interaction between humans and computers. The multi-round dialogue system includes modules such as speech recognition, language understanding, dialogue state maintenance, action candidate sorting, language generation, and speech synthesis. The answer logic mainly reflects In the dialog state maintenance module, that is, after receiving the output of the language understanding module, it is judged what state the system should jump to. The inventor realizes that the dialogue state maintenance module can generally be set by manual rules, but the dialogue state maintenance module based on manual rules does not have generalization ability, that is, when the user inputs special information, the manual rules are not set for the special information. It will cause the interruption of the entire multi-round dialogue. Therefore, the generalization ability of the traditional multi-round dialogue scheme is poor, and the running smoothness cannot be guaranteed.
本申请的主要目的为提供一种基于对话状态预测的多轮对话方法、装置、计算机设备和存储介质,旨在提高多轮对话方案的泛化能力,保证流畅性。The main purpose of this application is to provide a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction, aiming to improve the generalization ability of the multi-round dialogue scheme and ensure fluency.
为了实现上述目的,本申请提出一种基于对话状态预测的多轮对话方法,包括以下步骤:In order to achieve the above objective, this application proposes a multi-round dialogue method based on dialogue state prediction, which includes the following steps:
在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;
根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;
若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;
若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;
采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
本申请提供一种基于对话状态预测的多轮对话装置,包括:This application provides a multi-round dialogue device based on dialogue state prediction, including:
第i+1轮语音获取单元,用于在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;The (i+1)th round of speech acquisition unit is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;
第i+1轮文字文本获取单元,用于根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;The (i+1)th round of text and text acquisition unit is configured to perform voice recognition processing on the (i+1)th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;
对话状态生成条件判断单元,用于判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;A dialogue state generation condition judging unit, configured to judge whether the i+1th round of text and text triggers a preset dialogue state generation condition;
预测对话状态获取单元,用于若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;The predictive dialogue state acquisition unit is configured to use preset p dialogue state prediction tools if the i+1th round of text text does not trigger a preset dialogue state generation condition, based on the i+1th round of text text Perform dialogue state prediction with the preceding information corresponding to the i+1th round of text, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, The preceding information includes at least the first round of text,..., the i-th round of text;
预测对话状态判断单元,用于判断所述p个预测对话状态是否相同;A predictive dialogue state judging unit, configured to determine whether the p predicted dialogue states are the same;
第i+1轮回复语音获取单元,用于若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;The i+1th round reply voice acquisition unit is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and according to the correspondence between the preset dialogue state and the reply voice Relationship, get the i+1 round reply voice;
第i+1轮回复语音输出单元,用于采用预设的语音输出装置,输出所述第i+1轮回复语音。The i+1th round reply voice output unit is configured to use a preset voice output device to output the i+1th round reply voice.
本申请提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现The present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor executes the computer program when the computer program is executed.
一种基于对话状态预测的多轮对话方法,包括以下步骤:A multi-round dialogue method based on dialogue state prediction includes the following steps:
在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;
根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;
若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;
若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;
采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
本申请提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现This application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed when the computer program is executed by a processor.
一种基于对话状态预测的多轮对话方法,包括以下步骤:A multi-round dialogue method based on dialogue state prediction includes the following steps:
在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;
根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;
判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;
若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;
判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;
若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;
采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
本申请的基于对话状态预测的多轮对话方法、装置、计算机设备和存储介质,在与用户进行i轮对话之后,获取用户输入的第i+1轮语音;对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具进行对话状态预测,从而得到p个预测对话状态;判断所述p个预测对话状态是否相同;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;采用预设的语音输出装置,输出所述第i+1轮回复语音。从而提高了多轮对话方案的泛化能力,保证了流畅性。从而利用综合p个对话状态预测工具的方式(使预测准确性得到提高),以及利用前文信息进行对话状态预测的方式(使得多轮对话的分析是基于整体而进行的,数据更加充分,使分析结果更准确),使得数据分析更加充分,适应性更强(即泛化能力得到提高),使对话流畅性更高。The multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction of the present application acquire the i+1th round of voice input by the user after i-round dialogue with the user; Perform voice recognition processing to obtain the i+1 round of text; determine whether the i+1 round of text triggers a preset dialog state generation condition; if the i+1 round of text does not trigger a preset The dialog state generation condition is to use preset p dialog state prediction tools to predict the dialog state, thereby obtaining p predicted dialog states; determine whether the p predicted dialog states are the same; if the p predicted dialog states are the same, Then update the current state of the multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the corresponding relationship between the preset dialogue state and the reply voice; use the preset voice output device to output the Reply voice in the i+1 round. Thereby, the generalization ability of the multi-round dialogue scheme is improved and fluency is ensured. Therefore, the method of integrating p dialog state prediction tools (to improve the accuracy of prediction) and the method of using the previous information to predict the dialog state (making the analysis of multiple rounds of dialogs are based on the whole, the data is more sufficient, and the analysis The result is more accurate), which makes the data analysis more adequate, more adaptable (that is, the generalization ability is improved), and the dialogue is more fluent.
图1为本申请一实施例的基于对话状态预测的多轮对话方法的流程示意图;FIG. 1 is a schematic flowchart of a multi-round dialogue method based on dialogue state prediction according to an embodiment of this application;
图2为本申请一实施例的基于对话状态预测的多轮对话装置的结构示意框图;2 is a schematic block diagram of the structure of a multi-round dialogue device based on dialogue state prediction according to an embodiment of the application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
本申请的最佳实施方式The best implementation of this application
参照图1,本申请实施例提供一种基于对话状态预测的多轮对话方法,包括以下步骤:1, an embodiment of the present application provides a multi-round dialogue method based on dialogue state prediction, including the following steps:
S1、在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;S1, after i rounds of dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;
S2、根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;S2, according to the preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text text;
S3、判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;S3. Determine whether the i+1th round of text and text triggers a preset dialog state generation condition;
S4、若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;S4. If the i+1th round of text does not trigger the preset dialog state generation condition, then use p preset dialog state prediction tools, based on the i+1th round of text and the i-th +1 rounds of the preceding information corresponding to the text, perform dialog state prediction, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, and the preceding information includes at least The first round of text,..., the i-th round of text;
S5、判断所述p个预测对话状态是否相同;S5. Determine whether the p predicted dialog states are the same;
S6、若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;S6. If the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice ;
S7、采用预设的语音输出装置,输出所述第i+1轮回复语音。S7. Use a preset voice output device to output the i+1th round of reply voice.
本申请在多轮对话发生卡壳时(即所述第i+1轮文字文本未触发预设的对话状态生成条件),采用特殊的设置以保证多轮对话的持续进行,即采用预设的p个对话状态预测工具预测对话状态。从而提高了多轮对话方案的泛化能力,保证了运行流畅性。In this application, when multiple rounds of dialogue are stuck (that is, the i+1th round of text does not trigger the preset dialogue state generation conditions), special settings are adopted to ensure the continuous progress of multiple rounds of dialogue, that is, the preset p A dialog state prediction tool predicts the dialog state. Thereby, the generalization ability of the multi-round dialogue scheme is improved, and the running smoothness is ensured.
如上述步骤S1所述,在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数。本申请是应用在多轮对话的过程中,因此是在第一轮对话之后才实施,即获取用户输入的第i+1轮语音,其中i为大于1的整数。As described in step S1 above, after i rounds of dialogue with the user, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1. This application is applied in the process of multiple rounds of dialogue, so it is implemented after the first round of dialogue, that is, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1.
如上述步骤S2所述,根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本。其中,所述语音识别方法可采用任意可行的方法,例如采用开源的语音识别工具,以将语音处理为文字文本。所述开源的语音识别工具例如为谷歌开源Live Transcribe语音识别转文字工具等。As described in the above step S2, according to the preset voice recognition method, the voice recognition processing is performed on the i+1th round of speech, so as to obtain the i+1th round of text. Wherein, the speech recognition method can adopt any feasible method, for example, an open source speech recognition tool is used to process speech into text. The open source speech recognition tool is, for example, Google's open source Live Transcribe speech recognition to text tool.
如上述步骤S3所述,判断所述第i+1轮文字文本是否触发预设的对话状态生成条件。对话状态生成条件可预先记载在预设的配置文件中,例如为一个json配置文件,其中触发条件对应json中的"trigger"部分。当第i+1轮文字文本表述的意图(例如体现为关键词或关键词组合)记载于trigger部分中,则判定所述第i+1轮文字文本触发预设的对话状态生成条件。As described in the foregoing step S3, it is determined whether the (i+1)th round of texts triggers a preset dialog state generation condition. The dialog state generation conditions can be pre-recorded in a preset configuration file, such as a json configuration file, where the trigger condition corresponds to the "trigger" part of the json. When the intention expressed by the i+1th round of text (for example, as a keyword or a combination of keywords) is recorded in the trigger part, it is determined that the i+1th round of text triggers a preset dialog state generation condition.
如上述步骤S4所述,若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本。其中所述对话状态预测工具可以为任意可行的工具,例如为基于神经网络模型训练而 成的对话状态预测工具,或者为基于外部知识库的对话状态预测工具。由于所述第i+1轮文字文本未触发预设的对话状态生成条件,因此按原先的规则是无法维持多轮对话的,因此采用所述p个对话状态预测工具接续上断点,即预测出对话状态,以维持多轮对话。传统方案在第i+1轮文字文本未触发预设的对话状态生成条件之时,或者强硬结束所述多轮对话,或者强行将多轮对话重启,而不利于多轮对话状态的顺畅运行。其中,所谓对话状态,是一种包含0时刻到t时刻(例如为当前时刻)的对话历史的数据结构。预测出的对话状态例如为M1-M2-M3,其中M1-M2是对话历史(即已发生了两轮对话,其中包括用户的输入与用户得到的回复等数据),M3是新预测出的对话状态的新的组成部分。进一步地,对话状态还可以附带有流畅性与质量的标注,例如顺畅、不顺畅还是对话质量良好、优秀、差等标注,从而使数据更加精确,更利于实现精准的对话状态预测。As described in step S4 above, if the i+1th round of text does not trigger the preset dialog state generation condition, then preset p dialog state prediction tools are used, based on the i+1th round of text and The preceding information corresponding to the i+1th round of text is used to predict the dialog state, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, so The foregoing information includes at least the first round of text,..., the i-th round of text. The dialogue state prediction tool may be any feasible tool, for example, a dialogue state prediction tool based on neural network model training, or a dialogue state prediction tool based on an external knowledge base. Since the i+1th round of text does not trigger the preset dialog state generation conditions, it is impossible to maintain multiple rounds of dialog according to the original rules. Therefore, the p dialog state prediction tools are used to continue the breakpoint, that is, predict Out of the dialogue state to maintain multiple rounds of dialogue. In the traditional solution, when the i+1th round of text does not trigger the preset dialogue state generation condition, the multiple rounds of dialogue are forced to end or the multiple rounds of dialogue are forced to restart, which is not conducive to the smooth operation of the multiple rounds of dialogue state. Among them, the so-called dialogue state is a data structure containing the dialogue history from time 0 to time t (for example, the current time). The predicted dialogue state is, for example, M1-M2-M3, where M1-M2 is the dialogue history (that is, two rounds of dialogue have occurred, including data such as the user's input and the user's reply), and M3 is the newly predicted dialogue The new part of the state. Furthermore, the dialogue state may also be accompanied by labels of fluency and quality, such as smooth, unsmooth, or good, excellent, or poor dialogue quality, so that the data is more accurate and it is more conducive to realizing accurate dialogue state prediction.
如上述步骤S5所述,判断所述p个预测对话状态是否相同。若所述p个预测对话状态相同,表明所有的对话状态预测工具均预测相同的对话状态,该预测对话状态就是最终的对话状态,即多轮对话的当前状态应当更新为该预测对话状态。As described in step S5 above, it is determined whether the p predicted dialog states are the same. If the p predicted dialogue states are the same, it means that all dialogue state prediction tools predict the same dialogue state, and the predicted dialogue state is the final dialogue state, that is, the current state of multiple rounds of dialogue should be updated to the predicted dialogue state.
如上述步骤S6所述,若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音。对轮对话的当前状态更新为所述预测对话状态后,表明计算机已经理解用户输入的第i+1轮语音,因此应该输出对应的回复语音。本申请预先设置有对话状态与回复语音的对应关系,因此能够准确获得第i+1轮回复语音。As described in step S6 above, if the p predicted dialogue states are the same, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state, and the i-th dialogue state is obtained according to the preset correspondence relationship between the dialogue state and the reply voice. +1 round of reply voice. After the current state of the round dialogue is updated to the predicted dialogue state, it indicates that the computer has understood the i+1 round voice input by the user, and therefore should output the corresponding reply voice. This application presets the corresponding relationship between the dialogue state and the reply voice, so the i+1th round of reply voice can be accurately obtained.
如上述步骤S7所述,采用预设的语音输出装置,输出所述第i+1轮回复语音。其中所述语音输出装置例如为喇叭或音箱等。输出的第i+1轮回复语音用以维持多轮对话,并给予用户进行第i+2轮对话的时机。As described in step S7 above, the preset voice output device is used to output the i+1th round of reply voice. The voice output device is, for example, a speaker or a sound box. The output of the i+1th round of reply voice is used to maintain multiple rounds of dialogue and give the user the opportunity to conduct the i+2th round of dialogue.
在一个实施方式中,所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤S3,包括:In one embodiment, the step S3 of judging whether the (i+1)th round of texts triggers a preset dialog state generation condition includes:
S301、将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;S301. Perform word segmentation processing on the (i+1)th round of text, so as to obtain multiple keywords;
S302、判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;S302: Determine whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;
S303、若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;S303: If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the (i+1)th round of text triggers a preset dialog state generation condition;
所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤S3之后,包括:After the step S3 of determining whether the (i+1)th round of text triggers a preset dialog state generation condition, the method includes:
S31、若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。S31. If the (i+1)th round of text triggers a preset dialogue state generation condition, update the current state of the multiple rounds of dialogue to the jump state, and use a preset voice output device to output the reply voice .
如上所述,实现了判断所述第i+1轮文字文本是否触发预设的对话状态生成条件。本申请采用配置文件的方式来判断是否触发预设的对话状态生成条件。 其中,所述配置文件例如为json配置文件,其中触发条件、答复内容、跳转状态则分别对应json中的"trigger"部分、"output"部分、”state”部分。以银行领域调整信用卡额度为例进行说明,首先在用户咨询“信用卡额度调整”便会触发额度调整意图(例如配置文件中的"trigger"部分记录了“信用卡”和“额度调整”的组合),因此回答“您需要调整临时额度还是固定额度?”(例如配置文件中的"output"部分记录了“您需要调整临时额度还是固定额度?”),并且state部分记录了007,则将当前状态更新为007状态。从而完成第i+1轮对话。此时,由于已触发对话状态生成条件,因此无需对话状态预测工具也可顺利完成多轮对话。As described above, it is possible to determine whether the (i+1)th round of text triggers the preset dialog state generation condition. This application uses a configuration file method to determine whether to trigger a preset dialog state generation condition. Wherein, the configuration file is, for example, a json configuration file, where the trigger condition, reply content, and jump state correspond to the "trigger" part, the "output" part, and the "state" part of the json, respectively. Take the adjustment of credit card limit in the banking sector as an example. First, when the user consults "credit card limit adjustment", the intention of limit adjustment will be triggered (for example, the "trigger" part of the configuration file records the combination of "credit card" and "limit adjustment"). So answer "Do you need to adjust the temporary quota or the fixed quota?" (For example, the "output" part of the configuration file records "Do you need to adjust the temporary quota or the fixed quota?"), and the state part records 007, then update the current state The status is 007. Thus, the i+1th round of dialogue is completed. At this time, since the dialog state generation condition has been triggered, multiple rounds of dialog can be successfully completed without the need for a dialog state prediction tool.
在一个实施方式中,所述p个对话状态预测工具包括指定对话状态预测工具,所述指定对话状态预测工具预先连接至预设的外部知识库,所述外部知识库存储有历史多轮对话,所述依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测的步骤S4,包括:In one embodiment, the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations, The step S4 of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:
S401、采用所述指定对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,生成当前的多轮对话的第一状态链;S401. Using the specified dialogue state prediction tool, generate a first state chain of the current multi-round dialogue according to the i+1th round of text and the preceding information corresponding to the i+1th round of text.
S402、从所述外部知识库中获取指定历史多轮对话,其中所述指定历史多轮对话的第二状态链包含所述第一状态链;其中,所述第二状态链包含所述第一状态链指,所述第一状态链中的所有状态节点均是所述第二状态链的状态节点,并且所述第一状态链中的所有状态节点之间的节点关系也与所述第二状态链中对应的状态节点的节点关系相同;S402. Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain; The state chain means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain. The node relationship of the corresponding state nodes in the state chain is the same;
S403、判断所述指定历史多轮对话的数量是否等于1;S403. Judge whether the number of the specified historical multiple rounds of dialogue is equal to 1;
S404、若所述指定历史多轮对话的数量等于1,则获取所述第二状态链中的指定状态节点,并将所述指定状态节点记为所述指定对话状态预测工具对应的预测对话状态,其中所述指定状态节点与所述第一状态链直接连接。S404. If the number of the designated history multi-round dialogue is equal to 1, obtain the designated state node in the second state chain, and record the designated state node as the predicted dialogue state corresponding to the designated dialogue state prediction tool , Wherein the designated state node is directly connected to the first state chain.
如上所述,实现了依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测。其中,所述外部知识库存储有历史多轮对话,可作为对话状态的预测的依据。多轮对话是由多个轮次的对话构成的,每轮对话本申请的执行终端均会确定当前的对话状态是什么样的,进而决定应该返回什么样的回复语音,这是人工规则的一个制式流程。所述第一状态链例如为T1-T2,即当前的多轮对话在第3轮对话时出现了卡壳现象。因此从外部知识库中获取具有T1-T2链的历史多轮对话,例如存在T1-T2-T5-T8链(即第二状态链)的历史多轮对话,则该历史多轮对话则作为指定历史多轮对话,与T1-T2链直接连接的状态节点为T5,因此T5为指定状态节点,应该将所述指定对话状态预测工具对应的预测对话状态节点记为T5。由于所述指定历史多轮对话的第二状态链包含所述第一状态链,因此所述指定历史多轮对话与当前多轮对话是相近的,在多轮对话卡壳时,参照指定历史多轮对话即能给出相对准确的预测对话状态,从而维持多轮对话的进行。进一步地,当所述指定历史多轮对话的数量不等于1时,可采用优先搜索原则或投票决策原则,以选出最准确的预测对话状态。其中所述优先搜索原则指,以搜索到的第一个指定历史 多轮对话的指定状态节点,作为预测对话状态。投票决策原则指,以数量最多的指定状态节点作为预测对话状态,例如存在三个历史多轮对话,其状态链分别为T1-T2-T5-T8、T1-T2-T4-T7、T1-T2-T5-T9,则T5为数量最多的指定状态节点,因此T5作为预测对话状态节点。其中,T1例如为确定用户权限的状态,输出的回复语音例如为:“权限验证无误,请选择要办理的业务”(例如用户在第一轮对话中输入了用户名和密码);T2为业务确认状态,输出的回复语音例如为“您需要调整临时额度还是固定额度?”(例如用户输入了“我要调整信用额度”类似语音);T3为额度类别确认状态,输出的语音例如为“您需要如何调整临时额度”(例发用户输入了“临时额度”的语音)。其中,上述T1-T3的举例仅为解释本申请的一种应用场景,但不作为对本申请的限定。As described above, the prediction of the dialogue state based on the i+1th round of text and the preceding information corresponding to the i+1th round of text is realized. Wherein, the external knowledge base stores multiple historical rounds of dialogue, which can be used as a basis for predicting the state of the dialogue. Multiple rounds of dialogue are composed of multiple rounds of dialogue. In each round of dialogue, the execution terminal of this application will determine what the current dialogue state is, and then decide what kind of reply voice should be returned. This is a manual rule Standard process. The first state chain is, for example, T1-T2, that is, the current multi-round dialogue is stuck in the third round of dialogue. Therefore, the historical multi-round dialogue with the T1-T2 chain is obtained from the external knowledge base. For example, there is a historical multi-round dialogue with the T1-T2-T5-T8 chain (that is, the second state chain), and the historical multi-round dialogue is regarded as the designated In the history of multiple conversations, the state node directly connected to the T1-T2 chain is T5, so T5 is a designated state node, and the predicted dialogue state node corresponding to the designated dialogue state prediction tool should be recorded as T5. Since the second state chain of the designated historical multi-round dialogue includes the first state chain, the designated historical multi-round dialogue is similar to the current multi-round dialogue. When the multi-round dialogue is stuck, refer to the designated historical multi-round The dialogue can give a relatively accurate prediction of the dialogue state, so as to maintain the progress of multiple rounds of dialogue. Further, when the number of the designated historical multiple rounds of dialogue is not equal to 1, the priority search principle or the voting decision principle can be adopted to select the most accurate predicted dialogue state. The principle of preferential search refers to the first designated state node of the designated historical multi-round dialogue searched as the predicted dialogue state. The voting decision principle refers to the maximum number of designated state nodes as the predicted dialogue state. For example, there are three historical multi-round dialogues, and their state chains are T1-T2-T5-T8, T1-T2-T4-T7, T1-T2. -T5-T9, then T5 is the designated state node with the largest number, so T5 is used as the predicted dialogue state node. Among them, T1 is, for example, the status of determining the user authority, and the output reply voice is, for example: "Authority verification is correct, please select the business to be handled" (for example, the user enters the user name and password in the first round of dialogue); T2 is the business confirmation Status, the output response voice is, for example, "Do you need to adjust the temporary quota or the fixed quota?" (For example, the user enters a voice similar to "I want to adjust the credit limit"); T3 is the status of the quota category confirmation, and the output voice is, for example, "You need How to adjust the temporary quota" (for example, the user has entered the voice of "temporary quota"). Among them, the above example of T1-T3 is only for explaining one application scenario of this application, but not as a limitation to this application.
在一个实施方式中,所述判断所述指定历史多轮对话的数量是否等于1的步骤S403之后,包括:In one embodiment, after the step S403 of judging whether the number of the designated history multiple rounds of dialogue is equal to 1, the method includes:
S4031、若所述指定历史多轮对话的数量不等于1,则根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;S4031. If the number of the designated historical multi-round dialogue is not equal to 1, then according to the preset similarity calculation method, calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue, so as to obtain all the designated historical multi-round dialogues Corresponding multiple similarity values;
S4032、获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态,其中指定状态节点与所述第一状态链直接连接。S4032. Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain .
如上所述,实现了获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态。为了提高对话状态预测的准确性,本申请采用根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态的方式,以保证获取与当前多轮对话最相似的指定历史多轮对话。从而最相似的指定历史多轮对话接下来的对话状态,也最有可能是当前多轮对话的对话状态。从而提高预测对话状态的准确性。As described above, it is possible to obtain the designated state node of the designated historical multi-round dialogue corresponding to the maximum similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool. In order to improve the accuracy of the dialogue state prediction, this application uses a preset similarity calculation method to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue, so as to obtain multiple corresponding to all designated historical multi-round dialogues. Similarity value; to obtain the designated state node of the designated historical multi-round dialogue corresponding to the maximum similarity value, and record it as the method of predicting the dialogue state corresponding to the designated dialogue state prediction tool to ensure that the most current multi-round dialogue is obtained Multiple rounds of similar designated history dialogues. Therefore, the next dialogue state of the most similar designated historical multi-round dialogue is most likely to be the dialogue state of the current multi-round dialogue. Thereby improving the accuracy of predicting the state of the dialogue.
在一个实施方式中,所述根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度的步骤S4031,包括:In one embodiment, the step S4031 of calculating the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue according to a preset similarity calculation method includes:
S40311、通过查询通设的词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;S40311. Obtain i+1 first word vector sequences corresponding to the first round of speech input by the user, ..., and the i+1th round of speech respectively by querying the universally set word vector database, and combining the i+ One first word vector sequence is connected in sequence to obtain the first comprehensive vector X;
S40312、通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y;S40312. Obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively by querying the universally set word vector database, And sequentially concatenate the i+1 second word vector sequences to obtain a second comprehensive vector Y;
S40313、根据公式:S40313. According to the formula:
计算出指定历史多轮对话与当前多轮对话的相似度M,其中X为所述第一综合向量,Y为所述第二综合向量,Xj为所述第一综合向量的第j个分向量,Yj为所述第二综合向量的第j个分向量,所述第一综合向量和所述第二综合向量均具有m个分向量。
Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
如上所述,实现了根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度。本申请不仅以用户输入的当前轮语音作为相似计算的依据,还以用户之前输入的语音作为相似计算的依据,以提高相似计算的准确性。所述词向量库用于将单词映射为向量,是自然语言分析领域中的一种常用数据库。从而利用所述词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;并通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y。从而将当前多轮对话与历史多轮对话的相似判断,转化为向量间的相似计算。再根据公式:As described above, the similarity calculation method of the preset similarity calculation method is implemented to calculate the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue. This application not only uses the current round of voice input by the user as the basis for similar calculations, but also uses the user’s previous voice input as the basis for similar calculations, so as to improve the accuracy of similar calculations. The word vector database is used to map words to vectors, and is a common database in the field of natural language analysis. Therefore, the word vector library is used to obtain i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively, and the i+1th round of speech A word vector sequence is connected in order to obtain the first comprehensive vector X; and by querying the universal word vector library, the first round of voice input by the user in the designated historical multiple rounds of dialogue,..., the i+1 round The i+1 second word vector sequences corresponding to the speech respectively, and the i+1 second word vector sequences are sequentially connected to obtain the second integrated vector Y. In this way, the similarity judgment between the current multiple rounds of dialogue and the historical multiple rounds of dialogue is transformed into a similarity calculation between vectors. According to the formula:
计算出指定历史多轮对话与当前多轮对话的相似度M。其中上述公式不仅考虑到了向量间的数值差异,还考虑到了向量间的角度差异,从而进一步保证了相似计算的准确性。
Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue. The above formula not only takes into account the numerical difference between the vectors, but also takes into account the angle difference between the vectors, thereby further ensuring the accuracy of similar calculations.
在一个实施方式中,所述判断所述p个预测对话状态是否相同的步骤S5之后,包括:In one embodiment, after the step S5 of judging whether the p predicted dialog states are the same, the method includes:
S51、若所述p个预测对话状态不完全相同,则将所述p个预测对话状态划分为多个分组,其中每个分组仅包括一种预测对话状态;S51. If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;
S52、从所述多个分组中获取组内成员最多的第一分组,并将多轮对话的当前状态更新为所述第一分组对应的预测对话状态;S52. Obtain the first group with the most members in the group from the multiple groups, and update the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;
S53、从所述多个分组中获取组内成员最少的第二分组,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。S53. Obtain a second group with the fewest members in the group from the plurality of groups, and delete the dialog state prediction tool corresponding to the second group from the p dialog state prediction tools.
如上所述,实现了将多轮对话的当前状态更新为所述第一分组对应的预测对话状态,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。理想状态下是p个预测对话状态都相同,但实际上,p个对话状态预测工具的预测准确性是有差异的,因此很可能出现p个预测对话状态不完全相同的状况。当p个预测对话状态不完全相同时,本申请将所述p个预测对话状态划分为多个分组,其中组内成员最多的第一分组表明大多数的对话状态预测工具均认可该预测对话状态,因此将多轮对话的当前状态更新为所述第 一分组对应的预测对话状态。并且,为了维持p个对话状态预测工具的预测准确性,还将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除,以在下次进行预测时,提高较为准确的对话状态预测工具的相对权重,从而提高后续可能的对话状态预测的准确性。As described above, the current state of multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group, and the dialogue state prediction tool corresponding to the second group is deleted from the p dialogue state prediction tools . In an ideal state, p predicted dialogue states are all the same, but in fact, the prediction accuracy of p dialogue state prediction tools are different, so it is very likely that p predicted dialogue states are not exactly the same. When p predicted dialog states are not completely the same, this application divides the p predicted dialog states into multiple groups, where the first group with the most members in the group indicates that most of the dialog state prediction tools recognize the predicted dialog state Therefore, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group. In addition, in order to maintain the prediction accuracy of the p dialog state prediction tools, the dialog state prediction tools corresponding to the second group are also deleted from the p dialog state prediction tools, so as to improve the accuracy of the next prediction. The relative weight of the dialogue state prediction tool, thereby improving the accuracy of the subsequent possible dialogue state prediction.
参照图2,本申请实施例提供一种基于对话状态预测的多轮对话装置,包括:2, an embodiment of the present application provides a multi-round dialogue device based on dialogue state prediction, including:
第i+1轮语音获取单元10,用于在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;The (i+1)th round of speech acquisition unit 10 is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;
第i+1轮文字文本获取单元20,用于根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;The i+1th round of text and text acquisition unit 20 is configured to perform voice recognition processing on the i+1th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;
对话状态生成条件判断单元30,用于判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;The dialog state generation condition determination unit 30 is configured to determine whether the i+1th round of text and text triggers a preset dialog state generation condition;
预测对话状态获取单元40,用于若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;The predictive dialogue state acquisition unit 40 is configured to use preset p dialogue state prediction tools if the i+1th round of text does not trigger a preset dialogue state generation condition, based on the i+1th round of text The text and the preceding information corresponding to the i+1-th round of textual text are used to predict the dialogue state, thereby obtaining p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1 , The preceding information includes at least the first round of text,..., the i-th round of text;
预测对话状态判断单元50,用于判断所述p个预测对话状态是否相同;The predicted dialogue state judging unit 50 is used to judge whether the p predicted dialogue states are the same;
第i+1轮回复语音获取单元60,用于若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;The i+1th round reply voice acquiring unit 60 is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and based on the preset dialogue state and the reply voice Correspondence, get the i+1th round reply voice;
第i+1轮回复语音输出单元70,用于采用预设的语音输出装置,输出所述第i+1轮回复语音。The i+1th round reply voice output unit 70 is configured to use a preset voice output device to output the i+1th round reply voice.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
在一个实施方式中,所述对话状态生成条件判断单元30,包括:In an embodiment, the dialog state generation condition judgment unit 30 includes:
分词处理子单元,用于将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;The word segmentation processing subunit is used to perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;
配置文件判断子单元,用于判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;The configuration file judging subunit is used to judge whether the keyword or the combination of the keywords is recorded in a preset configuration file, wherein the configuration file records the trigger condition, the reply voice, and the jump state;
对话状态生成条件判断子单元,用于若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;The dialog state generation condition judging subunit is used to determine that if the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;
所述装置,包括:The device includes:
对话状态更新单元,用于若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。The dialogue state update unit is configured to update the current state of multiple rounds of dialogue to the jump state if the i+1th round of text triggers a preset dialogue state generation condition, and use a preset voice output device Output the reply voice.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方 式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
在一个实施方式中,所述p个对话状态预测工具包括指定对话状态预测工具,所述指定对话状态预测工具预先连接至预设的外部知识库,所述外部知识库存储有历史多轮对话,所述预测对话状态获取单元40,包括:In one embodiment, the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations, The predictive dialogue state obtaining unit 40 includes:
第一状态链生成子单元,用于采用所述指定对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,生成当前的多轮对话的第一状态链;The first state chain generation subunit is used to use the specified dialog state prediction tool to generate the current multiple rounds according to the i+1th round of text and the preceding information corresponding to the i+1th round of text The first state chain of the dialogue;
指定历史多轮对话获取子单元,用于从所述外部知识库中获取指定历史多轮对话,其中所述指定历史多轮对话的第二状态链包含所述第一状态链;其中,所述第二状态链包含所述第一状态链指,所述第一状态链中的所有状态节点均是所述第二状态链的状态节点,并且所述第一状态链中的所有状态节点之间的节点关系也与所述第二状态链中对应的状态节点的节点关系相同;A designated historical multi-round dialogue acquisition subunit for acquiring a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the The second state chain includes the first state chain, all state nodes in the first state chain are state nodes of the second state chain, and all state nodes in the first state chain The node relationship of is also the same as the node relationship of the corresponding state node in the second state chain;
指定历史多轮对话数量判断子单元,用于判断所述指定历史多轮对话的数量是否等于1;The designated historical multi-round dialogue quantity judging subunit is used for judging whether the designated historical multi-round dialogue quantity is equal to 1;
指定状态节点获取子单元,用于若所述指定历史多轮对话的数量等于1,则获取所述第二状态链中的指定状态节点,并将所述指定状态节点记为所述指定对话状态预测工具对应的预测对话状态,其中所述指定状态节点与所述第一状态链直接连接。The designated state node obtaining subunit is configured to obtain the designated state node in the second state chain if the number of the designated history multi-round dialogue is equal to 1, and record the designated state node as the designated dialogue state The prediction dialog state corresponding to the prediction tool, wherein the designated state node is directly connected to the first state chain.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
在一个实施方式中,所述装置,包括:In one embodiment, the device includes:
相似计算单元,用于若所述指定历史多轮对话的数量不等于1,则根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;The similarity calculation unit is used to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue according to the preset similarity calculation method if the number of the designated historical multi-round dialogue is not equal to 1, thereby obtaining the similarity between the designated historical multi-round dialogue and the current multi-round dialogue. Multiple similarity values corresponding to multiple historical rounds of dialogue;
预测对话状态标记单元,用于获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态,其中指定状态节点与所述第一状态链直接连接。The predicted dialogue state marking unit is used to obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node and the The first state chain is directly connected.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
在一个实施方式中,所述相似计算单元,包括:In one embodiment, the similarity calculation unit includes:
第一综合向量X获取子单元,用于通过查询通设的词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;The first comprehensive vector X obtaining subunit is used to obtain the i+1 first words corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively by querying the universally set word vector library Vector sequence, and sequentially connect the i+1 first word vector sequences to obtain the first comprehensive vector X;
第二综合向量Y获取子单元,用于通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y;The second comprehensive vector Y obtaining subunit is used to obtain the i corresponding to the first round of speech,..., and the i+1th round of speech input by the user in the specified historical multiple rounds of dialogue by querying the universally set word vector library +1 second word vector sequence, and sequentially connect the i+1 second word vector sequences to obtain a second comprehensive vector Y;
相似度M计算子单元,用于根据公式:The similarity M calculation subunit is used according to the formula:
计算出指定历史多轮对话与当前多轮对话的相似度M,其中X为所述第一综合向量,Y为所述第二综合向量,Xj为所述第一综合向量的第j个分向量,Yj为所述第二综合向量的第j个分向量,所述第一综合向量和所述第二综合向量均具有m个分向量。
Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
在一个实施方式中,所述装置,包括:In one embodiment, the device includes:
分组划分单元,用于若所述p个预测对话状态不完全相同,则将所述p个预测对话状态划分为多个分组,其中每个分组仅包括一种预测对话状态;A grouping division unit, configured to divide the p predicted dialog states into multiple groups if the p predicted dialog states are not completely the same, wherein each group includes only one predicted dialog state;
第一分组获取单元,用于从所述多个分组中获取组内成员最多的第一分组,并将多轮对话的当前状态更新为所述第一分组对应的预测对话状态;The first group obtaining unit is configured to obtain the first group with the most members in the group from the plurality of groups, and update the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;
第二分组获取单元,用于从所述多个分组中获取组内成员最少的第二分组,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。The second group obtaining unit is configured to obtain the second group with the least members in the group from the plurality of groups, and delete the dialog state prediction tool corresponding to the second group from the p dialog state prediction tools.
其中上述单元、子单元、模块或子模块分别用于执行的操作与前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储基于对话状态预测的多轮对话方法所用数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于对话状态预测的多轮对话方法。3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in the figure. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used to store data used in the multi-round dialogue method based on dialogue state prediction. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a multi-round dialogue method based on dialogue state prediction.
上述处理器执行上述基于对话状态预测的多轮对话方法,其中所述方法包括的步骤分别与执行前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。所述基于对话状态预测的多轮对话方法,包括:在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;判断所述p个预测对 话状态是否相同;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;采用预设的语音输出装置,输出所述第i+1轮回复语音。The above-mentioned processor executes the above-mentioned multi-round dialogue method based on dialogue state prediction, wherein the steps included in the method respectively correspond to the steps of executing the multi-round dialogue method based on dialogue state prediction in the aforementioned embodiment one-to-one, and will not be repeated here. The multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,... ., the i-th round of text; determine whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述存储介质为易失性存储介质或非易失性存储介质,计算机程序被处理器执行时实现基于对话状态预测的多轮对话方法,其中所述方法包括的步骤分别与执行前述实施方式的基于对话状态预测的多轮对话方法的步骤一一对应,在此不再赘述。所述基于对话状态预测的多轮对话方法,包括:在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;判断所述p个预测对话状态是否相同;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;采用预设的语音输出装置,输出所述第i+1轮回复语音。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. The storage medium is a volatile storage medium or a non-volatile storage medium. When the computer program is executed by the processor, the The multi-round dialogue method for state prediction, wherein the steps included in the method respectively correspond to the steps of executing the multi-round dialogue method based on dialogue state prediction of the foregoing embodiment one-to-one, and will not be repeated here. The multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,... ., the i-th round of text; determine whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.
Claims (20)
- 一种基于对话状态预测的多轮对话方法,其中,包括:A multi-round dialogue method based on dialogue state prediction, which includes:在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
- 根据权利要求1所述的基于对话状态预测的多轮对话方法,其中,所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤,包括:The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein the step of judging whether the (i+1)th round of text triggers a preset dialogue state generation condition comprises:将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤之后,包括:After the step of judging whether the i+1th round of text and text triggers a preset dialog state generation condition, the method includes:若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
- 根据权利要求1所述的基于对话状态预测的多轮对话方法,其中,所述p个对话状态预测工具包括指定对话状态预测工具,所述指定对话状态预测工具预先连接至预设的外部知识库,所述外部知识库存储有历史多轮对话,所述依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测的步骤,包括:The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein the p dialogue state prediction tools comprise a designated dialogue state prediction tool, and the designated dialogue state prediction tool is connected in advance to a preset external knowledge base The external knowledge base stores multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:采用所述指定对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,生成当前的多轮对话的第一状态链;Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue according to the i+1-th round of text and the preceding information corresponding to the i+1-th round of text;从所述外部知识库中获取指定历史多轮对话,其中所述指定历史多轮对话的第二状态链包含所述第一状态链;其中,所述第二状态链包含所述第一状态链指,所述第一状态链中的所有状态节点均是所述第二状态链的状态节点,并且所述第一状态链中的所有状态节点之间的节点关系也与所述第二状态链中 对应的状态节点的节点关系相同;Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;判断所述指定历史多轮对话的数量是否等于1;Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;若所述指定历史多轮对话的数量等于1,则获取所述第二状态链中的指定状态节点,并将所述指定状态节点记为所述指定对话状态预测工具对应的预测对话状态,其中所述指定状态节点与所述第一状态链直接连接。If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
- 根据权利要求3所述的基于对话状态预测的多轮对话方法,其中,所述判断所述指定历史多轮对话的数量是否等于1的步骤之后,包括:The multi-round dialogue method based on dialog state prediction according to claim 3, wherein after the step of judging whether the number of the designated historical multi-round dialogue is equal to 1, the method comprises:若所述指定历史多轮对话的数量不等于1,则根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;If the number of the designated historical multi-round dialogue is not equal to 1, then the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated according to the preset similarity calculation method, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态,其中指定状态节点与所述第一状态链直接连接。Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
- 根据权利要求4所述的基于对话状态预测的多轮对话方法,其中,所述根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度的步骤,包括:The multi-round dialogue method based on dialog state prediction according to claim 4, wherein the step of calculating the similarity between the designated historical multi-round dialogue and the current multi-round dialogue according to a preset similarity calculation method comprises:通过查询通设的词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;By querying the universal word vector library, the i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y;By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;根据公式:According to the formula:计算出指定历史多轮对话与当前多轮对话的相似度M,其中X为所述第一综合向量,Y为所述第二综合向量,Xj为所述第一综合向量的第j个分向量,Yj为所述第二综合向量的第j个分向量,所述第一综合向量和所述第二综合向量均具有m个分向量。 Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
- 根据权利要求1所述的基于对话状态预测的多轮对话方法,其中,所述判断所述p个预测对话状态是否相同的步骤之后,包括:The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein after the step of judging whether the p predicted dialogue states are the same, the method comprises:若所述p个预测对话状态不完全相同,则将所述p个预测对话状态划分为多个分组,其中每个分组仅包括一种预测对话状态;If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;从所述多个分组中获取组内成员最多的第一分组,并将多轮对话的当前状态更新为所述第一分组对应的预测对话状态;Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;从所述多个分组中获取组内成员最少的第二分组,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.
- 一种基于对话状态预测的多轮对话装置,其中,包括:A multi-round dialogue device based on dialogue state prediction, which includes:第i+1轮语音获取单元,用于在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;The (i+1)th round of speech acquisition unit is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;第i+1轮文字文本获取单元,用于根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;The (i+1)th round of text and text acquisition unit is configured to perform voice recognition processing on the (i+1)th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;对话状态生成条件判断单元,用于判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;A dialogue state generation condition judging unit, configured to judge whether the i+1th round of text and text triggers a preset dialogue state generation condition;预测对话状态获取单元,用于若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;The predictive dialogue state acquisition unit is configured to use preset p dialogue state prediction tools if the i+1th round of text text does not trigger a preset dialogue state generation condition, based on the i+1th round of text text Perform dialogue state prediction with the preceding information corresponding to the i+1th round of text, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, The preceding information includes at least the first round of text,..., the i-th round of text;预测对话状态判断单元,用于判断所述p个预测对话状态是否相同;A predictive dialogue state judging unit, configured to determine whether the p predicted dialogue states are the same;第i+1轮回复语音获取单元,用于若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;The i+1th round reply voice acquisition unit is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and according to the correspondence between the preset dialogue state and the reply voice Relationship, get the i+1 round reply voice;第i+1轮回复语音输出单元,用于采用预设的语音输出装置,输出所述第i+1轮回复语音。The i+1th round reply voice output unit is configured to use a preset voice output device to output the i+1th round reply voice.
- 根据权利要求7所述的基于对话状态预测的多轮对话装置,其中,所述对话状态生成条件判断单元,包括:The multi-round dialogue device based on dialogue state prediction according to claim 7, wherein said dialogue state generation condition judgment unit comprises:分词处理子单元,用于将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;The word segmentation processing subunit is used to perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;配置文件判断子单元,用于判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;The configuration file judging subunit is used to judge whether the keyword or the combination of the keywords is recorded in a preset configuration file, wherein the configuration file records the trigger condition, the reply voice, and the jump state;对话状态生成条件判断子单元,用于若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;The dialog state generation condition judging subunit is used to determine that if the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;所述装置,包括:The device includes:对话状态更新单元,用于若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。The dialogue state update unit is configured to update the current state of multiple rounds of dialogue to the jump state if the i+1th round of text triggers a preset dialogue state generation condition, and use a preset voice output device Output the reply voice.
- 一种计算机设备,其中,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种基于对话状态预测的多轮对话方法,所述方法包括:A computer device, which includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a multi-round dialog method based on dialog state prediction is implemented, and the method includes:在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
- 根据权利要求9所述的计算机设备,其中,所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤,包括:9. The computer device according to claim 9, wherein the step of determining whether the i+1th round of textual text triggers a preset dialog state generation condition comprises:将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤之后,包括:After the step of judging whether the i+1th round of text and text triggers a preset dialog state generation condition, the method includes:若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
- 根据权利要求9所述的计算机设备,其中,所述p个对话状态预测工具包括指定对话状态预测工具,所述指定对话状态预测工具预先连接至预设的外部知识库,所述外部知识库存储有历史多轮对话,所述依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测的步骤,包括:The computer device according to claim 9, wherein the p dialog state prediction tools comprise a designated dialog state prediction tool, the designated dialog state prediction tool is connected in advance to a preset external knowledge base, and the external knowledge base stores There are multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:采用所述指定对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,生成当前的多轮对话的第一状态链;Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue based on the i+1th round of text and the preceding information corresponding to the i+1th round of text;从所述外部知识库中获取指定历史多轮对话,其中所述指定历史多轮对话的第二状态链包含所述第一状态链;其中,所述第二状态链包含所述第一状态链指,所述第一状态链中的所有状态节点均是所述第二状态链的状态节点,并且所述第一状态链中的所有状态节点之间的节点关系也与所述第二状态链中对应的状态节点的节点关系相同;Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;判断所述指定历史多轮对话的数量是否等于1;Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;若所述指定历史多轮对话的数量等于1,则获取所述第二状态链中的指定状态节点,并将所述指定状态节点记为所述指定对话状态预测工具对应的预测对话状态,其中所述指定状态节点与所述第一状态链直接连接。If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
- 根据权利要求11所述的计算机设备,其中,所述判断所述指定历史多轮对话的数量是否等于1的步骤之后,包括:11. The computer device according to claim 11, wherein after the step of judging whether the number of conversations in the designated history multiple rounds is equal to 1, the step comprises:若所述指定历史多轮对话的数量不等于1,则根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;If the number of the designated historical multi-round dialogue is not equal to 1, then the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated according to the preset similarity calculation method, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态,其中指定状态节点与所述第一状态链直接连接。Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
- 根据权利要求12所述的计算机设备,其中,所述根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度的步骤,包括:The computer device according to claim 12, wherein the step of calculating the similarity between the designated historical multiple rounds of dialogue and the current multiple rounds of dialogue according to a preset similarity calculation method comprises:通过查询通设的词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;By querying the universal word vector library, the i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y;By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;根据公式:According to the formula:计算出指定历史多轮对话与当前多轮对话的相似度M,其中X为所述第一综合向量,Y为所述第二综合向量,Xj为所述第一综合向量的第j个分向量,Yj为所述第二综合向量的第j个分向量,所述第一综合向量和所述第二综合向量均具有m个分向量。 Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
- 根据权利要求9所述的计算机设备,其中,所述判断所述p个预测对话状态是否相同的步骤之后,包括:9. The computer device according to claim 9, wherein after the step of judging whether the p predicted conversation states are the same, it comprises:若所述p个预测对话状态不完全相同,则将所述p个预测对话状态划分为多个分组,其中每个分组仅包括一种预测对话状态;If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;从所述多个分组中获取组内成员最多的第一分组,并将多轮对话的当前状态更新为所述第一分组对应的预测对话状态;Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;从所述多个分组中获取组内成员最少的第二分组,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.
- 一种计算机可读存储介质,其中,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种基于对话状态预测的多轮对话方法,所述方法包括:A computer-readable storage medium, wherein a computer program is stored thereon, and when the computer program is executed by a processor, a multi-round dialog method based on dialog state prediction is realized, the method includes:在与用户进行i轮对话之后,获取用户输入的第i+1轮语音,其中i为大于1的整数;After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;根据预设的语音识别方法,对所述第i+1轮语音进行语音识别处理,从而得到第i+1轮文字文本;According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;判断所述第i+1轮文字文本是否触发预设的对话状态生成条件;Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;若所述第i+1轮文字文本未触发预设的对话状态生成条件,则采用预设的 p个对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测,从而得到与所述p个对话状态预测工具分别对应的p个预测对话状态;其中,p为大于1的整数,所述前文信息至少包括第一轮文字文本、...、第i轮文字文本;If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;判断所述p个预测对话状态是否相同;Judging whether the p predicted dialog states are the same;若所述p个预测对话状态相同,则将多轮对话的当前状态更新为所述预测对话状态,并根据预设的对话状态与回复语音的对应关系,获取第i+1轮回复语音;If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;采用预设的语音输出装置,输出所述第i+1轮回复语音。A preset voice output device is used to output the (i+1)th round of reply voice.
- 根据权利要求15所述的计算机可读存储介质,其中,所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤,包括:15. The computer-readable storage medium according to claim 15, wherein the step of judging whether the (i+1)th round of text triggers a preset dialog state generation condition comprises:将所述第i+1轮文字文本进行分词处理,从而得到多个关键词;Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;判断所述关键词或者所述关键词之间的组合是否记载在预设的配置文件中,其中所述配置文件记载有触发条件、回复语音和跳转状态;Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;若所述关键词或者所述关键词之间的组合记载在所述触发条件部分,则判定所述第i+1轮文字文本触发预设的对话状态生成条件;If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;所述判断所述第i+1轮文字文本是否触发预设的对话状态生成条件的步骤之后,包括:After the step of judging whether the i+1th round of text triggers a preset dialog state generation condition, the method includes:若所述第i+1轮文字文本触发预设的对话状态生成条件,则将多轮对话的当前状态更新为所述跳转状态,并采用预设的语音输出装置输出所述回复语音。If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
- 根据权利要求15所述的计算机可读存储介质,其中,所述p个对话状态预测工具包括指定对话状态预测工具,所述指定对话状态预测工具预先连接至预设的外部知识库,所述外部知识库存储有历史多轮对话,所述依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,进行对话状态预测的步骤,包括:The computer-readable storage medium according to claim 15, wherein the p dialog state prediction tools comprise a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external The knowledge base stores multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:采用所述指定对话状态预测工具,依据所述第i+1轮文字文本和与所述第i+1轮文字文本对应的前文信息,生成当前的多轮对话的第一状态链;Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue according to the i+1-th round of text and the preceding information corresponding to the i+1-th round of text;从所述外部知识库中获取指定历史多轮对话,其中所述指定历史多轮对话的第二状态链包含所述第一状态链;其中,所述第二状态链包含所述第一状态链指,所述第一状态链中的所有状态节点均是所述第二状态链的状态节点,并且所述第一状态链中的所有状态节点之间的节点关系也与所述第二状态链中对应的状态节点的节点关系相同;Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;判断所述指定历史多轮对话的数量是否等于1;Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;若所述指定历史多轮对话的数量等于1,则获取所述第二状态链中的指定状态节点,并将所述指定状态节点记为所述指定对话状态预测工具对应的预测对话状态,其中所述指定状态节点与所述第一状态链直接连接。If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
- 根据权利要求17所述的计算机可读存储介质,其中,所述判断所述指定历史多轮对话的数量是否等于1的步骤之后,包括:18. The computer-readable storage medium according to claim 17, wherein after the step of determining whether the number of the specified history multiple rounds of dialogue is equal to 1, the step comprises:若所述指定历史多轮对话的数量不等于1,则根据预设的相似计算方法, 计算指定历史多轮对话与当前多轮对话的相似度,从而得到与所有的指定历史多轮对话分别对应的多个相似度值;If the number of the designated historical multi-round dialogue is not equal to 1, then according to the preset similarity calculation method, the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;获取数值最大的相似度值对应的指定历史多轮对话的指定状态节点,并记为所述指定对话状态预测工具对应的预测对话状态,其中指定状态节点与所述第一状态链直接连接。Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
- 根据权利要求18所述的计算机可读存储介质,其中,所述根据预设的相似计算方法,计算指定历史多轮对话与当前多轮对话的相似度的步骤,包括:18. The computer-readable storage medium according to claim 18, wherein the step of calculating the similarity between the designated historical rounds of dialogue and the current rounds of dialogues according to a preset similarity calculation method comprises:通过查询通设的词向量库,获取与用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第一词向量序列,并将所述i+1个第一词向量序列顺序连接,从而得到第一综合向量X;By querying the universal word vector library, i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;通过查询通设的词向量库,获取指定历史多轮对话中的用户输入的第一轮语音、...、第i+1轮语音分别对应的i+1个第二词向量序列,并将所述i+1个第二词向量序列顺序连接,从而得到第二综合向量Y;By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;根据公式:According to the formula:计算出指定历史多轮对话与当前多轮对话的相似度M,其中X为所述第一综合向量,Y为所述第二综合向量,Xj为所述第一综合向量的第j个分向量,Yj为所述第二综合向量的第j个分向量,所述第一综合向量和所述第二综合向量均具有m个分向量。 Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
- 根据权利要求15所述的计算机可读存储介质,其中,所述判断所述p个预测对话状态是否相同的步骤之后,包括:15. The computer-readable storage medium according to claim 15, wherein after the step of judging whether the p predicted conversation states are the same, the method comprises:若所述p个预测对话状态不完全相同,则将所述p个预测对话状态划分为多个分组,其中每个分组仅包括一种预测对话状态;If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;从所述多个分组中获取组内成员最多的第一分组,并将多轮对话的当前状态更新为所述第一分组对应的预测对话状态;Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;从所述多个分组中获取组内成员最少的第二分组,并将所述第二分组对应的对话状态预测工具从所述p个对话状态预测工具中删除。A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010177686.7 | 2020-03-13 | ||
CN202010177686.7A CN111475616B (en) | 2020-03-13 | 2020-03-13 | Multi-round dialogue method and device based on dialogue state prediction and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021179445A1 true WO2021179445A1 (en) | 2021-09-16 |
Family
ID=71748316
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/093426 WO2021179445A1 (en) | 2020-03-13 | 2020-05-29 | Conversation state prediction-based multi-round conversation method, device, and computer apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111475616B (en) |
WO (1) | WO2021179445A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374266A (en) * | 2022-10-27 | 2022-11-22 | 深圳市人马互动科技有限公司 | Interaction method, device, equipment and storage medium based on plot interaction node |
CN115878775A (en) * | 2022-12-23 | 2023-03-31 | 北京百度网讯科技有限公司 | Method and device for generating cross-type dialogue data |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112017663B (en) * | 2020-08-14 | 2024-04-30 | 博泰车联网(南京)有限公司 | Voice generalization method and device and computer storage medium |
CN112185391A (en) * | 2020-09-30 | 2021-01-05 | 深圳供电局有限公司 | Automatic modification processing method for customer service record |
CN112463939B (en) * | 2020-11-12 | 2024-05-24 | 深圳市欢太科技有限公司 | Man-machine conversation method, system, service equipment and computer storage medium |
CN112364147A (en) * | 2020-12-01 | 2021-02-12 | 四川长虹电器股份有限公司 | Cross-domain multi-turn dialogue method based on knowledge graph and implementation system |
CN113220858B (en) * | 2021-05-31 | 2023-10-27 | 平安科技(深圳)有限公司 | Dialogue system updating method, device, computer equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4408665B2 (en) * | 2003-08-11 | 2010-02-03 | 富士通株式会社 | Speech recognition apparatus for speech recognition, speech data collection method for speech recognition, and computer program |
CN106599196A (en) * | 2016-12-14 | 2017-04-26 | 竹间智能科技(上海)有限公司 | Artificial intelligence conversation method and system |
CN106796787A (en) * | 2014-05-20 | 2017-05-31 | 亚马逊技术有限公司 | The linguistic context carried out using preceding dialog behavior in natural language processing is explained |
CN107665704A (en) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | Phonetic order detection model construction method, detection method and system, man-machine interaction method and equipment |
CN109635085A (en) * | 2018-06-05 | 2019-04-16 | 安徽省泰岳祥升软件有限公司 | Management method of intelligent interaction process, and multi-turn conversation method and device |
CN110032633A (en) * | 2019-04-17 | 2019-07-19 | 腾讯科技(深圳)有限公司 | More wheel dialog process method, apparatus and equipment |
CN110050015A (en) * | 2016-11-04 | 2019-07-23 | Prc-迪索托国际公司 | More (alkenyl) ethers of sulfur-bearing, the prepolymer of more (alkenyl) ethers for introducing sulfur-bearing and application thereof |
CN110096567A (en) * | 2019-03-14 | 2019-08-06 | 中国科学院自动化研究所 | Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning |
CN110309170A (en) * | 2019-07-02 | 2019-10-08 | 北京大学 | A kind of Task takes turns the complicated intension recognizing method in dialogue more |
CN110442676A (en) * | 2019-07-02 | 2019-11-12 | 北京邮电大学 | Patent retrieval method and device based on more wheel dialogues |
CN110704588A (en) * | 2019-09-04 | 2020-01-17 | 平安科技(深圳)有限公司 | Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318109B2 (en) * | 2013-10-02 | 2016-04-19 | Microsoft Technology Licensing, Llc | Techniques for updating a partial dialog state |
CN107369443B (en) * | 2017-06-29 | 2020-09-25 | 北京百度网讯科技有限公司 | Dialog management method and device based on artificial intelligence |
CN109086329B (en) * | 2018-06-29 | 2021-01-05 | 出门问问信息科技有限公司 | Topic keyword guide-based multi-turn conversation method and device |
CN109460450B (en) * | 2018-09-27 | 2021-07-09 | 清华大学 | Dialog state tracking method and device, computer equipment and storage medium |
CN110287297A (en) * | 2019-05-22 | 2019-09-27 | 深圳壹账通智能科技有限公司 | Dialogue replies method, apparatus, computer equipment and computer readable storage medium |
CN110209791B (en) * | 2019-06-12 | 2021-03-26 | 百融云创科技股份有限公司 | Multi-round dialogue intelligent voice interaction system and device |
-
2020
- 2020-03-13 CN CN202010177686.7A patent/CN111475616B/en active Active
- 2020-05-29 WO PCT/CN2020/093426 patent/WO2021179445A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4408665B2 (en) * | 2003-08-11 | 2010-02-03 | 富士通株式会社 | Speech recognition apparatus for speech recognition, speech data collection method for speech recognition, and computer program |
CN106796787A (en) * | 2014-05-20 | 2017-05-31 | 亚马逊技术有限公司 | The linguistic context carried out using preceding dialog behavior in natural language processing is explained |
CN107665704A (en) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | Phonetic order detection model construction method, detection method and system, man-machine interaction method and equipment |
CN110050015A (en) * | 2016-11-04 | 2019-07-23 | Prc-迪索托国际公司 | More (alkenyl) ethers of sulfur-bearing, the prepolymer of more (alkenyl) ethers for introducing sulfur-bearing and application thereof |
CN106599196A (en) * | 2016-12-14 | 2017-04-26 | 竹间智能科技(上海)有限公司 | Artificial intelligence conversation method and system |
CN109635085A (en) * | 2018-06-05 | 2019-04-16 | 安徽省泰岳祥升软件有限公司 | Management method of intelligent interaction process, and multi-turn conversation method and device |
CN110096567A (en) * | 2019-03-14 | 2019-08-06 | 中国科学院自动化研究所 | Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning |
CN110032633A (en) * | 2019-04-17 | 2019-07-19 | 腾讯科技(深圳)有限公司 | More wheel dialog process method, apparatus and equipment |
CN110309170A (en) * | 2019-07-02 | 2019-10-08 | 北京大学 | A kind of Task takes turns the complicated intension recognizing method in dialogue more |
CN110442676A (en) * | 2019-07-02 | 2019-11-12 | 北京邮电大学 | Patent retrieval method and device based on more wheel dialogues |
CN110704588A (en) * | 2019-09-04 | 2020-01-17 | 平安科技(深圳)有限公司 | Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374266A (en) * | 2022-10-27 | 2022-11-22 | 深圳市人马互动科技有限公司 | Interaction method, device, equipment and storage medium based on plot interaction node |
CN115878775A (en) * | 2022-12-23 | 2023-03-31 | 北京百度网讯科技有限公司 | Method and device for generating cross-type dialogue data |
CN115878775B (en) * | 2022-12-23 | 2024-04-12 | 北京百度网讯科技有限公司 | Method and device for generating cross-type dialogue data |
Also Published As
Publication number | Publication date |
---|---|
CN111475616A (en) | 2020-07-31 |
CN111475616B (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021179445A1 (en) | Conversation state prediction-based multi-round conversation method, device, and computer apparatus | |
US10424319B2 (en) | Assessing the structural quality of conversations | |
CN106297800B (en) | Self-adaptive voice recognition method and equipment | |
WO2022095380A1 (en) | Ai-based virtual interaction model generation method and apparatus, computer device and storage medium | |
CN110377632B (en) | Litigation result prediction method, litigation result prediction device, litigation result prediction computer device and litigation result prediction storage medium | |
CN109299245B (en) | Method and device for recalling knowledge points | |
US7292976B1 (en) | Active learning process for spoken dialog systems | |
CN104903954A (en) | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination | |
US11094326B2 (en) | Ensemble modeling of automatic speech recognition output | |
US11190641B1 (en) | Automated agent behavior recommendations for call quality improvement | |
US11289075B1 (en) | Routing of natural language inputs to speech processing applications | |
US11380315B2 (en) | Characterizing accuracy of ensemble models for automatic speech recognition by determining a predetermined number of multiple ASR engines based on their historical performance | |
CN110164416B (en) | Voice recognition method and device, equipment and storage medium thereof | |
JP6810580B2 (en) | Language model learning device and its program | |
GB2622755A (en) | Evaluating output sequences using an auto-regressive language model neural network | |
JP2020135689A (en) | Model learning system, intention interpretation system, method for learning model, and model learning program | |
US11024315B2 (en) | Characterizing accuracy of ensemble models for automatic speech recognition | |
US20240005912A1 (en) | Acoustic model for multilingual speech recognition | |
Gunasekara et al. | Quantized-dialog language model for goal-oriented conversational systems | |
JP7161974B2 (en) | Quality control method | |
CN116524926B (en) | Method for generating service form through voice control at mobile terminal | |
US11380308B1 (en) | Natural language processing | |
JPH11202886A (en) | Speech recognition device, word recognition device, word recognition method, and storage medium recorded with word recognition program | |
US11967319B2 (en) | Method and electronic device for processing a spoken utterance | |
US11568469B1 (en) | Systems and methods for generating recommendations based on multi-channel inputs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20924343 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20924343 Country of ref document: EP Kind code of ref document: A1 |