WO2021179445A1

WO2021179445A1 - Conversation state prediction-based multi-round conversation method, device, and computer apparatus

Info

Publication number: WO2021179445A1
Application number: PCT/CN2020/093426
Authority: WO
Inventors: 吴信朝; 郜开开; 周宸; 周宝; 陈远旭
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-03-13
Filing date: 2020-05-29
Publication date: 2021-09-16
Also published as: CN111475616A; CN111475616B

Abstract

A conversation state prediction-based multi-round conversation method, a device, a computer apparatus, and a storage medium. The method comprises: acquiring, after an i-th conversation round with a user, (i+1)-th round speech data input by the user (S1); performing, according to a preset speech recognition method, speech recognition processing on the (i+1)-th round speech data to obtain (i+1)-th round text data (S2); determining whether the (i+1)-th round text data triggers a preset conversation state generation criterion (S3); if not, performing conversation state prediction by using p preset conversation state prediction tools to obtain p predicted conversation states (S4); determining whether the p predicted conversation states are the same (S5); if so, updating a current state of a multi-round conversation as the predicted conversation states, and acquiring, according to a correspondence relationship between the predicted conversation states and response speech data, (i+1)-th round response speech data (S6); and outputting the (i+1)-th round response speech data by using a preset speech output device (S7). In this way, the invention improves generalizability of multi-round conversation solutions, and guarantees fluency.

Description

Multi-round dialogue method, device and computer equipment based on dialogue state prediction

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 13, 2020, the application number is 202010177686.7, and the invention title is "Multi-round dialogue method, device and computer equipment based on dialogue state prediction", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence, in particular to a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction.

Background technique

Multi-round dialogue technology is used to realize rapid information interaction between humans and computers. The multi-round dialogue system includes modules such as speech recognition, language understanding, dialogue state maintenance, action candidate sorting, language generation, and speech synthesis. The answer logic mainly reflects In the dialog state maintenance module, that is, after receiving the output of the language understanding module, it is judged what state the system should jump to. The inventor realizes that the dialogue state maintenance module can generally be set by manual rules, but the dialogue state maintenance module based on manual rules does not have generalization ability, that is, when the user inputs special information, the manual rules are not set for the special information. It will cause the interruption of the entire multi-round dialogue. Therefore, the generalization ability of the traditional multi-round dialogue scheme is poor, and the running smoothness cannot be guaranteed.

technical problem

The main purpose of this application is to provide a multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction, aiming to improve the generalization ability of the multi-round dialogue scheme and ensure fluency.

Technical solutions

In order to achieve the above objective, this application proposes a multi-round dialogue method based on dialogue state prediction, which includes the following steps:

After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;

According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;

Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;

If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;

Judging whether the p predicted dialog states are the same;

If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;

A preset voice output device is used to output the (i+1)th round of reply voice.

This application provides a multi-round dialogue device based on dialogue state prediction, including:

The (i+1)th round of speech acquisition unit is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;

The (i+1)th round of text and text acquisition unit is configured to perform voice recognition processing on the (i+1)th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;

A dialogue state generation condition judging unit, configured to judge whether the i+1th round of text and text triggers a preset dialogue state generation condition;

The predictive dialogue state acquisition unit is configured to use preset p dialogue state prediction tools if the i+1th round of text text does not trigger a preset dialogue state generation condition, based on the i+1th round of text text Perform dialogue state prediction with the preceding information corresponding to the i+1th round of text, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, The preceding information includes at least the first round of text,..., the i-th round of text;

A predictive dialogue state judging unit, configured to determine whether the p predicted dialogue states are the same;

The i+1th round reply voice acquisition unit is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and according to the correspondence between the preset dialogue state and the reply voice Relationship, get the i+1 round reply voice;

The i+1th round reply voice output unit is configured to use a preset voice output device to output the i+1th round reply voice.

The present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor executes the computer program when the computer program is executed.

A multi-round dialogue method based on dialogue state prediction includes the following steps:

Judging whether the p predicted dialog states are the same;

This application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed when the computer program is executed by a processor.

Judging whether the p predicted dialog states are the same;

Beneficial effect

The multi-round dialogue method, device, computer equipment and storage medium based on dialogue state prediction of the present application acquire the i+1th round of voice input by the user after i-round dialogue with the user; Perform voice recognition processing to obtain the i+1 round of text; determine whether the i+1 round of text triggers a preset dialog state generation condition; if the i+1 round of text does not trigger a preset The dialog state generation condition is to use preset p dialog state prediction tools to predict the dialog state, thereby obtaining p predicted dialog states; determine whether the p predicted dialog states are the same; if the p predicted dialog states are the same, Then update the current state of the multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the corresponding relationship between the preset dialogue state and the reply voice; use the preset voice output device to output the Reply voice in the i+1 round. Thereby, the generalization ability of the multi-round dialogue scheme is improved and fluency is ensured. Therefore, the method of integrating p dialog state prediction tools (to improve the accuracy of prediction) and the method of using the previous information to predict the dialog state (making the analysis of multiple rounds of dialogs are based on the whole, the data is more sufficient, and the analysis The result is more accurate), which makes the data analysis more adequate, more adaptable (that is, the generalization ability is improved), and the dialogue is more fluent.

Description of the drawings

FIG. 1 is a schematic flowchart of a multi-round dialogue method based on dialogue state prediction according to an embodiment of this application;

2 is a schematic block diagram of the structure of a multi-round dialogue device based on dialogue state prediction according to an embodiment of the application;

FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

The best implementation of this application

1, an embodiment of the present application provides a multi-round dialogue method based on dialogue state prediction, including the following steps:

S1, after i rounds of dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;

S2, according to the preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text text;

S3. Determine whether the i+1th round of text and text triggers a preset dialog state generation condition;

S4. If the i+1th round of text does not trigger the preset dialog state generation condition, then use p preset dialog state prediction tools, based on the i+1th round of text and the i-th +1 rounds of the preceding information corresponding to the text, perform dialog state prediction, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, and the preceding information includes at least The first round of text,..., the i-th round of text;

S5. Determine whether the p predicted dialog states are the same;

S6. If the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice ；

S7. Use a preset voice output device to output the i+1th round of reply voice.

In this application, when multiple rounds of dialogue are stuck (that is, the i+1th round of text does not trigger the preset dialogue state generation conditions), special settings are adopted to ensure the continuous progress of multiple rounds of dialogue, that is, the preset p A dialog state prediction tool predicts the dialog state. Thereby, the generalization ability of the multi-round dialogue scheme is improved, and the running smoothness is ensured.

As described in step S1 above, after i rounds of dialogue with the user, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1. This application is applied in the process of multiple rounds of dialogue, so it is implemented after the first round of dialogue, that is, the i+1 round of voice input by the user is obtained, where i is an integer greater than 1.

As described in the above step S2, according to the preset voice recognition method, the voice recognition processing is performed on the i+1th round of speech, so as to obtain the i+1th round of text. Wherein, the speech recognition method can adopt any feasible method, for example, an open source speech recognition tool is used to process speech into text. The open source speech recognition tool is, for example, Google's open source Live Transcribe speech recognition to text tool.

As described in the foregoing step S3, it is determined whether the (i+1)th round of texts triggers a preset dialog state generation condition. The dialog state generation conditions can be pre-recorded in a preset configuration file, such as a json configuration file, where the trigger condition corresponds to the "trigger" part of the json. When the intention expressed by the i+1th round of text (for example, as a keyword or a combination of keywords) is recorded in the trigger part, it is determined that the i+1th round of text triggers a preset dialog state generation condition.

As described in step S4 above, if the i+1th round of text does not trigger the preset dialog state generation condition, then preset p dialog state prediction tools are used, based on the i+1th round of text and The preceding information corresponding to the i+1th round of text is used to predict the dialog state, so as to obtain p predicted dialog states corresponding to the p dialog state prediction tools; where p is an integer greater than 1, so The foregoing information includes at least the first round of text,..., the i-th round of text. The dialogue state prediction tool may be any feasible tool, for example, a dialogue state prediction tool based on neural network model training, or a dialogue state prediction tool based on an external knowledge base. Since the i+1th round of text does not trigger the preset dialog state generation conditions, it is impossible to maintain multiple rounds of dialog according to the original rules. Therefore, the p dialog state prediction tools are used to continue the breakpoint, that is, predict Out of the dialogue state to maintain multiple rounds of dialogue. In the traditional solution, when the i+1th round of text does not trigger the preset dialogue state generation condition, the multiple rounds of dialogue are forced to end or the multiple rounds of dialogue are forced to restart, which is not conducive to the smooth operation of the multiple rounds of dialogue state. Among them, the so-called dialogue state is a data structure containing the dialogue history from time 0 to time t (for example, the current time). The predicted dialogue state is, for example, M1-M2-M3, where M1-M2 is the dialogue history (that is, two rounds of dialogue have occurred, including data such as the user's input and the user's reply), and M3 is the newly predicted dialogue The new part of the state. Furthermore, the dialogue state may also be accompanied by labels of fluency and quality, such as smooth, unsmooth, or good, excellent, or poor dialogue quality, so that the data is more accurate and it is more conducive to realizing accurate dialogue state prediction.

As described in step S5 above, it is determined whether the p predicted dialog states are the same. If the p predicted dialogue states are the same, it means that all dialogue state prediction tools predict the same dialogue state, and the predicted dialogue state is the final dialogue state, that is, the current state of multiple rounds of dialogue should be updated to the predicted dialogue state.

As described in step S6 above, if the p predicted dialogue states are the same, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state, and the i-th dialogue state is obtained according to the preset correspondence relationship between the dialogue state and the reply voice. +1 round of reply voice. After the current state of the round dialogue is updated to the predicted dialogue state, it indicates that the computer has understood the i+1 round voice input by the user, and therefore should output the corresponding reply voice. This application presets the corresponding relationship between the dialogue state and the reply voice, so the i+1th round of reply voice can be accurately obtained.

As described in step S7 above, the preset voice output device is used to output the i+1th round of reply voice. The voice output device is, for example, a speaker or a sound box. The output of the i+1th round of reply voice is used to maintain multiple rounds of dialogue and give the user the opportunity to conduct the i+2th round of dialogue.

In one embodiment, the step S3 of judging whether the (i+1)th round of texts triggers a preset dialog state generation condition includes:

S301. Perform word segmentation processing on the (i+1)th round of text, so as to obtain multiple keywords;

S302: Determine whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;

S303: If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the (i+1)th round of text triggers a preset dialog state generation condition;

After the step S3 of determining whether the (i+1)th round of text triggers a preset dialog state generation condition, the method includes:

S31. If the (i+1)th round of text triggers a preset dialogue state generation condition, update the current state of the multiple rounds of dialogue to the jump state, and use a preset voice output device to output the reply voice .

As described above, it is possible to determine whether the (i+1)th round of text triggers the preset dialog state generation condition. This application uses a configuration file method to determine whether to trigger a preset dialog state generation condition. Wherein, the configuration file is, for example, a json configuration file, where the trigger condition, reply content, and jump state correspond to the "trigger" part, the "output" part, and the "state" part of the json, respectively. Take the adjustment of credit card limit in the banking sector as an example. First, when the user consults "credit card limit adjustment", the intention of limit adjustment will be triggered (for example, the "trigger" part of the configuration file records the combination of "credit card" and "limit adjustment"). So answer "Do you need to adjust the temporary quota or the fixed quota?" (For example, the "output" part of the configuration file records "Do you need to adjust the temporary quota or the fixed quota?"), and the state part records 007, then update the current state The status is 007. Thus, the i+1th round of dialogue is completed. At this time, since the dialog state generation condition has been triggered, multiple rounds of dialog can be successfully completed without the need for a dialog state prediction tool.

In one embodiment, the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations, The step S4 of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:

S401. Using the specified dialogue state prediction tool, generate a first state chain of the current multi-round dialogue according to the i+1th round of text and the preceding information corresponding to the i+1th round of text.

S402. Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain; The state chain means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain. The node relationship of the corresponding state nodes in the state chain is the same;

S403. Judge whether the number of the specified historical multiple rounds of dialogue is equal to 1;

S404. If the number of the designated history multi-round dialogue is equal to 1, obtain the designated state node in the second state chain, and record the designated state node as the predicted dialogue state corresponding to the designated dialogue state prediction tool , Wherein the designated state node is directly connected to the first state chain.

As described above, the prediction of the dialogue state based on the i+1th round of text and the preceding information corresponding to the i+1th round of text is realized. Wherein, the external knowledge base stores multiple historical rounds of dialogue, which can be used as a basis for predicting the state of the dialogue. Multiple rounds of dialogue are composed of multiple rounds of dialogue. In each round of dialogue, the execution terminal of this application will determine what the current dialogue state is, and then decide what kind of reply voice should be returned. This is a manual rule Standard process. The first state chain is, for example, T1-T2, that is, the current multi-round dialogue is stuck in the third round of dialogue. Therefore, the historical multi-round dialogue with the T1-T2 chain is obtained from the external knowledge base. For example, there is a historical multi-round dialogue with the T1-T2-T5-T8 chain (that is, the second state chain), and the historical multi-round dialogue is regarded as the designated In the history of multiple conversations, the state node directly connected to the T1-T2 chain is T5, so T5 is a designated state node, and the predicted dialogue state node corresponding to the designated dialogue state prediction tool should be recorded as T5. Since the second state chain of the designated historical multi-round dialogue includes the first state chain, the designated historical multi-round dialogue is similar to the current multi-round dialogue. When the multi-round dialogue is stuck, refer to the designated historical multi-round The dialogue can give a relatively accurate prediction of the dialogue state, so as to maintain the progress of multiple rounds of dialogue. Further, when the number of the designated historical multiple rounds of dialogue is not equal to 1, the priority search principle or the voting decision principle can be adopted to select the most accurate predicted dialogue state. The principle of preferential search refers to the first designated state node of the designated historical multi-round dialogue searched as the predicted dialogue state. The voting decision principle refers to the maximum number of designated state nodes as the predicted dialogue state. For example, there are three historical multi-round dialogues, and their state chains are T1-T2-T5-T8, T1-T2-T4-T7, T1-T2. -T5-T9, then T5 is the designated state node with the largest number, so T5 is used as the predicted dialogue state node. Among them, T1 is, for example, the status of determining the user authority, and the output reply voice is, for example: "Authority verification is correct, please select the business to be handled" (for example, the user enters the user name and password in the first round of dialogue); T2 is the business confirmation Status, the output response voice is, for example, "Do you need to adjust the temporary quota or the fixed quota?" (For example, the user enters a voice similar to "I want to adjust the credit limit"); T3 is the status of the quota category confirmation, and the output voice is, for example, "You need How to adjust the temporary quota" (for example, the user has entered the voice of "temporary quota"). Among them, the above example of T1-T3 is only for explaining one application scenario of this application, but not as a limitation to this application.

In one embodiment, after the step S403 of judging whether the number of the designated history multiple rounds of dialogue is equal to 1, the method includes:

S4031. If the number of the designated historical multi-round dialogue is not equal to 1, then according to the preset similarity calculation method, calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue, so as to obtain all the designated historical multi-round dialogues Corresponding multiple similarity values;

S4032. Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain .

As described above, it is possible to obtain the designated state node of the designated historical multi-round dialogue corresponding to the maximum similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool. In order to improve the accuracy of the dialogue state prediction, this application uses a preset similarity calculation method to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue, so as to obtain multiple corresponding to all designated historical multi-round dialogues. Similarity value; to obtain the designated state node of the designated historical multi-round dialogue corresponding to the maximum similarity value, and record it as the method of predicting the dialogue state corresponding to the designated dialogue state prediction tool to ensure that the most current multi-round dialogue is obtained Multiple rounds of similar designated history dialogues. Therefore, the next dialogue state of the most similar designated historical multi-round dialogue is most likely to be the dialogue state of the current multi-round dialogue. Thereby improving the accuracy of predicting the state of the dialogue.

In one embodiment, the step S4031 of calculating the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue according to a preset similarity calculation method includes:

S40311. Obtain i+1 first word vector sequences corresponding to the first round of speech input by the user, ..., and the i+1th round of speech respectively by querying the universally set word vector database, and combining the i+ One first word vector sequence is connected in sequence to obtain the first comprehensive vector X;

S40312. Obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively by querying the universally set word vector database, And sequentially concatenate the i+1 second word vector sequences to obtain a second comprehensive vector Y;

S40313. According to the formula:

Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.

As described above, the similarity calculation method of the preset similarity calculation method is implemented to calculate the similarity between the specified historical multiple rounds of dialogue and the current multiple rounds of dialogue. This application not only uses the current round of voice input by the user as the basis for similar calculations, but also uses the user’s previous voice input as the basis for similar calculations, so as to improve the accuracy of similar calculations. The word vector database is used to map words to vectors, and is a common database in the field of natural language analysis. Therefore, the word vector library is used to obtain i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively, and the i+1th round of speech A word vector sequence is connected in order to obtain the first comprehensive vector X; and by querying the universal word vector library, the first round of voice input by the user in the designated historical multiple rounds of dialogue,..., the i+1 round The i+1 second word vector sequences corresponding to the speech respectively, and the i+1 second word vector sequences are sequentially connected to obtain the second integrated vector Y. In this way, the similarity judgment between the current multiple rounds of dialogue and the historical multiple rounds of dialogue is transformed into a similarity calculation between vectors. According to the formula:

Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue. The above formula not only takes into account the numerical difference between the vectors, but also takes into account the angle difference between the vectors, thereby further ensuring the accuracy of similar calculations.

In one embodiment, after the step S5 of judging whether the p predicted dialog states are the same, the method includes:

S51. If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;

S52. Obtain the first group with the most members in the group from the multiple groups, and update the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;

S53. Obtain a second group with the fewest members in the group from the plurality of groups, and delete the dialog state prediction tool corresponding to the second group from the p dialog state prediction tools.

As described above, the current state of multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group, and the dialogue state prediction tool corresponding to the second group is deleted from the p dialogue state prediction tools . In an ideal state, p predicted dialogue states are all the same, but in fact, the prediction accuracy of p dialogue state prediction tools are different, so it is very likely that p predicted dialogue states are not exactly the same. When p predicted dialog states are not completely the same, this application divides the p predicted dialog states into multiple groups, where the first group with the most members in the group indicates that most of the dialog state prediction tools recognize the predicted dialog state Therefore, the current state of the multiple rounds of dialogue is updated to the predicted dialogue state corresponding to the first group. In addition, in order to maintain the prediction accuracy of the p dialog state prediction tools, the dialog state prediction tools corresponding to the second group are also deleted from the p dialog state prediction tools, so as to improve the accuracy of the next prediction. The relative weight of the dialogue state prediction tool, thereby improving the accuracy of the subsequent possible dialogue state prediction.

2, an embodiment of the present application provides a multi-round dialogue device based on dialogue state prediction, including:

The (i+1)th round of speech acquisition unit 10 is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;

The i+1th round of text and text acquisition unit 20 is configured to perform voice recognition processing on the i+1th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;

The dialog state generation condition determination unit 30 is configured to determine whether the i+1th round of text and text triggers a preset dialog state generation condition;

The predictive dialogue state acquisition unit 40 is configured to use preset p dialogue state prediction tools if the i+1th round of text does not trigger a preset dialogue state generation condition, based on the i+1th round of text The text and the preceding information corresponding to the i+1-th round of textual text are used to predict the dialogue state, thereby obtaining p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1 , The preceding information includes at least the first round of text,..., the i-th round of text;

The predicted dialogue state judging unit 50 is used to judge whether the p predicted dialogue states are the same;

The i+1th round reply voice acquiring unit 60 is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and based on the preset dialogue state and the reply voice Correspondence, get the i+1th round reply voice;

The i+1th round reply voice output unit 70 is configured to use a preset voice output device to output the i+1th round reply voice.

The operations performed by the above-mentioned units, sub-units, modules or sub-modules respectively correspond to the steps of the multi-round dialogue method based on dialogue state prediction in the foregoing embodiment, and will not be repeated here.

In an embodiment, the dialog state generation condition judgment unit 30 includes:

The word segmentation processing subunit is used to perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;

The configuration file judging subunit is used to judge whether the keyword or the combination of the keywords is recorded in a preset configuration file, wherein the configuration file records the trigger condition, the reply voice, and the jump state;

The dialog state generation condition judging subunit is used to determine that if the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;

The device includes:

The dialogue state update unit is configured to update the current state of multiple rounds of dialogue to the jump state if the i+1th round of text triggers a preset dialogue state generation condition, and use a preset voice output device Output the reply voice.

In one embodiment, the p dialog state prediction tools include a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external knowledge base stores multiple historical rounds of conversations, The predictive dialogue state obtaining unit 40 includes:

The first state chain generation subunit is used to use the specified dialog state prediction tool to generate the current multiple rounds according to the i+1th round of text and the preceding information corresponding to the i+1th round of text The first state chain of the dialogue;

A designated historical multi-round dialogue acquisition subunit for acquiring a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the The second state chain includes the first state chain, all state nodes in the first state chain are state nodes of the second state chain, and all state nodes in the first state chain The node relationship of is also the same as the node relationship of the corresponding state node in the second state chain;

The designated historical multi-round dialogue quantity judging subunit is used for judging whether the designated historical multi-round dialogue quantity is equal to 1;

The designated state node obtaining subunit is configured to obtain the designated state node in the second state chain if the number of the designated history multi-round dialogue is equal to 1, and record the designated state node as the designated dialogue state The prediction dialog state corresponding to the prediction tool, wherein the designated state node is directly connected to the first state chain.

In one embodiment, the device includes:

The similarity calculation unit is used to calculate the similarity between the designated historical multi-round dialogue and the current multi-round dialogue according to the preset similarity calculation method if the number of the designated historical multi-round dialogue is not equal to 1, thereby obtaining the similarity between the designated historical multi-round dialogue and the current multi-round dialogue. Multiple similarity values corresponding to multiple historical rounds of dialogue;

The predicted dialogue state marking unit is used to obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node and the The first state chain is directly connected.

In one embodiment, the similarity calculation unit includes:

The first comprehensive vector X obtaining subunit is used to obtain the i+1 first words corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively by querying the universally set word vector library Vector sequence, and sequentially connect the i+1 first word vector sequences to obtain the first comprehensive vector X;

The second comprehensive vector Y obtaining subunit is used to obtain the i corresponding to the first round of speech,..., and the i+1th round of speech input by the user in the specified historical multiple rounds of dialogue by querying the universally set word vector library +1 second word vector sequence, and sequentially connect the i+1 second word vector sequences to obtain a second comprehensive vector Y;

The similarity M calculation subunit is used according to the formula:

In one embodiment, the device includes:

A grouping division unit, configured to divide the p predicted dialog states into multiple groups if the p predicted dialog states are not completely the same, wherein each group includes only one predicted dialog state;

The first group obtaining unit is configured to obtain the first group with the most members in the group from the plurality of groups, and update the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;

The second group obtaining unit is configured to obtain the second group with the least members in the group from the plurality of groups, and delete the dialog state prediction tool corresponding to the second group from the p dialog state prediction tools.

3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in the figure. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used to store data used in the multi-round dialogue method based on dialogue state prediction. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a multi-round dialogue method based on dialogue state prediction.

The above-mentioned processor executes the above-mentioned multi-round dialogue method based on dialogue state prediction, wherein the steps included in the method respectively correspond to the steps of executing the multi-round dialogue method based on dialogue state prediction in the aforementioned embodiment one-to-one, and will not be repeated here. The multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,... ., the i-th round of text; determine whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.

An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. The storage medium is a volatile storage medium or a non-volatile storage medium. When the computer program is executed by the processor, the The multi-round dialogue method for state prediction, wherein the steps included in the method respectively correspond to the steps of executing the multi-round dialogue method based on dialogue state prediction of the foregoing embodiment one-to-one, and will not be repeated here. The multi-round dialogue method based on dialogue state prediction includes: obtaining the i+1 round of voice input by the user after i-round dialogue with the user, where i is an integer greater than 1; according to a preset voice recognition method, Perform voice recognition processing on the i+1th round of speech to obtain the i+1th round of text; determine whether the i+1th round of text and text triggers a preset dialog state generation condition; if the i+th round of text If the 1 round of text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text and the preceding text corresponding to the i+1th round of text Information, the dialog state prediction is performed to obtain p predicted dialog states corresponding to the p dialog state prediction tools; wherein p is an integer greater than 1, and the preceding information includes at least the first round of text,... ., the i-th round of text; determine whether the p predicted dialogue states are the same; if the p predicted dialogue states are the same, update the current state of the multiple rounds of dialogue to the predicted dialogue state, and according to the preset The corresponding relationship between the dialogue state and the reply voice is obtained, and the i+1 round reply voice is obtained; the preset voice output device is used to output the i+1 round reply voice.

Claims

A multi-round dialogue method based on dialogue state prediction, which includes:

After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;

According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;

Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;

If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;

Judging whether the p predicted dialog states are the same;

If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;

A preset voice output device is used to output the (i+1)th round of reply voice.
The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein the step of judging whether the (i+1)th round of text triggers a preset dialogue state generation condition comprises:

Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;

Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;

If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;

After the step of judging whether the i+1th round of text and text triggers a preset dialog state generation condition, the method includes:

If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein the p dialogue state prediction tools comprise a designated dialogue state prediction tool, and the designated dialogue state prediction tool is connected in advance to a preset external knowledge base The external knowledge base stores multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:

Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue according to the i+1-th round of text and the preceding information corresponding to the i+1-th round of text;

Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;

Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;

If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
The multi-round dialogue method based on dialog state prediction according to claim 3, wherein after the step of judging whether the number of the designated historical multi-round dialogue is equal to 1, the method comprises:

If the number of the designated historical multi-round dialogue is not equal to 1, then the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated according to the preset similarity calculation method, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;

Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
The multi-round dialogue method based on dialog state prediction according to claim 4, wherein the step of calculating the similarity between the designated historical multi-round dialogue and the current multi-round dialogue according to a preset similarity calculation method comprises:

By querying the universal word vector library, the i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;

By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;

According to the formula:

Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
The multi-round dialogue method based on dialogue state prediction according to claim 1, wherein after the step of judging whether the p predicted dialogue states are the same, the method comprises:

If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;

Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;

A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.
A multi-round dialogue device based on dialogue state prediction, which includes:

The (i+1)th round of speech acquisition unit is configured to acquire the (i+1)th round of speech input by the user after the i-round dialogue with the user, where i is an integer greater than 1;

The (i+1)th round of text and text acquisition unit is configured to perform voice recognition processing on the (i+1)th round of speech according to a preset voice recognition method, so as to obtain the (i+1)th round of text and text;

A dialogue state generation condition judging unit, configured to judge whether the i+1th round of text and text triggers a preset dialogue state generation condition;

The predictive dialogue state acquisition unit is configured to use preset p dialogue state prediction tools if the i+1th round of text text does not trigger a preset dialogue state generation condition, based on the i+1th round of text text Perform dialogue state prediction with the preceding information corresponding to the i+1th round of text, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, The preceding information includes at least the first round of text,..., the i-th round of text;

A predictive dialogue state judging unit, configured to determine whether the p predicted dialogue states are the same;

The i+1th round reply voice acquisition unit is configured to update the current state of the multiple rounds of dialogue to the predicted dialogue state if the p predicted dialogue states are the same, and according to the correspondence between the preset dialogue state and the reply voice Relationship, get the i+1 round reply voice;

The i+1th round reply voice output unit is configured to use a preset voice output device to output the i+1th round reply voice.
The multi-round dialogue device based on dialogue state prediction according to claim 7, wherein said dialogue state generation condition judgment unit comprises:

The word segmentation processing subunit is used to perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;

The configuration file judging subunit is used to judge whether the keyword or the combination of the keywords is recorded in a preset configuration file, wherein the configuration file records the trigger condition, the reply voice, and the jump state;

The dialog state generation condition judging subunit is used to determine that if the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;

The device includes:

The dialogue state update unit is configured to update the current state of multiple rounds of dialogue to the jump state if the i+1th round of text triggers a preset dialogue state generation condition, and use a preset voice output device Output the reply voice.
A computer device, which includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a multi-round dialog method based on dialog state prediction is implemented, and the method includes:

After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;

According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;

Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;

If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;

Judging whether the p predicted dialog states are the same;

If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;

A preset voice output device is used to output the (i+1)th round of reply voice.
9. The computer device according to claim 9, wherein the step of determining whether the i+1th round of textual text triggers a preset dialog state generation condition comprises:

Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;

Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;

If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;

After the step of judging whether the i+1th round of text and text triggers a preset dialog state generation condition, the method includes:

If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
The computer device according to claim 9, wherein the p dialog state prediction tools comprise a designated dialog state prediction tool, the designated dialog state prediction tool is connected in advance to a preset external knowledge base, and the external knowledge base stores There are multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:

Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue based on the i+1th round of text and the preceding information corresponding to the i+1th round of text;

Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;

Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;

If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
11. The computer device according to claim 11, wherein after the step of judging whether the number of conversations in the designated history multiple rounds is equal to 1, the step comprises:

If the number of the designated historical multi-round dialogue is not equal to 1, then the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated according to the preset similarity calculation method, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;

Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
The computer device according to claim 12, wherein the step of calculating the similarity between the designated historical multiple rounds of dialogue and the current multiple rounds of dialogue according to a preset similarity calculation method comprises:

By querying the universal word vector library, the i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;

By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;

According to the formula:

Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
9. The computer device according to claim 9, wherein after the step of judging whether the p predicted conversation states are the same, it comprises:

If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;

Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;

A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.
A computer-readable storage medium, wherein a computer program is stored thereon, and when the computer program is executed by a processor, a multi-round dialog method based on dialog state prediction is realized, the method includes:

After the i-round dialogue with the user, obtain the i+1th round of voice input by the user, where i is an integer greater than 1;

According to a preset voice recognition method, perform voice recognition processing on the i+1th round of speech, so as to obtain the i+1th round of text;

Judging whether the (i+1)th round of texts triggers a preset dialog state generation condition;

If the i+1th round of text text does not trigger the preset dialog state generation condition, the preset p dialog state prediction tools are used, based on the i+1th round of text text and the i+1th round of text. The previous information corresponding to the round text is predicted to perform dialogue state prediction, so as to obtain p predicted dialogue states corresponding to the p dialogue state prediction tools; where p is an integer greater than 1, and the previous information includes at least the first Round text,..., i-th round text;

Judging whether the p predicted dialog states are the same;

If the p predicted dialogue states are the same, update the current state of multiple rounds of dialogue to the predicted dialogue state, and obtain the i+1th round of reply voice according to the preset correspondence between the dialogue state and the reply voice;

A preset voice output device is used to output the (i+1)th round of reply voice.
15. The computer-readable storage medium according to claim 15, wherein the step of judging whether the (i+1)th round of text triggers a preset dialog state generation condition comprises:

Perform word segmentation processing on the i+1th round of text text to obtain multiple keywords;

Judging whether the keyword or the combination of the keywords is recorded in a preset configuration file, where the configuration file records trigger conditions, reply voice, and jump state;

If the keyword or the combination of the keywords is recorded in the trigger condition part, it is determined that the i+1th round of text text triggers a preset dialog state generation condition;

After the step of judging whether the i+1th round of text triggers a preset dialog state generation condition, the method includes:

If the (i+1)th round of text triggers a preset dialogue state generation condition, the current state of the multiple rounds of dialogue is updated to the jump state, and the preset voice output device is used to output the reply voice.
The computer-readable storage medium according to claim 15, wherein the p dialog state prediction tools comprise a designated dialog state prediction tool, and the designated dialog state prediction tool is pre-connected to a preset external knowledge base, and the external The knowledge base stores multiple historical rounds of dialogue, and the step of performing dialogue state prediction based on the i+1th round of text and the preceding information corresponding to the i+1th round of text includes:

Using the specified dialogue state prediction tool to generate the first state chain of the current multi-round dialogue according to the i+1-th round of text and the preceding information corresponding to the i+1-th round of text;

Obtain a designated historical multi-round dialogue from the external knowledge base, wherein the second state chain of the designated historical multi-round dialogue includes the first state chain; wherein, the second state chain includes the first state chain Means that all state nodes in the first state chain are state nodes of the second state chain, and the node relationship between all state nodes in the first state chain is also the same as that of the second state chain The node relationship of the corresponding state nodes in is the same;

Determine whether the number of the specified historical multiple rounds of dialogue is equal to 1;

If the number of the designated history multi-round dialogue is equal to 1, then the designated state node in the second state chain is obtained, and the designated state node is recorded as the predicted dialogue state corresponding to the designated dialogue state prediction tool, where The designated state node is directly connected to the first state chain.
18. The computer-readable storage medium according to claim 17, wherein after the step of determining whether the number of the specified history multiple rounds of dialogue is equal to 1, the step comprises:

If the number of the designated historical multi-round dialogue is not equal to 1, then according to the preset similarity calculation method, the similarity between the designated historical multi-round dialogue and the current multi-round dialogue is calculated, so as to obtain the corresponding corresponding to all the designated historical multi-round dialogues. Multiple similarity values of;

Obtain the designated state node of the designated historical multi-round dialogue corresponding to the largest numerical similarity value, and record it as the predicted dialogue state corresponding to the designated dialogue state prediction tool, wherein the designated state node is directly connected to the first state chain.
18. The computer-readable storage medium according to claim 18, wherein the step of calculating the similarity between the designated historical rounds of dialogue and the current rounds of dialogues according to a preset similarity calculation method comprises:

By querying the universal word vector library, i+1 first word vector sequences corresponding to the first round of speech input by the user,..., and the i+1th round of speech respectively are obtained, and the i+1 The first word vector sequence is connected in order to obtain the first comprehensive vector X;

By querying the universal word vector library, obtain the i+1 second word vector sequence corresponding to the first round of voice input by the user in the specified historical multiple rounds of dialogue,..., and the i+1th round of voice respectively, and The i+1 second word vector sequences are sequentially connected, thereby obtaining a second comprehensive vector Y;

According to the formula:

Calculate the similarity M between the specified historical rounds of dialogue and the current rounds of dialogue, where X is the first integrated vector, Y is the second integrated vector, and Xj is the j-th component vector of the first integrated vector , Yj is the j-th component vector of the second integrated vector, and both the first integrated vector and the second integrated vector have m component vectors.
15. The computer-readable storage medium according to claim 15, wherein after the step of judging whether the p predicted conversation states are the same, the method comprises:

If the p predicted dialog states are not completely the same, divide the p predicted dialog states into multiple groups, where each group includes only one predicted dialog state;

Obtaining the first group with the most members in the group from the plurality of groups, and updating the current state of the multiple rounds of conversations to the predicted conversation state corresponding to the first group;

A second group with the fewest members in the group is obtained from the plurality of groups, and the dialog state prediction tool corresponding to the second group is deleted from the p dialog state prediction tools.