CN111160514A - Conversation method and system - Google Patents
Conversation method and system Download PDFInfo
- Publication number
- CN111160514A CN111160514A CN202010251489.5A CN202010251489A CN111160514A CN 111160514 A CN111160514 A CN 111160514A CN 202010251489 A CN202010251489 A CN 202010251489A CN 111160514 A CN111160514 A CN 111160514A
- Authority
- CN
- China
- Prior art keywords
- conversation
- utterance
- responsive
- dialog
- dialogue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 230000002787 reinforcement Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims description 144
- 230000004044 response Effects 0.000 claims description 74
- 238000012545 processing Methods 0.000 claims description 37
- 238000003860 storage Methods 0.000 claims description 37
- 230000008451 emotion Effects 0.000 claims description 19
- 238000013075 data extraction Methods 0.000 claims description 16
- 230000008909 emotion recognition Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 description 23
- 230000008569 process Effects 0.000 description 22
- 239000013598 vector Substances 0.000 description 14
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 238000013507 mapping Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000013386 optimize process Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Robotics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the specification discloses a method and a system for conversation. The dialogue method comprises the following steps: acquiring a conversation text; the dialog text comprises at least one user utterance; determining a current state of a dialog based on the dialog context; obtaining revenue scores of one or more candidate dialogs based on the current state of the dialog based on a dialog model; wherein the dialogue model is a reinforcement learning model; determining a target utterance from the one or more candidate utterances that is responsive to the above utterance based on the revenue score.
Description
Technical Field
The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a conversation method and system.
Background
An intelligent dialogue robot is an intelligent dialogue system that interacts with users using natural language. The dialog of the intelligent dialog robot with the user may include: task-type conversations, FAQ-type conversations, chat-type conversations, and persuasive-type conversations, among others. The intelligent dialogue robot generates dialogue sentences with a guiding function based on a dialogue process between the intelligent dialogue robot and a user, so that the user can complete specific operations after the dialogue. The guided dialogue realized by the intelligent dialogue robot can be widely applied to scenes such as charitable donation, commodity recommendation, loan acceptance promotion and the like.
The specification provides a guided dialogue method and a guided dialogue system suitable for an intelligent dialogue robot.
Disclosure of Invention
One aspect of an embodiment of the present specification provides a method of dialog, the method comprising: acquiring a conversation text; the dialog text comprises at least one user utterance; determining a current state of a dialog based on the dialog context; obtaining revenue scores of one or more candidate dialogs based on the current state of the dialog based on a dialog model; wherein the dialogue model is a reinforcement learning model; determining a target utterance from the one or more candidate utterances that is responsive to the above utterance based on the revenue score.
One aspect of an embodiment of the present specification provides a system for dialogs, the system comprising: the conversation data acquisition module is used for acquiring conversation texts; the dialog text comprises at least one user utterance; the conversation current state determining module is used for determining the conversation current state based on the conversation text; the profit score determining module is used for acquiring profit scores of one or more candidate dialogues on the basis of the current state of the conversation based on a conversation model; wherein the dialogue model is a reinforcement learning model; a target utterance determination module to determine a target utterance responsive to the above-dialog from the one or more candidate utterances based on the revenue score.
One aspect of embodiments of the present specification provides an apparatus for dialog, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of the above.
One aspect of an embodiment of the present specification provides a method of training a dialogue model, the method comprising: acquiring multiple rounds of historical conversations; extracting a plurality of groups of training data from the plurality of rounds of historical conversations, wherein one group of the plurality of groups of training data at least comprises a sample conversation current state, a response conversation, a sample conversation next state and a reward value corresponding to the response conversation; and iteratively updating parameters of the reinforcement learning model based on the plurality of groups of training data, so that the trained conversation model can determine the income scores of the candidate dialogues based on the current state of any conversation.
One aspect of an embodiment of the present specification provides a training dialogue model, the system comprising: the dialogue data acquisition unit is used for acquiring multiple rounds of historical dialogue; the training data extraction unit is used for extracting a plurality of groups of training data from the plurality of rounds of historical conversations, wherein one group of the plurality of groups of training data at least comprises a sample conversation current state, a response conversation, a sample conversation next state and a reward value corresponding to the response conversation; and the model parameter updating unit is used for iteratively updating the parameters of the reinforcement learning model based on a plurality of groups of training data, so that the trained conversation model can determine the profit score of the candidate conversation technology based on the current state of any conversation.
One aspect of embodiments of the present specification provides an apparatus for dialog, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of the above.
Drawings
The present description will be further described by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic diagram of an application scenario of a dialog system shown in accordance with some embodiments of the present description;
FIG. 2 is an exemplary flow diagram of a dialog method shown in accordance with some embodiments of the present description;
FIG. 3 is an exemplary flow diagram of a method of training a dialogue model, shown in accordance with some embodiments of the present description;
FIG. 4 is a schematic illustration of obtaining training data according to some embodiments of the present description;
fig. 5 is a block diagram of a dialog system shown in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used in this specification is a method for distinguishing different components, elements, parts or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
The intelligent dialogue robot can be applied to a series of speaking type dialogue scenes with the user, such as debt collection, commodity promotion, activity recommendation and the like. The intelligent dialogue robot can generate dialogue sentences (or dialogue techniques or robot techniques) with guiding function in the persuasive dialogue scenes, so that a user can complete specific operations after the dialogue.
Taking the intelligent conversation robot as an example of the application of the debt charging conversation scenario, in some embodiments, the intelligent conversation robot (also called "charging robot") may perform a conversation with a user who does not comply with the repayment obligation in an appointed time based on a configured fixed conversation process, and the conversation of each round of conversation is configured as a master manually. This approach may have the following features: the conversation process is fixed, so that the conversation scene covered by the conversation process is single.
In still other embodiments, the fixed dialog flow may not be relied upon, such that the bot intelligently selects the best-performing dialog response user from the candidate dialogs based on the preceding dialog content.
Fig. 1 is a schematic diagram of an exemplary dialog system shown in accordance with some embodiments of the present description. As shown in fig. 1, the dialog system 100 may include a processing device 110, a network 120, a user terminal 130, and a storage device 140.
The processing device 110 may be used to process information and/or data associated with the dialog generation to perform one or more of the functions disclosed in this specification. In some embodiments, processing device 110 may be used to obtain the dialog context. In some embodiments, processing device 110 may determine the current state of the conversation based on the context of the conversation. In some embodiments, process 110 may obtain revenue scores for candidate dialogs based on the current state of the dialog based on the dialog model. In some embodiments, the processing device 110 may determine a target utterance from the candidate utterances that is responsive to the above-dialog based on the revenue score. It is understood that in some embodiments, the processing device 110 may implement the functionality of an intelligent conversation robot or act as a cloud service. In still other embodiments, processing device 110 may obtain multiple rounds of historical dialog and train the dialog model. In some embodiments, the processing device 110 may include one or more processing engines (e.g., single core processing engines or multi-core processors). By way of example only, the processing device 110 may include one or more combinations of Central Processing Units (CPUs), Application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), image processors (GPUs), physical arithmetic processing units (PPUs), Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), Programmable Logic Devices (PLDs), controllers, micro-controller units, Reduced Instruction Set Computers (RISCs), microprocessors, and the like.
Fig. 2 is an exemplary flow diagram of a dialog method, shown in accordance with some embodiments of the present description.
In some embodiments, one or more steps in flow 200 may be implemented in system 100 shown in FIG. 1. For example, one or more steps in flow 200 may be stored as instructions in storage device 140 and invoked and/or executed by processing device 110.
A conversation may refer to an interaction in natural language between a user and a service.
The user utterance may be a word that the user speaks in a conversation. In some embodiments, the user may be a service user or an individual or group with potential service needs. Such as a borrower or a user who requires a borrowing service. Also for example, a user ordering goods or a user asking for goods. In some embodiments, the user may enter the user utterance on the user terminal 130 by voice input, text input, or the like.
The dialog context may refer to a natural language form of dialog context. In some embodiments, the dialog context comprises at least one user utterance. In some embodiments, the dialog context may be the completed dialog content of the user with the service. In some embodiments, the conversation context may include current user utterances and previously completed conversation content of the user and the service. Wherein, the server can be an intelligent dialogue robot working for the server.
In some embodiments, the dialogue data acquisition module may acquire a current user utterance from the user terminal 130. In some embodiments, the session data acquisition module may acquire the session context from a cache device (e.g., storage device 140) or a customer service log.
In some embodiments, the current state of a dialog may reflect information above the dialog. Such as textual information, semantic information, contextual information, and the like.
In some embodiments, the dialog current state determination module may determine the dialog current state by encoding the obtained dialog context, i.e., using a vector encoded for the dialog context as the dialog current state.
In some embodiments, the manner in which the encoding is performed may include a variety of ways. For example, one-hot, TF-IDF, Word2Vec model, Bert model, etc. For example only, multiple dialects in a dialog may be concatenated and input into the Bert model to obtain an encoded vector.
Step 206, obtaining income scores of one or more candidate dialogues on the basis of the current state of the conversation based on a conversation model; wherein the dialogue model is a reinforcement learning model. In particular, this step 206 may be performed by the revenue score determination module 530.
In some embodiments, the candidate utterance is a service utterance determined to be related to a target task guided by a service (e.g., an intelligent dialogue robot), wherein the service utterance refers to an utterance spoken by the service (e.g., the intelligent dialogue robot). For example, the intelligent dialogue robot is aimed at guiding the user to pay, and the candidate call is a service call related to the payment.
In some embodiments, the candidate sessions may be derived based on historical service logs, historical call records, and the like. For example, manually summarizing a historical service log or a historical call log, etc. results in candidate calls. For example, candidate sessions may be extracted from a historical service log or a historical call log by a model or algorithm.
In some embodiments, the number of candidate utterances is one or more. The candidate words are pre-configured and may be stored in a memory (e.g., storage device 140). The dialogue model can directly acquire candidate dialogs from the memory in any other modes such as direct reading, interface acquisition and the like when calculating the profit score.
The dialogue model is a reinforcement learning model, and the trained reinforcement learning model can select different actions under the current state and obtain the income values under different actions. For the dialogue model, the current state is the current state of the dialogue, and all possible actions correspond to all candidate dialogs. For training the dialogue model, refer to fig. 3 and its related description, which are not repeated herein.
It is to be appreciated that upon determining the current state of the conversation, the revenue score determination module can obtain revenue scores for one or more candidate dialogs via the conversation model. Specifically, the current state of the dialog and the candidate dialogs may be entered into a dialog model, which may derive a revenue score for each candidate dialogs.
In some embodiments, the revenue score positively correlates to the probability that the business objective is achieved by the corresponding candidate dialect, i.e., the greater the probability that the business objective is achieved, the greater the revenue score. The business objective may be the purpose that the service party wishes to achieve through the conversation. In some embodiments, the business objective may be an objective task that the service party wishes the intelligent conversation robot to guide the user through the conversation to accomplish. The business objective is related to a specific application scenario, for example, for a debt charging scenario, the business objective is a user payment. As another example, a merchandising scenario, where business targets are users purchasing merchandising.
A target utterance is determined from the one or more candidate utterances that is responsive to the above-utterance based on the revenue score, step 208. In particular, this step 208 may be performed by the target surgery determination module 540.
In some embodiments, the target dialect determination module may determine the candidate dialect corresponding to the highest revenue score as the target dialect.
The embodiment of the description determines the target dialogues responding to the conversation text from the candidate dialogues by adopting the conversation model, does not need to configure fixed conversation processes in advance, and can be flexibly applied to different conversation scenes; meanwhile, the dialogue model adopts a reinforcement learning model, and the optimal candidate dialogues can be intelligently selected based on the current state of the dialogue, so that the probability of realizing the business target is improved.
It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and alterations to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description.
FIG. 3 is an exemplary flow diagram of a method of training a dialogue model, shown in accordance with some embodiments of the present description. In some embodiments, one or more of the steps in flow 300 may be implemented in system 100 shown in FIG. 1. For example, one or more steps in flow 300 may be stored as instructions in storage device 140 and invoked and/or executed by processing device 110. In some embodiments, the process 300 may be implemented by a training module 550.
Step 302, obtaining a plurality of historical conversations. Specifically, this step 302 may be performed by the dialogue data acquisition unit 551.
In some embodiments, a round of historical conversations may refer to the full number of conversations of two parties to a conversation that have occurred within a period of time in the past (e.g., a week, a month, a half year, or a year). Wherein a historical conversation comprises a plurality of conversations divided in time sequence, and a conversation can be composed of a user utterance and a response conversation of a service party (e.g., an intelligent conversation robot) to the user utterance.
For example, a represents the dialect of the user a, B represents the dialect of the intelligent robot B (which may be replaced by a worker service), the content of the whole dialog between the user a and the intelligent robot B is a1B1a2B2a3B3a4B4, then a1B1a2B2a3B3a4B4 is a round of full-volume dialog, a1B1, a2B2, a3B3 or a4B4 may be regarded as a dialog, and a1B1, a2B2, a3B3 and a4B4 may be regarded as a first dialog, a second dialog, a third dialog and a fourth dialog, respectively.
In some embodiments, the historical conversations are derived from historical conversation logs, historical call records, and the like, and may be obtained from an online platform (e.g., a website, an application, and the like), or from manually consolidated conversation records, or may be read directly from a storage device that stores a large number of rounds of historical conversations.
The training data refers to data input to the reinforcement learning model for training the model. In some embodiments, the set of training data includes at least a sample dialog current state, a response dialog, a sample dialog next state, and a reward value corresponding to the response dialog. In some embodiments, the training data extraction unit may extract a plurality of sets of training data from a plurality of rounds of historical conversation.
The response utterance refers to a service utterance in a historical dialogue in response to a user utterance, wherein the service utterance refers to an utterance spoken by a service party (e.g., an intelligent dialogue robot). In some embodiments, the response utterance is a service utterance in a conversation. It will be appreciated that if a full round of historical conversations contains multiple conversations, multiple response conversations may be determined. In some embodiments, the response utterance may be encoded in the form of a vector, denoted by a below.
Continuing with the example of step 302, the response dialog may be b1 in the first dialog a1b1, b2 in the second dialog a2b2, b3 in the third dialog a3b3, or b4 in the fourth dialog a4b 4.
The sample dialog current state refers to a vector of user utterances responded to by the responding dialog in a round of historical dialog and the dialog content preceding the user utterances. The current state of the sample dialog may be denoted by s. The process of obtaining a sample dialog current state vector based on the user utterance and its previous dialog content may refer to the processing of step 204.
Continuing with the above example, a round of historical full-volume dialogs is a1b1a2b2a3b3a4b4, if the vector of response dialogs b1 is b1 ', the corresponding sample dialog current state is a 1' of a1, if the vector of response dialogs b2 is b2 ', the corresponding sample dialog current state is a vector (a1b1a 2)' of a1b1a2, and so on.
The sample dialog next state refers to a vector of user utterances after the response utterance and dialog content prior to the user utterance. The next state of the sample dialog can be represented by s'.
Continuing with the above example, a round of historical full-scale dialog is a1b1a2b2a3b3a4b4, if the response dialog vector is b1 ', the next state of the corresponding sample dialog is vector (a1b1a 2)' of a1b1a2, if the response dialog vector is b2 ', the current state of the corresponding sample dialog is vector (a1b1a2b2a 3)' of a1b1a2b2a3, and so on.
In some embodiments, the reward value corresponding to the response utterance may reflect one or more of the following information: a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance. In some embodiments, the training data extraction unit may determine a reward value corresponding to the response utterance, which may be represented by r, based on the above-described relevance, achievement probability of the business objective, emotion score, intention category, and conversation engagement.
In some embodiments, the sample dialog next state may reflect whether the dialog is finished, i.e., if there is no sample dialog next state, the end of the dialog is indicated. Continuing with the example above, the next state of the dialog cannot be extracted based on the fourth dialog, thus indicating that the dialog ends after the fourth dialog.
In some embodiments, a session end identifier may also be added to the training data to indicate whether the session is over. The end of session identification may be denoted by t. The dialogue end identifier is used for identifying whether the corresponding training data is the last dialogue in the round of historical dialogue. In some embodiments, the end of dialog indicator may be a preset character. Such as numbers, letters or other symbols, etc. Illustratively, the number 0 represents the end of the session, and 1 represents the non-end of the session. In some embodiments, if the end of dialog indicator represents an end, the training data has no sample next state s' to the dialog.
In some embodiments, a set of training samples may be represented by (s, a, s', r, t). As previously described, if a full set of historical sessions contains multiple sessions, multiple sets of training samples may be determined. It is to be appreciated that the training data can be extracted on a one-time dialogue basis.
Continuing with the above example, a historical full-scale session is a1b1a2b2a3b3a4b4, training samples (s, a, s ', r, t) can be extracted as (a 1', b1 ', (a1b1a 2)', r1, 0) based on the first session a1b1, training samples ((a 1b1a2) ', b 2', (a1b1a2b2a3) ', r2, 0) can be extracted as the second session a2b2, training samples ((a 1b1a2b2a 3)', b3 ', (a1b1a2b2a3b3a 4)', r3, 0) can be extracted as the third session a2b2, training samples ((a 1b1a2b2a3b3 ', 3a 3b 3', 0, 3b3, 0).
And step 306, iteratively updating parameters of the reinforcement learning model based on a plurality of groups of the training data, so that the trained conversation model can determine the income score of the candidate conversation technology based on the current state of any conversation. In particular, this step 306 may be performed by the model parameter update unit 553.
In the model training process, the model parameter updating unit continuously updates the parameters of the reinforcement learning model based on the training data. Specifically, during the training process, the model parameter updating unit may continuously adjust the parameters of the reinforcement learning model so that the loss function of the model satisfies a predetermined condition, for example, the loss function converges, or the loss function value is smaller than a predetermined value. And finishing model training when the loss function meets the preset condition to obtain a trained dialogue model. The trained dialog model may be a mapping to the revenue scores that reflects the current state of the dialog and the candidate dialogs, or may be understood as a Q (s, a) function that determines the revenue score for each candidate dialogs based on the input current state of the dialog.
In some embodiments, the model parameter update unit may train the dialogue model through an off-policy reinforcement learning method. Specifically, the model parameter updating unit may put a plurality of sets of training data into the experience playback data set; training data is randomly extracted from the empirical playback data set, and parameters of the reinforcement learning model are updated by the training data based on an off-policy reinforcement learning method. In some embodiments, the off-policy reinforcement learning methods may include, but are not limited to: q-learning, DQN, DDPG, and the like.
The experience replay data set may also be referred to as an experience pool, and the extraction of the training data from the experience replay data set may be a process of random uniform sampling, which may break the correlation between the training data, because the correlation may make the reinforcement learning model unable to learn effectively and difficult to converge. Meanwhile, a plurality of training data are uniformly sampled, so that the distribution of the training data is smoothed, and the problem of sample distribution change is solved.
Taking DQN reinforcement learning algorithm as an example, in some embodiments, when each set of training data does not include a session end identifier, the trained loss function may be formula (1):
wherein,representing a loss function of the model after the ith iterative training;representing some positive correlation mapping;representing parameters of the model in the ith iterative training;representing the parameters of the model in the i-1 st iterative training;represents a discount factor; r represents a reward value corresponding to a response utterance; s represents the current state of the sample session in the training data; a represents response utterances in the training data;representing the income score of the response dialect a determined by the model after the ith iterative training based on the current state s of the sample dialog; s 'represents a sample dialog next state in the training data, a' represents a candidate dialog;the model representing the i-1 st iterative training is based on the maximum value of all candidate conversational gain scores determined for the sample conversational next state s'. r may reflect the current benefit or benefits of the current,future benefits may be reflected.
As described above, when the session is ended and there is no session next state, the loss function at this time is。
Taking DQN reinforcement learning algorithm as an example, in some implementations, if each set of training data further includes a session end identifier, the loss function can be formula (2):
except for t, all the parameters in the formula (2) have the same meaning as that in the formula (1), and are not described herein again. t is the end of dialog indicator, i.e., end, t =0,when not finished, t =1,。
in some embodiments, the model parameter updating unit may also train the dialogue model through an on-policy reinforcement learning method. In some embodiments, the on-policy reinforcement learning method may include, but is not limited to: sarsa or Sarsalambda, etc.
In some embodiments, the trained dialog model may be optimized by the real-world feedback of the online user. Specifically, after the trained dialogue model is put on line to have a real dialogue with a real user, new training data can be extracted based on the real dialogue, and it can be understood that the user utterance in the new training data is real feedback of the on-line user to the service dialogue, the training is continued based on the new training data, and the performance of the model can be continuously improved by repeating the optimized process.
The reward value of the response dialect obtained in the embodiment of the description combines the semantic relevance, the realization probability of the business objective, the emotion score, the intention category and the conversation participation characteristic, and the conversation model obtained by training the training data formed by the reward value can effectively consider the current income score of the response dialect and improve the realization probability of the business objective. Meanwhile, training data are extracted from the experience playback data set, an off-policy-based reinforcement learning method is used for off-line training of the reinforcement learning model, the training data in the experience playback data set are updated on the basis of on-line interaction between the model and a user, data closed loop is achieved, and the model is iterated effectively and continuously.
In order to more clearly and completely deduce the training session model method shown in some embodiments of the present specification, fig. 4 is taken as an example to schematically illustrate the training data obtaining process in the training session model method shown in some embodiments of the present specification.
Illustratively, as shown in fig. 4, the training data may be obtained specifically by:
and acquiring manual conversation logs, wherein each manual conversation log can correspond to one round of historical conversation, so that multiple rounds of historical conversation are obtained based on the manual conversation logs. Obtaining a primary dialog composed of a user utterance and a service dialog replying to the user utterance from a plurality of rounds of historical dialogs, and further, executing the following operations aiming at a certain primary dialog to extract a corresponding set of training data:
and extracting the service dialogues in the dialog, coding the service dialogues, and taking the expression vectors of the service dialogues obtained after coding as response dialogues a.
And extracting the user words before the service dialogues and the dialogues before the user words as first sample dialog texts, coding the first sample dialog texts, and taking the coded representation vectors as sample dialog current states s.
And extracting the user words after the service dialogues and dialogues before the user words as second sample dialog texts, coding the second sample dialog texts, and taking the coded expression vectors as sample dialog next states s'.
A reward value r corresponding to the responsive utterance is determined based on a relevance of the responsive utterance to a current state of the conversation, a probability of achievement of the business objective, an emotion score of the historical user utterance in response to the responsive utterance, an intent category of the historical user utterance in response to the responsive utterance, and a conversation engagement related to the responsive utterance. For example only, 5 sub-reward values are determined based on the relevance of the response speech to the current state of the conversation, the realization probability of the business objective, the emotion score of the historical user speech responding to the response speech, the intention category of the historical user speech responding to the response speech and the conversation participation degree related to the response speech, and then the 5 sub-reward values are weighted to obtain r. Wherein the weight is positively correlated with the importance of the 5 subentry prize values. In some embodiments, the value of the dividend reward based on the relevance of the response grammar to the current state of the conversation, and the value of the dividend reward determined based on the probability of achievement of the business objective, are weighted more heavily. It is understood that r can measure the contribution of the alleged talent to completing the objective task from the above 5 angles, respectively.
In some embodiments, the relevance of the responsive dialogs to the current state of the dialog may be a semantic relevance of the two. In some embodiments, the trained semantic matching model may be used to process the current state of the sample dialog and the response dialog to determine the semantic relatedness between the two. Specifically, the current state of the dialog and the response dialog are input into a semantic matching model, and the semantic correlation between the current state of the dialog and the response dialog is output, wherein the semantic correlation is a number from 0 to 1.
In some embodiments, a first point reward value (denoted by r 1) may be determined based on the semantic relevance score, e.g., with the relevance score as r1 and, for example, with the relevance score minus a cardinality (e.g., 0.5) as the reward value.
In some embodiments, the semantic matching model may be trained based on sets of labeled training samples. Training samples may be obtained through manual dialog logs. A set of training samples may include sample dialog current state and dialog techniques. The label may be whether the sample dialog current state is related to the dialog (for example, related is represented by 1, unrelated is represented by 0), and if the dialog is a sentence in the history dialog text corresponding to the sample dialog current state, the label is related; otherwise, it is not relevant.
In some embodiments, the trained business objective prediction model can be used to process the current state of the sample dialog and the response dialog, and determine the probability of achieving the business objective. Specifically, the current state of the dialog and the response dialog are input into a business target prediction model, and the realization probability of the business target is output, wherein the probability is a number between 0 and 1.
In some embodiments, a second fractional prize value (denoted by r 2) may be determined based on the probability of achieving the business objective. The determination method is the same as r1 and is not described in detail.
In some embodiments, a business objective prediction model may be trained based on sets of labeled training samples. In some embodiments, a set of training samples may include sample dialog current states and response dialogs. The tag may be a time interval, e.g., 20min, 1h, etc., between the user completing the target task and the server outputting the response utterance.
In some embodiments, historical user utterances responsive to the responsive utterance may be processed using a trained emotion recognition model, emotion categories of the historical user utterances responsive to the responsive utterance may be determined, and corresponding categories may be converted to scores based on preset mapping rules. The score is a number from 0 to 1.
In some embodiments, a third bonus value (denoted r 3) may be determined based on the sentiment score. The determination method is the same as r1 and is not described in detail.
In some embodiments, the emotion recognition model may be trained based on a plurality of labeled training samples. The training sample may be a user utterance. In some embodiments, the tags may be positive and negative sentiment category labels. For example, 0 characterizes negative emotions and 1 characterizes positive emotions. In some embodiments, the tags may also contain other emotions, e.g., neutral, etc.
In some embodiments, historical user utterances responsive to the responsive utterance may be processed using a trained intent recognition model to determine historical user utterance intent categories responsive to the responsive utterance. In some embodiments, the intent category may be accept, decline, interest, or the like. Different intent categories may correspond to different scores. In some embodiments, a more aggressive intent score is higher, e.g., an accepted score is higher than a rejected score.
In some embodiments, a fourth bonus value (denoted r 4) may be determined based on the intent score. The determination method is the same as r1 and is not described in detail.
In some embodiments, the intent recognition model may be trained based on sets of labeled training samples. In some embodiments, the set of training samples may include user utterances. In some embodiments, the tags may be intent categories.
In some embodiments, the historical dialogue data at the time of training the semantic matching model, the business objective prediction model, the emotion recognition model, and the intention recognition model may be different from the data of the training dialogue model.
In some embodiments, the session engagement associated with the responsive utterance may be determined based on a number of sessions for a certain historical turn of sessions in which the responsive utterance is located. In some embodiments, the session engagement may be obtained through a preset mapping relationship of the number of sessions and the session engagement score. For example, the mapping relationship shown in equation (3):
wherein,scoring a conversation engagement;a constant of (0, 1);the number of conversations is a preset maximum number of conversations or the total number of conversations of a round of full-scale historical conversations where the response conversations are located;the number of sessions in response to a certain historical session of the session.
In some embodiments, a fifth point prize value (denoted r 5) may be determined based on the session engagement. The determination method is the same as r1 and is not described in detail.
In some embodiments, a determination of whether a conversation is complete, i.e., a conversation end identification, may be based on whether there are more user utterances after the responsive utterance.
Finally, the current state of the sample dialog, the response dialog, the corresponding reward value of the response dialog, the next state of the sample dialog and the end-of-dialog identifier are determined as a set of training data.
FIG. 5 is a block diagram of a dialog system shown in accordance with some embodiments of the present description. In some embodiments, the dialog system may be implemented by the processing device 110. In some embodiments, the dialog system may generate a target dialog based on the dialog context and the candidate dialogs. As shown in FIG. 5, dialog system 500 may include a dialog data acquisition module 510, a dialog current state determination module 520, a revenue score determination module 530, a target dialogs determination module 540, and a training module 550.
The dialogue data acquisition module 510 may be configured to acquire a dialogue context; the dialog text includes at least one user utterance.
The dialog current state determination module 520 may be configured to determine a dialog current state based on the dialog context.
The profit score determining module 530 may be configured to obtain profit scores for one or more candidate dialogs based on the current state of the dialog based on a dialog model; wherein the dialogue model is a reinforcement learning model. In some embodiments, the revenue score positively correlates to a probability that the respective candidate utterance resulted in achievement of the business objective.
The target utterance determination module 540 may be configured to determine a target utterance responsive to the above-dialog from the one or more candidate utterances based on the revenue score.
In some embodiments, the dialog system further comprises a training module 550, the training module 550 comprising: a dialogue data acquisition unit 551, a training data extraction unit 552, and a model parameter update unit 553.
In some embodiments, the dialogue data acquisition unit 551 may be used to acquire multiple rounds of historical dialogue.
In some embodiments, the training data extraction unit 552 may be configured to extract a plurality of sets of training data from the plurality of historical conversations, one of the plurality of sets of training data including at least: a sample dialog current state, a response dialog, a sample dialog next state, and a reward value corresponding to the response dialog. In some embodiments, one of the sets of training data further comprises: and the conversation ending identifier is used for identifying whether the corresponding training data is the last conversation in the round of historical conversations.
In some embodiments, the reward value corresponding to the response utterance reflects one or more of the following information: a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance. In some embodiments, the training data extraction unit may be to: and processing the current state of the sample dialogue and the response dialogue by utilizing a trained semantic matching model to determine the semantic correlation between the current state of the sample dialogue and the response dialogue.
In some embodiments, the training data extraction unit 552 may be configured to: and processing the current state of the sample dialogue and the response dialogue by using a trained business target prediction model to determine the realization probability of the business target.
In some embodiments, the training data extraction unit 552 may be configured to: processing historical user utterances responsive to the utterance with a trained emotion recognition model to determine the emotion score.
In some embodiments, the training data extraction unit 552 may be used to process historical user utterances responsive to speech techniques using a trained intent recognition model to determine the intent categories. In some embodiments, the conversation engagement associated with the responsive utterance positively correlates with the number of conversations corresponding to the responsive utterance.
In some embodiments, the model parameter update unit 553 may be configured to iteratively update parameters of the reinforcement learning model based on a plurality of sets of the training data, such that the trained conversation model is capable of determining a revenue score for a candidate conversation based on a current state of any conversation.
In some embodiments, the model parameter update unit 553 is configured to place sets of the training data into empirical playback data sets; training data is randomly extracted from the empirical playback data set, and parameters of the reinforcement learning model are updated with the training data based on an off-policy reinforcement learning method.
It should be appreciated that the system and its modules (e.g., a dialog system and its modules and/or a system for training a dialog model and its modules) may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above descriptions of the dialog system and its modules, and the system for training the dialog model and its modules are only for convenience of description, and should not be construed as limiting the present disclosure to the embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, the conversation data acquisition module 510, the conversation current state determination module 520, the profit score determination module 530, the target dialogues determination module 540 and the training module 550 disclosed in the conversation system may be different modules in one system, or may be a module that implements the functions of the two modules. For another example, each module in the dialog system may share one storage module, and each module may have its own storage module. Such variations are within the scope of the present disclosure.
Embodiments of the present specification also provide an apparatus for dialog, comprising at least one storage medium and at least one processor, the at least one storage medium configured to store computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of dialog described in any of the embodiments of this specification.
Embodiments of the present specification further provide an apparatus for training a dialog, including at least one storage medium and at least one processor, where the at least one storage medium is used to store computer instructions; the at least one processor is configured to execute the computer instructions to implement the method for training a dialogue model according to any embodiment of the present specification.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) the embodiment in the specification determines the target dialogs responding to the conversation text from the candidate dialogs by adopting the conversation model, does not need to configure a fixed conversation process in advance, and can be flexibly applied to different conversation scenes; (2) the reward value of the response dialect combines the semantic relevance, the realization probability of the business objective, the emotion value, the intention category and the conversation participation, and the conversation model obtained by training the training data formed by the response dialect can measure the profit value of the candidate dialect from the 5 angles, and output the candidate dialect which can most guide the user to finish the objective task to the user, thereby improving the realization probability of the business objective; (3) the trained dialogue model is subjected to iterative optimization training based on real feedback data of the online user, and the performance of the model can be improved. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.
Claims (28)
1. A method of dialog, comprising:
acquiring a conversation text; the dialog text comprises at least one user utterance;
determining a current state of a dialog based on the dialog context;
obtaining revenue scores of one or more candidate dialogs based on the current state of the dialog based on a dialog model; wherein the dialogue model is a reinforcement learning model;
determining a target utterance from the one or more candidate utterances that is responsive to the above utterance based on the revenue score.
2. The method of claim 1, the revenue score positively correlates to a probability that the corresponding candidate dialect caused the business objective to be achieved.
3. The method of claim 1, the dialogue model obtained by:
acquiring multiple rounds of historical conversations;
extracting a plurality of sets of training data from the plurality of rounds of historical dialogue, one of the plurality of sets of training data including at least: a sample dialog current state, a response dialog, a sample dialog next state, and a reward value corresponding to the response dialog;
and iteratively updating parameters of the reinforcement learning model based on the plurality of groups of training data, so that the trained conversation model can determine the income scores of the candidate dialogues based on the current state of any conversation.
4. The method of claim 3, wherein the reward value corresponding to the response utterance reflects one or more of:
a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance.
5. A system for dialogues, comprising:
the conversation data acquisition module is used for acquiring conversation texts; the dialog text comprises at least one user utterance;
the conversation current state determining module is used for determining the conversation current state based on the conversation text;
the profit score determining module is used for acquiring profit scores of one or more candidate dialogues on the basis of the current state of the conversation based on a conversation model; wherein the dialogue model is a reinforcement learning model;
a target utterance determination module to determine a target utterance responsive to the above-dialog from the one or more candidate utterances based on the revenue score.
6. The system of claim 5, the revenue score positively correlates to a probability that the corresponding candidate dialect caused the business objective to be achieved.
7. The system of claim 5, further comprising a training module, the training module comprising:
the dialogue data acquisition unit is used for acquiring multiple rounds of historical dialogue;
a training data extraction unit, configured to extract multiple sets of training data from the multiple rounds of historical conversations, where one of the multiple sets of training data at least includes: a sample dialog current state, a response dialog, a sample dialog next state, and a reward value corresponding to the response dialog;
and the model parameter updating unit is used for iteratively updating the parameters of the reinforcement learning model based on a plurality of groups of training data, so that the trained conversation model can determine the profit score of the candidate conversation technology based on the current state of any conversation.
8. The system of claim 7, wherein the reward value corresponding to the response utterance reflects one or more of:
a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance.
9. An apparatus for a conversation comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-4.
10. A method of training a dialogue model, comprising:
acquiring multiple rounds of historical conversations;
extracting a plurality of groups of training data from the plurality of rounds of historical conversations, wherein one group of the plurality of groups of training data at least comprises a sample conversation current state, a response conversation, a sample conversation next state and a reward value corresponding to the response conversation;
and iteratively updating parameters of the reinforcement learning model based on the plurality of groups of training data, so that the trained conversation model can determine the income scores of the candidate dialogues based on the current state of any conversation.
11. The method of claim 10, one of the sets of training data further comprising:
and the conversation ending identifier is used for identifying whether the corresponding training data is the last conversation in the round of historical conversations.
12. The method of claim 10, wherein the reward value corresponding to the response utterance reflects one or more of:
a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance.
13. The method of claim 12, wherein the degree of correlation between the response utterance and the current state of the dialog is obtained based on:
and processing the current state of the sample dialogue and the response dialogue by utilizing a trained semantic matching model to determine the semantic correlation between the current state of the sample dialogue and the response dialogue.
14. The method of claim 12, wherein the probability of achieving the business objective is obtained based on:
processing the current state of the sample dialogue and the response dialogue by using a trained business target prediction model to determine the realization probability of the business target; the business objective prediction model is obtained based on historical dialogue data and training of the historical dialogue data and the time interval of business objective realization.
15. The method of claim 12, wherein the emotion score responsive to the historical user utterances responsive to the utterance is obtained based on:
processing historical user utterances responsive to the utterance with a trained emotion recognition model to determine the emotion score.
16. The method of claim 12, wherein the intent categories responsive to historical user utterances responsive to the utterance are obtained based on:
historical user utterances responsive to the utterance are processed using the trained intent recognition model to determine the intent category.
17. The method of claim 12, wherein the session engagement associated with the responsive utterance positively correlates to a number of sessions corresponding to the responsive utterance.
18. The method of claim 10, wherein iteratively updating parameters of a reinforcement learning model based on the plurality of sets of training data, training the resulting dialogue model comprises:
placing a plurality of sets of the training data into an empirical playback dataset;
training data is randomly extracted from the empirical playback data set, and parameters of the reinforcement learning model are updated with the training data based on an off-policy reinforcement learning method.
19. A system for training a dialogue model, comprising:
the dialogue data acquisition unit is used for acquiring multiple rounds of historical dialogue;
the training data extraction unit is used for extracting a plurality of groups of training data from the plurality of rounds of historical conversations, wherein one group of the plurality of groups of training data at least comprises a sample conversation current state, a response conversation, a sample conversation next state and a reward value corresponding to the response conversation;
and the model parameter updating unit is used for iteratively updating the parameters of the reinforcement learning model based on a plurality of groups of training data, so that the trained conversation model can determine the profit score of the candidate conversation technology based on the current state of any conversation.
20. The system of claim 19, one of the sets of training data further comprising:
and the conversation ending identifier is used for identifying whether the corresponding training data is the last conversation in the round of historical conversations.
21. The system of claim 19, wherein the reward value corresponding to the response utterance reflects one or more of:
a relevance of a responsive utterance to a current state of a sample conversation, a probability of achievement of a business objective, an emotion score of a historical user utterance responsive to the responsive utterance, an intent category of the historical user utterance responsive to the responsive utterance, and a conversation engagement related to the responsive utterance.
22. The system of claim 21, the training data extraction unit to:
and processing the current state of the sample dialogue and the response dialogue by utilizing a trained semantic matching model to determine the semantic correlation between the current state of the sample dialogue and the response dialogue.
23. The system of claim 21, the training data extraction unit to:
processing the current state of the sample dialogue and the response dialogue by using a trained business target prediction model to determine the realization probability of the business target; the business objective prediction model is obtained based on historical dialogue data and training of the historical dialogue data and the time interval of business objective realization.
24. The system of claim 21, the training data extraction unit to:
processing historical user utterances responsive to the utterance with a trained emotion recognition model to determine the emotion score.
25. The system of claim 21, the training data extraction unit to:
historical user utterances responsive to the utterance are processed using the trained intent recognition model to determine the intent category.
26. The system of claim 21, wherein the session engagement associated with the responsive utterance positively correlates to a number of sessions corresponding to the responsive utterance.
27. The system of claim 19, the model parameter update unit to:
placing a plurality of sets of the training data into an empirical playback dataset;
training data is randomly extracted from the empirical playback data set, and parameters of the reinforcement learning model are updated with the training data based on an off-policy reinforcement learning method.
28. An apparatus for training a dialogue model, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 10-18.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010251489.5A CN111160514B (en) | 2020-04-01 | 2020-04-01 | Conversation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010251489.5A CN111160514B (en) | 2020-04-01 | 2020-04-01 | Conversation method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111160514A true CN111160514A (en) | 2020-05-15 |
CN111160514B CN111160514B (en) | 2020-08-28 |
Family
ID=70567783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010251489.5A Active CN111160514B (en) | 2020-04-01 | 2020-04-01 | Conversation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160514B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966805A (en) * | 2020-08-13 | 2020-11-20 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for assisting in realizing session |
CN112131372A (en) * | 2020-11-25 | 2020-12-25 | 中国科学院自动化研究所 | Knowledge-driven conversation strategy network optimization method, system and device |
CN112307188A (en) * | 2020-12-30 | 2021-02-02 | 北京百度网讯科技有限公司 | Dialog generation method, system, electronic device and readable storage medium |
CN112328769A (en) * | 2020-11-16 | 2021-02-05 | 北京沃东天骏信息技术有限公司 | Automatic customer service response method, device and computer readable storage medium |
CN112488239A (en) * | 2020-12-02 | 2021-03-12 | 罗科仕管理顾问有限公司 | Method and apparatus for artificial intelligence based computer-aided uniform system |
CN112507094A (en) * | 2020-12-11 | 2021-03-16 | 润联软件系统(深圳)有限公司 | Customer service robot dialogue method based on reinforcement learning and related components thereof |
CN112988991A (en) * | 2021-02-04 | 2021-06-18 | 支付宝(杭州)信息技术有限公司 | Method and system for anti-fraud intervention through man-machine conversation |
CN113761111A (en) * | 2020-07-31 | 2021-12-07 | 北京汇钧科技有限公司 | Intelligent conversation method and device |
CN113761136A (en) * | 2020-06-02 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Dialogue processing method, information processing method, model training method, information processing apparatus, model training apparatus, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108227932A (en) * | 2018-01-26 | 2018-06-29 | 上海智臻智能网络科技股份有限公司 | Interaction is intended to determine method and device, computer equipment and storage medium |
CN108763495A (en) * | 2018-05-30 | 2018-11-06 | 苏州思必驰信息科技有限公司 | Interactive method, system, electronic equipment and storage medium |
CN109086329A (en) * | 2018-06-29 | 2018-12-25 | 出门问问信息科技有限公司 | Dialogue method and device are taken turns in progress based on topic keyword guidance more |
US20190115027A1 (en) * | 2017-10-12 | 2019-04-18 | Google Llc | Turn-based reinforcement learning for dialog management |
CN110837548A (en) * | 2019-11-05 | 2020-02-25 | 泰康保险集团股份有限公司 | Answer matching method and device, electronic equipment and storage medium |
-
2020
- 2020-04-01 CN CN202010251489.5A patent/CN111160514B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190115027A1 (en) * | 2017-10-12 | 2019-04-18 | Google Llc | Turn-based reinforcement learning for dialog management |
CN108227932A (en) * | 2018-01-26 | 2018-06-29 | 上海智臻智能网络科技股份有限公司 | Interaction is intended to determine method and device, computer equipment and storage medium |
CN108763495A (en) * | 2018-05-30 | 2018-11-06 | 苏州思必驰信息科技有限公司 | Interactive method, system, electronic equipment and storage medium |
CN109086329A (en) * | 2018-06-29 | 2018-12-25 | 出门问问信息科技有限公司 | Dialogue method and device are taken turns in progress based on topic keyword guidance more |
CN110837548A (en) * | 2019-11-05 | 2020-02-25 | 泰康保险集团股份有限公司 | Answer matching method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
PAULA WESSELMANN 等: "Curiosity-driven Reinforcement Learning for Dialogue Management", 《ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
宋皓宇 等: "基于DQN的开放域多轮对话策略学习", 《中文信息学报》 * |
马跃: "基于深度强化学习的对话管理模型研究与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761136A (en) * | 2020-06-02 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Dialogue processing method, information processing method, model training method, information processing apparatus, model training apparatus, and storage medium |
CN113761111A (en) * | 2020-07-31 | 2021-12-07 | 北京汇钧科技有限公司 | Intelligent conversation method and device |
CN111966805A (en) * | 2020-08-13 | 2020-11-20 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for assisting in realizing session |
CN112328769A (en) * | 2020-11-16 | 2021-02-05 | 北京沃东天骏信息技术有限公司 | Automatic customer service response method, device and computer readable storage medium |
CN112131372A (en) * | 2020-11-25 | 2020-12-25 | 中国科学院自动化研究所 | Knowledge-driven conversation strategy network optimization method, system and device |
CN112488239A (en) * | 2020-12-02 | 2021-03-12 | 罗科仕管理顾问有限公司 | Method and apparatus for artificial intelligence based computer-aided uniform system |
CN112488239B (en) * | 2020-12-02 | 2022-01-07 | 罗科仕管理顾问有限公司 | Method and apparatus for artificial intelligence based computer-aided uniform system |
CN112507094A (en) * | 2020-12-11 | 2021-03-16 | 润联软件系统(深圳)有限公司 | Customer service robot dialogue method based on reinforcement learning and related components thereof |
CN112307188A (en) * | 2020-12-30 | 2021-02-02 | 北京百度网讯科技有限公司 | Dialog generation method, system, electronic device and readable storage medium |
CN112988991A (en) * | 2021-02-04 | 2021-06-18 | 支付宝(杭州)信息技术有限公司 | Method and system for anti-fraud intervention through man-machine conversation |
CN112988991B (en) * | 2021-02-04 | 2023-04-18 | 支付宝(杭州)信息技术有限公司 | Method and system for performing anti-fraud intervention through man-machine conversation |
Also Published As
Publication number | Publication date |
---|---|
CN111160514B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160514B (en) | Conversation method and system | |
CN110990547B (en) | Phone operation generation method and system | |
US11425064B2 (en) | Customized message suggestion with user embedding vectors | |
CN109285030A (en) | Products Show method, apparatus, terminal and computer readable storage medium | |
CN111177325B (en) | Method and system for automatically generating answers | |
CN116226334A (en) | Method for training generated large language model and searching method based on model | |
CN109829044A (en) | Dialogue method, device and equipment | |
US11995523B2 (en) | Systems and methods for determining training parameters for dialog generation | |
US11514894B2 (en) | Adaptively modifying dialog output by an artificial intelligence engine during a conversation with a customer based on changing the customer's negative emotional state to a positive one | |
CN110399472B (en) | Interview question prompting method and device, computer equipment and storage medium | |
EP3602417A1 (en) | Selecting answer spans from electronic documents using machine learning | |
CN111339309A (en) | Corpus expansion method and system for user intention | |
US11651439B2 (en) | System and method for pre-qualifying a consumer for life and health insurance products or services, benefits products or services based on eligibility and referring a qualified customer to a licensed insurance agent, producer or broker to facilitate the enrollment process | |
CN113010653A (en) | Method and system for training and conversing conversation strategy model | |
CN114722164A (en) | Intelligent comment replying method and device | |
CN114386426B (en) | Gold medal speaking skill recommendation method and device based on multivariate semantic fusion | |
CN116246632A (en) | Method and device for guiding external call operation | |
CN111914077A (en) | Customized speech recommendation method, device, computer equipment and storage medium | |
CN114418320A (en) | Customer service quality evaluation method, apparatus, device, medium, and program product | |
CN118296119A (en) | Method, device, equipment, medium and program product for generating prompt word | |
CN117574907A (en) | Task execution method and device | |
CN117370512A (en) | Method, device, equipment and storage medium for replying to dialogue | |
CN116561284A (en) | Intelligent response method, device, electronic equipment and medium | |
CN111651582B (en) | Method and system for simulating user speaking | |
CN114519094A (en) | Method and device for conversational recommendation based on random state and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |