CN113868395B - Multi-round dialogue generation type model establishment method, system, electronic equipment and medium - Google Patents
Multi-round dialogue generation type model establishment method, system, electronic equipment and medium Download PDFInfo
- Publication number
- CN113868395B CN113868395B CN202111180118.3A CN202111180118A CN113868395B CN 113868395 B CN113868395 B CN 113868395B CN 202111180118 A CN202111180118 A CN 202111180118A CN 113868395 B CN113868395 B CN 113868395B
- Authority
- CN
- China
- Prior art keywords
- text
- vector
- round
- model
- attention distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 115
- 238000009826 distribution Methods 0.000 claims abstract description 46
- 230000004044 response Effects 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 8
- 238000005065 mining Methods 0.000 abstract description 6
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a multi-round dialogue generation type model establishment method, a system, electronic equipment and a medium, wherein the multi-round dialogue generation type model establishment method comprises the following steps: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model; processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text; and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model. The application solves the problem of storing the round-going dialogue information by designing the attention mechanism aiming at the multi-round dialogue scene, and improves the utilization rate and the mining degree of the round-going dialogue information.
Description
Technical Field
The application relates to the technical field of deep learning, in particular to a method, a system, electronic equipment and a medium for establishing a multi-round dialogue generation model.
Background
In the prior art, the multi-round dialogue generation model establishment is mainly realized through the following two schemes, namely a pipeline-based method and a deep learning network-based method. The dialogue generating method based on the pipeline mainly comprises three parts of natural language understanding, dialogue state management, natural language generation and the like, and the generalization capability of the model is poor because the whole performance of the model is limited by all the parts; the multi-turn dialogue generation mode based on the deep learning network is mainly limited by the storage and the utilization of the round dialogue information, the background information is increased along with the increase of the number of dialogue turns, and the basic information such as the dialogue mode, the sequence length and the like is not controlled. However, how to solve the problem of storing the round-going dialog information and to increase the utilization rate and the mining degree of the round-going dialog information is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a multi-turn dialogue generation type model establishment method, a system, electronic equipment and a medium, which at least solve the problems of low dialogue generation quality, low utilization rate and mining degree of the round dialogue information, unreasonable dialogue information storage and the like.
The invention provides a multi-round dialogue generation type model establishment method, which comprises the following steps:
an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:
extracting features of the text to obtain text features, and vectorizing the text features to obtain the text vectors;
scoring the text vector through a key matrix and a value matrix of the coding layer to obtain the attention distribution vector;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
And obtaining the response text according to the splicing vector through the decoding layer.
In the above method for creating a model by generating multiple conversations, the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector includes:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the spliced vector.
In the above method for creating a model of multi-turn dialog generation, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
And carrying out product operation on the operation result according to the value matrix to obtain the attention distribution vector.
In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:
and adjusting the dimensionality of the spliced vector through the multi-layer perceptron structure of the coding layer.
In the method for establishing a multi-round dialogue generation model, the step of updating the coding layer by performing back propagation calculation on the multi-round dialogue generation model through the response text to obtain a final multi-round dialogue generation model includes:
and carrying out back propagation calculation on the initial multi-round dialogue generation type model according to the response text to obtain a loss function value, and updating model parameters according to the loss function value to obtain the final multi-round dialogue generation type model.
In the above-described multi-turn dialog generation model building method, the model parameters include at least one of the key matrix and the value matrix.
The invention also provides a multi-round dialogue generation type model building system, which is suitable for the multi-round dialogue generation type model building method, and comprises the following steps:
the coding layer construction unit is used for constructing a coding layer based on an attention mechanism, processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the processor realizes the multi-round dialogue generating model establishment method when executing the computer program.
The invention also provides an electronic device readable storage medium, wherein the electronic device readable storage medium stores computer program instructions, and the computer program instructions realize the multi-round dialogue generation model establishment method when being executed by the processor.
Compared with the related art, the method, the system, the electronic equipment and the medium for establishing the multi-round dialogue generation model are provided, when the forward propagation calculation is carried out on the text in the model training stage, the attention calculation is carried out on the text in each round, the used value matrix and key matrix are the value matrix and key matrix of the previous round, the value matrix and key matrix of the previous round contain the information of all previous rounds of dialogs, and then the related information of the round is updated in the value matrix and key matrix during the backward propagation for later dialogue use. The method solves the problems of low utilization rate and mining degree of the round-going dialogue information and the like caused by unreasonable storage of the round-going dialogue information, and improves the natural language processing capability.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a multi-round dialog generation model building method in accordance with an embodiment of the present application;
FIG. 2 is a framework diagram of a multi-round dialog generation model in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-round dialog generation model building system according to the present invention;
Fig. 4 is a frame diagram of an electronic device according to an embodiment of the application.
Wherein, the reference numerals are as follows:
coding layer construction unit: 51;
decoding layer construction unit: 52;
80 parts of a bus;
A processor: 81;
a memory: 82;
Communication interface: 83.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The invention establishes and reasonably stores and utilizes the model of the directed dialog information through the design and introduction of the attention mechanism, and has positive significance for establishing user portraits for analyzing user dialog scenes and improving dialog generation quality.
The invention will now be described with reference to specific examples.
Example 1
The embodiment provides a multi-round dialogue generating model building method. Referring to fig. 1 to 2, fig. 1 is a flowchart of a method for creating a model for generating multiple conversations according to an embodiment of the application; fig. 2 is a frame diagram of a multi-round dialog generation model establishment according to an embodiment of the present application, and as shown in fig. 1 to 2, the multi-round dialog generation model establishment method includes the steps of:
step S1: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
step S2: processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
Step S3: and updating the coding layer by back propagation calculation of the initial multi-round dialogue generating model through responding text to obtain the final multi-round dialogue generating model.
In a specific embodiment, after an initial multi-round dialogue generation model is built by an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network, model training is performed on the initial multi-round dialogue generation model, first, forward propagation calculation is performed on text data input by a user (forward propagation means that text is processed by the encoding layer to obtain a text vector and an attention distribution vector, response text is obtained by processing the text vector and the attention distribution vector by the decoding layer), backward propagation calculation is performed on the initial multi-round dialogue generation model according to a forward propagation result (backward propagation calculation means that after a loss function value is obtained by performing backward propagation calculation on the initial multi-round dialogue generation model according to the response text, model parameters are updated according to the loss function value), and a final multi-round dialogue generation model is obtained.
In an embodiment, the step S2 of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:
Extracting characteristics of the text to obtain text characteristics, and vectorizing the text characteristics to obtain text vectors;
Scoring the text vector through a key matrix and a value matrix of the coding layer to obtain an attention distribution vector;
the text vector and the attention distribution vector are spliced to obtain a splicing direction;
And obtaining the response text according to the splicing vector through the decoding layer.
In an embodiment, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
Obtaining an operation result after carrying out product operation on the text vector according to the key matrix;
And obtaining the attention distribution vector after carrying out product operation on the operation result according to the value matrix.
In specific implementation, after inputting a text into a coding layer, the coding layer performs feature extraction on the text to obtain text features and vectorizes the text features to obtain text vectors; and performing product operation on the text vector and the value matrix in the coding layer to obtain an operation result, performing product operation on the operation result and the key matrix, and scoring the text vector, namely judging the attention of the text vector to which the attention is distributed, so as to obtain an attention distribution vector. In the process of training a model through forward propagation and backward propagation calculation to obtain new model parameters, during a first round of dialogue, a value matrix and a key matrix are initialized randomly, after a final loss function result is obtained through model training, the model is updated, and the value matrix and the key matrix are optimized, and during each round of dialogue, the value matrix and the key matrix obtained during the previous round of model training are adopted in the forward propagation calculation.
In an embodiment, the step of obtaining the spliced vector after the text vector and the attention distribution vector are spliced includes:
And processing the text vector and the attention distribution vector based on the jump connection mode to obtain a spliced vector.
In a specific implementation, the text vector and the attention distribution vector are summed, i.e. vector addition, according to the concept of jump connection using a residual network, thereby obtaining a stitched vector.
In an embodiment, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:
And adjusting the dimensionality of the spliced vector through a multi-layer perceptron structure of the coding layer.
In a specific implementation, according to the input vector dimension set by the LSTM network in the decoding layer, the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the encoding layer.
In an embodiment, the step S3 of updating the coding layer by back-propagation computation of the multi-turn dialog-generating model in response to the text to obtain the final multi-turn dialog-generating model includes:
And after the back propagation calculation is carried out on the initial multi-round dialogue generating model according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-round dialogue generating model.
In an embodiment, the model parameters include at least one of a key matrix and a value matrix.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of a multi-round dialogue generating model creation system according to the present invention. As shown in fig. 3, the system for creating a multi-turn dialogue creation model according to the present invention is applicable to the method for creating a multi-turn dialogue creation model, and the system for creating a multi-turn dialogue creation model includes:
the coding layer construction unit 51 constructs a coding layer based on the attention mechanism, processes the text by the coding layer to obtain a text vector and an attention distribution vector, and processes the text vector and the attention distribution vector by the decoding layer to obtain a response text;
The decoding layer construction unit 52 constructs a decoding layer based on the LSTM network, and updates the encoding layer by back-propagation calculation of the initial multi-turn dialog generation model in response to the text, thereby obtaining a final multi-turn dialog generation model.
Example III
Referring to fig. 4, a specific implementation of an electronic device is disclosed in this embodiment. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, the memory 82 may comprise a hard disk drive (HARD DISK DRIVE, abbreviated HDD), a floppy disk drive, a solid state drive (SolidState Drive, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a universal serial bus (Universal Serial Bus, abbreviated USB) drive, or a combination of two or more of these. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the abnormal data monitoring apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated FPROM), an electrically erasable PROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, abbreviated EFPROM), an electrically rewritable ROM (ElectricallyAlterableRead-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory, FPMDRAM), an extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory, EDODRAM), a synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory, SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 implements any of the multi-turn dialog-generating model building methods of the above-described embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to enable communication between modules, devices, units and/or units in embodiments of the application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/abnormal data monitoring equipment, the database, the external storage, the image/abnormal data monitoring workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both that couple components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (ACCELERATED GRAPHICS Port, abbreviated as AGP) or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) Bus, a Front Side Bus (Front Side Bus, abbreviated as FSB), a HyperTransport (abbreviated as HT) interconnect, an industry standard architecture (Industry Standard Architecture, abbreviated as ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated as MCA) Bus, a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (SERIAL ADVANCED Technology Attachment, abbreviated as SATA) Bus, a video electronics standards Association local (Video Electronics Standards Association Local Bus, abbreviated as VLB) Bus, or other suitable Bus, or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device may connect to a multi-round dialog-generating model building system to implement the method in connection with fig. 1-2.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In summary, the invention establishes and reasonably stores and utilizes the model of the round-going dialogue information by the design and introduction of the attention mechanism aiming at the storage mode, the attention design mode and the introduction mode of the residual error network of the round-going dialogue information, solves the problems of low dialogue generation quality, low utilization rate and mining degree of the round-going dialogue information, unreasonable dialogue information storage and the like, analyzes the dialogue scene of a user to establish a user figure, improves the utilization rate and mining degree of the round-going dialogue information, and has positive significance for improving the dialogue generation quality.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. The protection scope of the patent of the application shall therefore be subject to the protection scope of the appended claims.
Claims (7)
1. A method for creating a multi-turn dialog generation model for application to a multi-turn dialog scene, comprising:
an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
Updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model;
The step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text comprises the following steps:
extracting features of the text to obtain text features, and vectorizing the text features to obtain the text vectors;
scoring the text vector through a key matrix and a value matrix of the coding layer to obtain the attention distribution vector;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the coding layer;
the step of obtaining the attention distribution vector after scoring the text vector through the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
And carrying out product operation on the operation result according to the value matrix to obtain the attention distribution vector.
2. The method of creating a model for multiple rounds of dialogue generation according to claim 1, wherein the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector comprises:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the spliced vector.
3. The method for creating the multi-round dialogue generating model as claimed in claim 2, wherein said step of updating said coding layer by back propagation calculation of said multi-round dialogue generating model by said response text comprises:
and carrying out back propagation calculation on the initial multi-round dialogue generation type model according to the response text to obtain a loss function value, and updating model parameters according to the loss function value to obtain the final multi-round dialogue generation type model.
4. A multi-round dialog generation model establishment method in accordance with claim 3, characterized in that the model parameters comprise at least one of the key matrix and the value matrix.
5. A multi-round dialog-generating model building system, characterized in that it is adapted to a multi-round dialog-generating model building method as claimed in any of the preceding claims 1 to 4, comprising:
the coding layer construction unit is used for constructing a coding layer based on an attention mechanism, processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the multi-round dialog generation model building method of any of claims 1 to 4 when the computer program is executed.
7. An electronic device readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-round dialog generation model building method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111180118.3A CN113868395B (en) | 2021-10-11 | 2021-10-11 | Multi-round dialogue generation type model establishment method, system, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111180118.3A CN113868395B (en) | 2021-10-11 | 2021-10-11 | Multi-round dialogue generation type model establishment method, system, electronic equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113868395A CN113868395A (en) | 2021-12-31 |
CN113868395B true CN113868395B (en) | 2024-08-02 |
Family
ID=79002470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111180118.3A Active CN113868395B (en) | 2021-10-11 | 2021-10-11 | Multi-round dialogue generation type model establishment method, system, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113868395B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776578A (en) * | 2017-01-03 | 2017-05-31 | 竹间智能科技(上海)有限公司 | Talk with the method and device of performance for lifting conversational system |
CN110032633A (en) * | 2019-04-17 | 2019-07-19 | 腾讯科技(深圳)有限公司 | More wheel dialog process method, apparatus and equipment |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11449744B2 (en) * | 2016-06-23 | 2022-09-20 | Microsoft Technology Licensing, Llc | End-to-end memory networks for contextual language understanding |
US20180329884A1 (en) * | 2017-05-12 | 2018-11-15 | Rsvp Technologies Inc. | Neural contextual conversation learning |
DK201770431A1 (en) * | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
EP3486842A1 (en) * | 2017-11-17 | 2019-05-22 | Digital Genius Limited | Template generation for a conversational agent |
US10978051B2 (en) * | 2018-09-28 | 2021-04-13 | Capital One Services, Llc | Adversarial learning framework for persona-based dialogue modeling |
US11087092B2 (en) * | 2019-03-05 | 2021-08-10 | Salesforce.Com, Inc. | Agent persona grounded chit-chat generation framework |
CN110413752B (en) * | 2019-07-22 | 2021-11-16 | 中国科学院自动化研究所 | Multi-turn spoken language understanding method, system and device based on conversation logic |
US11264009B2 (en) * | 2019-09-13 | 2022-03-01 | Mitsubishi Electric Research Laboratories, Inc. | System and method for a dialogue response generation system |
CN110929476B (en) * | 2019-09-27 | 2022-09-30 | 中国人民解放军63626部队 | Task type multi-round dialogue model construction method based on mixed granularity attention mechanism |
CN112231457A (en) * | 2020-10-19 | 2021-01-15 | 北京明略昭辉科技有限公司 | Multi-turn dialogue generation method and device for chatting robot and chatting robot |
US11132988B1 (en) * | 2020-10-22 | 2021-09-28 | PolyAI Limited | Dialogue system, a dialogue method, and a method of training |
CN113342947B (en) * | 2021-05-26 | 2022-03-15 | 华南师范大学 | Multi-round dialog text generation method capable of sensing dialog context relative position information |
CN113239174A (en) * | 2021-06-09 | 2021-08-10 | 华南师范大学 | Hierarchical multi-round conversation generation method and device based on double-layer decoding |
-
2021
- 2021-10-11 CN CN202111180118.3A patent/CN113868395B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776578A (en) * | 2017-01-03 | 2017-05-31 | 竹间智能科技(上海)有限公司 | Talk with the method and device of performance for lifting conversational system |
CN110032633A (en) * | 2019-04-17 | 2019-07-19 | 腾讯科技(深圳)有限公司 | More wheel dialog process method, apparatus and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113868395A (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112101190B (en) | Remote sensing image classification method, storage medium and computing device | |
CN107632987B (en) | A kind of dialogue generation method and device | |
CN107545889A (en) | Suitable for the optimization method, device and terminal device of the model of pattern-recognition | |
CN112115267A (en) | Training method, device and equipment of text classification model and storage medium | |
CN113570030A (en) | Data processing method, device, equipment and storage medium | |
Huai et al. | Zerobn: Learning compact neural networks for latency-critical edge systems | |
US20230252294A1 (en) | Data processing method, apparatus, and device, and computer-readable storage medium | |
CN113918681A (en) | Reading understanding method and system based on fragment extraction, electronic device and storage medium | |
CN118377869A (en) | Processing method and device for self-correction fact enhancement of large language model | |
CN113569705B (en) | Scene segmentation point judging method, system, storage medium and electronic equipment | |
CN114613450A (en) | Method and device for predicting property of drug molecule, storage medium and computer equipment | |
CN113743277A (en) | Method, system, equipment and storage medium for short video frequency classification | |
CN113255334A (en) | Method, system, electronic device and storage medium for calculating word vector | |
CN113868395B (en) | Multi-round dialogue generation type model establishment method, system, electronic equipment and medium | |
CN116644180A (en) | Training method and training system for text matching model and text label determining method | |
CN114049539B (en) | Collaborative target identification method, system and device based on decorrelation binary network | |
CN112132272B (en) | Computing device, processor and electronic equipment of neural network | |
CN111582456B (en) | Method, apparatus, device and medium for generating network model information | |
CN114528810A (en) | Data code generation method and device, electronic equipment and storage medium | |
CN112819513A (en) | Text chain generation method, device, equipment and medium | |
CN113343669B (en) | Word vector learning method, system, electronic equipment and storage medium | |
CN113192491B (en) | Acoustic model generation method, acoustic model generation device, computer equipment and storage medium | |
CN118249817B (en) | Decoding method and device, electronic equipment and computer readable storage medium | |
CN117544822B (en) | Video editing automation method and system | |
CN113761167B (en) | Session information extraction method, system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |