CN113868395B

CN113868395B - Multi-round dialogue generation type model establishment method, system, electronic equipment and medium

Info

Publication number: CN113868395B
Application number: CN202111180118.3A
Authority: CN
Inventors: 刘伟硕
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2024-08-02
Anticipated expiration: 2041-10-11
Also published as: CN113868395A

Abstract

The application discloses a multi-round dialogue generation type model establishment method, a system, electronic equipment and a medium, wherein the multi-round dialogue generation type model establishment method comprises the following steps: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model; processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text; and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model. The application solves the problem of storing the round-going dialogue information by designing the attention mechanism aiming at the multi-round dialogue scene, and improves the utilization rate and the mining degree of the round-going dialogue information.

Description

Multi-round dialogue generation type model establishment method, system, electronic equipment and medium

Technical Field

The application relates to the technical field of deep learning, in particular to a method, a system, electronic equipment and a medium for establishing a multi-round dialogue generation model.

Background

In the prior art, the multi-round dialogue generation model establishment is mainly realized through the following two schemes, namely a pipeline-based method and a deep learning network-based method. The dialogue generating method based on the pipeline mainly comprises three parts of natural language understanding, dialogue state management, natural language generation and the like, and the generalization capability of the model is poor because the whole performance of the model is limited by all the parts; the multi-turn dialogue generation mode based on the deep learning network is mainly limited by the storage and the utilization of the round dialogue information, the background information is increased along with the increase of the number of dialogue turns, and the basic information such as the dialogue mode, the sequence length and the like is not controlled. However, how to solve the problem of storing the round-going dialog information and to increase the utilization rate and the mining degree of the round-going dialog information is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a multi-turn dialogue generation type model establishment method, a system, electronic equipment and a medium, which at least solve the problems of low dialogue generation quality, low utilization rate and mining degree of the round dialogue information, unreasonable dialogue information storage and the like.

The invention provides a multi-round dialogue generation type model establishment method, which comprises the following steps:

an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;

processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

And updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.

In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:

extracting features of the text to obtain text features, and vectorizing the text features to obtain the text vectors;

scoring the text vector through a key matrix and a value matrix of the coding layer to obtain the attention distribution vector;

splicing the text vector and the attention distribution vector to obtain a spliced vector;

And obtaining the response text according to the splicing vector through the decoding layer.

In the above method for creating a model by generating multiple conversations, the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector includes:

and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the spliced vector.

In the above method for creating a model of multi-turn dialog generation, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:

performing product operation on the text vector according to the key matrix to obtain an operation result;

And carrying out product operation on the operation result according to the value matrix to obtain the attention distribution vector.

In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:

and adjusting the dimensionality of the spliced vector through the multi-layer perceptron structure of the coding layer.

In the method for establishing a multi-round dialogue generation model, the step of updating the coding layer by performing back propagation calculation on the multi-round dialogue generation model through the response text to obtain a final multi-round dialogue generation model includes:

and carrying out back propagation calculation on the initial multi-round dialogue generation type model according to the response text to obtain a loss function value, and updating model parameters according to the loss function value to obtain the final multi-round dialogue generation type model.

In the above-described multi-turn dialog generation model building method, the model parameters include at least one of the key matrix and the value matrix.

The invention also provides a multi-round dialogue generation type model building system, which is suitable for the multi-round dialogue generation type model building method, and comprises the following steps:

the coding layer construction unit is used for constructing a coding layer based on an attention mechanism, processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

And the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.

The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the processor realizes the multi-round dialogue generating model establishment method when executing the computer program.

The invention also provides an electronic device readable storage medium, wherein the electronic device readable storage medium stores computer program instructions, and the computer program instructions realize the multi-round dialogue generation model establishment method when being executed by the processor.

Compared with the related art, the method, the system, the electronic equipment and the medium for establishing the multi-round dialogue generation model are provided, when the forward propagation calculation is carried out on the text in the model training stage, the attention calculation is carried out on the text in each round, the used value matrix and key matrix are the value matrix and key matrix of the previous round, the value matrix and key matrix of the previous round contain the information of all previous rounds of dialogs, and then the related information of the round is updated in the value matrix and key matrix during the backward propagation for later dialogue use. The method solves the problems of low utilization rate and mining degree of the round-going dialogue information and the like caused by unreasonable storage of the round-going dialogue information, and improves the natural language processing capability.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a multi-round dialog generation model building method in accordance with an embodiment of the present application;

FIG. 2 is a framework diagram of a multi-round dialog generation model in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of a multi-round dialog generation model building system according to the present invention;

Fig. 4 is a frame diagram of an electronic device according to an embodiment of the application.

Wherein, the reference numerals are as follows:

coding layer construction unit: 51;

decoding layer construction unit: 52;

80 parts of a bus;

A processor: 81;

a memory: 82;

Communication interface: 83.

Detailed Description

The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.

It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.

The invention establishes and reasonably stores and utilizes the model of the directed dialog information through the design and introduction of the attention mechanism, and has positive significance for establishing user portraits for analyzing user dialog scenes and improving dialog generation quality.

The invention will now be described with reference to specific examples.

Example 1

The embodiment provides a multi-round dialogue generating model building method. Referring to fig. 1 to 2, fig. 1 is a flowchart of a method for creating a model for generating multiple conversations according to an embodiment of the application; fig. 2 is a frame diagram of a multi-round dialog generation model establishment according to an embodiment of the present application, and as shown in fig. 1 to 2, the multi-round dialog generation model establishment method includes the steps of:

step S1: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;

step S2: processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;

Step S3: and updating the coding layer by back propagation calculation of the initial multi-round dialogue generating model through responding text to obtain the final multi-round dialogue generating model.

In a specific embodiment, after an initial multi-round dialogue generation model is built by an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network, model training is performed on the initial multi-round dialogue generation model, first, forward propagation calculation is performed on text data input by a user (forward propagation means that text is processed by the encoding layer to obtain a text vector and an attention distribution vector, response text is obtained by processing the text vector and the attention distribution vector by the decoding layer), backward propagation calculation is performed on the initial multi-round dialogue generation model according to a forward propagation result (backward propagation calculation means that after a loss function value is obtained by performing backward propagation calculation on the initial multi-round dialogue generation model according to the response text, model parameters are updated according to the loss function value), and a final multi-round dialogue generation model is obtained.

In an embodiment, the step S2 of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:

Extracting characteristics of the text to obtain text characteristics, and vectorizing the text characteristics to obtain text vectors;

Scoring the text vector through a key matrix and a value matrix of the coding layer to obtain an attention distribution vector;

the text vector and the attention distribution vector are spliced to obtain a splicing direction;

In an embodiment, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:

Obtaining an operation result after carrying out product operation on the text vector according to the key matrix;

And obtaining the attention distribution vector after carrying out product operation on the operation result according to the value matrix.

In specific implementation, after inputting a text into a coding layer, the coding layer performs feature extraction on the text to obtain text features and vectorizes the text features to obtain text vectors; and performing product operation on the text vector and the value matrix in the coding layer to obtain an operation result, performing product operation on the operation result and the key matrix, and scoring the text vector, namely judging the attention of the text vector to which the attention is distributed, so as to obtain an attention distribution vector. In the process of training a model through forward propagation and backward propagation calculation to obtain new model parameters, during a first round of dialogue, a value matrix and a key matrix are initialized randomly, after a final loss function result is obtained through model training, the model is updated, and the value matrix and the key matrix are optimized, and during each round of dialogue, the value matrix and the key matrix obtained during the previous round of model training are adopted in the forward propagation calculation.

In an embodiment, the step of obtaining the spliced vector after the text vector and the attention distribution vector are spliced includes:

And processing the text vector and the attention distribution vector based on the jump connection mode to obtain a spliced vector.

In a specific implementation, the text vector and the attention distribution vector are summed, i.e. vector addition, according to the concept of jump connection using a residual network, thereby obtaining a stitched vector.

In an embodiment, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:

And adjusting the dimensionality of the spliced vector through a multi-layer perceptron structure of the coding layer.

In a specific implementation, according to the input vector dimension set by the LSTM network in the decoding layer, the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the encoding layer.

In an embodiment, the step S3 of updating the coding layer by back-propagation computation of the multi-turn dialog-generating model in response to the text to obtain the final multi-turn dialog-generating model includes:

And after the back propagation calculation is carried out on the initial multi-round dialogue generating model according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-round dialogue generating model.

In an embodiment, the model parameters include at least one of a key matrix and a value matrix.

Example two

Referring to fig. 3, fig. 3 is a schematic structural diagram of a multi-round dialogue generating model creation system according to the present invention. As shown in fig. 3, the system for creating a multi-turn dialogue creation model according to the present invention is applicable to the method for creating a multi-turn dialogue creation model, and the system for creating a multi-turn dialogue creation model includes:

the coding layer construction unit 51 constructs a coding layer based on the attention mechanism, processes the text by the coding layer to obtain a text vector and an attention distribution vector, and processes the text vector and the attention distribution vector by the decoding layer to obtain a response text;

The decoding layer construction unit 52 constructs a decoding layer based on the LSTM network, and updates the encoding layer by back-propagation calculation of the initial multi-turn dialog generation model in response to the text, thereby obtaining a final multi-turn dialog generation model.

Example III

Referring to fig. 4, a specific implementation of an electronic device is disclosed in this embodiment. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.

In particular, the processor 81 may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.

Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, the memory 82 may comprise a hard disk drive (HARD DISK DRIVE, abbreviated HDD), a floppy disk drive, a solid state drive (SolidState Drive, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a universal serial bus (Universal Serial Bus, abbreviated USB) drive, or a combination of two or more of these. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the abnormal data monitoring apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated FPROM), an electrically erasable PROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, abbreviated EFPROM), an electrically rewritable ROM (ElectricallyAlterableRead-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory, FPMDRAM), an extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory, EDODRAM), a synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory, SDRAM), or the like, as appropriate.

Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.

The processor 81 implements any of the multi-turn dialog-generating model building methods of the above-described embodiments by reading and executing computer program instructions stored in the memory 82.

In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.

The communication interface 83 is used to enable communication between modules, devices, units and/or units in embodiments of the application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/abnormal data monitoring equipment, the database, the external storage, the image/abnormal data monitoring workstation and the like are used for data communication.

Bus 80 includes hardware, software, or both that couple components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (ACCELERATED GRAPHICS Port, abbreviated as AGP) or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) Bus, a Front Side Bus (Front Side Bus, abbreviated as FSB), a HyperTransport (abbreviated as HT) interconnect, an industry standard architecture (Industry Standard Architecture, abbreviated as ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated as MCA) Bus, a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (SERIAL ADVANCED Technology Attachment, abbreviated as SATA) Bus, a video electronics standards Association local (Video Electronics Standards Association Local Bus, abbreviated as VLB) Bus, or other suitable Bus, or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.

The electronic device may connect to a multi-round dialog-generating model building system to implement the method in connection with fig. 1-2.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

In summary, the invention establishes and reasonably stores and utilizes the model of the round-going dialogue information by the design and introduction of the attention mechanism aiming at the storage mode, the attention design mode and the introduction mode of the residual error network of the round-going dialogue information, solves the problems of low dialogue generation quality, low utilization rate and mining degree of the round-going dialogue information, unreasonable dialogue information storage and the like, analyzes the dialogue scene of a user to establish a user figure, improves the utilization rate and mining degree of the round-going dialogue information, and has positive significance for improving the dialogue generation quality.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. The protection scope of the patent of the application shall therefore be subject to the protection scope of the appended claims.

Claims

1. A method for creating a multi-turn dialog generation model for application to a multi-turn dialog scene, comprising:

Updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model;

The step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text comprises the following steps:

the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the coding layer;

the step of obtaining the attention distribution vector after scoring the text vector through the key matrix and the value matrix of the coding layer includes:

2. The method of creating a model for multiple rounds of dialogue generation according to claim 1, wherein the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector comprises:

3. The method for creating the multi-round dialogue generating model as claimed in claim 2, wherein said step of updating said coding layer by back propagation calculation of said multi-round dialogue generating model by said response text comprises:

4. A multi-round dialog generation model establishment method in accordance with claim 3, characterized in that the model parameters comprise at least one of the key matrix and the value matrix.

5. A multi-round dialog-generating model building system, characterized in that it is adapted to a multi-round dialog-generating model building method as claimed in any of the preceding claims 1 to 4, comprising:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the multi-round dialog generation model building method of any of claims 1 to 4 when the computer program is executed.

7. An electronic device readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-round dialog generation model building method of any of claims 1 to 4.