Nothing Special   »   [go: up one dir, main page]

CN113868395B - Multi-round dialogue generation type model establishment method, system, electronic equipment and medium - Google Patents

Multi-round dialogue generation type model establishment method, system, electronic equipment and medium Download PDF

Info

Publication number
CN113868395B
CN113868395B CN202111180118.3A CN202111180118A CN113868395B CN 113868395 B CN113868395 B CN 113868395B CN 202111180118 A CN202111180118 A CN 202111180118A CN 113868395 B CN113868395 B CN 113868395B
Authority
CN
China
Prior art keywords
text
vector
round
model
attention distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111180118.3A
Other languages
Chinese (zh)
Other versions
CN113868395A (en
Inventor
刘伟硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202111180118.3A priority Critical patent/CN113868395B/en
Publication of CN113868395A publication Critical patent/CN113868395A/en
Application granted granted Critical
Publication of CN113868395B publication Critical patent/CN113868395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a multi-round dialogue generation type model establishment method, a system, electronic equipment and a medium, wherein the multi-round dialogue generation type model establishment method comprises the following steps: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model; processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text; and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model. The application solves the problem of storing the round-going dialogue information by designing the attention mechanism aiming at the multi-round dialogue scene, and improves the utilization rate and the mining degree of the round-going dialogue information.

Description

Multi-round dialogue generation type model establishment method, system, electronic equipment and medium
Technical Field
The application relates to the technical field of deep learning, in particular to a method, a system, electronic equipment and a medium for establishing a multi-round dialogue generation model.
Background
In the prior art, the multi-round dialogue generation model establishment is mainly realized through the following two schemes, namely a pipeline-based method and a deep learning network-based method. The dialogue generating method based on the pipeline mainly comprises three parts of natural language understanding, dialogue state management, natural language generation and the like, and the generalization capability of the model is poor because the whole performance of the model is limited by all the parts; the multi-turn dialogue generation mode based on the deep learning network is mainly limited by the storage and the utilization of the round dialogue information, the background information is increased along with the increase of the number of dialogue turns, and the basic information such as the dialogue mode, the sequence length and the like is not controlled. However, how to solve the problem of storing the round-going dialog information and to increase the utilization rate and the mining degree of the round-going dialog information is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a multi-turn dialogue generation type model establishment method, a system, electronic equipment and a medium, which at least solve the problems of low dialogue generation quality, low utilization rate and mining degree of the round dialogue information, unreasonable dialogue information storage and the like.
The invention provides a multi-round dialogue generation type model establishment method, which comprises the following steps:
an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:
extracting features of the text to obtain text features, and vectorizing the text features to obtain the text vectors;
scoring the text vector through a key matrix and a value matrix of the coding layer to obtain the attention distribution vector;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
And obtaining the response text according to the splicing vector through the decoding layer.
In the above method for creating a model by generating multiple conversations, the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector includes:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the spliced vector.
In the above method for creating a model of multi-turn dialog generation, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
And carrying out product operation on the operation result according to the value matrix to obtain the attention distribution vector.
In the above method for creating a model of generating multiple rounds of conversations, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:
and adjusting the dimensionality of the spliced vector through the multi-layer perceptron structure of the coding layer.
In the method for establishing a multi-round dialogue generation model, the step of updating the coding layer by performing back propagation calculation on the multi-round dialogue generation model through the response text to obtain a final multi-round dialogue generation model includes:
and carrying out back propagation calculation on the initial multi-round dialogue generation type model according to the response text to obtain a loss function value, and updating model parameters according to the loss function value to obtain the final multi-round dialogue generation type model.
In the above-described multi-turn dialog generation model building method, the model parameters include at least one of the key matrix and the value matrix.
The invention also provides a multi-round dialogue generation type model building system, which is suitable for the multi-round dialogue generation type model building method, and comprises the following steps:
the coding layer construction unit is used for constructing a coding layer based on an attention mechanism, processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the processor realizes the multi-round dialogue generating model establishment method when executing the computer program.
The invention also provides an electronic device readable storage medium, wherein the electronic device readable storage medium stores computer program instructions, and the computer program instructions realize the multi-round dialogue generation model establishment method when being executed by the processor.
Compared with the related art, the method, the system, the electronic equipment and the medium for establishing the multi-round dialogue generation model are provided, when the forward propagation calculation is carried out on the text in the model training stage, the attention calculation is carried out on the text in each round, the used value matrix and key matrix are the value matrix and key matrix of the previous round, the value matrix and key matrix of the previous round contain the information of all previous rounds of dialogs, and then the related information of the round is updated in the value matrix and key matrix during the backward propagation for later dialogue use. The method solves the problems of low utilization rate and mining degree of the round-going dialogue information and the like caused by unreasonable storage of the round-going dialogue information, and improves the natural language processing capability.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a multi-round dialog generation model building method in accordance with an embodiment of the present application;
FIG. 2 is a framework diagram of a multi-round dialog generation model in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-round dialog generation model building system according to the present invention;
Fig. 4 is a frame diagram of an electronic device according to an embodiment of the application.
Wherein, the reference numerals are as follows:
coding layer construction unit: 51;
decoding layer construction unit: 52;
80 parts of a bus;
A processor: 81;
a memory: 82;
Communication interface: 83.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The invention establishes and reasonably stores and utilizes the model of the directed dialog information through the design and introduction of the attention mechanism, and has positive significance for establishing user portraits for analyzing user dialog scenes and improving dialog generation quality.
The invention will now be described with reference to specific examples.
Example 1
The embodiment provides a multi-round dialogue generating model building method. Referring to fig. 1 to 2, fig. 1 is a flowchart of a method for creating a model for generating multiple conversations according to an embodiment of the application; fig. 2 is a frame diagram of a multi-round dialog generation model establishment according to an embodiment of the present application, and as shown in fig. 1 to 2, the multi-round dialog generation model establishment method includes the steps of:
step S1: an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
step S2: processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
Step S3: and updating the coding layer by back propagation calculation of the initial multi-round dialogue generating model through responding text to obtain the final multi-round dialogue generating model.
In a specific embodiment, after an initial multi-round dialogue generation model is built by an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network, model training is performed on the initial multi-round dialogue generation model, first, forward propagation calculation is performed on text data input by a user (forward propagation means that text is processed by the encoding layer to obtain a text vector and an attention distribution vector, response text is obtained by processing the text vector and the attention distribution vector by the decoding layer), backward propagation calculation is performed on the initial multi-round dialogue generation model according to a forward propagation result (backward propagation calculation means that after a loss function value is obtained by performing backward propagation calculation on the initial multi-round dialogue generation model according to the response text, model parameters are updated according to the loss function value), and a final multi-round dialogue generation model is obtained.
In an embodiment, the step S2 of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text includes:
Extracting characteristics of the text to obtain text characteristics, and vectorizing the text characteristics to obtain text vectors;
Scoring the text vector through a key matrix and a value matrix of the coding layer to obtain an attention distribution vector;
the text vector and the attention distribution vector are spliced to obtain a splicing direction;
And obtaining the response text according to the splicing vector through the decoding layer.
In an embodiment, the step of obtaining the attention distribution vector after scoring the text vector by the key matrix and the value matrix of the coding layer includes:
Obtaining an operation result after carrying out product operation on the text vector according to the key matrix;
And obtaining the attention distribution vector after carrying out product operation on the operation result according to the value matrix.
In specific implementation, after inputting a text into a coding layer, the coding layer performs feature extraction on the text to obtain text features and vectorizes the text features to obtain text vectors; and performing product operation on the text vector and the value matrix in the coding layer to obtain an operation result, performing product operation on the operation result and the key matrix, and scoring the text vector, namely judging the attention of the text vector to which the attention is distributed, so as to obtain an attention distribution vector. In the process of training a model through forward propagation and backward propagation calculation to obtain new model parameters, during a first round of dialogue, a value matrix and a key matrix are initialized randomly, after a final loss function result is obtained through model training, the model is updated, and the value matrix and the key matrix are optimized, and during each round of dialogue, the value matrix and the key matrix obtained during the previous round of model training are adopted in the forward propagation calculation.
In an embodiment, the step of obtaining the spliced vector after the text vector and the attention distribution vector are spliced includes:
And processing the text vector and the attention distribution vector based on the jump connection mode to obtain a spliced vector.
In a specific implementation, the text vector and the attention distribution vector are summed, i.e. vector addition, according to the concept of jump connection using a residual network, thereby obtaining a stitched vector.
In an embodiment, the step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text further includes:
And adjusting the dimensionality of the spliced vector through a multi-layer perceptron structure of the coding layer.
In a specific implementation, according to the input vector dimension set by the LSTM network in the decoding layer, the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the encoding layer.
In an embodiment, the step S3 of updating the coding layer by back-propagation computation of the multi-turn dialog-generating model in response to the text to obtain the final multi-turn dialog-generating model includes:
And after the back propagation calculation is carried out on the initial multi-round dialogue generating model according to the response text to obtain a loss function value, updating model parameters according to the loss function value to obtain the final multi-round dialogue generating model.
In an embodiment, the model parameters include at least one of a key matrix and a value matrix.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of a multi-round dialogue generating model creation system according to the present invention. As shown in fig. 3, the system for creating a multi-turn dialogue creation model according to the present invention is applicable to the method for creating a multi-turn dialogue creation model, and the system for creating a multi-turn dialogue creation model includes:
the coding layer construction unit 51 constructs a coding layer based on the attention mechanism, processes the text by the coding layer to obtain a text vector and an attention distribution vector, and processes the text vector and the attention distribution vector by the decoding layer to obtain a response text;
The decoding layer construction unit 52 constructs a decoding layer based on the LSTM network, and updates the encoding layer by back-propagation calculation of the initial multi-turn dialog generation model in response to the text, thereby obtaining a final multi-turn dialog generation model.
Example III
Referring to fig. 4, a specific implementation of an electronic device is disclosed in this embodiment. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, the memory 82 may comprise a hard disk drive (HARD DISK DRIVE, abbreviated HDD), a floppy disk drive, a solid state drive (SolidState Drive, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a universal serial bus (Universal Serial Bus, abbreviated USB) drive, or a combination of two or more of these. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the abnormal data monitoring apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated FPROM), an electrically erasable PROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, abbreviated EFPROM), an electrically rewritable ROM (ElectricallyAlterableRead-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory, FPMDRAM), an extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory, EDODRAM), a synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory, SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 implements any of the multi-turn dialog-generating model building methods of the above-described embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to enable communication between modules, devices, units and/or units in embodiments of the application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/abnormal data monitoring equipment, the database, the external storage, the image/abnormal data monitoring workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both that couple components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (ACCELERATED GRAPHICS Port, abbreviated as AGP) or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) Bus, a Front Side Bus (Front Side Bus, abbreviated as FSB), a HyperTransport (abbreviated as HT) interconnect, an industry standard architecture (Industry Standard Architecture, abbreviated as ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated as MCA) Bus, a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (SERIAL ADVANCED Technology Attachment, abbreviated as SATA) Bus, a video electronics standards Association local (Video Electronics Standards Association Local Bus, abbreviated as VLB) Bus, or other suitable Bus, or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device may connect to a multi-round dialog-generating model building system to implement the method in connection with fig. 1-2.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In summary, the invention establishes and reasonably stores and utilizes the model of the round-going dialogue information by the design and introduction of the attention mechanism aiming at the storage mode, the attention design mode and the introduction mode of the residual error network of the round-going dialogue information, solves the problems of low dialogue generation quality, low utilization rate and mining degree of the round-going dialogue information, unreasonable dialogue information storage and the like, analyzes the dialogue scene of a user to establish a user figure, improves the utilization rate and mining degree of the round-going dialogue information, and has positive significance for improving the dialogue generation quality.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. The protection scope of the patent of the application shall therefore be subject to the protection scope of the appended claims.

Claims (7)

1. A method for creating a multi-turn dialog generation model for application to a multi-turn dialog scene, comprising:
an encoding layer based on an attention mechanism and a decoding layer based on an LSTM network construct an initial multi-round dialogue generation model;
processing the text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
Updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model;
The step of processing the text by the encoding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector by the decoding layer to obtain a response text comprises the following steps:
extracting features of the text to obtain text features, and vectorizing the text features to obtain the text vectors;
scoring the text vector through a key matrix and a value matrix of the coding layer to obtain the attention distribution vector;
splicing the text vector and the attention distribution vector to obtain a spliced vector;
the dimension of the spliced vector is adjusted through the multi-layer perceptron structure of the coding layer;
the step of obtaining the attention distribution vector after scoring the text vector through the key matrix and the value matrix of the coding layer includes:
performing product operation on the text vector according to the key matrix to obtain an operation result;
And carrying out product operation on the operation result according to the value matrix to obtain the attention distribution vector.
2. The method of creating a model for multiple rounds of dialogue generation according to claim 1, wherein the step of obtaining a spliced vector after splicing the text vector and the attention distribution vector comprises:
and processing the text vector and the attention distribution vector based on a jump connection mode to obtain the spliced vector.
3. The method for creating the multi-round dialogue generating model as claimed in claim 2, wherein said step of updating said coding layer by back propagation calculation of said multi-round dialogue generating model by said response text comprises:
and carrying out back propagation calculation on the initial multi-round dialogue generation type model according to the response text to obtain a loss function value, and updating model parameters according to the loss function value to obtain the final multi-round dialogue generation type model.
4. A multi-round dialog generation model establishment method in accordance with claim 3, characterized in that the model parameters comprise at least one of the key matrix and the value matrix.
5. A multi-round dialog-generating model building system, characterized in that it is adapted to a multi-round dialog-generating model building method as claimed in any of the preceding claims 1 to 4, comprising:
the coding layer construction unit is used for constructing a coding layer based on an attention mechanism, processing a text through the coding layer to obtain a text vector and an attention distribution vector, and processing the text vector and the attention distribution vector through the decoding layer to obtain a response text;
And the decoding layer construction unit is used for constructing a decoding layer based on an LSTM network, and updating the coding layer by carrying out back propagation calculation on the initial multi-round dialogue generation type model through the response text to obtain a final multi-round dialogue generation type model.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the multi-round dialog generation model building method of any of claims 1 to 4 when the computer program is executed.
7. An electronic device readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-round dialog generation model building method of any of claims 1 to 4.
CN202111180118.3A 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishment method, system, electronic equipment and medium Active CN113868395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111180118.3A CN113868395B (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishment method, system, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111180118.3A CN113868395B (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishment method, system, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN113868395A CN113868395A (en) 2021-12-31
CN113868395B true CN113868395B (en) 2024-08-02

Family

ID=79002470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111180118.3A Active CN113868395B (en) 2021-10-11 2021-10-11 Multi-round dialogue generation type model establishment method, system, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113868395B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
CN110032633A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 More wheel dialog process method, apparatus and equipment

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11449744B2 (en) * 2016-06-23 2022-09-20 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
US20180329884A1 (en) * 2017-05-12 2018-11-15 Rsvp Technologies Inc. Neural contextual conversation learning
DK201770431A1 (en) * 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
EP3486842A1 (en) * 2017-11-17 2019-05-22 Digital Genius Limited Template generation for a conversational agent
US10978051B2 (en) * 2018-09-28 2021-04-13 Capital One Services, Llc Adversarial learning framework for persona-based dialogue modeling
US11087092B2 (en) * 2019-03-05 2021-08-10 Salesforce.Com, Inc. Agent persona grounded chit-chat generation framework
CN110413752B (en) * 2019-07-22 2021-11-16 中国科学院自动化研究所 Multi-turn spoken language understanding method, system and device based on conversation logic
US11264009B2 (en) * 2019-09-13 2022-03-01 Mitsubishi Electric Research Laboratories, Inc. System and method for a dialogue response generation system
CN110929476B (en) * 2019-09-27 2022-09-30 中国人民解放军63626部队 Task type multi-round dialogue model construction method based on mixed granularity attention mechanism
CN112231457A (en) * 2020-10-19 2021-01-15 北京明略昭辉科技有限公司 Multi-turn dialogue generation method and device for chatting robot and chatting robot
US11132988B1 (en) * 2020-10-22 2021-09-28 PolyAI Limited Dialogue system, a dialogue method, and a method of training
CN113342947B (en) * 2021-05-26 2022-03-15 华南师范大学 Multi-round dialog text generation method capable of sensing dialog context relative position information
CN113239174A (en) * 2021-06-09 2021-08-10 华南师范大学 Hierarchical multi-round conversation generation method and device based on double-layer decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
CN110032633A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 More wheel dialog process method, apparatus and equipment

Also Published As

Publication number Publication date
CN113868395A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
CN107632987B (en) A kind of dialogue generation method and device
CN107545889A (en) Suitable for the optimization method, device and terminal device of the model of pattern-recognition
CN112115267A (en) Training method, device and equipment of text classification model and storage medium
CN113570030A (en) Data processing method, device, equipment and storage medium
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
US20230252294A1 (en) Data processing method, apparatus, and device, and computer-readable storage medium
CN113918681A (en) Reading understanding method and system based on fragment extraction, electronic device and storage medium
CN118377869A (en) Processing method and device for self-correction fact enhancement of large language model
CN113569705B (en) Scene segmentation point judging method, system, storage medium and electronic equipment
CN114613450A (en) Method and device for predicting property of drug molecule, storage medium and computer equipment
CN113743277A (en) Method, system, equipment and storage medium for short video frequency classification
CN113255334A (en) Method, system, electronic device and storage medium for calculating word vector
CN113868395B (en) Multi-round dialogue generation type model establishment method, system, electronic equipment and medium
CN116644180A (en) Training method and training system for text matching model and text label determining method
CN114049539B (en) Collaborative target identification method, system and device based on decorrelation binary network
CN112132272B (en) Computing device, processor and electronic equipment of neural network
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN114528810A (en) Data code generation method and device, electronic equipment and storage medium
CN112819513A (en) Text chain generation method, device, equipment and medium
CN113343669B (en) Word vector learning method, system, electronic equipment and storage medium
CN113192491B (en) Acoustic model generation method, acoustic model generation device, computer equipment and storage medium
CN118249817B (en) Decoding method and device, electronic equipment and computer readable storage medium
CN117544822B (en) Video editing automation method and system
CN113761167B (en) Session information extraction method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant