CN115577118B - Text generation method based on mixed grouping ordering and dynamic entity memory planning - Google Patents
Text generation method based on mixed grouping ordering and dynamic entity memory planning Download PDFInfo
- Publication number
- CN115577118B CN115577118B CN202211216143.7A CN202211216143A CN115577118B CN 115577118 B CN115577118 B CN 115577118B CN 202211216143 A CN202211216143 A CN 202211216143A CN 115577118 B CN115577118 B CN 115577118B
- Authority
- CN
- China
- Prior art keywords
- entity
- graph
- sub
- sequence
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000003068 static effect Effects 0.000 claims abstract description 49
- 230000007246 mechanism Effects 0.000 claims abstract description 38
- 238000002789 length control Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 38
- 238000013528 artificial neural network Methods 0.000 claims description 28
- 125000004122 cyclic group Chemical group 0.000 claims description 14
- 230000000306 recurrent effect Effects 0.000 claims description 9
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 238000013519 translation Methods 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 230000005055 memory storage Effects 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 5
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 101100356020 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) recA gene Proteins 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101100412102 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) rec2 gene Proteins 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a text generation method based on mixed grouping ordering and dynamic entity memory planning, which aims to automatically convert input structured data into readable text describing the data. The invention selects sub-image grouping through a length control module and a sub-image observation module in the grouping stage, and sorts the data according to the group; generating a static node content plan in a static planning stage to achieve ordered inter-group groups in the group; each time step dynamically decides the data output in the next step according to the memory network on the basis of static planning; with three-level reconstruction, the decoder is directed from multiple angles to capture essential features in the input. The invention introduces a finer granularity grouping mechanism to make up the gap between structured data and unstructured text; the dynamic content planning is further combined with a memory network, so that the consistency of the semantics is enhanced; a three-level reconstruction mechanism is introduced to capture the intrinsic feature dependency between the input and the output from different levels.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a text generation method which is suitable for converting input structured data into readable text problems describing the data.
Background
Text generation tasks are an important topic in the field of natural language processing. Life data may appear in different forms under different circumstances, even some forms are difficult for a non-professional field to understand, for example: knowledge graph. And the conversion of such data into readable text provides people with a great deal of time and effort. Whereas the Data-to-text task aims at automatically converting the input structured Data into readable text describing these Data.
Reiter [1] Summarizing the text generation system, it is considered that it can be divided into three more independent modules: (1) Content planning (Content planning), i.e. selecting which data records or data fields to describe; (2) Sentence planning (Sentence planning), i.e. determining the order of the selected data records or data fields in the sentence; (3) Sentence realization (Surface realization), i.e. the generation of the actual text based on the result of the sentence planning. Intuitively, content planning mainly solves what, sentence planning mainly solves speaking order, and sentence implementation mainly solves what. This essentially becomes the paradigm of text generation systems, and more end-to-end models have begun to add content selection and content planning modules to improve performance in recent years. Puduppully et al [2] A neural network architecture is provided, which divides the generation task into a content selection and planning stage and a sentence realization stage. Given a set of data records, a content plan is first generated highlighting which information should be mentioned and in what order, then documents are generated based on the content plan, and a replication mechanism is added to promote decoder effects. Chen et al [3] A text generation model based on dynamic content planning is proposed to be implemented according to a generated text information dynamic adjustment plan, and a reconstruction mechanism is added to promote a decoder to capture essential features which the encoder wants to express. Puduppully et al [4] According to the generation process information and entity memory, the entity representation is dynamically updated, entity transition among sentences is captured, continuity among sentences is increased, and the content to be described is more properly selected.
Although the sentence realization phase can generate fluent text, information loss, duplication or illusion problems still occur, so the grouping concept is widely used, aligning entities with descriptive text to solve such problems. Lin et al [5] Separators are added to the plan for fine-grained classification to facilitate long text generation. Shen et al [6] Grouping the entity data, each portion being associated with a segment of the traceThe text corresponds to the corresponding description text can be generated by the designated entity pair without paying attention to the whole data. Xu et al [7] And sequencing and aggregating the input triplet data so as to align the triplet data with the output descriptive text, and generating the descriptive text sentence by sentence.
In connection with the above understanding, the content planning and sentence planning sections are further enhanced. Introducing a grouping mechanism with finer granularity, and constructing a corresponding static planning generation strategy in a matched manner; the memory network is further combined with entity transfer to grasp transfer of description key points among sentences; further reconstruction from multiple angles ensures that the multiple stages capture the essential features between the input and output.
Reference is made to:
[1]Reiter E.An architecture for data-to-text systems[C]//proceedings of the eleventh European workshop on natural language generation(ENLG 07).2007:97-104.
[2]Puduppully R,Dong L,Lapata M.Data-to-text generation with content selection and planning[C]//Proceedings of the AAAI conference on artificial intelligence.2019,33(01):6908-6915.
[3]Chen K,Li F,Hu B,et al.Neural data-to-text generation with dynamic content planning[J].Knowledge-Based Systems,2021,215:106610.
[4]Puduppully R,Dong L,Lapata M.Data-to-text generation with entity modeling[J].arXiv preprint arXiv:1906.03221,2019.
[5]Lin X,Cui S,Zhao Z,et al.GGP:A Graph-based Grouping Planner for Explicit Control of Long Text Generation[C]//Proceedings of the 30th ACM International Conference on Information&Knowledge Management.2021:3253-3257.
[6]Shen X,Chang E,Su H,et al.Neural data-to-text generation via jointly learning the segmentation and correspondence[J].arXiv preprint arXiv:2005.01096,2020.
disclosure of Invention
The invention aims to: aiming at the structural difference when converting structured data into linear readable texts, the prior model adopts an advanced planning method to make up for the structural difference, but the traditional planning method adopts a single cyclic neural network which is simple and not fine enough in granularity, and all the methods are realized by planning first and then, and the problems of adjustment in combination with a text generation process are solved.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
a text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that the entity representation is updated by utilizing information of a generation process and entity transfer memory based on static planning, the static planning is corrected, and finally a decoder is promoted to obtain more accurate important information from an encoder through three-level reconstruction. The method specifically comprises the following steps:
step 1) taking a structured data set needing to generate a corresponding text as model input, wherein the data is expressed in a form of a table or a knowledge graph, converting the obtained data into a bipartite graph, and carrying out embedded expression by using a graph attention mechanism;
step 2) grouping and sorting the data vectors obtained in the step 1) through a grouping stage; the grouping phase comprises two modules: a length control module and a sub-graph observation module; the length control module acts on each generation step, the generated sub-image sequence information is combined to map into probability distribution, the sub-image with the number of triples LC is selected from the next time step generation sub-image according to the probability distribution, the time step can only select the sub-image with the number of triples LC, if LC is selected as-1, the grouping stage is ended, and the step 5 is entered;
step 3) using the selection space for generating the sub-graph length LC control sub-graph obtained in the step 2), the sub-graph observation mechanism obtains the representation of the sub-graph according to the self-attention mechanism of all nodes in the sub-graph, and performs the attention mechanism with the sub-graph and node information in the previously generated sub-graph sequence, so as to generate the probability of selecting each sub-graph;
step 4) selecting a certain subgraph according to the probability distribution obtained in the step 3), and then updating node representations in all subgraphs by using the hidden state of the current step of the cyclic neural network, namely updating the representations of all subgraphs, and returning to the step 2); if step 2) LC is selected as-1, a final sub-graph sequence is obtained, and each sub-graph is a subset of the input structured data set;
step 5) static content planning stage selection generating entity sequence SP, representing V by global node global As the initialization state of the cyclic neural network, the selection space of each step is the corresponding subgraph in the sequence of step 4); when generating special sub-graph end mark<EOG>When the next step of input of the cyclic neural network is the representation of the current subgraph, the next step of selection space is obtained according to the subgraph sequence obtained in the step 4); when traversing the sub-image sequence, obtaining a final static content planning SP entity sequence;
step 6) coding the SP entity sequence obtained in the step 5) through a bidirectional gating loop network to obtain an SP sequence entity hiding representation e 1-n N represents the total number of entities in the SP sequence; transmitting the SP sequence hidden representation to a generation stage and an entity memory module;
step 7), the entity memory module uses the SP sequence entity hidden representation as initial content to carry out memory storage; hiding state d by using generation-stage recurrent neural network t-1 Updating physical memory u t,k Memorizing the entity u t,k And d t-1 Multiplying to obtain memory weight ψ t,k Where t represents the t-th time step and k represents the kth entity;
step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k ;
Step 9) memorizing the weight ψ t,k And corresponding entity context vector S t,k The weights are summed to obtain a context vector q t As an input of the pointer generation decoder, enhancing the pointer decoder by adopting a graph structure enhancement mechanism to generate a translation text corresponding to the structured data;
and 10) adopting three-level reconstruction to enable the decoder to completely acquire the information contained in the encoder, respectively reconstructing a static content planning SP according to the translation text, removing a sub-picture sequence of a reconstruction grouping stage according to the static content planning sequence, generating a decoding result of the decoder according to the pointer, and recovering the decoding result to be a bipartite graph representation.
Further, the data in the step 1) are represented in the form of a table or a knowledge graph, wherein the structured data exist in the form of records, and the structured data exist in the form of triples in the knowledge graph;
the knowledge graph is used as structured input data, and the triples are formed by a head entity, a relation and a tail entity; converting the obtained data into a bipartite graph, namely, representing the relation in the triplet as a node, and simultaneously adding a global node to observe the structural system information of the whole graph; all nodes are represented embedded using a graph attention mechanism.
Further, the SP entity sequence obtains the SP entity hidden representation e through Bi-gating cyclic network Bi-GRU 1-n Fusing the sequence information of the SP into entity embedding;
hidden state d using generation of decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity; comprising the following steps:
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
first, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k The method comprises the steps of carrying out a first treatment on the surface of the Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. information decision of the generated text; shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k Determining; in the formula (8), the expression "a",representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modify content->Updating to obtain entity memory content u of current time step t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u t,k And d t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module t,k ;W,b γ ,b d ,b u Is a super parameter.
Further, step 8) loops the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:
S t,k =a t,k ·e k (12)
equation (11) will generate the hidden state d of the phase-cycled neural network t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity;
equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention relates to a text generation method which combines deep reinforcement learning and is realized while planning, and is used for the condition that a readable text is required to be automatically output given structured input data. In planning, not only the importance of the input data per se is considered, but also the information of the generated text and the memory of the past physical changes are considered, and fine granularity grouping is further carried out, so that the generated planning is consistent with the golden planning as much as possible.
Meanwhile, three-level reconstruction is adopted to capture the essential characteristics between input and output from different layers. Reconstructing a static plan according to the generated text, and enabling the generated text to be basically consistent with the static plan, so as to ensure that the dynamic plan finely adjusts the static plan only according to the generated information; reconstructing the sequence of the selected entities in the groups according to the static programming sequence to ensure that the static programming sequence still keeps ordered among the groups; and restoring the decoding result of the pointer generation decoder to be bipartite graph representation, and reconstructing based on vector angles to enable the final decoding result to reflect the input essential characteristics.
Drawings
FIG. 1 is a flow chart of a text generation algorithm of the present invention;
FIG. 2 is a block diagram of a text generation algorithm of the present invention;
FIG. 3 is a packet phase flow diagram;
FIG. 4 is a block diagram of a packet phase;
fig. 5 is a process build diagram of a packet phase selection subgraph.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The text generation algorithm based on the mixed packet ordering and dynamic entity memory planning according to the invention is further described in detail with reference to the flow chart and the implementation case.
The text generation algorithm is improved by adopting mixed grouping ordering and dynamic entity memory planning, so that the planning performance is improved, and a coherent text is generated. The flow of the method is shown in fig. 1, the algorithm structure is shown in fig. 2, and the method comprises the following steps:
and step 10, converting the input structured data into bipartite graph representation, namely, representing the relationship in the triples as nodes, adding global nodes to observe the structural system information of the whole graph, and carrying out node embedding representation through a graph attention mechanism (GAT).
Step 20, inter-group ordering is performed by the grouping stage. The grouping phase comprises two modules: the length control module is used for controlling the selection space of the subgraph according to the length, and the subgraph observation module is used for selecting the subgraph by combining the generated subgraph sequence information in the selection space designated by the length control.
And step 30, generating an entity sequence SP in the static programming stage, and achieving the order among groups in the groups, as shown in formulas (1) - (4). The static planning stage adopts a cyclic neural network, and aims to generate a node sequence, and plan and generate the description content and sequence of the text in advance.
If LC=-1 then d 0 =V GLOBAL (1)
When the value range of the return value LC of the Length control module is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. As shown in formula (1), when the length control module return value LC of the grouping phase is-1, the grouping phase is ended, the static programming phase is entered, and the global node is used for representing V global As an initialization state of the recurrent neural network.
The selection space of the nodes is limited by the group, and all nodes or nodes in the current group can be selected in the content planning stage<EOG>An indication is made to indicate that the sub-graph utilization currently being the selected space is complete. Equation (2) is based on the kth subgraph G k The z-th node in (a) representsHidden state d with previous time step loop network t-1 Computing gating gate z Gating gate z The degree of association of the node itself with the static plan is measured, where z represents the z-th node. Gate is controlled as shown in formula (3) z And the kth sub-graph G k The z-th node in (2) represents->The context representation of the node is obtained after multiplication +.>The node importance is judged in conjunction with the generated SP sequence. Finally formula (4) is according to +.>Calculating the node of the current step selection node k,z Probability of (2)Namely, the relevance between each node in the subgraph and the node sequence generated by the preamble is measured, the node is selected, and each time step of the cyclic neural network takes the node representation selected in the previous step as input.
When the last time step SP is generated as<EOG>With special symbols, i.e. representing sub-graph G currently being the space of choice k With end, selecting the next sub-graph G in accordance with the sub-graph sequence generated in the grouping phase k+1 . Inputting a last sub-graph vector representation into a recurrent neural networkSub-picture vector representation +.>Pooling subgraph G by averaging k+1 Is derived from all node representations. For G k+1 The subgraph repeats the operations of the above formulas (1) - (4). And when the sub-graph sequence traversal obtained in the grouping stage is finished, the static content planning stage also obtains a final SP entity sequence.
Step 40, the SP entity sequence obtains the SP entity hidden representation e through Bi-directional gating cyclic network Bi-GRU 1-n ,(e 1 ,e 2 ,...,e n )=Bi-GRU(SP 1 ,SP 2 ,...,SP n ) The sequence information of the SP is fused into the entity embedding.
Step 50, combining the generated text information with the entity memory stored in the entity memory network to obtain a context q t As shown in equations (5) - (13).
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
Further, utilizing the hidden state d of the generated decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity. First, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k . Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. the information decision that text has been generated. Shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k And (5) determining. In the formula (8), the expression "a",representing the content of the memory modification to the entity. Finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modify content->Updating to obtain entity memory content u of current time step t,k . Equation (10) will u t,k And d t-1 Attention machine is performed to make the attention weight ψ of the memory module t,k 。
S t,k =a t,k ·e k (12)
Further, equation (11) will generate hidden state d of the phase-cycled neural network t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity.
Equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
Step 60, q t As an input to the pointer generation network, the pointer generation network is enhanced with a graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to generate text, as shown in formulas (14) - (19).
Equation (14) is based on the hidden state d of the generation module t And context vector q t Context vector computed to generate hidden states
Equation (15) utilizes a context vector that generates phase hidden statesProjected as a probability distribution of the same length as the vocabulary. Equation (16) takes the attention weight to an entity in the memory network as the duplication probability +.>The formula (17) adopts the existing graph structure enhancement mechanism (Graph Structure Enhancement Mechanism) to enhance the replication probability given by the pointer generation network by means of the collage of the graph.
θ=Sigmoid(W·d t +b d ) (18)
The conditional probability is used to combine the generation probability and the replication probability. Equation (18) calculates the probability that θ is used to select the copy or generate mode. Equation (19) soft combines the generation probability and the graph structure enhanced replication probability to obtain the final probability distribution.
The model is constructed by adopting a pipeline model and is divided into a grouping stage, a static programming stage, a physical memory stage and a pointer generation decoding stage. And extracting by the existing information extraction system, and comparing the reference text in the sample with the input structured data to obtain the static planning gold standard. Since dynamic content planning does not give the displayed golden standard, the memory module parameters are updated by means of the generated loss function. The reference text in the sample and the obtained static planning gold standard can be compared with the generated text and the generated static planning to obtain a loss function.
Equation (20) represents a negative log likelihood loss function of the generated text, causing the generated text to coincide as much as possible with the reference text given in the sample. Wherein,,expressed as reference text, t represents the t-th time step. Equation (21) is used for regularization of the loss function, where T represents the generated text length, i.e. the total number of time steps, +.>And (3) an average value of the statistical probability, wherein gamma is a super parameter.
Equation (22) characterizes the negative log likelihood of generating the SP sequence, maximizing the probability of generating the static programming gold standard, |SP| represents the sequence length of the static programming,representing static content planning gold marks->Is included in the node (a).
And step 70, reconstructing static programming according to the generated translation text by adopting three-level reconstruction, de-reconstructing groups according to a static programming sequence, generating a decoding result according to the pointer, and recovering to be a bipartite graph representation.
P rec1 (SP=node z )=Softmax(W·h t +b) (23)
Further, the first level of reconstruction uses a recurrent neural network as a model, and the static programming is reconstructed according to the translation text, namely, the static programming SP is extracted from the decoded vocabulary embedded representation. Generating hidden state vector of last moment of decoder by pointerInitializing the hidden state of the cyclic neural network, which is defined as h 0 . Calculation of translated text context vector by attention manipulation of all vocabulary embedded representations of hidden state and pointer generation decoder>As an input to the recurrent neural network. Equation (23) calculates the probability of selecting a node, where h t The hidden state output by the neural network is cycled at t time, and W, b is a super parameter. Thus, the loss function to generate a reconstruction of the text to the static programming section may be defined as:
where |SP| represents the sequence length of the generated static plan, node z Represents the z-th node in the SP entity sequence generated in the static planning stage,and (3) calculating the average value of the probabilities of each time step in the reconstruction 1 stage, wherein the average value is used for regularization of a loss function, and gamma is a super parameter. Loss function L rec1 The aim is to extract as much as possible the previous static plan from the generated descriptive text.
Further, the second level of reconstruction de-reconstructs the order of these selected entities in the packet according to the static programming sequence, i.e., reverts from the static programming to the packet sequence number, with the aim of preserving the ordered characteristics among the group of packet phases. The second-level reconstruction and the first-level reconstruction adopt similar structures, a static planning sequence is input through an attention mechanism, and a cyclic neural network is adopted to generate a corresponding grouping sequence. The loss function of the secondary reconstruction can be defined as:
where |g| denotes the length of the packet sequence, G k Representing the kth sub-graph in the sub-graph sequence generated by the packet phase,and (3) calculating the average value of the probabilities of each time step in the 2-stage reconstruction, wherein the calculation mode is the same as the formula (25), and gamma is a super parameter.
Further, the first-stage reconstruction and the second-stage reconstruction, namely, the generation of text reconstruction static programming and static programming reconstruction packet sequences, adopt serial number-based reconstruction. For the third level reconstruction, i.e. the network decoding result is restored to the bipartite graph representation according to the pointer generation, the reconstruction based on the vector representation is adopted.
L rec3 =KL((m 1 ,m 2 ,...m |V| ),GAT CUBE (Bipartite)) (27)
Equation (27) is expressed as m by encoding and re-decoding the pointer generation network decoding result to obtain the codes of the 1 st to |V| nodes 1 ,m 2 ,...m |V| ,L rec3 The decoded result is required to be consistent with the embedded representation of the bipartite graph after being subjected to graph attention mechanism (GAT) coding, and the reconstruction is carried out from the angle of vector representation, so that the KL divergence is used as a loss function.
L TOTAL =λ 1 ·L sp +λ 2 ·L lm +λ 3 ·L rec1 +λ 4 ·L rec2 +λ 5 ·L rec3 (28)
Finally, the model penalty function may be defined as equation (28), defined by a combination of three penalty functions of static programming penalty, text penalty generation, and three level reconstruction. Lambda (lambda) 1-5 Are super parameters.
As shown in fig. 3, the packet phase flow chart is as follows:
step 101, the Length control module selects the Length LC of the sub-image to be generated by combining the information of the generated sub-image sequence, the range of the LC is [ -1, max Length]Max Length is the total number of triples in the input structure data and is used for determining the space for selecting the subgraph in the next time step. The method comprises the steps of carrying out a first treatment on the surface of the If LC is-1, the packet phase terminates.The representation of the sub-graph is selected for the previous step for updating the length control memory vector L.
P LC =Softmax(W LC ·L t +b LC ) (30)
The formula (29) updates the Length control memory vector L according to the generated sequence, the formula (30) uses the L vector to project into probability steps with the Length equal to Max length+1, and the number of triples is selected according to the probability, and is defined as LC. Limiting the selection space of generating the sub-graph in the current step through the length of the sub-graph, and only selecting the sub-graph with the number of triples LC in the sub-graph by sub-graph observation. Where γ is the hyper-parameter and t represents the t-th time step.
V Global ←GAT(Bipartite) (31)
LC=Sigmoid(W·V Global +b LC ) (32)
As shown in the block diagram of fig. 4, the first time step lacks the generated sub-picture sequence information, so equation (31) is initialized with the global nodes in the bipartite graph.
As shown in fig. 5, all possible sub-graph sets can be defined as a three-dimensional Tensor, called square matrix Cube. Equation (33) converts the selected LC into one-hot vector LC onehot The selection of a sub-image space of a certain length can be intuitively understood from fig. 5, i.e. the selection of the corresponding page from Cube.
Step 102, controlling a selection space according to the length LC of the generated sub-graph, observing the sub-graph, and carrying out an attention mechanism with sub-graph and node information in the sequence of the generated sub-graph according to the representation of each sub-graph, so as to generate the probability of selecting each sub-graph.
The formula (34) is to average and pool the node representations in the subgraph to be the representation of the subgraph, and the node representations are updated at each moment, so that the subgraph representations are different, the subscript t represents the t-th time step, i represents the i-th subgraph, and j represents the j-th node in the subgraph. Attention mechanisms are added from the sub-graph level and the node level, respectively.
Equation (35) calculates the attention score, g, for the candidate sub-graph and the previously selected sub-graph t,i Representation of a candidate sub-graph i at time t, G k The selected sub-graph embeds the representation at time k before the representation. Equation (36) will be the information g of the candidate graph itself t,i With previously selected sub-picture information G k The attention mechanism of the sub-graph layer is completed by fusing the attention mechanismsThe following vectors
Equation (37) will sub-graph context vectorThe attention score is calculated with all nodes of the previously selected subgraph,representing the z-th node in the selected graph k. Equation (38) will be the candidate subgraph context vector +.>Information about all nodes in the previously selected subgraph +.>The attention mechanism of the node layer is completed by fusing the attention mechanisms, and the context vectors of the child nodes are +.>Finally, formula (39) is based on the node context vector +.>Calculating a probability of selecting the subgraph, wherein W G 、b G Is a packet phase super parameter. As shown in figure 5 of the drawings,intuitively, it is understood that selecting a sub-image representation from sub-image pages of the same length.
Step 103, updating node representations in all sub-graphs after selecting a certain sub-graph, namely updating the representations of all sub-graphs.
u=σ(W update G t +b update ) (40)
After each step of selecting the sub-graph, the representation of all nodes is updated by selecting the information of the sub-graph, so that even if the sub-graph is repeatedly selected, its representation is different. Formula (40) subgraph G selected by the previous step t Representing computing update content, wherein W update 、b update To update the super parameters. Equation (41) calculates gating for adjusting the relationship between the previous information retention and the updated information of the newly added subgraph, W gate 、b gate Super parameters are calculated for gating. Equation (42) updates its representation for each node, where t represents the t-th time and v represents the v-th node, so the node representation is different for each time.
Step 104, repeating the operations from step 101 to step 103 after updating all node representations until the LC value generated in step 101 is-1, and ending the grouping stage to obtain a final sub-graph sequence.
The grouping phase cannot extract gold standards from the samples. Therefore, the whole model can be warmed up first, the model is unfamiliar to data in the initial stage, and the weight distribution is continuously corrected by learning with a smaller learning rate. After the model has a certain familiarity to the data, fixing all parameters of the subsequent modules, and adjusting the weight of the grouping stage by using the loss function of the subsequent modules. Method 1: and fixing parameters of the static planning module, inputting grouping results of the grouping stage into the static planning stage to generate SP, and comparing the SP with the golden static planning given by the data set to obtain a loss function. Method 2: and (3) completely fixing all parameters from the static planning module to the pointer generation network module, inputting a grouping result, outputting a generated text, and comparing the generated text with a reference text given by a sample to obtain a loss function. The super parameters of the training packet phase are updated by methods 1 and 2.
Claims (4)
1. A text generation method based on mixed grouping ordering and dynamic entity memory planning is characterized in that: the method comprises the following steps:
step 1) taking a structured data set needing to generate a corresponding text as model input, wherein the data is expressed in a form of a table or a knowledge graph, converting the obtained data into a bipartite graph, and carrying out embedded expression by using a graph attention mechanism;
step 2) grouping and sorting the data vectors obtained in the step 1) through a grouping stage; the grouping phase comprises two modules: a length control module and a sub-graph observation module; the length control module acts on each generation step, the generated sub-image sequence information is combined to map into probability distribution, the sub-image with the number of triples LC is selected from the next time step generation sub-image according to the probability distribution, the time step can only select the sub-image with the number of triples LC, if LC is selected as-1, the grouping stage is ended, and the step 5 is entered;
step 3) using the selection space for generating the sub-graph length LC control sub-graph obtained in the step 2), the sub-graph observation mechanism obtains the representation of the sub-graph according to the self-attention mechanism of all nodes in the sub-graph, and performs the attention mechanism with the sub-graph and node information in the previously generated sub-graph sequence, so as to generate the probability of selecting each sub-graph;
step 4) selecting a certain subgraph according to the probability distribution obtained in the step 3), and then updating node representations in all subgraphs by using the hidden state of the current step of the cyclic neural network, namely updating the representations of all subgraphs, and returning to the step 2); if step 2) LC is selected as-1, a final sub-graph sequence is obtained, and each sub-graph is a subset of the input structured data set;
step 5) static content planning stage selection generating entity sequence SP, representing V by global node global As the initialization state of the cyclic neural network, the selection space of each step is the corresponding subgraph in the sequence of step 4); when generating special sub-graph end mark<EOG>When the next step of input of the cyclic neural network is the representation of the current subgraph, the next step of selection space is obtained according to the subgraph sequence obtained in the step 4); when traversing the sub-image sequence, obtaining a final static content planning SP entity sequence;
step 6) coding the SP entity sequence obtained in the step 5) through a bidirectional gating loop network to obtain an SP sequence entity hiding representation e 1-n N represents the total number of entities in the SP sequence; transmitting the SP sequence hidden representation to a generation stage and an entity memory module;
step 7), the entity memory module uses the SP sequence entity hidden representation as initial content to carry out memory storage; hiding state d by using generation-stage recurrent neural network t-1 Updating physical memory u t,k Memorizing the entity u t,k And d t-1 Multiplying to obtain memory weight ψ t,k Where t represents the t-th time step and k represents the kth entity;
step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k ;
Step 9) memorizing the weight ψ t,k And corresponding entity context vector S t,k The weights are summed to obtain a context vector q t As an input of the pointer generation decoder, enhancing the pointer decoder by adopting a graph structure enhancement mechanism to generate a translation text corresponding to the structured data;
and 10) adopting three-level reconstruction to enable the decoder to completely acquire the information contained in the encoder, respectively reconstructing a static content planning SP according to the translation text, removing a sub-picture sequence of a reconstruction grouping stage according to the static content planning sequence, generating a decoding result of the decoder according to the pointer, and recovering the decoding result to be a bipartite graph representation.
2. The text generation method according to claim 1, characterized in that: step 1) the data are expressed in the form of a table or a knowledge graph, wherein the structured data exist in the form of records in the table, and the structured data exist in the form of triples in the knowledge graph;
the knowledge graph is used as structured input data, and the triples are formed by a head entity, a relation and a tail entity; converting the obtained data into a bipartite graph, namely, representing the relation in the triplet as a node, and simultaneously adding a global node to observe the structural system information of the whole graph; all nodes are represented embedded using a graph attention mechanism.
3. The text generation method according to claim 2, characterized in that: SP entity sequence obtains SP sequence entity hiding representation e through Bi-gating cyclic network Bi-GRU 1-n Fusing the sequence information of the SP into entity embedding;
hidden state d using generation of decoding recurrent neural network RNN t-1 Updating each physical memory u in the memory network t,k Where t represents the t-th time step and k represents the kth entity; comprising the following steps:
u -1,k =W·e k (5)
γ t =softmax(W·d t-1 +b γ ) (6)
δ t,k =γ t ⊙softmax(W·d t-1 +b d +W·u t-1,k +b u ) (7)
first, equation (5) represents e by entity k Initializing the memory of each entity, denoted as u -1,k The method comprises the steps of carrying out a first treatment on the surface of the Shown in formula (6), gamma t Representing gating, deciding whether to modify, and determining the hidden state d according to the previous time step in the generation stage t-1 I.e. information decision of the generated text; shown in formula (7), delta t,k Indicating the extent to which modification is required, according to d t-1 And physical memory u t-1,k Determining; in the formula (8), the expression "a",representing content of the memory modification to the entity; finally, the formula (9) is based on the physical memory u at the previous moment t-1,k And modifying the content u-to-u t,k Updating to obtain entity memory content u of current time step t,k The method comprises the steps of carrying out a first treatment on the surface of the Equation (10) will u t,k And d t-1 Performing an attention mechanism to obtain the attention weight ψ of the memory module t,k ;W,b γ ,b d ,b u Is a super parameter.
4. A text generation method according to claim 3, characterized in that: step 8) circulating the hidden state d of the neural network according to the generation stage t-1 And e 1-n The attention doing mechanism gets the attention score a 1-n Score attention a t,k And corresponding entity memory u t,k Multiplication to obtain the entity context vector S t,k The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps:
S t,k =a t,k ·e k (12)
equation (11) will generate hidden states for the phase-cycled neural networkState d t-1 Hidden with the 1 st to n th entities to represent e 1-n Attention score a is obtained through an attention mechanism 1-n Equation (12) scores attention a t,k Hidden from entities representation e k Multiplication to obtain the entity context vector S t,k Where t represents the t-th time step and k represents the kth entity;
equation (13) is based on the memory module attention weight ψ t,k For entity context vector S t,k The weight sums to obtain the context vector q of the current t moment t As a pointer, generates the input to the network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211216143.7A CN115577118B (en) | 2022-09-30 | 2022-09-30 | Text generation method based on mixed grouping ordering and dynamic entity memory planning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211216143.7A CN115577118B (en) | 2022-09-30 | 2022-09-30 | Text generation method based on mixed grouping ordering and dynamic entity memory planning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115577118A CN115577118A (en) | 2023-01-06 |
CN115577118B true CN115577118B (en) | 2023-05-30 |
Family
ID=84582422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211216143.7A Active CN115577118B (en) | 2022-09-30 | 2022-09-30 | Text generation method based on mixed grouping ordering and dynamic entity memory planning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115577118B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078866A (en) * | 2019-12-30 | 2020-04-28 | 华南理工大学 | Chinese text abstract generation method based on sequence-to-sequence model |
US11010666B1 (en) * | 2017-10-24 | 2021-05-18 | Tunnel Technologies Inc. | Systems and methods for generation and use of tensor networks |
CN113360655A (en) * | 2021-06-25 | 2021-09-07 | 中国电子科技集团公司第二十八研究所 | Track point classification and text generation method based on sequence annotation |
CN113657115A (en) * | 2021-07-21 | 2021-11-16 | 内蒙古工业大学 | Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion |
CN114048350A (en) * | 2021-11-08 | 2022-02-15 | 湖南大学 | Text-video retrieval method based on fine-grained cross-modal alignment model |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10474709B2 (en) * | 2017-04-14 | 2019-11-12 | Salesforce.Com, Inc. | Deep reinforced model for abstractive summarization |
EP3598339B1 (en) * | 2018-07-19 | 2024-09-04 | Tata Consultancy Services Limited | Systems and methods for end-to-end handwritten text recognition using neural networks |
US11763100B2 (en) * | 2019-05-22 | 2023-09-19 | Royal Bank Of Canada | System and method for controllable machine text generation architecture |
CN110795556B (en) * | 2019-11-01 | 2023-04-18 | 中山大学 | Abstract generation method based on fine-grained plug-in decoding |
US11481418B2 (en) * | 2020-01-02 | 2022-10-25 | International Business Machines Corporation | Natural question generation via reinforcement learning based graph-to-sequence model |
-
2022
- 2022-09-30 CN CN202211216143.7A patent/CN115577118B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11010666B1 (en) * | 2017-10-24 | 2021-05-18 | Tunnel Technologies Inc. | Systems and methods for generation and use of tensor networks |
CN111078866A (en) * | 2019-12-30 | 2020-04-28 | 华南理工大学 | Chinese text abstract generation method based on sequence-to-sequence model |
CN113360655A (en) * | 2021-06-25 | 2021-09-07 | 中国电子科技集团公司第二十八研究所 | Track point classification and text generation method based on sequence annotation |
CN113657115A (en) * | 2021-07-21 | 2021-11-16 | 内蒙古工业大学 | Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion |
CN114048350A (en) * | 2021-11-08 | 2022-02-15 | 湖南大学 | Text-video retrieval method based on fine-grained cross-modal alignment model |
Non-Patent Citations (3)
Title |
---|
带有情感增强与情感融合的文本情感预测关键技术研究;荣欢;中国博士学位论文全文数据库 信息科技辑;全文 * |
用户粒度级的个性化社交文本生成模型;高永兵,高军甜;计算机应用;全文 * |
面向连贯性强化的无真值依赖文本摘要模型;马廷淮;计算机科学与探索;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115577118A (en) | 2023-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111538848B (en) | Knowledge representation learning method integrating multi-source information | |
CN108415977B (en) | Deep neural network and reinforcement learning-based generative machine reading understanding method | |
CN109840322A (en) | It is a kind of based on intensified learning cloze test type reading understand analysis model and method | |
CN111966820B (en) | Method and system for constructing and extracting generative abstract model | |
CN111985205A (en) | Aspect level emotion classification model | |
CN113157919B (en) | Sentence text aspect-level emotion classification method and sentence text aspect-level emotion classification system | |
CN115510236A (en) | Chapter-level event detection method based on information fusion and data enhancement | |
CN117763363A (en) | Cross-network academic community resource recommendation method based on knowledge graph and prompt learning | |
CN112580370B (en) | Mongolian nerve machine translation method integrating semantic knowledge | |
CN115391563B (en) | Knowledge graph link prediction method based on multi-source heterogeneous data fusion | |
CN113641854B (en) | Method and system for converting text into video | |
CN111444328A (en) | Natural language automatic prediction inference method with interpretation generation | |
CN114817574A (en) | Generation type common sense reasoning method based on knowledge graph | |
CN114780725A (en) | Text classification algorithm based on deep clustering | |
CN115577118B (en) | Text generation method based on mixed grouping ordering and dynamic entity memory planning | |
CN118335190A (en) | Method and system for generating amino acid sequence of protein with specific functions and properties by using deep learning technology | |
CN118312833A (en) | Hierarchical multi-label classification method and system for travel resources | |
CN118136155A (en) | Drug target affinity prediction method based on multi-modal information fusion and interaction | |
CN117350378A (en) | Natural language understanding algorithm based on semantic matching and knowledge graph | |
CN112069777B (en) | Two-stage data-to-text generation method based on skeleton | |
CN116977509A (en) | Virtual object action generation method, device, computer equipment and storage medium | |
CN116340569A (en) | Semi-supervised short video classification method based on semantic consistency | |
CN113486180A (en) | Remote supervision relation extraction method and system based on relation hierarchy interaction | |
CN117951313B (en) | Document relation extraction method based on entity relation statistics association | |
CN118070754B (en) | Neural network text sequence generation method, terminal device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |