CN112488290B - Natural language multitask modeling and predicting method and system with dependency relationship - Google Patents
Natural language multitask modeling and predicting method and system with dependency relationship Download PDFInfo
- Publication number
- CN112488290B CN112488290B CN202011129406.1A CN202011129406A CN112488290B CN 112488290 B CN112488290 B CN 112488290B CN 202011129406 A CN202011129406 A CN 202011129406A CN 112488290 B CN112488290 B CN 112488290B
- Authority
- CN
- China
- Prior art keywords
- task
- label
- tasks
- layer
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000001364 causal effect Effects 0.000 claims abstract description 72
- 238000005070 sampling Methods 0.000 claims abstract description 57
- 238000013508 migration Methods 0.000 claims abstract description 26
- 230000005012 migration Effects 0.000 claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 24
- 238000012546 transfer Methods 0.000 claims abstract description 3
- 239000010410 layer Substances 0.000 claims description 64
- 238000013528 artificial neural network Methods 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 34
- 230000015654 memory Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 239000002356 single layer Substances 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000002441 reversible effect Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010380 label transfer Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a natural language multitask modeling and predicting method and a system with dependency relationship.A hierarchical encoder embeds input words into tasks of different levels to carry out coding representation of different levels; the label embedding layer embeds labels of different tasks into the same pull-type space; the label shifter transfers the embedded label; the predictor predicts the probability distribution of each task according to the coding result and the migration result of each task; gumbel sampling is carried out on the Gumbel sampling layer according to probability distribution predicted by each task, counterfactual value is carried out according to set probability, counterfactual inference is carried out, if causal association exists among the tasks, the causal effect is obtained, and the multi-task model is subjected to combined optimization. According to the method and the device, the low-layer task can obtain the return from the high-layer task according to the causal association, so that the prediction result of the optimized model on the low-layer task is more accurate, and the prediction precision of the high-layer task is improved.
Description
Technical Field
The invention relates to a multitask learning technology in the technical field of natural language processing, in particular to a natural language multitask modeling and predicting method and system with dependency relationship.
Background
Along with the field of machine learning, multi-task learning is an important learning method because it allows the knowledge of the related tasks to be utilized to improve the effect of machine learning. In recent years, there have been some studies that have proposed a hierarchical multitask model for tasks with dependencies, which generally will work better than a flat multitask framework because of the potential dependencies between tasks that can be exploited. However, these hierarchical multi-task models only consider the stacking of encoders of the neural network, and ignore the strong logical association between the prediction results, thereby causing the situation that the prediction results are inconsistent among various tasks, and limiting the application of the machine learning model in the actual scene, for example, in the application of judicial judgment prediction based on the referee document, the situation that the relevant statutes and the names of guilt of the prediction are inconsistent is caused.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a method and system for multi-task modeling and prediction of natural language with dependency relationship.
The invention is realized by the following technical scheme.
According to an aspect of the present invention, there is provided a natural language multitask result prediction method with dependency relationship, including:
s1: word embedding is carried out on an input text X with the length of n, and the input text X is converted into a word embedding sequence E ═ { E }i}1≤i≤n。
S2: for any task k, embedding and migrating the labels of the first k-1 tasks:
define a label for each task asThe label is embedded intoThe tag embedding is processed by a fully connected neural network to get:
wherein, WkIs a parameter matrix of the fully-connected neural network for each task;
embedding tags for each task through a tag migratorThe migration result obtained isThe calculation process is as follows:
s3: processing the word embedding sequence, the label embedding after the migration and the coding of the task k-1 to obtain the coding of the task kComprises the following steps:
Hk=Encoder(k)(E,THk-1,Hk-1)
wherein, Encoder(k)An encoder for task k;
s4: predicting the code of the task k to obtain the output of the task k as follows:
wherein the Predictor(k)The predictor of the task k consists of a single-layer or multi-layer fully-connected neural network and then uses a softmax function pairAnd converting to generate the probability distribution of the prediction result of the task k:
wherein the probability distributionThe category corresponding to the maximum probability in the task k is the prediction result of the task k;
s5: performing counterfactual value taking on the prediction result of the task k obtained in S4 by using Gumbel sampling to obtain:
wherein,representing according to a probability distributionAfter samplingAs a result, g is sampled from the Gumbel (0,1) distribution, τ is the temperature parameter of the softmax function, and as τ approaches 0,according to probability distributionSampling values and converting the sampled values into one-hot vectors;
s6: the obtained sampling valueReplacing the task label adopted in the step S2, and re-executing the steps S2 to S4 to obtain the probability distribution of the task k prediction result
S7, training and optimizing the task k prediction result obtained in S6 by adopting a loss function:
if the task k is named entity identification, the loss function is calculated by cross entropy:
wherein,the label of the ith word corresponding to the c entity type indicates that the word is the entity type c if the label is 1, and indicates that the word is not the type c if the label is 0;
if the task k is a text sequence classification task, the loss function is as follows:
wherein,if the number of the c entity type in the whole text sequence is 1, the word is represented as the entity type c, and if the number of the c entity type in the whole text sequence is 0, the word is not represented as the type c;
the loss functions of a plurality of tasks are integrated to obtain:
wherein,representing the total loss function, λkRepresenting the weight corresponding to the task k;
and minimizing a total loss function, and realizing the training and optimization of the task k prediction result.
Preferably, the task k encoder includes: a two-way long-short time memory network, a convolutional neural network, or an attention-based deformable network.
Preferably, the predictor of task k comprises: the neural network is fully connected.
According to another aspect of the present invention, there is provided a method for constructing a natural language multitasking model with dependency relationship, including:
constructing a word embedding layer, wherein the word embedding layer is used for carrying out word embedding on an input text X;
constructing a hierarchical encoder, wherein the hierarchical encoder is used for embedding input words and carrying out different levels of processing aiming at different levels of tasks, so that a low-level task obtains shallow coded representation and a high-level task obtains deep coded representation;
constructing a label embedding layer, wherein the label embedding layer is used for embedding labels of different tasks into the same pull-type space;
constructing a label shifter, wherein the label shifter is used for shifting the embedded label, so that each task can utilize the label information of all lower-layer tasks;
constructing a predictor, wherein the predictor predicts the probability distribution of each task according to the coding result output by the level coder of each task and the migration result output by the label migration device;
and constructing a Gumbel sampling layer, wherein Gumbel sampling is carried out on the Gumbel sampling layer according to the probability distribution predicted by each task, counterfactual value taking is carried out according to the set probability, so that counterfactual inference is carried out, if causal association exists among the tasks, the causal effect is obtained, and the multi-task model is subjected to joint optimization.
Preferably, the method of constructing a hierarchical encoder comprises:
constructing an encoder of each task based on the deep neural network;
the encoders of the different tasks are stacked so that the encoder of each task can utilize the encoder output of its lower layer task, the original word embedding, and the information of its predictor prediction result migration of the lower layer task.
Preferably, the label embedding layer adopts a fully-connected neural network to embed the prediction result of each task into the same pull space.
Preferably, the label migrator performs label migration between tasks by using a one-way long-term memory network, so that each task can utilize label information of all lower-layer tasks of the task.
Preferably, each task can utilize the tag information of all its lower-layer tasks, including:
indirect causal path, for task k, its label is YkThe coding result output by the coder is HkThe label information after all the low-level tasks are migrated is THk-1The indirect causal path of the migrated label information to the label is THk-1→Hk→YkNamely, the label information of the low-layer task influences the encoding result of the task k so as to influence the prediction result of the task k;
direct causal path, for task k, its label is set to YkThe label information after all the low-level tasks are migrated is THk-1Direct causal path TH of migrated label information to labelk-1→YkI.e. label information of lower-level tasks is directly input to the predictor, the wayThe path is not affected by the input text.
Preferably, the method for joint optimization of the Gumbel sampling layer on the multitask model includes:
in a multi-task training stage, Gumbel sampling is carried out according to the prediction probability of each task, so that each task obtains a counterfactual value according to a set probability;
inputting the sampled prediction probability result into a coder and a predictor, and simultaneously leading the whole model to be conductive end to end due to Gumbel sampling, leading each task to have reverse gradient towards the lower-layer task, and punishing the counter fact value of the lower-layer task;
gumbel sampling also calculates causal effects between tasks from the observed data through counter-fact reasoning.
According to a third aspect of the invention, a natural language multitask result prediction system with dependency relationship is provided, which comprises a hierarchical encoder module based on a deep neural network, a multitask prediction result embedding and transferring module and a cause and effect inference joint optimization module based on Gumbel sampling; wherein:
the hierarchical encoder module based on the deep neural network uses an encoder based on the deep neural network to embed input words into tasks of different levels for processing of different levels, so that low-level tasks obtain shallow coded representation and high-level tasks obtain deep coded representation;
the multi-task prediction result embedding and transferring module is used for embedding labels of different tasks into the same pull-type space and then transferring the embedded labels by using a one-way long-time memory network, so that each task utilizes the label information of all the upper and lower tasks and obtains the probability distribution prediction result of each task according to the coding result of each task and the embedded label transfer result output by the hierarchical encoder module based on the deep neural network;
the Gumbel sampling-based causal inference joint optimization module conducts Gumbel sampling on probability distribution predicted by each task, performs counter-fact dereferencing according to set probability, conducts counter-fact inference, predicts a causal effect according to association information if causal association exists among the tasks, and conducts joint optimization on a hierarchical encoder module of a deep neural network and a multi-task prediction result embedding and transferring module.
According to a fourth aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform any of the methods described above.
According to a fifth aspect of the invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any one of the above.
Due to the adoption of the technical scheme, the invention has the following beneficial effects:
according to the natural language multitask modeling and predicting method and system with the dependency relationship, causal association among tasks is better utilized based on causal inference, a label embedding migration device and Gumbel sampling based on counterfactual inference are innovatively provided on the basis of a hierarchical multitask model, on one hand, the label embedding migration device embeds output results of low-level tasks and provides the embedded output results to high-level tasks, so that the prediction of the high-level tasks can utilize information of the upper-level tasks and the lower-level tasks, and the dependency relationship among the tasks is better modeled; on the other hand, Gumbel sampling based on counterfactual inference can correctly estimate causal association between tasks, and enables feedback to be provided for lower-layer tasks among higher-layer tasks, and joint optimization reduces error accumulation problems among tasks.
Compared with the traditional model, the natural language multitask modeling and predicting method and system with the dependency relationship have the advantages that the causal association among tasks is utilized, so that the accuracy and the robustness are improved, and more reasonable prediction can be given when new data are faced.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of the causal hypothesis employed by the present invention.
Figure 2 is a multitasking causal topology presented in a preferred embodiment of the present invention.
FIG. 3 is a diagram of a multitasking model architecture presented in a preferred embodiment of the present invention.
Detailed Description
The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
The embodiment of the invention provides a natural language multitask modeling and predicting method and system with dependency relationship, aiming at deducing logic dependency relationship among modeling multitasks based on cause and effect and promoting multitask learning by utilizing the dependency relationship, thereby promoting the accuracy and consistency of multitask prediction results.
The embodiment of the invention provides a natural language multitask result prediction method with dependency relationship, aiming at the condition that logic dependency exists among tasks, the dependency relationship among the tasks is utilized, and the consistency and the accuracy of a prediction result are improved according to counterfactual reasoning in causal inference.
The method provided by the embodiment comprises the following steps:
performing joint learning on a plurality of natural language processes having a dependency relationship: such as named entity recognition, and joint learning of two tasks of relationship extraction, wherein the task of relationship extraction depends on the task of named entity recognition; for another example, the prediction of the official document based on the continental law system comprises three tasks, namely related law article prediction, criminal name prediction and criminal period prediction, wherein the criminal name prediction depends on the related law article prediction, and the criminal period prediction depends on the first two tasks. Specifically, the multitask learning method can be divided into the following steps:
step 1: word embedding is carried out on an input text X with the length of n, and the input text X is converted into a word embedding sequence E ═ { E }i}1≤i≤n。
Step 2: taking the real result of the task as a label, and embedding and migrating the labels of the first k-1 tasks for any task k:
define a label for each task asThe label is embedded intoThe tag embedding is processed by a fully connected neural network to get:
wherein, WkIs a parameter matrix of the fully-connected neural network for each task;
embedding tags for each task through a tag migratorThe migration result obtained isThe calculation process is as follows:
and step 3: processing the word embedding sequence, the label embedding after the migration and the coding of the task k-1 to obtain the coding of the task kComprises the following steps:
Hk=Encoder(k)(E,THk-1,Hk-1)
wherein, Encoder(k)An encoder for task k;
and 4, step 4: predicting the code of the task k to obtain the output of the task k as follows:
wherein the Predictor(k)The predictor of the task k consists of a single-layer or multi-layer fully-connected neural network and then uses a softmax function pairAnd converting to generate the probability distribution of the prediction result of the task k:
wherein the probability distributionThe category corresponding to the maximum probability in the task k is the prediction result of the task k;
and 5: because the dependency relationship of the task k on the first k-1 tasks of the task k is considered in the prediction result in the step 4, an accumulated error between the tasks, that is, a prediction result error of the first k-1 tasks, may affect the prediction result of the task k. In order to reduce the accumulated error and improve the consistency of the prediction results of multiple tasks, this embodiment uses Gumbel sampling to perform counter-fact value calculation on the prediction result of task k obtained in step 4, so as to obtain:
wherein,representing according to a probability distributionAfter sampling, g is obtained by sampling from Gumbel (0,1) distribution, tau is a temperature parameter of the softmax function, when tau is close to 0,according to probability distributionSampling values and converting the sampled values into one-hot vectors;
step 6: the obtained sampling valueReplacing the real label adopted in the step 2, and re-executing the step 2 to the step 4 to obtain the probability distribution of the task k prediction result
And 7, training and optimizing a task k prediction result obtained in the step S6 by adopting a loss function:
if the task k is named entity identification, the loss function is calculated by cross entropy:
wherein,the label of the ith word corresponding to the c entity type indicates that the word is the entity type c if the label is 1, and indicates that the word is not the type c if the label is 0;
if the task k is a text sequence classification task, the loss function is as follows:
wherein,if the number of the c entity type in the whole text sequence is 1, the word is represented as the entity type c, and if the number of the c entity type in the whole text sequence is 0, the word is not represented as the type c;
the loss functions of a plurality of tasks are integrated at the same time to obtain:
wherein,represents the total loss function, λ, of the present learning methodkRepresenting the weight corresponding to the task k;
the process of training and optimizing is a process of minimizing the loss function; due to the fact thatAnd for the task k, the counter-fact value of the first k-1 tasks is included, so that the first k-1 tasks can obtain a counter-propagation gradient from the task k, if the tasks have a dependency relationship, the task k provides feedback to punish a wrong prediction result of the first k-1 tasks, and further the training and optimization of the prediction result of the task k are realized.
As a preferred embodiment, the encoder for task k includes: a two-way long-short time memory network, a convolutional neural network, or an attention-based deformable network.
As a preferred embodiment, the predictor of task k includes: the neural network is fully connected.
Another embodiment of the present invention provides a method for constructing a natural language multitask model with dependency relationship, including:
constructing a word embedding layer, wherein the word embedding layer is used for carrying out word embedding on an input text X;
constructing a hierarchical encoder, wherein the hierarchical encoder is used for embedding input words and carrying out different levels of processing aiming at different levels of tasks, so that a low-level task obtains shallow coded representation and a high-level task obtains deep coded representation; for example, in the multi-task learning of the named entity identification task and the relationship extraction, the named entity is identified as a low-level task, the relationship extraction is a high-level task, and semantic information required by the relationship extraction is richer, so that deeper coding representation is required;
constructing a label embedding layer, wherein the label embedding layer is used for embedding labels of different tasks into the same pull-type space;
constructing a label shifter, wherein the label shifter is used for shifting the embedded label, so that each task can utilize the label information of all lower-layer tasks;
constructing a predictor, wherein the predictor predicts the probability distribution of each task according to the coding result output by the hierarchical coder of each task and the migration result output by the label migration device;
and constructing a Gumbel sampling layer, carrying out Gumbel sampling on the Gumbel sampling layer according to the probability distribution predicted by each task, carrying out counterfactual dereferencing according to a set probability, thereby carrying out counterfactual inference, and obtaining a causal effect of the tasks if causal association exists between the tasks, thereby carrying out joint optimization on the multi-task model.
As a preferred embodiment, a method of constructing a hierarchical encoder includes:
constructing an encoder of each task based on the deep neural network;
the encoders of the different tasks are stacked so that the encoder of each task can utilize the encoder output of its lower layer task, the original word embedding, and the information of its predictor prediction result migration of the lower layer task.
As a preferred embodiment, the label embedding layer adopts a fully connected neural network to embed the prediction result of each task into the same pull-type space.
As a preferred embodiment, the label migrator performs label migration between tasks by using a one-way long-and-short-term memory network, so that each task can utilize label information of all lower-layer tasks of the task.
As a preferred embodiment, each task can utilize the label information of all its lower-level tasks, including:
indirect causal path, for task k, its label is YkThe coding result output by the coder is HkThe label information after all the low-level tasks are migrated is THk-1The indirect causal path of the migrated label information to the label is THk-1→Hk→YkNamely, the label information of the low-layer task influences the encoding result of the task k so as to influence the prediction result of the task k;
direct causal path, for task k, its label is set to YkThe label information after all the low-level tasks are migrated is THk-1Direct causal path TH of migrated label information to labelk-1→YkI.e. the label information of the low-level task is directly input to the predictor, and the path is not influenced by the input text.
As a preferred embodiment, the method for Gumbel sampling layer to perform joint optimization on the multitask model comprises the following steps:
in a multi-task training stage, Gumbel sampling is carried out according to the prediction probability of each task, so that each task obtains a counterfactual value according to a set probability;
inputting the sampled prediction probability result into an encoder and a predictor, and simultaneously leading the whole model to be conductive end to end due to Gumbel sampling, leading each task to have a reverse gradient towards the lower-layer task, and punishing the counterfactual value of the lower-layer task (namely punishing the wrong prediction result of the lower-layer task);
gumbel sampling also calculates causal effects between tasks from the observed data through counter-fact reasoning.
The multitask model constructed in the present embodiment may be used to execute the prediction method provided in the above embodiment and to construct the prediction system provided in the following embodiment.
The third embodiment of the invention provides a natural language multitask result prediction system with dependency relationship, which comprises a hierarchical encoder module based on a deep neural network, a multitask prediction result embedding and transferring module and a cause and effect inference joint optimization module based on Gumbel sampling; wherein:
the hierarchical encoder module based on the deep neural network uses an encoder based on the deep neural network to embed input words into tasks of different levels for processing of different levels, so that low-level tasks obtain shallow coded representation and high-level tasks obtain deep coded representation;
the prediction result embedding and transferring module between the multiple tasks embeds the labels of different tasks into the same pull-type space, and then uses a one-way long-time memory network to transfer the embedded labels, so that each task utilizes the label information of all the upper and lower tasks, and obtains the probability distribution prediction result of each task according to the encoding result of each task and the embedded label transferring result output by the hierarchical encoder module based on the deep neural network;
and if causal association exists among the tasks, predicting the causal effect according to association information, and performing joint optimization on a hierarchical encoder module of the deep neural network and a prediction result embedding and transferring module among the multiple tasks.
The technical solutions provided by the above-mentioned embodiments of the present invention, the technical principles involved and the technical effects achieved are further described in detail below with reference to the accompanying drawings.
First, the embodiment of the present invention treats a multitask model from the standpoint of causal inference, and outputs a variable Y for two tasks m and nmAnd YnConventional multi-task learning is based on a promiscuous causal relationship hypothesis, as shown in fig. 1(a), in which features extracted from input samples by an encoder represent H as a promiscuous factor and outputs of a plurality of tasks are determined by a mixed variable, and under such an assumption, a variable Y is setmAnd YnAlthough there is correlation, there is no causal relationship between them, and they are independent of each other, which means that this assumption has not been successful in modeling YmAnd YnThe dependency of (c). For dependent multitaskingThe present invention is based on the assumption of indirect causal relationship, as shown in FIG. 1(b), that the output variable Y for two tasks ismAnd Yn,YmBy characterizing H, and thus indirectly Yn. The multitask likelihood function under the confounding causal hypothesis is:
P(Ym,Yn|X)=P(Ym|X)P(Yn|X)
the multitask likelihood function under the indirect causal hypothesis is:
P(Ym,Yn|X)=P(Ym|X)P(Yn|X,Ym)
as can be seen from the likelihood function, the indirect causal hypothesis is more able to model YmAnd YnThe dependency of (c). In particular, the embodiments of the present invention employ indirect causal assumptions between which two causal paths exist. The indirect causal path is Ym→H→YnThe direct causal path being Ym→Yn。
The embodiment of the invention is based on indirect causal hypothesis and is extended to the scenes with more than 2 tasks, the causal effect topological graph is shown in figure 2, and indirect causal paths and direct causal paths also exist between labels. For task k, HkFor the output of the encoder, THk-1The migration result of the task label of the lower layer is labeled as YkThe indirect causal path of the migrated label information to the label is THk-1→Hk→YkThe direct causal path is THk-1→Yk。
Based on the proposed multitask causal topological graph, the embodiment of the invention provides a method for constructing a causal multitask model, and the model architecture graph obtained by construction is shown in fig. 3. On the basis of a hierarchical multitask model, prediction result embedding and migration among multitasks and causal inference joint optimization based on Gumbel sampling are adopted. The model architecture is shown in FIG. 3 and comprises a word embedding layer, a hierarchical coder, a predictor, a Gumbel sampling layer, a label embedding layer and a label migrator.
Based on the multi-task model architecture provided by the embodiment of the invention, the embodiment of the invention simultaneously realizes a multi-task result prediction method, which comprises the following steps:
step 1: word embedding is carried out on an input text X with the length of n, and the input text X is converted into a word embedding sequence E ═ { E }i}1≤i≤n。
Step 2: taking the real result of the task as a label, and embedding and migrating the labels of the first k-1 tasks for any task k:
define a label for each task asThe label is embedded intoThe tag embedding is processed by a fully connected neural network to get:
wherein, WkIs a parameter matrix of the fully-connected neural network for each task;
embedding tags for each task through a tag migratorThe migration result obtained isThe calculation process is as follows:
and step 3: processing the word embedding sequence, the label embedding after the migration and the coding of the task k-1 to obtain the coding of the task kComprises the following steps:
Hk=Encoder(k)(E,THk-1,Hk-1)
wherein, Encoder(k)An encoder for task k;
and 4, step 4: predicting the code of the task k to obtain the output of the task k as follows:
wherein the Predictor(k)The predictor of the task k consists of a single-layer or multi-layer fully-connected neural network and then uses a softmax function pairAnd converting to generate the probability distribution of the prediction result of the task k:
wherein the probability distributionThe category corresponding to the maximum probability in the task k is the prediction result of the task k;
and 5: because the dependency relationship of the task k on the first k-1 tasks of the task k is considered in the prediction result in the step 4, an accumulated error between the tasks, that is, a prediction result error of the first k-1 tasks, may affect the prediction result of the task k. In order to reduce the accumulated error and improve the consistency of the prediction results of a plurality of tasks, Gumbel sampling is used for carrying out counterfactual value taking in the method, and the following results are obtained:
wherein,representing according to a probability distributionAfter sampling, g is obtained by sampling from Gumbel (0,1) distribution, tau is a temperature parameter of the softmax function, when tau is close to 0,according to probability distributionSampling values and converting the sampled values into one-hot vectors;
step 6: the obtained sampling valueReplacing the real label adopted in the step 2, recalculating the steps 2 to 4 to obtain the probability distribution of the task k prediction result
And 7, training and optimizing the step 6:
if task k is named entity identification, its loss function is calculated by cross entropy:
wherein,the label of the ith word corresponding to the c-th entity category indicates that the word is the entity category c if the label is 1, and indicates that the word is not the category c if the label is 0. If the task k is a text sequence classification task, the loss function is as follows:
wherein,if the label of the c-th entity type in the whole text sequence is 1, the word is the entity type c, and if the label is 0, the word is not the type c. The loss functions of a plurality of tasks are integrated at the same time to obtain:
wherein,represents the total loss function, λ, of the present learning methodkIndicating the weight corresponding to task k. The process of training and optimization is the process of minimizing the loss function. Due to the fact thatAnd the task k contains the counter-fact value of the first k-1 tasks, so that the first k-1 tasks can obtain a counter-propagating gradient from the task k, and if a dependency relationship exists among the tasks, the task k provides feedback to punish the wrong prediction result of the first k-1 tasks.
The technical solution proposed by the above embodiments of the present invention is further described in detail by a specific example. The technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and are used for illustration only, and should not be construed as a limitation of the patent. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The task of setting artificial intelligent judicial judgment prediction includes three subtasks, and relevant law forecast, criminal name forecast and criminal period forecast are performed according to the affirmation fact description in the judgment document. In a continental act (such as the judicial system of china), there is a logical dependency relationship between these three subtasks, the names of the guilties and the criminal phase depend on the relevant law, and the criminal phase also depends on the names of the guilties, so the logical order of the three subtasks is: relevant law prediction → criminal name prediction → criminal phase prediction. Identification facts in the following official documents:
the people inspection institute in the third city area of Dongguan city is named as guidance, 7 month, 17 day and 4 hours in 2014, a person being defended is Jiang a party with others to a place where a number 35 Wake's clock is located in Dongguan county east approach of Huangjiang town county, village, 35, and the person being defended is forced to bring the clock to a 317 house of Changan town Xinan two-way Dean business hotel in Dongguan city. During the period, a jiang et al snatches 5000 yuan cash and a bank card in a certain wallet of the clock, forces the certain wallet to inform a bank card password, and then gets 14900 yuan from the certain bank card of the later jiang et al and consumes 515 yuan by the certain bank card of the later jiang et al. After that, Jiang someone and so on send a certain time to Huangjiang town to carry the escape … ….
And (4) predicting results: the multi-task model constructed by the model construction method provided by the embodiment of the invention and the prediction method thereof successfully predict that the related law is twenty-hundred and thirty-eight, the name of the crime is illegal containment, and the criminal period is 2-3 years. The second hundred thirty-eight of them stipulate that others are illegally restrained or otherwise deprived of their freedom, and there are futures, commissions, polices or deprivation of political rights under three years. With assault, insulting, heavy penalties. The name and the criminal period of the criminal line accord with the regulations of related laws.
And (3) analysis: after Gumbel sampling of the model and a label shifter are removed, a traditional hierarchical multi-task model is obtained, the prediction result is that the related law article is twenty-hundred thirty-eight, the criminal name is kidnapping criminal, the result that the criminal name is inconsistent with the criminal period is obtained, and the fact that only stacking encoders are insufficient for modeling the dependency relationship among tasks is shown. After Gumbel sampling and a label shifter are introduced, the model correctly estimates causal association between tasks from the aspect of causal inference, and the accuracy and consistency of a multi-task prediction result are improved by utilizing the causal association.
Obviously, the above is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and adjustments can be made according to the actual needs without departing from the principle of the present invention, and these modifications and adjustments should also be regarded as the protection scope of the present invention.
A fourth embodiment of the present invention provides a terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, is configured to perform the modeling method or the prediction method according to any one of the above embodiments of the present invention.
Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (RAM), such as a Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.
The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.
A fifth embodiment of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the modeling method or the prediction method of any of the above embodiments.
The natural language multitask modeling and predicting method and system with the dependency relationship provided by the embodiment of the invention realize a natural language multitask model construction method, a result predicting method and a result predicting system based on causal inference aiming at the problem of multitask joint optimization with the dependency relationship. Wherein: hierarchical encoder based on deep neural network: for various tasks with dependency relationship, the layered multi-encoder can obtain better performance than a flat encoder; the encoder of the complex task is placed on the upper layer, and can obtain representation with rich semantics, and the encoder of the simple task is placed on the lower layer, and only shallow processing is needed to be performed on input. Embedding and migrating prediction results among multiple tasks: the method and the device have the advantages that the multi-task can be modeled into a serialized label generation problem by utilizing the logic dependency relationship among the multi-task, so that the label is embedded into a vector space with fixed dimensionality, and then the label is migrated among the tasks by adopting a recurrent neural network, so that the prediction information of a low-level task is utilized by a high-level task. Causal inference joint optimization based on Gumbel sampling: the prediction result migration mechanism utilizes the logic relevance among tasks, but also causes the occurrence of error accumulation, so the embodiment of the invention designs a Gumbel sampling-based causal inference joint optimization method, and the core idea is to utilize the counter-fact reasoning in the causal inference, namely when the counter-fact value is taken by the label of the lower-layer task, whether the output of the higher-layer task changes or not, and whether the causal relevance exists or not is judged according to the change value. Due to the causal association, the low-level task can obtain the return from the high-level task, so that the prediction result of the optimized model for the low-level task is more accurate, and the prediction accuracy of the high-level task is improved.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.
Claims (10)
1. A method for predicting a natural language multitask result with dependency relationship is characterized by comprising the following steps:
s1: word embedding is carried out on an input text X with the length of n, and the input text X is converted into a word embedding sequence E ═ { E }i}1≤i≤n;
S2: taking the real result of the task as a label, and embedding and migrating the labels of the first k-1 tasks for any task k:
define a label for each task asThe label is embedded intoThe tag embedding is processed by a fully connected neural network to get:
wherein, WkIs a parameter matrix of the fully-connected neural network for each task;
embedding tags for each task through a tag migratorThe migration result obtained isThe calculation process is as follows:
s3: to pairProcessing the word embedding sequence, the label embedding after the migration and the coding of the task k-1 to obtain the coding of the task kComprises the following steps:
Hk=Encoder(k)(E,THk-1,Hk-1)
wherein, Encoder(k)An encoder for task k;
s4: predicting the code of the task k to obtain the output of the task k as follows:
wherein the Predictor(k)The predictor of the task k consists of a single-layer or multi-layer fully-connected neural network and then uses a softmax function pairAnd converting to generate the probability distribution of the prediction result of the task k:
wherein the probability distributionThe category corresponding to the maximum probability in the task k is the prediction result of the task k;
s5: performing counterfactual value taking on the prediction result of the task k obtained in S4 by using Gumbel sampling to obtain:
wherein,representing according to a probability distributionAfter sampling, g is obtained by sampling from Gumbel (0,1) distribution, tau is a temperature parameter of the softmax function, when tau is close to 0,according to probability distributionSampling values and converting the sampled values into one-hot vectors;
s6: the obtained sampling valueReplacing the task label adopted in the step S2, and re-executing the steps S2 to S4 to obtain the probability distribution of the task k prediction result
S7, training and optimizing the task k prediction result obtained in S6 by adopting a loss function:
if the task k is named entity identification, the loss function is calculated by cross entropy:
wherein,the label of the ith word corresponding to the c entity type indicates that the word is the entity type c if the label is 1, and indicates that the word is not the type c if the label is 0;
if the task k is a text sequence classification task, the loss function is as follows:
wherein,if the number of the c entity type in the whole text sequence is 1, the word is represented as the entity type c, and if the number of the c entity type in the whole text sequence is 0, the word is not represented as the type c;
and synthesizing the loss functions of the multiple tasks to obtain:
wherein,representing the total loss function, λkRepresenting the weight corresponding to the task k;
minimizing a total loss function, and realizing the training and optimization of a task k prediction result;
each task k can utilize the label information of all its lower layer tasks, wherein:
for task k, its label is YkThe coding result is HkThe label information after all the low-level tasks are migrated is THk-1The indirect causal path of the migrated label information to the label is THk-1→Hk→YkNamely, the label information of the low-layer task influences the encoding result of the task k so as to influence the prediction result of the task k;
for task k, its label is YkThe label information after all the low-level tasks are migrated is THk-1Direct causal path TH of migrated label information to labelk-1→YkI.e. the label information of the low-level task is directly input to the predictor, and the path is not influenced by the input text.
2. The method of claim 1, wherein the task k encoder comprises: a two-way long-short time memory network, a convolutional neural network, or an attention-based deformable network.
3. The method of claim 1, wherein the predictor of task k comprises: the neural network is fully connected.
4. A method for constructing a natural language multitask model with dependency relationship is characterized by comprising the following steps:
constructing a word embedding layer, wherein the word embedding layer is used for carrying out word embedding on an input text X;
constructing a hierarchical encoder, wherein the hierarchical encoder is used for embedding input words and processing different levels of tasks, and the task encoders are sequentially stacked, so that a low-level task obtains shallow encoding representation and a high-level task obtains deep encoding representation;
constructing a label embedding layer, wherein the label embedding layer is used for embedding labels of different tasks into the same pull-type space, and the real result of the task is used as the label of the task;
constructing a label shifter, wherein the label shifter is used for shifting the embedded label, so that each task can utilize the label information of all lower-layer tasks;
constructing a predictor, wherein the predictor predicts the probability distribution of each task according to the coding result output by the level coder of each task and the migration result output by the label migration device;
constructing a Gumbel sampling layer, wherein Gumbel sampling is carried out on the Gumbel sampling layer according to the probability distribution predicted by each task, counterfactual value taking is carried out according to set probability, so that counterfactual inference is carried out, if causal association exists among the tasks, the causal effect is obtained, and a multi-task model is subjected to joint optimization;
the label migrator adopts a one-way long-time memory network to perform label migration between tasks, so that each task can utilize the label information of all the low-level tasks; wherein:
indirect causal path, for task k, its label is YkThe coding result output by the coder is HkThe label information after all the low-level tasks are migrated is THk-1The indirect causal path of the migrated label information to the label is THk-1→Hk→YkNamely, the label information of the low-layer task influences the encoding result of the task k so as to influence the prediction result of the task k;
direct causal path, for task k, its label is set to YkThe label information after all the low-level tasks are migrated is THk-1Direct causal path TH of migrated label information to labelk-1→YkI.e. the label information of the low-level task is directly input to the predictor, and the path is not influenced by the input text.
5. The method of constructing a natural language multitasking model with dependent relations according to claim 4, wherein said method of constructing a hierarchical coder includes:
constructing an encoder of each task based on the deep neural network;
the encoders of the different tasks are stacked so that the encoder of each task can utilize the encoder output of its lower layer task, the original word embedding, and the information of its predictor prediction result migration of the lower layer task.
6. The method for constructing a natural language multitask model with dependency relationship according to claim 4, characterized in that the label embedding layer adopts a fully connected neural network to embed the sampling value of each task according to the probability distribution of the predicted result into the same pull space.
7. The method for constructing a natural language multitask model with dependency relationship according to claim 4, wherein said Gumbel sampling layer is a method for performing joint optimization on a multitask model, and comprises:
in a multi-task training stage, Gumbel sampling is carried out according to the prediction probability of each task, so that each task obtains a counterfactual value according to a set probability;
inputting the sampled prediction probability result into a coder and a predictor, and simultaneously leading the whole model to be conductive end to end due to Gumbel sampling, leading each task to have reverse gradient towards the lower-layer task, and punishing the counter fact value of the lower-layer task;
gumbel sampling also calculates causal effects between tasks from training data through counterfactual reasoning.
8. A natural language multitask result prediction system with dependency relationship is characterized by comprising a hierarchical encoder module based on a deep neural network, a multitask prediction result embedding and transferring module and a cause and effect inference joint optimization module based on Gumbel sampling; wherein:
the hierarchical encoder module based on the deep neural network uses an encoder based on the deep neural network to embed input words into tasks of different levels for processing of different levels, so that low-level tasks obtain shallow coded representation and high-level tasks obtain deep coded representation;
the prediction result embedding and transferring module between the multiple tasks takes the real result of the task as a label, embeds the labels of different tasks into the same pull-type space, and transfers the embedded label by using a one-way long-time memory network, so that each task utilizes the label information of all the tasks at the lower layer, and obtains the probability distribution prediction result of each task according to the coding result of each task and the embedded label transferring result output by the hierarchical encoder module based on the deep neural network; wherein: for task k, its label is YkThe coding result is HkThe label information after all the low-level tasks are migrated is THk-1The indirect causal path of the migrated label information to the label is THk-1→Hk→YkNamely, the label information of the low-layer task influences the encoding result of the task k so as to influence the prediction result of the task k; for task k, its label is YkThe label information after all the low-level tasks are migrated is THk-1Direct causal path TH of migrated label information to labelk-1→YkNamely, label information of the low-level task is directly input to the predictor, and the path is not influenced by the input text;
the Gumbel sampling-based causal inference joint optimization module conducts Gumbel sampling on probability distribution predicted by each task, performs counter-fact dereferencing according to set probability, conducts counter-fact inference, predicts a causal effect according to association information if causal association exists among the tasks, and conducts joint optimization on a hierarchical encoder module of a deep neural network and a multi-task prediction result embedding and transferring module.
9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the program when executed by the processor is operable to perform the method of any of claims 1-3 or 4-8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1-3 or 4-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011129406.1A CN112488290B (en) | 2020-10-21 | 2020-10-21 | Natural language multitask modeling and predicting method and system with dependency relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011129406.1A CN112488290B (en) | 2020-10-21 | 2020-10-21 | Natural language multitask modeling and predicting method and system with dependency relationship |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112488290A CN112488290A (en) | 2021-03-12 |
CN112488290B true CN112488290B (en) | 2021-09-07 |
Family
ID=74927084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011129406.1A Active CN112488290B (en) | 2020-10-21 | 2020-10-21 | Natural language multitask modeling and predicting method and system with dependency relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112488290B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113824725B (en) * | 2021-09-24 | 2023-04-07 | 中国人民解放军国防科技大学 | Network safety monitoring analysis method and system based on causal machine learning |
CN116958748B (en) * | 2023-07-28 | 2024-02-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Image detection method, device, equipment and medium for multitasking causal learning |
CN117151431B (en) * | 2023-10-30 | 2024-01-26 | 四川省致链数字科技有限公司 | Automatic distribution method and system for wooden furniture order tasks |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491514A (en) * | 2018-03-26 | 2018-09-04 | 清华大学 | The method and device putd question in conversational system, electronic equipment, computer-readable medium |
CN109543199A (en) * | 2018-11-28 | 2019-03-29 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of text translation |
CN109902296A (en) * | 2019-01-18 | 2019-06-18 | 华为技术有限公司 | Natural language processing method, training method and data processing equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11042796B2 (en) * | 2016-11-03 | 2021-06-22 | Salesforce.Com, Inc. | Training a joint many-task neural network model using successive regularization |
CN110188192B (en) * | 2019-04-16 | 2023-01-31 | 西安电子科技大学 | Multi-task network construction and multi-scale criminal name law enforcement combined prediction method |
CN110347839B (en) * | 2019-07-18 | 2021-07-16 | 湖南数定智能科技有限公司 | Text classification method based on generative multi-task learning model |
-
2020
- 2020-10-21 CN CN202011129406.1A patent/CN112488290B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491514A (en) * | 2018-03-26 | 2018-09-04 | 清华大学 | The method and device putd question in conversational system, electronic equipment, computer-readable medium |
CN109543199A (en) * | 2018-11-28 | 2019-03-29 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of text translation |
CN109902296A (en) * | 2019-01-18 | 2019-06-18 | 华为技术有限公司 | Natural language processing method, training method and data processing equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112488290A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112488290B (en) | Natural language multitask modeling and predicting method and system with dependency relationship | |
CN109062901B (en) | Neural network training method and device and name entity recognition method and device | |
CN109840322A (en) | It is a kind of based on intensified learning cloze test type reading understand analysis model and method | |
CN111160035A (en) | Text corpus processing method and device | |
CN107103363B (en) | A kind of construction method of the software fault expert system based on LDA | |
CN110795657A (en) | Article pushing and model training method and device, storage medium and computer equipment | |
CN111428525A (en) | Implicit discourse relation identification method and system and readable storage medium | |
CN110532398A (en) | Family's map method for auto constructing based on multitask united NNs model | |
CN113761197B (en) | Application form multi-label hierarchical classification method capable of utilizing expert knowledge | |
CN112508265A (en) | Time and activity multi-task prediction method and system for business process management | |
CN117291265B (en) | Knowledge graph construction method based on text big data | |
CN112990530B (en) | Regional population quantity prediction method, regional population quantity prediction device, electronic equipment and storage medium | |
CN112069825A (en) | Entity relation joint extraction method for alert condition record data | |
CN116932722A (en) | Cross-modal data fusion-based medical visual question-answering method and system | |
CN116663540A (en) | Financial event extraction method based on small sample | |
CN116431813A (en) | Intelligent customer service problem classification method and device, electronic equipment and storage medium | |
CN114881032A (en) | Hierarchical category named entity recognition model design method based on multi-task learning | |
CN111241392A (en) | Method, device, equipment and readable storage medium for determining popularity of article | |
CN116596581A (en) | ERP management system and method thereof | |
CN117422065A (en) | Natural language data processing system based on reinforcement learning algorithm | |
CN116757773A (en) | Clothing electronic commerce sales management system and method thereof | |
CN116362242A (en) | Small sample slot value extraction method, device, equipment and storage medium | |
CN113706347A (en) | Multitask model distillation method, multitask model distillation system, multitask model distillation medium and electronic terminal | |
CN116578671A (en) | Emotion-reason pair extraction method and device | |
CN116910190A (en) | Method, device and equipment for acquiring multi-task perception model and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |