CN115114409B - Civil aviation unsafe event combined extraction method based on soft parameter sharing - Google Patents
Civil aviation unsafe event combined extraction method based on soft parameter sharing Download PDFInfo
- Publication number
- CN115114409B CN115114409B CN202210848785.2A CN202210848785A CN115114409B CN 115114409 B CN115114409 B CN 115114409B CN 202210848785 A CN202210848785 A CN 202210848785A CN 115114409 B CN115114409 B CN 115114409B
- Authority
- CN
- China
- Prior art keywords
- event
- layer
- network
- model
- tasks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 59
- 230000000694 effects Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 43
- 238000000034 method Methods 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 13
- 238000002372 labelling Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract description 3
- 238000012216 screening Methods 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 57
- 230000006870 function Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000010006 flight Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 241000288105 Grus Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Animal Behavior & Ethology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a civil aviation unsafe event joint extraction method based on soft parameter sharing, which mainly extracts event information in unstructured unsafe event texts in the civil aviation field, including event trigger words, event types, event arguments and argument roles, and forms a structured data form. The invention constructs a joint extraction model based on soft parameter sharing, which clearly separates shared parameters and task specific parameters, enhances the capability of model extraction and semantic knowledge screening through a double-layer gating network, enables the model to learn proper characteristic representation for two tasks at the same time, realizes more efficient information sharing and joint representation learning, solves the phenomenon that two tasks cannot benefit at the same time when sharing the same coding layer due to task difference, and remarkably improves the integral effect of event extraction.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a civil aviation unsafe event combined extraction method based on soft parameter sharing.
Background
Event extraction is an important subject in information extraction research, and is the basic work of applications such as recommendation systems, intelligent question-answering, knowledge graph construction and the like. An event is a specific form of information, meaning that a specific occurrence of an event at a specific time, place, involves one or more participants, and can be generally described as a change in state. The main purpose of the event extraction task is to extract the event information such as event trigger words and argument from unstructured text into a structured form, wherein the event extraction comprises two subtasks, namely an event identification task and an argument role classification task, and the main elements of the task comprise event trigger words, event types, event argument and argument roles.
According to the summary of civil aviation development in 2020, the average number of executed flights in China is 30 ten thousand times per month, and as the number of flights increases, the number of unsafe events of civil aviation is also increased, the current information extraction and recording of most unsafe events are seriously dependent on manpower, high labor cost is generated, bird strike unsafe events are taken as an example, a bird strike event description text requires manual filling of event information such as bird strike time, place, corresponding flight number, bird strike position and the like into a form, time and effort are consumed, and a large number of event data cannot be simultaneously extracted uniformly to form structured data, so that the event information is further used for building event patterns in civil aviation field, and the event patterns have been proved to have important meanings such as risk event monitoring, existing knowledge patterns are enriched, history event management is convenient for inquiring history events and the like. How to efficiently and accurately extract more and more civil aviation unsafe events is an increasingly urgent problem to be solved.
In recent years, with the development of deep learning, most of research on event extraction algorithms is based on a deep learning architecture, and at present, a good effect is achieved on event extraction tasks based on a CNN, RNN, LSTM, BERT and other deep learning methods. However, most of the data used in research work at present are often events derived from general fields, and meanwhile, because of high cost of manual labeling, the common event extraction public data set is small in scale, the general fields are large in field range and many and miscellaneous in event types, so that the final effect of the event extraction method in the general fields in recent years cannot meet the requirements of civil aviation unsafe event extraction in the final effect of the common public data set ACE 2005. However, for civil aviation unsafe events, the event data in the limited field has the characteristics of single event text type, strong regularity of language representation, denser knowledge and the like, so that some research methods in the general field in recent years can obtain considerable effects on event extraction tasks in specific fields such as financial events, medical events and the like, and support is provided for the feasibility of event extraction in the civil aviation field.
The neural network model constructed in the event extraction method based on deep learning is mainly divided into two main types according to different extraction paradigms, namely extraction models based on Pipeline (Pipeline) and Joint (Joint). The model based on the assembly line is to extract trigger words to complete event recognition tasks and then extract argument to complete argument character classification tasks. However, this approach tends to suffer from two drawbacks: firstly, error propagation is carried out, errors generated by an event identification task in the first stage cannot be corrected in the next stage so as to influence a role classification task of a second stage, secondly, information interaction is lacking, two subtasks of event identification and role classification are mutually dependent, the information interaction between the two tasks is often ignored in a pipeline mode, and the tasks cannot benefit from each other so as to influence the final extraction effect. The method based on the combination is to perform the event recognition and the meta-role classification on the two tasks at the same time, most of the combined extraction models in recent years mainly adopt a method of sharing hard parameters in multi-task learning, the two tasks completely share the same section of bottom network and are respectively connected with respective network layers of the two tasks, the combined extraction mode can reduce the influence of error propagation, the two tasks are trained together to update the whole network parameters together, the information interaction between the two tasks is enhanced, and the final task effect is further improved. However, due to the variability of the event extraction between the two subtasks, the same network that is fully shared in the jointly extracted model tends to be prone to more complex tasks (meta-role classification tasks are more complex than event recognition tasks), and thus the seesaw phenomenon (improvement of the effect of one subtask and reduction of the effect of the other subtask) in multi-task learning can occur. The Multi-task learning (Multi-TASK LEARNING, MTL) refers to simultaneous learning of multiple tasks by using a single model, and improves the effect of the multiple tasks simultaneously by sharing information between the tasks, wherein the Multi-task learning is generally divided into a hard parameter sharing model and a soft parameter sharing model according to different sharing modes, and compared with the hard parameter sharing, the soft parameter sharing model does not completely share the same underlying network but encourages parameter similarity, and the shared network layer can better serve the downstream structure of two tasks on the premise of ensuring parameter sharing between the tasks, thereby improving the effect of the multiple tasks simultaneously and relieving the occurrence of the seesaw phenomenon.
Disclosure of Invention
In view of the above, the invention aims to provide a civil aviation unsafe event combined extraction method based on soft parameter sharing, which is improved on the method based on event combined extraction in the general field, constructs an event extraction data set for the civil aviation unsafe event, reduces the influence of a hard parameter sharing method in the traditional multi-task learning on the performance of the whole model, and finally completes efficient and accurate extraction of the civil aviation unsafe event.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a civil aviation unsafe event joint extraction method based on soft parameter sharing comprises the following steps of
S1: unsafe event text data in the civil aviation field are collected, and are preprocessed to construct event extraction data sets. The data set format is a data file processed into a Json format and is divided into a training set, a validation set and a test set according to an 8:1:1 ratio.
S2: and processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model.
S3: establishing a civil aviation unsafe event joint extraction model based on soft parameter sharing, wherein the civil aviation unsafe event joint extraction model comprises an embedded layer, a coding layer and a decoding prediction layer;
S4: and randomly sending the processed data of the input model into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are the accuracy, the recall rate and the F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value is not increased for a certain number of iterations, stopping training and saving the optimal model on the verification set.
S5: and taking the finally stored optimal model as a target model, sending the test set samples into the target model in batches, and outputting and storing the extracted result.
Further, for the collecting unsafe event text data in the civil aviation field in S1, preprocessing the data, and the specific steps for constructing the event extraction data set are as follows:
s11: and labeling the collected text of the civil aviation unsafe event according to the predefined event types, and labeling the sentences containing the event types in the text with specific event types.
S12: and marking the argument roles and arguments in the event frames according to the defined event frames for different event types.
S13: the final data set format is a processed data file in Json format, and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1; each piece of processed data of the data set comprises an original event sentence, an event type, a trigger word starting position index, an argument starting position index and an argument role type.
Further, for processing the event data in the Json format in the divided dataset in S2 to obtain the input format and the corresponding tag data required by the model, the method specifically includes:
S21: the training set, the verification set and the test set are respectively processed into a tsv format file for event identification and a tsv format file for meta character classification.
S22: and processing the tsv format file into an input format and corresponding label data required by the pre-training model. The processing steps for obtaining the input data required by the model include: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.
Further, the specific implementation steps of the civil aviation unsafe event joint extraction model based on soft parameter sharing in S3 include:
S31: the input event sentence is mapped to an embedded vector representation at word level through an embedding layer.
S32: and respectively inputting the vector representations into a shared network, a private network and a gate control network in the coding layer, and carrying out feature extraction and feature fusion to obtain the feature representations of the two tasks.
S33: the global best tag sequence of sentence level is selected from all possible sequence tags as final output by decoding with conditional random field (Conditional Random Field, CRF) and input to the decoding prediction layers of the two tasks respectively.
Furthermore, chineseBERT is adopted as an embedding vector representation of a character set of the input sequence for the embedding layer of the step S31 as a pre-training model, and the font and pinyin information of the characters are integrated in the pre-training stage, so that the method is more suitable for the natural language processing task of Chinese, and the character vector representation of the input text is obtained through ChineseBERT and is used as the characteristic input of the coding layer; chineseBERT is integrated with the font and phonetic information of the characters in the pre-training stage, is more suitable for the natural language processing task of Chinese, and obtains the character vector representation of each event sentence through ChineseBERT to be used as the characteristic input of the coding layer.
Further, for the step S32, the coding layer mainly includes a shared network, a private network, and a gate control network, and the coding layer is divided into a shared network layer and a private-shared network layer according to the gate control network; the shared network is composed of a group of fully connected networks as sub-networks, character-level embedded vectors acquired by an embedded layer are taken as input and are respectively input into each sub-network for acquiring shared knowledge of two tasks, and the two tasks screen the knowledge learned by the group of shared networks through a gating network of a first layer and then serve as the shared knowledge learned by the tasks; the private layer is a network layer which is independently shared by two tasks, biGRU networks are respectively used as the private networks of the two tasks, the dependency relationship between characters is convenient to capture, the extracted feature vector representation and the feature vector representation extracted by the first layer sharing network are subjected to selective feature fusion through a second layer gating network, wherein the gating network calculates a weighting function of a weight vector through linear change and a softmax layer, and the first layer learns the weight value of each sub-network output feature of the sharing network based on the input of the embedded layer, and the second layer learns the weight values of the private network output feature and the output feature of the sharing network layer; and finally, outputting the characteristic representation serving as the output of each task coding layer to a decoding prediction layer of the next layer.
Further, for the step S33 of decoding prediction layer mainly including a full connection network and a CRF network, both tasks adopt a sequence labeling mode; typically, a softmax classifier is selected to solve the multi-classification problem in the prediction stage, but the softmax classifier cannot consider the dependency relationship between labels in the sequence labeling problem, for example, the first label of a trigger word or argument is beginning with B-instead of I-and is generally the same type for a group of consecutive trigger words or argument labels, etc.; and the CRF model can obtain a globally optimal sequence label by considering constraint relations among labels through a transfer matrix, and in a decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence.
Further, for the iterative training process described in step S4, in order to better balance the difference between the two tasks in the training process, so that the training speeds of the two tasks are as consistent as possible, a dynamic weighting mode is adopted to allocate new loss weights to the two tasks in each round of training, the method learns the weight value of each task by considering the change rate of the loss, and finally the weighted task losses are added to be the total loss.
Further, for the step S5 of sending the test set samples into the target model in batches, outputting and storing the extracted results, the specific steps are as follows:
S51: and inputting the event sentences in the test set into the stored optimal model in batches, and respectively outputting an optimal sequence predicted for the event recognition task and an optimal sequence predicted for the argument character classification, wherein the optimal sequences are Json format files.
S52: and combining the two results to obtain the event knowledge file containing the complete structured form.
Compared with the prior art, the civil aviation unsafe event combined extraction method based on soft parameter sharing has the following advantages:
(1) According to the invention, the sharing parameters and task specific parameters are definitely separated in a soft parameter sharing mode, and the capacity of extracting and screening semantic knowledge of the model is enhanced through a double-layer gating network, so that the model can learn proper characteristic representation for two tasks at the same time, more efficient information sharing and joint representation learning are realized, and the occurrence of a seesaw phenomenon is effectively relieved;
(2) According to the method, the CRF network is used in the decoding prediction layer to consider constraint relations among the labels through the transfer matrix to obtain the globally optimal sequence label, so that the prediction effect of the model is effectively improved;
(3) In order to better balance the difference of two tasks in the training process and enable the training speeds of the two tasks to be as consistent as possible, the invention adopts a dynamic weighting mode to distribute new loss weights for training of the two tasks in each round so as to help the tasks to train to an optimal result at the same time.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of the method of the present invention;
FIG. 2 is an example of civil aviation unsafe event extraction;
FIG. 3 is a civil aviation unsafe time joint extraction model diagram based on soft parameter sharing;
FIG. 4 is a block diagram of a GRU network;
fig. 5 is a diagram of a BiGRU network configuration.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art in a specific case.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 1 and 2, an embodiment of the present invention provides a civil aviation unsafe event joint extraction method based on soft parameter sharing, including:
s1: unsafe event text data in the civil aviation field are collected, and are preprocessed to construct event extraction data sets. The data set format is a data file processed into a Json format and is divided into a training set, a validation set and a test set according to an 8:1:1 ratio.
S2: and processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model. Respectively processing the training set, the verification set and the test set into a tsv format file for event identification and a tsv format file for meta character classification; the processing step of processing the tsv format file into the input format and the corresponding label data required by the pre-training model to obtain the input data required by the model comprises the following steps: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.
S3: randomly sending the processed data of the input model into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are accuracy, recall rate and F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value is not increased for a certain number of iterations, stopping training and saving the optimal model on the verification set; the civil aviation unsafe event joint extraction model based on soft parameter sharing comprises an embedded layer, a coding layer and a decoding prediction layer. Firstly, mapping an input event sentence into an embedded vector representation of a word level through an embedded layer, then respectively inputting the vector representation into a shared network, a private network and a gate control network in an encoding layer, performing feature extraction and feature fusion to obtain respective feature representations of two tasks, finally respectively inputting the feature representations into a decoding prediction layer of the two tasks, decoding through a conditional random field (Conditional Random Field, CRF), and selecting a global optimal tag sequence of the sentence level from all possible sequence tags as final output.
S4: and taking the finally stored optimal model as a target model, sending the test set samples into the target model in batches, and outputting and storing the extracted result.
Specifically, in one embodiment, the preprocessing of the data in S1 includes labeling the collected text of the civil aviation unsafe event according to predefined event types, labeling specific event types for sentences containing the event types in the text, and labeling the argument roles and arguments in the text according to defined event frames for different event types.
Specifically, in one embodiment, the final data set format in S1 is a processed Json format data file, and is divided into a training set, a verification set and a test set according to the ratio of 8:1:1; each piece of processed data of the data set comprises an original event sentence, an event type, a trigger word starting position index, an argument starting position index and an argument role type.
Specifically, in one embodiment, in S2, the character vector representation of the input text is obtained through ChineseBERT, first, for the input sentence S, the sequence t= { T 0,t1,…,tn}∈Vc,Vc after word segmentation is obtained through data processing, where T 0 and T n are special characters [ CLS ] and [ SEP ] required for inputting ChineseBERT, n is the maximum length of the sequence, and each character T i obtains an embedded representation through ChineseBERTD is the dimension of the embedded vector:
xi=Embedding(ti)
an event sentence is embedded into a vector of I.e., x= { x 0,x1,…,xn }
Specifically, in one embodiment, as shown in fig. 3, the shared network in S3 is formed by a group of sub-networks, where each sub-network is a single-layer fully-connected network, and denoted as f i, i=1, …, m, m is the number of sub-networks, and the output of the task k after passing through the first layer shared network is:
wherein k=1, 2, 1 represents an event recognition task, 2 represents an argument character classification task, and superscript (0) represents a first layer, x is an output vector of an embedded layer, which The weight value that the first layer gating network outputs for task k for the ith sub-network fi (x),Is a weighted function of weight vectors calculated by the linear variation and softmax layers, expressed as
Wherein the method comprises the steps ofIs a trainable matrix, m is the number of sub-networks, d is the characteristic dimension of the input, and n is the length of the input sequence. The integration of the output characteristics of the sub-networks of the shared network is realized through the gating network of the first layer, so that the information required by the two tasks can be screened out from the shared network while the information sharing is ensured.
Specifically, in one embodiment, the private-shared network layer in S3, as shown in fig. 3, the private layers of the two tasks are BiGRU networks, so as to capture the dependency relationship between characters, and GRU (Gate Recurrent Unit), the structure of which is shown in fig. 4, and the specific working principle of which can be represented by the following formulas:
zt=σ(Wz·[ht-1,xt])
rt=σ(Wr·[ht-1,xt])
Wherein x t is information input at the current time, h t-1 is a hidden state at the previous time, h t is a hidden state transferred to the next time, R t is a candidate hidden state, z t is an update gate, σ is a sigmoid activation function, the input value is converted to a value of (0, 1), and tanh is a tanh activation function that converts the input value to a value between (-1, 1), where W z,Wr,Is a more feasible parameter matrix.
Specifically, in one embodiment, in S3, in order to better capture the bi-directional semantic dependency of text through the GRU, the private network further adopts BiGRU model, as shown in fig. 5, which is composed of two unidirectional GRUs, the input sequence is respectively input into the two GRU networks in the positive sequence and the reverse sequence for feature extraction, the extracted feature vectors are spliced to be output as the final network, and the corresponding calculation formula is as follows:
specifically, in one implementation manner, in S3, the two final tasks learn the task specific feature representations through BiGRU private networks respectively, and selectively fuse the feature representations learned by the second layer gating network and the shared network layer, so as to ensure that the two tasks can learn as much shared knowledge and private knowledge specific to the respective tasks in the coding layer, and the formula is expressed as follows:
wherein superscript (1) is denoted as second layer, The second tier gating network for task k outputs the weight value of the feature to the BiGRU private network,For the weight value of the second layer gating network of task k to the shared network output feature, H k is a BiGRU network learned feature representation vector,The feature representation vectors learned for the shared network layer.
Specifically, in one embodiment, the decoding prediction layer in S3 mainly includes a linear layer and a CRF layer, for an output sequence z= { Z 1,z2,…,zn } of one of the task linear layers, a tag sequence output by the CRF is y= { Y 1,y2,…,yn }, and then the total score of the tag sequences is:
where T is the transition score matrix and, A transfer score representing the transfer of label yi to label y i+1,Representing the output score of the ith word under the label y i,A score for each tag sequence that is likely to be output, so the probability distribution for output sequence Y is as follows:
Where Y Z is all possible tag sequences of sequence Z, the CRF layer is optimized to maximize the log likelihood estimate of the correct tag sequence Y * during training, and S (Z, Y *) is the score of the output correct tag sequence, as follows:
The loss function of each task is defined as
Loss=-log(P(Y*|Z))
In the decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence.
Specifically, in one embodiment, in the iterative training process described in S4, in order to better balance the difference between two tasks in the training process, so that the training speeds of the two tasks are as consistent as possible, a dynamic weighting mode is adopted to allocate new loss weights to the two tasks in each round of training, the method learns the weight value of each task by considering the change rate of the loss, and finally, the weighted task losses are added to be the total loss, where the formula is as follows:
Wherein e represents the number of training rounds epoch, r k is the ratio of the last round of training loss of the task k to the last round of training loss, represents the update rate of the last round of loss, and when epoch is 0 or 1, r k takes a value of 1; l k is the average loss per epoch, For the Loss weight of the task k, the Loss weight is obtained by multiplying r k by 2 after softmax normalization, loss Trigger is the Loss of the task identified by the current training event, and Loss Argument is the Loss of the argument angle class task.
Specifically, in one embodiment, the step S4 of sending the test set samples into the target model in batches, and outputting and storing the extracted results includes the following specific steps: 1. inputting event sentences in the test set into a stored optimal model in batches, and respectively outputting an optimal sequence predicted for an event recognition task and an optimal sequence predicted for argument character classification, wherein the optimal sequences are Json format files; 2. and combining the two results to obtain the event knowledge file containing the complete structured form.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (6)
1. A civil aviation unsafe event joint extraction method based on soft parameter sharing is characterized by comprising the following steps of: the method comprises the following steps:
S1: collecting unsafe event text data in the civil aviation field, preprocessing the data, and constructing an event extraction data set, wherein the data set is in a data file processed into a Json format and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1;
S2: processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model;
S3: establishing a civil aviation unsafe event joint extraction model based on soft parameter sharing, wherein the civil aviation unsafe event joint extraction model comprises an embedded layer, a coding layer and a decoding prediction layer;
The embedded layer adopts ChineseBERT as a pre-training model to obtain the embedded vector representation of a character set of an input sequence, and obtains the character vector representation of an input text through ChineseBERT to serve as characteristic input of the coding layer; acquiring character vector representation of each event sentence through ChineseBERT to be used as characteristic input of a coding layer;
The coding layer comprises a shared network, a private network and a gating network, and is divided into a shared network layer and a private-shared network layer according to the gating network; the shared network is composed of a group of fully connected networks as sub-networks, character-level embedded vectors acquired by an embedded layer are taken as input and are respectively input into each sub-network for acquiring shared knowledge of two tasks, the two tasks screen the learned knowledge of the group of shared networks through a gating network of a first layer respectively and then serve as the learned shared knowledge of the tasks respectively, a private layer is a network layer which is exclusive to the two tasks, biGRU networks are respectively taken as private networks of the two tasks, the extracted feature vector representation and the feature vector representation extracted by the first layer shared network are subjected to selective feature fusion through a second layer gating network, and finally the obtained feature representation is output to a decoding prediction layer of the next layer;
the decoding prediction layer comprises a full-connection network and a CRF network, both tasks adopt a sequence labeling mode, and in the decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence;
the implementation process of the civil aviation unsafe event joint extraction model based on soft parameter sharing is as follows:
s31: mapping the input event sentence into an embedded vector representation at a word level through an embedded layer;
s32: the vector representation is respectively input into a shared network, a private network and a gate control network in the coding layer, and feature extraction and feature fusion are carried out to obtain respective feature representations of two tasks;
S33: respectively inputting the global optimal tag sequences into decoding prediction layers of two tasks, decoding the global optimal tag sequences through a conditional random field, and selecting the global optimal tag sequences with sentence level from all possible sequence tags as final output;
S4: randomly sending the data processed in the step S2 into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, and verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are accuracy, recall rate and F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value does not rise for a certain number of iterations, stopping training and saving the optimal model on the verification set;
S5: and taking the finally stored optimal model as a target model, sending the test set samples into the target model in batches, and outputting and storing the extracted result.
2. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the S1, the method specifically comprises
S11: labeling the sentences containing the event types in the text with specific event types according to the collected text of the civil aviation unsafe event and the predefined event types;
S12: marking the argument roles and argument in the event frames according to the defined event frames for different event types;
S13: the final data set format is a processed data file in Json format, and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1; each piece of processed data of the data set comprises an original event sentence, an event type, a trigger word starting position index, an argument starting position index and an argument role type.
3. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the S2, specifically include
S21: respectively processing the training set, the verification set and the test set into a tsv format file for event identification and a tsv format file for meta character classification;
S22: the processing step of processing the tsv format file into the input format and the corresponding label data required by the pre-training model to obtain the input data required by the model comprises the following steps: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.
4. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: the gating network in the S3 calculates a weighting function of the weight vector through linear change and a softmax layer, and learns the weight value of each sub-network output characteristic of the sharing network in a first layer based on the input of the embedding layer, and learns the weight values of the private network output characteristic and the output characteristic of the sharing network layer in a second layer.
5. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the step S4, the iterative training process adopts a dynamic weighting mode to allocate new loss weights for two tasks in each round of training, learns the weight value of each task by considering the change rate of the loss, and finally adds the weighted task losses as the total loss.
6. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in S5, specifically include
S51: inputting event sentences in the test set into a stored optimal model in batches, and respectively outputting an optimal sequence predicted for an event recognition task and an optimal sequence predicted for argument character classification, wherein the optimal sequences are Json format files;
S52: and combining the two results to obtain the event knowledge file containing the complete structured form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210848785.2A CN115114409B (en) | 2022-07-19 | 2022-07-19 | Civil aviation unsafe event combined extraction method based on soft parameter sharing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210848785.2A CN115114409B (en) | 2022-07-19 | 2022-07-19 | Civil aviation unsafe event combined extraction method based on soft parameter sharing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115114409A CN115114409A (en) | 2022-09-27 |
CN115114409B true CN115114409B (en) | 2024-09-06 |
Family
ID=83333322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210848785.2A Active CN115114409B (en) | 2022-07-19 | 2022-07-19 | Civil aviation unsafe event combined extraction method based on soft parameter sharing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115114409B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116205584B (en) * | 2022-11-21 | 2023-08-22 | 中国民航科学技术研究院 | Civil aviation event association method based on unified space-time coding |
CN116108169B (en) * | 2022-12-12 | 2024-02-20 | 长三角信息智能创新研究院 | Hot wire work order intelligent dispatching method based on knowledge graph |
CN118277836B (en) * | 2024-05-31 | 2024-08-20 | 航安云创科技(北京)有限公司 | Method, device and equipment for determining classification and treatment measures of unsafe events |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032646A (en) * | 2019-05-08 | 2019-07-19 | 山西财经大学 | The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field |
CN112069811A (en) * | 2020-08-24 | 2020-12-11 | 武汉大学 | Electronic text event extraction method with enhanced multi-task interaction |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115151903A (en) * | 2020-12-25 | 2022-10-04 | 京东方科技集团股份有限公司 | Text extraction method and device, computer readable storage medium and electronic equipment |
-
2022
- 2022-07-19 CN CN202210848785.2A patent/CN115114409B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032646A (en) * | 2019-05-08 | 2019-07-19 | 山西财经大学 | The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field |
CN112069811A (en) * | 2020-08-24 | 2020-12-11 | 武汉大学 | Electronic text event extraction method with enhanced multi-task interaction |
Also Published As
Publication number | Publication date |
---|---|
CN115114409A (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057669A1 (en) | Method for pre-training knowledge graph on the basis of structured context information | |
CN108073711B (en) | Relation extraction method and system based on knowledge graph | |
CN112528676B (en) | Document-level event argument extraction method | |
CN115114409B (en) | Civil aviation unsafe event combined extraction method based on soft parameter sharing | |
CN108984745B (en) | Neural network text classification method fusing multiple knowledge maps | |
CN113204952B (en) | Multi-intention and semantic slot joint identification method based on cluster pre-analysis | |
CN109284406B (en) | Intention identification method based on difference cyclic neural network | |
CN110619051B (en) | Question sentence classification method, device, electronic equipment and storage medium | |
CN110826303A (en) | Joint information extraction method based on weak supervised learning | |
CN111753024A (en) | Public safety field-oriented multi-source heterogeneous data entity alignment method | |
CN112507039A (en) | Text understanding method based on external knowledge embedding | |
CN112699685B (en) | Named entity recognition method based on label-guided word fusion | |
CN109189862A (en) | A kind of construction of knowledge base method towards scientific and technological information analysis | |
CN114625882B (en) | Network construction method for improving unique diversity of image text description | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN112256866A (en) | Text fine-grained emotion analysis method based on deep learning | |
CN112668633B (en) | Adaptive graph migration learning method based on fine granularity field | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN114444515A (en) | Relation extraction method based on entity semantic fusion | |
CN111783688B (en) | Remote sensing image scene classification method based on convolutional neural network | |
CN113870863A (en) | Voiceprint recognition method and device, storage medium and electronic equipment | |
CN113657473A (en) | Web service classification method based on transfer learning | |
CN113239694A (en) | Argument role identification method based on argument phrase | |
CN117725999A (en) | Relation extraction method based on prompt learning and external knowledge embedding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |