CN115114409B

CN115114409B - Civil aviation unsafe event combined extraction method based on soft parameter sharing

Info

Publication number: CN115114409B
Application number: CN202210848785.2A
Authority: CN
Inventors: 冯小荣; 赵新阳; 冯兴杰; 蒋逸凡
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2024-09-06
Anticipated expiration: 2042-07-19
Also published as: CN115114409A

Abstract

The invention discloses a civil aviation unsafe event joint extraction method based on soft parameter sharing, which mainly extracts event information in unstructured unsafe event texts in the civil aviation field, including event trigger words, event types, event arguments and argument roles, and forms a structured data form. The invention constructs a joint extraction model based on soft parameter sharing, which clearly separates shared parameters and task specific parameters, enhances the capability of model extraction and semantic knowledge screening through a double-layer gating network, enables the model to learn proper characteristic representation for two tasks at the same time, realizes more efficient information sharing and joint representation learning, solves the phenomenon that two tasks cannot benefit at the same time when sharing the same coding layer due to task difference, and remarkably improves the integral effect of event extraction.

Description

Civil aviation unsafe event combined extraction method based on soft parameter sharing

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a civil aviation unsafe event combined extraction method based on soft parameter sharing.

Background

Event extraction is an important subject in information extraction research, and is the basic work of applications such as recommendation systems, intelligent question-answering, knowledge graph construction and the like. An event is a specific form of information, meaning that a specific occurrence of an event at a specific time, place, involves one or more participants, and can be generally described as a change in state. The main purpose of the event extraction task is to extract the event information such as event trigger words and argument from unstructured text into a structured form, wherein the event extraction comprises two subtasks, namely an event identification task and an argument role classification task, and the main elements of the task comprise event trigger words, event types, event argument and argument roles.

According to the summary of civil aviation development in 2020, the average number of executed flights in China is 30 ten thousand times per month, and as the number of flights increases, the number of unsafe events of civil aviation is also increased, the current information extraction and recording of most unsafe events are seriously dependent on manpower, high labor cost is generated, bird strike unsafe events are taken as an example, a bird strike event description text requires manual filling of event information such as bird strike time, place, corresponding flight number, bird strike position and the like into a form, time and effort are consumed, and a large number of event data cannot be simultaneously extracted uniformly to form structured data, so that the event information is further used for building event patterns in civil aviation field, and the event patterns have been proved to have important meanings such as risk event monitoring, existing knowledge patterns are enriched, history event management is convenient for inquiring history events and the like. How to efficiently and accurately extract more and more civil aviation unsafe events is an increasingly urgent problem to be solved.

In recent years, with the development of deep learning, most of research on event extraction algorithms is based on a deep learning architecture, and at present, a good effect is achieved on event extraction tasks based on a CNN, RNN, LSTM, BERT and other deep learning methods. However, most of the data used in research work at present are often events derived from general fields, and meanwhile, because of high cost of manual labeling, the common event extraction public data set is small in scale, the general fields are large in field range and many and miscellaneous in event types, so that the final effect of the event extraction method in the general fields in recent years cannot meet the requirements of civil aviation unsafe event extraction in the final effect of the common public data set ACE 2005. However, for civil aviation unsafe events, the event data in the limited field has the characteristics of single event text type, strong regularity of language representation, denser knowledge and the like, so that some research methods in the general field in recent years can obtain considerable effects on event extraction tasks in specific fields such as financial events, medical events and the like, and support is provided for the feasibility of event extraction in the civil aviation field.

The neural network model constructed in the event extraction method based on deep learning is mainly divided into two main types according to different extraction paradigms, namely extraction models based on Pipeline (Pipeline) and Joint (Joint). The model based on the assembly line is to extract trigger words to complete event recognition tasks and then extract argument to complete argument character classification tasks. However, this approach tends to suffer from two drawbacks: firstly, error propagation is carried out, errors generated by an event identification task in the first stage cannot be corrected in the next stage so as to influence a role classification task of a second stage, secondly, information interaction is lacking, two subtasks of event identification and role classification are mutually dependent, the information interaction between the two tasks is often ignored in a pipeline mode, and the tasks cannot benefit from each other so as to influence the final extraction effect. The method based on the combination is to perform the event recognition and the meta-role classification on the two tasks at the same time, most of the combined extraction models in recent years mainly adopt a method of sharing hard parameters in multi-task learning, the two tasks completely share the same section of bottom network and are respectively connected with respective network layers of the two tasks, the combined extraction mode can reduce the influence of error propagation, the two tasks are trained together to update the whole network parameters together, the information interaction between the two tasks is enhanced, and the final task effect is further improved. However, due to the variability of the event extraction between the two subtasks, the same network that is fully shared in the jointly extracted model tends to be prone to more complex tasks (meta-role classification tasks are more complex than event recognition tasks), and thus the seesaw phenomenon (improvement of the effect of one subtask and reduction of the effect of the other subtask) in multi-task learning can occur. The Multi-task learning (Multi-TASK LEARNING, MTL) refers to simultaneous learning of multiple tasks by using a single model, and improves the effect of the multiple tasks simultaneously by sharing information between the tasks, wherein the Multi-task learning is generally divided into a hard parameter sharing model and a soft parameter sharing model according to different sharing modes, and compared with the hard parameter sharing, the soft parameter sharing model does not completely share the same underlying network but encourages parameter similarity, and the shared network layer can better serve the downstream structure of two tasks on the premise of ensuring parameter sharing between the tasks, thereby improving the effect of the multiple tasks simultaneously and relieving the occurrence of the seesaw phenomenon.

Disclosure of Invention

In view of the above, the invention aims to provide a civil aviation unsafe event combined extraction method based on soft parameter sharing, which is improved on the method based on event combined extraction in the general field, constructs an event extraction data set for the civil aviation unsafe event, reduces the influence of a hard parameter sharing method in the traditional multi-task learning on the performance of the whole model, and finally completes efficient and accurate extraction of the civil aviation unsafe event.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

a civil aviation unsafe event joint extraction method based on soft parameter sharing comprises the following steps of

S1: unsafe event text data in the civil aviation field are collected, and are preprocessed to construct event extraction data sets. The data set format is a data file processed into a Json format and is divided into a training set, a validation set and a test set according to an 8:1:1 ratio.

S2: and processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model.

S3: establishing a civil aviation unsafe event joint extraction model based on soft parameter sharing, wherein the civil aviation unsafe event joint extraction model comprises an embedded layer, a coding layer and a decoding prediction layer;

S4: and randomly sending the processed data of the input model into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are the accuracy, the recall rate and the F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value is not increased for a certain number of iterations, stopping training and saving the optimal model on the verification set.

S5: and taking the finally stored optimal model as a target model, sending the test set samples into the target model in batches, and outputting and storing the extracted result.

Further, for the collecting unsafe event text data in the civil aviation field in S1, preprocessing the data, and the specific steps for constructing the event extraction data set are as follows:

s11: and labeling the collected text of the civil aviation unsafe event according to the predefined event types, and labeling the sentences containing the event types in the text with specific event types.

S12: and marking the argument roles and arguments in the event frames according to the defined event frames for different event types.

S13: the final data set format is a processed data file in Json format, and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1; each piece of processed data of the data set comprises an original event sentence, an event type, a trigger word starting position index, an argument starting position index and an argument role type.

Further, for processing the event data in the Json format in the divided dataset in S2 to obtain the input format and the corresponding tag data required by the model, the method specifically includes:

S21: the training set, the verification set and the test set are respectively processed into a tsv format file for event identification and a tsv format file for meta character classification.

S22: and processing the tsv format file into an input format and corresponding label data required by the pre-training model. The processing steps for obtaining the input data required by the model include: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.

Further, the specific implementation steps of the civil aviation unsafe event joint extraction model based on soft parameter sharing in S3 include:

S31: the input event sentence is mapped to an embedded vector representation at word level through an embedding layer.

S32: and respectively inputting the vector representations into a shared network, a private network and a gate control network in the coding layer, and carrying out feature extraction and feature fusion to obtain the feature representations of the two tasks.

S33: the global best tag sequence of sentence level is selected from all possible sequence tags as final output by decoding with conditional random field (Conditional Random Field, CRF) and input to the decoding prediction layers of the two tasks respectively.

Furthermore, chineseBERT is adopted as an embedding vector representation of a character set of the input sequence for the embedding layer of the step S31 as a pre-training model, and the font and pinyin information of the characters are integrated in the pre-training stage, so that the method is more suitable for the natural language processing task of Chinese, and the character vector representation of the input text is obtained through ChineseBERT and is used as the characteristic input of the coding layer; chineseBERT is integrated with the font and phonetic information of the characters in the pre-training stage, is more suitable for the natural language processing task of Chinese, and obtains the character vector representation of each event sentence through ChineseBERT to be used as the characteristic input of the coding layer.

Further, for the step S32, the coding layer mainly includes a shared network, a private network, and a gate control network, and the coding layer is divided into a shared network layer and a private-shared network layer according to the gate control network; the shared network is composed of a group of fully connected networks as sub-networks, character-level embedded vectors acquired by an embedded layer are taken as input and are respectively input into each sub-network for acquiring shared knowledge of two tasks, and the two tasks screen the knowledge learned by the group of shared networks through a gating network of a first layer and then serve as the shared knowledge learned by the tasks; the private layer is a network layer which is independently shared by two tasks, biGRU networks are respectively used as the private networks of the two tasks, the dependency relationship between characters is convenient to capture, the extracted feature vector representation and the feature vector representation extracted by the first layer sharing network are subjected to selective feature fusion through a second layer gating network, wherein the gating network calculates a weighting function of a weight vector through linear change and a softmax layer, and the first layer learns the weight value of each sub-network output feature of the sharing network based on the input of the embedded layer, and the second layer learns the weight values of the private network output feature and the output feature of the sharing network layer; and finally, outputting the characteristic representation serving as the output of each task coding layer to a decoding prediction layer of the next layer.

Further, for the step S33 of decoding prediction layer mainly including a full connection network and a CRF network, both tasks adopt a sequence labeling mode; typically, a softmax classifier is selected to solve the multi-classification problem in the prediction stage, but the softmax classifier cannot consider the dependency relationship between labels in the sequence labeling problem, for example, the first label of a trigger word or argument is beginning with B-instead of I-and is generally the same type for a group of consecutive trigger words or argument labels, etc.; and the CRF model can obtain a globally optimal sequence label by considering constraint relations among labels through a transfer matrix, and in a decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence.

Further, for the iterative training process described in step S4, in order to better balance the difference between the two tasks in the training process, so that the training speeds of the two tasks are as consistent as possible, a dynamic weighting mode is adopted to allocate new loss weights to the two tasks in each round of training, the method learns the weight value of each task by considering the change rate of the loss, and finally the weighted task losses are added to be the total loss.

Further, for the step S5 of sending the test set samples into the target model in batches, outputting and storing the extracted results, the specific steps are as follows:

S51: and inputting the event sentences in the test set into the stored optimal model in batches, and respectively outputting an optimal sequence predicted for the event recognition task and an optimal sequence predicted for the argument character classification, wherein the optimal sequences are Json format files.

S52: and combining the two results to obtain the event knowledge file containing the complete structured form.

Compared with the prior art, the civil aviation unsafe event combined extraction method based on soft parameter sharing has the following advantages:

(1) According to the invention, the sharing parameters and task specific parameters are definitely separated in a soft parameter sharing mode, and the capacity of extracting and screening semantic knowledge of the model is enhanced through a double-layer gating network, so that the model can learn proper characteristic representation for two tasks at the same time, more efficient information sharing and joint representation learning are realized, and the occurrence of a seesaw phenomenon is effectively relieved;

(2) According to the method, the CRF network is used in the decoding prediction layer to consider constraint relations among the labels through the transfer matrix to obtain the globally optimal sequence label, so that the prediction effect of the model is effectively improved;

(3) In order to better balance the difference of two tasks in the training process and enable the training speeds of the two tasks to be as consistent as possible, the invention adopts a dynamic weighting mode to distribute new loss weights for training of the two tasks in each round so as to help the tasks to train to an optimal result at the same time.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of the method of the present invention;

FIG. 2 is an example of civil aviation unsafe event extraction;

FIG. 3 is a civil aviation unsafe time joint extraction model diagram based on soft parameter sharing;

FIG. 4 is a block diagram of a GRU network;

fig. 5 is a diagram of a BiGRU network configuration.

Detailed Description

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art in a specific case.

The invention will be described in detail below with reference to the drawings in connection with embodiments.

As shown in fig. 1 and 2, an embodiment of the present invention provides a civil aviation unsafe event joint extraction method based on soft parameter sharing, including:

S2: and processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model. Respectively processing the training set, the verification set and the test set into a tsv format file for event identification and a tsv format file for meta character classification; the processing step of processing the tsv format file into the input format and the corresponding label data required by the pre-training model to obtain the input data required by the model comprises the following steps: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.

S3: randomly sending the processed data of the input model into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are accuracy, recall rate and F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value is not increased for a certain number of iterations, stopping training and saving the optimal model on the verification set; the civil aviation unsafe event joint extraction model based on soft parameter sharing comprises an embedded layer, a coding layer and a decoding prediction layer. Firstly, mapping an input event sentence into an embedded vector representation of a word level through an embedded layer, then respectively inputting the vector representation into a shared network, a private network and a gate control network in an encoding layer, performing feature extraction and feature fusion to obtain respective feature representations of two tasks, finally respectively inputting the feature representations into a decoding prediction layer of the two tasks, decoding through a conditional random field (Conditional Random Field, CRF), and selecting a global optimal tag sequence of the sentence level from all possible sequence tags as final output.

S4: and taking the finally stored optimal model as a target model, sending the test set samples into the target model in batches, and outputting and storing the extracted result.

Specifically, in one embodiment, the preprocessing of the data in S1 includes labeling the collected text of the civil aviation unsafe event according to predefined event types, labeling specific event types for sentences containing the event types in the text, and labeling the argument roles and arguments in the text according to defined event frames for different event types.

Specifically, in one embodiment, the final data set format in S1 is a processed Json format data file, and is divided into a training set, a verification set and a test set according to the ratio of 8:1:1; each piece of processed data of the data set comprises an original event sentence, an event type, a trigger word starting position index, an argument starting position index and an argument role type.

Specifically, in one embodiment, in S2, the character vector representation of the input text is obtained through ChineseBERT, first, for the input sentence S, the sequence t= { T ₀,t₁,…,t_n}∈V_c,V_c after word segmentation is obtained through data processing, where T ₀ and T _n are special characters [ CLS ] and [ SEP ] required for inputting ChineseBERT, n is the maximum length of the sequence, and each character T _i obtains an embedded representation through ChineseBERTD is the dimension of the embedded vector:

x_i＝Embedding(t_i)

an event sentence is embedded into a vector of I.e., x= { x ₀,x₁,…,x_n }

Specifically, in one embodiment, as shown in fig. 3, the shared network in S3 is formed by a group of sub-networks, where each sub-network is a single-layer fully-connected network, and denoted as f _i, i=1, …, m, m is the number of sub-networks, and the output of the task k after passing through the first layer shared network is:

wherein k=1, 2, 1 represents an event recognition task, 2 represents an argument character classification task, and superscript (0) represents a first layer, x is an output vector of an embedded layer, which The weight value that the first layer gating network outputs for task k for the ith sub-network fi (x),Is a weighted function of weight vectors calculated by the linear variation and softmax layers, expressed as

Wherein the method comprises the steps ofIs a trainable matrix, m is the number of sub-networks, d is the characteristic dimension of the input, and n is the length of the input sequence. The integration of the output characteristics of the sub-networks of the shared network is realized through the gating network of the first layer, so that the information required by the two tasks can be screened out from the shared network while the information sharing is ensured.

Specifically, in one embodiment, the private-shared network layer in S3, as shown in fig. 3, the private layers of the two tasks are BiGRU networks, so as to capture the dependency relationship between characters, and GRU (Gate Recurrent Unit), the structure of which is shown in fig. 4, and the specific working principle of which can be represented by the following formulas:

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

Wherein x _t is information input at the current time, h _t-1 is a hidden state at the previous time, h _t is a hidden state transferred to the next time, R _t is a candidate hidden state, z _t is an update gate, σ is a sigmoid activation function, the input value is converted to a value of (0, 1), and tanh is a tanh activation function that converts the input value to a value between (-1, 1), where W _z,W_r,Is a more feasible parameter matrix.

Specifically, in one embodiment, in S3, in order to better capture the bi-directional semantic dependency of text through the GRU, the private network further adopts BiGRU model, as shown in fig. 5, which is composed of two unidirectional GRUs, the input sequence is respectively input into the two GRU networks in the positive sequence and the reverse sequence for feature extraction, the extracted feature vectors are spliced to be output as the final network, and the corresponding calculation formula is as follows:

specifically, in one implementation manner, in S3, the two final tasks learn the task specific feature representations through BiGRU private networks respectively, and selectively fuse the feature representations learned by the second layer gating network and the shared network layer, so as to ensure that the two tasks can learn as much shared knowledge and private knowledge specific to the respective tasks in the coding layer, and the formula is expressed as follows:

wherein superscript (1) is denoted as second layer, The second tier gating network for task k outputs the weight value of the feature to the BiGRU private network,For the weight value of the second layer gating network of task k to the shared network output feature, H _k is a BiGRU network learned feature representation vector,The feature representation vectors learned for the shared network layer.

Specifically, in one embodiment, the decoding prediction layer in S3 mainly includes a linear layer and a CRF layer, for an output sequence z= { Z ₁,z₂,…,z_n } of one of the task linear layers, a tag sequence output by the CRF is y= { Y ₁,y₂,…,y_n }, and then the total score of the tag sequences is:

where T is the transition score matrix and, A transfer score representing the transfer of label yi to label y _i+1,Representing the output score of the ith word under the label y _i,A score for each tag sequence that is likely to be output, so the probability distribution for output sequence Y is as follows:

Where Y _Z is all possible tag sequences of sequence Z, the CRF layer is optimized to maximize the log likelihood estimate of the correct tag sequence Y ^* during training, and S (Z, Y ^*) is the score of the output correct tag sequence, as follows:

The loss function of each task is defined as

Loss＝-log(P(Y^*|Z))

In the decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence.

Specifically, in one embodiment, in the iterative training process described in S4, in order to better balance the difference between two tasks in the training process, so that the training speeds of the two tasks are as consistent as possible, a dynamic weighting mode is adopted to allocate new loss weights to the two tasks in each round of training, the method learns the weight value of each task by considering the change rate of the loss, and finally, the weighted task losses are added to be the total loss, where the formula is as follows:

Wherein e represents the number of training rounds epoch, r _k is the ratio of the last round of training loss of the task k to the last round of training loss, represents the update rate of the last round of loss, and when epoch is 0 or 1, r _k takes a value of 1; l _k is the average loss per epoch, For the Loss weight of the task k, the Loss weight is obtained by multiplying r _k by 2 after softmax normalization, loss _Trigger is the Loss of the task identified by the current training event, and Loss _Argument is the Loss of the argument angle class task.

Specifically, in one embodiment, the step S4 of sending the test set samples into the target model in batches, and outputting and storing the extracted results includes the following specific steps: 1. inputting event sentences in the test set into a stored optimal model in batches, and respectively outputting an optimal sequence predicted for an event recognition task and an optimal sequence predicted for argument character classification, wherein the optimal sequences are Json format files; 2. and combining the two results to obtain the event knowledge file containing the complete structured form.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A civil aviation unsafe event joint extraction method based on soft parameter sharing is characterized by comprising the following steps of: the method comprises the following steps:

S1: collecting unsafe event text data in the civil aviation field, preprocessing the data, and constructing an event extraction data set, wherein the data set is in a data file processed into a Json format and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1;

S2: processing event data in each Json format in the divided data set to obtain an input format and corresponding tag data required by the model;

The embedded layer adopts ChineseBERT as a pre-training model to obtain the embedded vector representation of a character set of an input sequence, and obtains the character vector representation of an input text through ChineseBERT to serve as characteristic input of the coding layer; acquiring character vector representation of each event sentence through ChineseBERT to be used as characteristic input of a coding layer;

The coding layer comprises a shared network, a private network and a gating network, and is divided into a shared network layer and a private-shared network layer according to the gating network; the shared network is composed of a group of fully connected networks as sub-networks, character-level embedded vectors acquired by an embedded layer are taken as input and are respectively input into each sub-network for acquiring shared knowledge of two tasks, the two tasks screen the learned knowledge of the group of shared networks through a gating network of a first layer respectively and then serve as the learned shared knowledge of the tasks respectively, a private layer is a network layer which is exclusive to the two tasks, biGRU networks are respectively taken as private networks of the two tasks, the extracted feature vector representation and the feature vector representation extracted by the first layer shared network are subjected to selective feature fusion through a second layer gating network, and finally the obtained feature representation is output to a decoding prediction layer of the next layer;

the decoding prediction layer comprises a full-connection network and a CRF network, both tasks adopt a sequence labeling mode, and in the decoding prediction stage, a Viterbi dynamic programming algorithm is adopted to solve a label sequence with the highest total score as an optimal sequence;

the implementation process of the civil aviation unsafe event joint extraction model based on soft parameter sharing is as follows:

s31: mapping the input event sentence into an embedded vector representation at a word level through an embedded layer;

s32: the vector representation is respectively input into a shared network, a private network and a gate control network in the coding layer, and feature extraction and feature fusion are carried out to obtain respective feature representations of two tasks;

S33: respectively inputting the global optimal tag sequences into decoding prediction layers of two tasks, decoding the global optimal tag sequences through a conditional random field, and selecting the global optimal tag sequences with sentence level from all possible sequence tags as final output;

S4: randomly sending the data processed in the step S2 into a constructed civil aviation unsafe event combined extraction model based on soft parameter sharing in batches according to batchsize to carry out iterative training, and verifying the effect of the model by using a verification set for each epoch, wherein evaluation indexes are accuracy, recall rate and F1 value, if the F1 value of event identification and argument character classification is simultaneously greater than the F1 value of the previous training, saving the model, setting early stop, and if the F1 value does not rise for a certain number of iterations, stopping training and saving the optimal model on the verification set;

2. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the S1, the method specifically comprises

S11: labeling the sentences containing the event types in the text with specific event types according to the collected text of the civil aviation unsafe event and the predefined event types;

S12: marking the argument roles and argument in the event frames according to the defined event frames for different event types;

3. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the S2, specifically include

S21: respectively processing the training set, the verification set and the test set into a tsv format file for event identification and a tsv format file for meta character classification;

S22: the processing step of processing the tsv format file into the input format and the corresponding label data required by the pre-training model to obtain the input data required by the model comprises the following steps: the input sentence is split into character-level sequences and processed by tokenzier method, the input data processed to contain input_ids, token_type_ids, attention _mask is fed into the model together with the processed tag data.

4. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: the gating network in the S3 calculates a weighting function of the weight vector through linear change and a softmax layer, and learns the weight value of each sub-network output characteristic of the sharing network in a first layer based on the input of the embedding layer, and learns the weight values of the private network output characteristic and the output characteristic of the sharing network layer in a second layer.

5. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in the step S4, the iterative training process adopts a dynamic weighting mode to allocate new loss weights for two tasks in each round of training, learns the weight value of each task by considering the change rate of the loss, and finally adds the weighted task losses as the total loss.

6. The civil aviation unsafe event joint extraction method based on soft parameter sharing according to claim 1, wherein the method comprises the following steps: in S5, specifically include

S51: inputting event sentences in the test set into a stored optimal model in batches, and respectively outputting an optimal sequence predicted for an event recognition task and an optimal sequence predicted for argument character classification, wherein the optimal sequences are Json format files;