CN118069840A

CN118069840A - Domain knowledge guided empty pipe unsafe event classification method

Info

Publication number: CN118069840A
Application number: CN202410156217.5A
Authority: CN
Inventors: 曾维理; 郭子逸; 朱聃; 江灏; 周亚东; 谭湘花; 刘继新
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2024-02-04
Filing date: 2024-02-04
Publication date: 2024-05-24

Abstract

The invention discloses a field knowledge guided empty pipe unsafe event classification method, which classifies unsafe event text data acquired in advance and preprocesses the text data; detecting and classifying the terms in an unsupervised mode based on a byte pair coding algorithm and a term word frequency method, and constructing an unsafe event domain knowledge base; constructing a text classification deep learning model to classify the empty-management safe text event under the guidance of domain knowledge; the text classification deep learning model comprises an embedding module, a wide module, a deep module and a classifier; and taking the historical unsafe event record of the empty pipe dangerous source database as a model input sample to carry out model training, and iteratively updating model parameters, thereby realizing accurate empty pipe unsafe event classification guided by field knowledge. The method can greatly reduce the acquisition difficulty of the domain knowledge, reduce the interaction loss of the domain knowledge and the input text, and is favorable for realizing the accurate classification of the empty pipe unsafe events guided by the domain knowledge.

Description

Domain knowledge guided empty pipe unsafe event classification method

Technical Field

The invention belongs to the technical field of digitization and intellectualization of civil aviation air traffic control, and particularly relates to an air traffic control unsafe event classification method guided by field knowledge.

Background

In recent years, with the rapid increase of civil aviation transportation demand, in order to improve the security capability, the collected civil aviation security information is fully utilized, the security condition and trend are evaluated, the information-driven security management is realized, and the information-driven security management is necessary for realizing the civil aviation security transportation. In order to achieve the information driven goal, it is important to automatically classify event data in security information. The automatic classification of the events is that statistical methods such as causal inference analysis and the like are applied to preconditions of civil aviation safety management, and the method has great promotion effect on timely finding potential safety hazards, controlling risks and preventing civil aviation accidents. The empty pipe is used as a central nerve of civil aviation, and intelligent analysis of unsafe events of the empty pipe is considered in advance.

The empty pipe unsafe event report is typically recorded in short text. The existing short text classification technology is mainly divided into three types: (1) rule-based method: such methods divide text into different categories by matching the text to be classified with a set of manually constructed linguistic rules. Rule-based text classification is useful in situations where available training data is limited, allowing for a more transparent and interpretable decision making process. However, such methods rely on a significant amount of manual work to create and maintain rules. (2) machine learning based methods: such methods typically break down the text classification problem into a two-step task: firstly, designing a proper feature extraction and dimension reduction method according to the characteristics of a text, and then training various machine learning models to complete regression from features to categories. For the problem of text classification, the greatest disadvantage of the method is that the characteristic engineering is complex, the method is seriously dependent on domain knowledge, and generalization is difficult. (3) a deep learning-based method: the process of manually extracting features is replaced by introducing an embedded model to map text into low-dimensional continuous feature vectors. For the embedded vector of the text, various deep learning models such as LSTM, CNN and the like and combinations thereof can be accessed as classifiers, so that the text classification task is completed.

The prior art currently has the following problems: (1) In areas of high specificity, such as the air traffic control area, performance is degraded due to the presence of special area knowledge. In these fields, the deep learning model requires more training data than the general model to fit the field causal relationships and learn the embedding of the field terms. (2) It is difficult to solve the problem of domain drift caused by the diversity of the training sample representation. For example, for the category "military and civil aviation conflict", the text expression of the sample will mostly exist in "military aircraft" or its close meaning, and a small part will not refer to "military", but will include descriptions of specific models such as "bombers". Both of the above expressions are actually predictive of class by the hidden variable "military aircraft". However, what is actually observed in the sample is some expressed mapping of hidden variables. The model does not learn the true hidden variables under the influence, thereby causing erroneous judgment.

Disclosure of Invention

The invention aims to: aiming at the problems of the method for extracting the critical characteristics of dangerous sources of unsafe events, the invention provides a method for classifying the unsafe events of the empty pipe guided by domain knowledge, which realizes accurate classification of the unsafe events of the empty pipe guided by domain knowledge.

The technical scheme is as follows: the invention relates to a field knowledge guided empty pipe unsafe event classification method, which comprises the following steps:

(1) Classifying pre-acquired unsafe event text data, and preprocessing the text data;

(2) Detecting and classifying the terms in an unsupervised mode based on a byte pair coding algorithm and a term word frequency method, and constructing an unsafe event domain knowledge base;

(3) Constructing a text classification deep learning model to classify the empty-management safe text event under the guidance of domain knowledge; the text classification deep learning model comprises an embedding module, a wide module, a deep module and a classifier;

(4) And taking the historical unsafe event record of the empty pipe dangerous source database as a model input sample to carry out model training, and iteratively updating model parameters, thereby realizing accurate empty pipe unsafe event classification guided by field knowledge.

Further, the classifying implementation process of the pre-acquired unsafe event text data in the step (1) is as follows:

The data unsafe event text data is divided into a hazard source number, a location unit, a specialty, a hazard source description, a trigger, a result, a likelihood, a severity, an initial risk level, a significant hazard source, an existing control measure, a buffer measure, a regulatory unit, an expected severity, an expected likelihood, an expected risk level, a remaining or derived risk, a safety performance goal, and a control status.

Further, the preprocessing implementation process of the text data in the step (1) is as follows:

Deleting invalid characters containing messy codes, redundant blank spaces and line feed symbols in the text data; the description related to the space entity is replaced by a unified mark; replacing specialized abbreviations in the dangerous source database with corresponding full names; and marking the unsafe events corresponding to each unsafe event result text to form an empty pipe unsafe event classification data set.

Further, the implementation process of the step (2) is as follows:

Traversing the civil aviation regulation text corpus based on a byte pair coding algorithm, searching a pair of adjacent Chinese characters with highest occurrence frequency, and replacing the adjacent Chinese characters with a new word mark; continuously searching adjacent Chinese characters with highest occurrence frequency, wherein the adjacent Chinese characters comprise Chinese character-Chinese character combinations and Chinese character-word mark combinations; repeating the above operations until a preset stopping condition is met, and obtaining a regular text word segmentation dictionary containing frequency; word segmentation is carried out on the regulation file corpus according to the dictionary;

in some category of text, a term appears more frequently than other terms and less frequently than other categories of text, then the term is a term associated with that category; the KF-IDF value was calculated according to this principle:

Wherein docs (term, cat) is the number of documents in the category containing the term, is the number of categories in which the term appears, α is a smoothing factor;

further screening redundant items in the results obtained by using KF-IDF values, wherein the screening is realized through the parts of speech of the term phrases; and eliminating other phrases which do not meet the part-of-speech requirement in the list.

Further, the embedding module in the step (3) converts the input text and the domain knowledge text into an embedded vector; the specific implementation process is as follows:

For text sequence input { w ₁,w₂,. }, first, it is tokenized by a token analyzer to obtain a string of token sequences of length N Next calculate word-level embedded sequence S ^w from BERT:

wherein, An embedded vector representing an nth word element, h being a hidden layer dimension of the BERT;

For each domain knowledge text, representing the domain knowledge text by adopting an average value of embedded vectors of each word source, and representing all domain knowledge texts as a complete sequence;

For the field knowledge text set L= { P ₁,P₂,…,P_M } with the number of M, the same word element analyzer is adopted to obtain a word element sequence with the length of L (M) Then calculate the embedding of P _j:

When M is not equal to N, filling or cutting the material to make the length of the material become M; converting L into a domain knowledge text embedded sequence S ^p:

Wherein e _li represents the phrase-level embedded vector of the i-th domain knowledge text; the matrix forms of S ^w and S ^p are respectively:

further, the wide module in the step (3) realizes memorization and recall of knowledge in various fields of event labels; the specific implementation process is as follows:

Performing moving average on the embedded vectors of the input text word by word; for the input text M ^w in matrix form and the embedded sequence M ^p of the domain knowledge text, they are denoted as the query matrix Q ^W and the key matrix K ^W, respectively:

K^W＝M^p

Wherein r is half step length of the moving average, h is dimension of the embedded vector, and the value matrix is a single-heat matrix of the category to which the domain knowledge text belongs, namely:

Wherein K represents the dimension of the one-hot encoding of the tag, i.e., the number of categories; the nth row of matrix Q ^W represents the embedded vector of the nth lemma of the input text The m-th row of matrix K ^W represents phrase-level embedding/>, of the m-th domain knowledge textThe m line of V ^W is the single-hot code of the category to which the m field knowledge text belongs;

the calculation of the attention score is realized through an attention mechanism of regularization compatibility:

Β＝(Q^WK^W)^T

wherein, beta _j is the compatibility degree of elements between the token sequence and the domain knowledge text sequence, beta is the j-th column vector of beta;

pooling the attention scores of the regularized compatibility by adopting maximum pooling; for a ^W, the probability that the input text belongs to the kth class is:

further, the deep module of step (3) includes a deep text attention block and a deep domain attention block; the deep text attention block trains a model to pay attention to important words in sentences through the self-attention module; the depth domain attention block trains a model to pay attention to important domain knowledge through a common attention module; and combining the outputs of the multi-layer deep text attention block and the deep field attention block to obtain the output of the deep module, thereby realizing the deep extraction of text information.

Further, the deep text attention block processes the input text with a self-attention module; adding learnable parameters in an attention mechanismAnd/>Calculating self-attention of the input text:

A^X＝Attention(Q^X,K^X,V^X)

The residual connection and LayerNormalization (LN) are then performed for a ^X:

A^X′＝M^w+LayerNorm(A^X)

A ^X′ is then non-linearised by the feed-forward layer, entering the next deep text attention block; calculate an average pooling for each word element of the input sequence at the end of all deep text attention blocks:

Where d ^X represents sentence-level embedding of the input text after the self-attention mechanism processing, i.e., the output of the deep text attention block.

Further, the depth domain attention block adds a learnable parameter in an attention mechanism, and the value is an embedded vector of domain knowledge text, and the input embedded sequence calculates the attention after matrix multiplication with the parameter:

Α^D＝Attention(Q^D,K^D,V^D)

A^D′＝M^p+LayerNorm(Α^D)

average pooling was also calculated for a ^D′:

The vector d ^D is the output of the attention block in the depth field, and d ^D is also nonlinear through the feedforward layer and then enters the attention block in the next depth field.

Further, the classifier in the step (3) is implemented as follows:

the dimension of the output w of the wide module is the number K of labels; sentence-by-sentence depth text feature d of the deep module, wherein the dimension is the dimension h of the embedded vector; the classifier integrates the two vectors and outputs the prediction probability of the category to which the text belongs; first, d is reduced in dimension, and two vectors are integrated into one vector x:

d′＝W₁d+b₁

x＝concat(w,d′)

Wherein, W ₁ and b ₁ are parameters of a linear layer for reducing d; concat represents the connection of vectors; inputting x into a fully connected layer with tanh as an activation function for classification, and converting the result into a prediction probability through a softmax function:

Wherein W ₂ and b ₂ are parameters of the full connection layer;

minimizing a binary cross entropy loss function between the predicted value and the true value:

where y is the true class of the sample.

The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that:

1. The invention characterizes the prior knowledge in the field in a text form, maps the prior knowledge and the input text into the same embedded space, realizes interaction with the input text and guides classification; this method records domain knowledge directly in terms (phrases) and therefore does not require structured data storage skills; the acquisition difficulty of the domain priori knowledge can be greatly reduced, and the interaction loss of the domain priori knowledge and the input text is reduced;

2. the invention introduces the wide module and the deep module to respectively finish the tasks of memorizing and applying priori knowledge or mining potential classification modes, thereby realizing the application of the priori knowledge of the field and solving the drift problem of the field;

3. The invention adopts an unsupervised mode to extract different forms of expressions of unsafe events related to the empty pipe from a large number of civil aviation regulation files to be used as the candidates of a domain knowledge dictionary; compared with an expert knowledge acquisition means based on experience, the method is beneficial to obtaining more comprehensive and accurate domain knowledge texts.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the invention provides a field knowledge guided empty pipe unsafe event classification method, which specifically comprises the following steps:

Step 1: classifying the pre-acquired unsafe event text data, and preprocessing the text data.

Firstly, reading in unsafe event text data, determining a classification basis, and then carrying out preprocessing of text denoising, space entity marking and abbreviation replacement on the text data.

The air-management hazard source database divides the data into 19 attribute stores of hazard source number, location unit, specialty, hazard source description, trigger, outcome, likelihood, severity, initial risk level, major hazard source, existing control measures, slow control measures, regulatory unit, expected severity, expected likelihood, expected risk level, remaining/derived risk, safety performance objective and control status. The invention classifies the empty pipe unsafe events according to the 'result' attribute data. The "outcome" is a description of the consequences that the reporting personnel may or have caused to the unsafe event, and is the best attribute for identifying the unsafe event class.

Preprocessing unsafe event text data:

Denoising text: the text data contains invalid characters such as messy codes, redundant blank spaces, line feed symbols and the like, which cause interference to the training process and delete the characters.

Spatial entity labeling: some descriptions relate to space entities, such as city names, airport names, waypoint names, and the like. These entities are replaced with uniform labels, thereby eliminating the bias due to space factors.

Abbreviation substitution: the hazard database contains a large number of highly specialized abbreviations that are replaced with corresponding holo-names.

And marking the data of the data set. Class 21 standard hazard events are defined according to the International civil aviation organization safety Manual (Doc 9859) and civil aircraft Accident Condition (MH/T2001-2015). The outcome description corresponds to multiple unsafe events, i.e., one data instance may correspond to multiple tags. And judging the type of each unsafe event result text according to the regulations in the International civil aviation organization safety management manual (Doc 9859) and the civil aircraft accident sign (MH/T2001-2015), and marking to form an empty pipe unsafe event classification data set.

Step 2: extracting terms containing category labels from massive regulation files, firstly adopting a byte pair coding algorithm to realize word segmentation, then adopting a method based on term word frequency, detecting the terms in an unsupervised mode and classifying the terms, and constructing an unsafe event field knowledge base. The method comprises the following specific steps:

Traversing the civil aviation regulation text corpus, searching a pair of adjacent Chinese characters with highest occurrence frequency, and replacing the adjacent Chinese characters with a new word mark; the adjacent Chinese characters with highest occurrence frequency (comprising the combination of Chinese characters and the combination of word marks) are continuously searched. And repeatedly iterating the above operation until the stopping condition is met, and obtaining the regulation text word segmentation dictionary containing the frequency. And segmenting the regulation file corpus according to the table.

A method based on keyword word frequency-reverse file frequency (Keyword Frequency-Inverse Document Frequency, KF-IDF) is adopted to extract knowledge terms in the unsafe event field. A term is a term that is related to a certain category if it appears more frequently than other terms in that category of text and less frequently than other terms in that category of text. The KF-IDF value was calculated according to this principle:

Wherein docs (term, cat) is the number of documents in the category containing the term, is the number of categories in which the term appears, and α is a smoothing factor. In the regulation file, each item is defined as a document, and unsafe events described by the items correspond to categories obtained by regular matching of the titles and contents of the items.

The redundancy in the results obtained using KF-IDF values is further screened. The filtering is accomplished by the parts of speech of these term phrases. Nouns (phrases) (e.g., "fly-away") or combinations of nouns (phrases) and verbs (e.g., "deviate from control instructions") have a high probability of forming valid terms, and only those terms are retained, while other non-part-of-speech requiring phrases in the list are culled. And finally, performing manual auditing and increasing and decreasing to ensure the quality of the domain knowledge base.

Step 3: constructing a text classification deep learning model to accurately classify the empty-management safe text event under the guidance of domain knowledge; the text classification deep learning model comprises an embedding module, a wide module, a deep module and a classifier.

The embedding module converts the input text and the domain knowledge text into embedded vectors, thereby enabling the deep learning model to understand the natural language text. For text sequence input { w ₁,w₂,. }, first, it is tokenized by a token analyzer to obtain a string of token sequences of length NNext, the word-level embedding sequence S ^w is calculated from BERT, namely:

wherein, And (3) representing an embedded vector of the nth word element, wherein h is the hidden layer dimension of the BERT.

For each domain knowledge text, the average value of the embedded vector of each word source is used for representing the domain knowledge text, so that all domain knowledge texts are represented as a complete sequence, and not a single sequence for each domain knowledge text. This approach can reduce the computational effort to the original few percent.

For a set of domain knowledge texts of number M, l= { P ₁,P₂,…,P_M }, a representation of each of the domain knowledge texts is calculated using P _m as an example. Obtaining a sequence of tokens of length l (m) using the same token analyzerThen calculate the embedding of P _j:

when M is not equal to N, it needs to be filled or truncated so that its length becomes M. Converting L into a domain knowledge text embedding sequence S ^p, namely:

Where e _li represents the phrase-level embedded vector of the i-th domain knowledge text. The matrix forms of S ^w and S ^p are respectively:

The wide module focuses on memorization and recall of various domain knowledge of event tags. First, a rolling average is performed on the embedded vectors of the input text per word. Specifically, the embedded sequences M ^p of the input text M ^w and the domain knowledge text in the form of a matrix are respectively represented as a query matrix Q ^W and a key matrix K ^W, and there are:

K^W＝M^p

where r is the half step of the moving average and h is the dimension of the embedded vector. The value matrix is a single-heat matrix of the category to which the domain knowledge text belongs, namely:

Where K represents the dimension of the one-hot encoding of the tag, i.e., the number of categories. The visual interpretation of the three matrices is that the nth row of matrix Q ^W represents the embedded vector of the nth word element of the input text The m-th row of matrix K ^W represents phrase-level embedding/>, of the m-th domain knowledge textAnd the m line of V ^W is the one-hot code of the category to which the m-th domain knowledge text belongs.

Next, the calculation of the attention score is achieved by regularization of the attention mechanism of the compatibility:

Β＝(Q^WK^W)^T

wherein, beta _j is the compatibility degree between token sequence and domain knowledge text sequence element by element, beta is the j-th column vector of beta. Since V ^W uses single-hot coding of a class, the calculation result of this step actually represents the probability that each token input belongs to a certain class.

The attention scores of regularized compatibility are pooled using maximum pooling. For a ^W, the probability that the input text belongs to the kth class is:

the deep module exploits the generalization capability of the deep network to mine the depth features of the input text. Deep modules are divided into two types of blocks: the deep text attention block focuses on important words in sentences through a self-attention module training model, and the deep field attention block focuses on important field knowledge through a common attention module training model. The depth extraction of the text information is achieved by superimposing a plurality of blocks. The results of the two parts are combined at the end of the wide module.

Depth text attention block: the input text is processed using a self-attention module. Unlike the class attention mechanism in the broad module, the attention module in the deep text attention block incorporates a learnable parameterAnd/>Calculating self-attention of the input text:

A^X＝Attention(Q^X,K^X,V^X)

A^X′＝M^w+LayerNorm(A^X)

A ^X′ is then non-linearised by the feed-forward layer and then goes to the next deep text attention block.

Since a ^X′ is word-by-word, it is necessary to calculate an average pooling for each word of the input sequence at the end of all deep text attention blocks:

Where d ^X represents sentence-level embedding of the input text after the self-attention mechanism processing, and is also the output of the deep text attention block.

Depth field attention block: deep domain attention block essence is to let a model learn the potential relevance of the input text to domain knowledge and attach the corresponding domain knowledge text insert to the sentence insert of the input text.

Depth field attention blocks are more similar to wide modules. The difference between the two is that the attention mechanism of the deep domain attention block has a learnable parameter, and the value is an embedded vector of domain knowledge text, not a single-hot matrix in the wide module. The input embedded sequence calculates the attention after matrix multiplication with the parameters, namely:

Α^D＝Attention(Q^D,K^D,V^D)

A^D′＝M^p+LayerNorm(Α^D)

average pooling was also calculated for a ^D′:

Vector d ^D is the output of the depth field attention block, and d ^D is also non-linearized by the feed-forward layer before entering the next depth field attention block.

The output d of the deep module is obtained by combining the outputs of the multi-layer deep text attention block and the deep field attention block:

d＝d^X+d^D

two vectors are obtained in the three modules: the dimension of the output w of the wide module is the number K of labels; sentence-by-sentence depth text feature d of the deep module, and dimension is dimension h of the embedded vector. The classifier integrates the two vectors and outputs a predictive probability of the category to which the text belongs. First, d is reduced in dimension, and two vectors are integrated into one vector x:

d′＝W₁d+b₁

x＝concat(w,d′)

Wherein, W ₁ and b ₁ are parameters of a linear layer for reducing d; concat represents the connection of vectors. Inputting x into a fully connected layer with tanh as an activation function for classification, and converting the result into a prediction probability through a softmax function:

Wherein W ₂ and b ₂ are parameters of the full connection layer.

For the multi-label classification problem referred to in this patent, it is desirable to minimize the binary cross entropy (Binary Cross Entropy, BCE) loss function between the predicted and real values:

where y is the true class of the sample.

Step 4: and taking the historical unsafe event record of the empty pipe dangerous source database as a model input sample to carry out model training, and iteratively updating model parameters, thereby realizing accurate empty pipe unsafe event classification guided by field knowledge.

And setting training parameters, training modes and other attributes of the text classification deep learning model, and training on a training set. AdamW of transformers libraries was used as optimizer and different learning rates were employed for BERT and downstream modules: the initial learning rate of BERT is 2e-5, and the initial learning rate of other modules is 1e-4. The batch size is 16, the training round number is 5, and the maximum length of the word sequence is 256. The dataset was assembled as per 8:2 is randomly divided into a training set and a test set, and 13 th generation is arrangedTraining on a server of the Kui ^TM i9-13900K 3.00GHz CPU, NVIDIA GeForce RTX 4090GPU and 128GB RAM takes about 7.5 hours.

In the present embodiment, 8000 records of risk sources in the national aviation agency of 2012 to 2022 are taken as an example, and the following evaluation indexes are adopted:

multi-label classification accuracy: calculating the frequency of complete matching of the prediction and the real label, setting n as the number of samples, and setting the accuracy as follows:

Wherein the method comprises the steps of Is the predicted class of the i-th sample, y _i is the true class of the i-th sample, 1 (x) is the indicator function: if x is true, the value of the function will be 1, otherwise 0.

Top-k accuracy: the Top-k accuracy calculates the number of instances containing the correct tag in the Top k predictive tags with the highest scores. The Top-k accuracy is:

wherein, Is the class with the j-th highest prediction score in the i-th sample. Common k values are 3 (noted P@3) and 5 (noted P@5).

TABLE 1 comparative experiment results of the present invention and mainstream method

Table 1 shows the experimental results of the method of the present invention compared with the mainstream text classification method. As can be seen from the table, the prediction model of the invention has higher precision and better prediction effect than the main stream text classification model.

Claims

1. The field knowledge guided empty pipe unsafe event classification method is characterized by comprising the following steps of:

2. The method for classifying an unsafe event for an empty pipe guided by domain knowledge according to claim 1, wherein the classifying the unsafe event text data acquired in advance in step (1) is implemented as follows:

3. The method for classifying an empty pipe unsafe event guided by domain knowledge according to claim 1, wherein the preprocessing of the text data in step (1) is implemented as follows:

4. The method for classifying an empty pipe unsafe event guided by domain knowledge according to claim 1, wherein said step (2) is implemented as follows:

5. The domain knowledge guided empty pipe unsafe event classification method according to claim 1, wherein step (3) said embedding module converts the input text and domain knowledge text into an embedded vector; the specific implementation process is as follows:

6. the domain knowledge guided empty pipe unsafe event classification method according to claim 1, wherein said wide module of step (3) implements memorization and recall of various domain knowledge of event tags; the specific implementation process is as follows:

K^W＝M^p

Β＝(Q^WK^W)^T

7. A domain knowledge guided empty pipe unsafe event classification method according to claim 1, wherein said deep module of step (3) comprises a deep text attention block and a deep domain attention block; the deep text attention block trains a model to pay attention to important words in sentences through the self-attention module; the depth domain attention block trains a model to pay attention to important domain knowledge through a common attention module; and combining the outputs of the multi-layer deep text attention block and the deep field attention block to obtain the output of the deep module, thereby realizing the deep extraction of text information.

8. A domain knowledge guided empty pipe unsafe event classification method according to claim 7, wherein said deep text attention block processes input text using a self-attention module; adding learnable parameters in an attention mechanismAnd/>Calculating self-attention of the input text:

A^X＝Attention(Q^X,K^X,V^X)

Residual connection and Layer Normalization (LN) were then performed for a ^X:

A^X′＝M^w+LayerNorm(A^X)

9. The method for classifying an empty pipe unsafe event guided by domain knowledge according to claim 7, wherein said deep domain attention block adds a learnable parameter to an attention mechanism, and value is an embedded vector of domain knowledge text, and the input embedded sequence calculates attention after matrix multiplication with the parameter:

Α^D＝Attention(Q^D,K^D,V^D)

A^D′＝M^p+LayerNorm(Α^D)

average pooling was also calculated for a ^D′:

10. The domain knowledge guided empty pipe unsafe event classification method according to claim 1, wherein the classifier implementation process of step (3) is as follows:

d′＝W₁d+b₁

x＝concat(w,d′)

Wherein W ₂ and b ₂ are parameters of the full connection layer;

where y is the true class of the sample.