CN110895705B

CN110895705B - Abnormal sample detection device, training device and training method thereof

Info

Publication number: CN110895705B
Application number: CN201811067951.5A
Authority: CN
Inventors: 庞占中; 于小亿; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2024-05-14
Anticipated expiration: 2038-09-13
Also published as: CN110895705A

Abstract

The present disclosure relates to a training device and a training method for training an abnormal sample detection device, and an abnormal sample detection device. The training device according to the present disclosure includes a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are subjected to joint training based on predetermined criteria regarding the first reconstruction error and the second reconstruction error. Comprises a first reconstruction unit and a back-end processing unit which are subjected to joint training. Compared with the prior art, the abnormal sample detection device can improve the abnormal sample detection performance.

Description

Abnormal sample detection device, training device and training method thereof

Technical Field

The present invention relates generally to the field of classification and detection, and more particularly, to a training device and training method for training an abnormal sample detection device and an abnormal sample detection device trained by the training device and training method.

Background

The purpose of abnormal sample detection is to identify abnormal samples that deviate from normal samples. The abnormal sample detection has important practical value and wide application range. For example, abnormal sample detection may be applied to industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.

With the continuous development of artificial intelligence technology, deep learning is applied to solve the problem of abnormal sample detection. However, the specificity of the abnormal sample detection problem presents a significant challenge for deep learning. First, the purpose of abnormal sample detection is to distinguish normal samples from abnormal samples, however, unlike conventional classification models, abnormal samples occur less frequently, resulting in difficulty in collecting enough abnormal samples for classification training. For example, when the abnormal temperature sample detection is applied to detect an abnormality in the operating temperature of an industrial machine, there may be a case where the operating temperature of the machine is abnormal only once or twice in the collected data for several days, so that the collected abnormal temperature sample is insufficient for classification training.

Furthermore, even if enough abnormal samples are collected, it is impossible to obtain complete knowledge of the abnormal samples. For example, in video monitoring, it is assumed that an abnormal situation occurring in a pedestrian street, such as a bicycle, a motor vehicle, or the like, is monitored. But the types of abnormal samples in the actual scene may exceed the pre-estimated case. For example, if the appearance of a bicycle, a motor vehicle is predefined as an abnormal sample class, it is difficult to judge whether these samples are normal samples or abnormal samples when objects such as skateboards, wheelchairs, tricycles, etc. appear in a monitoring scene.

Currently, the solution to the above-mentioned problem is based on the following idea: since the abnormal sample class cannot be defined completely, only the normal sample class is defined, and thus any sample not belonging to the normal sample class is defined as belonging to the abnormal sample class.

Current abnormal sample detection techniques include reconstruction error based abnormal sample detection techniques (e.g., SROSR (sparse representation-based open set recognition)), probability density based abnormal sample detection techniques (e.g., DAGMM (Deep Autoencoding Gaussian Mixture Model)), energy based abnormal sample detection techniques (e.g., DSEBM (Deep Structured Energy Based Model)), and the like. Among these existing abnormal sample detection techniques, an abnormal sample detection technique based on a reconstruction error is widely used because it is simple and has good performance.

In particular, the reconstruction error refers to an error between an input sample of the reconstruction model and the reconstruction sample, wherein the reconstruction model is capable of compressing the input sample to extract the feature data and reconstructing the input sample based on the extracted feature data. For the reconstruction model, the smaller the reconstruction error between the input sample and the reconstruction sample, the better the reconstruction effect of the reconstruction model is indicated.

Only normal samples are used for training in the training process of the reconstruction model, that is, the reconstruction model only learns how to reconstruct the normal samples. The trained reconstruction model generates smaller reconstruction errors for normal samples, and when the input samples are abnormal samples, the reconstruction model does not learn how to reconstruct the abnormal samples, so larger reconstruction errors are generated. In this way, the reconstruction model can distinguish the normal sample from the abnormal sample according to the size of the reconstruction error, thereby realizing abnormal sample detection.

However, the reconstruction model of the prior art has the following problems in practical application: the abnormal sample may be less different from the normal sample and thus difficult to correctly identify. Therefore, there is still a need for an abnormal sample detection technique that can more accurately distinguish between normal samples and abnormal samples.

Disclosure of Invention

In order to further improve the abnormal sample detection performance, an abnormal sample detection technique is proposed herein, which uses only normal samples as training data in a training process, reconstructs the normal samples by using a front-end reconstruction model, and performs further back-end processing on information extracted by the front-end reconstruction model to perform joint training with the front-end reconstruction model.

A brief summary of the disclosure will be presented below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

It is an object of the present disclosure to provide a training device and a training method for training an abnormal sample detection device. The training abnormal sample detection device trained by the training device and the training method can more accurately distinguish normal samples from abnormal samples.

To achieve the object of the present disclosure, according to one aspect of the present disclosure, there is provided a training apparatus for training an abnormal sample detection apparatus, the training apparatus including: a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are subjected to joint training based on predetermined criteria regarding the first reconstruction error and the second reconstruction error.

According to another aspect of the present disclosure, there is provided an abnormal sample detection apparatus including a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to the above aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a training method for training an abnormal sample detection apparatus, the training method including: a first reconstruction step of generating, by a first reconstruction unit, first reconstruction errors and intermediate feature data based on training sample data as normal sample data; a back-end processing step for generating, by the back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and a joint training step of performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion regarding the first reconstruction error and the second reconstruction error.

According to another aspect of the present disclosure, there is provided a computer program capable of implementing the training method described above. Furthermore, a computer program product in the form of at least a computer readable medium is provided, on which computer program code for implementing the training method described above is recorded.

The abnormal sample detection device for training according to the technology disclosed by the invention is used for training based on the normal sample, wherein the information extracted by the front-end reconstruction model is fully utilized in the training process, so that the normal sample and the abnormal sample can be distinguished more accurately.

Drawings

The above and other objects, features and advantages of the present disclosure will be more readily understood by reference to the following description of the embodiments of the disclosure taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram illustrating a training apparatus for training an abnormal sample detection apparatus according to the present disclosure;

FIG. 2 is a schematic diagram illustrating a first reconstruction unit implemented using a depth convolutional self-encoder in accordance with an embodiment of the present disclosure;

FIG. 3 is a block diagram illustrating a training apparatus according to a first embodiment of the present disclosure;

FIG. 4 is a schematic view showing the construction of a training device according to a first embodiment of the present disclosure;

fig. 5 is an operational flow diagram illustrating a second reconstruction unit implemented using a long and short term memory model (DCAE) according to a first embodiment of the present disclosure;

FIG. 6A is a graph showing a probability distribution of a first reconstruction error for DCAE;

FIG. 6B is a diagram showing a combined distribution of a first reconstruction error e and a second reconstruction error e' for DCAE;

FIG. 7 is a schematic view showing the construction of a training device according to a second embodiment of the present disclosure;

fig. 8 is a graph illustrating a method for predicting a second reconstruction error according to a second embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a training method for training an abnormal sample detection apparatus according to an embodiment of the present disclosure; and

FIG. 10 is a block diagram illustrating a general machine that may be used to implement a training apparatus and training method according to embodiments of the present disclosure.

Detailed Description

Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the attached illustrative drawings. Where elements of the drawings are designated by reference numerals, the same elements will be designated by the same reference numerals although the same elements are illustrated in different drawings. Further, in the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted where it may make the subject matter of the present disclosure unclear.

[01] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular is intended to include the plural unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "having," when used in this specification, are intended to specify the presence of stated features, entities, operations, and/or components, but do not preclude the presence or addition of one or more other features, entities, operations, and/or components.

[02] Unless defined otherwise, all terms including technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this inventive concept belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. The present disclosure may be practiced without some or all of these specific details. In other instances, only components that are germane to schemes according to the present disclosure have been shown in the drawings, while other details that are not germane to the present disclosure have been omitted in order to avoid obscuring the present disclosure with unnecessary detail.

Hereinafter, a training apparatus and a training method for training an abnormal sample detection apparatus according to each embodiment of the present disclosure will be described in detail with reference to the accompanying drawings.

< First embodiment >

First, a training apparatus 100 for training an abnormal sample detection apparatus according to a first embodiment of the present disclosure will be described with reference to fig. 1 to 6B.

Fig. 1 is a block diagram illustrating a training apparatus 100 for training an abnormal sample detection apparatus according to the present disclosure.

As shown in fig. 1, the training apparatus 100 may include: a first reconstruction unit 101 for generating a first reconstruction error and intermediate feature data based on training sample data, which is normal sample data, and a back-end processing unit 102 for generating a second reconstruction error based on the first reconstruction error and intermediate feature data. During training, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria regarding the first reconstruction error and the second reconstruction error. The first reconstruction unit 101 and the back-end processing unit 102, which are finally obtained through the joint training, together form an abnormal sample detection device.

According to one embodiment of the present disclosure, the first reconstruction unit 101 may perform a reconstruction operation for the intermediate feature data to output first reconstruction data. Further, according to an embodiment of the present disclosure, the first reconstruction error may be a distance, e.g., euclidean distance, of the first reconstruction data from the corresponding training sample data in the vector space.

Those skilled in the art will recognize that while embodiments of the present disclosure have illustrated the first reconstruction error using euclidean distance as an example, the present disclosure is not so limited. Indeed, other indicators besides euclidean distance may be employed by those skilled in the art for measuring differences between the first reconstructed data and the training sample data, such as mahalanobis distance, cosine distance, etc. in vector space, all of which are equally within the scope of the present disclosure.

According to one embodiment of the present disclosure, the dimension of the training sample data is greater than or equal to the dimension of the intermediate feature data. In fact, the first reconstruction unit 101 may perform a feature extraction operation on the training sample data, the intermediate feature data characterizing the extracted features. For example, where the techniques according to this disclosure are applied to image recognition, the training sample data may be normal two-dimensional image data and the intermediate feature data may be a one-dimensional vector characterizing features of the extracted two-dimensional image data.

Further, in the case where the technique according to the present disclosure is applied to industrial control, the training sample data may be a one-dimensional vector constituted by data sensed by each industrial sensor, and the intermediate feature data may be a one-dimensional vector having a smaller number of elements than the training sample data.

Subsequently, the first reconstruction unit 101 may perform a reconstruction operation based on the intermediate feature data, resulting in first reconstruction data. The first reconstructed data has the same dimensions as the training sample data.

According to one embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by a self-encoder.

The self-encoder is a neural network, and includes an input layer, a hidden layer, and an output layer each including neurons. The self-encoder can realize the compression and decompression processing of data.

The self-encoder is composed of an encoder and a decoder, both of which are essentially performing some kind of transformation on the data. The encoder is used to encode (compress) input data into low-dimensional data, and the decoder is used to decode (decompress) the compressed low-dimensional data into output data. The ideal self-encoder makes the reconstructed output data identical to the original input data, i.e. the error between the input data and the output data is zero.

Since the self-encoder is a technology known to those skilled in the art, the details of the self-encoder are not further described herein for brevity. Further, those skilled in the art will recognize that although embodiments of the present disclosure implement the first reconstruction unit 101 by using a self-encoder, the present disclosure is not limited thereto. Indeed, in light of the ideas of the present disclosure, the person skilled in the art may implement the reconstruction function of the first reconstruction unit using other reconstruction models than the self-encoder, as long as the reconstruction model is capable of extracting intermediate feature data and calculating the first reconstruction error. All such reconstruction models are intended to be within the scope of the present disclosure.

Fig. 2 is a schematic diagram illustrating a first reconstruction unit 101 implemented using a depth convolution self-encoder according to an embodiment of the present disclosure.

As shown in fig. 2, according to one embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by a depth convolution self-encoder (DCAE).

As shown in fig. 2, both the encoder and decoder of the DCAE are implemented by a Convolutional Neural Network (CNN), and thus can process complex image data.

Since CNN is a technology known to those skilled in the art, the details of CNN are not further described herein for the sake of brevity.

Those skilled in the art will recognize that while embodiments of the present disclosure are illustrated by applying a depth convolution self-encoder to image data, the present disclosure is not so limited. Depending on the type of sample data to be processed in a particular application environment, different types of self-encoders may be applied. For example, when the abnormal sample detection apparatus according to the present disclosure is applied to an industrial control environment, the input data may be data composed of data sensed by various sensors, in which case the technical solutions of the present disclosure may be implemented using a sparse self-encoder, all of which are intended to be within the scope of the present disclosure.

For example, as shown in fig. 2, the encoder and decoder of the DCAE constituting the first reconstruction unit 101 each include several hidden layers, such as a convolution layer on the encoder side, a pooling layer and a full connection layer, and a deconvolution layer, an anti-pooling layer and a full connection layer on the decoder side. In this regard, features of spatial information may be learned by a convolutional layer, and information redundancy in the learned feature map may be eliminated by a pooling layer. In addition, in order to obtain a highly generalizable DCAE through learning, some additional processing may be employed in training, including noise addition, whitening, clipping and flipping of training sample data. Furthermore, during training, the DCAE may employ dropout processing and regularization processing to prevent overfitting.

Specifically, for the training sample data x that is input as normal sample data, the encoder of the DCAE will perform feature extraction on the training sample data x to obtain intermediate feature data h of a low dimension. Furthermore, the decoder of the DCAE restores the intermediate feature data h into first reconstructed data x' having dimensions consistent with the training sample vector x. The above-described process can be represented by the following formula (1).

h＝f₁(W₁x+b₁),x'＝f₂(W₂h+b₂) (1)

Where f ₁、f₂ is the activation function, W ₁ is the connection weight matrix of the neurons of the convolutional neural network on the encoder side, b ₁ is the bias vector of the neurons of the convolutional neural network on the encoder side, W ₂ is the connection weight matrix of the neurons of the convolutional neural network on the decoder side, b ₂ is the bias vector of the neurons of the convolutional neural network on the decoder side, where W ₁、b₁ is the parameter to be trained on the encoder side, which may be represented by θ _e, and W ₂、b₂ is the parameter to be trained on the decoder side, which may be represented by θ _d.

In view of the above-described processes involving DCAE, which are known to those skilled in the art, for the sake of brevity, only the application of DCAE in the embodiments of the present disclosure will be described herein, without a more detailed description of its principles.

Assuming that the training sample set contains N training samples x _i (1.ltoreq.i.ltoreq.N), the loss function of DCAE that can be used to characterize the first reconstruction error is represented by the following equation (2).

Wherein subscript 2 indicates taking the second norm.

To avoid overfitting and improve generalization capability, according to one embodiment of the present disclosure, regularization terms may be added to the loss function of equation (2) above. Thus, the loss function of the above formula (2) may have a form shown in the following formula (3).

Where θ _i is the parameters to be trained for the fully connected layer in the convolutional neural network that constitutes the encoder and decoder of the DCAE, k is the number of parameters to be trained in the fully connected layer, and λ ₁ is a predetermined hyper-parameter, i.e. regularization parameter, which can be determined empirically or experimentally. According to one embodiment of the present disclosure, lambda ₁ may take on a value of, for example, 10000.

Since the loss function of DCAE is known to those skilled in the art, the details thereof will not be further described here for the sake of brevity.

In the joint training process of the training apparatus 100, the parameters to be trained of the first reconstruction unit 101 by the DCAE are θ _e and θ _d. In the case where the loss function contains a regularization term, the band training parameters also include θ _i.

It should be noted here that, for the i-th training sample data x _i among the N training sample data, the first reconstruction unit 101 generates first reconstruction data x' _i corresponding to x _i by performing a reconstruction operation, the difference between which is the first reconstruction error e _i with respect to the training sample data x _i. The loss function of the above equations (2) and/or (3) may thus represent the overall first reconstruction error for the N training sample data in this sense. Thus, the training process of minimizing the first reconstruction error of the first reconstruction unit 101 may be regarded as a process of minimizing the loss function of the first reconstruction unit 101.

Further, as shown in fig. 2, the first reconstruction unit 101 implemented by the DCAE may generate the first reconstruction error e and the intermediate feature data h based on the training sample data x, wherein the dimension of the training sample data x is greater than or equal to the dimension of the intermediate feature data h.

As described above, the first reconstruction error e may be a difference between the input training sample data and the output first reconstruction data of the first reconstruction unit 101, which may be represented by euclidean distances of the training sample data and the first reconstruction data in the vector space. The intermediate feature data h may represent features of the extracted training sample data, and thus both contain some information of the originally input training sample data.

In the prior art, for normal sample data, the first reconstruction unit implemented by DCAE can obtain a better reconstruction effect, i.e. the first reconstruction error between the input sample data and the output reconstruction data is smaller. However, the first reconstruction unit trained using normal sample data obtains a large first reconstruction error when reconstructing abnormal sample data, thereby being able to distinguish the abnormal sample data.

However, for some abnormal sample data, the first reconstruction error obtained by the first reconstruction unit may also be relatively small and thus may be mistaken for normal sample data, resulting in abnormal sample detection failure.

Therefore, according to the technical scheme disclosed by the invention, the first reconstruction error e and the intermediate characteristic data h are comprehensively considered, so that training sample data is utilized to the greatest extent.

According to the present disclosure, the back-end processing unit 102 may further process the first reconstruction error e and the intermediate feature data h obtained by the first reconstruction unit 101 to obtain a second reconstruction error e'. In this way, the training device 100 can perform the joint training on the first reconstruction unit 101 and the back-end processing unit 102 based on the predetermined criteria regarding the first reconstruction error e and the second reconstruction error e', which takes into account not only the information of the training sample data included in the first reconstruction error e but also the information of the training sample data included in the intermediate feature data h, and thus can obtain the abnormal sample detection device with improved detection accuracy by the joint training.

The back-end processing unit 102 according to embodiments of the present disclosure may process the first reconstruction error e and the intermediate feature data h in a variety of ways.

Fig. 3 is a block diagram illustrating a training apparatus 100 according to a first embodiment of the present disclosure, wherein an example of a back-end processing unit 102 is given. Fig. 4 is a schematic configuration diagram showing the training device 100 according to the first embodiment of the present disclosure.

As shown in fig. 3, according to the first embodiment of the present disclosure, the back-end processing unit 102 may include a synthesizing unit 1021 for generating synthesized data based on the first reconstruction error and the intermediate feature data, and a second reconstruction unit 1022 for generating a second reconstruction error based on the synthesized data, wherein joint training is performed on the first reconstruction unit 101, the synthesizing unit 1021, and the second reconstruction unit 1022 according to a predetermined criterion that minimizes the first reconstruction error and the second reconstruction error.

According to one embodiment of the present disclosure, the synthesizing unit 1021 may generate the synthesized data z based on the first reconstruction error e and the intermediate feature data h. For example, the synthesizing unit 1021 may directly splice the first reconstruction error e and the intermediate feature data h together to form the synthesized data z. Typically, the first reconstruction error e is a numerical value and the intermediate feature data h is a one-dimensional vector, which are directly stitched together to form a new one-dimensional vector as the composite data z. However, the present disclosure is not limited thereto. The first reconstruction error e and the intermediate feature data h may be otherwise combined by those skilled in the art in light of the teachings of the present disclosure to form the composite data z.

In some cases, the first reconstruction error e may differ greatly in dimension and magnitude from the intermediate feature data h and thus cannot be directly combined.

In this case, according to one embodiment of the present disclosure, the synthesizing unit 1021 may perform normalization processing on the intermediate feature data h to match the first reconstruction error e in terms of dimension and magnitude, and then may combine the normalized intermediate feature data with the first reconstruction error e into synthesized data z. For example, the normalization processing may be normalization processing of each data element of the intermediate feature data h based on the first reconstruction error e.

In addition, the intermediate feature data h is compressed low-dimensional data, which itself does not have a sequence characteristic. As shown in fig. 4, in the case where the second reconstruction unit 1022 is implemented by a long short term memory model (LSTM) (described in more detail later), in order to facilitate further processing of the synthesized data z, sequence learning may be performed on the intermediate feature data h according to one embodiment of the present disclosure.

For example, the sequence learning may be performed according to the following expression (4).

h′＝f₃(W₃·h+b₃) (4)

Where h' is the serialized intermediate feature data obtained by performing sequence learning on the intermediate feature data h, W ₃ and b ₃ are the connection weight matrix and bias vector as parameters to be learned, and f ₃ is the activation function. Here, the parameters W ₃ and b ₃ used for the sequence learning performed by the synthesizing unit 1021 are trained in the joint training process of the first reconstruction unit 101, the synthesizing unit 1021, and the second reconstruction unit 1022. It should be noted that the synthesizing unit 1021 itself has no loss function, and the influence of the parameters W ₃ and b ₃ on the joint training is reflected in the loss function of the second reconstructing unit 1022.

Subsequently, the resulting serialized intermediate feature data h' after sequence learning is combined with the first reconstruction error e to obtain the synthetic data z.

Here, it should be noted that it is not necessary to perform sequence learning on the intermediate feature data h. For example, the second reconstruction unit 1022 may also be implemented by DCAE, in which case sequence learning is not required to be performed on the intermediate feature data h.

As shown in fig. 4, the synthesized data z synthesized by the synthesizing unit 1021 may be input into the second reconstructing unit 1022, and the second reconstructing unit 1022 performs a reconstruction operation on the synthesized data z, and calculates a second reconstruction error e 'from a difference between the obtained second reconstruction data z' and the synthesized data z.

According to one embodiment of the present disclosure, the second reconstruction error e 'may be a distance, e.g., euclidean distance, of the second reconstruction data z' from the synthesized data z in the vector space.

Similar to the first reconstruction error e, although the embodiment of the present disclosure describes the second reconstruction error e' by using the euclidean distance as an example, the present disclosure is not limited thereto. Indeed, other indicators besides euclidean distance may be employed by those skilled in the art for measuring differences between the second reconstructed data and the synthesized data, such as mahalanobis distance, cosine distance, etc., all of which are equally within the scope of this disclosure.

As shown in fig. 4, according to one embodiment of the present disclosure, the second reconstruction unit 1022 may be implemented using a Long Short Term Memory (LSTM) model.

The LSTM model is a sequence Recurrent Neural Network (RNN) that is adapted to process and predict very long-spaced and delayed events of importance in sequence features. The LSTM model is capable of learning long-range dependencies through its memory cells, which typically includes four cells, namely input gate i _t, output gate o _t, forget gate f _t, and storage state c _t, where t represents the current time step. The memory state c _t affects the current state of other cells according to the state of the last time step. The forget gate f _t can be used to determine which information should be discarded. The above process can be represented by the following formula (5)

i_t＝σ(W_(i,x)x_t+W_(i,h)h_t-1+b_i)

f_t＝σ(W_(f,x)x_t+W_(f,h)h_t-1+b_f)

g_t＝tanh(W_(g,x)x_t+W_(g,h)h_t-1+b_g)

c_t＝i_t⊙g_t+f_t⊙c_t-1 (5)

o_t＝σ(W_(o,x)x_t+W_(o,h)h_t-1+b_o)

h_t＝o_t⊙tanh(c_t)

Where σ is a sigmoid function, by which it is meant that the vector elements multiply in sequence, x _t is meant for the input of the current time step t, h _t is meant for the intermediate state of the current time step t, and o _t is meant for the output of the current time step t. The connection weight matrix W _(i,x)、W_(f,x)、W_(g,x)、W_(o,x) and the bias vector b _i、b_f、b_g、b_o are parameters to be trained, denoted herein as θ _l.

Since the LSTM model is known to those skilled in the art, for brevity, only its application in the embodiments of the present disclosure will be described herein, without a more detailed description of its principles.

According to an embodiment of the present disclosure, in order to improve the effect of the reconstruction operation, the second reconstruction unit 1022, which is implemented by the LSTM model, may perform both forward propagation and backward propagation.

Fig. 5 is an operation flow diagram illustrating the second reconstruction unit 1022 implemented using the long-short term memory model according to the first embodiment of the present disclosure.

As shown in fig. 5, the LSTM model implementing the second reconstruction unit 1022 receives the synthesized data z, which includes the serialized intermediate feature data h 'and the first reconstruction error e, and forward propagates in n LSTM units, where the number of n is equal to the vector length of the serialized intermediate feature data h'.

Furthermore, to improve the effect of the reconstruction operation, the LSTM model also performs back propagation for reconstruction. In fig. 5, the symbols with wavy line superscripts represent the reconstruction results of forward propagation, and the symbols with sharp corners superscripts represent the reconstruction results of reverse propagation.

Thus, the loss function for realizing the LSTM model of the second reconstruction unit 1022 can be represented by the following equation (6).

Where h '_i,j denotes the jth sequence vector of the ith intermediate feature data in the N serialized intermediate feature data h',Represents an intermediate state of forward propagation corresponding to h' _i,j, andRepresenting the back-propagated intermediate state corresponding to h' _i,j. Furthermore, e _i denotes the i-th first reconstruction error of the N first reconstruction errors, andThe result of the forward propagation corresponding to e _i is shown.

Furthermore, lambda ₂ is a predetermined super-parameter that can be used to adjust the specific weight of the serialized intermediate feature data h 'and the first reconstruction error e in the resulting second reconstruction error e', which can be determined empirically or experimentally. For example, lambda ₂ has a value in the range of 0.1 to 1.

It should be noted here that, due to the recursive nature of the LSTM model and the physical meaning of the first reconstruction error e, the LSTM performs only forward propagation on the first reconstruction error e.

As described above, for the i-th training sample data x _i of the N training sample data, the first reconstruction data corresponding to x _i is x' _i, and the difference between the two is the first reconstruction error e _i of the first reconstruction unit 101 with respect to the training sample data x _i. Furthermore, the first reconstruction unit 101 generates intermediate feature data h _i for the i-th training sample data x _i. The synthesizing unit 1021 performs sequence learning on the intermediate feature data h _i to obtain serialized intermediate feature data h' _i, and combines it with the first reconstruction error e _i into synthesized data z _i.

The LSTM model for implementing the second reconstruction unit 1022 generates two reconstructed intermediate feature data for the serialized intermediate feature data h' _i by forward propagation and backward propagation, respectively, and generates a reconstructed first reconstruction error for the first reconstruction error e _i by forward propagation.

In summary, the loss function of the above equation (6) may represent the second reconstruction error with respect to the population of N synthesized data in this sense.

As described above, in the training apparatus 100 according to the first embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on the normal sample data for training; the synthesizing unit 1021 may combine the first reconstruction error and the intermediate feature data into synthesized data; subsequently, the second reconstruction unit 1022 may perform reconstruction on the synthetic data to generate a second reconstruction error, wherein the first reconstruction error generated by the first reconstruction unit may be generally represented by the above formula (2) or (3), and the second reconstruction error generated by the second reconstruction unit may be generally represented by the above formula (6).

According to the first embodiment of the present disclosure, the predetermined criterion on which the joint training is performed on the first reconstruction unit 101 and the back-end processing unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) is based on which the sum of the first reconstruction error and the second reconstruction error is minimized. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 shown in the following equation (7).

J(θ_e,θ_d,θ_l)＝J_DCAE(θ_e,θ_d)+λ₃J_LSTM(θ_l) (7)

Where lambda ₃ is a predetermined super-parameter, i.e., weight, that can be used to adjust the specific gravity of the first reconstruction error e and the second reconstruction error e' during the joint training process. In general, the loss function of the first reconstruction unit 101 should always be dominant in order to generate representative low-dimensional intermediate feature data and first reconstruction errors. Therefore, the super parameter λ ₃ is generally set to be less than 1, for example, the value of the super parameter λ ₃ ranges from 0.1 to 0.001.

The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end synthesis unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) using training sample data as normal sample data in a gradient descent method based on the loss function of the above formula (7) until a predetermined number of iterations is reached, or until a difference between results of two or more iterations stabilizes within a predetermined range. The first reconstruction unit 101 and the back-end synthesis unit 102 (for example, including the synthesis unit 1021 and the second reconstruction unit 1022) which are finally obtained by the joint training may constitute an abnormal sample detection device.

In the detection of abnormal sample data by the abnormal sample detection apparatus, when the input data is normal sample data, the second reconstruction error e' output by the back-end processing unit 102 (for example, including the synthesizing unit 1021 and the second reconstruction unit 1022) including information on the first reconstruction error e output by the first reconstruction unit 101 is smaller than a predetermined threshold value. The predetermined threshold may be used to distinguish between normal sample data and abnormal sample data. The predetermined threshold may be determined empirically or experimentally.

Therefore, when the sample data to be detected is input, if the second reconstruction error e' output by the back-end processing unit 102 (e.g., including the synthesizing unit 1021 and the second reconstruction unit 1022) is not less than the predetermined threshold value, it can be judged that the input sample data is abnormal sample data.

The following is a further explanation of the concepts of the present disclosure. Fig. 6A is a diagram showing a probability distribution of the first reconstruction error e of the DCAE, and fig. 6B is a diagram showing a distribution of the first reconstruction error e of the DCAE and the intermediate feature data h. The dark portions in fig. 6A and 6B correspond to normal samples for training, and the light portions correspond to abnormal samples to be detected.

As shown in fig. 6A, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is an overlap of probability distributions of normal samples and abnormal samples at the circled portion in the figure, and thus the first reconstruction unit 101 cannot accurately identify the abnormal samples within the portion. As shown in fig. 6B, further considering the first reconstruction error e in combination with the intermediate feature data h according to the technique of the present disclosure, it can be clearly seen that normal samples can be more accurately distinguished from abnormal samples.

Thus, according to the techniques of the present disclosure, both the first reconstruction error e and the intermediate feature data h are used in combination for training to preserve as much information as possible of the normal sample data input for training. By the processing according to the technology of the present disclosure, as shown in fig. 6B, a normal sample and an abnormal sample can be clearly distinguished, thereby improving the accuracy of abnormal sample detection.

The abnormal sample detection apparatus according to the first embodiment of the present disclosure was tested against a classical gray-scale image dataset MNIST commonly used in the art. The test results are shown in table 1 below.

TABLE 1

Where ρ represents an anomaly ratio, and indexes Prec (precision), rec (recall) and F1 (F value) are indexes for measuring detection performance commonly used in the existing anomaly sample detection technology. The definition is shown in the following formula (8):

TP, FN, FP and TN in formula (8) represent true positive, false negative, false positive and true negative, respectively.

The test results in table 1 show that the abnormal sample detection apparatus obtained by training performed by the training apparatus 100 according to the first embodiment of the present disclosure is superior to the related art abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM in terms of various indexes for measuring abnormal sample detection performance.

< Second embodiment >

Next, a training device 100 for training an abnormal sample detection device according to a second embodiment of the present disclosure will be described with reference to fig. 7 and 8.

The second embodiment of the present disclosure differs from the first embodiment in that the back-end processing unit 102 that performs back-end processing on the first reconstruction error and the intermediate feature data output by the first reconstruction unit 101 is implemented using a prediction mechanism, and thus, for brevity, a repetitive description of the first reconstruction unit 101 is not made here.

As described above, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is an overlap of probability distributions of normal samples and abnormal samples at the circled portion in fig. 6A, and thus the first reconstruction unit 101 cannot accurately identify the abnormal samples within the portion.

According to a second embodiment of the present disclosure, the back-end processing unit 102 may predict the second reconstruction error e 'based on the first reconstruction error e and the intermediate feature data h, wherein the predetermined criterion for performing the joint training on the first reconstruction unit 101 and the back-end processing unit 102 is to minimize the difference between the second reconstruction error e' and the first reconstruction error e.

According to one embodiment, the back-end processing unit 102 may be implemented by a multi-layer perceptron (MLP).

Fig. 7 is a schematic configuration diagram showing a training device 100 according to a second embodiment of the present disclosure.

As shown in fig. 7, the first reconstruction unit 101 of the training apparatus 100 of the second embodiment of the present disclosure is the same as the first reconstruction unit 101 of the first embodiment, except that the back-end processing unit 102 is implemented by an MLP.

MLP is a forward neural network with hidden layers that can be used to fit complex functions.

The second embodiment of the present disclosure is based on the idea that the second reconstruction error e' can be predicted from the intermediate feature data h by establishing a correspondence between the intermediate feature data h and the first reconstruction error e through training of the back-end processing unit 102 implemented by the MLP.

The second reconstruction error e' output by the back-end processing unit 102 of the MLP implementation can be represented by the following equation (9).

e'＝f_m(W_mh+b_m) (9)

Where f _m is the activation function, W _m is the connection weight matrix of the neurons of each layer in the MLP, b _m is the bias vector of the neurons, where W _m and b _m are the parameters of the MLP to be trained, which may be denoted herein by θ _m.

Training of the back-end processing unit 102 by the MLP may be regarded as establishing a correspondence between the intermediate feature data h and the first reconstruction errors e, for which the trained back-end processing unit 102 may predict the second reconstruction errors e 'with the aim of bringing the second reconstruction errors e' as close as possible to the corresponding first reconstruction errors e. For example, according to one embodiment of the present disclosure, training is performed on the MLP such that the difference between the second reconstruction error e' and the first reconstruction error e is minimized.

In summary, for N training samples, the cost function of the MLP that may be used to generally characterize the difference between the second reconstruction error e' and the first reconstruction error e may be represented by the following equation (10).

In this way, for the MLP trained with the training sample data as the normal sample data, the second reconstruction error e 'very close to the first reconstruction error e can be predicted, and for the abnormal sample data, the difference between the predicted second reconstruction error e' and the corresponding first reconstruction error e is very large, whereby the abnormal sample data can be identified.

As described above, in the training apparatus 100 according to the second embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on the normal sample data for training; the back-end processing unit 102 may establish a correspondence between the first reconstruction error, which may be generally represented by the above equation (2) or (3), and the intermediate feature data, and predict the second reconstruction error based thereon, which may be generally represented by the above equation (10).

According to a second embodiment of the present disclosure, the predetermined criterion on which the joint training is performed on the first reconstruction unit 101 and the back-end processing unit 102 is to minimize the difference between the first reconstruction error and the second reconstruction error. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 shown in the following equation (11).

J(θ_e,θ_d,θ_m)＝J_DCAE(θ_e,θ_d)+λ₄J_MLP(θ_m) (11)

Where lambda ₄ is a predetermined super-parameter, i.e., weight, that can be used to adjust the specific gravity of the first reconstruction error e and the second reconstruction error e' during the joint training process, which can be determined empirically or experimentally. In general, the loss function of the first reconstruction unit 101 should always be dominant in order to generate representative low-dimensional intermediate feature data and first reconstruction errors. Therefore, the super parameter λ ₄ is generally set to be less than 1, for example, the value of the super parameter λ ₄ ranges from 0.1 to 0.001.

The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end processing unit 102 using training sample data as normal sample data in a gradient descent method based on the loss function of the above formula (11) until a predetermined number of iterations is reached or until a difference between results of two or more iterations stabilizes within a predetermined range. The first reconstruction unit 101 and the back-end processing unit 102 obtained by this joint training may constitute an abnormal sample detection device.

The principle of the second embodiment of the present disclosure is further described below with reference to fig. 8. Fig. 8 is a graph illustrating a method for predicting a second reconstruction error e' according to a second embodiment of the present disclosure.

The graph shown in fig. 8 is schematic and corresponds to fig. 6A. The dark curve in fig. 8 corresponds to a normal sample for training, while the light curve corresponds to an abnormal sample to be detected.

As shown in fig. 8, since the first reconstruction unit 101 performs training using training sample data as normal sample data, the first reconstruction error is small for normal sample data and large for abnormal sample data. However, as shown in fig. 8, the probability distribution curve of the first reconstruction error with respect to the normal sample data intersects with the probability distribution curve of the first reconstruction error with respect to the abnormal sample data, resulting in that it is not possible to accurately judge whether the data input to the first reconstruction unit 101 is the normal sample data or the abnormal sample data based on the first reconstruction error in the intersecting portion.

Here, as shown in fig. 8, the normal sample data may be divided into two groups, the first group of normal sample data generating a smaller first reconstruction error e _n1 and corresponding intermediate feature data h _n1, and the second group of normal sample data generating a larger first reconstruction error e _n2 and corresponding intermediate feature data h _n2, the correspondence between (h _n1,h_n2) and (e _n1,e_n2) being established by joint training of the first reconstruction unit 101 and the back-end processing unit 102, when the differences between the second reconstruction error e' _n1,e'_n2 and the first reconstruction error e _n1,e_n2, which are predicted by the back-end processing unit 102, are both smaller than a certain predetermined threshold.

When the trained abnormal sample detection device detects an abnormal sample, there are two cases. The first case is a larger first reconstruction error e _a1, which never occurs during the training phase, and therefore, regardless of the distribution of the intermediate feature data h _a1, the second reconstruction error e' _a1 predicted by the back-end processing unit 102 is necessarily quite different from the first reconstruction error e _a1, for example, greater than the above-mentioned predetermined threshold, and thus can be determined as abnormal sample data accordingly.

The second case is a smaller first reconstruction error e _a2, which is similar to the larger first reconstruction error e _n2 of normal sample data. However, the distribution of the intermediate feature number h _a2 corresponding to the smaller first reconstruction error e _a2 is necessarily different from the distribution of the intermediate feature number h _n2 corresponding to the larger first reconstruction error e _n2, so that the corresponding second reconstruction error e' _a2 predicted by the back-end processing unit 102 is necessarily different from the first reconstruction error e _a2, for example, greater than the predetermined threshold, and thus can be determined as abnormal sample data according to the distribution.

It can be seen that, by the back-end processing unit 102 establishing the correspondence between the intermediate feature data h of the normal sample data and the first reconstruction error e, it is possible to accurately identify the abnormal sample falling in the intersection region of the probability distribution curve of the first reconstruction error with respect to the normal sample data and the probability distribution curve of the first reconstruction error with respect to the abnormal sample data, thereby improving the accuracy of abnormal sample detection.

The abnormal sample detection apparatus according to the second embodiment of the present disclosure was tested against a classical gray-scale image dataset MNIST commonly used in the art. The test results are shown in table 2 below.

TABLE 2

The test results in table 2 show that the abnormal sample detection apparatus finally obtained by training performed by the training apparatus 100 according to the second embodiment of the present disclosure is superior to the related art abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM in terms of various indexes for measuring abnormal sample detection performance.

Correspondingly, the disclosure also provides a training method for training the abnormal sample detection device.

Fig. 9 is a flowchart illustrating a training method 900 for training an abnormal sample detection apparatus according to an embodiment of the present disclosure.

The training method 900 starts in step S901. Subsequently, in a first reconstruction step S902, first reconstruction errors and intermediate feature data are generated by a first reconstruction unit based on training sample data as normal sample data.

The first reconstruction step S902 may be implemented by the first reconstruction unit 101 according to the first and second embodiments of the present disclosure.

Subsequently, in a back-end processing step S903, a second reconstruction error is generated by the back-end processing unit based on the first reconstruction error and the intermediate feature data.

The back-end processing step S903 may be implemented by the back-end processing unit 102 including the synthesizing unit 1021 and the second reconstructing unit 1022 according to the first embodiment of the present disclosure, or may be implemented by the back-end processing unit 102 implemented by a multi-layer sensor according to the second embodiment of the present disclosure.

Next, in a joint training step S904, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria regarding the first reconstruction error and the second reconstruction error.

The joint training performed in the joint training step S904 may be iterative training performed in a gradient descent method using training sample data based on the overall loss function, wherein the number of iterations may be set to a predetermined number of times or determined according to a criterion that, for example, a difference between results of two or more iterations is stabilized within a predetermined range.

Finally, the training method 900 ends at step S905.

Although the embodiments of the present disclosure are described above by taking image data as an example, it is apparent to those skilled in the art that the embodiments of the present disclosure are equally applicable to other abnormal sample detection fields, such as industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.

Fig. 10 is a block diagram illustrating a general machine 1000 that may be used to implement a training apparatus and training method according to embodiments of the present disclosure. The general-purpose machine 1000 may be, for example, a computer system. It should be noted that the general machine 1000 is only one example and does not imply any limitation on the scope of use or functionality of the methods and apparatus of the present disclosure. Nor should the general machine 1000 be construed as having a dependency or requirement relating to any one or combination of components illustrated in the above-described exercise device or exercise method.

In fig. 10, a Central Processing Unit (CPU) 1001 performs various processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 to a Random Access Memory (RAM) 1003. In the RAM 1003, data necessary when the CPU 1001 executes various processes and the like is also stored as needed. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output interface 1005 is also connected to the bus 1004.

The following components are also connected to the input/output interface 1005: an input section 1006 (including a keyboard, a mouse, and the like), an output section 1007 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like), a storage section 1008 (including a hard disk, and the like), and a communication section 1009 (including a network interface card such as a LAN card, a modem, and the like). The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 may also be connected to the input/output interface 1005, as desired. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like may be installed on the drive 1010 as needed, so that a computer program read out therefrom can be installed into the storage section 1008 as needed.

In the case where the series of processes described above is implemented by software, a program constituting the software may be installed from a network such as the internet or from a storage medium such as the removable medium 1011.

It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 1011 shown in fig. 10, in which the program is stored, which is distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 1011 include a magnetic disk (including a floppy disk), an optical disk (including a compact disk read only memory (CD-ROM) and a Digital Versatile Disk (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Or the storage medium may be a ROM 1002, a hard disk contained in the storage section 1008, or the like, in which a program is stored, and distributed to users together with a device containing them.

The present disclosure also provides a program product having stored thereon machine-readable instruction code. The instruction codes, when read and executed by a machine, may perform the training method according to the present disclosure described above. Accordingly, various storage media, as enumerated above, for carrying such program products are included within the scope of the present disclosure.

Specific embodiments of an apparatus and/or method according to embodiments of the present disclosure have been described above in detail with reference to block diagrams, flowcharts, and/or embodiments. When such block diagrams, flowcharts, and/or implementations comprise one or more functions and/or operations, it will be apparent to those skilled in the art that the functions and/or operations of such block diagrams, flowcharts, and/or implementations may be implemented by various hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described in this specification can be implemented by an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or other integrated form. However, those skilled in the art will recognize that some aspects of the embodiments described herein can be equivalently implemented in integrated circuits, in whole or in part, as one or more computer programs running on one or more computers (e.g., as one or more computer programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and/or firmware of this disclosure is well within the skill of one of skill in the art in light of this disclosure.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components. The terms "first," "second," and the like, as used herein, relate to ordinal numbers and do not denote the order of implementation or importance of features, elements, steps, or components, as defined by the terms, but rather are used to identify the features, elements, steps, or components for clarity of description.

In summary, in embodiments according to the present disclosure, the present disclosure provides the following, but is not limited thereto:

scheme 1. A trainer for training an abnormal sample detection device, the trainer includes:

A first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and

A back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data,

Wherein joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria regarding the first reconstruction error and the second reconstruction error.

Scheme 2. The training device of scheme 1 wherein the first reconstruction unit is implemented by a self-encoder.

Scheme 3. The training device of scheme 2 wherein the first reconstruction unit is implemented by a depth convolution self-encoder.

Scheme 4. The training device of scheme 1 wherein the first reconstruction unit is further configured to output first reconstruction data, the first reconstruction error being a distance in vector space of the first reconstruction data from the training sample data.

Scheme 5. The training device of scheme 1 wherein the vector dimension of the training sample data is greater than or equal to the vector dimension of the intermediate feature data.

Scheme 6. The training device of scheme 1 wherein the back-end processing unit comprises:

A synthesizing unit configured to generate synthesized data based on the first reconstruction error and the intermediate feature data; and

A second reconstruction unit configured to generate the second reconstruction error based on the synthesized data,

Wherein the predetermined criterion is to minimize a sum of the first reconstruction error and the second reconstruction error.

Scheme 7. The training device of scheme 6 wherein said second reconstruction unit is implemented by a long and short term memory model.

The training apparatus of claim 6, wherein the second reconstruction unit is configured to output second reconstruction data, and the second reconstruction error is a distance of the second reconstruction data from the synthesized data in vector space.

Scheme 9. The training device of scheme 6 wherein said synthesis unit normalizes said intermediate feature data to match said first reconstruction error.

Scheme 10. The training device according to scheme 7 wherein said synthesis unit performs sequence learning on said intermediate feature data.

Scheme 11. The training device of scheme 6 wherein the loss function of the first reconstruction unit and the loss function of the second reconstruction unit are weighted summed to obtain a total loss function, the joint training being performed based on the total loss function.

Scheme 12. The training device of scheme 11, wherein the loss function of the second reconstruction unit implemented by the long-short term memory model is obtained by performing both forward propagation and backward propagation of the long-short term memory model.

Scheme 13. The training device of scheme 11 wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the second reconstruction unit.

Scheme 14. The training device of scheme 1, wherein the back-end processing unit is configured to predict the second reconstruction error based on the first reconstruction error and the intermediate feature data, and

Wherein the predetermined criterion is to minimize a difference between the second reconstruction error and the first reconstruction error.

Scheme 15. The training device of scheme 14 wherein the back-end processing unit is implemented by a multi-layer perceptron.

Scheme 16. The training device of scheme 14 wherein the loss function of the first reconstruction unit and the loss function of the back-end processing unit are weighted summed to obtain a total loss function, the total loss function being used to perform the joint training.

Scheme 17. The training device of claim 14, wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the back-end processing unit.

An abnormal sample detection apparatus comprising a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to any one of claims 1 to 18.

An embodiment 19 is a training method for training an abnormal sample detection apparatus, the training method comprising:

A first reconstruction step of generating, by a first reconstruction unit, first reconstruction errors and intermediate feature data based on training sample data as normal sample data;

A back-end processing step for generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and

A joint training step of performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion regarding the first reconstruction error and the second reconstruction error.

Scheme 20. A computer readable storage medium having stored thereon a computer program which, when executed by a computer, implements the training method as described in scheme 19.

While the disclosure has been disclosed by the foregoing description of specific embodiments thereof, it will be understood that various modifications, improvements, or equivalents may be devised by those skilled in the art that will fall within the spirit and scope of the appended claims. Such modifications, improvements, or equivalents are intended to be included within the scope of this disclosure.

Claims

1. An abnormal image data detection apparatus configured to process image data to be detected to determine whether the image data to be detected is abnormal image data, the abnormal image data detection apparatus comprising a trained first reconstruction unit and a back-end processing unit obtained by:

A first reconstruction step of generating, by a first reconstruction unit, first reconstruction errors and intermediate feature data based on training image data as normal image data;

A joint training step for performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion regarding the first reconstruction error and the second reconstruction error to obtain the trained first reconstruction unit and back-end processing unit,

Wherein, in the case where the normal image data is normal two-dimensional image data, the intermediate feature data is a one-dimensional vector characterizing features of the extracted two-dimensional image data,

Wherein, the back-end processing step includes:

Generating, by a synthesizing unit in the back-end processing unit, synthesized data based on the first reconstruction error and the intermediate feature data; and

Performing a reconstruction operation on the synthesized data by a second reconstruction unit in the back-end processing unit to obtain second reconstruction data, and calculating the second reconstruction error from a difference between the second reconstruction data and the synthesized data,

2. The abnormal image data detection apparatus of claim 1, wherein the first reconstruction unit is implemented by a self-encoder.

3. The abnormal image data detecting apparatus of claim 2, wherein the first reconstruction unit is implemented by a depth convolution self-encoder.

4. The abnormal image data detection apparatus according to claim 1, wherein the first reconstructing step includes generating first reconstruction data by the first reconstructing unit, the first reconstruction error being a distance of the first reconstruction data from the training image data in a vector space.

5. The abnormal image data detection apparatus according to claim 1, wherein a vector dimension of the training image data is greater than or equal to a vector dimension of the intermediate feature data.

6. The abnormal image data detecting apparatus according to claim 1, wherein the second reconstructing unit is realized by a long-short-term memory model.

7. The abnormal image data detection apparatus of claim 1, wherein the second reconstruction error is a distance of the second reconstruction data from the synthesized data in a vector space.

8. The abnormal image data detecting apparatus of claim 1, wherein generating the synthesized data comprises: and normalizing the intermediate characteristic data by the synthesis unit to match the first reconstruction error.

9. The abnormal image data detecting apparatus of claim 6, wherein generating the synthesized data comprises: and performing sequence learning on the intermediate feature data by the synthesizing unit.

10. The abnormal image data detection apparatus according to claim 1, wherein the joint training step includes: the loss functions of the first and second reconstruction units are weighted and summed to obtain a total loss function, and the joint training is performed based on the total loss function.

11. The abnormal image data detecting apparatus according to claim 10, wherein the loss function of the second reconstructing unit realized by a long-short-term memory model is obtained by performing both forward propagation and backward propagation of the long-short-term memory model.

12. The abnormal image data detection apparatus according to claim 10, wherein a weight of a loss function of the first reconstruction unit is larger than a weight of a loss function of the second reconstruction unit.

13. An abnormal image data detection apparatus configured to process image data to be detected to determine whether the image data to be detected is abnormal image data, the abnormal image data detection apparatus comprising a trained first reconstruction unit and a back-end processing unit obtained by:

Wherein, the back-end processing step includes: predicting, by the back-end processing unit, the second reconstruction error based on the first reconstruction error and the intermediate feature data, an

14. The abnormal image data detecting apparatus of claim 13, wherein the back-end processing unit is implemented by a multi-layer perceptron.

15. The abnormal image data detection apparatus according to claim 13, wherein a loss function of the first reconstruction unit and a loss function of the back-end processing unit are weighted and summed to obtain a total loss function, the total loss function being used to perform the joint training.

16. The abnormal image data detection apparatus of claim 15, wherein a weight of the loss function of the first reconstruction unit is greater than a weight of the loss function of the back-end processing unit.