CN118864454B

CN118864454B - An unsupervised anomaly detection method and system based on memory expert guidance

Info

Publication number: CN118864454B
Application number: CN202411336461.6A
Authority: CN
Inventors: 李刚; 李敏; 周鸣乐; 李旺; 陶瑞涛; 韩德隆; 张蕊; 陈天娇
Original assignee: Shandong Shanke Digital Economy Research Institute Co ltd; Qingdao Haier Smart Technology R&D Co Ltd; National Supercomputing Center in Jinan
Current assignee: Shandong Shanke Digital Economy Research Institute Co ltd; Qingdao Haier Smart Technology R&D Co Ltd; National Supercomputing Center in Jinan
Priority date: 2024-09-25
Filing date: 2024-09-25
Publication date: 2025-01-21
Anticipated expiration: 2044-09-25
Also published as: CN118864454A

Abstract

The invention provides an unsupervised anomaly detection method and system based on memory expert guidance, and relates to the technical field of anomaly detection, wherein the method comprises the steps of obtaining a defect image to be detected; the method comprises the steps of inputting a defect image to be detected into a trained defect detection model to be detected, obtaining a glass container surface defect detection result, wherein the defect detection model comprises a characteristic distillation network for extracting a difference significant characteristic image and an abnormal refinement network for generating a defect detection result according to the difference significant characteristic image, which are sequentially connected, the characteristic distillation network is used for helping a denoising student network to learn a normal sample based on a normal memory expert, a memory vector obtained by a teacher network according to the normal sample characteristic is stored in the normal memory expert, and the denoising student network updates a query characteristic generated according to the defect image to be detected according to the memory vector. The invention can improve the accuracy of glass container surface defect detection.

Description

Non-supervision abnormality detection method and system based on memory expert guidance

Technical Field

The invention relates to the technical field of anomaly detection, in particular to an unsupervised anomaly detection method and system based on memory expert guidance.

Background

In intelligent manufacturing, anomaly detection is one of key technologies, and can help enterprises monitor production processes in real time and discover and treat potential problems in time, so that product quality and production safety are guaranteed. Anomaly detection is the process of identifying abnormal or rare objects, events, and patterns in an image or video. An anomaly may be any condition that does not conform to an expected pattern or deviates significantly from most data. In industrial defect detection, an unsupervised anomaly detection method is widely focused due to high data acquisition and labeling cost and unpredictability of anomaly types of industrial products.

In the glass container production process, anomaly detection is particularly important. During the production of glass containers, minor defects such as bubbles, cracks or uneven coating may affect the quality of the product and even lead to rejection of the product. Because of the variety of types of defects and the difficulty in labeling them completely, unsupervised anomaly detection techniques are critical for glass container defect detection.

Knowledge distillation methods have become an important technique in unsupervised anomaly detection tasks. The knowledge distillation-based anomaly detection framework can effectively utilize the advantages of the teacher network, guide the student network to learn the representation of the normal mode, and detect anomalies by identifying differences between the teacher network and the student network. However, in this process, the following problems often occur:

(1) Insufficient knowledge extraction, namely incomplete knowledge extraction possibly caused by insufficient model capacity or training when knowledge is transferred from a teacher network to a student network, affects the learning of the student network on the defect characteristics.

(2) Insufficient feature extraction-the difference in feature extraction capability between the teacher network and the student network may cause the student network to fail to learn the abnormal features sufficiently, especially when dealing with subtle defects.

(3) The teacher constraint is not strong, and the teacher network may not sufficiently restrict the guidance of the student network, so that the student network does not perform well when detecting the defects of the glass container.

Disclosure of Invention

In order to solve at least one of the defects in the prior art, the invention aims to provide an unsupervised anomaly detection method and system based on memory expert guidance so as to improve the accuracy of glass container surface defect detection.

To achieve the above object, according to some embodiments, a first aspect of the present invention provides an unsupervised anomaly detection method based on memory expert guidance, including:

Obtaining a defect image to be detected;

Inputting the defect image to be detected into a trained defect detection model for detection to obtain a glass container surface defect detection result;

The defect detection model comprises a feature distillation network for extracting a difference significant feature map and an abnormal refinement network for generating a defect detection result according to the difference significant feature map, wherein the difference significant feature map is obtained according to a denoising student network and a teacher network of the feature distillation network, the feature distillation network is used for helping the denoising student network to learn a normal sample based on a normal memory expert, a memory vector obtained by the teacher network according to the normal sample feature is stored in the normal memory expert, and the denoising student network updates a query feature generated according to a defect image to be detected according to the memory vector.

In a second aspect of the present invention, there is provided an unsupervised anomaly detection system based on memory expert guidance, comprising:

The image acquisition module is configured to acquire a defect image to be detected;

The defect detection module is configured to input a defect image to be detected into a trained defect detection model for detection, so as to obtain a glass container surface defect detection result;

Compared with the prior art, the invention has the beneficial effects that:

The invention provides an unsupervised anomaly detection method and system based on memory expert guidance, which introduces a memory expert mechanism in the knowledge distillation process, extracts high-level normal features in normal samples (namely normal samples) through a teacher network, stores the high-level normal features in normal memory specialization in the form of memory vectors and transmits the memory vectors to a denoising student network, and the denoising student network updates query features according to the memory vectors stored in the normal memory specialization, so that the image compression process is accelerated, the knowledge distillation degree is enhanced, and the accuracy of the denoising student network on tiny defect identification is improved. The self-adaptive perception state enhancement module is introduced into the denoising student network to fully extract global information and local information in the reconstruction process, so that the denoising student network can pay attention to the accurate recovery of the surface details and the structure of the glass container better during denoising, and the reconstruction quality of a normal sample is improved. By adopting the double-domain comparison method, the frequency domain information is introduced on the basis of the spatial domain, so that the constraint of the teacher network on the denoising student network is enhanced, the teacher network can guide the denoising student network to reconstruct a normal image with higher quality, and the accuracy of detecting the surface defects of the glass container is further improved.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of a method according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of the overall architecture of a defect detection model;

FIG. 3 is a schematic diagram of a normal information storage and recall network;

FIG. 4 is a schematic diagram of an adaptive perception state enhancement module;

FIG. 5 is a schematic diagram of a lossless frequency-domain feature encoder;

FIG. 6 is a schematic diagram of an anomaly refinement network.

Detailed Description

The invention will be further described with reference to the drawings and examples.

Example 1

An embodiment of the present invention provides an unsupervised anomaly detection method based on memory expert guidance, as shown in fig. 1 to 6, including:

Obtaining a defect image to be detected;

Aiming at the problems of the existing unsupervised anomaly detection based on knowledge distillation, the embodiment constructs a defect detection model, and a memory expert mechanism is introduced into the defect detection model to help a student network to more accurately identify the micro defects in the glass container. Firstly, a defect detection model formed by a characteristic distillation network and an abnormality refinement network is constructed, then, the model is trained and tested by adopting collected defective and flawless industrial product surface defect images, and finally, the trained model is deployed in an industrial application scene to detect the abnormality of a glass container in the production process.

The specific implementation process comprises the following steps:

s1, acquiring an industrial product surface defect image for training and verifying a model, and preprocessing the acquired image to obtain a training set and a verification set;

S2, constructing a characteristic distillation network;

s3, constructing an abnormal refinement network;

Step S4, connecting a characteristic distillation network and an abnormality refinement network to obtain a defect detection model, training by adopting the training set obtained in the step S1, and testing by adopting the testing set to obtain a trained defect detection model;

and S5, deploying the trained defect detection model.

In step S1, a systematic method is adopted to screen and divide the surface defect images of the industrial products. And selecting the defect-free image as a training set to culture the accurate identification capability of the abnormal detection model on the normal surface characteristics. Meanwhile, the defective image is used as a test set to evaluate the performance and accuracy of the model in the actual detection task. In order to further improve generalization capability and sensitivity to anomalies of the model, pseudo-anomaly construction is performed on normal samples in a training set, pseudo-anomaly images corresponding to the normal images are generated, mask patterns are created for the pseudo-anomaly images, the mask patterns not only define anomaly areas, but also provide additional training information for the model, and therefore a pseudo-anomaly training set is formed.

The generation of pseudo-anomalous images aims at generating anomalous images in a more complex and realistic way. Specifically, to increase the diversity of the generated abnormal images, a multi-stage nonlinear processing and noise transformation are introduced. The process of generating the abnormal image P _a is formulated as follows:

;

Wherein, Representing element-by-element multiplication, T is an anomaly-free training set image, and retains the information of an original scene; The method is characterized in that the method comprises the steps of selecting opacity factors randomly in a range (0, 1) as a data enhancement means, wherein M is an abnormal mask generated through multi-level nonlinear transformation and activation function processing and represents an abnormal region; And Is an introduced extra noise item which respectively represents local and global complex noise, thereby increasing the diversity of abnormal images, and the characteristics of the noise are determined by parametersAndAnd (5) adjusting.

In the generation of the abnormal image P _a, the non-abnormal region is represented by (1-M), so that the distinction between the abnormal region and the non-abnormal region can be more direct and obvious, especially at the boundary of the abnormal region, the introduced additional noise items comprise local and global complex noise, the characteristics of the noise can be more intuitively controlled, and the opacity factor is introducedThe method can flexibly adjust the visibility of the abnormality when generating the abnormal image, increases the flexibility of generating the image, and is beneficial to training the model to identify the abnormality with different degrees.

The generation process of the anomaly mask M is as follows:

;

Wherein, W _i is convolution kernel, X _i is input image; A nonlinear activation function RELU; Is a Sigmoid activation function used to generate a mask, N is the number of convolution layers, determines the complexity of generating an anomaly mask, parameters W _i and The convolution kernel and activation function of each stage are represented separately for adjusting the transform strength.

In step S2, the feature distillation network includes a denoising student network and a teacher network, in which a normal information storage and recall network and an adaptive perception state enhancement module are designed, and parameters of the feature distillation network are updated by a two-domain comparison method.

Specifically, as shown in fig. 2, the teacher network is composed of three pre-training Resnet blocks of corresponding dimensions for extracting features of the normal sample. The denoising student network is an architecture of an encoder-decoder, the encoder comprises four Resnet blocks and three normal information storage and recall networks, one normal information storage and recall network is correspondingly connected after the first three Resnet blocks, and the decoder comprises two Resnet blocks and two self-adaptive perception state enhancement modules connected after the Resnet blocks.

Based on the inspiration of the human anomaly recognition process, the embodiment of the invention introduces a memory expert mechanism in the knowledge distillation process, thereby assisting the denoising learning of the denoising student network, and the core of the memory expert mechanism is a normal information storage and recall network. As shown in fig. 3, a normal memory expert is established for storing high-level normal features extracted from normal samples, which are extracted by a pre-trained ResNet encoder (i.e., teacher network) to form a set of memory information representing normal states. The teacher network extracts advanced normal features from the normal sample, which can accurately reflect the high-level information of the normal sample, and is stored in the normal memory specialist. In the training process of the student network, the normal memory expert helps the student to conduct denoising learning. The student network not only needs to learn how to identify the normal sample, but also improves the robustness of the student network to noise and abnormality through the advanced features provided by the memory expert.

A normal learning strategy is adopted to help the memory expert grasp prior knowledge in normal data in the training process. First, a randomly initialized normal memory expert is established to store a certain number of memory vectors m _i. In the training phase, normal samples in the training set are sent into the teacher network, and k training samples are encoded into normal sample characteristics through an encoder of the teacher networkWherein,Representing a d-dimensional real vector space defined by d real coordinates, these normal sample features are saved in a normal memory specialist to recall learned normal knowledge from query features.

In order to further improve the learning effect, an adaptive weighting update mechanism is introduced, and weights are adjusted based on the dynamic relation between the memory vector and the normal sample characteristics of the current input:

;

Wherein, Is the value of the memory vector in the t-th iteration; is the value of the memory vector in the t+1st iteration; the dynamic learning rate is adaptively adjusted according to the change condition of the loss function in each iteration step; Is a suitably selected nonlinear function, here a Sigmoid function is used for adjusting the update amplitude; is a normal sample feature of the current input, represents the element product. The learning rate is adaptively adjusted according to the loss condition of each iteration, so that the method can converge more rapidly or avoid over-fitting; Nonlinear adjustment is introduced, and the nonlinear adjustment can be based on the memory vector m _i and normal sample characteristics The similarity of the model is dynamically adjusted to update the amplitude, so that the flexibility and the expressive power of the model are enhanced, and the element product is obtainedPreserving the relative orientation information between the input normal sample features and the memory vector helps to more accurately capture the relationship between the features. Subsequently, each normal sample featureFlattened and computed cosine similarity with memory vector m _i in the memory specialist, similarity weight is obtained through softmax activation function to form a first similarity score matrix:

;

Where K is the total number of vectors for the normal memory expert. Then press m _i Polymerizing to obtain the first normalized feature:

;

To ensure that normal information is remembered from normal sample features, a normal memory loss function L _mem is employed to minimizeAnd (3) withThe difference between:

;

Wherein, The sparsity-loss function is represented as,Representing a cross-sample consistency loss function,、The weight coefficients of the sparsity loss function and the cross-sample consistency loss function, respectively.

In order to promote the sparsity of the memory vector, avoid overfitting, introduce a sparsity loss function, ensure that the memory vector m _i has sparsity:

。

in order to ensure the feature consistency among different samples, cross-sample consistency loss is introduced, and the structural relationship among samples is ensured to be consistent after reconstruction:

;

Wherein, Representing different normal samplesAndThe difference between the two is that,Representing different normal samplesAndIs the first normalized feature of (1)AndDifferences between them.

After training of the teacher network on the normal memory expert is completed, the final goal of the normal information storage and recall network is to adaptively adjust the normality of the student network generation characteristics. And (3) extracting the memory priori knowledge for modeling by using the query characteristics, and inputting the pseudo-abnormal image corresponding to the training set into a denoising student network. Obtaining query features through a noisy student networkAnd invokes normal information stored in the memory specialist according to the query characteristics.

Specifically, the pseudo-abnormal image corresponding to the training set is sent to a denoising student network, and k training samples are encoded into query features through the denoising student networkWherein. The query features recall the learned normal knowledge by the normal memory expert. Each query featureFlattened and computed cosine similarity with memory vector m _i in normal memory specialist, similarity weight is obtained through softmax activation function, and a second similarity score matrix is formed:

;

Where K is the total number of vectors of the memory specialist.

Second similarity score matrixHow much of the relevant normalization needs to be invoked for integration at normalization is controlled. Then press m _i Polymerizing to obtain the second normalized feature:

;

Finally, willConversion intoAnd with the original query featuresAnd further splicing to form the input of the student network of the next stage.

The encoder of the denoising student network comprises four Resnet18 blocks and three normal information storage and recall networks, wherein the first Resnet blocks are connected with the first normal information storage and recall network NMR1, the first normal information storage and recall network NMR1 is connected with the second Resnet blocks, the second Resnet blocks are connected with the second normal information storage and recall network NMR2, the second normal information storage and recall network NMR2 is connected with the third Resnet blocks, the third Resnet blocks are connected with the third normal information storage and recall network NMR3, and the third normal information storage and recall network NMR3 is connected with the fourth Resnet blocks. The teacher network comprises three Resnet blocks which correspond to the first three Resnet blocks of the denoising student network encoder, and the features extracted by each Resnet block are input into the normal information storage and recall network after the corresponding Resnet blocks in the denoising student network, so as to train the normal memory expert in the corresponding normal information storage and recall network.

The self-adaptive perception state enhancement module is based on Mamba state space models, considers the requirements of the models in the embodiment, improves the post-processing process after the visual state space blocks, extracts global and local information, and enables the denoising student network to pay more attention to accurate recovery of details and structures during denoising.

Specifically, the self-adaptive perception state enhancement module uses the visual state space block to extract the space remote dependency relationship, so that the extraction of local information is increased on the basis, and after the extraction of the space features is completed by using the visual state space block, the capturing capability of the local region features is improved by adopting a strategy combining average pooling and maximum pooling, so that the model is helped to more accurately locate and aim at the interested object.

The structure of the adaptive sensing state enhancement module EP-VSS is shown in FIG. 4, and for the input feature X input to the adaptive sensing state enhancement module, X is first sent to the visual state space block VSS for processing to obtain the output result of the visual state space block. Subsequently, toAnd respectively carrying out average pooling and maximum pooling treatment, respectively capturing global average characteristics and global maximum characteristics, and then enhancing global understanding of the model on input data through the two characteristics of splicing (cat) to obtain spliced characteristics y. The calculation formula of the stitching feature y can be expressed as:

;

Wherein, For the output result of the visual state space block, H, w represent the height and width of the feature map, respectively, (H, 1) and (1, w) represent the window size of the pooling operation, and pooling is performed in the height and width directions, respectively.

Then extracting features through a convolution layer, and obtaining enhanced features through batch normalization of nonlinear expression capacity of BN and Sigmoid activation function enhancement modelsThe calculation formula is as follows:

;

where conv (y) denotes applying a convolution operation to y, Representing element-wise multiplication.

Thereafter, willAnd dividing the image into two parts along a second dimension (channel dimension), wherein the size of each part is H and W respectively, and obtaining a characteristic image x _w and x _h after dividing. The formula is as follows:

。

thereafter, the attention weights a _w、a_h in the width and height directions are calculated by the convolution layer and Sigmoid activation function.

;

Wherein, The function is activated for Sigmoid,To permute x _w to accommodate convolution operations. Finally, multiplying the attention weights in the width and height directions to obtain a final attention map, and multiplying the final attention map with the input feature map to enhance important features:

;

The identity represents an identity mapping to the input, namely the input characteristic X of the input is directly used, and out is a characteristic diagram output by the adaptive perception state enhancement module.

As shown in fig. 2, in order to improve the performance of the feature distillation network and optimize the network structure of the denoising student network, the embodiment of the invention adopts a dual-domain comparison method between the denoising student network and the teacher network for extracting normal features. According to the method, a more efficient frequency domain information analysis strategy is introduced on the basis of traditional spatial domain analysis, and the output of three scales of a teacher network is compared with the output of the last Resnet blocks and two self-adaptive perception state enhancement modules of a denoising student network in a double-domain manner, so that the denoising student network is promoted to reconstruct a higher-quality normal image under the guiding action of the teacher network.

Specifically, the image obtained by the teacher network and the corresponding image processed by the denoised student network are first passed through a lossless frequency domain feature encoder LFDFE. The key of the step is that the feature conversion from the spatial domain to the frequency domain is realized without losing any image information. The redundant information is then removed by a filter. On the basis, a double-domain composite loss function is further constructed, and the function comprehensively considers the image restoration performance of the feature space and the image restoration efficiency of the wavelet frequency domain space.

The lossless frequency-domain feature encoder first performs a Haar wavelet transform on the input feature map I, which decomposes the feature map into four key components, an approximate (low frequency) component a, and detail (high frequency) components in the horizontal C, vertical V, and diagonal D directions. And splicing the four components, and then carrying out further refinement treatment through a filter. The filter consists of a standard 1 x 1 convolution layer, a batch normalization layer, and RELU activation functions, with the objective of filtering redundant information and providing more efficient representative features. The process of Haar wavelet transformation can be expressed by the following formula:

;

Where I _Haar denotes a feature obtained after the Haar wavelet transform, and H () denotes the Haar wavelet transform.

After four key components obtained by wavelet transformation are spliced, further processing is carried out through a filter:

;

splicing four key components through cat, performing further processing through conv1×1 convolution, and obtaining the characteristics after filter processing through batch normalization BN and an activation function RELU 。

The wavelet transform can provide a multi-scale and multi-directional representation of the image, helping to more accurately identify and remove noise while preserving important structural information of the image. By minimizing the difference between the wavelet transform of normal image features and its denoised feature wavelet transform by the frequency domain loss function, it is possible to learn how to efficiently recover the details of the image in the wavelet domain. Frequency domain loss functionCan be expressed as:

;

wherein W represents lossless frequency domain feature encoding, Is a normal image feature output by the teacher's network,The method is a denoising image feature output by a denoising student network, R represents different scales, and r=3 represents the scale number of three different scales. The difference in frequency domain between the first Resnet block of the teacher network and the last adaptive perceptual status enhancement module output feature of the denoised student network decoder, the second Resnet block of the teacher network and the penultimate adaptive perceptual status enhancement module output feature of the denoised student network decoder, the third Resnet block of the teacher network and the last Resnet block output feature of the denoised student network decoder, respectively, is compared by minimizing the frequency domain loss functionThe network can optimize its parameters in the wavelet domain to achieve better denoising performance.

Combining the loss of the characteristic domain and the loss of the frequency domain to obtain a double-domain composite loss functionThe network can be optimized in two spaces:

;

Wherein, Representing a characteristic domain loss function, R representing different scales, r=3,、Is a weight parameter for balancing two loss terms, h and w represent the height and width of the feature map respectively,AndRepresenting the raw and denoised eigenvalues at positions (i, j), respectively.

Three different scales are to be usedAndFusing to uniform scale, and splicing to obtain a difference significant feature mapThe formula is as follows:

;

Where r=1, 2,3 denotes three different scales, U () denotes an up-sampling operation, and cat () denotes a splicing operation.

In step S3, an abnormality refinement network of an abnormality detection model is constructed, and the difference significant feature map obtained by the feature distillation network is further refined so as to improve the accuracy of abnormality detection. As shown in fig. 6, the anomaly refinement network includes a multi-scale feature extraction module, a self-attention fusion module, a cascade convolution module, a global context sensing module, an attention-guided feature fusion module, and a refinement output module, which are sequentially connected, and finally generates an anomaly detection result.

The multi-scale feature extraction module extracts information of an input feature map from different scales, and the capturing capability of abnormal features under the multi-scale is enhanced. The feature extraction at each scale uses a combination of convolution and pooling layers, the formula is as follows:

;

Where Conv represents convolution, pooling represents pooling operations, Representing the extracted multi-scale feature of the r-th scale.

Then fusing the features with different scales through a self-attention fusion module, and highlighting the key areas through a self-attention mechanism, wherein the formula is as follows:

;

Wherein, Representing computing self-attention weights and applying these weights to multi-scale features,Representing the output characteristics of the self-attention fusion module.

And then, each convolution layer further refines the characteristics on the basis of the previous convolution layer through a cascade convolution module, and the formula is as follows:

;

wherein BN represents normalization, RELU represents activation function, Representing the output characteristics of the cascaded convolution module.

Then, the global context sensing module enhances capturing global information through a global context sensing mechanism, and the formula is as follows:

;

wherein GAP represents global average pooling, FC represents fully connected layers, Representing the output characteristics of the global context awareness module.

Then, the attention-guided feature fusion module fuses the multi-scale features through a multi-head attention mechanism to enhance the feature expression capacity, and the formula is as follows:

;

Wherein, Represents the attention mechanism of the 1,2, & n heads, cat represents the splice,Representing the resulting multi-scale features of the fusion,Representing the output features of the attention directed feature fusion module. And finally, processing the final characteristics by the refinement output module to generate an abnormality detection result.

;

Wherein Upsample denotes an upsampling operation, for increasing the resolution of the feature map,Representing the output characteristics of the refinement output module.

Fixed feature distillation network. And respectively sending the pseudo-abnormal pictures into a teacher network and a denoising student network, and training an abnormal refinement network by using the characteristics extracted by the teacher network and the denoising student network. An abnormal refinement loss function combining multi-scale weighted loss and adaptive fusion loss is designed to better capture abnormal characteristics and balance the influence of different categories at the same time, so that the performance of an abnormal refinement network is improved.

Multi-scale weighted loss functionThe method aims at enhancing the attention to the abnormal region through feature extraction and weighting processing under different scales, and the formula is as follows:

;

Where S is the number of scales, N _s is the number of samples at the S-th scale, Is the weight of the s-th scale,Is a weighted parameter of the sample at the s-th scale, used to balance the loss at different scales,Is the true label of sample c at the s-th scale,Is the predicted value of sample c at the s-th scale.

Adaptive fusion loss functionThe abnormality detection performance is optimized by adaptively fusing the outputs of the plurality of feature layers, the formula of which is as follows:

;

Where N is the number of samples, Is the adaptive weight of sample d, used to balance the loss of different samples, L is the number of feature layers,Is the fusion weight of the first feature layer,Is the true label of the sample d,Is the predicted value of sample d under the first feature layer.

The abnormal refinement loss function combines the multi-scale weighted loss and the adaptive fusion loss to optimize the performance of the abnormal refinement networkThe formula of (2) is as follows:

;

Wherein, AndThe weight parameters of the multi-scale weighted loss and the adaptive fusion loss are used for balancing the contribution of the two losses.

In step S4, the construction of the defect detection model is completed, and the defect detection model is trained by adopting a training set, wherein the defect detection model is based on a normal memory loss functionTraining a normal information storage and recall network based on a double-domain composite loss functionTraining of characteristic distillation network is completed, and loss function is refined based on abnormalityAnd (5) training the anomaly refinement network. The test set is then used to verify the model effect.

The training process is divided into two phases. In the first stage, the normal image and the pseudo-abnormal image are respectively used as inputs of a teacher network and a denoising student network. After the teacher network finishes training the normal information storage and recall network, the intermediate features output by the denoising student network encoder are input into the normal information storage and recall network for recombination, and then are sent to the denoising student network decoder part. The goal of the training is to make the denoised student network generated feature representations as similar as possible to the teacher network. Finally, the feature distillation network generates a difference saliency map. In the second stage, the student network is fixed while using the pseudo-outlier image as input to the teacher network and the student network. The anomaly refinement network receives the salient feature map from the feature distillation network and generates a final anomaly detection result through multi-level refinement processing. The loss function of the model consists of three parts, namely a normal memory loss function used for training a normal information storage and recall networkDual domain composite loss function for training denoised student networkIncluding frequency domain loss functionsAnd a feature field loss functionAnd an anomaly refinement loss function for training an anomaly refinement network。

In the training of the defect detection model, the first 1000 rounds concentrate on memory enhancement and feature optimization, train normal information storage and recall networks and feature distillation networks, cultivate the abnormality removal capacity of student networks, and the subsequent 4000 rounds concentrate on feature refinement and performance optimization, train and performance optimization of abnormality refinement networks, and further improve the accuracy and generalization capacity of an abnormality detection system. After every 1000 rounds, the models were evaluated using a separate test set and the best performing model weights were retained. The strategy ensures that the model with the optimal performance can be selected for optimization at each stage in the training process, and finally the model with the optimal performance on the test set is selected for deployment, so that the defect detection model is ensured to achieve the optimal performance in practical application.

Example two

The embodiment provides an unsupervised anomaly detection system based on memory expert guidance, which comprises:

It should be noted that, in this embodiment, each module corresponds to a step of the method in the first embodiment, and the implementation process is the same, which is not described here.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An unsupervised anomaly detection method based on memory expert guidance, characterized by comprising:

Acquire an image of a defect to be detected;

The defect image to be detected is input into the trained defect detection model for detection to obtain the surface defect detection result of the glass container;

The defect detection model includes a feature distillation network for extracting a significant difference feature map and an anomaly refinement network for generating a defect detection result according to the significant difference feature map, which are connected in sequence. The significant difference feature map is obtained according to a denoising student network and a teacher network of the feature distillation network. The feature distillation network helps the denoising student network learn normal samples based on a normal memory expert. The normal memory expert stores a memory vector updated by the teacher network according to the normal sample features. The denoising student network updates the query features generated according to the defect image to be detected according to the memory vector.

The feature distillation network includes a teacher network and a denoising student network. The denoising student network is an encoder-decoder architecture. The encoder includes four Resnet18 blocks. The first three Resnet18 blocks of the encoder are connected to a normal information storage and recall network. Each of the normal information storage and recall networks includes a normal memory expert. The normal memory expert updates its own memory vector according to the normal sample features extracted from the normal sample by the corresponding layer of the teacher network.

The updating process of the memory vector is expressed as:

;

in, is the value of the memory vector in the tth iteration, is the value of the memory vector in the t+1th iteration, is the dynamic learning rate, is a nonlinear function, is the normal sample feature; · represents element-wise multiplication.

2. The unsupervised anomaly detection method based on memory expert guidance as claimed in claim 1, characterized in that the loss function in the normal memory expert training process is a normal memory loss function, and the normal memory loss function for:

;

in, represents the sparsity loss function, represents the cross-sample consistency loss function, , is the weight coefficient, k is the number of training samples, is the first normalized feature, obtained by aggregating the memory vectors according to a first similarity score matrix, and the first similarity score matrix is obtained according to the similarity between the normal sample features and the memory vectors.

3. The unsupervised anomaly detection method based on memory expert guidance according to claim 1, characterized in that the denoising student network updates the query features generated according to the defect image to be detected according to the memory vector, comprising:

The normal information storage and recall network calculates the similarity between the input query features and the memory vectors stored in the normal memory expert to obtain a second similarity score matrix; aggregates the memory vectors according to the second similarity score matrix to obtain a second normalized feature; and concatenates the second normalized feature with the query feature to obtain the output of the normal information storage and recall network.

4. An unsupervised anomaly detection method based on memory expert guidance as described in claim 1 is characterized in that the decoder of the denoising student network includes two Resnet18 blocks and two adaptive perception state enhancement modules connected after the Resnet18 blocks, and the adaptive perception state enhancement module uses the visual space state block to process the input features, and respectively obtains the global average feature and the global maximum feature of the output result of the visual state space block, splices the global average feature and the global maximum feature to obtain the spliced feature, and enhances the spliced feature to obtain the output feature map.

5. The unsupervised anomaly detection method based on memory expert guidance according to claim 4, characterized in that the enhancement of the splicing features comprises:

The concatenated features are processed through convolution layer, batch normalization and activation function in turn to obtain enhanced features;

Segmentation enhancement features, based on the segmented feature map, obtain the attention weights in the width and height directions respectively;

The acquired attention weights in the width and height directions are multiplied with the input features of the adaptive perception state enhancement module to obtain the output feature map.

6. An unsupervised anomaly detection method based on memory expert guidance as described in claim 4 is characterized in that the feature maps output by the last Resnet18 block and two adaptive perception state enhancement modules in the decoder of the denoising student network are compared with the feature maps output by the corresponding layers of the teacher network in the spatial domain and the frequency domain, so as to guide the denoising student network to reconstruct the normal image through the teacher network.

7. An unsupervised anomaly detection method based on memory expert guidance as described in claim 1 is characterized in that the significant difference feature map is obtained by correspondingly fusion and splicing the output features of the last three layers of the denoising student network decoder and the output features of the three scales of the teacher network; the anomaly refinement network includes a multi-scale feature extraction module, a self-attention fusion module, a cascade convolution module, a global context perception module, an attention-guided feature fusion module, and a refinement output module connected in sequence, and the anomaly refinement network further refines the significant difference feature map to generate anomaly detection results.

8. An unsupervised anomaly detection system based on memory expert guidance, characterized by comprising:

An image acquisition module is configured to acquire an image of a defect to be detected;

A defect detection module is configured to input a defect image to be detected into a trained defect detection model for detection, and obtain a defect detection result of the glass container surface;

The updating process of the memory vector is expressed as:

;