CN116912625A - Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism - Google Patents
Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism Download PDFInfo
- Publication number
- CN116912625A CN116912625A CN202310912045.5A CN202310912045A CN116912625A CN 116912625 A CN116912625 A CN 116912625A CN 202310912045 A CN202310912045 A CN 202310912045A CN 116912625 A CN116912625 A CN 116912625A
- Authority
- CN
- China
- Prior art keywords
- image
- defect
- sample
- training
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007547 defect Effects 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000007246 mechanism Effects 0.000 title claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 23
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 238000005520 cutting process Methods 0.000 claims abstract description 8
- 230000000737 periodic effect Effects 0.000 claims abstract description 5
- 230000002159 abnormal effect Effects 0.000 claims description 41
- 238000012549 training Methods 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000003062 neural network model Methods 0.000 claims description 10
- 230000000873 masking effect Effects 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000002708 enhancing effect Effects 0.000 claims description 7
- 238000013145 classification model Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000004088 simulation Methods 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000005251 gamma ray Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000013210 evaluation model Methods 0.000 claims description 3
- 238000011835 investigation Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 abstract 1
- 230000005856 abnormality Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000002950 deficient Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000428 dust Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004753 textile Substances 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a data enhancement method based on priori defect characteristics and SSPCAB attention mechanism, and relates to the field of industrial defect detection. Cutting an input normal sample image at a random position to obtain a rectangular area with random size, collecting the defect type and shape of an industrial product as priori knowledge, performing mask conversion on the rectangle to obtain an image patch similar to the defect shape of the corresponding product, randomly rotating the patch by a certain angle, performing random color dithering, pasting the patch at the random position of an original image to obtain a simulated defect image, inputting the simulated defect image and the normal image into a ResNet-18 neural network, integrating a self-supervision attention module (SSPCAB) into the neural network, using periodic focus loss (CFL) as a loss function of the neural network, and performing weighted addition of mean square error loss generated by the attention module as an objective function to finally obtain a defect detection model.
Description
Technical Field
The invention relates to the technical field of defect detection, in particular to a data enhancement method based on priori defect characteristics and SSPCAB attention mechanism.
Background
Surface defect detection of industrial products has been a hot spot of research and is widely used in various industrial processes, such as printing industry, glass manufacturing, textile industry, etc. The purpose is to replace manual inspection, and defects similar to scratches, cracks, dust, printing ink and the like are automatically detected and positioned on the surface of a product. With the rapid development of deep learning, the performance of the method is far superior to that of the traditional algorithm in most places. However, a big feature of deep learning over conventional algorithms is that a huge amount of data is required to drive, such as the COCO dataset for object detection, the ImageNet dataset for image classification, etc. Driven by the huge data sets, the performance of the deep learning model of target detection, semantic segmentation and classification tasks can be greatly improved, so that the data acquisition is important for a deep learning algorithm.
However, in the current industry, the acquisition of data often has several problems: 1. acquisition of defective abnormal samples is difficult because in actual production, the defective rate is extremely low, and when abnormal samples are generated, a large number of repeated useless abnormal samples are often generated, sometimes even no usable abnormal samples. 2. In actual production, the positions where the anomalies appear are often random, and even if a sufficient number of data sets are acquired, the anomalies appear at all positions of the image cannot be ensured, so that the data set driven model cannot ensure that the data set driven model has stronger detection performance under various backgrounds. 3. After a large number of abnormal samples are obtained, data labeling is needed for the abnormal samples, but compared with the targets of training sets such as COCO, the defects generated in industrial production are very tiny, so that the defects are difficult to find in a complex background, and the labeling process is often finished by consuming a large amount of manpower and material resources.
Although in recent years, research on a self-supervision or unsupervised defect detection method has been continued, there are still some basic problems, and the current anomaly detection technology based on an embedding method uses the distance between embedded vectors of a sample and a normal sample as a criterion for calculating an anomaly score, and uses a training network to improve the feature extraction capability of the network to obtain an anomaly region. However, in view of the principle of anomaly detection, the extracted features of the model need to be matched, so that the calculation speed is severely limited. The method based on reconstruction detects the abnormality of the image through the error generated in the image reconstruction process and locates the abnormality. Such as self-encoders and generation of a countermeasure network are often used in this approach. The self-encoder is trained using the loss of resistance, and the anomaly score is calculated from errors generated during the image reconstruction process. In view of the strong learning ability of neural networks, even abnormal images can be reconstructed with little loss. This violates the premise of the reconstruction method that the abnormal region of the abnormal image cannot be completely reconstructed to be distinguished from the original image, thereby disabling the reconstruction method. For the anomaly detection technology based on data enhancement, the method does not need to carry out data annotation, and a feature extractor is learned from unlabeled data. Common methods are for example to randomly copy a small rectangular area from the input image and to randomly paste it into the image simulating an abnormal sample. Structural anomalies are created by pasting rectangular patches of different sizes, from the aspect of the aspect ratio and rotation angle. Most of the current data enhancement methods cannot well simulate real defects, and can only generate abnormal samples simply by constructing irregularities of images. Therefore, a data enhancement method closer to the real defect is needed to generate the simulation data, so that the neural network model with better performance is driven.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a data enhancement method based on prior defect characteristics and SSPCAB attention mechanism, which comprises the following steps:
step 1: establishing an industrial defect detection data set; the data set comprises a test set and a training set, wherein the test set comprises a certain proportion of normal samples and defect samples, and the training set only comprises a certain number of normal samples;
step 2: establishing an industrial defect product priori knowledge base; the method comprises the steps of obtaining defect shape information and type information which are frequently generated by different industrial products as priori knowledge through field investigation and analysis of an industrial field;
step 3: a data enhancement strategy with priori knowledge is adopted, a rectangular patch is obtained by cutting a normal image, a new defect patch with a defect shape and original sample characteristics is obtained by fusing the patch and the priori defect shape, and the patch is pasted back to the original image to generate a simulated abnormal image;
the data enhancement strategy with priori knowledge is divided into two working modes according to the simulated abnormal size; the first mode of operation is: in the process of cutting, determining the size of a rectangle to be cut according to a certain proportion according to the size of an original image; the second mode of operation is: giving a certain numerical range which is far smaller than the size of the image, and randomly selecting numerical values in the range as the size of a clipping rectangle so as to generate a tiny abnormal patch; enriching data types through two different anomaly simulations;
step 4: taking a normal sample and a simulated abnormal sample generated based on the normal sample as input, training an image classifier capable of detecting an abnormal image by using a simulated image and a normal image based on a training method of self-supervision learning, wherein the classifier is provided with four convolution layers, the channel sizes are 64, 128, 256 and 512, the image size is converted into 512 multiplied by 1 by adopting a Relu activation function and two pooling layers, and finally the output is converted into probability distribution by using softmax;
the classifier is divided into two working modes according to different classification modes; the first mode of operation is: the classification task of the image only comprises two categories of normal images and abnormal images, the normal images are marked as 0, and the abnormal images randomly select one of tiny anomalies and standard anomalies to simulate in the process of enhancing data each time and are marked as 1; the second mode of operation is: the classification task of the image comprises three categories of normal images, standard anomalies and tiny anomalies, wherein two anomaly images of the standard anomalies and the tiny anomalies are generated simultaneously in the data enhancement process, the normal images are marked as 0, the standard anomalies are marked as 1, and the tiny anomalies are marked as 2; training a classifier with better performance through different classification tasks;
step 5: a neural network incorporating a self-supervising attentiveness mechanism;
the self-supervising attentiveness mechanism is divided into two parts: (1) The masking convolution layer adopts a convolution kernel with the size of 3 multiplied by 3, a masking area M with the size of 1 is arranged in the middle of the convolution kernel, calculation is not involved in the process of carrying out convolution by the convolution kernel, and parameters which can be learned by the masking convolution are positioned at four corners of a receptive field, wherein k0 epsilon N + Is a super parameter defining the size of the sub-core, C is the number of input channels. d is a core K i Distance from the mask area M, where M is denoted as ε R 1 X 1 xC, its predictive value is K i And (3) summing. (2) The channel attention mechanism is fused after the convolution layer, the features extracted by the convolution layer are subjected to a Squeeze compression operation, and feature mapping crossing the space dimension H×W is aggregated to generate a channel descriptor, namely H×W×C- & gt 1×1×C. The global space information is compressed into the channel descriptors so that these descriptors can be utilized by other layers of its input. The approach taken by Squeeze is global average pooling, expressed as:
where H is the length of the feature map, W is the width of the feature map, u c After being converted intoKeeping the channel number C unchanged, compressing the features into a channel descriptor;
the gating mechanism is then parameterized by a bottleneck of two fully connected layers, i.e. W 1 For reducing dimension, W 2 For dimension increment, expressed as:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z))
wherein W is 1 ,W 2 Is a full connection layer, W 1 Where r is a scaling parameter, here a value of 16, the number of channels can be reduced to reduce the amount of computation. Delta is a Relu layer, the output dimension is unchanged, and W 2 Multiplying and finally obtaining s through a sigmoid function;
after the module is inserted into the last convolutional layer of the backbone neural network, parameters are updated synchronously with training of the neural network;
step 6: repeating the steps 4 and 5 until the target function of the neural network converges to obtain the optimal classified neural network model;
the objective function is expressed as:
L mask =-ξ(1+p t ) γhc log(p t )+(1-ξ)FL+α(G(X)-X) 2
wherein p is t Is a predicted value; FL is the focus loss; gamma ray hc Is a super parameter; ζ is a time-varying parameter that is assigned different weights during different training periods, with the goal of combining early-to-confidence predicted class concerns with mid-to-misclassified hard sample class concerns. Alpha is defined weight super parameter, which is generally taken as 0.1, G is self-supervision attention module, X is tensor of self-supervision attention module; in detail, the parameter ζ that varies with the training epoch is defined as:
wherein, tau is a super parameter and tau is more than or equal to 1, which is used for dividing the total epoch; t is the total epoch number; t is the current epoch number. For example, if τ is set to 2, the periodic shape is an inverted triangle, τ changes from 1 to 0 in the first half and from 0 in the second half;
step 7: performing model evaluation on the classification model obtained in the step 6 in a test set, and calculating the abnormal score of the image through a Gaussian density estimator to classify the image;
the Gaussian density estimator detects the image abnormality and calculates the abnormality score, and the input image is only marked with the image-level label and has no pixel-level label; in the evaluation stage, the extracted features of the last convolution layer of the model are taken as output, the extracted features are input to a Gaussian density estimator, and the anomaly scores of the features are calculated and expressed as follows:
where x is the input feature, μ is the mathematical desired value, Σ is an n-order matrix obtained during learning, and their sub-table is expressed as:
wherein x is a feature matrix, and m is the number of the feature matrices;
wherein x is a feature matrix, mu is a data expectation, and m is the number of the feature matrices;
step 8: drawing a classification result of the classification model into an ROC curve, wherein the ROC curve is used as an evaluation index of the classifier and is used as a basis for calculating an AUC;
step 9: calculating the AUC score of the classifier as the score of the evaluation model according to the ROC curve obtained in the step 8, wherein the score is expressed as follows:
wherein M is a negative sample, N is a positive sample, the product of the two is the sum of the positive and negative samples in a pairwise ordering mode, the sum of the positive sample ordering modes is calculated and added, and rank is calculated i Is the number of samples less than the sample score. And if the score is not ideal, repeating the steps 4-9.
Step 10: if the neural network model obtained in the step 9 meets the required AUC score (set by itself according to the actual requirement), the model is used as a final detection model, namely, the neural network model of the data enhancement method using the prior defect characteristics and the SSPCAB self-supervision attention mechanism is used for completing training.
Further, the abnormal image generation strategy in the step 3 is divided into two working modes according to the simulated abnormal size; the first mode of operation is: in the process of cutting, determining the size of a rectangle to be cut according to a certain proportion according to the size of an original image; the second mode of operation is: giving a certain numerical range which is far smaller than the size of the image, and randomly selecting numerical values in the range as the size of a clipping rectangle so as to generate a tiny abnormal patch; enriching data types through two different anomaly simulations;
further, the normal sample and the simulated abnormal sample which are input by the self-supervision training method in the step 4 are divided into two working modes according to different classification modes; the first mode of operation is: the classification task of the image only comprises two categories of normal images and abnormal images, the normal images are marked as 0, and the abnormal images randomly select one of tiny anomalies and standard anomalies to simulate in the process of enhancing data each time and are marked as 1; the second mode of operation is: the classification task of the image comprises three categories of normal images, standard anomalies and tiny anomalies, wherein two anomaly images of the standard anomalies and the tiny anomalies are generated simultaneously in the data enhancement process, the normal images are marked as 0, the standard anomalies are marked as 1, and the tiny anomalies are marked as 2; training a classifier with better performance through different classification tasks;
further, in step 7, the gaussian density estimator performs anomaly detection on the image and calculates anomaly scores, and the input image is only labeled with image-level labels and no pixel-level labels; in the evaluation stage, the extracted features of the last convolution layer of the model are taken as output, the extracted features are input to a Gaussian density estimator, and the anomaly scores of the features are calculated and expressed as follows:
where x is the input feature, μ is the mathematical desired value, Σ is an n-order matrix obtained during learning, and their sub-table is expressed as:
wherein x is a feature matrix, and m is the number of the feature matrices;
wherein x is a feature matrix, mu is a data expectation, and m is the number of the feature matrices;
further, the objective function in step 6 is expressed as:
wherein p is t Is a predicted value; FL is the focus loss; gamma ray hc Is a super parameter; ζ is a time-varying parameter that is assigned different weights during different training periods, with the goal of combining early-to-confidence predicted class concerns with mid-to-misclassified hard sample class concerns. Alpha is defined weight super parameter, which is generally taken as 0.1, G is self-supervision attention module, X is tensor of self-supervision attention module; in detail, the parameter ζ that varies with the training epoch is defined as:
wherein, tau is a super parameter and tau is more than or equal to 1, which is used for dividing the total epoch; t is the total epoch number; t is the current epoch number. For example, if τ is set to 2, the periodic shape is an inverted triangle, τ changes from 1 to 0 in the first half and from 0 in the second half;
the beneficial effects of adopting above-mentioned technical scheme to produce lie in:
the invention provides a data enhancement method based on prior defect characteristics and SSPCAB attention mechanisms, which adopts a neural network combined with a self-supervision attention prediction mechanism, and adds industrial defect prior knowledge acquired on an industrial site in the data enhancement process, so that a generated simulated defect image is more in line with a real defect image, the finally obtained detection precision is higher, and the defect detection efficiency is effectively improved.
Drawings
FIG. 1 is a flow chart of a method for enhancing self-monitoring data in combination with an attention mechanism in an embodiment of the present invention.
FIG. 2 is a schematic diagram of a data enhancement strategy according to an embodiment of the present invention.
Fig. 3 is a block diagram of the attention mechanism in the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are for
The invention is illustrated but not intended to limit the scope of the invention.
The traditional industrial defect detection method is greatly interfered by external factors in a complex scene, for example, when the conditions of illumination change, pollution of a background light source, vibration of production line equipment and the like exist in an industrial field, the detection is greatly influenced. Current deep learning methods can avoid these problems to some extent, however deep learning methods require a large amount of defect data to train the model, which is currently difficult for industry to provide. Based on this, as shown in fig. 1, the present invention provides a data enhancement method based on a priori defect characteristics and ssplab attention mechanism, comprising the steps of:
step 1: establishing an industrial defect detection data set; the data set comprises a test set and a training set, wherein the test set comprises a certain proportion of normal samples and defect samples, and the training set only comprises a certain number of normal samples;
step 2: establishing an industrial defect product priori knowledge base; the method comprises the steps of obtaining defect shape information and type information which are frequently generated by different industrial products as priori knowledge through field investigation and analysis of an industrial field;
step 3: a data enhancement strategy with priori knowledge is adopted, a rectangular patch is obtained by cutting a normal image, a new defect patch with a defect shape and original sample characteristics is obtained by fusing the patch and the priori defect shape, and the patch is pasted back to the original image to generate a simulated abnormal image; as shown in fig. 2;
the data enhancement strategy with priori knowledge is divided into two working modes according to the simulated abnormal size; the first mode of operation is: in the process of cutting, determining the size of a rectangle to be cut according to a certain proportion according to the size of an original image; the second mode of operation is: giving a certain numerical range which is far smaller than the size of the image, and randomly selecting numerical values in the range as the size of a clipping rectangle so as to generate a tiny abnormal patch; enriching data types through two different anomaly simulations;
step 4: taking a normal sample and a simulated abnormal sample generated based on the normal sample as input, training an image classifier capable of detecting an abnormal image by using a simulated image and a normal image based on a training method of self-supervision learning, wherein the classifier is provided with four convolution layers, the channel sizes are 64, 128, 256 and 512, the image size is converted into 512 multiplied by 1 by adopting a Relu activation function and two pooling layers, and finally the output is converted into probability distribution by using softmax;
the classifier is divided into two working modes according to different classification modes; the first mode of operation is: the classification task of the image only comprises two categories of normal images and abnormal images, the normal images are marked as 0, and the abnormal images randomly select one of tiny anomalies and standard anomalies to simulate in the process of enhancing data each time and are marked as 1; the second mode of operation is: the classification task of the image comprises three categories of normal images, standard anomalies and tiny anomalies, wherein two anomaly images of the standard anomalies and the tiny anomalies are generated simultaneously in the data enhancement process, the normal images are marked as 0, the standard anomalies are marked as 1, and the tiny anomalies are marked as 2; training a classifier with better performance through different classification tasks;
step 5: a neural network incorporating a self-supervising attentiveness mechanism; as shown in fig. 3;
the self-supervising attentiveness mechanism is divided into two parts: (1) The masking convolution layer adopts a convolution kernel with the size of 3 multiplied by 3, a masking area M with the size of 1 is arranged in the middle of the convolution kernel, calculation is not involved in the process of carrying out convolution by the convolution kernel, and parameters which can be learned by the masking convolution are positioned at four corners of a receptive field, wherein k0 epsilon N + Is a super parameter defining the size of the sub-core, C is the number of input channels. d is a core K i Distance from the mask area M, where M is denoted as ε R 1 X 1 xC, its predictive value is K i And (3) summing. (2) The channel attention mechanism is fused after the convolution layer, the features extracted by the convolution layer are subjected to a Squeeze compression operation, and feature mapping crossing the space dimension H×W is aggregated to generate a channel descriptor, namely H×W×C- & gt 1×1×C. The global space information is compressed into the channel descriptors so that these descriptors can be utilized by other layers of its input. The approach taken by Squeeze is global average pooling, expressed as:
where H is the length of the feature map, W is the width of the feature map, u c For the converted feature matrix, keeping the channel number C unchanged, and compressing the features into a channel descriptor;
thereafter, the gating is parameterized by a bottleneck formed by two fully connected layersThe mechanism, i.e. W 1 For reducing dimension, W 2 For dimension increment, expressed as:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z))
wherein W is 1 ,W 2 Is a full connection layer, W 1 Where r is a scaling parameter, here a value of 16, the number of channels can be reduced to reduce the amount of computation. Delta is a Relu layer, the output dimension is unchanged, and W 2 Multiplying and finally obtaining s through a sigmoid function;
after the module is inserted into the last convolutional layer of the backbone neural network, parameters are updated synchronously with training of the neural network;
step 6: repeating the steps 4 and 5 until the target function of the neural network converges to obtain the optimal classified neural network model;
the objective function is expressed as:
wherein p is t Is a predicted value; FL is the focus loss; gamma ray hc Is a super parameter; ζ is a time-varying parameter that is assigned different weights during different training periods, with the goal of combining early-to-confidence predicted class concerns with mid-to-misclassified hard sample class concerns. Alpha is defined weight super parameter, which is generally taken as 0.1, G is self-supervision attention module, X is tensor of self-supervision attention module; in detail, the parameter ζ that varies with the training epoch is defined as:
wherein, tau is a super parameter and tau is more than or equal to 1, which is used for dividing the total epoch; t is the total epoch number; t is the current epoch number. For example, if τ is set to 2, the periodic shape is an inverted triangle, τ changes from 1 to 0 in the first half and from 0 in the second half;
step 7: performing model evaluation on the classification model obtained in the step 6 in a test set, and calculating the abnormal score of the image through a Gaussian density estimator to classify the image;
the Gaussian density estimator detects the image abnormality and calculates the abnormality score, and the input image is only marked with the image-level label and has no pixel-level label; in the evaluation stage, the extracted features of the last convolution layer of the model are taken as output, the extracted features are input to a Gaussian density estimator, and the anomaly scores of the features are calculated and expressed as follows:
where x is the input feature, μ is the mathematical desired value, Σ is an n-order matrix obtained during learning, and their sub-table is expressed as:
wherein x is a feature matrix, and m is the number of the feature matrices;
wherein x is a feature matrix, mu is a data expectation, and m is the number of the feature matrices;
step 8: drawing a classification result of the classification model into an ROC curve, wherein the ROC curve is used as an evaluation index of the classifier and is used as a basis for calculating an AUC;
step 9: calculating the AUC score of the classifier as the score of the evaluation model according to the ROC curve obtained in the step 8, wherein the score is expressed as follows:
wherein M is negativeThe samples, N is positive sample, the product of the two is the sum of the positive and negative samples ordered pairwise, the sum of the positive sample orders is calculated and added, and rank is calculated i Is the number of samples less than the sample score. And if the score is not ideal, repeating the steps 4-9.
Step 10: if the neural network model obtained in the step 9 meets the required AUC score (set by itself according to the actual requirement), the model is used as a final detection model, namely, the neural network model of the data enhancement method using the prior defect characteristics and the SSPCAB self-supervision attention mechanism is used for completing training.
The embodiment of the self-monitoring data enhancement method combined with the attention mechanism can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.
Claims (5)
1. A method of data enhancement based on a priori defect characteristics and ssplab attention mechanisms, comprising the steps of:
step 1: establishing an industrial defect detection data set; the data set comprises a test set and a training set, wherein the test set comprises a certain proportion of normal samples and defect samples, and the training set only comprises a certain number of normal samples;
step 2: establishing an industrial defect product priori knowledge base; the method comprises the steps of obtaining defect shape information and type information which are frequently generated by different industrial products as priori knowledge through field investigation and analysis of an industrial field;
step 3: the data with priori knowledge is enhanced, a rectangular patch is obtained by cutting a normal image, a new defect patch with a defect shape and original sample characteristics is obtained by fusing the patch and the priori defect shape, and the patch is pasted back to the original image to generate a simulated abnormal image;
step 4: taking a normal sample and a simulated abnormal sample generated based on the normal sample as input, training an image classifier capable of detecting an abnormal image by using a simulated image and a normal image based on a training method of self-supervision learning, wherein the classifier is provided with four convolution layers, the channel sizes are 64, 128, 256 and 512, the image size is converted into 512 multiplied by 1 by adopting a Relu activation function and two pooling layers, and finally the output is converted into probability distribution by using softmax;
step 5: a neural network incorporating a self-supervising attentiveness mechanism;
the self-supervising attentiveness mechanism is divided into two parts: (1) The masking convolution layer adopts a convolution kernel with the size of 3 multiplied by 3, a masking area M with the size of 1 is arranged in the middle of the convolution kernel, calculation is not involved in the process of carrying out convolution by the convolution kernel, and parameters which can be learned by the masking convolution are positioned at four corners of a receptive field, wherein k0 epsilon N + Is a super parameter defining the size of the sub-core, C is the number of input channels. d is a core K i Distance from the mask area M, where M is denoted as ε R 1 X 1 xC, its predictive value is K i And (3) summing. (2) The channel attention mechanism is fused after the convolution layer, the features extracted by the convolution layer are subjected to a Squeeze compression operation, and feature mapping crossing the space dimension H×W is aggregated to generate a channel descriptor, namely H×W×C- & gt 1×1×C. The global space information is compressed into the channel descriptors so that these descriptors can be utilized by other layers of its input. The approach taken by Squeeze is global average pooling, expressed as:
where H is the length of the feature map, W is the width of the feature map, u c For the converted feature matrix, keeping the channel number C unchanged, and compressing the features into a channel descriptor;
the gating mechanism is then parameterized by a bottleneck of two fully connected layers, i.e. W 1 For reducing dimension, W 2 For dimension increment, expressed as:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z))
wherein W is 1 ,W 2 Is a full connection layer, W 1 Where r is a scaling parameter, here a value of 16, the number of channels can be reduced to reduce the amount of computation. Delta is a Relu layer, the output dimension is unchanged, and W 2 Multiplying and finally obtaining s through a sigmoid function;
after the module is inserted into the last convolutional layer of the backbone neural network, parameters are updated synchronously with training of the neural network;
step 6: repeating the steps 4 and 5 until the target function of the neural network converges to obtain the optimal classified neural network model;
step 7: performing model evaluation on the classification model obtained in the step 6 in a test set, and classifying by calculating abnormal scores of the images through Gaussian density estimation;
step 8: drawing a classification result of the classification model into an ROC curve, wherein the ROC curve is used as an evaluation index of the classifier and is used as a basis for calculating an AUC;
step 9: calculating the AUC score of the classifier as the score of the evaluation model according to the ROC curve obtained in the step 8, wherein the score is expressed as follows:
wherein M is a negative sample of the sample,n is positive sample, the product of the N and the N is the sum of the positive sample and the negative sample in a pairwise ordering mode, the sum of the positive sample ordering modes is calculated and added, and rank is calculated i Is the number of samples less than the sample score. And if the score is not ideal, repeating the steps 4-9.
Step 10: if the neural network model obtained in the step 9 meets the required AUC score (set by itself according to the actual requirement), the model is used as a final detection model, namely, the neural network model of the data enhancement method using the prior defect characteristics and the SSPCAB self-supervision attention mechanism is used for completing training.
2. The method for enhancing data based on a priori defect features and sspcar attention mechanism as set forth in claim 1 wherein the anomaly image generation strategy of step 3 is divided into two modes of operation based on simulated anomaly size; the first mode of operation is: in the process of cutting, determining the size of a rectangle to be cut according to a certain proportion according to the size of an original image; the second mode of operation is: giving a certain numerical range which is far smaller than the size of the image, and randomly selecting numerical values in the range as the size of a clipping rectangle so as to generate a tiny abnormal patch; the data types are enriched by two different anomaly simulations.
3. The data enhancement method based on the prior defect feature and the sspcar attention mechanism according to claim 1, wherein the normal sample and the simulated abnormal sample which are input by the self-supervision training method in step 4 are divided into two working modes according to different classification modes; the first mode of operation is: the classification task of the image only comprises two categories of normal images and abnormal images, the normal images are marked as 0, and the abnormal images randomly select one of tiny anomalies and standard anomalies to simulate in the process of enhancing data each time and are marked as 1; the second mode of operation is: the classification task of the image comprises three categories of normal images, standard anomalies and tiny anomalies, wherein two anomaly images of the standard anomalies and the tiny anomalies are generated simultaneously in the data enhancement process, the normal images are marked as 0, the standard anomalies are marked as 1, and the tiny anomalies are marked as 2; and training the classifier with better performance through different classification tasks.
4. The method for enhancing data based on a priori defect features and sspcar attention mechanism as set forth in claim 1 wherein the gaussian density estimator performs anomaly detection and calculates anomaly scores for the image in step 7, the input image is labeled only with image level labels and no pixel level labels; in the evaluation stage, the extracted features of the last convolution layer of the model are taken as output, the extracted features are input to a Gaussian density estimator, and the anomaly scores of the features are calculated and expressed as follows:
where x is the input feature, μ is the mathematical desired value, Σ is an n-order matrix obtained during learning, and their sub-table is expressed as:
wherein x is a feature matrix, and m is the number of the feature matrices;
wherein x is a feature matrix, mu is a data expectation, and m is the number of the feature matrices;
5. the method of claim 1, wherein the objective function in step 6 is expressed as:
wherein p is t Is a predicted value; FL is the focus loss; gamma ray hc Is a super parameter; ζ is a time-varying parameter that is assigned different weights during different training periods, with the goal of combining early-to-confidence predicted class concerns with mid-to-misclassified hard sample class concerns. Alpha is defined weight super parameter, which is generally taken as 0.1, G is self-supervision attention module, X is tensor of self-supervision attention module; in detail, the parameter ζ that varies with the training epoch is defined as:
wherein, tau is a super parameter and tau is more than or equal to 1, which is used for dividing the total epoch; t is the total epoch number; t is the current epoch number. For example, if τ is set to 2, the periodic shape is an inverted triangle, τ changes from 1 to 0 in the first half and from 0 in the second half.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310912045.5A CN116912625A (en) | 2023-07-25 | 2023-07-25 | Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310912045.5A CN116912625A (en) | 2023-07-25 | 2023-07-25 | Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116912625A true CN116912625A (en) | 2023-10-20 |
Family
ID=88352806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310912045.5A Pending CN116912625A (en) | 2023-07-25 | 2023-07-25 | Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116912625A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117557872A (en) * | 2024-01-12 | 2024-02-13 | 苏州大学 | Unsupervised anomaly detection method and device for optimizing storage mode |
CN118485592A (en) * | 2024-07-05 | 2024-08-13 | 华侨大学 | Low-illumination phase contrast cell microscopic image enhancement method based on multi-scale transducer |
-
2023
- 2023-07-25 CN CN202310912045.5A patent/CN116912625A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117557872A (en) * | 2024-01-12 | 2024-02-13 | 苏州大学 | Unsupervised anomaly detection method and device for optimizing storage mode |
CN117557872B (en) * | 2024-01-12 | 2024-03-22 | 苏州大学 | Unsupervised anomaly detection method and device for optimizing storage mode |
CN118485592A (en) * | 2024-07-05 | 2024-08-13 | 华侨大学 | Low-illumination phase contrast cell microscopic image enhancement method based on multi-scale transducer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114627383B (en) | Small sample defect detection method based on metric learning | |
CN111444939B (en) | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field | |
CN116912625A (en) | Data enhancement method based on priori defect characteristics and SSPCAB attention mechanism | |
CN109740676B (en) | Object detection and migration method based on similar targets | |
CN116310785B (en) | Unmanned aerial vehicle image pavement disease detection method based on YOLO v4 | |
CN112434586B (en) | Multi-complex scene target detection method based on domain self-adaptive learning | |
CN109284779A (en) | Object detection method based on deep full convolution network | |
CN111199543A (en) | Refrigerator-freezer surface defect detects based on convolutional neural network | |
CN115136209A (en) | Defect detection system | |
CN115880298A (en) | Glass surface defect detection method and system based on unsupervised pre-training | |
CN115526847A (en) | Mainboard surface defect detection method based on semi-supervised learning | |
CN113591948A (en) | Defect pattern recognition method and device, electronic equipment and storage medium | |
Cui et al. | Real-time detection of wood defects based on SPP-improved YOLO algorithm | |
CN117036243A (en) | Method, device, equipment and storage medium for detecting surface defects of shaving board | |
CN116563250A (en) | Recovery type self-supervision defect detection method, device and storage medium | |
CN118212196B (en) | Industrial defect detection method based on image restoration | |
Ruediger-Flore et al. | CAD-based data augmentation and transfer learning empowers part classification in manufacturing | |
CN116912144A (en) | Data enhancement method based on discipline algorithm and channel attention mechanism | |
CN117911399A (en) | Light-weight multi-scale aluminum profile surface defect detection method | |
CN115294392B (en) | Visible light remote sensing image cloud removal method and system based on network model generation | |
KR102494829B1 (en) | Structure damage evaluation method for using the convolutional neural network, and computing apparatus for performing the method | |
CN113808079B (en) | Industrial product surface defect self-adaptive detection method based on deep learning model AGLNet | |
CN115601610A (en) | Fabric flaw detection method based on improved EfficientDet model | |
US20230084761A1 (en) | Automated identification of training data candidates for perception systems | |
Chen et al. | Noise-assisted data enhancement promoting image classification of municipal solid waste |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |