CN114943864A

CN114943864A - Tobacco leaf grading method integrating attention mechanism and convolutional neural network model

Info

Publication number: CN114943864A
Application number: CN202210666171.2A
Authority: CN
Inventors: 占彤平; 刘芳; 江秀; 伍惠英; 吴斯晟; 蔡晓鹏; 林雅玲; 林榕玲; 陈燕燕; 黄睿
Original assignee: Fujian Yili Information Technology Co ltd
Current assignee: Fujian Yili Information Technology Co ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-08-26

Abstract

The invention relates to a tobacco leaf grading method integrating an attention mechanism and a convolutional neural network model, which comprises the following steps of: s1, acquiring a tobacco leaf picture, constructing a tobacco leaf sample data set based on the national flue-cured tobacco leaf grading standard and an expert experience method, and preprocessing; step S2: based on a convolutional neural network, introducing a depth separable convolution method, decomposing standard convolution into depth convolution and point-by-point convolution, constructing a depth separable convolution model, and training based on a preprocessed tobacco leaf sample data set; and step S3, introducing a method of an attention mechanism module, and optimizing the depth separable convolution model to obtain a final tobacco leaf grading model. The invention can realize accurate and efficient tobacco leaf grading.

Description

Tobacco leaf grading method integrating attention mechanism and convolutional neural network model

Technical Field

The invention relates to the field of machine learning, in particular to a tobacco leaf grading method integrating an attention mechanism and a convolutional neural network model.

Background

In the tobacco industry chain, grading of flue-cured tobacco is an important link and is related to the practical benefit of tobacco growers. The transmission manual grading is recognized by naked eyes, the error is large, and the efficiency is low. And forty-two levels are adopted in China, and the division of the refinement level brings greater difficulty for detection and grading of tobacco leaves. With the continuous development of machine vision, the tobacco leaf grading technology is changing from inefficient manual grading to computer automatic grading. With the great increase of the detection amount, the traditional manual grading algorithm cannot meet the requirement of rapid production, and the application of the machine vision technology and the deep learning to the grading identification of the tobacco leaf images has great research space and application value. According to a large amount of research data and years of accumulated practice in the tobacco industry of China, the tobacco leaves at different parts of the tobacco leaf structure generally have different appearance characteristics, and the chemical components, the smoke quality and the physical characteristics of the tobacco leaves with different colors have larger differences. The quality of the tobacco leaves is closely related to the appearance color and the appearance shape of the tobacco leaves and has regularity.

Disclosure of Invention

In view of this, the invention aims to provide a tobacco leaf grading method combining an attention mechanism and a convolutional neural network model, which can realize accurate and efficient tobacco leaf grading.

In order to achieve the purpose, the invention adopts the following technical scheme:

a tobacco leaf grading method integrating an attention mechanism and a convolutional neural network model comprises the following steps:

s1, acquiring a tobacco leaf picture, constructing a tobacco leaf sample data set based on the national flue-cured tobacco leaf grading standard and an expert experience method, and preprocessing;

step S2: based on a convolutional neural network, introducing a depth separable convolution method, decomposing standard convolution into depth convolution and point-by-point convolution, constructing a depth separable convolution model, and training based on a preprocessed tobacco leaf sample data set;

and step S3, introducing a method of an attention mechanism module, and optimizing the depth separable convolution model to obtain a final tobacco leaf grading model.

Further, the preprocessing comprises cutting, denoising and standardization processing of the tobacco leaf image, and specifically comprises the following steps:

cutting the tobacco leaf image, and denoising the cut image by adopting a bilateral filtering denoising method; after denoising, standardizing the tobacco leaf image, wherein the formula of the standardization is as follows:

where u is the mean of the image, x represents the image matrix, σ represents the standard deviation, N represents the number of pixels of the image x, and adjusted _ stddev is the adjustment coefficient.

Further, the depth separable convolution model, the layer 1, is depth convolution, performs calculation of spatial correlation, and performs effective extraction on features; the layer 2 is a point-by-point convolution, and the number of output characteristic channels is adjusted through linear combination of input channels, specifically:

firstly, deep convolution operation is carried out, each input channel is filtered by a single filter, all multi-channel feature maps from the upper layer are split into feature maps of single channels, single-channel convolution is carried out on the feature maps respectively, and then the feature maps are stacked together again;

setting the input and output characteristic graphs to be the same, and adopting convolution kernel as D _k ·D _k ，D _k Representing the length and width of the convolution kernel k, M being the number of input channels, N being the number of output channels, D _f And (3) representing the width and height of the input and output characteristic graphs, performing convolution on the M convolution kernels and the M channels respectively, wherein the calculation amount of the deep convolution is as follows (2):

D _k ·D _k ·M·D _f ·D _f (2)

to is D _f ·D _f M is combined by convolution, the convolution of 1.1. N is carried out, the conventional convolution is combined, and the calculation amount of the point-by-point convolution is as shown in formula (3):

M·N·D _f ·D _f (3)

the total computation of the depth separable convolution is the sum of the depth convolution and the point-by-point convolution, as shown in equation (4):

D _k ·D _k ·M·D _f ·D _f +M·N·D _f ·D _f (4)

and the calculated amount of the standard convolution is as follows:

D _k ·D _k ·M·D _f ·D _f ·N (5)

the ratio of the depth separable convolution to the standard convolution is as follows (6):

further, the attention mechanism module adopts a convolution attention module, combines a space and a channel attention mechanism module, and is particularly suitable for the attention mechanism module

The channel attention module firstly inputs the characteristics, respectively performs maximum pooling and average pooling, then respectively performs element-by-element addition operation on the characteristics output by the shared full-connection layer through the multilayer perceptron, and generates a final channel attention weight through an activation function; finally, performing element-by-element multiplication operation on the channel attention weight and the input feature weight to generate input features required by the space attention module;

the spatial attention module takes the feature map output by the channel attention module as an input feature map of the module; firstly, performing maximum pooling and average pooling based on channels, then performing merging operation on the two results based on the channels, performing convolution operation to reduce the dimension into 1 channel, and generating space attention characteristics through an activation function; and finally, multiplying the space attention characteristic and the input characteristic of the module to obtain the finally generated characteristic.

Further, the channel attention mechanism module compresses the feature map in the spatial dimension to obtain a one-dimensional vector, then performs operation, considers not only the average pooling but also the maximum pooling when performing compression in the spatial dimension, the average pooling and the maximum pooling are used for aggregating spatial information of the feature map, sends the spatial information to a shared network, compresses the spatial dimension of the input feature map, sums and combines element by element to generate the channel attention map,

the channel attention calculation formula is shown in formula (7)

In formula 7:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; MLP represents a multi-layer perceptron; avgpoolRepresents average pooling; max Pool denotes maximum pooling; w ₁ And W ₀ Parameters in the represented multi-layer perceptron; sigma represents sigmoid activation function

Further, the space attention mechanism module compresses the channel, and performs average pooling and maximum pooling on the channel dimension respectively; the operation of maximum pooling MaxPool is to extract the maximum value on a channel, and the extraction times are the height multiplied by the width; the operation of average value pooling AvgPool is to extract an average value on a channel, and the extraction times are also height times width; and then combining the extracted feature maps to obtain a 2-channel feature map, which specifically comprises the following steps:

in the formula:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; avg Pool represents average pooling; max Pool represents maximum pooling; 7 × 7 denotes the size of the convolution kernel; σ denotes a sigmoid activation function.

Compared with the prior art, the invention has the following beneficial effects:

the invention establishes a model integrating an attention mechanism and a convolution neural network, introduces a depth separable convolution method, simultaneously adds the attention mechanism, continuously focuses on the most discriminative region to realize the classification of the image, has better prediction effect and effectively improves the identification accuracy.

Drawings

FIG. 1 is an original image of tobacco leaves in an embodiment of the present invention;

FIG. 2 is a block diagram of a filtered denoised image according to an embodiment of the present invention;

FIG. 3 is a standard convolution process in accordance with an embodiment of the present invention;

FIG. 4 is a process for depth separable convolution according to an embodiment of the present invention;

FIG. 5 is a diagram of an attention module in accordance with an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

Referring to the drawings, the invention provides a tobacco leaf grading method integrating an attention mechanism and a convolutional neural network model, which comprises the following steps:

in this example, the tobacco leaf samples studied were all from fujian gming with a collection time of 2021 years and 6 months. The tobacco leaves growing in the upper part, the middle part and the lower part are respectively selected for research, each tobacco leaf sample is relatively good, poor and general tobacco leaves manually selected by tobacco growers, and the tobacco leaf samples are sorted and graded by tobacco experts into three groups, namely B4F, C2F and X2F, and the grading mainly depends on the appearance characteristics of the tobacco leaves.

Shoot 600 many high definition tobacco leaf images through the high appearance of shooing, select 420 qualified tobacco leaf images and use as this experiment, the standard condition of tobacco leaf sample is as follows:

the pictures of the samples of the tobacco leaves with different grades are shown in figure 1;

in this embodiment, the preprocessing includes cutting, denoising, and standardizing the tobacco leaf image, and specifically includes: cutting the tobacco leaf image, and denoising the cut image by adopting a bilateral filtering denoising method; after denoising, standardizing the tobacco leaf image, wherein the formula of the standardization is as follows:

Step S2: based on a convolutional neural network, introducing a deep separable convolution method, decomposing the standard convolution into a deep convolution and a point-by-point convolution, constructing a deep separable convolution model, and training based on a preprocessed tobacco leaf sample data set;

in this embodiment, the calculation process of the convolution model is as shown in fig. 3, the scale and the calculation amount are huge, and it is necessary to perform convolution operation on all input features to obtain a series of outputs. Further, a depth separable convolution model is adopted, the calculation process is shown in fig. 4, the layer 1 is depth convolution, and the calculation of spatial correlation is mainly carried out to effectively extract features; the layer 2 is point-by-point convolution, and the number of output characteristic channels is adjusted mainly through linear combination of input channels.

let the input and output characteristic graphs be the same, and use convolution kernel as D _k ·D _k ，D _k Representing the length and width of the convolution kernel k, M being the number of input channels, N being the number of output channels, D _f And (3) representing the width and the height of the input and output characteristic graphs, performing convolution on the M convolution kernels and the M channels respectively, wherein the deep convolution calculation quantity is as shown in a formula (2):

D _k ·D _k ·M·D _f ·D _f (2)

M·N·D _f ·D _f (3)

D _k ·D _k ·M·D _f ·D _f +M·N·D _f ·D _f (4)

and the calculated amount of the standard convolution is as shown in formula (5):

D _k ·D _k ·M·D _f ·D _f ·N (5)

and step S3, introducing a method of an attention mechanism module, optimizing the depth separable convolution model, and obtaining a final tobacco leaf grading model.

In this embodiment, the attention mechanism module is a convolution attention module, and combines a space and a channel attention mechanism module, specifically

Preferably, the channel attention mechanism module compresses the feature map in the spatial dimension to obtain a one-dimensional vector, and then operates, when compressing in the spatial dimension, not only considering average pooling but also considering maximum pooling, the average pooling and the maximum pooling are used to aggregate spatial information of the feature map and are sent to a sharing network, the spatial dimension of the input feature map is compressed, and element-by-element summation and combination are performed to generate the channel attention map,

the channel attention calculation formula is shown in formula (7)

In formula 7:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; MLP denotes a multilayer perceptron; AvgPool represents average pooling; max Pool denotes maximum pooling; w ₁ And W ₀ Parameters in the represented multi-layer perceptron; sigma represents sigmoid activation function

Preferably, the spatial attention mechanism module compresses the channels, and performs average pooling and maximum pooling on the channel dimensions respectively; the operation of maximum pooling MaxPool is to extract the maximum value on a channel, and the extraction times are height multiplied by width; the operation of average value pooling AvgPool is to extract an average value on a channel, and the extraction times are also height times width; then, combining the extracted feature maps to obtain a 2-channel feature map, which specifically comprises the following steps:

in the formula:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; avg Pool represents average pooling; max Pool denotes Max PoolMelting; 7 × 7 denotes the size of the convolution kernel; σ denotes the sigmoid activation function.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A tobacco leaf grading method integrating an attention mechanism and a convolutional neural network model is characterized by comprising the following steps of:

2. The tobacco leaf grading method combining the attention mechanism and the convolutional neural network model according to claim 1, wherein the preprocessing comprises cropping, denoising and standardizing the tobacco leaf image, and specifically comprises:

clipping the tobacco leaf image, and denoising the clipped image by adopting a bilateral filtering denoising method; after denoising, standardizing the tobacco leaf image, wherein the formula of the standardization is as follows:

3. The tobacco leaf grading method integrating the attention mechanism and the convolutional neural network model according to claim 1, wherein the depth separable convolutional model is adopted, the layer 1 is depth convolution, spatial correlation calculation is performed, and features are effectively extracted; layer 2 is a point-by-point convolution, and the number of output characteristic channels is adjusted through linear combination of input channels, specifically:

let the input and output characteristic graphs be the same, and use convolution kernel as D _k ·D _k ，D _k Representing the length and width of a convolution kernel k, M being the number of input channels, N being the number of output channels, D _f And (3) representing the width and height of the input and output characteristic graphs, performing convolution on the M convolution kernels and the M channels respectively, wherein the calculation amount of the deep convolution is as follows (2):

D _k ·D _k ·M·D _f ·D _f (2)

M·N·D _f ·D _f (3)

D _k ·D _k ·M·D _f ·D _f +M·N·D _f ·D _f (4)

and the calculated amount of the standard convolution is as follows:

D _k ·D _k ·M·D _f ·D _f ·N (5)

4. the tobacco leaf grading method combining an attention mechanism and a convolutional neural network model according to claim 1, wherein the attention mechanism module is a convolutional attention module, and is combined with a space and channel attention mechanism module, in particular

the spatial attention module takes the feature map output by the channel attention module as an input feature map of the module; firstly, performing maximum pooling and average pooling based on channels, then performing merging operation on the two results based on the channels, performing convolution operation to reduce the dimension into 1 channel, and then generating spatial attention characteristics by an activation function; and finally, multiplying the space attention characteristic and the input characteristic of the module to obtain the finally generated characteristic.

5. The tobacco leaf grading method combining attention mechanism and convolutional neural network model according to claim 4, wherein the channel attention mechanism module compresses the feature map in the spatial dimension to obtain a one-dimensional vector, and then performs the operation, when performing the compression in the spatial dimension, not only the average pooling but also the maximum pooling are considered, the average pooling and the maximum pooling are used for aggregating the spatial information of the feature map, and the spatial information is sent to a shared network, the spatial dimension of the input feature map is compressed, and the element-by-element summation and combination are performed to generate the channel attention map, and the channel attention calculation formula is as shown in formula (7)

In formula 7:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; MLP denotes a multilayer perceptron; avg Pool represents average pooling; max Pool denotes maximum pooling; w ₁ And W ₀ Parameters in the represented multi-layer perceptron; σ denotes the sigmoid activation function.

6. The tobacco leaf grading method integrating the attention mechanism and the convolutional neural network model according to claim 4, wherein the spatial attention mechanism module compresses channels, and performs average pooling and maximum pooling on channel dimensions respectively; the operation of maximum pooling MaxPool is to extract the maximum value on a channel, and the extraction times are height multiplied by width; the operation of average value pooling AvgPool is to extract an average value on a channel, and the extraction times are also height times width; then, combining the extracted feature maps to obtain a 2-channel feature map, which specifically comprises the following steps:

in the formula:

and

respectively representing the average pooling characteristic and the maximum pooling characteristic of the channels; avg Pool represents average pooling; max Pool denotes maximum pooling; 7 × 7 denotes the size of the convolution kernel; σ denotes a sigmoid activation function.