Plants 11 03174 v2
Plants 11 03174 v2
Plants 11 03174 v2
Article
A Copy Paste and Semantic Segmentation-Based Approach for
the Classification and Assessment of Significant Rice Diseases
Zhiyong Li 1,2,† , Peng Chen 1,2,† , Luyu Shuai 1,2,† , Mantao Wang 1,2 , Liang Zhang 1,2 , Yuchao Wang 3
and Jiong Mu 1,2, *
Abstract: The accurate segmentation of significant rice diseases and assessment of the degree of
disease damage are the keys to their early diagnosis and intelligent monitoring and are the core of
accurate pest control and information management. Deep learning applied to rice disease detection
and segmentation can significantly improve the accuracy of disease detection and identification but
requires a large number of training samples to determine the optimal parameters of the model. This
study proposed a lightweight network based on copy paste and semantic segmentation for accurate
disease region segmentation and severity assessment. First, a dataset for rice significant disease
segmentation was selected and collated based on 3 open-source datasets, containing 450 sample
images belonging to 3 categories of rice leaf bacterial blight, blast and brown spot. Then, to increase
the diversity of samples, a data augmentation method, rice leaf disease copy paste (RLDCP), was
proposed that expanded the collected disease samples with the concept of copy and paste. The new
RSegformer model was then trained by replacing the new backbone network with the lightweight
Citation: Li, Z.; Chen, P.; Shuai, L.; semantic segmentation network Segformer, combining the attention mechanism and changing the
Wang, M.; Zhang, L.; Wang, Y.; Mu, J.
upsampling operator, so that the model could better balance local and global information, speed up
A Copy Paste and Semantic
the training process and reduce the degree of overfitting of the network. The results show that RLDCP
Segmentation-Based Approach for
could effectively improve the accuracy and generalisation performance of the semantic segmentation
the Classification and Assessment of
model compared with traditional data augmentation methods and could improve the MIoU of the
Significant Rice Diseases. Plants 2022,
11, 3174. https://doi.org/10.3390/
semantic segmentation model by about 5% with a dataset only twice the size. RSegformer can achieve
plants11223174 an 85.38% MIoU at a model size of 14.36 M. The method proposed in this paper can quickly, easily
and accurately identify disease occurrence areas, their species and the degree of disease damage,
Academic Editor: Mukhtar Ahmed
providing a reference for timely and effective rice disease control.
Received: 28 August 2022
Accepted: 18 November 2022 Keywords: disease type recognition; disease level differentiation; object detection; semantic segmentation
Published: 20 November 2022
the accurate classification and identification of rice diseases and the assessment of disease
damage levels. It is also a key to accurately locating rice disease areas and guiding plant
protection equipment to target spraying. Early rice disease target detection algorithms used
a sliding window strategy to select region proposals, extracted region proposal features
and finally used a classifier to classify them to obtain the target area [3]. Although this
method can locate disease targets without missing them, the redundant region proposal
generated can be computationally intensive. It takes more time to traverse all the disease
images, resulting in poor detection performance. In addition, the feature extraction of
region proposal uses manual methods such as grey-scale co-occurrence matrix [4], textural
descriptors [5] and local binary patterns [6], and the extracted features are more focused on
the underlying features such as disease colour and shape, resulting in poor robustness of
disease detection; the classifier uses support vector machines [7], Bayesian classifiers [8],
unsupervised clustering [9] and other machine learning algorithms for disease recognition,
with slow recognition speed and low accuracy rate.
Deep learning can automatically learn features from disease image data, which has
the advantages of high learning ability, high upper-performance limit, good portability
and wide coverage compared with traditional machine learning, which can avoid the
limitations of manual feature engineering [10]. Datasets are the basis for building deep
learning models, and the dataset’s quality determines whether the deep learning model can
be trained successfully. According to the survey, several publicly available plant disease
image datasets have been formed [11]. The datasets for rice disease research are fragmented,
scattered and redundant, and few datasets are publicly available. Therefore, most of the
existing deep learning-based plant disease diagnosis methods use data augmentation to im-
prove the models’ recognition, detection and segmentation accuracy. The commonly used
data augmentation methods are classified into traditional, supervised and unsupervised.
Bhagat et al. used traditional data enhancement methods such as geometric transformation,
colour transformation and fuzzy transformation to expand crop disease image data that
are simple and easy to operate, but the amount of information they add is limited [12].
Therefore, the accuracy of the model is also limited. Hu et al. used SinGAN to generate
many plant leaf disease images [13], but the method requires additional training overhead.
The copy-paste method was proposed by snapping out instances, then dithering, flipping
and pasting them onto another image, where each operation had large randomness [14].
Still, the randomness of its dithering, flipping, pasting position and number of pastes
made the synthesised images challenging to understand because they did not match the
actual scene.
With the rapid development of semantic segmentation models, many models have
been introduced into plant disease segmentation and classification. However, it is challeng-
ing with existing models to achieve a good trade-off between accuracy and scale. Gonçalves
et al. compared six pixel-level classification prediction methods and obtained relatively
high accuracy with three models, FPN, UNet and DeepLabv3+ (Xception), all of which had
parameter data sizes above 25 m, and SegNet, PSPNet and DeepLabv3+ (Moblienetv2) all
had model parameter counts of less than 8.0 million, despite the relatively weak model gen-
eralization [15]. However, high-accuracy and lightweight models are required for accurate
plant disease segmentation and easy deployment on mobile devices. Furthermore, there are
many challenges with semantic segmentation models for plant foliar disease classification
and segmentation studies, with the overall difficulties centred on the complexity of the
context and the characteristics of the disease itself. To overcome these challenges, some
researchers have improved the model architecture for plant disease segmentation and
classification and produced a richer dataset [16–18]. Hu et al. used the UNet network
model to reduce the influence of complex backgrounds on the assessment results and then
used a multiconvolutional neural network model to automatically identify tea diseases in
small samples [13]. Ji et al. used a two-step approach to detect grapevine black measles
disease and estimate the severity to better extract disease features, first by segmenting the
leaves and disease using the DeepLabv3+ semantic segmentation model based on ResNet50
Plants 2022, 11, 3174 3 of 20
and second by developing a fuzzy rule-based system for each feature to predict the degree
of damage caused by the disease [19]. However, most of these models only target a single
disease of a single crop in the same period and do not consider the impact of similarities
between symptoms of different diseases of the same crop and changes in symptoms of the
same disease of the same crop in different periods on the accuracy of the models; therefore,
so the robustness of the trained models is poor, and their generalisability is weak.
Traditional plant disease severity estimation relies on manual experience. However,
this method is inefficient and requires large labour and time expenditures, and the as-
sessment results are often subjective and unreliable. In addition, more research has been
carried out to automatically estimate plant disease severity by building direct models, i.e.,
by qualitative classification and the detection of plant disease images [20–22]. However,
most of these models cannot achieve refined quantitative estimates, and direct models
have disadvantages such as poor interpretability and weak migration performance and
require retraining the model when the evaluation criteria change [23]. A novel method
proposed for plant disease severity estimation is a semantic segmentation model to achieve
the pixel-level classification of plant disease images and thus obtain the percentage of the
lesion-to-leaf area required for plant disease severity estimation. Wang et al. proposed a
two-stage model fusing DeepLabv3+ and UNet to segment cucumber leaves and disease
spots based on the ratio of segmented disease spots to the leaf pixel area [24] and classified
disease severity based on the percentage of segmented marks in the leaf pixel area. Chen
et al. proposed a new segmentation model, BLSNet, for rice bacterial streak disease and
classified severity classes based on the ratio of lesion area to total leaf area [23]. However,
different plant diseases have different severity estimation criteria, and studies on the fine
assessment of rice disease severity are few and limited to evaluation criteria based on area
percentage, but the timely prevention and control of mid- to late-stage diseases cannot be
accurately assessed with small but densely distributed disease areas.
To solve the above problems, this paper proposes a new rice leaf disease identification
and segmentation model, RSegformer. The main contributions of this paper are:
(1) A publicly available dataset of common rice diseases was collected and annotated
with semantic segmentation.
(2) A data enhancement method for rice disease images was proposed based on the
copy-and-paste idea to generate more images that match the symptoms of rice diseases.
(3) A new rice disease segmentation model, RSegformer, was proposed, with MIoU
reaching 85.38% on a parametric count of 14.36 million.
(4) An index for classifying rice leaf disease classes by combining the ratio of spots
to leaf area and the number of spots is proposed to provide a valuable reference for the
practical application of leaf disease severity estimation in other plants.
Figure 1.
Figure 1. Sample
Sampleimages
imagesofofrice
ricediseases,
diseases,
(a)(a) dataset
dataset 1—bacterial
1—bacterial blight,
blight, (b) (b) dataset
dataset 1—blast,
1—blast, (c) da-
(c) dataset
taset 1—brown
1—brown spot, spot, (d) dataset
(d) dataset 2—bacterial
2—bacterial blight,
blight, (e) dataset
(e) dataset 2—brown
2—brown spot,spot, (f) dataset
(f) dataset 3—bacte-
3—bacterial
rial blight,
blight, (g) dataset
(g) dataset 3—blast,
3—blast, (h) dataset
(h) dataset 3—brown
3—brown spot.spot.
2.1.2.
2.1.2. Data Annotation
Annotation
The
The dataset used
dataset used for
for this
this work
work consisted
consisted of
of 450
450 images.
images. Of these, 150
150 images
images were
were of
of
each of the three types of diseases:
each of the three types of diseases: rice bacterial blight, rice blast and brown spot. Consid-
ering
ering the
the inconsistent
inconsistent resolution
resolution of different
different data, in order to facilitate data augmentation,
all
all images
imageswere
wereresized
resizedtoto × 640
640640 pixels,
× 640 and and
pixels, 450 images were were
450 images annotated using the
annotated EISeg
using the
annotation software
EISeg annotation [27], some
software [27],of which
some are shown
of which in Figure
are shown 2.
in Figure 2.
Plants 2022, 11, 3174
x FOR PEER REVIEW 5 5of
of 20
Figure 2. Sample
Samplericericeleaf
leafdisease
diseaseimage
imageand andsegmentation
segmentation label:
label:(a) (a)
bacterial blight,
bacterial (b) (b)
blight, blast, (c)
blast,
brown
(c) spot,
brown (d) (d)
spot, bacterial blight
bacterial label,
blight (e) blast
label, label,
(e) blast (f) brown
label, spotspot
(f) brown label,label,
where orange,
where mauve,
orange, red,
mauve,
blue and and
red, blue black represent
black representricerice
bacterial
bacterialblight, blast,
blight, brown
blast, brownspot,
spot,healthy
healthyleaves
leaves and
and background
background
areas, respectively.
areas, respectively.
Figure 3. RLDCP data enhancement example: (a) RGB image of the pasted object, (b) RGB image of
the copied object, (c) newly synthesised RGB image, (d) mask image of the pasted object, (e) mask
image of
image of the
the copied
copied object,
object, (f)
(f) newly
newly synthesised
synthesised mask
mask image.
image.
2.1.4.
2.1.4. Rice
Rice Leaf
Leaf Disease
Disease Severity
Severity Label
Label
Different
Different criteria
criteria for
for measuring
measuring disease
disease severity
severity were
were designed
designed for for different
different disease
disease
types
types toto solve
solve the
the problem
problem of of multiple
multiple spots
spots with
with aa small
small total
total area
area covered,
covered, as as in
in the
the rice
rice
brown
brown spot
spot inin the
the middle
middle andand late
late stages
stages of of disease
disease development.
development. For For rice
rice bacterial
bacterial blight
blight
and
and rice
rice blast,
blast, the
the criteria
criteria are
are based
based onon the
the percentage
percentage of of the
the total
total leaf
leaf area
area covered
covered by by thethe
lesion; for rice brown spot, the criteria are based on the percentage of
lesion; for rice brown spot, the criteria are based on the percentage of the area covered by the area covered
by
thethe lesion
lesion andand
the the number
number of lesions
of lesions of which
of which the higher
the higher level level is selected
is selected as theas thelevel.
final final
level. In the area-based criteria, grade 0 is for healthy leaves without disease,
In the area-based criteria, grade 0 is for healthy leaves without disease, grade 1 is for those grade 1 is for
those withto
with 0.1% 0.1%
10%to 10% coverage,
lesion lesion coverage,
grade 2grade 2 is for
is for those those
with 11%with 11%grade
to 25%, to 25%, 3 isgrade
for those3 is
for those with 26% to 45%, grade 4 is for those with 46% to 65% and
with 26% to 45%, grade 4 is for those with 46% to 65% and grade 5 is for those with more grade 5 is for those
with
than more
65%. In than
the65%. In the
criteria criteria
based on the based on the
number number of measurements:
of measurements: grade 0 is a grade
healthy0 leaf is a
healthy leaf without disease, grade 1 is for 1–5 spots in a single image,
without disease, grade 1 is for 1–5 spots in a single image, grade 2 for 6–10, grade 3 for grade 2 for 6–10,
grade
11–15, 3grade
for 11–15, grade and
4 for 16–20 4 forgrade
16–205and grade 5than
for greater for greater
25. Figurethan4 25.
showsFigure
the 4distribution
shows the
distribution of the severity levels of the rice leaf disease dataset according to the above
of the severity levels of the rice leaf disease dataset according to the above classification
classification criteria.
criteria.
Plants 2022, 11, x FOR PEER REVIEW 7 of 20
Figure 4. Distribution of five severity levels of three diseases in the rice leaf disease dataset.
Figure 4. Distribution of five severity levels of three diseases in the rice leaf disease dataset.
2.2. Model Architecture
2.2.1.
2.2. Model
Model Architecture Overview
Architecture
This semantic
2.2.1. Model segmentation
Architecture Overview and image classification are very much related, and seman-
tic segmentation can be seen as an extension of image classification from the image level to
This semantic segmentation and image classification are very much related, and se-
the pixel level. In fact, since FCN [28], many semantic segmentation frameworks have been
mantic segmentation can be seen as an extension of image classification from the image
derived from image classification variants of ImageNet [29]. Some current semantic segmen-
level to the pixel level. In fact, since FCN [28], many semantic segmentation frameworks
tation networks based on the convolutional neural network family adopt different networks
have been derived from image classification variants of ImageNet [29]. Some current se-
as the feature extraction backbone, such as VGG [30], ResNet [31] and MobileNetv2 [32],
mantic segmentation networks based on the convolutional neural network family adopt
or design modules and methods such as dilated convolution [33], atrous spatial pyramid
different
pooling networks as the feature
[34], cross-attention extraction[35]
mechanisms backbone, such as VGG
and point-space [30], [36]
attention ResNet [31] and
to expand the
MobileNetv2 [32], or design modules and methods such as dilated
perceptual field to obtain rich contextual information. However, these methods introduce convolution [33],
atrous
manyspatial pyramid
empirical modules, pooling
making[34],the
cross-attention
resulting frameworkmechanisms [35] and point-space
computationally intensive at- and
tention [36] to expand the perceptual field to obtain rich contextual
complex. With the rapid development of transformers [37] in the field of computer vision, information. However,
these methods
the use introduce many
of transformers as the empirical
backbone modules,
of networks making the resulting
to effectively expand framework com-
the perceptual
putationally
field to extract rich feature information through self-attentive mechanisms is one of in
intensive and complex. With the rapid development of transformers [37] the
the field of computer
mainstream vision,
approaches, ofthe use Segformer
which of transformers [38] isasonethe of
backbone of networks
the typical to effec-of
representatives
tively expand applied
this method the perceptual field to
to semantic extract rich feature
segmentation tasks. information through self-attentive
mechanisms
Segformeris one of the mainstream
discards approaches,
positional encoding, usesofa which Segformer transformer
novel multilevel [38] is one ofasthe the
typical
encodingrepresentatives
structure to of this method
output multiscaleapplied to semantic
features and usessegmentation
a simple andtasks. lightweight mul-
Segformer
tilayer discards
perceptron (MLP) positional encoding,
as the decoder uses a novel
to combine localmultilevel
and global transformer
attention to asshow
the
encoding structure to output multiscale features and uses a simple
good segmentation performance. However, the model specifies a similar field of perception and lightweight mul-
tilayer
for eachperceptron (MLP)within
token feature as theeach
decoder
layer,toandcombine local andinevitably
this constraint global attention
limits theto show
ability
good segmentation
of each self-attentiveperformance. However,
layer to capture the model
multiscale specifies
features. The ashunted
similar field of percep-
transformer [39]
tion for each
proposes token shunted
a novel feature within each layer,
self-attentive thatand thismultiscale
unifies constraint feature
inevitably limits the
extraction abil- a
within
ity of each
single self-attentive
self-attentive layer
layer to capture
through multiscale
multiscale token features.
aggregation. The Inshunted
addition,transformer [39]
in Segformer’s
proposes
decoder,aup-sampling
novel shunted self-attentive
using that unifies multiscale
bilinear interpolation feature extraction
is computationally intensive, within
and the a
single self-attentive layer through multiscale token aggregation.
recovered image edges become blurred to a certain extent. The lightweight up-sampling In addition, in Seg-
former’s
operatordecoder, up-sampling
Content-Aware using bilinear
ReAssembly interpolation
of Features (CARAFE) is computationally
[40] can betterintensive,
solve this
and the recovered
problem. image
In this study, weedges become
designed blurred to
RSegformer, a certain extent.
a lightweight The lightweight
and efficient up-
rice leaf disease
segmentation
sampling model
operator based on Segformer
Content-Aware ReAssemblyand combined
of Features shunted transformer,
(CARAFE) [40] cancoordinate
better
attention
solve (CA) [41]Inand
this problem. thisCARAFE.
study, weFigure
designed 5 shows the overall
RSegformer, network model
a lightweight architecture
and efficient rice
of RSegformer.
leaf disease segmentation model based on Segformer and combined shunted transformer,
coordinate attention (CA) [41] and CARAFE. Figure 5 shows the overall network model
architecture of RSegformer.
Plants 2022,
Plants 2022,11,
11,3174
x FOR PEER REVIEW 88 of 20
of 20
Figure 5.
Figure 5. The
The overall
overall architecture
architecture of
of the
theRSegformer
RSegformernetwork.
network.
Similar to
Similar tothe
thearchitecture
architectureofofSegformer,
Segformer, RSegformer
RSegformer is divided
is divided intointo
twotwo parts:
parts: en-
encod-
coding
ing and decoding.
and decoding. The encoding
The encoding part extracts
part extracts multiscale
multiscale featuresfeatures
through through four
four shunted
transformer blocks and
shunted transformer subsequently
blocks embeds CA
and subsequently attention
embeds into the encoder–decoder
CA attention con-
into the encoder–de-
nection part. In contrast,
coder connection part. Inthe decoding
contrast, the part restores
decoding partthe featurethe
restores map to themap
feature original
to theimage
orig-
size
inal via the size
image CARAFE
via theup-sampling operator. operator.
CARAFE up-sampling
2.2.2.
2.2.2. Encoding
Encoding Section
Section
Different
Different criteria for
criteria for measuring
measuring disease
disease severity
severity were
were designed
designed for
for different
different disease
disease
types
types to solve the problem of multiple spots with a small total area covered, as
to solve the problem of multiple spots with a small total area covered, as in
in the
the rice
rice
brown
brown spotspot in
in the
the middle
middleand
and late
latestage.
stage.
(1)
(1) Shunted
Shunted transformer
transformer block
block
The
The shunted transformer
shunted transformer block
block consists
consists of
of shunted
shunted self-attention
self-attention and
and aa detail-specific
detail-specific
feed-forward layer.
feed-forward layer.
Shunted self-attention (SSA) in the shunted transformer block: SSA divides multiple
Shunted self-attention (SSA) in the shunted transformer block: SSA divides multiple
attention heads within the same layer into groups, each of which explains a specific
attention heads within the same layer into groups, each of which explains a specific gran-
granularity of features by aggregating a different number of tokens before calculating
ularity of features by aggregating a different number of tokens before calculating the self-
the self-attention matrix, thus enabling different attention heads within the same layer to
attention matrix, thus enabling different attention heads within the same layer to simul-
simultaneously allow objects of various scales to be modelled efficiently and simultaneously
taneously allow objects of various scales to be modelled efficiently and simultaneously on
different attention heads within the same layer. The SSA calculation can be expressed as
(1)–(4):
Plants 2022, 11, 3174 9 of 20
on different attention heads within the same layer. The SSA calculation can be expressed as
(1)–(4):
Qi = XWiQ (1)
x 0 = FC ( x; θ1 ) (5)
x00 = FC σ x 0 + DS x 0 ; θ
; θ2 (6)
where θ1 and θ2 represent the output dimensions of the first and second fully connected
layers, respectively, and DS(·) illustrates a detail-specific layer with parameters θ imple-
mented by deep-separated convolution.
(2) Encoding process
Given an input image of size H × W × 3, the image is first transformed into a sequence
of tokens containing more valid information using the patch embedding mechanism. The
− 1 − 1
length of the sequence is H × 4 × W × 4 , and the dimensionality of each token
vector is C. Patch embedding uses multiple layers of convolution, each of which includes
a specific convolution, BatchNorm2d and the ReLU activation function. The first layer
uses a kernel = 7 × 7, stride = 2, padding = 3 convolutional layer; the second layer stacks
zero or multiple kernel = 3 × 3, stride = 2, padding = 1 convolutional layers depending
on the required model size; and finally, a two-dimensional convolutional mapping with
kernel = 2 × 2, stride = 2 generates an input sequence of length H × 4−1 × W × 4−1 .
The token sequence is sequentially entered into four stages to obtain multiscale feature
information, each containing a linear embedding and multiple shunted transformer blocks.
The linear embedding uses a convolutional layer with a stride size of 2 to achieve down-
sampling, while each shunted transformer block outputs a feature map of the same size.
Thus, four feature maps are obtained atF1 , F2 , F3 , F4 , and each stage outputs a feature
−( i + 1 ) −( i + 1 ) i − 1
map of the size of Fi at H × 2 × W×2 × C×2 . Table 2 shows the
parameter settings for the different stages.
Plants 2022, 11, 3174 10 of 20
Table 2. Selected parameters in different phases of the shunted-tiny model. Head indicates the
number of heads in a shunted transformer block, Ni indicates the number of shunted transformer
blocks in a phase, Ci indicates the output dimension.
Shunted-Tiny
C1 = 64, C2 = 128, C1 = 256, C4 = 512,
head = 2, head = 4, head = 8, head = 16,
N1 = 1 N2 = 2 N3 = 4 N4 = 1
using MLP and finally using the MLP prediction segmentation mask. Equations (7)–(10)
can express the decoding part:
where CARAFE(·) is the up-sampling operation for the feature map using the CARAFE
operator, Ncls is the number of categories and M is the final prediction segmentation
mask obtained.
3. Experimental Process
3.1. Realisation Details
Our model was trained using 128 GB of memory powered by a Quadro RTX5000
graphics processing unit (GPU) under the Ubuntu20.04 LTS system environment. In order
to validate the effectiveness of the data augmentation method, the PSPNet [42], HRNet [43]
and OCRNet [44] networks were used to train raw data, traditionally augmented data and
RLDCP augmented data, respectively. To verify the validity of the models, data obtained by
RLDCP augmentation were used, trained with models of similar size (DeepLabv3+ model
with ResNet18 as the backbone and Segformer model with MiT-B1 as the backbone). All
of the experimental models used in our comparison experiments were derived from the
MMSegmentation [45] codebase. Therefore, the pretraining weights and hyperparameters
for the comparison experiments inherited the default settings from MMSegmentation, with
a training image size of 512 × 512. Furthermore, the model proposed in this paper is
also based on the MMSegmentation codebase implementation. The pretraining weights
used are obtained from the shunted transformer backbone trained on the ImageNet-1k
dataset. For this model, we inherited the default settings of MMSegmentation and the
shunted transformer: an initial learning rate of 0.00006, a “poly” learning strategy with
a default factor of 1.0, and 80k iterations using the Adam-W optimiser. In addition, the
batch size during training and validation was set to 2, and the results were evaluated every
500 iterations using a multiclass cross-entropy loss function to calculate the loss, as shown
in Equation (11):
1 K
Loss = − ∑ [yn logŷ] (11)
K n =1
where yn indicates the pixel point accurate class label, ŷ indicates the pixel point predicted
class label and K indicates the total number of classes.
k
1 pii
MIoU = ∑
k + 1 i=0 ∑ j=0 pij + ∑kj=0 p ji − pii
k
(13)
where pij denotes quantities that were originally in the class but were predicted to be in the
class, p ji denotes amounts that were originally in the class but were predicted to be in the
class, pii denotes true quantities, pij , p ji is interpreted as false positive and false negative,
respectively, and k denotes class numbers.
Plants 2022, 11, 3174 12 of 20
4. Discussion
4.1. Validation of Data Augmentation Methods
In this experiment, we chose three of the more popular network models, namely
PSPNet, HRNet and OCRNet. To verify the effectiveness and superiority of our proposed
data augmentation method and to validate the performance change when RLDCP was
used a different number of times, we compared the segmentation accuracy of four datasets
(the original dataset, the dataset obtained after running the traditional data augmentation
method twice and the dataset obtained after running the RLDCP augmentation method
once and then twice) on three classical semantic segmentation models, using MIoU as the
evaluation metric. The original dataset contained 450 disease images, and with each data
enhancement, the number of datasets increased by 450. Thus, a single data augmentation
produced a dataset with 900 disease images and double data augmentation produced a
dataset with 1350 disease images. It is worth noting that in order to obtain more valid
information from the original image by traditional data augmentation methods, we chose
two classical traditional data augmentation methods, namely random rotation and the
addition of pretzel noise. In particular, we divided the data within each level of the three
diseases in turn in a ratio of 8:2 to form the training and validation sets required for our
experiments. The experimental results are shown in Table 3. We found that the MIOU
values of the dataset enhanced using the RLDCP method increased on the different network
models, demonstrating the effectiveness of RLDCP in the segmentation process.
Observation of Table 3 revealed that (1) traditional data enhancement methods reduced
the segmentation performance of the model. We analysed the reason for this, probably
because the datasets we used originated from three different environments with widely
varying data distributions. The limited amount of information added by random rotation
and pretzel noise amplified this imbalance by repeating memory on the data. (2) The
RLDCP data enhancement method effectively improves model segmentation accuracy.
Compared with the original datasets PSPNet, HRNet and OCRNet, MIoU improved by
5.47%, 6.34% and 5.04% after two RLDCP data augmentations, respectively. We analyse that
this may be because the rice leaf disease copy paste method synthesises reasonable rice leaf
disease images by restricted copy-paste, which effectively expands the sample data volume,
reduces the impact of data distribution differences and improves the generalisability of
the model. (3) Training with the data augmented with one RLDCP data augmentation,
MIOU increased significantly for all three models. When we added another RLDCP data
augmentation, MIOU also increased for all three models. We believe that as the amount
of data increases, MIoU will tend to saturate. Although the traditional data enhancement
method also increased the amount of data, it did not increase the MIOU, so our proposed
RLDCP method is effective.
Figure 7. MIoUs for the three models were validated using 5-fold cross-validation. The dataset was
divided into 5 parts, set as data-1 to data-5. fold-i is equal to the experimental results obtained by
Figure
treating 7. MIoUs
data-i as thefor the threeset
validation models were
and the restvalidated using
of the data 5-fold
as the cross-validation.
training set. The dataset was
divided into 5 parts, set as data-1 to data-5. fold-i is equal to the experimental results obtained by
treating
Firstly,data-i as the validation
the MIoUs set and
of the three the rest
models of the dataless
fluctuated as the training
among theset.
five sets of experi-
ments. Secondly, the differences between the three models were relatively stable in each
Firstly,
experiment. the MIoUs
Finally, of thethat
it is evident three models fluctuated
RSegformer performed less among
best amongthethefive setsmodels,
three of experi-
withments.
MIoUsSecondly,
on average the1.5%
differences
and 2.5%between the three
higher than models
Segformer andwere relativelyrespectively.
Deeplabv3+, stable in each
experiment. Finally,
To verify that it is evident that RSegformer
the cross-validation results of theperformed best among
three models the three models,
were statistically sig-
with MIoUs on average 1.5% and 2.5% higher than Segformer and
nificantly different, and because the crossover experiments for the three models were Deeplabv3+, respec-
tively.
independent of each other and their results were consistent with continuity, normality
To verify that
and homogeneity the cross-validation
of variance, we chose a results
one-way of analysis
the threeof models were
variance statistically
(ANOVA) andsig-
usednificantly different,
SPSS software for and because
statistical the crossover
analysis. The ANOVAexperiments
resultsfor the three
showed thatmodels were in-
the different
dependent
models of each other
had significantly and their
different results
effects were consistent
on MIoU, F = 39.853,with
p =continuity,
0.000005, asnormality
shown inand
homogeneity
Table of variance,
5. The multiple we chose a results
mean comparison one-way analysis
showed thatofthe
variance (ANOVA)
RSegformer model and
wasused
significantly betterfor
SPSS software than Segformer
statistical and Deeplabv3+.
analysis. The ANOVA results showed that the different mod-
els had significantly different effects on MIoU, F = 39.853, p = 0.000005, as shown in Table
Table 5. One-way
5. The multipleanalysis
meanofcomparison
variance results for different
results showedmodels.
that the RSegformer model was signifi-
cantly better than Segformer and Deeplabv3+.
Model MIoU (%) F-Test Multiple Comparisons
(x ± s) F P
RSegformer 87.56 ± 0.45 39.853 0.000005 RSegformer > Segformer
Segformer 86.02 ± 0.46 RSegformer > DeepLabv3+
DeepLabv3+ 84.80 ± 0.54 Segformer > DeepLabv3+
Plants 2022, 11, 3174 15 of 20
Model 1 is the original Segformer model, Model 2 is the model after replacing the
encoding part of the Segformer model with shunted transformer and Model 3 is the model
after adding CA attention to the middle part of encoding and decoding on top of Model
2. Model 4 is the model after replacing the bilinear up-sampling with CARAFE in the
decoding part on top of Model 2.
After replacing the coding backbone with shunted transformer, the segmentation
accuracy improved for almost all diseases and leaves and backgrounds except for rice
bacterial blight. This may be because SSA has better feature extraction for small targets
through multiscale token aggregation, which unifies multiscale feature extraction within
a single self-attentive layer and therefore has better segmentation capability for dense
micro-miniature disease spots such as brown spots and rice blast.
The segmentation accuracy of rice bacterial blight disease improved with the addition
of CA attention. This may be because CA attention takes into account the relationships
of the location information in the feature space, which enables the model to capture the
long-distance dependence between spatial locations. Therefore, there is an improvement in
segmenting images of rice bacterial blight, which has an onset colour similar to that of rice
ears and a wide distribution area.
All disease segmentation accuracies improved significantly with the addition of the
CARAFE operator. This may be because the CARAFE up-sampling method has a larger
perceptual field and can make better use of the surrounding information and also the
up-sampling kernel in CARAFE is related to the semantic information of the feature map,
enabling up-sampling based on the input content. Thus, it can significantly improve the
overall segmentation performance of the network.
When CA attention and CARAFE cooperate, the segmentation accuracy of all diseases
except brown spot improved. We determined that this may be because since CARAFE
can better capture semantic information, CA attention can better capture feature spatial
location information, and the combination of the two to complement each other effectively
improves the model’s segmentation performance.
the leaf as a bacterial blight lesion in the segmentation, but Segformer and RSegformer do
not, as the rice blast onset area exhibits very similar colour symptoms to the background.
the fourth
In the row,
fourth it can
row, be seen
it can thatthat
be seen the the
RSegformer
RSegformer model
modelsegmented
segmentedthethe
blurred edges
blurred edges of
brown
of brownspot disease
spot diseasevery well,
very in in
well, line with
line withthe
theconclusion
conclusionobtained
obtainedininthe
theablation
ablationstudy
study
that
that CA
CA attention
attention significantly
significantly improved
improved the the fine
finesegmentation
segmentationof ofmargins.
margins. InIn the
the fifth
fifth
row,
row,wewefind
findthat
thatDeeplabv3+,
Deeplabv3+,and andSegformer
Segformerboth bothshow
showmissed
misseddetection
detectionofoffine
finedisease
disease
spots.
spots.RSegformer
RSegformercan cansegment
segment accurately, which
accurately, has important
which implications
has important for the for
implications timelythe
monitoring and early warning of rice diseases. The sixth row uses the
timely monitoring and early warning of rice diseases. The sixth row uses the syntheticsynthetic leaf of rice
leaf
bacterial blight disease
of rice bacterial blightand its inference
disease results after
and its inference data enhancement,
results and the segmented
after data enhancement, and the
area of RSegformer
segmented was closer towas
area of RSegformer the closer
labelled
to image.
the labelled image.
Figure 8. Example inference results for the validation sets on the three models. The first column
represents the real images, the second column represents the real labels, the third column shows
the inference results for the DeepLabv3+ network model, the fourth column shows the inference
results for the Segformer network model, and the fifth column shows the inference results for the
RSegformer network model.
Confusion matrices
Figure9.9.Confusion
Figure matrices for the
theseverity
severityclasses
classesofofdifferent rice
different diseases
rice under
diseases underdifferent network
different net-
models.
work The first
models. Therow
firstisrow
the is
confusion matrixmatrix
the confusion of the three
of themodels for the for
three models estimation of the severity
the estimation of the
severity of rice bacterial
of rice bacterial blight disease,
blight disease, the second
the second rowconfusion
row is the is the confusion
matrix matrix of themodels
of the three three models
for the
for the estimation
estimation of the severity
of the severity of rice
of rice blast blast disease
disease and theandthirdthe third
row row
is the is the confusion
confusion matrix ofmatrix of
the three
the three models for the estimation of the severity of brown
models for the estimation of the severity of brown spot disease. spot disease.
a larger lesion area than the marked area. Brown spot was mostly underestimated at level 2,
probably due to the clustering of spots, which made it difficult to split the predicted spots.
5. Conclusions
In this study, a semantic segmentation method based on Segformer, shunted trans-
former encoder, CA attention mechanism and CARAFE up-sampling operator was pro-
posed to identify and segment rice bacterial blight, rice blast and brown spot; improve
segmentation accuracy using the RLDCP data enhancement method and calculate the
area and number of spots based on the segmentation results; then, the disease severity
classification criteria were used to determine the rating. The results show that: (1) the
proposed RLDCP data enhancement method outperforms traditional data enhancement
methods in generalisation and significantly improves the detection performance of the se-
mantic segmentation model without additional training costs compared with GAN models.
(2) The RSegformer semantic segmentation model achieves MIOU of 85.38%, in contrast
with DeepLabV3+ and Segformer, with minor increases in the numbers of parameters
and computational effort, exceeding DeepLabV3+ by 1.91% and the Segformer-B1 model
by 1.43%. (3) The model has greater accuracy in classifying lesion severity on the newly
established severity criteria.
The semantic segmentation model proposed in this study achieves pixel-level classifi-
cations of different rice diseases and provides a reference for related plant disease detection
studies. In the future, we suggest designing semantic segmentation models with higher
accuracy and smaller size, expanding the dataset of different disease types and disease
stages of rice, producing more fine-grained semantic segmentation labels and adopting
more concise and efficient data enhancement methods to achieve rice disease segmentation
and severity ranking.
Author Contributions: Conceptualization, Z.L., P.C. and L.S.; Data curation, P.C. and L.S.; Formal
analysis, Z.L., M.W. and L.Z.; Funding acquisition, Z.L. and Y.W.; Investigation, P.C. and L.S.;
Methodology, Z.L., P.C. and L.S.; Project administration, Z.L., M.W. and L.Z.; Resources, Z.L., P.C.
and L.S.; Software, L.S.; Supervision, Y.W. and J.M.; Validation, P.C. and L.S.; Visualization, P.C. and
L.S.; Writing—original draft, P.C. and L.S.; Writing—review & editing, Z.L., P.C. and L.S. All authors
have read and agreed to the published version of the manuscript.
Funding: This work was funded by the Research on intelligent monitoring and early warning
technology for major rice pests and diseases of the Sichuan Provincial Department of Science and
Technology (grant number 2022NSFSC0172) and the Research and application of key technologies for
intelligent spraying based on machine vision (key technology research project) of Sichuan Provincial
Department of Science and Technology (grant number 22ZDYF0095).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The three varieties of diseased rice leaf images and masks used
in this study are available at https://www.kaggle.com/datasets/slygirl/rice-leaf-disease-with-
segmentation-labels (accessed on 3 November 2022) and can be shared on request.
Acknowledgments: Thanks to Xueqin Jiang, Xingyu Jia and Boda Zhang for their advice throughout
the research process. Thanks to all the partners of AI Studio for their support.
Conflicts of Interest: The authors declare that they have no known conflicting financial interests or
personal relationships that could have appeared to influence the work reported in this paper.
References
1. Li, H. Research Progress on Acquisition and Processing of Rice Disease Images Based on Computer Vision Technology. J. Phys.
Conf. Ser. 2020, 1453, 012160. [CrossRef]
2. Prajapati, H.B.; Shah, J.P.; Dabhi, V.K. Detection and Classification of Rice Plant Diseases. Intell. Decis. Technol. 2017, 11, 357–373.
[CrossRef]
Plants 2022, 11, 3174 19 of 20
3. Li, D.; Wang, R.; Xie, C.; Liu, L.; Zhang, J.; Li, R.; Wang, F.; Zhou, M.; Liu, W. A Recognition Method for Rice Plant Diseases and
Pests Video Detection Based on Deep Convolutional Neural Network. Sensors 2020, 20, 578. [CrossRef] [PubMed]
4. Mathew, A.; Antony, A.; Mahadeshwar, Y.; Khan, T.; Kulkarni, A. Plant Disease Detection Using GLCM Feature Extractor and
Voting Classification Approach. Mater. Today Proc. 2022, 58, 407–415. [CrossRef]
5. Ali, H.; Lali, M.I.; Nawaz, M.Z.; Sharif, M.; Saleem, B.A. Symptom Based Automated Detection of Citrus Diseases Using Color
Histogram and Textural Descriptors. Comput. Electron. Agric. 2017, 138, 92–104. [CrossRef]
6. Hou, C.; Zhuang, J.; Tang, Y.; He, Y.; Miao, A.; Huang, H.; Luo, S. Recognition of Early Blight and Late Blight Diseases on Potato
Leaves Based on Graph Cut Segmentation. J. Agric. Food Res. 2021, 5, 100154. [CrossRef]
7. Pallathadka, H.; Ravipati, P.; Sekhar Sajja, G.; Phasinam, K.; Kassanuk, T.; Sanchez, D.T.; Prabhu, P. Application of Machine
Learning Techniques in Rice Leaf Disease Detection. Mater. Today Proc. 2022, 51, 2277–2280. [CrossRef]
8. Javidan, S.M.; Banakar, A.; Vakilian, K.A.; Ampatzidis, Y. Diagnosis of Grape Leaf Diseases Using Automatic K-Means Clustering
and Machine Learning. Smart Agric. Technol. 2023, 3, 100081. [CrossRef]
9. Harakannanavar, S.S.; Rudagi, J.M.; Puranikmath, V.I.; Siddiqua, A.; Pramodhini, R. Plant Leaf Disease Detection Using Computer
Vision and Machine Learning Algorithms. Glob. Transit. Proc. 2022, 3, 305–310. [CrossRef]
10. Ahmad, A.; Saraswat, D.; El Gamal, A. A Survey on Using Deep Learning Techniques for Plant Disease Diagnosis and Recom-
mendations for Development of Appropriate Tools. Smart Agric. Technol. 2023, 3, 100083. [CrossRef]
11. Liu, J.; Wang, X. Plant Diseases and Pests Detection Based on Deep Learning: A Review. Plant Methods 2021, 17, 22. [CrossRef]
12. Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A Novel Architecture for Plant Leaf Segmentation
and Counting. Ecol. Inform. 2022, 68, 101583. [CrossRef]
13. Hu, G.; Fang, M. Using a Multi-Convolutional Neural Network to Automatically Identify Small-Sample Tea Leaf Diseases.
Sustain. Comput. Inform. Syst. 2022, 35, 100696. [CrossRef]
14. Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste Is a Strong Data
Augmentation Method for Instance Segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2917–2927. [CrossRef]
15. Gonçalves, J.P.; Pinto, F.A.C.; Queiroz, D.M.; Villar, F.M.M.; Barbedo, J.G.A.; del Ponte, E.M. Deep Learning Architectures for
Semantic Segmentation and Automatic Estimation of Severity of Foliar Symptoms Caused by Diseases or Pests. Biosyst. Eng.
2021, 210, 129–142. [CrossRef]
16. Wang, Y.; Wang, H.; Peng, Z. Rice Diseases Detection and Classification Using Attention Based Neural Network and Bayesian
Optimization. Expert Syst. Appl. 2021, 178, 114770. [CrossRef]
17. Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Instance Segmentation of Apple Flowers Using the Improved Mask R–CNN Model.
Biosyst. Eng. 2020, 193, 264–278. [CrossRef]
18. Li, H.; Li, C.; Li, G.; Chen, L. A Real-Time Table Grape Detection Method Based on Improved YOLOv4-Tiny Network in Complex
Background. Biosyst. Eng. 2021, 212, 347–359. [CrossRef]
19. Ji, M.; Wu, Z. Automatic Detection and Severity Analysis of Grape Black Measles Disease Based on Deep Learning and Fuzzy
Logic. Comput. Electron. Agric. 2022, 193, 106718. [CrossRef]
20. Liang, Q.; Xiang, S.; Hu, Y.; Coppola, G.; Zhang, D.; Sun, W. PD2SE-Net: Computer-Assisted Plant Disease Diagnosis and Severity
Estimation Network. Comput. Electron. Agric. 2019, 157, 518–529. [CrossRef]
21. Prabhakar, M.; Purushothaman, R.; Awasthi, D.P. Deep Learning Based Assessment of Disease Severity for Early Blight in Tomato
Crop. Multimed. Tools Appl. 2020, 79, 28773–28784. [CrossRef]
22. Esgario, J.G.M.; Krohling, R.A.; Ventura, J.A. Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress.
Comput. Electron. Agric. 2020, 169, 105162. [CrossRef]
23. Chen, S.; Zhang, K.; Zhao, Y.; Sun, Y.; Ban, W.; Chen, Y.; Zhuang, H.; Zhang, X.; Liu, J.; Yang, T. An Approach for Rice Bacterial
Leaf Streak Disease Segmentation and Disease Severity Estimation. Agriculture 2021, 11, 420. [CrossRef]
24. Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A Cucumber Leaf Disease Severity Classification Method Based on the Fusion of
DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [CrossRef]
25. Sethy, P.K.; Barpanda, N.K.; Rath, A.K.; Behera, S.K. Deep Feature Based Rice Leaf Disease Identification Using Support Vector
Machine. Comput. Electron. Agric. 2020, 175, 105527. [CrossRef]
26. Leaf Rice Disease | Kaggle. Available online: https://www.kaggle.com/datasets/tedisetiady/leaf-rice-disease-indonesia
(accessed on 7 August 2022).
27. Hao, Y.; Liu, Y.; Wu, Z.; Han, L.; Chen, Y.; Chen, G.; Chu, L.; Tang, S.; Yu, Z.; Chen, Z.; et al. EdgeFlow: Achieving Practical
Interactive Segmentation with Edge-Guided Flow. In Proceedings of the IEEE International Conference on Computer Vision 2021,
Montreal, QC, Canada, 10–17 October 2021; pp. 1551–1560. [CrossRef]
28. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern. Anal. Mach.
Intell. 2014, 39, 640–651. [CrossRef]
29. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the
2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [CrossRef]
30. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd
International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May
2015. [CrossRef]
Plants 2022, 11, 3174 20 of 20
31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 770–778. [CrossRef]
32. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–23 June 2018; pp. 4510–4520. [CrossRef]
33. Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the 4th International Conference
on Learning Representations, ICLR 2016—Conference Track Proceedings, San Juan, Puerto Rico, 2–4 May 2016. [CrossRef]
34. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic
Image Segmentation. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11211, pp. 833–851.
[CrossRef]
35. The Cross-Attention Mechanism | Download Scientific Diagram. Available online: https://www.researchgate.net/figure/The-
cross-attention-mechanism_fig2_350779666 (accessed on 24 October 2022).
36. Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. PSANet: Point-Wise Spatial Attention Network for Scene Parsing. In
Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11213, pp. 270–286. [CrossRef]
37. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need.
Adv. Neural Inf. Process. Syst. 2017, 2017, 5999–6009. [CrossRef]
38. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic
Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 15, 12077–12090. [CrossRef]
39. Ren, S.; Zhou, D.; He, S.; Feng, J.; Wang, X. Shunted Self-Attention via Multi-Scale Token Aggregation. arXiv 2021, arXiv:2111.15193.
[CrossRef]
40. Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the
IEEE International Conference on Computer Vision 2019, Seoul, Korea, 27–28 October 2019; pp. 3007–3016. [CrossRef]
41. Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717.
[CrossRef]
42. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 30th IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2017, Honolulu, Hawaii, 21–26 July 2016; pp. 6230–6239. [CrossRef]
43. Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June
2019; pp. 5686–5696. [CrossRef]
44. Yuan, Y.; Chen, X.; Chen, X.; Wang, J. Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation.
arXiv 2019, arXiv:1909.11065. [CrossRef]
45. Open-Mmlab/Mmsegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. Available online: https:
//github.com/open-mmlab/mmsegmentation (accessed on 7 August 2022).