Open AccessArticle

GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing

Zhenyu Zhang

¹,

Yunfan Yang

¹,

Xin Xu

Liangliang Liu

Jibo Yue

Ruifeng Ding

²,

Yanhui Lu

Jie Liu

⁴ and

Hongbo Qiao

^1,*

College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China

Institute of Plant Protection, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China

Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China

⁴

Division of Pest Monitoring and Forecasting, National Agricultural Technology Service Center, Beijing 100125, China

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(16), 3046; https://doi.org/10.3390/rs16163046

Submission received: 20 June 2024 / Revised: 15 August 2024 / Accepted: 15 August 2024 / Published: 19 August 2024

(This article belongs to the Special Issue Plant Disease Detection and Recognition Using Remotely Sensed Data)

Download

Browse Figures

Figure 1
Images of damage caused by cotton aphids. "> Figure 2
GVC-YOLO model structure diagram. "> Figure 3
GSConv structure diagram. "> Figure 4
GS bottleneck and VoVGSCSP structure. "> Figure 5
Coordinate attention mechanism structure. "> Figure 6
Network structure of SimSPPF. "> Figure 7
Embedded edge computing device: (a) Jetson Xavier NX; (b) Depth camera-USB. "> Figure 8
Comparison of the training curves of GVC-YOLO and YOLOv8n. "> Figure 9
Lightweight comparison of the GVC-YOLO and YOLOv8n models. "> Figure 10
Comparison of the detection effects on actual scenes: (a) original images; (b) YOLOv8n; (c) GVC-YOLO; and (d) Grad-CAM visualization. "> Figure 11
Confusion matrix of the GVC-YOLO model on the test set. "> Figure 12
Performance comparison of YOLO series algorithms. "> Figure 13
Edge deployment process. "> Figure 14
Video stream detection speed at the edge. "> Figure 15
Real-time video detection screen on Jetson. ">

Versions Notes

Abstract

Cotton aphids (Aphis gossypii Glover) pose a significant threat to cotton growth, exerting detrimental effects on both yield and quality. Conventional methods for pest and disease surveillance in agricultural settings suffer from a lack of real-time capability. The use of edge computing devices for real-time processing of cotton aphid-damaged leaves captured by field cameras holds significant practical research value for large-scale disease and pest control measures. The mainstream detection models are generally large in size, making it challenging to achieve real-time detection on edge computing devices with limited resources. In response to these challenges, we propose GVC-YOLO, a real-time detection method for cotton aphid-damaged leaves based on edge computing. Building upon YOLOv8n, lightweight GSConv and VoVGSCSP modules are employed to reconstruct the neck and backbone networks, thereby reducing model complexity while enhancing multiscale feature fusion. In the backbone network, we integrate the coordinate attention (CA) mechanism and the SimSPPF network to increase the model’s ability to extract features of cotton aphid-damaged leaves, balancing the accuracy loss of the model after becoming lightweight. The experimental results demonstrate that the size of the GVC-YOLO model is only 5.4 MB, a decrease of 14.3% compared with the baseline network, with a reduction of 16.7% in the number of parameters and 17.1% in floating-point operations (FLOPs). The [email protected] and [email protected]:0.95 reach 97.9% and 90.3%, respectively. The GVC-YOLO model is optimized and accelerated by TensorRT and then deployed onto the embedded edge computing device Jetson Xavier NX for detecting cotton aphid damage video captured from the camera. Under FP16 quantization, the detection speed reaches 48 frames per second (FPS). In summary, the proposed GVC-YOLO model demonstrates good detection accuracy and speed, and its performance in detecting cotton aphid damage in edge computing scenarios meets practical application needs. This research provides a convenient and effective intelligent method for the large-scale detection and precise control of pests in cotton fields.

Keywords:

cotton aphid; real-time video detection; YOLOv8; edge computing

1. Introduction

Cotton is an important economic crop and raw material for the textile industry and plays a crucial role in the national economy and people’s livelihoods [1]. However, cotton often suffers from various diseases and pests during its growth, with pest infestation being one of the most prominent factors affecting the high yield and quality of cotton in China, directly impacting the economic income of cotton farmers [2]. The cotton aphid is an important pest during the seedling stage that is widely distributed in cotton areas throughout China and easily spreads once it occurs. Cotton aphids insert their piercing-sucking mouthparts into the underside or tender parts of cotton leaves, sucking sap. The infested leaves curl upward, forming an arc, with aphid excretions visible on the leaf surface, often promoting mold growth. Infected cotton plants exhibit stunted growth, relatively small leaves, and reduced leaf numbers [3].

Xinjiang, owing to its unique geographical location, is the largest cotton planting and processing base in China [1]. Owing to its large and concentrated cotton planting scale, which is conducive to large-scale and continuous agricultural machinery operations, it possesses characteristics conducive to the development of mechanical intelligence. Rapid and accurate identification of the severity of cotton aphid damage in the field, followed by direct implementation of further control measures based on the identified results, such as precise variable pesticide spraying, has positive effects on improving efficiency, preventing further spread of pests, and thus enhancing the quality and yield of cotton. Currently, the discovery of cotton aphid damage relies primarily on manual observation, which is not only time-consuming and laborious but also prone to inaccuracies in human judgment, leading to the further spread of pest problems and highlighting serious drawbacks in large-scale cotton pest control and management.

With the continuous advancement of computer vision technology and deep learning methods, the speed and accuracy of traditional image analysis techniques have become insufficient to meet the demands of modern agricultural production. The utilization of deep learning methods for pest and disease detection has gradually become mainstream [4]. The model learns from a large number of data features to efficiently identify crop pests and diseases without much human involvement. For example, Alves et al. [5] introduced a field image cotton pest classification method based on deep residual networks. They constructed a dataset containing the 15 most common cotton pests and harmless categories and proposed a specific residual deep learning model, ResNet34*, which automatically classifies major pests from given images. A performance evaluation against other convolutional neural networks revealed that the proposed model achieved the highest accuracy, with an f score of 0.98. Bao et al. [6] developed a lightweight CA_DenseNet_BC_40 model based on DenseNet for classifying leaf damage caused by cotton aphids. They introduced coordinated attention mechanisms into each submodule of the dense block to better extract features of cotton aphid-induced symptoms, achieving a high classification accuracy of 97.3%. With the rapid development of drone technology, there has been an increasing amount of academic research using drone-based remote sensing technology for crop pest and disease detection. This approach allows for precise and rapid monitoring of pests and diseases [7]. Li et al. [8] utilized high-resolution multispectral images captured by drones. Using data analysis methods such as partial least squares regression (PLSR) and neural network models, they developed a Verticillium wilt monitoring model for cotton based on drone remote sensing images. On this basis, they integrated the use of agricultural drones to generate prescription maps for spraying against Verticillium wilt, thereby achieving an integrated approach from disease detection to pest control.

Object detection is an important direction where computer vision and deep learning intersect [9], which not only classifies objects in images but also accurately identifies their positions. It has extensive applications in industrial manufacturing and smart agriculture, with representative algorithms including the one-stage YOLO [10] series and the two-stage Faster R-CNN [11]. Among them, the YOLO series models, known for their rapid and accurate performance, are widely used in agricultural detection. Some scholars [12] have combined drone multispectral imaging with object detection algorithms to perform seedling detection and counting in cotton crops. By utilizing multispectral images collected by drones, models suitable for counting throughout the cotton seedling stage were established via three deep learning algorithms: YOLOv5, YOLOv7, and CenterNet. The results indicated that YOLOv7 performed better in terms of detection and counting. Li et al. [13] integrated three new modules, SCFEM, DSCFEM, and SPPM, into the YOLOv5 model to extract image features effectively. They applied this model to the detection of ten types of jute diseases and pests, achieving an average accuracy of 96.63%, demonstrating the broad prospects of machine learning and deep learning in precise agricultural disease prevention. Xu et al. [14] collected images of cotton seedling aphid infections via smartphones and compiled them into an image dataset. They constructed three different object detection models, Faster R-CNN, YOLOv5, and SSD, with YOLOv5 achieving the highest mAP (95.7%) and FPS (61.73) and successfully deploying it on a mobile Android phone. Many researchers are dedicated to developing lightweight models that can be more easily deployed on mobile devices, thereby facilitating the effective application of pest and disease detection models in real-world scenarios. Xu et al. [15] developed a lightweight SSV2-YOLO model based on YOLOv5s. By reconstructing the backbone network via Stem and ShuffleNet V2 and adjusting the neck network width, it was used for sugarcane aphid detection. The feature level, data augmentation method, and loss function were further reconstructed to improve the detection performance for small, dense, and overlapping targets. The improved method is extremely lightweight and can be used for real-time detection on mobile devices. Lin et al. [16] developed a system for automatically detecting peanut leaf diseases. The system is based on the YOLOv8n model and combines the advanced FasterNeXt and DSConv modules and the generalized intersection over union loss function. It aims to improve the lightweight nature of the model so that it can adapt to edge computing devices and realize real-time detection of peanut leaf diseases.

In summary, deep learning has demonstrated good performance in crop pest and disease detection because of its powerful feature extraction and autonomous learning capabilities [17]. Many scholars have achieved high real-time detection performance on standard platforms. However, the large size of these computing platforms makes them unsuitable for the detection of pests and diseases in field farm environments. Furthermore, most studies in this field focus on detecting a single image at a time, which has limited detection coverage and low efficiency. This approach is inadequate for the management and control of pests and diseases in large-scale cotton fields in Xinjiang. We aim to use edge computing devices with low power consumption and small sizes for real-time detection of pests and diseases. Real-time detection of video data can display the occurrence of pests and diseases in a larger range and improve detection efficiency. Moreover, the deployment of pest and disease detection models in edge computing scenarios can be combined with pesticide application tractors and drones. By placing the computing unit on the edge, large-area pest and disease detection can be achieved without the need for long-distance data transmission, greatly reducing labor and improving efficiency, thereby achieving precise pesticide spraying, which is more valuable for practical research. However, the computing resources of commonly used edge devices remain significantly limited [18]. Existing general models have numerous network parameters, complex calculations, and high memory consumption, making real-time detection challenging and unable to improve the detection efficiency for pests and diseases in practical applications [19]. Therefore, developing a cotton aphid damage detection model tailored for actual cotton field scenarios, characterized by low computational complexity, effective detection performance, and ease of deployment, is essential.

In response to the aforementioned challenges, this paper proposes an efficient cotton aphid damage detection model designed for real-time deployment on drones or tractors in cotton fields. We build upon the YOLOv8n model by integrating a module inspired by the Slim-Neck design, which reduces computational overhead to enhance detection speed at the edge. Additionally, an attention mechanism is incorporated to ensure high detection accuracy. The resulting model is named GVC-YOLO. The GVC-YOLO model is optimized and accelerated via TensorRT, deployed on the edge computing device Jetson Xavier NX, and connected to cameras to obtain real-time field videos for timely inference and detection of cotton aphid damage. This research can be combined with IoT agricultural machinery to achieve integrated intelligent pest and disease recognition and precise variable pesticide spraying, providing a convenient and effective new method for large-scale pest and disease management in the current cotton industry.

2. Materials and Methods

2.1. Data Acquisition

The image data for this experiment were collected at the Korla Experimental Station of the Institute of Plant Protection, Chinese Academy of Agricultural Sciences (41°44′59″N, 85°48′30″E). This region has a typical temperate continental climate, is full of light throughout the year, has a large temperature difference between day and night, has low precipitation, and is suitable for high-quality cotton growth. The cotton varieties used were ‘Zhongmiansuo49’ and ‘Xinluzhong66’, which were sown in mid-to-late April in mulch cultivation mode with drip irrigation under the film. The areas in the cotton field where the degree of cotton aphid infestation was severe were selected for image collection. During the experiment, no pesticides were applied to inhibit the growth of the cotton aphids. Cotton aphid infestations occurred naturally without any artificial intervention.

The data were collected from late June to mid–July 2019 and from early June to early July 2022. This period is the most serious period of cotton aphid infestation, and it is also a critical period for pesticide control. All image data were obtained under natural conditions to enrich the diversity of the image dataset, including varying light intensities, cotton leaf obstructions, overlaps, small targets, and complex backgrounds. Data acquisition was conducted with HUAWEI Nova, iPhone 8 Plus, and Mi Note 3 smartphones. The acquisition method was overhead vertical shooting. The researchers stood next to the cotton plants with a smartphone in hand and vertically shot images of the cotton seedlings from a vantage of 1.2–1.5 m. The smartphone was parallel to the ground, and the camera was 90 degrees to the ground. The image resolution was 4030 × 3024 pixels. Additionally, smartphones were used to record videos via parallel movement to obtain video data, with a resolution of 1920 × 1080 pixels and a frame rate of 30 FPS.

2.2. Data Preprocessing

The collected raw images were manually screened to remove ineffective images, such as those that were blurry or unclear. For video files, key frames were extracted. Owing to the large size of the collected raw image data and redundant information, such as the soil contained in the images, which could slow down the training speed of deep learning models, the images were uniformly cropped to a size of 3000 × 3000 pixels to remove irrelevant information. Additionally, high-definition smartphones are used for data collection, resulting in large original image resolutions, whereas lower-resolution industrial cameras are commonly used in practical applications. Models trained on high-resolution images may not perform optimally in real-world scenarios. Therefore, the image resolutions were further uniformly reduced to 1024 × 1024 pixels.

Using the annotation tool LabelImg, we conducted graded annotations of cotton aphid damage to cotton leaf canopy regions and stored the annotated data in the PASCAL VOC format. The grading of cotton aphid damage in this study followed both expert experience and national grading standards (GB/T 15799-2011) [20], as outlined in Table 1. Examples of acquired images for this experiment are shown in Figure 1. Deep learning requires learning and extracting features from a large amount of labeled data. To enhance the model’s generalizability and robustness, optimize its learning ability regarding the data, and avoid overfitting, various data augmentation methods, including horizontal flipping, vertical flipping, introducing Gaussian noise, and adjusting the brightness of the original images, were applied to the 3051 annotated images, resulting in a total of four augmentation methods [21]. The original dataset was expanded through these methods, resulting in a total of 16,950 images. These images were randomly divided into training, validation, and testing sets at a ratio of 8:1:1. The basic information of the cotton aphid damage dataset is presented in Table 2.

2.3. YOLOv8 Object Detection Model

YOLOv8 is a state-of-the-art (SOTA) model in the field of object detection that integrates and improves upon previous generations of YOLO models. As a one-stage object detection network, it offers unparalleled performance in terms of flexibility and accuracy; its high performance and low latency enable it to accomplish various computer vision tasks quickly. To meet the practical needs of different specific scenarios, YOLOv8 provides a total of five models at different scales: N, S, M, L, and X. The depth of the network and the number of parameters increase progressively, resulting in improved accuracy of detection results, albeit with a corresponding decrease in detection speed [22].

The network structure of YOLOv8 consists of four main parts: the input, backbone, neck, and head. The backbone network comprises standard convolution modules and C2f and SPPF modules. This section still adopts the CSPDarknet structure from YOLOv5, replacing the C3 module with a more gradient-rich and lightweight C2f module. This step of the process effectively extracts features from input images at different levels to capture crucial information. The neck section uses a PAN-FPN structure, combining the advantages of PANet and FPN to fuse multiscale features from different levels more effectively [23]. The head section abandons the anchor-based approach used in previous generations of YOLO structures, foregoing the use of predefined anchor boxes. Instead, it adopts the anchor-free method to directly learn and predict the position and size information of targets from image features without relying on predefined reference boxes. During output, the Decoupled-Head approach separates the localization and classification tasks of object detection into two independent branches. The first branch performs regression tasks to output prediction boxes, whereas the other branch handles classification tasks to output the probabilities of the target classes. This decoupled design avoids mutual interference between tasks, thus preventing inaccurate detection results.

B C E L = - \sum_{i = 1}^{N} [y_{i} \ln (σ (x_{i})) + (1 - y_{i}) \ln (1 - σ (x_{i}))]

(1)

D F L (S_{i}, S_{i + 1}) = - ((y_{i + 1} - y) \log (S_{i}) + (y - y_{i}) \log (S_{i + 1}))

(2)

The classification loss of YOLOv8 employs binary cross-entropy loss (BCEL) to measure the model’s accuracy in classifying targets, as shown in Equation (1), where y_i represents the label and σ(x_i) represents the model’s predicted result. The bounding box regression loss adopts an improved form of the positional loss function, CIoU loss, and DFL. Distribution focal loss (DFL) attempts to learn and optimize the position of bounding boxes in a more direct manner; this allows the network to focus more quickly on the distribution of neighboring regions near the target position. In Equation (2), y represents the ground truth label, y_i and y_i+1 are two adjacent quality labels to y, and S_i and S_i+1 are the sigmoid outputs corresponding to the quality labels. The weighted sum of the three losses mentioned above constitutes the final loss function.

2.4. GVC-YOLO Model Construction

In this study, to achieve real-time and accurate detection of cotton aphid damage, the YOLOv8n network, which is the fastest in speed and most suitable for deployment on edge devices, was chosen as the base network. Lightweight reconstructions were made for this network. The standard convolutions in the backbone network and neck section of the original structure were replaced with lightweight GSConv to reduce the computational load of the model. Additionally, the VoVGSCSP structure was introduced in the neck section to further reduce the model’s weight while also improving the fusion of features related to cotton aphid damage. Furthermore, a CA mechanism was added after the C2f module in the backbone network to increase the model’s ability to extract features of cotton aphid damage. The SPPF was replaced with the more computationally efficient SimSPPF network. Figure 2 illustrates the structure of the improved lightweight GVC-YOLO network model.

The convolution operations can efficiently conduct feature extraction and processing through local perception and parameter-sharing mechanisms; they are an indispensable part of computer vision and deep learning. However, standard convolution accumulates many parameters when dealing with high-resolution or high-channel data, increasing computational costs and memory usage.

The lightweight GSConv module [24] combines depthwise separable convolution and standard convolution, aiming to retain as many hidden connections between channels as possible while reducing model complexity and maintaining good accuracy. The structure of the GSConv module is illustrated in Figure 3. Suppose that the input channel is C_1, and the output channel is C₂. First, the number of channels is changed to C₂/2 after a standard convolution module; then, a depthwise separable convolution module is performed, and the number of channels remains unchanged. Then, the above two convoluted feature graphs are connected, and nonlinear shuffle operations are carried out to enhance the learning ability of the features. The mathematical expression of GSConv is depicted in Equation (3). The computational cost of GSConv is approximately 50% that of standard convolution, while its learning capability is comparable. This advantage is more evident for lightweight detectors.

F_{out} = S h u f f l e (Cat (f_{Conv} (X_{in}), f_{DWC} (f_{Conv} (X_{in}))))

(3)

In the feature fusion stage, referring to the idea of ResNet [25], we continue to introduce the GS bottleneck [24] on the basis of GSConv. Then, based on the Slim-Neck design concept, we adopt the one-shot aggregation method to design the cross-stage partial network (CSPNet) module VoVGSCSP [24]. The structure is shown in Figure 4. The calculations are illustrated in Equations (4) and (5). This feature fusion module effectively balances model accuracy and speed. Replacing the C2f module in the neck part with the VoVGSCSP module enables the model to maintain sufficient accuracy and a high feature reuse rate while reducing the complexity of the computation and network structure. Models based on the above improvements are easier to deploy in edge computing devices.

G S B_{out} = f_{G S C} (f_{G S C} (X_{in})) + f_{C o n v} (X_{in})

(4)

{V o V G S C S P}_{out} = f_{C o n v} (Concat ({G S B}_{out}, f_{C o n v} (X_{in})))

(5)

The attention mechanism [26] was initially proposed in the field of natural language processing (NLP) and has since been widely applied in computer vision. This mechanism dynamically optimizes the allocation of importance to local regions of images, enabling models to focus more on key information relevant to the current task while reducing attention to less relevant areas, thereby enhancing the effectiveness of target detection.

Coordinate attention (CA) [27] is an attention mechanism used to enhance deep learning models’ understanding of the spatial structure of input data. It encodes positional information in both the horizontal and vertical directions into channel attention, allowing the model to focus on large-scale spatial relationships without introducing excessive computational overhead. As illustrated in Figure 5, the CA mechanism first performs global average pooling operations on the input feature map in both the width and height directions. The resulting two feature maps are subsequently stacked to form a new feature layer with a shape of [C, 1, H + W]. Subsequently, convolutional dimension reduction, batch normalization, and nonlinear activation functions are applied to enrich the feature representation. Then, the features of the feature map in the width and height directions are separated, and the number of channels is adjusted via convolution. The attention weights g^h and g^w in two directions are obtained via the sigmoid activation function. Finally, the multiplicative weighted calculation is performed with the original feature map, and the final feature map with the attention weight is output. The multiplication weighting formula is shown in Equation (6).

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(6)

The CA mechanism, by embedding positional information, effectively enhances the model’s ability to localize cotton aphid damage to leaves in complex scenarios, thereby improving the accuracy of cotton aphid damage detection. Moreover, the CA mechanism is highly flexible, avoiding significant computational overhead and not increasing the complexity of the model.

Simplified spatial pyramid pooling-fast (SimSPPF) [28] is an efficient network proposed in YOLOv6, which is an improved version of SPPF. Its structure, as shown in Figure 6, serially processes three 5 × 5 scale max pooling operations and optimizes the SiLU activation function in SPPF to ReLU, thereby accelerating the convergence speed of the network and improving operational efficiency.

This network does not change the size or channels of the input feature map; its main function is to extract and fuse high-level features. During the fusion process, multiple max pooling operations are applied, greatly enriching the expressive power of the obtained feature maps. This allows for the extraction of semantic features from as many layers as possible, effectively avoiding the loss of local information in cotton aphid damage images and improving the efficiency and accuracy of feature extraction.

2.5. Experimental Platform

The experiments in this paper were conducted on a server platform with Ubuntu 20.04 LTS as the operating system. The hardware and software configuration were as follows: an Xeon(R) Gold 5218 CPU @ 2.30 GHz processor, 252 GB of RAM, and an NVIDIA A100 GPU with 40 GB of VRAM. The deep learning framework used was PyTorch 1.10 with CUDA 11.3, and GPU acceleration was performed via cuDNN 8.0.5. Multiple comparative experiments and ablation studies were conducted. The hyperparameters were kept consistent for each experiment to ensure the validity and reliability of the experiments. Detailed information is provided in Table 3.

2.6. Edge Computing Equipment

Traditional desktop PCs are not suitable for mounting on mobile mechanical equipment in the field due to issues such as their large size and high power consumption. Moreover, traditional cloud computing paradigms have drawbacks such as unstable video data transmission and slow speed [29]. Edge computing methods can effectively address these issues to achieve real-time identification of cotton aphid damage in the field. The Nvidia Jetson Xavier NX embedded edge computing device was selected for model deployment, as shown in Figure 7, with the device parameters detailed in Table 4.

The Nvidia Jetson Xavier NX is compact, powerful, and equipped with rich I/O interfaces. It supports a range of mainstream AI frameworks and algorithms, making it easy for developers to integrate models into the device for tasks such as image recognition, segmentation, and object detection in various computer vision applications [30]. The device’s software environment consists of the Ubuntu 18.04 operating system, Python 3.6, CUDA 10.2, and TensorRT 8.2 for accelerating PyTorch model inference. OpenCV 4.1.1 is used with support for GPU acceleration provided by CUDA.

2.7. Evaluation Indicators

In object detection algorithms, the performance of network models is evaluated via several metrics: precision (P), recall (R), and mean average precision (mAP). Additionally, when lightweight models are considered, parameters, floating-point operations (FLOPs), frames per second (FPS), and the size of the optimal model weight file are also important indicators for evaluating model performance [31]. The metrics [email protected] and [email protected]:0.95 represent the mAP values when the IoU threshold is 0.5 and when the IoU ranges from 0.5 to 0.95 with a step size of 0.05, respectively. A higher mAP indicates a higher overall accuracy of the model. FPS refers to the number of frames processed per second, which is primarily used to measure the detection speed of the model at the edge. A higher FPS value indicates that the model can identify and locate targets more quickly. The formula for each indicator is shown below:

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

A P = \int_{0}^{1} PR \cdot dr

(9)

mAP = \frac{1}{N} \sum_{i = 0}^{n} {AP}_{i}

(10)

F P S = \frac{N p}{T i}

(11)

where TP represents positive samples correctly predicted as positive samples, FP represents negative samples incorrectly predicted as positive samples, FN represents positive samples incorrectly predicted as negative samples, N represents the number of categories, Np represents the number of images, and Ti represents the detection time.

3. Results

3.1. GVC-YOLO Model Performance

Figure 8 shows the curves of the precision, recall, [email protected], and [email protected]:0.95 evaluation indicators over training epochs for the baseline model YOLOv8n and the improved GVC-YOLO model. The graph clearly shows that the P and R values of both models exhibit significant fluctuations during the first 150 epochs. In the subsequent 50 epochs, the curves gradually stabilize, with the GVC-YOLO model consistently outperforming the baseline model. The final P and R values for the GVC-YOLO model are 96% and 93.2%, respectively. With respect to the [email protected] and [email protected]:0.95 indicators, there is notable variability in the curves during the initial 75 epochs, followed by smoother increases in the subsequent 125 epochs, gradually approaching stability. Furthermore, the curves for the GVC-YOLO model consistently surpass those of the baseline model, with final [email protected] and [email protected]:0.95 values reaching 97.9% and 90.3%, respectively. These improvements can be attributed to the optimization of the original network structure, achieved through the introduction of the GSConv module for better feature extraction and the incorporation of the CA mechanism, enhancing the model’s ability to extract and localize cotton aphid damage features in images.

It is important to prioritize lightweighting the model while maintaining accuracy to facilitate deployment of the model on resource-constrained edge devices, thereby improving the detection speed at the edge and achieving real-time detection. As shown in Figure 9, a comparison was made between GVC-YOLO and YOLOv8n in terms of parameters, FLOPs, and optimal weight file sizes. The improved GVC-YOLO model exhibited a reduction of 16.7% in the number of parameters and 17.1% in FLOPs. The final size of the optimal model weight file generated was 5.4 MB, representing a 14.3% decrease compared with that before the improvements. In summary, the GVC-YOLO model underwent further optimization while maintaining the high accuracy of the original model and achieved a significant reduction in model size, which is beneficial for subsequent deployment on the edge for real-time detection.

Figure 10 depicts the partial detection results on the test set. The figure shows that both models perform well in detecting various levels of cotton aphid damage in real-world scenarios. However, in some complex scenarios, such as leaf occlusion, poor lighting conditions, and small leaf targets, YOLOv8n results in instances of partial omission. The GVC-YOLO model enhances the detection capability in complex scenarios, achieving precise identification of damaged leaves with high confidence and effectively alleviating the issue of omissions. Additionally, Figure 10 provides a heatmap visualization of the detection results of the GVC-YOLO model. By using Grad-CAM [32], the key regions in the detection result images can be intuitively displayed, further analyzing the effectiveness and interpretability of the model’s detection.

The confusion matrix of the GVC-YOLO model on the test set is illustrated in Figure 11. The model exhibits partial confusion in detecting cotton aphid damage at levels 0 and 1, as these two levels have similar leaf curl characteristics, making it more challenging for the model to differentiate between them. However, overall, the GVC-YOLO model demonstrates good detection performance for all four levels of cotton aphid damage, with an extremely low false detection rate.

3.2. Ablation Experiment

An ablation experiment was conducted using YOLOv8n as the baseline model under the same experimental conditions to validate the optimization functionality of the GSConv, VoVGSCSP, CA, and SimSPPF modules for the GCV-YOLO model, as shown in Table 5. Replacing the standard convolutions in the YOLOv8n network structure with the lightweight GSConv significantly reduced the model’s complexity while also improving its accuracy. When both the GSConv and the VoVGSCSP modules, which are based on the Slim-Neck paradigm, were introduced simultaneously, the model’s complexity and parameter count were further reduced, effectively increasing the model’s detection speed. In addition, the CA mechanism is added to the above modifications, and owing to the lightweight nature of this attention mechanism, only a small increase in the computational effort and complexity of the model was recorded, which has little impact on the speed of detection. Finally, the SPPF structure was replaced with the more efficient SimSPPF network structure. When all four modules were simultaneously incorporated into the network structure, the best detection accuracy and fastest detection speed were achieved, indicating a comprehensive optimal detection performance compared with those of the other models. We designed a lightweight structure for the benchmark model, including reducing the number of network layers and parameters and reducing computational complexity and memory usage through techniques such as Group Shuffle Conv. We optimized the feature extraction module, introduced a new feature fusion strategy and coordinate attention mechanism, and enhanced the model’s ability to detect multiscale targets. In the edge computing environment, these improvements significantly improve the model’s detection speed for pests and diseases while ensuring high detection accuracy.

3.3. Comparison with Mainstream Object Detection Models

To further validate the comprehensive performance advantages of the proposed GVC-YOLO model, performance comparison experiments were conducted with six mainstream object detection algorithms: RT-DETR-L [33], YOLOX-tiny [34], YOLOv5n [35], YOLOv6n [36], YOLOv7-tiny [37], and Faster R-CNN (ResNet50). The software and hardware configurations, as well as the training hyperparameters, were kept consistent across all the experiments. The results are shown in Table 6.

The lightweight GVC-YOLO detection method proposed in this paper outperforms current mainstream object detection models with significant advantages. The GVC-YOLO model achieves the highest [email protected] value of 97.9%, which is 1.6%, 1.2%, 2.6%, 6.0%, 7.6%, and 6.6% higher than those of RT-DETR-L, YOLOX-tiny, YOLOv5n, YOLOv6n, YOLOv7-tiny, and Faster R-CNN (ResNet50), respectively. In terms of the indicators used to assess model lightweight, the GVC-YOLO model’s parameters, FLOPs, and optimal weight file size are 2.5 M, 6.8 G, and 5.4 MB, respectively, making it the best among the compared models.

Faster R-CNN, as a classic two-stage object detection model, has significant drawbacks in terms of recognition speed compared with one-stage object detection models such as the YOLO series, rendering it unable to meet real-time requirements. RT-DETR is an end-to-end object detection model based on a transformer [38] that performs well in real-time object detection. However, compared with the model proposed in this research, it shows a significant gap in detection speed while maintaining similar model accuracy, thus failing to meet real-time detection requirements.

Figure 12 presents a radar chart analysis comparing GVC-YOLO with other mainstream YOLO series models, demonstrating its significant advantages in both detection accuracy and model lightweightness. In conclusion, the improved lightweight GVC-YOLO model exhibits robust detection capabilities in complex environments while achieving high detection accuracy with a reduced model size. This makes it well suited for deployment on edge devices with high flexibility requirements, outperforming some of the current mainstream object detection models.

3.4. Model Deployment and Real-Time Detection

TensorRT is a high-performance deep learning inference optimizer designed for NVIDIA GPUs and Jetson series hardware. It provides low-latency, high-throughput deployment inference for deep learning applications. When combined with NVIDIA GPUs, TensorRT enables fast and efficient deployment inference in almost all frameworks [39].

To further validate the effectiveness of the improved lightweight model on edge devices, the proposed GVC-YOLO model is deployed and tested on the edge device Jetson Xavier NX. The deployment process, as illustrated in Figure 13, involves first converting the trained PyTorch model into the open intermediate representation format ONNX [40]. The ONNX format file is subsequently transferred to the edge device, where TensorRT is used to generate the accelerated network inference engine. The generated engine file can be serialized for permanent storage and subsequently loaded through deserialization for real-time inference on video data obtained from cameras.

During training, deep learning models typically use Float32 (FP32) precision for high accuracy. However, during model inference, to achieve accelerated inference, TensorRT also supports inference precision in the Float16 (FP16) [41] and INT8 [42] quantization modes. Zhang et al. [43] successfully deployed an improved lightweight strawberry fruit detection model to Jetson nano and used the TensorRT method to optimize and accelerate the model, enabling real-time detection on edge devices and facilitating mechanically automated strawberry harvesting. Figure 14 shows the detection speed of the GVC-YOLO model deployed on Jetson Xavier NX via the INT8 and FP16 quantization methods. The graph shows that the detection speed of the model significantly improved after TensorRT optimization. Under FP16 precision, the FPS is approximately 48, whereas under INT8 precision, the FPS can reach 53, demonstrating excellent real-time performance.

Figure 15 presents the real-time detection results of the GVC-YOLO model deployed on Jetson Xavier NX in actual cotton field scenarios. We tested the model via a video recorded in the cotton field, which was approximately 1 min long with a frame rate of 30 fps. We randomly select 12 frames from the real-time detection results for demonstration. The results indicate that even in complex real-world scenarios involving occlusion and overlap of cotton leaves in the canopy layer, the GVC-YOLO model exhibited both fast detection speed and high detection accuracy, achieving remarkable performance in detecting cotton aphid damage to leaves.

The above validation effectively demonstrates the high real-time performance and accuracy of the proposed GVC-YOLO model for video detection at the edge. This method can meet the requirements of pest detection in practical agricultural scenarios, providing the necessary theoretical foundation and technical support for the development of modern agriculture toward intelligence and automation.

4. Discussion

The utilization of intelligent methods for pest and disease detection in farmlands is a critical component of achieving smart agriculture. Currently, there is extensive research on pest and disease detection or monitoring. To monitor pests and diseases over large areas, researchers have employed satellite remote sensing technology to establish monitoring models capable of long-term damage prediction before the onset of infestations [44]. With the rapid development of drone technology, methods for monitoring farmland pests and diseases via drones equipped with multispectral and hyperspectral cameras have been extensively studied by scholars both domestically and internationally [45]. Drone remote sensing has unique advantages for monitoring pests and diseases.

The aforementioned studies are both macroscopic, providing qualitative monitoring of pest and disease occurrence over large areas of farmland. This paper aims to utilize deep learning-based object detection algorithms to perform quantitative assessments of pest and disease infestations in farmland by annotating RGB images or videos captured by smartphones or drones. Advanced object detection models were trained on many labeled images, demonstrating high-precision recognition of pest and disease images. Currently, most related research is carried out in static scenes with traditional computing methods. Video detection at the edge can better improve the efficiency of pest and disease detection, which is more challenging and has more research value. Owing to the limited computational resources of edge mobile devices, running object detection models on such devices poses significant challenges, especially for real-time identification, which demands more computational resources. We are also working on improving the algorithm to develop a lightweight model. The complex background of farmland scenes and the occlusion of cotton leaves pose challenges for recognition. We aim to increase the detection performance by adding attention mechanisms and improving the backbone network.

On the basis of the above research, we developed the CVC-YOLO model for high-precision identification of cotton aphid damage in the field. The model is characterized by high accuracy and speed, enabling precise and rapid identification of leaf damage caused by cotton aphids. The size of the GVC-YOLO model is only 5.4 MB. In terms of detection accuracy, the [email protected]:0.95 reached 90.3%. Finally, we deployed the proposed GVC-YOLO model on the edge computing device Jetson Xavier NX, achieving fast and accurate detection of aphid-damaged leaves at the edge.

We compared the GVC-YOLO model with both classic and state-of-the-art methods in the field. Two-stage methods, such as Faster R-CNN, are not suitable for deployment on edge mobile devices because of their slow detection speeds. In contrast, the YOLO series models feature a simple structure, making them easy to deploy across various platforms while demonstrating strong adaptability. By optimizing the network structure, we further reduce the model’s computational requirements, thereby increasing its detection speed on edge devices and enabling real-time performance. The deployed edge computing device is small and low-power and can be connected to cameras mounted on drones and spraying tractors, enabling real-time detection at the edge. By analyzing real-time detection results and integrating them with an automatic spraying control system, a precise pest control technology system can be established. This provides technical support for precise pesticide application, reducing pesticide waste and lowering agricultural costs.

Despite our efforts to conduct thorough and comprehensive research, our work has several shortcomings. First, although our improvements to the original model greatly reduced the complexity of the model and increased the speed of detection, the improvement in detection accuracy was not significant. This suggests that continued efforts are needed to further improve the accuracy of pest and disease detection. In addition, to fully demonstrate the benefits of the lightweight model, it needs to be tested and deployed on edge device platforms with more limited computational resources, such as the Jetson nano. This will also be the focus of research in our future work.

5. Conclusions

This study proposes a lightweight object detection model called GVC-YOLO to address prominent issues such as the inability to run high-complexity models efficiently in real time on edge devices with limited computational resources. On the basis of the YOLOv8n model, the model’s volume is significantly reduced by introducing the lightweight modules GSConv and VoVGSCSP, thereby reducing model complexity and accelerating detection speed. Additionally, the CA mechanism and SimSPPF network are incorporated to enhance the model’s detection capability for cotton aphid damage features and improve detection accuracy. To further validate the proposed model’s effectiveness, the lightweight GVC-YOLO model is optimized and deployed to the edge computing device Jetson Xavier NX via TensorRT and tested for detection analysis on video data captured in actual cotton field scenarios.

The main conclusions are as follows: the GVC-YOLO model achieves [email protected] and [email protected]:0.95 values of 97.9% and 90.3%, respectively, demonstrating high accuracy in recognizing cotton aphid damage and meeting practical detection requirements. Moreover, the parameters and FLOPs are 2.5 M and 6.8 G, respectively, with the optimal model weight size being only 5.4 MB. Compared with YOLOv8n, these figures represent reductions of 16.7%, 17.1%, and 14.3%, respectively, making it the most lightweight and efficient model of its kind for the detection of cotton aphid-damaged leaves. When deployed in actual cotton field scenarios, the GVC-YOLO model achieves a detection speed of 48 FPS with FP16 precision, meeting the real-time detection requirements at the edge.

The lightweight detection method proposed in this study can also be transferred and applied to real-time detection and analysis of various other crop pests and diseases, which is highly important for the mechanization and intelligent development of crop pest control. Edge computing devices can be installed on ground sprayers or low-altitude agricultural drones, and real-time pest information detected by the model can be displayed directly. Farmers can make decisions on the basis of detection results, control pesticide spraying in real time, and realize an integrated process from pest detection to prevention and control. This significantly reduces manual labor and achieves more precise and efficient development in the agricultural industry.

Author Contributions

Conceptualization, Z.Z. and H.Q.; Data curation, Z.Z. and Y.Y.; Formal analysis, J.Y.; Funding acquisition, Y.L. and H.Q.; Investigation, L.L.; Methodology, Z.Z. and Y.Y.; Project administration, H.Q.; Resources, J.L.; Software, X.X.; Supervision, Y.L. and J.L.; Validation, L.L., J.Y. and R.D.; Visualization, X.X.; Writing—original draft, Z.Z.; Writing—review and editing, H.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Key R&D projects during the 14th Five Year Plan period [2022YFD1400302] and the National Natural Science Foundation of China [U2003119].

Data Availability Statement

The raw/processed data required to reproduce the above findings cannot be shared at this time, as the data also form part of an ongoing study.

Acknowledgments

The authors thank the graduate students from the School of Information and Management Sciences of Henan Agricultural University and the National Agricultural Extension Technology Service Center for their continuous support of our research.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Zhou, Y.; Li, F.; Xin, Q.; Li, Y.; Lin, Z. Historical variability of cotton yield and response to climate and agronomic management in Xinjiang, China. Sci. Total Environ. 2024, 912, 169327. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Wahid, A.; Ahmad, M.; Tahir, M.T.; Ahmed, M.; Ahmad, S.; Hasanuzzaman, M. World Cotton Production and Consumption: An Overview. In Cotton Production and Uses: Agronomy, Crop Protection, and Postharvest Technologies; Ahmad, S., Hasanuzzaman, M., Eds.; Springer: Singapore, 2020; pp. 1–7. [Google Scholar]
Hu, X.; Qiao, H.; Chen, B.; Si, H. A novel approach to grade cotton aphid damage severity with hyperspectral index reconstruction. Appl. Sci. 2022, 12, 8760. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar] [CrossRef] [PubMed]
Alves, A.N.; Souza, W.S.; Borges, D.L. Cotton pests classification in field-based images using deep residual networks. Comput. Electron. Agric. 2020, 174, 105488. [Google Scholar] [CrossRef]
Bao, W.; Cheng, T.; Zhou, X.-G.; Guo, W.; Wang, Y.; Zhang, X.; Qiao, H.; Zhang, D. An improved DenseNet model to classify the damage caused by cotton aphid. Comput. Electron. Agric. 2022, 203, 107485. [Google Scholar] [CrossRef]
Kouadio, L.; El Jarroudi, M.; Belabess, Z.; Laasli, S.-E.; Roni, M.Z.K.; Amine, I.D.I.; Mokhtari, N.; Mokrini, F.; Junk, J.; Lahlali, R. A Review on UAV-Based Applications for Plant Disease Detection and Monitoring. Remote Sens. 2023, 15, 4273. [Google Scholar] [CrossRef]
Li, X.; Liang, Z.; Yang, G.; Lin, T.; Liu, B. Assessing the Severity of Verticillium Wilt in Cotton Fields and Constructing Pesticide Application Prescription Maps Using Unmanned Aerial Vehicle (UAV) Multispectral Images. Drones 2024, 8, 176. [Google Scholar] [CrossRef]
Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A survey of deep learning-based object detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Chen, W.; Ma, Y.; Zhang, Z.; Gao, P.; Lv, X. Cotton Seedling Detection and Counting Based on UAV Multispectral Images and Deep Learning Methods. Remote Sens. 2023, 15, 2680. [Google Scholar] [CrossRef]
Li, D.; Ahmed, F.; Wu, N.; Sethi, A.I. Yolo-JD: A Deep Learning Network for jute diseases and pests detection from images. Plants 2022, 11, 937. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Shi, J.; Chen, Y.; He, Q.; Liu, L.; Sun, T.; Ding, R.; Lu, Y.; Xue, C.; Qiao, H. Research on machine vision and deep learning based recognition of cotton seedling aphid infestation level. Front. Plant Sci. 2023, 14, 1200901. [Google Scholar] [CrossRef] [PubMed]
Xu, W.; Xu, T.; Thomasson, J.A.; Chen, W.; Karthikeyan, R.; Tian, G.; Shi, Y.; Ji, C.; Su, Q. A lightweight SSV2-YOLO based model for detection of sugarcane aphids in unstructured natural environments. Comput. Electron. Agric. 2023, 211, 107961. [Google Scholar] [CrossRef]
Lin, Y.; Wang, L.; Chen, T.; Liu, Y.; Zhang, L. Monitoring system for peanut leaf disease based on a lightweight deep learning model. Comput. Electron. Agric. 2024, 222, 109055. [Google Scholar] [CrossRef]
Tannous, M.; Stefanini, C.; Romano, D. A Deep-Learning-Based Detection Approach for the Identification of Insect Species of Economic Importance. Insects 2023, 14, 148. [Google Scholar] [CrossRef] [PubMed]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. Internet Things J. IEEE 2016, 3, 637–646. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
GB/T 15799-2011; Rules of Monitoring and Forecast of the Cotton Aphid(Aphis Gossypii Glover). China Standard Press: Beijing, China, 2011.
Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A Review on YOLOv8 and Its Advancements; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Ren, F.; Fei, J.; Li, H.; Doma, B.T. Steel Surface Defect Detection Using Improved Deep Learning Algorithm: ECA-SimSPPF-SIoU-Yolov5. IEEE Access 2024, 12, 32545–32553. [Google Scholar] [CrossRef]
Sandhu, A.K. Big data with cloud computing: Discussions and challenges. Big Data Min. Anal. 2021, 5, 32–40. [Google Scholar] [CrossRef]
Wasule, S.; Khadatkar, G.; Pendke, V.; Rane, P. Xavier Vision: Pioneering Autonomous Vehicle Perception with YOLO v8 on Jetson Xavier NX. In Proceedings of the 2023 IEEE Pune Section International Conference (PuneCon), Pune, India, 15–16 December 2023; pp. 1–6. [Google Scholar]
Guan, H.; Deng, H.; Ma, X.; Zhang, T.; Zhang, Y.; Zhu, T.; Zhou, H.; Gu, Z.; Lu, Y. A corn canopy organs detection method based on improved DBi-YOLOv8 network. Eur. J. Agron. 2024, 154, 127076. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Yuan, S.; Du, Y.; Liu, M.; Yue, S.; Li, B.; Zhang, H. YOLOv5-Ytiny: A miniature aggregate detection and classification model. Electronics 2022, 11, 1743. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Jeong, E.; Kim, J.; Ha, S. Tensorrt-based framework and optimization methodology for deep learning inference on jetson boards. ACM Trans. Embed. Comput. Syst. (TECS) 2022, 21, 1–26. [Google Scholar] [CrossRef]
Zhou, Y.; Yang, K. Exploring tensorrt to improve real-time inference for deep learning. In Proceedings of the 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Hainan, China, 18–20 December 2022; pp. 2011–2018. [Google Scholar]
Haidar, A.; Tomov, S.; Dongarra, J.; Higham, N.J. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 11–16 November 2018; pp. 603–613. [Google Scholar]
Kim, S.; Park, G.; Yi, Y. Performance evaluation of INT8 quantized inference on mobile GPUs. IEEE Access 2021, 9, 164245–164255. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, J.; Chen, Y.; Yang, W.; Zhang, W.; He, Y. Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application. Comput. Electron. Agric. 2022, 192, 106586. [Google Scholar] [CrossRef]
Sári-Barnácz, F.E.; Zalai, M.; Toepfer, S.; Milics, G.; Iványi, D.; Tóthné Kun, M.; Mészáros, J.; Árvai, M.; Kiss, J. Suitability of Satellite Imagery for Surveillance of Maize Ear Damage by Cotton Bollworm (Helicoverpa armigera) Larvae. Remote Sens. 2023, 15, 5602. [Google Scholar] [CrossRef]
Xu, D.; Lu, Y.; Liang, H.; Lu, Z.; Yu, L.; Liu, Q. Areca Yellow Leaf Disease Severity Monitoring Using UAV-Based Multispectral and Thermal Infrared Imagery. Remote Sens. 2023, 15, 3114. [Google Scholar] [CrossRef]

Figure 1. Images of damage caused by cotton aphids.

Figure 2. GVC-YOLO model structure diagram.

Figure 3. GSConv structure diagram.

Figure 4. GS bottleneck and VoVGSCSP structure.

Figure 5. Coordinate attention mechanism structure.

Figure 6. Network structure of SimSPPF.

Figure 7. Embedded edge computing device: (a) Jetson Xavier NX; (b) Depth camera-USB.

Figure 8. Comparison of the training curves of GVC-YOLO and YOLOv8n.

Figure 9. Lightweight comparison of the GVC-YOLO and YOLOv8n models.

Figure 10. Comparison of the detection effects on actual scenes: (a) original images; (b) YOLOv8n; (c) GVC-YOLO; and (d) Grad-CAM visualization.

Figure 11. Confusion matrix of the GVC-YOLO model on the test set.

Figure 12. Performance comparison of YOLO series algorithms.

Figure 13. Edge deployment process.

Figure 14. Video stream detection speed at the edge.

Figure 15. Real-time video detection screen on Jetson.

Table 1. Classification standard for leaves damaged by cotton aphids.

Level of Leaf Damage Caused by Cotton Aphids	Description of Symptoms or Damage
0	Healthy, no aphids, leaves are flat
1	With aphids, the most heavily affected leaves are crumpled
2	With aphids, the most affected leaves are slightly rolled, nearly semicircular
3	With aphids, the most affected leaves are curled up or more semicircular, curved, or spherical

Table 2. Basic information of the cotton aphid damage dataset.

Category	Number of Images	Number of Labels
Category	Number of Images	0	1	2	3
Training set	13730	8551	8835	8980	8917
Validation set	1695	1070	1086	1108	1102
Test set	1525	933	933	978	1006
Total	16950	10554	10854	11066	11025

Table 3. Training hyperparameter configuration.

Hyperparameter	Value
Epoch	200
Batch size	32
Image size	640 × 640 pixels
Number of workers	8
Optimizer	SGD
Lr0	0.001
Lrf	0.1

Table 4. Jetson Xavier NX equipment parameters.

Parameter	Technical Specifications
CPU	6-core NVIDIA Carmel ARM v8.2 64-bit
GPU	384-core NVIDIA Volta GPU with 48 Tensor Cores
Video memory	8 GB 128-bit LPDDR4
Storage	16 GB eMMC + 128 GB SSD
Camera	12 lanes MIPI CSI-2D-PHY1.2 Up to 6 cameras
Power dissipation	10 W\|15 W
AI performance	21 TOPS (INT8)
Deep learning accelerator	2× NVDLA Engines
Visual accelerator	7-Way VLIW Vision Processor

Table 5. Ablation test and analysis of results.

YOLOv8	GSConv	VoVGSCSP	CA	SimSPPF	mAP@ 0.5:0.95	mAP@ 0.5	Params (M)	FLOPs (G)	Size (MB)
√	-	-	-	-	89.1%	97.5%	3.01	8.2	6.3
√	√	-	-	-	90.0%	97.8%	2.73	7.6	5.7
√	√	√	-	-	89.6%	97.5%	2.52	6.7	5.4
√	√	√	√	-	90.1%	97.7%	2.53	6.8	5.4
√	√	√	√	√	90.3%	97.9%	2.53	6.7	5.4

Table 6. Analysis of the results of different object detection models.

Model	[email protected]:0.95	[email protected]	Params (M)	FLOPs (G)	Size (MB)
RT-DETR-L	89.3%	96.3%	32.8	108.0	66.2
YOLOX-tiny	76.3%	96.7%	5.0	7.6	70.3
YOLOv5n	85.7%	95.4%	4.1	7.3	8.6
YOLOv6n	83.9%	92.4%	4.2	11.8	8.7
YOLOv7-tiny	78.0%	91.0%	6.0	13.2	12.3
Faster R-CNN (ResNet50)	84.2%	90.8%	41.4	90.7	338.0
GVC-YOLO (Ours)	90.3%	97.9%	2.5	6.8	5.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Yang, Y.; Xu, X.; Liu, L.; Yue, J.; Ding, R.; Lu, Y.; Liu, J.; Qiao, H. GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing. Remote Sens. 2024, 16, 3046. https://doi.org/10.3390/rs16163046

AMA Style

Zhang Z, Yang Y, Xu X, Liu L, Yue J, Ding R, Lu Y, Liu J, Qiao H. GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing. Remote Sensing. 2024; 16(16):3046. https://doi.org/10.3390/rs16163046

Chicago/Turabian Style

Zhang, Zhenyu, Yunfan Yang, Xin Xu, Liangliang Liu, Jibo Yue, Ruifeng Ding, Yanhui Lu, Jie Liu, and Hongbo Qiao. 2024. "GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing" Remote Sensing 16, no. 16: 3046. https://doi.org/10.3390/rs16163046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu