CN116935296A

CN116935296A - Orchard environment scene detection method and terminal based on multitask deep learning

Info

Publication number: CN116935296A
Application number: CN202310901547.8A
Authority: CN
Inventors: 赵文锋; 林暖晨; 江政文; 梁升濠; 刘易迪; 蓝海洋; 黄袁爵; 钟敏悦; 李振源
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-24

Abstract

The invention discloses an orchard environment scene detection method and terminal based on multitask deep learning, wherein the method comprises the steps of collecting an orchard environment image and constructing a data set; inputting an orchard environment image into an improved MobileNet 3 backbone network, and sequentially obtaining an output characteristic diagram through a CBS layer and an improved bneck module; the output feature map is generated and fused with features of different scales through a spatial pyramid pool SPP module, and different semantic hierarchy features are generated and fused through a feature pyramid network FPN module; and generating images with different scale features and different semantic hierarchy features, decoding the images through a target detection decoding head and a semantic segmentation decoding head simultaneously, obtaining a detection target through the target detection decoding head, and segmenting a travelable region through the semantic segmentation decoding head. According to the invention, semantic segmentation and target detection tasks are combined for processing, so that the simultaneous identification of the drivable area and the obstacle of the orchard is realized.

Description

Orchard environment scene detection method and terminal based on multitask deep learning

Technical Field

The invention relates to the technical field of orchard management, in particular to an orchard environment scene detection method and terminal based on multi-task deep learning.

Background

Along with the enlargement of the planting area of the orchard in China, the progress of agricultural mechanization and the rise of labor cost, the standardized development and intelligent management of the orchard are required, and the necessary trend of future development is also realized. In order to improve the efficiency of orchard management, reduce the labor intensity and simultaneously reduce the production cost, fruit farmers need to adopt an orchard environment automatic driving technology. The technology needs to rely on the perception technology of the orchard environment to provide important information for the automatic driving decision-making module. Image recognition is a very important aspect in the perception technology of the orchard environment.

Conventional pavement identification algorithms typically employ extraction of surface features such as texture, color, shape, etc. However, this method has a problem in that it lacks extraction and expression of deep features and advanced semantic information, and thus it does not perform well in recognition of complex unstructured orchard road scenes. The orchard environment recognition algorithm adopts various means to respectively process semantic segmentation and target detection tasks, so that higher accuracy is realized. While these methods work well, if the tasks are processed sequentially, they are time consuming than disposable.

Therefore, how to provide a method and a terminal for detecting an orchard environment scene through multi-task deep learning is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides an orchard environment scene detection method and terminal based on multi-task deep learning, which can realize the joint task of dividing a drivable area and detecting obstacles, share the same backbone network, and have higher real-time performance, higher accuracy and shorter time.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the orchard environment scene detection method based on multitask deep learning comprises the following steps:

collecting an orchard environment image and constructing a data set;

inputting an orchard environment image into an improved MobileNet 3 backbone network, and sequentially obtaining an output characteristic diagram through a CBS layer and an improved bneck module;

the output feature map is generated and fused with features of different scales through a spatial pyramid pool SPP module, and different semantic hierarchy features are generated and fused through a feature pyramid network FPN module;

and generating images with different scale features and different semantic hierarchy features, decoding the images through a target detection decoding head and a semantic segmentation decoding head simultaneously, obtaining a detection target through the target detection decoding head, and segmenting a travelable region through the semantic segmentation decoding head.

Preferably, the improved bneck module specifically processes:

step a: the input features are subjected to dimension increasing through 1X 1 point-by-point convolution in sequence, the features after dimension increasing are subjected to 3X 3 depth convolution, and then the features are processed through an ECA-Net attention module;

step b: in an ECA-Net attention module, global average pooling operation is adopted for each channel of a feature matrix of an input feature diagram, local cross-channel information interaction is realized through one-dimensional convolution, attention diagram is formed through a sigmoid activation function, and the input feature diagram and the attention diagram are subjected to Hadamard product;

the ECA-Net attention module generates weights ω for each channel by a one-dimensional convolution of size K:

ω＝σ(C1 D _K (y))

wherein ,C1D_K Representing one-dimensional convolution with a convolution kernel of size K, y representing the channel, σ representing the sigmoid activation function; the mapping relationship between the channel dimensions C and K is as follows:

C＝Φ(K)≈exp(γ×K-b)

i.e. given the channel dimension C, the convolution kernel size K is adaptively determined:

wherein ,|t|_odd An odd number representing the nearest distance t, γ and b representing constants, the value of γ being set to 2, the value of b being set to 1;

step c: and convolving the upscales to the sizes of the input feature images point by 1X 1, and finally adding the feature images after upscales and the original feature images to obtain the output feature images.

Preferably, the target detection decoding head detects lossThe method comprises the following steps:

wherein ,indicating the loss of overlap->Indicating center distance loss, < >>Representing the loss of width and height; ioU is the intersection ratio of the bounding box and the real bounding box, b ^(gt) Respectively representing the center point coordinates of the prediction boundary frame and the center point coordinates, w and w of the real boundary frame ^(gt) Respectively represents the predicted width, the true width, h and h ^(gt) Respectively represent the predicted and true height, ρ ² (. Cndot.) represents Euclidean distance, c represents diagonal distance of minimum rectangle that can wrap the predicted bounding box and the real bounding box; c _w ，c _h Representing the width, height of the smallest rectangular box surrounding the prediction bounding box and the real bounding box;

semantic segmentation solves for region loss that can travel in wharfThe method comprises the following steps:

wherein ,p_t Representing the probability of correct prediction of the model for classification, p when the prediction is correct _t P, otherwise p _t ＝1-p，υ _t and υ_ω Is super parameter, and is used for modulating the positive and negative sample weight and controlling the difficult and easy sample weight respectively;

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

Preferably, acquiring an image of an orchard environment and constructing a data set specifically includes:

step 1.1, respectively photographing an orchard environment image under a natural environment at different time intervals, different illumination angles and different visual angles;

step 1.2, carrying out target detection labeling on fruit tree targets in an orchard environment, and carrying out semantic segmentation labeling on a drivable area;

and 1.3, carrying out data enhancement and augmentation on the orchard environment image through brightness adjustment, gaussian blur, affine transformation, mirror image overturning and eclosion treatment, and constructing a data set.

Orchard environment scene detection terminal based on multitasking deep learning includes: the robot comprises a camera, a processor, a navigation decision module and a robot main body, wherein the processor is provided with a deep learning model, the deep learning model comprises an improved MobileNetv3 backbone network, a spatial pyramid pool SPP module, a feature pyramid network FPN module, a target detection solution terminal and a semantic segmentation solution terminal, and the improved MobileNetv3 backbone network comprises a CBS layer and an improved bnck module;

the camera is carried on the robot main body and used for collecting an orchard environment image;

the processor is arranged in the robot main body and is used for inputting an orchard environment image into the improved MobileNetv3 backbone network, and an output characteristic diagram is obtained through the CBS layer and the improved bneck module in sequence;

the method comprises the steps of generating images with different scale features and different semantic hierarchy features, decoding the images through a target detection decoding head and a semantic segmentation decoding head at the same time, obtaining a detection target through the target detection decoding head, and segmenting a drivable area through the semantic segmentation decoding head;

the navigation decision module is arranged inside the robot main body and is used for calculating a corresponding path through the drivable area and the detection target and controlling the movement of the robot main body.

Preferably, the improved bneck module specifically processes:

ω＝σ(C1D _K (y))

C＝Φ(K)≈exp(γ×K-b)

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

A computer readable medium storing instructions that, when executed on the readable medium, cause the readable medium to perform a multitasking deep learning based method of orchard environment scene detection.

A processing terminal comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor realizes an orchard environment scene detection method based on multi-task deep learning when executing the computer program.

Compared with the prior art, the invention discloses an orchard environment scene detection method and terminal based on multi-task deep learning, which are used for processing semantic segmentation and target detection tasks in a combined way, so that the simultaneous identification of a drivable area and an obstacle of an orchard is realized. The semantic segmentation task is mainly used for segmenting a exercisable area, and the object detection is mainly used for detecting obstacles. By adopting the method, the efficiency of orchard management can be improved, the labor intensity can be reduced, and the production cost can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an orchard environment scene detection method based on multi-task deep learning.

FIG. 2 is a schematic diagram of the deep learning model structure of the present invention.

FIG. 3 is a schematic diagram of a modified bneck module structure of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses an orchard environment scene detection method based on multi-task deep learning, which is shown in fig. 1 and comprises the following steps:

collecting an orchard environment image and constructing a data set;

and generating images with different scale features and different semantic hierarchy features, decoding the images through a target detection decoding head and a semantic segmentation decoding head simultaneously, obtaining a detection target through the target detection decoding head, and segmenting a travelable region through the semantic segmentation decoding head. The driving area is divided into 2 minutes: a travelable region, a non-travelable region; obstacle detection targets are classified into people, fruit trees and the like.

In this embodiment, in the modified MobileNetv3 backbone network, the present invention uses the ECA attention mechanism to replace SE blocks in the original bneck, enhancing feature extraction.

As shown in fig. 2, the deep learning model includes an improved MobileNetv3 backbone network, a spatial pyramid pool SPP module, a feature pyramid network FPN module, a target detection solution terminal and a semantic segmentation solution terminal, and the MobileNetv3 backbone network includes a CBS layer and an improved bneck module.

The CBS layer consists of a two-dimensional convolution layer, a BN layer and a SiLU activation function:

SiLU(x)＝x·Sigmoid(x)

as shown in fig. 3, the modified bneck module specifically processes:

ω＝σ(C1D _K (y))

C＝Φ(K)≈exp(γ×K-b)

wherein ,p_t Representing the probability of correct prediction of the model for classification, p when the prediction is correct _t P, otherwise p _t ＝1-p，υ _t and υ_ω Is prepared from radix Ginseng RubraThe number is used for modulating the positive and negative sample weights and controlling the difficult and easy sample weights respectively;

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

In this embodiment, capturing an image of an orchard environment and constructing a data set specifically includes:

step 1.2, carrying out target detection labeling on fruit tree targets in an orchard environment, and carrying out semantic segmentation labeling on a drivable area; manually marking by using labelme marking software;

and 1.3, carrying out data enhancement and augmentation on the orchard environment image through brightness adjustment, gaussian blur, affine transformation, mirror image overturning and rainfall treatment, constructing a data set, and dividing the data set into a training set, a testing set and a verification set.

The embodiment provides an orchard environment scene detection terminal based on multitasking deep learning, which comprises: the robot comprises a camera, a processor, a navigation decision module and a robot main body, wherein the processor is provided with a deep learning model, the deep learning model comprises an improved MobileNetv3 backbone network, a spatial pyramid pool SPP module, a feature pyramid network FPN module, a target detection solution terminal and a semantic segmentation solution terminal, and the MobileNetv3 backbone network comprises a CBS layer and an improved bneck module;

In this embodiment, the specific processing procedure of the improved bneck module is as follows:

ω＝σ(C1D _K (y))

C＝Φ(K)≈exp(γ×K-b)

Target detection decoding head detection lossThe method comprises the following steps:

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

The invention can be used for:

1. and (3) orchard planting management: the invention can be used by fruit farmers to detect the condition of the orchard road, discover problems in time and repair the problems, thereby improving the efficiency of orchard management and the fruit yield.

2. Orchard study and analysis: the method can be used for orchard research and analysis, for example, analysis of influence of road distribution and shape on fruit tree growth, and improvement of fruit tree quality and yield.

3. Automatic fruit picking robot: the fruit picking machine can help the machine to identify roads and paths in an orchard, avoid entering fruit tree areas by mistake, ensure safe operation of the machine, and improve the efficiency and precision of automatic fruit picking. The automatic fruit picking machine is helped to plan the optimal travelling path, so that travelling distance and time are reduced to the maximum extent, and the working efficiency and economic benefit of the machine are improved.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The orchard environment scene detection method based on multitask deep learning is characterized by comprising the following steps of:

collecting an orchard environment image and constructing a data set;

2. The method for detecting the environment scene of the orchard based on the multi-task deep learning according to claim 1, wherein the specific processing procedure of the improved bneck module is as follows:

ω＝σ(C1D _K (y))

C＝Φ(K)≈exp(γ×K-b)

3. The method for detecting an orchard environment scene based on multi-task deep learning according to claim 1, wherein the target detection decoding head detects lossThe method comprises the following steps:

wherein ,indicating the loss of overlap->Indicating center distance loss, < >>Representing the loss of width and height; ioU is the intersection ratio of the prediction boundary frame and the real boundary frame, b ^(gt) Respectively representing the center point coordinates of the prediction boundary frame and the center point coordinates, w and w of the real boundary frame ^(gt) Respectively represents the predicted width, the true width, h and h ^(gt) Respectively represent the predicted and true height, ρ ² (. Cndot.) represents Euclidean distance, c represents diagonal distance of minimum rectangle that can wrap the predicted bounding box and the real bounding box; c _w ，c _h Representing the width, height of the smallest rectangular box surrounding the prediction bounding box and the real bounding box;

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

4. The method for detecting an orchard environment scene based on multi-task deep learning according to claim 1, wherein the steps of acquiring an image of the orchard environment and constructing a data set comprise:

5. Orchard environment scene detection terminal based on multitasking deep learning, which is characterized by comprising: the robot comprises a camera, a processor, a navigation decision module and a robot main body, wherein the processor is provided with a deep learning model, the deep learning model comprises an improved MobileNetv3 backbone network, a spatial pyramid pool SPP module, a feature pyramid network FPN module, a target detection solution terminal and a semantic segmentation solution terminal, and the improved MobileNetv3 backbone network comprises a CBS layer and an improved bnck module;

6. The terminal for detecting the scene of the orchard environment based on the multi-task deep learning according to claim 5, wherein the improved bneck module specifically processes:

ω＝σ(C1D _K (y))

C＝Φ(K)≈exp(γ×K-b)

7. The terminal for detecting an orchard environment scene based on multi-task deep learning according to claim 5, wherein the target detection decoding head detects lossThe method comprises the following steps:

wherein ,indicating the loss of overlap->Indicating center distance loss, < >>Representing the loss of width and height; ioU is the intersection ratio of the bounding box and the real bounding box, b ^(gt) Respectively representing the center point coordinates of the prediction boundary frame and the center point coordinates, w and w of the real boundary frame ^(gt) Respectively represents the predicted width, the true width, h and h ^(gt) Respectively represent the predicted and true height, ρ ² (. Cndot.) represents Euclidean distance, c represents diagonal distance of minimum rectangle that can wrap the predicted bounding box and the real bounding box; c _w ，c _h Representing the width and height of the smallest rectangular box surrounding the prediction and real bounding boxes；

model total loss functionThe method comprises the following steps:

total loss of wherein γ₁ ，γ ₂ Is a balance weight parameter.

8. A computer readable medium having instructions stored thereon, which when executed on the readable medium cause the readable medium to perform the method of any of claims 1-4.

9. A processing terminal comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the method according to any of claims 1-4 when executing the computer program.