CN115661767A

CN115661767A - Image front vehicle target identification method based on convolutional neural network

Info

Publication number: CN115661767A
Application number: CN202211350460.8A
Authority: CN
Inventors: 陈兵旗; 房鑫
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-01-31

Abstract

The application relates to the technical field of vehicle target identification, in particular to a method for identifying a vehicle target in front of an image based on a convolutional neural network, wherein the method comprises the following steps: acquiring an environment image around a vehicle; the method comprises the steps of inputting an environmental image into a pre-trained target detection model, and outputting a target recognition result of the environmental image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing the output features of the current feature layer with the output features of all feature layers before the current feature layer to serve as the input of the next feature layer, extracting a feature map of the environmental image by using the residual error network, inputting the feature map into the feature extraction network, outputting a fusion feature, inputting the fusion feature into the prediction network, and outputting the target recognition result of the environmental image. Therefore, the problems of difficulty in detecting small targets in a road scene, low detection rate and the like in the related art are solved.

Description

A method for vehicle target recognition in front of images based on convolutional neural network

技术领域technical field

本申请涉及车辆目标识别技术领域，特别涉及一种基于卷积神经网络的图像前方车辆目标识别方法。The present application relates to the technical field of vehicle target recognition, in particular to a method for recognizing a vehicle target in front of an image based on a convolutional neural network.

背景技术Background technique

YOLO(You Only Look Once)，是一个用于目标检测的网络，从起初的YOLOV1到现在的YOLOV4，其算法的检测精度与速度得到，明显提升。YOLOV4的网络结构可分为三个部分：主干特征提取网络CSPDarknet53、加强特征提取网络和预测网络Yolo Head。YOLO (You Only Look Once) is a network for target detection. From the initial YOLOV1 to the current YOLOV4, the detection accuracy and speed of its algorithm have been significantly improved. The network structure of YOLOV4 can be divided into three parts: the backbone feature extraction network CSPDarknet53, the enhanced feature extraction network and the prediction network Yolo Head.

主干特征提取网络CSPDarknet53的结构由YOLOV3其中的ResNet(ResidualNeural Network，残差网络)结构使用了CSPNet(Cross Stage Partial Network，跨阶段局部网络)结构，但是特征提取能力仍需加强。除此之外，YOLOV4中SPP(SpatialPyramidPooling，空间金字塔池化)模块使用的池化核数量和大小在大特征图上不能充分融合多尺度感受的信息，其目标检测的性能提升有限。The structure of the backbone feature extraction network CSPDarknet53 is from YOLOV3. The ResNet (Residual Neural Network, residual network) structure uses the CSPNet (Cross Stage Partial Network, cross-stage local network) structure, but the feature extraction ability still needs to be strengthened. In addition, the number and size of the pooling cores used by the SPP (Spatial Pyramid Pooling) module in YOLOV4 cannot fully integrate the information of multi-scale perception on the large feature map, and the performance improvement of its target detection is limited.

发明内容Contents of the invention

本申请提供一种车辆的目标识别方法、装置、车辆及存储介质，以解决相关技术中对道路场景下小目标检测困难、检测率低等问题。The present application provides a vehicle target recognition method, device, vehicle and storage medium to solve the problems in related technologies such as difficulty in detecting small targets in road scenes and low detection rate.

本申请第一方面实施例提供一种车辆的目标识别方法，包括以下步骤：获取车辆周围的环境图像；将所述环境图像输入预先训练得到的目标检测模型，输出所述环境图像的目标识别结果，其中，所述目标检测模型包括残差网络、特征提取网络和预测网络，所述残差网络包括多个特征层和密集连接网络，所述密集连接网络用于将当前特征层的输出特征与所述当前特征层之前所有特征层的输出特征进行拼接，作为下一个特征层的输入，利用所述残差网络提取所述环境图像的特征图，将所述特征图输入所述特征提取网络，输出融合特征，并将所述融合特征输入所述预测网络，输出所述环境图像的目标识别结果。The embodiment of the first aspect of the present application provides a vehicle target recognition method, including the following steps: acquiring an environmental image around the vehicle; inputting the environmental image into a pre-trained target detection model, and outputting the target recognition result of the environmental image , wherein the target detection model includes a residual network, a feature extraction network and a prediction network, the residual network includes a plurality of feature layers and a densely connected network, and the densely connected network is used to combine the output features of the current feature layer with splicing the output features of all feature layers before the current feature layer, as the input of the next feature layer, using the residual network to extract the feature map of the environment image, and inputting the feature map into the feature extraction network, Outputting the fusion feature, and inputting the fusion feature into the prediction network, and outputting the target recognition result of the environment image.

可选地，所述特征提取网络包括特征金字塔网络和池化网络，其中，所述特征层中的输出特征和所述池化网络分别与所述特征金字塔网络之间设置有注意力机制网络，利用所述池化网络将所述特征图转换成特征向量，并利用所述注意力机制网络将所述特征图和所述特征向量输入所述特征金字塔网络，输出所述图像特征和所述特征向量的融合特征。Optionally, the feature extraction network includes a feature pyramid network and a pooling network, wherein the output features in the feature layer and the pooling network are respectively provided with an attention mechanism network between the feature pyramid network, Use the pooling network to convert the feature map into a feature vector, and use the attention mechanism network to input the feature map and the feature vector into the feature pyramid network, and output the image feature and the feature A vector of fused features.

可选地，所述注意力机制网络的注意力机制包括：对特征层输出的各个通道的特征图像进行全局平均池化操作，得到各个通道的新特征图；对各个通道进行卷积核大小为k的卷积操作，并经过预设激活函数得到各个通道的权重，基于所述各个通道的新特征图和权重得到跨通道的交互信息。Optionally, the attention mechanism of the attention mechanism network includes: performing a global average pooling operation on the feature images of each channel output by the feature layer to obtain a new feature map of each channel; The convolution operation of k, and the weight of each channel is obtained through the preset activation function, and the cross-channel interaction information is obtained based on the new feature map and weight of each channel.

可选地，所述池化网络包括多个卷积核，其中，所述多个卷积核可以分别为1×1、4×4、7×7、10×10和13×13。Optionally, the pooling network includes multiple convolution kernels, where the multiple convolution kernels may be 1×1, 4×4, 7×7, 10×10, and 13×13 respectively.

本申请第二方面实施例提供一种车辆的目标识别装置，包括：获取模块，用于获取车辆周围的环境图像；识别模块，用于将所述环境图像输入预先训练得到的目标检测模型，输出所述环境图像的目标识别结果，其中，所述目标检测模型包括残差网络、特征提取网络和预测网络，所述残差网络包括多个特征层和密集连接网络，所述密集连接网络用于将当前特征层的输出特征与所述当前特征层之前所有特征层的输出特征进行拼接，作为下一个特征层的输入，利用所述残差网络提取所述环境图像的特征图，将所述特征图输入所述特征提取网络，输出融合特征，并将所述融合特征输入所述预测网络，输出所述环境图像的目标识别结果。The embodiment of the second aspect of the present application provides a vehicle target recognition device, including: an acquisition module, used to acquire the environment image around the vehicle; a recognition module, used to input the environment image into the pre-trained target detection model, and output The target recognition result of the environment image, wherein the target detection model includes a residual network, a feature extraction network and a prediction network, and the residual network includes a plurality of feature layers and a densely connected network, and the densely connected network is used for Splicing the output features of the current feature layer with the output features of all feature layers before the current feature layer, as the input of the next feature layer, using the residual network to extract the feature map of the environment image, and combining the feature The graph is input into the feature extraction network, and the fusion feature is output, and the fusion feature is input into the prediction network, and the target recognition result of the environment image is output.

本申请第三方面实施例提供一种车辆，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述程序，以实现如上述实施例所述的车辆的目标识别方法。The embodiment of the third aspect of the present application provides a vehicle, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor executes the program to implement the following: The vehicle object recognition method described in the above embodiments.

本申请第四方面实施例提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行，以用于实现如上述实施例所述的车辆的目标识别方法。The embodiment of the fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, and the program is executed by a processor, so as to realize the object recognition method for a vehicle as described in the above-mentioned embodiment.

由此，本申请至少具有如下有益效果：Thus, the application at least has the following beneficial effects:

将残差网络结构中加入了密集连接，提高了网络的特征提取能力，特别是对小目标的特征提取能力，并减少了参数量；将注意力机制加入主干特征网络输出特征层，在没有提高网络深度的同时，利用大量信息选择性增加特征，提高了算法精度；改变了池化层卷积核大小，提高了特征图多尺度感受视野信息的融合，经过池化层模块后，神经网络所蕴含的图像特征信息更加丰富，所包含的信息更加复杂，在图像上表现为细节特征更多。由此，解决了相关技术中对道路场景下小目标检测困难、检测率低等技术问题。Dense connections are added to the residual network structure, which improves the feature extraction ability of the network, especially for small targets, and reduces the amount of parameters; adding the attention mechanism to the output feature layer of the backbone feature network, without improving At the same time as the depth of the network, a large amount of information is used to selectively add features, which improves the accuracy of the algorithm; the size of the convolution kernel of the pooling layer is changed, and the fusion of multi-scale sensory field information of the feature map is improved. After the pooling layer module, the neural network The contained image feature information is richer, the information contained is more complex, and the image shows more detailed features. As a result, technical problems such as difficulty in detecting small targets in a road scene and low detection rate in related technologies are solved.

本申请附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

附图说明Description of drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为根据本申请实施例提供的车辆的目标识别方法的流程图；FIG. 1 is a flow chart of a vehicle target recognition method provided according to an embodiment of the present application;

图2为根据本申请实施例提供的残差结构加入密集连接结构图；FIG. 2 is a diagram of a dense connection structure added to a residual structure provided according to an embodiment of the present application;

图3为根据本申请实施例提供的ECA(Efficient Channel Attention)注意力机制图；FIG. 3 is a diagram of an ECA (Efficient Channel Attention) attention mechanism provided according to an embodiment of the present application;

图4为根据相关技术中YOLOV4目标检测模型图；Fig. 4 is according to the YOLOV4 target detection model figure in the related art;

图5为根据本申请实施例提供的改进YOLOV4目标检测模型图；FIG. 5 is a diagram of an improved YOLOV4 target detection model provided according to an embodiment of the present application;

图6为根据本申请实施例提供的车辆检测的P-R曲线对比图；FIG. 6 is a comparison diagram of P-R curves for vehicle detection provided according to an embodiment of the present application;

图7为根据本申请一个实施例提供的车辆的目标识别方法的流程图；FIG. 7 is a flow chart of a vehicle target recognition method provided according to an embodiment of the present application;

图8为根据本申请实施例提供的车辆的目标识别装置的示例图；FIG. 8 is an example diagram of a target recognition device for a vehicle provided according to an embodiment of the present application;

图9为根据本申请实施例提供的车辆的结构示意图。Fig. 9 is a schematic structural diagram of a vehicle provided according to an embodiment of the present application.

具体实施方式Detailed ways

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本申请，而不能理解为对本申请的限制。Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

相关技术通过YOLO实现目标检测，YOLO是一个用于目标检测的网络，从起初的YOLOV1到现在的YOLOV4，其算法的检测精度与速度得到明显提升。YOLOV4的网络结构可分为三个部分：主干特征提取网络CSPDarknet53；加强特征提取网络，包括SPP和PANet(PathAggregation Network，路径聚合网络)两部分；预测网络Yolo Head，利用提取特征进行结果预测。但是仍存在以下缺点：Related technologies use YOLO to achieve target detection. YOLO is a network for target detection. From the initial YOLOV1 to the current YOLOV4, the detection accuracy and speed of its algorithm have been significantly improved. The network structure of YOLOV4 can be divided into three parts: the backbone feature extraction network CSPDarknet53; the enhanced feature extraction network, including SPP and PANet (PathAggregation Network, path aggregation network); the prediction network Yolo Head, which uses the extracted features for result prediction. But there are still following disadvantages:

1、主干特征提取网络CSPDarknet53由YOLOV3其中的ResNet结构修改为CSPResNet结构。CSPResNet结构是对输入层划分为两部分分别采用卷积操作。右边部分使用Resblock残差块结构进行特征提取，左边部分经过卷积、正则化和激活函数处理后直接与右边部分的输出特征图拼接，但其特征提取能力仍需加强。1. The backbone feature extraction network CSPDarknet53 is modified from the ResNet structure in YOLOV3 to the CSPResNet structure. The CSPResNet structure divides the input layer into two parts and uses convolution operations respectively. The right part uses the Resblock residual block structure for feature extraction, and the left part is directly spliced with the output feature map of the right part after convolution, regularization and activation function processing, but its feature extraction ability still needs to be strengthened.

2、当使用深度CNN(Convolution Neural Network，卷积神经网络)模型识别图像时，一般是通过卷积核去提取图像的局部信息，然而，每个局部信息对图像能否被正确识别的影响力是不同的，如何让模型知道图像中不同局部信息的重要性是较为关键的问题。主干网络没有添加注意力机制，会导致在训练过程中权重指数的偏移，加入注意力机制可以增强特征融合和通道与空间注意的结合，提高提取特征的能力。2. When using a deep CNN (Convolution Neural Network, convolutional neural network) model to identify images, the local information of the image is generally extracted through the convolution kernel. However, the influence of each local information on whether the image can be correctly recognized How to let the model know the importance of different local information in the image is a more critical issue. The backbone network does not add an attention mechanism, which will cause the weight index to shift during the training process. Adding an attention mechanism can enhance the combination of feature fusion and channel and spatial attention, and improve the ability to extract features.

3、随着卷积神经网络的网络层次加深，深层特征图的图像信息高度抽象，图像语义信息增加，图像的直接特征信息丢失，造成使用神经网络深层特征图进行小目标的检测，模型的精度需要提高。SPP模块结构可以实现多尺度局部特征和全局特征的融合，丰富特征图的表达能力。YOLOV4中SPP模块使用的池化核数量和大小在大特征图上不能充分融合多尺度感受野信息，其目标检测的性能提升有限。3. With the deepening of the network level of the convolutional neural network, the image information of the deep feature map is highly abstract, the semantic information of the image increases, and the direct feature information of the image is lost, resulting in the detection of small targets using the deep feature map of the neural network. The accuracy of the model need to be improved. The SPP module structure can realize the fusion of multi-scale local features and global features, and enrich the expressive ability of feature maps. The number and size of pooling kernels used by the SPP module in YOLOV4 cannot fully integrate multi-scale receptive field information on large feature maps, and its target detection performance is limited.

下面参考附图描述本申请实施例的车辆的目标识别方法、装置、车辆及存储介质。针对上述背景技术中提到的目前用于目标检测的网络YOLOV4，存在着目标特征提取能力较差、目标检测性能较低的问题，本申请提供了一种车辆的目标识别方法，在该方法中，对YOLOV4网络结构进行改进，在CSPResNet结构中融入密集连接结构，对浅层特征信息充分重复利用；在通道注意力机制中采用不降维的局部跨通道交互策略，避免了降维对于通道注意力学习效果的影响；对YOLOV4中池化核大小进行细化，使图像上细节特征更多。由此，解决了解决相关技术中进行车辆目标检测时，对小目标车辆检测困难、检测率低，检测性能较差等问题。The vehicle object recognition method, device, vehicle and storage medium according to the embodiments of the present application will be described below with reference to the accompanying drawings. Aiming at the problem that the current network YOLOV4 used for target detection mentioned in the above background technology has poor target feature extraction ability and low target detection performance, this application provides a vehicle target recognition method, in which , improve the YOLOV4 network structure, integrate the dense connection structure into the CSPResNet structure, and fully reuse the shallow feature information; in the channel attention mechanism, a local cross-channel interaction strategy without dimensionality reduction is adopted to avoid dimensionality reduction for channel attention The influence of force learning effect; the size of the pooling kernel in YOLOV4 is refined to make more detailed features on the image. Thus, problems such as difficulty in detecting small target vehicles, low detection rate, and poor detection performance in the related art are solved when vehicle target detection is performed.

具体而言，图1为本申请实施例所提供的一种车辆的目标识别方法的流程示意图。Specifically, FIG. 1 is a schematic flowchart of a vehicle object recognition method provided by an embodiment of the present application.

如图1所示，该车辆的目标识别方法包括以下步骤：As shown in Figure 1, the vehicle target recognition method includes the following steps:

在步骤S101中，获取车辆周围的环境图像。In step S101, an environment image around the vehicle is acquired.

其中，本申请实施例可以通过车载摄像头获得车辆周围的环境图像，也可以通过其他方式获得，不作具体限定。Wherein, in the embodiment of the present application, the environment image around the vehicle may be obtained through the vehicle-mounted camera, or may be obtained through other methods, which are not specifically limited.

在步骤S102中，将环境图像输入预先训练得到的目标检测模型，输出环境图像的目标识别结果，其中，目标检测模型包括残差网络、特征提取网络和预测网络，残差网络包括多个特征层和密集连接网络，密集连接网络用于将当前特征层的输出特征与当前特征层之前所有特征层的输出特征进行拼接，作为下一个特征层的输入，利用残差网络提取环境图像的特征图，将特征图输入特征提取网络，输出融合特征，并将融合特征输入预测网络，输出环境图像的目标识别结果。In step S102, the environment image is input into the pre-trained target detection model, and the target recognition result of the environment image is output, wherein the target detection model includes a residual network, a feature extraction network and a prediction network, and the residual network includes multiple feature layers And densely connected network, the densely connected network is used to splice the output features of the current feature layer with the output features of all feature layers before the current feature layer, as the input of the next feature layer, and use the residual network to extract the feature map of the environment image, Input the feature map into the feature extraction network, output the fusion feature, and input the fusion feature into the prediction network, and output the target recognition result of the environment image.

可以理解的是，本申请实施例可以将环境图像输入预先训练的目标检测模型，得到输出环境图像的目标识别结果。It can be understood that, in the embodiment of the present application, the environment image may be input into a pre-trained object detection model to obtain the object recognition result of the output environment image.

其中，残差网络包括多个特征层和密集连接网络；预测网络用于利用提取特征进行结果预测，输出环境图像的识别结果。Among them, the residual network includes multiple feature layers and a densely connected network; the prediction network is used to predict the result by using the extracted features, and output the recognition result of the environment image.

可以理解的是，本申请实施例在残差网络中添加密集连接网络，如图2所示，将前面所有特征层与当前特征层进行密集连接，从输入特征层始起，每层都作为后面各层的输入，使浅层特征信息得到充分重复利用，提高了检测小目标物体的能力。在密集连接网络中，第n+1层的输入特征，如公式所示：It can be understood that in the embodiment of the present application, a densely connected network is added to the residual network. As shown in Figure 2, all the previous feature layers are densely connected to the current feature layer. Starting from the input feature layer, each layer is used as the following The input of each layer enables the shallow feature information to be fully reused and improves the ability to detect small target objects. In a densely connected network, the input features of the n+1th layer, as shown in the formula:

x_n＝_n([x₀，x₁，…，x_n-2，x_n-1])，x _n = _n ([x ₀ , x ₁ , . . . , x _n-2 , x _n-1 ]),

其中，[x₀，x₁，…，x_n-2，x_n-1]为新增输入x_n-1与之前的所有输入特征层[x₀，x₁，…，x_n-3，x_n-2|在通道方向的拼接；H_n为第n层的转换函数；x_n为第n层的输出特征。Among them, [x ₀ , x ₁ , ..., x _n-2 , x _n-1 ] is the new input x _n-1 and all previous input feature layers [x ₀ , x ₁ , ..., x _n-3 , x _n-2 | Splicing in the channel direction; H _n is the conversion function of the nth layer; x _n is the output feature of the nth layer.

在密集连接网络中，第n层的真正输入特征层为新增输入x_n-1与之前的所有输入特征层x₀，x₁，…，x_n-3，x_n-2在通道方向进行拼接，即：[x₀，x₁，…，x_n-3，x_n-2]。该输入经过正则化、激活函数和卷积操作得到第n层的输出特征层x_n，如公式所示：In the densely connected network, the real input feature layer of the nth layer is the new input x _n-1 and all previous input feature layers x ₀ , x ₁ ,..., x _n-3 , x _n-2 in the channel direction Splicing, namely: [x ₀ , x ₁ , ..., x _n-3 , x _n-2 ]. The input is subjected to regularization, activation function and convolution operation to obtain the output feature layer x _n of the nth layer, as shown in the formula:

x_n＝H_n([x₀，x₁，…，x_n-2，x_n-1])，x _n = H _n ([x ₀ , x ₁ , . . . , x _n-2 , x _n-1 ]),

其中，x_n和之前的所有输入特征层拼接成[x₀，x₁，…，x_n-2，x_n-1]作为第n+1层的输入。每层新增的通道数k通常不会太大，新增的通道数k被称为增长率。由于采用的是拼接的方式而非残差网络中使用的线性相加，并且每次增加的通道数k较小，所以看上去连接密集较为复杂，但是实际参数的数量和计算量反而要比残差网络要少。Among them, x _n and all previous input feature layers are spliced into [x ₀ , x ₁ ,..., x _n-2 , x _n-1 ] as the input of the n+1th layer. The number k of new channels in each layer is usually not too large, and the number of new channels k is called the growth rate. Since the splicing method is used instead of the linear addition used in the residual network, and the number of channels added each time k is small, it seems that the connection density is more complicated, but the actual number of parameters and calculations are more complex than the residual network. Poor network is less.

需要说明的是，本申请实施例针对YOLOV4主干特征提取网络CSPDarknet53使用的残差块结构中存在部分浅层特征信息利用不充分的问题，使用与原残差堆叠结构具有相同层数的密集连接块Denseblock。该连接块符合CSPNet结构跳跃连接中堆叠的残差块的输出结果尺寸与输入特征图尺寸保持一致的要求，不会改变输入特征图的尺寸。该结构可由密集连接块Denseblock实现，且无需进行过渡层(Transition Layer)的下采样操作。It should be noted that the embodiment of this application aims at the problem of insufficient utilization of some shallow feature information in the residual block structure used by the YOLOV4 backbone feature extraction network CSPDarknet53, and uses densely connected blocks with the same number of layers as the original residual stacking structure Dense block. The connection block meets the requirement that the output size of the stacked residual block in the skip connection of the CSPNet structure is consistent with the size of the input feature map, and will not change the size of the input feature map. This structure can be realized by dense connection block Denseblock, and the downsampling operation of the transition layer (Transition Layer) is not required.

在本申请实施例中，特征提取网络包括特征金字塔网络和池化网络，其中，特征层中的输出特征和池化网络分别与特征金字塔网络之间设置有注意力机制网络，利用池化网络将特征图转换成特征向量，并利用注意力机制网络将特征图和特征向量输入特征金字塔网络，输出图像特征和特征向量的融合特征。In the embodiment of the present application, the feature extraction network includes a feature pyramid network and a pooling network, wherein an attention mechanism network is set between the output features in the feature layer and the pooling network and the feature pyramid network, and the pooling network is used to The feature map is converted into a feature vector, and the feature map and feature vector are input into the feature pyramid network by using the attention mechanism network, and the fusion feature of the image feature and feature vector is output.

需要说明的是，由于通道注意力机制被证明在改善深度卷积神经的性能方面有很大的潜力，因此本申请实施例在特征层中的输出特征和池化网络分别设置有注意力机制。It should be noted that since the channel attention mechanism has been proven to have great potential in improving the performance of deep convolutional neural networks, the output features in the feature layer and the pooling network in the embodiment of the present application are respectively equipped with an attention mechanism.

可以理解的是，本申请实施例可以利用池化网络将特征图转换为特征向量，并利用注意力机制网络将特征图和特征向量输入特征金字塔网络，从而输出图像特征和特征向量的融合特征，以便于后续利用融合特征输出环境图像的目标识别结果。It can be understood that the embodiment of the present application can use the pooling network to convert the feature map into a feature vector, and use the attention mechanism network to input the feature map and feature vector into the feature pyramid network, so as to output the fusion feature of image features and feature vectors, In order to facilitate the subsequent use of fusion features to output the target recognition results of the environmental image.

在本申请实施例中，注意力机制网络的注意力机制包括：对特征层输出的各个通道的特征图像进行全局平均池化操作，得到各个通道的新特征图；对各个通道进行卷积核大小为k的卷积操作，并经过预设激活函数得到各个通道的权重，基于所述各个通道的新特征图和权重得到跨通道的交互信息。In the embodiment of the present application, the attention mechanism of the attention mechanism network includes: performing a global average pooling operation on the feature images of each channel output by the feature layer to obtain a new feature map of each channel; The convolution operation is k, and the weight of each channel is obtained through the preset activation function, and the cross-channel interaction information is obtained based on the new feature map and weight of each channel.

其中，预设激活函数指可以计算各个通道的权重的函数，比如可以为Sigmod激活函数等。Wherein, the preset activation function refers to a function that can calculate the weight of each channel, for example, it can be a Sigmod activation function.

具体而言，本申请实施例在主干网络输出特征层加入注意力机制，如图3所示，ECA-Net(Efficient Channel Attention for Deep Convolutional Neural Networks，轻量模块注意力机制)根据通道的重要程度对其分配不同的权重，并通过自适应确定局部交叉跨通道交互范围的方法进行建模，以轻量级的方式获取跨通道的交互信息。具体步骤如下：Specifically, the embodiment of the present application adds an attention mechanism to the output feature layer of the backbone network, as shown in Figure 3, ECA-Net (Efficient Channel Attention for Deep Convolutional Neural Networks, lightweight module attention mechanism) according to the importance of the channel Different weights are assigned to it, and the cross-channel interaction information is obtained in a lightweight way by adaptively determining the local cross-channel interaction range for modeling. Specific steps are as follows:

1、将输入特征层进行全局平均池化操作；1. Perform a global average pooling operation on the input feature layer;

2、进行卷积核大小为k的1维卷积操作，并经过Sigmod激活函数得到各个通道的权重ω；2. Perform a 1-dimensional convolution operation with a convolution kernel size of k, and obtain the weight ω of each channel through the Sigmod activation function;

3、将权重与原始输入特征图对应元素相乘。3. Multiply the weight with the corresponding element of the original input feature map.

可以理解的是，本申请实施例的高效通道注意力机制ECA模块，采用了一种不降维的局部跨通道交互策略，可以有效避免降维对于通道注意力学习效果的影响，且适当的跨通道交互可以在保持性能的同时、显著降低模型的复杂性。It can be understood that the ECA module of the efficient channel attention mechanism in the embodiment of the present application adopts a local cross-channel interaction strategy without dimensionality reduction, which can effectively avoid the influence of dimensionality reduction on the learning effect of channel attention. Channel interaction can significantly reduce model complexity while maintaining performance.

在本申请实施例中，池化网络包括多个卷积核，其中，多个卷积核分别为1×1、4×4、7×7、10×10和13×13。In the embodiment of the present application, the pooling network includes multiple convolution kernels, wherein the multiple convolution kernels are 1×1, 4×4, 7×7, 10×10, and 13×13, respectively.

需要说明的是，原有的YOLOV4池化层卷积核分别为1×1，5×5，9×9，13×13，本申请实施例将池化核大小细化为1×1，4×4，7×7，10×10，13×13，有效提高了特征图多尺度感受视野范围，经过细化后的池化层模块，神经网络所蕴含的图像特征信息更加丰富，所包含的信息更加复杂，在图像上表现为细节特征更多。It should be noted that the original YOLOV4 pooling layer convolution kernels are 1×1, 5×5, 9×9, 13×13 respectively, and the embodiment of this application refines the pooling kernel size to 1×1, 4 ×4, 7×7, 10×10, 13×13, effectively improving the multi-scale sensory field of view of the feature map, after the refinement of the pooling layer module, the image feature information contained in the neural network is more abundant, and the included The information is more complex, and there are more detailed features on the image.

具体而言，本申请对原有的YOLOV4目标模型(如图4所示)进行改进，改进之后的YOLOV4目标检测模型如图5所示，改进方法具体分为以下三个方面：Specifically, this application improves the original YOLOV4 target model (as shown in Figure 4), and the improved YOLOV4 target detection model is shown in Figure 5, and the improvement method is specifically divided into the following three aspects:

1、针对机器视觉检测车辆中存在的小目标车辆检测困难、检测率低等问题，对YOLOV4网络结构进行改进。改进YOLOV4网络结构借鉴密集连接网络特征提取的优点，在CSPResNet结构中融入密集连接结构，对浅层特征信息充分重复利用；密集连接网络是建立前面特征层和后面特征层的短路连接。1. In view of the difficulties in detecting small target vehicles and low detection rate in machine vision detection vehicles, the network structure of YOLOV4 is improved. The improved YOLOV4 network structure draws on the advantages of densely connected network feature extraction, incorporates densely connected structure into the CSPResNet structure, and fully reuses the shallow feature information; the densely connected network is to establish a short-circuit connection between the front feature layer and the back feature layer.

2、通道注意力机制被证明在改善深度卷积神经的性能方面有很大的潜力。高效通道注意力机制(ECA)模块，采用了一种不降维的局部跨通道交互策略，有效避免了降维对于通道注意力学习效果的影响。适当的跨通道交互可以在保持性能的同时显著降低模型的复杂性。2. The channel attention mechanism has been shown to have great potential in improving the performance of deep convolutional neural. The Efficient Channel Attention (ECA) module adopts a local cross-channel interaction strategy without dimensionality reduction, which effectively avoids the influence of dimensionality reduction on channel attention learning effects. Proper cross-channel interaction can significantly reduce model complexity while maintaining performance.

3、原有YOLOV4池化层卷积核分别为1×1，5×5，9×9，13×13，将池化核大小细化为1×1，4×4，7×7，10×10，13×13，增强感受野范围。经过细化后的池化层模块，神经网络所蕴含的图像特征信息更加丰富，所包含的信息更加复杂，在图像上表现为细节特征更多。3. The original YOLOV4 pooling layer convolution kernels are 1×1, 5×5, 9×9, 13×13, and the pooling kernel size is refined to 1×1, 4×4, 7×7, 10 ×10, 13×13, enhanced receptive field range. After refining the pooling layer module, the image feature information contained in the neural network is more abundant, the information contained is more complex, and the image shows more detailed features.

如图6所示，原YOLOV4与改进YOLOV4的P-R曲线对比图，本申请经过改进的YOLOV4算法对于道路场景下小目标的检测效果要由于原YOLOV4算法，漏检率更低，并且在检出目标的置信度上，改进前后的结果也存在差别，对于YOLOV4的检出目标，改进YOLOV4输出的置信度值一般更高，这表明改进后的网络的检测能力更强，更能关注目标的关键信息，在检测速度上也略有提升，改进YOLOV4算法较原YOLOV4相比，检测速度略高，平均精度提高了2.61％，达到92.63％，改进YOLOV4算法在小目标检测上性能更佳，漏检率更低。As shown in Figure 6, the P-R curve comparison between the original YOLOV4 and the improved YOLOV4, the improved YOLOV4 algorithm of this application has a better detection effect on small targets in road scenes due to the original YOLOV4 algorithm, which has a lower missed detection rate and is better at detecting targets. In terms of confidence, there are also differences in the results before and after the improvement. For the detection target of YOLOV4, the confidence value of the improved YOLOV4 output is generally higher, which shows that the improved network has stronger detection ability and can pay more attention to the key information of the target. , The detection speed is also slightly improved. Compared with the original YOLOV4, the improved YOLOV4 algorithm has a slightly higher detection speed, and the average accuracy has increased by 2.61%, reaching 92.63%. The improved YOLOV4 algorithm has better performance in small target detection, and the missed detection rate lower.

下面将通过一个具体实施例来阐述车辆的目标识别方法，如图7所示，步骤如下：The target recognition method of the vehicle will be described below through a specific embodiment, as shown in Figure 7, the steps are as follows:

1、对车辆周围环境的图像进行采集，制作数据集；1. Collect images of the surrounding environment of the vehicle and make a data set;

2、将数据集划分为验证集、训练集、测试集；2. Divide the data set into validation set, training set and test set;

3、构建基于改进YOLOV4的车辆检测模型，其添加了密集连接、ECA注意力机制和不同池化卷积核；3. Construct a vehicle detection model based on improved YOLOV4, which adds dense connection, ECA attention mechanism and different pooling convolution kernels;

4、对模型进行训练与调优；4. Train and tune the model;

5、使用验证集对训练完成的前方车辆模型进行性能评估；5. Use the verification set to evaluate the performance of the trained front vehicle model;

6、搭建开发平台，对单目摄像头进行读取，在模型内进行视频预测。6. Build a development platform to read the monocular camera and perform video prediction in the model.

根据本申请实施例提出的车辆的目标识别方法，将残差网络结构中加入了密集连接，提高了网络的特征提取能力，特别是对小目标的特征提取能力，并减少了参数量；将注意力机制加入主干特征网络输出特征层，在没有提高网络深度的同时，利用大量信息选择性增加特征，提高了算法精度；改变了池化层卷积核大小，提高了特征图多尺度感受视野信息的融合，经过池化层模块后，神经网络所蕴含的图像特征信息更加丰富，所包含的信息更加复杂，在图像上表现为细节特征更多。According to the vehicle target recognition method proposed in the embodiment of the present application, dense connections are added to the residual network structure, which improves the feature extraction capability of the network, especially for small targets, and reduces the amount of parameters; it will be noted that The force mechanism is added to the output feature layer of the backbone feature network. While not increasing the depth of the network, a large amount of information is used to selectively increase the feature, which improves the accuracy of the algorithm; the size of the convolution kernel of the pooling layer is changed, and the multi-scale visual field information of the feature map is improved. After the fusion of the pooling layer module, the image feature information contained in the neural network is more abundant, the information contained is more complex, and there are more detailed features on the image.

其次参照附图描述根据本申请实施例提出的车辆的目标识别装置。Next, the object recognition device for a vehicle proposed according to the embodiment of the present application will be described with reference to the accompanying drawings.

图8是本申请实施例的车辆的目标识别装置的方框示意图。FIG. 8 is a schematic block diagram of an object recognition device for a vehicle according to an embodiment of the present application.

如图8所示，该车辆的目标识别装置10包括：获取模块100和检测模块200。As shown in FIG. 8 , the object recognition device 10 of the vehicle includes: an acquisition module 100 and a detection module 200 .

其中，获取模块100用于获取车辆周围的环境图像；检测模块200用于将环境图像输入预先训练得到的目标检测模型，输出环境图像的目标识别结果，其中，目标检测模型包括残差网络、特征提取网络和预测网络，残差网络包括多个特征层和密集连接网络，密集连接网络用于将当前特征层的输出特征与当前特征层之前所有特征层的输出特征进行拼接，作为下一个特征层的输入，利用残差网络提取环境图像的特征图，将特征图输入特征提取网络，输出融合特征，并将融合特征输入预测网络，输出环境图像的目标识别结果。Wherein, the acquisition module 100 is used to acquire the environment image around the vehicle; the detection module 200 is used to input the environment image into the pre-trained target detection model, and output the target recognition result of the environment image, wherein the target detection model includes a residual network, a feature Extraction network and prediction network, the residual network includes multiple feature layers and dense connection network, the dense connection network is used to splice the output features of the current feature layer with the output features of all feature layers before the current feature layer, as the next feature layer input, use the residual network to extract the feature map of the environment image, input the feature map into the feature extraction network, output the fusion feature, and input the fusion feature into the prediction network, and output the target recognition result of the environment image.

在本申请实施例中，注意力机制网络的注意力机制包括：对特征层输出的各个通道的特征图像进行全局平均池化操作，得到各个通道的新特征图；对各个通道进行卷积核大小为k的卷积操作，并经过预设激活函数得到各个通道的权重，基于各个通道的新特征图和权重得到跨通道的交互信息。In the embodiment of the present application, the attention mechanism of the attention mechanism network includes: performing a global average pooling operation on the feature images of each channel output by the feature layer to obtain a new feature map of each channel; The convolution operation is k, and the weight of each channel is obtained through the preset activation function, and the cross-channel interaction information is obtained based on the new feature map and weight of each channel.

需要说明的是，前述对车辆的目标识别方法实施例的解释说明也适用于该实施例的车辆的目标识别装置，此处不再赘述。It should be noted that the foregoing explanations of the embodiment of the vehicle object recognition method are also applicable to the vehicle object recognition device of this embodiment, and will not be repeated here.

根据本申请实施例提出的车辆的目标识别装置，将残差网络结构中加入了密集连接，提高了网络的特征提取能力，特别是对小目标的特征提取能力，并减少了参数量；将注意力机制加入主干特征网络输出特征层，在没有提高网络深度的同时，利用大量信息选择性增加特征，提高了算法精度；改变了池化层卷积核大小，提高了特征图多尺度感受视野信息的融合，经过池化层模块后，神经网络所蕴含的图像特征信息更加丰富，所包含的信息更加复杂，在图像上表现为细节特征更多。According to the vehicle target recognition device proposed in the embodiment of the present application, dense connections are added to the residual network structure, which improves the feature extraction capability of the network, especially for small targets, and reduces the amount of parameters; it will be noted The force mechanism is added to the output feature layer of the backbone feature network. While not increasing the depth of the network, a large amount of information is used to selectively increase the feature, which improves the accuracy of the algorithm; the size of the convolution kernel of the pooling layer is changed, and the multi-scale visual field information of the feature map is improved. After the fusion of the pooling layer module, the image feature information contained in the neural network is more abundant, the information contained is more complex, and there are more detailed features on the image.

图9为本申请实施例提供的车辆的结构示意图。该车辆可以包括：FIG. 9 is a schematic structural diagram of a vehicle provided by an embodiment of the present application. The vehicle can include:

存储器901、处理器902及存储在存储器901上并可在处理器902上运行的计算机程序。A memory 901 , a processor 902 , and a computer program stored in the memory 901 and executable on the processor 902 .

处理器902执行程序时实现上述实施例中提供的车辆的目标识别方法。When the processor 902 executes the program, the object recognition method for the vehicle provided in the above-mentioned embodiments is implemented.

进一步地，车辆还包括：Further, the vehicle also includes:

通信接口903，用于存储器901和处理器902之间的通信。The communication interface 903 is used for communication between the memory 901 and the processor 902 .

存储器901，用于存放可在处理器902上运行的计算机程序。The memory 901 is used to store computer programs that can run on the processor 902 .

存储器901可能包含高速RAM(Random Access Memory，随机存取存储器)存储器，也可能还包括非易失性存储器，例如至少一个磁盘存储器。The memory 901 may include a high-speed RAM (Random Access Memory, random access memory) memory, and may also include a non-volatile memory, such as at least one disk memory.

如果存储器901、处理器902和通信接口903独立实现，则通信接口903、存储器901和处理器902可以通过总线相互连接并完成相互间的通信。总线可以是ISA(IndustryStandard Architecture，工业标准体系结构)总线、PCI(Peripheral Component，外部设备互连)总线或EISA(Extended Industry Standard Architecture，扩展工业标准体系结构)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，图9中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。If the memory 901, the processor 902, and the communication interface 903 are independently implemented, the communication interface 903, the memory 901, and the processor 902 may be connected to each other through a bus to complete mutual communication. The bus may be an ISA (Industry Standard Architecture, industry standard architecture) bus, a PCI (Peripheral Component, external device interconnection) bus, or an EISA (Extended Industry Standard Architecture, extended industry standard architecture) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.

可选的，在具体实现上，如果存储器901、处理器902及通信接口903，集成在一块芯片上实现，则存储器901、处理器902及通信接口903可以通过内部接口完成相互间的通信。Optionally, in terms of specific implementation, if the memory 901, processor 902, and communication interface 903 are integrated on one chip, then the memory 901, processor 902, and communication interface 903 can communicate with each other through the internal interface.

处理器902可能是一个CPU(Central Processing Unit，中央处理器)，或者是ASIC(Application Specific Integrated Circuit，特定集成电路)，或者是被配置成实施本申请实施例的一个或多个集成电路。The processor 902 may be a CPU (Central Processing Unit, central processing unit), or an ASIC (Application Specific Integrated Circuit, specific integrated circuit), or one or more integrated circuits configured to implement the embodiments of the present application.

本申请实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上的车辆的目标识别方法。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the above object recognition method for a vehicle is realized.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不是必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或N个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Moreover, the described specific features, structures, materials or characteristics may be combined in any one or N embodiments or examples in an appropriate manner. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中，“N个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present application, "N" means at least two, such as two, three, etc., unless otherwise specifically defined.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更N个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本申请的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a custom logical function or step of a process , and the scope of preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in substantially simultaneous fashion or in reverse order depending on the functions involved, which shall It should be understood by those skilled in the art to which the embodiments of the present application belong.

应当理解，本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，N个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列，现场可编程门阵列等。It should be understood that each part of the present application may be realized by hardware, software, firmware or a combination thereof. In the above embodiments, the N steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.

尽管上面已经示出和描述了本申请的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本申请的限制，本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present application, and those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. A target recognition method for a vehicle, comprising the following steps:

Obtain an image of the environment around the vehicle;

Inputting the environmental image into a pre-trained target detection model, outputting the target recognition result of the environmental image, wherein the target detection model includes a residual network, a feature extraction network and a prediction network, and the residual network includes multiple A feature layer and a densely connected network, the densely connected network is used to splice the output features of the current feature layer with the output features of all feature layers before the current feature layer, as the input of the next feature layer, using the residual The difference network extracts the feature map of the environment image, inputs the feature map into the feature extraction network, outputs fusion features, and inputs the fusion features into the prediction network, and outputs the target recognition result of the environment image.

2. The method according to claim 1, wherein the feature extraction network comprises a feature pyramid network and a pooling network, wherein the output features in the feature layer and the pooling network are respectively related to the feature An attention mechanism network is set between the pyramid networks, and the feature map is converted into a feature vector by using the pooling network, and the feature map and the feature vector are input into the feature pyramid by using the attention mechanism network A network that outputs a fusion feature of the image feature and the feature vector.

3. The method according to claim 2, wherein the attention mechanism of the attention mechanism network comprises:

Perform a global average pooling operation on the feature images of each channel output by the feature layer to obtain a new feature map of each channel;

A convolution operation with a convolution kernel size of k is performed on each channel, and the weight of each channel is obtained through a preset activation function, and cross-channel interaction information is obtained based on the new feature map and weight of each channel.

4. The method according to claim 2, wherein the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels are 1×1, 4×4, 7×7, 10×10 and 13×13.

5. A target recognition device for a vehicle, comprising:

An acquisition module, configured to acquire environmental images around the vehicle;

A detection module, configured to input the environmental image into a pre-trained target detection model, and output the target recognition result of the environmental image, wherein the target detection model includes a residual network, a feature extraction network and a prediction network, the The residual network includes multiple feature layers and a densely connected network, and the densely connected network is used to splice the output features of the current feature layer with the output features of all feature layers before the current feature layer, as the input of the next feature layer , using the residual network to extract the feature map of the environment image, input the feature map into the feature extraction network, output the fusion feature, and input the fusion feature into the prediction network, and output the feature map of the environment image Target recognition results.

6. The device according to claim 5, wherein the feature extraction network comprises a feature pyramid network and a pooling network, wherein the output features in the feature layer and the pooling network are respectively related to the feature An attention mechanism network is set between the pyramid networks, and the feature map is converted into a feature vector by using the pooling network, and the feature map and the feature vector are input into the feature pyramid by using the attention mechanism network A network that outputs a fusion feature of the image feature and the feature vector.

7. The device according to claim 6, wherein the attention mechanism of the attention mechanism network comprises: performing a global average pooling operation on the feature images of each channel output by the feature layer to obtain new features of each channel Figure: Perform a convolution operation with a convolution kernel size of k on each channel, and obtain the weight of each channel through a preset activation function, and obtain cross-channel interaction information based on the new feature map and weight of each channel.

8. The device according to claim 6, wherein the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels are 1×1, 4×4, 7×7, 10×10 and 13×13.

9. A vehicle, characterized in that it comprises: a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executes the program, so as to realize the The target recognition method for the vehicle described in any one of 1-4.

10. A computer-readable storage medium, on which a computer program is stored, wherein the program is executed by a processor, so as to implement the vehicle object recognition method according to any one of claims 1-4.