CN115661767A - Image front vehicle target identification method based on convolutional neural network - Google Patents
Image front vehicle target identification method based on convolutional neural network Download PDFInfo
- Publication number
- CN115661767A CN115661767A CN202211350460.8A CN202211350460A CN115661767A CN 115661767 A CN115661767 A CN 115661767A CN 202211350460 A CN202211350460 A CN 202211350460A CN 115661767 A CN115661767 A CN 115661767A
- Authority
- CN
- China
- Prior art keywords
- feature
- network
- output
- vehicle
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application relates to the technical field of vehicle target identification, in particular to a method for identifying a vehicle target in front of an image based on a convolutional neural network, wherein the method comprises the following steps: acquiring an environment image around a vehicle; the method comprises the steps of inputting an environmental image into a pre-trained target detection model, and outputting a target recognition result of the environmental image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing the output features of the current feature layer with the output features of all feature layers before the current feature layer to serve as the input of the next feature layer, extracting a feature map of the environmental image by using the residual error network, inputting the feature map into the feature extraction network, outputting a fusion feature, inputting the fusion feature into the prediction network, and outputting the target recognition result of the environmental image. Therefore, the problems of difficulty in detecting small targets in a road scene, low detection rate and the like in the related art are solved.
Description
Technical Field
The application relates to the technical field of vehicle target identification, in particular to a convolutional neural network-based image front vehicle target identification method.
Background
Yoloo (You Only Look one) is a network for target detection, and the detection precision and speed of the algorithm are obtained from the original Yolov1 to the current Yolov4, so that the detection precision and speed are obviously improved. The network structure of YOLOV4 can be divided into three parts: a trunk feature extraction network CSPDarknet53, an enhanced feature extraction network and a prediction network Yolo Head.
The structure of the main feature extraction Network CSPDarknet53 is a ResNet (Residual Neural Network) structure in YOLOV3, which uses a CSPNet (Cross Stage Partial Network) structure, but the feature extraction capability needs to be enhanced. In addition, the number and size of the pooling kernels used by the SPP (Spatial pyramid pooling) module in YOLOV4 cannot sufficiently fuse information of multi-scale perception on a large feature map, and the performance improvement of target detection is limited.
Disclosure of Invention
The application provides a target identification method and device for a vehicle, the vehicle and a storage medium, and aims to solve the problems that in the related art, small targets are difficult to detect in a road scene, the detection rate is low and the like.
An embodiment of a first aspect of the present application provides a target identification method for a vehicle, including the following steps: acquiring an environment image around a vehicle; inputting the environment image into a pre-trained target detection model, and outputting a target recognition result of the environment image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing the output features of the current feature layer with the output features of all feature layers before the current feature layer to serve as the input of the next feature layer, extracting the feature diagram of the environment image by using the residual error network, inputting the feature diagram into the feature extraction network, outputting a fusion feature, inputting the fusion feature into the prediction network, and outputting the target recognition result of the environment image.
Optionally, the feature extraction network includes a feature pyramid network and a pooling network, where an attention mechanism network is respectively disposed between the output feature in the feature layer and the pooling network and the feature pyramid network, the pooling network is used to convert the feature map into feature vectors, the attention mechanism network is used to input the feature map and the feature vectors into the feature pyramid network, and a fusion feature of the image feature and the feature vectors is output.
Optionally, the attention mechanism of the attention mechanism network comprises: performing global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels; and performing convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
Optionally, the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels may be 1 × 1,4 × 4,7 × 7, 10 × 10, and 13 × 13, respectively.
In a second aspect, an embodiment of the present application provides an object recognition apparatus for a vehicle, including: the acquisition module is used for acquiring an environment image around the vehicle; the identification module is used for inputting the environment image into a pre-trained target detection model and outputting a target identification result of the environment image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing output features of a current feature layer and output features of all feature layers before the current feature layer and used as input of a next feature layer, the residual error network is used for extracting a feature map of the environment image, the feature map is input into the feature extraction network, fusion features are output, the fusion features are input into the prediction network, and the target identification result of the environment image is output.
Optionally, the feature extraction network includes a feature pyramid network and a pooling network, where an attention mechanism network is respectively disposed between the output feature in the feature layer and the pooling network and the feature pyramid network, the pooling network is used to convert the feature map into feature vectors, the attention mechanism network is used to input the feature map and the feature vectors into the feature pyramid network, and a fusion feature of the image feature and the feature vectors is output.
Optionally, the attention mechanism of the attention mechanism network comprises: performing global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels; and performing convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
Optionally, the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels may be 1 × 1,4 × 4,7 × 7, 10 × 10, and 13 × 13, respectively.
An embodiment of a third aspect of the present application provides a vehicle, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of object recognition of a vehicle as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the object identification method of a vehicle as described in the above embodiments.
Therefore, the application has at least the following beneficial effects:
dense connection is added into a residual network structure, so that the feature extraction capability of the network is improved, particularly the feature extraction capability of a small target is improved, and the parameter quantity is reduced; an attention mechanism is added into a main feature network output feature layer, and the feature is selectively increased by using a large amount of information while the network depth is not increased, so that the algorithm precision is improved; the size of a convolution kernel of a pooling layer is changed, fusion of multi-scale perception view information of the characteristic diagram is improved, and after the information passes through a pooling layer module, image characteristic information contained in a neural network is richer, contained information is more complex, and more detailed characteristics are shown on an image. Therefore, the technical problems of difficulty in detecting small targets in a road scene, low detection rate and the like in the related technology are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for identifying an object of a vehicle according to an embodiment of the present application;
FIG. 2 is a diagram of a residual structure adding dense connection structure provided in accordance with an embodiment of the present application;
FIG. 3 is a chart of an ECA (Efficient Channel Attention) Attention machine provided in accordance with an embodiment of the present application;
FIG. 4 is a diagram of a Yolov4 target detection model according to the related art;
FIG. 5 is a diagram of an improved Yolov4 target detection model provided in accordance with an embodiment of the present application;
FIG. 6 is a comparison graph of P-R curves for vehicle detection provided in accordance with an embodiment of the present application;
FIG. 7 is a flow chart of a method of object identification of a vehicle according to one embodiment of the present application;
FIG. 8 is an exemplary diagram of an object recognition device of a vehicle provided in accordance with an embodiment of the present application;
fig. 9 is a schematic structural diagram of a vehicle according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
In the related art, the target detection is realized through the YOLO, which is a network for target detection, and the detection precision and speed of the algorithm are obviously improved from the original YOLOV1 to the current YOLOV 4. The network structure of YOLOV4 can be divided into three parts: a trunk feature extraction network CSPDarknet53; the enhanced feature extraction Network comprises two parts, namely an SPP (shortest Path Aggregation Network) and a PANET (Path Aggregation Network); and predicting a network Yolo Head, and performing result prediction by using the extracted features. However, the following disadvantages still exist:
1. the trunk feature extraction network CSPDarknet53 is modified from a ResNet structure in YOLOV3 to a csprasenet structure. The CSPResNet structure is that the input layer is divided into two parts and convolution operation is respectively adopted. And the right part uses a Resblock residual block structure for feature extraction, and the left part is directly spliced with the output feature map of the right part after convolution, regularization and activation function processing, but the feature extraction capability of the left part still needs to be enhanced.
2. When an image is identified by using a deep CNN (convolutional Neural Network) model, local information of the image is generally extracted by a Convolution kernel, however, the influence of each piece of local information on whether the image can be correctly identified is different, and how to let the model know the importance of different pieces of local information in the image is a key problem. The main network is not added with an attention mechanism, which can cause the deviation of the weight index in the training process, and the attention mechanism can enhance the combination of feature fusion and channel and space attention, and improve the capability of extracting features.
3. With the deepening of the network hierarchy of the convolutional neural network, the image information of the deep feature map is highly abstract, the semantic information of the image is increased, and the direct feature information of the image is lost, so that the detection of a small target by using the deep feature map of the neural network is caused, and the accuracy of the model needs to be improved. The SPP module structure can realize the fusion of multi-scale local features and global features, and enrich the expression capability of the feature map. The number and size of pooling kernels used by the SPP module in the Yolov4 cannot fully fuse multi-scale receptive field information on a large feature map, and the performance improvement of target detection is limited.
A target recognition method and apparatus for a vehicle, and a storage medium according to embodiments of the present application are described below with reference to the drawings. Aiming at the problems that the prior network YOLOV4 for target detection has poor target feature extraction capability and low target detection performance in the background technology, the application provides a vehicle target identification method, in the method, the network structure of the YOLOV4 is improved, a dense connection structure is fused into a CSPResNet structure, and shallow feature information is fully and repeatedly utilized; in the channel attention mechanism, a local cross-channel interaction strategy without dimension reduction is adopted, so that the influence of dimension reduction on the channel attention mechanism learning effect is avoided; the size of the pooling kernel in YOLOV4 is thinned, so that the detail characteristics on the image are more. Therefore, the problems that when the vehicle target is detected in the related technology, the small target vehicle is difficult to detect, the detection rate is low, the detection performance is poor and the like are solved.
Specifically, fig. 1 is a schematic flowchart of a method for identifying a target of a vehicle according to an embodiment of the present disclosure.
As shown in fig. 1, the object recognition method of a vehicle includes the steps of:
in step S101, an environment image around the vehicle is acquired.
The vehicle-mounted camera can acquire the environment image around the vehicle, and can also acquire the environment image in other ways, without specific limitation.
In step S102, an environment image is input into a pre-trained target detection model, and a target recognition result of the environment image is output, where the target detection model includes a residual error network, a feature extraction network, and a prediction network, the residual error network includes a plurality of feature layers and a dense connection network, and the dense connection network is used to splice output features of a current feature layer with output features of all feature layers before the current feature layer, and use the residual error network to extract a feature map of the environment image, input the feature map into the feature extraction network, output a fusion feature, input the fusion feature into the prediction network, and output a target recognition result of the environment image.
It can be understood that, in the embodiment of the present application, the environment image may be input into a pre-trained target detection model, so as to obtain a target recognition result of the output environment image.
The residual error network comprises a plurality of characteristic layers and a dense connection network; and the prediction network is used for predicting the result by utilizing the extracted features and outputting the identification result of the environment image.
It can be understood that, in the embodiment of the present application, a dense connection network is added in the residual error network, as shown in fig. 2, all the previous feature layers are densely connected with the current feature layer, and from the beginning of inputting the feature layers, each layer is used as the input of each layer behind, so that shallow feature information is fully recycled, and the capability of detecting small target objects is improved. In a densely connected network, the input characteristics of the (n + 1) th layer are as shown in the formula:
x n = n ([x 0 ,x 1 ,…,x n-2 ,x n-1 ]),
wherein, [ x ] 0 ,x 1 ,…,x n-2 ,x n-1 ]For newly adding input x n-1 With all previous input feature layers x 0 ,x 1 ,…,x n-3 ,x n-2 Splicing | in the channel direction; h n A transfer function for the nth layer; x is the number of n Is the output characteristic of the nth layer.
In the dense connection network, the real input characteristic layer of the nth layer is newly added input x n-1 With all previous input feature layers x 0 ,x 1 ,…,x n-3 ,x n-2 Splicing is carried out in the channel direction, namely: [ x ] of 0 ,x 1 ,…,x n-3 ,x n-2 ]. The input is subjected to regularization, activation function and convolution operation to obtain an output characteristic layer x of the nth layer n As shown in the formula:
x n =H n ([x 0 ,x 1 ,…,x n-2 ,x n-1 ]),
wherein x is n And all the previous input feature layers are spliced into [ x ] 0 ,x 1 ,…,x n-2 ,x n-1 ]As input to the (n + 1) th layer. The number k of newly added channels per layer is usually not too large, and the number k of newly added channels is called the growth rate. Since the splicing mode is adopted instead of the linear addition used in the residual network, and the number k of channels added each time is small, the connection is seemingly dense and complicated, but the number and the calculation amount of actual parameters are smaller than those of the residual network.
It should be noted that, in the embodiment of the present application, to solve the problem that part of shallow feature information is not fully utilized in the residual block structure used by the YOLOV4 trunk feature extraction network CSPDarknet53, a dense connection block Denseblock having the same number of layers as the original residual block structure is used. The connection block meets the requirement that the output result size of the residual block stacked in the CSPNet structure jump connection is consistent with the size of the input characteristic diagram, and the size of the input characteristic diagram cannot be changed. The structure can be realized by dense connection blocks Deneblock, and the downsampling operation of a Transition Layer (Transition Layer) is not required.
In the embodiment of the application, the feature extraction network comprises a feature pyramid network and a pooling network, wherein an attention mechanism network is arranged between the output feature and the pooling network in the feature layer and the feature pyramid network respectively, the feature graph is converted into a feature vector by using the pooling network, the feature graph and the feature vector are input into the feature pyramid network by using the attention mechanism network, and the fusion feature of the image feature and the feature vector is output.
It should be noted that since the channel attention mechanism has proved to have great potential in improving the performance of the deep convolutional neural, the embodiment of the present application is provided with an attention mechanism for the output feature and the pooling network in the feature layer, respectively.
It can be understood that, in the embodiment of the present application, the feature map may be converted into the feature vector by using a pooling network, and the feature map and the feature vector are input into the feature pyramid network by using an attention mechanism network, so as to output a fusion feature of the image feature and the feature vector, so as to output a target identification result of the environment image by using the fusion feature subsequently.
In the embodiment of the present application, the attention mechanism of the attention mechanism network includes: performing global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels; and carrying out convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
The preset activation function refers to a function that can calculate the weight of each channel, and may be, for example, a Sigmod activation function.
Specifically, in the embodiment of the present application, an Attention mechanism is added to an output feature layer of a backbone network, as shown in fig. 3, an ECA-Net (lightweight module Attention mechanism) assigns different weights to channels according to their importance degrees, and performs modeling by a method for adaptively determining a local cross-Channel interaction range, so as to obtain cross-Channel interaction information in a lightweight manner. The method comprises the following specific steps:
1. performing global average pooling operation on the input feature layer;
2. performing 1-dimensional convolution operation with the convolution kernel size of k, and obtaining the weight omega of each channel through a Sigmod activation function;
3. and multiplying the weight by the corresponding element of the original input feature map.
It can be understood that, the high-efficiency channel attention mechanism ECA module in the embodiment of the present application adopts a local cross-channel interaction strategy without dimension reduction, which can effectively avoid the influence of dimension reduction on the channel attention mechanism learning effect, and the appropriate cross-channel interaction can significantly reduce the complexity of the model while maintaining the performance.
In an embodiment of the present application, the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels are 1 × 1,4 × 4,7 × 7, 10 × 10, and 13 × 13, respectively.
It should be noted that, original YOLOV4 pooling layer convolution kernels are 1 × 1,5 × 5,9 × 9 and 13 × 13 respectively, in the embodiment of the present application, the size of the pooling kernel is reduced to 1 × 1,4 × 4,7 × 7, 10 × 10 and 13 × 13, so that the multi-scale sensing visual field range of the feature map is effectively improved, and the image feature information contained in the neural network is richer and the contained information is more complex through the reduced pooling layer module, and the image shows more detail features.
Specifically, the present application improves an original YOLOV4 target model (as shown in fig. 4), and the improved YOLOV4 target detection model is shown in fig. 5, and the improvement method is specifically divided into the following three aspects:
1. aiming at the problems of difficult detection, low detection rate and the like of small target vehicles in machine vision detection vehicles, the YOLOV4 network structure is improved. The improved YOLOV4 network structure uses the advantage of dense connection network feature extraction for reference, and the dense connection structure is fused in the CSPResNet structure, so that shallow feature information is fully and repeatedly utilized; a dense connection network is a network that establishes short-circuit connections to preceding and succeeding feature layers.
2. Channel attention machine mechanisms have proven to have great potential in improving the performance of deep convolutional nerves. The efficient channel attention mechanism (ECA) module adopts a local cross-channel interaction strategy without dimension reduction, and effectively avoids the influence of dimension reduction on the channel attention mechanics effect. Appropriate cross-channel interactions can significantly reduce the complexity of the model while maintaining performance.
3. The original Yolov4 pooling layer convolution kernels are respectively 1 × 1,5 × 5,9 × 9 and 13 × 13, the pooling kernel size is reduced to 1 × 1,4 × 4,7 × 7, 10 × 10 and 13 × 13, and the receptive field range is enhanced. Through the thinned pooling layer module, the image characteristic information contained in the neural network is richer, the contained information is more complex, and more detail characteristics are shown on the image.
As shown in fig. 6, a P-R curve comparison graph of the original YOLOV4 and the improved YOLOV4 shows that the detection effect of the improved YOLOV4 algorithm on the small target in the road scene is due to that the original YOLOV4 algorithm has a lower undetected rate, and the results before and after the improvement are different in the confidence of the detected target, and for the detected target of YOLOV4, the confidence value output by the improved YOLOV4 is generally higher, which indicates that the improved network has a stronger detection capability, can pay more attention to the key information of the target, and has a slightly improved detection speed, compared with the original YOLOV4, the improved YOLOV4 algorithm has a slightly higher detection speed, the average precision is improved by 2.61% and reaches 92.63%, and the improved YOLOV4 algorithm has a better performance on the detection of the small target, and has a lower undetected rate.
The method for identifying the target of the vehicle will be described by a specific embodiment, as shown in fig. 7, the steps are as follows:
1. collecting images of the surrounding environment of the vehicle, and making a data set;
2. dividing a data set into a verification set, a training set and a test set;
3. constructing a vehicle detection model based on improved YOLOV4, wherein dense connection, an ECA attention mechanism and different pooling convolution kernels are added;
4. training and adjusting the model;
5. performing performance evaluation on the trained front vehicle model by using a verification set;
6. and (4) building a development platform, reading the monocular camera, and performing video prediction in the model.
According to the vehicle target identification method provided by the embodiment of the application, dense connection is added into a residual network structure, so that the characteristic extraction capability of the network is improved, particularly the characteristic extraction capability of small targets, and the parameter quantity is reduced; an attention mechanism is added into a main feature network output feature layer, and the feature is selectively increased by using a large amount of information while the network depth is not increased, so that the algorithm precision is improved; the size of a convolution kernel of a pooling layer is changed, fusion of multi-scale perception visual field information of the characteristic diagram is improved, after the characteristic diagram passes through a pooling layer module, image characteristic information contained in the neural network is richer, contained information is more complex, and more detail characteristics are shown on the image.
Next, an object recognition apparatus of a vehicle according to an embodiment of the present application is described with reference to the drawings.
Fig. 8 is a block diagram schematically illustrating an object recognition device of a vehicle according to an embodiment of the present application.
As shown in fig. 8, the object recognition device 10 of the vehicle includes: an acquisition module 100 and a detection module 200.
The acquiring module 100 is used for acquiring an environment image around the vehicle; the detection module 200 is configured to input an environmental image into a pre-trained target detection model, and output a target recognition result of the environmental image, where the target detection model includes a residual network, a feature extraction network, and a prediction network, the residual network includes multiple feature layers and a dense connection network, and the dense connection network is configured to splice output features of a current feature layer and output features of all feature layers before the current feature layer, and use the output features as input of a next feature layer, extract a feature map of the environmental image using the residual network, input the feature map into the feature extraction network, output fusion features, and input the fusion features into the prediction network, and output the target recognition result of the environmental image.
In the embodiment of the application, the feature extraction network comprises a feature pyramid network and a pooling network, wherein an attention mechanism network is arranged between the output feature and the pooling network in the feature layer and the feature pyramid network respectively, the feature graph is converted into a feature vector by using the pooling network, the feature graph and the feature vector are input into the feature pyramid network by using the attention mechanism network, and the fusion feature of the image feature and the feature vector is output.
In the embodiment of the present application, the attention mechanism of the attention mechanism network includes: carrying out global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels; and performing convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
In an embodiment of the present application, the pooling network includes a plurality of convolution kernels, wherein the plurality of convolution kernels are 1 × 1,4 × 4,7 × 7, 10 × 10, and 13 × 13, respectively.
It should be noted that the foregoing explanation of the embodiment of the vehicle target identification method is also applicable to the vehicle target identification device of this embodiment, and details are not repeated here.
According to the vehicle target identification device provided by the embodiment of the application, dense connection is added into a residual network structure, so that the characteristic extraction capability of a network is improved, particularly the characteristic extraction capability of a small target is improved, and the parameter quantity is reduced; an attention mechanism is added into a main feature network output feature layer, and features are selectively increased by using a large amount of information while the network depth is not increased, so that the algorithm precision is improved; the size of a convolution kernel of a pooling layer is changed, fusion of multi-scale perception visual field information of the characteristic diagram is improved, after the characteristic diagram passes through a pooling layer module, image characteristic information contained in the neural network is richer, contained information is more complex, and more detail characteristics are shown on the image.
Fig. 9 is a schematic structural diagram of a vehicle according to an embodiment of the present application. The vehicle may include:
The processor 902, when executing the program, implements the object recognition method of the vehicle provided in the above-described embodiments.
Further, the vehicle further includes:
a communication interface 903 for communication between the memory 901 and the processor 902.
A memory 901 for storing computer programs executable on the processor 902.
The Memory 901 may include a high-speed RAM (Random Access Memory) Memory, and may also include a nonvolatile Memory, such as at least one disk Memory.
If the memory 901, the processor 902, and the communication interface 903 are implemented independently, the communication interface 903, the memory 901, and the processor 902 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 901, the processor 902, and the communication interface 903 are integrated on one chip, the memory 901, the processor 902, and the communication interface 903 may complete mutual communication through an internal interface.
The processor 902 may be a CPU (Central Processing Unit), an ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the object identification method of the vehicle as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array, a field programmable gate array, or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A method of identifying an object of a vehicle, comprising the steps of:
acquiring an environment image around a vehicle;
inputting the environment image into a pre-trained target detection model, and outputting a target recognition result of the environment image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing the output features of the current feature layer with the output features of all feature layers before the current feature layer to serve as the input of the next feature layer, extracting the feature diagram of the environment image by using the residual error network, inputting the feature diagram into the feature extraction network, outputting a fusion feature, inputting the fusion feature into the prediction network, and outputting the target recognition result of the environment image.
2. The method according to claim 1, wherein the feature extraction network comprises a feature pyramid network and a pooling network, wherein an attention mechanism network is arranged between each of the output features and the pooling network in the feature layer and the feature pyramid network, the feature map is converted into feature vectors by using the pooling network, the feature maps and the feature vectors are input into the feature pyramid network by using the attention mechanism network, and the fused features of the image features and the feature vectors are output.
3. The method of claim 2, wherein the attention mechanism of the attention mechanism network comprises:
carrying out global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels;
and performing convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
4. The method of claim 2, wherein the pooled network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels are 1 x 1,4 x 4,7 x 7, 10 x 10, and 13 x 13, respectively.
5. An object recognition apparatus of a vehicle, characterized by comprising:
the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring an environment image around a vehicle;
the detection module is used for inputting the environment image into a pre-trained target detection model and outputting a target recognition result of the environment image, wherein the target detection model comprises a residual error network, a feature extraction network and a prediction network, the residual error network comprises a plurality of feature layers and a dense connection network, the dense connection network is used for splicing output features of a current feature layer with output features of all feature layers before the current feature layer and is used as input of a next feature layer, a feature map of the environment image is extracted by using the residual error network, the feature map is input into the feature extraction network, fusion features are output, the fusion features are input into the prediction network, and the target recognition result of the environment image is output.
6. The apparatus according to claim 5, wherein the feature extraction network comprises a feature pyramid network and a pooling network, wherein an attention mechanism network is disposed between each of the output features and the pooling network in the feature layer and the feature pyramid network, the feature map is converted into feature vectors by using the pooling network, the feature maps and the feature vectors are input into the feature pyramid network by using the attention mechanism network, and the fused features of the image features and the feature vectors are output.
7. The apparatus of claim 6, wherein the attention mechanism of the attention mechanism network comprises: carrying out global average pooling operation on the feature images of all channels output by the feature layer to obtain new feature maps of all channels; and performing convolution operation with the convolution kernel size of k on each channel, obtaining the weight of each channel through a preset activation function, and obtaining cross-channel interaction information based on the new characteristic diagram and the weight of each channel.
8. The apparatus of claim 6, wherein the pooling network comprises a plurality of convolution kernels, wherein the plurality of convolution kernels are 1 x 1,4 x 4,7 x 7, 10 x 10, and 13 x 13, respectively.
9. A vehicle, characterized by comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of object identification of a vehicle as claimed in any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing an object recognition method of a vehicle according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211350460.8A CN115661767A (en) | 2022-10-31 | 2022-10-31 | Image front vehicle target identification method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211350460.8A CN115661767A (en) | 2022-10-31 | 2022-10-31 | Image front vehicle target identification method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115661767A true CN115661767A (en) | 2023-01-31 |
Family
ID=84994693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211350460.8A Pending CN115661767A (en) | 2022-10-31 | 2022-10-31 | Image front vehicle target identification method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115661767A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116189115A (en) * | 2023-04-24 | 2023-05-30 | 青岛创新奇智科技集团股份有限公司 | Vehicle type recognition method, electronic device and readable storage medium |
CN116416504A (en) * | 2023-03-16 | 2023-07-11 | 北京瑞拓电子技术发展有限公司 | Expressway foreign matter detection system and method based on vehicle cooperation |
-
2022
- 2022-10-31 CN CN202211350460.8A patent/CN115661767A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116416504A (en) * | 2023-03-16 | 2023-07-11 | 北京瑞拓电子技术发展有限公司 | Expressway foreign matter detection system and method based on vehicle cooperation |
CN116416504B (en) * | 2023-03-16 | 2024-02-06 | 北京瑞拓电子技术发展有限公司 | Expressway foreign matter detection system and method based on vehicle cooperation |
CN116189115A (en) * | 2023-04-24 | 2023-05-30 | 青岛创新奇智科技集团股份有限公司 | Vehicle type recognition method, electronic device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115661767A (en) | Image front vehicle target identification method based on convolutional neural network | |
CN111738037B (en) | Automatic driving method, system and vehicle thereof | |
CN113468967A (en) | Lane line detection method, device, equipment and medium based on attention mechanism | |
CN115457395A (en) | Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion | |
CN112464912B (en) | Robot end face detection method based on YOLO-RGGNet | |
CN112541460B (en) | Vehicle re-identification method and system | |
CN112307978A (en) | Target detection method and device, electronic equipment and readable storage medium | |
CN114913498A (en) | Parallel multi-scale feature aggregation lane line detection method based on key point estimation | |
CN115937520A (en) | Point cloud moving target segmentation method based on semantic information guidance | |
CN111476190A (en) | Target detection method, apparatus and storage medium for unmanned driving | |
CN117456480B (en) | Light vehicle re-identification method based on multi-source information fusion | |
CN114399737A (en) | Road detection method and device, storage medium and electronic equipment | |
WO2022120996A1 (en) | Visual position recognition method and apparatus, and computer device and readable storage medium | |
CN111899283B (en) | Video target tracking method | |
US20230281867A1 (en) | Methods performed by electronic devices, electronic devices, and storage media | |
CN117079265A (en) | Method, device and equipment for generalizing open set field of monocular 3D target detection | |
Gao et al. | A lane-changing detection model using span-based transformer | |
CN113591543B (en) | Traffic sign recognition method, device, electronic equipment and computer storage medium | |
CN114820755A (en) | Depth map estimation method and system | |
CN117576148A (en) | Target tracking method, device, electronic equipment and storage medium | |
CN113033444A (en) | Age estimation method and device and electronic equipment | |
CN118365992B (en) | Training method and device for scene detection model, electronic equipment and storage medium | |
CN118552839B (en) | Lightweight underwater target video detection method | |
CN115063594B (en) | Feature extraction method and device based on automatic driving | |
US20230410532A1 (en) | Object detection device, monitoring device, training device, and model generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |