CN115187964A - Automatic driving decision-making method based on multi-sensor data fusion and SoC chip - Google Patents
Automatic driving decision-making method based on multi-sensor data fusion and SoC chip Download PDFInfo
- Publication number
- CN115187964A CN115187964A CN202211082826.8A CN202211082826A CN115187964A CN 115187964 A CN115187964 A CN 115187964A CN 202211082826 A CN202211082826 A CN 202211082826A CN 115187964 A CN115187964 A CN 115187964A
- Authority
- CN
- China
- Prior art keywords
- data
- image
- point cloud
- layer
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000001514 detection method Methods 0.000 claims abstract description 89
- 238000003062 neural network model Methods 0.000 claims abstract description 39
- 238000000605 extraction Methods 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 24
- 238000011176 pooling Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000001537 neural effect Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000008901 benefit Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004313 glare Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/28—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/582—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Remote Sensing (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Radar, Positioning & Navigation (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Automation & Control Theory (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses an automatic driving decision-making method and an SoC chip based on multi-sensor data fusion, belonging to the technical field of machine learning and automatic driving, wherein an image sensor acquires image data of a road, inputs the image data into a trained image target detection neural network model, carries out lane image target detection and outputs target detection data of a lane image; the method comprises the following steps that a laser radar collects 3D point cloud data, the point cloud data are input into a trained point cloud target detection neural network model to carry out obstacle target detection, and obstacle information output by a binocular camera is fused to generate obstacle position and distance data; and performing data fusion on the lane image data and the obstacle position distance data, and correcting the road condition information of vehicle driving as a basis for automatic driving decision. The scheme of the invention can fully meet the real-time requirement in the automatic driving scene, and simultaneously fuses different sensor data, so that the accuracy of road condition analysis is greatly improved.
Description
Technical Field
The invention belongs to the technical field of machine learning and automatic driving, and particularly relates to an automatic driving decision method based on multi-sensor data fusion and an SoC chip.
Background
The automatic driving technology is more and more concerned by whole vehicle enterprises, and some whole vehicle enterprises invest more and more manpower and material resources to develop automatic driving vehicles, and even the automatic driving vehicles are used as target mass production points of 5-10 years in the future. The realization of automatic driving is divided into three stages of cognition, judgment and control, and the current automatic driving technology has many problems in the aspects of cognitive stages and path generation such as road identification and pedestrian identification and judgment stages such as condition judgment.
Along with the rapid development of artificial intelligence in the years, the application of the artificial intelligence in the field of automatic driving is more and more common, and a Chinese patent with the publication number of CN114708566A discloses an automatic driving target detection method based on improved YOLOv4, which comprises the following specific steps: acquiring a target detection common data set, and preprocessing the acquired data set through Mosaic; constructing a new non-maximum value inhibition algorithm Soft-CIOU-NMS by using NMS, soft-NMS and a CIOU loss function; improving a feature extraction network of YOLOv4, and increasing the three-scale prediction of the original YOVOv4 to four-scale prediction; the ordinary convolution of YOLOv4 is improved, and the depth separable convolution is used for replacing the ordinary convolution, so that the detection speed is accelerated; and the YOLOv4 network structure is improved, and a CBAM attention mechanism is added to enhance the feature extraction capability. However, depending on the image alone as the judgment basis, the image may be deviated in the detection and classification step due to the deviation of the moving image, and a judgment error may occur in the operation of the vehicle due to a set threshold value or an error in image cropping or feature extraction, thereby giving an erroneous command.
With the continuous improvement and popularization of 3D equipment such as laser radars, depth cameras and the like, automatic driving under a real three-dimensional scene becomes possible, the requirements of an automatic driving system on identification and detection of targets in a complex scene are improved, and the requirements of safety and convenience are met. In the automatic driving device, data acquisition is usually performed by an image sensor, a laser sensor and a radar, and a sensor number is combined for comprehensive analysis, so that relevant operations are realized according to an analysis result. The 2D target detection cannot meet the requirement of sensing environment of the unmanned vehicle, the 3D target detection can identify object types and information such as length, width, height, rotation angle and the like in a three-dimensional space, the 3D target detection is applied to the unmanned vehicle to detect targets in a scene, and the automatic vehicle can accurately predict and plan own behaviors and paths by estimating the actual position, so that collision and violation are avoided, the occurrence of traffic accidents can be greatly reduced, and the intellectualization of urban traffic is realized.
In order to solve the problem that the operation of a vehicle is wrongly decided due to the dynamic image deviation existing in a single image sensor, the Chinese patent invention with the publication number of CN114782729A provides a real-time target detection method based on laser radar and vision fusion, which comprises the following steps: acquiring camera image data and three-dimensional laser radar scanning point data of the surrounding environment of the vehicle, converting the point cloud data into a local rectangular coordinate system, and preprocessing the 3D point cloud; performing density clustering on the preprocessed 3D point cloud data, and extracting a 3D region of interest of a target and corresponding point cloud characteristics; s3: screening out sparse clusters of a target 3D region of interest, mapping to a corresponding region of an image, extracting image features and fusing with point cloud features; and inputting the point cloud characteristics and the image characteristics of all the interested areas into an SSD detector, and positioning and identifying the target. The extraction algorithm of the point cloud characteristics comprises PointNet + +, pointNet, voxelNet or SECOND algorithm. However, the problem that the point cloud feature extraction algorithm is low in detection speed generally exists in the technical scheme, for example, the target detection speed of PointNet is only 5.7Hz, a PointNet + + model is proposed based on a dense point cloud data set, the performance on a laser radar sparse point cloud data set is difficult to meet requirements, the VoxelNet algorithm uses 3D convolution to cause overlarge calculation amount, the processing speed is only 4.4Hz, and although the SECOND algorithm is improved on the VoxelNet algorithm, the processing speed is increased to 20Hz, and the real-time requirement under an automatic driving scene is still difficult to meet.
Disclosure of Invention
The invention provides an automatic driving decision-making method based on multi-sensor data fusion and an SoC (system on chip) chip, and aims to solve the problems of low efficiency of road condition information processing and misjudgment depending on a single image sensor in automatic driving in the prior art.
In order to solve the technical problems, the automatic driving decision is carried out based on multi-sensor data fusion, different neural network models are trained to be respectively used for detecting image sensor data and laser radar sensor data, and the specific scheme is as follows:
the automatic driving decision-making method based on multi-sensor data fusion comprises the following steps:
s1: the method comprises the steps that an RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data and traffic sign data;
s2: inputting the lane line data, the vehicle data, the pedestrian data and the traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the image target detection neural network model adopts a YOLOv7 target detection algorithm;
s3: the method comprises the steps that 3D point cloud data are collected through a laser radar, the point cloud data are input into a trained point cloud target detection neural network model, distance feature extraction and feature fusion are conducted, target position and distance data are output, fusion is conducted on the target position and distance information output by a binocular camera, and final obstacle position and distance data are generated, wherein a PointPillar target detection algorithm is adopted by the neural network model;
s4: carrying out data fusion on the lane image data generated in the step S2 and the obstacle position distance data generated in the step S3, analyzing whether errors exist in each sensor or not, and correcting the road condition information of vehicle driving;
s5: and making corresponding decision according to the road condition information corrected in the step S4 and applying the decision to automatic driving.
Preferably, the training process of the neural model for detecting image targets in step S2 specifically includes the following steps:
s2-1: establishing a data set of lanes, pedestrians and traffic signs, wherein the data set is used for training a neural network model;
s2-2: preprocessing the lane, pedestrian and traffic sign data sets to generate RGB format images with set resolution;
s2-3: sequentially enabling the format image to pass through an image feature extraction layer, an image feature fusion layer and an image target detection layer of a YOLOv7 network to obtain a neural network model;
s2-4: checking whether the training times reach a set target or not, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
Preferably, the training process of the point cloud target detection neural model in step S3 specifically includes the following steps:
s3-1: establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model;
s3-2: preprocessing the laser radar data set to generate format point cloud data;
s3-3: sequentially passing the format point cloud data through a feature conversion layer, a feature extraction layer and a target detection layer of a PointPillar network to obtain a neural network model;
s3-4: checking whether the training times reach a set target or not, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
Preferably, in the data fusion of step S4, the processed image data and the radar data are matched in a decision layer fusion manner, and the obstacle position and distance detection result generated by the radar data are mapped to the coordinates of the image data to form a comprehensive characteristic diagram.
Preferably, the YOLOv7 network model comprises an image input layer, an image feature extraction layer, an image feature fusion layer and an image target detection layer; the image input layer aligns input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
Preferably, the PointPillar network model comprises a point cloud feature conversion layer, a point cloud feature extraction layer and a point cloud target detection layer; the point cloud feature conversion layer converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer processes the pseudo image to obtain features of a high layer; the point cloud target detection layer detects the position and the distance of a target through a regression 3D frame.
Preferably, the method for detecting the traffic sign data adopts an improved lightweight convolutional neural network, the lightweight convolutional neural network uses an expansion convolution to realize a sliding window method, and statistical information in a data set is used for accelerating the forward propagation speed of the network, so that the efficiency of detecting the traffic sign is improved.
Preferably, the format of the point cloud data input into the pilar feature layer is P × N × D, where P is the selected number of pilars, N is the maximum number of point clouds stored in each pilar, and D is the dimensional attribute of the point cloud.
Preferably, the dimensional attributes of the point cloud are 9-dimensional data, characterized by:
whereinIs the original laser radar point cloud data,represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,which represents the reflected intensity of the laser light,indicating the offset of the laser point cloud in pilar from the center of the N point clouds,indicating the offset of the laser point cloud from the pilar coordinates.
An automatic driving decision SoC chip based on multi-sensor data fusion comprises a general processor and a neural network processor; the general-purpose processor controls the operation of the neural network processor through the self-defined instruction, and the neural network processor is used for executing the method.
Compared with the prior art, the invention has the following technical effects:
1. the invention combines the laser radar and the binocular camera to realize the detection of the position and the distance of the obstacle, can efficiently utilize the advantage that the point cloud data has accurate spatial information, simultaneously utilizes the binocular camera to make up the problem that the laser radar has positioning error when working in a severe environment, enlarges the application range of the obstacle detection, and meets the robustness requirement of the obstacle detection in an automatic driving scene.
2. The invention adopts the PointPillar network model to process laser radar data, operates on a columnar body (Pillar) but not a Voxel (Voxel), does not need to manually adjust the box separation in the vertical direction, uses Pillar to represent point cloud, can be used for 3D point cloud detection only by 2D convolution, greatly reduces the calculated amount, increases the processing speed to over 62Hz, and can effectively meet the real-time requirement under the automatic driving scene.
3. Aiming at different tasks, the invention adopts different neural network models for processing, fully exerts the advantages of each neural network model, enables the data before the comprehensive data fusion to be synchronously processed, and improves the integrity of the decision method.
4. The method combines the lane image data, the image data of other traffic participants, the image data of traffic signs and the position distance data of the obstacles to perform comprehensive data fusion on a decision layer, so that the drivable area and the obstacles can be judged more accurately in the automatic driving scene, the target recognition capability is good, and the sensing accuracy of the vehicle to the surrounding environment is improved.
Drawings
FIG. 1 is a flow chart of an automated driving decision method based on multi-sensor data fusion in accordance with the present invention;
FIG. 2 is a flow chart of a YOLOv7 network model structure of the automatic driving decision method based on multi-sensor data fusion according to the present invention;
FIG. 3 is a flow chart of a PointPillar network model structure of the multi-sensor data fusion-based automatic driving decision method of the present invention.
In the figure: 1. an image input layer; 2. an image feature extraction layer; 3. an image target detection layer; 4. a point cloud feature conversion layer; 5. a point cloud feature extraction layer; 6. a point cloud target detection layer; 21. a convolution module; 22. a first pooling module; 23. a second pooling module; 24. a third pooling module; 41. point cloud data; 42. stacking column data; 43. acquiring characteristic data; 44. pseudo image data.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail and completely with reference to the accompanying drawings.
Referring to fig. 1-3, the present invention provides an automatic driving decision method based on multi-sensor data fusion, comprising the following steps:
s1: the RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data, traffic sign data and other traffic participant data.
S2: inputting lane line data, vehicle data, pedestrian data and traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the neural network model adopts a YOLOv7 target detection algorithm.
S3: the method comprises the steps that 3D point cloud data are collected through a laser radar, the point cloud data are input into a trained point cloud target detection neural network model, distance feature extraction and feature fusion are conducted, target position and distance data are output, fusion is conducted on the target position and distance information output by a binocular camera, final obstacle position and distance data are generated, and a PointPillar target detection algorithm is adopted by the neural network model. The method can efficiently utilize the advantage that the point cloud data has accurate spatial information, and simultaneously utilizes the binocular camera to make up the problem that the laser radar has positioning errors when working in a severe environment, thereby expanding the application range of obstacle detection and meeting the robustness requirement of the obstacle detection in an automatic driving scene.
S4: and (3) carrying out data fusion on the lane image data generated in the step (S2) and the obstacle position distance data generated in the step (S3), analyzing whether errors exist in each sensor, and correcting the road condition information of vehicle driving.
S5: and making corresponding decision according to the road condition information corrected in the step S4 and applying the decision to automatic driving.
The training process of the image target detection neural model in the step S2 specifically comprises the following steps:
s2-1: and establishing a data set of lanes, pedestrians and traffic signs, wherein the data set comprises normal, crowded, night, lane-free lines, shadows, arrows, glare, curves, intersections and lane condition types under different weather and climate conditions, and also comprises pedestrians, animals, non-motor vehicles and other obstacles, and the data set is used for training a neural network model.
In this embodiment, in order to fully train the YOLOv7 neural network model, the TuSimple data set is used in cooperation with the CULane data set to train the detection of the targets of the lanes and the vehicles, and the RESIDE data set is used to train the detection of the targets of other traffic participants in road traffic.
S2-2: preprocessing a data set of lanes, pedestrians and traffic signs to generate an RGB format image with set resolution, wherein the format image is an RGB three-channel format image with 640 x 640 resolution according to the characteristics of the YOLOv7 input layer 1.
S2-3: and sequentially passing the format image through an image feature extraction layer 2, an image feature fusion layer and an image target detection layer 3 of a YOLOv7 network to obtain a neural network model.
S2-4: checking whether the training times reach a set target, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
S3, the training process of the point cloud target detection neural model specifically comprises the following steps:
s3-1: and establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model, and the data set can adopt data sets such as LiDAR-Video Driving Dataset, KITTI, pandaset, waymo, lyft Level 5, DAIR-V2X, nuScenes and the like.
S3-2: and preprocessing the laser radar data set to generate format point cloud data.
The data format of the point cloud inputted into the Pillar feature layer is P × N × D, where P is the number of selected Pillars, N is the maximum number of point clouds stored in each Pillar, and D is the dimensional attribute of the point cloud.
The dimensional attribute of the point cloud is 9-dimensional data, and is characterized in that:
whereinIs the original laser radar point cloud data,represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,which represents the reflected intensity of the laser light,indicating the offset of the laser point cloud in pilar from the center of the N point clouds,indicating the offset of the laser point cloud from the pilar coordinates.
In this embodiment, the number P of the pilars is 30000, the maximum number of the point clouds stored in each pilar is 20, if the number of the point clouds in a certain pilar is greater than 20, 20 point clouds are randomly sampled and discarded, and if the number of the point clouds in a certain pilar is less than 20, 0 padding is used for supplementing the point clouds. Thus, the point cloud data format input to the pilar feature layer is P × N × D (30000 × 20 × 9).
S3-3: and sequentially passing the format point cloud data through a point cloud feature conversion layer 4, a point cloud feature extraction layer 5 and a point cloud target detection layer 6 of the PointPillar network to obtain a neural network model.
S3-4: checking whether the training times reach a set target, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
The YOLOv7 network model comprises an image input layer 1, an image feature extraction layer 2, an image feature fusion layer and an image target detection layer 3; the image input layer aligns the input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy rate of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
The image feature extraction layer 2 comprises a convolution module 21, a first pooling module 22, a second pooling module 23 and a third pooling module 24 which are arranged in sequence; the convolution module 21 outputs a 4-time down-sampled feature map B, the first pooling module 22 receives the feature map B and processes the feature map B to output an 8-time down-sampled feature map C, the second pooling module 23 receives the feature map C and processes the feature map C to output a 16-time down-sampled feature map D, and the third pooling module 24 receives the feature map C and processes the feature map C to output a 32-time down-sampled feature map E. The convolution module 21 includes four CBR convolution layers and an ELAN layer sequentially disposed. The first, second and third pooling modules 22, 23 and 24 are the largest pooling layer MP1 and ELAN layer sequentially disposed.
And the image target detection layer 3 performs pyramid pooling on the feature map E, and outputs three target detection results with different sizes through three branches, namely a ReVGG _ block layer REP and a layer of convolution CONV.
The PointPillar network model comprises a point cloud feature conversion layer 4, a point cloud feature extraction layer 5 and a point cloud target detection layer 6; the point cloud feature conversion layer 4 converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer 5 processes the pseudo image to obtain the features of a high layer; the point cloud target detection layer 6 performs Bbox regression through the SSD detection head to realize the position and distance of the 3D frame detection target.
The point cloud feature conversion layer 4 converges the input P × N × D (30000 × 20 × 9) point cloud data 41 into stacked column data 42, then sequentially applies simplified PointNet and 1 × 1 convolution to each point cloud to obtain learned feature data 43, and finally moves the point cloud back to the original position according to the index to obtain pseudo image data 44, wherein the size of the pseudo image data 44 is H × W × C (512 × 512 × 64), where H is the pixel height of the pseudo image, W is the pixel width of the pseudo image, and C is the channel number of the pseudo image.
The processing flow of the point cloud feature extraction layer 5 comprises 3 steps: carrying out progressive downsampling on an input pseudo image to form a pyramid characteristic; corresponding features are up-sampled to a uniform size; and splicing the three uniform characteristics. Wherein the down-sampling is performed by a sequence ofThe components of the composition are as follows,is relative to the stride of the pseudo-image,is the number of 2D convolution layers of size 3 x 3,is the number of output channels. For operation of up-samplingIt is shown that the process of the present invention,andrepresenting the number of stride inputs and the number of outputs, and finally obtaining F characteristics to be spliced together.
The detection method of the traffic sign data adopts an improved lightweight convolution neural network, the lightweight convolution neural network realizes a sliding window method by using expansion convolution, and statistical information in a data set is used for accelerating the forward propagation speed of the network so as to improve the detection efficiency of the traffic sign.
And S4, fusing data, namely matching the processed image data with radar data in a decision layer fusion mode, and mapping the obstacle position and distance detection result generated by the obstacle data to the coordinates of the image data to form a comprehensive characteristic diagram. Both the obstacle information data and the image data are converted into BEV coordinates. The barrier information data is regarded as a multi-channel image under polar coordinates, the channel of the multi-channel image is characterized by doppler, and the multi-channel image under BEV can be regarded after coordinate conversion is carried out; the image data can be regarded as a multi-channel image under BEV after coordinate conversion. The two data are in the same coordinate system, and the two data are fused on a multi-scale by using a Concat-based mode. The method has the advantages that the vehicle driving road surface is judged by using the fused data, so that the judgment of the drivable area and the barrier in the automatic driving scene is more accurate, the target recognition capability is good, and the accuracy of the vehicle on the sensing of the surrounding environment is improved.
An automatic driving decision SoC chip based on multi-sensor data fusion comprises a general processor and a neural network processor; the general processor controls the operation of the neural network processor through the self-defined instruction, and the neural network processor is used for executing the method.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various changes and modifications without departing from the inventive concept, and these changes and modifications are all within the scope of the present invention.
Claims (10)
1. The automatic driving decision method based on multi-sensor data fusion is characterized by comprising the following steps of:
s1: the method comprises the steps that an RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data and traffic sign data;
s2: inputting the lane line data, the vehicle data, the pedestrian data and the traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the image target detection neural network model adopts a YOLOv7 target detection algorithm;
s3: collecting 3D point cloud data by a laser radar, inputting the point cloud data into a trained point cloud target detection neural network model, performing distance feature extraction and feature fusion, outputting target position and distance data, fusing the target position and distance data with target position and distance information output by a binocular camera, and generating final obstacle position and distance data, wherein the neural network model adopts a PointPillar target detection algorithm;
s4: carrying out data fusion on the lane image data generated in the step S2 and the obstacle position distance data generated in the step S3, analyzing whether errors exist in all sensors or not, and correcting the road condition information of vehicle driving;
s5: and making corresponding decisions according to the road condition information corrected in the step S4, and applying the decisions to automatic driving.
2. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the training process of the image target detection neural model in the step S2 specifically comprises the following steps:
s2-1: establishing a data set of lanes, pedestrians and traffic signs, wherein the data set is used for training a neural network model;
s2-2: preprocessing the lane, pedestrian and traffic sign data sets to generate RGB format images with set resolution;
s2-3: sequentially enabling the format image to pass through an image feature extraction layer, an image feature fusion layer and an image target detection layer of a YOLOv7 network to obtain a neural network model;
s2-4: checking whether the training times reach a set target, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
3. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the training process of the point cloud target detection neural model in the step S3 specifically comprises the following steps:
s3-1: establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model;
s3-2: preprocessing the laser radar data set to generate format point cloud data;
s3-3: sequentially passing the format point cloud data through a feature conversion layer, a feature extraction layer and a target detection layer of a PointPillar network to obtain a neural network model;
s3-4: checking whether the training times reach a set target or not, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
4. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the data fusion of step S4 is performed by matching the processed image data with the radar data in a decision layer fusion manner, and mapping the obstacle position and distance detection result generated by the radar data to the coordinates of the image data to form a comprehensive feature map.
5. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the YOLOv7 network model comprises an image input layer, an image feature extraction layer, an image feature fusion layer and an image target detection layer; the image input layer aligns input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
6. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the PointPillar network model comprises a point cloud feature conversion layer, a point cloud feature extraction layer and a point cloud target detection layer; the point cloud feature conversion layer converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer processes the pseudo image to obtain the features of a high layer; the point cloud target detection layer detects the position and the distance of a target through a regression 3D frame.
7. The multi-sensor data fusion-based automatic driving decision method according to claim 1, characterized in that the detection method of the traffic sign data adopts an improved lightweight convolutional neural network, the lightweight convolutional neural network uses an expansion convolution to implement a sliding window method, and statistical information in a data set is used to accelerate the forward propagation speed of the network, so as to improve the efficiency of the traffic sign detection.
8. The multi-sensor data fusion-based automatic driving decision method of claim 6, wherein the point cloud data format of the input Pillar feature layer is P x N x D, where P is the selected Pillar number, N is the maximum point cloud number stored by each Pillar, and D is the dimensional attribute of the point cloud.
9. The multi-sensor data fusion-based automated driving decision method of claim 8, wherein the point cloud has a dimensional attribute of 9-dimensional data characterized as:
whereinIs the original laser radar point cloud data,represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,which represents the intensity of the reflection of the laser light,indicating the offset of the laser point cloud in pilar from the center of the N point clouds,indicating the offset of the laser point cloud from the pilar coordinates.
10. An automatic driving decision SoC chip based on multi-sensor data fusion is characterized in that the SoC chip comprises a general processor and a neural network processor; the general purpose processor controls the operation of a neural network processor through custom instructions, the neural network processor being configured to perform the method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211082826.8A CN115187964A (en) | 2022-09-06 | 2022-09-06 | Automatic driving decision-making method based on multi-sensor data fusion and SoC chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211082826.8A CN115187964A (en) | 2022-09-06 | 2022-09-06 | Automatic driving decision-making method based on multi-sensor data fusion and SoC chip |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115187964A true CN115187964A (en) | 2022-10-14 |
Family
ID=83523212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211082826.8A Pending CN115187964A (en) | 2022-09-06 | 2022-09-06 | Automatic driving decision-making method based on multi-sensor data fusion and SoC chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115187964A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115984802A (en) * | 2023-03-08 | 2023-04-18 | 安徽蔚来智驾科技有限公司 | Target detection method, computer-readable storage medium and driving equipment |
CN116229452A (en) * | 2023-03-13 | 2023-06-06 | 无锡物联网创新中心有限公司 | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion |
CN116453087A (en) * | 2023-03-30 | 2023-07-18 | 无锡物联网创新中心有限公司 | Automatic driving obstacle detection method of data closed loop |
CN117111055A (en) * | 2023-06-19 | 2023-11-24 | 山东高速集团有限公司 | Vehicle state sensing method based on thunder fusion |
CN117197019A (en) * | 2023-11-07 | 2023-12-08 | 山东商业职业技术学院 | Vehicle three-dimensional point cloud image fusion method and system |
CN117944059A (en) * | 2024-03-27 | 2024-04-30 | 南京师范大学 | Track planning method based on vision and radar feature fusion |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886477A (en) * | 2017-09-20 | 2018-04-06 | 武汉环宇智行科技有限公司 | Unmanned neutral body vision merges antidote with low line beam laser radar |
US20210094580A1 (en) * | 2019-09-30 | 2021-04-01 | Toyota Jidosha Kabushiki Kaisha | Driving control apparatus for automated driving vehicle, stop target, and driving control system |
CN113420637A (en) * | 2021-06-18 | 2021-09-21 | 北京轻舟智航科技有限公司 | Laser radar detection method under multi-scale aerial view angle in automatic driving |
CN114120115A (en) * | 2021-11-19 | 2022-03-01 | 东南大学 | Point cloud target detection method for fusing point features and grid features |
CN114359181A (en) * | 2021-12-17 | 2022-04-15 | 上海应用技术大学 | Intelligent traffic target fusion detection method and system based on image and point cloud |
CN114397877A (en) * | 2021-06-25 | 2022-04-26 | 南京交通职业技术学院 | Intelligent automobile automatic driving system |
-
2022
- 2022-09-06 CN CN202211082826.8A patent/CN115187964A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886477A (en) * | 2017-09-20 | 2018-04-06 | 武汉环宇智行科技有限公司 | Unmanned neutral body vision merges antidote with low line beam laser radar |
US20210094580A1 (en) * | 2019-09-30 | 2021-04-01 | Toyota Jidosha Kabushiki Kaisha | Driving control apparatus for automated driving vehicle, stop target, and driving control system |
CN113420637A (en) * | 2021-06-18 | 2021-09-21 | 北京轻舟智航科技有限公司 | Laser radar detection method under multi-scale aerial view angle in automatic driving |
CN114397877A (en) * | 2021-06-25 | 2022-04-26 | 南京交通职业技术学院 | Intelligent automobile automatic driving system |
CN114120115A (en) * | 2021-11-19 | 2022-03-01 | 东南大学 | Point cloud target detection method for fusing point features and grid features |
CN114359181A (en) * | 2021-12-17 | 2022-04-15 | 上海应用技术大学 | Intelligent traffic target fusion detection method and system based on image and point cloud |
Non-Patent Citations (1)
Title |
---|
伍晓晖等: "交通标志识别方法综述", 《计算机工程与应用》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115984802A (en) * | 2023-03-08 | 2023-04-18 | 安徽蔚来智驾科技有限公司 | Target detection method, computer-readable storage medium and driving equipment |
CN116229452A (en) * | 2023-03-13 | 2023-06-06 | 无锡物联网创新中心有限公司 | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion |
CN116229452B (en) * | 2023-03-13 | 2023-11-17 | 无锡物联网创新中心有限公司 | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion |
CN116453087A (en) * | 2023-03-30 | 2023-07-18 | 无锡物联网创新中心有限公司 | Automatic driving obstacle detection method of data closed loop |
CN116453087B (en) * | 2023-03-30 | 2023-10-20 | 无锡物联网创新中心有限公司 | Automatic driving obstacle detection method of data closed loop |
CN117111055A (en) * | 2023-06-19 | 2023-11-24 | 山东高速集团有限公司 | Vehicle state sensing method based on thunder fusion |
CN117197019A (en) * | 2023-11-07 | 2023-12-08 | 山东商业职业技术学院 | Vehicle three-dimensional point cloud image fusion method and system |
CN117944059A (en) * | 2024-03-27 | 2024-04-30 | 南京师范大学 | Track planning method based on vision and radar feature fusion |
CN117944059B (en) * | 2024-03-27 | 2024-05-31 | 南京师范大学 | Track planning method based on vision and radar feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111583337B (en) | Omnibearing obstacle detection method based on multi-sensor fusion | |
CN109948661B (en) | 3D vehicle detection method based on multi-sensor fusion | |
CN115187964A (en) | Automatic driving decision-making method based on multi-sensor data fusion and SoC chip | |
CN112149550B (en) | Automatic driving vehicle 3D target detection method based on multi-sensor fusion | |
CN110738121A (en) | front vehicle detection method and detection system | |
CN113095152B (en) | Regression-based lane line detection method and system | |
CN113936139A (en) | Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation | |
CN115049700A (en) | Target detection method and device | |
CN114821507A (en) | Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving | |
CN114639115B (en) | Human body key point and laser radar fused 3D pedestrian detection method | |
CN117274749B (en) | Fused 3D target detection method based on 4D millimeter wave radar and image | |
CN117284320A (en) | Vehicle feature recognition method and system for point cloud data | |
Kanchana et al. | Computer vision for autonomous driving | |
CN114155414A (en) | Novel unmanned-driving-oriented feature layer data fusion method and system and target detection method | |
CN113378647B (en) | Real-time track obstacle detection method based on three-dimensional point cloud | |
CN117593707B (en) | Vehicle identification method and device | |
CN118411517A (en) | Digital twin method and device for traffic road in confluence area | |
CN112950786A (en) | Vehicle three-dimensional reconstruction method based on neural network | |
CN116778449A (en) | Detection method for improving detection efficiency of three-dimensional target of automatic driving | |
CN116386003A (en) | Three-dimensional target detection method based on knowledge distillation | |
CN116403186A (en) | Automatic driving three-dimensional target detection method based on FPN Swin Transformer and Pointernet++ | |
US20220371606A1 (en) | Streaming object detection and segmentation with polar pillars | |
CN113611008B (en) | Vehicle driving scene acquisition method, device, equipment and medium | |
CN114820931A (en) | Virtual reality-based CIM (common information model) visual real-time imaging method for smart city | |
CN113762195A (en) | Point cloud semantic segmentation and understanding method based on road side RSU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20221014 |
|
RJ01 | Rejection of invention patent application after publication |