CN111626217B - Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion - Google Patents
Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion Download PDFInfo
- Publication number
- CN111626217B CN111626217B CN202010466491.4A CN202010466491A CN111626217B CN 111626217 B CN111626217 B CN 111626217B CN 202010466491 A CN202010466491 A CN 202010466491A CN 111626217 B CN111626217 B CN 111626217B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- dimensional
- image
- dimensional point
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000001514 detection method Methods 0.000 title claims abstract description 49
- 230000004927 fusion Effects 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000011218 segmentation Effects 0.000 claims abstract description 28
- 230000000694 effects Effects 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 10
- 238000013135 deep learning Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000012800 visualization Methods 0.000 claims description 5
- 230000009469 supplementation Effects 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion, which relates to the field of automatic driving target detection and tracking and comprises the following steps: s100, pre-training a deep Labv3+ model; s200, converting the three-dimensional point cloud data into a specified format; s300, preprocessing three-dimensional point cloud data in a specified format; s400, training a PointRCNN-deep Labv3+ model; s500, updating and tracking the target state. According to the invention, each laser data point feature contains space information and has an image semantic segmentation result, so that the identification effect of PointRCNN is improved, and the accuracy of identifying the pedestrian target with smaller target and higher similarity with the environment is improved.
Description
Technical Field
The invention relates to the field of automatic driving target detection and tracking, in particular to a target detection and tracking method based on fusion of a two-dimensional picture and a three-dimensional point cloud.
Background
At present, unmanned operation has reached the stage of L3-level landing, and all automobile main factories, automatic driving beginners, automobile system suppliers and university research and development institutions have listed landing as the current working center of gravity. The most core functional module in automatic driving is composed of a perception layer, a decision layer and a control layer. The perception layer mainly comprises the following components: and information acquisition is carried out on surrounding environments by devices such as a laser radar, a millimeter wave radar, a visual constant sensor and the like. The unmanned detection system carries out target detection according to the acquired image, three-dimensional point cloud and other data, and the scene segmentation and other recognition methods acquire the understanding of the unmanned vehicle to the surrounding environment, so that specific functions such as autonomous cruising, automatic lane changing, traffic sign recognition, traffic jam automatic driving, high-speed driving and the like can be realized. Different from the vision sensor, the laser radar can effectively improve the accuracy of the vehicle on the external environment perception modeling. The key technology of laser radar in automatic driving is mainly divided into three-dimensional point cloud segmentation, road extraction, environment modeling, obstacle detection and tracking and information fusion of various sensors by integrating various research and practical operations. The three-dimensional point cloud data volume produced by the laser radar can reach millions per second, and the common clustering algorithm cannot meet the requirement of data real-time calculation. The three-dimensional point cloud segmentation refers to that in order to quickly extract useful object information, the three-dimensional point cloud is segmented according to the integral features and the local features of the three-dimensional point cloud distribution, so that a plurality of independent subsets are formed. The expectation of each subset is that each subset corresponds to a perceived target that will have a physical meaning and reflects the geometric and pose characteristics of the target object. The three-dimensional point cloud segmentation is an important basis for guaranteeing the subsequent target classification and tracking performance of the laser radar. Currently, three-dimensional point cloud segmentation and object detection methods based on deep learning are prevailing.
In general, deep neural networks require input information to have a normalized format, such as two-dimensional images, time-sequential speech, and the like. The original three-dimensional point cloud data are often some unordered point sets in space, and a certain three-dimensional point cloud is assumed to contain N three-dimensional points, each point is represented by (x, y, z) three-dimensional coordinates, even if the changes of shielding, visual angles and the like are not considered, the points are arranged and combined in sequence, and N-! One possibility is to use a single-piece plastic. Therefore, we need to design a function so that the function value is independent of the order of input data.
In actual data labeling, a large amount of labeled data is required for training the deep neural network. The labeling of three-dimensional point cloud data in the market is mostly performed manually. The labeling staff can have a large number of conditions of false detection, missed detection and incapability of ensuring precision during operation. In order to solve the "pain point" on the market at present, an automatic labeling tool combined with a deep learning algorithm is necessary.
The three-dimensional point cloud target recognition method generally proposed at present can be divided into two major categories, namely a grid-based recognition method and a laser point-based recognition method. And converting the unordered three-dimensional points into ordered characteristics such as 3D voxels or 2D aerial view characteristics based on a grid recognition method, and then carrying out 3D target recognition by using a 3D or 2D convolutional neural network. Aiming at the problem of information loss in the three-dimensional point cloud data conversion process based on the grid recognition method, the current mainstream mode is to mutually fuse multiple sensors, so that the information can be supplemented and corrected. For example, MV3D-Net being industrialized, which fuses vision and laser point cloud information, searches for a target region of interest only through top view and front view of a three-dimensional point cloud, and combines image features for target recognition, unlike the previous voxel-based methods, which compromise the computational complexity and loss of data feature conversion process information. The AVOD model takes a three-dimensional point cloud aerial view angle and a corresponding image as input, cuts and scales the image by utilizing a 3D anchor point grid map, performs feature fusion on an interest domain output by the model, and finally obtains a target recognition result through a fully connected network. The MMF performs two-stage processing by utilizing laser radar data, on one hand, original RGB image information is introduced into depth features, RGBD images are obtained after mutual stitching and used for feature extraction as image complementary information, on the other hand, the laser radar data are converted into aerial view angles, rough interest areas are provided through a depth network, and laser point cloud features and image features in the areas are mutually stitched and fused for boundary frame fine adjustment, so that a more accurate target recognition result is obtained. ContFuse is to perform depth continuous fusion on the three-dimensional point cloud and the images under the multi-scale and multi-sensor through a double-flow network structure, so that high-precision three-dimensional space target detection and positioning are realized.
The other type is a target recognition method based on a laser point recognition method, which directly extracts effective features through laser point cloud data, and is also becoming popular after PointNet and PointNet++ are proposed. Because the PointNet method does not need a data point preprocessing process, the characteristic extraction difficulty caused by three-dimensional point cloud disorder is solved by using pooling operation, thereby effectively avoiding three-dimensional point cloud information loss and enabling a final target identification result to be relatively accurate. F-PointNet is used as a first network model for performing target identification by using PointNet, a 2D interest area is searched by using a mask RCNN, laser point cloud data in the interest area is obtained by combining depth information, and feature extraction and regression of target boundary frame parameters are performed through two PointNet steps. The PointRCNN only depends on laser point cloud data, and the feature extraction and the interest region extraction are carried out and the target recognition and fine adjustment of the second stage are carried out through the PointNet of the first stage, so that a more excellent target recognition result is obtained, and the image information supplement is not needed.
Because of the 3D space characteristics of the laser point cloud, any targets cannot be overlapped and overlapped as in a 2D space, interference factors are relatively few and difficulty is low in a multi-target tracking process, most of current 3D target tracking schemes are based on detected target tracking, namely targets in a three-dimensional space are identified through an identification model, the target identification result of the frame and a plurality of previous frame tracking results are compared and matched, and finally the tracking model is updated. At present, a tracking model AB3DMOT with highest processing frame rate in a three-dimensional point cloud space can track a target by only utilizing a PointRCNN model to identify the three-dimensional point cloud target and a 3D Kalman filter.
The labeling software commonly used in the market currently has the following problems:
1. the difficulty of pedestrian recognition is large: with the excellent performance of deep learning in the fields of images, laser radars and the like, more and more excellent target detection and tracking algorithms are proposed. Because of the physical characteristics of the laser radar, the precise distance information which is not possessed by the common camera is introduced, and the mutual shielding capability between targets is effectively avoided, so that the laser radar is more and more valued by researchers in the development stage of automatic driving. Taking the tracking reference of the automobile class of the KITTI test data set (KITTI is a standard test data set facing to automatic driving) as an example, the multi-target tracking accuracy of the laser point cloud can reach 88.89% at the highest. However, the effect of tracking pedestrians is less effective. By analyzing the two types of three-dimensional point cloud data structures, the number of laser points of the automobile targets is generally more, the occupied space is larger, the three-dimensional point cloud space has obvious L-shaped structures, I-shaped structures and the like, and the identification of the model is relatively simple; the number of laser points included in the pedestrian three-dimensional point cloud result is smaller, the corresponding volume is smaller, and the obvious distance limitation exists, namely, as the distance increases, the number of the pedestrian three-dimensional point cloud data points is linearly reduced, the laser points are sparse, the object recognition is not facilitated, the pedestrians can appear at any position in any scene, the scenes can comprise a series of background objects such as roadblocks, clusters and street lamps, the objects and the pedestrians can have a certain degree of similarity, and the difficulty of realizing the object recognition of the pedestrians through the laser point cloud data is further increased.
2. Tracking model complexity is high: comparing various tracking algorithms in the KITTI list, it can be found that these models mostly increase tracking accuracy at the expense of increasing system complexity and computational cost, which can make researchers have a great challenge to analysis of the modules, for example, for increasing tracking accuracy, researchers cannot effectively distinguish which parts of the system have the greatest contribution to the results, thus causing confusion. For example, there may be considerable differences between the excellent algorithmic models, such as FANTrack, DSM, extraCK, from the network structure and the data processing process, yet the tracking behavior is quite similar. Also in JCSTD, MOTBeyondPixels, the adverse effect of the increased computational cost is quite obvious, and despite excellent accuracy, the high computational performance and high time consumption required to make real-time tracking still far from being achieved, which further results in high dependent costs.
3. Tracking the number of targets varies: in the multi-target tracking process, ID exchange is one of the most common problems, namely, when multiple tracked targets are attached to or close to each other, the tracking model cannot effectively distinguish the tracked targets, so that an ID exchange phenomenon occurs. The laser radar data has smaller object shielding rate and no lamination or superposition phenomenon, so that the number of targets in the space is more than that of the targets contained in the images in the corresponding directions at the same moment, and the requirements on the stability of tracking and more various target numbers in the laser point cloud space are also stricter. The current target tracking method in the 3D space is mainly realized through a filter, and the method is relatively dependent on a target matching strategy, so that the effect of the model is also uneven.
Accordingly, those skilled in the art have been working to develop a method for target detection and tracking that achieves high efficiency and high accuracy.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention aims to solve the problems of low pedestrian category identification capability, complex tracking model and inapplicability of the conventional cross matching standard in the current three-dimensional point cloud data set.
The inventor designs a target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion, combines a feature extraction process taking PointRCNN in AB3DMOT as a main part and a deep Labv3+ picture example segmentation result to enable each laser data point feature to contain space information and have an image semantic segmentation result, and simultaneously provides a novel multi-condition joint judgment mode suitable for pedestrian tracks aiming at the defect of a relevance matching algorithm of AB3 DMOT.
In one embodiment of the invention, a target detection and tracking method based on fusion of a two-dimensional picture and a three-dimensional point cloud is provided, which comprises the following steps:
s100, pre-training a deep Labv3+ model;
s200, converting the three-dimensional point cloud data into a specified format;
s300, preprocessing three-dimensional point cloud data in a specified format;
s400, training a PointRCNN-deep Labv3+ model;
s500, updating and tracking the target state.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in the foregoing embodiment, the step S100 includes reading an image file in the Cityscapes, pre-training a model of deep labv3+ based on a truth file in a data set of the image file in combination with a corresponding image file, and adopting a specific loss function as a target until the accuracy is no longer significantly improved, ending training of the whole deep learning framework, and saving the corresponding neural network parameters.
Further, in the target detection and tracking method based on the fusion of the two-dimensional picture and the three-dimensional point cloud in the above embodiment, the specific loss function is formula (1), so that the model can realize accurate semantic segmentation of the image:
L deeplabv3+ (x)=∑w(x)log(p k (x)), (1)
wherein
x is the pixel position on the two-dimensional plane, a k (x) Representing the value of the kth channel corresponding to x in the final output layer of the neural network. P is p k (x) Representing the probability that a pixel belongs to class k. w (x) represents the classification result vector of the real label of the pixel x position, L deeplabv3+ (x) Representing the sum of the probabilities that x belongs to the class of correct tags.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, the step S100 further includes:
s110, inputting Cityscapes image data in training, wherein the Cityscapes image data comprises batch size, image quantity and channel quantity;
s120, the coding network obtains feature graphs with different hole sizes through hole convolution, and the feature graphs are input to a subsequent convolution network for feature extraction after superposition and splicing, so that an effective coding feature result is finally obtained;
s130, the decoding network performs information supplementation through the full convolution and the characteristics of the corresponding layers in the encoding network, performs layer-by-layer up sampling, finally restores to the original input image size, and outputs the classification information of each pixel point.
Further, in the target detection and tracking method based on the two-dimensional image and the three-dimensional point cloud fusion in the above embodiment, the above hole convolution is 1×1, 3×3, different sizes and different sampling rates.
Optionally, in the target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion in any of the foregoing embodiments, the pre-training in step S100 further includes graph semantic segmentation and image object classification:
s140, extracting image semantic segmentation information in the Cityscapes data set, and extracting classification information of target pixels;
s150, reading all image data, and configuring the image pixel classification meeting the requirements;
s160, deploying deep Labv3+ into one micro-service (dock) in the GPU server.
Further, in the target detection and tracking method based on the two-dimensional picture and the three-dimensional point cloud fusion in the above-described embodiment, the image object classification includes cars, trucks, pedestrians, riding persons, and the ground.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, step S100 further includes checking an effect of pre-training.
Further, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in the above embodiment, the method for checking the effect of pre-training includes developing the visualization in python using matplotlib library, then comparing the results with the true value, and taking the ratio of the intersection and union of the two sets of the true value and the predicted value in the image pixel (i.e. MIoU) as the final criterion, and the larger the value, the more excellent the performance.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, the three-dimensional point cloud in the foregoing step S200 is derived from multiple beams of the 3D lidar, where its horizontal and vertical field ranges are 360 ° and 40 °, respectively, and the horizontal range reaches 300 meters.
Optionally, in the target detection and tracking method based on the fusion of the two-dimensional image and the three-dimensional point cloud in any of the embodiments, the specified format in the step S200 is a format that is convenient for the three-dimensional point cloud algorithm to read in.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the embodiments, the specified format in the step S200 is a pcd format.
Optionally, in the target detection and tracking method based on the fusion of the two-dimensional image and the three-dimensional point cloud in any of the foregoing embodiments, the preprocessing in the foregoing step S300 includes increasing the number of points by an upsampling method when the points are too sparse, and reducing the corresponding number of three-dimensional point clouds by a downsampling method when the points are too dense, so that the three-dimensional point clouds are uniformly distributed on the entire plane.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, the preprocessing in the foregoing step S300 further includes foreground and background extraction, where a loss function is formula (3):
L fore (p u )=-α u (1-p u ) β log(p u ), (3)
wherein
p u Representing different probability processing results of foreground and background points, alpha u And beta is an artificially defined constant for controlling the weights of foreground and background points, L fore (p u ) The method is used for relieving the problem of unbalanced category distribution by means of a Focal Loss function under the condition that the number ratio of foreground points to background points is 1:3 or more.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, the step S400 includes:
s410, reading a specified format file in the KITTI;
s420, inputting a true value file based on the KITTI data set and combining a corresponding three-dimensional point cloud data file and an image file;
s430, fixing the DeepLabv3+ model weight;
s440, training a PointRCNN-deep Labv3+ model;
s450, adopting a specific loss function as a target, and ending the training of the whole deep learning framework until the precision is not obviously improved;
s460, saving the corresponding neural network parameters.
Alternatively, in the target detection and tracking method based on the two-dimensional picture and three-dimensional point cloud fusion in any of the above embodiments, the specified format in the above step S410 is pdc format.
Optionally, in the target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion in any of the foregoing embodiments, step S440 further includes semantic segmentation and three-dimensional point cloud object classification:
s441, extracting three-dimensional frame drawing coordinate information in a KITTI data set and information of two-dimensional frames of left-view and right-view two-dimensional views corresponding to the three-dimensional frame drawing coordinate information, and extracting related classification information;
s442, reading all three-dimensional point cloud data, and configuring target three-dimensional frame information and classification meeting the requirements;
s443, deploying the PointRCNN-deep Labv3+ model into a micro service (dock) in the GPU server.
Further, in the target detection and tracking method based on the two-dimensional picture and the three-dimensional point cloud fusion in the above embodiment, the three-dimensional point cloud object classification includes cars, trucks, pedestrians, riding persons, and the ground.
Optionally, in the target detection and tracking method based on the two-dimensional image and the three-dimensional point cloud fusion in any of the foregoing embodiments, step S440 further includes checking the effect of the pre-training.
Further, in the target detection and tracking method based on the two-dimensional image and the three-dimensional point cloud fusion in the above embodiment, the method for checking the effect of pre-training includes developing the visualization in python by using PCL (point cloud library) library, and then comparing the results with the true values, and counting the ratio of the intersection and union of the two sets of the 3D bounding box and the prediction bounding box of the true value in the three-dimensional space (namely IoU) as the final judgment basis, and the larger the numerical value, the more excellent the performance.
Optionally, in the method for detecting and tracking a target based on fusion of a two-dimensional image and a three-dimensional point cloud in any embodiment, the step S500 includes that in the tracking process, the algorithm model uses PointRCNN-deepcapv3+ to identify the target, uses the cross-over ratio and the distance ratio as a joint matching condition, uses the hungarian algorithm to match the identification result, and realizes the update and tracking of the target state through the filter.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any one of the embodiments, the filter is a 3D kalman filter.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the foregoing embodiments, the step S500 includes:
s510, training an AB3DMOT-MCM-deep Labv3+ model, and inputting Cityscapes image data and KITTI three-dimensional point cloud image data, wherein the three-dimensional point cloud image data comprises batch size, image number and corresponding channel number, and three-dimensional point cloud data number and corresponding channel number;
s520, searching a target region of interest through the PointRCNN by using the identification network, performing semantic segmentation on the image by using the deep Labv3+, taking the three-dimensional point cloud characteristics of the region of interest and the corresponding image semantic segmentation result as supplementary information, and inputting the supplementary information into a second stage of the PointRCNN to obtain a precise target identification result;
s530, the data matching module performs matching calculation by comparing the identification target parameter and the track prediction parameter and using the distance ratio and the intersection ratio: updating the matching track, checking the unmatched track, deleting the unmatched track when the maximum memory time limit is exceeded, otherwise, keeping the original state unchanged; track creation is carried out on the unmatched identification targets;
s540, a 3D Kalman filter predicts and updates coordinates x, y, z, size parameters, yaw angle and relative speed of a target track by using a traditional Kalman filter creation and update mode.
Further, in the target detection and tracking method based on the two-dimensional image and the three-dimensional point cloud fusion in the above embodiment, the step S510 further includes combining the PointRCNN-deepcbrv3+ and the AB3DMOT multi-matching condition tracking model, performing target recognition by using the PointRCNN-deepcbrv3+ and performing target tracking by using the tracking model in the AB3 DMOT.
Optionally, in the target detection and tracking method based on two-dimensional image and three-dimensional point cloud fusion in any of the above embodiments, the matching condition blending ratio and the distance ratio calculation functions in the step S530 are (5) and (6), respectively:
wherein ,Sa 3D bounding volume representing trajectory prediction results, S b A 3D surrounding area volume representing detection recognition results, S a ∩S b Representing the intersection region of the two volumes, S a ∪S b The IoU represents the ratio calculation result of the intersection and union of the two 3D surrounding areas, and the calculation result is used as one of the matching judgment bases according to the set threshold;
wherein ,t1 and t2 Center point coordinates representing the track prediction result and center point coordinates representing the detection recognition result, dis (t) 1 ,t 2 ) The Euclidean distance between two center point coordinates is represented, w is the target width of the track prediction result, posR represents the Euclidean distance between the centers of two 3D bounding boxes and the width ratio calculation result predicted by the tracker, and the calculated value is used as one of the matching judgment bases according to the set threshold.
According to the invention, the feature extraction process taking PointRCNN in AB3DMOT as a leading part and the deep Labv3+ are combined with each other to divide the picture instance, so that each laser data point feature contains space information and has an image semantic division result, the identification effect of the PointRCNN is improved, and the identification accuracy of the pedestrian target with smaller target and higher similarity to the environment is effectively improved. Aiming at the defects of the association degree matching algorithm of the AB3DMOT, a novel multi-condition joint judging mode suitable for pedestrian tracks is provided, so that the target matching capacity is improved, the model is more excellent in pedestrian tracking performance, and the purposes of high efficiency and high accuracy are achieved.
The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.
Drawings
FIG. 1 is a flowchart illustrating a method according to an example embodiment;
FIG. 2 is a schematic diagram illustrating a deep Labv3+ flow scheme according to an exemplary embodiment;
fig. 3 is a diagram illustrating an AB3DMOT-MCM-deep labv3+ architecture according to an exemplary embodiment.
Detailed Description
The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.
In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is schematically and appropriately exaggerated in some places in the drawings for clarity of illustration.
The inventor combines the feature extraction process taking PointRCNN in AB3DMOT as a leading part and the deep Labv3+ to the picture example segmentation result, so that each laser data point feature contains space information and has an image semantic segmentation result, and a novel multi-condition joint judgment mode suitable for pedestrian tracks is provided aiming at the defect of the association matching algorithm of AB3 DMOT. The inventor designs a target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion, as shown in fig. 1, comprising the following steps:
s100, pre-training a deep Labv3+ model, namely reading an image file in Cityscapes, pre-training the deep Labv3+ model based on a truth value file in an image file data set and a corresponding image file, adopting a specific loss function as a target, ending the training of the whole deep learning framework until the accuracy is not obviously improved any more, and storing corresponding neural network parameters; the inventor has determined that the specific loss function is defined as follows, in order to enable the model to achieve accurate semantic segmentation of the image:
L deeplabv3+ (x)=∑w(x)log(p k (x)), (1)
wherein
x is the pixel position on the two-dimensional plane, a k (x) Representing the value of the kth channel corresponding to x in the final output layer of the neural network. P is p k (x) Representing the probability that a pixel belongs to class k. w (x) represents the classification result vector of the real label of the pixel x position, L deeplabv3+ (x) Representing the sum of the probabilities that x belongs to the class of correct tags.
Step S100 is refined, as shown in fig. 2:
s110, inputting Cityscapes image data in pre-training, wherein the Cityrcapes image data comprises batch size, image quantity and channel number;
s120, the encoding network Encoder firstly utilizes a deep convolutional network (DCNN) to extract basic features of the Image. In order to increase the receptive field of the filter, so that the filter can learn global and local information more accurately, the coding network performs feature extraction in a cavity convolution (Atrous Conv) mode, the specific cavity convolution filter types comprise 1X1Conv, 3X3Conv rate 6, 3X3Conv rate 12, 3X3Conv rate 18 and an Image Pooling layer (Image Pooling), different feature graphs are obtained through the operations, the feature graphs are input to a subsequent convolution network after superposition and splicing, the 1X1Conv is used for performing convolution extraction, and finally an effective coding feature result is obtained;
s130, decoding the network Decoder, performing convolution feature extraction on the 1X1Conv by using the primary Features (Low-Level Features) output by the DCNN layer in the coding network to obtain a feature map, and performing 4 times up-sampling (namely Upsamples by 4) amplification on the feature map finally output by the Encoder layer to enable the feature map to be consistent with the feature map obtained before in size and mutually splicing (Concat) to obtain a new feature map. In order to obtain more effective features, the new feature map is amplified again by 3X3Conv and 4 times up-sampling (namely Upsamples by 4), up-sampling is performed layer by layer, finally the original input image size is restored, and the classification result Prediction of each pixel point is output;
step S100 further includes image object classification and semantic segmentation, the image object classification including cars, trucks, pedestrians, riders and ground, specifically including:
s140, extracting image semantic segmentation information in the Cityscapes data set, and extracting classification information of target pixels;
s150, reading all image data, and configuring the image pixel classification meeting the requirements;
s160, deploying the model into a micro service center (dock) in the GPU server.
In addition, step S100 also includes verifying the effectiveness of the pre-training, specifically developing a visualization in python using matplotlib library, and then performing a visual comparison of the results in combination with a truth.
S200, converting three-dimensional point cloud data into a specified format, wherein the three-dimensional point cloud data are from multi-line bundles of a 3D laser radar, the horizontal and vertical visual fields are 360 degrees and 40 degrees respectively, and the horizontal range reaches 300 meters; the specified format is a format which is convenient for the three-dimensional point cloud algorithm to read in, such as a pcd format.
S300, preprocessing the three-dimensional point cloud data in a specified format, wherein the preprocessing comprises the steps of increasing the number of points by adopting an up-sampling method when the points are too sparse, and reducing the corresponding three-dimensional point cloud number by adopting a down-sampling method when the points are too dense so that the three-dimensional point cloud is uniformly distributed on the whole plane; the preprocessing also comprises foreground and background extraction, wherein the loss function is as follows:
L fore (p u )=-α u (1-p u ) β log(p u ), (3)
wherein
p u Representing different probability processing results of foreground and background points, alpha u And beta is an artificially defined constant for controlling the weights of foreground and background points, L fore (p u ) Is a Focal Loss function and is used for relieving the problem of unbalanced distribution of categories when the number of foreground points and the number of background points are greatly different.
S400, training a PointRCNN-deep Labv3+ model, comprising:
s410, reading a specified format file in the KITTI, which is generally in a pcd format;
s420, inputting a truth value file based on the KITTI data set (the last step) and combining the corresponding three-dimensional point cloud data file and the image file;
s430, fixing the DeepLabv3+ model weight;
s440, training a PointRCNN-deep Labv3+ model, wherein the model comprises semantic segmentation and three-dimensional point cloud object classification, and the three-dimensional point cloud object classification comprises cars, trucks, pedestrians, riding persons and the ground, and specifically comprises the following steps:
s441, extracting three-dimensional frame drawing coordinate information in a KITTI data set and information of two-dimensional frames of left-view and right-view two-dimensional views corresponding to the three-dimensional frame drawing coordinate information, and extracting related classification information;
s442, reading all three-dimensional point cloud data, and configuring target three-dimensional frame information and classification meeting the requirements;
s443, deploying the model into a micro service center (dock) in the GPU server;
s450, adopting a specific loss function as a target, and ending the training of the whole deep learning framework until the precision is not obviously improved;
s460, saving the corresponding neural network parameters.
Step S400 also includes verifying the effectiveness of the pre-training, specifically including developing a visualization in python using the PCL (point cloud library) library, and then performing a visual comparison of the results in combination with the truth values.
S500, target state updating and tracking are achieved, the algorithm model utilizes PointRCNN-deep Labv3+ to identify targets, the intersection ratio and the distance ratio are used as joint matching conditions, the Hungary algorithm is utilized to match identification results, target state updating and tracking are achieved through a filter, and the 3D Kalman filter is selected by the filter. As shown in fig. 3, the method specifically includes:
s510, training an AB3DMOT-MCM-deep Labv3+ model, wherein the network structure combines a PointRCNN-deep Labv3+ and an AB3DMOT multi-matching condition tracking model, namely 3D target detection and identification are carried out by using the PointRCNN-deep Labv3+ and then target tracking is carried out by using a tracking model in the AB3 DMOT. Respectively inputting Cityscapes image data and KITTI three-dimensional point cloud image data to PointRCNN and deep Labv3+, wherein the Cityscapes image data comprises batch size, image quantity and corresponding channel number, and three-dimensional point cloud data quantity and corresponding channel number;
s520, the 3D target detection network searches a target region of interest through the PointRCNN, performs semantic segmentation on the image by using DeepLabv3+, takes three-dimensional point cloud characteristics of the region of interest and corresponding image semantic segmentation results as supplementary information, and inputs the supplementary information into a second stage of the PointRCNN (namely data matching) to obtain a precise target recognition result;
s530, the data matching module performs matching calculation by comparing the identification target parameter and the track prediction parameter and utilizing the distance ratio PosR and the intersection ratio IoU, wherein the specific process is as follows: for each predicted trajectory and bounding box of the identified target, the results of the distance ratio PosR and the intersection ratio IoU are calculated first, added according to the weight of 0.5, and when the sum exceeds the set threshold value of 0.3, the matching is indicated, and for the target trajectories smaller than 0.3, the value of the intersection ratio IoU is calculated separately and matched, and when the sum exceeds the set threshold value of 0.3, the matching is classified as trajectory matching. Updating all the matching results, and if tracking is not matched, deleting the track if the maximum memory time limit is exceeded, otherwise, keeping the original state unchanged; and carrying out track generation on the detection mismatch. And after all result processing is completed, carrying out updating prediction on the 3D Kalman filter for subsequent track matching calculation. Wherein the inventor designs matching condition intersection ratio IoU and distance ratio PosR calculation functions as (5) and (6) respectively:
wherein ,Sa 3D bounding volume representing trajectory prediction results, S b A 3D surrounding area volume representing detection recognition results, S a ∩S b Representing the intersection region of the two volumes, S a ∪S b The IoU represents the ratio calculation result of the intersection and union of the two 3D surrounding areas, and the calculation result is used as one of the matching judgment bases according to the set threshold;
wherein ,t1 and t2 Center point coordinates representing the track prediction result and center point coordinates representing the detection recognition result, dis (t) 1 ,t 2 ) The Euclidean distance between two center point coordinates is represented, w is the target width of the track prediction result, posR represents the Euclidean distance between the centers of two 3D bounding boxes and the width ratio calculation result predicted by the tracker, and the calculated value is used as one of the matching judgment bases according to the set threshold.
S540, a 3D Kalman filter predicts and updates coordinates x, y, z, size parameters, yaw angle and relative speed of a target track by using a traditional Kalman filter creation and update mode.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.
Claims (9)
1. The target detection and tracking method based on the fusion of the two-dimensional picture and the three-dimensional point cloud is characterized by comprising the following steps of:
s100, pre-training a deep Labv3+ model, namely reading an image file in Cityscapes, pre-training the deep Labv3+ model based on a truth value file in an image file data set and a corresponding image file, adopting a specific loss function as a target, ending training of the whole deep learning frame until the precision is not obviously improved any more, and storing corresponding neural network parameters, wherein the specific loss function is shown as a formula (1), so that the model can realize accurate semantic segmentation of the image:
L deeplabv3+ (x)=∑w(x)log(p k (x)), (1);
wherein ,
x is the pixel position on the two-dimensional plane, a k (x) Representing the value of the kth channel corresponding to x in the final output layer of the neural network; p is p k (x) Representing the probability that a pixel belongs to class k; w (x) represents the classification result vector of the real label of the pixel x position, L deeplabv3+ (x) Representing the sum of probabilities of the classification to which the correct label belongs;
further comprises:
s110, inputting Cityscapes image data in training, wherein the Cityscapes image data comprises batch size, image quantity and channel quantity;
s120, the coding network obtains feature graphs with different hole sizes through hole convolution, and the feature graphs are input to a subsequent convolution network for feature extraction after superposition and splicing, so that an effective coding feature result is finally obtained;
s130, the decoding network performs information supplementation through the full convolution and the characteristics of the corresponding layers in the coding network, performs layer-by-layer up sampling, finally restores to the original input image size, and outputs the classification information of each pixel point;
s200, converting the three-dimensional point cloud data into a specified format;
s300, preprocessing three-dimensional point cloud data in a specified format, wherein the preprocessing further comprises foreground and background extraction, and a loss function is shown as a formula (3):
L fore (p u )=-α u (1-p u ) β log(p u ), (3);
wherein ,
p u representing different probability processing results of foreground and background points, alpha u And beta is an artificially defined constant for controlling the weights of foreground and background points, L fore (p u ) The method is used for relieving the problem of unbalanced category distribution in a manner of Focal Loss function under the condition that the number ratio of foreground points to background points is 1:3 or more different;
s400, training a PointRCNN-deep Labv3+ model;
s500, updating and tracking the target state.
2. The method for detecting and tracking a target based on two-dimensional picture and three-dimensional point cloud fusion according to claim 1, wherein the pre-training of step S100 further comprises semantic segmentation and image object classification:
s140, extracting image semantic segmentation information in the Cityscapes data set, and extracting classification information of target pixels;
s150, reading all image data, and configuring the image pixel classification meeting the requirements;
s160, deploying deep Labv3+ into one micro-service (dock) in the GPU server.
3. The method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion according to claim 2, wherein said step S100 further comprises checking the effect of pre-training.
4. A method for detecting and tracking targets based on two-dimensional pictures and three-dimensional point cloud fusion as claimed in claim 3, wherein the method for checking the effect of pre-training comprises the steps of developing a visualization in python by using a matplotlib library, then comparing results by combining true values, and taking the ratio of the intersection and union of two sets of true values and predicted values in pixels of a statistical image as a final judgment basis, wherein the larger the value is, the more excellent the value is.
5. The method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion according to claim 4, wherein the preprocessing in step S300 includes increasing the number of points by up-sampling when the points are too sparse, and reducing the number of corresponding three-dimensional point clouds by down-sampling when the points are too dense, so that the three-dimensional point clouds are uniformly distributed in the whole plane.
6. The method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion according to claim 5, wherein the step S400 comprises:
s410, reading a specified format file in the KITTI;
s420, inputting a true value file based on the KITTI data set and combining a corresponding three-dimensional point cloud data file and an image file;
s430, fixing the DeepLabv3+ model weight;
s440, training a PointRCNN-deep Labv3+ model;
s450, adopting a specific loss function as a target, and ending the training of the whole deep learning framework until the precision is not obviously improved;
s460, saving the corresponding neural network parameters.
7. The method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion as claimed in claim 6, wherein said step S440 further comprises semantic segmentation and three-dimensional point cloud object classification:
s441, extracting three-dimensional frame drawing coordinate information in a KITTI data set and information of two-dimensional frames of left-view and right-view two-dimensional views corresponding to the three-dimensional frame drawing coordinate information, and extracting related classification information;
s442, reading all three-dimensional point cloud data, and configuring target three-dimensional frame information and classification meeting the requirements;
s443, deploying the PointRCNN-deep Labv3+ model into one micro-service in the GPU server.
8. The method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion according to claim 7, wherein the step S500 comprises:
s510, training an AB3DMOT-MCM-deep Labv3+ model, and inputting Cityscapes image data and KITTI three-dimensional point cloud image data, wherein the three-dimensional point cloud image data comprises batch size, image number and corresponding channel number, and three-dimensional point cloud data number and corresponding channel number;
s520, searching a target region of interest through the PointRCNN by using the identification network, performing semantic segmentation on the image by using the deep Labv3+, taking the three-dimensional point cloud characteristics of the region of interest and the corresponding image semantic segmentation result as supplementary information, and inputting the supplementary information into a second stage of the PointRCNN to obtain a precise target identification result;
s530, the data matching module performs matching calculation by comparing the identification target parameter and the track prediction parameter and using the distance ratio and the intersection ratio: updating the matching track, checking the unmatched track, deleting the unmatched track when the maximum memory time limit is exceeded, otherwise, keeping the original state unchanged; track creation is carried out on the unmatched identification targets;
s540, a 3D Kalman filter predicts and updates coordinates x, y, z, size parameters, yaw angle and relative speed of a target track by using a traditional Kalman filter creation and update mode.
9. A method for detecting and tracking a target based on two-dimensional image and three-dimensional point cloud fusion according to claim 8, wherein the step S510 further comprises combining the PointRCNN-deepcbrv3+ and AB3DMOT multi-matching condition tracking model, performing target recognition by using the PointRCNN-deepcbrv3+ and performing target tracking by using the tracking model in AB3 DMOT.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010466491.4A CN111626217B (en) | 2020-05-28 | 2020-05-28 | Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010466491.4A CN111626217B (en) | 2020-05-28 | 2020-05-28 | Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111626217A CN111626217A (en) | 2020-09-04 |
CN111626217B true CN111626217B (en) | 2023-08-22 |
Family
ID=72259546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010466491.4A Active CN111626217B (en) | 2020-05-28 | 2020-05-28 | Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111626217B (en) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487884A (en) * | 2020-11-16 | 2021-03-12 | 香港中文大学(深圳) | Traffic violation behavior detection method and device and computer readable storage medium |
WO2022104774A1 (en) * | 2020-11-23 | 2022-05-27 | 华为技术有限公司 | Target detection method and apparatus |
CN112598735B (en) * | 2020-12-21 | 2024-02-27 | 西北工业大学 | Single image object pose estimation method integrating three-dimensional model information |
CN114758333B (en) * | 2020-12-29 | 2024-02-13 | 北京瓦特曼科技有限公司 | Identification method and system for unhooking hook of ladle lifted by travelling crane of casting crane |
CN112686865B (en) * | 2020-12-31 | 2023-06-02 | 重庆西山科技股份有限公司 | 3D view auxiliary detection method, system, device and storage medium |
CN112712089B (en) * | 2020-12-31 | 2024-09-20 | 的卢技术有限公司 | Obstacle detection method, obstacle detection device, computer device, and storage medium |
CN112329749B (en) * | 2021-01-05 | 2021-04-27 | 新石器慧通(北京)科技有限公司 | Point cloud labeling method and labeling equipment |
CN112884705B (en) * | 2021-01-06 | 2024-05-14 | 西北工业大学 | Two-dimensional material sample position visualization method |
CN112700429B (en) * | 2021-01-08 | 2022-08-26 | 中国民航大学 | Airport pavement underground structure disease automatic detection method based on deep learning |
CN112862858A (en) * | 2021-01-14 | 2021-05-28 | 浙江大学 | Multi-target tracking method based on scene motion information |
CN113424220B (en) * | 2021-03-30 | 2024-03-01 | 商汤国际私人有限公司 | Processing for generating point cloud completion network and point cloud data |
CN113239749B (en) * | 2021-04-27 | 2023-04-07 | 四川大学 | Cross-domain point cloud semantic segmentation method based on multi-modal joint learning |
CN113189610B (en) * | 2021-04-28 | 2024-06-14 | 中国科学技术大学 | Map-enhanced autopilot multi-target tracking method and related equipment |
CN113177969B (en) * | 2021-04-29 | 2022-07-15 | 哈尔滨工程大学 | Point cloud single-target tracking method of candidate seeds based on motion direction change |
CN113239829B (en) * | 2021-05-17 | 2022-10-04 | 哈尔滨工程大学 | Cross-dimension remote sensing data target identification method based on space occupation probability characteristics |
CN113255504B (en) * | 2021-05-19 | 2022-07-22 | 燕山大学 | Road side visual angle beyond visual range global fusion perception system based on deep learning |
CN113111978B (en) * | 2021-06-11 | 2021-10-01 | 之江实验室 | Three-dimensional target detection system and method based on point cloud and image data |
CN113421242B (en) * | 2021-06-23 | 2023-10-27 | 河北科技大学 | Welding spot appearance quality detection method and device based on deep learning and terminal |
CN113378760A (en) * | 2021-06-25 | 2021-09-10 | 北京百度网讯科技有限公司 | Training target detection model and method and device for detecting target |
CN113537316B (en) * | 2021-06-30 | 2024-04-09 | 南京理工大学 | Vehicle detection method based on 4D millimeter wave radar point cloud |
CN113689393A (en) * | 2021-08-19 | 2021-11-23 | 东南大学 | Three-dimensional target detection algorithm based on image and point cloud example matching |
CN113780446A (en) * | 2021-09-16 | 2021-12-10 | 广州大学 | Lightweight voxel deep learning method capable of being heavily parameterized |
CN114266891B (en) * | 2021-11-17 | 2024-09-24 | 京沪高速铁路股份有限公司 | Railway operation environment abnormality identification method based on image and laser data fusion |
CN114120075B (en) * | 2021-11-25 | 2024-09-24 | 武汉大学 | Three-dimensional target detection method integrating monocular camera and laser radar |
CN114119671B (en) * | 2021-12-01 | 2022-09-09 | 清华大学 | Multi-target tracking method based on occlusion compensation and used for three-dimensional space information fusion |
CN114638954B (en) * | 2022-02-22 | 2024-04-19 | 深圳元戎启行科技有限公司 | Training method of point cloud segmentation model, point cloud data segmentation method and related device |
CN114581523A (en) * | 2022-03-04 | 2022-06-03 | 京东鲲鹏(江苏)科技有限公司 | Method and device for determining labeling data for monocular 3D target detection |
CN114612895B (en) * | 2022-03-18 | 2024-08-23 | 上海伯镭智能科技有限公司 | Road detection method and device in nonstandard road scene |
CN114693909A (en) * | 2022-03-31 | 2022-07-01 | 苏州蓝图智慧城市科技有限公司 | Microcosmic vehicle track sensing equipment based on multi-sensor machine vision fusion |
CN115719443A (en) * | 2022-12-01 | 2023-02-28 | 上海人工智能创新中心 | Method and system for using 2D pre-training model as 3D downstream task backbone network |
CN116758518B (en) * | 2023-08-22 | 2023-12-01 | 安徽蔚来智驾科技有限公司 | Environment sensing method, computer device, computer-readable storage medium and vehicle |
TWI842641B (en) * | 2023-10-19 | 2024-05-11 | 財團法人車輛研究測試中心 | Sensor fusion and object tracking system and method thereof |
CN117237401B (en) * | 2023-11-08 | 2024-02-13 | 北京理工大学前沿技术研究院 | Multi-target tracking method, system, medium and equipment for fusion of image and point cloud |
CN118397282B (en) * | 2024-06-27 | 2024-08-30 | 中国民用航空飞行学院 | Three-dimensional point cloud robustness component segmentation method based on semantic SAM large model |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107610084A (en) * | 2017-09-30 | 2018-01-19 | 驭势科技(北京)有限公司 | A kind of method and apparatus that information fusion is carried out to depth image and laser spots cloud atlas |
CN108509918A (en) * | 2018-04-03 | 2018-09-07 | 中国人民解放军国防科技大学 | Target detection and tracking method fusing laser point cloud and image |
US10404261B1 (en) * | 2018-06-01 | 2019-09-03 | Yekutiel Josefsberg | Radar target detection system for autonomous vehicles with ultra low phase noise frequency synthesizer |
CN110321910A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | Feature extracting method, device and equipment towards cloud |
CN110414418A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of Approach for road detection of image-lidar image data Multiscale Fusion |
CN110675431A (en) * | 2019-10-08 | 2020-01-10 | 中国人民解放军军事科学院国防科技创新研究院 | Three-dimensional multi-target tracking method fusing image and laser point cloud |
CN110751090A (en) * | 2019-10-18 | 2020-02-04 | 宁波博登智能科技有限责任公司 | Three-dimensional point cloud labeling method and device and electronic equipment |
CN110929692A (en) * | 2019-12-11 | 2020-03-27 | 中国科学院长春光学精密机械与物理研究所 | Three-dimensional target detection method and device based on multi-sensor information fusion |
CN111062423A (en) * | 2019-11-29 | 2020-04-24 | 中国矿业大学 | Point cloud classification method of point cloud graph neural network based on self-adaptive feature fusion |
-
2020
- 2020-05-28 CN CN202010466491.4A patent/CN111626217B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107610084A (en) * | 2017-09-30 | 2018-01-19 | 驭势科技(北京)有限公司 | A kind of method and apparatus that information fusion is carried out to depth image and laser spots cloud atlas |
CN110321910A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | Feature extracting method, device and equipment towards cloud |
CN108509918A (en) * | 2018-04-03 | 2018-09-07 | 中国人民解放军国防科技大学 | Target detection and tracking method fusing laser point cloud and image |
US10404261B1 (en) * | 2018-06-01 | 2019-09-03 | Yekutiel Josefsberg | Radar target detection system for autonomous vehicles with ultra low phase noise frequency synthesizer |
CN110414418A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of Approach for road detection of image-lidar image data Multiscale Fusion |
CN110675431A (en) * | 2019-10-08 | 2020-01-10 | 中国人民解放军军事科学院国防科技创新研究院 | Three-dimensional multi-target tracking method fusing image and laser point cloud |
CN110751090A (en) * | 2019-10-18 | 2020-02-04 | 宁波博登智能科技有限责任公司 | Three-dimensional point cloud labeling method and device and electronic equipment |
CN111062423A (en) * | 2019-11-29 | 2020-04-24 | 中国矿业大学 | Point cloud classification method of point cloud graph neural network based on self-adaptive feature fusion |
CN110929692A (en) * | 2019-12-11 | 2020-03-27 | 中国科学院长春光学精密机械与物理研究所 | Three-dimensional target detection method and device based on multi-sensor information fusion |
Non-Patent Citations (1)
Title |
---|
Sourabh Vora,et.al..PointPainting: Sequential Fusion for 3D Object Detection.《arXiv:1911.10150v1》.2019,1-10. * |
Also Published As
Publication number | Publication date |
---|---|
CN111626217A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111626217B (en) | Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion | |
CN111201451B (en) | Method and device for detecting object in scene based on laser data and radar data of scene | |
CN113359810B (en) | Unmanned aerial vehicle landing area identification method based on multiple sensors | |
Zhu et al. | Overview of environment perception for intelligent vehicles | |
US20210390329A1 (en) | Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium | |
Fang et al. | 3d-siamrpn: An end-to-end learning method for real-time 3d single object tracking using raw point cloud | |
Chen et al. | Vehicle detection in high-resolution aerial images via sparse representation and superpixels | |
CN113506318B (en) | Three-dimensional target perception method under vehicle-mounted edge scene | |
CN115049700A (en) | Target detection method and device | |
Ahmad et al. | An edge-less approach to horizon line detection | |
Wang et al. | An overview of 3d object detection | |
US12008762B2 (en) | Systems and methods for generating a road surface semantic segmentation map from a sequence of point clouds | |
Balaska et al. | Enhancing satellite semantic maps with ground-level imagery | |
Zhu et al. | A review of 6d object pose estimation | |
CN114325634A (en) | Method for extracting passable area in high-robustness field environment based on laser radar | |
Dewangan et al. | Towards the design of vision-based intelligent vehicle system: methodologies and challenges | |
Zhang et al. | Gc-net: Gridding and clustering for traffic object detection with roadside lidar | |
CN113255779A (en) | Multi-source perception data fusion identification method and system and computer readable storage medium | |
Chiang et al. | 3D point cloud classification for autonomous driving via dense-residual fusion network | |
Li et al. | Real-time monocular joint perception network for autonomous driving | |
Gökçe et al. | Recognition of dynamic objects from UGVs using Interconnected Neuralnetwork-based Computer Vision system | |
Danapal et al. | Sensor fusion of camera and LiDAR raw data for vehicle detection | |
Berrio et al. | Fusing lidar and semantic image information in octree maps | |
Zhao et al. | DHA: Lidar and vision data fusion-based on road object classifier | |
CN116664851A (en) | Automatic driving data extraction method based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 315048 room 5-1-1, building 22, East Zone, Ningbo new material innovation center, high tech Zone, Ningbo, Zhejiang Province Applicant after: Ningbo Boden Intelligent Technology Co.,Ltd. Address before: 315040 room 210-521, floor 2, building 003, No. 750, Chuangyuan Road, high tech Zone, Ningbo, Zhejiang Applicant before: NINGBO BODEN INTELLIGENT TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |