WO2024199378A1 - Obstacle feature recognition model training method and apparatus, device, and storage medium - Google Patents
Obstacle feature recognition model training method and apparatus, device, and storage medium Download PDFInfo
- Publication number
- WO2024199378A1 WO2024199378A1 PCT/CN2024/084501 CN2024084501W WO2024199378A1 WO 2024199378 A1 WO2024199378 A1 WO 2024199378A1 CN 2024084501 W CN2024084501 W CN 2024084501W WO 2024199378 A1 WO2024199378 A1 WO 2024199378A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voxel
- obstacle
- real
- information
- features
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000012549 training Methods 0.000 title claims abstract description 57
- 238000003860 storage Methods 0.000 title claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 37
- 238000004590 computer program Methods 0.000 claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
Definitions
- the present disclosure relates to the technical field of obstacle detection, and in particular to a training method and apparatus for an obstacle feature recognition model, a computer device, a computer-readable storage medium, a computer program product, and a computer program.
- Self-driving cars also known as driverless cars, are a technology that uses computer equipment to control vehicles to drive automatically on the road.
- the realization of self-driving relies on the collaboration of artificial intelligence, visual computing, radar, and positioning components. Due to the complex actual road conditions and the presence of a large number of obstacles such as pedestrians and vehicles, how to achieve obstacle recognition, clarify the characteristics of obstacles in three-dimensional space, and then plan a driving route to avoid obstacles has become the key to self-driving.
- the related obstacle recognition technology mainly recognizes obstacles based on images captured by cameras due to the low cost of cameras. Although there is technology that can convert two-dimensional data collected by cameras into three-dimensional data, its accuracy is poor, resulting in the obstacle features obtained using images and recognition models being of low accuracy in three-dimensional space, affecting the subsequent decision-making of autonomous driving.
- At least one embodiment of the present disclosure provides a training method and apparatus for an obstacle feature recognition model, a computer device, a computer-readable storage medium, a computer program product, and a computer program.
- an embodiment of the present disclosure provides a method for training an obstacle feature recognition model, comprising:
- the initial model includes a feature extraction network, a feature conversion network, and an obstacle feature prediction network;
- the initial model is trained according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- an embodiment of the present disclosure provides a method for acquiring obstacle features, including:
- the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
- the feature extraction network is used to extract the two-dimensional features corresponding to the surround view image
- the feature conversion network is used to convert the two-dimensional features into three-dimensional features
- the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- the present disclosure provides a training device for an obstacle feature recognition model, including:
- the first acquisition module is used to acquire the surround view image and point cloud data corresponding to the same obstacle
- a determination module configured to determine the real obstacle feature corresponding to each voxel of the obstacle based on the point cloud data and preset voxels in the corresponding three-dimensional space;
- a second acquisition module is used to input the surround view image into an initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network;
- the training module is used to train the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- an embodiment of the present disclosure provides a device for acquiring obstacle characteristics, including:
- An image acquisition module is used to acquire a surround view image of obstacles
- a feature prediction module used to input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle;
- the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
- the feature extraction network is used to extract the two-dimensional features corresponding to the surround view image
- the feature conversion network is used to convert the two-dimensional features into three-dimensional features
- the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- an embodiment of the present disclosure provides a computer device, including: a processor and a memory;
- the processor is used to execute any of the obstacle feature recognition model training methods provided in the embodiments of the present disclosure, or to execute any of the obstacle feature acquisition methods provided in the embodiments of the present disclosure, by calling the program or instruction stored in the memory.
- an embodiment of the present disclosure provides a computer-readable storage medium, wherein the computer-readable storage medium stores a program or instruction, wherein the program or instruction enables a computer to execute any of the training methods for the obstacle feature recognition model provided in the embodiments of the present disclosure, or execute any of the methods for acquiring obstacle features provided in the embodiments of the present disclosure.
- an embodiment of the present disclosure provides a computer program product, which, when executed by a processor, implements the training method of the obstacle feature recognition model described in any embodiment of the first aspect of the present disclosure, or the method for obtaining obstacle features provided in any embodiment of the second aspect of the present disclosure.
- an embodiment of the present disclosure provides a computer program, wherein the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes any one of the first aspects of the present disclosure.
- the surround view image and point cloud data corresponding to the same obstacle are obtained, and based on the point cloud data and the preset voxels in the corresponding three-dimensional space, the real obstacle features corresponding to the obstacle in each voxel are determined, and the surround view image is input into the initial model to be trained to obtain the obstacle prediction features corresponding to the obstacle, and then the initial model is trained according to the real obstacle features and the obstacle prediction features corresponding to the obstacle to obtain the obstacle feature recognition model.
- the initial model is used to predict the obstacle features of the surround view image, and the real obstacle features determined based on the point cloud data are used as the training target, and the initial model is trained to obtain the obstacle feature recognition model, so that the trained model can learn accurate obstacle features from the surround view image, improve the recognition accuracy of the obstacle feature recognition model for the 3D features of the obstacle, and thus help improve the accuracy of obstacle recognition.
- FIG1 is a schematic diagram of a flow chart of a method for training an obstacle feature recognition model provided by an embodiment of the present disclosure
- FIG2 is a schematic flow chart of a method for training an obstacle feature recognition model provided by another embodiment of the present disclosure
- FIG3 is a flow chart of a method for training an obstacle feature recognition model provided by another embodiment of the present disclosure.
- FIG4 is a flow chart of a method for training an obstacle feature recognition model provided in yet another embodiment of the present disclosure.
- FIG5 is a schematic diagram showing a network structure of an obstacle feature recognition model according to a specific embodiment of the present disclosure
- FIG6 is a schematic flow chart of a method for acquiring obstacle features according to an embodiment of the present disclosure.
- FIG7 is a schematic diagram of the structure of a training device for an obstacle feature recognition model provided by an embodiment of the present disclosure
- FIG8 is a schematic diagram of the structure of an obstacle feature acquisition device provided in an embodiment of the present disclosure.
- Figure 1 is a flow chart of a training method for an obstacle feature recognition model provided in an embodiment of the present disclosure.
- the training method for an obstacle feature recognition model can be executed by a training device for an obstacle feature recognition model provided in an embodiment of the present disclosure.
- the training device for an obstacle feature recognition model can be implemented using software and/or hardware and can be integrated on a computer device, which can be an electronic device such as a computer or a server.
- the training method of the obstacle feature recognition model provided in the embodiment of the present disclosure may include steps 101 to 104 .
- Step 101 obtaining surround view images and point cloud data corresponding to the same obstacle.
- the surround image includes multiple images, for example, the surround image may include images in six directions: front, rear, left front, right front, left rear and right rear.
- Obstacles may be, for example, buildings, trees, human bodies, vehicles and the like.
- a surround view image of the obstacle captured by a camera and point cloud data of the obstacle captured by various sensors may be obtained, for example, point cloud data of the obstacle may be obtained by a laser radar.
- LiDAR is a system that integrates three technologies: laser, Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS). Through the combination of these three technologies, it can not only actively and in real time perceive the environment and the dynamic spatial position relationship of objects, but also generate accurate three-dimensional spatial models under the condition of consistent absolute measurement points. It is used in surface remote sensing, such as ground elevation and landform, forestry survey and other data acquisition, as well as autonomous driving and high-precision map production.
- GNSS Global Navigation Satellite System
- INS Inertial Navigation System
- LiDAR actively emits a laser beam and measures the time it takes for the light to hit an object or surface and then reflect back to calculate the distance from the target point to the laser radar.
- the above behavior will obtain millions of data points in a rapid and repeated process.
- the system will build a complex "map" of the geographic space surface it is measuring, called a "point cloud.” Since the point cloud is the feedback value of the laser radar's received light beam on the object, each point in the point cloud contains three-dimensional coordinate values, which are the three elements we often call x, y, and z.
- the point cloud data can be collected manually or automatically.
- the point cloud data within the obstacle range can be collected by manually controlling the laser radar.
- the point cloud data within the obstacle range can be collected autonomously by the laser radar on the autonomous mobile device during movement.
- surround view images and point cloud data may be acquired for multiple obstacles, and corresponding surround view images and point cloud data may be acquired for each obstacle.
- Step 102 Based on the point cloud data and preset voxels in the corresponding three-dimensional space, determine the real features of the obstacle corresponding to each voxel of the obstacle.
- the preset voxels may be obtained by dividing the three-dimensional space into voxels according to a preset size.
- the acquired point cloud data can be converted into a three-dimensional space, and the three-dimensional space can be divided into voxels to obtain a plurality of voxels.
- the point cloud data in each voxel can then be processed to obtain the true features of the obstacle corresponding to the obstacle in each voxel.
- the real features of the obstacle may include the real occupancy information of each voxel when the three-dimensional space corresponding to the obstacle is divided into multiple voxels.
- the occupancy information reflects whether the voxel contains an object.
- the occupied voxel contains an object, and the unoccupied voxel contains an object.
- Occupied voxels do not contain objects; they may also include the true position and size information of occupied voxels in obstacles.
- obstacles in the point cloud data are identified as real, and then the obstacles are used as a benchmark, and the determined real features of the obstacles are used as training targets to participate in subsequent supervised learning.
- Step 103 input the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle.
- the initial model may include a feature extraction network, a feature conversion network, and an obstacle feature prediction network. It is understood that the network structure of the initial model is not limited to the above three networks and can be set according to actual needs, which is not limited in the embodiments of the present disclosure.
- the initial model may include a feature extraction network, a feature conversion network, and an obstacle feature prediction network, and each of the aforementioned networks may be one or more.
- the feature extraction network is used to extract different features.
- the initial model may include the currently commonly used feature extraction network and feature conversion network.
- the feature extraction network is used to extract two-dimensional features based on the surround image, and the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and then the obstacle feature prediction network predicts obstacle features based on the three-dimensional features.
- the obstacle feature prediction network can be one or more. When there are multiple obstacle feature prediction networks, each network can be used to predict features of different categories, and the features predicted by each obstacle feature prediction network are used as the final obstacle features.
- the surround view image is input into the initial model to be trained, and the initial model performs feature extraction, feature conversion, obstacle feature prediction and other processing on the surround view image, and finally outputs the obstacle prediction features obtained based on the surround view image prediction.
- the obstacle prediction features output by the initial model may include at least one of voxel occupancy prediction information, voxel position prediction information and voxel size prediction information.
- Step 104 training the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- the initial model after obtaining the obstacle prediction features output by the initial model, the initial model can be trained based on the obstacle prediction features and the real features of the obstacle corresponding to the same obstacle.
- a trained obstacle feature recognition model is obtained.
- the obstacle feature recognition model can more accurately identify the 3D features of obstacles in the surround view image, and provide data support for accurately identifying obstacles within obstacles.
- the network parameters of the initial model can be adjusted according to the difference between the predicted features of the obstacle and the real features of the obstacle. For example, if there is a convolutional neural network in the initial model, the number of convolution kernels, step size and other parameters can be adjusted. For another example, the parameters of the pooling layer can be adjusted.
- a preset loss function can be used to calculate the loss value based on the data of the obstacle prediction features and the obstacle real features.
- the network parameters of the initial model are adjusted; when there is no difference between the loss value and the loss threshold or the difference is within the allowable error range, the training is terminated to obtain a trained obstacle feature recognition model.
- the obstacle feature recognition model training method of the disclosed embodiment obtains the surrounding view corresponding to the same obstacle
- the system uses the image and point cloud data, divides the point cloud data into voxels in three-dimensional space based on the point cloud data, determines the real obstacle features corresponding to the obstacle in each voxel, inputs the surround view image into the initial model to be trained, obtains the obstacle prediction features corresponding to the obstacle, and then trains the initial model according to the real obstacle features and the obstacle prediction features corresponding to the obstacle to obtain an obstacle feature recognition model.
- the initial model is used to predict the obstacle features of the surround view image
- the real obstacle features determined based on the point cloud data are used as the training target to train the initial model to obtain the obstacle feature recognition model, so that the trained model can learn accurate obstacle features from the surround view image, improve the recognition accuracy of the obstacle feature recognition model for the 3D features of the obstacle, and thus help improve the accuracy of obstacle recognition.
- the real features of the obstacle may include real voxel occupancy information
- the obstacle feature prediction network may include a voxel occupancy prediction network.
- step 103 may include steps 201 to 203.
- Step 201 input the surround view image into a feature extraction network to obtain two-dimensional features corresponding to the surround view image.
- the feature extraction network in the initial model may be a currently commonly used image feature extraction network, which is used to extract two-dimensional features from the surround image.
- Step 202 input the two-dimensional features into the feature conversion network to obtain the three-dimensional features corresponding to the obstacles.
- the feature conversion network in the initial model can be a currently commonly used feature conversion network, and the feature conversion network is used to perform feature conversion and restore the input two-dimensional features to three-dimensional features.
- the surround view image is first subjected to feature extraction by the feature extraction network of the initial model to obtain two-dimensional features, and then the feature conversion network in the initial model performs feature conversion to restore the two-dimensional features to three-dimensional features to obtain the three-dimensional features of the obstacle.
- Step 203 input the three-dimensional features into the voxel occupancy prediction network to obtain voxel occupancy prediction information output by the voxel occupancy prediction network.
- the voxel occupancy prediction network is a network built according to actual needs, that is, the obstacle feature prediction network includes the voxel occupancy prediction network.
- the voxel occupancy prediction network can have a variety of structures, and it can predict the occupancy information of voxels based on three-dimensional features.
- the embodiments of the present disclosure do not limit the specific structure of the voxel occupancy prediction network.
- the three-dimensional features output by the feature conversion network of the initial network are input to the voxel occupancy prediction network, and the voxel occupancy prediction network outputs the voxel occupancy prediction information.
- the voxel occupancy prediction network when the voxel occupancy prediction network makes a prediction based on the three-dimensional features, it can first divide the three-dimensional features into a plurality of voxels, and then predict the occupancy information corresponding to each voxel.
- the voxel occupancy prediction information is used to reflect whether the voxel is occupied. If the voxel is occupied, it means that the voxel contains an object, and if the voxel is not occupied, it means that the voxel does not contain an object.
- the voxel occupancy prediction information can be represented by a preset identifier. For example, “1" is preset to indicate that the voxel is occupied, and "0" is preset to indicate that the voxel is not occupied.
- the voxel occupancy prediction network predicts the occupancy state of each voxel in the three-dimensional feature, for a certain voxel, if the prediction result is that the voxel is occupied, the voxel is marked as "1", and if the prediction result is that the voxel is not occupied, the voxel is marked as "0".
- the label information of the voxel represents the voxel occupancy prediction. Test information.
- the initial model can learn the voxel occupancy of the input surround image. Since whether a voxel is occupied reflects whether there is an object in the voxel, this also provides conditions for further predicting the position of obstacles and narrowing the obstacle detection range.
- the real features of the obstacle may also include the real voxel position information and the real voxel size information corresponding to the voxel
- the obstacle feature prediction network may also include a voxel position prediction network and a voxel size prediction network, as shown in FIG. 2
- step 103 may also include steps 204 to 205.
- Step 204 input the three-dimensional features into the voxel position prediction network to obtain voxel position prediction information output by the voxel position prediction network.
- the voxel position prediction network is a network built according to actual needs, that is, the obstacle feature prediction network also includes the voxel position prediction network.
- the voxel position prediction network can have a variety of structures, and it can predict the position information of voxels based on three-dimensional features.
- the embodiments of the present disclosure do not limit the specific structure of the voxel position prediction network.
- the three-dimensional features can also be input into the voxel position prediction network, and the voxel position prediction network outputs voxel position prediction information based on the three-dimensional features.
- the voxel position prediction information reflects the position of the object (ie, obstacle) within the voxel.
- the position of the center of mass of an object within a voxel can be predicted as voxel position prediction information, and the voxel position prediction information can be expressed as (x, y, z), where x, y, and z represent the coordinate values of the predicted center of mass of the object on the x-axis, y-axis, and z-axis, respectively.
- the coordinate offset of the center of mass of the object in the voxel relative to the center point of the voxel can be predicted as the voxel position prediction information
- the voxel position prediction information can be expressed as (dx, dy, dz), where dx represents the offset of the center point of the voxel relative to the center of mass of the object on the x-axis, dy represents the offset of the center point of the voxel relative to the center of mass of the object on the y-axis, and dz represents the offset of the center point of the voxel relative to the center of mass of the object on the z-axis.
- dx, dy and dz are predicted values.
- Step 205 input the three-dimensional features into the voxel size prediction network to obtain voxel size prediction information output by the voxel size prediction network.
- the voxel size prediction network is a network built according to actual needs, that is, the obstacle feature prediction network also includes a voxel size prediction network.
- the voxel size prediction network can have a variety of structures, and it is sufficient to predict the size information of voxels based on three-dimensional features.
- the embodiments of the present disclosure do not limit the specific structure of the voxel size prediction network.
- the three-dimensional features can also be input into the voxel size prediction network, and the voxel size prediction network outputs voxel size prediction information based on the three-dimensional features.
- the voxel size prediction information reflects the size of the object (ie, obstacle) within the voxel.
- the length, width and height of the minimum circumscribed cuboid of the point cloud within the voxel can be predicted as voxel size prediction information.
- the voxel size prediction information can be expressed as (dl, dw, dh), where dl represents the length of the minimum circumscribed cuboid, dw represents the width of the minimum circumscribed cuboid, and dh represents the height of the minimum circumscribed cuboid.
- the initial model includes a voxel occupancy prediction network, a voxel position prediction network and a voxel size prediction network, so that voxel occupancy information, position information and size information can be predicted, and these information constitute obstacle prediction features.
- the obstacle prediction feature can be represented by the prediction features corresponding to multiple voxels, and the prediction features corresponding to each voxel can be expressed as (1, x, y, z, dx, dy, dz, dl, dw, dh), where 1 is the voxel occupancy prediction information, indicating that the voxel is occupied, (x, y, z) represents the center point coordinates of the voxel, (dx, dy, dz) represents the voxel position prediction information, and (dl, dw, dh) represents the voxel size prediction information.
- the obstacle prediction feature can also be expressed as (0), where 0 is the voxel occupancy prediction information, indicating that the voxel is unoccupied, and the unoccupied voxel has no voxel position prediction information and voxel size prediction information; or, the obstacle prediction feature can also be expressed as (0, x, y, z, 0, 0, 0, 0, 0), where the first 0 is the voxel occupancy prediction information, indicating that the voxel is unoccupied, (x, y, z) represents the center point coordinates of the unoccupied voxel, and the remaining 0s indicate that the unoccupied voxels have no voxel position prediction information and voxel size prediction information.
- the training method of the obstacle feature recognition model of the embodiment of the present disclosure obtains the real information of the voxel position and the real information of the voxel size as the learning target, and sets the voxel position prediction network and the voxel size prediction network in the initial model, so that the initial model can learn the voxel position and voxel size of the input surround image, so that the trained model can predict more accurate voxel positions and sizes based on the input surround image, and obtain accurate 3D features of obstacles.
- a first loss value can be determined according to the voxel occupancy prediction information and the real voxel occupancy information
- a second loss value can be determined according to the voxel position prediction information and the real voxel position information
- a third loss value can be determined according to the voxel size prediction information and the real voxel size information
- relevant technologies may be used to first determine at least one voxel pair from the real feature of the obstacle and the predicted feature of the obstacle. For example, for any voxel in the predicted feature of the obstacle, the target voxel with the same specific coordinates as the real feature of the obstacle may be found based on the coordinates of the voxel (i.e., the center point coordinates), and the voxel and the target voxel constitute a voxel pair.
- the first loss value is calculated using the voxel occupancy prediction information and the real voxel occupancy information corresponding to the voxel pair
- the second loss value is calculated using the voxel position prediction information and the real voxel position information corresponding to the voxel pair
- the third loss value is calculated using the voxel size prediction information and the real voxel size information corresponding to the voxel pair.
- the loss function used to calculate each loss value can be pre-set according to actual needs, and the embodiment of the present disclosure does not limit the loss function used to calculate each loss value.
- the network parameters of the initial model when adjusting the network parameters of the initial model based on the first loss value, the second loss value, and the third loss value, the network parameters of the initial model can be adjusted based on each loss value. For example, when the first loss value does not meet the corresponding loss threshold, the network parameters of the voxel occupancy prediction network in the initial model can be adjusted, and the voxel occupancy prediction network can also be adjusted.
- the network parameters of other networks before the prediction network can be adjusted; when the second loss value does not meet the corresponding loss threshold, the network parameters of the voxel position prediction network in the initial model can be adjusted, and the network parameters of other networks before the voxel position prediction network can also be adjusted; when the third loss value does not meet the corresponding loss threshold, the network parameters of the voxel size prediction network in the initial model can be adjusted, and the network parameters of other networks before the voxel size prediction network can also be adjusted.
- the loss value of the initial model can be first calculated based on the first loss value, the second loss value, and the third loss value.
- the average of the first loss value, the second loss value, and the third loss value can be calculated as the loss value of the initial model; for another example, the sum of the first loss value, the second loss value, and the third loss value can be calculated as the loss value of the initial model, and so on.
- the network parameters of the initial model can be adjusted based on the loss value of the initial model. When the loss value of the initial model does not meet the preset loss threshold, at least one network parameter of the initial model is adjusted.
- a first loss value is determined based on the voxel occupancy prediction information and the voxel occupancy real information
- a second loss value is determined based on the voxel position prediction information and the voxel position real information
- a third loss value is determined based on the voxel size prediction information and the voxel size real information
- step 102 may include steps 301 to 303 .
- Step 301 convert the point cloud data into a three-dimensional coordinate system.
- point cloud data collected by a sensor can be obtained, and the obtained point cloud data can be converted into a three-dimensional coordinate system. Since each point in the point cloud data contains a three-dimensional coordinate value, during the conversion, the obtained point cloud data can be converted into a three-dimensional coordinate system based on the three-dimensional coordinate value contained in each point.
- Step 302 voxelize the three-dimensional space in the three-dimensional coordinate system according to a preset size to obtain a plurality of voxels in the three-dimensional space.
- the preset size can be pre-set according to actual needs.
- the preset size can be set to 0.1m*0.1m*0.1m, 0.1m*0.2m*0.1m, 0.1m*0.1m*0.2m, 0.1m*0.2m*0.3m, and so on.
- the three-dimensional space may be divided into voxels according to a fixed preset size, and the three-dimensional space may be divided into a plurality of voxels of the same size.
- the three-dimensional space is voxelized according to the preset size, and the three-dimensional space can be divided into (300, 300, 100) voxels of equal size.
- Step 303 Based on the point cloud data contained in each voxel of the multiple voxels, the real features of the obstacle corresponding to the obstacle in each voxel are generated by using the key points of the point cloud in the voxel and the external graphics of the point cloud.
- the real features of the obstacle include the real information of voxel occupancy corresponding to each voxel, the real information of voxel position corresponding to the non-empty voxel, and the real information of voxel size.
- the key point of the point cloud within the voxel can be the centroid of the point cloud within the voxel
- the external graphics of the point cloud within the voxel can be is the minimum circumscribed cuboid of the point cloud within the voxel.
- the point cloud data contained in the voxel can be determined, and then the real voxel information of the voxel can be determined based on the point cloud data contained in the voxel, the key points of the point cloud in the voxel, and the external graphics of the point cloud, wherein the real voxel information includes the real voxel occupancy information of the voxel, the real voxel position information corresponding to the non-empty voxel (i.e., the voxel whose real voxel occupancy information is occupied voxel), and the real voxel size information.
- the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the occupied non-empty voxel, and the real voxel size information constitute the real obstacle feature corresponding to the obstacle to which the point cloud data belongs in each voxel.
- the voxel occupancy real information can be determined based on whether the voxel contains a point, and can be represented by different marks. For example, for a voxel containing a point, "1" can be marked as the voxel occupancy real information of the voxel, indicating that an object exists in the voxel; for a voxel that does not contain a point, "0" can be marked as the voxel occupancy real information of the voxel, indicating that no object exists in the voxel.
- the real voxel position information and the real voxel size information of the voxel can be represented by different parameters, which are described below with examples.
- the true voxel position information of the voxel can be represented by the coordinates corresponding to the centroid (ie, key point) of the point cloud within the voxel
- the true voxel size information of the voxel can be represented by the coordinates of the vertices of the minimum circumscribed cuboid of the point cloud within the voxel.
- the true voxel position information of the voxel can be represented by the offset between the coordinates of the center point of the voxel and the coordinates corresponding to the center of mass of the point cloud within the voxel (i.e., the key point), and the true voxel size information of the voxel can be represented by the length, width, and height of the minimum circumscribed cuboid of the point cloud within the voxel.
- the training method of the obstacle feature recognition model of the embodiment of the present disclosure converts point cloud data into a three-dimensional coordinate system, voxelizes the three-dimensional space in the three-dimensional coordinate system according to a preset size, and obtains multiple voxels in the three-dimensional space. Then, based on the point cloud data contained in each voxel of the multiple voxels, the key points of the point cloud in the voxel and the external graphics of the point cloud are used to generate the real obstacle features corresponding to the obstacle in each voxel.
- the real obstacle features include the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information.
- the real voxel occupancy information, real voxel position information and real voxel size information corresponding to each voxel are further determined as the real voxel information of the voxel according to the point cloud data contained in each voxel.
- the real position and real size of the voxels belonging to the point cloud data are calculated according to the point cloud data, so that the sizes of different voxels can be different, and the accurate real position and real size of the voxels can be obtained, which is conducive to improving the representation accuracy of the obstacle position and size, and also provides data support for training a model that can accurately identify the characteristics of obstacles.
- step 303 may include steps 401 to 405 .
- Step 401 determining the real voxel occupancy information corresponding to each voxel according to the position of each point in the point cloud data, wherein the real voxel occupancy information includes occupied and unoccupied.
- the position of each point in the point cloud data can be The coordinate value of each point is determined by the position, that is, the coordinate value of each point, to determine whether each voxel contains a point or a point cloud, and determine the voxel occupancy real information corresponding to each voxel according to the judgment result. Specifically, for a voxel containing a point, the corresponding voxel occupancy real information is determined to be occupied, and for a voxel not containing a point, the corresponding voxel occupancy real information is determined to be unoccupied.
- a non-empty voxel containing at least one point may be assigned a value of 1, indicating that the voxel is occupied and there is an object in the voxel; an empty voxel containing no point may be assigned a value of 0, indicating that the voxel is not occupied and there is no object in the voxel.
- "1" and "0" represent the real information of voxel occupancy.
- Step 402 for the non-empty voxels whose real voxel occupancy information is occupied, determine the centroid of the point cloud in the non-empty voxel as the key point based on a centroid calculation rule according to at least one point in the non-empty voxel.
- the centroid calculation rule can be a preset rule for determining the centroid of the point cloud in a non-empty voxel. If there is only one point in the non-empty voxel, the point can be determined as the centroid, and the coordinates of the point are the coordinates of the centroid; if there are multiple points in the non-empty voxel, the centroid of the point cloud in the non-empty voxel and its corresponding coordinates can be calculated according to the coordinate values of each point in the non-empty voxel, the mass of each point and other information by the currently commonly used method of determining the centroid of the point cloud.
- the coordinates of the centroid can represent the position of the centroid, that is, the position coordinates.
- the center of mass of the point cloud in the non-empty voxel can be determined based on all points contained in the non-empty voxel (called a point cloud) and the center of mass calculation rule, and the center of mass can be used as the key point of the point cloud in the non-empty voxel.
- Step 403 Determine the real voxel position information of the non-empty voxel according to the key point and based on a preset position determination rule.
- the voxel position real information and voxel size real information corresponding to each non-empty voxel can be determined according to the key points of the point cloud in the non-empty voxels and based on the preset position determination rules.
- the position determination rule includes: determining the position coordinates corresponding to the key point as the real voxel position information of the non-empty voxel.
- the position coordinates of the determined key point can be directly determined as the real voxel position information corresponding to the non-empty voxel.
- the centroid of the point cloud represents the average position of the mass distribution of the point cloud, and the point cloud describes obstacles in geographic space
- the centroid of the point cloud can be used to characterize the true position of the obstacle. Since the general obstacle detection algorithm in three-dimensional space divides the three-dimensional space into voxels to identify obstacles, in the embodiments of the present disclosure, in order to determine the true position of the voxel, the coordinates of the centroid of the point cloud in the voxel (i.e., the key point) can be used as the true voxel position information of the voxel.
- the position determination rule includes: determining the coordinate offset of the center point of the non-empty voxel relative to the key point as the position information of the non-empty voxel. Therefore, in the embodiment of the present disclosure, for any non-empty voxel, when determining the true voxel position information of the non-empty voxel based on the key point and the preset position determination rule, the first coordinate of the center point of the non-empty voxel can be obtained first, and then the coordinate of the center point of the non-empty voxel relative to the key point of the point cloud in the non-empty voxel can be determined based on the first coordinate and the second coordinate corresponding to the key point of the point cloud in the non-empty voxel.
- the offset wherein the coordinate offset includes the offset corresponding to each coordinate axis in the three-dimensional coordinate system, and then the first coordinate of the center point of the non-empty voxel and the coordinate offset of the center point relative to the center of mass of the point cloud in the non-empty voxel are determined as the true voxel position information of the non-empty voxel.
- the coordinates of the center point of each voxel can be determined.
- the coordinates corresponding to the center point of the non-empty voxel are referred to as the first coordinates
- the coordinates corresponding to the key points of the point cloud in the non-empty voxel are referred to as the second coordinates.
- the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud within the non-empty voxel can be calculated based on the first coordinate corresponding to the center point of the non-empty voxel and the second coordinate corresponding to the key point of the point cloud within the non-empty voxel, wherein the coordinate offset includes the offset of the center point relative to the key point on the x, y and z axes of the three-dimensional coordinate system, respectively, and the offset can be expressed by the coordinate difference between the center point and the key point on the same coordinate axis.
- the coordinates of the point can be regarded as the second coordinates of the key point
- the coordinate offset of the center point of the non-empty voxel relative to the point can be regarded as the coordinate offset of the center point of the non-empty voxel relative to the key point.
- the first coordinate corresponding to the center point and the determined coordinate offset can be determined as the real voxel position information of the non-empty voxel.
- the true voxel position information of a non-empty voxel can be expressed as (x 1 , y 1 , z 1 , dx, dy, dz), where (x 1 , y 1 , z 1 ) represents the coordinates of the center point of the non-empty voxel, and (dx, dy, dz) represents the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud within the non-empty voxel.
- the center of mass of the point cloud in the non-empty voxel is determined as the key point of the point cloud based on at least one point in the non-empty voxel, and the true voxel position information of the non-empty voxel is determined based on the center of mass.
- the true position of the voxel is determined according to the position of the center of mass of the point cloud in the non-empty voxel, which can improve the accuracy of the voxel position and make the voxel representation of the obstacle position more accurate.
- Step 404 determine the real information of the voxel size of the non-empty voxel by using the circumscribed graph corresponding to at least one point in the non-empty voxel.
- the circumscribed figure refers to the circumscribed figure of all points contained in the non-empty voxel, and the circumscribed figure may be, for example, a minimum circumscribed cuboid.
- any non-empty voxel at least one of the non-empty voxels is used.
- the minimum circumscribed figure of all the points in the non-empty voxel can be determined first, for example, the minimum circumscribed cuboid of all the points in the non-empty voxel can be determined, and then the coordinates of each vertex of the minimum circumscribed figure can be determined, and then the coordinates of each vertex can be used to represent the real voxel size information of the non-empty voxel.
- the minimum circumscribed cuboid of at least one point in the non-empty voxel can be first determined based on at least one point in the non-empty voxel, and then the true voxel size information of the non-empty voxel can be determined based on the length, width and height of the minimum circumscribed cuboid.
- the currently commonly used method of determining the minimum circumscribed cuboid can be used for implementation, and the embodiments of the present disclosure do not limit the specific implementation method of determining the minimum circumscribed cuboid.
- the minimum circumscribed cuboid can be a cuboid or a cube.
- the determined minimum circumscribed rectangle can be represented by the coordinates of each vertex.
- different methods can be selected to determine the minimum circumscribed cuboid of at least one point in the non-empty voxel according to the different numbers and positions of points contained in the non-empty voxel.
- the following example is used for exemplary explanation.
- the currently commonly used method can be used to determine the minimum circumscribed cuboid of these points.
- the currently commonly used method can be used to determine the minimum circumscribed rectangle of these points, and for another coordinate axis that is not involved in the plane where these points are located, the length on the coordinate axis can be determined based on the preset side length, thereby obtaining a cuboid, and the cuboid is used as the minimum circumscribed cuboid of these points.
- the minimum circumscribed rectangle of these points can be determined based on these points, and the coordinates of each vertex of the minimum circumscribed rectangle can be obtained, and for the x-axis, the coordinates on the x-axis can be determined based on the preset side length of 0.2 to obtain a cuboid.
- the determined minimum circumscribed rectangle can be used as a face of the cuboid, and the coordinates of the vertex when the length is 0.2 in the positive or negative direction of the x-axis can be determined, and the coordinates of the other four vertices of the cuboid can be obtained, thereby obtaining the coordinates of each vertex on the minimum circumscribed cuboid.
- the determined minimum circumscribed rectangle can be used as the mid-section of the cuboid, and the x-coordinate values of the four vertices of the minimum circumscribed rectangle can be increased by 0.1 and decreased by 0.1 respectively, while the coordinates of the y-axis and the z-axis remain unchanged, to obtain eight new coordinate points.
- the cuboid enclosed by these eight new coordinate points is determined as the minimum circumscribed cuboid of at least two points in the non-empty voxel.
- a cube containing the point can be determined as the minimum circumscribed cuboid of the point according to a preset side length.
- a cuboid containing the point can be determined as the minimum circumscribed cuboid of the point according to a preset length, a preset width, and a preset height.
- the non-empty voxel can also be used as the minimum circumscribed cuboid.
- the length, width and height of the minimum circumscribed cuboid are also determined. Further, the real voxel size information of the non-empty voxel can be determined based on the length, width and height of the minimum circumscribed cuboid.
- the length, width and height of the minimum circumscribed cuboid may be directly used as the true voxel size information of the non-empty voxel.
- each side length of the minimum circumscribed cuboid can be compared with the preset length, wherein each side length includes the lengths corresponding to the length, width and height of the minimum circumscribed cuboid, and the preset length can be set according to actual needs, such as setting the preset length to 0.1; when there is a side length less than the preset length in each side length, the side length less than the preset length is updated to the preset length, and then based on each updated side length, the true voxel size information of the non-empty voxel is determined.
- the length, width and height of the minimum circumscribed cuboid can be compared with the preset length respectively. If at least one side length of the length, width and height is less than the preset length, the side length less than the preset length is replaced with the preset length, and then the updated side length is used as the true voxel size information of the non-empty voxel.
- each side length in the true voxel size information of the non-empty voxel is not less than the preset length, so as to avoid the final non-empty voxel size being too small, resulting in a large number of holes, making the voxel discontinuous, thereby affecting the obstacle detection result.
- the true voxel size information of a non-empty voxel can be expressed as (dl, dw, dh), where dl represents the length of the minimum enclosing rectangle, dw represents the width of the minimum enclosing rectangle, and dh represents the height of the minimum enclosing rectangle.
- the true voxel size information of the non-empty voxel can be determined to be (0.2, 0.1, 0.08).
- the length, width, and height of the minimum circumscribed cuboid determined based on the points in a non-empty voxel are 0.2, 0.1, and 0.08, respectively. It can be determined that the height of the minimum circumscribed cuboid is less than the preset length, and the height of the minimum circumscribed cuboid is updated to 0.1.
- the final voxel size information of the non-empty voxel is (0.2, 0.1, 0.1).
- the minimum circumscribed cuboid of at least one point in the non-empty voxel is determined based on at least one point in the non-empty voxel
- the real voxel size information of the non-empty voxel is determined based on the length, width and height of the minimum circumscribed cuboid.
- a non-empty voxel contains only one point
- the above-mentioned method of comparing the length, width and height of the minimum circumscribed cuboid with the preset length can be adopted.
- the side length less than the preset length is updated to the preset length, and then the true voxel size information of the non-empty voxel is determined based on each updated side length, so as to determine the true voxel size information of the non-empty voxel when the non-empty voxel contains only one point.
- the preset length can be determined as the true voxel size information of the non-empty voxel containing one point. Assuming the preset The length is 0.1. For a non-empty voxel containing only one point, (0.1, 0.1, 0.1) can be used as the actual voxel size information of the non-empty voxel.
- step 403 and step 404 there is no particular order for the execution of step 403 and step 404.
- the two can be executed simultaneously or one after the other.
- the embodiment of the present disclosure only takes the execution of step 404 after step 403 as an example, which cannot be used as a limitation of the present disclosure.
- Step 405 Generate a real obstacle feature corresponding to the obstacle in each voxel based on the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information.
- the real obstacle features of the obstacle belonging to the point cloud data can be generated based on the real voxel occupancy information, the real voxel position information and the real voxel size information of the same non-empty voxel, and the real voxel occupancy information corresponding to each unoccupied empty voxel.
- the voxel position real information and the voxel size real information may not be provided, or the voxel position real information and the voxel size real information of the empty voxel may be set to 0.
- the real information representation method corresponding to each voxel in the real feature of the obstacle is represented by a unified format as (p, x, y, z, dl, dw, dh), where p represents the real information of voxel occupancy, (x, y, z) represents the coordinates corresponding to the centroid of the point cloud in the voxel, dl represents the length of the minimum bounding rectangle, dw represents the width of the minimum bounding rectangle, and dh represents the height of the minimum bounding rectangle.
- the real information corresponding to the unoccupied empty voxels in the real feature of the obstacle may be represented by (p 1 ), where p 1 indicates that the voxel is unoccupied.
- the true voxel position information of the occupied non-empty voxel is represented by the coordinates of the center point of the non-empty voxel and the coordinate offset of the center point relative to the center of mass of the point cloud in the non-empty voxel
- the true voxel size information of the non-empty voxel is represented by the length, width and height of the minimum circumscribed rectangle of all point clouds in the non-empty voxel
- the true information of the non-empty voxel can be expressed as (p 2 ,x 1 ,y 1 ,z 1 ,dx,dy,dz,dl,dw,dh), where p 2 indicates that the voxel is occupied, (x 1 ,
- the training method of the obstacle feature recognition model of the embodiment of the present disclosure determines the real information of voxel occupancy corresponding to each voxel according to the position of each point in the point cloud data, the real information of voxel occupancy includes occupied and unoccupied, and for non-empty voxels whose real information of voxel occupancy is occupied, the centroid of the point cloud in the non-empty voxel is determined as a key point based on a centroid calculation rule according to at least one point in the non-empty voxel, and the real information of the voxel position of the non-empty voxel is determined based on a preset position determination rule according to the determined key point, and the real information of the voxel size of the non-empty voxel is determined by using an external figure corresponding to at least one point in the non-empty voxel, and then based on the real information of voxel occupancy and the non-
- the real voxel position information and the real voxel size information corresponding to the voxel are obtained to generate the real features of the obstacle corresponding to each voxel. Therefore, the occupation status of the divided multiple voxels can be marked, and the real position and real size corresponding to each non-empty voxel can be determined.
- the real voxel information is dynamically generated according to the points in the voxel, which makes the position and contour of the obstacle more precise, which is conducive to improving the accuracy of obstacle feature recognition, thereby improving the accuracy of obstacle detection.
- multiple frames of point cloud data collected by the laser radar for the same obstacle can be first obtained, and then the multiple frames of point cloud data can be converted into the same three-dimensional coordinate system.
- the multiple points are deduplicated.
- a frame of point cloud of the obstacle is obtained at one time. It is difficult for one frame of point cloud to fully describe all the information of the entire obstacle. Therefore, in the embodiment of the present disclosure, multiple frames of point cloud data collected by the laser radar for the same obstacle at different times can be obtained, and then the obtained multiple frames of point cloud data can be spliced.
- Splicing refers to converting multiple frames of point cloud data into the same three-dimensional coordinate system. It can be understood that for the same three-dimensional coordinate position in three-dimensional space, more than one frame of point cloud data may contain the point corresponding to the position.
- the multiple points corresponding to the position can be deduplicated, and only one point is retained to be displayed in the three-dimensional coordinate system.
- the data of multiple points corresponding to the same three-dimensional coordinate position can also be averaged, and the average obtained is used as the information of the point corresponding to the three-dimensional coordinate position in the three-dimensional coordinate system.
- a denser point cloud can be obtained, so that the point cloud data displayed in the three-dimensional coordinate system can better reflect all the information of the real obstacle.
- FIG5 shows a schematic diagram of the network structure of an obstacle feature recognition model of a specific embodiment of the present disclosure.
- the obstacle feature recognition model includes a feature extraction network, a feature conversion network, a voxel occupancy prediction network, a voxel position prediction network and a voxel size prediction network, wherein the feature extraction network is used to extract features of the input surround view image to obtain two-dimensional features; the feature conversion network is used to convert the two-dimensional features extracted by the feature extraction network into three-dimensional features; the voxel occupancy prediction network is used to obtain voxel occupancy prediction information based on the three-dimensional features, the voxel position prediction network is used to obtain voxel position prediction information based on the three-dimensional features, and the voxel size prediction network is used to obtain voxel size prediction information based on the three-dimensional features.
- the obstacle feature recognition model shown in FIG5 can be obtained by training using the real features of obstacles obtained based on point cloud data as training targets, wherein the real features of obstacles include real voxel occupancy information, real voxel position information and real voxel size information.
- the real features of obstacles include real voxel occupancy information, real voxel position information and real voxel size information.
- six surround view images are input into the obstacle feature recognition model, and two-dimensional features are first obtained through the feature extraction network, and the two-dimensional features are restored to three-dimensional features after passing through the feature conversion module.
- the obstacle feature recognition model may also include an upsampling network.
- the three-dimensional features are respectively input into the voxel occupancy prediction network, the voxel position prediction network and the voxel size prediction network to obtain voxel occupancy prediction information, voxel position prediction information and voxel size prediction information, and then based on the voxel occupancy prediction information, the voxel position prediction information and the voxel size prediction information, the obstacle prediction features are obtained.
- the obstacle prediction features are obtained.
- the embodiment of the present disclosure further provides a method for acquiring obstacle features, which uses the obstacle feature recognition model trained by the above embodiment to obtain obstacle prediction features of the obstacle.
- FIG6 is a flow chart of a method for acquiring obstacle features provided in an embodiment of the present disclosure.
- the method for acquiring obstacle features may be executed by an apparatus for acquiring obstacle features provided in an embodiment of the present disclosure.
- the apparatus for acquiring obstacle features may be implemented using software and/or hardware and may be integrated on a computer device, which may be an electronic device such as a computer or a server.
- the method for acquiring obstacle features may include steps 501 and 502 .
- Step 501 Acquire a surround view image of an obstacle.
- the obstacles may be buildings, trees, human bodies, vehicles, etc.
- a camera may be used to capture a surround view image of an obstacle, the surround view image including multiple images, for example, the surround view image may include images in six directions: front, rear, left front, right front, left rear, and right rear.
- Step 502 Input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle.
- the obstacle feature recognition model is pre-trained by the obstacle feature recognition model training method provided in the above embodiment.
- the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
- the feature extraction network is used to extract two-dimensional features corresponding to the surround view image
- the feature conversion network is used to convert the two-dimensional features into three-dimensional features
- the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- the acquired surround view image of the obstacle is input into the obstacle feature recognition model that has been trained in advance.
- the feature extraction network in the obstacle feature recognition model first extracts features from the surround view image to obtain two-dimensional features corresponding to the surround view image.
- the two-dimensional features output by the feature extraction network are input into the feature conversion network, which performs feature conversion to convert the two-dimensional features into three-dimensional features corresponding to the obstacle.
- the three-dimensional features output by the feature conversion network are input into the obstacle feature prediction network, which performs obstacle feature prediction based on the input three-dimensional features and outputs obstacle prediction features corresponding to the obstacle.
- the obstacle feature prediction network can be set according to actual needs.
- the obstacle feature prediction network may include but is not limited to at least one of a voxel occupancy prediction network, a voxel position prediction network, and a voxel size prediction network. Accordingly, at least one of the voxel occupancy prediction information, the voxel position prediction information, and the voxel size prediction information can be predicted as the obstacle prediction feature of the obstacle.
- the obstacle feature acquisition method of the disclosed embodiment obtains a surround view image of the obstacle, inputs the surround view image into a pre-trained obstacle feature recognition model, and obtains obstacle prediction features corresponding to the obstacle.
- a pre-trained obstacle feature recognition model it is possible to obtain high-precision obstacle features of the obstacle in three-dimensional space based on the surround view image of the obstacle, which is beneficial to improving the accuracy of obstacle recognition, thereby helping the vehicle to effectively avoid obstacles in autonomous driving.
- the embodiment of the present disclosure also provides a training device for an obstacle feature recognition model.
- FIG. 7 is a schematic diagram of the structure of a training device for an obstacle feature recognition model provided by an embodiment of the present disclosure. It can be implemented by software and/or hardware, and can be integrated on a computer device, which can be an electronic device such as a computer or a server.
- the obstacle feature recognition model training device 60 may include: a first acquisition module 601 , a determination module 602 , a second acquisition module 603 and a training module 604 .
- the first acquisition module 601 is used to acquire surround view images and point cloud data corresponding to the same obstacle.
- the determination module 602 is used to determine the real obstacle feature corresponding to each voxel of the obstacle based on the point cloud data and the preset voxels in the corresponding three-dimensional space.
- the second acquisition module 603 is used to input the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
- the training module 604 is used to train the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- the real feature of the obstacle includes real information of voxel occupancy
- the obstacle feature prediction network includes a voxel occupancy prediction network
- the second acquisition module 603 is further used to:
- the three-dimensional features are input into the voxel occupancy prediction network to obtain voxel occupancy prediction information output by the voxel occupancy prediction network.
- the real feature of the obstacle further includes real information of the voxel position and real information of the voxel size corresponding to the voxel
- the obstacle feature prediction network further includes a voxel position prediction network and a voxel size prediction network; the second acquisition module 603 is further used to:
- the three-dimensional features are input into the voxel size prediction network to obtain voxel size prediction information output by the voxel size prediction network.
- the training module 604 is further configured to:
- a network parameter of the initial model is adjusted.
- the determining module 602 includes:
- a conversion submodule used for converting the point cloud data into a three-dimensional coordinate system
- a segmentation submodule used for voxelizing the three-dimensional space in the three-dimensional coordinate system according to a preset size to obtain a plurality of voxels in the three-dimensional space;
- a generation submodule is used to generate, based on the point cloud data contained in each voxel of the multiple voxels, the real features of the obstacle corresponding to the obstacle in each voxel by using the key points of the point cloud in the voxel and the external graphics of the point cloud, wherein the real features of the obstacle include the real information of voxel occupancy corresponding to each voxel, the real information of voxel position corresponding to the non-empty voxel, and the real information of voxel size.
- the generating submodule comprises:
- a first determining unit configured to determine, according to the position of each point in the point cloud data, real voxel occupancy information corresponding to each voxel, wherein the real voxel occupancy information includes occupied and unoccupied;
- a second determining unit is configured to determine, for the non-empty voxel whose real voxel occupancy information is occupied, the centroid of the point cloud in the non-empty voxel as the key point based on a centroid calculation rule according to at least one point in the non-empty voxel;
- a third determination unit configured to determine the real voxel position information of the non-empty voxel according to the key point and based on a preset position determination rule
- a fourth determining unit configured to determine the real voxel size information of the non-empty voxel by using the circumscribed graph corresponding to at least one point in the non-empty voxel;
- a generating unit is used to generate a real obstacle feature corresponding to the obstacle in each voxel based on the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information.
- the position determination rule includes: determining the position coordinates corresponding to the key point as the real voxel position information of the non-empty voxel.
- the position determination rule includes: determining the coordinate offset of the center point of the non-empty voxel relative to the key point as the position information of the non-empty voxel; the third determination unit is further used to:
- the first coordinate and the coordinate offset are determined as the real voxel position information of the non-empty voxel.
- the fourth determining unit is further configured to:
- the real voxel size information of the non-empty voxel is determined.
- the fourth determining unit is further configured to:
- the fourth determining unit is further configured to:
- each side length of the minimum circumscribed cuboid Compare each side length of the minimum circumscribed cuboid with a preset length, wherein each side length includes the lengths corresponding to the length, width, and height of the minimum circumscribed cuboid;
- the real voxel size information of the non-empty voxel is determined.
- the training device for the obstacle feature recognition model that can be configured on a computer device provided in the embodiments of the present disclosure can execute any training method for the obstacle feature recognition model applied to a computer device provided in the embodiments of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
- any training method for the obstacle feature recognition model applied to a computer device provided in the embodiments of the present disclosure and has the corresponding functional modules and beneficial effects of the execution method.
- the embodiment of the present disclosure also provides a device for acquiring obstacle characteristics.
- FIG8 is a schematic diagram of the structure of an obstacle feature acquisition device provided in an embodiment of the present disclosure.
- the device may be implemented using software and/or hardware and may be integrated on a computer device, which may be an electronic device such as a computer or a server.
- the obstacle feature acquisition device 70 may include: an image acquisition module 701 and a feature prediction module 702 .
- the image acquisition module 701 is used to acquire a surround view image of an obstacle.
- the feature prediction module 702 is used to input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle.
- the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
- the feature extraction network is used to extract the two-dimensional features corresponding to the surround view image
- the feature conversion network is used to convert the two-dimensional features into three-dimensional features
- the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- the obstacle feature acquisition device provided in the embodiments of the present disclosure and which can be configured on a computer device can execute any obstacle feature acquisition method provided in the embodiments of the present disclosure and which is applied to a computer device, and has the corresponding functional modules and beneficial effects of the execution method.
- any obstacle feature acquisition method provided in the embodiments of the present disclosure and which is applied to a computer device and has the corresponding functional modules and beneficial effects of the execution method.
- the embodiments of the present disclosure further provide a computer device, including a processor and a memory; the processor is used to execute the steps of each embodiment of the training method of the obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for obtaining obstacle features as described in any of the aforementioned embodiments, by calling the program or instructions stored in the memory. To avoid repeated description, they are not repeated here.
- the embodiments of the present disclosure further provide a computer-readable storage medium, which is non-transitory and stores programs or instructions.
- the programs or instructions enable a computer to execute the steps of each embodiment of the method for training an obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for acquiring obstacle features as described in any of the aforementioned embodiments. To avoid repeated description, they are not repeated here.
- the embodiments of the present disclosure also provide a computer program product, which is used to execute the steps of each embodiment of the method for training an obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for acquiring obstacle features as described in any of the aforementioned embodiments.
- the present disclosure also provides a computer program, wherein the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes the obstacle control method as described in any of the above embodiments.
- the computer program includes computer program code
- the computer executes the obstacle control method as described in any of the above embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Image Processing (AREA)
Abstract
Provided are an obstacle feature recognition model training method and apparatus, a computer device, a computer-readable storage medium, a computer program product, and a computer program. The method comprises: acquiring a surround-view image and point cloud data that are corresponding to a same obstacle; on the basis of the point cloud data and voxels preset in a corresponding three-dimensional space, determining a real obstacle feature corresponding to an obstacle in each voxel; inputting the surround-view image into an initial model to be trained, to obtain a predicted obstacle feature corresponding to the obstacle, wherein the initial model comprises a feature extraction network, a feature conversion network, and an obstacle feature prediction network; and training the initial model according to the real obstacle features and the predicted obstacle feature that are corresponding to the obstacle, so as to obtain an obstacle feature recognition model.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求在2023年03月30日在中国提交的中国专利申请号202310333989.7的优先权,其全部内容通过引用并入本文。This application claims priority to Chinese patent application No. 202310333989.7 filed in China on March 30, 2023, the entire contents of which are incorporated herein by reference.
本公开涉及障碍物检测技术领域,尤其涉及一种障碍物特征识别模型的训练方法及装置、计算机设备、计算机可读存储介质、计算机程序产品和计算机程序。The present disclosure relates to the technical field of obstacle detection, and in particular to a training method and apparatus for an obstacle feature recognition model, a computer device, a computer-readable storage medium, a computer program product, and a computer program.
自动驾驶汽车又称无人驾驶汽车,是一种通过计算机设备控制车辆在道路上实现自动行驶的技术,自动驾驶的实现依赖于人工智能、视觉计算、雷达以及定位组件的协同合作。由于实际道路情况复杂,存在大量行人和车辆等障碍物,因此如何实现障碍物识别,明确障碍物在三维空间下的特征,进而规划出规避障碍物的行驶路线成为自动驾驶的关键。Self-driving cars, also known as driverless cars, are a technology that uses computer equipment to control vehicles to drive automatically on the road. The realization of self-driving relies on the collaboration of artificial intelligence, visual computing, radar, and positioning components. Due to the complex actual road conditions and the presence of a large number of obstacles such as pedestrians and vehicles, how to achieve obstacle recognition, clarify the characteristics of obstacles in three-dimensional space, and then plan a driving route to avoid obstacles has become the key to self-driving.
相关的障碍物识别技术,由于摄像机的成本较低,主要基于摄像机采集的图像进行障碍物识别,虽然存在将摄像机采集的二维数据转换为三维数据的技术,但是其精准度欠佳,导致利用图像和识别模型得到的障碍物特征在三维空间中精度不高,影响自动驾驶后续的决策工作。The related obstacle recognition technology mainly recognizes obstacles based on images captured by cameras due to the low cost of cameras. Although there is technology that can convert two-dimensional data collected by cameras into three-dimensional data, its accuracy is poor, resulting in the obstacle features obtained using images and recognition models being of low accuracy in three-dimensional space, affecting the subsequent decision-making of autonomous driving.
因此如何得到一个能够输出高精度障碍物特征的识别模型就成为亟待解决的问题。Therefore, how to obtain a recognition model that can output high-precision obstacle features becomes an urgent problem to be solved.
发明内容Summary of the invention
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开的至少一个实施例提供了一种障碍物特征识别模型的训练方法及装置、计算机设备及、计算机可读存储介质、计算机程序产品和计算机程序。In order to solve the above technical problems or at least partially solve the above technical problems, at least one embodiment of the present disclosure provides a training method and apparatus for an obstacle feature recognition model, a computer device, a computer-readable storage medium, a computer program product, and a computer program.
第一方面,本公开实施例提供了一种障碍物特征识别模型的训练方法,包括:In a first aspect, an embodiment of the present disclosure provides a method for training an obstacle feature recognition model, comprising:
获取同一障碍物对应的环视图像及点云数据;Obtain surround view images and point cloud data corresponding to the same obstacle;
基于所述点云数据,以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征;Based on the point cloud data and preset voxels in the corresponding three-dimensional space, determine the real obstacle features corresponding to the obstacle in each voxel;
将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,所述初始模型包括特征提取网络、特征转换网络和障碍物特征预测网络;Inputting the surround view image into an initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network, and an obstacle feature prediction network;
根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。The initial model is trained according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
第二方面,本公开实施例提供了一种障碍物特征的获取方法,包括:In a second aspect, an embodiment of the present disclosure provides a method for acquiring obstacle features, including:
获取障碍物的环视图像;
Obtain a surround view of obstacles;
将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征;Inputting the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle;
其中,所述障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,所述特征提取网络用于提取所述环视图像对应的二维特征,所述特征转换网络用于将所述二维特征转换为三维特征,所述障碍物特征预测网络用于根据所述三维特征预测得到所述障碍物对应的障碍物预测特征。Among them, the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract the two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
第三方面,本公开实施例提供了一种障碍物特征识别模型的训练装置,包括:In a third aspect, the present disclosure provides a training device for an obstacle feature recognition model, including:
第一获取模块,用于获取同一障碍物对应的环视图像及点云数据;The first acquisition module is used to acquire the surround view image and point cloud data corresponding to the same obstacle;
确定模块,用于基于所述点云数据,以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征;A determination module, configured to determine the real obstacle feature corresponding to each voxel of the obstacle based on the point cloud data and preset voxels in the corresponding three-dimensional space;
第二获取模块,用于将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,所述初始模型包括特征提取网络、特征转换网络和障碍物特征预测网络;A second acquisition module is used to input the surround view image into an initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network;
训练模块,用于根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。The training module is used to train the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
第四方面,本公开实施例提供了一种障碍物特征的获取装置,包括:In a fourth aspect, an embodiment of the present disclosure provides a device for acquiring obstacle characteristics, including:
图像获取模块,用于获取障碍物的环视图像;An image acquisition module is used to acquire a surround view image of obstacles;
特征预测模块,用于将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征;A feature prediction module, used to input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle;
其中,所述障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,所述特征提取网络用于提取所述环视图像对应的二维特征,所述特征转换网络用于将所述二维特征转换为三维特征,所述障碍物特征预测网络用于根据所述三维特征预测得到所述障碍物对应的障碍物预测特征。Among them, the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract the two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
第五方面,本公开实施例提供了一种计算机设备,包括:处理器和存储器;In a fifth aspect, an embodiment of the present disclosure provides a computer device, including: a processor and a memory;
所述处理器通过调用所述存储器存储的程序或指令,用于执行本公开实施例提供的任一所述的障碍物特征识别模型的训练方法,或者,执行本公开实施例提供的任一所述的障碍物特征的获取方法。The processor is used to execute any of the obstacle feature recognition model training methods provided in the embodiments of the present disclosure, or to execute any of the obstacle feature acquisition methods provided in the embodiments of the present disclosure, by calling the program or instruction stored in the memory.
第六方面,本公开实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行本公开实施例提供的任一所述的障碍物特征识别模型的训练方法,或者,执行本公开实施例提供的任一所述的障碍物特征的获取方法。In a sixth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, wherein the computer-readable storage medium stores a program or instruction, wherein the program or instruction enables a computer to execute any of the training methods for the obstacle feature recognition model provided in the embodiments of the present disclosure, or execute any of the methods for acquiring obstacle features provided in the embodiments of the present disclosure.
第七方面,本公开实施例提供了一种计算机程序产品,所述计算机程序产品在被处理器执行时实现本公开第一方面任一实施例所述的障碍物特征识别模型的训练方法,或者,本公开第二方面任一实施例提供的任一所述的障碍物特征的获取方法。In the seventh aspect, an embodiment of the present disclosure provides a computer program product, which, when executed by a processor, implements the training method of the obstacle feature recognition model described in any embodiment of the first aspect of the present disclosure, or the method for obtaining obstacle features provided in any embodiment of the second aspect of the present disclosure.
第八方面,本公开实施例提供了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如本公开第一方面任一
实施例所述的障碍物特征识别模型的训练方法,或者,本公开第二方面任一实施例提供的任一所述的障碍物特征的获取方法。In an eighth aspect, an embodiment of the present disclosure provides a computer program, wherein the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes any one of the first aspects of the present disclosure. The training method of the obstacle feature recognition model described in the embodiment, or the method for obtaining obstacle features provided by any embodiment of the second aspect of the present disclosure.
本公开实施例提供的技术方案与相关技术相比至少具有如下优点:Compared with the related art, the technical solution provided by the embodiments of the present disclosure has at least the following advantages:
在本公开实施例中,获取同一障碍物对应的环视图像及点云数据,并基于点云数据以及对应三维空间中预设的体素,确定障碍物在每个体素中对应的障碍物真实特征,以及将环视图像输入待训练的初始模型,得到障碍物对应的障碍物预测特征,进而根据障碍物对应的障碍物真实特征和障碍物预测特征对初始模型进行训练,得到障碍物特征识别模型。采用上述技术方案,利用初始模型对环视图像进行障碍物特征预测,并利用基于点云数据确定的障碍物真实特征作为训练目标,对初始模型进行训练以得到障碍物特征识别模型,使得训练后的模型能够从环视图像中学习到准确的障碍物特征,提高障碍物特征识别模型对障碍物的3D特征的识别准确度,从而有利于提高障碍物识别的准确率。In the disclosed embodiment, the surround view image and point cloud data corresponding to the same obstacle are obtained, and based on the point cloud data and the preset voxels in the corresponding three-dimensional space, the real obstacle features corresponding to the obstacle in each voxel are determined, and the surround view image is input into the initial model to be trained to obtain the obstacle prediction features corresponding to the obstacle, and then the initial model is trained according to the real obstacle features and the obstacle prediction features corresponding to the obstacle to obtain the obstacle feature recognition model. The above technical solution is adopted, the initial model is used to predict the obstacle features of the surround view image, and the real obstacle features determined based on the point cloud data are used as the training target, and the initial model is trained to obtain the obstacle feature recognition model, so that the trained model can learn accurate obstacle features from the surround view image, improve the recognition accuracy of the obstacle feature recognition model for the 3D features of the obstacle, and thus help improve the accuracy of obstacle recognition.
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or related technologies, the drawings required for use in the embodiments or related technical descriptions are briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.
图1为本公开一实施例提供的障碍物特征识别模型的训练方法的流程示意图;FIG1 is a schematic diagram of a flow chart of a method for training an obstacle feature recognition model provided by an embodiment of the present disclosure;
图2为本公开另一实施例提供的障碍物特征识别模型的训练方法的流程示意图;FIG2 is a schematic flow chart of a method for training an obstacle feature recognition model provided by another embodiment of the present disclosure;
图3为本公开又一实施例提供的障碍物特征识别模型的训练方法的流程示意图;FIG3 is a flow chart of a method for training an obstacle feature recognition model provided by another embodiment of the present disclosure;
图4为本公开再一实施例提供的障碍物特征识别模型的训练方法的流程示意图;FIG4 is a flow chart of a method for training an obstacle feature recognition model provided in yet another embodiment of the present disclosure;
图5示出了本公开一具体实施例的障碍物特征识别模型的网络结构示意图;FIG5 is a schematic diagram showing a network structure of an obstacle feature recognition model according to a specific embodiment of the present disclosure;
图6为本公开一实施例提供的障碍物特征的获取方法的流程示意图;FIG6 is a schematic flow chart of a method for acquiring obstacle features according to an embodiment of the present disclosure;
图7为本公开一实施例提供的障碍物特征识别模型的训练装置的结构示意图;FIG7 is a schematic diagram of the structure of a training device for an obstacle feature recognition model provided by an embodiment of the present disclosure;
图8为本公开一实施例提供的障碍物特征的获取装置的结构示意图。FIG8 is a schematic diagram of the structure of an obstacle feature acquisition device provided in an embodiment of the present disclosure.
为了能够更清楚地理解本公开的上述目的、特征和优点,下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,所描述的实施例是本公开的一部分实施例,而不是全部的实施例,此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。基于所描述的本公开的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。In order to more clearly understand the above-mentioned purposes, features and advantages of the present disclosure, the present disclosure is further described in detail below in conjunction with the accompanying drawings and embodiments. It is understood that the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments, and the specific embodiments described herein are only used to explain the present disclosure, rather than to limit the present disclosure. In the absence of conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other. Based on the described embodiments of the present disclosure, all other embodiments obtained by ordinary technicians in the field belong to the scope of protection of the present disclosure.
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,
而不是全部的实施例。In the following description, many specific details are set forth to facilitate a full understanding of the present disclosure, but the present disclosure may also be implemented in other ways different from those described herein; it is obvious that the embodiments in the specification are only a part of the embodiments of the present disclosure. Rather than all embodiments.
图1为本公开一实施例提供的障碍物特征识别模型的训练方法的流程示意图,该障碍物特征识别模型的训练方法可以由本公开实施例提供的障碍物特征识别模型的训练装置执行,该障碍物特征识别模型的训练装置可以采用软件和/或硬件实现,并可集成在计算机设备上,所述计算机设备可以是电脑、服务器等电子设备。Figure 1 is a flow chart of a training method for an obstacle feature recognition model provided in an embodiment of the present disclosure. The training method for an obstacle feature recognition model can be executed by a training device for an obstacle feature recognition model provided in an embodiment of the present disclosure. The training device for an obstacle feature recognition model can be implemented using software and/or hardware and can be integrated on a computer device, which can be an electronic device such as a computer or a server.
如图1所示,本公开实施例提供的障碍物特征识别模型的训练方法,可以包括步骤101至步骤104。As shown in FIG. 1 , the training method of the obstacle feature recognition model provided in the embodiment of the present disclosure may include steps 101 to 104 .
步骤101,获取同一障碍物对应的环视图像及点云数据。Step 101, obtaining surround view images and point cloud data corresponding to the same obstacle.
其中,环视图像包括多张图像,例如,环视图像可以包括前、后、左前、右前、左后和右后六个方向的图像。障碍物例如可以是建筑、树木、人体、车辆等。The surround image includes multiple images, for example, the surround image may include images in six directions: front, rear, left front, right front, left rear and right rear. Obstacles may be, for example, buildings, trees, human bodies, vehicles and the like.
本公开实施例中,针对同一障碍物,可以获取摄像机采集的该障碍物的环视图像,以及获取各种传感器采集的该障碍物的点云数据,例如由激光雷达来获取障碍物的点云数据。In the disclosed embodiment, for the same obstacle, a surround view image of the obstacle captured by a camera and point cloud data of the obstacle captured by various sensors may be obtained, for example, point cloud data of the obstacle may be obtained by a laser radar.
激光雷达是一种集激光、全球卫星导航系统(Global Navigation Satellite System,GNSS)和惯性导航系统(Inertial Navigation System,INS)三种技术于一身的系统,通过这三种技术的结合,不仅可以主动、实时地感知环境、物体动态空间位置关系,也可在一致绝对测量点位的情况下,生成精确的三维空间模型,应用于地表遥感,例如地面高程和地貌、林业调查等数据获取,以及自动驾驶和高精度地图制作。LiDAR is a system that integrates three technologies: laser, Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS). Through the combination of these three technologies, it can not only actively and in real time perceive the environment and the dynamic spatial position relationship of objects, but also generate accurate three-dimensional spatial models under the condition of consistent absolute measurement points. It is used in surface remote sensing, such as ground elevation and landform, forestry survey and other data acquisition, as well as autonomous driving and high-precision map production.
激光雷达主动发射激光束,通过测量光线打到物体或表面再反射回来所需要的时间,来计算激光雷达到目标点的距离,上述行为在快速重复过程中会获取数百万个数据点,利用这些数据点,系统会构建出其正在测量的地理空间表面的复杂“地图”,称为“点云”。由于点云是激光雷达收到的光束对物体的反馈数值,所以点云中的每个点都包含了三维坐标数值,也即我们常说的x、y、z三个元素。LiDAR actively emits a laser beam and measures the time it takes for the light to hit an object or surface and then reflect back to calculate the distance from the target point to the laser radar. The above behavior will obtain millions of data points in a rapid and repeated process. Using these data points, the system will build a complex "map" of the geographic space surface it is measuring, called a "point cloud." Since the point cloud is the feedback value of the laser radar's received light beam on the object, each point in the point cloud contains three-dimensional coordinate values, which are the three elements we often call x, y, and z.
能够理解的是,点云数据可以通过人工或自动的方式进行采集。例如,通过人为控制激光雷达采集障碍物范围内的点云数据。通过自主移动设备上的激光雷达在移动过程中自主采集障碍物范围内的点云数据。It is understood that the point cloud data can be collected manually or automatically. For example, the point cloud data within the obstacle range can be collected by manually controlling the laser radar. The point cloud data within the obstacle range can be collected autonomously by the laser radar on the autonomous mobile device during movement.
本公开实施例中,为了获得丰富的训练数据,可以针对多个障碍物获取环视图像和点云数据,针对每个障碍物,均获取对应的环视图像和点云数据。In the disclosed embodiment, in order to obtain rich training data, surround view images and point cloud data may be acquired for multiple obstacles, and corresponding surround view images and point cloud data may be acquired for each obstacle.
步骤102,基于所述点云数据,以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征。Step 102: Based on the point cloud data and preset voxels in the corresponding three-dimensional space, determine the real features of the obstacle corresponding to each voxel of the obstacle.
其中,预设的体素可以是按照预设的尺寸对三维空间进行体素划分得到的。The preset voxels may be obtained by dividing the three-dimensional space into voxels according to a preset size.
本公开实施例中,对于获取的点云数据,可以将点云数据转换至三维空间中,并对三维空间进行体素划分,得到多个体素,再对各体素中的点云数据进行处理,得到障碍物在每个体素中对应的障碍物真实特征。In the disclosed embodiment, the acquired point cloud data can be converted into a three-dimensional space, and the three-dimensional space can be divided into voxels to obtain a plurality of voxels. The point cloud data in each voxel can then be processed to obtain the true features of the obstacle corresponding to the obstacle in each voxel.
其中,障碍物真实特征可以包括将障碍物对应的三维空间划分为多个体素时,每个体素的真实占用信息,占用信息反映了体素内是否包含物体,被占用的体素中包含物体,未
被占用的体素中则不包含物体;也可以包括障碍物中被占用体素的真实位置和尺寸信息。The real features of the obstacle may include the real occupancy information of each voxel when the three-dimensional space corresponding to the obstacle is divided into multiple voxels. The occupancy information reflects whether the voxel contains an object. The occupied voxel contains an object, and the unoccupied voxel contains an object. Occupied voxels do not contain objects; they may also include the true position and size information of occupied voxels in obstacles.
本公开实施例中,点云数据中的障碍物被认定为是真实的,进而以该障碍物作为基准,确定的障碍物真实特征作为训练目标,参与后续的监督学习。In the disclosed embodiment, obstacles in the point cloud data are identified as real, and then the obstacles are used as a benchmark, and the determined real features of the obstacles are used as training targets to participate in subsequent supervised learning.
步骤103,将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征。Step 103: input the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle.
其中,初始模型可以包括特征提取网络、特征转换网络和障碍物特征预测网络。能够理解的是,初始模型的网络结构并不限于上述三种网络,可以根据实际需要进行设置,本公开实施例对此不作限制。The initial model may include a feature extraction network, a feature conversion network, and an obstacle feature prediction network. It is understood that the network structure of the initial model is not limited to the above three networks and can be set according to actual needs, which is not limited in the embodiments of the present disclosure.
在一些实施例中,初始模型可以包括特征提取网络、特征转换网络以及障碍物特征预测网络,前述各个网络可以为一个或多个。其中,特征提取网络用于提取不同的特征,例如,初始模型可以包括目前常用的特征提取网络以及特征转换网络,特征提取网络用于基于环视图像提取出二维特征,特征转换网络用于将二维特征转换为三维特征,进而由障碍物特征预测网络基于三维特征进行障碍物特征预测。能够理解的是,根据实际需求,障碍物特征预测网络可以是一个或多个,当障碍物特征预测网络为多个时,每个网络可以用于预测不同类别的特征,将各障碍物特征预测网络分别预测的特征作为最终得到的障碍物特征。In some embodiments, the initial model may include a feature extraction network, a feature conversion network, and an obstacle feature prediction network, and each of the aforementioned networks may be one or more. Among them, the feature extraction network is used to extract different features. For example, the initial model may include the currently commonly used feature extraction network and feature conversion network. The feature extraction network is used to extract two-dimensional features based on the surround image, and the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and then the obstacle feature prediction network predicts obstacle features based on the three-dimensional features. It can be understood that according to actual needs, the obstacle feature prediction network can be one or more. When there are multiple obstacle feature prediction networks, each network can be used to predict features of different categories, and the features predicted by each obstacle feature prediction network are used as the final obstacle features.
本公开实施例中,将环视图像输入至待训练的初始模型,由初始模型对环视图像进行特征提取、特征转换、障碍物特征预测等处理,最终输出基于环视图像预测得到的障碍物预测特征。In the disclosed embodiment, the surround view image is input into the initial model to be trained, and the initial model performs feature extraction, feature conversion, obstacle feature prediction and other processing on the surround view image, and finally outputs the obstacle prediction features obtained based on the surround view image prediction.
其中,初始模型输出的障碍物预测特征可以包括体素占用预测信息、体素位置预测信息和体素尺寸预测信息中的至少一个。The obstacle prediction features output by the initial model may include at least one of voxel occupancy prediction information, voxel position prediction information and voxel size prediction information.
步骤104,根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。Step 104: training the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
本公开实施例中,获取了初始模型输出的障碍物预测特征之后,可以基于同一障碍物对应的障碍物预测特征和障碍物真实特征,对初始模型进行训练,通过不断地调整初始模型的网络参数,得到训练完成的障碍物特征识别模型,障碍物特征识别模型能够较准确的识别出环视图像中障碍物的3D特征,为准确识别出障碍物内的障碍物提供数据支撑。In the disclosed embodiment, after obtaining the obstacle prediction features output by the initial model, the initial model can be trained based on the obstacle prediction features and the real features of the obstacle corresponding to the same obstacle. By continuously adjusting the network parameters of the initial model, a trained obstacle feature recognition model is obtained. The obstacle feature recognition model can more accurately identify the 3D features of obstacles in the surround view image, and provide data support for accurately identifying obstacles within obstacles.
在一些实施例中,可以根据障碍物预测特征和障碍物真实特征之间的差异,调整初始模型的网络参数,例如,初始模型中有卷积神经网络,则可以调整卷积核的数量、步长等参数,又例如,可以调整池化层的参数等。In some embodiments, the network parameters of the initial model can be adjusted according to the difference between the predicted features of the obstacle and the real features of the obstacle. For example, if there is a convolutional neural network in the initial model, the number of convolution kernels, step size and other parameters can be adjusted. For another example, the parameters of the pooling layer can be adjusted.
在一些实施例中,可以利用预设的损失函数,根据障碍物预测特征和障碍物真实特征的数据进行损失值的计算,在计算得到的损失值与预设的损失阈值存在差异或者差异超出允许的误差范围时,则调整初始模型的网络参数;在损失值与损失阈值不存在差异或者差异在允许的误差范围内时,则结束训练,得到训练完成的障碍物特征识别模型。In some embodiments, a preset loss function can be used to calculate the loss value based on the data of the obstacle prediction features and the obstacle real features. When the calculated loss value is different from the preset loss threshold or the difference exceeds the allowable error range, the network parameters of the initial model are adjusted; when there is no difference between the loss value and the loss threshold or the difference is within the allowable error range, the training is terminated to obtain a trained obstacle feature recognition model.
本公开实施例的障碍物特征识别模型的训练方法,通过获取同一障碍物对应的环视图
像及点云数据,基于点云数据,将点云数据在三维空间中进行体素划分,确定障碍物在每个体素中对应的障碍物真实特征,将环视图像输入待训练的初始模型,得到障碍物对应的障碍物预测特征,进而根据障碍物对应的障碍物真实特征和障碍物预测特征对初始模型进行训练,得到障碍物特征识别模型。采用上述技术方案,利用初始模型对环视图像进行障碍物特征预测,并利用基于点云数据确定的障碍物真实特征作为训练目标,对初始模型进行训练以得到障碍物特征识别模型,使得训练后的模型能够从环视图像中学习到准确的障碍物特征,提高障碍物特征识别模型对障碍物的3D特征的识别准确度,从而有利于提高障碍物识别的准确率。The obstacle feature recognition model training method of the disclosed embodiment obtains the surrounding view corresponding to the same obstacle The system uses the image and point cloud data, divides the point cloud data into voxels in three-dimensional space based on the point cloud data, determines the real obstacle features corresponding to the obstacle in each voxel, inputs the surround view image into the initial model to be trained, obtains the obstacle prediction features corresponding to the obstacle, and then trains the initial model according to the real obstacle features and the obstacle prediction features corresponding to the obstacle to obtain an obstacle feature recognition model. The above technical solution is adopted, the initial model is used to predict the obstacle features of the surround view image, and the real obstacle features determined based on the point cloud data are used as the training target to train the initial model to obtain the obstacle feature recognition model, so that the trained model can learn accurate obstacle features from the surround view image, improve the recognition accuracy of the obstacle feature recognition model for the 3D features of the obstacle, and thus help improve the accuracy of obstacle recognition.
在本公开的一种可选实施方式中,障碍物真实特征可以包括体素占用真实信息,障碍物特征预测网络可以包括体素占用预测网络,从而,如图2所示,在前述实施例的基础上,步骤103可以包括步骤201至步骤203。In an optional implementation of the present disclosure, the real features of the obstacle may include real voxel occupancy information, and the obstacle feature prediction network may include a voxel occupancy prediction network. Thus, as shown in FIG. 2 , based on the aforementioned embodiment, step 103 may include steps 201 to 203.
步骤201,将所述环视图像输入特征提取网络,得到所述环视图像对应的二维特征。Step 201: input the surround view image into a feature extraction network to obtain two-dimensional features corresponding to the surround view image.
其中,初始模型中的特征提取网络可以是目前常用的图像特征提取网络,用于从环视图像中提取出二维特征。The feature extraction network in the initial model may be a currently commonly used image feature extraction network, which is used to extract two-dimensional features from the surround image.
步骤202,将所述二维特征输入所述特征转换网络,得到所述障碍物对应的三维特征。Step 202: input the two-dimensional features into the feature conversion network to obtain the three-dimensional features corresponding to the obstacles.
其中,初始模型中的特征转换网络可以是目前常用的特征转换网络,特征转换网络用于进行特征转换,将输入的二维特征恢复为三维特征。Among them, the feature conversion network in the initial model can be a currently commonly used feature conversion network, and the feature conversion network is used to perform feature conversion and restore the input two-dimensional features to three-dimensional features.
本公开实施例中,将获取的环视图像输入初始模型之后,先经过初始模型的特征提取网络对环视图像进行特征提取,得到二维特征,再由初始模型中的特征转换网络进行特征转换,将二维特征恢复为三维特征,得到障碍物的三维特征。In the disclosed embodiment, after the acquired surround view image is input into the initial model, the surround view image is first subjected to feature extraction by the feature extraction network of the initial model to obtain two-dimensional features, and then the feature conversion network in the initial model performs feature conversion to restore the two-dimensional features to three-dimensional features to obtain the three-dimensional features of the obstacle.
步骤203,将所述三维特征输入所述体素占用预测网络,获取所述体素占用预测网络输出的体素占用预测信息。Step 203: input the three-dimensional features into the voxel occupancy prediction network to obtain voxel occupancy prediction information output by the voxel occupancy prediction network.
其中,体素占用预测网络是根据实际需求搭建的网络,即障碍物特征预测网络中包含体素占用预测网络,体素占用预测网络可以有多种结构,能够实现基于三维特征预测出体素的占用信息即可,本公开实施例对体素占用预测网络的具体结构不作限制。Among them, the voxel occupancy prediction network is a network built according to actual needs, that is, the obstacle feature prediction network includes the voxel occupancy prediction network. The voxel occupancy prediction network can have a variety of structures, and it can predict the occupancy information of voxels based on three-dimensional features. The embodiments of the present disclosure do not limit the specific structure of the voxel occupancy prediction network.
本公开实施例中,将初始网络的特征转换网络输出的三维特征输入至体素占用预测网络,由体素占用预测网络输出体素占用预测信息。能够理解的是,本公开实施例中,体素占用预测网络在基于三维特征进行预测时,可以先将三维特征划分为多个体素,进而预测每个体素对应的占用信息。In the disclosed embodiment, the three-dimensional features output by the feature conversion network of the initial network are input to the voxel occupancy prediction network, and the voxel occupancy prediction network outputs the voxel occupancy prediction information. It can be understood that in the disclosed embodiment, when the voxel occupancy prediction network makes a prediction based on the three-dimensional features, it can first divide the three-dimensional features into a plurality of voxels, and then predict the occupancy information corresponding to each voxel.
其中,体素占用预测信息用于反映体素是否被占用,体素被占用表示体素内包含物体,体素未被占用即表示体素内不包含物体。The voxel occupancy prediction information is used to reflect whether the voxel is occupied. If the voxel is occupied, it means that the voxel contains an object, and if the voxel is not occupied, it means that the voxel does not contain an object.
作为一种示例,体素占用预测信息可以通过预设的标识进行表示。例如,预先设置“1”表示体素被占用,设置“0”表示体素未被占用,则体素占用预测网络在预测三维特征中各体素的占用状态时,对于某个体素,若预测结果为该体素被占用,则将该体素标记为“1”,若预测结果为该体素未被占用,则将该体素标记为“0”。体素的标记信息即代表了体素占用预
测信息。As an example, the voxel occupancy prediction information can be represented by a preset identifier. For example, "1" is preset to indicate that the voxel is occupied, and "0" is preset to indicate that the voxel is not occupied. When the voxel occupancy prediction network predicts the occupancy state of each voxel in the three-dimensional feature, for a certain voxel, if the prediction result is that the voxel is occupied, the voxel is marked as "1", and if the prediction result is that the voxel is not occupied, the voxel is marked as "0". The label information of the voxel represents the voxel occupancy prediction. Test information.
在本公开实施例中,通过获取体素占用真实信息作为学习目标,并在初始模型中设置特征提取网络、特征转换网络和体素占用预测网络,使得初始模型能够对输入的环视图像的体素占用情况进行学习,由于体素是否被占用反映了体素内是否有物体,这也为进一步预测障碍物的位置、缩小障碍物检测范围提供了条件。In the disclosed embodiment, by obtaining the real information of voxel occupancy as the learning target, and setting a feature extraction network, a feature conversion network and a voxel occupancy prediction network in the initial model, the initial model can learn the voxel occupancy of the input surround image. Since whether a voxel is occupied reflects whether there is an object in the voxel, this also provides conditions for further predicting the position of obstacles and narrowing the obstacle detection range.
在本公开的一种可选实施方式中,障碍物真实特征还可以包括体素对应的体素位置真实信息及体素尺寸真实信息,障碍物特征预测网络还可以包括体素位置预测网络和体素尺寸预测网络,如图2所示,步骤103还可以包括步骤204至步骤205。In an optional implementation of the present disclosure, the real features of the obstacle may also include the real voxel position information and the real voxel size information corresponding to the voxel, and the obstacle feature prediction network may also include a voxel position prediction network and a voxel size prediction network, as shown in FIG. 2 , and step 103 may also include steps 204 to 205.
步骤204,将所述三维特征输入所述体素位置预测网络,获取所述体素位置预测网络输出的体素位置预测信息。Step 204: input the three-dimensional features into the voxel position prediction network to obtain voxel position prediction information output by the voxel position prediction network.
其中,体素位置预测网络是根据实际需求搭建的网络,即障碍物特征预测网络中还包含体素位置预测网络,体素位置预测网络可以有多种结构,能够实现基于三维特征预测出体素的位置信息即可,本公开实施例对体素位置预测网络的具体结构不作限制。Among them, the voxel position prediction network is a network built according to actual needs, that is, the obstacle feature prediction network also includes the voxel position prediction network. The voxel position prediction network can have a variety of structures, and it can predict the position information of voxels based on three-dimensional features. The embodiments of the present disclosure do not limit the specific structure of the voxel position prediction network.
本公开实施例中,初始模型的特征转换网络提取出三维特征之后,还可以将三维特征输入至体素位置预测网络,由体素位置预测网络基于三维特征输出体素位置预测信息。In the disclosed embodiment, after the feature conversion network of the initial model extracts the three-dimensional features, the three-dimensional features can also be input into the voxel position prediction network, and the voxel position prediction network outputs voxel position prediction information based on the three-dimensional features.
其中,体素位置预测信息反映了体素内物体(即障碍物)所在的位置。Among them, the voxel position prediction information reflects the position of the object (ie, obstacle) within the voxel.
作为一种示例,可以预测体素内物体的质心所在的位置来作为体素位置预测信息,体素位置预测信息可以表示为(x,y,z),其中,x、y、z分别表示预测的物体的质心分别在x轴、y轴和z轴上的坐标值。As an example, the position of the center of mass of an object within a voxel can be predicted as voxel position prediction information, and the voxel position prediction information can be expressed as (x, y, z), where x, y, and z represent the coordinate values of the predicted center of mass of the object on the x-axis, y-axis, and z-axis, respectively.
作为一种示例,可以预测体素内物体的质心相对于体素的中心点的坐标偏移量来作为体素位置预测信息,体素位置预测信息可以表示为(dx,dy,dz),其中,dx表示体素的中心点相对于物体的质心在x轴上的偏移量,dy表示体素的中心点相对于物体的质心在y轴上的偏移量,dz表示体素的中心点相对于物体的质心在z轴上的偏移量。能够理解的是,dx、dy和dz为预测值。As an example, the coordinate offset of the center of mass of the object in the voxel relative to the center point of the voxel can be predicted as the voxel position prediction information, and the voxel position prediction information can be expressed as (dx, dy, dz), where dx represents the offset of the center point of the voxel relative to the center of mass of the object on the x-axis, dy represents the offset of the center point of the voxel relative to the center of mass of the object on the y-axis, and dz represents the offset of the center point of the voxel relative to the center of mass of the object on the z-axis. It can be understood that dx, dy and dz are predicted values.
步骤205,将所述三维特征输入所述体素尺寸预测网络,获取所述体素尺寸预测网络输出的体素尺寸预测信息。Step 205: input the three-dimensional features into the voxel size prediction network to obtain voxel size prediction information output by the voxel size prediction network.
其中,体素尺寸预测网络是根据实际需求搭建的网络,即障碍物特征预测网络中还包含体素尺寸预测网络,体素尺寸预测网络可以有多种结构,能够实现基于三维特征预测出体素的尺寸信息即可,本公开实施例对体素尺寸预测网络的具体结构不作限制。Among them, the voxel size prediction network is a network built according to actual needs, that is, the obstacle feature prediction network also includes a voxel size prediction network. The voxel size prediction network can have a variety of structures, and it is sufficient to predict the size information of voxels based on three-dimensional features. The embodiments of the present disclosure do not limit the specific structure of the voxel size prediction network.
本公开实施例中,初始模型的特征转换网络提取出三维特征之后,还可以将三维特征输入至体素尺寸预测网络,由体素尺寸预测网络基于三维特征输出体素尺寸预测信息。In the disclosed embodiment, after the feature conversion network of the initial model extracts the three-dimensional features, the three-dimensional features can also be input into the voxel size prediction network, and the voxel size prediction network outputs voxel size prediction information based on the three-dimensional features.
其中,体素尺寸预测信息反映了体素内物体(即障碍物)的大小。Among them, the voxel size prediction information reflects the size of the object (ie, obstacle) within the voxel.
作为一种示例,可以预测体素内点云的最小外接长方体的长宽高来作为体素尺寸预测信息,体素尺寸预测信息可以表示为(dl,dw,dh),其中,dl表示最小外接长方体的长,dw表示最小外接长方体的宽,dh表示最小外接长方体的高。
As an example, the length, width and height of the minimum circumscribed cuboid of the point cloud within the voxel can be predicted as voxel size prediction information. The voxel size prediction information can be expressed as (dl, dw, dh), where dl represents the length of the minimum circumscribed cuboid, dw represents the width of the minimum circumscribed cuboid, and dh represents the height of the minimum circumscribed cuboid.
本公开实施中,初始模型包括体素占用预测网络、体素位置预测网络和体素尺寸预测网络,从而能够预测出体素占用信息、位置信息和尺寸信息,这些信息组成了障碍物预测特征。In the implementation of the present disclosure, the initial model includes a voxel occupancy prediction network, a voxel position prediction network and a voxel size prediction network, so that voxel occupancy information, position information and size information can be predicted, and these information constitute obstacle prediction features.
在一些实施例中,障碍物预测特征可以由多个体素对应的预测特征进行表示,每个体素对应的预测特征可以表示为(1,x,y,z,dx,dy,dz,dl,dw,dh),其中,1为体素占用预测信息,表示体素被占用,(x,y,z)表示体素的中心点坐标,(dx,dy,dz)表示体素位置预测信息,(dl,dw,dh)表示体素尺寸预测信息。障碍物预测特征还可以表示为(0),0为体素占用预测信息,表示体素未被占用,未被占用体素无体素位置预测信息和体素尺寸预测信息;或者,障碍物预测特征还可以表示为(0,x,y,z,0,0,0,0,0,0),其中,第一个0为体素占用预测信息,表示体素未被占用,(x,y,z)表示该未被占用体素的中心点坐标,其余的0则表示未被占用体素无体素位置预测信息和体素尺寸预测信息。In some embodiments, the obstacle prediction feature can be represented by the prediction features corresponding to multiple voxels, and the prediction features corresponding to each voxel can be expressed as (1, x, y, z, dx, dy, dz, dl, dw, dh), where 1 is the voxel occupancy prediction information, indicating that the voxel is occupied, (x, y, z) represents the center point coordinates of the voxel, (dx, dy, dz) represents the voxel position prediction information, and (dl, dw, dh) represents the voxel size prediction information. The obstacle prediction feature can also be expressed as (0), where 0 is the voxel occupancy prediction information, indicating that the voxel is unoccupied, and the unoccupied voxel has no voxel position prediction information and voxel size prediction information; or, the obstacle prediction feature can also be expressed as (0, x, y, z, 0, 0, 0, 0, 0), where the first 0 is the voxel occupancy prediction information, indicating that the voxel is unoccupied, (x, y, z) represents the center point coordinates of the unoccupied voxel, and the remaining 0s indicate that the unoccupied voxels have no voxel position prediction information and voxel size prediction information.
需要说明的是,上述步骤203、204、205之间没有先后顺序,可以一起执行或分开执行。It should be noted that there is no order among the above steps 203, 204, and 205, and they can be executed together or separately.
本公开实施例的障碍物特征识别模型的训练方法,通过获取体素位置真实信息和体素尺寸真实信息作为学习目标,并在初始模型中设置体素位置预测网络和体素尺寸预测网络,使得初始模型能够对输入的环视图像的体素位置和体素尺寸进行学习,从而训练后的模型能够基于输入的环视图像预测出更精准的体素的位置和尺寸,获得准确的障碍物的3D特征。The training method of the obstacle feature recognition model of the embodiment of the present disclosure obtains the real information of the voxel position and the real information of the voxel size as the learning target, and sets the voxel position prediction network and the voxel size prediction network in the initial model, so that the initial model can learn the voxel position and voxel size of the input surround image, so that the trained model can predict more accurate voxel positions and sizes based on the input surround image, and obtain accurate 3D features of obstacles.
在本公开的一种可选实施方式中,在根据障碍物对应的障碍物真实特征和障碍物预测特征对初始模型进行训练时,可以分别根据体素占用预测信息和体素占用真实信息,确定第一损失值,根据体素位置预测信息和体素位置真实信息,确定第二损失值,以及根据体素尺寸预测信息和体素尺寸真实信息,确定第三损失值,进而基于第一损失值、第二损失值和第三损失值,调整初始模型的网络参数。In an optional embodiment of the present disclosure, when the initial model is trained according to the real obstacle features and the predicted obstacle features corresponding to the obstacles, a first loss value can be determined according to the voxel occupancy prediction information and the real voxel occupancy information, a second loss value can be determined according to the voxel position prediction information and the real voxel position information, and a third loss value can be determined according to the voxel size prediction information and the real voxel size information, and then the network parameters of the initial model can be adjusted based on the first loss value, the second loss value and the third loss value.
本公开实施例中,在计算第一损失值、第二损失值和第三损失值时,可以先采用相关技术从障碍物真实特征和障碍物预测特征中确定至少一个体素对,例如,对于障碍物预测特征中的任一体素,可以根据该体素的坐标(即中心点坐标),从障碍物真实特征中找到与其具体相同坐标的目标体素,则该体素和该目标体素即构成一个体素对。接着,利用体素对对应的体素占用预测信息和体素占用真实信息计算得到第一损失值,利用体素对对应的体素位置预测信息和体素位置真实信息计算得到第二损失值,以及利用体素对对应的体素尺寸预测信息和体素尺寸真实信息计算得到第三损失值。In the disclosed embodiment, when calculating the first loss value, the second loss value, and the third loss value, relevant technologies may be used to first determine at least one voxel pair from the real feature of the obstacle and the predicted feature of the obstacle. For example, for any voxel in the predicted feature of the obstacle, the target voxel with the same specific coordinates as the real feature of the obstacle may be found based on the coordinates of the voxel (i.e., the center point coordinates), and the voxel and the target voxel constitute a voxel pair. Then, the first loss value is calculated using the voxel occupancy prediction information and the real voxel occupancy information corresponding to the voxel pair, the second loss value is calculated using the voxel position prediction information and the real voxel position information corresponding to the voxel pair, and the third loss value is calculated using the voxel size prediction information and the real voxel size information corresponding to the voxel pair.
其中,计算各损失值采用的损失函数可以根据实际需要预先设定,本公开实施例对计算各损失值使用的损失函数不作限制。Among them, the loss function used to calculate each loss value can be pre-set according to actual needs, and the embodiment of the present disclosure does not limit the loss function used to calculate each loss value.
作为一种示例,在基于第一损失值、第二损失值和第三损失值调整初始模型的网络参数时,可以分别基于各损失值调整初始模型的网络参数。例如,在第一损失值不满足对应的损失阈值时,可以调整初始模型中体素占用预测网络的网络参数,还可以调整体素占用
预测网络之前的其他网络的网络参数;在第二损失值不满足对应的损失阈值时,可以调整初始模型中体素位置预测网络的网络参数,还可以调整体素位置预测网络之前的其他网络的网络参数;在第三损失值不满足对应的损失阈值时,可以调整初始模型中体素尺寸预测网络的网络参数,还可以调整体素尺寸预测网络之前的其他网络的网络参数。As an example, when adjusting the network parameters of the initial model based on the first loss value, the second loss value, and the third loss value, the network parameters of the initial model can be adjusted based on each loss value. For example, when the first loss value does not meet the corresponding loss threshold, the network parameters of the voxel occupancy prediction network in the initial model can be adjusted, and the voxel occupancy prediction network can also be adjusted. The network parameters of other networks before the prediction network can be adjusted; when the second loss value does not meet the corresponding loss threshold, the network parameters of the voxel position prediction network in the initial model can be adjusted, and the network parameters of other networks before the voxel position prediction network can also be adjusted; when the third loss value does not meet the corresponding loss threshold, the network parameters of the voxel size prediction network in the initial model can be adjusted, and the network parameters of other networks before the voxel size prediction network can also be adjusted.
作为一种示例,在基于第一损失值、第二损失值和第三损失值调整初始模型的网络参数时,可以先基于第一损失值、第二损失值和第三损失值计算得到初始模型的损失值。例如,可以计算第一损失值、第二损失值和第三损失值的均值作为初始模型的损失值;又例如,可以计算第一损失值、第二损失值和第三损失值的和值作为初始模型的损失值,等等。接着,可以基于初始模型的损失值来调整初始模型的网络参数。在初始模型的损失值不满足预设的损失阈值时,则调整初始模型的至少一个网络参数。As an example, when adjusting the network parameters of the initial model based on the first loss value, the second loss value, and the third loss value, the loss value of the initial model can be first calculated based on the first loss value, the second loss value, and the third loss value. For example, the average of the first loss value, the second loss value, and the third loss value can be calculated as the loss value of the initial model; for another example, the sum of the first loss value, the second loss value, and the third loss value can be calculated as the loss value of the initial model, and so on. Then, the network parameters of the initial model can be adjusted based on the loss value of the initial model. When the loss value of the initial model does not meet the preset loss threshold, at least one network parameter of the initial model is adjusted.
在本公开实施例中,通过根据体素占用预测信息和体素占用真实信息确定第一损失值,根据体素位置预测信息和体素位置真实信息确定第二损失值,以及根据体素尺寸预测信息和体素尺寸真实信息确定第三损失值,进而基于第一损失值、第二损失值和第三损失值,调整初始模型的网络参数,由此,实现了对初始模型的迭代训练,以获得能够精准预测环视图像中障碍物3D特征的障碍物特征识别模型。In the embodiment of the present disclosure, a first loss value is determined based on the voxel occupancy prediction information and the voxel occupancy real information, a second loss value is determined based on the voxel position prediction information and the voxel position real information, and a third loss value is determined based on the voxel size prediction information and the voxel size real information, and then based on the first loss value, the second loss value and the third loss value, the network parameters of the initial model are adjusted, thereby achieving iterative training of the initial model to obtain an obstacle feature recognition model that can accurately predict the 3D features of obstacles in the surround view image.
在本公开的一种可选实施方式中,如图3所示,在前述实施例的基础上,步骤102可以包括步骤301至步骤303。In an optional implementation of the present disclosure, as shown in FIG. 3 , based on the above-mentioned embodiment, step 102 may include steps 301 to 303 .
步骤301,将所述点云数据转换至三维坐标系中。Step 301: convert the point cloud data into a three-dimensional coordinate system.
本公开实施例中,可以获取传感器(如激光雷达)采集的点云数据,并将获取的点云数据转换至三维坐标系中。由于点云数据中的每个点都包含了三维坐标数值,在转换时,可以基于每个点所包含的三维坐标数值,将获取的点云数据转换至三维坐标系中。In the disclosed embodiment, point cloud data collected by a sensor (such as a laser radar) can be obtained, and the obtained point cloud data can be converted into a three-dimensional coordinate system. Since each point in the point cloud data contains a three-dimensional coordinate value, during the conversion, the obtained point cloud data can be converted into a three-dimensional coordinate system based on the three-dimensional coordinate value contained in each point.
步骤302,按照预设尺寸对所述三维坐标系下的三维空间进行体素化,得到所述三维空间中的多个体素。Step 302 : voxelize the three-dimensional space in the three-dimensional coordinate system according to a preset size to obtain a plurality of voxels in the three-dimensional space.
其中,预设尺寸可以根据实际需求预先设置,比如,可以设置预设尺寸为0.1m*0.1m*0.1m、0.1m*0.2m*0.1m、0.1m*0.1m*0.2m、0.1m*0.2m*0.3m,等等。Among them, the preset size can be pre-set according to actual needs. For example, the preset size can be set to 0.1m*0.1m*0.1m, 0.1m*0.2m*0.1m, 0.1m*0.1m*0.2m, 0.1m*0.2m*0.3m, and so on.
本公开实施例中,对于三维坐标系下的三维空间,可以将该三维空间按照固定的预设尺寸进行体素划分,将三维空间划分为尺寸相同的多个体素。In the embodiment of the present disclosure, for a three-dimensional space in a three-dimensional coordinate system, the three-dimensional space may be divided into voxels according to a fixed preset size, and the three-dimensional space may be divided into a plurality of voxels of the same size.
在一些实施例中,假设三维空间的空间范围为30m*30m*10m,预设尺寸为0.1m*0.1m*0.1m,则按照该预设尺寸对三维空间进行体素化,可以将三维空间划分为(300,300,100)个等大小的体素。In some embodiments, assuming that the spatial range of the three-dimensional space is 30m*30m*10m and the preset size is 0.1m*0.1m*0.1m, the three-dimensional space is voxelized according to the preset size, and the three-dimensional space can be divided into (300, 300, 100) voxels of equal size.
步骤303,根据所述多个体素中每个体素包含的所述点云数据,利用体素内点云的关键点及点云的外接图形生成所述障碍物在每个体素中对应的障碍物真实特征,所述障碍物真实特征包括所述每个体素分别对应的体素占用真实信息、非空体素对应的体素位置真实信息和体素尺寸真实信息。Step 303: Based on the point cloud data contained in each voxel of the multiple voxels, the real features of the obstacle corresponding to the obstacle in each voxel are generated by using the key points of the point cloud in the voxel and the external graphics of the point cloud. The real features of the obstacle include the real information of voxel occupancy corresponding to each voxel, the real information of voxel position corresponding to the non-empty voxel, and the real information of voxel size.
其中,即体素内点云的关键点可以为体素内点云的质心,体素内点云的外接图形可以
为体素内点云的最小外接长方体。Among them, the key point of the point cloud within the voxel can be the centroid of the point cloud within the voxel, and the external graphics of the point cloud within the voxel can be is the minimum circumscribed cuboid of the point cloud within the voxel.
本公开实施例中,对于划分出来的每个体素,可以确定该体素中包含的点云数据,进而根据体素中包含的点云数据、体素内点云的关键点及点云的外接图形,确定该体素的真实体素信息,其中,真实体素信息包括该体素的体素占用真实信息、非空体素(即体素占用真实信息为占用的体素)对应的体素位置真实信息和体素尺寸真实信息。每个体素对应的体素占用真实信息、被占用的非空体素对应的体素位置真实信息和体素尺寸真实信息,构成了点云数据所属障碍物在每个体素中对应的障碍物真实特征。In the disclosed embodiment, for each divided voxel, the point cloud data contained in the voxel can be determined, and then the real voxel information of the voxel can be determined based on the point cloud data contained in the voxel, the key points of the point cloud in the voxel, and the external graphics of the point cloud, wherein the real voxel information includes the real voxel occupancy information of the voxel, the real voxel position information corresponding to the non-empty voxel (i.e., the voxel whose real voxel occupancy information is occupied voxel), and the real voxel size information. The real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the occupied non-empty voxel, and the real voxel size information constitute the real obstacle feature corresponding to the obstacle to which the point cloud data belongs in each voxel.
其中,体素占用真实信息可以根据体素中是否包含点来判断,并可以通过不同的标记来表示体素占用真实信息。例如,对于包含点的体素,可以标记“1”来作为该体素的体素占用真实信息,表示该体素内存在物体;对于不包含点的体素,可以标记“0”来作为该体素的体素占用真实信息,表示该体素内不存在物体。The voxel occupancy real information can be determined based on whether the voxel contains a point, and can be represented by different marks. For example, for a voxel containing a point, "1" can be marked as the voxel occupancy real information of the voxel, indicating that an object exists in the voxel; for a voxel that does not contain a point, "0" can be marked as the voxel occupancy real information of the voxel, indicating that no object exists in the voxel.
本公开实施例中,体素真实的体素位置真实信息和体素尺寸真实信息可以通过不同的参数来表征,下面举例进行说明。In the disclosed embodiment, the real voxel position information and the real voxel size information of the voxel can be represented by different parameters, which are described below with examples.
在一些实施例中,体素的体素位置真实信息可以用体素内点云的质心(即关键点)所对应的坐标来表示,体素的体素尺寸真实信息可以利用体素内点云的最小外接长方体的各顶点的坐标来表示。In some embodiments, the true voxel position information of the voxel can be represented by the coordinates corresponding to the centroid (ie, key point) of the point cloud within the voxel, and the true voxel size information of the voxel can be represented by the coordinates of the vertices of the minimum circumscribed cuboid of the point cloud within the voxel.
在一些实施例中,体素的体素位置真实信息可以用体素的中心点坐标相对于体素内点云的质心(即关键点)所对应的坐标之间的偏移量来表示,体素的体素尺寸真实信息可以利用体素内点云的最小外接长方体的长度、宽度和高度进行表示。In some embodiments, the true voxel position information of the voxel can be represented by the offset between the coordinates of the center point of the voxel and the coordinates corresponding to the center of mass of the point cloud within the voxel (i.e., the key point), and the true voxel size information of the voxel can be represented by the length, width, and height of the minimum circumscribed cuboid of the point cloud within the voxel.
本公开实施例的障碍物特征识别模型的训练方法,通过将点云数据转换至三维坐标系中,按照预设尺寸对三维坐标系下的三维空间进行体素化,得到三维空间中的多个体素,进而根据多个体素中每个体素包含的点云数据,利用体素内点云的关键点及点云的外接图形生成障碍物在每个体素中对应的障碍物真实特征,障碍物真实特征包括每个体素分别对应的体素占用真实信息、非空体素对应的体素位置真实信息和体素尺寸真实信息。采用上述技术方案,在将三维空间划分为多个固定尺寸的体素之后,进一步根据每个体素中包含的点云数据,确定每个体素分别对应的体素占用真实信息、体素位置真实信息和体素尺寸真实信息作为体素的真实体素信息,由此,实现了根据点云数据计算所属体素的真实位置和真实尺寸,使得不同体素的尺寸可以是不同的,能够获得精准的体素真实位置和体素真实尺寸,从而有利于提高障碍物位置和尺寸的表示精度,也为训练得到能够准确识别出障碍物特征的模型提供了数据支撑。The training method of the obstacle feature recognition model of the embodiment of the present disclosure converts point cloud data into a three-dimensional coordinate system, voxelizes the three-dimensional space in the three-dimensional coordinate system according to a preset size, and obtains multiple voxels in the three-dimensional space. Then, based on the point cloud data contained in each voxel of the multiple voxels, the key points of the point cloud in the voxel and the external graphics of the point cloud are used to generate the real obstacle features corresponding to the obstacle in each voxel. The real obstacle features include the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information. By adopting the above technical scheme, after dividing the three-dimensional space into multiple voxels of fixed size, the real voxel occupancy information, real voxel position information and real voxel size information corresponding to each voxel are further determined as the real voxel information of the voxel according to the point cloud data contained in each voxel. Thus, the real position and real size of the voxels belonging to the point cloud data are calculated according to the point cloud data, so that the sizes of different voxels can be different, and the accurate real position and real size of the voxels can be obtained, which is conducive to improving the representation accuracy of the obstacle position and size, and also provides data support for training a model that can accurately identify the characteristics of obstacles.
在本公开的一种可选实施方式中,如图4所示,在如图3所示实施例的基础上,步骤303可以包括步骤401至步骤405。In an optional implementation of the present disclosure, as shown in FIG. 4 , based on the embodiment shown in FIG. 3 , step 303 may include steps 401 to 405 .
步骤401,根据所述点云数据中每个点的位置,确定所述每个体素对应的体素占用真实信息,所述体素占用真实信息包括占用和未占用。Step 401 : determining the real voxel occupancy information corresponding to each voxel according to the position of each point in the point cloud data, wherein the real voxel occupancy information includes occupied and unoccupied.
本公开实施例中,将三维空间划分为多个体素之后,可以根据点云数据中每个点的位
置,即每个点包含的坐标数值,判断各个体素中是否包含点或者点云,并根据判断结果确定各体素对应的体素占用真实信息。具体地,对于包含点的体素,确定其对应的体素占用真实信息为占用,对于不包含点的体素,确定其对应的体素占用真实信息为未占用。In the embodiment of the present disclosure, after the three-dimensional space is divided into a plurality of voxels, the position of each point in the point cloud data can be The coordinate value of each point is determined by the position, that is, the coordinate value of each point, to determine whether each voxel contains a point or a point cloud, and determine the voxel occupancy real information corresponding to each voxel according to the judgment result. Specifically, for a voxel containing a point, the corresponding voxel occupancy real information is determined to be occupied, and for a voxel not containing a point, the corresponding voxel occupancy real information is determined to be unoccupied.
在一些实施例中,可以对包含至少一个点的非空体素赋值为1,表示该体素被占用,体素中有物体;对不包含点的空体素赋值为0,表示该体素未被占用,体素中无物体。“1”、“0”即表示体素占用真实信息。In some embodiments, a non-empty voxel containing at least one point may be assigned a value of 1, indicating that the voxel is occupied and there is an object in the voxel; an empty voxel containing no point may be assigned a value of 0, indicating that the voxel is not occupied and there is no object in the voxel. "1" and "0" represent the real information of voxel occupancy.
步骤402,针对所述体素占用真实信息为占用的非空体素,根据所述非空体素中的至少一个点,基于质心计算规则确定所述非空体素内点云的质心作为所述关键点。Step 402 : for the non-empty voxels whose real voxel occupancy information is occupied, determine the centroid of the point cloud in the non-empty voxel as the key point based on a centroid calculation rule according to at least one point in the non-empty voxel.
其中,质心计算规则可以是预设的用于确定非空体素中点云的质心的规则。若非空体素中仅有一个点,则该点可以确定为质心,该点的坐标即为质心的坐标;若非空体素中有多个点,则可以通过目前常用的确定点云的质心的方式,根据非空体素中每个点的坐标数值、每个点的质量等信息,计算得到非空体素内点云的质心及其对应的坐标。质心的坐标可以表征质心所在的位置,即位置坐标。Among them, the centroid calculation rule can be a preset rule for determining the centroid of the point cloud in a non-empty voxel. If there is only one point in the non-empty voxel, the point can be determined as the centroid, and the coordinates of the point are the coordinates of the centroid; if there are multiple points in the non-empty voxel, the centroid of the point cloud in the non-empty voxel and its corresponding coordinates can be calculated according to the coordinate values of each point in the non-empty voxel, the mass of each point and other information by the currently commonly used method of determining the centroid of the point cloud. The coordinates of the centroid can represent the position of the centroid, that is, the position coordinates.
本公开实施例中,对于任一体素占用真实信息为占用的非空体素,可以根据该非空体素中包含的所有点(称为点云),基于质心计算规则,确定该非空体素中点云的质心,并将该质心作为该非空体素内点云的关键点。In the embodiment of the present disclosure, for any non-empty voxel whose real voxel occupancy information is occupied, the center of mass of the point cloud in the non-empty voxel can be determined based on all points contained in the non-empty voxel (called a point cloud) and the center of mass calculation rule, and the center of mass can be used as the key point of the point cloud in the non-empty voxel.
步骤403,根据所述关键点,基于预设的位置确定规则,确定所述非空体素的体素位置真实信息。Step 403: Determine the real voxel position information of the non-empty voxel according to the key point and based on a preset position determination rule.
本公开实施例中,对于筛选出的体素占用真实信息为占用的每个非空体素,可以根据非空体素中点云的关键点,并基于预设的位置确定规则,来确定每个非空体素分别对应的体素位置真实信息和体素尺寸真实信息。In the embodiment of the present disclosure, for each non-empty voxel whose filtered voxel occupancy real information is occupied, the voxel position real information and voxel size real information corresponding to each non-empty voxel can be determined according to the key points of the point cloud in the non-empty voxels and based on the preset position determination rules.
在本公开的一种可选实施方式中,位置确定规则,包括:将所述关键点对应的位置坐标,确定为所述非空体素的体素位置真实信息。从而,本公开实施例中,对于任一非空体素,在根据关键点,基于预设的位置确定规则,确定非空体素的体素位置真实信息时,可以将确定的关键点的位置坐标,直接确定为所属非空体素对应的体素位置真实信息。In an optional implementation of the present disclosure, the position determination rule includes: determining the position coordinates corresponding to the key point as the real voxel position information of the non-empty voxel. Thus, in the embodiment of the present disclosure, for any non-empty voxel, when determining the real voxel position information of the non-empty voxel based on the key point and the preset position determination rule, the position coordinates of the determined key point can be directly determined as the real voxel position information corresponding to the non-empty voxel.
由于非空体素内点云的关键点是点云的质心,点云的质心表示了点云质量分布的平均位置,而点云则描述了地理空间中的障碍物,因此本公开实施例中,可以用点云的质心来表征障碍物的真实位置,又由于三维空间中通用障碍物检测算法是将三维空间划分为体素来识别障碍物的,因此,本公开实施例中,为了确定体素的真实位置,可以用体素内点云的质心(即关键点)的坐标来作为体素的体素位置真实信息。Since the key point of the point cloud in a non-empty voxel is the centroid of the point cloud, the centroid of the point cloud represents the average position of the mass distribution of the point cloud, and the point cloud describes obstacles in geographic space, in the embodiments of the present disclosure, the centroid of the point cloud can be used to characterize the true position of the obstacle. Since the general obstacle detection algorithm in three-dimensional space divides the three-dimensional space into voxels to identify obstacles, in the embodiments of the present disclosure, in order to determine the true position of the voxel, the coordinates of the centroid of the point cloud in the voxel (i.e., the key point) can be used as the true voxel position information of the voxel.
作为一种可能的实现方式,位置确定规则,包括:将非空体素的中心点相对于关键点的坐标偏移量确定为非空体素的位置信息。从而,本公开实施例中,对于任一非空体素,在根据关键点,基于预设的位置确定规则,确定非空体素的体素位置真实信息时,可以先获取非空体素的中心点的第一坐标,再根据该第一坐标以及确定的该非空体素内点云的关键点对应的第二坐标,确定该非空体素的中心点相对于该非空体素内点云的关键点的坐标
偏移量,其中,该坐标偏移量包括三维坐标系中每个坐标轴上分别对应的偏移量,进而将该非空体素的中心点的第一坐标以及该中心点相对于该非空体素内点云的质心的坐标偏移量,确定为该非空体素的体素位置真实信息。As a possible implementation, the position determination rule includes: determining the coordinate offset of the center point of the non-empty voxel relative to the key point as the position information of the non-empty voxel. Therefore, in the embodiment of the present disclosure, for any non-empty voxel, when determining the true voxel position information of the non-empty voxel based on the key point and the preset position determination rule, the first coordinate of the center point of the non-empty voxel can be obtained first, and then the coordinate of the center point of the non-empty voxel relative to the key point of the point cloud in the non-empty voxel can be determined based on the first coordinate and the second coordinate corresponding to the key point of the point cloud in the non-empty voxel. The offset, wherein the coordinate offset includes the offset corresponding to each coordinate axis in the three-dimensional coordinate system, and then the first coordinate of the center point of the non-empty voxel and the coordinate offset of the center point relative to the center of mass of the point cloud in the non-empty voxel are determined as the true voxel position information of the non-empty voxel.
能够理解的是,在将三维空间的空间范围按照固定的预设尺寸划分为多个大小相同的体素之后,每个体素的中心点的坐标即可确定。为了便于描述和便于与非空体素内点云的关键点的坐标进行区分,本公开实施例中,将非空体素的中心点对应坐标称为第一坐标,将非空体素内点云的关键点对应的坐标称为第二坐标。It can be understood that after the spatial range of the three-dimensional space is divided into a plurality of voxels of the same size according to a fixed preset size, the coordinates of the center point of each voxel can be determined. For the convenience of description and to facilitate the distinction from the coordinates of the key points of the point cloud in the non-empty voxel, in the embodiment of the present disclosure, the coordinates corresponding to the center point of the non-empty voxel are referred to as the first coordinates, and the coordinates corresponding to the key points of the point cloud in the non-empty voxel are referred to as the second coordinates.
本公开实施例中,在确定每个非空体素的体素位置真实信息时,可以先根据非空体素的中心点对应的第一坐标,以及该非空体素内点云的关键点对应的第二坐标,计算得到该非空体素的中心点相对于该非空体素内点云的关键点的坐标偏移量,其中,坐标偏移量包括中心点相对于关键点在三维坐标系的x、y和z轴上分别对应的偏移量,偏移量可以用中心点与关键点在同一坐标轴上的坐标差值来表示。In the embodiment of the present disclosure, when determining the true voxel position information of each non-empty voxel, the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud within the non-empty voxel can be calculated based on the first coordinate corresponding to the center point of the non-empty voxel and the second coordinate corresponding to the key point of the point cloud within the non-empty voxel, wherein the coordinate offset includes the offset of the center point relative to the key point on the x, y and z axes of the three-dimensional coordinate system, respectively, and the offset can be expressed by the coordinate difference between the center point and the key point on the same coordinate axis.
在一些实施例中,假设一个非空体素的中心点的第一坐标为(x1,y1,z1),该非空体素内点云的关键点对应的第二坐标为(x2,y2,z2),则该非空体素的中心点相对于点云的关键点的坐标偏移量可以记为(dx,dy,dz),其中,dx=x1-x2,dy=y1-y2,dz=z1-z2。In some embodiments, assuming that the first coordinate of the center point of a non-empty voxel is (x 1 , y 1 , z 1 ), and the second coordinate corresponding to the key point of the point cloud in the non-empty voxel is (x 2 , y 2 , z 2 ), then the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud can be recorded as (dx, dy, dz), where dx = x 1 -x 2 , dy = y 1 -y 2 , dz = z 1 -z 2 .
能够理解的是,当非空体素内仅包括一个点时,则该点的坐标可以看作是关键点的第二坐标,该非空体素的中心点相对于该点的坐标偏移量,可以看作是该非空体素的中心点相对于关键点的坐标偏移量。It can be understood that when a non-empty voxel includes only one point, the coordinates of the point can be regarded as the second coordinates of the key point, and the coordinate offset of the center point of the non-empty voxel relative to the point can be regarded as the coordinate offset of the center point of the non-empty voxel relative to the key point.
在一些实施例中,假设一个非空体素的中心点的第一坐标为(x1,y1,z1),该非空体素内仅包括一个点,该点的坐标为(x3,y3,z3),则该非空体素的中心点相对于点云的关键点(即该点)的坐标偏移量可以记为(dx,dy,dz),其中,dx=x1-x3,dy=y1-y3,dz=z1-z3。In some embodiments, assuming that the first coordinates of the center point of a non-empty voxel are ( x1 , y1 , z1 ), and the non-empty voxel includes only one point, and the coordinates of the point are ( x3 , y3 , z3 ), then the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud (i.e., the point) can be recorded as (dx, dy, dz), where dx= x1 - x3 , dy= y1 - y3 , dz= z1 - z3 .
之后,确定了非空体素的中心点相对于点云的关键点的坐标偏移量之后,即可将该中心点对应的第一坐标及确定的坐标偏移量,确定为该非空体素的体素位置真实信息。After that, after determining the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud, the first coordinate corresponding to the center point and the determined coordinate offset can be determined as the real voxel position information of the non-empty voxel.
在一些实施例中,非空体素的体素位置真实信息可以表示为(x1,y1,z1,dx,dy,dz),其中,(x1,y1,z1)表示非空体素的中心点的坐标,(dx,dy,dz)表示非空体素的中心点相对于该非空体素内点云的关键点的坐标偏移量。In some embodiments, the true voxel position information of a non-empty voxel can be expressed as (x 1 , y 1 , z 1 , dx, dy, dz), where (x 1 , y 1 , z 1 ) represents the coordinates of the center point of the non-empty voxel, and (dx, dy, dz) represents the coordinate offset of the center point of the non-empty voxel relative to the key point of the point cloud within the non-empty voxel.
在本公开实施例中,通过根据非空体素中的至少一个点,确定非空体素中点云的质心作为点云的关键点,进而基于质心来确定非空体素的体素位置真实信息,由此,实现了根据非空体素中点云的质心的位置来确定体素的真实位置,能够提高体素位置的准确度,使得体素对障碍物的位置表示更加精准。In the embodiment of the present disclosure, the center of mass of the point cloud in the non-empty voxel is determined as the key point of the point cloud based on at least one point in the non-empty voxel, and the true voxel position information of the non-empty voxel is determined based on the center of mass. Thus, the true position of the voxel is determined according to the position of the center of mass of the point cloud in the non-empty voxel, which can improve the accuracy of the voxel position and make the voxel representation of the obstacle position more accurate.
步骤404,利用所述非空体素中的至少一个点对应的外接图形,确定所述非空体素的体素尺寸真实信息。Step 404: determine the real information of the voxel size of the non-empty voxel by using the circumscribed graph corresponding to at least one point in the non-empty voxel.
其中,外接图形是指非空体素中包含的所有点的外接图形,外接图形例如可以是最小外接长方体。The circumscribed figure refers to the circumscribed figure of all points contained in the non-empty voxel, and the circumscribed figure may be, for example, a minimum circumscribed cuboid.
在本公开的一种可选实施方式中,针对任一非空体素,在利用该非空体素中的至少一
个点对应的外接图形,确定该非空体素的尺寸信息时,可以先确定该非空体素中所有点的最小外接图形,例如,确定该非空体素中所有点的最小外接长方体,接着确定该最小外接图形的各顶点坐标,进而利用各顶点的坐标来表征该非空体素的体素尺寸真实信息。In an optional embodiment of the present disclosure, for any non-empty voxel, at least one of the non-empty voxels is used. When determining the size information of the non-empty voxel by using the circumscribed figure corresponding to each point, the minimum circumscribed figure of all the points in the non-empty voxel can be determined first, for example, the minimum circumscribed cuboid of all the points in the non-empty voxel can be determined, and then the coordinates of each vertex of the minimum circumscribed figure can be determined, and then the coordinates of each vertex can be used to represent the real voxel size information of the non-empty voxel.
在本公开的一种可选实施方式中,在利用所述非空体素中的至少一个点对应的外接图形,确定非空体素的体素尺寸真实信息时,可以先根据非空体素中的至少一个点,确定非空体素中至少一个点的最小外接长方体,接着,根据该最小外接长方体的长宽高,确定非空体素的体素尺寸真实信息。In an optional embodiment of the present disclosure, when determining the true voxel size information of the non-empty voxel using the circumscribed figure corresponding to at least one point in the non-empty voxel, the minimum circumscribed cuboid of at least one point in the non-empty voxel can be first determined based on at least one point in the non-empty voxel, and then the true voxel size information of the non-empty voxel can be determined based on the length, width and height of the minimum circumscribed cuboid.
其中,在根据非空体素中的至少一个点确定非空体素内至少一个点的最小外接长方体时,可以采用目前常用的确定最小外接长方体的方式来实现,本公开实施例对确定最小外接长方体的具体实现方式不作限制。能够理解的是,最小外接长方体可以是长方体,也可以是正方体。确定的最小外接矩形可以利用各顶点的坐标来表示。Among them, when determining the minimum circumscribed cuboid of at least one point in the non-empty voxel according to at least one point in the non-empty voxel, the currently commonly used method of determining the minimum circumscribed cuboid can be used for implementation, and the embodiments of the present disclosure do not limit the specific implementation method of determining the minimum circumscribed cuboid. It can be understood that the minimum circumscribed cuboid can be a cuboid or a cube. The determined minimum circumscribed rectangle can be represented by the coordinates of each vertex.
本公开实施例中,在确定最小外接长方体时,针对非空体素中包含的点的个数和位置不同,可以选择不同的方式来确定非空体素中至少一个点的最小外接长方体,下面举例进行示例性说明。In the embodiment of the present disclosure, when determining the minimum circumscribed cuboid, different methods can be selected to determine the minimum circumscribed cuboid of at least one point in the non-empty voxel according to the different numbers and positions of points contained in the non-empty voxel. The following example is used for exemplary explanation.
作为一种示例,假设一个非空体素中包含至少两个点,且这至少两个点中又至少有两个点是不在一个平面上的,则可以采用目前常用的方式来确定这些点的最小外接长方体。As an example, assuming that a non-empty voxel contains at least two points, and at least two of the at least two points are not on the same plane, the currently commonly used method can be used to determine the minimum circumscribed cuboid of these points.
作为一种示例,假设一个非空体素中包含的至少两个点均在同一平面上,则可以采用目前常用的方式来确定这些点的最小外接矩形,而对于这些点所在的平面未涉及的另一坐标轴,则可以基于预设的边长来确定该坐标轴上的长度,从而得到一个长方体,将该长方体作为这些点的最小外接长方体。例如,假设一个非空体素内包含的多个点均在yoz这一平面上,预设的边长为0.2,则基于这些点可以确定这些点的最小外接矩形,并得到最小外接矩形各顶点的坐标,而对于x轴,则可以基于预设的边长0.2,确定x轴上的坐标以得到一个长方体。例如,可以将确定的最小外接矩形作为长方体的一个面,向x轴正方向或负方向确定长度为0.2时顶点的坐标,即可得到长方体另外四个顶点的坐标,由此得到了最小外接长方体上各顶点的坐标。又例如,可以将确定的最小外接矩形作为长方体的中截面,将该最小外接矩形的四个顶点的x坐标值分别增大0.1和减小0.1,y轴和z轴的坐标保持不变,得到八个新的坐标点,将这八个新的坐标点围成的长方体确定为非空体素内至少两个点的最小外接长方体。As an example, assuming that at least two points contained in a non-empty voxel are on the same plane, the currently commonly used method can be used to determine the minimum circumscribed rectangle of these points, and for another coordinate axis that is not involved in the plane where these points are located, the length on the coordinate axis can be determined based on the preset side length, thereby obtaining a cuboid, and the cuboid is used as the minimum circumscribed cuboid of these points. For example, assuming that multiple points contained in a non-empty voxel are all on the plane yoz, and the preset side length is 0.2, the minimum circumscribed rectangle of these points can be determined based on these points, and the coordinates of each vertex of the minimum circumscribed rectangle can be obtained, and for the x-axis, the coordinates on the x-axis can be determined based on the preset side length of 0.2 to obtain a cuboid. For example, the determined minimum circumscribed rectangle can be used as a face of the cuboid, and the coordinates of the vertex when the length is 0.2 in the positive or negative direction of the x-axis can be determined, and the coordinates of the other four vertices of the cuboid can be obtained, thereby obtaining the coordinates of each vertex on the minimum circumscribed cuboid. For another example, the determined minimum circumscribed rectangle can be used as the mid-section of the cuboid, and the x-coordinate values of the four vertices of the minimum circumscribed rectangle can be increased by 0.1 and decreased by 0.1 respectively, while the coordinates of the y-axis and the z-axis remain unchanged, to obtain eight new coordinate points. The cuboid enclosed by these eight new coordinate points is determined as the minimum circumscribed cuboid of at least two points in the non-empty voxel.
作为一种示例,假设一个非空体素中仅包含一个点,则可以按照预设的边长,确定一个包含该点的正方体作为该点的最小外接长方体。或者,可以按照预设的长度、预设的宽度和预设的高度,确定一个包含该点的长方体作为该点的最小外接长方体。或者,还可以将该非空体素作为最小外接长方体。As an example, assuming that a non-empty voxel contains only one point, a cube containing the point can be determined as the minimum circumscribed cuboid of the point according to a preset side length. Alternatively, a cuboid containing the point can be determined as the minimum circumscribed cuboid of the point according to a preset length, a preset width, and a preset height. Alternatively, the non-empty voxel can also be used as the minimum circumscribed cuboid.
接着,确定了非空体素中至少一个点的最小外接长方体之后,该最小外接长方体的长宽高(即长度、宽度和高度)也即确定。进而,可以根据最小外接长方体的长宽高,确定非空体素的体素尺寸真实信息。
Next, after determining the minimum circumscribed cuboid of at least one point in the non-empty voxel, the length, width and height of the minimum circumscribed cuboid are also determined. Further, the real voxel size information of the non-empty voxel can be determined based on the length, width and height of the minimum circumscribed cuboid.
作为一种可能的实现方式,在根据最小外接长方体的长宽高,确定非空体素的体素尺寸真实信息时,可以将最小外接长方体的长宽高直接作为非空体素的体素尺寸真实信息。As a possible implementation, when determining the true voxel size information of the non-empty voxel according to the length, width and height of the minimum circumscribed cuboid, the length, width and height of the minimum circumscribed cuboid may be directly used as the true voxel size information of the non-empty voxel.
作为一种可能的实现方式,在根据最小外接长方体的长宽高,确定非空体素的体素尺寸真实信息时,可以先将该最小外接长方体的每个边长分别与预设长度比较,其中,每个边长包括该最小外接长方体的长、宽和高分别对应的长度,预设长度可以根据实际需求进行设置,比如设置预设长度为0.1;在每个边长中存在小于该预设长度的边长时,将小于该预设长度的边长更新为该预设长度,进而基于更新后的每个边长,确定非空体素的体素尺寸真实信息。As a possible implementation method, when determining the true voxel size information of non-empty voxels based on the length, width and height of the minimum circumscribed cuboid, each side length of the minimum circumscribed cuboid can be compared with the preset length, wherein each side length includes the lengths corresponding to the length, width and height of the minimum circumscribed cuboid, and the preset length can be set according to actual needs, such as setting the preset length to 0.1; when there is a side length less than the preset length in each side length, the side length less than the preset length is updated to the preset length, and then based on each updated side length, the true voxel size information of the non-empty voxel is determined.
也就是说,本公开实施例中,对于确定的最小外接长方体,可以将该最小外接长方体的长度、宽度和高度分别和预设长度比较,如果长度、宽度和高度中的至少一个边长小于该预设长度,则将小于该预设长度的边长替换为该预设长度,进而利用更新后的边长作为该非空体素的体素尺寸真实信息,由此,使得非空体素的体素尺寸真实信息中每个边长均不小于该预设长度,以避免最终确定的非空体素的尺寸太小导致产生较多的空洞,使得体素不连续,从而影响了障碍物的检测结果。That is to say, in the embodiment of the present disclosure, for the determined minimum circumscribed cuboid, the length, width and height of the minimum circumscribed cuboid can be compared with the preset length respectively. If at least one side length of the length, width and height is less than the preset length, the side length less than the preset length is replaced with the preset length, and then the updated side length is used as the true voxel size information of the non-empty voxel. As a result, each side length in the true voxel size information of the non-empty voxel is not less than the preset length, so as to avoid the final non-empty voxel size being too small, resulting in a large number of holes, making the voxel discontinuous, thereby affecting the obstacle detection result.
在一些实施例中,本公开实施例中,非空体素的体素尺寸真实信息可以表示为(dl,dw,dh),其中,dl表示最小外接矩形的长度,dw表示最小外接矩形的宽度,dh表示最小外接矩形的高度。In some embodiments, in the embodiments of the present disclosure, the true voxel size information of a non-empty voxel can be expressed as (dl, dw, dh), where dl represents the length of the minimum enclosing rectangle, dw represents the width of the minimum enclosing rectangle, and dh represents the height of the minimum enclosing rectangle.
例如,假设根据一个非空体素中的点确定的最小外接长方体的长、宽、高依次为0.2、0.1和0.08,则可以确定该非空体素的体素尺寸真实信息为(0.2,0.1,0.08)。For example, assuming that the length, width, and height of the minimum circumscribed cuboid determined according to the points in a non-empty voxel are 0.2, 0.1, and 0.08, respectively, then the true voxel size information of the non-empty voxel can be determined to be (0.2, 0.1, 0.08).
又例如,假设预设长度为0.1,根据一个非空体素中的点确定的最小外接长方体的长、宽、高依次为0.2、0.1和0.08,则可以确定最小外接长方体的高度小于该预设长度,则将最小外接长方体的高度更新为0.1,最终确定的该非空体素的体素尺寸真实信息为(0.2,0.1,0.1)。For another example, assuming the preset length is 0.1, the length, width, and height of the minimum circumscribed cuboid determined based on the points in a non-empty voxel are 0.2, 0.1, and 0.08, respectively. It can be determined that the height of the minimum circumscribed cuboid is less than the preset length, and the height of the minimum circumscribed cuboid is updated to 0.1. The final voxel size information of the non-empty voxel is (0.2, 0.1, 0.1).
在本公开实施例中,通过根据非空体素中的至少一个点,确定非空体素中至少一个点的最小外接长方体,进而根据最小外接长方体的长宽高,确定非空体素的体素尺寸真实信息,由此,实现了根据非空体素中包含的点云的位置来确定体素的真实尺寸,能够提高体素尺寸的准确度,使得体素对障碍物的轮廓表示更加精细。In the embodiment of the present disclosure, the minimum circumscribed cuboid of at least one point in the non-empty voxel is determined based on at least one point in the non-empty voxel, and the real voxel size information of the non-empty voxel is determined based on the length, width and height of the minimum circumscribed cuboid. Thus, it is possible to determine the real size of the voxel based on the position of the point cloud contained in the non-empty voxel, which can improve the accuracy of the voxel size and make the voxel representation of the outline of the obstacle more precise.
在本公开的一种可选实施方式中,当非空体素中只包含一个点时,还可以确定该点的最小外接长方体的长宽高均为0,这种情况下,可以采用上述将最小外接长方体的长宽高分别与预设长度比较,在长宽高中存在小于预设长度的边长时,将小于该预设长度的边长更新为该预设长度,进而基于更新后的每个边长,确定非空体素的体素尺寸真实信息的方式,来确定非空体素中仅包含一个点时非空体素的体素尺寸真实信息。In an optional embodiment of the present disclosure, when a non-empty voxel contains only one point, it can also be determined that the length, width and height of the minimum circumscribed cuboid of the point are all 0. In this case, the above-mentioned method of comparing the length, width and height of the minimum circumscribed cuboid with the preset length can be adopted. When there is a side length less than the preset length among the length, width and height, the side length less than the preset length is updated to the preset length, and then the true voxel size information of the non-empty voxel is determined based on each updated side length, so as to determine the true voxel size information of the non-empty voxel when the non-empty voxel contains only one point.
本公开实施例中,当非空体素中只包含一个点时,确定该点的最小外接长方体的长宽高均为0,而预设长度是一个大于0的数,该最小外接长方体的长宽高显然均小于预设长度,因此可以将该预设长度确定为包含一个点的非空体素的体素尺寸真实信息。假设预设
长度为0.1,则对于仅包含一个点的非空体素,可以利用(0.1,0.1,0.1)作为该非空体素真实的体素尺寸真实信息。In the embodiment of the present disclosure, when a non-empty voxel contains only one point, the length, width and height of the smallest circumscribed cuboid of the point are all 0, and the preset length is a number greater than 0. The length, width and height of the smallest circumscribed cuboid are obviously smaller than the preset length. Therefore, the preset length can be determined as the true voxel size information of the non-empty voxel containing one point. Assuming the preset The length is 0.1. For a non-empty voxel containing only one point, (0.1, 0.1, 0.1) can be used as the actual voxel size information of the non-empty voxel.
需要说明的是,本公开实施例中,对于步骤403和步骤404的执行顺序不分先后,两者可以同时执行,也可以先后执行,本公开实施例仅以步骤404在步骤403之后执行作为示例,而不能作为对本公开的限制。It should be noted that in the embodiment of the present disclosure, there is no particular order for the execution of step 403 and step 404. The two can be executed simultaneously or one after the other. The embodiment of the present disclosure only takes the execution of step 404 after step 403 as an example, which cannot be used as a limitation of the present disclosure.
步骤405,基于所述每个体素对应的所述体素占用真实信息、所述非空体素对应的所述体素位置真实信息和所述体素尺寸真实信息,生成所述障碍物在每个体素中对应的障碍物真实特征。Step 405 : Generate a real obstacle feature corresponding to the obstacle in each voxel based on the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information.
本公开实施例中,针对每个非空体素,确定了非空体素的体素位置真实信息和体素尺寸真实信息之后,即可基于同一非空体素的体素占用真实信息、体素位置真实信息和体素尺寸真实信息,以及各未占用的空体素对应的体素占用真实信息,生成点云数据所属障碍物的障碍物真实特征。In the disclosed embodiment, for each non-empty voxel, after determining the real voxel position information and the real voxel size information of the non-empty voxel, the real obstacle features of the obstacle belonging to the point cloud data can be generated based on the real voxel occupancy information, the real voxel position information and the real voxel size information of the same non-empty voxel, and the real voxel occupancy information corresponding to each unoccupied empty voxel.
需要说明的是,对于体素占用真实信息为未占用的空体素,可以不提供体素位置真实信息和体素尺寸真实信息,或者,也可以将空体素的体素位置真实信息和体素尺寸真实信息均设置为0。It should be noted that, for an empty voxel whose voxel occupancy real information is unoccupied, the voxel position real information and the voxel size real information may not be provided, or the voxel position real information and the voxel size real information of the empty voxel may be set to 0.
在一些实施例中,假设障碍物真实特征中每个体素对应的真实信息表示方式采用统一格式表示为(p,x,y,z,dl,dw,dh),其中,p表示体素占用真实信息,(x,y,z)表示体素中点云的质心对应的坐标,dl表示最小外接矩形的长度,dw表示最小外接矩形的宽度,dh表示最小外接矩形的高度。则对于未占用的空体素,其真实信息可以表示为(p1,,,,,,),其中,p1表示体素未占用,其他元素为空,表示无体素位置真实信息和体素尺寸真实信息。对于被占用的非空体素,其真实信息可以表示为(p2,x0,y0,z0,dl,dw,dh),其中,p2表示体素被占用。In some embodiments, it is assumed that the real information representation method corresponding to each voxel in the real feature of the obstacle is represented by a unified format as (p, x, y, z, dl, dw, dh), where p represents the real information of voxel occupancy, (x, y, z) represents the coordinates corresponding to the centroid of the point cloud in the voxel, dl represents the length of the minimum bounding rectangle, dw represents the width of the minimum bounding rectangle, and dh represents the height of the minimum bounding rectangle. For unoccupied empty voxels, their real information can be expressed as (p 1 ,,,,,,), where p 1 indicates that the voxel is unoccupied, and the other elements are empty, indicating that there is no real information of the voxel position and the real information of the voxel size. For occupied non-empty voxels, their real information can be expressed as (p 2 ,x 0 ,y 0 ,z 0 ,dl,dw,dh), where p 2 indicates that the voxel is occupied.
在一些实施例中,障碍物真实特征中未占用的空体素对应的真实信息表示方式可以为(p1),p1表示体素未占用。假设被占用的非空体素的体素位置真实信息是通过非空体素的中心点坐标及中心点相对于非空体素中点云的质心的坐标偏移量来表示的,非空体素的体素尺寸真实信息是通过非空体素内所有点云的最小外接长方体的长宽高来表示的,则非空体素的真实信息可以表示为(p2,x1,y1,z1,dx,dy,dz,dl,dw,dh),其中,p2表示体素被占用,(x1,y1,z1)表示非空体素的中心点对应的坐标,(dx,dy,dz)表示非空体素的中心点相对于非空体素中点云的质心的坐标偏移量,dl表示最小外接矩形的长度,dw表示最小外接矩形的宽度,dh表示最小外接矩形的高度。In some embodiments, the real information corresponding to the unoccupied empty voxels in the real feature of the obstacle may be represented by (p 1 ), where p 1 indicates that the voxel is unoccupied. Assuming that the true voxel position information of the occupied non-empty voxel is represented by the coordinates of the center point of the non-empty voxel and the coordinate offset of the center point relative to the center of mass of the point cloud in the non-empty voxel, and the true voxel size information of the non-empty voxel is represented by the length, width and height of the minimum circumscribed rectangle of all point clouds in the non-empty voxel, then the true information of the non-empty voxel can be expressed as (p 2 ,x 1 ,y 1 ,z 1 ,dx,dy,dz,dl,dw,dh), where p 2 indicates that the voxel is occupied, (x 1 ,y 1 ,z 1 ) represents the coordinates corresponding to the center point of the non-empty voxel, (dx,dy,dz) represents the coordinate offset of the center point of the non-empty voxel relative to the center of mass of the point cloud in the non-empty voxel, dl represents the length of the minimum circumscribed rectangle, dw represents the width of the minimum circumscribed rectangle, and dh represents the height of the minimum circumscribed rectangle.
本公开实施例的障碍物特征识别模型的训练方法,通过根据点云数据中每个点的位置,确定每个体素对应的体素占用真实信息,体素占用真实信息包括占用和未占用,并针对体素占用真实信息为占用的非空体素,根据非空体素中的至少一个点,基于质心计算规则确定非空体素内点云的质心作为关键点,并根据确定的关键点,基于预设的位置确定规则,确定非空体素的体素位置真实信息,以及利用非空体素中的至少一个点对应的外接图形,确定非空体素的体素尺寸真实信息,进而基于每个体素对应的体素占用真实信息、非空体
素对应的体素位置真实信息和体素尺寸真实信息,生成障碍物在每个体素中对应的障碍物真实特征,由此,能够对划分的多个体素进行占用状态标记,并能够确定出各非空体素对应的真实位置和真实尺寸,实现了根据体素内的点动态生成体素真实信息,使得对障碍物的位置和轮廓表示更加精细,有利于提高障碍物特征识别的准确度,从而提高障碍物检测的准确度。The training method of the obstacle feature recognition model of the embodiment of the present disclosure determines the real information of voxel occupancy corresponding to each voxel according to the position of each point in the point cloud data, the real information of voxel occupancy includes occupied and unoccupied, and for non-empty voxels whose real information of voxel occupancy is occupied, the centroid of the point cloud in the non-empty voxel is determined as a key point based on a centroid calculation rule according to at least one point in the non-empty voxel, and the real information of the voxel position of the non-empty voxel is determined based on a preset position determination rule according to the determined key point, and the real information of the voxel size of the non-empty voxel is determined by using an external figure corresponding to at least one point in the non-empty voxel, and then based on the real information of voxel occupancy and the non-empty voxel corresponding to each voxel, the real information of the voxel size is determined. The real voxel position information and the real voxel size information corresponding to the voxel are obtained to generate the real features of the obstacle corresponding to each voxel. Therefore, the occupation status of the divided multiple voxels can be marked, and the real position and real size corresponding to each non-empty voxel can be determined. The real voxel information is dynamically generated according to the points in the voxel, which makes the position and contour of the obstacle more precise, which is conducive to improving the accuracy of obstacle feature recognition, thereby improving the accuracy of obstacle detection.
在本公开的一种可选实施方式中,在将获取的点云数据转换至三维坐标系中时,可以先获取激光雷达针对同一障碍物采集的多帧点云数据,之后将多帧点云数据转换至同一三维坐标系中,其中,在进行转换时,若同一三维坐标位置对应多帧点云数据中的多个点,则对该多个点进行去重处理。In an optional embodiment of the present disclosure, when converting the acquired point cloud data into a three-dimensional coordinate system, multiple frames of point cloud data collected by the laser radar for the same obstacle can be first obtained, and then the multiple frames of point cloud data can be converted into the same three-dimensional coordinate system. During the conversion, if the same three-dimensional coordinate position corresponds to multiple points in the multiple frames of point cloud data, the multiple points are deduplicated.
通常,激光雷达采集一个障碍物的点云数据时,在一个时刻获取该障碍物的一帧点云,一帧点云很难完整描述整个障碍物的全部信息,因此本公开实施例中,可以获取激光雷达针对同一障碍物在不同时刻采集的多帧点云数据,进而对获取的多帧点云数据进行拼接,拼接即指将多帧点云数据转换至同一个三维坐标系中。能够理解的是,针对三维空间中的同一个三维坐标位置,可能不止一帧点云数据中包含了该位置对应的点,针对这种情况,在进行拼接时,可以对该位置对应的多个点进行去重处理,仅保留一个点显示在三维坐标系中。或者,针对一个三维坐标位置对应多个点的情况,还可以将同一三维坐标位置对应的多个点的数据求均值,将求得的均值作为三维坐标系中该三维坐标位置对应的点的信息。Usually, when a laser radar collects point cloud data of an obstacle, a frame of point cloud of the obstacle is obtained at one time. It is difficult for one frame of point cloud to fully describe all the information of the entire obstacle. Therefore, in the embodiment of the present disclosure, multiple frames of point cloud data collected by the laser radar for the same obstacle at different times can be obtained, and then the obtained multiple frames of point cloud data can be spliced. Splicing refers to converting multiple frames of point cloud data into the same three-dimensional coordinate system. It can be understood that for the same three-dimensional coordinate position in three-dimensional space, more than one frame of point cloud data may contain the point corresponding to the position. For this case, when splicing, the multiple points corresponding to the position can be deduplicated, and only one point is retained to be displayed in the three-dimensional coordinate system. Alternatively, for the case where a three-dimensional coordinate position corresponds to multiple points, the data of multiple points corresponding to the same three-dimensional coordinate position can also be averaged, and the average obtained is used as the information of the point corresponding to the three-dimensional coordinate position in the three-dimensional coordinate system.
在本公开实施例中,通过获取多帧点云数据,并将多帧点云数据转换至同一坐标系中,能够获得更稠密的点云,使得三维坐标系中显示的点云数据更能反映真实的障碍物的全部信息。In the embodiment of the present disclosure, by acquiring multi-frame point cloud data and converting the multi-frame point cloud data into the same coordinate system, a denser point cloud can be obtained, so that the point cloud data displayed in the three-dimensional coordinate system can better reflect all the information of the real obstacle.
图5示出了本公开一具体实施例的障碍物特征识别模型的网络结构示意图。如图5所示,该障碍物特征识别模型包括特征提取网络、特征转换网络、体素占用预测网络、体素位置预测网络和体素尺寸预测网络,其中,特征提取网络用于对输入的环视图像进行特征提取,得到二维特征;特征转换网络用于将特征提取网络提取的二维特征转换为三维特征;体素占用预测网络用于基于三维特征获取体素占用预测信息,体素位置预测网络用于基于三维特征获取体素位置预测信息,体素尺寸预测网络用于基于三维特征获取体素尺寸预测信息。图5所示的障碍物特征识别模型可以利用基于点云数据获取的障碍物真实特征作为训练目标训练得到,其中,障碍物真实特征包括体素占用真实信息、体素位置真实信息和体素尺寸真实信息。如图5所示,将6张环视图像输入至障碍物特征识别模型中,先经特征提取网络获得二维特征,二维特征经过特征转换模块后恢复为三维特征。如图5所示,障碍物特征识别模型还可以包括上采样网络,三维特征经上采样后分别输入至体素占用预测网络、体素位置预测网络和体素尺寸预测网络,得到体素占用预测信息、体素位置预测信息和体素尺寸预测信息,进而基于体素占用预测信息、体素位置预测信息和体素尺寸预测信息,得到障碍物预测特征。利用图5所示的障碍物特征识别模型,能够获得准确的障碍物3D特征,为实现基于环视图像准确识别障碍物提供了条件。
FIG5 shows a schematic diagram of the network structure of an obstacle feature recognition model of a specific embodiment of the present disclosure. As shown in FIG5 , the obstacle feature recognition model includes a feature extraction network, a feature conversion network, a voxel occupancy prediction network, a voxel position prediction network and a voxel size prediction network, wherein the feature extraction network is used to extract features of the input surround view image to obtain two-dimensional features; the feature conversion network is used to convert the two-dimensional features extracted by the feature extraction network into three-dimensional features; the voxel occupancy prediction network is used to obtain voxel occupancy prediction information based on the three-dimensional features, the voxel position prediction network is used to obtain voxel position prediction information based on the three-dimensional features, and the voxel size prediction network is used to obtain voxel size prediction information based on the three-dimensional features. The obstacle feature recognition model shown in FIG5 can be obtained by training using the real features of obstacles obtained based on point cloud data as training targets, wherein the real features of obstacles include real voxel occupancy information, real voxel position information and real voxel size information. As shown in FIG5 , six surround view images are input into the obstacle feature recognition model, and two-dimensional features are first obtained through the feature extraction network, and the two-dimensional features are restored to three-dimensional features after passing through the feature conversion module. As shown in FIG5 , the obstacle feature recognition model may also include an upsampling network. After upsampling, the three-dimensional features are respectively input into the voxel occupancy prediction network, the voxel position prediction network and the voxel size prediction network to obtain voxel occupancy prediction information, voxel position prediction information and voxel size prediction information, and then based on the voxel occupancy prediction information, the voxel position prediction information and the voxel size prediction information, the obstacle prediction features are obtained. By using the obstacle feature recognition model shown in FIG5 , accurate obstacle 3D features can be obtained, which provides conditions for accurately identifying obstacles based on surround view images.
与上述实施例相对应,本公开实施例还提供了一种障碍物特征的获取方法,利用上述实施例训练得到的障碍物特征识别模型来获得障碍物的障碍物预测特征。Corresponding to the above embodiment, the embodiment of the present disclosure further provides a method for acquiring obstacle features, which uses the obstacle feature recognition model trained by the above embodiment to obtain obstacle prediction features of the obstacle.
图6为本公开一实施例提供的障碍物特征的获取方法的流程示意图,该障碍物特征的获取方法可以由本公开实施例提供的障碍物特征的获取装置执行,该障碍物特征的获取装置可以采用软件和/或硬件实现,并可集成在计算机设备上,所述计算机设备可以是电脑、服务器等电子设备。FIG6 is a flow chart of a method for acquiring obstacle features provided in an embodiment of the present disclosure. The method for acquiring obstacle features may be executed by an apparatus for acquiring obstacle features provided in an embodiment of the present disclosure. The apparatus for acquiring obstacle features may be implemented using software and/or hardware and may be integrated on a computer device, which may be an electronic device such as a computer or a server.
如图6所示,该障碍物特征的获取方法可以包括步骤501至步骤502。As shown in FIG. 6 , the method for acquiring obstacle features may include steps 501 and 502 .
步骤501,获取障碍物的环视图像。Step 501: Acquire a surround view image of an obstacle.
其中,障碍物例如可以是建筑、树木、人体、车辆等。The obstacles may be buildings, trees, human bodies, vehicles, etc.
在一些实施例中,可以利用摄像机来采集障碍物的环视图像,环视图像包括多张图像,例如,环视图像可以包括前、后、左前、右前、左后和右后六个方向的图像。In some embodiments, a camera may be used to capture a surround view image of an obstacle, the surround view image including multiple images, for example, the surround view image may include images in six directions: front, rear, left front, right front, left rear, and right rear.
步骤502,将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征。Step 502: Input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle.
其中,障碍物特征识别模型是通过上述实施例提供的障碍物特征识别模型的训练方法预先训练得到的,障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,特征提取网络用于提取环视图像对应的二维特征,特征转换网络用于将二维特征转换为三维特征,障碍物特征预测网络用于根据三维特征预测得到障碍物对应的障碍物预测特征。The obstacle feature recognition model is pre-trained by the obstacle feature recognition model training method provided in the above embodiment. The obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
本公开实施例中,将获取的障碍物的环视图像输入至预先训练完成的障碍物特征识别模型中,先由障碍物特征识别模型中的特征提取网络对环视图像进行特征提取,得到环视图像对应的二维特征。接着,特征提取网络输出的二维特征输入至特征转换网络,由特征转换网络进行特征转换,将二维特征转换为障碍物对应的三维特征。最后,特征转换网络输出的三维特征输入至障碍物特征预测网络,由障碍物特征预测网络基于输入的三维特征进行障碍物特征预测,并输出障碍物对应的障碍物预测特征。In the disclosed embodiment, the acquired surround view image of the obstacle is input into the obstacle feature recognition model that has been trained in advance. The feature extraction network in the obstacle feature recognition model first extracts features from the surround view image to obtain two-dimensional features corresponding to the surround view image. Next, the two-dimensional features output by the feature extraction network are input into the feature conversion network, which performs feature conversion to convert the two-dimensional features into three-dimensional features corresponding to the obstacle. Finally, the three-dimensional features output by the feature conversion network are input into the obstacle feature prediction network, which performs obstacle feature prediction based on the input three-dimensional features and outputs obstacle prediction features corresponding to the obstacle.
其中,障碍物特征预测网络可以根据实际需求进行设置,障碍物特征预测网络可以包括但不限于体素占用预测网络、体素位置预测网络、体素尺寸预测网络中的至少一个,相应的,可以预测得到体素占用预测信息、体素位置预测信息、体素尺寸预测信息中的至少一种作为障碍物的障碍物预测特征。Among them, the obstacle feature prediction network can be set according to actual needs. The obstacle feature prediction network may include but is not limited to at least one of a voxel occupancy prediction network, a voxel position prediction network, and a voxel size prediction network. Accordingly, at least one of the voxel occupancy prediction information, the voxel position prediction information, and the voxel size prediction information can be predicted as the obstacle prediction feature of the obstacle.
本公开实施例的障碍物特征的获取方法,通过获取障碍物的环视图像,将环视图像输入至预先训练完成的障碍物特征识别模型,得到障碍物对应的障碍物预测特征,由此,利用预先训练完成的障碍物特征识别模型,能够基于障碍物的环视图像获得障碍物在三维空间中的高精度障碍物特征,有利于提高障碍物识别的准确率,从而有助于车辆在自动驾驶中有效规避障碍物。The obstacle feature acquisition method of the disclosed embodiment obtains a surround view image of the obstacle, inputs the surround view image into a pre-trained obstacle feature recognition model, and obtains obstacle prediction features corresponding to the obstacle. Thus, by using the pre-trained obstacle feature recognition model, it is possible to obtain high-precision obstacle features of the obstacle in three-dimensional space based on the surround view image of the obstacle, which is beneficial to improving the accuracy of obstacle recognition, thereby helping the vehicle to effectively avoid obstacles in autonomous driving.
为了实现上述实施例,本公开实施例还提供了一种障碍物特征识别模型的训练装置。In order to implement the above embodiment, the embodiment of the present disclosure also provides a training device for an obstacle feature recognition model.
图7为本公开一实施例提供的障碍物特征识别模型的训练装置的结构示意图,该装置
可以采用软件和/或硬件实现,并可集成在计算机设备上,所述计算机设备可以是电脑、服务器等电子设备。FIG. 7 is a schematic diagram of the structure of a training device for an obstacle feature recognition model provided by an embodiment of the present disclosure. It can be implemented by software and/or hardware, and can be integrated on a computer device, which can be an electronic device such as a computer or a server.
如图7所示,本公开实施例提供的障碍物特征识别模型的训练装置60可以包括:第一获取模块601、确定模块602、第二获取模块603和训练模块604。As shown in FIG. 7 , the obstacle feature recognition model training device 60 provided in the embodiment of the present disclosure may include: a first acquisition module 601 , a determination module 602 , a second acquisition module 603 and a training module 604 .
第一获取模块601,用于获取同一障碍物对应的环视图像及点云数据。The first acquisition module 601 is used to acquire surround view images and point cloud data corresponding to the same obstacle.
确定模块602,用于基于所述点云数据,以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征。The determination module 602 is used to determine the real obstacle feature corresponding to each voxel of the obstacle based on the point cloud data and the preset voxels in the corresponding three-dimensional space.
第二获取模块603,用于将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,所述初始模型包括特征提取网络、特征转换网络和障碍物特征预测网络。The second acquisition module 603 is used to input the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network.
训练模块604,用于根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。The training module 604 is used to train the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
在一些实施例中,所述障碍物真实特征包括体素占用真实信息,所述障碍物特征预测网络包括体素占用预测网络;所述第二获取模块603,还用于:In some embodiments, the real feature of the obstacle includes real information of voxel occupancy, and the obstacle feature prediction network includes a voxel occupancy prediction network; the second acquisition module 603 is further used to:
将所述环视图像输入特征提取网络,得到所述环视图像对应的二维特征;Inputting the surround view image into a feature extraction network to obtain two-dimensional features corresponding to the surround view image;
将所述二维特征输入所述特征转换网络,得到所述障碍物对应的三维特征;Inputting the two-dimensional features into the feature conversion network to obtain three-dimensional features corresponding to the obstacles;
将所述三维特征输入所述体素占用预测网络,获取所述体素占用预测网络输出的体素占用预测信息。The three-dimensional features are input into the voxel occupancy prediction network to obtain voxel occupancy prediction information output by the voxel occupancy prediction network.
在一些实施例中,所述障碍物真实特征还包括体素对应的体素位置真实信息及体素尺寸真实信息,所述障碍物特征预测网络还包括体素位置预测网络和体素尺寸预测网络;所述第二获取模块603,还用于:In some embodiments, the real feature of the obstacle further includes real information of the voxel position and real information of the voxel size corresponding to the voxel, and the obstacle feature prediction network further includes a voxel position prediction network and a voxel size prediction network; the second acquisition module 603 is further used to:
将所述三维特征输入所述体素位置预测网络,获取所述体素位置预测网络输出的体素位置预测信息;Inputting the three-dimensional features into the voxel position prediction network to obtain voxel position prediction information output by the voxel position prediction network;
将所述三维特征输入所述体素尺寸预测网络,获取所述体素尺寸预测网络输出的体素尺寸预测信息。The three-dimensional features are input into the voxel size prediction network to obtain voxel size prediction information output by the voxel size prediction network.
在一些实施例中,所述训练模块604,还用于:In some embodiments, the training module 604 is further configured to:
根据所述体素占用预测信息和所述体素占用真实信息,确定第一损失值;Determining a first loss value according to the voxel occupancy prediction information and the voxel occupancy actual information;
根据所述体素位置预测信息和所述体素位置真实信息,确定第二损失值;Determining a second loss value according to the voxel position prediction information and the voxel position real information;
根据所述体素尺寸预测信息和所述体素尺寸真实信息,确定第三损失值;Determining a third loss value according to the voxel size prediction information and the voxel size real information;
基于所述第一损失值、所述第二损失值和所述第三损失值,调整所述初始模型的网络参数。Based on the first loss value, the second loss value, and the third loss value, a network parameter of the initial model is adjusted.
在一些实施例中,所述确定模块602,包括:In some embodiments, the determining module 602 includes:
转换子模块,用于将所述点云数据转换至三维坐标系中;A conversion submodule, used for converting the point cloud data into a three-dimensional coordinate system;
分割子模块,用于按照预设尺寸对所述三维坐标系下的三维空间进行体素化,得到所述三维空间中的多个体素;
A segmentation submodule, used for voxelizing the three-dimensional space in the three-dimensional coordinate system according to a preset size to obtain a plurality of voxels in the three-dimensional space;
生成子模块,用于根据所述多个体素中每个体素包含的所述点云数据,利用体素内点云的关键点及点云的外接图形生成所述障碍物在每个体素中对应的障碍物真实特征,所述障碍物真实特征包括所述每个体素分别对应的体素占用真实信息、非空体素对应的体素位置真实信息和体素尺寸真实信息。A generation submodule is used to generate, based on the point cloud data contained in each voxel of the multiple voxels, the real features of the obstacle corresponding to the obstacle in each voxel by using the key points of the point cloud in the voxel and the external graphics of the point cloud, wherein the real features of the obstacle include the real information of voxel occupancy corresponding to each voxel, the real information of voxel position corresponding to the non-empty voxel, and the real information of voxel size.
在一些实施例中,所述生成子模块,包括:In some embodiments, the generating submodule comprises:
第一确定单元,用于根据所述点云数据中每个点的位置,确定所述每个体素对应的体素占用真实信息,所述体素占用真实信息包括占用和未占用;A first determining unit, configured to determine, according to the position of each point in the point cloud data, real voxel occupancy information corresponding to each voxel, wherein the real voxel occupancy information includes occupied and unoccupied;
第二确定单元,用于针对所述体素占用真实信息为占用的非空体素,根据所述非空体素中的至少一个点,基于质心计算规则确定所述非空体素内点云的质心作为所述关键点;A second determining unit is configured to determine, for the non-empty voxel whose real voxel occupancy information is occupied, the centroid of the point cloud in the non-empty voxel as the key point based on a centroid calculation rule according to at least one point in the non-empty voxel;
第三确定单元,用于根据所述关键点,基于预设的位置确定规则,确定所述非空体素的体素位置真实信息;A third determination unit, configured to determine the real voxel position information of the non-empty voxel according to the key point and based on a preset position determination rule;
第四确定单元,用于利用所述非空体素中的至少一个点对应的外接图形,确定所述非空体素的体素尺寸真实信息;A fourth determining unit, configured to determine the real voxel size information of the non-empty voxel by using the circumscribed graph corresponding to at least one point in the non-empty voxel;
生成单元,用于基于所述每个体素对应的所述体素占用真实信息、所述非空体素对应的所述体素位置真实信息和所述体素尺寸真实信息,生成所述障碍物在每个体素中对应的障碍物真实特征。A generating unit is used to generate a real obstacle feature corresponding to the obstacle in each voxel based on the real voxel occupancy information corresponding to each voxel, the real voxel position information corresponding to the non-empty voxel, and the real voxel size information.
在一些实施例中,所述位置确定规则,包括:将所述关键点对应的位置坐标,确定为所述非空体素的体素位置真实信息。In some embodiments, the position determination rule includes: determining the position coordinates corresponding to the key point as the real voxel position information of the non-empty voxel.
在一些实施例中,所述位置确定规则,包括:将所述非空体素的中心点相对于所述关键点的坐标偏移量确定为所述非空体素的位置信息;所述第三确定单元,还用于:In some embodiments, the position determination rule includes: determining the coordinate offset of the center point of the non-empty voxel relative to the key point as the position information of the non-empty voxel; the third determination unit is further used to:
获取所述非空体素的中心点的第一坐标;Obtaining a first coordinate of a center point of the non-empty voxel;
根据所述第一坐标以及所述关键点对应的第二坐标,确定所述中心点相对于所述关键点的坐标偏移量,所述坐标偏移量包括所述三维坐标系中每个坐标轴上分别对应的偏移量;Determine, according to the first coordinate and the second coordinate corresponding to the key point, a coordinate offset of the center point relative to the key point, wherein the coordinate offset includes an offset corresponding to each coordinate axis in the three-dimensional coordinate system;
将所述第一坐标以及所述坐标偏移量,确定为所述非空体素的体素位置真实信息。The first coordinate and the coordinate offset are determined as the real voxel position information of the non-empty voxel.
在一些实施例中,所述第四确定单元,还用于:In some embodiments, the fourth determining unit is further configured to:
根据所述非空体素中的至少一个点,确定所述非空体素中所述至少一个点的最小外接长方体;Determine, according to at least one point in the non-empty voxel, a minimum circumscribed cuboid of the at least one point in the non-empty voxel;
根据所述最小外接长方体的长宽高,确定所述非空体素的体素尺寸真实信息。According to the length, width and height of the minimum circumscribed cuboid, the real voxel size information of the non-empty voxel is determined.
在一些实施例中,所述第四确定单元,还用于:In some embodiments, the fourth determining unit is further configured to:
在所述非空体素中包含一个点的情况下,确定所述最小外接长方体的长宽高均为0。When the non-empty voxel contains a point, it is determined that the length, width and height of the minimum circumscribed cuboid are all 0.
在一些实施例中,所述第四确定单元,还用于:In some embodiments, the fourth determining unit is further configured to:
将所述最小外接长方体的每个边长分别与预设长度比较,其中,所述每个边长包括所述最小外接长方体的长、宽和高分别对应的长度;Compare each side length of the minimum circumscribed cuboid with a preset length, wherein each side length includes the lengths corresponding to the length, width, and height of the minimum circumscribed cuboid;
在所述每个边长中存在小于所述预设长度的边长时,将小于所述预设长度的边长更新为所述预设长度;
When there is a side length shorter than the preset length among each side length, updating the side length shorter than the preset length to the preset length;
基于更新后的每个边长,确定所述非空体素的体素尺寸真实信息。Based on each updated edge length, the real voxel size information of the non-empty voxel is determined.
本公开实施例所提供的可配置于计算机设备上的障碍物特征识别模型的训练装置,可执行本公开实施例所提供的任意应用于计算机设备上的障碍物特征识别模型的训练方法,具备执行方法相应的功能模块和有益效果。本公开装置实施例中未详尽描述的内容可以参考本公开任意方法实施例中的描述。The training device for the obstacle feature recognition model that can be configured on a computer device provided in the embodiments of the present disclosure can execute any training method for the obstacle feature recognition model applied to a computer device provided in the embodiments of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method. For the contents not fully described in the embodiments of the device of the present disclosure, reference can be made to the description in any method embodiment of the present disclosure.
为了实现上述实施例,本公开实施例还提供了一种障碍物特征的获取装置。In order to implement the above embodiment, the embodiment of the present disclosure also provides a device for acquiring obstacle characteristics.
图8为本公开一实施例提供的障碍物特征的获取装置的结构示意图,该装置可以采用软件和/或硬件实现,并可集成在计算机设备上,所述计算机设备可以是电脑、服务器等电子设备。FIG8 is a schematic diagram of the structure of an obstacle feature acquisition device provided in an embodiment of the present disclosure. The device may be implemented using software and/or hardware and may be integrated on a computer device, which may be an electronic device such as a computer or a server.
如图8所示,本公开实施例提供的障碍物特征的获取装置70可以包括:图像获取模块701和特征预测模块702。As shown in FIG. 8 , the obstacle feature acquisition device 70 provided in the embodiment of the present disclosure may include: an image acquisition module 701 and a feature prediction module 702 .
图像获取模块701,用于获取障碍物的环视图像。The image acquisition module 701 is used to acquire a surround view image of an obstacle.
特征预测模块702,用于将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征。The feature prediction module 702 is used to input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle.
其中,所述障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,所述特征提取网络用于提取所述环视图像对应的二维特征,所述特征转换网络用于将所述二维特征转换为三维特征,所述障碍物特征预测网络用于根据所述三维特征预测得到所述障碍物对应的障碍物预测特征。Among them, the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract the two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
本公开实施例所提供的可配置于计算机设备上的障碍物特征的获取装置,可执行本公开实施例所提供的任意应用于计算机设备上的障碍物特征的获取方法,具备执行方法相应的功能模块和有益效果。本公开装置实施例中未详尽描述的内容可以参考本公开任意方法实施例中的描述。The obstacle feature acquisition device provided in the embodiments of the present disclosure and which can be configured on a computer device can execute any obstacle feature acquisition method provided in the embodiments of the present disclosure and which is applied to a computer device, and has the corresponding functional modules and beneficial effects of the execution method. For the contents not fully described in the embodiments of the present disclosure, reference can be made to the description in any method embodiment of the present disclosure.
本公开实施例还提供了一种计算机设备,包括处理器和存储器;所述处理器通过调用所述存储器存储的程序或指令,用于执行如前述任意实施例所述障碍物特征识别模型的训练方法各实施例的步骤,或者,执行如前述任意实施例所述障碍物特征的获取方法各实施例的步骤,为避免重复描述,在此不再赘述。The embodiments of the present disclosure further provide a computer device, including a processor and a memory; the processor is used to execute the steps of each embodiment of the training method of the obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for obtaining obstacle features as described in any of the aforementioned embodiments, by calling the program or instructions stored in the memory. To avoid repeated description, they are not repeated here.
本公开实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质是非暂态的,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如前述任意实施例所述障碍物特征识别模型的训练方法各实施例的步骤,或者,执行如前述任意实施例所述障碍物特征的获取方法各实施例的步骤,为避免重复描述,在此不再赘述。The embodiments of the present disclosure further provide a computer-readable storage medium, which is non-transitory and stores programs or instructions. The programs or instructions enable a computer to execute the steps of each embodiment of the method for training an obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for acquiring obstacle features as described in any of the aforementioned embodiments. To avoid repeated description, they are not repeated here.
本公开实施例还提供了一种计算机程序产品,所述计算机程序产品用于执行如前述任意实施例所述障碍物特征识别模型的训练方法各实施例的步骤,或者,执行如前述任意实施例所述障碍物特征的获取方法各实施例的步骤。The embodiments of the present disclosure also provide a computer program product, which is used to execute the steps of each embodiment of the method for training an obstacle feature recognition model as described in any of the aforementioned embodiments, or to execute the steps of each embodiment of the method for acquiring obstacle features as described in any of the aforementioned embodiments.
本公开实施例还提供了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如前述任意实施例所述的障碍物
特征识别模型的训练方法,或者,如前述任意实施例所述的障碍物特征的获取方法。The present disclosure also provides a computer program, wherein the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes the obstacle control method as described in any of the above embodiments. A method for training a feature recognition model, or a method for acquiring obstacle features as described in any of the above embodiments.
需要说明的是,前述对方法、装置实施例的解释说明也适用于上述实施例的电子设备、计算机可读存储介质、计算机程序产品和计算机程序,此处不再赘述。It should be noted that the aforementioned explanations of the method and device embodiments are also applicable to the electronic device, computer-readable storage medium, computer program product and computer program of the above embodiments, and will not be repeated here.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or device including the elements.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description is only a specific embodiment of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to the embodiments described herein, but will conform to the widest scope consistent with the principles and novel features disclosed herein.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。
All embodiments of the present disclosure may be implemented individually or in combination with other embodiments, and are deemed to be within the protection scope required by the present disclosure.
Claims (18)
- 一种障碍物特征识别模型的训练方法,其特征在于,所述方法包括:A training method for an obstacle feature recognition model, characterized in that the method comprises:获取同一障碍物对应的环视图像及点云数据;Obtain surround view images and point cloud data corresponding to the same obstacle;基于所述点云数据,以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征;Based on the point cloud data and preset voxels in the corresponding three-dimensional space, determine the real obstacle features corresponding to the obstacle in each voxel;将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,所述初始模型包括特征提取网络、特征转换网络和障碍物特征预测网络;Inputting the surround view image into an initial model to be trained to obtain obstacle prediction features corresponding to the obstacle, wherein the initial model includes a feature extraction network, a feature conversion network, and an obstacle feature prediction network;根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。The initial model is trained according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- 根据权利要求1所述的方法,其特征在于,所述障碍物真实特征包括体素占用真实信息,所述障碍物特征预测网络包括体素占用预测网络;The method according to claim 1, characterized in that the real feature of the obstacle includes real information of voxel occupancy, and the obstacle feature prediction network includes a voxel occupancy prediction network;所述将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,包括:The step of inputting the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle includes:将所述环视图像输入所述特征提取网络,得到所述环视图像对应的二维特征;Inputting the surround view image into the feature extraction network to obtain two-dimensional features corresponding to the surround view image;将所述二维特征输入所述特征转换网络,得到所述障碍物对应的三维特征;Inputting the two-dimensional features into the feature conversion network to obtain three-dimensional features corresponding to the obstacles;将所述三维特征输入所述体素占用预测网络,获取所述体素占用预测网络输出的体素占用预测信息。The three-dimensional features are input into the voxel occupancy prediction network to obtain voxel occupancy prediction information output by the voxel occupancy prediction network.
- 根据权利要求2所述的方法,其特征在于,所述障碍物真实特征还包括体素对应的体素位置真实信息及体素尺寸真实信息,所述障碍物特征预测网络还包括体素位置预测网络和体素尺寸预测网络;The method according to claim 2 is characterized in that the real feature of the obstacle also includes the real information of the voxel position and the real information of the voxel size corresponding to the voxel, and the obstacle feature prediction network also includes the voxel position prediction network and the voxel size prediction network;所述将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征,还包括:The step of inputting the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle further includes:将所述三维特征输入所述体素位置预测网络,获取所述体素位置预测网络输出的体素位置预测信息;Inputting the three-dimensional features into the voxel position prediction network to obtain voxel position prediction information output by the voxel position prediction network;将所述三维特征输入所述体素尺寸预测网络,获取所述体素尺寸预测网络输出的体素尺寸预测信息。The three-dimensional features are input into the voxel size prediction network to obtain voxel size prediction information output by the voxel size prediction network.
- 根据权利要求3所述的方法,其特征在于,所述根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,包括:The method according to claim 3, characterized in that the training of the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles comprises:根据所述体素占用预测信息和所述体素占用真实信息,确定第一损失值;Determining a first loss value according to the voxel occupancy prediction information and the voxel occupancy actual information;根据所述体素位置预测信息和所述体素位置真实信息,确定第二损失值;Determining a second loss value according to the voxel position prediction information and the voxel position real information;根据所述体素尺寸预测信息和所述体素尺寸真实信息,确定第三损失值;Determining a third loss value according to the voxel size prediction information and the voxel size real information;基于所述第一损失值、所述第二损失值和所述第三损失值,调整所述初始模型的网络参数。Based on the first loss value, the second loss value, and the third loss value, a network parameter of the initial model is adjusted.
- 根据权利要求1至4中任一项所述的方法,其特征在于,所述基于所述点云数据, 以及对应三维空间中预设的体素,确定所述障碍物在每个体素中对应的障碍物真实特征,包括:The method according to any one of claims 1 to 4, characterized in that based on the point cloud data, And corresponding to the preset voxels in the three-dimensional space, determining the real obstacle features corresponding to the obstacle in each voxel, including:将所述点云数据转换至三维坐标系中;Converting the point cloud data into a three-dimensional coordinate system;按照预设尺寸对所述三维坐标系下的三维空间进行体素化,得到所述三维空间中的多个体素;voxelize the three-dimensional space in the three-dimensional coordinate system according to a preset size to obtain a plurality of voxels in the three-dimensional space;根据所述多个体素中每个体素包含的所述点云数据,利用所述体素内点云的关键点及点云的外接图形生成所述障碍物在每个体素中对应的障碍物真实特征,所述障碍物真实特征包括所述每个体素分别对应的体素占用真实信息、非空体素对应的体素位置真实信息和体素尺寸真实信息。According to the point cloud data contained in each voxel of the multiple voxels, the real features of the obstacle corresponding to the obstacle in each voxel are generated by utilizing the key points of the point cloud in the voxel and the external graphics of the point cloud. The real features of the obstacle include the real information of voxel occupancy corresponding to each voxel, the real information of voxel position corresponding to the non-empty voxel, and the real information of voxel size.
- 根据权利要求5所述的方法,其特征在于,所述根据所述多个体素中每个体素包含的所述点云数据,利用体素内点云的关键点及点云的外接图形生成所述障碍物在每个体素中对应的障碍物真实特征,包括:The method according to claim 5, characterized in that the step of generating the real obstacle features corresponding to the obstacle in each voxel using the key points of the point cloud in the voxel and the circumscribed graphics of the point cloud according to the point cloud data contained in each voxel of the plurality of voxels comprises:根据所述点云数据中每个点的位置,确定所述每个体素对应的体素占用真实信息,所述体素占用真实信息包括占用和未占用;Determine, according to the position of each point in the point cloud data, the real voxel occupancy information corresponding to each voxel, wherein the real voxel occupancy information includes occupied and unoccupied;针对所述体素占用真实信息为占用的非空体素,根据所述非空体素中的至少一个点,基于质心计算规则确定所述非空体素内点云的质心作为所述关键点;For the non-empty voxel whose real voxel occupancy information is occupied, determining the centroid of the point cloud in the non-empty voxel as the key point based on a centroid calculation rule according to at least one point in the non-empty voxel;根据所述关键点,基于预设的位置确定规则,确定所述非空体素的体素位置真实信息;According to the key points, based on a preset position determination rule, determining the real voxel position information of the non-empty voxel;利用所述非空体素中的至少一个点对应的外接图形,确定所述非空体素的体素尺寸真实信息;Determine the real voxel size information of the non-empty voxel by using the circumscribed figure corresponding to at least one point in the non-empty voxel;基于所述每个体素对应的所述体素占用真实信息、所述非空体素对应的所述体素位置真实信息和所述体素尺寸真实信息,生成所述障碍物在每个体素中对应的障碍物真实特征。Based on the voxel occupancy real information corresponding to each voxel, the voxel position real information corresponding to the non-empty voxel, and the voxel size real information, a real obstacle feature corresponding to the obstacle in each voxel is generated.
- 根据权利要求6所述的方法,其特征在于,所述位置确定规则,包括:The method according to claim 6, characterized in that the location determination rule comprises:将所述关键点对应的位置坐标,确定为所述非空体素的体素位置真实信息。The position coordinates corresponding to the key points are determined as the real voxel position information of the non-empty voxels.
- 根据权利要求6或7所述的方法,其特征在于,所述位置确定规则,包括:The method according to claim 6 or 7, characterized in that the position determination rule comprises:将所述非空体素的中心点相对于所述关键点的坐标偏移量确定为所述非空体素的位置信息;Determine the coordinate offset of the center point of the non-empty voxel relative to the key point as the position information of the non-empty voxel;所述根据所述关键点,基于预设的位置确定规则,确定所述非空体素的体素位置真实信息,包括:The step of determining the real voxel position information of the non-empty voxel based on the key point and a preset position determination rule includes:获取所述非空体素的中心点的第一坐标;Obtaining a first coordinate of a center point of the non-empty voxel;根据所述第一坐标以及所述关键点对应的第二坐标,确定所述中心点相对于所述关键点的坐标偏移量,所述坐标偏移量包括所述三维坐标系中每个坐标轴上分别对应的偏移量;Determine, according to the first coordinate and the second coordinate corresponding to the key point, a coordinate offset of the center point relative to the key point, wherein the coordinate offset includes an offset corresponding to each coordinate axis in the three-dimensional coordinate system;将所述第一坐标以及所述坐标偏移量,确定为所述非空体素的体素位置真实信息。The first coordinate and the coordinate offset are determined as the real voxel position information of the non-empty voxel.
- 根据权利要求6至8中任一项所述的方法,其特征在于,所述利用所述非空体素中的至少一个点对应的外接图形,确定所述非空体素的体素尺寸真实信息,包括:The method according to any one of claims 6 to 8, characterized in that the determining the real voxel size information of the non-empty voxel by using the circumscribed graph corresponding to at least one point in the non-empty voxel comprises:根据所述非空体素中的至少一个点,确定所述非空体素中所述至少一个点的最小外接 长方体;Determine the minimum circumscribed area of the at least one point in the non-empty voxel according to the at least one point in the non-empty voxel. cuboid;根据所述最小外接长方体的长宽高,确定所述非空体素的体素尺寸真实信息。According to the length, width and height of the minimum circumscribed cuboid, the real voxel size information of the non-empty voxel is determined.
- 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, characterized in that the method further comprises:在所述非空体素中包含一个点的情况下,确定所述最小外接长方体的长宽高均为0。When the non-empty voxel contains a point, it is determined that the length, width and height of the minimum circumscribed cuboid are all 0.
- 根据权利要求9或10所述的方法,其特征在于,所述根据所述最小外接长方体的长宽高,确定所述非空体素的体素尺寸真实信息,包括:The method according to claim 9 or 10, characterized in that the determining the real voxel size information of the non-empty voxel according to the length, width and height of the minimum circumscribed cuboid comprises:将所述最小外接长方体的每个边长分别与预设长度比较,其中,所述每个边长包括所述最小外接长方体的长、宽和高分别对应的长度;Compare each side length of the minimum circumscribed cuboid with a preset length, wherein each side length includes the lengths corresponding to the length, width, and height of the minimum circumscribed cuboid;在所述每个边长中存在小于所述预设长度的边长时,将小于所述预设长度的边长更新为所述预设长度;When there is a side length shorter than the preset length among each side length, updating the side length shorter than the preset length to the preset length;基于更新后的每个边长,确定所述非空体素的体素尺寸真实信息。Based on each updated edge length, the real voxel size information of the non-empty voxel is determined.
- 一种障碍物特征的获取方法,其特征在于,所述方法包括:A method for acquiring obstacle features, characterized in that the method comprises:获取障碍物的环视图像;Obtain a surround view of obstacles;将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征;Inputting the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle;其中,所述障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,所述特征提取网络用于提取所述环视图像对应的二维特征,所述特征转换网络用于将所述二维特征转换为三维特征,所述障碍物特征预测网络用于根据所述三维特征预测得到所述障碍物对应的障碍物预测特征。Among them, the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract the two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- 一种障碍物特征识别模型的训练装置,其特征在于,所述装置包括:A training device for an obstacle feature recognition model, characterized in that the device comprises:第一获取模块,用于获取同一障碍物对应的环视图像及点云数据;The first acquisition module is used to acquire the surround view image and point cloud data corresponding to the same obstacle;确定模块,用于基于所述点云数据,确定所述障碍物对应的障碍物真实特征;A determination module, used to determine the real features of the obstacle corresponding to the obstacle based on the point cloud data;第二获取模块,用于将所述环视图像输入待训练的初始模型,得到所述障碍物对应的障碍物预测特征;A second acquisition module is used to input the surround view image into the initial model to be trained to obtain obstacle prediction features corresponding to the obstacle;训练模块,用于根据所述障碍物对应的所述障碍物真实特征和所述障碍物预测特征对所述初始模型进行训练,得到障碍物特征识别模型。The training module is used to train the initial model according to the real obstacle features and the predicted obstacle features corresponding to the obstacles to obtain an obstacle feature recognition model.
- 一种障碍物特征的获取装置,其特征在于,所述装置包括:A device for acquiring obstacle characteristics, characterized in that the device comprises:图像获取模块,用于获取障碍物的环视图像;An image acquisition module is used to acquire a surround view image of obstacles;特征预测模块,用于将所述环视图像输入至预先训练完成的障碍物特征识别模型,得到所述障碍物对应的障碍物预测特征;A feature prediction module, used to input the surround view image into a pre-trained obstacle feature recognition model to obtain obstacle prediction features corresponding to the obstacle;其中,所述障碍物特征识别模型包括特征提取网络、特征转换网络和障碍物特征预测网络,所述特征提取网络用于提取所述环视图像对应的二维特征,所述特征转换网络用于将所述二维特征转换为三维特征,所述障碍物特征预测网络用于根据所述三维特征预测得到所述障碍物对应的障碍物预测特征。Among them, the obstacle feature recognition model includes a feature extraction network, a feature conversion network and an obstacle feature prediction network. The feature extraction network is used to extract the two-dimensional features corresponding to the surround view image, the feature conversion network is used to convert the two-dimensional features into three-dimensional features, and the obstacle feature prediction network is used to predict the obstacle prediction features corresponding to the obstacle based on the three-dimensional features.
- 一种计算机设备,其特征在于,包括:处理器和存储器; A computer device, comprising: a processor and a memory;所述处理器通过调用所述存储器存储的程序或指令,用于执行如权利要求1至11中任一项所述的障碍物特征识别模型的训练方法,或者,执行如权利要求12所述的障碍物特征的获取方法。The processor is used to execute the training method of the obstacle feature recognition model according to any one of claims 1 to 11, or to execute the method for acquiring obstacle features according to claim 12, by calling the program or instruction stored in the memory.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至11中任一项所述的障碍物特征识别模型的训练方法,或者,执行如权利要求12所述的障碍物特征的获取方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a program or instruction, wherein the program or instruction enables a computer to execute the training method of the obstacle feature recognition model as described in any one of claims 1 to 11, or execute the method for obtaining obstacle features as described in claim 12.
- 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如权利要求1至11中任一项所述的障碍物特征识别模型的训练方法,或者,如权利要求12所述的障碍物特征的获取方法。A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements the method for training an obstacle feature recognition model as described in any one of claims 1 to 11, or the method for acquiring obstacle features as described in claim 12.
- 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1至11中任一项所述的障碍物特征识别模型的训练方法,或者,如权利要求12所述的障碍物特征的获取方法。 A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, the computer executes the training method of the obstacle feature recognition model as described in any one of claims 1 to 11, or the method for obtaining obstacle features as described in claim 12.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310333989.7 | 2023-03-30 | ||
CN202310333989.7A CN118736524A (en) | 2023-03-30 | 2023-03-30 | Training method, device, equipment and storage medium for obstacle characteristic recognition model |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024199378A1 true WO2024199378A1 (en) | 2024-10-03 |
Family
ID=92848032
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/084501 WO2024199378A1 (en) | 2023-03-30 | 2024-03-28 | Obstacle feature recognition model training method and apparatus, device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118736524A (en) |
WO (1) | WO2024199378A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106707293A (en) * | 2016-12-01 | 2017-05-24 | 百度在线网络技术(北京)有限公司 | Obstacle recognition method and device for vehicles |
US10970518B1 (en) * | 2017-11-14 | 2021-04-06 | Apple Inc. | Voxel-based feature learning network |
CN113255444A (en) * | 2021-04-19 | 2021-08-13 | 杭州飞步科技有限公司 | Training method of image recognition model, image recognition method and device |
CN114638954A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud segmentation model training method, point cloud data segmentation method and related device |
-
2023
- 2023-03-30 CN CN202310333989.7A patent/CN118736524A/en active Pending
-
2024
- 2024-03-28 WO PCT/CN2024/084501 patent/WO2024199378A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106707293A (en) * | 2016-12-01 | 2017-05-24 | 百度在线网络技术(北京)有限公司 | Obstacle recognition method and device for vehicles |
US10970518B1 (en) * | 2017-11-14 | 2021-04-06 | Apple Inc. | Voxel-based feature learning network |
CN113255444A (en) * | 2021-04-19 | 2021-08-13 | 杭州飞步科技有限公司 | Training method of image recognition model, image recognition method and device |
CN114638954A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud segmentation model training method, point cloud data segmentation method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN118736524A (en) | 2024-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109737974B (en) | 3D navigation semantic map updating method, device and equipment | |
JP6745328B2 (en) | Method and apparatus for recovering point cloud data | |
CN111079619B (en) | Method and apparatus for detecting target object in image | |
CN114930401A (en) | Point cloud-based three-dimensional reconstruction method and device and computer equipment | |
WO2020053611A1 (en) | Electronic device, system and method for determining a semantic grid of an environment of a vehicle | |
CN110378919B (en) | Narrow-road passing obstacle detection method based on SLAM | |
CN111880191B (en) | Map generation method based on multi-agent laser radar and visual information fusion | |
CN114179788B (en) | Automatic parking method, system, computer readable storage medium and vehicle terminal | |
CN114677435A (en) | Point cloud panoramic fusion element extraction method and system | |
CN112381873B (en) | Data labeling method and device | |
WO2024197815A1 (en) | Engineering machinery mapping method and device, and readable storage medium | |
CN113724387A (en) | Laser and camera fused map construction method | |
CN114550116A (en) | Object identification method and device | |
CN111401190A (en) | Vehicle detection method, device, computer equipment and storage medium | |
CN118463965A (en) | Positioning accuracy evaluation method and device and vehicle | |
CN114241448A (en) | Method and device for obtaining heading angle of obstacle, electronic equipment and vehicle | |
CN117990085A (en) | Autonomous exploration map generation method and system based on double millimeter wave radar | |
CN112528918A (en) | Road element identification method, map marking method and device and vehicle | |
CN116642490A (en) | Visual positioning navigation method based on hybrid map, robot and storage medium | |
WO2024199378A1 (en) | Obstacle feature recognition model training method and apparatus, device, and storage medium | |
CN114648639B (en) | Target vehicle detection method, system and device | |
CN114353779B (en) | Method for rapidly updating robot local cost map by adopting point cloud projection | |
CN116520302A (en) | Positioning method applied to automatic driving system and method for constructing three-dimensional map | |
CN115909277A (en) | Obstacle detection method, obstacle detection device, electronic device and computer-readable storage medium | |
CN115661313A (en) | Point cloud map generation method, point cloud map generation device and storage medium |