Nothing Special   »   [go: up one dir, main page]

WO2022133986A1 - Accuracy estimation method and system - Google Patents

Accuracy estimation method and system Download PDF

Info

Publication number
WO2022133986A1
WO2022133986A1 PCT/CN2020/139323 CN2020139323W WO2022133986A1 WO 2022133986 A1 WO2022133986 A1 WO 2022133986A1 CN 2020139323 W CN2020139323 W CN 2020139323W WO 2022133986 A1 WO2022133986 A1 WO 2022133986A1
Authority
WO
WIPO (PCT)
Prior art keywords
point set
point
point cloud
processor
coordinate system
Prior art date
Application number
PCT/CN2020/139323
Other languages
French (fr)
Inventor
You Zhou
Lu Sun
Cansen JIANG
Original Assignee
SZ DJI Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co., Ltd. filed Critical SZ DJI Technology Co., Ltd.
Priority to PCT/CN2020/139323 priority Critical patent/WO2022133986A1/en
Publication of WO2022133986A1 publication Critical patent/WO2022133986A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road

Definitions

  • the present disclosure relates to autonomous driving navigation and, more particularly, to an accuracy estimation method and system.
  • Reliable lane recognition and positioning is a fundamental necessity for autonomous driving navigation.
  • Some conventional technologies use data obtained by sensors (e.g., image data captured by a camera, radar data acquired by a Lidar, and/or the like) to detect lane lines in real time.
  • sensors e.g., image data captured by a camera, radar data acquired by a Lidar, and/or the like
  • it is difficult to determine information such as whether a vehicle is on a highway or a city road, how fast the vehicle can drive on a road with unlimited speed, a curvature of the road ahead, a strength of a Global Positioning System (GPS) signal of the road where the vehicle is located, and the like, in real time.
  • GPS Global Positioning System
  • the high-precision electronic maps are also referred as high resolution (HD) maps.
  • HD maps provide lane-level information and other road information (e.g., a curvature, heading direction, slope, cross-slope angle, and the like, of the road) , which are important for the safety and comfort of unmanned vehicles.
  • the lane lines need to be labeled in advance on the HD maps.
  • the conventional labeling methods are highly dependent on manual labor. Even though manual labeling is accurate for the conventional navigation maps formed by two-dimensional (2D) images, it is difficult to achieve high accuracy for the HD maps formed by three-dimensional (3D) point clouds. It is difficult for human eyes to distinguish the lane lines on the HD maps forming by colorless point clouds obtained by a Lidar. As such, a labeling accuracy may be poor and a labeling efficiency may be low. Therefore, a method for estimating an accuracy of the lane line is needed to ensure a reliable lane recognition and positioning.
  • an accuracy estimation method including obtaining a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image, obtaining a position point set for the object in the 2D image by detecting the object in the 2D image, and estimating an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  • an accuracy estimation system including a processor and a memory coupled to the processor.
  • the memory stores instructions that, when executed by the processor, cause the processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  • a vehicle includes a camera configured to capture the 2D image, a Lidar configured to acquire the 3D point cloud, a processor, and a memory coupled to the camera, the Lidar, and the processor.
  • the memory can store instructions that, when executed by the processor, cause the processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  • a non-transitory computer-readable medium storing instructions that, when being executed, cause a processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  • FIG. 1 is a schematic block diagram of a vehicle consistent with embodiments of the disclosure.
  • FIG. 2 is a schematic block diagram of an accuracy estimation system consistent with embodiments of the disclosure.
  • FIG. 3 is a flow chart of an accuracy estimation method consistent with embodiments of the disclosure.
  • FIG. 4 is a flow chart of obtaining an original point set for an object from a three-dimensional (3D) point cloud consistent with embodiments of the disclosure.
  • FIG. 5 is a flow chart of obtaining a dense point set for an object from a 3D point cloud consistent with embodiments of the disclosure.
  • FIG. 6 schematically shows lane lines marked in a two-dimensional (2D) bird-view image consistent with the disclosure.
  • FIG. 7 schematically shows coordinate systems consistent with embodiments of the disclosure.
  • FIG. 8 is a flow chart of a coordinate system transformation method consistent with embodiments of the disclosure.
  • FIGs. 9A and 9B schematically show a translation transformation and a rotation transformation consistent with embodiments of the disclosure.
  • FIG. 10 is a flow chart of another accuracy estimation method consistent with embodiments of the disclosure.
  • FIG. 11 is a flow chart of another accuracy estimation method consistent with embodiments of the disclosure.
  • the present disclosure provides an accuracy estimation method and system that can quantitatively estimate an accuracy of a lane line that is generated from a three-dimensional (3D) point cloud captured by a Lidar.
  • the generated lane line can be projected onto a two-dimensional (2D) image captured by a camera, and the projected lane line can be compared with a 2D lane line detected from the 2D image to determine the accuracy of the lane line. If the accuracy satisfies a predetermined requirement, the generated lane line can be used in a high resolution (HD) map. If the accuracy does not satisfy the predetermined requirement, an adaptation strategy can be determined. As such, a reliable lane recognition and positioning can be ensured.
  • HD high resolution
  • FIG. 1 is a schematic block diagram of an example vehicle 100 consistent with the disclosure.
  • the vehicle 100 can include a driverless car, a mobile robot, an intelligent electrical vehicle, an unmanned aerial vehicle (UAV) , or any mobile body using the autonomous driving navigation technology.
  • the vehicle 100 includes a camera 101 configured to capture the 2D images of the surrounding environment of the vehicle 100, a Lidar 102 configured to acquire the 3D point clouds through beaming laser points at the road, and an accuracy estimation system 103 configured to estimate the accuracy of the lane lines generated from the 3D point clouds.
  • the vehicle 100 can include a plurality of cameras 101 and a plurality of Lidars 102.
  • the camera 101 can include a video recorder, a digital camera, an infrared camera, and the like.
  • the camera 101 can be mounted on a front, rear, side, or any suitable position of the vehicle 100.
  • the lane lines and obstacles e.g., other vehicles, pedestrians, and the like
  • the Lidar 102 can include a 32-channel Lidar, 64-channel Lidar, 128-channel Lidar, and the like. The number of channels of the Lidar 102 can be selected according to actual needs, which will not be limited herein.
  • the Lidar 102 can be mounted on a top, front, side, or any suitable position of the vehicle 100. As such, the 3D point clouds can be acquired by the Lidar 102.
  • the camera 101 and the Lidar 102 can be communicatively coupled to the accuracy estimation system 103. Therefore, the camera 101 can send image data to the accuracy estimation system 103 and the Lidar 102 can send the 3D point clouds to the accuracy estimation system 103. In some other embodiments, the camera 101 and the Lidar 102 can be communicatively coupled to the vehicle 100 and the vehicle 100 can be communicatively coupled to the accuracy estimation system 103. The camera 101 can send the image data to the vehicle 100 and the Lidar 102 can send the 3D point clouds to the vehicle 100. The vehicle 100 can further send the image data and the 3D point clouds to the accuracy estimation system 103.
  • the vehicle 100 can further include a global navigation satellite system (GNSS) , for example, a Global Positioning System (GPS) or the like, configured to use satellites to detect real-time positions of the vehicle 100 in a global coordinate system.
  • GNSS global navigation satellite system
  • GPS Global Positioning System
  • RTK real-time kinematic
  • the vehicle 100 can further include an inertial navigation system (INS) including motion sensors (e.g., accelerometers) and rotation sensors (e.g., gyroscopes) .
  • INS inertial navigation system
  • the INS can be configured to continuously calculate a pose, a velocity, and the like, of the vehicle 100.
  • the GNSS and the INS can be communicatively coupled to the accuracy estimation system 103. Therefore, the GNSS can send the position data of the vehicle 100 to the accuracy estimation system 103 and the INS can send pose data and velocity data of the vehicle 100 to the accuracy estimation system 103. In some other embodiments, the GNSS and the INS can be communicatively coupled to the vehicle 100 and the vehicle 100 can be communicatively coupled to the accuracy estimation system 103. The GNSS can send the position data to the vehicle 100 and the INS can send the pose data and the velocity data to the vehicle 100. The vehicle 100 can further send the position data, pose data, and velocity data of the vehicle 100 to the accuracy estimation system 103.
  • the accuracy estimation system 103 can be integrated into the vehicle 100. In some other embodiments, the accuracy estimation system 103 can be an independent system mounted at the vehicle 100.
  • FIG. 2 is a schematic block diagram of the example accuracy estimation system 103 consistent with the disclosure. The accuracy estimation system 103 can be used to estimate the accuracy of the lane line generated from the 3D point clouds.
  • the accuracy estimation system 103 includes a processor 1031 and a memory 1032 coupled to the processor 1031.
  • the processor 1031 may be any suitable hardware processor, such as a microprocessor, a micro-controller, a central processing unit (CPU) , a network processor (NP) , a digital signal processor (DSP) , an application specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , or another programmable logic device, discrete gate or transistor logic device, and discrete hardware component.
  • the memory 1032 may include a non-transitory computer-readable storage medium, such as a random access memory (RAM) , a read only memory, a flash memory, a volatile memory, a hard disk storage, or an optical media.
  • the memory 1032 may store computer program instructions and/or data, e.g., the 2D images, the 3D point clouds, the position data, the pose data, and the velocity data of the vehicle 100, and/or the like.
  • the processor 1031 can be configured to execute the computer program instructions that are stored in the memory 1032, to perform the accuracy estimation method consistent with the disclosure, such as one of the example methods described below.
  • the accuracy estimation system 103 can further include a communication device.
  • the communication device can be configured to communicate with the vehicle 100, the camera 101, and/or the Lidar 102.
  • the communication device can be further configured to communicate with the GNSS and the INS.
  • FIG. 3 is a flow chart of an example accuracy estimation method 300 consistent with the disclosure.
  • An execution entity of the method 300 can include an accuracy estimation system (e.g., the accuracy estimation system 103 described above) , or a processor of the accuracy estimation system (e.g., the processor 1031 of the accuracy estimation system 103 described above) .
  • a projection point set for an object in a 2D image is obtained based on a 3D point cloud data corresponding to the 2D image.
  • the 2D image can be obtained by a camera, for example, the camera 101 described above.
  • an original point set for the object in a 3D global coordinate system of the 3D point cloud can be obtained, and the projection point set can be obtained according to the original point set for the object.
  • the 3D point cloud can be acquired by a Lidar, such as the Lidar 102 described above.
  • the 3D global coordinate system of the 3D point cloud can be simply referred to as the “global coordinate system.
  • the global coordinate system can include a world coordinate system having a direction of true north as an X-axis, a direction of true east as a Y-axis, and a direction toward a center of the earth is a Z-axis.
  • the object can include a lane line, and the original point set for the lane line can include a plurality of control points of the lane line.
  • the object can include a vehicle, a pedestrian, or any object that can be detected by the Lidar and the camera.
  • FIG. 4 is a flow chart of obtaining the original point set for the object from the 3D point cloud consistent with the disclosure.
  • the 3D point cloud is acquired by the Lidar.
  • a driver can drive a collection vehicle, e.g., the vehicle 100 described above, to a scene where a HD map needs to be constructed.
  • the 3D point cloud can be acquired by the Lidar (e.g., the Lidar 102 described above) mounted on the collection vehicle.
  • the 3D point cloud of a same scene may be acquired more than once.
  • a dense point cloud is obtained according to the 3D point cloud. Assumes the object include the lane line. If the number of channels of the Lidar are not high enough, the clean point cloud may be sparse. The sparse point cloud may be not enough for a lane line positioning.
  • FIG. 5 is a flow chart of obtaining the dense point set for the object from the 3D point cloud consistent with the disclosure.
  • the 3D point cloud can be aligned with navigation data.
  • the navigation data can include data obtained by a combination of a GNSS (e.g., the GNSS in vehicle 100 described above) and a RTK (e.g., the RTK in vehicle 100 described above) . Since the 3D point cloud is obtained by the Lidar and the navigation data is obtained by the combination of the GNSS and the RTK, an alignment of the 3D point cloud and the navigation data on a physical time axis is needed to ensure a one-to-one correspondence of the 3D point cloud and the navigation data.
  • GNSS e.g., the GNSS in vehicle 100 described above
  • RTK e.g., the RTK in vehicle 100 described above
  • the one-to-one correspondence can be performed according to a global time stamp of each data point stored with the 3D point cloud and the navigation data.
  • the global time stamp can be obtained according to a high-precision clock provided by the GNSS (e.g., a GPS) .
  • a preprocessing can be performed on the 3D point cloud to obtain the clean point cloud.
  • the preprocessing can include deleting data points corresponding to obstacles from the 3D point cloud.
  • the 3D point cloud acquired by the Lidar may include the obstacles on the road, for example, vehicles, pedestrians, street lights, and the like, which can have an adverse effect on the lane line positioning from the 3D point cloud.
  • the obstacles can be detected from the 3D point cloud using any suitable machine learning method, such as a deep learning network (e.g., an unsupervised pre-trained network, a convolutional neural network, a recurrent neural network, a recursive neural network, or the like) , and then be deleted from the 3D point cloud to obtain the clean point cloud.
  • a deep learning network e.g., an unsupervised pre-trained network, a convolutional neural network, a recurrent neural network, a recursive neural network, or the like
  • the pose of the clean point cloud can be calculated using graph-based optimization.
  • each frame obtained by the Lidar can include one clean point cloud.
  • Rough poses of all clean point clouds obtained by the combination of the GNSS and the RTK can be used as vertexes of a graph.
  • Relative pose relationships between frames can be used as edges of the graph.
  • the relative pose relationships between frames can include relative pose relationships between temporal neighbor frames. All clean point clouds can be traversed according to a time sequence, and an inter-frame registration in the time sequence can be performed to obtain the relative pose relationships between the temporal neighbor frames. In some embodiments, performing the inter-frame registration can include selecting points from each clean point cloud using a normal space sampling method (NSS) , and performing iterative closest point (ICP) on the selected points of the temporal neighbor frames to obtain the relative pose relationships between the temporal neighbor frames.
  • NSS normal space sampling method
  • ICP iterative closest point
  • the relative pose relationships between frames can include relative pose relationships between spatial proximity point clouds.
  • the spatial proximity point clouds can refer to the clean point clouds from multiple frames that are close in space but not continuous in time sequence, for example, the clean point clouds of a same position acquired at different times. For example, according to global position information provided by the GNSS, the point cloud corresponding to each frame can be determined, and the iterative closest point (ICP) can be performed on the spatial proximity point clouds to obtain the relative pose relationships between the spatial proximity point clouds.
  • ICP iterative closest point
  • the pose of the clean point cloud can be obtained by solving an objective function of the graph.
  • the objective function can include minimizing a sum of residuals of the vertexes of the graph.
  • the pose of the clean point cloud can be also referred to as a pose of the 3D point cloud.
  • a point registration is performed on the clean point clouds within a distance range.
  • performing the point registration on the clean point clouds within the distance range can include performing the graph-based optimization on a graph forming by the poses of the clean point clouds, obtained at 501, within the distance range.
  • the distance range can be determined according to actual needs, for example, the distance range can be set as 100 meters, 200 meters, or the like.
  • the pose obtained at 501 may be not accurate. Therefore, performing again the graph-based optimization on the graph forming by the poses of the clean point clouds can increase an accuracy of the poses of the clean point clouds.
  • the processes of performing the graph-based optimization are similar to the processes described at 501, and detailed description is omitted herein.
  • a point cloud stacking is performed on the clean point cloud to obtain the dense point cloud.
  • clean point clouds obtained at a previous moment and a next moment can be projected to a current moment.
  • the clean point clouds obtained at the previous moment and the next moment can be projected to the current moment according to a motion state estimation of the vehicle (e.g., a moving distance, a change of a pose of the vehicle, and the like) .
  • the clean point clouds obtained at the previous moment can include the clean point clouds obtained at one or more previous frames
  • the clean point clouds obtained at the next moment can include the clean point clouds obtained at one or more next frames.
  • the dense point cloud is projected from the global coordinate system to a bird-view coordinate system to obtain a 2D bird-view image.
  • a corresponding 2D bird-view point can be obtained by ignoring a coordinate value of the point in a height direction (i.e., Z axis) and remain coordinate values of the point in horizontal and vertical directions (i.e., X and Y axes) .
  • a coordinate of a point p in the global coordinate system is (x p , y p , z p ) .
  • a coordinate of a corresponding 2D bird-view point p b can be (x p , y p ) .
  • the processes at 503 can be omitted, and the clean point cloud can be projected from the global coordinate system to the bird-view coordinate system to obtain the 2D bird-view image at 403.
  • the object is marked in the 2D bird-view image. Assumes the object includes the lane line.
  • FIG. 6 schematically shows example lane lines marked in the 2D bird-view image consistent with the disclosure.
  • the plurality of control points (denoted as yellow points in FIG. 6) can be marked on the 2D bird-view image by using a marking software, and the lane lines can be drawn on the 2D bird-view image.
  • the plurality of control points can be selected from the 2D bird-view points at a regular interval (e.g., 0.5m, 1m, or the like) in the 2D bird-view image.
  • the original point set for the lane line can be determined as the plurality of control points.
  • obtaining of the projection point set for the object in the 2D image can further include projecting the original point set for the object in the 3D point cloud from the global coordinate system to a 2D coordinate system of the 2D image.
  • the 2D coordinate system of the 2D image can be simply referred to as an “image coordinate system. ” Since the global coordinate system is different from the image coordinate system, projecting the point set for the lane in the 3D point cloud from the global coordinate system to the image coordinate system can be accomplished by performing coordinate system transformations from global coordinate system to the image coordinate system.
  • FIG. 7 schematically shows example coordinate systems consistent with the disclosure.
  • the coordinate systems includes the global coordinate system denoted as (X, Y, Z) , a vehicle body coordinate system denoted as (X v , Y v , Z v ) , a camera coordinate system denoted as (X c , Y c , Z c ) , a Lidar coordinate system denoted as (X L , Y L , Z L ) , and the image coordinate system denoted as (x, y) .
  • FIG. 8 is a flow chart of an example coordinate system transformation method 800 consistent with the disclosure. As shown in FIG. 8, at 801, the original point set for the object in the 3D point cloud is converted from the global coordinate system to the vehicle body coordinate system to obtain a first conversion point set for the object in the vehicle body coordinate system.
  • converting the original point set for the object in the 3D point cloud from the global coordinate system to the vehicle body coordinate system can include multiplying the original point set for the object by an inverse of the pose of the clean point cloud to obtain the first conversion point set.
  • the pose can include a pose rotation matrix and a pose translation matrix.
  • the conversion equation can be as follows:
  • R p represents the pose rotation matrix of size 3 ⁇ 3
  • T p represents the pose translation matrix of size 3 ⁇ 1
  • [x p , y p , z p ] T represents a 3D point from the original point set for the object in the 3D point cloud
  • [x v , y v , z v ] T represents a point from the first conversion point set for the object in the vehicle body coordinate system.
  • the first conversion point set for the object is converted from the vehicle body coordinate system to the camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system.
  • converting the first conversion point set for the object from the vehicle body coordinate system to the camera coordinate system can include multiplying the first conversion point set for the object by an inverse matrix of an extrinsic matrix for the camera capturing the 2D image to obtain the second conversion point set.
  • the extrinsic matrix can be a conversion matrix from the vehicle coordinate system to the camera coordinate system and can include a camera rotation matrix and a camera translation matrix.
  • the conversion equation can be as follows:
  • R c represents the camera rotation matrix of size 3 ⁇ 3
  • T c represents the camera translation matrix of size 3 ⁇ 1
  • [x v , y v , z v ] T represents the point from the first conversion point set for the object in the vehicle body coordinate system
  • [x c , y c , z c ] T represents a point from the second conversion point set for the object in the camera coordinate system.
  • FIGs. 9A and 9B schematically show an example translation transformation and an example rotation transformation consistent with the disclosure.
  • FIGs. 9A and 9B shows the translation transformation and the rotation transformation projected in a 2D coordinate plane.
  • a point P v can linearly translate to a point P c through adding a translation vector to the point P v , which can be expressed as the following equation:
  • T c represents the camera translation matrix
  • the point P v can move to the point P c through rotating ⁇ degrees in the clockwise direction, which can be expressed as the following equation:
  • R x represents a rotation matrix around X c axis
  • R y represents a rotation matrix around X y axis
  • R z represents a rotation matrix around X z axis
  • R c represents the camera rotation matrix
  • the extrinsic matrix for the camera can be obtained by pre-calibration.
  • the camera rotation matrix R c and the camera translation matrix T c can be calculated based on the Eqs. 2 to 5.
  • the second conversion point set for the object is projected to the 2D coordinate system to obtain the projection point set for the object according to a projection correlation.
  • [x, y] T represents the point from the projection point set in the image coordinate system
  • K represents the intrinsic matrix
  • the intrinsic matrix can be expressed as follows:
  • K represents the intrinsic matrix
  • ⁇ x fm x
  • ⁇ y fm y
  • f represents a local length of the camera
  • m x and m y represent scale factors in the x and y directions (i.e., a number of pixels per unit distance)
  • represents skew parameters between the x and y axes
  • (c x , c y ) as shown in FIG. 7 represents a principal point of the 2D image, which is ideally in a center of the 2D image.
  • a position point set for the object in the 2D image is obtained by detecting the object in the 2D image.
  • a machine learning method such as a convolutional neural network (CNN) or any suitable deep learning method, can be used to detect the object in the 2D image.
  • the position point set for the object in the 2D image can be determined as pixel points of the 2D image falling in the object. Taking the lane line as an example of the object, a road surface can be first detected in the 2D image, and the lane line can be then detected in the road surface.
  • an accuracy associated with the projection point set is estimated by comparing the projection point set for the object and the position point set for the object.
  • a corresponding position point in the position point set that is closest to the projection point can be found, and a distance between the projection point and the corresponding position point can be calculated.
  • the distance can include a Euclidean distance between the projection point and the corresponding position point, which can be calculated using the following equation:
  • d represents the Euclidean distance
  • p represents a projection point
  • (p x , p y ) represents coordinates of p in the image coordinate system
  • q represents a position point corresponding to p
  • (q x , q y ) represents coordinates of q in the image coordinate system.
  • a weighted average of the distances between the projection points and the corresponding position points can be calculated as the average distance between the projection point set and the position point set.
  • a weight in calculating the weighted average can equal a quotient of an experience value divided by a depth value of the projection point, which can be expressed as:
  • W represents the weight
  • E represents the experience value
  • z d represents the depth value of the projection point.
  • the depth value can equal a distance between a point in the 3D point cloud that corresponds to the projection point and the camera capturing the 2D image.
  • the depth value can be determined according to distance information of the point obtained by the Lidar and a relationship between mounting positions of the Lidar and the camera.
  • FIG. 10 is a flow chart of another accuracy estimation method 1000 consistent with embodiments of the disclosure. As shown in FIG. 10, on the basis of the method 300, the method 1000 further includes the following processes.
  • an adaption strategy is determined according to the average distance.
  • a small average distance between the projection point set for the lane and the position point set for the lane may indicate that the point set for the lane obtained from the 3D point cloud is accurate.
  • a large average distance may indicate that the point set for the lane obtained from the 3D point cloud may be not accurate.
  • the adaption strategy can be determined according to a comparison result of the average distance and a threshold distance. The threshold distance can be determined according to actual needs.
  • the adaption strategy can include calibrating a mounting position of the camera capturing the 2D image in response to the average distance being larger than the threshold distance. For example, whether the mounting position of the camera is accurate can be checked. As another example, whether an installation of the camera is loose can be checked.
  • the adaption strategy can include verifying the intrinsic matrix for the camera capturing the 2D image in response to the average distance being larger than the threshold distance. For example, if the comparison results of an entire section of the road are very poor, the calibration of the intrinsic matrix may be abnormal and need to be recalibrated.
  • the adaption strategy can include correcting marking of the control points for the object in the 3D point cloud in response to the average distance being larger than the threshold distance. For example, some roads may be manually marked incorrectly, or the control points may be not enough, a curvature deviation is too large, and the like. The control points for the lane line can be corrected by checking marking results.
  • the adaption strategy can include correcting the pose of the point cloud in response to the average distance being larger than the threshold distance. For example, the pose of the point cloud obtained at 501 and 502 may be not accurate, and the pose of the point cloud may need to be corrected.
  • FIG. 11 is a flow chart of another example accuracy estimation method 1100 consistent with the disclosure.
  • An execution entity of the method 1100 can include an accuracy estimation system (e.g., the accuracy estimation system 103 described above) , or a processor of the accuracy estimation system (e.g., the processor 1031 of the accuracy estimation system 103 described above) .
  • the method 1100 further includes the following processes.
  • the accuracy associated with the projection point set is determined according to the average distance.
  • the accuracy associated with the projection point set can be also referred to as “an accuracy of the lane line generated in the 2D bird-view image. ”
  • a large average distance can indicate a low accuracy of the lane generated in the 2D bird-view image, and a small average distance can indicate a high accuracy of the lane generated in the 2D bird-view image. That is, the average distance and the accuracy of the lane can have an inverse relationship.
  • the inverse relationship between the average distance and the accuracy of the lane can be expressed as follows:
  • the accuracy associated with the projection point set is output.
  • the accuracy of the lane line can be output in real time.
  • a prompt associated with the accuracy of the lane line can be displayed on, for example, a screen, a steering wheel, an instrument panel, and/or the like, of the vehicle.
  • the prompt can be displayed in response to the accuracy of the lane being lower than a threshold value.
  • the threshold value can be predetermined according to actual needs.
  • the threshold value can be 90%, 95%, or the like.
  • the prompt can include a symbol, color, voice, or vibration for risk warning.
  • the prompt can be generated to notice a driver.
  • different accuracy may correspond to different risk levels, and the different risk levels can have different display manners. For example, there may be a first threshold value and a second threshold value smaller than the first threshold value. If the accuracy is higher than the first threshold value, an indicator light can emit green light indicating a safety level. If the accuracy is smaller than the first threshold value and higher than the second threshold value, the indicator light can emit yellow light indicating that the driver needs to pay attention. If the accuracy is smaller than the second threshold value, the indicator light can emit red light indicating a dangerous level.
  • the accuracy estimation method and system can estimate the accuracy of the lane line that are automatically generated from the 3D point clouds. If the accuracy does not satisfy the predetermined requirement, the lane line can be further manually adjusted. As such, the semi-automatic labeling of high HD maps can be realized, thereby increasing the labeling efficiency, reducing the manual workload, and improving the labeling accuracy.
  • the present disclosure further provides a non-transitory computer-readable medium.
  • the non-transitory computer-readable medium can include a random access memory (RAM) , a read only memory, a flash memory, a volatile memory, a hard disk storage, an optical media, or the like.
  • the non-transitory computer-readable medium can store instructions that, when being executed, causing a processor to execute an accuracy estimation method consistent with the disclosure, for example, the methods in FIGs. 3 to 5, 8, 10, and 11.
  • the components in the figures associated with the device embodiments can be coupled in a manner different from that shown in the figures as needed. Some components may be omitted and additional components may be added.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

An accuracy estimation method includes obtaining a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image (301), obtaining a position point set for the object in the 2D image by detecting the object in the 2D image (302), and estimating an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object (303).

Description

ACCURACY ESTIMATION METHOD AND SYSTEM TECHNICAL FIELD
The present disclosure relates to autonomous driving navigation and, more particularly, to an accuracy estimation method and system.
BACKGROUND
Reliable lane recognition and positioning is a fundamental necessity for autonomous driving navigation. Some conventional technologies use data obtained by sensors (e.g., image data captured by a camera, radar data acquired by a Lidar, and/or the like) to detect lane lines in real time. However, based on the data obtained by sensors, it is difficult to determine information, such as whether a vehicle is on a highway or a city road, how fast the vehicle can drive on a road with unlimited speed, a curvature of the road ahead, a strength of a Global Positioning System (GPS) signal of the road where the vehicle is located, and the like, in real time.
Other conventional technologies obtain the lane lines based on high-precision electronic maps. The high-precision electronic maps are also referred as high resolution (HD) maps. Unlike conventional navigation maps providing only road level information, the HD maps provide lane-level information and other road information (e.g., a curvature, heading direction, slope, cross-slope angle, and the like, of the road) , which are important for the safety and comfort of unmanned vehicles.
The lane lines need to be labeled in advance on the HD maps. The conventional labeling methods are highly dependent on manual labor. Even though manual labeling is accurate for the conventional navigation maps formed by two-dimensional (2D) images, it is difficult to achieve high accuracy for the HD maps formed by three-dimensional (3D) point clouds. It is difficult for human eyes to distinguish the lane lines on the HD maps forming by colorless point clouds obtained by a Lidar. As such, a labeling accuracy may be poor and a labeling efficiency may be low. Therefore, a  method for estimating an accuracy of the lane line is needed to ensure a reliable lane recognition and positioning.
SUMMARY
In accordance with the present disclosure, there is provided an accuracy estimation method including obtaining a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image, obtaining a position point set for the object in the 2D image by detecting the object in the 2D image, and estimating an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
Also in accordance with the present disclosure, there is provided an accuracy estimation system including a processor and a memory coupled to the processor. The memory stores instructions that, when executed by the processor, cause the processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
Also in accordance with the present disclosure, there is provided a vehicle includes a camera configured to capture the 2D image, a Lidar configured to acquire the 3D point cloud, a processor, and a memory coupled to the camera, the Lidar, and the processor. The memory can store instructions that, when executed by the processor, cause the processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
Also in accordance with the present disclosure, there is provided a  non-transitory computer-readable medium storing instructions that, when being executed, cause a processor to obtain the projection point set for the object in the 2D image based on the 3D point cloud corresponding to the 2D image, obtain the position point set for the object in the 2D image by detecting the object in the 2D image, and estimate the accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic block diagram of a vehicle consistent with embodiments of the disclosure.
FIG. 2 is a schematic block diagram of an accuracy estimation system consistent with embodiments of the disclosure.
FIG. 3 is a flow chart of an accuracy estimation method consistent with embodiments of the disclosure.
FIG. 4 is a flow chart of obtaining an original point set for an object from a three-dimensional (3D) point cloud consistent with embodiments of the disclosure.
FIG. 5 is a flow chart of obtaining a dense point set for an object from a 3D point cloud consistent with embodiments of the disclosure.
FIG. 6 schematically shows lane lines marked in a two-dimensional (2D) bird-view image consistent with the disclosure.
FIG. 7 schematically shows coordinate systems consistent with embodiments of the disclosure.
FIG. 8 is a flow chart of a coordinate system transformation method consistent with embodiments of the disclosure.
FIGs. 9A and 9B schematically show a translation transformation and a rotation transformation consistent with embodiments of the disclosure.
FIG. 10 is a flow chart of another accuracy estimation method consistent with embodiments of the disclosure.
FIG. 11 is a flow chart of another accuracy estimation method consistent  with embodiments of the disclosure.
DESCRIPTION OF THE EMBODIMENTS
Hereinafter, embodiments consistent with the disclosure will be described with reference to the drawings, which are merely examples for illustrative purposes and are not intended to limit the scope of the disclosure. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The present disclosure provides an accuracy estimation method and system that can quantitatively estimate an accuracy of a lane line that is generated from a three-dimensional (3D) point cloud captured by a Lidar. The generated lane line can be projected onto a two-dimensional (2D) image captured by a camera, and the projected lane line can be compared with a 2D lane line detected from the 2D image to determine the accuracy of the lane line. If the accuracy satisfies a predetermined requirement, the generated lane line can be used in a high resolution (HD) map. If the accuracy does not satisfy the predetermined requirement, an adaptation strategy can be determined. As such, a reliable lane recognition and positioning can be ensured.
FIG. 1 is a schematic block diagram of an example vehicle 100 consistent with the disclosure. The vehicle 100 can include a driverless car, a mobile robot, an intelligent electrical vehicle, an unmanned aerial vehicle (UAV) , or any mobile body using the autonomous driving navigation technology. As shown in FIG. 1, the vehicle 100 includes a camera 101 configured to capture the 2D images of the surrounding environment of the vehicle 100, a Lidar 102 configured to acquire the 3D point clouds through beaming laser points at the road, and an accuracy estimation system 103 configured to estimate the accuracy of the lane lines generated from the 3D point clouds. In some embodiments, the vehicle 100 can include a plurality of cameras 101 and a plurality of Lidars 102.
The camera 101 can include a video recorder, a digital camera, an infrared camera, and the like. The camera 101 can be mounted on a front, rear, side,  or any suitable position of the vehicle 100. As such, the lane lines and obstacles (e.g., other vehicles, pedestrians, and the like) on the road can be detected from the 2D images captured by the camera 101. The Lidar 102 can include a 32-channel Lidar, 64-channel Lidar, 128-channel Lidar, and the like. The number of channels of the Lidar 102 can be selected according to actual needs, which will not be limited herein. The Lidar 102 can be mounted on a top, front, side, or any suitable position of the vehicle 100. As such, the 3D point clouds can be acquired by the Lidar 102.
In some embodiments, the camera 101 and the Lidar 102 can be communicatively coupled to the accuracy estimation system 103. Therefore, the camera 101 can send image data to the accuracy estimation system 103 and the Lidar 102 can send the 3D point clouds to the accuracy estimation system 103. In some other embodiments, the camera 101 and the Lidar 102 can be communicatively coupled to the vehicle 100 and the vehicle 100 can be communicatively coupled to the accuracy estimation system 103. The camera 101 can send the image data to the vehicle 100 and the Lidar 102 can send the 3D point clouds to the vehicle 100. The vehicle 100 can further send the image data and the 3D point clouds to the accuracy estimation system 103.
In some embodiments, the vehicle 100 can further include a global navigation satellite system (GNSS) , for example, a Global Positioning System (GPS) or the like, configured to use satellites to detect real-time positions of the vehicle 100 in a global coordinate system. In some embodiments, a real-time kinematic (RTK) positioning can be used to enhance a precision of position data derived from the GNSS. In some embodiments, the vehicle 100 can further include an inertial navigation system (INS) including motion sensors (e.g., accelerometers) and rotation sensors (e.g., gyroscopes) . The INS can be configured to continuously calculate a pose, a velocity, and the like, of the vehicle 100.
In some embodiments, the GNSS and the INS can be communicatively coupled to the accuracy estimation system 103. Therefore, the GNSS can  send the position data of the vehicle 100 to the accuracy estimation system 103 and the INS can send pose data and velocity data of the vehicle 100 to the accuracy estimation system 103. In some other embodiments, the GNSS and the INS can be communicatively coupled to the vehicle 100 and the vehicle 100 can be communicatively coupled to the accuracy estimation system 103. The GNSS can send the position data to the vehicle 100 and the INS can send the pose data and the velocity data to the vehicle 100. The vehicle 100 can further send the position data, pose data, and velocity data of the vehicle 100 to the accuracy estimation system 103.
In some embodiments, the accuracy estimation system 103 can be integrated into the vehicle 100. In some other embodiments, the accuracy estimation system 103 can be an independent system mounted at the vehicle 100. FIG. 2 is a schematic block diagram of the example accuracy estimation system 103 consistent with the disclosure. The accuracy estimation system 103 can be used to estimate the accuracy of the lane line generated from the 3D point clouds.
As shown in FIG. 2, the accuracy estimation system 103 includes a processor 1031 and a memory 1032 coupled to the processor 1031. The processor 1031 may be any suitable hardware processor, such as a microprocessor, a micro-controller, a central processing unit (CPU) , a network processor (NP) , a digital signal processor (DSP) , an application specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , or another programmable logic device, discrete gate or transistor logic device, and discrete hardware component. The memory 1032 may include a non-transitory computer-readable storage medium, such as a random access memory (RAM) , a read only memory, a flash memory, a volatile memory, a hard disk storage, or an optical media. The memory 1032 may store computer program instructions and/or data, e.g., the 2D images, the 3D point clouds, the position data, the pose data, and the velocity data of the vehicle 100, and/or the like. The processor 1031 can be configured to execute the  computer program instructions that are stored in the memory 1032, to perform the accuracy estimation method consistent with the disclosure, such as one of the example methods described below. In some embodiments, the accuracy estimation system 103 can further include a communication device. The communication device can be configured to communicate with the vehicle 100, the camera 101, and/or the Lidar 102. In some embodiments, the communication device can be further configured to communicate with the GNSS and the INS.
FIG. 3 is a flow chart of an example accuracy estimation method 300 consistent with the disclosure. An execution entity of the method 300 can include an accuracy estimation system (e.g., the accuracy estimation system 103 described above) , or a processor of the accuracy estimation system (e.g., the processor 1031 of the accuracy estimation system 103 described above) .
As shown in FIG. 3, at 301, a projection point set for an object in a 2D image is obtained based on a 3D point cloud data corresponding to the 2D image. The 2D image can be obtained by a camera, for example, the camera 101 described above. In some embodiments, an original point set for the object in a 3D global coordinate system of the 3D point cloud can be obtained, and the projection point set can be obtained according to the original point set for the object. The 3D point cloud can be acquired by a Lidar, such as the Lidar 102 described above. The 3D global coordinate system of the 3D point cloud can be simply referred to as the “global coordinate system. ” The global coordinate system can include a world coordinate system having a direction of true north as an X-axis, a direction of true east as a Y-axis, and a direction toward a center of the earth is a Z-axis. For example, the object can include a lane line, and the original point set for the lane line can include a plurality of control points of the lane line. In some other embodiments, the object can include a vehicle, a pedestrian, or any object that can be detected by the Lidar and the camera.
FIG. 4 is a flow chart of obtaining the original point set for the object from the 3D point cloud consistent with the disclosure. As shown in FIG. 4, at 401, the 3D point cloud is acquired by the Lidar. A driver can drive a collection vehicle, e.g., the vehicle 100 described above, to a scene where a HD map needs to be constructed. The 3D point cloud can be acquired by the Lidar (e.g., the Lidar 102 described above) mounted on the collection vehicle. In some embodiments, the 3D point cloud of a same scene may be acquired more than once. There are many other vehicles and pedestrians on the road or on the side of the road, which may occlude some areas of the road. Therefore, multiple collections can be performed to ensure that most of the road can be covered. There is a minimum speed limit on highways, and hence, the driver cannot drive too slowly. Therefore, multiple collections may be need to acquire enough points pe unit area to ensure that details of the road can be obtained. For example, the driver can drive the collection vehicle on the road for 3 to 5 times at different times. As such, a same scene may correspond to multiple point clouds acquired at different times.
At 402, a dense point cloud is obtained according to the 3D point cloud. Assumes the object include the lane line. If the number of channels of the Lidar are not high enough, the clean point cloud may be sparse. The sparse point cloud may be not enough for a lane line positioning. FIG. 5 is a flow chart of obtaining the dense point set for the object from the 3D point cloud consistent with the disclosure.
As shown in FIG. 5, at 501, a pose of a clean point cloud in the global coordinate system is calculated. In some embodiments, the 3D point cloud can be aligned with navigation data. For example, the navigation data can include data obtained by a combination of a GNSS (e.g., the GNSS in vehicle 100 described above) and a RTK (e.g., the RTK in vehicle 100 described above) . Since the 3D point cloud is obtained by the Lidar and the navigation data is obtained by the combination of the GNSS and the RTK, an alignment of the 3D point cloud and the navigation data on a physical time axis is needed to  ensure a one-to-one correspondence of the 3D point cloud and the navigation data. For example, the one-to-one correspondence can be performed according to a global time stamp of each data point stored with the 3D point cloud and the navigation data. The global time stamp can be obtained according to a high-precision clock provided by the GNSS (e.g., a GPS) .
In some embodiments, a preprocessing can be performed on the 3D point cloud to obtain the clean point cloud. In some embodiments, the preprocessing can include deleting data points corresponding to obstacles from the 3D point cloud. The 3D point cloud acquired by the Lidar may include the obstacles on the road, for example, vehicles, pedestrians, street lights, and the like, which can have an adverse effect on the lane line positioning from the 3D point cloud. The obstacles can be detected from the 3D point cloud using any suitable machine learning method, such as a deep learning network (e.g., an unsupervised pre-trained network, a convolutional neural network, a recurrent neural network, a recursive neural network, or the like) , and then be deleted from the 3D point cloud to obtain the clean point cloud.
In some embodiments, the pose of the clean point cloud can be calculated using graph-based optimization. For example, each frame obtained by the Lidar can include one clean point cloud. Rough poses of all clean point clouds obtained by the combination of the GNSS and the RTK can be used as vertexes of a graph. Relative pose relationships between frames can be used as edges of the graph.
In some embodiments, the relative pose relationships between frames can include relative pose relationships between temporal neighbor frames. All clean point clouds can be traversed according to a time sequence, and an inter-frame registration in the time sequence can be performed to obtain the relative pose relationships between the temporal neighbor frames. In some embodiments, performing the inter-frame registration can include selecting points from each clean point cloud using a normal space sampling method (NSS) , and performing iterative closest point (ICP) on the selected points of  the temporal neighbor frames to obtain the relative pose relationships between the temporal neighbor frames.
In some embodiments, the relative pose relationships between frames can include relative pose relationships between spatial proximity point clouds. The spatial proximity point clouds can refer to the clean point clouds from multiple frames that are close in space but not continuous in time sequence, for example, the clean point clouds of a same position acquired at different times. For example, according to global position information provided by the GNSS, the point cloud corresponding to each frame can be determined, and the iterative closest point (ICP) can be performed on the spatial proximity point clouds to obtain the relative pose relationships between the spatial proximity point clouds.
After obtaining the vertexes and edges of the graph, the pose of the clean point cloud can be obtained by solving an objective function of the graph. For example, the objective function can include minimizing a sum of residuals of the vertexes of the graph. Herein, the pose of the clean point cloud can be also referred to as a pose of the 3D point cloud.
At 502, a point registration is performed on the clean point clouds within a distance range. In some embodiments, performing the point registration on the clean point clouds within the distance range can include performing the graph-based optimization on a graph forming by the poses of the clean point clouds, obtained at 501, within the distance range. The distance range can be determined according to actual needs, for example, the distance range can be set as 100 meters, 200 meters, or the like. For a road having few features on both sides of the road (e.g., a highway) , the pose obtained at 501 may be not accurate. Therefore, performing again the graph-based optimization on the graph forming by the poses of the clean point clouds can increase an accuracy of the poses of the clean point clouds. The processes of performing the graph-based optimization are similar to the processes described at 501, and detailed description is omitted herein.
At 503, a point cloud stacking is performed on the clean point cloud to obtain the dense point cloud. In some embodiments, clean point clouds obtained at a previous moment and a next moment can be projected to a current moment. The clean point clouds obtained at the previous moment and the next moment can be projected to the current moment according to a motion state estimation of the vehicle (e.g., a moving distance, a change of a pose of the vehicle, and the like) . The clean point clouds obtained at the previous moment can include the clean point clouds obtained at one or more previous frames, and the clean point clouds obtained at the next moment can include the clean point clouds obtained at one or more next frames.
Referring again to FIG. 4, at 403, the dense point cloud is projected from the global coordinate system to a bird-view coordinate system to obtain a 2D bird-view image. For each point in the dense point cloud, a corresponding 2D bird-view point can be obtained by ignoring a coordinate value of the point in a height direction (i.e., Z axis) and remain coordinate values of the point in horizontal and vertical directions (i.e., X and Y axes) . For example, assumes that a coordinate of a point p in the global coordinate system is (x p, y p, z p) . When projecting the point p to the bird-view coordinate system, a coordinate of a corresponding 2D bird-view point p b can be (x p, y p) .
In some embodiments, if the number of channels of the Lidar are high enough, the processes at 503 can be omitted, and the clean point cloud can be projected from the global coordinate system to the bird-view coordinate system to obtain the 2D bird-view image at 403.
At 404, the object is marked in the 2D bird-view image. Assumes the object includes the lane line. FIG. 6 schematically shows example lane lines marked in the 2D bird-view image consistent with the disclosure. In some embodiments, the plurality of control points (denoted as yellow points in FIG. 6) can be marked on the 2D bird-view image by using a marking software, and the lane lines can be drawn on the 2D bird-view image. The plurality of control points can be selected from the 2D bird-view points at a regular  interval (e.g., 0.5m, 1m, or the like) in the 2D bird-view image. The original point set for the lane line can be determined as the plurality of control points.
In some embodiments, obtaining of the projection point set for the object in the 2D image can further include projecting the original point set for the object in the 3D point cloud from the global coordinate system to a 2D coordinate system of the 2D image. The 2D coordinate system of the 2D image can be simply referred to as an “image coordinate system. ” Since the global coordinate system is different from the image coordinate system, projecting the point set for the lane in the 3D point cloud from the global coordinate system to the image coordinate system can be accomplished by performing coordinate system transformations from global coordinate system to the image coordinate system.
FIG. 7 schematically shows example coordinate systems consistent with the disclosure. Taking the accuracy estimation system mounted on a vehicle (e.g., the vehicle 100 described above) as an example, as shown in FIG. 7, the coordinate systems includes the global coordinate system denoted as (X, Y, Z) , a vehicle body coordinate system denoted as (X v, Y v, Z v) , a camera coordinate system denoted as (X c, Y c, Z c) , a Lidar coordinate system denoted as (X L, Y L, Z L) , and the image coordinate system denoted as (x, y) .
FIG. 8 is a flow chart of an example coordinate system transformation method 800 consistent with the disclosure. As shown in FIG. 8, at 801, the original point set for the object in the 3D point cloud is converted from the global coordinate system to the vehicle body coordinate system to obtain a first conversion point set for the object in the vehicle body coordinate system.
In some embodiments, converting the original point set for the object in the 3D point cloud from the global coordinate system to the vehicle body coordinate system can include multiplying the original point set for the object by an inverse of the pose of the clean point cloud to obtain the first conversion point set. The pose can include a pose rotation matrix and a pose translation matrix.
The conversion equation can be as follows:
Figure PCTCN2020139323-appb-000001
wherein R p represents the pose rotation matrix of size 3×3, T p represents the pose translation matrix of size 3×1, [x p, y p, z pT represents a 3D point from the original point set for the object in the 3D point cloud, and [x v, y v, z vT represents a point from the first conversion point set for the object in the vehicle body coordinate system.
At 802, the first conversion point set for the object is converted from the vehicle body coordinate system to the camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system. In some embodiments, converting the first conversion point set for the object from the vehicle body coordinate system to the camera coordinate system can include multiplying the first conversion point set for the object by an inverse matrix of an extrinsic matrix for the camera capturing the 2D image to obtain the second conversion point set. The extrinsic matrix can be a conversion matrix from the vehicle coordinate system to the camera coordinate system and can include a camera rotation matrix and a camera translation matrix. The conversion equation can be as follows:
Figure PCTCN2020139323-appb-000002
wherein R c represents the camera rotation matrix of size 3×3, T c represents the camera translation matrix of size 3×1, [x v, y v, z vT represents the point from the first conversion point set for the object in the vehicle body coordinate system, and [x c, y c, z cT represents a point from the second conversion point set for the object in the camera coordinate system.
FIGs. 9A and 9B schematically show an example translation transformation and an example rotation transformation consistent with the disclosure. For the simplifying of the graphic display, FIGs. 9A and 9B  shows the translation transformation and the rotation transformation projected in a 2D coordinate plane. As shown in FIG. 9A, a point P v, can linearly translate to a point P c through adding a translation vector
Figure PCTCN2020139323-appb-000003
to the point P v, which can be expressed as the following equation:
Figure PCTCN2020139323-appb-000004
wherein T c represents the camera translation matrix.
As shown in FIG. 9B, the point P v can move to the point P c through rotating θ degrees in the clockwise direction, which can be expressed as the following equation:
Figure PCTCN2020139323-appb-000005
wherein:
Figure PCTCN2020139323-appb-000006
wherein R x represents a rotation matrix around X c axis, R y represents a rotation matrix around X y axis, R z represents a rotation matrix around X z axis, and R c represents the camera rotation matrix.
In some embodiments, the extrinsic matrix for the camera can be obtained by pre-calibration. When a plurality of calibration points in the vehicle body coordinate system and a plurality of corresponding points in the camera  coordinate system are obtained, the camera rotation matrix R c and the camera translation matrix T c can be calculated based on the Eqs. 2 to 5.
Referring again to FIG. 8, at 802, the second conversion point set for the object is projected to the 2D coordinate system to obtain the projection point set for the object according to a projection correlation. In some embodiments, the projection correlation can include an intrinsic matrix for the camera capturing the 2D image. Projecting the second conversion point set for the object to the 2D coordinate system can include multiplying the second conversion point set for the object by the intrinsic matrix for the camera capturing the 2D image to obtain the projection point set for the object, which can be expressed as the following equation:
Figure PCTCN2020139323-appb-000007
wherein [x, y]  T represents the point from the projection point set in the image coordinate system, and K represents the intrinsic matrix.
The intrinsic matrix can be expressed as follows:
Figure PCTCN2020139323-appb-000008
wherein K represents the intrinsic matrix, α x = fm x, α y = fm y, f represents a local length of the camera, m x and m y represent scale factors in the x and y directions (i.e., a number of pixels per unit distance) , γ represents skew parameters between the x and y axes, and (c x, c y) as shown in FIG. 7 represents a principal point of the 2D image, which is ideally in a center of the 2D image.
Referring again to FIG. 3, at 302, a position point set for the object in the 2D image is obtained by detecting the object in the 2D image. For example, a machine learning method, such as a convolutional neural network (CNN) or any suitable deep learning method, can be used to detect the object in the 2D image. The position point set for the object in the 2D image can be determined as pixel points of the 2D image falling in the object. Taking the  lane line as an example of the object, a road surface can be first detected in the 2D image, and the lane line can be then detected in the road surface.
At 303, an accuracy associated with the projection point set is estimated by comparing the projection point set for the object and the position point set for the object. In some embodiments, for each projection point in the projection point set, a corresponding position point in the position point set that is closest to the projection point can be found, and a distance between the projection point and the corresponding position point can be calculated. For example, the distance can include a Euclidean distance between the projection point and the corresponding position point, which can be calculated using the following equation:
Figure PCTCN2020139323-appb-000009
wherein d represents the Euclidean distance, p represents a projection point, (p x, p y) represents coordinates of p in the image coordinate system, q represents a position point corresponding to p, and (q x, q y) represents coordinates of q in the image coordinate system.
In some embodiments, a weighted average of the distances between the projection points and the corresponding position points can be calculated as the average distance between the projection point set and the position point set. A weight in calculating the weighted average can equal a quotient of an experience value divided by a depth value of the projection point, which can be expressed as:
W = E/z d      Eq. 9
wherein W represents the weight, E represents the experience value, and z d represents the depth value of the projection point.
The depth value can equal a distance between a point in the 3D point cloud that corresponds to the projection point and the camera capturing the 2D image. In some embodiments, the depth value can be determined according  to distance information of the point obtained by the Lidar and a relationship between mounting positions of the Lidar and the camera.
FIG. 10 is a flow chart of another accuracy estimation method 1000 consistent with embodiments of the disclosure. As shown in FIG. 10, on the basis of the method 300, the method 1000 further includes the following processes.
At 304, an adaption strategy is determined according to the average distance. A small average distance between the projection point set for the lane and the position point set for the lane may indicate that the point set for the lane obtained from the 3D point cloud is accurate. A large average distance may indicate that the point set for the lane obtained from the 3D point cloud may be not accurate. In some embodiments, the adaption strategy can be determined according to a comparison result of the average distance and a threshold distance. The threshold distance can be determined according to actual needs.
In some embodiments, the adaption strategy can include calibrating a mounting position of the camera capturing the 2D image in response to the average distance being larger than the threshold distance. For example, whether the mounting position of the camera is accurate can be checked. As another example, whether an installation of the camera is loose can be checked.
In some embodiments, the adaption strategy can include verifying the intrinsic matrix for the camera capturing the 2D image in response to the average distance being larger than the threshold distance. For example, if the comparison results of an entire section of the road are very poor, the calibration of the intrinsic matrix may be abnormal and need to be recalibrated.
In some embodiments, the adaption strategy can include correcting marking of the control points for the object in the 3D point cloud in response to the average distance being larger than the threshold distance. For example, some roads may be manually marked incorrectly, or the control points may be  not enough, a curvature deviation is too large, and the like. The control points for the lane line can be corrected by checking marking results.
In some embodiments, the adaption strategy can include correcting the pose of the point cloud in response to the average distance being larger than the threshold distance. For example, the pose of the point cloud obtained at 501 and 502 may be not accurate, and the pose of the point cloud may need to be corrected.
FIG. 11 is a flow chart of another example accuracy estimation method 1100 consistent with the disclosure. An execution entity of the method 1100 can include an accuracy estimation system (e.g., the accuracy estimation system 103 described above) , or a processor of the accuracy estimation system (e.g., the processor 1031 of the accuracy estimation system 103 described above) . On the basis of the method 1000 in FIG. 10, the method 1100 further includes the following processes.
At 305, the accuracy associated with the projection point set is determined according to the average distance. The accuracy associated with the projection point set can be also referred to as “an accuracy of the lane line generated in the 2D bird-view image. ” A large average distance can indicate a low accuracy of the lane generated in the 2D bird-view image, and a small average distance can indicate a high accuracy of the lane generated in the 2D bird-view image. That is, the average distance and the accuracy of the lane can have an inverse relationship. For example, the inverse relationship between the average distance and the accuracy of the lane can be expressed as follows:
Figure PCTCN2020139323-appb-000010
At 306, the accuracy associated with the projection point set is output. In some embodiments, the accuracy of the lane line can be output in real time. In some other embodiments, a prompt associated with the accuracy of the lane  line can be displayed on, for example, a screen, a steering wheel, an instrument panel, and/or the like, of the vehicle.
In some embodiments, the prompt can be displayed in response to the accuracy of the lane being lower than a threshold value. The threshold value can be predetermined according to actual needs. For example, the threshold value can be 90%, 95%, or the like. In some embodiments, the prompt can include a symbol, color, voice, or vibration for risk warning. For example, when the accuracy is lower than the threshold value, the prompt can be generated to notice a driver. In some embodiments, different accuracy may correspond to different risk levels, and the different risk levels can have different display manners. For example, there may be a first threshold value and a second threshold value smaller than the first threshold value. If the accuracy is higher than the first threshold value, an indicator light can emit green light indicating a safety level. If the accuracy is smaller than the first threshold value and higher than the second threshold value, the indicator light can emit yellow light indicating that the driver needs to pay attention. If the accuracy is smaller than the second threshold value, the indicator light can emit red light indicating a dangerous level.
Consistent with the disclosure, the accuracy estimation method and system can estimate the accuracy of the lane line that are automatically generated from the 3D point clouds. If the accuracy does not satisfy the predetermined requirement, the lane line can be further manually adjusted. As such, the semi-automatic labeling of high HD maps can be realized, thereby increasing the labeling efficiency, reducing the manual workload, and improving the labeling accuracy.
The present disclosure further provides a non-transitory computer-readable medium. The non-transitory computer-readable medium can include a random access memory (RAM) , a read only memory, a flash memory, a volatile memory, a hard disk storage, an optical media, or the like. The non-transitory computer-readable medium can store instructions that,  when being executed, causing a processor to execute an accuracy estimation method consistent with the disclosure, for example, the methods in FIGs. 3 to 5, 8, 10, and 11.
The processes shown in the figures associated with the method embodiments can be executed or performed in any suitable order or sequence, which is not limited to the order and sequence shown in the figures and described above. For example, two consecutive processes may be executed substantially simultaneously where appropriate or in parallel to reduce latency and processing times, or be executed in an order reversed to that shown in the figures, depending on the functionality involved.
Further, the components in the figures associated with the device embodiments can be coupled in a manner different from that shown in the figures as needed. Some components may be omitted and additional components may be added.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. It is intended that the specification and examples be considered as exemplary only and not to limit the scope of the disclosure, with a true scope and spirit of the invention being indicated by the following claims.

Claims (104)

  1. An accuracy estimation method comprising:
    obtaining a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image;
    obtaining a position point set for the object in the 2D image by detecting the object in the 2D image; and
    estimating an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  2. The method of claim 1, wherein obtaining the projection point set for the object in the 2D image includes:
    projecting an original point set for the object in the 3D point cloud from a 3D global coordinate system of the 3D point cloud to a 2D coordinate system of the 2D image.
  3. The method of claim 2, wherein projecting the original point set for the object in the 3D point cloud from the 3D global coordinate system to the 2D coordinate system includes:
    converting the original point set for the object in the 3D point cloud from the global coordinate system to a vehicle body coordinate system to obtain a first conversion point set for the object in the vehicle body coordinate system;
    converting the first conversion point set for the object from the vehicle body coordinate system to a camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system; and
    projecting the second conversion point set for the object to the 2D coordinate system to obtain the projection point set for the object  according to a projection correlation.
  4. The method of claim 3, wherein converting the original point set for the object in the 3D point cloud from the global coordinate system to the vehicle body coordinate system to obtain the first conversion point set for the object in the vehicle body coordinate system includes:
    obtaining a pose of the 3D point cloud, the pose including a pose rotation matrix and a pose translation matrix; and
    obtaining the first conversion point set according to the pose of the 3D point cloud.
  5. The method of claim 4, wherein obtaining the first conversion point set according to the pose of the 3D point cloud includes:
    multiplying the original point set for the object by an inverse of the pose.
  6. The method of claim 3, wherein converting the first conversion point set for the object from the vehicle body coordinate system to the camera coordinate system includes:
    multiplying the first conversion point set for the object by an inverse matrix of an extrinsic matrix for a camera capturing the 2D image, the extrinsic matrix including a camera rotation matrix and a camera translation matrix.
  7. The method of claim 3, wherein projecting the second conversion point set for the object to the 2D coordinate system includes:
    multiplying the second conversion point set for the object by an intrinsic matrix for a camera capturing the 2D image to obtain the projection point set for the object.
  8. The method of claim 1, wherein obtaining the projection point set for the object in the 2D image in the 2D image further includes:
    obtaining an original point set for the object in a 3D global coordinate system of the 3D point cloud; and
    obtaining the projection point set according to the original point set  for the object.
  9. The method of claim 8, wherein obtaining the original point set for the object includes:
    acquiring the 3D point cloud by a Lidar; and
    marking the object in a 2D bird-view image according to the 3D point cloud.
  10. The method of claim 9, wherein marking the object in the 2D bird-view image according to the 3D point cloud includes:
    obtaining a dense point cloud according to the 3D point cloud;
    projecting the dense point cloud from the global coordinate system to the bird-view coordinate system to obtain the 2D bird-view image; and
    marking the object in the 2D bird-view image.
  11. The method of claim 10, wherein obtaining the dense point cloud according to the 3D point cloud includes:
    calculating a pose of a clean point cloud in the global coordinate system;
    performing a point registration on the clean point cloud within a distance range; and
    performing a point cloud stacking on the clean point cloud to obtain the dense point cloud.
  12. The method of claim 11, wherein calculating the pose of the clean point cloud in the global coordinate system includes:
    aligning the 3D point cloud with navigation data; and
    performing a preprocessing on the 3D point cloud to obtain the clean point cloud.
  13. The method of claim 1, wherein the object includes a lane line.
  14. The method of claim 1, wherein obtaining the position point set for the object includes:
    detecting the object in the 2D image using a deep learning neural network or a clustering classification method.
  15. The method of claim 1, wherein estimating the accuracy associated with the projection point set includes:
    for each projection point in the projection point set:
    finding a corresponding position point in the position point set that is closest to the projection point; and
    calculating a distance between the projection point and the corresponding position point;
    calculating a weighted average of the distances between the projection points and the corresponding position points as an average distance between the projection point set and the position point set.
  16. The method of claim 15, wherein for the distance between the projection point and the corresponding position point, a weight in calculating the weighted average equals a quotient of an experience value divided by a depth value of the projection point.
  17. The method of claim 16, wherein the depth value equals a distance between a point in the 3D point cloud that corresponds to the projection point and a camera capturing the 2D image.
  18. The method of claim 15, further comprising:
    determining an adaption strategy according to the average distance.
  19. The method of claim 18, wherein determining the adaption strategy according to the average distance includes:
    determining the adaptation strategy according to a comparison result of the average distance and a threshold distance.
  20. The method of claim 19, wherein determining the adaptation strategy includes:
    calibrating a mounting position of a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  21. The method of claim 19, wherein determining the adaptation strategy includes:
    verifying an intrinsic matrix for a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  22. The method of claim 19, wherein determining the adaptation strategy includes:
    correcting marking of control points for the object in the 3D point cloud in response to the average distance being larger than the threshold distance.
  23. The method of claim 19, wherein determining the adaptation strategy includes:
    correcting a pose of the 3D point cloud in response to the average distance being larger than the threshold distance.
  24. The method of claim 15, further comprising:
    determining the accuracy associated with the projection point set according to the average distance.
  25. The method of claim 24, further comprising:
    outputting the accuracy associated with the projection point set.
  26. The method of claim 25, wherein outputting the accuracy includes:
    displaying a prompt in response to the accuracy being lower than a threshold value.
  27. An accuracy estimation system comprising:
    a processor; and
    a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to:
    obtain a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image;
    obtain a position point set for the object in the 2D image by detecting the object in the 2D image; and
    estimate an accuracy associated with the projection point set by  comparing the projection point set for the object and the position point set for the object.
  28. The system of claim 27, wherein the instructions further cause the processor to:
    project an original point set for the object in the 3D point cloud from a 3D global coordinate system of the 3D point cloud to a 2D coordinate system of the 2D image.
  29. The system of claim 28, wherein the instructions further cause the processor to:
    convert the original point set for the object in the 3D point cloud from the global coordinate system to a vehicle body coordinate system to obtain a first conversion point set for the object in the vehicle body coordinate system;
    convert the first conversion point set for the object from the vehicle body coordinate system to a camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system; and
    project the second conversion point set for the object to the 2D coordinate system to obtain the projection point set for the object according to a projection correlation.
  30. The system of claim 29, wherein the instructions further cause the processor to:
    obtain a pose of the 3D point cloud, the pose including a pose rotation matrix and a pose translation matrix; and
    obtain the first conversion point set according to the pose of the 3D point cloud.
  31. The system of claim 30, wherein the instructions further cause the processor to:
    multiply the original point set for the object by an inverse of the pose.
  32. The system of claim 29, wherein the instructions further cause  the processor to:
    multiply the first conversion point set for the object by an inverse matrix of an extrinsic matrix for a camera capturing the 2D image, the extrinsic matrix including a camera rotation matrix and a camera translation matrix.
  33. The system of claim 29, wherein the instructions further cause the processor to:
    multiply the second conversion point set for the object by an intrinsic matrix for a camera capturing the 2D image to obtain the projection point set for the object.
  34. The system of claim 27, wherein the instructions further cause the processor to:
    obtain an original point set for the object in a 3D global coordinate system of the 3D point cloud; and
    obtain the projection point set according to the original point set for the object.
  35. The system of claim 34, wherein the instructions further cause the processor to:
    acquire the 3D point cloud by a Lidar; and
    mark the object in a 2D bird-view image according to the 3D point cloud.
  36. The system of claim 35, wherein the instructions further cause the processor to:
    obtain a dense point cloud according to the 3D point cloud;
    project the dense point cloud from the global coordinate system to a bird-view coordinate system to obtain the 2D bird-view image; and
    mark the object in the 2D bird-view image.
  37. The system of claim 36, wherein the instructions further cause the processor to:
    calculate a pose of a clean point cloud in the global coordinate  system;
    perform a point registration on the clean point cloud within a distance range; and
    perform a point cloud stacking on the clean point cloud to obtain the dense point cloud.
  38. The system of claim 37, wherein the instructions further cause the processor to:
    align the 3D point cloud with navigation data; and
    perform a preprocessing on the 3D point cloud to obtain the clean point cloud.
  39. The system of claim 27, wherein the object includes a lane line.
  40. The system of claim 27, wherein the instructions further cause the processor to:
    detect the object in the 2D image using a deep learning neural network or a clustering classification method.
  41. The system of claim 27, wherein the instructions further cause the processor to:
    for each projection point in the projection point set:
    find a corresponding position point in the position point set that is closest to the projection point; and
    calculate a distance between the projection point and the corresponding position point;
    calculate a weighted average of the distances between the projection points and the corresponding position points as an average distance between the projection point set and the position point set.
  42. The system of claim 41, wherein for the distance between a projection point and the corresponding position point, a weight in calculating the weighted average equals a quotient of an experience value divided by a depth value of the projection point.
  43. The system of claim 42, wherein the depth value equals a  distance between a point in the 3D point cloud that corresponds to the projection point and a camera capturing the 2D image.
  44. The system of claim 41, wherein the instructions further cause the processor to:
    determine an adaption strategy according to the average distance.
  45. The system of claim 44, wherein the instructions further cause the processor to:
    determine the adaptation strategy according to a comparison result of the average distance and a threshold distance.
  46. The system of claim 45, wherein the instructions further cause the processor to:
    calibrate a mounting position of a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  47. The system of claim 45, wherein the instructions further cause the processor to:
    verify an intrinsic matrix for a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  48. The system of claim 45, wherein the instructions further cause the processor to:
    correct marking of control points for the lane in the 3D point cloud in response to the average distance being larger than the threshold distance.
  49. The system of claim 45, wherein the instructions further cause the processor to:
    correct a pose of the 3D point cloud in response to the average distance being larger than the threshold distance.
  50. The system of claim 41, wherein the instructions further cause the processor to:
    determine the accuracy associated with the projection point set according to the average distance.
  51. The system of claim 50, wherein the instructions further cause the processor to:
    output the accuracy associated with the projection point set.
  52. The system of claim 51, wherein the instructions further cause the processor to:
    display a prompt in response to the accuracy being lower than a threshold value.
  53. A vehicle comprising:
    a camera configured to capture a two-dimensional (2D) image;
    a Lidar configured to acquire a three-dimensional (3D) point cloud;
    a processor; and
    a memory coupled to the camera, the Lidar, and the processor, and storing instructions that, when executed by the processor, cause the processor to:
    obtain a projection point set for an object in the 2D image based on a three-dimensional (3D) point cloud corresponding to the 2D image;
    obtain a position point set for the object in the 2D image by detecting the object in the 2D image; and
    estimate an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  54. The vehicle of claim 53, wherein the instructions further cause the processor to:
    project an original point set for the object in the 3D point cloud from a 3D global coordinate system of the 3D point cloud to a 2D coordinate system of the 2D image.
  55. The vehicle of claim 54, wherein the instructions further cause the processor to:
    convert the original point set for the object in the 3D point cloud from the global coordinate system to a vehicle body coordinate system to  obtain a first conversion point set for the object in the vehicle body coordinate system;
    convert the first conversion point set for the object from the vehicle body coordinate system to a camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system; and
    project the second conversion point set for the object to the 2D coordinate system to obtain the projection point set for the object according to a projection correlation.
  56. The vehicle of claim 55, wherein the instructions further cause the processor to:
    obtain a pose of the 3D point cloud, the pose including a pose rotation matrix and a pose translation matrix; and
    obtain the first conversion point set according to the pose of the 3D point cloud.
  57. The vehicle of claim 56, wherein the instructions further cause the processor to:
    multiply the original point set for the object by an inverse of the pose.
  58. The vehicle of claim 55, wherein the instructions further cause the processor to:
    multiply the first conversion point set for the object by an inverse matrix of an extrinsic matrix for a camera capturing the 2D image, the extrinsic matrix including a camera rotation matrix and a camera translation matrix.
  59. The vehicle of claim 55, wherein the instructions further cause the processor to:
    multiply the second conversion point set for the object by an intrinsic matrix for a camera capturing the 2D image to obtain the projection point set for the object.
  60. The vehicle of claim 53, wherein the instructions further cause  the processor to:
    obtain an original point set for the object in a 3D global coordinate system of the 3D point cloud; and
    obtain the projection point set according to the point set for the object.
  61. The vehicle of claim 60, wherein the instructions further cause the processor to:
    acquire the 3D point cloud by a Lidar; and
    mark the object in a 2D bird-view image.
  62. The vehicle of claim 61, wherein the instructions further cause the processor to:
    obtain a dense point cloud according to the 3D point cloud;
    project the dense point cloud from the global coordinate system to a bird-view coordinate system to obtain the 2D bird-view image; and
    m ark the object in the 2D bird-view image.
  63. The vehicle of claim 62, wherein the instructions further cause the processor to:
    calculate a pose of the clean point cloud in the global coordinate system;
    perform a point registration on the clean point cloud within a distance range; and
    perform a point cloud stacking on the clean point cloud to obtain the dense point cloud.
  64. The vehicle of claim 63, wherein the instructions further cause the processor to:
    align the 3D point cloud with navigation data; and
    perform a preprocessing on the 3D point cloud to obtain the clean point cloud.
  65. The vehicle of claim 53, wherein the object includes a lane line.
  66. The vehicle of claim 53, wherein the instructions further cause  the processor to:
    detect the object in the 2D image using a deep learning neural network or a clustering classification method.
  67. The vehicle of claim 53, wherein the instructions further cause the processor to:
    for each projection point in the projection point set:
    find a corresponding position point in the position point set that is closest to the projection point; and
    calculate a distance between the projection point and the corresponding position point;
    calculate a weighted average of the distances between the projection points and the corresponding position points as an average distance between the projection point set and the position point set.
  68. The vehicle of claim 67, wherein for the distance between a projection point and the corresponding position point, a weight in calculating the weighted average equals a quotient of an experience value divided by a depth value of the projection point.
  69. The vehicle of claim 68, wherein the depth value equals a distance between a point in the 3D point cloud that corresponds to the projection point and a camera capturing the 2D image.
  70. The vehicle of claim 67, wherein the instructions further cause the processor to:
    determine an adaption strategy according to the average distance.
  71. The vehicle of claim 70, wherein the instructions further cause the processor to:
    determine the adaptation strategy according to a comparison result of the average distance and a threshold distance.
  72. The vehicle of claim 71, wherein the instructions further cause the processor to:
    calibrate a mounting position of a camera capturing the 2D image in  response to the average distance being larger than the threshold distance.
  73. The vehicle of claim 71, wherein the instructions further cause the processor to:
    verify an intrinsic matrix for a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  74. The vehicle of claim 71, wherein the instructions further cause the processor to:
    correct marking of control points for the object in the 3D point cloud in response to the average distance being larger than the threshold distance.
  75. The vehicle of claim 71, wherein the instructions further cause the processor to:
    correct a pose of the 3D point cloud in response to the average distance being larger than the threshold distance.
  76. The vehicle of claim 67, wherein the instructions further cause the processor to:
    determine the accuracy associated with the projection point set according to the average distance.
  77. The vehicle of claim 76, wherein the instructions further cause the processor to:
    output the accuracy associated with the projection point set.
  78. The vehicle of claim 77, wherein the instructions further cause the processor to:
    display a prompt in response to the accuracy being lower than a threshold value.
  79. A non-transitory computer-readable medium storing instructions that, when being executed, cause a processor to:
    obtain a projection point set for an object in a two-dimensional (2D) image based on a three-dimensional (3D) point cloud corresponding to the 2D image;
    obtain a position point set for the object in the 2D image by detecting the object in the 2D image; and
    estimate an accuracy associated with the projection point set by comparing the projection point set for the object and the position point set for the object.
  80. The non-transitory computer-readable medium of claim 79, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    project an original point set for the object in the 3D point cloud from a 3D global coordinate system of the 3D point cloud to a 2D coordinate system of the 2D image.
  81. The non-transitory computer-readable medium of claim 80, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    convert the original point set for the object in the 3D point cloud from the global coordinate system to a vehicle body coordinate system to obtain a first conversion point set for the object in the vehicle body coordinate system;
    convert the first conversion point set for the object from the vehicle body coordinate system to a camera coordinate system to obtain a second conversion point set for the object in the camera coordinate system; and
    project the second conversion point set for the object to the 2D coordinate system to obtain the projection point set for the object according to a projection correlation.
  82. The non-transitory computer-readable medium of claim 81, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    obtain a pose of the 3D point cloud, the pose including a pose rotation matrix and a pose translation matrix; and
    obtain the first conversion point set according to the pose of the 3D  point cloud.
  83. The non-transitory computer-readable medium of claim 82, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    multiply the original point set for the object by an inverse of the pose.
  84. The non-transitory computer-readable medium of claim 81, wherein converting the first conversion point set for the lane from the vehicle body coordinate system to the camera coordinate system includes:
    multiplying the first conversion point set for the lane by an inverse matrix of an extrinsic matrix for a camera capturing the 2D image to obtain the second conversion point set, the extrinsic matrix including a camera rotation matrix and a camera translation matrix.
  85. The non-transitory computer-readable medium of claim 81, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    multiply the second conversion point set for the object by an intrinsic matrix for a camera capturing the 2D image to obtain the projection point set for the object.
  86. The non-transitory computer-readable medium of claim 73, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    obtain an original point set for the object in a 3D global coordinate system of the 3D point cloud;
    obtain the projection point set according to the original point set for the object.
  87. The non-transitory computer-readable medium of claim 86, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    acquire the 3D point cloud by a Lidar; and
    mark the object in a 2D bird-view image according to the 3D point cloud.
  88. The non-transitory computer-readable medium of claim 87, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    obtain a dense point cloud according to the 3D point cloud;
    project the dense point cloud from the global coordinate system to a bird-view coordinate system to obtain the 2D bird-view image; and
    mark the lane line in the 2D bird-view image.
  89. The non-transitory computer-readable medium of claim 88, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    calculate a pose of a clean point cloud in the global coordinate system;
    perform a point registration on the clean point cloud within a distance range; and
    perform a point cloud stacking on the clean point cloud to obtain the dense point cloud.
  90. The non-transitory computer-readable medium of claim 89, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    align the 3D point cloud with navigation data; and
    perform a preprocessing on the 3D point cloud to obtain the clean point cloud.
  91. The non-transitory computer-readable medium of claim 79, wherein the object includes a lane line.
  92. The non-transitory computer-readable medium of claim 79, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    detect the object in the 2D image using a deep learning neural  network or a clustering classification method.
  93. The non-transitory computer-readable medium of claim 79, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    find a corresponding position point in the position point set that is closest to the projection point; and
    calculate a distance between the projection point and the corresponding position point;
    calculate a weighted average of the distances between the projection points and the corresponding position points as an average distance between the projection point set and the position point set.
  94. The non-transitory computer-readable medium of claim 93, wherein for the distance between a projection point and the corresponding position point, a weight in calculating the weighted average equals a quotient of an experience value divided by a depth value of the projection point.
  95. The non-transitory computer-readable medium of claim 94, wherein the depth value equals a distance between a point in the 3D point cloud that corresponds to the projection point and a camera capturing the 2D image.
  96. The non-transitory computer-readable medium of claim 93, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    determine an adaption strategy according to the average distance.
  97. The non-transitory computer-readable medium of claim 96, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    determine the adaptation strategy according to a comparison result of the average distance and a threshold distance.
  98. The non-transitory computer-readable medium of claim 97,  wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    calibrate a mounting position of a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  99. The non-transitory computer-readable medium of claim 97, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    verify an intrinsic matrix for a camera capturing the 2D image in response to the average distance being larger than the threshold distance.
  100. The non-transitory computer-readable medium of claim 97, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    correct marking of control points for the lane in the 3D point cloud in response to the average distance being larger than the threshold distance.
  101. The non-transitory computer-readable medium of claim 97, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    correct a pose of the 3D point cloud in response to the average distance being larger than the threshold distance.
  102. The non-transitory computer-readable medium of claim 93, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    determine the accuracy associated with the projection point set according to the average distance.
  103. The non-transitory computer-readable medium of claim 102, wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    output the accuracy associated with the projection point set.
  104. The non-transitory computer-readable medium of claim 103,  wherein the instructions stored in the computer-readable medium, when being executed, causes the processor to:
    display a prompt in response to the accuracy being lower than a threshold value.
PCT/CN2020/139323 2020-12-25 2020-12-25 Accuracy estimation method and system WO2022133986A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/139323 WO2022133986A1 (en) 2020-12-25 2020-12-25 Accuracy estimation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/139323 WO2022133986A1 (en) 2020-12-25 2020-12-25 Accuracy estimation method and system

Publications (1)

Publication Number Publication Date
WO2022133986A1 true WO2022133986A1 (en) 2022-06-30

Family

ID=82158648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139323 WO2022133986A1 (en) 2020-12-25 2020-12-25 Accuracy estimation method and system

Country Status (1)

Country Link
WO (1) WO2022133986A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115797425A (en) * 2023-01-19 2023-03-14 中国科学技术大学 Laser global positioning method based on point cloud aerial view and rough-to-fine strategy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2458336A1 (en) * 2010-11-29 2012-05-30 Navteq North America, LLC Method and system for reporting errors in a geographic database
CN103901884A (en) * 2012-12-25 2014-07-02 联想(北京)有限公司 Information processing method and information processing device
CN109141446A (en) * 2018-07-04 2019-01-04 百度在线网络技术(北京)有限公司 For obtaining the method, apparatus, equipment and computer readable storage medium of map
CN109754426A (en) * 2017-11-01 2019-05-14 虹软科技股份有限公司 A kind of method and apparatus for verifying
CN110186467A (en) * 2018-02-23 2019-08-30 通用汽车环球科技运作有限责任公司 Group's sensing points cloud map
CN111983635A (en) * 2020-08-17 2020-11-24 浙江商汤科技开发有限公司 Pose determination method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2458336A1 (en) * 2010-11-29 2012-05-30 Navteq North America, LLC Method and system for reporting errors in a geographic database
CN103901884A (en) * 2012-12-25 2014-07-02 联想(北京)有限公司 Information processing method and information processing device
CN109754426A (en) * 2017-11-01 2019-05-14 虹软科技股份有限公司 A kind of method and apparatus for verifying
CN110186467A (en) * 2018-02-23 2019-08-30 通用汽车环球科技运作有限责任公司 Group's sensing points cloud map
CN109141446A (en) * 2018-07-04 2019-01-04 百度在线网络技术(北京)有限公司 For obtaining the method, apparatus, equipment and computer readable storage medium of map
CN111983635A (en) * 2020-08-17 2020-11-24 浙江商汤科技开发有限公司 Pose determination method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115797425A (en) * 2023-01-19 2023-03-14 中国科学技术大学 Laser global positioning method based on point cloud aerial view and rough-to-fine strategy
CN115797425B (en) * 2023-01-19 2023-06-16 中国科学技术大学 Laser global positioning method based on point cloud aerial view and coarse-to-fine strategy

Similar Documents

Publication Publication Date Title
JP7073315B2 (en) Vehicles, vehicle positioning systems, and vehicle positioning methods
CN107703528B (en) Visual positioning method and system combined with low-precision GPS in automatic driving
CN111436216B (en) Method and system for color point cloud generation
TWI722355B (en) Systems and methods for correcting a high-definition map based on detection of obstructing objects
CN110859044B (en) Integrated sensor calibration in natural scenes
CN109931939B (en) Vehicle positioning method, device, equipment and computer readable storage medium
CN108692719B (en) Object detection device
JP2020525809A (en) System and method for updating high resolution maps based on binocular images
CN113673282A (en) Target detection method and device
WO2019208101A1 (en) Position estimating device
CN112346463B (en) Unmanned vehicle path planning method based on speed sampling
JP2017181476A (en) Vehicle location detection device, vehicle location detection method and vehicle location detection-purpose computer program
KR102003387B1 (en) Method for detecting and locating traffic participants using bird's-eye view image, computer-readerble recording medium storing traffic participants detecting and locating program
CN111833443A (en) Landmark position reconstruction in autonomous machine applications
KR20200142315A (en) Method and apparatus of updating road network
WO2022133986A1 (en) Accuracy estimation method and system
CN117635683A (en) Trolley indoor positioning method based on multiple cameras
CN112712563A (en) Camera orientation estimation
KR102195040B1 (en) Method for collecting road signs information using MMS and mono camera
WO2021004813A1 (en) Method and mobile entity for detecting feature points in an image
US20240112363A1 (en) Position estimation system, position estimation method, and program
JP2020077297A (en) Position and posture estimation device
CN116168086A (en) Calibration precision evaluation method and device, electronic equipment and storage medium
CN117953046A (en) Data processing method, device, controller, vehicle and storage medium
CN114265081A (en) Data fusion method of 2D camera and laser radar

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20966551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20966551

Country of ref document: EP

Kind code of ref document: A1