CN113255444A - Training method of image recognition model, image recognition method and device - Google Patents
Training method of image recognition model, image recognition method and device Download PDFInfo
- Publication number
- CN113255444A CN113255444A CN202110421118.1A CN202110421118A CN113255444A CN 113255444 A CN113255444 A CN 113255444A CN 202110421118 A CN202110421118 A CN 202110421118A CN 113255444 A CN113255444 A CN 113255444A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- cloud data
- image recognition
- recognition model
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 117
- 238000012549 training Methods 0.000 title claims abstract description 95
- 238000004590 computer program Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 18
- 238000001514 detection method Methods 0.000 abstract description 20
- 238000010586 diagram Methods 0.000 description 13
- 230000009286 beneficial effect Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 7
- 241001300198 Caperonia palustris Species 0.000 description 5
- 240000004050 Pentaglottis sempervirens Species 0.000 description 5
- 235000000384 Veronica chamaedrys Nutrition 0.000 description 5
- 238000002372 labelling Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
The application provides a training method of an image recognition model, an image recognition method and a device, wherein the image recognition model is applied to a monocular detector, and the method comprises the following steps: acquiring first point cloud data acquired by laser radar equipment; determining a three-dimensional object frame corresponding to a target object in the first point cloud data according to the first point cloud data; and training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, wherein the image recognition model is used for recognizing the object in the image to be recognized. The method and the device can improve the identification accuracy of the image identification model, and the image identification model runs in any monocular 3D detector, so that the cost of object detection can be reduced while the detection accuracy and the identification accuracy of the target object are ensured.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a training method for an image recognition model, an image recognition method, and an image recognition device.
Background
In the field of automatic driving, three-dimensional object detection is generally required to improve the safety of vehicle driving and to avoid collision between the vehicle and other objects on the road.
Currently, three-dimensional object detection is generally realized by using a laser radar device, but the laser radar device is high in price and limited in working range. In order to solve the problem, a monocular detector can be used for detecting the three-dimensional object instead of a laser radar device. However, the monocular method of detecting a three-dimensional object based on a monocular detector is difficult to capture accurate depth information in an image. To enable the monocular detector to detect depth information, the depth image predicted by the pre-trained depth estimator can be used as a network input to guide the monocular detector to perform depth learning, so as to capture the depth information in the image.
However, in the above manner, the depth image predicted by the depth estimator may lose a part of information, resulting in low detection accuracy of the three-dimensional object.
Disclosure of Invention
The embodiment of the application provides a training method of an image recognition model, an image recognition method and a device, which can improve the recognition accuracy of the image recognition model, and the image recognition model is operated in any monocular 3D detector, so that the detection accuracy and the recognition accuracy of a target object are ensured, and simultaneously, the cost of object detection can be reduced.
In a first aspect, an embodiment of the present application provides a training method for an image recognition model, where the image recognition model is applied to a monocular detector, and the method includes:
acquiring first point cloud data acquired by laser radar equipment;
determining a three-dimensional object frame corresponding to a target object in the first point cloud data according to the first point cloud data;
training an initial image recognition model according to the first point cloud data and the three-dimensional object frame to obtain the image recognition model, wherein the image recognition model is used for recognizing an object in an image to be recognized.
In a possible implementation manner, the determining, according to the first point cloud data, a three-dimensional object frame corresponding to a target object in the first point cloud data includes:
and inputting the first point cloud data into a pre-trained three-dimensional recognition model based on the laser radar to obtain a three-dimensional object frame corresponding to a target object in the first point cloud data, wherein the three-dimensional recognition model is obtained by training an initial recognition model by adopting a three-dimensional object marking frame corresponding to each object in the second point cloud data.
In a possible implementation manner, the determining, according to the first point cloud data, a three-dimensional object frame corresponding to a target object in the first point cloud data includes:
acquiring an RGB color mode (RGB) image corresponding to the first point cloud data;
segmenting the RGB image to obtain a two-dimensional frame and a semantic mask;
and determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional frame and the semantic mask.
In a possible implementation manner, the determining, according to the two-dimensional box and the semantic mask, a three-dimensional box corresponding to a target object in the first point cloud data includes:
determining third point cloud data corresponding to a target object in the first point cloud data according to the two-dimensional frame and the semantic mask;
and determining a minimum three-dimensional bounding box covering third point cloud data corresponding to the target object, and determining the minimum three-dimensional bounding box as a three-dimensional object frame corresponding to the target object.
In a possible implementation manner, the determining, according to the two-dimensional frame and the semantic mask, third point cloud data corresponding to a target object in the first point cloud data includes:
determining initial point cloud data corresponding to the target object according to the two-dimensional frame and the semantic mask;
clustering the initial point cloud data corresponding to the target object to obtain a plurality of clusters;
and determining the initial point cloud data in the cluster with the most initial point cloud data in the plurality of clusters as the third point cloud data.
In a second aspect, the present application provides an image recognition method, including:
acquiring an image to be identified;
inputting the image to be recognized into an image recognition model to obtain an object in the image to be recognized, wherein the image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, and the point cloud data is acquired through laser radar equipment.
In a third aspect, an embodiment of the present application provides a training apparatus for an image recognition model, including:
the acquisition unit is used for acquiring first point cloud data acquired through the laser radar equipment.
And the processing unit is used for determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data.
And the training unit is used for training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, and the image recognition model is used for recognizing the object in the image to be recognized.
In a possible implementation manner, the processing unit is specifically configured to:
and inputting the first point cloud data into a pre-trained three-dimensional recognition model based on the laser radar to obtain a three-dimensional object frame corresponding to a target object in the first point cloud data, wherein the three-dimensional recognition model is obtained by training an initial recognition model by adopting a three-dimensional object marking frame corresponding to each object in the second point cloud data.
In a possible implementation manner, the processing unit is specifically configured to:
acquiring an RGB image corresponding to the first point cloud data; segmenting the RGB image to obtain a two-dimensional frame and a semantic mask; and determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional frame and the semantic mask.
In a possible implementation manner, the processing unit is specifically configured to:
determining third point cloud data corresponding to a target object in the first point cloud data according to the two-dimensional frame and the semantic mask; and determining a minimum three-dimensional bounding box covering third point cloud data corresponding to the target object, and determining the minimum three-dimensional bounding box as a three-dimensional object box corresponding to the target object.
In a possible implementation manner, the processing unit is specifically configured to:
determining initial point cloud data corresponding to the target object according to the two-dimensional frame and the semantic mask; clustering the initial point cloud data corresponding to the target object to obtain a plurality of clusters; and determining the initial point cloud data in the cluster with the most initial point cloud data in the plurality of clusters as the third point cloud data.
In a fourth aspect, the present application provides an image recognition apparatus comprising:
and the acquisition unit is used for acquiring the image to be identified.
The processing unit is used for inputting the image to be recognized into an image recognition model to obtain an object in the image to be recognized, the image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, and the point cloud data is acquired through laser radar equipment.
In a fifth aspect, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the training method for the image recognition model described in any one of the possible implementation manners of the first aspect or the image recognition method described in any one of the possible implementation manners of the second aspect.
In a sixth aspect, an embodiment of the present application further provides a server, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for training an image recognition model in any one of the possible implementations of the first aspect.
In a seventh aspect, an embodiment of the present application further provides a vehicle, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the image recognition method in any one of the possible implementation manners of the second aspect.
In an eighth aspect, an embodiment of the present application further provides a computer program product, where the computer program product includes: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the method for training an image recognition model described in any of the possible implementations of the first aspect above or the method for image recognition described in any of the possible implementations of the second aspect above.
Therefore, according to the training method, the image recognition method and the device for the image recognition model, when the image recognition model is trained, the three-dimensional object frame corresponding to the target object is obtained by directly utilizing the first point cloud data collected by the laser radar equipment, and the training of the initial image recognition model is guided through the three-dimensional object frame, so that any information of the target object is not lost in the training process, and the recognition accuracy of the image recognition model is improved. In addition, the training method is visual, simple, convenient and effective, the image recognition model can be operated in any monocular 3D detector, the target object detection precision and the recognition accuracy are guaranteed, and meanwhile the object detection cost can be reduced.
Drawings
Fig. 1 is a system architecture diagram of a training method of an image recognition model according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a training method for an image recognition model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating another method for training an image recognition model according to an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram illustrating a further method for training an image recognition model according to an embodiment of the present application;
FIG. 5 is a schematic flow chart diagram illustrating a further method for training an image recognition model according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating an image recognition method according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of an image recognition model training apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a vehicle according to an embodiment of the present application.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The training method and the image recognition method for the image recognition model provided by the embodiment of the application can be applied to scenes such as automatic driving or intelligent transportation and the like, and can also be applied to other scenes needing to detect three-dimensional objects. In the present application, an automatic driving scenario will be described as an example.
In the field of automatic driving, it is very important for a vehicle to perform three-dimensional object detection, which can avoid collision with other objects on the road. Therefore, in order to improve the safety of vehicle travel, a three-dimensional object detection device in an autonomous vehicle plays an important role.
Currently, due to the price of lidar devices and the limitations of the operating range, monocular detectors are commonly used for the detection of three-dimensional objects. However, due to the ill-conditioned nature of the monocular image captured by the monocular detector, it is difficult for the monocular method to capture accurate depth information in the image. And the laser radar point cloud can provide accurate depth measurement for a scene, so that the monocular detector can be guided to learn depth information. To achieve this goal, multi-stage pipelines based on depth maps are currently being developed. In particular, this type of method splits the training process into multiple stages, and in the first stage of training, the lidar point cloud can be projected onto the image plane to train the depth estimator. In the second stage, the depth map predicted from the pre-trained depth estimator may be used as a network input for training a monocular depth map-based detector. However, such complex pipelines implicitly utilize the lidar point cloud through the intermediate depth estimator, thereby losing a portion of valuable information, such as a portion of depth information, and thus resulting in poor accuracy of the three-dimensional object detected by the monocular detector.
In the embodiment of the application, the above problems are taken into consideration, and the method for training the image recognition model is provided. Furthermore, when the object in the image to be recognized is detected through the image recognition model, the accuracy of the detected object is high.
Fig. 1 is a system architecture diagram of a training method of an image recognition model according to an embodiment of the present application, as shown in fig. 1, the system includes a laser radar device 11, a server 12, and a vehicle 13, where the vehicle 13 is provided with a monocular detector. The networks used between them may include various types of wireless networks, such as, but not limited to: the internet, local area networks, WIFI, WLAN, cellular communication networks (GPRS, CDMA, 2G/3G/4G/5G cellular networks), satellite communication networks, and so forth.
As in fig. 1. For example, the laser radar device 11 may acquire the radar point cloud map in real time, and the laser radar device 11 may send the acquired radar point cloud map to the server 12 in real time through the wireless network. When the autonomous vehicle 13 runs on a road, the on-board monocular detector may acquire an RGB color mode (RGB) image on the road in real time, and the hardware terminal of the vehicle 13 may transmit the position information of the vehicle 13 and the acquired RGB color mode (RGB) image to the server 12 in real time through the wireless network. Receiving the information sent by the hardware terminal of the vehicle (13), the server 12 will implement training of the image recognition model according to the received information, and will send the finally trained image recognition model to the vehicle (13), and the vehicle (13) will use the trained image recognition model to recognize the object in the image to be recognized.
Hereinafter, a method for training an image recognition model provided by the present application will be described in detail by using specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating a training method of an image recognition model according to an embodiment of the present disclosure, where the training method of the image recognition model may be executed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 2, the training method of the image recognition model may include:
s201, first point cloud data collected through laser radar equipment are obtained.
For example, the first point cloud data includes point cloud data of a scene where the detected target object is located, where the first point cloud data may be captured by the laser radar device, or may be acquired in an offline collection manner, so that the cost may be reduced.
The target object may be a person, a car, a sign, or the like. The first point cloud data comprises accurate depth information, and the depth information can accurately determine the position of the target object, so that the image recognition model trained by the first point cloud data with the depth information has higher detection precision.
In this step, when the laser radar device collects the initial point cloud data, the initial point cloud data can be sent to the server in real time, and the server determines the first point cloud data from the initial point cloud data according to the position of the target object.
S202, determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data.
In this step, the first point cloud data may include at least one target object, and when the first point cloud data includes a plurality of target objects, each target object corresponds to a respective three-dimensional object frame, where the three-dimensional object frame is used to identify the target object.
For example, the process of determining the three-dimensional object frame corresponding to the target object in the first point cloud data may be implemented in two ways, one is in a supervised mode, and the other is in an unsupervised mode. A specific process for determining the three-dimensional object frame in the above two ways is explained in the following embodiments.
The three-dimensional object frame obtained by the method is obtained by directly operating the first point cloud data, so that any relevant information about the target object is not lost, and the integrity of the information of the target object is ensured.
S203, training the initial image recognition model according to the three-dimensional object frame to obtain an image recognition model, wherein the image recognition model is used for recognizing an object in the image to be recognized.
The initial image recognition model is mainly applied to the monocular probe, so that the initial image recognition model can adopt the image recognition model of the existing monocular probe 3D, such as Smoke, CenterNet and the like.
For example, an initial identification frame of the target object may be obtained by identifying, by an initial image identification model, an RGB color mode (RGB) image obtained by a monocular detector. The RGB color mode (RGB) image and the first point cloud data are acquired at the same scene at the same time, and the object information of the image in the RGB color mode (RGB) image corresponds to the object included in the first point cloud data one to one.
When the initial image recognition model is trained, the initial image recognition model can be trained according to the three-dimensional object frame and the initial recognition frame, so that the image recognition model is obtained. Specifically, the training process of the initial image recognition model is to evaluate the consistency degree of an initial recognition frame of a target object and a three-dimensional object frame corresponding to the target object in the first point cloud data through a monocular loss function, if the consistency degree reaches a preset threshold value, the training of the initial image recognition model is completed, and the trained initial image recognition model is the final image recognition model; if the consistency degree does not reach the preset threshold value, the parameters in the initial image recognition model need to be adjusted, the initial image recognition model after the parameters are adjusted is determined as a new initial image recognition model, and the training process is repeatedly executed until the consistency degree reaches the preset threshold value.
Wherein, the monocular loss function is shown in formulas (1) - (4):
L=Lcls+L2D+L3D (1)
L2D=-log(IoU(b′2D,b2D) (3)
L3D=SmoothL1(b′3D-b3D) (4)
wherein L isclsAccuracy of object class prediction, LclsThe smaller the value of (c), the more accurate the class of the prediction. c represents the true class of the target object, ciRepresenting the predicted probability, n, of an object identified by the initial image recognition model in the ith classcThe total number of the object categories stored in the terminal or the server.
L2DIndicating the degree of matching, L, of the initial recognition frame and the two-dimensional frame of the target object2DThe smaller the value is, the higher the matching degree of the initial identification frame is, wherein the two-dimensional frame of the target object is obtained by removing the height information of the target object from the three-dimensional object frame. b'2DRepresenting an initial recognition box, b2DThe two-dimensional box representing the target object, IoU, is an intersection set operator.
L3DAnd the matching degree of the initial recognition three-dimensional object frame and the three-dimensional object frame is represented, and the initial recognition three-dimensional object frame is obtained by adding height information of a target object to the initial recognition frame.
Wherein, SmoothL1As shown in the following equation:
b'3D-b3DIs less than 1:
b'3D-b3DIs greater than 1:
wherein, b'3DRepresenting the initially recognized three-dimensional object frame, b3DRepresenting a three-dimensional object frame, smoothenL1Through to b'3DAnd b3DIs compared with the coverage area of L3DThe smaller the value, the higher the recognition accuracy of initially recognizing the three-dimensional object frame.
L represents a degree of a course of the three-dimensional object corresponding to the target object in the initial recognition frame and the first point cloud data, and according to the description, the smaller the L value, the higher the accuracy of the trained initial image recognition model.
For example, after the image recognition model is obtained, after the monocular detector captures an image to be recognized, the image to be recognized may be input to the image recognition model to recognize an object in the image to be recognized.
In the embodiment of the application, when the image recognition model is trained, the three-dimensional object frame corresponding to the target object is obtained by directly utilizing the first point cloud data acquired by the laser radar equipment, and the training of the initial image recognition model is guided by the three-dimensional object frame, so that any information of the target object is not lost in the training process, and the recognition accuracy of the image recognition model is improved. In addition, the training method is visual, simple, convenient and effective, the image recognition model can be operated in any monocular 3D detector, the target object detection precision and the recognition accuracy are guaranteed, and meanwhile the object detection cost can be reduced.
Based on the embodiment shown in fig. 2, in order to facilitate understanding, in S102, how to determine the three-dimensional object frame corresponding to the target object in the first point cloud data is implemented according to the first point cloud data; next, by using the second embodiment shown in fig. 3, the following detailed description will be made on determining the three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data in the supervision mode.
Fig. 3 is a schematic flowchart of another training method for an image recognition model provided in an embodiment of the present application, which is used in this embodiment to describe in detail how, in S102 in the embodiment shown in fig. 2, a process of determining a three-dimensional object frame corresponding to a target object in first point cloud data according to the first point cloud data, and the embodiment shown in fig. 3 is used to determine a three-dimensional object frame in a supervised mode, as shown in fig. 3, the method includes:
s301, first point cloud data collected through laser radar equipment are obtained.
S301 is similar to S201, and is not described herein again.
S302, inputting the first point cloud data into a pre-trained three-dimensional recognition model based on the laser radar, and obtaining a three-dimensional object frame corresponding to the target object in the first point cloud data.
And the three-dimensional recognition model is obtained by training the initial recognition model by adopting a three-dimensional object marking frame corresponding to each object in the second point cloud data.
Specifically, the second point cloud data includes radar point cloud data collected for a scene required for training the three-dimensional identification model based on the laser radar. The second point cloud data can be collected through laser radar equipment and can also be acquired in an off-line mode, and the cost can be reduced through the off-line acquisition mode.
For example, the three-dimensional object labeling box corresponding to each object in the second point cloud data is obtained by manually labeling a key identification point on the second point cloud data, and is therefore called a supervision mode. The three-dimensional identification model based on the laser radar can adopt Second or F-pointet, and the initial identification frame is obtained by the Second point cloud data based on the three-dimensional identification model based on the laser radar. And training the initial recognition model by using the three-dimensional object labeling frames corresponding to each object in the second point cloud data. When the initial image recognition model is trained, the initial image recognition model can be trained according to the initial recognition frame and the three-dimensional object labeling frame corresponding to each object in the second point cloud data, so that the three-dimensional recognition model based on the laser radar is obtained. Specifically, the training process of the initial image recognition model is to evaluate the consistency degree of an initial recognition frame of a target object and a three-dimensional object marking frame corresponding to the target object in the point cloud data of the second point cloud data through a monocular loss function, if the consistency degree reaches a preset threshold value, the training of the initial image recognition model is completed, and the trained initial image recognition model is the final image recognition model; if the consistency degree does not reach the preset threshold value, the parameters in the initial image recognition model need to be adjusted, the initial image recognition model after the parameters are adjusted is determined as a new initial image recognition model, and the training process is repeatedly executed until the consistency degree reaches the preset threshold value.
The lidar loss function is similar to the monocular loss function in the embodiment shown in fig. 2, and is not described herein again.
When the three-dimensional object frame corresponding to the target object in the first point cloud data is determined, the first point cloud data can be input to a trained three-dimensional recognition model based on the laser radar, and the three-dimensional object frame is obtained.
S303, training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, wherein the image recognition model is used for recognizing an object in the image to be recognized.
S303 is similar to S203, and is not described herein again.
The three-dimensional recognition model obtained by the method is realized based on the point cloud data acquired by the laser radar, so that the three-dimensional recognition model has higher detection precision, and the value information of the target object in any first point cloud data is not lost in the process of training the three-dimensional recognition model by using the laser radar point cloud. Therefore, the three-dimensional object frame obtained by the target object in the first point cloud data identified by the three-dimensional identification model does not lose any value object related to the target object. And the three-dimensional object frame corresponding to the target object in the first point cloud data predicted by the three-dimensional detector based on the laser radar has quite high precision due to accurate depth measurement, and can be directly used for training detector detection models of other non-laser radars.
In this embodiment, in the supervision mode, in the process of determining the three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data, the obtained three-dimensional object frame does not lose any information related to the target object, and the information of the object included in the three-dimensional object frame can ensure the recognition accuracy of the image recognition model. In addition, key data points in the target object are marked on the second point cloud data in a manual mode, so that the workload of workers can be greatly reduced, and the cost of manual marking is reduced.
Fig. 4 is a schematic flowchart of a training method for an image recognition model according to an embodiment of the present application, and this embodiment describes in detail how, in S102 in the embodiment shown in fig. 2, a process of determining a three-dimensional object frame corresponding to a target object in first point cloud data according to the first point cloud data, where the embodiment shown in fig. 4 is different from the embodiment shown in fig. 3 in that the embodiment shown in fig. 4 is to determine the three-dimensional object frame in an unsupervised mode, and as shown in fig. 4, the method includes:
s401, first point cloud data collected through laser radar equipment are obtained.
S401 is similar to S201, and is not described herein again.
S402, acquiring an RGB color mode (RGB) image corresponding to the first point cloud data.
Wherein, an RGB color mode (RGB) image may be acquired by a monocular detector. An RGB color mode (RGB) image and first point cloud data are acquired at the same scene at the same time, and object information of the image in the RGB image corresponds to objects included in the first point cloud data one to one.
S403, segmenting the RGB color mode (RGB) image to obtain a two-dimensional frame and a semantic mask.
In this step, an offline 2D instance segmentation model may be used to segment an RGB color mode (RGB) image to obtain a two-dimensional frame (2D box) and a semantic mask (mask).
S404, determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional frame and the semantic mask.
In this step, a camera view cone may be constructed through a two-dimensional box (2D box) and a semantic mask (mask) to determine a three-dimensional object box corresponding to the target object in the first point cloud data.
In a possible implementation manner, when the three-dimensional frame corresponding to the target object in the first point cloud data is determined according to the two-dimensional frame and the semantic mask, the third point cloud data corresponding to the target object in the first point cloud data may be determined according to the two-dimensional frame and the semantic mask, the minimum three-dimensional bounding box covering the third point cloud data corresponding to the target object is determined, and the minimum three-dimensional bounding box is determined as the three-dimensional object frame corresponding to the target object.
Specifically, a camera view cone is constructed through a two-dimensional frame (2D box) and a semantic mask (mask) so as to select related laser radar points for a target object, and therefore third point cloud data corresponding to the target object in the first point cloud data is determined. Illustratively, based on the camera view frustum, the initial point cloud data corresponding to the target object is determined, and those 2D detection boxes without any lidar points inside will be ignored. However, because the lidar points located in the same view frustum are composed of the target object and the mixed background points or the shielding points around the target object, in order to delete the mixed background points or the shielding points around the target object in the initial point cloud data, the initial point cloud data is clustered by adopting a clustering method of DBSCAN to obtain a plurality of clusters. And determining the initial point cloud data in the cluster with the most initial point cloud data in the plurality of clusters as the third point cloud data.
Most of the initial point cloud data are point cloud data corresponding to the target object, the point cloud data are more concentrated, and the initial point cloud data are clustered by the clustering method of the DBSCAN, so that the most clusters in the initial point cloud data are screened to obtain third point cloud data, thereby ensuring that the point cloud data in the third point cloud data are all point cloud data of the target object, and completely eliminating the point cloud data of the scene.
And after the third point cloud data are obtained, performing horizontal projection on the third point cloud data to obtain a Bird Eye View (Bird Eye View). The method for obtaining the convex hull from the Bird's Eye View (Bird Eye View) is as follows: selecting the rightmost point (y is minimum, x is maximum) in the BEV (bird's-eye view), marking the point as P0, sorting the points included in the other third point clouds (from small to large) by taking the included angle (anticlockwise direction) between P0 and the x axis as a reference, and deleting the point closer to P0 if the two points have the same included angle. By the method for obtaining the convex hull through the BEV (bird's-eye view), all the points in the BEV (bird's-eye view) are traversed to form a closed convex hull. Enumerating the sides of the convex hull polygon of the convex hull, making a circumscribed rectangle, comparing the areas of the circumscribed rectangles, and selecting the rectangle with the smallest area as a minimum three-dimensional boundary frame which is a three-dimensional object frame corresponding to the target object. Other parameters of the three-dimensional object frame may be calculated from statistics of the remaining points, e.g. height may be expressed in terms of the maximum spatial offset of the points along the y-axis; the longitudinal center coordinate is calculated by averaging the longitudinal coordinates of the points. At the same time, the minimum three-dimensional bounding box size eliminates objects that are likely outliers because the three-dimensional sizes of most valid objects are very close. Although some potential targets are ignored and filtered, the final result can still make the image recognition model applied to the monocular detection method obtain accurate detection results.
S405, training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, wherein the image recognition model is used for recognizing an object in an image to be recognized.
S405 is similar to S203, and is not described herein again.
In this embodiment, in an unsupervised mode, in the process of determining the three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data, the obtained three-dimensional object frame does not lose any information related to the target object, and the information in the three-dimensional object frame is only the information of the target object and does not have information of a mixed background point or a shielding point around the target object, so that the information of the object contained in the three-dimensional object frame can ensure the recognition accuracy of the image recognition model.
Fig. 5 is a schematic flowchart of a training method for an image recognition model according to an embodiment of the present disclosure, and the embodiment takes a target recognition object, specifically a vehicle, as an example to describe in detail an operation manner of the training method for an image recognition model according to the present disclosure.
As shown in fig. 5, first point cloud data is obtained in the first step, and the specific obtaining manner is similar to S201, which is not described herein again. And after the first point cloud data is acquired, a first radar cloud three-dimensional object frame is acquired through a supervision mode or an unsupervised period mode.
In an exemplary supervision mode, similar to S302, the initial recognition model is trained by using the three-dimensional object labeling boxes corresponding to the objects in the second point cloud data, so as to obtain a three-dimensional recognition model based on the laser radar. As shown in fig. 5, specifically, the second point cloud data acquired in advance is identified based on the initial three-dimensional identification model of the laser radar, and an initial identification frame of the second point cloud target object is acquired. Evaluating the consistency degree of the initial identification frame of the target object and the three-dimensional object marking frame corresponding to the target object in the point cloud data of the second point cloud data through a monocular loss function through a LiDAR loss function, finishing training on the initial image identification model if the consistency degree reaches a preset threshold value, wherein the trained initial image identification model is the final image identification model; if the consistency degree does not reach the preset threshold value, parameters in the initial image recognition model need to be adjusted, the initial image recognition model after the parameters are adjusted is determined as a new initial image recognition model, and the training process is repeatedly executed until the consistency degree reaches the preset threshold value. And after the final image recognition model is obtained, recognizing the first point cloud data by the final image recognition model to obtain a first point cloud three-dimensional object frame.
As an example, another way of unsupervised mode is similar to the fourth embodiment. As shown in fig. 5, first, a first radar point cloud data and an RGB color mode (RGB) image corresponding to the first radar point cloud data are obtained at the same time. Then, by segmenting the RGB image, a two-dimensional frame (2D box) and a semantic mask (mask) are obtained. And constructing a camera view cone through a two-dimensional box (2D box) and a semantic mask (mask), and determining initial point cloud data from the first point cloud data based on the camera view cone. As shown in fig. 5, a mixed background point cloud exists around the target object (vehicle, for example) in the initial point cloud data. After the initial point cloud data is processed by the clustering method of the DBSCAN, the initial point cloud data in the clusters containing the most initial point cloud data in the plurality of clusters is selected and determined as the third point cloud data, so that the mixed background point cloud in the selected target point cloud data is eliminated. As shown in the point cloud after clustering in fig. 5, the initial point cloud data is divided into 4 clusters by the clustering method of DBSCAN, where the cluster containing the most initial point cloud data is the point cloud data of the target object (for example, a vehicle). And then horizontally projecting the third point cloud data to obtain a Bird's-Eye View (Bird Eye View), and obtaining a convex hull from the Bird's-Eye View (Bird Eye View). As further shown in fig. 5, the convex hull is converted into a minimum bounding box of the bird's eye view, and the minimum bounding box is added with height information to obtain a three-dimensional object frame corresponding to the target object (for example, a vehicle).
After a first radar cloud three-dimensional object frame is obtained through a supervision mode or an unsupervised period mode, the consistency degree of an initial recognition frame of a target object and a three-dimensional object frame corresponding to the target object in first point cloud data is evaluated through a monocular loss function, if the consistency degree reaches a preset threshold value, training of an initial image recognition model is completed, and the trained initial image recognition model is a final image recognition model; if the consistency degree does not reach the preset threshold value, the parameters in the initial image recognition model need to be adjusted, the initial image recognition model after the parameters are adjusted is determined as a new initial image recognition model, and the training process is repeatedly executed until the consistency degree reaches the preset threshold value. The initial identification frame is obtained by identifying an RGB color mode (RGB) image obtained by a monocular detector through an initial image identification model, the RGB color mode (RGB) image and the first point cloud data are obtained in the same scene at the same time, and object information of an image in the RGB color mode (RGB) image corresponds to an object included in the first point cloud data.
For example, after the image recognition model is obtained, after the monocular detector captures an image to be recognized, the image to be recognized may be input to the image recognition model to recognize an object in the image to be recognized.
In the description of this embodiment, in the unsupervised mode, in the process of determining the three-dimensional object frame corresponding to the target object in the first point cloud data according to the first point cloud data, the obtained three-dimensional object frame does not lose any information related to the target object, the information in the three-dimensional object frame is only the information of the target object, and the information of the mixed background point or the shielding point around the target object is removed by the technical means described in the embodiments, so that the information of the object included in the three-dimensional object frame can ensure the recognition accuracy of the image recognition model.
Fig. 6 is a flowchart illustrating an image recognition method according to an embodiment of the present application, where the image recognition method may be executed by an in-vehicle hardware device, and referring to fig. 6, the image recognition method may include:
s601, acquiring an image to be identified.
The method comprises the steps of obtaining an image to be identified of an object to be identified through a monocular detector.
S602, inputting the image to be recognized into an image recognition model to obtain the object in the image to be recognized.
The image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, wherein the point cloud data is acquired through laser radar equipment.
The image recognition model is similar to the image recognition model obtained by the training method shown in any one of the above embodiments, and details are not repeated here.
And inputting the image to be recognized into an image recognition model for image recognition. And inputting the image to be identified into an image identification model to obtain monocular three-dimensional image identification information.
Furthermore, the obtained monocular three-dimensional image identification information is sent to the vehicle-mounted automatic driving device by the image identification model, and the automatic driving device can sense the real three-dimensional world through the received monocular three-dimensional image identification information and avoid collision with other objects on the road.
Fig. 7 is a schematic structural diagram of an image recognition model training apparatus 700 according to an embodiment of the present application, for example, please refer to fig. 7, where the image recognition model training apparatus 700 may include:
an obtaining unit 701, configured to obtain first point cloud data collected by a laser radar device.
A processing unit 702, configured to determine, according to the first point cloud data, a three-dimensional object frame corresponding to a target object in the first point cloud data.
A training unit 703, configured to train an initial image recognition model according to the three-dimensional object frame, to obtain the image recognition model, where the image recognition model is used to recognize an object in an image to be recognized.
Optionally, the processing unit 702 is specifically configured to input the first point cloud data into a pre-trained three-dimensional recognition model based on a laser radar, so as to obtain a three-dimensional object frame corresponding to a target object in the first point cloud data, where the three-dimensional recognition model is obtained by training an initial recognition model by using the three-dimensional object frame corresponding to each object in the second point cloud data.
Optionally, the processing unit 702 is specifically configured to obtain an RGB color mode (RGB) image corresponding to the first point cloud data; segmenting the RGB color mode (RGB) image to obtain a two-dimensional frame and a semantic mask; and determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional frame and the semantic mask.
Optionally, the processing unit 702 is specifically configured to determine, according to the two-dimensional frame and the semantic mask, third point cloud data corresponding to a target object in the first point cloud data; and determining a minimum three-dimensional bounding box covering third point cloud data corresponding to the target object, and determining the minimum three-dimensional bounding box as a three-dimensional object frame corresponding to the target object.
Optionally, the processing unit 702 is specifically configured to determine, according to the two-dimensional frame and the semantic mask, initial point cloud data corresponding to the target object; clustering the initial point cloud data corresponding to the target object to obtain a plurality of clusters; and determining the initial point cloud data in the cluster with the most initial point cloud data in the plurality of clusters as the third point cloud data.
The device 700 for training the image recognition model according to the embodiment of the present application can execute the method for training the image recognition model according to any one of the embodiments, and the implementation principle and the beneficial effect thereof are similar to those of the method for training the image recognition model.
Fig. 8 is a schematic structural diagram of an apparatus 800 for an image recognition method according to an embodiment of the present application, and for example, referring to fig. 8, the apparatus 800 for the image recognition method may include:
an acquiring unit 801, configured to acquire an image to be identified.
The processing unit 802 inputs the image to be recognized into an image recognition model to obtain an object in the image to be recognized, the image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, and the point cloud data is acquired through laser radar equipment.
The device 800 for training an image recognition model according to the embodiment of the present application can execute the image recognition method according to any of the embodiments, and the implementation principle and the beneficial effect of the device are similar to those of the image recognition method, and reference may be made to the implementation principle and the beneficial effect of the image recognition method, which are not described herein again.
Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application, for example, please refer to fig. 9, where the server includes:
the memory 901, the processor 902, and a computer program stored in the memory 901 and capable of being run on the processor 902, where the processor 902 implements the training method of the image recognition model shown in any of the above embodiments when executing the program, and the implementation principle and the beneficial effect thereof are similar to those of the training method of the image recognition model, and reference may be made to the implementation principle and the beneficial effect of the training method of the image recognition model, which is not described herein again.
Fig. 10 is a schematic structural diagram of a vehicle according to an embodiment of the present application, and for example, please refer to fig. 10, the vehicle includes:
the memory 1001, the processor 1002, and the computer program stored in the memory 1001 and capable of being executed on the processor 1002, when the processor 1002 executes the program, implement the image recognition method according to any of the embodiments described above, and the implementation principle and the beneficial effect thereof are similar to those of the image recognition method, which can be referred to as the implementation principle and the beneficial effect of the image recognition method, and no further description is provided herein.
The embodiment of the present application further provides a readable storage medium, where a computer program is stored, and when the program is executed by a processor, the method for training an image recognition model shown in any of the above embodiments is implemented, and the implementation principle and the beneficial effect of the method are similar to those of the method for training an image recognition model, which can be referred to as the implementation principle and the beneficial effect of the method for training an image recognition model, and are not described herein again.
An embodiment of the present application further provides a computer program product, where the computer program product includes: a computer program, where the computer program is stored in a readable storage medium, where the computer program can be read by at least one processor of an electronic device from the readable storage medium, and the at least one processor executes the computer program to enable the electronic device to execute the training method of the image recognition model shown in any of the above embodiments, where the implementation principle and the beneficial effect of the method are similar to those of the training method of the image recognition model, and reference may be made to the implementation principle and the beneficial effect of the training method of the image recognition model, and details are not repeated here.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
Claims (12)
1. A method for training an image recognition model, wherein the image recognition model is applied to a monocular detector, the method comprising:
acquiring first point cloud data acquired by laser radar equipment;
determining a three-dimensional object frame corresponding to a target object in the first point cloud data according to the first point cloud data;
and training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, wherein the image recognition model is used for recognizing the object in the image to be recognized.
2. The method according to claim 1, wherein the determining, according to the first point cloud data, a three-dimensional object frame corresponding to a target object in the first point cloud data comprises:
and inputting the first point cloud data into a pre-trained three-dimensional recognition model based on the laser radar to obtain a three-dimensional object frame corresponding to a target object in the first point cloud data, wherein the three-dimensional recognition model is obtained by training an initial recognition model by adopting a three-dimensional object marking frame corresponding to each object in the second point cloud data.
3. The method according to claim 1, wherein the determining, according to the first point cloud data, a three-dimensional object frame corresponding to a target object in the first point cloud data comprises:
acquiring an RGB image corresponding to the first point cloud data;
segmenting the RGB image to obtain a two-dimensional object frame and a semantic mask;
and determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional object frame and the semantic mask.
4. The method according to claim 3, wherein the determining a three-dimensional object frame corresponding to the target object in the first point cloud data according to the two-dimensional frame and the semantic mask comprises:
determining third point cloud data corresponding to a target object in the first point cloud data according to the two-dimensional frame and the semantic mask;
and determining a minimum three-dimensional bounding box covering third point cloud data corresponding to the target object, and determining the minimum three-dimensional bounding box as a three-dimensional object box corresponding to the target object.
5. The method according to claim 4, wherein the determining, according to the two-dimensional box and the semantic mask, third point cloud data corresponding to a target object in the first point cloud data comprises:
determining initial point cloud data corresponding to the target object according to the two-dimensional frame and the semantic mask;
clustering the initial point cloud data corresponding to the target object to obtain a plurality of clusters;
and determining the initial point cloud data in the cluster with the most initial point cloud data in the plurality of clusters as the third point cloud data.
6. An image recognition method, comprising:
acquiring an image to be identified;
inputting the image to be recognized into an image recognition model to obtain an object in the image to be recognized, wherein the image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, and the point cloud data is acquired through laser radar equipment.
7. An apparatus for training an image recognition model, comprising:
the acquisition unit is used for acquiring first point cloud data acquired by laser radar equipment;
the processing unit is used for determining a three-dimensional object frame corresponding to a target object in the first point cloud data according to the first point cloud data;
and the training unit is used for training an initial image recognition model according to the three-dimensional object frame to obtain the image recognition model, and the image recognition model is used for recognizing the object in the image to be recognized.
8. An image recognition apparatus, comprising:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized;
the processing unit is used for inputting the image to be recognized into an image recognition model to obtain an object in the image to be recognized, the image recognition model is obtained by training an initial image recognition model according to point cloud data and a three-dimensional object frame corresponding to a target object in the point cloud data, and the point cloud data is acquired through laser radar equipment.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of training an image recognition model according to any one of claims 1 to 5 or a method of image recognition according to claim 6.
10. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of training an image recognition model according to any one of claims 1 to 5 when executing the program.
11. A vehicle comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image recognition method of claim 6 when executing the program.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method of training an image recognition model of any one of claims 1-5 or the method of image recognition of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110421118.1A CN113255444A (en) | 2021-04-19 | 2021-04-19 | Training method of image recognition model, image recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110421118.1A CN113255444A (en) | 2021-04-19 | 2021-04-19 | Training method of image recognition model, image recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113255444A true CN113255444A (en) | 2021-08-13 |
Family
ID=77221099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110421118.1A Pending CN113255444A (en) | 2021-04-19 | 2021-04-19 | Training method of image recognition model, image recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255444A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114898354A (en) * | 2022-03-24 | 2022-08-12 | 中德(珠海)人工智能研究院有限公司 | Measuring method and device based on three-dimensional model, server and readable storage medium |
CN115880536A (en) * | 2023-02-15 | 2023-03-31 | 北京百度网讯科技有限公司 | Data processing method, training method, target object detection method and device |
CN116912238A (en) * | 2023-09-11 | 2023-10-20 | 湖北工业大学 | Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion |
WO2024199378A1 (en) * | 2023-03-30 | 2024-10-03 | 北京罗克维尔斯科技有限公司 | Obstacle feature recognition model training method and apparatus, device, and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344812A (en) * | 2018-11-27 | 2019-02-15 | 武汉大学 | A kind of improved single photon point cloud data denoising method based on cluster |
US20190096086A1 (en) * | 2017-09-22 | 2019-03-28 | Zoox, Inc. | Three-Dimensional Bounding Box From Two-Dimensional Image and Point Cloud Data |
CN111160214A (en) * | 2019-12-25 | 2020-05-15 | 电子科技大学 | 3D target detection method based on data fusion |
CN112200129A (en) * | 2020-10-28 | 2021-01-08 | 中国人民解放军陆军航空兵学院陆军航空兵研究所 | Three-dimensional target detection method and device based on deep learning and terminal equipment |
WO2021009258A1 (en) * | 2019-07-15 | 2021-01-21 | Promaton Holding B.V. | Object detection and instance segmentation of 3d point clouds based on deep learning |
CN112462347A (en) * | 2020-12-28 | 2021-03-09 | 长沙理工大学 | Laser radar point cloud rapid classification filtering algorithm based on density clustering |
-
2021
- 2021-04-19 CN CN202110421118.1A patent/CN113255444A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190096086A1 (en) * | 2017-09-22 | 2019-03-28 | Zoox, Inc. | Three-Dimensional Bounding Box From Two-Dimensional Image and Point Cloud Data |
CN109344812A (en) * | 2018-11-27 | 2019-02-15 | 武汉大学 | A kind of improved single photon point cloud data denoising method based on cluster |
WO2021009258A1 (en) * | 2019-07-15 | 2021-01-21 | Promaton Holding B.V. | Object detection and instance segmentation of 3d point clouds based on deep learning |
CN111160214A (en) * | 2019-12-25 | 2020-05-15 | 电子科技大学 | 3D target detection method based on data fusion |
CN112200129A (en) * | 2020-10-28 | 2021-01-08 | 中国人民解放军陆军航空兵学院陆军航空兵研究所 | Three-dimensional target detection method and device based on deep learning and terminal equipment |
CN112462347A (en) * | 2020-12-28 | 2021-03-09 | 长沙理工大学 | Laser radar point cloud rapid classification filtering algorithm based on density clustering |
Non-Patent Citations (1)
Title |
---|
宋一凡,张鹏,宗立波等: "改进的基于冗余点过滤的3D目标检测方法", 《计算机应用》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114898354A (en) * | 2022-03-24 | 2022-08-12 | 中德(珠海)人工智能研究院有限公司 | Measuring method and device based on three-dimensional model, server and readable storage medium |
CN115880536A (en) * | 2023-02-15 | 2023-03-31 | 北京百度网讯科技有限公司 | Data processing method, training method, target object detection method and device |
CN115880536B (en) * | 2023-02-15 | 2023-09-01 | 北京百度网讯科技有限公司 | Data processing method, training method, target object detection method and device |
WO2024199378A1 (en) * | 2023-03-30 | 2024-10-03 | 北京罗克维尔斯科技有限公司 | Obstacle feature recognition model training method and apparatus, device, and storage medium |
CN116912238A (en) * | 2023-09-11 | 2023-10-20 | 湖北工业大学 | Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion |
CN116912238B (en) * | 2023-09-11 | 2023-11-28 | 湖北工业大学 | Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110148196B (en) | Image processing method and device and related equipment | |
CN109087510B (en) | Traffic monitoring method and device | |
CN113255444A (en) | Training method of image recognition model, image recognition method and device | |
CN112528878A (en) | Method and device for detecting lane line, terminal device and readable storage medium | |
CN112949782A (en) | Target detection method, device, equipment and storage medium | |
CN109840463B (en) | Lane line identification method and device | |
CN110796104A (en) | Target detection method and device, storage medium and unmanned aerial vehicle | |
CN111742344A (en) | Image semantic segmentation method, movable platform and storage medium | |
CN111091023A (en) | Vehicle detection method and device and electronic equipment | |
CN112683228A (en) | Monocular camera ranging method and device | |
CN117037103A (en) | Road detection method and device | |
CN114972758A (en) | Instance segmentation method based on point cloud weak supervision | |
CN114913340A (en) | Parking space detection method, device, equipment and storage medium | |
CN114241448A (en) | Method and device for obtaining heading angle of obstacle, electronic equipment and vehicle | |
CN113435350A (en) | Traffic marking detection method, device, equipment and medium | |
CN112733678A (en) | Ranging method, ranging device, computer equipment and storage medium | |
CN116309943B (en) | Parking lot semantic map road network construction method and device and electronic equipment | |
CN112115737B (en) | Vehicle orientation determining method and device and vehicle-mounted terminal | |
CN112509321A (en) | Unmanned aerial vehicle-based driving control method and system for urban complex traffic situation and readable storage medium | |
CN116778262A (en) | Three-dimensional target detection method and system based on virtual point cloud | |
CN113902927B (en) | Comprehensive information processing method fusing image and point cloud information | |
CN115578703A (en) | Laser perception fusion optimization method, device and equipment and readable storage medium | |
CN115346081A (en) | Power transmission line point cloud data classification method based on multi-data fusion | |
CN114724119A (en) | Lane line extraction method, lane line detection apparatus, and storage medium | |
CN111611942B (en) | Method for extracting and building database by perspective self-adaptive lane skeleton |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210813 |