CN114863071A

CN114863071A - Target object labeling method and device, storage medium and electronic equipment

Info

Publication number: CN114863071A
Application number: CN202210503463.4A
Authority: CN
Inventors: 王帅; 王剑
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-05

Abstract

The disclosure relates to a target object labeling method, a target object labeling device, a storage medium and electronic equipment. The method comprises the following steps: responding to a received labeling request aiming at a target object, and acquiring a target three-dimensional scene image, wherein the target three-dimensional scene image comprises a virtual target object model established for the target object; acquiring first position coordinates of each vertex of the virtual target object model in a world coordinate system to which the virtual target object model belongs; determining second position coordinates of the target vertexes in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of the vertexes; and determining position labeling information of the target three-dimensional scene image related to the target task according to the second position coordinate of the target vertex. Therefore, more various and complex labeling information can be constructed by simulating a real environment, the labeling information of the target object can be effectively acquired, and a rich data set can be acquired.

Description

Target object labeling method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a target object labeling method and apparatus, a storage medium, and an electronic device.

Background

With the continuous development of artificial intelligence technology and the increasing demand for image processing, the application of labeling technology in the fields of target tracking, target detection and the like is wider, and the method aims to input a picture into a model, obtain a labeling result of an object in the picture output by the model and further realize target tracking or detection. For machine learning based on images and training and testing verification of neural network models, abundant image data sets and a large number of target object labels are the basis for obtaining models with higher tracking or detection accuracy. Therefore, how to enrich the image data set and the labeling information is the key point of research in the tasks of machine learning and training and testing and verifying of a neural network model.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a target object labeling method, including:

responding to a received labeling request aiming at a target object, and acquiring a target three-dimensional scene image, wherein the target three-dimensional scene image comprises a virtual target object model established for the target object;

acquiring first position coordinates of each vertex of the virtual target object model in a world coordinate system to which the virtual target object model belongs;

determining second position coordinates of the target vertexes in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of the vertexes;

and determining position labeling information of the target three-dimensional scene image related to the target task according to the second position coordinate of the target vertex.

In a second aspect, the present disclosure provides a target object labeling apparatus, the apparatus comprising:

the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for responding to a received labeling request aiming at a target object and obtaining a target three-dimensional scene image, and the target three-dimensional scene image comprises a virtual target object model established for the target object;

the second acquisition module is used for acquiring first position coordinates of each vertex of the virtual target object model in a world coordinate system to which the virtual target object model belongs;

a first determining module, configured to determine, according to configuration information of a rendering camera in the target three-dimensional scene image and the first position coordinates of the vertices, second position coordinates of the target vertices in a two-dimensional image corresponding to the rendering camera;

and the second determining module is used for determining the position marking information of the target three-dimensional scene image related to the target task according to the second position coordinate of the target vertex.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having at least one computer program stored thereon;

at least one processing device for executing the at least one computer program in the storage device to implement the steps of the method in the first aspect.

According to the technical scheme, the first position coordinates of each vertex of the virtual target object model in the target three-dimensional scene image in the world coordinate system to which the virtual target object model belongs are converted into the second position coordinates of the target vertex in the two-dimensional image corresponding to the rendering camera, namely, the position information of the target object in the three-dimensional scene in the two-dimensional image corresponding to the rendering camera is obtained through space position conversion, and then the position marking information of the target object is determined according to the position information. Therefore, more various and complex labeling information can be constructed by simulating a real environment, the labeling information of the target object can be effectively acquired, and a rich data set can be acquired.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow diagram illustrating a method for target object annotation in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating a virtual target object model, a virtual environment object model, and a rendering camera in a three dimensional scene image in accordance with an exemplary embodiment;

FIG. 3 is a two-dimensional image corresponding to the rendering camera shown in FIG. 2;

FIG. 4 is a block diagram illustrating a target object annotation device in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In the related art, the following challenges exist in enriching image data sets and annotation information: on one hand, due to the privacy protection problem or the restriction of a non-universal scene, the difficulty of image acquisition is gradually increased, and the diversity and the data set quality of an image detection environment are restricted; on the other hand, the traditional image labeling work needs a large amount of manual intervention, or pre-labeling is performed by means of machine learning or deep learning, and the pre-labeling mode is performed by means of machine learning or deep learning, so that the transfer performance of the used model is seriously depended on, and the labeling of a brand-new data set still needs to be corrected by later manual intervention, and the defects of low efficiency and large error exist.

In addition, there is a method of creating a data set by creating a virtual target image by adding a texture map to a 3D model. Each pose, each angle, of an object is modeled by 3D software (e.g., 3DMAX software, Maya software, Poser software, etc.), and then a series of pictures of the object are generated as a dataset by rendering.

The method improves the mode of purely relying on traditional real image acquisition, and can generate richer virtual image data sets. However, the limitations of this current approach are: on one hand, only the target object is modeled simply, and labeling is not needed, so that a data set is directly obtained, the purpose is single, and the universality is weak; on the other hand, the comprehensive relationship between light and material in the real environment cannot be well simulated, such as reflection and scattering of ambient light, brightness and definition of a target object, and the perspective angle and shielding degree of the target object in space, so that the target object has larger characteristic deviation with image data collected in the real environment. Therefore, the method cannot accurately acquire the labeling information of the target object, and thus cannot acquire a richer data set.

In view of this, the present disclosure provides a target object labeling method, device, storage medium and electronic device, so as to effectively acquire labeling information of a target object, and further acquire a rich data set.

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

All actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

Fig. 1 is a flowchart illustrating a target object labeling method according to an exemplary embodiment. As shown in fig. 1, the method may include the following steps.

In step S11, in response to receiving an annotation request for the target object, a target three-dimensional scene image is acquired. The target three-dimensional scene image comprises a virtual target object model created for a target object.

It should be understood that, with the update iteration of the hardware such as the GPU and the CPU, and the upgrade of the algorithm capability of the software such as the three-dimensional graphics renderer, the rendering of the three-dimensional virtual scene brings qualitative improvement in the aspects of light, material, and the like, so that the virtual scene can achieve the photo-level display effect. Secondly, it should be understood that in the three-dimensional scene image, a variety of three-dimensional scene images can be obtained only by setting different behavior or posture parameters of the target object, different configuration information of the rendering camera, and different environment information. Therefore, in view of the above, the target object is labeled based on the three-dimensional scene image in the present disclosure.

In step S12, the first position coordinates of each vertex of the virtual target object model in the world coordinate system to which the virtual target object model belongs are acquired.

In the present disclosure, the first position coordinates of each vertex of the virtual target object model may be obtained through an API (Application Program Interface) embedded in the three-dimensional modeling software. Illustratively, a third script tool for a user to traverse each vertex of the virtual target object model is obtained, and in the target three-dimensional scene image, each vertex of the virtual target object model is traversed according to the third script tool to obtain a first position coordinate of each vertex in a world coordinate system to which the virtual target object model belongs. For example, a Get Object All Vectors In World 3D Axis () function is called to obtain the first position coordinates of each vertex In the World coordinate system. The world coordinate system is coordinate information of the target object in a three-dimensional scene global coordinate system, and the first position coordinate is a three-dimensional coordinate.

In step S13, the second position coordinates of the vertices of the object in the two-dimensional image corresponding to the rendering camera are determined based on the configuration information of the rendering camera in the object three-dimensional scene image and the first position coordinates of the vertices.

In an embodiment, the target vertices may be the vertices determined in step S12, and accordingly, the specific implementation manner of step S13 is: and according to the configuration information of the rendering camera in the target three-dimensional scene image, acquiring a first script tool for converting the three-dimensional coordinates into two-dimensional coordinates, and calling the first script tool to convert the first position coordinates of each vertex into second position coordinates of each vertex in the two-dimensional image corresponding to the rendering camera.

In another embodiment, the target vertex is a vertex visible in the two-dimensional image corresponding to the rendering camera, that is, a vertex not occluded by the environmental object, and accordingly, the specific implementation manner of step S13 is: according to configuration information of a rendering camera in the target three-dimensional scene image, acquiring a second script tool for determining a first position coordinate of a visible vertex in the first position coordinates of the vertexes, and calling the second script tool to obtain the first position coordinate of the visible vertex; a first script tool for converting three-dimensional coordinates to two-dimensional coordinates is invoked to convert first location coordinates of the visible vertex to second location coordinates in a two-dimensional image corresponding to the rendering camera.

In yet another embodiment, the target vertices are the vertices determined in step S12 and the vertices visible in the two-dimensional image corresponding to the rendering camera. Exemplarily, pseudo codes for converting the first position coordinates V (g) of each vertex into the second position coordinates V (c) of each vertex in the two-dimensional image corresponding to the rendering camera and converting into the second position coordinates V (c') of a vertex visible in the two-dimensional image corresponding to the rendering camera are as follows:

Plain Text

input:given vertex set V(g)in world axis；

output:vertex set V(c)in camera local 2D axis；

V(c)←Convert To Camera 2D Axis(c,V(g))

V(g')←Get Visible Vertices(c,V(g))

V(c')←Convert To Camera 2D Axis(c,V(g'))

wherein c represents the two-dimensional image corresponding to the rendering camera, and V (g') represents the first position coordinate of the visible vertex.

In step S14, position labeling information of the target three-dimensional scene image related to the target task is determined based on the second position coordinates of the target vertex.

By adopting the technical scheme, the first position coordinates of each vertex of the virtual target object model in the target three-dimensional scene image in the world coordinate system to which the virtual target object model belongs are converted into the second position coordinates of the target vertex in the two-dimensional image corresponding to the rendering camera, namely, the position information of the target object in the three-dimensional scene in the two-dimensional image corresponding to the rendering camera is obtained through space position conversion, and then the position marking information of the target object is determined according to the position information. Therefore, more various and complex labeling information can be constructed by simulating a real environment, the labeling information of the target object can be effectively acquired, and a rich data set can be acquired.

For example, before acquiring the target three-dimensional scene image, the target object labeling method may further include:

first, in response to receiving a request for constructing a three-dimensional scene image of a target object, a virtual target object model is created for the target object, and a virtual environment object model is created for a preset environment object. For example, if the target object is an object in an outdoor environment, the environmental objects may include, but are not limited to: buildings, vegetation, static obstacles, vehicles, and the like. Among them, a large number of built-in or usable models are provided in existing three-dimensional content authoring software and real-time rendering software, and therefore, in the present disclosure, a virtual target object model can be created for a target object and a virtual environment object model can be created for a preset environment object through the three-dimensional content authoring software and the real-time rendering software. Therefore, the workload of constructing the three-dimensional scene image can be reduced, and the efficiency of constructing the three-dimensional scene image is improved.

And then, acquiring animations of the virtual target object model and the virtual environment object model and configuration information of the rendering camera to obtain a plurality of frames of three-dimensional scene images.

For example, a user may set, according to actual needs, animations of the virtual target object model and the virtual environment object model and configuration information of the rendering camera on the human-computer interaction interface, and then the device executing the target object labeling method may acquire the animations of the virtual target object model and the virtual environment object model and the configuration information of the rendering camera. The position change of the virtual target object model and the virtual environment object model is realized through the animation, so that rich target object position information is obtained, and the diversity of the data set and the application range expansion are facilitated. The configuration information of the rendering camera may include information of a position, an angle, a focal length, etc. of the rendering camera, and the configuration information of the rendering camera is related to a relative position of a target object and an environmental object, a perspective condition, and a light texture calculation, an occlusion condition, a blur degree, etc. in a subsequently generated three-dimensional scene image, which are finally presented in a rendering picture of the camera. Therefore, in the present disclosure, multiple frames of three-dimensional scene images may be obtained by setting animations of the virtual target object model and the virtual environment object model, and configuration information of different rendering cameras.

It should be understood that one or more rendering cameras may be included in each frame of the three-dimensional scene image, as the present disclosure is not limited in this respect. For convenience of description, the following description will be given by taking an example in which each frame of three-dimensional scene image includes one rendering camera.

For example, the positions of the virtual target object model and the virtual environment object model are fixed, and different three-dimensional scene images can be obtained by adjusting the configuration information of the rendering camera. For another example, the configuration information of the rendering camera is fixed, and by changing the position of the virtual target object model and/or the virtual environment object model, different three-dimensional scene images can be obtained. For another example, the positions of the configuration information of the rendering camera, the virtual target object model and the virtual environment object model are adjusted simultaneously to obtain different three-dimensional scene images.

Therefore, according to the above manner, a plurality of frames of three-dimensional scene images including the virtual target object model and the virtual environment object model can be obtained, and the three-dimensional scene images can ideally present the mutual influence between various different objects (target objects and objects) in the scene, such as light, material, shielding condition, definition and the like, so as to more approximate the target object effect in the real scene.

Correspondingly, in response to receiving an annotation request for a target object, a specific way of acquiring a target three-dimensional scene image is as follows: and in response to receiving an annotation request aiming at the target object, sequentially determining each frame of three-dimensional scene image as a target three-dimensional scene image. That is, for each frame of three-dimensional scene image, the position labeling information of the three-dimensional scene image related to the target task can be determined according to the target object labeling method provided by the present disclosure. Therefore, data of the target three-dimensional scene image are enriched, and then the labeling information of a large number of target objects can be acquired.

Next, a specific embodiment in which the position specification information of the target three-dimensional scene image related to the target task is determined based on the second position coordinates of the target vertex in step S14 in fig. 1 will be described.

In one embodiment, the target task is a split task. The specific implementation manner of the step S14 is as follows: and determining the second position coordinate of the target boundary vertex of the virtual target object model in the two-dimensional image corresponding to the rendering camera according to the second position coordinate of the target vertex, and determining the second position coordinate of the target boundary vertex as the position marking information of the target three-dimensional scene image related to the segmentation task.

It is noted that, in this embodiment, the target vertices may be the respective vertices and/or the visible vertices in step S12, which is not specifically limited by the present disclosure. When the target vertex is a visible vertex, the target boundary vertex is marked as a visible boundary vertex. The following description will be given taking, as examples, target vertices as the respective vertices and visible vertices, and target boundary vertices as the respective boundary vertices and visible boundary vertices.

Illustratively, second position coordinates V(s) of each boundary vertex of the virtual target object model in the two-dimensional image corresponding to the rendering camera are calculated based on the second position coordinates V (c) of each vertex, and second position coordinates V (s ') of the visible boundary vertex of the virtual target object model in the two-dimensional image corresponding to the rendering camera are calculated based on the second position coordinates V (c') of the visible vertex. For example, the second position coordinates V(s) and V (s') of the target boundary vertex can be obtained by using a graph boundary (contour) search algorithm. For example, the algorithm Basic Find contacts () of the relevant Find profile provided in the OpenCV library. The pseudo code for determining the second position coordinate of the target boundary vertex through the Basic Find contacts () algorithm is as follows:

Plain Text

input:given vertex sets V(c)and V(c')in camera local 2D axis；

output:vertex set V(s)and V(s')for object contours；

V(s)←Basic Find Contours(V(c))

V(s')←Basic Find Contours(V(c'))

in this embodiment, the determined second position coordinates V(s) and V (s') of the target boundary vertices are position labeling information of the target three-dimensional scene image related to the segmentation task. That is, the second position coordinates V(s) and V (s ') of the target boundary vertices may be used as position labeling information for training and testing the verification segmentation model, and subsequently, a sample data set for training and testing the verification segmentation model may be further constructed according to the second position coordinates V(s) and V (s') of the target boundary vertices.

In another embodiment, the target task is a detection task, and the specific implementation manner of step S14 is as follows: respectively determining the minimum value and the maximum value of the first coordinate axis and the minimum value and the maximum value of the second coordinate axis according to the second position coordinate of the target vertex; determining the minimum value of the first coordinate axis and the minimum value of the second coordinate axis as the position coordinate of the peak of the minimum value, and determining the maximum value of the first coordinate axis and the maximum value of the second coordinate axis as the position coordinate of the peak of the maximum value; and determining the position coordinate of the minimum value peak and the position coordinate of the maximum value peak as the position marking information of the target three-dimensional scene image related to the detection task.

Similarly, in this embodiment, the target vertices may be the respective vertices and/or the visible vertices in step S12, which is not specifically limited by the present disclosure. The following description will take the target vertices as the respective vertices and the visible vertices as an example.

Illustratively, the second position coordinate V (b) of the most significant vertex is calculated based on the second position coordinate V (c) of each vertex, and the second position coordinate V (b ') of the most significant vertex is calculated based on the second position coordinate V (c') of the visible vertex, the most significant vertex including the position coordinate of the maximum value vertex and the position coordinate of the minimum value vertex. Wherein the pseudo code for determining the second position coordinate of the most significant vertex, V (b), and the second position coordinate of the most significant vertex, V (b'), that is visible is as follows:

Plain Text

input:given contour vertex sets V(c)and V(c')；

output:bounding boxes V(b)and V(b’)；

V(b)←Basic Get Bounding Boxes(V(c))

V(b’)←Basic Get Bounding Boxes(V(c'))

the description of the values is that the manner of obtaining the second position coordinate V (b) of the maximum vertex is similar to the manner of obtaining the second position coordinate V (b') of the maximum vertex, and the following description will be given only by taking the second position coordinate V (b) of the maximum vertex as an example. Wherein, the pseudo code of Basic Get Bounding Boxes () is as follows:

similarly, if V < < X, Y > > in the pseudo code is replaced by the second position coordinate V (c ') of the visible vertex, then the second position coordinate V (b') of the most visible vertex can be obtained according to the pseudo code.

In this embodiment, the second position coordinate V (b) of the determined maximum vertex and the second position coordinate V (b') of the visible maximum vertex are position labeling information of the target three-dimensional scene image related to the detection task. That is, V (b) and V (b ') may be used as position labeling information for training and testing the verification detection model, and subsequently, a sample data set for training and testing the verification detection model may be constructed according to V (b) and V (b').

It should be understood that in the present disclosure, different position coordinates may be obtained according to actual requirements, and for example, the first position coordinates V (g) of each vertex, the first position coordinates V (g ') of the visible vertex, the second position coordinates V (c) of each vertex, the second position coordinates V (c') of the visible vertex, the second position coordinates V(s) and V (s ') of the target boundary vertex, and the second position coordinates V (b) of the maximum vertex and the second position coordinates V (b') of the maximum vertex may be obtained simultaneously. Further illustratively, some of the position coordinates may be acquired, for example, only V (g), V (c '), V (s '), and V (b '). The present disclosure does not specifically limit this.

According to the mode, the position marking information of each target three-dimensional scene image related to the target task can be obtained, and then the position marking information can be stored.

Illustratively, first, for each target three-dimensional scene image, basic information of a virtual target object model, configuration information of a rendering camera and frame information of the target three-dimensional scene image are acquired and used as tag information, and an annotation set of the target three-dimensional scene image is generated according to the tag information and position annotation information of the target three-dimensional scene image, so as to obtain a plurality of annotation sets.

The basic information of the virtual target object model may include, but is not limited to: the name, the center position, the zoom size and other basic information of the target object. The configuration information of the rendering camera may include, but is not limited to: and rendering basic information such as camera name, position, focal length, resolution and the like. The frame information of the target three-dimensional scene image may include, but is not limited to: frame number, frame interval, frame rate, and other basic information. In the present disclosure, one annotation set may be obtained for each target three-dimensional scene image, and then when a plurality of target three-dimensional scene images are available, a plurality of annotation sets may be obtained.

And then, storing the three-dimensional scene identification corresponding to the multi-frame three-dimensional scene image and the plurality of label sets in a correlated manner as a label information file.

It should be understood that, in the present disclosure, multiple frames of three-dimensional scene images belong to the same three-dimensional scene, and the multiple frames of three-dimensional scene images in the three-dimensional scene can be obtained only by changing the positions of the target object and the environmental object and the configuration information of the rendering camera, that is, the above-mentioned multiple frames of three-dimensional scene images belong to the same three-dimensional scene, that is, the multiple frames of three-dimensional scene images have the same three-dimensional scene identifier. Therefore, in the present disclosure, the three-dimensional scene identifiers corresponding to the multiple frames of three-dimensional scene images are stored in association with the plurality of label sets determined above.

Illustratively, the pseudo code for storing the three-dimensional scene identifiers corresponding to the multiple frames of three-dimensional scene images in association with the determined multiple annotation sets is as follows:

in the pseudo code, meta _ info represents three-dimensional scene basic information corresponding to a plurality of frames of three-dimensional scene images, that is, three-dimensional scene identification. object _ basic _ info represents basic information of the virtual target object model, camera _ basic _ info represents configuration information of the rendering camera, and frame _ basic _ info represents frame information of the target three-dimensional scene image.

In the present disclosure, in order to provide sufficient flexibility and easy maintenance to the stored annotation information file, the stored annotation information file can be saved in a format such as yaml, json, xml, and the like based on a text format. The mode is convenient for the subsequent secondary operations of loading, adding, deleting and adjusting the label of the specified range of the label information.

FIG. 2 is a schematic diagram illustrating a virtual target object model, a virtual environment object model, and a rendering camera in a three-dimensional scene image, according to an example embodiment. As shown in fig. 2, the target object is a cube and the environment object is a sphere. Fig. 3 is a two-dimensional image corresponding to the rendering camera shown in fig. 2. The pseudo code of the labeling process for the target object cube is as follows, and the obtained labeling information file is stored in a text-form yaml format.

By adopting the marking method provided by the disclosure, more various and complex marking information can be manufactured by simulating a real environment, the richness degree and the data set quality of later-stage data set manufacturing are greatly improved, and great assistance and convenience are brought to the training and test verification of a model algorithm. In addition, the labeling method can further expand and produce various labeling tools, model training and test verification tools, pipeline tools and other applications.

Based on the same concept, the present disclosure provides a target object labeling apparatus. FIG. 4 is a block diagram illustrating a target object annotation device in accordance with an exemplary embodiment. As shown in fig. 4, the target object labeling apparatus 500 may include:

a first obtaining module 501, configured to, in response to receiving an annotation request for a target object, obtain a target three-dimensional scene image, where the target three-dimensional scene image includes a virtual target object model created for the target object;

a second obtaining module 502, configured to obtain a first position coordinate of each vertex of the virtual target object model in a world coordinate system to which the virtual target object model belongs;

a first determining module 503, configured to determine, according to configuration information of a rendering camera in the target three-dimensional scene image and the first position coordinates of each vertex, second position coordinates of a target vertex in a two-dimensional image corresponding to the rendering camera;

a second determining module 504, configured to determine, according to the second position coordinate of the target vertex, position annotation information of the target three-dimensional scene image, which is related to the target task.

Optionally, the apparatus 500 further comprises:

the system comprises a creating module, a generating module and a display module, wherein the creating module is used for responding to a received request for constructing a three-dimensional scene image of a target object, creating a virtual target object model for the target object and creating a virtual environment object model for a preset environment object;

the third acquisition module is used for acquiring animations of the virtual target object model and the virtual environment object model and configuration information of a rendering camera so as to obtain a plurality of frames of three-dimensional scene images;

the first obtaining module comprises:

and the first determining submodule is used for sequentially determining each frame of three-dimensional scene image as a target three-dimensional scene image in response to receiving an annotation request aiming at the target object.

Optionally, the apparatus 500 further comprises:

a fourth obtaining module, configured to obtain, for each target three-dimensional scene image, basic information of the virtual target object model, configuration information of the rendering camera, and frame information of the target three-dimensional scene image as tag information, and generate an annotation set of the target three-dimensional scene image according to the tag information and the position annotation information of the target three-dimensional scene image to obtain multiple annotation sets;

and the storage module is used for storing the three-dimensional scene identifications corresponding to the multi-frame three-dimensional scene images and the plurality of label sets in a correlation manner as label information files.

Optionally, the target vertex is the each vertex; the first determining module 503 includes:

the first obtaining submodule is used for obtaining a first script tool for converting three-dimensional coordinates into two-dimensional coordinates according to configuration information of a rendering camera in the target three-dimensional scene image;

and the first calling sub-module is used for calling the first script tool so as to convert the first position coordinates of the vertexes into second position coordinates of the vertexes in the two-dimensional image corresponding to the rendering camera.

Optionally, the target vertex is a vertex visible in the two-dimensional image corresponding to the rendering camera;

the first determining module 503 includes:

the second obtaining submodule is used for obtaining a second script tool used for determining the first position coordinates of the visible vertexes in the first position coordinates of the vertexes according to the configuration information of the rendering camera in the target three-dimensional scene image, and calling the second script tool to obtain the first position coordinates of the visible vertexes;

and the second calling submodule is used for calling a first script tool for converting the three-dimensional coordinates into two-dimensional coordinates so as to convert the first position coordinates of the visible vertex into second position coordinates in the two-dimensional image corresponding to the rendering camera.

Optionally, the target task is a segmentation task; the second determining module 504 includes:

the second determining submodule is used for determining a second position coordinate of a target boundary vertex of the virtual target object model in the two-dimensional image corresponding to the rendering camera according to the second position coordinate of the target vertex;

and the third determining submodule is used for determining the second position coordinate of the target boundary vertex as the position labeling information of the target three-dimensional scene image related to the segmentation task.

Optionally, the target task is a detection task, and the second determining module 504 includes:

the fourth determining submodule is used for respectively determining the minimum value and the maximum value of the first coordinate axis and the minimum value and the maximum value of the second coordinate axis according to the second position coordinate of the target vertex;

the fifth determining submodule is used for determining the minimum value of the first coordinate axis and the minimum value of the second coordinate axis as the position coordinate of the peak of the minimum value, and determining the maximum value of the first coordinate axis and the maximum value of the second coordinate axis as the position coordinate of the peak of the maximum value;

and the sixth determining submodule is used for determining the position coordinate of the minimum value vertex and the position coordinate of the maximum value vertex as the position labeling information of the target three-dimensional scene image related to the detection task.

Optionally, the second obtaining module 502 includes:

a third obtaining submodule, configured to obtain a third script tool for traversing each vertex of the virtual target object model;

and the traversing submodule is used for traversing each vertex of the virtual target object model in the target three-dimensional scene image according to the third script tool so as to obtain a first position coordinate of each vertex in a world coordinate system to which the virtual target object model belongs.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Based on the same concept, the embodiments of the present disclosure further provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processing apparatus, implements the steps of any one of the above-mentioned target object labeling methods.

Based on the same concept, an embodiment of the present disclosure further provides an electronic device, including:

a storage device having at least one computer program stored thereon;

at least one processing device, configured to execute the at least one computer program in the storage device to implement the steps of any one of the above target object labeling methods.

Referring now to FIG. 5, shown is a block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to a received labeling request aiming at a target object, and acquiring a target three-dimensional scene image, wherein the target three-dimensional scene image comprises a virtual target object model established for the target object; acquiring first position coordinates of each vertex of the virtual target object model in a world coordinate system to which the virtual target object model belongs; determining second position coordinates of the target vertexes in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of the vertexes; and determining position labeling information of the target three-dimensional scene image related to the target task according to the second position coordinate of the target vertex.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides a target object labeling method according to one or more embodiments of the present disclosure, including:

Example 2 provides the method of example 1, further including, in accordance with one or more embodiments of the present disclosure:

in response to receiving a request for constructing a three-dimensional scene image of a target object, creating a virtual target object model for the target object and creating a virtual environment object model for a preset environment object;

acquiring animations of the virtual target object model and the virtual environment object model and configuration information of a rendering camera to obtain a plurality of frames of three-dimensional scene images;

the acquiring the target three-dimensional scene image in response to receiving the annotation request aiming at the target object comprises the following steps:

and in response to receiving an annotation request aiming at the target object, sequentially determining each frame of three-dimensional scene image as a target three-dimensional scene image.

Example 3 provides the method of example 2, further comprising, in accordance with one or more embodiments of the present disclosure:

for each target three-dimensional scene image, acquiring basic information of the virtual target object model, configuration information of the rendering camera and frame information of the target three-dimensional scene image as label information, and generating an annotation set of the target three-dimensional scene image according to the label information and the position annotation information of the target three-dimensional scene image to obtain a plurality of annotation sets;

and associating and storing the three-dimensional scene identifications corresponding to the multi-frame three-dimensional scene images with the plurality of label sets as label information files.

Example 4 provides the method of example 1, the target vertices being the vertices; determining second position coordinates of the target vertex in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of each vertex, including:

acquiring a first script tool for converting a three-dimensional coordinate into a two-dimensional coordinate according to configuration information of a rendering camera in the target three-dimensional scene image;

and calling the first script tool to convert the first position coordinates of the vertexes into second position coordinates of the vertexes in the two-dimensional image corresponding to the rendering camera.

Example 5 provides the method of example 1, the target vertex being a vertex visible in a two-dimensional image corresponding to the rendering camera;

determining second position coordinates of the target vertex in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of each vertex, including:

according to configuration information of a rendering camera in the target three-dimensional scene image, acquiring a second script tool for determining a first position coordinate of the visible vertex in the first position coordinates of the vertexes, and calling the second script tool to obtain the first position coordinate of the visible vertex;

and calling a first script tool for converting the three-dimensional coordinates into two-dimensional coordinates so as to convert the first position coordinates of the visible vertex into second position coordinates in the two-dimensional image corresponding to the rendering camera.

Example 6 provides the method of any one of examples 1-5, the target task being a segmentation task, according to one or more embodiments of the present disclosure; the determining, according to the second position coordinate of the target vertex, position labeling information of the target three-dimensional scene image, which is related to the target task, includes:

determining a second position coordinate of a target boundary vertex of the virtual target object model in the two-dimensional image corresponding to the rendering camera according to the second position coordinate of the target vertex;

and determining the second position coordinate of the target boundary vertex as the position labeling information of the target three-dimensional scene image related to the task.

Example 7 provides the method of any one of examples 1-5, wherein the target task is a detection task, and determining, according to the second position coordinate of the target vertex, position labeling information of the target three-dimensional scene image related to the target task includes:

respectively determining the minimum value and the maximum value of the first coordinate axis and the minimum value and the maximum value of the second coordinate axis according to the second position coordinate of the target vertex;

determining the minimum value of the first coordinate axis and the minimum value of the second coordinate axis as the position coordinate of the peak of the minimum value, and determining the maximum value of the first coordinate axis and the maximum value of the second coordinate axis as the position coordinate of the peak of the maximum value;

and determining the position coordinate of the minimum vertex and the position coordinate of the maximum vertex as the position labeling information of the target three-dimensional scene image related to the detection task.

Example 8 provides the method of any one of examples 1-5, wherein the obtaining first location coordinates of vertices of the virtual target object model in a world coordinate system to which the virtual target object model belongs, includes:

acquiring a third script tool for traversing each vertex of the virtual target object model;

traversing each vertex of the virtual target object model in the target three-dimensional scene image according to the third script tool to obtain a first position coordinate of each vertex in a world coordinate system to which the virtual target object model belongs.

Example 9 provides, in accordance with one or more embodiments of the present disclosure, a target object labeling apparatus, the apparatus comprising:

Example 10 provides the apparatus of example 9, the apparatus further comprising, in accordance with one or more embodiments of the present disclosure:

the first obtaining module comprises:

Example 11 provides the apparatus of example 10, the apparatus further comprising, in accordance with one or more embodiments of the present disclosure:

Example 12 provides the apparatus of example 9, the target vertices being the vertices, in accordance with one or more embodiments of the present disclosure; the first determining module includes:

Example 13 provides the apparatus of example 9, the target vertex being a vertex visible in a two-dimensional image corresponding to the rendering camera, in accordance with one or more embodiments of the present disclosure; the first determining module includes:

Example 14 provides the apparatus of any one of examples 9-13, the target task being a segmentation task, in accordance with one or more embodiments of the present disclosure; the second determining module includes:

Example 15 provides the apparatus of any one of examples 9-13, the target task being a detection task, the second determining module including:

Example 16 provides the apparatus of any one of examples 9-13, the second obtaining module comprising:

Example 17 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-8, in accordance with one or more embodiments of the present disclosure.

Example 18 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:

a storage device having at least one computer program stored thereon;

at least one processing device for executing the at least one computer program in the storage device to implement the steps of the method of any of examples 1-8.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Claims

1. A target object labeling method is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein the target vertices are the vertices; determining second position coordinates of the target vertex in the two-dimensional image corresponding to the rendering camera according to the configuration information of the rendering camera in the target three-dimensional scene image and the first position coordinates of each vertex, including:

5. The method of claim 1, wherein the target vertex is a vertex visible in a two-dimensional image corresponding to the rendering camera;

6. The method according to any one of claims 1-5, wherein the target task is a segmentation task; the determining, according to the second position coordinate of the target vertex, position labeling information of the target three-dimensional scene image, which is related to the target task, includes:

and determining the second position coordinate of the target boundary vertex as the position labeling information of the target three-dimensional scene image related to the segmentation task.

7. The method according to any one of claims 1 to 5, wherein the target task is a detection task, and the determining, according to the second position coordinate of the target vertex, the position labeling information of the target three-dimensional scene image related to the target task comprises:

8. The method according to any one of claims 1-5, wherein said obtaining first position coordinates of vertices of the virtual target object model in a world coordinate system to which the virtual target object model belongs comprises:

and traversing each vertex of the virtual target object model in the target three-dimensional scene image according to the third script tool to obtain a first position coordinate of each vertex in a world coordinate system to which the virtual target object model belongs.

9. A target object labeling apparatus, the apparatus comprising:

10. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 8.

11. An electronic device, comprising:

a storage device having at least one computer program stored thereon;

at least one processing device for executing the at least one computer program in the storage device to carry out the steps of the method according to any one of claims 1 to 8.