Nothing Special   »   [go: up one dir, main page]

CN110766152B - Method and apparatus for training deep neural networks - Google Patents

Method and apparatus for training deep neural networks Download PDF

Info

Publication number
CN110766152B
CN110766152B CN201810844262.4A CN201810844262A CN110766152B CN 110766152 B CN110766152 B CN 110766152B CN 201810844262 A CN201810844262 A CN 201810844262A CN 110766152 B CN110766152 B CN 110766152B
Authority
CN
China
Prior art keywords
depth map
training sample
planar region
region
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810844262.4A
Other languages
Chinese (zh)
Other versions
CN110766152A (en
Inventor
李斐
田虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201810844262.4A priority Critical patent/CN110766152B/en
Publication of CN110766152A publication Critical patent/CN110766152A/en
Application granted granted Critical
Publication of CN110766152B publication Critical patent/CN110766152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a method and apparatus for training a deep neural network. According to one embodiment of the present disclosure, the method comprises the steps of: aiming at each training sample image in the training set, a depth neural network is used for generating a corresponding estimated depth image according to the training sample images; calculating a loss of the training sample image based on the training sample depth map and the estimated depth map of the training sample image; and optimizing parameters of the neural network based on the calculated losses, wherein the losses include a loss term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map. The trained depth neural network obtained by the method and the device can improve the accuracy of estimating the depth map under the condition of using a single input image.

Description

Method and apparatus for training deep neural networks
Technical Field
The present disclosure relates generally to the field of three-dimensional image processing, and in particular, to a method and apparatus for training a deep neural network.
Background
In recent years, with the development of three-dimensional imaging technology, digitized three-dimensional objects have been widely used in many fields of people's daily life, such as augmented reality, digital museums, three-dimensional printing, and the like. An important aspect of three-dimensional imaging techniques is three-dimensional reconstruction techniques. Depth information is critical for three-dimensional reconstruction. In general, depth may be estimated from a single image, two images, or more than two images. Where only one image is needed to estimate depth from a single image, and the estimated depth may be conveniently used for computer vision applications such as object recognition, pose estimation, and the like.
For depth map estimation based on a single image, the input is one image, and the output is the corresponding depth map. In recent years, mainstream depth map estimation methods based on a single image generally use a depth neural network to mine the relationship between visual information and depth information. In order to obtain a more accurate depth estimation result, researchers have proposed many effective methods for training a depth neural network, such as reasonably utilizing gradient data of depth, introducing multi-scale image information, and the like. However, most existing methods focus only on the color and depth data itself. Since the relationship between an image and its corresponding depth map is quite complex, it is difficult to learn a direct mapping model between the two. To further improve the performance of depth estimation based on a single image, other additional information needs to be employed. Therefore, when estimating a depth map using a single image, the accuracy of the depth estimation needs to be improved.
Disclosure of Invention
A brief summary of the disclosure will be presented below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
According to one aspect of the present disclosure, there is provided a method for training a deep neural network, the method comprising the steps of: aiming at each training sample image in the training set, a depth neural network is used for generating a corresponding estimated depth image according to the training sample images; calculating a loss of the training sample image based on the training sample depth map and the estimated depth map of the training sample image; and optimizing parameters of the neural network based on the calculated losses, wherein the losses include a loss term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map.
According to one aspect of the present disclosure, there is provided a method for training a deep neural network, the method comprising the steps of: aiming at each training sample image in the training set, a depth neural network is used for generating a corresponding estimated depth image according to the training sample images; detecting at least one planar region from a training sample depth map of the training sample image; calculating a loss of the training sample image based on the training sample depth map and the estimated depth map of the training sample image; and optimizing parameters of the deep neural network based on the calculated loss; wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map; wherein at least one planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in at least one plane area; and wherein each of the at least one planar region satisfies: in the planar region, the average value of the absolute values of the second order gradient values is below a predetermined second order gradient threshold.
According to another aspect of the present invention, there is provided an apparatus for training a deep neural network, comprising: a depth map estimation unit configured to generate a corresponding estimated depth map from the training sample image using the depth neural network; a loss calculation unit configured to calculate a loss of the training sample image based on the training sample depth map of the training sample image and the estimated depth map; and a parameter optimization unit configured to optimize parameters of the neural network based on the calculated loss; wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map.
The trained depth neural network obtained by the method and the device can improve the accuracy of estimating the depth map under the condition of using a single input image.
Drawings
The present disclosure may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, along with the following detailed description. In the drawings:
FIG. 1 is an exemplary flowchart of a method for training a deep neural network, according to one embodiment of the present disclosure;
FIG. 2 is an exemplary flowchart of a method for training a deep neural network, according to another embodiment of the present disclosure;
FIG. 3 is an exemplary flowchart of a method for determining a depth map according to one embodiment of the present disclosure;
FIG. 4 is an exemplary block diagram of an apparatus for training a deep neural network, according to one embodiment of the present disclosure; and
fig. 5 is an exemplary block diagram of an information processing apparatus according to one embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, and that these decisions may vary from one implementation to another.
It should be noted here that, in order to avoid obscuring the present disclosure due to unnecessary details, only the device structures closely related to the scheme according to the present disclosure are shown in the drawings, and other details not greatly related to the present disclosure are omitted.
It is to be understood that the present disclosure is not limited to the described embodiments due to the following description with reference to the drawings. In this context, embodiments may be combined with each other, features replaced or borrowed between different embodiments, one or more features omitted in one embodiment, where possible.
A method of the present invention for training a deep neural network is described below with reference to fig. 1.
Fig. 1 is an exemplary flowchart of a method 100 for training a deep neural network, according to one embodiment of the present disclosure.
When a single image is input and an estimated depth map is obtained by using the depth neural network, the accuracy of the estimated depth map is affected by parameters of the depth neural network. Therefore, before the accurate estimated depth map is expected to be obtained by actually using the depth neural network, the depth neural network needs to be trained by using the training sample image so as to optimize the parameters of the depth neural network.
Before training the deep neural network, the deep neural network needs to be constructed, and initial parameters are set. Since the construction of the deep neural network is a conventional technique, it will not be described in detail herein.
The data input when training the deep neural network includes: the training sample image (IM (i), i=1, 2, … …) and the training sample depth map (DMt (i), i=1, 2, … …) corresponding to each training sample image. The training sample depth map herein may be acquired by a depth camera or computationally by other processing means. The training sample image and the training sample depth map form a training set.
Steps 101, 103 and 105 are performed for each training sample image in the training set.
At step 101, a depth map is estimated. Specifically, a corresponding estimated depth map DMe (i) is generated from the training sample image IM (i) using a depth neural network.
At step 103, the loss is calculated based on the planar region alignment. Specifically, a loss Lt (i) of the training sample image IM (i) is calculated based on the training sample depth map DMt (i) and the estimated depth map DMe (i) of the training sample image, wherein the loss comprises a loss term L (i) calculated based on a comparison of at least one planar region (Rs (i, j), j=1, 2, … …) in the training sample depth map DMt (i) and a corresponding region (Re (i, j), j=1, 2, … …) in the estimated depth map DMe (i).
It may be ensured that each training sample image has at least one planar area when the training set is entered. The training sample image may be skipped if no planar region is detected when the planar region is detected before step 103.
The planar area may be a floor, ceiling, road surface, facade of a building, etc.
At step 105, the parameters are optimized. Specifically, parameters of the deep neural network are optimized based on the calculated loss Lt (i). In general, the parameters of the deep neural network are optimized by minimizing the loss function, which is common to those skilled in the art and therefore not described in detail.
Parameters of the deep neural network may be optimized once for each batch of training sample images. The number of training sample images in a batch of training sample images is at least 1, preferably a plurality, e.g. 10 or 50. That is, the training set may be divided into a plurality of batches.
The detection of planar areas is described below.
If the planar region is not pre-labeled for each training sample image, the method 100 further includes detecting the planar region from each training sample depth map (DMt (i), i=1, 2, … …). The number of planar areas in the training sample depth map DMt (i) may be 1,2, 3 or more. The number threshold may be set, and when the number of detected planar areas in the training sample depth map DMt (i) reaches the number threshold, the continuation of the detection of the planar areas is stopped.
In the depth map, the depth of the plane area is uniformly changed, so the gradient of the depth is constant, and the second-order gradient of the depth is zero. This property can be used to detect planar regions in the training sample depth map.
Note that, the training sample depth map may have distortion and errors, so that the gradient of the actual depth changes around the constant of the overall gradient of the planar region, and the second-order gradient of the actual depth changes around zero.
Note that the gradient of depth may include a gradient in the x-direction and a gradient in the y-direction; the second order gradient includes taking the derivative of the gradient in the x direction about x, taking the derivative of the gradient in the y direction about y, taking the derivative of the gradient in the y direction about x, taking the derivative of the gradient in the y direction about y.
In one embodiment, the planar region is detected from the training sample depth map DMt (i) by: calculating gradient values of a plurality of pixels of the training sample depth map DMt (i), and determining a planar area based on the gradient values of the plurality of pixels; wherein each of the planar regions satisfies: in the planar region, a percentage of the number of pixels whose absolute value of a difference of the gradient value and a constant indicating an overall gradient of the planar region is lower than a first threshold value is higher than a first predetermined percentage. The gradient value of each pixel in the training sample depth map can be calculated, or only the gradient value of a part of pixels can be calculated according to a certain rule. The overall gradient of the planar region may be determined in a number of ways, for example by: performing preliminary clustering based on gradient change to obtain a plurality of candidate plane areas, and if the gradient change is smaller than a preset degree on a line segment with a preset length in one candidate plane area, calculating the gradient average value of each pixel point on the line segment, and taking the average value as the overall gradient of the candidate plane area; alternatively, a gradient probability distribution histogram (the ordinate is the percentage of the pixels whose gradient values are within the gradient range k, and the abscissa is the gradient) of the training sample depth map DMt (i) is determined, and the gradient value corresponding to the maximum point is used as the overall gradient of the candidate plane region. The planar region is screened from the candidate planar region. For example: assuming that the pixels of the candidate planar area are 10000 and the first predetermined percentage is 90%, if the absolute value of the difference between the gradient value in the x direction of 9010 pixels and the constant (x direction) indicating the overall gradient of the planar area is lower than the first threshold value, and the absolute value of the difference between the gradient value in the y direction of 9020 pixels and the constant (y direction) indicating the overall gradient of the planar area is lower than the first threshold value, the percentage of the number of pixels whose absolute value of the difference between the gradient value and the constant indicating the overall gradient of the planar area is lower than the first threshold value is 90.1% and greater than the first predetermined percentage is 90%, the candidate planar area is selected as the planar area. The percentage is calculated by taking the number of pixels for which the gradient is calculated as a denominator, for example, if the candidate plane area has 10000 pixels and the number of pixels for which the gradient is calculated is 5000, the percentage is calculated by taking 5000 as a denominator. Alternatively, it is also possible to define each of the planar areas to satisfy: in the planar region, an average value of absolute values of differences of gradient values and constants indicating an overall gradient of the planar region is below a predetermined gradient deviation threshold. For example, if gradient values are calculated for 8000 pixels in a candidate planar region, an arithmetic average of absolute values of differences between the gradient values and constants indicating the overall gradient of the planar region may be calculated, and if the arithmetic average is lower than a predetermined gradient deviation threshold, the candidate planar region is determined as the planar region. When calculating the average value, the calculation may be performed for all pixels in the candidate plane area, or may be performed for some pixels in the candidate plane area. The average value may be: arithmetic mean, geometric mean, or root mean square mean, etc.
In one embodiment, the planar region is detected from the training sample depth map by: calculating second order gradient values of a plurality of pixels of a training sample depth map, and determining a planar region based on the second order gradient values of the plurality of pixels; wherein each of the planar regions satisfies: in the planar region, the percentage of the number of pixels whose absolute value of the second order gradient value is smaller than the second threshold value is higher than a second predetermined percentage. The second-order gradient value of each pixel in the training sample depth map can be calculated, or only the second-order gradient value of part of pixels can be calculated according to a certain rule. In this embodiment, the percentage is calculated with the number of pixels for which the second order gradient is calculated as the denominator. Alternatively, it is also possible to define each of the planar areas to satisfy: in the planar region, the average value of the absolute values of the second order gradient values is below a predetermined second order gradient threshold. For example, if the second order gradient values are calculated for 8000 pixels in the candidate plane region, and an arithmetic average of absolute values of these second order gradient values is calculated, for example, if the arithmetic average is smaller than a predetermined second order gradient threshold value, the candidate plane region is determined as a plane region. The calculation of the average second-order gradient value may be performed for all pixels in the candidate plane region or may be performed for some pixels in the candidate plane region.
In one embodiment, the planar region is detected from the training sample depth map by: the planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image region corresponding to each cluster as one plane region in at least one plane region. The gradient value of each pixel in the training sample depth map can be calculated, or only the gradient value of a part of pixels can be calculated according to a certain rule. In addition to considering the depth gradient value of each pixel when clustering, in order to ensure that pixels in the same cluster are not far away from each other, position information of the pixels is also introduced. That is, in the clustering process, each pixel p uses a four-dimensional feature vector [. V. ] x D(p) ▽ y D(p) p x p y ] T Description is made. Wherein% x D (p) and ] y D (p) represents the gradient of the depth of the pixel p in the horizontal and vertical directions, respectively, (p) x ,p y ) Is the position coordinate of the pixel p in the image. All pixels in the training sample depth map may be clustered by, but not limited to, hierarchical clustering. When hierarchical clustering is performed, the minimum distance among the distances between the pixels in the cluster a and the pixels in the cluster B can be used as the distance between the cluster a and the cluster B, and if the minimum distance is smaller than a preset threshold value, the cluster a and the cluster B are combined into one cluster.
To facilitate subsequent loss terms calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map, the detected planar region may be post-processed (the planar region before processing may be referred to as a "candidate planar region"). For example: filling the voids in the planar region; if the number of pixels in one candidate planar area is smaller than the third threshold value, prohibiting the use of the candidate planar area as a planar area; the candidate planar region is subjected to an etching operation (e.g., removing an annular region of the peripheral edge of the candidate planar region). One or more of the foregoing post-treatments may be performed on the candidate planar regions.
After determining at least one planar region Rs (i, j) of the training depth map DMt (i), a local loss can be calculated for that planar region. The calculation of the loss is described below.
In the prior art, the absolute value of the difference between the estimated depth of each pixel of the estimated depth map of each training image and the depth of the corresponding pixel of the training depth map is typically calculated when calculating the loss function. The invention further increases the loss term calculated based on the comparison of the planar region in the training sample depth map and the corresponding region in the estimated depth map according to the preset combination coefficient.
The calculated loss term L (i) includes the local loss L (i, j) for the planar region Rs (i, j). The loss term L (i) may be an accumulation of local loss L (i, j) with respect to j.
In one embodiment, the calculated loss term includes a local loss L (i, j) for the planar region Rs (i, j) determined by: determining a plane parameter of a corresponding plane corresponding to the plane region Rs (i, j) in the training sample depth map DMt (i), and calculating a sum or average value of distances between a first predetermined number of three-dimensional points and the corresponding plane in the corresponding region Re (i, j) corresponding to the plane region Rs (i, j) in the estimated depth map DMe (i) based on the plane parameter as a local loss L (i, j). If an average is used as the local loss, the first predetermined number may be different for different training sample images.
In one embodiment, the calculated loss term includes a local loss L (i, j) for the planar region Rs (i, j) determined by: determining the normal of the corresponding plane corresponding to the planar region Rs (i, j) in the training sample depth map DMt (i), and calculating the sum or average of the absolute values of the inner products of the second predetermined number of vectors and the normal of the corresponding plane in the corresponding region Re (i, j) corresponding to the planar region in the estimated depth map DMe (i) as the local loss. The length of the vector is preferably uniform. If not uniform, the absolute values may be normalized by length and then summed or averaged. The length of the normal is preferably unit length and if not unit length, the absolute value needs to be normalized by the normal length before summing or averaging. The vector may be determined by: in the corresponding region Re (i, j), two three-dimensional points calculated based on the estimated depth are randomly selected, and one three-dimensional point is used as a start point of the vector and the other three-dimensional point is used as an end point of the vector.
In one embodiment, the calculated loss term includes a local loss L (i, j) for the planar region Rs (i, j) determined by: the sum or average of the absolute values of the second order gradients of the third predetermined number of pixels in the corresponding region Re (i, j) corresponding to the planar region Rs (i, j) in the estimated depth map DMe (i) is calculated as the local loss L (i, j).
Fig. 2 is an exemplary flowchart of a method 200 for training a deep neural network according to another embodiment of the present disclosure. The method 200 is described below with reference to fig. 2.
After the deep neural network is constructed and a training set is obtained that includes the training sample image IM (i) and the training sample depth map DMt (i), execution of the method 200 may begin (i=1, 2, … …).
For each training sample image in the training set, steps 201, 203, 205, 207 are performed.
At step 201, a depth map is estimated. Specifically, a corresponding estimated depth map DMe (i) is generated from the training sample image IM (i) using a depth neural network.
At step 203, a planar region is detected. Specifically, the planar region Rs (i, j) is detected from the training sample depth map DMt (i) by: and calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting connected domains in an image area corresponding to each cluster as a plane area. The number of planar areas is at least 1. Wherein each of the planar regions satisfies: in the planar region, the average value of the absolute values of the second order gradient values is below a predetermined second order gradient threshold. If the obtained planar area after clustering does not meet the requirement of the average second-order gradient value, discarding the planar area, namely, when the loss is calculated in the follow-up process, not comparing the calculation loss based on the planar area. Further, hierarchical clustering may be employed for clustering. The method 200 may also include the post-processing described previously.
At step 205, the loss is calculated based on the planar region contrast. Specifically, a loss Lt (i) of the training sample image IM (i) is calculated based on the training sample depth map DMt (i) and the estimated depth map DMe (i) of the training sample image, wherein the loss comprises a loss term L (i) calculated based on a comparison of at least one planar region (Rs (i, j), j=1, 2, … …) in the training sample depth map DMt (i) and a corresponding region (Re (i, j), j=1, 2, … …) in the estimated depth map DMe (i).
At step 207, the parameters are optimized. Specifically, based on the calculated loss L 0 (i) Parameters of the deep neural network are optimized. Typically, parameters of the deep neural network are optimized by minimizing a loss function.
Parameters of the deep neural network may be optimized once for a batch of training sample images.
Steps 201 and 203 may be performed in parallel or sequentially. Step 201 may be advanced, or step 203 may be advanced.
The method for determining a depth map using the neural network of the present invention is described below.
Fig. 3 is an exemplary flowchart of a method 300 for determining a depth map according to one embodiment of the present disclosure. In step 301, a deep neural network is trained. Specifically, in training, a loss term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map is considered. An example of a training method may be the aforementioned method 100 or 200. In step 303, a depth map is determined based on the input single image. Specifically, the trained depth neural network can determine a corresponding depth map based on a single image; thus, the method 300 is able to determine a depth map based on a single image and the accuracy of the depth map is improved because additional constraints on the loss by planar regions are taken into account during training.
The apparatus for training a deep neural network of the present invention is described below.
Fig. 4 is an exemplary block diagram of an apparatus 400 for training a deep neural network according to one embodiment of the present disclosure. The apparatus 400 comprises: a depth map estimation unit 401, a loss calculation unit 403, and a parameter optimization unit 405. The depth map estimation unit 401 is configured to generate a corresponding estimated depth map from the training sample images using the depth neural network. The loss calculation unit 403 is configured to calculate a loss of the training sample image based on the training sample depth map and the estimated depth map of the training sample image. The parameter optimization unit 405 is configured to optimize parameters of the neural network based on the calculated loss. Wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map. Referring to the method 100, the apparatus 400 may further comprise a planar region detection unit configured to detect at least one planar region from the training sample depth map. The planar region detection unit may be configured to detect at least one planar region from the training sample depth map in a plurality of ways, for example, calculate gradient values of a plurality of pixels of the training sample depth map, cluster based on the gradient values and positions of the plurality of pixels to obtain at least one cluster, and extract a connected region in an image region corresponding to each cluster as one planar region in the at least one planar region.
In one embodiment, the present disclosure also provides a storage medium. The storage medium has stored thereon program code readable by an information processing device, which when executed on the information processing device causes the information processing device to perform the above-described method according to the present invention (including a method for training a deep neural network and a method for determining a depth map). Storage media include, but are not limited to, floppy diskettes, compact discs, magneto-optical discs, memory cards, memory sticks, and the like.
Fig. 5 is an exemplary block diagram of an information processing apparatus 500 according to one embodiment of the present disclosure.
In fig. 5, a Central Processing Unit (CPU) 501 performs various processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 to a Random Access Memory (RAM) 503. The RAM 503 also stores data and the like necessary when the CPU 501 executes various processes, as necessary.
The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output interface 505 is also connected to the bus 504.
The following components are connected to the input/output interface 505: an input section 506 including a soft keyboard or the like; an output portion 507 including a display such as a Liquid Crystal Display (LCD), a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet or a local area network.
The drive 510 is also connected to the input/output interface 505 as needed. A removable medium 511 such as a semiconductor memory or the like is installed on the drive 510 as needed, so that a computer program read therefrom is installed to the storage section 508 as needed.
The CPU 501 may run the program code of the aforementioned method for determining a depth map or the program code of the method for training a depth neural network.
The method for training the deep neural network, the method for determining the depth map and the device thereof have at least the following beneficial effects: the trained depth neural network obtained by the method or the device can improve the accuracy of estimating the depth map under the condition of using a single input image.
While the invention has been disclosed in the context of specific embodiments thereof, it will be appreciated that those skilled in the art may devise various modifications, including combinations and substitutions of features between embodiments, as appropriate, within the spirit and scope of the appended claims. Such modifications, improvements, or equivalents are intended to be included within the scope of this invention.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
Furthermore, the methods of the embodiments of the present invention are not limited to being performed in the temporal order described in the specification or shown in the drawings, but may be performed in other temporal orders, in parallel, or independently. Therefore, the order of execution of the methods described in the present specification does not limit the technical scope of the present invention.
Additional note
1. A method for training a deep neural network, the method comprising the steps of:
for each training sample image in the training set,
generating a corresponding estimated depth map according to the training sample image by using the depth neural network;
calculating a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
optimizing parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map.
2. The method of appendix 1, further comprising detecting the at least one planar region from the training sample depth map.
3. The method of supplementary note 2, wherein the at least one planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, and determining the at least one planar region based on the gradient values of the plurality of pixels;
wherein each of the at least one planar region satisfies: in the planar region, a percentage ratio of the number of pixels whose absolute value of the difference of the gradient value and the constant indicating the overall gradient of the planar region is lower than a first threshold value is higher than a first predetermined percentage, or an average value of the absolute value of the difference of the gradient value and the constant indicating the overall gradient of the planar region is lower than a predetermined gradient deviation threshold value.
4. The method of supplementary note 2, wherein the at least one planar region is detected from the training sample depth map by: calculating second order gradient values of a plurality of pixels of the training sample depth map, and determining the at least one planar region based on the second order gradient values of the plurality of pixels;
wherein each of the at least one planar region satisfies: in the planar region, the percentage of the number of pixels whose absolute value of the second order gradient value is smaller than the second threshold value is higher than a second predetermined percentage, or the average value of the absolute values of the second order gradient values is lower than a predetermined second order gradient threshold value.
5. The method of supplementary note 2, wherein the at least one planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in the at least one plane area.
6. The method according to one of supplementary notes 3 to 5, wherein the calculated loss term comprises a local loss for one of the at least one planar area determined by: and determining a plane parameter of a corresponding plane corresponding to the plane region in the training sample depth map, and calculating the sum or average value of the distances between a first preset number of three-dimensional points in the corresponding region corresponding to the plane region in the estimated depth map and the corresponding plane as the local loss based on the plane parameter.
7. The method according to one of supplementary notes 3 to 5, wherein the calculated loss term comprises a local loss for one of the at least one planar area determined by: and determining the normal line of a corresponding plane corresponding to the plane area in the training sample depth map, and calculating the sum or average value of the absolute value of the inner product of a second preset number of vectors in the corresponding area corresponding to the plane area in the estimated depth map and the normal line of the corresponding plane as the local loss.
8. The method according to one of supplementary notes 3 to 5, wherein the calculated loss term comprises a local loss for one of the at least one planar area determined by: and calculating the sum or average value of the absolute values of the second-order gradients of the third predetermined number of pixels in the corresponding region corresponding to the plane region in the estimated depth map as the local loss.
9. The method of one of supplementary notes 3 to 5, wherein determining at least one planar area in the estimated depth map further comprises: filling the void in one of the at least one planar region.
10. The method according to one of supplementary notes 3 to 5, wherein in determining at least one planar area in the estimated depth map, if the number of pixels in a candidate planar area of the at least one planar area is smaller than a third threshold value, the candidate planar area is prohibited from being used as one of the at least one planar area.
11. The method according to one of supplementary notes 3 to 5, wherein, when determining at least one planar region in the estimated depth map, a candidate planar region of the at least one planar region is eroded, and the eroded candidate planar region is taken as one of the at least one planar region.
12. The method of supplementary note 3, wherein the gradient values include a horizontal gradient value and a vertical gradient value.
13. The method of supplementary note 5, wherein the at least one cluster is obtained by performing hierarchical clustering.
14. The method of appendix 1, the penalty further comprising a sum or average of absolute values of differences in depths of the estimated depth map of the training sample image and pixels of the training sample depth map.
15. A method for training a deep neural network, the method comprising the steps of:
for each training sample image in the training set,
generating a corresponding estimated depth map according to the training sample image by using the depth neural network;
detecting at least one planar region from a training sample depth map of the training sample image;
calculating a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
optimizing parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of the at least one planar region in the training sample depth map and a corresponding region in the estimated depth map;
wherein the at least one planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in the at least one plane area; and is also provided with
Wherein each of the at least one planar region satisfies: in the planar region, the average value of the absolute values of the second order gradient values is below a predetermined second order gradient threshold.
16. The method of claim 15, further comprising post-processing of at least one of:
filling a void in one of the at least one planar region;
in determining at least one planar region in the estimated depth map, if the number of pixels in a candidate planar region of the at least one planar region is less than a third threshold, prohibiting the use of the candidate planar region as one planar region of the at least one planar region; and
and corroding the candidate plane area of the at least one plane area when determining the at least one plane area in the estimated depth map, and taking the corroded candidate plane area as one plane area in the at least one plane area.
17. The method of claim 15, wherein the at least one cluster is obtained by hierarchical clustering.
18. An apparatus for training a deep neural network, comprising:
a depth map estimation unit configured to generate a corresponding estimated depth map from training sample images using the depth neural network;
a loss calculation unit configured to calculate a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
a parameter optimization unit configured to optimize parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map.
19. The apparatus of supplementary note 18, further comprising a planar region detection unit configured to detect the at least one planar region from the training sample depth map.
20. The apparatus of appendix 19, wherein the planar region detection unit is further configured to detect the at least one planar region from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in the at least one plane area.

Claims (10)

1. A method for training a deep neural network of predictive depth maps, the method comprising the steps of:
for each training sample image in the training set,
generating a corresponding estimated depth map according to the training sample image by using the depth neural network;
calculating a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
optimizing parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map; and is also provided with
Additional constraints related to the at least one planar region are considered in determining the penalty term.
2. The method of claim 1, further comprising detecting the at least one planar region from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, and determining the at least one planar region based on the gradient values of the plurality of pixels;
wherein each of the at least one planar region satisfies: in the planar region, a percentage ratio of the number of pixels whose absolute value of the difference of the gradient value and the constant indicating the overall gradient of the planar region is lower than a first threshold value is higher than a first predetermined percentage, or an average value of the absolute value of the difference of the gradient value and the constant indicating the overall gradient of the planar region is lower than a predetermined gradient deviation threshold value.
3. The method of claim 1, further comprising detecting the at least one planar region from the training sample depth map by: calculating second order gradient values of a plurality of pixels of the training sample depth map, and determining the at least one planar region based on the second order gradient values of the plurality of pixels;
wherein each of the at least one planar region satisfies: in the planar region, the percentage of the number of pixels whose absolute value of the second order gradient value is smaller than the second threshold value is higher than a second predetermined percentage, or the average value of the absolute values of the second order gradient values is lower than a predetermined second order gradient threshold value.
4. The method of claim 1, further comprising detecting the at least one planar region from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in the at least one plane area.
5. The method of one of claims 2 to 4, wherein the calculated loss term comprises a local loss for one of the at least one planar region determined by: and determining a plane parameter of a corresponding plane corresponding to the plane region in the training sample depth map, and calculating the sum or average value of the distances between a first preset number of three-dimensional points in the corresponding region corresponding to the plane region in the estimated depth map and the corresponding plane as the local loss based on the plane parameter.
6. The method of one of claims 2 to 4, wherein the calculated loss term comprises a local loss for one of the at least one planar region determined by: and determining the normal line of a corresponding plane corresponding to the plane area in the training sample depth map, and calculating the sum or average value of the absolute value of the inner product of a second preset number of vectors in the corresponding area corresponding to the plane area in the estimated depth map and the normal line of the corresponding plane as the local loss.
7. The method of one of claims 2 to 4, wherein the calculated loss term comprises a local loss for one of the at least one planar region determined by: and calculating the sum or average value of the absolute values of the second-order gradients of the third predetermined number of pixels in the corresponding region corresponding to the plane region in the estimated depth map as the local loss.
8. A method for training a deep neural network of predictive depth maps, the method comprising the steps of:
for each training sample image in the training set,
generating a corresponding estimated depth map according to the training sample image by using the depth neural network;
detecting at least one planar region from a training sample depth map of the training sample image;
calculating a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
optimizing parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of the at least one planar region in the training sample depth map and a corresponding region in the estimated depth map;
wherein the at least one planar region is detected from the training sample depth map by: calculating gradient values of a plurality of pixels of the training sample depth map, clustering based on the gradient values and the positions of the plurality of pixels to obtain at least one cluster, and extracting a connected domain in an image area corresponding to each cluster as one plane area in the at least one plane area;
wherein each of the at least one planar region satisfies: in the planar region, an average value of absolute values of the second-order gradient values is below a predetermined second-order gradient threshold; and is also provided with
Additional constraints related to the at least one planar region are considered in determining the penalty term.
9. The method of claim 8, wherein the at least one cluster is obtained by performing hierarchical clustering.
10. An apparatus for training a depth neural network of a predicted depth map, comprising:
a depth map estimation unit configured to generate a corresponding estimated depth map from training sample images using the depth neural network;
a loss calculation unit configured to calculate a loss of the training sample image based on a training sample depth map of the training sample image and the estimated depth map; and
a parameter optimization unit configured to optimize parameters of the deep neural network based on the calculated loss;
wherein the penalty comprises a penalty term calculated based on a comparison of at least one planar region in the training sample depth map and a corresponding region in the estimated depth map; and is also provided with
Additional constraints related to the at least one planar region are considered in determining the penalty term.
CN201810844262.4A 2018-07-27 2018-07-27 Method and apparatus for training deep neural networks Active CN110766152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810844262.4A CN110766152B (en) 2018-07-27 2018-07-27 Method and apparatus for training deep neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810844262.4A CN110766152B (en) 2018-07-27 2018-07-27 Method and apparatus for training deep neural networks

Publications (2)

Publication Number Publication Date
CN110766152A CN110766152A (en) 2020-02-07
CN110766152B true CN110766152B (en) 2023-08-04

Family

ID=69327293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810844262.4A Active CN110766152B (en) 2018-07-27 2018-07-27 Method and apparatus for training deep neural networks

Country Status (1)

Country Link
CN (1) CN110766152B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781882B1 (en) * 2008-08-07 2014-07-15 Accenture Global Services Limited Automotive industry high performance capability assessment
CN105488515A (en) * 2014-09-17 2016-04-13 富士通株式会社 Method for training convolutional neural network classifier and image processing device
CN106096538A (en) * 2016-06-08 2016-11-09 中国科学院自动化研究所 Face identification method based on sequencing neural network model and device
CN106599797A (en) * 2016-11-24 2017-04-26 北京航空航天大学 Infrared face identification method based on local parallel nerve network
CN106897390A (en) * 2017-01-24 2017-06-27 北京大学 Target precise search method based on depth measure study
WO2017114810A1 (en) * 2015-12-31 2017-07-06 Vito Nv Methods, controllers and systems for the control of distribution systems using a neural network arhcitecture
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN107208478A (en) * 2015-02-20 2017-09-26 哈里伯顿能源服务公司 The classification of grain graininess and distribution of shapes in drilling fluid
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107229918A (en) * 2017-05-26 2017-10-03 西安电子科技大学 A kind of SAR image object detection method based on full convolutional neural networks
CN107240119A (en) * 2017-04-19 2017-10-10 北京航空航天大学 Utilize the method for improving the fuzzy clustering algorithm extraction uneven infrared pedestrian of gray scale
CN107403415A (en) * 2017-07-21 2017-11-28 深圳大学 Compression depth plot quality Enhancement Method and device based on full convolutional neural networks
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
WO2018000752A1 (en) * 2016-06-27 2018-01-04 浙江工商大学 Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN107610137A (en) * 2017-09-27 2018-01-19 武汉大学 A kind of high-resolution remote sensing image optimal cut part method
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108304859A (en) * 2017-12-29 2018-07-20 达闼科技(北京)有限公司 Image-recognizing method and cloud system
US10032256B1 (en) * 2016-11-18 2018-07-24 The Florida State University Research Foundation, Inc. System and method for image processing using automatically estimated tuning parameters

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9363498B2 (en) * 2011-11-11 2016-06-07 Texas Instruments Incorporated Method, system and computer program product for adjusting a convergence plane of a stereoscopic image
US9483728B2 (en) * 2013-12-06 2016-11-01 International Business Machines Corporation Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks
EP3234871B1 (en) * 2014-12-17 2020-11-25 Google LLC Generating numeric embeddings of images
US10410118B2 (en) * 2015-03-13 2019-09-10 Deep Genomics Incorporated System and method for training neural networks
US10037592B2 (en) * 2015-06-05 2018-07-31 Mindaptiv LLC Digital quaternion logarithm signal processing system and method for images and other data types
US10572800B2 (en) * 2016-02-05 2020-02-25 Nec Corporation Accelerating deep neural network training with inconsistent stochastic gradient descent
CN106295678B (en) * 2016-07-27 2020-03-06 北京旷视科技有限公司 Neural network training and constructing method and device and target detection method and device
US11068781B2 (en) * 2016-10-07 2021-07-20 Nvidia Corporation Temporal ensembling for semi-supervised learning

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781882B1 (en) * 2008-08-07 2014-07-15 Accenture Global Services Limited Automotive industry high performance capability assessment
CN105488515A (en) * 2014-09-17 2016-04-13 富士通株式会社 Method for training convolutional neural network classifier and image processing device
CN107208478A (en) * 2015-02-20 2017-09-26 哈里伯顿能源服务公司 The classification of grain graininess and distribution of shapes in drilling fluid
WO2017114810A1 (en) * 2015-12-31 2017-07-06 Vito Nv Methods, controllers and systems for the control of distribution systems using a neural network arhcitecture
CN106096538A (en) * 2016-06-08 2016-11-09 中国科学院自动化研究所 Face identification method based on sequencing neural network model and device
WO2018000752A1 (en) * 2016-06-27 2018-01-04 浙江工商大学 Monocular image depth estimation method based on multi-scale cnn and continuous crf
US10032256B1 (en) * 2016-11-18 2018-07-24 The Florida State University Research Foundation, Inc. System and method for image processing using automatically estimated tuning parameters
CN106599797A (en) * 2016-11-24 2017-04-26 北京航空航天大学 Infrared face identification method based on local parallel nerve network
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN106897390A (en) * 2017-01-24 2017-06-27 北京大学 Target precise search method based on depth measure study
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN107240119A (en) * 2017-04-19 2017-10-10 北京航空航天大学 Utilize the method for improving the fuzzy clustering algorithm extraction uneven infrared pedestrian of gray scale
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107229918A (en) * 2017-05-26 2017-10-03 西安电子科技大学 A kind of SAR image object detection method based on full convolutional neural networks
CN107403415A (en) * 2017-07-21 2017-11-28 深圳大学 Compression depth plot quality Enhancement Method and device based on full convolutional neural networks
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
CN107610137A (en) * 2017-09-27 2018-01-19 武汉大学 A kind of high-resolution remote sensing image optimal cut part method
CN108304859A (en) * 2017-12-29 2018-07-20 达闼科技(北京)有限公司 Image-recognizing method and cloud system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度卷积神经网络的双目立体视觉匹配算法;肖进胜;田红;邹文涛;童乐;雷俊锋;;光学学报(第08期);全文 *

Also Published As

Publication number Publication date
CN110766152A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
Hoang et al. Metaheuristic optimized edge detection for recognition of concrete wall cracks: a comparative study on the performances of roberts, prewitt, canny, and sobel algorithms
CN109447154B (en) Picture similarity detection method, device, medium and electronic equipment
CN114418957A (en) Global and local binary pattern image crack segmentation method based on robot vision
US20170270664A1 (en) Methods for characterizing features of interest in digital images and systems for practicing same
CN108230292B (en) Object detection method, neural network training method, device and electronic equipment
US20070173744A1 (en) System and method for detecting intervertebral disc alignment using vertebrae segmentation
CN103679743A (en) Target tracking device and method as well as camera
Yang et al. An accurate mura defect vision inspection method using outlier-prejudging-based image background construction and region-gradient-based level set
CN111444807B (en) Target detection method, device, electronic equipment and computer readable medium
US10332244B2 (en) Methods and apparatuses for estimating an ambiguity of an image
Liang et al. An algorithm for concrete crack extraction and identification based on machine vision
Chiverton et al. Automatic bootstrapping and tracking of object contours
CN116740072B (en) Road surface defect detection method and system based on machine vision
CN117392464B (en) Image anomaly detection method and system based on multi-scale denoising probability model
CN112102202A (en) Image segmentation method and image processing device
JP6260113B2 (en) Edge extraction method and equipment
CN107808165B (en) Infrared image matching method based on SUSAN corner detection
CN110245600A (en) Adaptively originate quick stroke width unmanned plane Approach for road detection
CN115457044A (en) Pavement crack segmentation method based on class activation mapping
KR102330263B1 (en) Method and apparatus for detecting nuclear region using artificial neural network
Kumar et al. Histogram thresholding in image segmentation: a joint level set method and lattice boltzmann method based approach
Adu-Gyamfi et al. Functional evaluation of pavement condition using a complete vision system
CN110766152B (en) Method and apparatus for training deep neural networks
CN103235950A (en) Target detection image processing method
Wang et al. Comparison and Analysis of Several Clustering Algorithms for Pavement Crack Segmentation Guided by Computational Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant