CN106803071A

CN106803071A - Object detecting method and device in a kind of image

Info

Publication number: CN106803071A
Application number: CN201611249792.1A
Authority: CN
Inventors: 杨松林
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-06-06
Anticipated expiration: 2036-12-29
Also published as: CN106803071B

Abstract

The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the real-time of target detection.According to default dividing mode in the method, image to be detected is divided into multiple grids, image after division is input in the convolutional neural networks of training in advance completion, obtain the corresponding characteristic vector of each grid of the described image of convolutional neural networks output, recognize the maximum of classification parameter in each characteristic vector, when the maximum is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, the positional information of the object of the corresponding classification of category parameter is determined.Due to the convolutional neural networks completed by training in advance in the embodiment of the present invention, determine classification and the position of object in image, the detection of object space and classification can simultaneously be realized, without the multiple characteristic areas of selection, save the time of detection, the real-time of detection and the efficiency of detection are improve, and is easy to global optimization.

Description

Object detecting method and device in a kind of image

Technical field

The present invention relates to machine learning techniques field, object detecting method and device in more particularly to a kind of image.

Background technology

With the development of Video Supervision Technique, intelligent video monitoring is applied in increasing scene, such as traffic, business Field, hospital, cell, park etc., the application of intelligent video monitoring by image in various scenes, to carry out target detection and establish Basis.

It is general using the convolutional neural networks based on candidate region when prior art carries out target detection in the picture (Region Convolutional Neural Network, R-CNN) and its extend Fast RCNN and FasterRCNN.Fig. 1 It is the schematic flow sheet that object detection is carried out using R-CNN, its detection process includes：Input picture is received, is extracted in the picture Candidate region (region proposal), calculates the CNN features of each candidate region, and true using classification and the method for returning The type of earnest body and position., it is necessary to extract 2000 candidate regions in the picture in said process, the whole process extracted The time of time-consuming 1~2s is needed, then for each candidate region, it is necessary to calculate the CNN features of the candidate region, and candidate regions There are many in domain in the presence of overlap, therefore can also there are many repeated works when CNN features are calculated；And this was detected Also include subsequent step in journey：The feature learning of proposal, and the position of pair object for determining is corrected and eliminates void The treatment such as alert, whole detection process may need the time of 2~40s, leverage the real-time of object detection.

In addition, during carrying out object detection using R-CNN, the extraction of image is detected using conspicuousness (selective search) is extracted, and calculates CNN features using convolutional neural networks afterwards, finally reuses SVMs Model (SVM) is classified, so that it is determined that the position of target.And above three step is all separate method, can not Global optimization is carried out to whole detection process.

Fig. 2 is the process schematic that object detection is carried out using Faster RCNN, and the process is entered using convolutional neural networks OK, each sliding window will generate one 256 data of dimension at intermediate layer (intermediate layer), in classification layer (cls Layer) the classification of detection object, is returning the position of layer (reg layer) detection object.Above-mentioned classification and position to object Detection be two independent steps, be required for being detected respectively for the data of 256 dimensions in two steps, therefore the process Also will growth detection duration, so as to influence the real-time of object detection.

The content of the invention

The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the reality of object detection Shi Xing, and be easy to carry out global optimization to object detection.

To reach above-mentioned purpose, the embodiment of the invention discloses the object detecting method in kind of image, it is applied to electronics and sets Standby, the method includes：

According to default dividing mode, image to be detected is divided into multiple grids, wherein the image to be detected Size be target size；

Image after division is input in the convolutional neural networks of training in advance completion, convolutional neural networks output is obtained Described image multiple characteristic vectors, wherein each grid correspondence one characteristic vector；

For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when it is described most When big value is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, the category is determined The positional information of the object of parameter correspondence classification.

Further, it is described according to default dividing mode, it is described before image to be detected is divided into multiple grids Method also includes：

Whether the size for judging described image is target size；

If not, being target size by the size adjusting of described image.

Further, the training process of the convolutional neural networks includes：

For each sample image in sample image set, using rectangle frame label target object；

Each sample image is divided into multiple grids according to default dividing mode, the corresponding feature of each grid is determined Vector, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to The classification of the target object, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set in advance The maximum of setting, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector Value, and according to mark the target object rectangle frame size, determine the appearance and size parameter in the characteristic vector Value, when in grid not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid；

According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are trained.

Further, it is described each sample image is divided into multiple grids according to default dividing mode before, it is described Method also includes：

For each sample image, whether the size for judging the sample image is target size；

If not, being target size by the size adjusting of the sample image.

Further, the basis determines each sample image of the characteristic vector of each grid, to convolutional Neural net Network be trained including：

Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than The quantity of sample image in the sample image set；

Using each the described subsample image chosen, convolutional neural networks are trained.

Further, the default dividing mode includes：

Image and sample image are divided into line number amount and number of columns identical multiple grid；Or,

Image and sample image are divided into multiple grids that line number amount and number of columns are differed.

Further, methods described also includes：

Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample The information of the target object marked in image, determines the error of the convolutional neural networks；

When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage Function is lost to determine：

Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, x_iFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, y_iFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, h_iThe square where the target object of mark The height of shape frame, w_iThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of prediction_iIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, P_iC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjTo set The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,Do not exist the central point of object in the grid i of prediction When take 1, otherwise take 0, wherein,Determined according to below equation：

P_r(Object) be prediction grid i currently with the presence or absence of object probability, P_r(Class | Object) it is prediction Grid i in object belong to the conditional probability of classification c.

Further, it is described according to center position parameter in the characteristic vector and appearance and size parameter, determine such The positional information of the object of other parameter correspondence classification includes：

According to the location parameter of the central point, positional information of the central point in the grid is determined；

The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.

Further, the location parameter according to the central point, determines position of the central point in the grid Confidence breath includes：

The set point of the grid is as a reference point；According to the reference point and the location parameter of the central point, really Fixed positional information of the central point in the grid.

The embodiment of the invention discloses the article detection device in a kind of image, described device includes：

Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described The size of image to be detected is target size；

Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains volume Multiple characteristic vectors of the described image of product neutral net output, wherein one characteristic vector of each grid correspondence；

Determining module, for for the corresponding characteristic vector of each grid, in identification this feature vector, classification parameter is most Big value, when the maximum is more than given threshold, joins according to center position parameter in the characteristic vector and appearance and size Number, determines the positional information of the object of category parameter correspondence classification.

Further, described device also includes：

Judge adjusting module, whether the size for judging described image is target size；If not, by described image Size adjusting is target size.

Further, described device also includes：

Training module, for for each sample image in sample image set, using rectangle frame label target object； Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined, its In, each described sample image size is target size, when the central point of target object is included in grid, according to the target The classification of object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid be set to it is set in advance most Big value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and root According to the size of the rectangle frame of the target object of mark, the value of the appearance and size parameter in the characteristic vector is determined, work as net When not including the central point of target object in lattice, the value of each parameter is zero in the corresponding characteristic vector of the grid；According to determining Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.

Further, the training module, is additionally operable to for each sample image, judge the sample image size whether It is target size；If not, being target size by the size adjusting of the sample image.

Further, the training module, specifically for choosing subsample image in the sample image set, wherein Quantity of the quantity of the subsample image chosen less than sample image in the sample image set；Using each chosen Convolutional neural networks are trained by the subsample image.

Further, described device also includes：

Error calculating module, for the position according to the convolutional neural networks to the subsample objects in images and class The target object marked in other prediction, and subsample image, determines the error of the convolutional neural networks；

Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, x_iFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, y_iFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, h_iThe square where the target object of mark The height of shape frame, w_iThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of prediction_iIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, P_iC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object 1,0 is otherwise taken, wherein,Determined according to below equation：

Further, the determining module, specifically for the location parameter according to the central point, determines the central point Positional information in the grid；

Further, the determining module, specifically for the set point of the grid is as a reference point；According to the ginseng The location parameter of examination point and the central point, determines positional information of the central point in the grid.

The object detecting method and device in a kind of image are the embodiment of the invention provides, according to default stroke in the method The mode of dividing, multiple grids are divided into by image to be detected, and the wherein size of described image is target size, by the figure after division As being input in the convolutional neural networks of training in advance completion, multiple features of the described image of convolutional neural networks output are obtained Vector, wherein each grid one characteristic vector of correspondence, recognize the maximum of classification parameter in each characteristic vector, when the maximum When value is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, category parameter pair is determined The positional information of the object of the classification answered.Due in the embodiment of the present invention by training in advance complete convolutional neural networks, really Fixed corresponding each characteristic vector of the image, classification parameter and location-dependent parameters in characteristic vector, in determining image Object classification and position, can simultaneously realize the detection of object space and classification, be easy to global optimization, additionally, due to basis The corresponding characteristic vector of each grid, determines position and the classification of object, without the multiple characteristic areas of selection, saves detection Time, improve the real-time of detection and the efficiency of detection.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is the schematic flow sheet that object detection is carried out using R-CNN；

Fig. 2 is the process schematic that object detection is carried out using Faster RCNN；

Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention；

Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention；

Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention；

Fig. 6 A- Fig. 6 D are the annotation results schematic diagram of target object provided in an embodiment of the present invention；

Fig. 7 is the construction process schematic diagram of cube structure in Fig. 6 D；

Fig. 8 is the structural representation of the article detection device in a kind of image provided in an embodiment of the present invention.

Specific embodiment

In order to effectively improve the efficiency of object detection, the real-time of object detection is improved, is easy to object detection global optimization, The embodiment of the invention provides the object detecting method and device in a kind of image.

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention, and the process includes following Step：

Step S301：According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked The size of the image of survey is target size.

The embodiment of the present invention is applied to electronic equipment, and the specific electronic equipment can be desktop computer, notebook, other tools There is smart machine of disposal ability etc..

After getting the image to be detected of target size, according to default dividing mode, image to be detected is divided It is multiple grids, when wherein the default dividing mode is trained with convolutional neural networks, the dividing mode to image is identical.For example For convenience, it by image is multiple rows, Duo Gelie that can be, the interval between interval and row between row is equal, or. Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out convolutional neural networks The image of training uses identical mesh generation mode.

When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into Than can be with identical, it is also possible to different.

Step S302：Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.

For the classification of object and position in detection image, convolutional neural networks are instructed in embodiments of the present invention Practice, the corresponding characteristic vector of each grid is obtained by training the convolutional neural networks for completing, for example, can divide an image into 49 grids of 7*7, then after the image after division being input into the convolutional neural networks that training is completed, can export 49 features Vector, each characteristic vector one grid of correspondence.

Step S303：For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really Determine the positional information of the object of category parameter correspondence classification.

Specifically, the characteristic vector that the embodiment of the present invention is obtained is multi-C vector, this feature vector at least includes：Classification is joined Number and location parameter, wherein classification parameter include multiple, and the location parameter includes again：Center position parameter and appearance and size are joined Number.After each grid corresponding characteristic vector is obtained, for the corresponding characteristic vector of each grid, the grid is judged respectively Whether object is detected.If the maximum in the corresponding characteristic vector of grid in multiple classification parameters is more than given threshold, To object, the corresponding classification of category parameter is the classification of the object to the grid detection, can be according to the corresponding spy of the grid Levy the position that vector determines the object.

Because the location parameter in the characteristic vector used when convolutional neural networks are trained is true according to the method for setting Fixed, therefore the position of object can be determined according to the method for the setting.

Due in the embodiment of the present invention by training in advance complete convolutional neural networks, determine the image it is corresponding each Characteristic vector, classification parameter and location-dependent parameters in characteristic vector determine classification and the position of object in image, The prediction of object space and classification can be simultaneously realized, is easy to global optimization, additionally, due to according to the corresponding feature of each grid Vector, determines position and the classification of object, without the multiple characteristic areas of selection, saves the time of detection, improves detection Real-time and the efficiency of detection.

Object detection in the embodiment of the present invention is directed to the detection that the image of target size is carried out, and the target size is volume When product neutral net is trained, the uniform sizes of the image of use, the size can be arbitrary dimension, as long as convolutional Neural net The size of image when during network training with object detection is identical.The target size for example can be 1024*1024, or, Can be 256*512 etc..

Therefore, in embodiments of the present invention, all it is target size to ensure the image being input in convolutional neural networks Image, according to default dividing mode, before image to be detected is divided into multiple grids, methods described also includes：

Whether the size for judging described image is target size；

If not, being target size by the size adjusting of described image.

When image to be detected is target size, subsequent treatment directly is carried out to the image, when image to be detected is non- During target size, by the Image Adjusting to be detected to target size.The adjustment of picture size belongs to prior art, in the present invention The process is not repeated in embodiment.

Specifically, being joined according to center position parameter in the characteristic vector and appearance and size in embodiments of the present invention Number, determining the positional information of the object of category parameter correspondence classification includes：

Wherein, the location parameter according to the central point, determines position letter of the central point in the grid Breath includes：

Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention, the mistake Journey is comprised the following steps：

Step S401：Receive image to be detected.

Step S402：Whether the size for judging described image is target size, if it is, carrying out step S404, otherwise, is entered Row step S403.

Step S403：It is target size by the size adjusting of described image.

Step S404：According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked The size of the image of survey is target size.

Step S405：Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.

Step S406：For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector.

Step S407：It is when the maximum is more than given threshold, the set point of the grid is as a reference point, according to The location parameter of the reference point and the central point, determines positional information of the central point in the grid.

Step S408：The central point is determined according to the positional information, using the central point as the center of rectangle frame, According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.

Above-mentioned target detection is that the convolutional neural networks completed based on training are carried out, in order to realize the detection to object, Need to be trained convolutional neural networks.In the embodiment of the present invention when being trained to convolutional neural networks, by target chi Very little sample image is divided into multiple grids, if the central point of a certain target object is located in some grid, the grid It is responsible for detecting the target object, including the classification and corresponding position (bounding box) for detecting the target object.

Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention, and the process includes following step Suddenly：

Step S501：For each sample image in sample image set, using rectangle frame label target object.

Convolutional neural networks are trained using substantial amounts of sample image in embodiments of the present invention, then substantial amounts of sample Image construction sample image set.Using rectangle frame in each sample image label target object.

Specifically, the annotation results schematic diagram of the target object as shown in Fig. 6 A- Fig. 6 D, deposits in the sample image in Fig. 6 A Dog, bicycle and car are respectively in 3 target objects.When being labeled to each target object, respectively in sample graph Recognize each target object on the top of upper and lower, left and right (with respect to the upper and lower, left and right direction shown in Fig. 6 A) four direction as in Point, if the summit be upper and lower summit, by respectively through upper and lower summit parallel to bottom on sample image two lines As two sides of rectangle frame, if the summit is left and right summit, will be respectively through left and right summit parallel to sample image The two lines of left and right side as rectangle frame two other side.Such as the dog, bicycle and the car that are marked with dotted line in Fig. 6 A Rectangle frame.

Step S502：Each sample image is divided into multiple grids according to default dividing mode, each grid is determined Corresponding characteristic vector, wherein, each described sample image size is target size, when the center comprising target object in grid During point, according to the classification of the target object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid Maximum set in advance is set to, the position in the grid is located at according to the central point, determine center in the characteristic vector The value of point location parameter, and the rectangle frame according to the target object of mark size, determine outer in the characteristic vector The value of shape dimensional parameters, when the central point of target object is not included in grid, each parameter in the corresponding characteristic vector of the grid Value be zero.

Sample image can be divided into multiple grids according to default dividing mode in embodiments of the present invention, wherein The dividing mode of the sample image is identical to the dividing mode of image to be detected with above-mentioned detection process.

For example for convenience, it by image is multiple rows, Duo Gelie, the interval between interval and row between row that can be It is equal, or.Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out The image of convolutional neural networks training uses identical mesh generation mode.

When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into Than can be with identical, it is also possible to different.Sample image is for example divided into multiple nets of 12*10 or 15*15 or 6*6 etc. Lattice.When grid it is equal in magnitude when, sizing grid can be normalized.As shown in Figure 6B, in embodiments of the present invention will Sample image is divided into laterally 7 rows, and multiple grids of the row of longitudinal direction 7, each grid is each grid after grid, therefore normalization Size can consider 1*1.

One characteristic vector of each grid correspondence in sample image, this feature vector is multi-C vector, and this feature vector is extremely Include less：Classification parameter and location parameter, wherein classification parameter include multiple, and the location parameter includes again：Center position is joined Number and appearance and size parameter.

Step S503：According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are entered Row training.

Specifically, in embodiments of the present invention, can be using all sample images in sample image set to convolution god It is trained through network.But because including substantial amounts of sample image in sample image set, in order to improve the efficiency of training, at this According to each sample image for the characteristic vector that each grid is determined in inventive embodiments, convolutional neural networks are trained Including：

By randomly selecting the subsample image much smaller than sample image total quantity, convolutional neural networks are trained, The parameter of convolutional neural networks is constantly updated, until the information of the target object of the information and mark of the object of each grid forecasting Between error convergence untill.

Likewise, in embodiments of the present invention when being trained to convolutional neural networks, the sample of the target size of use This image, therefore, in embodiments of the present invention, in order to ensure that the sample image being input in convolutional neural networks is all target chi It is very little, it is described each sample image is divided into multiple grids according to default dividing mode before, methods described also includes：

If not, being target size by the size adjusting of the sample image.

When sample image is target size, subsequent treatment directly is carried out to the sample image, when sample image non-targeted During size, the sample image is adjusted to target size.The adjustment of picture size belongs to prior art, in embodiments of the present invention The process is not repeated.

In said process, the adjustment of target size can be first carried out to sample image, it is also possible to first enter in sample image The mark of row rectangle frame.Advanced row carry out rectangle frame mark ensure size in sample image than it is larger when, can accurately mark Note target object, the adjustment for first carrying out target size can ensure, when the size of sample image is smaller, can accurately mark Target object.

In above-mentioned annotation process, it may be determined that the corresponding characteristic vector of each grid in sample image, of the invention real Apply the corresponding characteristic vector of each grid in example can be expressed as (confidence, cls1, cls2, cls3 ..., cls20, x, Y, w, h), wherein confidence is probability parameter, cls1, and cls2, cls3 ..., cls20 are classification parameter, and x, y, w and h are Location parameter is put centered on location parameter, wherein x and y, w and y is appearance and size parameter.When in the grid comprising target object During central point, the value of each parameter in the corresponding characteristic vector of the grid is determined, in the grid not comprising target object During heart point, the value of each parameter is 0 in the corresponding characteristic vector of the grid.

Specifically, due to being marked to each target object using rectangle frame in sample image, it is believed that square The central point of shape frame is the central point of target object, three central points of rectangle frame as shown in Figure 6 C.When being included in grid During the central point of target object, then in mark, it is believed that probability parameter is 1 in the corresponding characteristic vector of the grid, that is, work as The probability that there is target object in the preceding grid is 1.

Because the classification of the target object included in sample image has multiple, in embodiments of the present invention using classification parameter Cls represents, cls1, cls2 ..., clsn represent different classes of target object respectively.Such as n can be 20, that is, have The target object of classification in 20, the target object classification that cls1 is represented be car, the target object classification that cls2 is represented be dog, The target object classification that cls3 is represented is bicycle.When the central point of target object is included in grid, by the target object pair The classification parameter value answered is set to maximum, and wherein more than the threshold value of setting, such as maximum can be 1, threshold to the maximum Value can be 0.4 etc..

For example shown in Fig. 6 C, the grid from the bottom up where (upper and lower shown in Fig. 6 C) each central point is corresponding In characteristic vector, cls2 is 1 in the classification parameter in the corresponding characteristic vector of first central point, and other classification parameters are 0, the Cls3 is 1 in classification parameter in the corresponding characteristic vector of two central points, and other classification parameters are 0, the 3rd central point pair Cls1 is 1 in classification parameter in the characteristic vector answered, and other classification parameters are 0.

Location parameter x, y, w and h of target object are also included in this feature vector, position ginseng is put centered on wherein x and y Number, its numerical value is the transverse and longitudinal coordinate value of the central point relative to set point of target object, the wherein corresponding set point of each grid Can be with identical, it is also possible to different, such as it is considered that the upper left corner of sample image is set point, the i.e. origin of coordinates, because to every Individual grid is normalized, therefore the coordinate of each position in each grid is uniquely determined.Certainly, in order to simplify Journey, reduces amount of calculation, and the corresponding set point of each grid can also be different, can be using each grid as an independent list Unit, the upper left corner of the grid is set point, the i.e. origin of coordinates.Therefore when being labeled, can according to central point relative to Skew in the grid upper left corner, determines the value of the x and y in the corresponding characteristic vector of the grid where it.Wherein, according to relative position Skew, determine that the process of x and y values belongs to prior art, the process is not repeated in embodiments of the present invention.Join position W and h is appearance and size parameter in number, and its numerical value is the length and value wide of rectangle frame where target object.

Because characteristic vector is a multi-C vector, in order to accurately represent the corresponding characteristic vector of each grid, at this Building mode in inventive embodiments according to Fig. 7, builds the cube structure shown in Fig. 6 D, by the grid convolutional layer, Maximum pond layer, full articulamentum and output layer carry out respective handling, generate lattice structure, and wherein lattice is in Z axis side To depth according to the dimension of characteristic vector determine.In embodiments of the present invention, the lattice is in the depth of Z-direction 25.Above-mentioned that respective handling is carried out in each layer of convolutional neural networks, the process for generating lattice structure belongs to prior art, The process is not repeated in embodiments of the present invention.

After being labeled to substantial amounts of sample image using aforesaid way, using the sample image after mark to convolutional Neural Network is trained.Specifically, convolutional neural networks are carried out by the multiple subsample images for using in embodiments of the present invention Training.In the training process, for each subsample image, the convolution for obtaining the subsample image by convolutional neural networks is special Levy figure, in the convolution characteristic pattern comprising correspondence each grid characteristic vector (confidence, cls1, cls2, cls3 ..., Cls20, x, y, w, h), location parameter and classification parameter comprising the object predicted in the grid in this feature vector, and Probability parameter confidence, probability parameter confidence represent rectangle frame where the object that the grid forecasting is arrived and mark The target object rectangle frame overlapping degree.

Each subsample image is directed in the training process, by calculating the error of information of forecasting and markup information, adjustment The network parameter of convolutional neural networks, by randomly selecting the subsample figure much smaller than sample image total quantity (batch) every time Convolutional neural networks are trained by picture, and update its network parameter, until each grid information of forecasting and markup information it Between error convergence.Convolutional neural networks are trained according to subsample image, adjust the network parameter of convolutional neural networks, Until the process that convolutional neural networks training is completed belongs to prior art, the process is not gone to live in the household of one's in-laws on getting married in embodiments of the present invention State.

In the training process of above-mentioned convolutional neural networks, in order to accurately predict position and the classification information of object, The last full articulamentum of the convolutional neural networks uses logic activation function, convolutional layer to connect entirely with other in the embodiment of the present invention Layer is connect using Leak ReLU functions.Wherein the leaky ReLU functions are：

In embodiments of the present invention in order to complete the training to convolutional neural networks, restrain it, to convolutional Neural net During network training, the method also includes：

Prediction according to the convolutional neural networks to the position and classification of target object in the subsample image, and son The information of the target object marked in sample image, determines the error of the convolutional neural networks；

Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically take 1 or 2, xi be mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, yi be mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, h_iWhere the target object of mark The height of rectangle frame, w_iThe width of rectangle frame where the target object of mark,The rectangle frame where the object of prediction Height,The width of rectangle frame, C where the object of prediction_iFor the grid i of mark currently whether there is target object Probability,Be prediction grid i currently with the presence or absence of object probability, P_iC () is the object in the grid i of mark Body belongs to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjFor The weights of setting,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, 1 is taken when the grid i of prediction has the central point of object, 0 is otherwise taken,Do not exist the center of object in the grid i of prediction 1 is taken during point, 0 is otherwise taken, wherein,Determined according to below equation：

During in order that the application condition predicted the outcome between annotation results is big, it is during prediction to position prediction Contribution is smaller, in embodiments of the present invention, using above-mentioned loss function.

As shown in Figure 6B, each sample image is divided for 7*7 49 totally in one particular embodiment of the present invention Grid, each grid can be detected to 20 classifications, therefore a sample image can produce 980 detection probabilities, its The detection probability of middle most of grid is 0.This will cause to train discretization, a variable be introduced herein and solve this problem： I.e. certain grid whether with the presence of object probability.Therefore except 20 classification parameters, the grid of also one prediction is currently The no probability P with the presence of object_r(Object), then the target object in certain grid belongs to the probability of classification cIt is P_r (Object) object and in certain grid of prediction belongs to the conditional probability P of classification c_rThe product of (Class | Object). In each grid to P_r(Object) it is updated, ability is to P when only there is object within a grid_r(Class | Object) enter Row updates.

Fig. 8 is the article detection device structural representation in a kind of image provided in an embodiment of the present invention, and the device is located at In electronic equipment, the device includes：

Division module 81, for according to default dividing mode, image to be detected being divided into multiple grids, wherein institute The size for stating image to be detected is target size；

Detection module 82, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains Multiple characteristic vectors of the described image of convolutional neural networks output, wherein one characteristic vector of each grid correspondence；

Determining module 83, for for the corresponding characteristic vector of each grid, classification parameter in identification this feature vector Maximum, when the maximum is more than given threshold, according to center position parameter and appearance and size in the characteristic vector Parameter, determines the positional information of the object of category parameter correspondence classification.

Described device also includes：

Judge adjusting module 84, whether the size for judging described image is target size；If not, by described image Size adjusting be target size.

Described device also includes：

Training module 85, for for each sample image in sample image set, using rectangle frame label target thing Body；Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined, Wherein, each described sample image size is target size, when the central point of target object is included in grid, according to the mesh The classification of object is marked, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance Maximum, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and The size of the rectangle frame of the target object according to mark, determines the value of the appearance and size parameter in the characteristic vector, when When not including the central point of target object in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid；According to determination Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.

The training module 85, is additionally operable to for each sample image, and whether the size for judging the sample image is target Size；If not, being target size by the size adjusting of the sample image.

The training module 85, specifically for choosing subsample image in the sample image set, wherein choose Quantity of the quantity of the subsample image less than sample image in the sample image set；Using each the described son chosen Convolutional neural networks are trained by sample image.

Described device also includes：

Error calculating module 86, for according to the convolutional neural networks to the position of the subsample objects in images and The information of the target object marked in the prediction of classification, and subsample image, determines the error of the convolutional neural networks；

Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, x_iFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, y_iFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, h_iThe square where the target object of mark The height of shape frame, w_iThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of prediction_iIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, P_iC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object 1 is taken, 0 is otherwise taken, wherein,Determined according to below equation：

The determining module 83, specifically for the location parameter according to the central point, determines the central point described Positional information in grid；The central point is determined according to the positional information, using the central point as the center of rectangle frame, According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.

The determining module 83, specifically for the set point of the grid is as a reference point；According to the reference point and The location parameter of the central point, determines positional information of the central point in the grid.

For systems/devices embodiment, because it is substantially similar to embodiment of the method, so the comparing of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.

It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.And, the application can be used and wherein include the computer of computer usable program code at one or more The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) is produced The form of product.

The application is the flow with reference to method, equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.

These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the application scope.

Obviously, those skilled in the art can carry out the essence of various changes and modification without deviating from the application to the application God and scope.So, if these modifications of the application and modification belong to the scope of the application claim and its equivalent technologies Within, then the application is also intended to comprising these changes and modification.

Claims

1. the object detecting method in a kind of image, it is characterised in that be applied to electronic equipment, the method includes：

According to default dividing mode, image to be detected is divided into multiple grids, wherein the chi of the image to be detected Very little is target size；

Image after division is input in the convolutional neural networks of training in advance completion, the institute of convolutional neural networks output is obtained State multiple characteristic vectors of image, wherein one characteristic vector of each grid correspondence；

For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when the maximum During more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, category parameter is determined The positional information of the object of correspondence classification.

2. method according to claim 1, it is characterised in that described according to default dividing mode, by figure to be detected Before as being divided into multiple grids, methods described also includes：

Whether the size for judging described image is target size；

If not, being target size by the size adjusting of described image.

3. method according to claim 1, it is characterised in that the training process of the convolutional neural networks includes：

Each sample image is divided into multiple grids according to default dividing mode, determine the corresponding feature of each grid to Amount, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to institute The classification of target object is stated, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance Fixed maximum, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector Value, and the rectangle frame according to the target object of mark size, determine appearance and size parameter in the characteristic vector Value, when the central point of target object is not included in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid；

4. method according to claim 3, it is characterised in that it is described according to default dividing mode by each sample image Before being divided into multiple grids, methods described also includes：

If not, being target size by the size adjusting of the sample image.

5. method according to claim 3, it is characterised in that the basis determines the every of the characteristic vector of each grid Individual sample image, convolutional neural networks are trained including：

Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than described The quantity of sample image in sample image set；

6. the method according to claim 1 or 3, it is characterised in that the default dividing mode includes：

7. method according to claim 6, it is characterised in that methods described also includes：

Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample image The information of the target object of middle mark, determines the error of the convolutional neural networks；

When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following loss letter Number determines：

\begin{matrix} λ_{c o o r d} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(x_{i} - {\hat{x}}_{i})}^{2} + {(y_{i} - {\hat{y}}_{i})}^{2} + λ_{c o o r d} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(\sqrt{w_{i}} - \sqrt{{\hat{w}}_{i}})}^{2} + {(\sqrt{h_{i}} - \sqrt{{\hat{h}}_{i}})}^{2} \\ + Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(c_{i} - {\hat{c}}_{i})}^{2} + λ_{n o o b j} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{n o o b j} {(c_{i} - {\hat{c}}_{i})}^{2} + Σ_{i = 0}^{S^{2}} I_{i}^{o b j} \underset{c &Element; c l a s s e s}{Σ} {(P_{i} (c) - {\hat{P}}_{i} (c))}^{2} \end{matrix}

Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, x_iFor mark target object central point grid i horizontal seat Mark,For prediction object central point grid i abscissa, y_iFor mark target object central point in the grid The ordinate of i,For prediction object central point grid i ordinate, h_iThe rectangle where the target object of mark The height of frame, w_iThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of prediction_iIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, P_iC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjTo set The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object 1 is taken, 0 is otherwise taken, wherein,Determined according to below equation：

\Pr ({Class}_{i} | O b j e c t) * \Pr (O b j e c t) = {\hat{P}}_{i} (c)

P_r(Object) be prediction grid i currently with the presence or absence of object probability, P_r(Class | Object) it is the net predicted Object in lattice i belongs to the conditional probability of classification c.

8. method according to claim 1, it is characterised in that described according to center position parameter in the characteristic vector With appearance and size parameter, determining the positional information of the object of category parameter correspondence classification includes：

The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to the profile Dimensional parameters, determine the positional information of the rectangle frame, believe the positional information of the rectangle frame as the position of the object Breath, and using the corresponding object classification of the classification parameter as the object classification.

9. method according to claim 8, it is characterised in that the location parameter according to the central point, determines institute Stating positional information of the central point in the grid includes：

The set point of the grid is as a reference point；According to the reference point and the location parameter of the central point, institute is determined State positional information of the central point in the grid.

10. the article detection device in a kind of image, it is characterised in that described device includes：

Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described to be checked The size of the image of survey is target size；

Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains convolution god Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence；

Determining module, for for the corresponding characteristic vector of each grid, recognizing the maximum of classification parameter in this feature vector, When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really Determine the positional information of the object of category parameter correspondence classification.

11. devices according to claim 10, it is characterised in that described device also includes：

Judge adjusting module, whether the size for judging described image is target size；If not, by the size of described image It is adjusted to target size.

12. devices according to claim 10, it is characterised in that described device also includes：

Training module, for for each sample image in sample image set, using rectangle frame label target object；According to Each sample image is divided into multiple grids by default dividing mode, determines the corresponding characteristic vector of each grid, wherein, often The individual sample image size is target size, when the central point of target object is included in grid, according to the target object Classification, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to maximum set in advance Value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and according to The size of the rectangle frame of the target object of mark, determines the value of the appearance and size parameter in the characteristic vector, works as grid In not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid；It is every according to determining Convolutional neural networks are trained by each sample image of the characteristic vector of individual grid.

13. devices according to claim 12, it is characterised in that the training module, are additionally operable to for each sample graph Picture, whether the size for judging the sample image is target size；If not, being target chi by the size adjusting of the sample image It is very little.

14. devices according to claim 13, it is characterised in that the training module, specifically in the sample graph Image set chooses subsample image in closing, wherein the quantity of the subsample image chosen is less than sample in the sample image set The quantity of this image；Using each the described subsample image chosen, convolutional neural networks are trained.

15. devices according to claim 12, it is characterised in that described device also includes：

Error calculating module, for according to the convolutional neural networks to the position of the subsample objects in images and classification Prediction, and the target object marked in the image of subsample, determine the error of the convolutional neural networks；

\begin{matrix} λ_{c o o r d} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(x_{i} - {\hat{x}}_{i})}^{2} + {(y_{i} - {\hat{y}}_{i})}^{2} + λ_{c o o r d} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(\sqrt{w_{i}} - \sqrt{{\hat{w}}_{i}})}^{2} + {(\sqrt{h_{i}} - \sqrt{{\hat{h}}_{i}})}^{2} \\ + Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{o b j} {(c_{i} - {\hat{c}}_{i})}^{2} + λ_{n o o b j} Σ_{i = 0}^{S^{2}} Σ_{j = 0}^{B} I_{i j}^{n o o b j} {(c_{i} - {\hat{c}}_{i})}^{2} + Σ_{i = 0}^{S^{2}} I_{i}^{o b j} \underset{c &Element; c l a s s e s}{Σ} {(P_{i} (c) - {\hat{P}}_{i} (c))}^{2} \end{matrix}

Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, x_iFor mark target object central point grid i horizontal seat Mark,For prediction object central point grid i abscissa, y_iFor mark target object central point in grid i Ordinate,For prediction object central point grid i ordinate, h_iThe rectangle where the target object of mark The height of frame, w_iThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of prediction_iIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, P_iC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of prediction_coordAnd λ_noobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object 1,0 is otherwise taken, wherein,Determined according to below equation：

\Pr ({Class}_{i} | O b j e c t) * \Pr (O b j e c t) = {\hat{P}}_{i} (c)

16. devices according to claim 10, it is characterised in that the determining module, specifically for according to the center The location parameter of point, determines positional information of the central point in the grid；

17. devices according to claim 16, it is characterised in that the determining module, specifically for by the grid Set point is as a reference point；According to the reference point and the location parameter of the central point, determine the central point in the net Positional information in lattice.