Nothing Special   »   [go: up one dir, main page]

CN106803071A - Object detecting method and device in a kind of image - Google Patents

Object detecting method and device in a kind of image Download PDF

Info

Publication number
CN106803071A
CN106803071A CN201611249792.1A CN201611249792A CN106803071A CN 106803071 A CN106803071 A CN 106803071A CN 201611249792 A CN201611249792 A CN 201611249792A CN 106803071 A CN106803071 A CN 106803071A
Authority
CN
China
Prior art keywords
grid
image
central point
classification
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611249792.1A
Other languages
Chinese (zh)
Other versions
CN106803071B (en
Inventor
杨松林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN201611249792.1A priority Critical patent/CN106803071B/en
Publication of CN106803071A publication Critical patent/CN106803071A/en
Priority to EP17886017.7A priority patent/EP3545466A4/en
Priority to PCT/CN2017/107043 priority patent/WO2018121013A1/en
Priority to US16/457,861 priority patent/US11113840B2/en
Application granted granted Critical
Publication of CN106803071B publication Critical patent/CN106803071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the real-time of target detection.According to default dividing mode in the method, image to be detected is divided into multiple grids, image after division is input in the convolutional neural networks of training in advance completion, obtain the corresponding characteristic vector of each grid of the described image of convolutional neural networks output, recognize the maximum of classification parameter in each characteristic vector, when the maximum is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, the positional information of the object of the corresponding classification of category parameter is determined.Due to the convolutional neural networks completed by training in advance in the embodiment of the present invention, determine classification and the position of object in image, the detection of object space and classification can simultaneously be realized, without the multiple characteristic areas of selection, save the time of detection, the real-time of detection and the efficiency of detection are improve, and is easy to global optimization.

Description

Object detecting method and device in a kind of image
Technical field
The present invention relates to machine learning techniques field, object detecting method and device in more particularly to a kind of image.
Background technology
With the development of Video Supervision Technique, intelligent video monitoring is applied in increasing scene, such as traffic, business Field, hospital, cell, park etc., the application of intelligent video monitoring by image in various scenes, to carry out target detection and establish Basis.
It is general using the convolutional neural networks based on candidate region when prior art carries out target detection in the picture (Region Convolutional Neural Network, R-CNN) and its extend Fast RCNN and FasterRCNN.Fig. 1 It is the schematic flow sheet that object detection is carried out using R-CNN, its detection process includes:Input picture is received, is extracted in the picture Candidate region (region proposal), calculates the CNN features of each candidate region, and true using classification and the method for returning The type of earnest body and position., it is necessary to extract 2000 candidate regions in the picture in said process, the whole process extracted The time of time-consuming 1~2s is needed, then for each candidate region, it is necessary to calculate the CNN features of the candidate region, and candidate regions There are many in domain in the presence of overlap, therefore can also there are many repeated works when CNN features are calculated;And this was detected Also include subsequent step in journey:The feature learning of proposal, and the position of pair object for determining is corrected and eliminates void The treatment such as alert, whole detection process may need the time of 2~40s, leverage the real-time of object detection.
In addition, during carrying out object detection using R-CNN, the extraction of image is detected using conspicuousness (selective search) is extracted, and calculates CNN features using convolutional neural networks afterwards, finally reuses SVMs Model (SVM) is classified, so that it is determined that the position of target.And above three step is all separate method, can not Global optimization is carried out to whole detection process.
Fig. 2 is the process schematic that object detection is carried out using Faster RCNN, and the process is entered using convolutional neural networks OK, each sliding window will generate one 256 data of dimension at intermediate layer (intermediate layer), in classification layer (cls Layer) the classification of detection object, is returning the position of layer (reg layer) detection object.Above-mentioned classification and position to object Detection be two independent steps, be required for being detected respectively for the data of 256 dimensions in two steps, therefore the process Also will growth detection duration, so as to influence the real-time of object detection.
The content of the invention
The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the reality of object detection Shi Xing, and be easy to carry out global optimization to object detection.
To reach above-mentioned purpose, the embodiment of the invention discloses the object detecting method in kind of image, it is applied to electronics and sets Standby, the method includes:
According to default dividing mode, image to be detected is divided into multiple grids, wherein the image to be detected Size be target size;
Image after division is input in the convolutional neural networks of training in advance completion, convolutional neural networks output is obtained Described image multiple characteristic vectors, wherein each grid correspondence one characteristic vector;
For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when it is described most When big value is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, the category is determined The positional information of the object of parameter correspondence classification.
Further, it is described according to default dividing mode, it is described before image to be detected is divided into multiple grids Method also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
Further, the training process of the convolutional neural networks includes:
For each sample image in sample image set, using rectangle frame label target object;
Each sample image is divided into multiple grids according to default dividing mode, the corresponding feature of each grid is determined Vector, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to The classification of the target object, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set in advance The maximum of setting, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector Value, and according to mark the target object rectangle frame size, determine the appearance and size parameter in the characteristic vector Value, when in grid not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid;
According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are trained.
Further, it is described each sample image is divided into multiple grids according to default dividing mode before, it is described Method also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
Further, the basis determines each sample image of the characteristic vector of each grid, to convolutional Neural net Network be trained including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than The quantity of sample image in the sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
Further, the default dividing mode includes:
Image and sample image are divided into line number amount and number of columns identical multiple grid;Or,
Image and sample image are divided into multiple grids that line number amount and number of columns are differed.
Further, methods described also includes:
Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample The information of the target object marked in image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,Do not exist the central point of object in the grid i of prediction When take 1, otherwise take 0, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction Grid i in object belong to the conditional probability of classification c.
Further, it is described according to center position parameter in the characteristic vector and appearance and size parameter, determine such The positional information of the object of other parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Further, the location parameter according to the central point, determines position of the central point in the grid Confidence breath includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, really Fixed positional information of the central point in the grid.
The embodiment of the invention discloses the article detection device in a kind of image, described device includes:
Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described The size of image to be detected is target size;
Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains volume Multiple characteristic vectors of the described image of product neutral net output, wherein one characteristic vector of each grid correspondence;
Determining module, for for the corresponding characteristic vector of each grid, in identification this feature vector, classification parameter is most Big value, when the maximum is more than given threshold, joins according to center position parameter in the characteristic vector and appearance and size Number, determines the positional information of the object of category parameter correspondence classification.
Further, described device also includes:
Judge adjusting module, whether the size for judging described image is target size;If not, by described image Size adjusting is target size.
Further, described device also includes:
Training module, for for each sample image in sample image set, using rectangle frame label target object; Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined, its In, each described sample image size is target size, when the central point of target object is included in grid, according to the target The classification of object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid be set to it is set in advance most Big value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and root According to the size of the rectangle frame of the target object of mark, the value of the appearance and size parameter in the characteristic vector is determined, work as net When not including the central point of target object in lattice, the value of each parameter is zero in the corresponding characteristic vector of the grid;According to determining Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.
Further, the training module, is additionally operable to for each sample image, judge the sample image size whether It is target size;If not, being target size by the size adjusting of the sample image.
Further, the training module, specifically for choosing subsample image in the sample image set, wherein Quantity of the quantity of the subsample image chosen less than sample image in the sample image set;Using each chosen Convolutional neural networks are trained by the subsample image.
Further, described device also includes:
Error calculating module, for the position according to the convolutional neural networks to the subsample objects in images and class The target object marked in other prediction, and subsample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object 1,0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction Grid i in object belong to the conditional probability of classification c.
Further, the determining module, specifically for the location parameter according to the central point, determines the central point Positional information in the grid;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Further, the determining module, specifically for the set point of the grid is as a reference point;According to the ginseng The location parameter of examination point and the central point, determines positional information of the central point in the grid.
The object detecting method and device in a kind of image are the embodiment of the invention provides, according to default stroke in the method The mode of dividing, multiple grids are divided into by image to be detected, and the wherein size of described image is target size, by the figure after division As being input in the convolutional neural networks of training in advance completion, multiple features of the described image of convolutional neural networks output are obtained Vector, wherein each grid one characteristic vector of correspondence, recognize the maximum of classification parameter in each characteristic vector, when the maximum When value is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, category parameter pair is determined The positional information of the object of the classification answered.Due in the embodiment of the present invention by training in advance complete convolutional neural networks, really Fixed corresponding each characteristic vector of the image, classification parameter and location-dependent parameters in characteristic vector, in determining image Object classification and position, can simultaneously realize the detection of object space and classification, be easy to global optimization, additionally, due to basis The corresponding characteristic vector of each grid, determines position and the classification of object, without the multiple characteristic areas of selection, saves detection Time, improve the real-time of detection and the efficiency of detection.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet that object detection is carried out using R-CNN;
Fig. 2 is the process schematic that object detection is carried out using Faster RCNN;
Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention;
Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention;
Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention;
Fig. 6 A- Fig. 6 D are the annotation results schematic diagram of target object provided in an embodiment of the present invention;
Fig. 7 is the construction process schematic diagram of cube structure in Fig. 6 D;
Fig. 8 is the structural representation of the article detection device in a kind of image provided in an embodiment of the present invention.
Specific embodiment
In order to effectively improve the efficiency of object detection, the real-time of object detection is improved, is easy to object detection global optimization, The embodiment of the invention provides the object detecting method and device in a kind of image.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention, and the process includes following Step:
Step S301:According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked The size of the image of survey is target size.
The embodiment of the present invention is applied to electronic equipment, and the specific electronic equipment can be desktop computer, notebook, other tools There is smart machine of disposal ability etc..
After getting the image to be detected of target size, according to default dividing mode, image to be detected is divided It is multiple grids, when wherein the default dividing mode is trained with convolutional neural networks, the dividing mode to image is identical.For example For convenience, it by image is multiple rows, Duo Gelie that can be, the interval between interval and row between row is equal, or. Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out convolutional neural networks The image of training uses identical mesh generation mode.
When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into Than can be with identical, it is also possible to different.
Step S302:Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.
For the classification of object and position in detection image, convolutional neural networks are instructed in embodiments of the present invention Practice, the corresponding characteristic vector of each grid is obtained by training the convolutional neural networks for completing, for example, can divide an image into 49 grids of 7*7, then after the image after division being input into the convolutional neural networks that training is completed, can export 49 features Vector, each characteristic vector one grid of correspondence.
Step S303:For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really Determine the positional information of the object of category parameter correspondence classification.
Specifically, the characteristic vector that the embodiment of the present invention is obtained is multi-C vector, this feature vector at least includes:Classification is joined Number and location parameter, wherein classification parameter include multiple, and the location parameter includes again:Center position parameter and appearance and size are joined Number.After each grid corresponding characteristic vector is obtained, for the corresponding characteristic vector of each grid, the grid is judged respectively Whether object is detected.If the maximum in the corresponding characteristic vector of grid in multiple classification parameters is more than given threshold, To object, the corresponding classification of category parameter is the classification of the object to the grid detection, can be according to the corresponding spy of the grid Levy the position that vector determines the object.
Because the location parameter in the characteristic vector used when convolutional neural networks are trained is true according to the method for setting Fixed, therefore the position of object can be determined according to the method for the setting.
Due in the embodiment of the present invention by training in advance complete convolutional neural networks, determine the image it is corresponding each Characteristic vector, classification parameter and location-dependent parameters in characteristic vector determine classification and the position of object in image, The prediction of object space and classification can be simultaneously realized, is easy to global optimization, additionally, due to according to the corresponding feature of each grid Vector, determines position and the classification of object, without the multiple characteristic areas of selection, saves the time of detection, improves detection Real-time and the efficiency of detection.
Object detection in the embodiment of the present invention is directed to the detection that the image of target size is carried out, and the target size is volume When product neutral net is trained, the uniform sizes of the image of use, the size can be arbitrary dimension, as long as convolutional Neural net The size of image when during network training with object detection is identical.The target size for example can be 1024*1024, or, Can be 256*512 etc..
Therefore, in embodiments of the present invention, all it is target size to ensure the image being input in convolutional neural networks Image, according to default dividing mode, before image to be detected is divided into multiple grids, methods described also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
When image to be detected is target size, subsequent treatment directly is carried out to the image, when image to be detected is non- During target size, by the Image Adjusting to be detected to target size.The adjustment of picture size belongs to prior art, in the present invention The process is not repeated in embodiment.
Specifically, being joined according to center position parameter in the characteristic vector and appearance and size in embodiments of the present invention Number, determining the positional information of the object of category parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Wherein, the location parameter according to the central point, determines position letter of the central point in the grid Breath includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, really Fixed positional information of the central point in the grid.
Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention, the mistake Journey is comprised the following steps:
Step S401:Receive image to be detected.
Step S402:Whether the size for judging described image is target size, if it is, carrying out step S404, otherwise, is entered Row step S403.
Step S403:It is target size by the size adjusting of described image.
Step S404:According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked The size of the image of survey is target size.
Step S405:Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.
Step S406:For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector.
Step S407:It is when the maximum is more than given threshold, the set point of the grid is as a reference point, according to The location parameter of the reference point and the central point, determines positional information of the central point in the grid.
Step S408:The central point is determined according to the positional information, using the central point as the center of rectangle frame, According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.
Above-mentioned target detection is that the convolutional neural networks completed based on training are carried out, in order to realize the detection to object, Need to be trained convolutional neural networks.In the embodiment of the present invention when being trained to convolutional neural networks, by target chi Very little sample image is divided into multiple grids, if the central point of a certain target object is located in some grid, the grid It is responsible for detecting the target object, including the classification and corresponding position (bounding box) for detecting the target object.
Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention, and the process includes following step Suddenly:
Step S501:For each sample image in sample image set, using rectangle frame label target object.
Convolutional neural networks are trained using substantial amounts of sample image in embodiments of the present invention, then substantial amounts of sample Image construction sample image set.Using rectangle frame in each sample image label target object.
Specifically, the annotation results schematic diagram of the target object as shown in Fig. 6 A- Fig. 6 D, deposits in the sample image in Fig. 6 A Dog, bicycle and car are respectively in 3 target objects.When being labeled to each target object, respectively in sample graph Recognize each target object on the top of upper and lower, left and right (with respect to the upper and lower, left and right direction shown in Fig. 6 A) four direction as in Point, if the summit be upper and lower summit, by respectively through upper and lower summit parallel to bottom on sample image two lines As two sides of rectangle frame, if the summit is left and right summit, will be respectively through left and right summit parallel to sample image The two lines of left and right side as rectangle frame two other side.Such as the dog, bicycle and the car that are marked with dotted line in Fig. 6 A Rectangle frame.
Step S502:Each sample image is divided into multiple grids according to default dividing mode, each grid is determined Corresponding characteristic vector, wherein, each described sample image size is target size, when the center comprising target object in grid During point, according to the classification of the target object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid Maximum set in advance is set to, the position in the grid is located at according to the central point, determine center in the characteristic vector The value of point location parameter, and the rectangle frame according to the target object of mark size, determine outer in the characteristic vector The value of shape dimensional parameters, when the central point of target object is not included in grid, each parameter in the corresponding characteristic vector of the grid Value be zero.
Sample image can be divided into multiple grids according to default dividing mode in embodiments of the present invention, wherein The dividing mode of the sample image is identical to the dividing mode of image to be detected with above-mentioned detection process.
For example for convenience, it by image is multiple rows, Duo Gelie, the interval between interval and row between row that can be It is equal, or.Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out The image of convolutional neural networks training uses identical mesh generation mode.
When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into Than can be with identical, it is also possible to different.Sample image is for example divided into multiple nets of 12*10 or 15*15 or 6*6 etc. Lattice.When grid it is equal in magnitude when, sizing grid can be normalized.As shown in Figure 6B, in embodiments of the present invention will Sample image is divided into laterally 7 rows, and multiple grids of the row of longitudinal direction 7, each grid is each grid after grid, therefore normalization Size can consider 1*1.
One characteristic vector of each grid correspondence in sample image, this feature vector is multi-C vector, and this feature vector is extremely Include less:Classification parameter and location parameter, wherein classification parameter include multiple, and the location parameter includes again:Center position is joined Number and appearance and size parameter.
Step S503:According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are entered Row training.
Specifically, in embodiments of the present invention, can be using all sample images in sample image set to convolution god It is trained through network.But because including substantial amounts of sample image in sample image set, in order to improve the efficiency of training, at this According to each sample image for the characteristic vector that each grid is determined in inventive embodiments, convolutional neural networks are trained Including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than The quantity of sample image in the sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
By randomly selecting the subsample image much smaller than sample image total quantity, convolutional neural networks are trained, The parameter of convolutional neural networks is constantly updated, until the information of the target object of the information and mark of the object of each grid forecasting Between error convergence untill.
Likewise, in embodiments of the present invention when being trained to convolutional neural networks, the sample of the target size of use This image, therefore, in embodiments of the present invention, in order to ensure that the sample image being input in convolutional neural networks is all target chi It is very little, it is described each sample image is divided into multiple grids according to default dividing mode before, methods described also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
When sample image is target size, subsequent treatment directly is carried out to the sample image, when sample image non-targeted During size, the sample image is adjusted to target size.The adjustment of picture size belongs to prior art, in embodiments of the present invention The process is not repeated.
In said process, the adjustment of target size can be first carried out to sample image, it is also possible to first enter in sample image The mark of row rectangle frame.Advanced row carry out rectangle frame mark ensure size in sample image than it is larger when, can accurately mark Note target object, the adjustment for first carrying out target size can ensure, when the size of sample image is smaller, can accurately mark Target object.
In above-mentioned annotation process, it may be determined that the corresponding characteristic vector of each grid in sample image, of the invention real Apply the corresponding characteristic vector of each grid in example can be expressed as (confidence, cls1, cls2, cls3 ..., cls20, x, Y, w, h), wherein confidence is probability parameter, cls1, and cls2, cls3 ..., cls20 are classification parameter, and x, y, w and h are Location parameter is put centered on location parameter, wherein x and y, w and y is appearance and size parameter.When in the grid comprising target object During central point, the value of each parameter in the corresponding characteristic vector of the grid is determined, in the grid not comprising target object During heart point, the value of each parameter is 0 in the corresponding characteristic vector of the grid.
Specifically, due to being marked to each target object using rectangle frame in sample image, it is believed that square The central point of shape frame is the central point of target object, three central points of rectangle frame as shown in Figure 6 C.When being included in grid During the central point of target object, then in mark, it is believed that probability parameter is 1 in the corresponding characteristic vector of the grid, that is, work as The probability that there is target object in the preceding grid is 1.
Because the classification of the target object included in sample image has multiple, in embodiments of the present invention using classification parameter Cls represents, cls1, cls2 ..., clsn represent different classes of target object respectively.Such as n can be 20, that is, have The target object of classification in 20, the target object classification that cls1 is represented be car, the target object classification that cls2 is represented be dog, The target object classification that cls3 is represented is bicycle.When the central point of target object is included in grid, by the target object pair The classification parameter value answered is set to maximum, and wherein more than the threshold value of setting, such as maximum can be 1, threshold to the maximum Value can be 0.4 etc..
For example shown in Fig. 6 C, the grid from the bottom up where (upper and lower shown in Fig. 6 C) each central point is corresponding In characteristic vector, cls2 is 1 in the classification parameter in the corresponding characteristic vector of first central point, and other classification parameters are 0, the Cls3 is 1 in classification parameter in the corresponding characteristic vector of two central points, and other classification parameters are 0, the 3rd central point pair Cls1 is 1 in classification parameter in the characteristic vector answered, and other classification parameters are 0.
Location parameter x, y, w and h of target object are also included in this feature vector, position ginseng is put centered on wherein x and y Number, its numerical value is the transverse and longitudinal coordinate value of the central point relative to set point of target object, the wherein corresponding set point of each grid Can be with identical, it is also possible to different, such as it is considered that the upper left corner of sample image is set point, the i.e. origin of coordinates, because to every Individual grid is normalized, therefore the coordinate of each position in each grid is uniquely determined.Certainly, in order to simplify Journey, reduces amount of calculation, and the corresponding set point of each grid can also be different, can be using each grid as an independent list Unit, the upper left corner of the grid is set point, the i.e. origin of coordinates.Therefore when being labeled, can according to central point relative to Skew in the grid upper left corner, determines the value of the x and y in the corresponding characteristic vector of the grid where it.Wherein, according to relative position Skew, determine that the process of x and y values belongs to prior art, the process is not repeated in embodiments of the present invention.Join position W and h is appearance and size parameter in number, and its numerical value is the length and value wide of rectangle frame where target object.
Because characteristic vector is a multi-C vector, in order to accurately represent the corresponding characteristic vector of each grid, at this Building mode in inventive embodiments according to Fig. 7, builds the cube structure shown in Fig. 6 D, by the grid convolutional layer, Maximum pond layer, full articulamentum and output layer carry out respective handling, generate lattice structure, and wherein lattice is in Z axis side To depth according to the dimension of characteristic vector determine.In embodiments of the present invention, the lattice is in the depth of Z-direction 25.Above-mentioned that respective handling is carried out in each layer of convolutional neural networks, the process for generating lattice structure belongs to prior art, The process is not repeated in embodiments of the present invention.
After being labeled to substantial amounts of sample image using aforesaid way, using the sample image after mark to convolutional Neural Network is trained.Specifically, convolutional neural networks are carried out by the multiple subsample images for using in embodiments of the present invention Training.In the training process, for each subsample image, the convolution for obtaining the subsample image by convolutional neural networks is special Levy figure, in the convolution characteristic pattern comprising correspondence each grid characteristic vector (confidence, cls1, cls2, cls3 ..., Cls20, x, y, w, h), location parameter and classification parameter comprising the object predicted in the grid in this feature vector, and Probability parameter confidence, probability parameter confidence represent rectangle frame where the object that the grid forecasting is arrived and mark The target object rectangle frame overlapping degree.
Each subsample image is directed in the training process, by calculating the error of information of forecasting and markup information, adjustment The network parameter of convolutional neural networks, by randomly selecting the subsample figure much smaller than sample image total quantity (batch) every time Convolutional neural networks are trained by picture, and update its network parameter, until each grid information of forecasting and markup information it Between error convergence.Convolutional neural networks are trained according to subsample image, adjust the network parameter of convolutional neural networks, Until the process that convolutional neural networks training is completed belongs to prior art, the process is not gone to live in the household of one's in-laws on getting married in embodiments of the present invention State.
In the training process of above-mentioned convolutional neural networks, in order to accurately predict position and the classification information of object, The last full articulamentum of the convolutional neural networks uses logic activation function, convolutional layer to connect entirely with other in the embodiment of the present invention Layer is connect using Leak ReLU functions.Wherein the leaky ReLU functions are:
In embodiments of the present invention in order to complete the training to convolutional neural networks, restrain it, to convolutional Neural net During network training, the method also includes:
Prediction according to the convolutional neural networks to the position and classification of target object in the subsample image, and son The information of the target object marked in sample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically take 1 or 2, xi be mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, yi be mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, hiWhere the target object of mark The height of rectangle frame, wiThe width of rectangle frame where the target object of mark,The rectangle frame where the object of prediction Height,The width of rectangle frame, C where the object of predictioniFor the grid i of mark currently whether there is target object Probability,Be prediction grid i currently with the presence or absence of object probability, PiC () is the object in the grid i of mark Body belongs to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjFor The weights of setting,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, 1 is taken when the grid i of prediction has the central point of object, 0 is otherwise taken,Do not exist the center of object in the grid i of prediction 1 is taken during point, 0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction Grid i in object belong to the conditional probability of classification c.
During in order that the application condition predicted the outcome between annotation results is big, it is during prediction to position prediction Contribution is smaller, in embodiments of the present invention, using above-mentioned loss function.
As shown in Figure 6B, each sample image is divided for 7*7 49 totally in one particular embodiment of the present invention Grid, each grid can be detected to 20 classifications, therefore a sample image can produce 980 detection probabilities, its The detection probability of middle most of grid is 0.This will cause to train discretization, a variable be introduced herein and solve this problem: I.e. certain grid whether with the presence of object probability.Therefore except 20 classification parameters, the grid of also one prediction is currently The no probability P with the presence of objectr(Object), then the target object in certain grid belongs to the probability of classification cIt is Pr (Object) object and in certain grid of prediction belongs to the conditional probability P of classification crThe product of (Class | Object). In each grid to Pr(Object) it is updated, ability is to P when only there is object within a gridr(Class | Object) enter Row updates.
Fig. 8 is the article detection device structural representation in a kind of image provided in an embodiment of the present invention, and the device is located at In electronic equipment, the device includes:
Division module 81, for according to default dividing mode, image to be detected being divided into multiple grids, wherein institute The size for stating image to be detected is target size;
Detection module 82, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains Multiple characteristic vectors of the described image of convolutional neural networks output, wherein one characteristic vector of each grid correspondence;
Determining module 83, for for the corresponding characteristic vector of each grid, classification parameter in identification this feature vector Maximum, when the maximum is more than given threshold, according to center position parameter and appearance and size in the characteristic vector Parameter, determines the positional information of the object of category parameter correspondence classification.
Described device also includes:
Judge adjusting module 84, whether the size for judging described image is target size;If not, by described image Size adjusting be target size.
Described device also includes:
Training module 85, for for each sample image in sample image set, using rectangle frame label target thing Body;Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined, Wherein, each described sample image size is target size, when the central point of target object is included in grid, according to the mesh The classification of object is marked, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance Maximum, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and The size of the rectangle frame of the target object according to mark, determines the value of the appearance and size parameter in the characteristic vector, when When not including the central point of target object in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid;According to determination Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.
The training module 85, is additionally operable to for each sample image, and whether the size for judging the sample image is target Size;If not, being target size by the size adjusting of the sample image.
The training module 85, specifically for choosing subsample image in the sample image set, wherein choose Quantity of the quantity of the subsample image less than sample image in the sample image set;Using each the described son chosen Convolutional neural networks are trained by sample image.
Described device also includes:
Error calculating module 86, for according to the convolutional neural networks to the position of the subsample objects in images and The information of the target object marked in the prediction of classification, and subsample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object 1 is taken, 0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction Grid i in object belong to the conditional probability of classification c.
The determining module 83, specifically for the location parameter according to the central point, determines the central point described Positional information in grid;The central point is determined according to the positional information, using the central point as the center of rectangle frame, According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.
The determining module 83, specifically for the set point of the grid is as a reference point;According to the reference point and The location parameter of the central point, determines positional information of the central point in the grid.
The object detecting method and device in a kind of image are the embodiment of the invention provides, according to default stroke in the method The mode of dividing, multiple grids are divided into by image to be detected, and the wherein size of described image is target size, by the figure after division As being input in the convolutional neural networks of training in advance completion, multiple features of the described image of convolutional neural networks output are obtained Vector, wherein each grid one characteristic vector of correspondence, recognize the maximum of classification parameter in each characteristic vector, when the maximum When value is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, category parameter pair is determined The positional information of the object of the classification answered.Due in the embodiment of the present invention by training in advance complete convolutional neural networks, really Fixed corresponding each characteristic vector of the image, classification parameter and location-dependent parameters in characteristic vector, in determining image Object classification and position, can simultaneously realize the detection of object space and classification, be easy to global optimization, additionally, due to basis The corresponding characteristic vector of each grid, determines position and the classification of object, without the multiple characteristic areas of selection, saves detection Time, improve the real-time of detection and the efficiency of detection.
For systems/devices embodiment, because it is substantially similar to embodiment of the method, so the comparing of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.And, the application can be used and wherein include the computer of computer usable program code at one or more The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) is produced The form of product.
The application is the flow with reference to method, equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the application scope.
Obviously, those skilled in the art can carry out the essence of various changes and modification without deviating from the application to the application God and scope.So, if these modifications of the application and modification belong to the scope of the application claim and its equivalent technologies Within, then the application is also intended to comprising these changes and modification.

Claims (17)

1. the object detecting method in a kind of image, it is characterised in that be applied to electronic equipment, the method includes:
According to default dividing mode, image to be detected is divided into multiple grids, wherein the chi of the image to be detected Very little is target size;
Image after division is input in the convolutional neural networks of training in advance completion, the institute of convolutional neural networks output is obtained State multiple characteristic vectors of image, wherein one characteristic vector of each grid correspondence;
For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when the maximum During more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, category parameter is determined The positional information of the object of correspondence classification.
2. method according to claim 1, it is characterised in that described according to default dividing mode, by figure to be detected Before as being divided into multiple grids, methods described also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
3. method according to claim 1, it is characterised in that the training process of the convolutional neural networks includes:
For each sample image in sample image set, using rectangle frame label target object;
Each sample image is divided into multiple grids according to default dividing mode, determine the corresponding feature of each grid to Amount, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to institute The classification of target object is stated, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance Fixed maximum, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector Value, and the rectangle frame according to the target object of mark size, determine appearance and size parameter in the characteristic vector Value, when the central point of target object is not included in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid;
According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are trained.
4. method according to claim 3, it is characterised in that it is described according to default dividing mode by each sample image Before being divided into multiple grids, methods described also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
5. method according to claim 3, it is characterised in that the basis determines the every of the characteristic vector of each grid Individual sample image, convolutional neural networks are trained including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than described The quantity of sample image in sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
6. the method according to claim 1 or 3, it is characterised in that the default dividing mode includes:
Image and sample image are divided into line number amount and number of columns identical multiple grid;Or,
Image and sample image are divided into multiple grids that line number amount and number of columns are differed.
7. method according to claim 6, it is characterised in that methods described also includes:
Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample image The information of the target object of middle mark, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following loss letter Number determines:
λ c o o r d Σ i = 0 S 2 Σ j = 0 B I i j o b j ( x i - x ^ i ) 2 + ( y i - y ^ i ) 2 + λ c o o r d Σ i = 0 S 2 Σ j = 0 B I i j o b j ( w i - w ^ i ) 2 + ( h i - h ^ i ) 2 + Σ i = 0 S 2 Σ j = 0 B I i j o b j ( c i - c ^ i ) 2 + λ n o o b j Σ i = 0 S 2 Σ j = 0 B I i j n o o b j ( c i - c ^ i ) 2 + Σ i = 0 S 2 I i o b j Σ c ∈ c l a s s e s ( P i ( c ) - P ^ i ( c ) ) 2
Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal seat Mark,For prediction object central point grid i abscissa, yiFor mark target object central point in the grid The ordinate of i,For prediction object central point grid i ordinate, hiThe rectangle where the target object of mark The height of frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0, The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object 1 is taken, 0 is otherwise taken, wherein,Determined according to below equation:
Pr ( Class i | O b j e c t ) * Pr ( O b j e c t ) = P ^ i ( c )
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is the net predicted Object in lattice i belongs to the conditional probability of classification c.
8. method according to claim 1, it is characterised in that described according to center position parameter in the characteristic vector With appearance and size parameter, determining the positional information of the object of category parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to the profile Dimensional parameters, determine the positional information of the rectangle frame, believe the positional information of the rectangle frame as the position of the object Breath, and using the corresponding object classification of the classification parameter as the object classification.
9. method according to claim 8, it is characterised in that the location parameter according to the central point, determines institute Stating positional information of the central point in the grid includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, institute is determined State positional information of the central point in the grid.
10. the article detection device in a kind of image, it is characterised in that described device includes:
Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described to be checked The size of the image of survey is target size;
Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains convolution god Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence;
Determining module, for for the corresponding characteristic vector of each grid, recognizing the maximum of classification parameter in this feature vector, When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really Determine the positional information of the object of category parameter correspondence classification.
11. devices according to claim 10, it is characterised in that described device also includes:
Judge adjusting module, whether the size for judging described image is target size;If not, by the size of described image It is adjusted to target size.
12. devices according to claim 10, it is characterised in that described device also includes:
Training module, for for each sample image in sample image set, using rectangle frame label target object;According to Each sample image is divided into multiple grids by default dividing mode, determines the corresponding characteristic vector of each grid, wherein, often The individual sample image size is target size, when the central point of target object is included in grid, according to the target object Classification, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to maximum set in advance Value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and according to The size of the rectangle frame of the target object of mark, determines the value of the appearance and size parameter in the characteristic vector, works as grid In not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid;It is every according to determining Convolutional neural networks are trained by each sample image of the characteristic vector of individual grid.
13. devices according to claim 12, it is characterised in that the training module, are additionally operable to for each sample graph Picture, whether the size for judging the sample image is target size;If not, being target chi by the size adjusting of the sample image It is very little.
14. devices according to claim 13, it is characterised in that the training module, specifically in the sample graph Image set chooses subsample image in closing, wherein the quantity of the subsample image chosen is less than sample in the sample image set The quantity of this image;Using each the described subsample image chosen, convolutional neural networks are trained.
15. devices according to claim 12, it is characterised in that described device also includes:
Error calculating module, for according to the convolutional neural networks to the position of the subsample objects in images and classification Prediction, and the target object marked in the image of subsample, determine the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following loss letter Number determines:
λ c o o r d Σ i = 0 S 2 Σ j = 0 B I i j o b j ( x i - x ^ i ) 2 + ( y i - y ^ i ) 2 + λ c o o r d Σ i = 0 S 2 Σ j = 0 B I i j o b j ( w i - w ^ i ) 2 + ( h i - h ^ i ) 2 + Σ i = 0 S 2 Σ j = 0 B I i j o b j ( c i - c ^ i ) 2 + λ n o o b j Σ i = 0 S 2 Σ j = 0 B I i j n o o b j ( c i - c ^ i ) 2 + Σ i = 0 S 2 I i o b j Σ c ∈ c l a s s e s ( P i ( c ) - P ^ i ( c ) ) 2
Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal seat Mark,For prediction object central point grid i abscissa, yiFor mark target object central point in grid i Ordinate,For prediction object central point grid i ordinate, hiThe rectangle where the target object of mark The height of frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre- The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object 1,0 is otherwise taken, wherein,Determined according to below equation:
Pr ( Class i | O b j e c t ) * Pr ( O b j e c t ) = P ^ i ( c )
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is the net predicted Object in lattice i belongs to the conditional probability of classification c.
16. devices according to claim 10, it is characterised in that the determining module, specifically for according to the center The location parameter of point, determines positional information of the central point in the grid;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to the profile Dimensional parameters, determine the positional information of the rectangle frame, believe the positional information of the rectangle frame as the position of the object Breath, and using the corresponding object classification of the classification parameter as the object classification.
17. devices according to claim 16, it is characterised in that the determining module, specifically for by the grid Set point is as a reference point;According to the reference point and the location parameter of the central point, determine the central point in the net Positional information in lattice.
CN201611249792.1A 2016-12-29 2016-12-29 Method and device for detecting object in image Active CN106803071B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201611249792.1A CN106803071B (en) 2016-12-29 2016-12-29 Method and device for detecting object in image
EP17886017.7A EP3545466A4 (en) 2016-12-29 2017-10-20 Systems and methods for detecting objects in images
PCT/CN2017/107043 WO2018121013A1 (en) 2016-12-29 2017-10-20 Systems and methods for detecting objects in images
US16/457,861 US11113840B2 (en) 2016-12-29 2019-06-28 Systems and methods for detecting objects in images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611249792.1A CN106803071B (en) 2016-12-29 2016-12-29 Method and device for detecting object in image

Publications (2)

Publication Number Publication Date
CN106803071A true CN106803071A (en) 2017-06-06
CN106803071B CN106803071B (en) 2020-02-14

Family

ID=58985345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611249792.1A Active CN106803071B (en) 2016-12-29 2016-12-29 Method and device for detecting object in image

Country Status (1)

Country Link
CN (1) CN106803071B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392158A (en) * 2017-07-27 2017-11-24 济南浪潮高新科技投资发展有限公司 A kind of method and device of image recognition
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN108229307A (en) * 2017-11-22 2018-06-29 北京市商汤科技开发有限公司 For the method, apparatus and equipment of object detection
WO2018121013A1 (en) * 2016-12-29 2018-07-05 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
CN108460761A (en) * 2018-03-12 2018-08-28 北京百度网讯科技有限公司 Method and apparatus for generating information
CN108921840A (en) * 2018-07-02 2018-11-30 北京百度网讯科技有限公司 Display screen peripheral circuit detection method, device, electronic equipment and storage medium
CN108960232A (en) * 2018-06-08 2018-12-07 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN108968811A (en) * 2018-06-20 2018-12-11 四川斐讯信息技术有限公司 A kind of object identification method and system of sweeping robot
CN109272050A (en) * 2018-09-30 2019-01-25 北京字节跳动网络技术有限公司 Image processing method and device
CN109558791A (en) * 2018-10-11 2019-04-02 浙江大学宁波理工学院 It is a kind of that bamboo shoot device and method is sought based on image recognition
CN109685069A (en) * 2018-12-27 2019-04-26 乐山师范学院 Image detecting method, device and computer readable storage medium
CN109726741A (en) * 2018-12-06 2019-05-07 江苏科技大学 A kind of detection method and device of multiple target object
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN110110189A (en) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 Method and apparatus for generating information
CN110338835A (en) * 2019-07-02 2019-10-18 深圳安科高技术股份有限公司 A kind of intelligent scanning stereoscopic monitoring method and system
CN110610184A (en) * 2018-06-15 2019-12-24 阿里巴巴集团控股有限公司 Method, device and equipment for detecting salient object of image
CN110930386A (en) * 2019-11-20 2020-03-27 重庆金山医疗技术研究院有限公司 Image processing method, device, equipment and storage medium
CN111353555A (en) * 2020-05-25 2020-06-30 腾讯科技(深圳)有限公司 Label detection method and device and computer readable storage medium
CN111461318A (en) * 2019-01-22 2020-07-28 斯特拉德视觉公司 Neural network operation method using grid generator and device using the same
CN111597845A (en) * 2019-02-20 2020-08-28 中科院微电子研究所昆山分所 Two-dimensional code detection method, device and equipment and readable storage medium
CN111639660A (en) * 2019-03-01 2020-09-08 中科院微电子研究所昆山分所 Image training method, device, equipment and medium based on convolutional network
CN111914850A (en) * 2019-05-07 2020-11-10 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN112084874A (en) * 2020-08-11 2020-12-15 深圳市优必选科技股份有限公司 Object detection method and device and terminal equipment
CN112446867A (en) * 2020-11-25 2021-03-05 上海联影医疗科技股份有限公司 Method, device and equipment for determining blood flow parameters and storage medium
CN112785564A (en) * 2021-01-15 2021-05-11 武汉纺织大学 Pedestrian detection tracking system and method based on mechanical arm
CN113935425A (en) * 2021-10-21 2022-01-14 中国船舶重工集团公司第七一一研究所 Object identification method, device, terminal and storage medium
US11373411B1 (en) 2018-06-13 2022-06-28 Apple Inc. Three-dimensional object estimation using two-dimensional annotations
CN114739388A (en) * 2022-04-20 2022-07-12 中国移动通信集团广东有限公司 Indoor positioning navigation method and system based on UWB and laser radar

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104517113A (en) * 2013-09-29 2015-04-15 浙江大华技术股份有限公司 Image feature extraction method and device and image sorting method and device
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104517113A (en) * 2013-09-29 2015-04-15 浙江大华技术股份有限公司 Image feature extraction method and device and image sorting method and device
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RUSSELL STEWART 等: "End-to-end people detection in crowded scenes", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1506.04878》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121013A1 (en) * 2016-12-29 2018-07-05 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
US11113840B2 (en) 2016-12-29 2021-09-07 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
CN107392158A (en) * 2017-07-27 2017-11-24 济南浪潮高新科技投资发展有限公司 A kind of method and device of image recognition
CN108229307A (en) * 2017-11-22 2018-06-29 北京市商汤科技开发有限公司 For the method, apparatus and equipment of object detection
CN108229307B (en) * 2017-11-22 2022-01-04 北京市商汤科技开发有限公司 Method, device and equipment for object detection
US11222441B2 (en) 2017-11-22 2022-01-11 Beijing Sensetime Technology Development Co., Ltd. Methods and apparatuses for object detection, and devices
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN110110189A (en) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 Method and apparatus for generating information
CN108460761A (en) * 2018-03-12 2018-08-28 北京百度网讯科技有限公司 Method and apparatus for generating information
CN108960232A (en) * 2018-06-08 2018-12-07 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
US11748998B1 (en) 2018-06-13 2023-09-05 Apple Inc. Three-dimensional object estimation using two-dimensional annotations
US11373411B1 (en) 2018-06-13 2022-06-28 Apple Inc. Three-dimensional object estimation using two-dimensional annotations
CN110610184A (en) * 2018-06-15 2019-12-24 阿里巴巴集团控股有限公司 Method, device and equipment for detecting salient object of image
CN110610184B (en) * 2018-06-15 2023-05-12 阿里巴巴集团控股有限公司 Method, device and equipment for detecting salient targets of images
CN108968811A (en) * 2018-06-20 2018-12-11 四川斐讯信息技术有限公司 A kind of object identification method and system of sweeping robot
KR20200004823A (en) * 2018-07-02 2020-01-14 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Display screen peripheral circuit detection method, device, electronic device and storage medium
CN108921840A (en) * 2018-07-02 2018-11-30 北京百度网讯科技有限公司 Display screen peripheral circuit detection method, device, electronic equipment and storage medium
KR102321768B1 (en) 2018-07-02 2021-11-03 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Display screen peripheral circuit detection method, apparatus, electronic device and storage medium
JP2020530125A (en) * 2018-07-02 2020-10-15 北京百度網訊科技有限公司 Display screen peripheral circuit detection method, display screen peripheral circuit detection device, electronic devices and storage media
CN109272050B (en) * 2018-09-30 2019-11-22 北京字节跳动网络技术有限公司 Image processing method and device
CN109272050A (en) * 2018-09-30 2019-01-25 北京字节跳动网络技术有限公司 Image processing method and device
CN109558791A (en) * 2018-10-11 2019-04-02 浙江大学宁波理工学院 It is a kind of that bamboo shoot device and method is sought based on image recognition
CN109726741A (en) * 2018-12-06 2019-05-07 江苏科技大学 A kind of detection method and device of multiple target object
CN109726741B (en) * 2018-12-06 2023-05-30 江苏科技大学 Method and device for detecting multiple target objects
CN109685069A (en) * 2018-12-27 2019-04-26 乐山师范学院 Image detecting method, device and computer readable storage medium
CN111461318B (en) * 2019-01-22 2023-10-17 斯特拉德视觉公司 Neural network operation method using grid generator and device using the same
CN111461318A (en) * 2019-01-22 2020-07-28 斯特拉德视觉公司 Neural network operation method using grid generator and device using the same
CN111597845A (en) * 2019-02-20 2020-08-28 中科院微电子研究所昆山分所 Two-dimensional code detection method, device and equipment and readable storage medium
CN111639660B (en) * 2019-03-01 2024-01-12 中科微至科技股份有限公司 Image training method, device, equipment and medium based on convolution network
CN111639660A (en) * 2019-03-01 2020-09-08 中科院微电子研究所昆山分所 Image training method, device, equipment and medium based on convolutional network
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN111914850A (en) * 2019-05-07 2020-11-10 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN111914850B (en) * 2019-05-07 2023-09-19 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN110338835A (en) * 2019-07-02 2019-10-18 深圳安科高技术股份有限公司 A kind of intelligent scanning stereoscopic monitoring method and system
CN110930386A (en) * 2019-11-20 2020-03-27 重庆金山医疗技术研究院有限公司 Image processing method, device, equipment and storage medium
CN110930386B (en) * 2019-11-20 2024-02-20 重庆金山医疗技术研究院有限公司 Image processing method, device, equipment and storage medium
CN111353555A (en) * 2020-05-25 2020-06-30 腾讯科技(深圳)有限公司 Label detection method and device and computer readable storage medium
CN112084874A (en) * 2020-08-11 2020-12-15 深圳市优必选科技股份有限公司 Object detection method and device and terminal equipment
CN112084874B (en) * 2020-08-11 2023-12-29 深圳市优必选科技股份有限公司 Object detection method and device and terminal equipment
CN112446867A (en) * 2020-11-25 2021-03-05 上海联影医疗科技股份有限公司 Method, device and equipment for determining blood flow parameters and storage medium
CN112785564A (en) * 2021-01-15 2021-05-11 武汉纺织大学 Pedestrian detection tracking system and method based on mechanical arm
CN113935425A (en) * 2021-10-21 2022-01-14 中国船舶重工集团公司第七一一研究所 Object identification method, device, terminal and storage medium
CN114739388A (en) * 2022-04-20 2022-07-12 中国移动通信集团广东有限公司 Indoor positioning navigation method and system based on UWB and laser radar

Also Published As

Publication number Publication date
CN106803071B (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CN106803071A (en) Object detecting method and device in a kind of image
CN106780612B (en) Object detecting method and device in a kind of image
Sheng et al. Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
US11581130B2 (en) Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation
CN110264468B (en) Point cloud data mark, parted pattern determination, object detection method and relevant device
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN107229904A (en) A kind of object detection and recognition method based on deep learning
CN110991444B (en) License plate recognition method and device for complex scene
CN106874914A (en) A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks
CN110400332A (en) A kind of target detection tracking method, device and computer equipment
CN107330410A (en) Method for detecting abnormality based on deep learning under complex environment
CN106570453A (en) Pedestrian detection method, device and system
CN113362491B (en) Vehicle track prediction and driving behavior analysis method
CN104361351A (en) Synthetic aperture radar (SAR) image classification method on basis of range statistics similarity
CN111062383A (en) Image-based ship detection depth neural network algorithm
CN102750522B (en) A kind of method of target following
Joseph et al. Systematic advancement of YOLO object detector for real-time detection of objects
CN113487600A (en) Characteristic enhancement scale self-adaptive sensing ship detection method
Niu et al. A PCB Defect Detection Algorithm with Improved Faster R-CNN.
CN114882423A (en) Truck warehousing goods identification method based on improved Yolov5m model and Deepsort
CN106960434A (en) A kind of image significance detection method based on surroundedness and Bayesian model
Bai et al. Depth feature fusion based surface defect region identification method for steel plate manufacturing
CN109492697A (en) Picture detects network training method and picture detects network training device
CN104050665B (en) The method of estimation and device of prospect residence time in a kind of video image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant