CN106803071A - Object detecting method and device in a kind of image - Google Patents
Object detecting method and device in a kind of image Download PDFInfo
- Publication number
- CN106803071A CN106803071A CN201611249792.1A CN201611249792A CN106803071A CN 106803071 A CN106803071 A CN 106803071A CN 201611249792 A CN201611249792 A CN 201611249792A CN 106803071 A CN106803071 A CN 106803071A
- Authority
- CN
- China
- Prior art keywords
- grid
- image
- central point
- classification
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the real-time of target detection.According to default dividing mode in the method, image to be detected is divided into multiple grids, image after division is input in the convolutional neural networks of training in advance completion, obtain the corresponding characteristic vector of each grid of the described image of convolutional neural networks output, recognize the maximum of classification parameter in each characteristic vector, when the maximum is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, the positional information of the object of the corresponding classification of category parameter is determined.Due to the convolutional neural networks completed by training in advance in the embodiment of the present invention, determine classification and the position of object in image, the detection of object space and classification can simultaneously be realized, without the multiple characteristic areas of selection, save the time of detection, the real-time of detection and the efficiency of detection are improve, and is easy to global optimization.
Description
Technical field
The present invention relates to machine learning techniques field, object detecting method and device in more particularly to a kind of image.
Background technology
With the development of Video Supervision Technique, intelligent video monitoring is applied in increasing scene, such as traffic, business
Field, hospital, cell, park etc., the application of intelligent video monitoring by image in various scenes, to carry out target detection and establish
Basis.
It is general using the convolutional neural networks based on candidate region when prior art carries out target detection in the picture
(Region Convolutional Neural Network, R-CNN) and its extend Fast RCNN and FasterRCNN.Fig. 1
It is the schematic flow sheet that object detection is carried out using R-CNN, its detection process includes:Input picture is received, is extracted in the picture
Candidate region (region proposal), calculates the CNN features of each candidate region, and true using classification and the method for returning
The type of earnest body and position., it is necessary to extract 2000 candidate regions in the picture in said process, the whole process extracted
The time of time-consuming 1~2s is needed, then for each candidate region, it is necessary to calculate the CNN features of the candidate region, and candidate regions
There are many in domain in the presence of overlap, therefore can also there are many repeated works when CNN features are calculated;And this was detected
Also include subsequent step in journey:The feature learning of proposal, and the position of pair object for determining is corrected and eliminates void
The treatment such as alert, whole detection process may need the time of 2~40s, leverage the real-time of object detection.
In addition, during carrying out object detection using R-CNN, the extraction of image is detected using conspicuousness
(selective search) is extracted, and calculates CNN features using convolutional neural networks afterwards, finally reuses SVMs
Model (SVM) is classified, so that it is determined that the position of target.And above three step is all separate method, can not
Global optimization is carried out to whole detection process.
Fig. 2 is the process schematic that object detection is carried out using Faster RCNN, and the process is entered using convolutional neural networks
OK, each sliding window will generate one 256 data of dimension at intermediate layer (intermediate layer), in classification layer (cls
Layer) the classification of detection object, is returning the position of layer (reg layer) detection object.Above-mentioned classification and position to object
Detection be two independent steps, be required for being detected respectively for the data of 256 dimensions in two steps, therefore the process
Also will growth detection duration, so as to influence the real-time of object detection.
The content of the invention
The embodiment of the invention discloses object detecting method and device in a kind of image, it is used to improve the reality of object detection
Shi Xing, and be easy to carry out global optimization to object detection.
To reach above-mentioned purpose, the embodiment of the invention discloses the object detecting method in kind of image, it is applied to electronics and sets
Standby, the method includes:
According to default dividing mode, image to be detected is divided into multiple grids, wherein the image to be detected
Size be target size;
Image after division is input in the convolutional neural networks of training in advance completion, convolutional neural networks output is obtained
Described image multiple characteristic vectors, wherein each grid correspondence one characteristic vector;
For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when it is described most
When big value is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, the category is determined
The positional information of the object of parameter correspondence classification.
Further, it is described according to default dividing mode, it is described before image to be detected is divided into multiple grids
Method also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
Further, the training process of the convolutional neural networks includes:
For each sample image in sample image set, using rectangle frame label target object;
Each sample image is divided into multiple grids according to default dividing mode, the corresponding feature of each grid is determined
Vector, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to
The classification of the target object, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set in advance
The maximum of setting, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector
Value, and according to mark the target object rectangle frame size, determine the appearance and size parameter in the characteristic vector
Value, when in grid not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid;
According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are trained.
Further, it is described each sample image is divided into multiple grids according to default dividing mode before, it is described
Method also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
Further, the basis determines each sample image of the characteristic vector of each grid, to convolutional Neural net
Network be trained including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than
The quantity of sample image in the sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
Further, the default dividing mode includes:
Image and sample image are divided into line number amount and number of columns identical multiple grid;Or,
Image and sample image are divided into multiple grids that line number amount and number of columns are differed.
Further, methods described also includes:
Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample
The information of the target object marked in image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage
Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke
Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net
The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark
The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction
Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object
Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned
Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set
The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,
The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,Do not exist the central point of object in the grid i of prediction
When take 1, otherwise take 0, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction
Grid i in object belong to the conditional probability of classification c.
Further, it is described according to center position parameter in the characteristic vector and appearance and size parameter, determine such
The positional information of the object of other parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described
Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object
Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Further, the location parameter according to the central point, determines position of the central point in the grid
Confidence breath includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, really
Fixed positional information of the central point in the grid.
The embodiment of the invention discloses the article detection device in a kind of image, described device includes:
Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described
The size of image to be detected is target size;
Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains volume
Multiple characteristic vectors of the described image of product neutral net output, wherein one characteristic vector of each grid correspondence;
Determining module, for for the corresponding characteristic vector of each grid, in identification this feature vector, classification parameter is most
Big value, when the maximum is more than given threshold, joins according to center position parameter in the characteristic vector and appearance and size
Number, determines the positional information of the object of category parameter correspondence classification.
Further, described device also includes:
Judge adjusting module, whether the size for judging described image is target size;If not, by described image
Size adjusting is target size.
Further, described device also includes:
Training module, for for each sample image in sample image set, using rectangle frame label target object;
Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined, its
In, each described sample image size is target size, when the central point of target object is included in grid, according to the target
The classification of object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid be set to it is set in advance most
Big value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and root
According to the size of the rectangle frame of the target object of mark, the value of the appearance and size parameter in the characteristic vector is determined, work as net
When not including the central point of target object in lattice, the value of each parameter is zero in the corresponding characteristic vector of the grid;According to determining
Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.
Further, the training module, is additionally operable to for each sample image, judge the sample image size whether
It is target size;If not, being target size by the size adjusting of the sample image.
Further, the training module, specifically for choosing subsample image in the sample image set, wherein
Quantity of the quantity of the subsample image chosen less than sample image in the sample image set;Using each chosen
Convolutional neural networks are trained by the subsample image.
Further, described device also includes:
Error calculating module, for the position according to the convolutional neural networks to the subsample objects in images and class
The target object marked in other prediction, and subsample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage
Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke
Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net
The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark
The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction
Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object
Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned
Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set
Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre-
The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object
1,0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction
Grid i in object belong to the conditional probability of classification c.
Further, the determining module, specifically for the location parameter according to the central point, determines the central point
Positional information in the grid;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described
Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object
Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Further, the determining module, specifically for the set point of the grid is as a reference point;According to the ginseng
The location parameter of examination point and the central point, determines positional information of the central point in the grid.
The object detecting method and device in a kind of image are the embodiment of the invention provides, according to default stroke in the method
The mode of dividing, multiple grids are divided into by image to be detected, and the wherein size of described image is target size, by the figure after division
As being input in the convolutional neural networks of training in advance completion, multiple features of the described image of convolutional neural networks output are obtained
Vector, wherein each grid one characteristic vector of correspondence, recognize the maximum of classification parameter in each characteristic vector, when the maximum
When value is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, category parameter pair is determined
The positional information of the object of the classification answered.Due in the embodiment of the present invention by training in advance complete convolutional neural networks, really
Fixed corresponding each characteristic vector of the image, classification parameter and location-dependent parameters in characteristic vector, in determining image
Object classification and position, can simultaneously realize the detection of object space and classification, be easy to global optimization, additionally, due to basis
The corresponding characteristic vector of each grid, determines position and the classification of object, without the multiple characteristic areas of selection, saves detection
Time, improve the real-time of detection and the efficiency of detection.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet that object detection is carried out using R-CNN;
Fig. 2 is the process schematic that object detection is carried out using Faster RCNN;
Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention;
Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention;
Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention;
Fig. 6 A- Fig. 6 D are the annotation results schematic diagram of target object provided in an embodiment of the present invention;
Fig. 7 is the construction process schematic diagram of cube structure in Fig. 6 D;
Fig. 8 is the structural representation of the article detection device in a kind of image provided in an embodiment of the present invention.
Specific embodiment
In order to effectively improve the efficiency of object detection, the real-time of object detection is improved, is easy to object detection global optimization,
The embodiment of the invention provides the object detecting method and device in a kind of image.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Fig. 3 is the object detection process schematic in a kind of image provided in an embodiment of the present invention, and the process includes following
Step:
Step S301:According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked
The size of the image of survey is target size.
The embodiment of the present invention is applied to electronic equipment, and the specific electronic equipment can be desktop computer, notebook, other tools
There is smart machine of disposal ability etc..
After getting the image to be detected of target size, according to default dividing mode, image to be detected is divided
It is multiple grids, when wherein the default dividing mode is trained with convolutional neural networks, the dividing mode to image is identical.For example
For convenience, it by image is multiple rows, Duo Gelie that can be, the interval between interval and row between row is equal, or.
Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out convolutional neural networks
The image of training uses identical mesh generation mode.
When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many
Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into
Than can be with identical, it is also possible to different.
Step S302:Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained
Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.
For the classification of object and position in detection image, convolutional neural networks are instructed in embodiments of the present invention
Practice, the corresponding characteristic vector of each grid is obtained by training the convolutional neural networks for completing, for example, can divide an image into
49 grids of 7*7, then after the image after division being input into the convolutional neural networks that training is completed, can export 49 features
Vector, each characteristic vector one grid of correspondence.
Step S303:For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector,
When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really
Determine the positional information of the object of category parameter correspondence classification.
Specifically, the characteristic vector that the embodiment of the present invention is obtained is multi-C vector, this feature vector at least includes:Classification is joined
Number and location parameter, wherein classification parameter include multiple, and the location parameter includes again:Center position parameter and appearance and size are joined
Number.After each grid corresponding characteristic vector is obtained, for the corresponding characteristic vector of each grid, the grid is judged respectively
Whether object is detected.If the maximum in the corresponding characteristic vector of grid in multiple classification parameters is more than given threshold,
To object, the corresponding classification of category parameter is the classification of the object to the grid detection, can be according to the corresponding spy of the grid
Levy the position that vector determines the object.
Because the location parameter in the characteristic vector used when convolutional neural networks are trained is true according to the method for setting
Fixed, therefore the position of object can be determined according to the method for the setting.
Due in the embodiment of the present invention by training in advance complete convolutional neural networks, determine the image it is corresponding each
Characteristic vector, classification parameter and location-dependent parameters in characteristic vector determine classification and the position of object in image,
The prediction of object space and classification can be simultaneously realized, is easy to global optimization, additionally, due to according to the corresponding feature of each grid
Vector, determines position and the classification of object, without the multiple characteristic areas of selection, saves the time of detection, improves detection
Real-time and the efficiency of detection.
Object detection in the embodiment of the present invention is directed to the detection that the image of target size is carried out, and the target size is volume
When product neutral net is trained, the uniform sizes of the image of use, the size can be arbitrary dimension, as long as convolutional Neural net
The size of image when during network training with object detection is identical.The target size for example can be 1024*1024, or,
Can be 256*512 etc..
Therefore, in embodiments of the present invention, all it is target size to ensure the image being input in convolutional neural networks
Image, according to default dividing mode, before image to be detected is divided into multiple grids, methods described also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
When image to be detected is target size, subsequent treatment directly is carried out to the image, when image to be detected is non-
During target size, by the Image Adjusting to be detected to target size.The adjustment of picture size belongs to prior art, in the present invention
The process is not repeated in embodiment.
Specifically, being joined according to center position parameter in the characteristic vector and appearance and size in embodiments of the present invention
Number, determining the positional information of the object of category parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to described
Appearance and size parameter, determines the positional information of the rectangle frame, using the positional information of the rectangle frame as the position of the object
Confidence cease, and using the corresponding object classification of the classification parameter as the object classification.
Wherein, the location parameter according to the central point, determines position letter of the central point in the grid
Breath includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, really
Fixed positional information of the central point in the grid.
Fig. 4 is the detailed implementation process schematic diagram of the object detection in a kind of image provided in an embodiment of the present invention, the mistake
Journey is comprised the following steps:
Step S401:Receive image to be detected.
Step S402:Whether the size for judging described image is target size, if it is, carrying out step S404, otherwise, is entered
Row step S403.
Step S403:It is target size by the size adjusting of described image.
Step S404:According to default dividing mode, image to be detected is divided into multiple grids, wherein described to be checked
The size of the image of survey is target size.
Step S405:Image after division is input in the convolutional neural networks of training in advance completion, convolution god is obtained
Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence.
Step S406:For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector.
Step S407:It is when the maximum is more than given threshold, the set point of the grid is as a reference point, according to
The location parameter of the reference point and the central point, determines positional information of the central point in the grid.
Step S408:The central point is determined according to the positional information, using the central point as the center of rectangle frame,
According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described
The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.
Above-mentioned target detection is that the convolutional neural networks completed based on training are carried out, in order to realize the detection to object,
Need to be trained convolutional neural networks.In the embodiment of the present invention when being trained to convolutional neural networks, by target chi
Very little sample image is divided into multiple grids, if the central point of a certain target object is located in some grid, the grid
It is responsible for detecting the target object, including the classification and corresponding position (bounding box) for detecting the target object.
Fig. 5 is the training process schematic diagram of convolutional neural networks provided in an embodiment of the present invention, and the process includes following step
Suddenly:
Step S501:For each sample image in sample image set, using rectangle frame label target object.
Convolutional neural networks are trained using substantial amounts of sample image in embodiments of the present invention, then substantial amounts of sample
Image construction sample image set.Using rectangle frame in each sample image label target object.
Specifically, the annotation results schematic diagram of the target object as shown in Fig. 6 A- Fig. 6 D, deposits in the sample image in Fig. 6 A
Dog, bicycle and car are respectively in 3 target objects.When being labeled to each target object, respectively in sample graph
Recognize each target object on the top of upper and lower, left and right (with respect to the upper and lower, left and right direction shown in Fig. 6 A) four direction as in
Point, if the summit be upper and lower summit, by respectively through upper and lower summit parallel to bottom on sample image two lines
As two sides of rectangle frame, if the summit is left and right summit, will be respectively through left and right summit parallel to sample image
The two lines of left and right side as rectangle frame two other side.Such as the dog, bicycle and the car that are marked with dotted line in Fig. 6 A
Rectangle frame.
Step S502:Each sample image is divided into multiple grids according to default dividing mode, each grid is determined
Corresponding characteristic vector, wherein, each described sample image size is target size, when the center comprising target object in grid
During point, according to the classification of the target object, by the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid
Maximum set in advance is set to, the position in the grid is located at according to the central point, determine center in the characteristic vector
The value of point location parameter, and the rectangle frame according to the target object of mark size, determine outer in the characteristic vector
The value of shape dimensional parameters, when the central point of target object is not included in grid, each parameter in the corresponding characteristic vector of the grid
Value be zero.
Sample image can be divided into multiple grids according to default dividing mode in embodiments of the present invention, wherein
The dividing mode of the sample image is identical to the dividing mode of image to be detected with above-mentioned detection process.
For example for convenience, it by image is multiple rows, Duo Gelie, the interval between interval and row between row that can be
It is equal, or.Multiple irregular grids can certainly be divided an image into, as long as ensureing image to be detected and carrying out
The image of convolutional neural networks training uses identical mesh generation mode.
When multiple rows and multiple row are divided an image into, can be divide an image into row, column quantity identical it is many
Individual grid, or the different multiple grids of quantity of row, column, the length and width of each grid after dividing certainly can also be divided into
Than can be with identical, it is also possible to different.Sample image is for example divided into multiple nets of 12*10 or 15*15 or 6*6 etc.
Lattice.When grid it is equal in magnitude when, sizing grid can be normalized.As shown in Figure 6B, in embodiments of the present invention will
Sample image is divided into laterally 7 rows, and multiple grids of the row of longitudinal direction 7, each grid is each grid after grid, therefore normalization
Size can consider 1*1.
One characteristic vector of each grid correspondence in sample image, this feature vector is multi-C vector, and this feature vector is extremely
Include less:Classification parameter and location parameter, wherein classification parameter include multiple, and the location parameter includes again:Center position is joined
Number and appearance and size parameter.
Step S503:According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are entered
Row training.
Specifically, in embodiments of the present invention, can be using all sample images in sample image set to convolution god
It is trained through network.But because including substantial amounts of sample image in sample image set, in order to improve the efficiency of training, at this
According to each sample image for the characteristic vector that each grid is determined in inventive embodiments, convolutional neural networks are trained
Including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than
The quantity of sample image in the sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
By randomly selecting the subsample image much smaller than sample image total quantity, convolutional neural networks are trained,
The parameter of convolutional neural networks is constantly updated, until the information of the target object of the information and mark of the object of each grid forecasting
Between error convergence untill.
Likewise, in embodiments of the present invention when being trained to convolutional neural networks, the sample of the target size of use
This image, therefore, in embodiments of the present invention, in order to ensure that the sample image being input in convolutional neural networks is all target chi
It is very little, it is described each sample image is divided into multiple grids according to default dividing mode before, methods described also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
When sample image is target size, subsequent treatment directly is carried out to the sample image, when sample image non-targeted
During size, the sample image is adjusted to target size.The adjustment of picture size belongs to prior art, in embodiments of the present invention
The process is not repeated.
In said process, the adjustment of target size can be first carried out to sample image, it is also possible to first enter in sample image
The mark of row rectangle frame.Advanced row carry out rectangle frame mark ensure size in sample image than it is larger when, can accurately mark
Note target object, the adjustment for first carrying out target size can ensure, when the size of sample image is smaller, can accurately mark
Target object.
In above-mentioned annotation process, it may be determined that the corresponding characteristic vector of each grid in sample image, of the invention real
Apply the corresponding characteristic vector of each grid in example can be expressed as (confidence, cls1, cls2, cls3 ..., cls20, x,
Y, w, h), wherein confidence is probability parameter, cls1, and cls2, cls3 ..., cls20 are classification parameter, and x, y, w and h are
Location parameter is put centered on location parameter, wherein x and y, w and y is appearance and size parameter.When in the grid comprising target object
During central point, the value of each parameter in the corresponding characteristic vector of the grid is determined, in the grid not comprising target object
During heart point, the value of each parameter is 0 in the corresponding characteristic vector of the grid.
Specifically, due to being marked to each target object using rectangle frame in sample image, it is believed that square
The central point of shape frame is the central point of target object, three central points of rectangle frame as shown in Figure 6 C.When being included in grid
During the central point of target object, then in mark, it is believed that probability parameter is 1 in the corresponding characteristic vector of the grid, that is, work as
The probability that there is target object in the preceding grid is 1.
Because the classification of the target object included in sample image has multiple, in embodiments of the present invention using classification parameter
Cls represents, cls1, cls2 ..., clsn represent different classes of target object respectively.Such as n can be 20, that is, have
The target object of classification in 20, the target object classification that cls1 is represented be car, the target object classification that cls2 is represented be dog,
The target object classification that cls3 is represented is bicycle.When the central point of target object is included in grid, by the target object pair
The classification parameter value answered is set to maximum, and wherein more than the threshold value of setting, such as maximum can be 1, threshold to the maximum
Value can be 0.4 etc..
For example shown in Fig. 6 C, the grid from the bottom up where (upper and lower shown in Fig. 6 C) each central point is corresponding
In characteristic vector, cls2 is 1 in the classification parameter in the corresponding characteristic vector of first central point, and other classification parameters are 0, the
Cls3 is 1 in classification parameter in the corresponding characteristic vector of two central points, and other classification parameters are 0, the 3rd central point pair
Cls1 is 1 in classification parameter in the characteristic vector answered, and other classification parameters are 0.
Location parameter x, y, w and h of target object are also included in this feature vector, position ginseng is put centered on wherein x and y
Number, its numerical value is the transverse and longitudinal coordinate value of the central point relative to set point of target object, the wherein corresponding set point of each grid
Can be with identical, it is also possible to different, such as it is considered that the upper left corner of sample image is set point, the i.e. origin of coordinates, because to every
Individual grid is normalized, therefore the coordinate of each position in each grid is uniquely determined.Certainly, in order to simplify
Journey, reduces amount of calculation, and the corresponding set point of each grid can also be different, can be using each grid as an independent list
Unit, the upper left corner of the grid is set point, the i.e. origin of coordinates.Therefore when being labeled, can according to central point relative to
Skew in the grid upper left corner, determines the value of the x and y in the corresponding characteristic vector of the grid where it.Wherein, according to relative position
Skew, determine that the process of x and y values belongs to prior art, the process is not repeated in embodiments of the present invention.Join position
W and h is appearance and size parameter in number, and its numerical value is the length and value wide of rectangle frame where target object.
Because characteristic vector is a multi-C vector, in order to accurately represent the corresponding characteristic vector of each grid, at this
Building mode in inventive embodiments according to Fig. 7, builds the cube structure shown in Fig. 6 D, by the grid convolutional layer,
Maximum pond layer, full articulamentum and output layer carry out respective handling, generate lattice structure, and wherein lattice is in Z axis side
To depth according to the dimension of characteristic vector determine.In embodiments of the present invention, the lattice is in the depth of Z-direction
25.Above-mentioned that respective handling is carried out in each layer of convolutional neural networks, the process for generating lattice structure belongs to prior art,
The process is not repeated in embodiments of the present invention.
After being labeled to substantial amounts of sample image using aforesaid way, using the sample image after mark to convolutional Neural
Network is trained.Specifically, convolutional neural networks are carried out by the multiple subsample images for using in embodiments of the present invention
Training.In the training process, for each subsample image, the convolution for obtaining the subsample image by convolutional neural networks is special
Levy figure, in the convolution characteristic pattern comprising correspondence each grid characteristic vector (confidence, cls1, cls2, cls3 ...,
Cls20, x, y, w, h), location parameter and classification parameter comprising the object predicted in the grid in this feature vector, and
Probability parameter confidence, probability parameter confidence represent rectangle frame where the object that the grid forecasting is arrived and mark
The target object rectangle frame overlapping degree.
Each subsample image is directed in the training process, by calculating the error of information of forecasting and markup information, adjustment
The network parameter of convolutional neural networks, by randomly selecting the subsample figure much smaller than sample image total quantity (batch) every time
Convolutional neural networks are trained by picture, and update its network parameter, until each grid information of forecasting and markup information it
Between error convergence.Convolutional neural networks are trained according to subsample image, adjust the network parameter of convolutional neural networks,
Until the process that convolutional neural networks training is completed belongs to prior art, the process is not gone to live in the household of one's in-laws on getting married in embodiments of the present invention
State.
In the training process of above-mentioned convolutional neural networks, in order to accurately predict position and the classification information of object,
The last full articulamentum of the convolutional neural networks uses logic activation function, convolutional layer to connect entirely with other in the embodiment of the present invention
Layer is connect using Leak ReLU functions.Wherein the leaky ReLU functions are:
In embodiments of the present invention in order to complete the training to convolutional neural networks, restrain it, to convolutional Neural net
During network training, the method also includes:
Prediction according to the convolutional neural networks to the position and classification of target object in the subsample image, and son
The information of the target object marked in sample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage
Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of each grid forecasting, typically take 1 or 2, xi be mark target object central point grid i horizontal stroke
Coordinate,For prediction object central point grid i abscissa, yi be mark target object central point in the net
The ordinate of lattice i,For prediction object central point grid i ordinate, hiWhere the target object of mark
The height of rectangle frame, wiThe width of rectangle frame where the target object of mark,The rectangle frame where the object of prediction
Height,The width of rectangle frame, C where the object of predictioniFor the grid i of mark currently whether there is target object
Probability,Be prediction grid i currently with the presence or absence of object probability, PiC () is the object in the grid i of mark
Body belongs to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjFor
The weights of setting,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,
1 is taken when the grid i of prediction has the central point of object, 0 is otherwise taken,Do not exist the center of object in the grid i of prediction
1 is taken during point, 0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction
Grid i in object belong to the conditional probability of classification c.
During in order that the application condition predicted the outcome between annotation results is big, it is during prediction to position prediction
Contribution is smaller, in embodiments of the present invention, using above-mentioned loss function.
As shown in Figure 6B, each sample image is divided for 7*7 49 totally in one particular embodiment of the present invention
Grid, each grid can be detected to 20 classifications, therefore a sample image can produce 980 detection probabilities, its
The detection probability of middle most of grid is 0.This will cause to train discretization, a variable be introduced herein and solve this problem:
I.e. certain grid whether with the presence of object probability.Therefore except 20 classification parameters, the grid of also one prediction is currently
The no probability P with the presence of objectr(Object), then the target object in certain grid belongs to the probability of classification cIt is Pr
(Object) object and in certain grid of prediction belongs to the conditional probability P of classification crThe product of (Class | Object).
In each grid to Pr(Object) it is updated, ability is to P when only there is object within a gridr(Class | Object) enter
Row updates.
Fig. 8 is the article detection device structural representation in a kind of image provided in an embodiment of the present invention, and the device is located at
In electronic equipment, the device includes:
Division module 81, for according to default dividing mode, image to be detected being divided into multiple grids, wherein institute
The size for stating image to be detected is target size;
Detection module 82, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains
Multiple characteristic vectors of the described image of convolutional neural networks output, wherein one characteristic vector of each grid correspondence;
Determining module 83, for for the corresponding characteristic vector of each grid, classification parameter in identification this feature vector
Maximum, when the maximum is more than given threshold, according to center position parameter and appearance and size in the characteristic vector
Parameter, determines the positional information of the object of category parameter correspondence classification.
Described device also includes:
Judge adjusting module 84, whether the size for judging described image is target size;If not, by described image
Size adjusting be target size.
Described device also includes:
Training module 85, for for each sample image in sample image set, using rectangle frame label target thing
Body;Each sample image is divided into multiple grids according to default dividing mode, the corresponding characteristic vector of each grid is determined,
Wherein, each described sample image size is target size, when the central point of target object is included in grid, according to the mesh
The classification of object is marked, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance
Maximum, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and
The size of the rectangle frame of the target object according to mark, determines the value of the appearance and size parameter in the characteristic vector, when
When not including the central point of target object in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid;According to determination
Convolutional neural networks are trained by each sample image of the characteristic vector of each grid.
The training module 85, is additionally operable to for each sample image, and whether the size for judging the sample image is target
Size;If not, being target size by the size adjusting of the sample image.
The training module 85, specifically for choosing subsample image in the sample image set, wherein choose
Quantity of the quantity of the subsample image less than sample image in the sample image set;Using each the described son chosen
Convolutional neural networks are trained by sample image.
Described device also includes:
Error calculating module 86, for according to the convolutional neural networks to the position of the subsample objects in images and
The information of the target object marked in the prediction of classification, and subsample image, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following damage
Function is lost to determine:
Wherein, S is that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of each grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal stroke
Coordinate,For prediction object central point grid i abscissa, yiFor mark target object central point in the net
The ordinate of lattice i,For prediction object central point grid i ordinate, hiThe square where the target object of mark
The height of shape frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction
Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object
Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned
Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set
Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre-
The grid i of survey takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object
1 is taken, 0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is prediction
Grid i in object belong to the conditional probability of classification c.
The determining module 83, specifically for the location parameter according to the central point, determines the central point described
Positional information in grid;The central point is determined according to the positional information, using the central point as the center of rectangle frame,
According to the appearance and size parameter, the positional information of the rectangle frame is determined, using the positional information of the rectangle frame as described
The positional information of object, and using the corresponding object classification of the classification parameter as the object classification.
The determining module 83, specifically for the set point of the grid is as a reference point;According to the reference point and
The location parameter of the central point, determines positional information of the central point in the grid.
The object detecting method and device in a kind of image are the embodiment of the invention provides, according to default stroke in the method
The mode of dividing, multiple grids are divided into by image to be detected, and the wherein size of described image is target size, by the figure after division
As being input in the convolutional neural networks of training in advance completion, multiple features of the described image of convolutional neural networks output are obtained
Vector, wherein each grid one characteristic vector of correspondence, recognize the maximum of classification parameter in each characteristic vector, when the maximum
When value is more than given threshold, according to center position parameter in characteristic vector and appearance and size parameter, category parameter pair is determined
The positional information of the object of the classification answered.Due in the embodiment of the present invention by training in advance complete convolutional neural networks, really
Fixed corresponding each characteristic vector of the image, classification parameter and location-dependent parameters in characteristic vector, in determining image
Object classification and position, can simultaneously realize the detection of object space and classification, be easy to global optimization, additionally, due to basis
The corresponding characteristic vector of each grid, determines position and the classification of object, without the multiple characteristic areas of selection, saves detection
Time, improve the real-time of detection and the efficiency of detection.
For systems/devices embodiment, because it is substantially similar to embodiment of the method, so the comparing of description is simple
Single, the relevent part can refer to the partial explaination of embodiments of method.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.And, the application can be used and wherein include the computer of computer usable program code at one or more
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) is produced
The form of product.
The application is the flow with reference to method, equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions
The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices
The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy
In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger
Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the application scope.
Obviously, those skilled in the art can carry out the essence of various changes and modification without deviating from the application to the application
God and scope.So, if these modifications of the application and modification belong to the scope of the application claim and its equivalent technologies
Within, then the application is also intended to comprising these changes and modification.
Claims (17)
1. the object detecting method in a kind of image, it is characterised in that be applied to electronic equipment, the method includes:
According to default dividing mode, image to be detected is divided into multiple grids, wherein the chi of the image to be detected
Very little is target size;
Image after division is input in the convolutional neural networks of training in advance completion, the institute of convolutional neural networks output is obtained
State multiple characteristic vectors of image, wherein one characteristic vector of each grid correspondence;
For the corresponding characteristic vector of each grid, the maximum of classification parameter in identification this feature vector, when the maximum
During more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, category parameter is determined
The positional information of the object of correspondence classification.
2. method according to claim 1, it is characterised in that described according to default dividing mode, by figure to be detected
Before as being divided into multiple grids, methods described also includes:
Whether the size for judging described image is target size;
If not, being target size by the size adjusting of described image.
3. method according to claim 1, it is characterised in that the training process of the convolutional neural networks includes:
For each sample image in sample image set, using rectangle frame label target object;
Each sample image is divided into multiple grids according to default dividing mode, determine the corresponding feature of each grid to
Amount, wherein, each described sample image size is target size, when the central point of target object is included in grid, according to institute
The classification of target object is stated, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to set in advance
Fixed maximum, the position in the grid is located at according to the central point, determines center position parameter in the characteristic vector
Value, and the rectangle frame according to the target object of mark size, determine appearance and size parameter in the characteristic vector
Value, when the central point of target object is not included in grid, the value of each parameter is zero in the corresponding characteristic vector of the grid;
According to each sample image for the characteristic vector that each grid is determined, convolutional neural networks are trained.
4. method according to claim 3, it is characterised in that it is described according to default dividing mode by each sample image
Before being divided into multiple grids, methods described also includes:
For each sample image, whether the size for judging the sample image is target size;
If not, being target size by the size adjusting of the sample image.
5. method according to claim 3, it is characterised in that the basis determines the every of the characteristic vector of each grid
Individual sample image, convolutional neural networks are trained including:
Subsample image is chosen in the sample image set, wherein the quantity of the subsample image chosen is less than described
The quantity of sample image in sample image set;
Using each the described subsample image chosen, convolutional neural networks are trained.
6. the method according to claim 1 or 3, it is characterised in that the default dividing mode includes:
Image and sample image are divided into line number amount and number of columns identical multiple grid;Or,
Image and sample image are divided into multiple grids that line number amount and number of columns are differed.
7. method according to claim 6, it is characterised in that methods described also includes:
Prediction according to the convolutional neural networks to the position and classification of the subsample objects in images, and subsample image
The information of the target object of middle mark, determines the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following loss letter
Number determines:
Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal seat
Mark,For prediction object central point grid i abscissa, yiFor mark target object central point in the grid
The ordinate of i,For prediction object central point grid i ordinate, hiThe rectangle where the target object of mark
The height of frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction
Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object
Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned
Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set
The weights put,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,
The grid i of prediction takes 1 when there is the central point of object, otherwise take 0,When the grid i of prediction does not exist the central point of object
1 is taken, 0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is the net predicted
Object in lattice i belongs to the conditional probability of classification c.
8. method according to claim 1, it is characterised in that described according to center position parameter in the characteristic vector
With appearance and size parameter, determining the positional information of the object of category parameter correspondence classification includes:
According to the location parameter of the central point, positional information of the central point in the grid is determined;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to the profile
Dimensional parameters, determine the positional information of the rectangle frame, believe the positional information of the rectangle frame as the position of the object
Breath, and using the corresponding object classification of the classification parameter as the object classification.
9. method according to claim 8, it is characterised in that the location parameter according to the central point, determines institute
Stating positional information of the central point in the grid includes:
The set point of the grid is as a reference point;According to the reference point and the location parameter of the central point, institute is determined
State positional information of the central point in the grid.
10. the article detection device in a kind of image, it is characterised in that described device includes:
Division module, for according to default dividing mode, image to be detected being divided into multiple grids, wherein described to be checked
The size of the image of survey is target size;
Detection module, for the image after division to be input in the convolutional neural networks of training in advance completion, obtains convolution god
Multiple characteristic vectors of the described image exported through network, wherein each grid one characteristic vector of correspondence;
Determining module, for for the corresponding characteristic vector of each grid, recognizing the maximum of classification parameter in this feature vector,
When the maximum is more than given threshold, according to center position parameter in the characteristic vector and appearance and size parameter, really
Determine the positional information of the object of category parameter correspondence classification.
11. devices according to claim 10, it is characterised in that described device also includes:
Judge adjusting module, whether the size for judging described image is target size;If not, by the size of described image
It is adjusted to target size.
12. devices according to claim 10, it is characterised in that described device also includes:
Training module, for for each sample image in sample image set, using rectangle frame label target object;According to
Each sample image is divided into multiple grids by default dividing mode, determines the corresponding characteristic vector of each grid, wherein, often
The individual sample image size is target size, when the central point of target object is included in grid, according to the target object
Classification, the value of the corresponding classification parameter of the category in the corresponding characteristic vector of the grid is set to maximum set in advance
Value, the position in the grid is located at according to the central point, determines the value of center position parameter in the characteristic vector, and according to
The size of the rectangle frame of the target object of mark, determines the value of the appearance and size parameter in the characteristic vector, works as grid
In not comprising target object central point when, the value of each parameter is zero in the corresponding characteristic vector of the grid;It is every according to determining
Convolutional neural networks are trained by each sample image of the characteristic vector of individual grid.
13. devices according to claim 12, it is characterised in that the training module, are additionally operable to for each sample graph
Picture, whether the size for judging the sample image is target size;If not, being target chi by the size adjusting of the sample image
It is very little.
14. devices according to claim 13, it is characterised in that the training module, specifically in the sample graph
Image set chooses subsample image in closing, wherein the quantity of the subsample image chosen is less than sample in the sample image set
The quantity of this image;Using each the described subsample image chosen, convolutional neural networks are trained.
15. devices according to claim 12, it is characterised in that described device also includes:
Error calculating module, for according to the convolutional neural networks to the position of the subsample objects in images and classification
Prediction, and the target object marked in the image of subsample, determine the error of the convolutional neural networks;
When the error convergence, determine that the convolutional neural networks training is completed, wherein the error uses following loss letter
Number determines:
Wherein, S is each that number of lines or column number, B of the number of lines of the grid for dividing with column number when identical pre-set
The quantity of the rectangle frame of grid forecasting, typically takes 1 or 2, xiFor mark target object central point grid i horizontal seat
Mark,For prediction object central point grid i abscissa, yiFor mark target object central point in grid i
Ordinate,For prediction object central point grid i ordinate, hiThe rectangle where the target object of mark
The height of frame, wiThe width of rectangle frame where the target object of mark,The height of rectangle frame where the object of prediction
Degree,The width of rectangle frame, C where the object of predictioniIt is that the grid i for marking currently whether there is the general of target object
Rate,Be prediction grid i currently with the presence or absence of object probability, PiC () is that the target object in the grid i of mark is returned
Belong to the probability of classification c,The probability of classification c, λ are belonged to for the object in the grid i of predictioncoordAnd λnoobjTo set
Weights,The central point of the object in j-th rectangle frame of prediction takes 1 when being located in grid i, otherwise takes 0,Pre-
The grid i of survey takes 1 when there is the central point of object, otherwise take 0,Taken when the grid i of prediction does not exist the central point of object
1,0 is otherwise taken, wherein,Determined according to below equation:
Pr(Object) be prediction grid i currently with the presence or absence of object probability, Pr(Class | Object) it is the net predicted
Object in lattice i belongs to the conditional probability of classification c.
16. devices according to claim 10, it is characterised in that the determining module, specifically for according to the center
The location parameter of point, determines positional information of the central point in the grid;
The central point is determined according to the positional information, using the central point as the center of rectangle frame, according to the profile
Dimensional parameters, determine the positional information of the rectangle frame, believe the positional information of the rectangle frame as the position of the object
Breath, and using the corresponding object classification of the classification parameter as the object classification.
17. devices according to claim 16, it is characterised in that the determining module, specifically for by the grid
Set point is as a reference point;According to the reference point and the location parameter of the central point, determine the central point in the net
Positional information in lattice.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249792.1A CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
EP17886017.7A EP3545466A4 (en) | 2016-12-29 | 2017-10-20 | Systems and methods for detecting objects in images |
PCT/CN2017/107043 WO2018121013A1 (en) | 2016-12-29 | 2017-10-20 | Systems and methods for detecting objects in images |
US16/457,861 US11113840B2 (en) | 2016-12-29 | 2019-06-28 | Systems and methods for detecting objects in images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249792.1A CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106803071A true CN106803071A (en) | 2017-06-06 |
CN106803071B CN106803071B (en) | 2020-02-14 |
Family
ID=58985345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611249792.1A Active CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106803071B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107392158A (en) * | 2017-07-27 | 2017-11-24 | 济南浪潮高新科技投资发展有限公司 | A kind of method and device of image recognition |
CN108062547A (en) * | 2017-12-13 | 2018-05-22 | 北京小米移动软件有限公司 | Character detecting method and device |
CN108229307A (en) * | 2017-11-22 | 2018-06-29 | 北京市商汤科技开发有限公司 | For the method, apparatus and equipment of object detection |
WO2018121013A1 (en) * | 2016-12-29 | 2018-07-05 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
CN108460761A (en) * | 2018-03-12 | 2018-08-28 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN108921840A (en) * | 2018-07-02 | 2018-11-30 | 北京百度网讯科技有限公司 | Display screen peripheral circuit detection method, device, electronic equipment and storage medium |
CN108960232A (en) * | 2018-06-08 | 2018-12-07 | Oppo广东移动通信有限公司 | Model training method, device, electronic equipment and computer readable storage medium |
CN108968811A (en) * | 2018-06-20 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of object identification method and system of sweeping robot |
CN109272050A (en) * | 2018-09-30 | 2019-01-25 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109558791A (en) * | 2018-10-11 | 2019-04-02 | 浙江大学宁波理工学院 | It is a kind of that bamboo shoot device and method is sought based on image recognition |
CN109685069A (en) * | 2018-12-27 | 2019-04-26 | 乐山师范学院 | Image detecting method, device and computer readable storage medium |
CN109726741A (en) * | 2018-12-06 | 2019-05-07 | 江苏科技大学 | A kind of detection method and device of multiple target object |
CN109961107A (en) * | 2019-04-18 | 2019-07-02 | 北京迈格威科技有限公司 | Training method, device, electronic equipment and the storage medium of target detection model |
CN110110189A (en) * | 2018-02-01 | 2019-08-09 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN110338835A (en) * | 2019-07-02 | 2019-10-18 | 深圳安科高技术股份有限公司 | A kind of intelligent scanning stereoscopic monitoring method and system |
CN110610184A (en) * | 2018-06-15 | 2019-12-24 | 阿里巴巴集团控股有限公司 | Method, device and equipment for detecting salient object of image |
CN110930386A (en) * | 2019-11-20 | 2020-03-27 | 重庆金山医疗技术研究院有限公司 | Image processing method, device, equipment and storage medium |
CN111353555A (en) * | 2020-05-25 | 2020-06-30 | 腾讯科技(深圳)有限公司 | Label detection method and device and computer readable storage medium |
CN111461318A (en) * | 2019-01-22 | 2020-07-28 | 斯特拉德视觉公司 | Neural network operation method using grid generator and device using the same |
CN111597845A (en) * | 2019-02-20 | 2020-08-28 | 中科院微电子研究所昆山分所 | Two-dimensional code detection method, device and equipment and readable storage medium |
CN111639660A (en) * | 2019-03-01 | 2020-09-08 | 中科院微电子研究所昆山分所 | Image training method, device, equipment and medium based on convolutional network |
CN111914850A (en) * | 2019-05-07 | 2020-11-10 | 百度在线网络技术(北京)有限公司 | Picture feature extraction method, device, server and medium |
CN112084874A (en) * | 2020-08-11 | 2020-12-15 | 深圳市优必选科技股份有限公司 | Object detection method and device and terminal equipment |
CN112446867A (en) * | 2020-11-25 | 2021-03-05 | 上海联影医疗科技股份有限公司 | Method, device and equipment for determining blood flow parameters and storage medium |
CN112785564A (en) * | 2021-01-15 | 2021-05-11 | 武汉纺织大学 | Pedestrian detection tracking system and method based on mechanical arm |
CN113935425A (en) * | 2021-10-21 | 2022-01-14 | 中国船舶重工集团公司第七一一研究所 | Object identification method, device, terminal and storage medium |
US11373411B1 (en) | 2018-06-13 | 2022-06-28 | Apple Inc. | Three-dimensional object estimation using two-dimensional annotations |
CN114739388A (en) * | 2022-04-20 | 2022-07-12 | 中国移动通信集团广东有限公司 | Indoor positioning navigation method and system based on UWB and laser radar |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517113A (en) * | 2013-09-29 | 2015-04-15 | 浙江大华技术股份有限公司 | Image feature extraction method and device and image sorting method and device |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
-
2016
- 2016-12-29 CN CN201611249792.1A patent/CN106803071B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517113A (en) * | 2013-09-29 | 2015-04-15 | 浙江大华技术股份有限公司 | Image feature extraction method and device and image sorting method and device |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
Non-Patent Citations (1)
Title |
---|
RUSSELL STEWART 等: "End-to-end people detection in crowded scenes", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1506.04878》 * |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018121013A1 (en) * | 2016-12-29 | 2018-07-05 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
US11113840B2 (en) | 2016-12-29 | 2021-09-07 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
CN107392158A (en) * | 2017-07-27 | 2017-11-24 | 济南浪潮高新科技投资发展有限公司 | A kind of method and device of image recognition |
CN108229307A (en) * | 2017-11-22 | 2018-06-29 | 北京市商汤科技开发有限公司 | For the method, apparatus and equipment of object detection |
CN108229307B (en) * | 2017-11-22 | 2022-01-04 | 北京市商汤科技开发有限公司 | Method, device and equipment for object detection |
US11222441B2 (en) | 2017-11-22 | 2022-01-11 | Beijing Sensetime Technology Development Co., Ltd. | Methods and apparatuses for object detection, and devices |
CN108062547A (en) * | 2017-12-13 | 2018-05-22 | 北京小米移动软件有限公司 | Character detecting method and device |
CN110110189A (en) * | 2018-02-01 | 2019-08-09 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN108460761A (en) * | 2018-03-12 | 2018-08-28 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN108960232A (en) * | 2018-06-08 | 2018-12-07 | Oppo广东移动通信有限公司 | Model training method, device, electronic equipment and computer readable storage medium |
US11748998B1 (en) | 2018-06-13 | 2023-09-05 | Apple Inc. | Three-dimensional object estimation using two-dimensional annotations |
US11373411B1 (en) | 2018-06-13 | 2022-06-28 | Apple Inc. | Three-dimensional object estimation using two-dimensional annotations |
CN110610184A (en) * | 2018-06-15 | 2019-12-24 | 阿里巴巴集团控股有限公司 | Method, device and equipment for detecting salient object of image |
CN110610184B (en) * | 2018-06-15 | 2023-05-12 | 阿里巴巴集团控股有限公司 | Method, device and equipment for detecting salient targets of images |
CN108968811A (en) * | 2018-06-20 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of object identification method and system of sweeping robot |
KR20200004823A (en) * | 2018-07-02 | 2020-01-14 | 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 | Display screen peripheral circuit detection method, device, electronic device and storage medium |
CN108921840A (en) * | 2018-07-02 | 2018-11-30 | 北京百度网讯科技有限公司 | Display screen peripheral circuit detection method, device, electronic equipment and storage medium |
KR102321768B1 (en) | 2018-07-02 | 2021-11-03 | 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 | Display screen peripheral circuit detection method, apparatus, electronic device and storage medium |
JP2020530125A (en) * | 2018-07-02 | 2020-10-15 | 北京百度網訊科技有限公司 | Display screen peripheral circuit detection method, display screen peripheral circuit detection device, electronic devices and storage media |
CN109272050B (en) * | 2018-09-30 | 2019-11-22 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109272050A (en) * | 2018-09-30 | 2019-01-25 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109558791A (en) * | 2018-10-11 | 2019-04-02 | 浙江大学宁波理工学院 | It is a kind of that bamboo shoot device and method is sought based on image recognition |
CN109726741A (en) * | 2018-12-06 | 2019-05-07 | 江苏科技大学 | A kind of detection method and device of multiple target object |
CN109726741B (en) * | 2018-12-06 | 2023-05-30 | 江苏科技大学 | Method and device for detecting multiple target objects |
CN109685069A (en) * | 2018-12-27 | 2019-04-26 | 乐山师范学院 | Image detecting method, device and computer readable storage medium |
CN111461318B (en) * | 2019-01-22 | 2023-10-17 | 斯特拉德视觉公司 | Neural network operation method using grid generator and device using the same |
CN111461318A (en) * | 2019-01-22 | 2020-07-28 | 斯特拉德视觉公司 | Neural network operation method using grid generator and device using the same |
CN111597845A (en) * | 2019-02-20 | 2020-08-28 | 中科院微电子研究所昆山分所 | Two-dimensional code detection method, device and equipment and readable storage medium |
CN111639660B (en) * | 2019-03-01 | 2024-01-12 | 中科微至科技股份有限公司 | Image training method, device, equipment and medium based on convolution network |
CN111639660A (en) * | 2019-03-01 | 2020-09-08 | 中科院微电子研究所昆山分所 | Image training method, device, equipment and medium based on convolutional network |
CN109961107A (en) * | 2019-04-18 | 2019-07-02 | 北京迈格威科技有限公司 | Training method, device, electronic equipment and the storage medium of target detection model |
CN111914850A (en) * | 2019-05-07 | 2020-11-10 | 百度在线网络技术(北京)有限公司 | Picture feature extraction method, device, server and medium |
CN111914850B (en) * | 2019-05-07 | 2023-09-19 | 百度在线网络技术(北京)有限公司 | Picture feature extraction method, device, server and medium |
CN110338835A (en) * | 2019-07-02 | 2019-10-18 | 深圳安科高技术股份有限公司 | A kind of intelligent scanning stereoscopic monitoring method and system |
CN110930386A (en) * | 2019-11-20 | 2020-03-27 | 重庆金山医疗技术研究院有限公司 | Image processing method, device, equipment and storage medium |
CN110930386B (en) * | 2019-11-20 | 2024-02-20 | 重庆金山医疗技术研究院有限公司 | Image processing method, device, equipment and storage medium |
CN111353555A (en) * | 2020-05-25 | 2020-06-30 | 腾讯科技(深圳)有限公司 | Label detection method and device and computer readable storage medium |
CN112084874A (en) * | 2020-08-11 | 2020-12-15 | 深圳市优必选科技股份有限公司 | Object detection method and device and terminal equipment |
CN112084874B (en) * | 2020-08-11 | 2023-12-29 | 深圳市优必选科技股份有限公司 | Object detection method and device and terminal equipment |
CN112446867A (en) * | 2020-11-25 | 2021-03-05 | 上海联影医疗科技股份有限公司 | Method, device and equipment for determining blood flow parameters and storage medium |
CN112785564A (en) * | 2021-01-15 | 2021-05-11 | 武汉纺织大学 | Pedestrian detection tracking system and method based on mechanical arm |
CN113935425A (en) * | 2021-10-21 | 2022-01-14 | 中国船舶重工集团公司第七一一研究所 | Object identification method, device, terminal and storage medium |
CN114739388A (en) * | 2022-04-20 | 2022-07-12 | 中国移动通信集团广东有限公司 | Indoor positioning navigation method and system based on UWB and laser radar |
Also Published As
Publication number | Publication date |
---|---|
CN106803071B (en) | 2020-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106803071A (en) | Object detecting method and device in a kind of image | |
CN106780612B (en) | Object detecting method and device in a kind of image | |
Sheng et al. | Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving | |
CN113569667B (en) | Inland ship target identification method and system based on lightweight neural network model | |
US11581130B2 (en) | Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation | |
CN110264468B (en) | Point cloud data mark, parted pattern determination, object detection method and relevant device | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN107229904A (en) | A kind of object detection and recognition method based on deep learning | |
CN110991444B (en) | License plate recognition method and device for complex scene | |
CN106874914A (en) | A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks | |
CN110400332A (en) | A kind of target detection tracking method, device and computer equipment | |
CN107330410A (en) | Method for detecting abnormality based on deep learning under complex environment | |
CN106570453A (en) | Pedestrian detection method, device and system | |
CN113362491B (en) | Vehicle track prediction and driving behavior analysis method | |
CN104361351A (en) | Synthetic aperture radar (SAR) image classification method on basis of range statistics similarity | |
CN111062383A (en) | Image-based ship detection depth neural network algorithm | |
CN102750522B (en) | A kind of method of target following | |
Joseph et al. | Systematic advancement of YOLO object detector for real-time detection of objects | |
CN113487600A (en) | Characteristic enhancement scale self-adaptive sensing ship detection method | |
Niu et al. | A PCB Defect Detection Algorithm with Improved Faster R-CNN. | |
CN114882423A (en) | Truck warehousing goods identification method based on improved Yolov5m model and Deepsort | |
CN106960434A (en) | A kind of image significance detection method based on surroundedness and Bayesian model | |
Bai et al. | Depth feature fusion based surface defect region identification method for steel plate manufacturing | |
CN109492697A (en) | Picture detects network training method and picture detects network training device | |
CN104050665B (en) | The method of estimation and device of prospect residence time in a kind of video image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |