Nothing Special   »   [go: up one dir, main page]

WO2019148362A1 - 物体检测方法和装置 - Google Patents

物体检测方法和装置 Download PDF

Info

Publication number
WO2019148362A1
WO2019148362A1 PCT/CN2018/074706 CN2018074706W WO2019148362A1 WO 2019148362 A1 WO2019148362 A1 WO 2019148362A1 CN 2018074706 W CN2018074706 W CN 2018074706W WO 2019148362 A1 WO2019148362 A1 WO 2019148362A1
Authority
WO
WIPO (PCT)
Prior art keywords
partial image
image feature
candidate detection
region
convolution
Prior art date
Application number
PCT/CN2018/074706
Other languages
English (en)
French (fr)
Inventor
白向晖
谭志明
Original Assignee
富士通株式会社
白向晖
谭志明
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社, 白向晖, 谭志明 filed Critical 富士通株式会社
Priority to CN201880055754.3A priority Critical patent/CN111095295B/zh
Priority to JP2020529127A priority patent/JP6984750B2/ja
Priority to PCT/CN2018/074706 priority patent/WO2019148362A1/zh
Publication of WO2019148362A1 publication Critical patent/WO2019148362A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to an object detecting method and apparatus.
  • Target detection technology is an important research branch in the field of computer vision. Its purpose is to find out the location of all targets in the image and determine the specific category of each target.
  • the traditional target detection method is generally divided into three steps: firstly, some candidate regions are selected on the image by using the sliding window, and then the visual features of the candidate regions are extracted, and finally the trained classifier is used for classification and recognition, and the target detection result is obtained.
  • CNN Convolutional Neural Networks
  • the input of the convolutional neural network is the original image.
  • the convolution kernel of the convolutional layer convolves the original image with a certain size and stride to obtain the feature map, and the downsampling layer performs the feature map.
  • the sampling process extracts the maximum value or the average value in a certain area on the feature map, and after multi-layer convolution and down-sampling processing, is subjected to classification processing by the classifier to obtain the target detection result.
  • RCNN Region-based CNN
  • fast RCNN such as Fast RCNN and Faster RCNN.
  • the basic structure is still CNN, which adds a Region Recommendation Network (RPN) after the last layer of the feature map of the convolutional neural network, and the candidate region is obtained through the RPN training, and the candidate region is extracted.
  • RPN Region Recommendation Network
  • the image features are classified using a classifier to obtain target detection results.
  • the input image is convoluted by using multiple convolution layers to obtain the feature map, and then the candidate region is determined by using the RPN, and the last convolution layer with more semantic information is returned.
  • the features in the candidate region are classified by the classifier to obtain the target detection result.
  • the detection accuracy is low when detecting the small target object in the input image.
  • the embodiment of the invention provides an object detecting method and device, which can realize the balance of spatial resolution and semantic information when extracting local image features, and improve object detection precision.
  • an object detecting method comprising:
  • the object detection is performed according to the second partial image feature of each candidate detection area, and the object detection result is output.
  • an object detecting apparatus comprising:
  • a feature extraction unit for extracting global image features from the input image using a plurality of convolution layers
  • a region recommendation unit configured to determine a plurality of candidate detection regions by using the global image feature, and feed back information of the plurality of candidate detection regions to the feature extraction unit; and the feature extraction unit is further configured to use the information according to the information a predetermined number of convolution layers in the convolutional layer, extracting a first partial image feature corresponding to a predetermined number of convolutional layers;
  • a processing unit configured to determine, according to the first partial image feature, a second partial image feature of each of the plurality of candidate detection regions; wherein a portion of the plurality of candidate detection regions is a candidate detection region
  • the two partial image features are determined using the extracted first partial image features corresponding to at least two of the predetermined number of convolutional layers;
  • a detecting unit configured to perform object detection according to the second partial image feature of each candidate detection area, and output an object detection result.
  • An advantageous effect of the embodiment of the present invention is that, by the method and apparatus of the embodiment, when extracting a feature, a partial image feature of a part of the candidate detection regions of the plurality of candidate regions is a partial image feature extracted by using at least two convolution layers Therefore, it is possible to achieve a balance between spatial resolution and semantic information when extracting local image features, and improve object detection accuracy.
  • FIG. 1 is a schematic view of an object detecting device in the first embodiment
  • FIG. 2 is a schematic diagram of a convolution operation of a convolution layer in the first embodiment
  • FIG. 3 is a schematic diagram of determining a candidate detection area in the first embodiment
  • Figure 5 is a schematic view showing the structure of the object detection in the second embodiment
  • Figure 6 is a schematic diagram showing the result of detecting an object in the second embodiment
  • Figure 7 is a schematic diagram showing the structure of an electronic device in the third embodiment.
  • FIG. 8 is a schematic diagram showing the hardware configuration of an electronic device in the third embodiment.
  • the first embodiment provides an object detecting device.
  • FIG. 1 is a schematic diagram of an object detecting apparatus according to Embodiment 1, as shown in FIG. 1, the apparatus includes:
  • a feature extraction unit 101 for extracting global image features from the input image using a plurality of convolution layers
  • a region recommendation unit 102 configured to determine a plurality of candidate detection regions by using the global image feature, and feed back information of the plurality of candidate detection regions to the feature extraction unit 101; and the feature extraction unit 101 is further configured to: according to the information, Extracting a first partial image feature corresponding to a predetermined number of convolution layers by using a predetermined number of convolution layers of the plurality of convolutional layers;
  • a processing unit 103 configured to determine, according to the first partial image feature, a second partial image feature of each of the plurality of candidate detection regions; wherein, a part of the plurality of candidate detection regions is candidate detection regions
  • the second partial image feature is determined using the extracted first partial image features corresponding to at least two of the predetermined number of convolutional layers;
  • the detecting unit 104 is configured to perform object detection according to the second partial image feature of each candidate detection area, and output an object detection result.
  • the partial image features of a part of the candidate detection regions of the plurality of candidate regions are determined by using the partial image features extracted by the at least two convolution layers, thereby being capable of extracting the partial image features. Realize the balance between spatial resolution and semantic information, and improve the accuracy of object detection.
  • the feature extraction unit 101 can be implemented using a convolutional neural network structure that extracts global image features from the input image using a plurality of (N) convolutional layers in the convolutional neural network structure, wherein each volume
  • the stack can be regarded as a filter.
  • the filter parameter can be called a convolution kernel.
  • the convolution kernel can be set to one or at least two, as needed, for each convolution layer of the multiple convolution layers. The parameters are different.
  • the convolution operation is performed to extract the features in the image.
  • FIG. 2 is a schematic diagram of a convolution operation of a convolution layer.
  • the image is a 5 ⁇ 5 image
  • the convolution layer corresponds to a 3 ⁇ 3 convolution kernel.
  • the convolution kernel is regarded as a sliding window, which is sequentially slid on the image and summed with the corresponding image pixels to obtain the extracted global image features.
  • the convolution layer through which the input image first passes is referred to as the first layer, and so on, and the last convolved layer is referred to as the Nth layer, that is, the plurality of convolution layers are sequentially
  • the numbers are 1, 2, ..., N, wherein the convolutional layer with a small number is the convolutional layer with the highest position, and the convolutional layer with the large number is the convolutional layer with the lower position; the input image undergoes the first convolution After the layer, the extracted global image feature is input as the second convolutional layer, and so on, until the Nth convolutional layer is passed; wherein the plurality of convolutional layers have depth and division, the multiple convolutional layers
  • the convolutional layer in the middle position has higher spatial resolution than the convolution layer in the lower position; the convolution layer in the front position has less semantic information than the convolution layer in the lower position, usually, the convolution in the front position
  • the layer is called the shallow layer, and the convolution layer behind the position becomes
  • the convolution kernel located in the shallow layer can extract image features such as edges and colors, and the semantic information is small, but the spatial resolution is high, and the number of layers is deepened.
  • the degree of nonlinearity is enhanced, and the image features obtained by convolution may be specific shapes, for example The nose and the eyes have many semantic information, but the spatial resolution is low.
  • the spatial resolution of multiple convolutional layers can be reduced in multiples.
  • the spatial resolution of the W-1 convolutional layer is the Wth convolution. 2 times the layer (W is greater than or equal to 2 and less than or equal to N), but the embodiment is not limited thereto.
  • the above is only an exemplary illustration of how to extract features using a convolutional neural network, but this embodiment is not limited thereto.
  • the structure of the convolutional neural network may refer to the prior art, for example, LeNet, AlexNet, ZF Net, GoogleLeNet, VGGNet, ResNet, DenseNet, etc., are not examples here.
  • the size of N can be determined as needed, and the embodiment is not limited thereto.
  • N may be 5.
  • the region recommendation unit 102 may be implemented by using an existing RPN structure, which uses the global image feature extracted by the feature extraction unit 101 to determine a plurality of candidate detection regions, wherein any of the plurality of convolution layers may be utilized.
  • the global image feature extracted by one or more two or more convolution layers is input to the RPN.
  • This embodiment is not limited thereto.
  • the global image feature extracted by the Nth convolution layer is input to the RPN for determining.
  • the candidate detection area is such that the candidate detection area can be determined more accurately because the semantic information of the Nth convolutional layer is more.
  • the specific implementation of the RPN can refer to the prior art, and the following examples are illustrated.
  • FIG. 3 is a schematic diagram of determining a plurality of candidate detection regions by using the global image feature extracted by the Nth convolution layer, as shown in FIG. 3, centering on each point on the global image feature extracted by the Nth convolution layer,
  • the sliding window with different area and aspect ratio is used to collect features in a specific region of the global image feature, and the features collected by different windows are dimensioned to a fixed dimension. According to the features after dimension reduction, the classification layer is given in each sliding window.
  • a score containing a target, a window with a high score is regarded as a positive sample, and a flag with a low score is considered to have no object and is filtered out, and the classification layer can determine a central anchor point of the candidate detection area and coordinates of the candidate detection area, width and height;
  • the other connection layer is used to determine whether the candidate detection area is the foreground or the background.
  • the full connection layer can also be implemented by using the convolution layer.
  • the number of candidate detection regions determined by the region recommendation unit 102 is multiple according to the above algorithm, wherein the plurality of candidate detection regions may be divided into a first number (M) of regional groups according to the region size level, respectively
  • the first region group, the second region group, the ..., the Mth region group, and the candidate detection regions in the different region groups have different sizes, for example, the candidate detection regions in the first region group are smaller than the candidate detection in the second region group.
  • the candidate detection regions in the second region group are smaller than the candidate detection regions in the third region group, and so on, the candidate detection regions in the M-1 region group are smaller than the candidate detection regions in the M region group.
  • each region group includes at least one candidate detection region, and the number of Ms may be determined according to requirements.
  • M is greater than or equal to 2
  • the candidate detection regions may be divided into three regional groups according to the region size level. , respectively, the large area group, the middle area group and the small area group.
  • the area group, the larger area group, the middle area group, the smaller area group, and the ultra-small area group are only illustrated by way of example, and the present embodiment is not limited thereto, wherein the candidate detection area in each area group
  • the second partial image feature is determined in the same way.
  • the width threshold W2 determines that the candidate detection area belongs to the small area group when the length and the width of the candidate detection area determined by the RPN are smaller than L1 and W1, respectively. When the length and the width are greater than L2 and W2, respectively, the candidate detection area is determined to be large.
  • the area group and other cases belong to the middle area group. This is only an example.
  • the present embodiment is not limited thereto.
  • the information of the plurality of candidate detection areas may be fed back to the feature extraction unit 101.
  • the feature extraction unit 101 may further utilize the information according to the information. a predetermined number of convolutional layers in the convolutional layer, extracting a first partial image feature corresponding to a predetermined number of convolutional layers; wherein the predetermined number is greater than or equal to 2 and less than or equal to N; the processing unit 103 is based on the first partial image Feature determining a second partial image feature of each of the plurality of candidate detection regions; wherein a second partial image feature of a portion of the plurality of candidate detection regions is a predetermined number corresponding to the extracted
  • the first partial image feature of at least two convolutional layers in the convolutional layer is determined; the second partial image feature of the other of the plurality of candidate detection regions is the extracted corresponding number of volumes Determining a first partial image feature of at least one convolutional layer in the layer, wherein the candidate detection region
  • the candidate detection area in one area group with a smaller area level may be determined as the part of the candidate area, and the area level is larger.
  • the candidate detection area in one area group is determined as the other part candidate detection area, such that the second partial image feature of the candidate detection area in one area group having the smaller level of the area is the extracted corresponding at least two volumes
  • the first partial image feature of the layer is determined, so that the balance of spatial resolution and semantic information can be achieved when the second partial image feature is extracted, and the object detection accuracy, especially the detection accuracy of the small target object, is improved.
  • the feature extraction unit 101 may extract, according to information of all candidate detection regions, a first partial image feature corresponding to each of the predetermined number of convolution layers, wherein, for each region group, The feature extraction unit 101 may extract, according to information of the candidate detection regions in each of the region groups, a first partial image feature corresponding to each of the predetermined number of convolution layers, that is, using the predetermined number of convolution layers.
  • Each of the convolutional layers extracts first partial image features of the candidate detection regions in all of the region groups corresponding to the convolutional layer, for example, a predetermined number of Z, using each of the Z convolutional layers to extract A first partial image feature of the candidate detection regions in the M region groups corresponding to the convolutional layer.
  • the processing unit 103 determines the second image feature of the part of the candidate detection regions from the extracted first partial image features corresponding to at least two of the convolution layers of the predetermined number of convolution layers, corresponding to the extracted Determining, by the first partial image feature of at least one of the predetermined number of convolutional layers, a second image feature of the another portion of the candidate detection region, wherein the first region group and the second region group of the plurality of region groups are And determining, by the processing unit 103, a position ratio of one of the at least two convolution layers utilized by the second partial image feature of the candidate detection area in the first region group to determine the second region group Positioning a convolution layer of the at least two convolution layers in the second partial image feature of the candidate detection region, wherein the candidate detection region in the first region group is smaller than the second region group Candidate detection area.
  • the processing unit 103 determines second partial image features of the candidate detection regions in the first region group according to the first partial image features corresponding to the third and fifth convolutional layers, according to the corresponding fourth and fifth volumes.
  • the first partial image feature of the layer determines a second partial image feature of the candidate detection region in the second region group, wherein the position of the third convolution layer is higher than the position of the fourth convolution layer.
  • the relationship of the positions of the other convoluted layers in the at least two convolution layers is not limited, and they may be the same or different.
  • the feature extraction unit 101 may extract, according to information of the partial candidate detection region, a first partial image feature corresponding to each of the predetermined number of convolution layers, wherein, for a region group, the feature The extracting unit 101 extracts a first partial image feature corresponding to a partial convolutional layer in a predetermined number of convolution layers according to information of the candidate detection regions in the region group, that is, using a partial convolution layer in the predetermined number of convolution layers And extracting a first partial image feature of the candidate detection region in the region group corresponding to the partial convolution layer.
  • the feature extraction unit 101 extracts the information corresponding to the first predetermined convolution layer according to the information of the candidate detection regions of the first region group.
  • the processing unit 103 is based on the first Determining, by the first partial image feature of the predetermined convolution layer, a second partial image feature of the candidate detection region in the first region group, and determining, in the second region group, the first partial image feature of the second predetermined convolution layer A second partial image feature of the candidate detection area.
  • the first predetermined convolutional layer may be the 3rd and 5th convolutional layers
  • the second predetermined convolutional layer may be the 4th and 5th convolutional layers, according to the candidate of the first regional group
  • the information of the detection area is extracted corresponding to the first partial image features of the third and fifth convolutional layers
  • the first corresponding to the fourth and fifth convolutional layers are extracted according to the information of the candidate detection regions of the second regional group.
  • the positional relationship between the position of the other convolutional layer in the first predetermined convolutional layer and the other convolutional layer in the second predetermined convolutional layer is not limited, and may be the same or different.
  • the position ratio of one of the at least two convolution layers utilized when determining the second partial image feature of the candidate detection region in the region group having the smaller size class is determined according to the above embodiment. Determining the position of one of the at least two convolution layers utilized by the second partial image feature of the candidate detection area in the region group having a larger size class, due to the convolution layer extraction of the front position The spatial resolution of the feature is large, and therefore, the detection accuracy of the small target object can be further improved.
  • the processing unit 103 when the second partial image feature of the candidate detection area is determined using the extracted first partial image feature corresponding to one of the predetermined number of convolutional layers, the processing unit 103 will extract the corresponding The first partial image feature of one convolutional layer acts directly as the second partial image feature of another portion of the candidate detection region.
  • the second partial portion of the candidate detection region (the portion and/or the other portion) is determined using the extracted first partial image features corresponding to at least two of the predetermined number of convolutional layers
  • the processing unit 103 integrates the first partial image features of each of the at least two convolution layers to obtain a second partial image feature of the candidate detection region, and the integration is specifically described below. deal with.
  • the processing unit 103 can include:
  • a first processing module (not shown) for performing upsampling processing of the first partial image feature of the at least one convolved layer that is behind the extracted corresponding position, so as to be the most forward convolution with the extracted corresponding position
  • the spatial resolution of the first partial image features of the layer is the same, and the first partial image feature of the at least one convolved layer behind the corresponding extracted extracted position and the extracted convolution layer of the corresponding position are A partial image feature is subjected to addition processing to obtain a second partial image feature corresponding to the candidate detection region.
  • the first partial image features of the Q-1 convolutional layers that are after the extracted corresponding positions are respectively subjected to upsampling processing, so that the corresponding positions are the most extracted.
  • the spatial resolution of the first partial image feature of the first convolutional layer is the same, and the first partial image feature of the corresponding Q-1 convolutional layer after the upsampling process and the first position corresponding to the corresponding position are The first partial image features of the convolutional layer are superimposed.
  • the spatial resolution of the first partial image feature of the convolutional layer corresponding to the highest position is (H/8, W/8), corresponding to
  • the spatial resolution of the first partial image features of the two convolutional layers that are located behind is (H/16, W/16) and (H/32, W/32), respectively, and the spatial resolution is (H) /16, W/16) and the first partial image feature of (H/32, W/32) are upsampled, and the spatial resolution is increased to (H/8, W/8), so 3
  • the spatial resolution is the same, that is, it can be added.
  • the processing unit 103 can include:
  • a second processing module for expanding the first partial image feature of the at least one convolved layer that is behind the extracted corresponding position to be the most forward convolution layer corresponding to the extracted corresponding position
  • the first partial image feature has the same spatial resolution, and the first partial image feature of the convolved layer behind the processed corresponding extracted position and the first partial image of the convolved layer with the highest corresponding extracted position are The feature performs a superposition convolution process to obtain a second partial image feature corresponding to the candidate detection region.
  • the first partial image features of the Q-1 convolutional layers that are after the extracted corresponding positions are respectively subjected to an enlargement process, and then corresponding to the extracted
  • the spatial resolution of the first partial image feature of the first convolutional layer at the top position is the same, and the first partial image feature of the corresponding Q-1 convolutional layer after the expansion process and the corresponding top position are 1
  • the first partial image feature of the convolutional layer is subjected to superposition convolution processing.
  • the spatial resolution of the first partial image feature of the convolutional layer corresponding to the highest position is (H/8, W/ 8)
  • the spatial resolution of the first partial image features of the two convolutional layers corresponding to the lower position is (H/16, W/16) and (H/32, W/32), respectively.
  • the first partial image features in the candidate detection regions of (H/16, W/16) and (H/32, W/32) are expanded to increase the spatial resolution to (H/8, W).
  • the spatial resolution of the three convolutional layers is the same, that is, the convolution processing can be superimposed, wherein the expansion processing refers to the center point of the original candidate detection area
  • the reference expands the original candidate detection area to extract more first partial image features
  • the convolution process may be a new convolution layer different from the plurality of convolution layers, and the superimposed features are dimensionally reduced. deal with.
  • the processing unit 703 may include a first processing module or a second processing module, or may also include a first processing module and a second processing module, for example, candidates in a regional group with a smaller regional level.
  • the detection area is processed by the second processing module, and is processed by the first processing module for the candidate detection area in the area group with a larger area level, but this embodiment is not limited thereto, and the area level is small.
  • the larger the regional level is the relative comparison result of the candidate detection regions in the two regional groups.
  • the feature extraction unit 101 may extract the corresponding position according to the information of the candidate detection area in the large area group.
  • the first partial image feature of the lower convolution layer according to the information of the candidate detection region in the middle region group, extracts the convolution layer with the lowest position corresponding to the first position and the first portion of the convolution layer with the third last position
  • the image feature extracts, according to the information of the candidate detection area in the small area group, the first partial image feature of the convolution layer corresponding to the position and the convolution layer of the third last position; the processing unit 103 uses the extracted pair
  • the first partial image feature of the candidate detection region in the large region group should be determined by the first partial image feature of the most backward convolutional layer in the plurality of convolutional layers; using the extracted corresponding multiple convolutional layers
  • the first partial image feature extracted by the lowermost convolution layer is subjected to upsampling processing and is determined by adding the first
  • the information of the candidate detection area in the large area group is fed back to the fifth convolution layer (a convolution layer), and the first partial image feature corresponding to the fifth convolution layer is extracted to determine
  • the second partial image feature of the candidate detection area (the other part of the candidate detection area) in the large area group feeds back the information of the candidate detection area in the middle area group to the 4th and 5th convolutional layers, and extracts the corresponding 4th sum
  • the first partial image feature of the fifth convolutional layer performs upsampling processing on the first partial image feature corresponding to the fifth convolutional layer to distinguish it from the first partial image feature corresponding to the fourth convolutional layer
  • the rate is the same, and the first partial image feature corresponding to the fifth convolutional layer after the upsampling process and the first partial image feature corresponding to the fourth convolutional layer (two convolutional layers, here the example is 2, But at least two) may be added to determine a second partial image feature of the candidate detection region (a part of the candidate detection region) in the middle region group
  • the detecting unit 104 may perform object detection based on the RCNN structure.
  • the candidate detection regions corresponding to the first number of region groups may be used.
  • the second partial image features respectively obtain a first number of detection results, and the first number of detection results are added to output the object detection result.
  • the same number of RCNNs as the first number may be set, and each RCNN separately performs object detection on the second partial image features extracted by the candidate detection regions in one region group, and adds the recognition results of each RCNN to output the object.
  • the detection result is the object 3, and the detection result may also include the positioning of the objects 1, 2, 3, and the final object detection result is that the object has 1, 2, 3 in the input image.
  • the specific implementation manner of the RCNN may refer to the prior art.
  • the extracted second partial image feature is extracted from the region of interest using a region of interest (ROI pooling), and input to the classifier to obtain the candidate detection region.
  • ROI pooling region of interest
  • Object categories, object detection and positioning are not repeated here.
  • the partial image feature of a part of the candidate detection regions among the plurality of candidate regions is determined by the partial image feature extracted by the at least two convolution layers, thereby being able to extract the partial image
  • the feature balances the spatial resolution and semantic information to improve the accuracy of object detection.
  • the second embodiment provides an object detection method.
  • the principle of the method is similar to that of the device in the first embodiment. Therefore, the specific implementation may refer to the implementation of the device in the embodiment 1, and the content is the same. Repeat the instructions.
  • FIG. 4 is a flowchart of an object detecting method according to Embodiment 2. As shown in FIG. 4, the method includes:
  • Step 401 Extract a global image feature from the input image by using multiple convolution layers
  • Step 402 Determine, by using the global image feature, a plurality of candidate detection regions
  • Step 403 Extract, according to information of the plurality of candidate detection regions, a first partial image feature corresponding to a predetermined number of convolution layers by using a predetermined number of convolution layers of the plurality of convolution layers;
  • Step 404 Determine, according to the first partial image feature, a second partial image feature of each candidate detection region of the plurality of candidate detection regions; wherein, a second partial image of a portion of the plurality of candidate detection regions Characterizing is determined using the extracted first partial image features corresponding to at least two of the predetermined number of convolutional layers;
  • Step 405 Perform object detection according to the second partial image feature of each candidate detection area, and output an object detection result.
  • the specific implementation manners of the steps 401-405 can refer to the object detecting apparatus 100 in Embodiment 1, and the repeated description is not repeated.
  • the spatially higher resolution of the convoluted layer of the plurality of convolution layers than the convolutional layer of the lower position is higher than the convolutional layer of the lower convolutional layer Less semantic information.
  • the second partial image feature of the other candidate detection region of the plurality of candidate detection regions is a first partial image feature that utilizes the extracted at least one of the predetermined number of convolution layers definite.
  • each of the plurality of candidate detection regions belongs to one of the first number of regional groups having different regional size levels; and the first regional group and the plurality of regional groups are Second regional group:
  • the first partial image feature corresponding to the first predetermined convolution layer is extracted according to the information of the candidate detection region of the first region group, and the information of the candidate detection region according to the second region group is obtained. Extracting a first partial image feature corresponding to the second predetermined convolutional layer, wherein a position of one of the first predetermined convolutional layers is higher than a position of a convolutional layer of the second predetermined convolutional layer
  • the candidate detection area in the first area group is smaller than the candidate detection area in the second area group; in step 404, the first area group is determined according to the first partial image feature of the first predetermined convolution layer And determining, by the second partial image feature of the candidate detection region, the second partial image feature of the candidate detection region in the second region group according to the first partial image feature of the second predetermined convolution layer.
  • step 404 determining a position of a convolution layer of the at least two convolution layers utilized by the second partial image feature of the candidate detection region in the first region group, determining a position of one of the at least two convolution layers utilized by the second partial image feature of the candidate detection region in the second region group, wherein the candidate detection region in the first region group Less than the candidate detection area in the second group of regions.
  • the second partial image feature of the candidate detection region is determined using the extracted first partial image feature corresponding to at least two of the predetermined number of convolutional layers
  • the second partial image feature of each of the plurality of candidate detection regions includes: performing upsampling processing on the first partial image feature of the at least one convolution layer of the extracted corresponding position Extracting the spatial resolution of the first partial image feature of the most advanced convolution layer of the corresponding position is the same, and correspondingly extracting the first partial image feature of the at least one convolution layer after the processed corresponding position
  • the first partial image feature of the most advanced convolutional layer is subjected to addition processing to obtain a second partial image feature corresponding to the candidate detection region.
  • the second partial image feature of the candidate detection region is determined using the extracted first partial image feature corresponding to at least two of the predetermined number of convolutional layers
  • the second partial image feature of each of the plurality of candidate detection regions comprises: expanding, processing, and extracting the first partial image feature of the at least one convolution layer that is behind the extracted corresponding position
  • the spatial resolution of the first partial image feature of the convolution layer at the top of the corresponding position is the same, and the first partial image feature of the convolved layer behind the processed corresponding extracted position is closest to the extracted corresponding position.
  • the first partial image feature of the previous convolutional layer is subjected to superposition convolution processing to obtain a second partial image feature corresponding to the candidate detection region.
  • the first number of area groups includes: a large area group, a middle area group, and a small area group, and determining, according to the first partial image feature, each of the plurality of candidate detection areas
  • the two partial image features include: determining, by using the extracted first partial image features corresponding to the most backward convolution layer of the plurality of convolution layers, determining second partial image features of the candidate detection regions in the large region group; Corresponding to the first partial image feature of the convolution layer corresponding to the second highest convolutional layer after the first partial image feature extracted from the most backward convolutional layer in the plurality of convolutional layers is subjected to upsampling processing Adding to determine a second partial image feature of the candidate detection region in the middle region group; using the extracted first partial image feature corresponding to the most backward convolution layer in the plurality of convolution layers A convolution with the first partial image feature of the extracted third convolutional layer is performed to determine a second partial image feature of the candidate detection region in the small region group.
  • step 405 a first quantity of detection results are respectively obtained according to second partial image features of candidate detection areas corresponding to the first number of area groups, and the first number of detection results are added to The object detection result is output.
  • the global image feature can be extracted, and the RPN determines three regions according to the global image feature. Groups, which are large boxes, small boxes, and medium boxes. The large boxes are fed back to the conv5 to extract the first partial image features of the large boxes to obtain large boxes.
  • the second partial image feature of the candidate detection area is directly output to RCNN1; the information of the medium boxes is fed back to conv4 and conv5, respectively, the first partial image feature is extracted, and the first partial image feature of conv5 is subjected to upsampling processing,
  • the spatial resolution of the first partial image feature of the conv4 is the same, and the second partial image features of the candidate detection area in the medium boxes are obtained and output to the RCNN2; the information of the small boxes is fed back.
  • conv3 and conv5 respectively extract the first partial image feature, and expand the first partial image feature of conv5 to make its spatial resolution and conv3 A spatial resolution of the same local image features, after two superimposed layers through a new convx convolution process, characterized in that the second partial image candidate detection region small boxes down the dimension of the output to RCNN3.
  • RCNN1, RCNN2, RCNN3 respectively for the second partial image feature of the candidate detection region in the large region group, the second partial image feature of the candidate detection region in the middle region group, and the second partial image of the candidate detection region in the small region group
  • the features are classified and detected, and the respective detection results are obtained.
  • the final object detection result is output, including the positions of the object 1 and the object 2 and the objects 1, 2.
  • FIG. 6 is a schematic diagram of the object detection result in the embodiment.
  • the second partial image features of the candidate detection regions having different size levels are respectively input into different RCNNs, and the respective recognition results are all people, for example, an RCNN.
  • the person who is near the elevator or far away from the square is identified (the candidate detection area is small), and the other RCNN identifies the person near the square near the square (the candidate detection area is larger), and then outputs the final object detection result. Includes the location of all people and people in the input image.
  • the partial image feature of a part of the candidate detection regions among the plurality of candidate regions is determined by using the partial image features extracted by the at least two convolution layers, thereby being able to extract the partial image.
  • the feature balances the spatial resolution and semantic information to improve the accuracy of object detection.
  • FIG. 7 is a schematic diagram of the electronic device of the third embodiment.
  • the electronic device 700 includes the object detecting device 100 described in Embodiment 1, and the structure of the object detecting device 100 will not be described again.
  • the third embodiment of the present invention provides an electronic device.
  • the principle of the electronic device is similar to that of the second embodiment. Therefore, the specific implementation may refer to the implementation of the method in the second embodiment. The description will not be repeated.
  • Fig. 8 is a schematic block diagram showing the system configuration of an electronic apparatus according to a third embodiment of the present invention.
  • electronic device 800 can include a central processor 801 and a memory 802; the memory 802 is coupled to the central processor 801.
  • the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.
  • the electronic device 800 may further include: an input unit 803, a display 804, and a power source 805.
  • the functions of the object detecting device described in Embodiment 1 may be integrated into the central processing unit 801.
  • the central processing unit 801 can be configured to: extract a global image feature from the input image by using multiple convolution layers; determine a plurality of candidate detection regions by using the global image feature; and utilize the information of the plurality of candidate detection regions according to the information a predetermined number of convolution layers of the plurality of convolution layers, extracting a first partial image feature corresponding to a predetermined number of convolution layers; determining each candidate detection of the plurality of candidate detection regions according to the first partial image feature a second partial image feature of the region; wherein the second partial image feature of a portion of the plurality of candidate detection regions is the first of the extracted convolution layers corresponding to at least two of the predetermined number of convolution layers A partial image feature is determined; and object detection is performed according to the second partial image feature of each candidate detection region, and the object detection result is output.
  • the spatially higher resolution of the convoluted layer of the plurality of convolution layers than the convolutional layer of the lower position is higher than the convolutional layer of the lower convolutional layer Less semantic information.
  • the second partial image feature of the other candidate detection region of the plurality of candidate detection regions is a first partial image feature that utilizes the extracted at least one of the predetermined number of convolution layers definite.
  • each of the plurality of candidate detection regions belongs to one of the first number of regional groups having different regional size levels; and the first one of the first number of regional groups Regional group and second regional group:
  • the central processing unit 801 may be configured to: extract, according to information of the candidate detection area of the first region group, a first partial image feature corresponding to the first predetermined convolution layer, according to the second regional group
  • the information of the candidate detection area is extracted corresponding to the first partial image feature of the second predetermined convolutional layer, wherein a position of one of the first predetermined convolutional layers is convolved with one of the second predetermined convolutional layers
  • the layer is located in front of the layer, wherein the candidate detection area in the first area group is smaller than the candidate detection area in the second area group.
  • the central processing unit 801 is further configured to: determine, according to the first partial image feature of the first predetermined convolution layer, a second partial image feature of the candidate detection region in the first region group, according to the second predetermined convolution A first partial image feature of the layer determines a second partial image feature of the candidate detection region in the second set of regions.
  • the central processing unit 801 can be configured to: determine one of the at least two convolution layers utilized in determining the second partial image feature of the candidate detection region in the first region group a position that is higher than a position of one of the at least two convolutional layers utilized to determine a second partial image feature of the candidate detection region in the second region group, wherein the first region group The candidate detection area is smaller than the candidate detection area in the second area group.
  • the central processing unit 801 can The method is configured to: perform upsampling processing on the first partial image feature of the at least one convolution layer that is corresponding to the extracted corresponding position, and make the space of the first partial image feature of the convolution layer with the highest position corresponding to the extracted corresponding position The resolution is the same, and the first partial image feature of the at least one convolution layer behind the processed corresponding extracted position is added to the first partial image feature of the convolution layer with the highest position of the extracted corresponding position, A second partial image feature corresponding to the candidate detection area is obtained.
  • the central processing unit 801 can The method is configured to: expand, by the first partial image feature of the at least one convolution layer that is corresponding to the extracted corresponding position, to spatially distinguish the first partial image feature of the convolution layer with the highest position of the extracted corresponding position The rates are the same, and the first partial image feature of the convolved layer behind the processed corresponding extracted position is superimposed and convoluted with the first partial image feature of the extracted convolution layer corresponding to the corresponding position to obtain A second partial image feature corresponding to the candidate detection area.
  • the first number of area groups includes: a large area group, a middle area group, and a small area group
  • the central processing unit 801 can be configured to: utilize the extracted convolution corresponding to the position in the plurality of convolution layers
  • the first partial image feature of the layer determines a second partial image feature of the candidate detection region in the large region group; and the first partial image feature extracted using the extracted convolution layer corresponding to the position in the plurality of convolution layers
  • the second partial image feature of the candidate detection region in the middle region group is determined by adding the first partial image feature of the convolution layer whose corresponding position is the penultimate corresponding region;
  • the first partial image feature of the convolutional layer at the position closest to the convolutional layer is expanded and convolved with the first partial image feature of the convolutional layer whose third position is the last to determine the small region group.
  • a second partial image feature of the candidate detection region is configured to: utilize the extracted convolution corresponding to the position in the plurality of convolution layers
  • the central processing unit 801 may be configured to: respectively obtain a first quantity of detection results according to second partial image features of the candidate detection areas corresponding to the first number of area groups, and the first quantity of detections The results are added to output the object detection result.
  • the object detecting apparatus 100 described in Embodiment 1 may be configured separately from the central processing unit 801.
  • the object detecting apparatus 100 may be a chip connected to the central processing unit 801 through the central processing.
  • the control of the object detecting device 100 is realized by the control of the device 801.
  • the electronic device 800 also does not have to include all of the components shown in FIG. 8 in this embodiment.
  • the central processing unit 80 can include a microprocessor or other processor device and/or logic device that receives input and controls the electronic device 800. The operation of the various components.
  • the memory 802 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device.
  • the central processing unit 801 can execute the program stored by the memory 802 to implement information storage or processing and the like.
  • the functions of other components are similar to those of the existing ones and will not be described here.
  • the various components of the electronic device 800 can be implemented by special purpose hardware, firmware, software, or a combination thereof without departing from the scope of the invention.
  • the partial image feature of a part of the candidate detection regions among the plurality of candidate regions is determined by the partial image feature extracted by the at least two convolution layers, thereby being able to extract the partial image
  • the feature balances the spatial resolution and semantic information to improve the accuracy of object detection.
  • the embodiment of the present invention also provides a computer readable program, wherein when the program is executed in the object detecting device, the program causes the computer to execute the object detecting method as in Embodiment 2 above in the object detecting device.
  • the embodiment of the present invention further provides a storage medium storing a computer readable program, wherein the computer readable program causes the computer to execute the object detecting method in Embodiment 2 above in the object detecting device.
  • the above apparatus and method of the present invention may be implemented by hardware or by hardware in combination with software.
  • the present invention relates to a computer readable program that, when executed by a logic component, enables the logic component to implement the apparatus or components described above, or to cause the logic component to implement the various methods described above Or steps.
  • the present invention also relates to a storage medium for storing the above program, such as a hard disk, a magnetic disk, an optical disk, a DVD, a flash memory, or the like.
  • the object detecting method performed in the object detecting apparatus described in connection with the embodiment of the present invention may be directly embodied as hardware, a software module executed by the processor, or a combination of both.
  • one or more of the functional block diagrams shown in FIG. 1 and/or one or more combinations of functional block diagrams may correspond to various software modules of a computer program flow, or to individual hardware modules.
  • These software modules may correspond to the respective steps shown in FIG. 2, respectively.
  • These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
  • FPGA Field Programmable Gate Array
  • the software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
  • a storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the software module can be stored in the memory of the object detecting device or in a memory card of the insertable object detecting device.
  • One or more of the functional blocks described with respect to FIG. 1 and/or one or more combinations of functional blocks may be implemented as a general purpose processor, digital signal processor (DSP), dedicated for performing the functions described herein.
  • DSP digital signal processor
  • One or more of the functional block diagrams described with respect to FIG. 1 and/or one or more combinations of functional block diagrams may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, One or more microprocessors or any other such configuration in conjunction with DSP communication.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种物体检测方法和装置,其中,该方法包括:利用多个卷积层从输入图像中提取全局图像特征;利用该全局图像特征确定多个候选检测区域;根据该信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及,根据该每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。由此,能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。

Description

物体检测方法和装置 技术领域
本发明涉及图像处理技术领域,尤其涉及一种物体检测方法和装置。
背景技术
目标检测技术是计算机视觉领域的重要研究分支,其目的在于找出图像中所有目标的位置,并确定每个目标的具体类别。传统的目标检测方法一般分为三个步骤:首先利用滑动窗口在图像上选择一些候选区域,然后提取这些候选区域的视觉特征,最后使用训练的分类器进行分类识别,得到目标检测结果。
近年来,深度学习被广泛的应用于计算机视觉领域,相较于传统的机器学习算法,深度学习在特征提取方面具有无可比拟的优越性,其中,卷积神经网络(Convolutional Neural Networks,CNN)是深度学习的一个重要算法,卷积神经网络的输入为原始图像,卷积层的卷积核以一定的大小和步幅对原始图像进行卷积运算得到特征图,下采样层对特征图进行采样处理,抽取特征图上一定区域内的最大值或平均值,经过多层卷积和下采样处理后,交由分类器进行分类处理,得到目标检测结果。
应该注意,上面对技术背景的介绍只是为了方便对本发明的技术方案进行清楚、完整的说明,并方便本领域技术人员的理解而阐述的。不能仅仅因为这些方案在本发明的背景技术部分进行了阐述而认为上述技术方案为本领域技术人员所公知。
发明内容
深度学习应用于目标检测领域以来,体系架构不断发展完善,从CNN发展为基于区域的CNN(Region-based CNN,RCNN),从RCNN进一步扩展出快速RCNN,例如Fast RCNN以及Faster RCNN。
在Faster RCNN中,其基本结构仍然是CNN,其通过在卷积神经网络最后一层特征图后添加一个区域推荐网络(Region Proposal Network,RPN),通过该RPN训练得到候选区域,提取候选区域中的图像特征,使用分类器进行分类处理,得到目标检测结果。
在现有的目标检测方法中,先利用多个卷积层对输入图像进行卷积运算,得到特 征图后,再利用RPN确定候选区域,返回最后一个具有较多语义信息的卷积层提取出候选区域内的特征,使用分类器进行分类处理,得到目标检测结果,但由于最后一个卷积层空间分辨率较小,因此该方法在检测输入图像中的小目标物体时,检测精度较低。
本发明实施例提出了一种物体检测方法和装置,能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
本发明实施例的上述目的是通过如下技术方案实现的:
根据本发明实施例的第一个方面,提供了一种物体检测方法,该方法包括:
利用多个卷积层从输入图像中提取全局图像特征;
利用该全局图像特征确定多个候选检测区域;
根据该多个候选检测区域的信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;
根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
根据该每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
根据本发明实施例的第二个方面,提供了一种物体检测装置,该装置包括:
特征提取单元,其用于利用多个卷积层从输入图像中提取全局图像特征;
区域推荐单元,其用于利用该全局图像特征确定多个候选检测区域,将该多个候选检测区域的信息反馈给该特征提取单元;并且该特征提取单元还用于根据该信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;
处理单元,其用于根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
检测单元,其用于根据该每个候选检测区域的第二局部图像特征进行物体检测, 输出物体检测结果。
本发明实施例的有益效果在于,通过本实施例的方法和装置,在提取特征时,多个候选区域中的一部分候选检测区域的局部图像特征是利用至少两个卷积层提取的局部图像特征确定的,由此能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
参照后文的说明和附图,详细公开了本发明的特定实施方式,指明了本发明的原理可以被采用的方式。应该理解,本发明的实施方式在范围上并不因而受到限制。在所附权利要求的条款的范围内,本发明的实施方式包括许多改变、修改和等同。
针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用,与其它实施方式中的特征相组合,或替代其它实施方式中的特征。
应该强调,术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在,但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。
附图说明
在本发明实施例的一个附图或一种实施方式中描述的元素和特征可以与一个或更多个其它附图或实施方式中示出的元素和特征相结合。此外,在附图中,类似的标号表示几个附图中对应的部件,并可用于指示多于一种实施方式中使用的对应部件。
所包括的附图用来提供对本发明实施例的进一步的理解,其构成了说明书的一部分,用于例示本发明的实施方式,并与文字描述一起来阐释本发明的原理。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1是本实施例1中物体检测装置示意图;
图2是本实施例1中一个卷积层的卷积运算示意图;
图3是本实施例1中确定候选检测区域示意图;
图4是本实施例2中物体检测方法流程图;
图5是本实施例2中物体检测结构示意图;
图6是本实施例2中物体检测结果示意图;
图7是本实施例3中电子设备构成示意图;
图8是本实施例3中电子设备硬件构成示意图。
具体实施方式
参照附图,通过下面的说明书,本发明的前述以及其它特征将变得明显。在说明书和附图中,具体公开了本发明的特定实施方式,其表明了其中可以采用本发明的原则的部分实施方式,应了解的是,本发明不限于所描述的实施方式,相反,本发明包括落入所附权利要求的范围内的全部修改、变型以及等同物。下面结合附图对本发明的各种实施方式进行说明。这些实施方式只是示例性的,不是对本发明的限制。
实施例1
本实施例1提供一种物体检测装置。
图1是本实施例1的物体检测装置示意图,如图1所示,该装置包括:
特征提取单元101,其用于利用多个卷积层从输入图像中提取全局图像特征;
区域推荐单元102,其用于利用该全局图像特征确定多个候选检测区域,将该多个候选检测区域的信息反馈给该特征提取单元101;并且该特征提取单元101还用于根据该信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;
处理单元103,其用于根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
检测单元104,其用于根据该每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
由上述实施例可知,在提取特征时,多个候选区域中的一部分候选检测区域的局部图像特征是利用至少两个卷积层提取的局部图像特征确定的,由此能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
在本实施例中,特征提取单元101可以使用卷积神经网络结构实现,其利用卷积神经网络结构中的多个(N)卷积层从输入图像中提取全局图像特征,其中,每个卷积层可以看作为一个滤波器,该滤波器参数可以称为卷积核,该卷积核可以根据需要 设置为一个或至少两个,多个卷积层中每个卷积层对应的滤波器参数均不同,将输入图像转换为二维度的图像数据输入卷积层(滤波器)后,经过卷积运算来提取图像中的特征。
图2是一个卷积层的卷积运算示意图,如图2所示,该图像是5×5的图像,该卷积层对应一个3×3的卷积核
Figure PCTCN2018074706-appb-000001
将该卷积核看作一个滑动窗口,在图像上依次滑动,并与对应的图像像素作乘积后求和,得到提取的全局图像特征。
在本实施例中,为了说明方便,以下将输入图像最先经过的卷积层称为第1层,以此类推,最后经过的卷积层称为第N层,即将多个卷积层依次编号为1,2,…,N,其中,编号小的卷积层是位置靠前的卷积层,编号大的卷积层是位置靠后的卷积层;输入图像经过第1个卷积层后,将提取的全局图像特征作为输入经过第2个卷积层,以此类推,直至经过第N个卷积层;其中,多个卷积层有深浅之分,该多个卷积层中位置靠前的卷积层比位置靠后的卷积层的空间分辨率高;位置靠前的卷积层比位置靠后的卷积层的语义信息少,通常,位置靠前的卷积层称为浅层,位置靠后的卷积层成为深层,即位于浅层的卷积核可以提取出边缘、颜色等图像特征,语义信息少,但空间分辨率高,随着层数的加深,非线性程度加强,卷积得到的图像特征可以是某些特定的形状,例如鼻子眼睛等,语义信息多,但空间分辨率低,其中,多个卷积层的空间分辨率可以按照倍数依次减少,例如第W-1个卷积层的空间分辨率是第W个卷积层的2倍(W大于等于2并且小于等于N),但本实施例并不以此作为限制。
以上仅为示例性的说明如何利用卷积神经网络提取特征,但本实施例并不以此作为限制,该卷积神经网络的结构可以参考现有技术,例如可以是LeNet,AlexNet,ZF Net,GoogleLeNet,VGGNet,ResNet,DenseNet等,此处不再一一举例。
在本实施例中,N的大小可以根据需要确定,本实施例并不以此作为限制,例如N可以是5。
在本实施例中,区域推荐单元102可以使用现有的RPN结构实现,其利用特征提取单元101提取的该全局图像特征确定多个候选检测区域,其中,可以将多个卷积层中利用任意一个或任意两个以上的卷积层提取的全局图像特征输入至该RPN,本实施例并不以此作为限制,例如将第N个卷积层提取的全局图像特征输入至RPN,用于确定候选检测区域,这样由于第N个卷积层的语义信息较多,可以更加准确的 确定候选检测区域,该RPN的具体实施方式可以参考现有技术,以下示例说明。
图3是利用该第N个卷积层提取的全局图像特征确定多个候选检测区域示意图,如图3所示,以第N个卷积层提取的全局图像特征上的每个点为中心,使用不同面积和长宽比的滑动窗口来采集全局图像特征特定区域内的特征,将不同窗口采集到的特征降维到固定维度,根据降维之后的特征,分类层给出每个滑动窗口内包含目标的得分,得分高的窗口作为正样本,得分低的就认为没有物体,被过滤掉,该分类层可以确定候选检测区域的中心锚(anchor)点以及候选检测区域的坐标,宽高;另一个全连接层用于确定候选检测区域是前景还是背景,该全连接层也可以利用卷积层实现,具体实现方式可以参考现有技术,此处不再赘述。
在本实施例中,根据上述算法,区域推荐单元102确定的候选检测区域为多个,其中,该多个候选检测区域可以按照区域大小等级划分为第一数量(M)个区域组,分别为第一区域组、第二区域组、…、第M区域组,不同区域组中的候选检测区域的大小等级不同,例如第一区域组中的候选检测区域都小于第二区域组中的候选检测区域,第二区域组中的候选检测区域都小于第三区域组中的候选检测区域,以此类推,第M-1区域组中的候选检测区域都小于第M区域组中的候选检测区域,其中,M大于等于2,每个区域组中包括至少一个候选检测区域,M的数量可以根据需要确定,例如,在M=3时,即候选检测区域按照区域大小等级可以分为3个区域组,分别为大区域组,中区域组以及小区域组,在M=5时,即候选检测区域按照区域大小等级可以分为5个区域组,分别为超大区域组,较大区域组,中区域组,较小区域组以及超小区域组,此处仅为示例说明,本实施例并不以此作为限制,其中,每个区域组中的候选检测区域的第二局部图像特征的确定方法相同。
在本实施例中,可以通过设定长宽阈值来划分M个区域组,例如M=3时,可设定第一长阈值L1和第一宽阈值W1,以及第二长阈值L2和第二宽阈值W2,在根据RPN确定的候选检测区域的长和宽分别小于L1和W1时,确定该候选检测区域属于小区域组,长和宽分别大于L2和W2时,确定该候选检测区域属于大区域组,其他情况均属于中区域组,此处仅为示例说明,本实施例并不以此作为限制,例如也可以通过设定面积阈值来划分M个区域组,例如M=3时,可设定第一面积阈值S1,以及第二面积阈值S2,在根据RPN确定的候选检测区域的面积小于S1时,确定该候选检测区域属于小区域组,面积大于S2时,确定该候选检测区域属于大区域组,其 他情况属于中区域组,以上仅以M=3为例说明如何划分区域组,M等于其他值时划分区域组的方式与M=3类似,此处不再一一举例。
在本实施例中,在区域推荐单元102确定多个候选检测区域后,可以将该多个候选检测区域的信息反馈给该特征提取单元101;特征提取单元101还可以根据该信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;其中,该预定数量大于等于2且小于等于N;处理单元103根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;该多个候选检测区域中的另一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少一个卷积层的第一局部图像特征确定的,其中,可以根据候选检测区域所属的区域组确定该候选检测区域是该一部分候选检测区域还是该另一部分候选检测区域,例如,可以将区域等级较小的一个区域组中的候选检测区域确定为该一部分候选区域,将区域等级较大的一个区域组中的候选检测区域确定为该另一部分候选检测区域,这样,由于该区域等级较小的一个区域组中的候选检测区域的第二局部图像特征是利用提取的对应至少两个卷积层的第一局部图像特征确定的,因此能够在提取第二局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度,尤其是小目标物体的检测精度。
在一个实施方式中,该特征提取单元101可以根据所有候选检测区域的信息,提取对应该预定数量的卷积层中每一个卷积层的第一局部图像特征,其中,针对每个区域组,特征提取单元101可以根据每个区域组中的候选检测区域的信息提取对应该预定数量的卷积层中每一个卷积层的第一局部图像特征,即利用该预定数量的卷积层中的每一个卷积层,提取对应该卷积层的所有区域组中的候选检测区域的第一局部图像特征,例如预定数量为Z个,利用Z个卷积层中的每一个卷积层,提取对应该卷积层的M个区域组中的候选检测区域的第一局部图像特征。
在该实施方式中,处理单元103从提取的对应该预定数量的卷积层中至少两个卷积层的第一局部图像特征确定该一部分候选检测区域的第二图像特征,从提取的对应该预定数量的卷积层中至少一个卷积层的第一局部图像特征确定该另一部分候选检测区域的第二图像特征,其中,针对该多个区域组中的第一区域组和第二区域组,该 处理单元103确定该第一区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置比确定该第二区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域。例如,处理单元103根据对应第3个和第5个卷积层的第一局部图像特征确定第一区域组中的候选检测区域的第二局部图像特征,根据对应第4个和第5个卷积层的第一局部图像特征确定第二区域组中的候选检测区域的第二局部图像特征,其中,该第3个卷积层的位置比该第4个卷积层的位置靠前。在该实施方式中,并不限定该至少两个卷积层中的其他卷积层的位置的关系,其可以相同,也可以不同。
在一个实施方式中,该特征提取单元101可以根据部分候选检测区域的信息,提取对应该预定数量的卷积层中每一个卷积层的第一局部图像特征,其中,针对一个区域组,特征提取单元101根据该区域组中的候选检测区域的信息提取对应该预定数量的卷积层中部分卷积层的第一局部图像特征,即利用该预定数量的卷积层中的部分卷积层,提取对应该部分卷积层的该区域组中的候选检测区域的第一局部图像特征。
在该实施方式中,针对该多个区域组中的第一区域组和第二区域组,该特征提取单元101根据该第一区域组的候选检测区域的信息提取对应第一预定卷积层的第一局部图像特征,根据该第二区域组的候选检测区域的信息提取对应第二预定卷积层的第一局部图像特征,其中,该第一预定卷积层中的一个卷积层的位置比该第二预定卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域;处理单元103根据该第一预定卷积层的第一局部图像特征确定该第一区域组中的候选检测区域的第二局部图像特征,根据该第二预定卷积层的第一局部图像特征确定该第二区域组中的候选检测区域的第二局部图像特征。例如,该第一预定卷积层可以是第3个和第5个卷积层,该第二预定卷积层可以是第4个和第5个卷积层,根据该第一区域组的候选检测区域的信息提取对应第3个和第5个卷积层的第一局部图像特征,根据该第二区域组的候选检测区域的信息提取对应第4个和第5个卷积层的第一局部图像特征,其中,该第3个卷积层的位置比该第4个卷积层的位置靠前。在该实施方式中,并不限定该第一预定卷积层中的其他卷积层的位置与该第二预定卷积层中的其他卷积层的位置关系,其可以相同,也可以不同。
在本实施例中,根据上述实施方式,确定大小等级较小的区域组中的候选检测区 域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置比确定大小等级较大的区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置靠前,由于位置靠前的卷积层提取的特征的空间分辨率大,因此,能够进一步提高小目标物体的检测精度。
在本实施例中,在利用提取的对应该预定数量的卷积层中的一个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,处理单元103将提取的对应该一个卷积层的第一局部图像特征直接作为另一部分候选检测区域的第二局部图像特征。
在本实施例中,在利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域(该一部分和/或该另一部分)的第二局部图像特征时,处理单元103将该至少两个卷积层中的对应每一个卷积层的第一局部图像特征进行整合处理,得到该候选检测区域的第二局部图像特征,以下具体说明该整合处理。
在一个实施方式中,该处理单元103可以包括:
第一处理模块(未图示),其用于将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作上采样处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的至少一个卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行相加处理,以得到对应该候选检测区域的第二局部图像特征。
在该实施方式中,在Q个卷积层中,将提取的对应位置靠后的Q-1个卷积层的第一局部图像特征分别作上采样处理后,使其与提取的对应位置最靠前的1个卷积层的第一局部图像特征的空间分辨率相同,并将上采样处理后的对应Q-1个卷积层的第一局部图像特征与对应位置最靠前的1个卷积层的第一局部图像特征进行叠加,例如在Q=3时,对应位置最靠前的卷积层的第一局部图像特征的空间分辨率为(H/8,W/8),对应位置靠后的两个卷积层的第一局部图像特征的空间分辨率分别为(H/16,W/16)以及(H/32,W/32),则将空间分辨率分别为(H/16,W/16)以及(H/32,W/32)的第一局部图像特征进行上采样处理后,使其空间分辨率增大到(H/8,W/8),这样3个卷积层处理后空间分辨率相同,即可以相加处理。
在一个实施方式中,该处理单元103可以包括:
第二处理模块(未图示),其用于将提取的对应位置靠后的至少一个卷积层的第 一局部图像特征作扩展处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行叠加卷积处理,以得到对应该候选检测区域的第二局部图像特征。
在该实施方式中,在Q个卷积层中,将提取的对应位置靠后的Q-1个卷积层的第一局部图像特征分别作扩展(enlarge)处理后,使其与提取的对应位置最靠前的1个卷积层的第一局部图像特征的空间分辨率相同,并将扩展处理后的对应Q-1个卷积层的第一局部图像特征与对应位置最靠前的1个卷积层的第一局部图像特征进行叠加卷积处理,例如在Q=3时,对应位置最靠前的卷积层的第一局部图像特征的空间分辨率为(H/8,W/8),对应位置靠后的两个卷积层的第一局部图像特征的空间分辨率分别为(H/16,W/16)以及(H/32,W/32),则将空间分辨率分别为(H/16,W/16)以及(H/32,W/32)的候选检测区域内的第一局部图像特征进行扩展处理,使其空间分辨率增大到(H/8,W/8),这样3个卷积层处理后空间分辨率相同,即可以叠加卷积处理,其中,该扩展处理是指以原有候选检测区域中心点为基准扩大原有的候选检测区域,以提取更多的第一局部图像特征,该卷积处理可以为不同于该多个卷积层的一个新的卷积层,对叠加后的特征进行降维处理。
在本实施例中,该处理单元703可以包括第一处理模块或第二处理模块,或者也可以同时包括第一处理模块和第二处理模块,例如针对区域等级较小的一个区域组中的候选检测区域使用第二处理模块处理,针对区域等级较大的一个区域组中的候选检测区域使用第一处理模块处理,但本实施例并不以此作为限制,需要说明的是,区域等级较小和区域等级较大是指两个区域组中的候选检测区域的相对比较结果。
在本实施例中,在该第一数量个区域组包括:大区域组、中区域组、小区域组时,特征提取单元101可以根据大区域组中的候选检测区域的信息,提取对应位置最靠后的卷积层的第一局部图像特征,根据中区域组中的候选检测区域的信息,提取对应位置最靠后的卷积层以及位置为倒数第三个的卷积层的第一局部图像特征,根据小区域组中的候选检测区域的信息,提取对应位置最靠后的卷积层以及位置为倒数第三个的卷积层的第一局部图像特征;处理单元103利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征确定大区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层提取的第一局部图像特 征作上采样处理后与提取的对应位置为倒数第二个的卷积层的第一局部图像特征相加以确定中区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征作扩展处理后与提取的位置为倒数第三个的卷积层的第一局部图像特征叠加卷积以确定小区域组中的候选检测区域的第二局部图像特征。
例如在N=5时,将大区域组中的候选检测区域的信息反馈给第5个卷积层(一个卷积层),提取对应第5个卷积层的第一局部图像特征,以确定大区域组中的候选检测区域(另一部分候选检测区域)的第二局部图像特征,将中区域组中的候选检测区域的信息反馈给第4和第5个卷积层,提取对应第4和第5个卷积层的第一局部图像特征,将对应第5个卷积层的第一局部图像特征作上采样处理,使其与对应第4个卷积层的第一局部图像特征空间分辨率相同,并将上采样处理后的对应第5个卷积层的第一局部图像特征和对应第4个卷积层的第一局部图像特征(两个卷积层,此处示例为2,但可以为至少2个)相加,以确定中区域组中的候选检测区域(一部分候选检测区域)的第二局部图像特征,将小区域组中的候选检测区域的信息反馈给第3和第5个卷积层,提取对应第3和第5个卷积层的第一局部图像特征,将对应第5个卷积层的第一局部图像特征作扩展处理,使其与对应第3个卷积层的第一局部图像特征空间分辨率相同,并将扩展处理后的对应第5个卷积层的第一局部图像特征和对应第3个卷积层的第一局部图像特征叠加(两个卷积层,此处示例为2,但可以为至少2个)后经过一个新的卷积层,以确定小区域组中的候选检测区域(一部分候选检测区域)的第二局部图像特征。
在本实施例中,检测单元104可以基于RCNN结构进行物体检测,在多个候选检测区域根据区域大小等级划分为第一数量个区域组时,可以根据对应第一数量个区域组的候选检测区域的第二局部图像特征分别得到第一数量个检测结果,将该第一数量个检测结果相加,以输出该物体检测结果。例如可以设置与第一数量相同数量的RCNN,每一个RCNN分别对一个区域组中的候选检测区域提取的第二局部图像特征进行物体检测,将每一个RCNN的识别结果相加,以输出该物体检测结果,该物体检测结果中包括目标物体的类别以及具体的位置,例如M=3时,设置3个RCNN,RCNN1,RCNN2,RCNN3,分别对大区域组中的候选检测区域的第二局部图像特征,中区域组中的候选检测区域的第二局部图像特征,小区域组中的候选检测区域的第二 局部图像特征进行物体检测,RCNN1检测结果是物体1,RCNN2检测结果是物体2,RCNN3检测结果是物体3,另外,检测结果也可以包括对物体1,2,3的定位,最终的物体检测结果为输入图像中具有物体1,2,3。
在本实施例中,RCNN的具体实施方式可以参考现有技术,例如将提取的第二局部图像特征利用感兴趣区域池(ROI pooling)提取特征向量,输入至分类器,得到该候选检测区域的物体类别,完成物体检测以及定位,此处不再一一赘述。
通过本实施例的上述装置,在提取特征时,多个候选区域中的一部分候选检测区域的局部图像特征是利用至少两个卷积层提取的局部图像特征确定的,由此能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
实施例2
本实施例2提供一种物体检测方法,由于该方法解决问题的原理与实施例1中的装置类似,因此其具体的实施可以参考实施例1中的装置的实施,内容相同之处,不再重复说明。
图4是本实施例2的物体检测方法流程图,如图4所示,该方法包括:
步骤401,利用多个卷积层从输入图像中提取全局图像特征;
步骤402,利用该全局图像特征确定多个候选检测区域;
步骤403,根据该多个候选检测区域的信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;
步骤404,根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
步骤405,根据该每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
在本实施例中,步骤401-405的具体实施方式可以参考实施例1中的物体检测装置100,重复之处不再赘述
在本实施例中,该多个卷积层中位置靠前的卷积层比位置靠后的卷积层的空间分辨率高;位置靠前的卷积层比位置靠后的卷积层的语义信息少。
在本实施例中,该多个候选检测区域中的另一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少一个卷积层的第一局部图像特征确定的。
在本实施例中,该多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组;针对该多个区域组中的第一区域组和第二区域组:
在一个实施方式中,在步骤403中,根据该第一区域组的候选检测区域的信息提取对应第一预定卷积层的第一局部图像特征,根据该第二区域组的候选检测区域的信息提取对应第二预定卷积层的第一局部图像特征,其中,该第一预定卷积层中的一个卷积层的位置比该第二预定卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域;在步骤404中,根据该第一预定卷积层的第一局部图像特征确定该第一区域组中的候选检测区域的第二局部图像特征,根据该第二预定卷积层的第一局部图像特征确定该第二区域组中的候选检测区域的第二局部图像特征。
在一个实施方式中,在步骤404中,确定该第一区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置,比确定该第二区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域。
在一个实施方式中,在利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作上采样处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的至少一个卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行相加处理,以得到对应该候选检测区域的第二局部图像特征。
在一个实施方式中,在利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,根据该第一局部图像 特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作扩展处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行叠加卷积处理,以得到对应该候选检测区域的第二局部图像特征。
在本实施例中,该第一数量个区域组包括:大区域组、中区域组、小区域组,根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征确定大区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层提取的第一局部图像特征作上采样处理后与提取的对应位置为倒数第二个的卷积层的第一局部图像特征相加,以确定中区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征作扩展处理后与提取的位置为倒数第三个的卷积层的第一局部图像特征叠加卷积,以确定小区域组中的候选检测区域的第二局部图像特征。
在本实施例中,在步骤405中,根据对应第一数量个区域组的候选检测区域的第二局部图像特征分别得到第一数量个检测结果,将该第一数量个检测结果相加,以输出该物体检测结果。
以下以M=3,N=5为例,结合附图5对本实施例中的物体检测方法进行说明。
在本实施例中,如图5所示,输入图像经过五个卷积层conv1-conv5(其中conv1-2未示出)后,可以提取全局图像特征,RPN根据该全局图像特征确定3个区域组,分别为大区域组组(large boxes),小区域组(small boxes),中区域组(medium boxes),将large boxes的信息反馈给conv5提取large boxes的第一局部图像特征,得到large boxes中的候选检测区域的第二局部图像特征,直接输出至RCNN1;将medium boxes的信息反馈给conv4和conv5,分别提取第一局部图像特征,将conv5的第一局部图像特征进行上采样处理后,使其空间分辨率与conv4的第一局部图像特征的空间分辨率相同,二者相加后,得到medium boxes中的候选检测区域的第二局部图像特征,输出至RCNN2;将small boxes的信息反馈给conv3和conv5,分别提取第一局部图像特征,将conv5的第一局部图像特征进行扩展处理后,使其空间分辨率与conv3的 第一局部图像特征的空间分辨率相同,二者叠加后经过一个新的卷积层convx处理,使得降维后的small boxes中的候选检测区域的第二局部图像特征输出至RCNN3。RCNN1,RCNN2,RCNN3分别对大区域组中的候选检测区域的第二局部图像特征,中区域组中的候选检测区域的第二局部图像特征,小区域组中的候选检测区域的第二局部图像特征进行分类识别检测,得到各自的检测结果,相加后,输出最终的物体检测结果,包括物体1和物体2以及物体1,2的位置。
图6是本实施例中的物体检测结果示意图,如图6所示,将大小等级不同的候选检测区域的第二局部图像特征分别输入不同RCNN,得到各自的识别结果都是人,例如一个RCNN识别的是电梯附近或广场远处的人(候选检测区域较小),另一个RCNN识别的是广场近处平地附近的人(候选检测区域较大)相加后,输出最终的物体检测结果,包括输入图像中所有的人以及人的位置。
通过本实施例的上述方法,在提取特征时,多个候选区域中的一部分候选检测区域的局部图像特征是利用至少两个卷积层提取的局部图像特征确定的,由此能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
实施例3
本实施例3提供了一种电子设备,图7是本实施例3的电子设备的示意图。如图7所示,电子设备700包括实施例1所述的物体检测装置100,该物体检测装置100的结构不再赘述。
本实施例3还提供了一种电子设备,由于该电子设备解决问题的原理与实施例2中的方法类似,因此其具体的实施可以参考实施例2中的方法的实施,内容相同之处,不再重复说明。
图8是本发明实施例3的电子设备的系统构成的示意框图。如图8所示,电子设备800可以包括中央处理器801和存储器802;该存储器802耦合到该中央处理器801。该图是示例性的;还可以使用其它类型的结构,来补充或代替该结构,以实现电信功能或其它功能。
如图8所示,该电子设备800还可以包括:输入单元803、显示器804、电源805。
在一个实施方式中,实施例1所述的物体检测装置的功能可以被集成到该中央处理器801中。其中,该中央处理器801可以被配置为:利用多个卷积层从输入图像中 提取全局图像特征;利用该全局图像特征确定多个候选检测区域;根据该多个候选检测区域的信息,利用该多个卷积层中预定数量的卷积层,提取对应该预定数量的卷积层的第一局部图像特征;根据该第一局部图像特征确定该多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,该多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及,根据该每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
在本实施例中,该多个卷积层中位置靠前的卷积层比位置靠后的卷积层的空间分辨率高;位置靠前的卷积层比位置靠后的卷积层的语义信息少。
在本实施例中,该多个候选检测区域中的另一部分候选检测区域的第二局部图像特征是利用提取的对应该预定数量的卷积层中的至少一个卷积层的第一局部图像特征确定的。
在本实施例中,该多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组;并且针对该第一数量个区域组中的第一区域组和第二区域组:
在一个实施方式中,该中央处理器801可以被配置为:根据该第一区域组的候选检测区域的信息提取对应第一预定卷积层的第一局部图像特征,根据该第二区域组的候选检测区域的信息提取对应第二预定卷积层的第一局部图像特征,其中,该第一预定卷积层中的一个卷积层的位置比该第二预定卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域。
该中央处理器801还可以被配置为:根据该第一预定卷积层的第一局部图像特征确定该第一区域组中的候选检测区域的第二局部图像特征,根据该第二预定卷积层的第一局部图像特征确定该第二区域组中的候选检测区域的第二局部图像特征。
在一个实施方式中,该中央处理器801可以被配置为:确定该第一区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置比确定该第二区域组中的候选检测区域的第二局部图像特征时所利用的该至少两个卷积层中的一个卷积层的位置靠前,其中,该第一区域组中的候选检测区域小于该第二区域组中的候选检测区域。
在利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特 征确定候选检测区域的第二局部图像特征时,在一个实施方式中,该中央处理器801可以被配置为:将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作上采样处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的至少一个卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行相加处理,以得到对应该候选检测区域的第二局部图像特征。
在利用提取的对应该预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,在一个实施方式中,该中央处理器801可以被配置为:将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作扩展处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行叠加卷积处理,以得到对应该候选检测区域的第二局部图像特征。
在第一数量个区域组包括:大区域组、中区域组、小区域组,该中央处理器801可以被配置为:利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征确定大区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层提取的第一局部图像特征作上采样处理后与提取的对应位置为倒数第二个的卷积层的第一局部图像特征相加以确定中区域组中的候选检测区域的第二局部图像特征;利用提取的对应该多个卷积层中的位置最靠后的卷积层的第一局部图像特征作扩展处理后与提取的位置为倒数第三个的卷积层的第一局部图像特征叠加卷积以确定小区域组中的候选检测区域的第二局部图像特征。
在本实施例中,该中央处理器801可以被配置为:根据对应第一数量个区域组的候选检测区域的第二局部图像特征分别得到第一数量个检测结果,将该第一数量个检测结果相加,以输出该物体检测结果。
在另一个实施方式中,实施例1所述的物体检测装置100可以与该中央处理器801分开配置,例如可以将该物体检测装置100为与该中央处理器801连接的芯片,通过该中央处理器801的控制来实现该物体检测装置100的功能。
在本实施例中该电子设备800也并不是必须要包括图8中所示的所有部件。
如图8所示,该中央处理器801有时也称为控制器或操作控件,可以包括微处理 器或其它处理器装置和/或逻辑装置,该中央处理器801接收输入并控制该电子设备800的各个部件的操作。
该存储器802,例如可以是缓存器、闪存、硬驱、可移动介质、易失性存储器、非易失性存储器或其它合适装置中的一种或更多种。并且该中央处理器801可执行该存储器802存储的该程序,以实现信息存储或处理等。其它部件的功能与现有类似,此处不再赘述。该电子设备800的各部件可以通过专用硬件、固件、软件或其结合来实现,而不偏离本发明的范围。
通过本实施例的上述装置,在提取特征时,多个候选区域中的一部分候选检测区域的局部图像特征是利用至少两个卷积层提取的局部图像特征确定的,由此能够在提取局部图像特征时实现空间分辨率和语义信息的平衡,提高物体检测精度。
本发明实施例还提供一种计算机可读程序,其中当在物体检测装置中执行该程序时,该程序使得计算机在该物体检测装置中执行如上面实施例2中的物体检测方法。
本发明实施例还提供一种存储有计算机可读程序的存储介质,其中该计算机可读程序使得计算机在物体检测装置中执行上面实施例2中的物体检测方法。
本发明以上的装置和方法可以由硬件实现,也可以由硬件结合软件实现。本发明涉及这样的计算机可读程序,当该程序被逻辑部件所执行时,能够使该逻辑部件实现上文所述的装置或构成部件,或使该逻辑部件实现上文所述的各种方法或步骤。本发明还涉及用于存储以上程序的存储介质,如硬盘、磁盘、光盘、DVD、flash存储器等。
结合本发明实施例描述的在物体检测装置中执行的物体检测方法可直接体现为硬件、由处理器执行的软件模块或二者组合。例如,图1中所示的功能框图中的一个或多个和/或功能框图的一个或多个组合,既可以对应于计算机程序流程的各个软件模块,亦可以对应于各个硬件模块。这些软件模块,可以分别对应于图2所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。
软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域已知的任何其它形式的存储介质。可以将一种存储介质耦接至处理器,从而使处理器能够从该存储介质读取信 息,且可向该存储介质写入信息;或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该软件模块可以存储在物体检测装置的存储器中,也可以存储在可插入物体检测装置的存储卡中。
针对图1描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立门或晶体管逻辑器件、分立硬件组件、或者其任意适当组合。针对图1描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,还可以实现为计算设备的组合,例如,DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其它这种配置。
以上结合具体的实施方式对本发明进行了描述,但本领域技术人员应该清楚,这些描述都是示例性的,并不是对本发明保护范围的限制。本领域技术人员可以根据本发明的精神和原理对本发明做出各种变型和修改,这些变型和修改也在本发明的范围内。

Claims (20)

  1. 一种物体检测装置,其中,所述装置包括:
    特征提取单元,其用于利用多个卷积层从输入图像中提取全局图像特征;
    区域推荐单元,其用于利用所述全局图像特征确定多个候选检测区域,将所述多个候选检测区域的信息反馈给所述特征提取单元;并且所述特征提取单元还用于根据所述信息,利用所述多个卷积层中预定数量的卷积层,提取对应所述预定数量的卷积层的第一局部图像特征;
    处理单元,其用于根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,所述多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
    检测单元,其用于根据所述每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
  2. 根据权利要求1所述的装置,其中,所述多个卷积层中位置靠前的卷积层比位置靠后的卷积层的空间分辨率高;位置靠前的卷积层比位置靠后的卷积层的语义信息少。
  3. 根据权利要求1所述的装置,其中,所述多个候选检测区域中的另一部分候选检测区域的第二局部图像特征是利用提取的对应所述预定数量的卷积层中的至少一个卷积层的第一局部图像特征确定的。
  4. 根据权利要求1所述的装置,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组;
    并且针对所述多个区域组中的第一区域组和第二区域组,所述特征提取单元根据所述第一区域组的候选检测区域的信息提取对应第一预定卷积层的第一局部图像特征,根据所述第二区域组的候选检测区域的信息提取对应第二预定卷积层的第一局部图像特征,其中,所述第一预定卷积层中的一个卷积层的位置比所述第二预定卷积层中的一个卷积层的位置靠前,其中,所述第一区域组中的候选检测区域小于所述第二区域组中的候选检测区域。
  5. 根据权利要求4所述的装置,其中,所述处理单元根据所述第一预定卷积层 的第一局部图像特征确定所述第一区域组中的候选检测区域的第二局部图像特征,根据所述第二预定卷积层的第一局部图像特征确定所述第二区域组中的候选检测区域的第二局部图像特征。
  6. 根据权利要求1所述的装置,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,针对所述多个区域组中的第一区域组和第二区域组,所述处理单元确定所述第一区域组中的候选检测区域的第二局部图像特征时所利用的所述至少两个卷积层中的一个卷积层的位置比确定所述第二区域组中的候选检测区域的第二局部图像特征时所利用的所述至少两个卷积层中的一个卷积层的位置靠前,其中,所述第一区域组中的候选检测区域小于所述第二区域组中的候选检测区域。
  7. 根据权利要求1所述的装置,其中,在利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,所述处理单元包括:
    第一处理模块,其用于将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作上采样处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的至少一个卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行相加处理,以得到对应所述候选检测区域的第二局部图像特征。
  8. 根据权利要求1所述的装置,其中,在利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,所述处理单元包括:
    第二处理模块,其用于将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作扩展处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行叠加卷积处理,以得到对应所述候选检测区域的第二局部图像特征。
  9. 根据权利要求1所述的装置,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,其中所述第一数量个区域组包括:大区域组、中区域组、小区域组,所述处理单元利用提取的对应所 述多个卷积层中的位置最靠后的卷积层的第一局部图像特征确定大区域组中的候选检测区域的第二局部图像特征;
    利用提取的对应所述多个卷积层中的位置最靠后的卷积层提取的第一局部图像特征作上采样处理后与提取的对应位置为倒数第二个的卷积层的第一局部图像特征相加以确定中区域组中的候选检测区域的第二局部图像特征;
    利用提取的对应所述多个卷积层中的位置最靠后的卷积层的第一局部图像特征作扩展处理后与提取的位置为倒数第三个的卷积层的第一局部图像特征叠加卷积以确定小区域组中的候选检测区域的第二局部图像特征。
  10. 根据权利要求1所述的装置,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,所述检测单元根据对应第一数量个区域组的候选检测区域的第二局部图像特征分别得到第一数量个检测结果,将所述第一数量个检测结果相加,以输出所述物体检测结果。
  11. 一种物体检测方法,其中,所述方法包括:
    利用多个卷积层从输入图像中提取全局图像特征;
    利用所述全局图像特征确定多个候选检测区域;
    根据所述多个候选检测区域的信息,利用所述多个卷积层中预定数量的卷积层,提取对应所述预定数量的卷积层的第一局部图像特征;
    根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征;其中,所述多个候选检测区域中的一部分候选检测区域的第二局部图像特征是利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定的;以及
    根据所述每个候选检测区域的第二局部图像特征进行物体检测,输出物体检测结果。
  12. 根据权利要求11所述的方法,其中,所述多个卷积层中位置靠前的卷积层比位置靠后的卷积层的空间分辨率高;位置靠前的卷积层比位置靠后的卷积层的语义信息少。
  13. 根据权利要求11所述的方法,其中,所述多个候选检测区域中的另一部分候选检测区域的第二局部图像特征是利用提取的对应所述预定数量的卷积层中的至少一个卷积层的第一局部图像特征确定的。
  14. 根据权利要求11所述的方法,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组;
    并且针对所述多个区域组中的第一区域组和第二区域组,提取对应所述预定数量的卷积层的第一局部图像特征包括:
    根据所述第一区域组的候选检测区域的信息提取对应第一预定卷积层的第一局部图像特征,根据所述第二区域组的候选检测区域的信息提取对应第二预定卷积层的第一局部图像特征,其中,所述第一预定卷积层中的一个卷积层的位置比所述第二预定卷积层中的一个卷积层的位置靠前,其中,所述第一区域组中的候选检测区域小于所述第二区域组中的候选检测区域。
  15. 根据权利要求14所述的方法,其中,根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:
    根据所述第一预定卷积层的第一局部图像特征确定所述第一区域组中的候选检测区域的第二局部图像特征,根据所述第二预定卷积层的第一局部图像特征确定所述第二区域组中的候选检测区域的第二局部图像特征。
  16. 根据权利要求11所述的方法,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,针对所述多个区域组中的第一区域组和第二区域组,其中,确定所述第一区域组中的候选检测区域的第二局部图像特征时所利用的所述至少两个卷积层中的一个卷积层的位置比确定所述第二区域组中的候选检测区域的第二局部图像特征时所利用的所述至少两个卷积层中的一个卷积层的位置靠前,其中,所述第一区域组中的候选检测区域小于所述第二区域组中的候选检测区域。
  17. 根据权利要求11所述的方法,其中,在利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:
    将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作上采样处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的至少一个卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行相加处理,以得到对应所述候选检测区域 的第二局部图像特征。
  18. 根据权利要求11所述的方法,其中,在利用提取的对应所述预定数量的卷积层中的至少两个卷积层的第一局部图像特征确定候选检测区域的第二局部图像特征时,根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:
    将提取的对应位置靠后的至少一个卷积层的第一局部图像特征作扩展处理,使其与提取的对应位置最靠前的卷积层的第一局部图像特征的空间分辨率相同,并将处理后的提取的对应位置靠后的卷积层的第一局部图像特征与提取的对应位置最靠前的卷积层的第一局部图像特征进行叠加卷积处理,以得到对应所述候选检测区域的第二局部图像特征。
  19. 根据权利要求11所述的方法,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,其中所述第一数量个区域组包括:大区域组、中区域组、小区域组,根据所述第一局部图像特征确定所述多个候选检测区域中的每个候选检测区域的第二局部图像特征包括:利用提取的对应所述多个卷积层中的位置最靠后的卷积层的第一局部图像特征确定大区域组中的候选检测区域的第二局部图像特征;
    利用提取的对应所述多个卷积层中的位置最靠后的卷积层提取的第一局部图像特征作上采样处理后与提取的对应位置为倒数第二个的卷积层的第一局部图像特征相加以确定中区域组中的候选检测区域的第二局部图像特征;
    利用提取的对应所述多个卷积层中的位置最靠后的卷积层的第一局部图像特征作扩展处理后与提取的位置为倒数第三个的卷积层的第一局部图像特征叠加卷积以确定小区域组中的候选检测区域的第二局部图像特征。
  20. 根据权利要求11所述的方法,其中,所述多个候选检测区域中的每个候选检测区域属于区域大小等级不同的第一数量个区域组中的一个区域组,根据对应第一数量个区域组的候选检测区域的第二局部图像特征分别得到第一数量个检测结果,将所述第一数量个检测结果相加,以输出所述物体检测结果。
PCT/CN2018/074706 2018-01-31 2018-01-31 物体检测方法和装置 WO2019148362A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880055754.3A CN111095295B (zh) 2018-01-31 2018-01-31 物体检测方法和装置
JP2020529127A JP6984750B2 (ja) 2018-01-31 2018-01-31 物体検出方法及び装置
PCT/CN2018/074706 WO2019148362A1 (zh) 2018-01-31 2018-01-31 物体检测方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/074706 WO2019148362A1 (zh) 2018-01-31 2018-01-31 物体检测方法和装置

Publications (1)

Publication Number Publication Date
WO2019148362A1 true WO2019148362A1 (zh) 2019-08-08

Family

ID=67477854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/074706 WO2019148362A1 (zh) 2018-01-31 2018-01-31 物体检测方法和装置

Country Status (3)

Country Link
JP (1) JP6984750B2 (zh)
CN (1) CN111095295B (zh)
WO (1) WO2019148362A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705544A (zh) * 2019-09-05 2020-01-17 中国民航大学 基于Faster-RCNN的自适应快速目标检测方法
CN111553200A (zh) * 2020-04-07 2020-08-18 北京农业信息技术研究中心 一种图像检测识别方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205131A (zh) * 2021-04-28 2021-08-03 阿波罗智联(北京)科技有限公司 图像数据的处理方法、装置、路侧设备和云控平台

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017173605A1 (en) * 2016-04-06 2017-10-12 Xiaogang Wang Method and system for person recognition
CN107341517A (zh) * 2017-07-07 2017-11-10 哈尔滨工业大学 一种基于深度学习层级间特征融合的多尺度小物体检测方法
CN107463892A (zh) * 2017-07-27 2017-12-12 北京大学深圳研究生院 一种结合上下文信息和多级特征的图像中行人检测方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018005520A (ja) * 2016-06-30 2018-01-11 クラリオン株式会社 物体検出装置及び物体検出方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017173605A1 (en) * 2016-04-06 2017-10-12 Xiaogang Wang Method and system for person recognition
CN107341517A (zh) * 2017-07-07 2017-11-10 哈尔滨工业大学 一种基于深度学习层级间特征融合的多尺度小物体检测方法
CN107463892A (zh) * 2017-07-27 2017-12-12 北京大学深圳研究生院 一种结合上下文信息和多级特征的图像中行人检测方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705544A (zh) * 2019-09-05 2020-01-17 中国民航大学 基于Faster-RCNN的自适应快速目标检测方法
CN110705544B (zh) * 2019-09-05 2023-04-07 中国民航大学 基于Faster-RCNN的自适应快速目标检测方法
CN111553200A (zh) * 2020-04-07 2020-08-18 北京农业信息技术研究中心 一种图像检测识别方法及装置

Also Published As

Publication number Publication date
JP6984750B2 (ja) 2021-12-22
CN111095295A (zh) 2020-05-01
JP2021505992A (ja) 2021-02-18
CN111095295B (zh) 2021-09-03

Similar Documents

Publication Publication Date Title
CN110084292B (zh) 基于DenseNet和多尺度特征融合的目标检测方法
CN110533084B (zh) 一种基于自注意力机制的多尺度目标检测方法
US11315253B2 (en) Computer vision system and method
US11164027B2 (en) Deep learning based license plate identification method, device, equipment, and storage medium
CN112132156B (zh) 多深度特征融合的图像显著性目标检测方法及系统
EP3690742A1 (en) Method for auto-labeling training images for use in deep learning network to analyze images with high precision, and auto-labeling device using the same
US20210081695A1 (en) Image processing method, apparatus, electronic device and computer readable storage medium
CN106778835A (zh) 融合场景信息和深度特征的遥感图像机场目标识别方法
CN106971185B (zh) 一种基于全卷积网络的车牌定位方法及装置
US20180114092A1 (en) Devices, systems, and methods for anomaly detection
US10387752B1 (en) Learning method and learning device for object detector with hardware optimization based on CNN for detection at distance or military purpose using image concatenation, and testing method and testing device using the same
CN111145209A (zh) 一种医学图像分割方法、装置、设备及存储介质
CN114067186B (zh) 一种行人检测方法、装置、电子设备及存储介质
WO2015184899A1 (zh) 一种车辆车牌识别方法及装置
US20180137630A1 (en) Image processing apparatus and method
CN111488901B (zh) 在cnn中从多个模块内的输入图像提取特征的方法及装置
CN110135446B (zh) 文本检测方法及计算机存储介质
CN109087337B (zh) 基于分层卷积特征的长时间目标跟踪方法及系统
WO2019148362A1 (zh) 物体检测方法和装置
CN105261021A (zh) 去除前景检测结果阴影的方法及装置
US20200234135A1 (en) LEARNING METHOD AND LEARNING DEVICE FOR CNN USING 1xK OR Kx1 CONVOLUTION TO BE USED FOR HARDWARE OPTIMIZATION, AND TESTING METHOD AND TESTING DEVICE USING THE SAME
CN110246171B (zh) 一种实时单目视频深度估计方法
CN111738114A (zh) 基于无锚点精确采样遥感图像车辆目标检测方法
CN103530646B (zh) 使用分类器级联的复杂对象检测
CN106529441A (zh) 基于模糊边界分片的深度动作图人体行为识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18903362

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020529127

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18903362

Country of ref document: EP

Kind code of ref document: A1