WO2018121690A1

WO2018121690A1 - Object attribute detection method and device, neural network training method and device, and regional detection method and device

Info

Publication number: WO2018121690A1
Application number: PCT/CN2017/119535
Authority: WO
Inventors: 邵婧; 闫俊杰
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2016-12-29
Filing date: 2017-12-28
Publication date: 2018-07-05
Also published as: CN108229267A; CN108229267B

Abstract

An object attribute detection method and device, a neural network training method and device, and a region detection method and device. The object attribute detection method comprises: inputting an image to be detected to an attention neural network for region detection to obtain at least one target region, in the image to be detected, associated with an object attribute of a target (S102); and inputting the image to be detected and the at least one target region to an attribute classification neural network for attribute detection to obtain object attribute information of the image to be detected (S104).

Description

Object attribute detection, neural network training, area detection method and device

This application claims the priority of the Chinese Patent Application filed on December 29, 2016, the Chinese Patent Office, the application number is CN201611246395.9, and the invention name is "object attribute detection, neural network training, area detection method and device". The content is incorporated herein by reference.

Technical field

The present application relates to artificial intelligence technology, and in particular, to an object attribute detection method and apparatus, a neural network training method and apparatus, and an area detection method and apparatus, and an electronic apparatus.

Background technique

Convolutional neural networks are an important research field for computer vision and pattern recognition. They use computer-like biological brain thinking to inspire information processing similar to humans to specific objects. Through the convolutional neural network, object detection and recognition can be performed efficiently. With the development of Internet technology and the rapid increase of information volume, convolutional neural networks are more and more widely used in the field of object detection and recognition to find out the actual information needed from a large amount of information.

Summary of the invention

The embodiment of the present application provides an object attribute detection scheme, a neural network training scheme, and an area detection scheme.

According to a first aspect of the embodiments of the present application, an object attribute detecting method includes: inputting an image to be detected into an attention neural network for area detection, and obtaining an image of the object to be detected in the image to be detected. At least one target area; inputting the to-be-detected image and the at least one target area into an attribute classification neural network for attribute detection, and obtaining object attribute information of the to-be-detected image.

According to a second aspect of the embodiments of the present application, a neural network training method is provided, including: inputting a training sample image into an attention neural network for area training, obtaining probability information of a candidate target area; and according to the candidate target area The probability information is used to sample the candidate target region of the training sample image to obtain the sampled image sample; and input the attribute information of the target region and the image sample into the auxiliary classification network for attribute training to obtain the image sample. The accuracy information of the candidate target area; the attribute information of the target area is attribute information of the target area marked for the training sample image; and the parameter of the attention neural network is adjusted according to the accuracy information.

According to a third aspect of the embodiments of the present application, a region detecting method includes: acquiring a target image to be detected, wherein the target image includes a still image or a video image; and detecting the target image by using an attention neural network Obtaining a target area of the target image; wherein the attention neural network is trained using a neural network training method as described in any of the embodiments of the present application.

According to a fourth aspect of the embodiments of the present invention, there is provided an object attribute detecting apparatus, comprising: a first acquiring module, configured to input an image to be detected into an attention neural network for area detection, and obtain the image to be detected. At least one target area associated with the object attribute of the target; a second obtaining module, configured to input the image to be detected and the at least one target area into an attribute classification neural network for attribute detection, and obtain the image to be detected Object property information.

According to a fifth aspect of the embodiments of the present application, a neural network training apparatus is provided, including: a sixth acquiring module, configured to input a training sample image into an attention neural network for area training, and obtain probability information of a candidate target area. a seventh acquiring module, configured to perform candidate target region sampling on the training sample image according to probability information of the candidate target region, to obtain a sampled image sample, and an eighth obtaining module, configured to: perform attribute information of the target region and And inputting the image sample into the auxiliary classification network for attribute training, obtaining accuracy information of the candidate target area in the image sample; and the attribute information of the target area is attribute information of the target area marked for the training sample image The second parameter adjustment module is configured to adjust parameters of the attention neural network according to the accuracy information.

According to a sixth aspect of the embodiments of the present application, an area detecting apparatus is provided, including: a ninth obtaining module, configured to acquire a target image to be detected, where the target image includes a still image or a video image; a module, configured to detect the target image by using an attention neural network to obtain a target area of the target image; wherein the attention neural network adopts a neural network training method or a neural network according to any embodiment of the present application The training device is trained.

According to a seventh aspect of the embodiments of the present application, an electronic device is provided, including:

Processor and memory;

The memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the object attribute detecting method according to any one of the embodiments of the present application; or the memory is configured to store at least one An executable instruction, the executable instruction causing the processor to perform an operation corresponding to the neural network training method according to any one of the embodiments of the present application; or the memory is configured to store at least one executable instruction, the executable The instructions cause the processor to perform operations corresponding to the area detecting method described in any of the embodiments of the present application.

According to an eighth aspect of the embodiments of the present application, another electronic device is provided, including:

The processor and the object attribute detecting apparatus according to any one of the embodiments of the present application; when the processor runs the object attribute detecting apparatus, the unit in the object attribute detecting apparatus according to any one of the embodiments of the present application is operated; or

The processor and the neural network training device according to any one of the embodiments of the present application; when the processor runs the neural network training device, the unit in the neural network training device according to any one of the embodiments of the present application is operated; or

The processor and the area detecting device according to any one of the embodiments of the present application; when the processor runs the area detecting device, the unit in the area detecting device according to any one of the embodiments of the present application is operated. According to a ninth aspect of embodiments of the present application, there is provided a computer program comprising computer readable code, when the computer readable code is run on a device, the processor in the device is operative to implement any of the present application An instruction of each step in the object attribute detecting method described in the embodiment; or

When the computer readable code is run on a device, the processor in the device executes instructions for implementing the steps in the neural network training method described in any of the embodiments of the present application; or

When the computer readable code is run on a device, the processor in the device executes instructions for implementing the steps in the region detecting method of any of the embodiments of the present application.

According to a tenth aspect of the embodiments of the present application, a computer readable storage medium is provided for storing computer readable instructions, and when the instructions are executed, implementing an object attribute detecting method according to any one of the embodiments of the present application The operation of each step, or the operation of each step in the neural network training method according to any one of the embodiments of the present application, or the operation of each step in the area detecting method according to any one of the embodiments of the present application.

According to the technical solution provided by the embodiment of the present application, an Attention neural network is used to detect an area of a target in an image to be inspected, and then an image region detected by the attention neural network is input into an attribute classification neural network to perform attribute detection of the target. Obtain the corresponding object attribute information. The trained attention neural network can accurately detect the target area in the image, and perform targeted attribute detection on the area to obtain more accurate object attribute information.

The technical solutions of the present application are further described in detail below through the accompanying drawings and embodiments.

DRAWINGS

The accompanying drawings, which are incorporated in FIG.

The present application can be more clearly understood from the following detailed description, in which:

1 is a flowchart of an object attribute detecting method according to an embodiment of the present application;

2 is a flowchart of an object attribute detecting method according to an embodiment of the present application;

3 is a flowchart of a neural network training method according to an embodiment of the present application;

4 is a flowchart of an area detecting method according to an embodiment of the present application;

FIG. 5 is a structural block diagram of an object attribute detecting apparatus according to an embodiment of the present application; FIG.

6 is a structural block diagram of a neural network training apparatus according to an embodiment of the present application;

FIG. 7 is a structural block diagram of an area detecting apparatus according to an embodiment of the present application; FIG.

FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

The optional embodiments of the embodiments of the present application are further described in detail below with reference to the accompanying drawings, wherein The following examples are intended to illustrate the application, but are not intended to limit the scope of the application.

Those skilled in the art can understand that the terms “first”, “second” and the like in the embodiments of the present application are only used to distinguish different steps, devices or modules, etc., and do not represent any specific technical meaning or between them. The inevitable logical order.

It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments are not intended to limit the scope of the application.

The following description of the at least one exemplary embodiment is merely illustrative and is in no way

Techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but the techniques, methods and apparatus should be considered as part of the specification, where appropriate.

It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in one figure, it is not required to be further discussed in the subsequent figures.

Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices.

Referring to FIG. 1, a flowchart of an object attribute detecting method according to an embodiment of the present application is shown. The object attribute detecting method of this embodiment includes the following steps:

Step S102: input the image to be detected into the attention neural network for area detection, and obtain at least one local area associated with the object attribute of the target in the image to be detected as the target area.

The image to be examined in each embodiment of the present application may include a still image or a video image.

The object attribute of the target in the image to be detected is a preset attribute to be detected. For example, the detection of the face attribute in the image to be inspected includes, but is not limited to, one or more of the following: whether or not the glasses are worn, whether or not there is Wearing a hat with or without a mask; for example, detection of vehicle attributes in the image to be examined, including but not limited to: vehicle color, style, license plate number, and the like.

In practical applications, the attention neural network is applied to image recognition of deep learning, and when the image is imitated by a person, the focus of the eye moves on different objects. When the neural network recognizes the image, it is more accurate to focus on each feature at a time. At each recognition, the attention neural network can calculate the weight of each feature, and then weight the sum of the features. The larger the weight, the greater the contribution of the feature to the current recognition.

The target area is a partial area of the image to be detected. The trained attentional neural network has automatic target area detection capability, and the image to be detected is input into the attention neural network, and the corresponding target area can be obtained, and the target area can be one or multiple, such as multiple faces. Area, so that attribute detection can be performed on multiple faces at the same time. The attention neural network may be a neural network that has been trained to be directly used by a third party, or may be an attention neural network obtained through sample training, such as the training obtained by the method described in the following embodiments of the present application. Force neural network.

In an alternative example, the step S102 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first acquisition module 502 being executed by the processor.

Step S104: Input the image to be detected and the at least one target area into the attribute classification neural network for attribute detection, and obtain object attribute information of the image to be detected.

The attribute classification neural network may adopt any appropriate network form, such as VGG-16 neural network, GoogleNet neural network, etc., and the training may also adopt a conventional training method, so that the trained network has attribute classification and recognition functions. For example, the gender, age, wearing, etc. of the pedestrian can be identified.

The input of the attribute classification neural network is the entire image to be detected and the target area determined by the attention neural network, such as the head area of the human body, and the output is the value of the attribute of the target area, such as the value of the attribute of the head.

In an alternative example, the step S104 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second acquisition module 504 being executed by the processor.

Optionally, in the object attribute detecting method of another embodiment, the method further includes: displaying the object attribute information in the image to be detected. In an optional example, the operation of displaying the object attribute information in the image to be examined may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a display module 506 executed by the processor.

In this embodiment, the attention neural network is used to detect the target region in the image to be inspected, and then the image region detected by the attention neural network is input into the attribute classification neural network to perform target attribute detection, and corresponding object attribute information is obtained. The trained attention neural network can accurately detect the target region (ie, the target region) in the image, and perform targeted attribute detection on the target region to obtain more accurate target object attribute information.

Referring to FIG. 2, a flow chart of an object attribute detecting method according to another embodiment of the present application is shown. In this embodiment, the attention neural network for detecting the region corresponding to the target may be trained, and then the object property detection is performed using the trained attention neural network. The object attribute detecting method of this embodiment includes the following steps:

Step S202: Using the training sample image and the auxiliary classification network, the attention neural network is trained as a neural network for detecting a target area in the image.

In an alternative example, the step S202 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first training module 508 being executed by the processor.

Optionally, this step S202 may include:

Step S2022: Input the training sample image into the attention neural network for area training, and obtain probability information of the candidate target area.

The training sample image may be appropriately selected by a person skilled in the art according to actual needs, and may include, for example, but not limited to, a person sample image and a vehicle sample image.

The attention neural network in the embodiments of the present application can be considered as a convolution network that introduces an attention mechanism. After the attention mechanism is introduced, the convolutional network can determine the degree of influence of each candidate target region in the image on the final target region during image training. This degree of influence is usually expressed in the form of probability, that is, candidates. Probability information for the target area.

Taking a single image as an example, a plurality of candidate target regions are usually included. Through the processing of the attention neural network, the probability values of each candidate target region in the image may be the final target region. Similarly, all images in the training sample set are processed by the attention neural network to obtain probability values for which each candidate target region in the respective image may be the final target region. For example, in a person image, a plurality of candidate regions are probability values of the head region.

In this embodiment, the attention sample neural network is taken as an example to realize the attention neural network to the corresponding target area of the character, such as the head area, the upper body area, the lower body area, the foot area, the hand area, and the like. auto recognition. Those skilled in the art can refer to the training of the sample image of the person to realize the training of other sample images, such as the image of the vehicle sample, such as the automatic attention of the attention neural network to the corresponding target area of the vehicle, such as the vehicle brand area, the vehicle sign area, the vehicle body area, and the like. Identification.

Step S2024: Perform sampling of the candidate target region on the training sample image according to the probability information of the candidate target region, and obtain the sampled sampled image.

In a sample image, the candidate target region with a larger probability value is more likely to be sampled. In general, for a sample image with multiple candidate target regions, it is sampled, and some or a plurality of candidate target regions of the sample image may be acquired. The number of samples may be appropriately set by a person skilled in the art according to actual needs, and the embodiment of the present application does not limit this.

In a feasible solution, the polynomial distribution corresponding to the probability value of the candidate target region may be determined first; then, according to the polynomial distribution, the candidate target region is sampled for each training sample image, and the sampled image sample is obtained.

By sampling, information of the target area in the collected training sample image can be obtained, and the feature map of the corresponding target area can be obtained by using the information.

Step S2026: input attribute information and image samples of the target area into the auxiliary classification network for attribute training, obtain accuracy information of the candidate target area in the image sample, and adjust network parameters of the attention neural network according to the accuracy information, Network parameters may include, for example, but are not limited to, weights, biases, and the like.

The attribute information of the target area is attribute information of the target area marked for the training sample image.

The attribute information of the target area is used to represent the attribute of the object of the target area. For example, for the head area of the face, the attribute information may include, but is not limited to, one or more of the following: gender, age, hairstyle, whether Wear glasses, wear masks, etc. The sampled image sample contains information of the sampled area, including which area is collected, and the corresponding feature map of the area.

Before using the auxiliary classification network, the attribute information of the target area needs to be acquired first. In a feasible manner, the attribute information of the target area may be input together with the training sample image at the initial time, wherein the training sample image is input to the attention. The neural network, and the attribute information of the target area is input to the auxiliary classification network. However, the information of the target area may also be input into the attention neural network together with the training sample image, and then transmitted to the auxiliary classification network by the attention neural network; or may be temporarily passed when the sampled sample is input. Get it in the right way.

In various embodiments of the present application, the auxiliary classification network is used to implement the reinforcement learning of the attention neural network. In practical applications, the auxiliary classification network may adopt any suitable network capable of implementing reinforcement learning. Reinforcement learning, as a problem of Sequential Decision Making, continuously selects some behaviors and obtains the greatest return from these behaviors as the best result. It does not have a label to tell the algorithm what to do, by first trying to make some behavior, and then getting a result, by judging whether the result is right or wrong to feedback the previous behavior. This feedback is used to adjust the previous behavior, and through continuous adjustment algorithms, we can learn under what circumstances what kind of behavior can be selected to get the best results.

In this embodiment, the auxiliary classification network determines whether the probability estimation of the corresponding candidate target region by the attention neural network is accurate by calculating the reward value of each candidate target region in each sampled image sample, and then determines How to adjust the network parameters of the attention neural network to make the prediction of the attention neural network more accurate.

In this embodiment, the attribute information and the image sample of the target area are input into the auxiliary classification network for attribute training, and the loss value of the attribute information of the candidate target area in the image sample is obtained by the loss function of the auxiliary classification network. The loss function is determined according to the attribute information of the target area; then, according to the obtained loss value, the reward value of the candidate target area in the image sample is determined, and the reward value is the accuracy information.

For example, the loss values of at least one candidate target region of the at least one image sample may be first averaged to obtain an average value; and the candidate in the sampled image sample is determined according to the relationship between the average value and the obtained loss value. The return value of the target area.

In a feasible solution, if the obtained loss value satisfies the set criterion, the return value of the candidate target area corresponding to the loss value is set as the first return value; otherwise, the candidate corresponding to the loss value is The return value of the target area is set to the second return value. Optionally, the setting criterion may be that the loss value is less than X times of the average value (for example, in an actual application, the X value may be 0.5), and the return value of the candidate target area corresponding to the loss value is set to 1; otherwise Set the return value of the candidate target area corresponding to the loss value to 0. The setting criterion may be appropriately set by a person skilled in the art according to actual conditions, for example, the loss value may be less than 0.5 times of the average value, and the first N or the like among the loss values from large to small, N is greater than An integer of 0.

If the adjusted network parameters of the attention neural network can make the target area obtained by the auxiliary classification network have a return value of 1 and the non-target area has a return value of 0, then the attention neural network training can be considered completed. Otherwise, the parameters of the attention neural network are continuously adjusted according to the reward value until the target area obtained by the auxiliary classification network has a return value of 1 and the non-target area has a return value of 0.

So far, the training of the attention neural network has been realized, and the attention neural network after training can accurately predict the target area.

Step S204: Input the image to be detected into the attention neural network for area detection, and obtain at least one local area associated with the object attribute of the target in the image to be detected as the target area.

As described above, the trained attention neural network is capable of performing target region detection to detect at least one target region associated with the target object attribute

In an alternative example, the step S204 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first acquisition module 502 being executed by the processor.

Step S206: Input the image to be detected and the at least one target area into the attribute classification neural network for attribute detection, and obtain object attribute information of the image to be inspected.

In an alternative example, the step S206 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second acquisition module 504 being executed by the processor.

In this embodiment, the attention neural network is used to detect the target region in the image to be inspected, and then the image region detected by the attention neural network is input into the attribute classification neural network to perform target attribute detection, and corresponding object attribute information is obtained. The trained attention neural network can accurately detect the target area in the image, and perform targeted attribute detection on the area to obtain more accurate object attribute information.

Hereinafter, the training of the attention neural network provided in the embodiment of the present application will be described by using the embodiment shown in FIG. 3. Referring to FIG. 3, a flow chart of a neural network training method in accordance with an embodiment of the present application is shown. The neural network training method of this embodiment includes the following steps:

Step S302: Input the training sample image into the attention neural network for area training, and obtain probability information of the candidate target area.

In this embodiment, the attention sample neural network is still taken as an example to realize the automatic recognition of the corresponding target area of the character by the attention neural network.

In this embodiment, the probability information of the candidate target area may include a probability value of the candidate target area.

In an optional example, the step S302 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a third acquisition module 5082 being executed by the processor.

Step S304: Perform sampling of the candidate target region on the training sample image according to the probability information of the candidate target region, and obtain the sampled sampled image.

In a feasible solution, the polynomial distribution corresponding to the probability value of the candidate target region may be determined first; then, according to the polynomial distribution, the candidate target region is sampled by the training sample image, and the sampled image sample is obtained.

By sampling, information of the region in the collected person image training sample can be obtained, by which the feature map of the corresponding region can be obtained.

In an alternative example, the step S304 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a fourth acquisition module 5084 being executed by the processor.

Step S306: Input attribute information and image samples of the target area into the auxiliary classification network for attribute training, and obtain accuracy information of the candidate target area in the image sample.

The attribute information of the target area is used to represent the attributes of the object of the target area. For example, for the head area, the attribute information may include, but is not limited to, gender, age, hairstyle, whether to wear glasses, whether to wear a mask, or the like. The sampled image sample contains information of the sampled area, including which area is collected, and the corresponding feature map of the area.

In this embodiment, the auxiliary classification network determines whether the attention neural network determines the probability of the corresponding candidate target region accurately by calculating the return value of each candidate target region in each sampled sample, and then determines how to adjust the attention. The network parameters of the neural network to make the prediction of the attention neural network more accurate.

When the accuracy information of the candidate target region in the training sample image is obtained by the auxiliary classification network and the attribute information of the target region, such as the reward value in the embodiment, in a feasible solution, the attribute information and the image of the target region may be obtained. The sample is input into the auxiliary classification network for attribute training, and the loss value of the attribute information of the candidate target area in the image sample is obtained by the loss function of the auxiliary classification network, wherein the loss function is determined according to the attribute information of the target area; The loss value determines a reward value of the candidate target area in the image sample, and the reward value is the accuracy information. For example, the loss values of at least one candidate target region of the at least one image sample may be first averaged (eg, the loss values of the respective candidate target regions of all image samples are averaged) to obtain an average value; and then obtained according to the average value The relationship of the loss values determines the return value of the candidate target region in the sampled image sample. In a feasible solution, if the obtained loss value is less than 0.5 times of the average value, and the loss value satisfies the set standard, the return value of the candidate target area corresponding to the loss value is set to 1; otherwise, the loss value is corresponding. The return value of the candidate target area is set to zero. The setting criteria may be appropriately set by a person skilled in the art according to actual conditions, such as setting the first N or the like among the loss values from large to small, and N is an integer greater than 0.

It can be understood that the above-mentioned feasible solution is only one of the implementation manners. In practical applications, the user can adjust the implementation condition or the optional parameter according to actual needs, and the example of the above feasible solution should not be construed as the only implementation manner.

In an alternative example, the step S306 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a fifth acquisition module 5086 being executed by the processor.

Step S308: Adjust parameters of the attention neural network according to the accuracy information.

The parameters of the adjusted attention neural network may include, for example, but are not limited to, network parameters such as weight parameters and offset parameters.

The training convergence condition of the above attention neural network is only one of the implementation solutions. It can be understood that, in practical applications, the attention neural network of the embodiment of the present application may also set other training convergence conditions, and the above training convergence conditions are An example should not be construed as the only implementation.

In an alternative example, the step S308 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first parameter adjustment module 5088 being executed by the processor.

So far, the training of the attention neural network has been realized, and the attention neural network after training can accurately predict the target area. It should be noted that, in order to improve the training effect, an optional method is to separately train the attention neural network for different target areas, for example, in one training, only the attention neural network is trained on the head region of the character. In another training, only the attentional neural network is trained to predict the upper body area of the character.

In addition, based on the trained attentional neural network, the following alternatives can be performed: using the trained attentional neural network to detect the training sample image, obtaining the target region of the training sample image; using the training sample image , a target area of each training sample image, and an attribute information training attribute classification neural network of each target area.

Among them, the attribute classification neural network may adopt any appropriate network form, such as a convolutional neural network, and the training may also adopt a conventional training method. Through the target area of each training sample image, the recognition of the target area in the training sample image can be effectively learned and trained, and the attribute information of each target area can be effectively used in the target area in the recognized person image. The properties of the object are learned and trained.

Optionally, the attention neural network in the embodiment of the present application may be a full convolutional neural network. Compared with a convolutional neural network with a fully connected layer, the convolutional layer parameters required by the full convolutional neural network are less. Training is faster.

Because the attributes of the subject object in the image are often only related to certain areas of the subject, and do not require the characteristics of the entire image. For example, the pedestrian attribute is often only related to certain body areas of the pedestrian, and does not require a whole pedestrian image. Features such as wearing glasses, wearing a hat, or wearing a mask are only required for the characteristics of the pedestrian's head. In the solution of the embodiment, the attention mechanism based on the Reinforcement Learning method is adopted to enable the algorithm to automatically select the associated region of each attribute in the image, and then the feature of the associated region may be extracted, thereby utilizing the feature and The global characteristics of the image are used to predict the corresponding attributes, which not only saves the cost of manual labeling, but also finds areas that are better for training.

Hereinafter, the neural network training method of the present embodiment will be exemplarily illustrated with an optional example.

In this example, the training attention neural network is used to identify the head region of a person as an example. The training process is as follows:

(1) The pedestrian attributes to be identified are manually classified according to their associated body parts, and the same attributes of the associated areas are classified into one category, such as wearing glasses, wearing a hat, and wearing a mask. These attributes only relate to the head of the pedestrian; Types, backpacks These attributes only relate to the upper body of the pedestrian.

(2) Train a full convolutional attention neural network for each body part.

In the following, taking the attentional neural network of the training head as an example, training in other parts and training in non-pedestrian situations can be implemented with reference to this example.

At each iteration training, the attention neural network selects a batch of images as input. The attention neural network inputs a part of the data of the entire data set for training, that is, a batch of images, and the next batch of images is input in the next iteration. And so on, until the data in the entire data set is completely iterated, the attention neural network outputs a feature map for each image, at least one position in the feature map satisfies the polynomial distribution, and the value of the at least one position corresponds to Probability; then, each image randomly samples M regions from this polynomial distribution, and the probability of sampling each region in the M regions is the probability value corresponding to the region in the feature map, where M is an integer greater than 0, The technical personnel in the field are appropriately set according to actual needs; each area sampled will pass through the auxiliary classification network, and the loss of the classification of an attribute is obtained by the loss function of the attribute classification in the auxiliary classification network; the loss of L is N × M areas The average value, N represents the number of image samples, and the loss of the M regions selected for each image is from small to large. Sorting, if each area is located before the sorted queue top_k months, and less than 0.5L (ie: half the average loss), the return value of the region 1, and 0 otherwise. The top_k may be appropriately set by a person skilled in the art according to actual needs, and the embodiment of the present application does not limit this.

Among them, because each attribute is a multi-valued attribute, each attribute can use the loss function (softmax function) to calculate the loss, and the final loss is the sum of the losses of all attributes. A loss function for attribute classification in an auxiliary classification network is as follows:

among them,

The true label of the kth attribute of the nth image sample (determined according to the attribute value of the input header area),

The label for this attribute output for the network is

The probability, n and N are the number of image samples, and k is the number of attributes of the image sample.

The input of the attentional neural network after training is a whole pedestrian image, and the output is the probability that each possible region in the image is the head, wherein the attention neural network is a full convolutional neural network. For example, it can be 2 convolution layers plus one Softmax layer, and each convolution layer is followed by a ReLU layer. The output of the last convolutional layer before the Softmax layer is a feature map of a single channel, and then after passing through the Softmax layer, the value of each position of the feature map is that the location corresponding to the location in the original image can be selected as the header. The probability, the area with the highest probability, can be selected as the head area.

The attention neural network in this embodiment is optimized by using reinforcement learning. The attention neural network does not calculate the loss immediately, but evaluates the return value of each region. The ultimate goal is to maximize the return value. When evaluating the return value of each area that may be selected as the head, the area is re-entered into an auxiliary classification network, and the loss function of the auxiliary classification network is the classification loss of the attributes related to the head area. The return value of each possible selection as the head region is determined by the classification effect of the region on the attributes of the header region after passing through the auxiliary classification network.

In the neural network training method of the present embodiment, the attentional neural network is trained by training the sample image, and each training sample image may include a plurality of different candidate target regions, and each candidate target region may be obtained through the attention neural network. a probability value of the target area; further, after obtaining the probability value, sampling the corresponding training sample image according to the probability value, wherein a region with a larger probability value is more likely to be collected; after sampling, the target is The attribute information of the region is input into the auxiliary classification network together with the sampled image, and the reward value of the collected region is calculated by the auxiliary classification network, and then the network parameters of the attention neural network are adjusted according to the reward value until the attention neural network is satisfied. Convergence conditions, complete the training of attention neural networks.

It can be seen that, in the above training process, since the attribute information of the target area is related to the target area, for example, whether the information of wearing the glasses or wearing the mask is only related to the human head, after completing the above training, the attention neural network can automatically find the image. In the region with the largest response to the attribute corresponding to the attribute information, it is not necessary to manually mark the training sample, which not only saves the cost of manual labeling, but also finds the best area corresponding to the attribute information, and reduces the cost of the convolutional neural network training process. Reduced training time.

The neural network training method of this embodiment may be performed by any suitable device having data processing capabilities, including but not limited to: a PC or the like.

Referring to FIG. 4, a flow chart of an area detecting method according to an embodiment of the present application is shown. In this embodiment, the target region detection is performed on the image by using the trained attention neural network shown in any of the above embodiments, and the required target region is determined from the image. The area detecting method of this embodiment includes the following steps:

Step S402: Acquire a target image to be detected.

In various embodiments of the present application, the target image may include a still image or a video image. In an alternative, the video image may include a pedestrian image or a vehicle image in video surveillance.

In video surveillance scenarios, there is often a need to identify pedestrian attributes or vehicle attributes. When performing pedestrian attribute or vehicle attribute recognition, the corresponding target area, such as the head area of a certain human body or the area where a certain vehicle is located, may be first located, and corresponding attribute recognition is performed for the target area.

In an alternative example, the step S402 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a ninth acquisition module 702 being executed by the processor.

Step S404: detecting the target image by using an attention neural network to obtain a target area of the target image.

In this embodiment, an attention neural network trained by the method shown in any of the above embodiments is employed. The target area of the target image can be quickly and accurately located, and the target area can be processed according to actual needs, such as attribute recognition, image information acquisition, and area positioning.

In various embodiments of the present application, when the target image is a person image, the target area may include, but is not limited to, any one or more of the following: a head, an upper body, a lower body, a foot, a hand; when the target image is a vehicle image The target area may include, for example but not limited to, any one or more of the following: a vehicle brand area, a vehicle sign area, and a body area.

In an alternative example, the step S404 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a tenth acquisition module 704 that is executed by the processor.

The area detection method in this embodiment can accurately and effectively detect and locate the target area in the image, reduce the target area positioning cost, and improve the target area positioning efficiency.

The area detecting method of this embodiment may be performed by any suitable device having data processing capabilities, including but not limited to: a PC or the like.

Any of the methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like. Alternatively, any of the methods provided by the embodiments of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory to perform any of the methods mentioned in the embodiments of the present application. This will not be repeated below.

A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Referring to FIG. 5, a block diagram of a structure of an object attribute detecting apparatus according to an embodiment of the present application is shown. The object attribute detecting apparatus of the embodiment includes: a first acquiring module 502, configured to input the image to be detected into the attention neural network for area detection, and obtain at least one local area in the image to be detected that is associated with the object attribute of the target. As the target area, the second obtaining module 504 is configured to input the to-be-detected image and the at least one target area into the attribute classification neural network for attribute detection, and obtain the to-be-checked object attribute information.

Optionally, the object attribute detecting apparatus of the embodiment further includes: a display module 506, configured to display the object attribute information in the image to be detected.

Optionally, when the target image is a person image, the target area may include any one or more of the following: a head, an upper body, a lower body, a foot, a hand; and/or, when the target image is a vehicle image, The target area may include, for example, any one or more of the following: a vehicle brand area, a vehicle sign area, and a body area.

Alternatively, the image to be examined may include a still image or a video image.

Alternatively, the video image may include a pedestrian image and/or a vehicle image in video surveillance.

Optionally, the object attribute detecting apparatus of the embodiment further includes: a first training module 508, configured to use the training sample image and before the first acquiring module 502 inputs the image to be detected into the attention neural network for area detection. The auxiliary classification network trains the attention neural network as a neural network for detecting a target area in the image.

Optionally, the first training module 508 includes: a third obtaining module 5082, configured to input the training sample image into the attention neural network for area training, to obtain probability information of the candidate target area; and a fourth obtaining module 5084, configured to: The candidate target region is sampled according to the probability information of the candidate target region, and the sampled image sample is obtained. The fifth obtaining module 5086 is configured to input the attribute information and the image sample of the target region into the auxiliary classification network for attribute training. And obtaining the accuracy information of the candidate target area in the image sample; the attribute information of the target area is attribute information of the target area marked for the training sample image; the first parameter adjustment module 5088 is configured to adjust the attention neural network according to the accuracy information Network parameters.

Optionally, the fifth obtaining module 5086 includes: a first loss obtaining module 50862, configured to input attribute information and image samples of the target area into the auxiliary classification network for attribute training, and obtain image samples by using a loss function of the auxiliary classification network. And a loss value of the attribute information of the candidate target area, wherein the loss function is determined according to the attribute information of the target area; the first report obtaining module 50864 is configured to determine, according to the obtained loss value, a return value of the candidate target area in the image sample The return value is the accuracy information.

Optionally, the first report obtaining module 50864 is configured to average the loss values of the at least one candidate target region of the at least one image sample to obtain an average value; and determine the candidate in the image sample according to the relationship between the average value and the obtained loss value. The return value of the target area.

Optionally, the first report obtaining module 50864 is configured to average the loss values of the at least one candidate target region of the at least one image sample to obtain an average value; if the obtained loss value satisfies the setting criterion, the candidate corresponding to the loss value is obtained. The return value of the target area is set to the first return value; otherwise, the return value of the candidate target area corresponding to the loss value is set as the second return value.

Optionally, the fourth obtaining module 5084 is configured to determine a polynomial distribution corresponding to the probability value of the candidate target region; according to the polynomial distribution, the candidate target region is sampled by the training sample image, and the sampled image sample is obtained.

Optionally, the attention neural network comprises a full convolutional neural network.

Optionally, the object attribute detecting apparatus of the embodiment further includes: a second training module 510, configured to detect the training sample image by using the trained attention neural network to obtain a target area of the training sample image; and use the training sample image, at least A target area of the training sample image, and an attribute information training attribute classification neural network of at least one target area.

The object attribute detecting apparatus of the present embodiment can be used to implement the corresponding object attribute detecting method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.

Referring to FIG. 6, a block diagram of a neural network training apparatus according to another embodiment of the present application is shown. The neural network training device of the present embodiment includes: a sixth obtaining module 602, configured to input a training sample image into the attention neural network for area training, and obtain probability information of the candidate target region; and a seventh obtaining module 604, configured to The probability information of the candidate target region is used to sample the candidate target region of the training sample image to obtain the sampled image sample. The eighth obtaining module 606 is configured to input the attribute information and the image sample of the target region into the auxiliary classification network for attribute training. Obtaining accuracy information of the candidate target region in the image sample; the attribute information of the target region is attribute information of the target region marked for the training sample image; and the second parameter adjustment module 608 is configured to adjust the attention neural network according to the accuracy information parameter.

Optionally, the eighth obtaining module 606 includes: a second loss obtaining module 6062, configured to input attribute information and image samples of the target area into the auxiliary classification network for attribute training, and obtain an image sample by using a loss function of the auxiliary classification network. The loss value of the attribute information of the candidate target area, wherein the loss function is determined according to the attribute information of the target area; the second report obtaining module 6064 is configured to determine the return value of the candidate target area in the image sample according to the obtained loss value The return value is the accuracy information.

Optionally, the second report obtaining module 6064 is configured to average the loss values of the at least one candidate target region of the at least one image sample to obtain an average value; and determine the candidate in the image sample according to the relationship between the average value and the obtained loss value. The return value of the target area.

Optionally, the second report obtaining module 6064 is configured to average the loss values of the at least one candidate target region of the at least one image sample to obtain an average value; if the obtained loss value satisfies the setting criterion, the candidate corresponding to the loss value is obtained. The return value of the target area is set to the first return value; otherwise, the return value of the candidate target area corresponding to the loss value is set as the second return value.

Optionally, the seventh obtaining module 604 is configured to determine a polynomial distribution corresponding to the probability value of the candidate target region; according to the polynomial distribution, the candidate target region is sampled by the training sample image, and the sampled image sample is obtained.

Optionally, the attention neural network is a full convolutional neural network.

Optionally, the neural network training device of the embodiment further includes: a third training module 610, configured to detect the training sample image by using the trained attention neural network to obtain a target region of the training sample image; and use the training sample image, at least A target area of the training sample image, and an attribute information training attribute classification neural network of at least one target area.

The neural network training device of the present embodiment is used to implement the corresponding neural network training method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.

Referring to FIG. 7, a block diagram of a structure of an area detecting apparatus according to an embodiment of the present application is shown. The area detecting device of this embodiment includes: a ninth obtaining module 702, configured to acquire a target image to be detected, wherein the target image includes a still image or a video image; and a tenth acquiring module 704 is configured to detect the target by using an attention neural network. The image is obtained from the target area of the target image; wherein the attention neural network is trained by using the neural network training method or the neural network training device according to any of the above embodiments of the present application.

Optionally, when the target image is a person image, the target area may include any one or more of the following: a head, an upper body, a lower body, a foot, a hand; when the target image is a vehicle image, the target area may include the following: Any one or more of: vehicle grade area, vehicle sign area, body area.

Optionally, the video image includes a pedestrian image or a vehicle image in video surveillance.

The area detecting device of the present embodiment can be used to implement the corresponding area detecting method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.

In addition, an embodiment of the present application further provides an electronic device, including: a processor and a memory;

The memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the object attribute detecting method according to any one of the foregoing embodiments of the present application; or

The memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the neural network training method described in any one of the foregoing embodiments of the present application; or

The memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the area detecting method according to any one of the foregoing embodiments of the present application.

In addition, the embodiment of the present application further provides another electronic device, including:

The processor and the object attribute detecting apparatus according to any one of the above embodiments of the present application; when the processor runs the object attribute detecting apparatus, the unit in the object attribute detecting apparatus according to any one of the above embodiments of the present application is operated; or

The processor and the neural network training device according to any one of the above embodiments of the present application; when the processor runs the neural network training device, the unit in the neural network training device according to any of the above embodiments of the present application is operated; or

The processor and the area detecting device according to any of the above embodiments of the present application; when the processor runs the area detecting device, the unit in the area detecting device according to any of the above embodiments of the present application is operated.

The embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like. Referring now to Figure 8, a block diagram of an electronic device 800 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown. As shown in FIG. 8, electronic device 800 includes one or more first processors, first communication elements, etc., such as one or more central processing units (CPUs) 801, and / or one or more image processor (GPU) 813 or the like, the first processor may be loaded into the random access memory (RAM) 803 according to executable instructions stored in read only memory (ROM) 802 or from storage portion 808. The executable instructions execute various appropriate actions and processes. In this embodiment, the first read only memory 802 and the random access memory 803 are collectively referred to as a first memory. The first communication component includes a communication component 812 and/or a communication interface 809. The communication component 812 can include, but is not limited to, a network card. The network card can include, but is not limited to, an IB (Infiniband) network card. The communication interface 809 includes a communication interface of a network interface card such as a LAN card, a modem, etc., and the communication interface 809 is via an Internet interface. The network performs communication processing.

The first processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions, connect to communication component 812 via first communication bus 804, and communicate with other target devices via communication component 812 to complete The operation corresponding to any object attribute detecting method provided by the embodiment of the present application, for example, inputting the image to be detected into the attention neural network for area detection, and obtaining at least one target area in the image to be detected that is associated with the object attribute of the target. Entering the image to be inspected and the at least one target area into the attribute classification neural network for attribute detection, and obtaining object attribute information of the image to be inspected. Alternatively, the first processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions, connect to communication component 812 via first communication bus 804, and communicate with other target devices via communication component 812, The operation corresponding to any of the neural network training methods provided by the embodiment of the present application is completed, for example, the training sample image is input into the attention neural network for area training, and probability information of the candidate target area is obtained; according to the candidate target area Probabilistic information is used to sample candidate image regions of the training sample image to obtain sampled image samples; and input attribute information of the target region and the image samples into an auxiliary classification network for attribute training to obtain candidates in the image samples. The accuracy information of the target area; the attribute information of the target area is attribute information of the target area marked for the training sample image; and the parameter of the attention neural network is adjusted according to the accuracy information. Alternatively, the first processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions, connect to communication component 812 via first communication bus 804, and communicate with other target devices via communication component 812, The operation corresponding to any area detection method provided by the embodiment of the present application is completed, for example, acquiring a target image to be detected, where the target image includes a still image or a video image; and the target image is detected by using an attention neural network. Obtaining a target area of the target image; wherein the attention neural network is trained using a neural network training method as described in any of the embodiments of the present application.

Further, in the RAM 803, various programs and data required for the operation of the device can be stored. The CPU 801 or GPU 813, the ROM 802, and the RAM 803 are connected to each other through the first communication bus 804. In the case of RAM 803, ROM 802 is an optional module. The RAM 803 stores executable instructions or writes executable instructions to the ROM 802 at runtime, the executable instructions causing the first processor to perform operations corresponding to the above-described communication methods. An input/output (I/O) interface 805 is also coupled to the first communication bus 804. The communication component 812 can be integrated or can be configured to have multiple sub-modules (e.g., multiple IB network cards) and be on a communication bus link.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 808 including a hard disk or the like. And a communication interface 809 including a network interface card such as a LAN card, modem, or the like. Driver 810 is also coupled to I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage portion 808 as needed.

It should be noted that the architecture shown in FIG. 8 is only an optional implementation manner. In an optional practice process, the number and type of components in FIG. 8 may be selected, deleted, added, or replaced according to actual needs; Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, communication components can be separated, or integrated on the CPU or GPU. ,and many more. These alternative embodiments are all within the scope of the present application.

In particular, according to embodiments of the present application, the processes described above with reference to the flowcharts may be implemented as a computer software program. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising the corresponding execution The instructions corresponding to the method steps provided by any embodiment of the present application. For example, the program code may include an instruction corresponding to the following steps provided in the embodiment of the present application: inputting the image to be detected into the attention neural network for area detection, and obtaining at least one of the to-be-detected image associated with the object attribute of the target. Target area; inputting the image to be inspected and the at least one target area into the attribute classification neural network for attribute detection, and obtaining object attribute information of the image to be inspected. For example, the program code may include an instruction corresponding to the following steps provided in the embodiment of the present application: inputting the training sample image into the attention neural network for area training, and obtaining probability information of the candidate target area; according to the candidate target area The probability information is used to sample the candidate target region of the training sample image to obtain the sampled image sample; and input the attribute information of the target region and the image sample into the auxiliary classification network for attribute training to obtain the image sample. The accuracy information of the candidate target area; the attribute information of the target area is attribute information of the target area marked for the training sample image; and the parameter of the attention neural network is adjusted according to the accuracy information. For example, the program code may include an instruction corresponding to the following steps provided in the embodiment of the present application: acquiring a target image to be detected, wherein the target image includes a still image or a video image; and detecting the target by using an attention neural network And obtaining a target area of the target image; wherein the attention neural network is trained using a neural network training method as described in any of the embodiments of the present application. In such an embodiment, the computer program can be downloaded and installed from the network via a communication component, and/or installed from the removable media 811. The above-described functions defined in the method of any of the embodiments of the present application are executed when the computer program is executed by the first processor.

In addition, the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on a device, the processor in the device executes to implement any of the embodiments of the present application. The instructions of each step in the object attribute detection method; or

When the computer readable code is run on a device, the processor in the device executes instructions for implementing the steps in the region detecting method as described in any of the embodiments of the present application.

In addition, the embodiment of the present application further provides a computer readable storage medium, which is configured to store computer readable instructions, and when the instructions are executed, implement the steps in the object attribute detecting method according to any embodiment of the present application. The operation of each step in the neural network training method described in any of the embodiments of the present application, or the operation of each step in the area detecting method according to any of the embodiments of the present application.

At least one embodiment of the present specification is described in a progressive manner, and at least one embodiment focuses on differences from other embodiments, and the same or similar parts between at least one embodiment may be referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The methods, apparatus, and apparatus of the present application may be implemented in a number of ways. For example, the method, apparatus, and apparatus of the embodiments of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the embodiments of the present application are not limited to the order of the above optional description unless otherwise specified. Moreover, in some embodiments, the present application may also be embodied as a program recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with embodiments of the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to an embodiment of the present application.

The description of the embodiments of the present application is for the purpose of illustration and description, The embodiments were chosen and described in order to best explain the principles and embodiments of the embodiments of the invention,

Claims

An object attribute detecting method includes:

Inputting the image to be detected into the attention neural network for area detection, and obtaining at least one target area in the image to be detected that is associated with the object attribute of the target;

Inputting the to-be-detected image and the at least one target area into an attribute classification neural network for attribute detection, and obtaining object attribute information of the to-be-detected image.
The method of claim 1 further comprising:

The object attribute information is displayed in the image to be inspected.
The method according to claim 1 or 2, wherein, when the image to be inspected is a person image, the target area comprises any one or more of the following: a head, an upper body, a lower body, a foot, a hand; and / or,

When the image to be inspected is a vehicle image, the target area includes any one or more of the following: a vehicle brand area, a vehicle sign area, and a vehicle body area.
The method according to any one of claims 1 to 3, wherein the image to be examined comprises a still image or a video image.
The method of claim 4 wherein the video image comprises a pedestrian image and/or a vehicle image in video surveillance.
The method according to any one of claims 1 to 5, further comprising: before inputting the image to be detected into the attention neural network for area detection, further comprising:

The attention neural network is trained as a neural network for detecting a target area in the image using the training sample image and the auxiliary classification network.
The method according to claim 6, wherein said training said attentional neural network as a neural network for detecting a target area in an image using a training sample image and an auxiliary classification network comprises:

The training sample image is input into the attention neural network for regional training, and probability information of the candidate target region is obtained;

And performing candidate target region sampling on the training sample image according to the probability information of the candidate target region, and obtaining the sampled image sample;

Entering the attribute information of the target area and the image sample into the auxiliary classification network for attribute training, and obtaining accuracy information of the candidate target area in the image sample; wherein the attribute information of the target area is targeted Attribute information of a target area marked by the training sample image;

Adjusting network parameters of the attention neural network according to the accuracy information.
The method according to claim 7, wherein the attribute information of the target area and the image sample are input into the auxiliary classification network for attribute training, and the accuracy information of the candidate target area in the image sample is obtained, including :

And inputting the attribute information of the target area and the image sample into the auxiliary classification network for attribute training, and obtaining, by using a loss function of the auxiliary classification network, attribute information of the candidate target area in the image sample. Loss value, wherein the loss function is determined according to attribute information of the target area;

And determining, according to the obtained loss value, a return value of the candidate target region in the image sample, the reward value being the accuracy information.
The method of claim 8, wherein determining a reward value of the candidate target region in the image sample based on the obtained loss value comprises:

And averaging loss values of at least one candidate target region of at least one of the image samples to obtain an average value;

A return value of the candidate target region in the image sample is determined based on the relationship between the average value and the obtained loss value.
The method of claim 9, wherein determining a reward value of the candidate target region in the image sample based on the relationship between the average value and the obtained loss value comprises:

And if the obtained loss value satisfies the setting criterion, setting a return value of the candidate target area corresponding to the loss value as the first return value;

Otherwise, the reward value of the candidate target area corresponding to the loss value is set as the second reward value.
The method according to any one of claims 7 to 10, wherein the candidate target region is sampled according to the probability information of the candidate target region, and the sampled image sample is obtained, including:

Determining a polynomial distribution corresponding to the probability value of the candidate target region;

According to the polynomial distribution, the candidate target region is sampled by the training sample image, and the sampled image sample is obtained.
A method according to any of claims 7-11, wherein the attention neural network comprises a full convolutional neural network.
The method of any of claims 7-12, further comprising:

The training sample image is detected by using the attention-maintained neural network, and the target area of the training sample image is obtained;

The attribute classification neural network is trained using the training sample image, the target area of at least one of the training sample images, and the attribute information of at least one of the target areas.
A neural network training method includes:

The training sample image is input into the attention neural network for regional training, and probability information of the candidate target region is obtained;

And performing candidate target region sampling on the training sample image according to the probability information of the candidate target region, and obtaining the sampled image sample;

And inputting the attribute information of the target area and the image sample into the auxiliary classification network for attribute training, obtaining accuracy information of the candidate target area in the image sample; wherein the attribute information of the target area is for the training Attribute information of the target area marked by the sample image;

Adjusting parameters of the attention neural network according to the accuracy information.
The method according to claim 14, wherein the attribute information of the target area and the image sample are input into the auxiliary classification network for attribute training, and the accuracy information of the candidate target area in the image sample is obtained, including:

The attribute information of the target area and the image sample are input into the auxiliary classification network for attribute training, and the loss value of the attribute information of the candidate target area in the image sample is obtained by using the loss function of the auxiliary classification network. Wherein the loss function is determined according to attribute information of the target area;

And determining, according to the obtained loss value, a return value of the candidate target region in the image sample, the reward value being the accuracy information.
The method of claim 15, wherein determining a reward value of the candidate target region in the image sample based on the obtained loss value comprises:

And averaging loss values of at least one candidate target region of at least one of the image samples to obtain an average value;

A return value of the candidate target region in the image sample is determined based on the relationship between the average value and the obtained loss value.
The method of claim 16, wherein determining a return value of the candidate target region in the image sample based on the relationship between the average value and the obtained loss value comprises:

And if the obtained loss value satisfies the setting criterion, setting a return value of the candidate target area corresponding to the loss value as the first return value;

Otherwise, the reward value of the candidate target area corresponding to the loss value is set as the second reward value.
The method according to any one of claims 14-17, wherein the candidate target region is sampled according to the probability information of the candidate target region, and the sampled image sample is obtained, including:

Determining a polynomial distribution corresponding to the probability value of the candidate target region;

According to the polynomial distribution, the candidate target region is sampled by the training sample image, and the sampled image sample is obtained.
A method according to any of claims 14-18, wherein the attention neural network comprises a full convolutional neural network.
The method of any of claims 14-19, further comprising:

The training sample image is detected by using the attention-maintained neural network, and the target area of the training sample image is obtained;

The attribute classification neural network is trained using the training sample image, the target area of at least one of the training sample images, and the attribute information of at least one of the target areas.
A method of area detection, comprising:

Obtaining a target image to be detected, wherein the target image comprises a still image or a video image;

Detecting the target image by using an attention neural network to obtain a target area of the target image;

Wherein the attention neural network is trained by the method according to any one of claims 14-20.
The method according to claim 21, wherein when the target image is a person image, the target area includes any one or more of the following: a head, an upper body, a lower body, a foot, a hand; When the target image is a vehicle image, the target area includes any one or more of the following: a vehicle brand area, a vehicle logo area, and a vehicle body area.
The method of claim 21 or 22, wherein the video image comprises a pedestrian image or a vehicle image in video surveillance.
An object attribute detecting device includes:

a first acquiring module, configured to input an image to be detected into an attention neural network for area detection, and obtain at least one target area in the image to be detected that is associated with an object attribute of the target;

And a second acquiring module, configured to input the to-be-detected image and the at least one target area into an attribute classification neural network for attribute detection, and obtain object attribute information of the to-be-detected image.
The apparatus of claim 24, further comprising:

And a display module, configured to display the object attribute information in the image to be inspected.
The apparatus according to claim 24 or 25, wherein, when the target image is a person image, the target area includes any one or more of the following: a head, an upper body, a lower body, a foot, a hand; /or,

When the target image is a vehicle image, the target area includes any one or more of the following: a vehicle brand area, a vehicle logo area, and a vehicle body area.
A device according to any one of claims 24 to 26, wherein the image to be examined comprises a still image or a video image.
The apparatus of claim 27, wherein the video image comprises a pedestrian image and/or a vehicle image in video surveillance.
The apparatus according to any one of claims 24 to 28, further comprising:

a first training module, configured to use the training sample image and the auxiliary classification network to train the attention neural network for use before the first acquisition module inputs the image to be detected into the attention neural network for area detection A neural network that detects a target area in an image.
The apparatus of claim 29, wherein the first training module comprises:

a third acquiring module, configured to input the training sample image into the attention neural network for regional training, and obtain probability information of the candidate target region;

a fourth acquiring module, configured to perform candidate target area sampling on the training sample image according to probability information of the candidate target area, to obtain a sampled image sample;

a fifth obtaining module, configured to input attribute information of the target area and the image sample into the auxiliary classification network for attribute training, to obtain accuracy information of the candidate target area in the image sample; The attribute information is attribute information of a target area marked for the training sample image;

The first parameter adjustment module is configured to adjust network parameters of the attention neural network according to the accuracy information.
The apparatus of claim 30, wherein the fifth obtaining module comprises:

a first loss obtaining module, configured to input attribute information of the target area and the image sample into the auxiliary classification network for attribute training, and obtain the image sample by using a loss function of the auxiliary classification network a loss value of attribute information of the candidate target area, wherein the loss function is determined according to attribute information of the target area;

And a first reward obtaining module, configured to determine, according to the obtained loss value, a return value of the candidate target area in the image sample, where the reward value is the accuracy information.
The apparatus according to claim 31, wherein said first reward acquisition module is configured to average a loss value of at least one candidate target region of at least one image sample to obtain an average value; according to said average value and obtained The relationship of the loss values determines a return value of a candidate target region in the image sample.
The apparatus according to claim 32, wherein said first reward acquisition module is configured to average the loss values of at least one of said candidate target regions of said at least one of said image samples to obtain an average value; If the loss value satisfies the setting criterion, the return value of the candidate target area corresponding to the loss value is set as the first return value; otherwise, the return value of the candidate target area corresponding to the loss value is set as the second return value. .
The apparatus according to any one of claims 30-33, wherein the fourth obtaining module is configured to determine a polynomial distribution corresponding to a probability value of the candidate target region; and to perform the training according to the polynomial distribution The sample image is sampled by the candidate target region, and the sampled sample is obtained.
A device according to any of claims 30-34, wherein the attention neural network comprises a full convolutional neural network.
The apparatus according to any one of claims 30 to 35, further comprising:

a second training module, configured to detect the training sample image by using the trained attention neural network to obtain a target area of the training sample image; and use the training sample image, at least one of the training sample images The target area, and the attribute information of at least one of the target areas, the training attribute classification neural network.
A neural network training device includes:

a sixth acquiring module, configured to input the training sample image into the attention neural network for regional training, and obtain probability information of the candidate target region;

a seventh acquiring module, configured to perform sampling of candidate target regions on the training sample image according to probability information of the candidate target region, to obtain sampled image samples;

An eighth obtaining module, configured to input attribute information of the target area and the image sample into the auxiliary classification network for attribute training, to obtain accuracy information of the candidate target area in the image sample; wherein, the target area The attribute information is attribute information of a target area marked for the training sample image;

The second parameter adjustment module is configured to adjust parameters of the attention neural network according to the accuracy information.
The apparatus of claim 37, wherein the eighth obtaining module comprises:

a second loss obtaining module, configured to input attribute information of the target area and the image sample into an auxiliary classification network for attribute training, and obtain, by using a loss function of the auxiliary classification network, the candidate target in the image sample a loss value of the attribute information of the region, wherein the loss function is determined according to attribute information of the target region;

And a second report obtaining module, configured to determine, according to the obtained loss value, a return value of the candidate target area in the image sample, where the reward value is the accuracy information.
The apparatus according to claim 38, wherein said second reward acquisition module is configured to average the loss values of respective candidate target regions of all image samples to obtain an average value; according to said average value and said obtained A relationship of loss values that determines a return value of a candidate target region in the image sample.
The apparatus according to claim 39, wherein said second reward acquisition module is configured to average the loss values of respective candidate target regions of all image samples to obtain an average value; if said obtained loss value satisfies a setting The criterion is to set the return value of the candidate target area corresponding to the loss value as the first return value; otherwise, set the return value of the candidate target area corresponding to the loss value as the second return value.
The apparatus according to any one of claims 37 to 40, wherein the seventh obtaining module is configured to determine a polynomial distribution corresponding to a probability value of the candidate target region; and according to the polynomial distribution, the training sample The image is sampled by the candidate target area, and the sampled sample is obtained.
A device according to any of claims 37-41, wherein the attention neural network comprises a full convolutional neural network.
The apparatus of any of claims 37-42, further comprising:

a third training module, configured to detect the training sample image by using the trained attention neural network to obtain a target area of the training sample image; and use the training sample image, at least one target of the training sample image The region, and the attribute information of at least one of the target regions, train the attribute classification neural network.
An area detecting device comprising:

a ninth obtaining module, configured to acquire a target image to be detected, where the target image includes a still image or a video image;

a tenth acquiring module, configured to detect the target image by using an attention neural network, and obtain a target area of the target image;

Wherein the attention neural network is trained by the method according to any one of 14 to 20 or the device according to any one of claims 37 to 43.
The apparatus according to claim 44, wherein when the target image is a person image, the target area includes any one or more of the following: a head, an upper body, a lower body, a foot, a hand; When the target image is a vehicle image, the target area includes any one or more of the following: a vehicle brand area, a vehicle logo area, and a vehicle body area.
The apparatus according to claim 44 or 45, wherein said video image comprises a pedestrian image or a vehicle image in video surveillance.
An electronic device comprising: a processor and a memory;

The memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the object attribute detecting method according to any one of claims 1 to 13; or Storing at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the neural network training method of any one of claims 14-20; or the memory is configured to store at least one executable An instruction that causes the processor to perform an operation corresponding to the region detecting method according to any one of claims 21-23.
An electronic device comprising:

The processor and the object attribute detecting apparatus according to any one of claims 24-36; wherein the unit in the object attribute detecting apparatus according to any one of claims 24-36 is used when the processor runs the object attribute detecting means Run; or

The processor and the neural network training device according to any one of claims 37 to 43; wherein the unit in the neural network training device according to any one of claims 37 to 43 is operated when the processor runs the neural network training device Run; or

The processor and the area detecting device according to any one of claims 44 to 46; wherein the unit in the area detecting device according to any one of claims 44 to 46 is operated when the processor operates the area detecting device.
A computer program comprising computer readable code, the processor in the device executing the object attribute detection according to any one of claims 1-13 when the computer readable code is run on a device Instructions for each step in the method; or

When the computer readable code is run on a device, the processor in the device executes instructions for implementing the steps in the neural network training method of any of claims 14-20; or

When the computer readable code is run on a device, the processor in the device executes instructions for implementing the steps in the region detecting method of any of claims 21-23.
A computer readable storage medium for storing computer readable instructions, wherein when the instructions are executed, the operations of the steps in the object attribute detecting method according to any one of claims 1 to 13 are implemented Or the operation of each step in the neural network training method according to any one of claims 14-20, or the operation of each step in the area detecting method according to any one of claims 21-23.