Nothing Special   »   [go: up one dir, main page]

WO2018121737A1 - 关键点预测、网络训练及图像处理方法和装置、电子设备 - Google Patents

关键点预测、网络训练及图像处理方法和装置、电子设备 Download PDF

Info

Publication number
WO2018121737A1
WO2018121737A1 PCT/CN2017/119877 CN2017119877W WO2018121737A1 WO 2018121737 A1 WO2018121737 A1 WO 2018121737A1 CN 2017119877 W CN2017119877 W CN 2017119877W WO 2018121737 A1 WO2018121737 A1 WO 2018121737A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
prediction
information
neural network
convolutional neural
Prior art date
Application number
PCT/CN2017/119877
Other languages
English (en)
French (fr)
Inventor
刘宇
闫俊杰
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018121737A1 publication Critical patent/WO2018121737A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the embodiments of the present invention relate to the field of artificial intelligence technologies, and in particular, to a key point prediction, a network training and image processing method and apparatus, and an electronic device.
  • the key point prediction of a general object refers to a key point of a general object (such as a human body, a vehicle, an animal or a plant, a furniture, etc.) in a natural scene (such as a person's head, hand, torso position; a front window of the vehicle, Tire, chassis, back box position, etc. are predicted.
  • Key points for general purpose objects can be used to enhance the effects of applications such as general object detection and scene segmentation.
  • the embodiment of the present application provides a key point prediction, network training, and image processing technical solution.
  • a key point prediction method including: detecting an image by using a first convolutional neural network to obtain feature information of the image; and using the first convolutional neural network to use a universal a convolutional neural network trained by the sample image of the keypoint of the object; using the first convolutional neural network to predict a key point of the universal object of the image according to the feature information, and obtaining a general object of the image
  • the key point prediction result includes key point position prediction information and key point existence prediction information.
  • the first convolutional neural network includes: a feature extraction layer, a first keypoint prediction convolution layer and a second keypoint prediction convolution layer, the first keypoint prediction convolution layer and the first Two key point prediction convolution layers are respectively connected to the feature extraction layer; wherein: the feature extraction layer is used to extract feature information of the image; and the first key point prediction convolution layer is used to compare the feature information Performing a convolution operation to obtain the key point position prediction information; the second key point prediction convolution layer is configured to perform a convolution operation on the feature information to obtain the key point presence prediction information.
  • the convolution kernel of the first keypoint prediction convolution layer is 1*1*2N
  • the convolution kernel of the second keypoint prediction convolution layer is 1*1*N, where N is The total number of key points to be predicted.
  • the first convolutional neural network comprises: a full convolutional neural network.
  • the training of the first convolutional neural network includes: acquiring the sample image, where the key point annotation information includes key point location annotation information and key point presence annotation information; and training the first image using the sample image a convolutional neural network, obtaining key point position prediction information and key point presence prediction information of the general object of the first convolutional neural network for the sample image; predicting the key point position information and the key according to an objective function
  • the point presence prediction information is supervised to determine whether the iterative loss rate of the first convolutional neural network satisfies a set condition; if satisfied, the training of the first convolutional neural network is completed.
  • the training of the first convolutional neural network further includes: if not satisfied, if not, adjusting the first according to the obtained key point position prediction information and the key point presence prediction information Convolving the parameters of the neural network until the iterative loss rate satisfies the set condition.
  • the supervising the key point position prediction information and the key point presence prediction information according to an objective function comprising: supervising the key point position prediction information according to a regression objective function, according to a classification objective function pair Predictive information exists for monitoring at the key points.
  • a key point prediction network training method including: acquiring a sample image of key point annotation information including a general object, wherein the key point labeling information includes key point position labeling information And the key point has annotation information; using the sample image to train the first convolutional neural network, obtaining key point position prediction information and key point existence prediction information of the universal object of the first convolutional neural network for the sample image; And supervising the key point position prediction information and the key point presence prediction information according to an objective function, determining whether an iterative loss rate of the first convolutional neural network satisfies a set condition; if satisfied, completing the Training of a convolutional neural network.
  • the method further includes: adjusting, if not satisfied, adjusting parameters of the first convolutional neural network according to the obtained key point position prediction information and the key point presence prediction information until the iterative loss rate satisfies Set the conditions.
  • the supervising the key point position prediction information and the key point presence prediction information according to an objective function comprising: supervising the key point position prediction information according to a regression objective function, according to a classification objective function pair Predictive information exists for monitoring at the key points.
  • the first convolutional neural network includes: a feature extraction layer, a first keypoint prediction convolution layer and a second keypoint prediction convolution layer, the first keypoint prediction convolution layer and the first Two key point prediction convolution layers are respectively connected to the feature extraction layer; wherein: the feature extraction layer is used to extract feature information of the sample image; and the first key point prediction convolution layer is used to The information is subjected to a convolution operation to obtain the key point position prediction information; and the second key point prediction convolution layer is configured to perform a convolution operation on the feature information to obtain the key point existence prediction information.
  • the convolution kernel of the first keypoint prediction convolution layer is 1*1*2N
  • the convolution kernel of the second keypoint prediction convolution layer is 1*1*N, where N is The total number of key points to be predicted.
  • the first convolutional neural network comprises: a full convolutional neural network.
  • an image processing method comprising: detecting an image by using a key point prediction method according to any of the above embodiments, or adopting a key as described in any of the above embodiments.
  • the first convolutional neural network detected by the point prediction network training method detects an image, and obtains a key point prediction result of the general object of the image, where the key point prediction result includes key point position prediction information and key point existence prediction information;
  • the image is processed according to a key point prediction result of the universal object.
  • the processing the image according to a key point prediction result of the universal object comprises: determining a location of a general object in the image according to a key point prediction result of the universal object.
  • the processing the image according to the key point prediction result of the universal object comprises: extracting an object feature of the universal object in the image according to the key point prediction result of the universal object.
  • the processing the image according to a key point prediction result of the universal object comprises: estimating a posture of a general object in the image according to a key point prediction result of the universal object.
  • the processing the image according to the key point prediction result of the universal object comprises: tracking a general object in the image according to a key point prediction result of the universal object.
  • the processing the image according to the key point prediction result of the universal object comprises: identifying a general object in the image according to a key point prediction result of the universal object.
  • the processing the image according to the key point prediction result of the universal object comprises: rendering a general object in the image according to a key point prediction result of the universal object.
  • a key point prediction apparatus includes: a detection module, configured to detect an image by using a first convolutional neural network, to obtain feature information of the image; and the first convolution a neural network is a convolutional neural network trained using a sample image containing key point annotation information of a general object; a prediction module for predicting a general object of the image based on the feature information by using the first convolutional neural network A key point is obtained by obtaining a key point prediction result of the general object of the image, the key point prediction result including key point position prediction information and key point existence prediction information.
  • the first convolutional neural network includes: a feature extraction layer, a first keypoint prediction convolution layer and a second keypoint prediction convolution layer, the first keypoint prediction convolution layer and the first Two key point prediction convolution layers are respectively connected to the feature extraction layer; wherein: the feature extraction layer is used to extract feature information of the image; and the first key point prediction convolution layer is used to compare the feature information Performing a convolution operation to obtain the key point position prediction information; the second key point prediction convolution layer is configured to perform a convolution operation on the feature information to obtain the key point presence prediction information.
  • the convolution kernel of the first keypoint prediction convolution layer is 1*1*2N
  • the convolution kernel of the second keypoint prediction convolution layer is 1*1*N, where N is The total number of key points to be predicted.
  • the first convolutional neural network comprises: a full convolutional neural network.
  • the device further includes: a training module, configured to train the first convolutional neural network
  • the training module includes: an acquisition submodule, configured to acquire the sample image, where the key point annotation information includes Key point location label information and key point presence label information; training sub-module for training the first convolutional neural network using the sample image to obtain a key of the first convolutional neural network for the general object of the sample image Point position prediction information and key point existence prediction information; a monitoring sub-module for supervising the key point position prediction information and the key point existence prediction information according to an objective function; and determining a sub-module for determining the first Whether the iterative loss rate of the convolutional neural network satisfies the set condition; the execution submodule is configured to complete the first convolutional neural network if the iterative loss rate of the first convolutional neural network satisfies the set condition training.
  • the execution sub-module is further configured to: if the iterative loss rate of the first convolutional neural network does not satisfy the set condition, the information and location information of the key point obtained according to the training sub-module The key point presence prediction information adjusts parameters of the first convolutional neural network until the iterative loss rate satisfies the set condition.
  • the supervising sub-module is configured to supervise the key point position prediction information according to a regression objective function, and supervise the key point existence prediction information according to the classification objective function.
  • a key point prediction network training apparatus including: an acquisition module, configured to acquire a sample image of key point annotation information including a general object, wherein the key point annotation information includes The key point location labeling information and the key point presence labeling information; the training module, configured to train the first convolutional neural network using the sample image, to obtain a key point of the first convolutional neural network for the general object of the sample image Position prediction information and key point existence prediction information; a monitoring module, configured to supervise the key point position prediction information and the key point presence prediction information according to an objective function; and a determining module, configured to determine the first convolutional nerve Whether the iterative loss rate of the network satisfies the set condition; and an execution module, configured to complete training on the first convolutional neural network if an iterative loss rate of the first convolutional neural network satisfies a set condition.
  • the execution module is further configured to: if the iterative loss rate of the first convolutional neural network does not satisfy the set condition, the key point position prediction information and the key obtained according to the training module The point presence prediction information adjusts parameters of the first convolutional neural network until the iterative loss rate satisfies the set condition.
  • the monitoring module is configured to supervise the key point position prediction information according to a regression objective function, and supervise the key point existence prediction information according to the classification objective function.
  • the first convolutional neural network includes: a feature extraction layer, a first keypoint prediction convolution layer and a second keypoint prediction convolution layer, the first keypoint prediction convolution layer and the first Two key point prediction convolution layers are respectively connected to the feature extraction layer; wherein: the feature extraction layer is used to extract feature information of the sample image; and the first key point prediction convolution layer is used to The information is subjected to a convolution operation to obtain the key point position prediction information; and the second key point prediction convolution layer is configured to perform a convolution operation on the feature information to obtain the key point existence prediction information.
  • the convolution kernel of the first keypoint prediction convolution layer is 1*1*2N
  • the convolution kernel of the second keypoint prediction convolution layer is 1*1*N, where N is The total number of key points to be predicted.
  • the first convolutional neural network comprises: a full convolutional neural network.
  • an image processing apparatus including: a detection module, configured to detect an image by using a key point prediction apparatus according to any of the above embodiments of the present application, or
  • the first convolutional neural network detected by the key point prediction network training device according to any of the above embodiments detects an image, and obtains a key point prediction result of the general object of the image, where the key point prediction result includes a key point position.
  • Predictive information exists in the prediction information and the key points; and the processing module is configured to process the image according to the key point prediction result of the universal object.
  • the processing module includes: a location determining submodule, configured to determine a location of the universal object in the image according to a keypoint prediction result of the universal object.
  • the processing module includes: a feature extraction sub-module, configured to extract an object feature of the universal object in the image according to the key point prediction result of the universal object.
  • the processing module includes: a posture estimation submodule, configured to estimate a posture of the universal object in the image according to a key point prediction result of the universal object.
  • a posture estimation submodule configured to estimate a posture of the universal object in the image according to a key point prediction result of the universal object.
  • the processing module includes: an object tracking sub-module, configured to track a general object in the image according to a key point prediction result of the universal object.
  • the processing module includes: an object recognition submodule, configured to identify a general object in the image according to a key point prediction result of the universal object.
  • the processing module includes: an object rendering sub-module, configured to render a general object in the image according to a key point prediction result of the universal object.
  • an electronic device including: a processor and a memory; the memory is configured to store at least one executable instruction, the executable instruction causing the processor to execute as in the present application The operation corresponding to the object key point prediction method in any of the above embodiments; or the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform any of the above embodiments of the present application
  • the object key point predicts an operation corresponding to the network training method; or the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform the method as described in any of the above embodiments of the present application The operation corresponding to the image processing method.
  • another electronic device including:
  • the processor and the key point prediction apparatus according to any one of the above embodiments of the present application; when the processor runs the key point prediction apparatus, the unit in the key point prediction apparatus according to any one of the embodiments of the present application is operated; or
  • the processor and the key point prediction network training device according to any one of the above embodiments of the present application; the key point prediction network training device according to any of the above embodiments of the present application when the processor runs the key point prediction network training device The unit in is being run; or
  • the processor and the image processing apparatus according to any of the above embodiments of the present application; when the processor runs the image processing apparatus, the unit in the image processing apparatus according to any of the above embodiments of the present application is operated.
  • a computer program comprising computer readable code, the processor in the device executing to implement the present application when the computer readable code is run on a device An instruction of each step in the object key point prediction method described in any of the above embodiments; or
  • the processor in the device executes instructions for implementing the steps in the object keypoint prediction network training method as described in any of the above embodiments of the present application; or
  • the processor in the device executes instructions for implementing the steps in the image processing method as described in any of the above-described embodiments of the present application.
  • a computer readable storage medium for storing computer readable instructions that, when executed, implement an object key as described in any of the above embodiments of the present application. The operation of each step in the point prediction method, or the operation of each step in the object key point prediction network training method according to any of the above embodiments of the present application, or the image processing method according to any of the above embodiments of the present application The operation of each step.
  • the first convolutional neural network is trained by using the sample image of the key point annotation information of the universal object, and the trained first convolutional neural network is used to predict the key point of the universal object in the image.
  • a general-purpose item can be understood as a general object in a natural scene, such as an object such as a human body, a vehicle, an animal or a plant, a furniture, a key point such as a person's head, a hand, a torso position; a front window of a vehicle, a tire, and a chassis
  • the first convolutional neural network expands the critical point prediction range of the object class.
  • the key point position prediction information and the key point existence prediction information of the general object can be directly obtained, wherein the key point position prediction information is the key point to be predicted.
  • the position information in the image, the key point exists, the prediction information is information about whether the key point to be predicted exists in the image, when the position information of the key point to be predicted is obtained in the image, and the key point to be predicted is determined to exist in the image.
  • the key point can be predicted, and the key point position prediction information combined with the common object and the key point presence prediction information are integrated to determine the key points of the general object in the image.
  • FIG. 1 is a flowchart of a key point prediction method according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a key point prediction method according to another embodiment of the present application.
  • FIG. 3 is a flowchart of training a first convolutional neural network in a key point prediction method according to another embodiment of the present application
  • FIG. 4 is a schematic diagram of a training principle of a first convolutional neural network according to an embodiment of the present application
  • FIG. 5 is a flowchart of an image processing method according to an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a key point prediction apparatus according to an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a key point prediction apparatus according to another embodiment of the present application.
  • FIG. 8 is a structural block diagram of a key point prediction network training apparatus according to an embodiment of the present application.
  • FIG. 9 is a block diagram showing the structure of an image processing apparatus according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the present embodiment takes the key point of the general object in the prediction image as an example scenario, and takes the mobile terminal or the PC as the executor of the key point prediction method in this embodiment as an example.
  • the key point prediction method of this embodiment will be described.
  • other application scenarios, as well as other devices having the functions of data collection, processing, and transmission can implement the key point prediction solution provided by the embodiment of the present application.
  • the scene is not limited.
  • a universal object refers to any object existing in a natural scene, such as an object such as a human body, a vehicle, an animal or a plant, or a piece of furniture.
  • the key point of a general object is the local position that most objects in the object category to which the universal object belongs can be used to distinguish the object type, for example, the position of the person's head, hand, torso, etc., the front window of the vehicle, the tire, Chassis, rear box and other locations.
  • the key point prediction method of this embodiment includes:
  • Step S100 detecting an image by using a first convolutional neural network to obtain feature information of the image.
  • the first convolutional neural network is a convolutional neural network trained using a sample image containing key point annotation information of a general object, and the first convolutional neural network is used to predict key point information of the general object in the image.
  • the convolutional neural network in various embodiments of the present application is a neural network including convolution processing capability, and may include a convolutional layer, or a convolutional layer and a non-convolutional layer.
  • the image may be an image derived from an image capturing device, and is composed of an image of one frame and one frame, or may be a single image or an image, and may also be derived from other devices, and the image includes a static image or a video. image.
  • the image can be input to the first convolutional neural network to obtain feature information of the image.
  • the feature information includes feature information of the general object.
  • the step S100 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the detection module 600 executed by the processor.
  • Step S102 Using a first convolutional neural network to predict a key point of the general object of the image according to the feature information, and obtaining a key point prediction result of the universal object of the image.
  • the first convolutional neural network may include, for example but not limited to: an input layer, a feature extraction layer, and a keypoint prediction convolution layer.
  • the input layer is used to input the image
  • the feature extraction layer is used to extract the feature information of the image
  • the key point prediction convolution layer is used to convolve the feature information to obtain the key point prediction result
  • the key point prediction result includes the key point.
  • the position prediction information and the key points have prediction information.
  • the step S102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a prediction module 602 executed by the processor.
  • the keyed point prediction result of the general object can be predicted from the image by using the trained first convolutional neural network, and the first convolutional neural network in this embodiment is used by the universal convolutional neural network.
  • the sample image of the key point annotation information of the object is trained.
  • the first convolutional neural network can predict the key points of the objects of multiple categories, and expands the applicable range of the prediction key convolutional neural network of the key point of the object.
  • FIG. 2 a flow chart of steps of a keypoint prediction method according to another embodiment of the present application is shown.
  • the present embodiment still uses a mobile terminal or a PC as an example to describe the key point prediction method provided in this embodiment.
  • Other devices and scenarios may be implemented by referring to this embodiment.
  • This embodiment is to emphasize the differences from the above-mentioned embodiments. For the same, reference may be made to the description and description in the foregoing embodiments, and details are not described herein.
  • the key point prediction method of this embodiment includes:
  • Step S200 training the first convolutional neural network.
  • This step S200 may include the following sub-steps:
  • Sub-step S300 acquiring a sample image containing key point annotation information of the general object.
  • the sample image of the key point labeling information of the universal object may be a video image derived from the image capturing device, and is composed of an image of one frame and one frame, or may be a single image or an image, or may be derived from Other devices are then labeled in the sample image.
  • the key point labeling information includes the key point position labeling information and the key point existence labeling information.
  • the key point of the general object and the key point position of the general object may be marked in the sample image.
  • the present embodiment does not limit the source and the acquisition path of the sample image of the key point labeling information of the general object.
  • the sub-step S300 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 800 executed by the processor.
  • Sub-step S302 training the first convolutional neural network using the sample image, obtaining key point position prediction information and key point existence prediction information of the general object of the first convolutional neural network for the sample image.
  • the key point position prediction information can be understood as position information of a key point of the general object in the sample image, for example, coordinate point information or pixel point information.
  • the key point existence prediction information can be understood as the existence information of the key point of the general object in the sample image, for example, a certain key point of a general object exists or does not exist in the sample image, and the key point position of the common object in this embodiment There is no restriction on the content of prediction information and key points in the presence of prediction information.
  • the first convolutional neural network may include: an input layer, a feature extraction layer, and a keypoint prediction convolution layer.
  • the key point prediction convolution layer may include a first key point prediction convolution layer and a second key point prediction convolution layer, and the first key point prediction convolution layer and the second key point prediction convolution layer respectively and the feature extraction layer connection.
  • the input layer is used to input the sample image
  • the feature extraction layer is used to extract the feature information of the sample image.
  • the first key point prediction convolution layer is used to convolve the feature information to obtain the key position prediction information
  • the second key point prediction The convolution layer is used to convolve the feature information to obtain the prediction information of the key point existence.
  • the convolution kernel of the first key point prediction convolution layer is 1*1*2N
  • the convolution kernel of the second key point prediction convolution layer is 1*1*N
  • N The total number of key points to be predicted
  • N is an integer greater than or equal to 1.
  • Training the first convolutional neural network that is, training the input layer of the first convolutional neural network, the feature extraction layer, the first key point prediction convolution layer and the second key point prediction convolution layer parameters, and then according to the training parameters Construct a first convolutional neural network.
  • the first convolutional neural network can be trained using a sample image containing key point annotation information of a general object, so that the first convolutional neural network obtained by the training is more accurate, and samples in various cases can be selected when selecting the sample image.
  • the image may include a sample image marked with key point annotation information of the general object, and may also include a sample image of key point annotation information not labeled with the general object.
  • the first convolutional neural network in this embodiment may include a full convolutional neural network, that is, a neural network composed entirely of convolutional layers.
  • the first convolutional neural network in each embodiment of the present application may be a convolutional neural network of any structure. This embodiment is only used as an example for description. In practical applications, the first convolutional neural network is not limited thereto. For example, it can also be other two-class or multi-class convolutional neural networks.
  • the sub-step S302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a training module 802 executed by the processor.
  • Sub-step S304 supervising the key point position prediction information and the key point existence prediction information according to the objective function.
  • the key point position prediction information and the key point existence prediction information are simultaneously supervised according to the objective function, for example, according to a regression objective function, such as a smooth L1 objective function, an euclidean objective function, etc.
  • the key position prediction information is supervised, and the prediction information of the key points is supervised according to the classification objective function, such as the softmax objective function, the cross entropy objective function, the hint objective function, and the like.
  • the sub-step S304 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a supervisory module 804 executed by the processor.
  • Sub-step S306 determining whether the iterative loss rate of the first convolutional neural network satisfies the set condition.
  • sub-step S308 is performed; if not, sub-step S310 is performed.
  • the setting condition may be that the iteration loss rate remains unchanged during the predetermined number of training processes of the first convolutional neural network, or the change of the iterative loss rate is kept within a certain range, and the setting conditions may be The content is not limited.
  • the sub-step S306 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a decision module 806 executed by the processor.
  • Sub-step S308 completing the training of the first convolutional neural network.
  • the sub-step S308 may be performed by a processor invoking a corresponding instruction stored in a memory or by an execution module 808 executed by the processor.
  • Sub-step S310 adjusting parameters of the first convolutional neural network according to the obtained key point position prediction information and key point presence prediction information until the iterative loss rate satisfies the set condition.
  • the iterative loss rate of the first convolutional neural network does not satisfy the set condition, the obtained key point position prediction information and the key point existence prediction information do not correspond to the key point position labeling information and the key point presence labeling information in the sample image. That is to say, the parameters of the currently trained first convolutional neural network are not accurate enough, and the parameters of the first convolutional neural network need to be adjusted accordingly.
  • the adjustment process of the parameters of the first convolutional neural network is not in this embodiment. Make restrictions. When the iterative loss rate of the parameter-adjusted first convolutional neural network satisfies the set condition, it is determined that the training of the first convolutional neural network is completed.
  • the sub-step S310 may be performed by a processor invoking a corresponding instruction stored in a memory or by an execution module 808 executed by the processor.
  • the step S200 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a keypoint predictive network training device operated by the processor.
  • Step S202 Detecting an image by using a first convolutional neural network to obtain feature information of the image.
  • the step S202 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the detection module 700 being executed by the processor.
  • Step S204 Using a first convolutional neural network to predict a key point of the general object of the image according to the feature information, and obtaining a key point prediction result of the universal object of the image.
  • the step S204 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a prediction module 702 being executed by the processor.
  • FIG. 4 a schematic diagram of a training principle of a first convolutional neural network according to an embodiment of the present application is shown. Since the operation speed of the full convolutional neural network is faster than that of the non-full convolutional neural network, in this embodiment, the first convolutional neural network uses a full convolutional neural network. In this embodiment, the full convolutional neural network or the non-full convolutional neural network is not limited to the first convolutional neural network. In this embodiment, the first convolutional neural network is taken as an example of a full convolutional neural network, and the sample image is input to the full convolutional neural network, and the feature information of the sample image is obtained from the feature extraction layer of the full convolutional neural network.
  • the first key point prediction convolution layer is used to convolve the feature information to obtain the key position prediction information. Even if the key point does not exist, the first key point prediction convolution layer will be a non-existent key point random prediction. A key point location information.
  • the convolution operation is performed on the feature information by the second key point prediction convolution layer to obtain the key point existence prediction information.
  • the smoothing L1 objective function is used to supervise the regression task of the training key position prediction information
  • the softmax objective function is used to supervise the classification task of the training key point with the prediction information.
  • the prediction information predicts the key point of the general object in the sample image.
  • the keyed point prediction result of the general object can be predicted from the image by using the trained first convolutional neural network, and the first convolutional neural network in this embodiment is used by the universal convolutional neural network.
  • the sample image of the key point annotation information of the object is trained.
  • the first convolutional neural network can predict the key points of the objects of multiple categories, and expands the applicable range of the prediction key convolutional neural network of the key point of the object.
  • the key point prediction result in this embodiment may include key point position prediction information and key point existence prediction information, wherein the key point position prediction information is position information of a key point to be predicted in the image, and the key point existence prediction information is to be The key point of the prediction is whether the information exists in the image.
  • the key point position prediction information of the object and the key point existence prediction information comprehensively judge the key points of the general object in the image.
  • the key point prediction result in this embodiment includes not only the key point position prediction information but also the key point existence prediction information, which increases the prediction of whether the key point exists, and improves the accuracy of the key point prediction.
  • the first convolutional neural network in this embodiment may include a first key point prediction convolution layer and a second key point prediction convolution layer, and the first key point prediction convolution layer and the second key point prediction convolution layer respectively
  • the feature extraction layer is connected, and after the feature extraction layer extracts the feature information of the image, the first key point prediction convolution layer and the second key point prediction convolution layer may perform convolution operation on the feature information in parallel, and the first key point prediction convolution
  • the layer and the second key point prediction convolution layer belong to a parallel relationship, that is, the key point position prediction information and the key point presence prediction information are simultaneously predicted.
  • the key point position prediction information includes [x1, y1, x2, y2, ..., xN, yN], where x1, x2, ..., xN represent key points in the sample image
  • the abscissa information in the middle, y1, y2, ..., yN represents the ordinate information of the key point in the sample image.
  • the key point existence prediction information includes [s1, s2, ..., sN], where s1, s2, ..., sN represent the existence information of the key point in the sample image.
  • the first key point prediction convolution layer in this embodiment is used for performing convolution operation on the feature information to obtain key point position prediction information. Since the key point position prediction information includes the abscissa information and the ordinate information, the first key point is The convolution kernel of the predicted convolutional layer is 1*1*2N.
  • the second key point prediction convolution layer is used to convolve the feature information to obtain the prediction information of the key point. Since the key point exists as the key point or the key point does not exist, the second key point prediction convolution layer The convolution kernel is 1*1*N.
  • the first convolutional neural network in this embodiment may be a full convolutional neural network. Since the operation speed of the full convolutional neural network is faster than that of the non-full convolutional neural network, the first convolutional neural network is used for prediction. The speed of the key points is faster than the prediction of key points using a non-full convolutional neural network.
  • the image processing method of this embodiment may be performed by any device having data acquisition, processing, and transmission functions, including but not limited to a mobile terminal, a PC, and the like.
  • the image processing method of this embodiment includes:
  • Step S500 Perform key point prediction on the image to obtain a key point prediction result of the general object in the image.
  • the key point prediction of the image may be performed by using the first convolutional neural network trained in the above embodiment to perform key point prediction on the image, or the key point prediction method in the above embodiment is used to perform key point prediction and prediction on the image.
  • the key point prediction method in the above embodiment is used to perform key point prediction and prediction on the image.
  • the step S500 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the detection module 900 being executed by the processor.
  • Step S502 Processing the image according to the key point prediction result of the general object.
  • step S502 may be performed by a processor invoking a corresponding instruction stored in a memory or by a processing module 902 being executed by the processor.
  • the image may be subjected to various processing according to the key point prediction result of the general object, such as determining the position of the general object in the image according to the key point prediction result of the universal object; and extracting the image according to the key point prediction result of the general object
  • the object feature of the universal object estimating the pose of the universal object in the image according to the key point prediction result of the universal object; tracking the general object in the image according to the key point prediction result of the universal object; and identifying the image according to the key point prediction result of the universal object a generic object; renders a generic object in an image based on keypoint predictions of a generic object, and so on.
  • This embodiment is only described by taking the position of the general object in the image according to the key point prediction result of the general object as an example. Other ways of processing the image according to the key point prediction result of the general object can be performed by referring to the commonly used processing method.
  • the embodiment does not limit the technical means for processing an image according to a key point prediction result of a general object.
  • predicting the key point position prediction information of the general object and the presence information of the key point prediction information can be based on the above-mentioned cat key
  • the point information determines the position, orientation and posture of the cat, and the like.
  • any of the above methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any of the foregoing methods provided by the embodiments of the present application may be executed by a processor, such as the processor, by using a corresponding instruction stored in the memory to perform any of the foregoing methods provided by the embodiments of the present application. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the key point prediction apparatus includes: a detection module 600, configured to detect an image by using a first convolutional neural network, and obtain feature information of the image; the first convolutional neural network is to mark information by using a key point containing a general object.
  • the convolutional neural network obtained by training the sample image; the prediction module 602 is configured to use the first convolutional neural network to predict a key point of the general object of the image according to the feature information, obtain a key point prediction result of the general object of the image, and predict the key point result Including key point position prediction information and key point presence prediction information.
  • the key point prediction result of the general object can be predicted from the image by using the trained first convolutional neural network, and the first convolutional neural network in this embodiment is used by the universal convolutional neural network.
  • the sample image of the key point annotation information of the object is trained.
  • the first convolutional neural network can predict the key points of the objects of multiple categories, and expands the applicable range of the prediction key convolutional neural network of the key point of the object.
  • the key point prediction apparatus includes: a detection module 700, configured to detect an image by using a first convolutional neural network to obtain feature information of an image; and the first convolutional neural network is to mark information by using a key point containing a general object.
  • the convolutional neural network obtained by training the sample image; the prediction module 702 is configured to use the first convolutional neural network to predict a key point of the general object of the image according to the feature information, obtain a key point prediction result of the general object of the image, and predict the key point result Including key point position prediction information and key point presence prediction information.
  • the first convolutional neural network may include a feature extraction layer, a first key point prediction convolution layer and a second key point prediction convolution layer, a first key point prediction convolution layer and a second key point prediction convolution
  • the layer is respectively connected to the feature extraction layer, wherein the feature extraction layer is used to extract feature information of the image; the first key point prediction convolution layer is used for convoluting the feature information to obtain key position position prediction information; the second key point The prediction convolution layer is used to convolve the feature information to obtain the prediction information of the key point existence.
  • the convolution kernel of the first key point prediction convolution layer is 1*1*2N
  • the convolution kernel of the second key point prediction convolution layer is 1*1*N, where N is the key to be predicted The total number of points.
  • the first convolutional neural network comprises a full convolutional neural network.
  • the key point prediction apparatus further includes: a training module 704, configured to train the first convolutional neural network
  • the training module 704 includes: an acquisition sub-module 7040, configured to acquire a sample image, and key point labeling information.
  • the key position location labeling information and the key point presence labeling information are included;
  • the training sub-module 7042 is configured to train the first convolutional neural network using the sample image to obtain the key point position prediction information of the universal object of the first convolutional neural network for the sample image.
  • the supervision sub-module 7044 is configured to supervise the key point position prediction information and the key point existence prediction information according to the objective function;
  • the determining sub-module 7046 is configured to determine the iterative loss rate of the first convolutional neural network. Whether the setting condition is satisfied; the executing sub-module 7048 is configured to complete the training of the first convolutional neural network if the iterative loss rate of the first convolutional neural network satisfies the set condition.
  • the execution sub-module 7048 is further configured to: if the iterative loss rate of the first convolutional neural network does not satisfy the set condition, adjust the key point position prediction information and the key point existence prediction information obtained by the training sub-module 7042. The parameters of a convolutional neural network until the iterative loss rate satisfies the set conditions.
  • the supervising sub-module 7044 is configured to supervise the key point position prediction information according to the regression objective function, and at the same time, supervise the existence information of the key point according to the classification objective function.
  • the key point prediction result of the general object can be predicted from the image by using the trained first convolutional neural network, and the first convolutional neural network in this embodiment is used by the universal convolutional neural network.
  • the sample image of the key point annotation information of the object is trained.
  • the first convolutional neural network can predict the key points of the objects of multiple categories, and expands the applicable range of the prediction key convolutional neural network of the key point of the object.
  • the key point prediction result in this embodiment includes key point position prediction information and key point existence prediction information, wherein the key point position prediction information is position information of the key point to be predicted in the image, and the key point existence prediction information is to be predicted.
  • the key point is whether the information exists in the image.
  • the key point position prediction information and the key point existence prediction information comprehensively judge the key points of the general object in the image.
  • the key point prediction result in this embodiment includes not only the key point position prediction information but also the key point existence prediction information, which increases the prediction of whether the key point exists, and improves the accuracy of the key point prediction.
  • the first convolutional neural network in this embodiment includes a first key point prediction convolution layer and a second key point prediction convolution layer, and the first key point prediction convolution layer and the second key point prediction convolution layer respectively and features
  • the extraction layer is connected, and after the feature extraction layer extracts the feature information of the image, the first key point prediction convolution layer and the second key point prediction convolution layer can perform convolution operation on the feature information in parallel, and the first key point prediction convolution layer
  • the second key point prediction convolution layer belongs to a parallel relationship, that is, the key point position prediction information and the key point presence prediction information are simultaneously predicted.
  • the key point position prediction information includes [x1, y1, x2, y2, ..., xN, yN], where x1, x2, ..., xN represent key points in the sample image
  • the abscissa information in the middle, y1, y2, ..., yN represents the ordinate information of the key point in the sample image.
  • the key point existence prediction information includes [s1, s2, ..., sN], where s1, s2, ..., sN represent the existence information of the key point in the sample image. This embodiment improves the efficiency of the first convolutional neural network to predict key points.
  • the first key point prediction convolution layer in this embodiment is used for performing convolution operation on the feature information to obtain key point position prediction information. Since the key point position prediction information includes the abscissa information and the ordinate information, the first key point is The convolution kernel of the predicted convolutional layer is 1*1*2N.
  • the second key point prediction convolution layer is used to convolve the feature information to obtain the prediction information of the key point. Since the key point exists as the key point or the key point does not exist, the second key point prediction convolution layer The convolution kernel is 1*1*N.
  • the first convolutional neural network in this embodiment may be a full convolutional neural network. Since the operation speed of the full convolutional neural network is faster than that of the non-full convolutional neural network, the first convolutional neural network is used for prediction. The speed of the key points is faster than the prediction of key points using a non-full convolutional neural network.
  • FIG. 8 a block diagram of a key point prediction network training apparatus according to an embodiment of the present application is shown.
  • the key point prediction network training apparatus includes: an obtaining module 800, configured to acquire a sample image of key point labeling information of a general object, wherein the key point labeling information includes key point position labeling information and key point presence labeling information.
  • a training module 802 configured to train the first convolutional neural network using the sample image, obtain key point position prediction information and key point presence prediction information of the general object of the first convolutional neural network for the sample image; and supervise module 804, configured to:
  • the key point position prediction information and the key point existence prediction information are supervised according to the objective function;
  • the determining module 806 is configured to determine whether the iterative loss rate of the first convolutional neural network satisfies the set condition; and the executing module 808 is used to The iterative loss rate of the convolutional neural network satisfies the set condition, and the training of the first convolutional neural network is completed.
  • the executing module 808 is further configured to: if the iterative loss rate of the first convolutional neural network does not satisfy the set condition, adjust the first volume according to the key point position prediction information and the key point presence prediction information obtained by the training module 802.
  • the parameters of the neural network are up to the set condition under the iterative loss rate.
  • the monitoring module 804 is configured to supervise the key point position prediction information according to the regression objective function, and supervise the key point existence prediction information according to the classification objective function.
  • the first convolutional neural network may include a feature extraction layer, a first key point prediction convolution layer and a second key point prediction convolution layer, a first key point prediction convolution layer and a second key point prediction convolution
  • the layer is respectively connected with the feature extraction layer; wherein the feature extraction layer is used for extracting feature information of the sample image; the first key point prediction convolution layer is used for convoluting the feature information to obtain key position position prediction information; The point prediction convolution layer is used to convolve the feature information to obtain key point existence prediction information.
  • the convolution kernel of the first key point prediction convolution layer is 1*1*2N
  • the convolution kernel of the second key point prediction convolution layer is 1*1*N, where N is the key to be predicted The total number of points.
  • the first convolutional neural network comprises a full convolutional neural network.
  • the key point prediction network training apparatus of the present embodiment is used to implement the corresponding key point prediction network training method in the foregoing various embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • the image processing apparatus includes: a detection module 900 for detecting an image by using a key point prediction apparatus according to any of the above embodiments of the present application, or using a key point prediction network training according to any of the above embodiments of the present application.
  • the first convolutional neural network detected by the device detects the image, and obtains a key point prediction result of the general object of the image, where the key point prediction result includes key point position prediction information and key point existence prediction information; and the processing module 902 is configured to The key point prediction result of the general object processes the image.
  • the processing module 902 includes: a location determining sub-module 9020, configured to determine a location of the universal object in the image according to a keypoint prediction result of the universal object.
  • the processing module 902 includes: a feature extraction sub-module 9021, configured to extract an object feature of the universal object in the image according to the key point prediction result of the universal object.
  • the processing module 902 includes: a posture estimation sub-module 9022, configured to estimate a posture of the universal object in the image according to the key point prediction result of the universal object.
  • the processing module 902 includes: an object tracking sub-module 9023, configured to track a general object in the image according to a key point prediction result of the universal object.
  • the processing module 902 includes an object recognition sub-module 9024 for identifying a general object in the image according to a key point prediction result of the universal object.
  • the processing module 902 includes: an object rendering sub-module 9025, configured to render a general object in the image according to a key point prediction result of the universal object.
  • the image processing apparatus of the present embodiment is used to implement the corresponding image processing method in the foregoing various embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • an embodiment of the present application further provides an electronic device, including: a processor and a memory;
  • the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the object key point prediction method according to any one of the foregoing embodiments of the present application; or
  • the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the object keypoint prediction network training method described in any one of the foregoing embodiments of the present application; or
  • the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the image processing method described in any of the above embodiments of the present application.
  • the embodiment of the present application further provides another electronic device, including:
  • the processor and the key point prediction network training device according to any one of the above embodiments of the present application; the key point prediction network training device according to any of the above embodiments of the present application when the processor runs the key point prediction network training device The unit in is being run; or
  • the processor and the image processing apparatus according to any of the above embodiments of the present application; when the processor runs the image processing apparatus, the unit in the image processing apparatus according to any of the above embodiments of the present application is executed.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • a schematic structural diagram of an electronic device 1000 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
  • the electronic device 1000 includes one or more processors and communication components.
  • processors such as: one or more central processing units (CPUs) 1001, and/or one or more image processors (GPUs) 1013, etc., the processors may be stored in a read only memory (ROM)
  • ROM read only memory
  • the executable instructions in 1002 or executable instructions loaded from the storage portion 1008 into the random access memory (RAM) 1003 perform various appropriate actions and processes.
  • the communication component includes a communication component 1012 and/or a communication interface 1009.
  • the communication component 1012 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card, the communication interface 1009 includes a communication interface of a network interface card such as a LAN card, a modem, etc., and the communication interface 1009 is via a network such as the Internet. Perform communication processing.
  • a network card which may include, but is not limited to, an IB (Infiniband) network card
  • the communication interface 1009 includes a communication interface of a network interface card such as a LAN card, a modem, etc.
  • the communication interface 1009 is via a network such as the Internet. Perform communication processing.
  • the processor can communicate with the read only memory 1002 and/or the random access memory 1003 to execute executable instructions, connect to the communication component 1012 via the communication bus 1004, and communicate with other target devices via the communication component 1012, thereby completing the embodiments of the present application.
  • the operation corresponding to any of the key point prediction methods provided for example, detecting the image by using the first convolutional neural network to obtain feature information of the image; the first convolutional neural network is a sample image using the key point labeling information containing the general object
  • the convolutional neural network obtained by training; using the first convolutional neural network to predict the key points of the general object of the image according to the feature information, and obtaining the key point prediction result of the general object of the image, the key point prediction result includes the key point position prediction information and the key Point presence prediction information.
  • the sample image of the key point labeling information of the universal object is obtained, wherein the key point labeling information includes key point position labeling information and key point presence labeling information; and the first convolutional neural network is trained by using the sample image, Obtaining key point position prediction information and key point presence prediction information of the universal object of the first convolutional neural network for the sample image; performing the key point position prediction information and the key point presence prediction information according to an objective function Supervising, judging whether the iterative loss rate of the first convolutional neural network satisfies a set condition; if satisfied, completing training of the first convolutional neural network.
  • the image is detected by the key point prediction method according to any of the above embodiments, or the first convolutional neural network is used to detect the image by using the key point prediction network training method as described in any of the above embodiments.
  • RAM 1003 various programs and data required for the operation of the device can be stored.
  • the CPU 1001 or the GPU 1013, the ROM 1002, and the RAM 1003 are connected to each other through the communication bus 1004.
  • ROM 1002 is an optional module.
  • the RAM 1003 stores executable instructions, or writes executable instructions to the ROM 1002 at runtime, the executable instructions causing the processor to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 1005 is also coupled to communication bus 1004.
  • the communication component 1012 can be integrated or can be configured to have multiple sub-modules (e.g., multiple IB network cards) and be on a communication bus link.
  • the following components are connected to the I/O interface 1005: an input portion 1006 including a keyboard, a mouse, etc.; an output portion 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk or the like And a communication interface 1009 including a network interface card such as a LAN card, modem, or the like.
  • Driver 1010 is also coupled to I/O interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 1010 as needed so that a computer program read therefrom is installed into the storage portion 1008 as needed.
  • FIG. 10 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 10 may be selected, deleted, added, or replaced according to actual needs; Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, communication components can be separated, or integrated on the CPU or GPU. ,and many more.
  • These alternative embodiments are all within the scope of the present application.
  • embodiments of the present application may be implemented as a computer software program.
  • embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising the corresponding execution
  • the instructions corresponding to the method steps provided by the embodiments of the present application are provided by the embodiments of the present application.
  • the program code may include an instruction corresponding to the following steps provided in the embodiment of the present application: detecting the image by using the first convolutional neural network to obtain feature information of the image; and using the first convolutional neural network to use a key point containing the general object
  • the convolutional neural network is trained by the sample image of the annotation information; the key point of the general object of the image is predicted by the first convolutional neural network according to the feature information, and the key point prediction result of the general object of the image is obtained, and the key point prediction result includes the key point The position prediction information and the key points have prediction information.
  • the program code may include an instruction corresponding to the following steps provided in the embodiment of the present application: acquiring a sample image of key point annotation information including a general object, wherein the key point annotation information includes key point location annotation information and a key Point presence information; training the first convolutional neural network using the sample image, obtaining key point position prediction information and key point presence prediction information of the universal object of the first convolutional neural network for the sample image; The function supervises the key point position prediction information and the key point existence prediction information, and determines whether the iterative loss rate of the first convolutional neural network satisfies a set condition; if yes, completes the first volume Training of the neural network.
  • the program code may include an instruction corresponding to the following steps provided by the embodiment of the present application: detecting an image by using a key point prediction method according to any of the above embodiments, or adopting any of the foregoing embodiments.
  • the key point prediction network training method is used to detect the image by the first convolutional neural network, and the key point prediction result of the general object of the image is obtained, and the key point prediction result includes the key point position prediction information and the key point existence prediction information.
  • the computer program can be downloaded and installed from the network via a communication component, and/or installed from the removable medium 1011.
  • the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on a device, the processor in the device executes to implement any of the embodiments of the present application.
  • the processor in the device executes instructions for implementing the steps in the object keypoint prediction network training method of any of the embodiments of the present application.
  • the processor in the device executes instructions for implementing the steps in the image processing method as described in any of the embodiments of the present application.
  • the embodiment of the present application further provides a computer readable storage medium, configured to store computer readable instructions, where the instructions are executed to implement each of the object key point prediction methods according to any one of the embodiments of the present application.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the method, apparatus, and apparatus of the embodiments of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the embodiments of the present application are not limited to the order of the above optional description unless otherwise specified.
  • the present application may also be embodied as a program recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with embodiments of the present application.
  • the present application also covers a recording medium storing a program for executing the method according to an embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种关键点预测、网络训练及图像处理方法和装置、电子设备,其中,所述关键点预测方法包括:采用第一卷积神经网络检测图像,获得所述图像的特征信息(S100);所述第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;采用所述第一卷积神经网络根据所述特征信息预测所述图像的通用物体的关键点,获得所述图像的通用物体的关键点预测结果(S102),关键点预测结果包括关键点位置预测信息和关键点存在预测信息。

Description

关键点预测、网络训练及图像处理方法和装置、电子设备
本申请要求在2016年12月30日提交中国专利局、申请号为CN201611261431.9、发明名称为“关键点预测、网络训练、图像处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,尤其涉及一种关键点预测、网络训练及图像处理方法和装置、电子设备。
背景技术
通用物体的关键点预测是指对于自然场景中的通用物体(如人体、交通工具、动植物、家具等物体)的关键点(如人的头部、手部、躯干位置;车辆的前窗、轮胎、底盘、后箱位置等)进行预测。通用物体的关键点可用于增强通用物体检测和场景分割等应用的效果。
发明内容
本申请实施例提供了一种关键点预测、网络训练及图像处理技术方案。
根据本申请实施例的一方面,提供了一种关键点预测方法,包括:采用第一卷积神经网络检测图像,获得所述图像的特征信息;所述第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;采用所述第一卷积神经网络根据所述特征信息预测所述图像的通用物体的关键点,获得所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
可选地,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:所述特征提取层用于提取所述图像的特征信息;所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
可选地,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,所述第一卷积神经网络包括:全卷积神经网络。
可选地,所述第一卷积神经网络的训练,包括:获取所述样本图像,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;若满足,则完成对所述第一卷积神经网络的训练。
可选地,所述第一卷积神经网络的训练,还包括:若不满足,若不满足,则根据获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
可选地,所述根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,包括:根据回归目标函数对所述关键点位置预测信息进行监督,根据分类目标函数对所述关键点存在预测信息进行监督。
根据本申请实施例的另一方面,提供了一种关键点预测网络训练方法,包括:获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标 注信息和关键点存在标注信息;使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;若满足,则完成对所述第一卷积神经网络的训练。
可选地,还包括:若不满足,则根据获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
可选地,所述根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,包括:根据回归目标函数对所述关键点位置预测信息进行监督,根据分类目标函数对所述关键点存在预测信息进行监督。
可选地,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:所述特征提取层用于提取所述样本图像的特征信息;所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
可选地,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,所述第一卷积神经网络包括:全卷积神经网络。
根据本申请实施例的又一方面,提供了一种图像处理方法,包括:采用如上述任一实施例所述的关键点预测方法检测图像,或者,采用如上述任一实施例所述的关键点预测网络训练方法训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;根据所述通用物体的关键点预测结果对所述图像进行处理。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果确定所述图像中的通用物体的位置。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果提取所述图像中的通用物体的物体特征。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果估计所述图像中的通用物体的姿态。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果跟踪所述图像中的通用物体。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果识别所述图像中的通用物体。
可选地,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:根据所述通用物体的关键点预测结果渲染所述图像中的通用物体。
根据本申请实施例的再一方面,提供了一种关键点预测装置,包括:检测模块,用于采用第一卷积神经网络检测图像,获得所述图像的特征信息;所述第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;预测模块,用于采用所述第一卷积神经网络根据所述特征信息预测所述图像的通用物体的关键点,获得所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
可选地,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:所述特征提取层用于提取所述图像的特征信息;所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;所述第二关 键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
可选地,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,所述第一卷积神经网络包括:全卷积神经网络。
可选地,所述装置还包括:训练模块,用于训练所述第一卷积神经网络,所述训练模块包括:获取子模块,用于获取所述样本图像,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;训练子模块,用于使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;监督子模块,用于根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督;判断子模块,用于判断所述第一卷积神经网络的迭代损失率是否满足设定条件;执行子模块,用于若所述第一卷积神经网络的迭代损失率满足设定条件,则完成对所述第一卷积神经网络的训练。
可选地,所述执行子模块,还用于若所述第一卷积神经网络的迭代损失率不满足设定条件,则根据所述训练子模块获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
可选地,所述监督子模块,用于根据回归目标函数对所述关键点位置预测信息进行监督,以及根据分类目标函数对所述关键点存在预测信息进行监督。
根据本申请实施例的再一方面,提供了一种关键点预测网络训练装置,包括:获取模块,用于获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;训练模块,用于使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;监督模块,用于根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督;判断模块,用于判断所述第一卷积神经网络的迭代损失率是否满足设定条件;执行模块,用于若所述第一卷积神经网络的迭代损失率满足设定条件,则完成对所述第一卷积神经网络的训练。
可选地,所述执行模块,还用于若所述第一卷积神经网络的迭代损失率不满足设定条件,则根据所述训练模块获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
可选地,所述监督模块,用于根据回归目标函数对所述关键点位置预测信息进行监督,以及根据分类目标函数对所述关键点存在预测信息进行监督。
可选地,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:所述特征提取层用于提取所述样本图像的特征信息;所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
可选地,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,所述第一卷积神经网络包括:全卷积神经网络。
根据本申请实施例的再一方面,提供了一种图像处理装置,包括:检测模块,用于采用如本申请上述任一实施例所述的关键点预测装置检测图像,或者,采用如本申请上述任一实施例所述的关键点预测网络训练装置训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;处理模块,用于根据所述通用物体的关键点预测结果对所述图像进行处理。
可选地,所述处理模块,包括:位置确定子模块,用于根据所述通用物体的关键点预 测结果确定所述图像中的通用物体的位置。
可选地,所述处理模块,包括:特征提取子模块,用于根据所述通用物体的关键点预测结果提取所述图像中的通用物体的物体特征。
可选地,所述处理模块,包括:姿态估计子模块,用于根据所述通用物体的关键点预测结果估计所述图像中的通用物体的姿态。
可选地,所述处理模块,包括:物体跟踪子模块,用于根据所述通用物体的关键点预测结果跟踪所述图像中的通用物体。
可选地,所述处理模块,包括:物体识别子模块,用于根据所述通用物体的关键点预测结果识别所述图像中的通用物体。
可选地,所述处理模块,包括:物体渲染子模块,用于根据所述通用物体的关键点预测结果渲染所述图像中的通用物体。
根据本申请实施例的再一方面,提供了一种电子设备,包括:处理器和存储器;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例所述的物体关键点预测方法对应的操作;或者,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例所述的物体关键点预测网络训练方法对应的操作;或者,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例所述的图像处理方法对应的操作。
根据本申请实施例的再一方面,提供了另一种电子设备,包括:
处理器和本申请上述任一实施例所述的键点预测装置;在处理器运行所述键点预测装置时,本申请上述任一实施例所述的键点预测装置中的单元被运行;或者
处理器和本申请上述任一实施例所述的关键点预测网络训练装置;在处理器运行所述关键点预测网络训练装置时,本申请上述任一实施例所述的关键点预测网络训练装置中的单元被运行;或者
处理器和本申请上述任一实施例所述的图像处理装置;在处理器运行所述的图像处理装置时,本申请上述任一实施例所述的图像处理装置中的单元被运行。
根据本申请实施例的再一方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如本申请上述任一实施例所述的物体关键点预测方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如本申请上述任一实施例所述的物体关键点预测网络训练方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如本申请上述任一实施例所述的图像处理方法中各步骤的指令。
根据本申请实施例的再一方面,提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现如本申请上述任一实施例所述的物体关键点预测方法中各步骤的操作、或者如本申请上述任一实施例所述的物体关键点预测网络训练方法中各步骤的操作、或者如本申请上述任一实施例所述的图像处理方法中各步骤的操作。
根据本申请实施例提供的技术方案,使用含有通用物体的关键点标注信息的样本图像训练第一卷积神经网络,训练得到的第一卷积神经网络用于预测图像中的通用物体的关键点,其中,通用物品可以理解为自然场景中的通用物体,例如人体、交通工具、动植物、家具等物体,关键点如人的头部、手部、躯干位置;车辆的前窗、轮胎、底盘、后箱位置等,第一卷积神经网络扩大了物体类别的关键点预测范围。另外,通过第一卷积神经网络预测图像中通用物体的关键点,可以直接得到通用物体的关键点位置预测信息和关键点存在预测信息,其中,关键点位置预测信息为待预测的关键点在图像中的位置信息,关键点存在预测信息为待预测的关键点在图像中是否存在的信息,当得到待预测的关键点在图像中的位置信息,并确定待预测的关键点在图像中存在时,即可预测该关键点,实现了结合 通用物体的关键点位置预测信息和关键点存在预测信息综合判断图像中的通用物体的关键点。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是根据本申请一实施例的关键点预测方法的流程图;
图2是根据本申请另一实施例的关键点预测方法的流程图;
图3是根据本申请另一实施例的关键点预测方法中训练第一卷积神经网络的流程图;
图4是根据本申请一实施例的第一卷积神经网络的训练原理示意图;
图5是根据本申请一实施例的图像处理方法的流程图;
图6是根据本申请一实施例的关键点预测装置的结构框图;
图7是根据本申请另一实施例的关键点预测装置的结构框图;
图8是根据本申请一实施例的关键点预测网络训练装置的结构框图;
图9是根据本申请一实施例的图像处理装置的结构框图;
图10是根据本申请一实施例的一种电子设备的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外可选说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
下面结合附图(若干附图中相同的标号表示相同的元素)和实施例,对本申请实施例的可选实施方式作进一步详细说明。以下实施例用于说明本申请,但不用来限制本申请的 范围。
本领域技术人员可以理解,本申请实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
参照图1,示出了根据本申请一实施例的关键点预测方法的流程图。为了便于理解本申请实施例提供的关键点预测方案,本实施例以预测图像中的通用物体的关键点为示例场景,以移动终端或PC为本实施例的关键点预测方法的执行者为例,对本实施例的关键点预测方法进行说明。但本领域技术人员应当明了的是,其它应用场景,以及其它具有数据采集、处理和传输功能的设备均可参照本实施例实现本申请实施例提供的关键点预测方案,本申请实施例对实现场景不做限制。本申请各实施例中,通用物体是指自然场景中存在的任意物体,例如人体、交通工具、动植物、家具等物体。通用物体的关键点是指,该通用物体所属物体类别中多数物体具有的、可用于区分物体类别的局部位置,例如,人的头部、手部、躯干等位置,车辆的前窗、轮胎、底盘、后箱等位置。本实施例的关键点预测方法包括:
步骤S100、采用第一卷积神经网络检测图像,获得图像的特征信息。
本实施例中,第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络,第一卷积神经网络用于预测图像中的通用物体的关键点信息。本申请各实施例中卷积神经网络为包括有卷积处理能力的神经网络,可包括卷积层、或者卷积层和非卷积层。
其中,图像可以是来源于图像采集设备的图像,由一帧一帧的图像组成,也可以为单独的一帧图像或者一幅图像,还可以来源于其他设备,图像包括静态图像或视频中的图像。
可以将图像输入至第一卷积神经网络,得到图像的特征信息。其中,该特征信息包括通用物体的特征信息。
在一个可选示例中,该步骤S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的检测模块600执行。
步骤S102、采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果。
本申请各实施例中,第一卷积神经网络例如可以包括但不限于:输入层、特征提取层和关键点预测卷积层。其中,输入层用于输入图像,特征提取层用于提取图像的特征信息,关键点预测卷积层用于对特征信息进行卷积操作,得到关键点预测结果,该关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
在一个可选示例中,该步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的预测模块602执行。
根据本实施例提供的关键点预测方法,使用训练完毕的第一卷积神经网络可以从图像中预测出通用物体的关键点预测结果,本实施例中的第一卷积神经网络由使用含有通用物体的关键点标注信息的样本图像训练得到,第一卷积神经网络可以预测多个类别的物体的关键点,扩大了物体关键点预测卷积神经网络的适用范围。
参照图2,示出了根据本申请另一实施例的关键点预测方法的步骤流程图。本实施例仍以移动终端或PC为例,对本实施例提供的关键点预测方法进行说明,其它设备和场景可参照本实施例执行。本实施例在于强调与上述实施例的不同之处,相同之处可以参照上述实施例中的介绍和说明,在此不再赘述。本实施例的关键点预测方法包括:
步骤S200、训练第一卷积神经网络。
参照图3,示出了训练第一卷积神经网络的流程图,本步骤S200可以包括以下子步骤:
子步骤S300、获取含有通用物体的关键点标注信息的样本图像。
其中,含有通用物体的关键点标注信息的样本图像可以是来源于图像采集设备的视频图像,由一帧一帧的图像组成,也可以为单独的一帧图像或者一幅图像,还可以来源于其 他设备,然后在样本图像中进行标注操作。其中,关键点标注信息包括关键点位置标注信息和关键点存在标注信息。可以在样本图像中标注是否存在通用物体的关键点,以及通用物体的关键点位置,本实施例对含有通用物体的关键点标注信息的样本图像的来源和获得途径等不做限制。
在一个可选示例中,该子步骤S300可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块800执行。
子步骤S302、使用样本图像训练第一卷积神经网络,获得第一卷积神经网络针对样本图像的通用物体的关键点位置预测信息和关键点存在预测信息。
关键点位置预测信息可以理解为通用物体的关键点在样本图像中的位置信息,例如,坐标点信息或者像素点信息。关键点存在预测信息可以理解为通用物体的关键点在样本图像中的存在信息,例如,某通用物体的某个关键点在样本图像中存在或者不存在,本实施例对通用物体的关键点位置预测信息和关键点存在预测信息的内容不做限制。
本实施例中,第一卷积神经网络可以包括:输入层、特征提取层和关键点预测卷积层。其中,关键点预测卷积层可以包括第一关键点预测卷积层和第二关键点预测卷积层,第一关键点预测卷积层和第二关键点预测卷积层分别与特征提取层连接。输入层用于输入样本图像,特征提取层用于提取样本图像的特征信息,第一关键点预测卷积层用于对特征信息进行卷积操作,得到关键点位置预测信息,第二关键点预测卷积层用于对特征信息进行卷积操作,得到关键点存在预测信息。
在其中一个可选示例中,第一关键点预测卷积层的卷积核为1*1*2N,第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量,N为大于或等于1的整数。
训练第一卷积神经网络,即训练第一卷积神经网络的输入层、特征提取层、第一关键点预测卷积层和第二关键点预测卷积层的参数,再根据训练得到的参数构建第一卷积神经网络。
可以使用含有通用物体的关键点标注信息的样本图像对第一卷积神经网络进行训练,为使得训练得到的第一卷积神经网络更加准确,在选择样本图像时可以选择多种情况下的样本图像,样本图像中可以包括标注有通用物体的关键点标注信息的样本图像,还可以包括未标注有通用物体的关键点标注信息的样本图像。
本实施例中的第一卷积神经网络可以包括全卷积神经网络,即全部由卷积层构成的神经网络。然而,本申请各实施例中的第一卷积神经网络可以为任意结构的卷积神经网络,本实施例只是以此为例进行说明,实际应用中第一卷积神经网络并不仅限于此,例如,还可以是其他二分类或多分类卷积神经网络。
在一个可选示例中,该子步骤S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的训练模块802执行。
子步骤S304、根据目标函数对关键点位置预测信息和关键点存在预测信息进行监督。
在对第一卷积神经网络的训练过程中,根据目标函数对关键点位置预测信息和关键点存在预测信息同时进行监督,例如,根据回归目标函数,如平滑L1目标函数、euclidean目标函数等对关键点位置预测信息进行监督,并且根据分类目标函数,如softmax目标函数、cross entropy目标函数,hinge目标函数等对关键点存在预测信息进行监督。
在一个可选示例中,该子步骤S304可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的监督模块804执行。
子步骤S306、判断第一卷积神经网络的迭代损失率是否满足设定条件。
若满足,则执行子步骤S308;若不满足,则执行子步骤S310。
通过对第一卷积神经网络的迭代训练,判断第一卷积神经网络的迭代损失率是否满足设定条件。其中,设定条件可以为在第一卷积神经网络的预定次数的训练过程中,迭代损 失率保持不变,或者迭代损失率的变化保持在一定范围内,本实施例对设定条件的可选内容不做限制。
在一个可选示例中,该子步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的判断模块806执行。
子步骤S308、完成对第一卷积神经网络的训练。
在一个可选示例中,该子步骤S308可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的执行模块808执行。
子步骤S310、根据获得的关键点位置预测信息和关键点存在预测信息调整第一卷积神经网络的参数,直至迭代损失率满足设定条件。
若第一卷积神经网络的迭代损失率不满足设定条件,表示获得的关键点位置预测信息和关键点存在预测信息与样本图像中标记的关键点位置标注信息和关键点存在标注信息不对应,也就是说,当前训练的第一卷积神经网络的参数不够准确,需要对第一卷积神经网络的参数进行相应的调整,本实施例对第一卷积神经网络的参数的调整过程不做限制。当参数调整后的第一卷积神经网络的迭代损失率满足设定条件,则确定完成对第一卷积神经网络的训练。
在一个可选示例中,该子步骤S310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的执行模块808执行。
在一个可选示例中,该步骤S200可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的关键点预测网络训练装置执行。
步骤S202、采用第一卷积神经网络检测图像,获得图像的特征信息。
在一个可选示例中,该步骤S202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的检测模块700执行。
步骤S204、采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果。
在一个可选示例中,该步骤S204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的预测模块702执行。
参照图4,示出了根据本申请一实施例第一卷积神经网络的训练原理示意图,由于全卷积神经网络的运算速度相比非全卷积神经网络的运算速度快,本实施例中的第一卷积神经网络选用全卷积神经网络,本实施例对第一卷积神经网络选用全卷积神经网络或者非全卷积神经网络不做限制。本实施例中,以第一卷积神经网络为全卷积神经网络为例进行说明,将样本图像输入至全卷积神经网络,从全卷积神经网络的特征提取层得到样本图像的特征信息,再通过第一关键点预测卷积层对特征信息进行卷积操作,得到关键点位置预测信息,即使关键点不存在,第一关键点预测卷积层也会为不存在的关键点随机预测一个关键点位置信息。另外,通过第二关键点预测卷积层对特征信息进行卷积操作,得到关键点存在预测信息。在全卷积神经网络的训练过程中,使用平滑L1目标函数对训练关键点位置预测信息的回归任务进行监督,使用softmax目标函数对训练关键点存在预测信息的分类任务进行监督。最终根据关键点位置预测信息关键点存在预测信息预测样本图像中的通用物体的关键点。
根据本实施例提供的关键点预测方法,使用训练完毕的第一卷积神经网络可以从图像中预测出通用物体的关键点预测结果,本实施例中的第一卷积神经网络由使用含有通用物体的关键点标注信息的样本图像训练得到,第一卷积神经网络可以预测多个类别的物体的关键点,扩大了物体关键点预测卷积神经网络的适用范围。
本实施例中的关键点预测结果可以包括关键点位置预测信息和关键点存在预测信息,其中,关键点位置预测信息为待预测的关键点在图像中的位置信息,关键点存在预测信息为待预测的关键点在图像中是否存在的信息,当得到待预测的关键点在图像中的位置信息, 并确定待预测的关键点在图像中存在时,即可预测该关键点,实现了结合通用物体的关键点位置预测信息和关键点存在预测信息综合判断图像中的通用物体的关键点。
本实施例中的关键点预测结果不仅包括关键点位置预测信息还包括关键点存在预测信息,增加了关键点是否存在的预测,提高了关键点预测的准确性。
本实施例中的第一卷积神经网络可以包括第一关键点预测卷积层和第二关键点预测卷积层,第一关键点预测卷积层和第二关键点预测卷积层分别与特征提取层连接,特征提取层提取到图像的特征信息后,第一关键点预测卷积层和第二关键点预测卷积层可以并行对特征信息进行卷积操作,第一关键点预测卷积层和第二关键点预测卷积层属于并行关系,即关键点位置预测信息和关键点存在预测信息同时预测。如果待预测的关键点的总数量为N,则关键点位置预测信息包括[x1,y1,x2,y2,…,xN,yN],其中,x1,x2,…,xN表示关键点在样本图像中的横坐标信息,y1,y2,…,yN表示关键点在样本图像中的纵坐标信息。关键点存在预测信息包括[s1,s2,…,sN],其中,s1,s2,…,sN表示关键点在样本图像中的存在信息。与通过串行方式获得关键点位置预测信息和关键点存在预测信息相比,提高了第一卷积神经网络预测关键点的效率。
本实施例中的第一关键点预测卷积层用于对特征信息进行卷积操作,得到关键点位置预测信息,由于关键点位置预测信息包括横坐标信息和纵坐标信息,因此第一关键点预测卷积层的卷积核为1*1*2N。第二关键点预测卷积层用于对特征信息进行卷积操作,得到关键点存在预测信息,由于关键点存在预测信息为关键点存在或者关键点不存在,因此第二关键点预测卷积层的卷积核为1*1*N。
本实施例中的第一卷积神经网络可以为全卷积神经网络,由于全卷积神经网络的运算速度比非全卷积神经网络的运算速度快,因此,利用第一卷积神经网络预测关键点的速度比利用非全卷积神经网络预测关键点的速度快。
参照图5,示出了根据本申请一实施例的图像处理方法的步骤流程图。本实施例的图像处理方法可以由任意具有数据采集、处理和传输功能的设备执行,包括但不限于移动终端和PC等。本实施例的图像处理方法包括:
步骤S500、对图像进行关键点预测,得到图像中的通用物体的关键点预测结果。
本实施例中对图像进行关键点预测可以采用上述实施例中训练得到第一卷积神经网络对图像进行关键点预测,或者采用上述实施例中的关键点预测方法对图像进行关键点预测,预测过程可以参照上述实施例中的相关介绍和说明,本实施例在此不再赘述。
在一个可选示例中,该步骤S500可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的检测模块900执行。
步骤S502、根据通用物体的关键点预测结果对图像进行处理。
在一个可选示例中,该步骤S502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的处理模块902执行。
本实施例中,可以根据通用物体的关键点预测结果对图像进行多种处理,如根据通用物体的关键点预测结果确定图像中的通用物体的位置;根据通用物体的关键点预测结果提取图像中的通用物体的物体特征;根据通用物体的关键点预测结果估计图像中的通用物体的姿态;根据通用物体的关键点预测结果跟踪图像中的通用物体;根据通用物体的关键点预测结果识别图像中的通用物体;根据通用物体的关键点预测结果渲染图像中的通用物体等等。
本实施例仅以根据通用物体的关键点预测结果确定图像中的通用物体的位置为例进行说明,其他根据通用物体的关键点预测结果对图像进行处理的方式可以参照常用的处理方式执行,本实施例对根据通用物体的关键点预测结果对图像进行处理所采用的技术手段不做限制。
例如,从图像中预测到通用物体的关键点位置预测信息和关键点存在预测信息,如预 测到猫的头部、躯干、四肢关节和尾部的位置信息和存在信息,则可以根据上述猫的关键点信息确定猫的位置、朝向和姿态等等。
本申请实施例提供的上述任一方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的上述任一方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提供的上述任一方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
参照图6,示出了根据本申请一实施例的关键点预测装置的结构框图。本实施例提供的关键点预测装置包括:检测模块600,用于采用第一卷积神经网络检测图像,获得图像的特征信息;第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;预测模块602,用于采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果,关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
根据本实施例提供的关键点预测装置,使用训练完毕的第一卷积神经网络可以从图像中预测出通用物体的关键点预测结果,本实施例中的第一卷积神经网络由使用含有通用物体的关键点标注信息的样本图像训练得到,第一卷积神经网络可以预测多个类别的物体的关键点,扩大了物体关键点预测卷积神经网络的适用范围。
参照图7,示出了根据本申请另一实施例的关键点预测装置的结构框图。本实施例提供的关键点预测装置包括:检测模块700,用于采用第一卷积神经网络检测图像,获得图像的特征信息;第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;预测模块702,用于采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果,关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
可选地,第一卷积神经网络可以包括特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,第一关键点预测卷积层和第二关键点预测卷积层分别与特征提取层连接,其中,特征提取层用于提取图像的特征信息;第一关键点预测卷积层用于对特征信息进行卷积操作,得到关键点位置预测信息;第二关键点预测卷积层用于对特征信息进行卷积操作,得到关键点存在预测信息。
可选地,第一关键点预测卷积层的卷积核为1*1*2N,第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,第一卷积神经网络包括全卷积神经网络。
可选地,本实施例提供的关键点预测装置还包括:训练模块704,用于训练第一卷积神经网络,训练模块704包括:获取子模块7040,用于获取样本图像,关键点标注信息包括关键点位置标注信息和关键点存在标注信息;训练子模块7042,用于使用样本图像训练第一卷积神经网络,获得第一卷积神经网络针对样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;监督子模块7044,用于根据目标函数对关键点位置预测信息和关键点存在预测信息进行监督;判断子模块7046,用于判断第一卷积神经网络的迭代损失率是否满足设定条件;执行子模块7048,用于若第一卷积神经网络的迭代损失率满足设定条件,则完成对第一卷积神经网络的训练。
可选地,执行子模块7048,还用于若第一卷积神经网络的迭代损失率不满足设定条件,则根据训练子模块7042获得的关键点位置预测信息和关键点存在预测信息调整第一卷积神经网络的参数,直至迭代损失率满足设定条件。
可选地,监督子模块7044,用于根据回归目标函数对关键点位置预测信息进行监督,同时,根据分类目标函数对关键点存在预测信息进行监督。
根据本实施例提供的关键点预测装置,使用训练完毕的第一卷积神经网络可以从图像中预测出通用物体的关键点预测结果,本实施例中的第一卷积神经网络由使用含有通用物体的关键点标注信息的样本图像训练得到,第一卷积神经网络可以预测多个类别的物体的关键点,扩大了物体关键点预测卷积神经网络的适用范围。
本实施例中的关键点预测结果包括关键点位置预测信息和关键点存在预测信息,其中,关键点位置预测信息为待预测的关键点在图像中的位置信息,关键点存在预测信息为待预测的关键点在图像中是否存在的信息,当得到待预测的关键点在图像中的位置信息,并确定待预测的关键点在图像中存在时,即可预测该关键点,实现了结合通用物体的关键点位置预测信息和关键点存在预测信息综合判断图像中的通用物体的关键点。
本实施例中的关键点预测结果不仅包括关键点位置预测信息还包括关键点存在预测信息,增加了关键点是否存在的预测,提高了关键点预测的准确性。
本实施例中的第一卷积神经网络包括第一关键点预测卷积层和第二关键点预测卷积层,第一关键点预测卷积层和第二关键点预测卷积层分别与特征提取层连接,特征提取层提取到图像的特征信息后,第一关键点预测卷积层和第二关键点预测卷积层可以并行对特征信息进行卷积操作,第一关键点预测卷积层和第二关键点预测卷积层属于并行关系,即关键点位置预测信息和关键点存在预测信息同时预测。如果待预测的关键点的总数量为N,则关键点位置预测信息包括[x1,y1,x2,y2,…,xN,yN],其中,x1,x2,…,xN表示关键点在样本图像中的横坐标信息,y1,y2,…,yN表示关键点在样本图像中的纵坐标信息。关键点存在预测信息包括[s1,s2,…,sN],其中,s1,s2,…,sN表示关键点在样本图像中的存在信息。本实施例提高了第一卷积神经网络预测关键点的效率。
本实施例中的第一关键点预测卷积层用于对特征信息进行卷积操作,得到关键点位置预测信息,由于关键点位置预测信息包括横坐标信息和纵坐标信息,因此第一关键点预测卷积层的卷积核为1*1*2N。第二关键点预测卷积层用于对特征信息进行卷积操作,得到关键点存在预测信息,由于关键点存在预测信息为关键点存在或者关键点不存在,因此第二关键点预测卷积层的卷积核为1*1*N。
本实施例中的第一卷积神经网络可以为全卷积神经网络,由于全卷积神经网络的运算速度比非全卷积神经网络的运算速度快,因此,利用第一卷积神经网络预测关键点的速度比利用非全卷积神经网络预测关键点的速度快。
参照图8,示出了根据本申请一实施例的关键点预测网络训练装置的结构框图。
本实施例提供的关键点预测网络训练装置包括:获取模块800,用于获取含有通用物体的关键点标注信息的样本图像,其中,关键点标注信息包括关键点位置标注信息和关键点存在标注信息;训练模块802,用于使用样本图像训练第一卷积神经网络,获得第一卷积神经网络针对样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;监督模块804,用于根据目标函数对关键点位置预测信息和关键点存在预测信息进行监督;判断模块806,用于判断第一卷积神经网络的迭代损失率是否满足设定条件;执行模块808,用于若第一卷积神经网络的迭代损失率满足设定条件,则完成对第一卷积神经网络的训练。
可选地,执行模块808,还用于若第一卷积神经网络的迭代损失率不满足设定条件,则根据训练模块802获得的关键点位置预测信息和关键点存在预测信息调整第一卷积神经网络的参数,直至迭代损失率满足设定条件。
可选地,监督模块804,用于根据回归目标函数对关键点位置预测信息进行监督,并且根据分类目标函数对关键点存在预测信息进行监督。
可选地,第一卷积神经网络可以包括特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,第一关键点预测卷积层和第二关键点预测卷积层分别与特征提取层连接; 其中,特征提取层用于提取样本图像的特征信息;第一关键点预测卷积层用于对特征信息进行卷积操作,得到关键点位置预测信息;第二关键点预测卷积层用于对特征信息进行卷积操作,得到关键点存在预测信息。
可选地,第一关键点预测卷积层的卷积核为1*1*2N,第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
可选地,第一卷积神经网络包括全卷积神经网络。
本实施例的关键点预测网络训练装置用于实现前述多个实施例中相应的关键点预测网络训练方法,并具有相应的方法实施例的有益效果,在此不再赘述。
参照图9,示出了根据本申请一实施例的图像处理装置的结构框图。本实施例提供的图像处理装置包括:检测模块900,用于采用如本申请上述任一实施例的关键点预测装置检测图像,或者,采用如本申请上述任一实施例的关键点预测网络训练装置训练而得的第一卷积神经网络检测图像,得到图像的通用物体的关键点预测结果,该关键点预测结果包括关键点位置预测信息和关键点存在预测信息;处理模块902,用于根据通用物体的关键点预测结果对图像进行处理。
可选地,处理模块902包括:位置确定子模块9020,用于根据通用物体的关键点预测结果确定图像中的通用物体的位置。
可选地,处理模块902包括:特征提取子模块9021,用于根据通用物体的关键点预测结果提取图像中的通用物体的物体特征。
可选地,处理模块902包括:姿态估计子模块9022,用于根据通用物体的关键点预测结果估计图像中的通用物体的姿态。
可选地,处理模块902包括:物体跟踪子模块9023,用于根据通用物体的关键点预测结果跟踪图像中的通用物体。
可选地,处理模块902包括:物体识别子模块9024,用于根据通用物体的关键点预测结果识别图像中的通用物体。
可选地,处理模块902包括:物体渲染子模块9025,用于根据通用物体的关键点预测结果渲染图像中的通用物体。
本实施例的图像处理装置用于实现前述多个实施例中相应的图像处理方法,并具有相应的方法实施例的有益效果,在此不再赘述。
另外,本申请实施例还提供了一种电子设备,包括:处理器和存储器;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例所述的物体关键点预测方法对应的操作;或者,
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例所述的物体关键点预测网络训练方法对应的操作;或者,
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例所述的图像处理方法对应的操作。
另外,本申请实施例还提供了另一种电子设备,包括:
处理器和本申请上述任一实施例所述的关键点预测装置;在处理器运行所述关键点预测装置时,本申请上述任一实施例所述的关键点预测装置中的单元被运行;或者
处理器和本申请上述任一实施例所述的关键点预测网络训练装置;在处理器运行所述关键点预测网络训练装置时,本申请上述任一实施例所述的关键点预测网络训练装置中的单元被运行;或者
处理器和本申请上述任一实施例所述的图像处理装置;在处理器运行所述图像处理装置时,本申请上述任一实施例所述的图像处理装置中的单元被运行。本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图10,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备1000的 结构示意图:如图10所示,电子设备1000包括一个或多个处理器、通信元件等,一个或多个处理器例如:一个或多个中央处理单元(CPU)1001,和/或一个或多个图像处理器(GPU)1013等,处理器可以根据存储在只读存储器(ROM)1002中的可执行指令或者从存储部分1008加载到随机访问存储器(RAM)1003中的可执行指令而执行各种适当的动作和处理。通信元件包括通信组件1012和/或通信接口1009。其中,通信组件1012可包括但不限于网卡,网卡可包括但不限于IB(Infiniband)网卡,通信接口1009包括诸如LAN卡、调制解调器等的网络接口卡的通信接口,通信接口1009经由诸如因特网的网络执行通信处理。
处理器可与只读存储器1002和/或随机访问存储器1003中通信以执行可执行指令,通过通信总线1004与通信组件1012相连、并经通信组件1012与其他目标设备通信,从而完成本申请实施例提供的任一项关键点预测方法对应的操作,例如,采用第一卷积神经网络检测图像,获得图像的特征信息;第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果,关键点预测结果包括关键点位置预测信息和关键点存在预测信息。又例如,获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;若满足,则完成对所述第一卷积神经网络的训练。再例如,采用如上述任一实施例所述的关键点预测方法检测图像,或者,采用如上述任一实施例所述的关键点预测网络训练方法训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;根据所述通用物体的关键点预测结果对所述图像进行处理。
此外,在RAM 1003中,还可存储有装置操作所需的各种程序和数据。CPU1001或GPU1013、ROM1002以及RAM1003通过通信总线1004彼此相连。在有RAM1003的情况下,ROM1002为可选模块。RAM1003存储可执行指令,或在运行时向ROM1002中写入可执行指令,可执行指令使处理器执行上述通信方法对应的操作。输入/输出(I/O)接口1005也连接至通信总线1004。通信组件1012可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在通信总线链接上。
以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信接口1009。驱动器1010也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
需要说明的,如图10所示的架构仅为一种可选实现方式,在可选实践过程中,可根据实际需要对上述图10的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信元件可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请的保护范围。
特别地,根据本申请实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令。例如,程序代码可包括对应执行本申 请实施例提供的如下步骤对应的指令:采用第一卷积神经网络检测图像,获得图像的特征信息;第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;采用第一卷积神经网络根据特征信息预测图像的通用物体的关键点,获得图像的通用物体的关键点预测结果,关键点预测结果包括关键点位置预测信息和关键点存在预测信息。又例如,程序代码可包括对应执行本申请实施例提供的如下步骤对应的指令:获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;若满足,则完成对所述第一卷积神经网络的训练。再例如,程序代码可包括对应执行本申请实施例提供的如下步骤对应的指令:采用如上述任一实施例所述的关键点预测方法检测图像,或者,采用如上述任一实施例所述的关键点预测网络训练方法训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;根据所述通用物体的关键点预测结果对所述图像进行处理。在这样的实施例中,该计算机程序可以通过通信元件从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被处理器执行时,执行本申请实施例的方法中限定的上述功能。另外,本申请实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的物体关键点预测方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的物体关键点预测网络训练方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如本申请任一实施例所述的图像处理方法中各步骤的指令。
另外,本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一实施例所述的物体关键点预测方法中各步骤的操作、或者本申请任一实施例所述的物体关键点预测网络训练方法中各步骤的操作、或者如本申请任一实施例所述的图像处理方法中各步骤的操作。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请实施例的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请实施例的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请实施例的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请实施例的方法的程序的记录介质。
本申请实施例的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式,很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (44)

  1. 一种关键点预测方法,其特征在于,包括:
    采用第一卷积神经网络检测图像,获得所述图像的特征信息;所述第一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;
    采用所述第一卷积神经网络根据所述特征信息预测所述图像的通用物体的关键点,获得所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
  2. 根据权利要求1所述的方法,其特征在于,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:
    所述特征提取层用于提取所述图像的特征信息;
    所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;
    所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
  3. 根据权利要求2所述的方法,其特征在于,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述第一卷积神经网络包括:全卷积神经网络。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述第一卷积神经网络的训练,包括:
    获取所述样本图像,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;
    使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;
    根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;
    若满足,则完成对所述第一卷积神经网络的训练。
  6. 根据权利要求5所述的方法,其特征在于,所述第一卷积神经网络的训练,还包括:
    若不满足,则根据获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
  7. 根据权利要求5所述的方法,其特征在于,所述根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,包括:
    根据回归目标函数对所述关键点位置预测信息进行监督,根据分类目标函数对所述关键点存在预测信息进行监督。
  8. 一种关键点预测网络训练方法,其特征在于,包括:
    获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;
    使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;
    根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,判断所述第一卷积神经网络的迭代损失率是否满足设定条件;
    若满足,则完成对所述第一卷积神经网络的训练。
  9. 根据权利要求8所述的方法,其特征在于,还包括:
    若不满足,则根据获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
  10. 根据权利要求8所述的方法,其特征在于,所述根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督,包括:
    根据回归目标函数对所述关键点位置预测信息进行监督,根据分类目标函数对所述关键点存在预测信息进行监督。
  11. 根据权利要求8-10任一所述的方法,其特征在于,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:
    所述特征提取层用于提取所述样本图像的特征信息;
    所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;
    所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
  12. 根据权利要求11所述的方法,其特征在于,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
  13. 根据权利要求8-12任一所述的方法,其特征在于,所述第一卷积神经网络包括:全卷积神经网络。
  14. 一种图像处理方法,其特征在于,包括:
    采用如权利要求1-7任一所述的方法检测图像,或者,采用如权利要求8-13任一所述的方法训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;
    根据所述通用物体的关键点预测结果对所述图像进行处理。
  15. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果确定所述图像中的通用物体的位置。
  16. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果提取所述图像中的通用物体的物体特征。
  17. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果估计所述图像中的通用物体的姿态。
  18. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果跟踪所述图像中的通用物体。
  19. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果识别所述图像中的通用物体。
  20. 根据权利要求14所述的方法,其特征在于,所述根据所述通用物体的关键点预测结果对所述图像进行处理,包括:
    根据所述通用物体的关键点预测结果渲染所述图像中的通用物体。
  21. 一种关键点预测装置,其特征在于,包括:
    检测模块,用于采用第一卷积神经网络检测图像,获得所述图像的特征信息;所述第 一卷积神经网络为使用含有通用物体的关键点标注信息的样本图像训练得到的卷积神经网络;
    预测模块,用于采用所述第一卷积神经网络根据所述特征信息预测所述图像的通用物体的关键点,获得所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息。
  22. 根据权利要求21所述的装置,其特征在于,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:
    所述特征提取层用于提取所述图像的特征信息;
    所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;
    所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
  23. 根据权利要求22所述的装置,其特征在于,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
  24. 根据权利要求21-23任一所述的装置,其特征在于,所述第一卷积神经网络包括:全卷积神经网络。
  25. 根据权利要求21-24任一所述的装置,其特征在于,还包括:训练模块,用于训练所述第一卷积神经网络;所述训练模块包括:
    获取子模块,用于获取所述样本图像,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;
    训练子模块,用于使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;
    监督子模块,用于根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督;
    判断子模块,用于判断所述第一卷积神经网络的迭代损失率是否满足设定条件;
    执行子模块,用于若所述第一卷积神经网络的迭代损失率满足设定条件,则完成对所述第一卷积神经网络的训练。
  26. 根据权利要求25所述的装置,其特征在于,所述执行子模块,还用于若所述第一卷积神经网络的迭代损失率不满足设定条件,则根据所述训练子模块获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
  27. 根据权利要求25所述的装置,其特征在于,所述监督子模块,用于根据回归目标函数对所述关键点位置预测信息进行监督,以及根据分类目标函数对所述关键点存在预测信息进行监督。
  28. 一种关键点预测网络训练装置,其特征在于,包括:
    获取模块,用于获取含有通用物体的关键点标注信息的样本图像,其中,所述关键点标注信息包括关键点位置标注信息和关键点存在标注信息;
    训练模块,用于使用所述样本图像训练第一卷积神经网络,获得所述第一卷积神经网络针对所述样本图像的通用物体的关键点位置预测信息和关键点存在预测信息;
    监督模块,用于根据目标函数对所述关键点位置预测信息和所述关键点存在预测信息进行监督;
    判断模块,用于判断所述第一卷积神经网络的迭代损失率是否满足设定条件;
    执行模块,用于若所述第一卷积神经网络的迭代损失率满足设定条件,则完成对所述 第一卷积神经网络的训练。
  29. 根据权利要求28所述的装置,其特征在于,所述执行模块,还用于若所述第一卷积神经网络的迭代损失率不满足设定条件,则根据所述训练模块获得的所述关键点位置预测信息和所述关键点存在预测信息调整所述第一卷积神经网络的参数,直至所述迭代损失率满足所述设定条件。
  30. 根据权利要求28所述的装置,其特征在于,所述监督模块,用于根据回归目标函数对所述关键点位置预测信息进行监督,以及根据分类目标函数对所述关键点存在预测信息进行监督。
  31. 根据权利要求28-30任一所述的装置,其特征在于,所述第一卷积神经网络包括:特征提取层、第一关键点预测卷积层和第二关键点预测卷积层,所述第一关键点预测卷积层和所述第二关键点预测卷积层分别与所述特征提取层连接;其中:
    所述特征提取层用于提取所述样本图像的特征信息;
    所述第一关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点位置预测信息;
    所述第二关键点预测卷积层用于对所述特征信息进行卷积操作,得到所述关键点存在预测信息。
  32. 根据权利要求31所述的装置,其特征在于,所述第一关键点预测卷积层的卷积核为1*1*2N,所述第二关键点预测卷积层的卷积核为1*1*N,其中,N为待预测的关键点的总数量。
  33. 根据权利要求28-32任一所述的装置,其特征在于,所述第一卷积神经网络包括:全卷积神经网络。
  34. 一种图像处理装置,其特征在于,包括:
    检测模块,用于采用如权利要求21-27任一所述的装置检测图像,或者,采用如权利要求28-33任一所述的装置训练而得的第一卷积神经网络检测图像,得到所述图像的通用物体的关键点预测结果,所述关键点预测结果包括关键点位置预测信息和关键点存在预测信息;
    处理模块,用于根据所述通用物体的关键点预测结果对所述图像进行处理。
  35. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    位置确定子模块,用于根据所述通用物体的关键点预测结果确定所述图像中的通用物体的位置。
  36. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    特征提取子模块,用于根据所述通用物体的关键点预测结果提取所述图像中的通用物体的物体特征。
  37. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    姿态估计子模块,用于根据所述通用物体的关键点预测结果估计所述图像中的通用物体的姿态。
  38. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    物体跟踪子模块,用于根据所述通用物体的关键点预测结果跟踪所述图像中的通用物体。
  39. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    物体识别子模块,用于根据所述通用物体的关键点预测结果识别所述图像中的通用物体。
  40. 根据权利要求34所述的装置,其特征在于,所述处理模块,包括:
    物体渲染子模块,用于根据所述通用物体的关键点预测结果渲染所述图像中的通用物体。
  41. 一种电子设备,其特征在于,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-7任一所述的物体关键点预测方法对应的操作;或者
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求8-13任一所述的物体关键点预测网络训练方法对应的操作;或者
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求14-20任一所述的图像处理方法对应的操作。
  42. 一种电子设备,其特征在于,包括:
    处理器和权利要求21-27任一项所述的关键点预测装置;在处理器运行所述关键点预测装置时,权利要求21-27任一项所述的关键点预测装置中的单元被运行;或者
    处理器和权利要求28-33任一项所述的关键点预测网络训练装置;在处理器运行所述关键点预测网络训练装置时,权利要求28-33任一项所述的关键点预测网络训练装置中的单元被运行;或者
    处理器和权利要求34-40任一项所述的图像处理装置;在处理器运行所述的图像处理装置时,权利要求34-40任一项所述的图像处理装置中的单元被运行。
  43. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如权利要求1-7任一所述的物体关键点预测方法中各步骤的指令;或者
    当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如权利要求8-13任一所述的物体关键点预测网络训练方法中各步骤的指令;或者
    当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如权利要求14-20任一所述的图像处理方法中各步骤的指令。
  44. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现如权利要求1-7任一所述的物体关键点预测方法中各步骤的操作、或者如权利要求8-13任一所述的物体关键点预测网络训练方法中各步骤的操作、或者如权利要求14-20任一所述的图像处理方法中各步骤的操作。
PCT/CN2017/119877 2016-12-30 2017-12-29 关键点预测、网络训练及图像处理方法和装置、电子设备 WO2018121737A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611261431.9A CN108229489B (zh) 2016-12-30 2016-12-30 关键点预测、网络训练、图像处理方法、装置及电子设备
CN201611261431.9 2016-12-30

Publications (1)

Publication Number Publication Date
WO2018121737A1 true WO2018121737A1 (zh) 2018-07-05

Family

ID=62657284

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119877 WO2018121737A1 (zh) 2016-12-30 2017-12-29 关键点预测、网络训练及图像处理方法和装置、电子设备

Country Status (2)

Country Link
CN (1) CN108229489B (zh)
WO (1) WO2018121737A1 (zh)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472289A (zh) * 2018-10-09 2019-03-15 北京陌上花科技有限公司 关键点检测方法和设备
CN109614867A (zh) * 2018-11-09 2019-04-12 北京市商汤科技开发有限公司 人体关键点检测方法和装置、电子设备、计算机存储介质
CN110309706A (zh) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 人脸关键点检测方法、装置、计算机设备及存储介质
CN110910449A (zh) * 2019-12-03 2020-03-24 清华大学 识别物体三维位置的方法和系统
CN111191492A (zh) * 2018-11-15 2020-05-22 北京三星通信技术研究有限公司 信息估计、模型检索和模型对准方法和装置
CN111340043A (zh) * 2018-12-19 2020-06-26 北京京东尚科信息技术有限公司 关键点检测方法、系统、设备及存储介质
CN111353325A (zh) * 2018-12-20 2020-06-30 北京京东尚科信息技术有限公司 关键点检测模型训练方法及装置
CN111507334A (zh) * 2019-01-30 2020-08-07 中国科学院宁波材料技术与工程研究所 一种基于关键点的实例分割方法
CN111783986A (zh) * 2020-07-02 2020-10-16 清华大学 网络训练方法及装置、姿态预测方法及装置
CN111814588A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 行为检测方法以及相关设备、装置
CN111862189A (zh) * 2020-07-07 2020-10-30 北京海益同展信息科技有限公司 体尺信息确定方法、装置、电子设备和计算机可读介质
CN111950723A (zh) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 神经网络模型训练方法、图像处理方法、装置及终端设备
CN111967949A (zh) * 2020-09-22 2020-11-20 武汉博晟安全技术股份有限公司 基于Leaky-Conv & Cross安全课程推荐引擎排序算法
CN112115745A (zh) * 2019-06-21 2020-12-22 杭州海康威视数字技术股份有限公司 一种商品漏扫码行为识别方法、装置及系统
CN112150533A (zh) * 2019-06-28 2020-12-29 顺丰科技有限公司 物体体积计算方法、装置、设备、及存储介质
CN112287855A (zh) * 2020-11-02 2021-01-29 东软睿驰汽车技术(沈阳)有限公司 基于多任务神经网络的驾驶行为检测方法和装置
CN112348035A (zh) * 2020-11-11 2021-02-09 东软睿驰汽车技术(沈阳)有限公司 车辆关键点检测方法、装置及电子设备
CN113449718A (zh) * 2021-06-30 2021-09-28 平安科技(深圳)有限公司 关键点定位模型的训练方法、装置和计算机设备

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448007B (zh) * 2018-11-02 2020-10-09 北京迈格威科技有限公司 图像处理方法、图像处理装置及存储介质
CN109410253B (zh) * 2018-11-06 2019-11-26 北京字节跳动网络技术有限公司 用于生成信息的方法、装置、电子设备和计算机可读介质
CN109697446B (zh) * 2018-12-04 2021-12-07 北京字节跳动网络技术有限公司 图像关键点提取方法、装置、可读存储介质及电子设备
CN109670591B (zh) * 2018-12-14 2022-09-27 深圳市商汤科技有限公司 一种神经网络的训练方法及图像匹配方法、装置
CN109657615B (zh) * 2018-12-19 2021-11-02 腾讯科技(深圳)有限公司 一种目标检测的训练方法、装置及终端设备
CN109522910B (zh) * 2018-12-25 2020-12-11 浙江商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN110478911A (zh) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 基于机器学习的智能游戏车无人驾驶方法及智能车、设备
CN110533006B (zh) * 2019-09-11 2022-03-25 北京小米智能科技有限公司 一种目标跟踪方法、装置及介质
CN110909655A (zh) * 2019-11-18 2020-03-24 上海眼控科技股份有限公司 一种识别视频事件的方法及设备
CN111179247A (zh) * 2019-12-27 2020-05-19 上海商汤智能科技有限公司 三维目标检测方法及其模型的训练方法及相关装置、设备
CN111481208B (zh) * 2020-04-01 2023-05-12 中南大学湘雅医院 一种应用于关节康复的辅助系统、方法及存储介质
CN111523422B (zh) * 2020-04-15 2023-10-10 北京华捷艾米科技有限公司 一种关键点检测模型训练方法、关键点检测方法和装置
CN116721412B (zh) * 2023-04-17 2024-05-03 之江实验室 一种自下而上的基于结构性先验的豆荚关键点检测方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156715A (zh) * 2014-09-01 2014-11-19 杭州朗和科技有限公司 一种终端设备、信息采集方法及装置
US20150347822A1 (en) * 2014-05-29 2015-12-03 Beijing Kuangshi Technology Co., Ltd. Facial Landmark Localization Using Coarse-to-Fine Cascaded Neural Networks
CN105354565A (zh) * 2015-12-23 2016-02-24 北京市商汤科技开发有限公司 基于全卷积网络人脸五官定位与判别的方法及系统
CN105760836A (zh) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 基于深度学习的多角度人脸对齐方法、系统及拍摄终端

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718879A (zh) * 2016-01-19 2016-06-29 华南理工大学 基于深度卷积神经网络的自由场景第一视角手指关键点检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150347822A1 (en) * 2014-05-29 2015-12-03 Beijing Kuangshi Technology Co., Ltd. Facial Landmark Localization Using Coarse-to-Fine Cascaded Neural Networks
CN104156715A (zh) * 2014-09-01 2014-11-19 杭州朗和科技有限公司 一种终端设备、信息采集方法及装置
CN105354565A (zh) * 2015-12-23 2016-02-24 北京市商汤科技开发有限公司 基于全卷积网络人脸五官定位与判别的方法及系统
CN105760836A (zh) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 基于深度学习的多角度人脸对齐方法、系统及拍摄终端

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472289A (zh) * 2018-10-09 2019-03-15 北京陌上花科技有限公司 关键点检测方法和设备
CN109614867A (zh) * 2018-11-09 2019-04-12 北京市商汤科技开发有限公司 人体关键点检测方法和装置、电子设备、计算机存储介质
CN111191492A (zh) * 2018-11-15 2020-05-22 北京三星通信技术研究有限公司 信息估计、模型检索和模型对准方法和装置
CN111340043A (zh) * 2018-12-19 2020-06-26 北京京东尚科信息技术有限公司 关键点检测方法、系统、设备及存储介质
CN111353325A (zh) * 2018-12-20 2020-06-30 北京京东尚科信息技术有限公司 关键点检测模型训练方法及装置
CN111507334B (zh) * 2019-01-30 2024-03-12 中国科学院宁波材料技术与工程研究所 一种基于关键点的实例分割方法
CN111507334A (zh) * 2019-01-30 2020-08-07 中国科学院宁波材料技术与工程研究所 一种基于关键点的实例分割方法
CN110309706A (zh) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 人脸关键点检测方法、装置、计算机设备及存储介质
CN111950723B (zh) * 2019-05-16 2024-05-21 武汉Tcl集团工业研究院有限公司 神经网络模型训练方法、图像处理方法、装置及终端设备
CN111950723A (zh) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 神经网络模型训练方法、图像处理方法、装置及终端设备
CN112115745A (zh) * 2019-06-21 2020-12-22 杭州海康威视数字技术股份有限公司 一种商品漏扫码行为识别方法、装置及系统
CN112150533A (zh) * 2019-06-28 2020-12-29 顺丰科技有限公司 物体体积计算方法、装置、设备、及存储介质
CN110910449A (zh) * 2019-12-03 2020-03-24 清华大学 识别物体三维位置的方法和系统
CN110910449B (zh) * 2019-12-03 2023-10-13 清华大学 识别物体三维位置的方法和系统
CN111814588B (zh) * 2020-06-18 2023-08-01 浙江大华技术股份有限公司 行为检测方法以及相关设备、装置
CN111814588A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 行为检测方法以及相关设备、装置
CN111783986A (zh) * 2020-07-02 2020-10-16 清华大学 网络训练方法及装置、姿态预测方法及装置
CN111862189A (zh) * 2020-07-07 2020-10-30 北京海益同展信息科技有限公司 体尺信息确定方法、装置、电子设备和计算机可读介质
CN111862189B (zh) * 2020-07-07 2023-12-05 京东科技信息技术有限公司 体尺信息确定方法、装置、电子设备和计算机可读介质
CN111967949A (zh) * 2020-09-22 2020-11-20 武汉博晟安全技术股份有限公司 基于Leaky-Conv & Cross安全课程推荐引擎排序算法
CN112287855A (zh) * 2020-11-02 2021-01-29 东软睿驰汽车技术(沈阳)有限公司 基于多任务神经网络的驾驶行为检测方法和装置
CN112287855B (zh) * 2020-11-02 2024-05-10 东软睿驰汽车技术(沈阳)有限公司 基于多任务神经网络的驾驶行为检测方法和装置
CN112348035A (zh) * 2020-11-11 2021-02-09 东软睿驰汽车技术(沈阳)有限公司 车辆关键点检测方法、装置及电子设备
CN112348035B (zh) * 2020-11-11 2024-05-24 东软睿驰汽车技术(沈阳)有限公司 车辆关键点检测方法、装置及电子设备
CN113449718A (zh) * 2021-06-30 2021-09-28 平安科技(深圳)有限公司 关键点定位模型的训练方法、装置和计算机设备

Also Published As

Publication number Publication date
CN108229489B (zh) 2020-08-11
CN108229489A (zh) 2018-06-29

Similar Documents

Publication Publication Date Title
WO2018121737A1 (zh) 关键点预测、网络训练及图像处理方法和装置、电子设备
CN108229478B (zh) 图像语义分割及训练方法和装置、电子设备、存储介质和程序
Sindagi et al. Prior-based domain adaptive object detection for hazy and rainy conditions
CN108460338B (zh) 人体姿态估计方法和装置、电子设备、存储介质、程序
CN108960036B (zh) 三维人体姿态预测方法、装置、介质及设备
US10936911B2 (en) Logo detection
US10817714B2 (en) Method and apparatus for predicting walking behaviors, data processing apparatus, and electronic device
CN106557778B (zh) 通用物体检测方法和装置、数据处理装置和终端设备
WO2018121690A1 (zh) 对象属性检测、神经网络训练、区域检测方法和装置
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
WO2019091464A1 (zh) 目标检测方法和装置、训练方法、电子设备和介质
WO2019205604A1 (zh) 图像处理方法、训练方法、装置、设备、介质及程序
WO2018202089A1 (zh) 关键点检测方法、装置、存储介质及电子设备
WO2018019126A1 (zh) 视频类别识别方法和装置、数据处理装置和电子设备
CN108229591B (zh) 神经网络自适应训练方法和装置、设备、程序和存储介质
CN108229353B (zh) 人体图像的分类方法和装置、电子设备、存储介质、程序
CN108229418B (zh) 人体关键点检测方法和装置、电子设备、存储介质和程序
WO2019128979A1 (zh) 关键帧调度方法和装置、电子设备、程序和介质
CN108491872B (zh) 目标再识别方法和装置、电子设备、程序和存储介质
CN108229494B (zh) 网络训练方法、处理方法、装置、存储介质和电子设备
CN118284905A (zh) 用于3d场景的可泛化语义分割的神经语义场
CN108154153B (zh) 场景分析方法和系统、电子设备
KR101700030B1 (ko) 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치
CN112766284A (zh) 图像识别方法和装置、存储介质和电子设备
CN109325512A (zh) 图像分类方法及装置、电子设备、计算机程序及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17888172

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17888172

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 17.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17888172

Country of ref document: EP

Kind code of ref document: A1