Nothing Special   »   [go: up one dir, main page]

WO2020062493A1 - Image processing method and apparatus - Google Patents

Image processing method and apparatus Download PDF

Info

Publication number
WO2020062493A1
WO2020062493A1 PCT/CN2018/115968 CN2018115968W WO2020062493A1 WO 2020062493 A1 WO2020062493 A1 WO 2020062493A1 CN 2018115968 W CN2018115968 W CN 2018115968W WO 2020062493 A1 WO2020062493 A1 WO 2020062493A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
pose
preset
candidate frame
target candidate
Prior art date
Application number
PCT/CN2018/115968
Other languages
French (fr)
Chinese (zh)
Inventor
胡耀全
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020062493A1 publication Critical patent/WO2020062493A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • Embodiments of the present application relate to the field of computer technology, and specifically to the field of Internet technology, and in particular, to an image processing method and device.
  • the embodiments of the present application provide an image processing method and device.
  • an embodiment of the present application provides an image processing method, including: obtaining an image of a posture of a labeled object, where the image includes at least two objects, different objects have different postures, and the posture is indicated by multiple key points; Based on the image and the annotation of the pose, the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network.
  • each object determines the coincidence degree of the candidate frame where each candidate pose is located and the labeled frame of the labeled pose, and use the candidate frame whose coincidence is greater than the preset coincidence degree threshold as the target candidate frame; for each labeled frame For each key point in the corresponding target candidate frame, the position average value of the key point in each target candidate frame is taken; the set of position average values of each key point is taken as a pose detected on the image.
  • the method before the image is input to the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network to determine the candidate pose of each object, the method further includes: presetting multiple presets in the target image.
  • the pose is clustered to obtain the key point set; each key point set is determined as the anchor point pose, where the key points included in different key point sets have different positions in the target image.
  • clustering a plurality of preset poses in a target image to obtain a key point set includes: clustering a multi-dimensional vector corresponding to each preset pose, wherein the multi-dimensional vectors corresponding to the preset pose are The number of dimensions of the vector is the same as the number of key points of the preset pose; each key point of the pose corresponding to the multi-dimensional vector of the cluster center is composed of a key point set.
  • taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point in each target candidate frame, in response to determining that the position of the key point is outside the callout frame, use a preset first preset weight as the weight of the key point in the target candidate frame; response After determining that the position of the key point is within the label box, a preset second preset weight is used as the weight of the key point in the target candidate frame, and the first preset weight is smaller than the second preset weight; based on the label The weight of the key point in each target candidate frame corresponding to the frame determines the average position of the key point in each target candidate frame.
  • taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point within each target candidate frame, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; and in response to determining that the distance is less than or equal to, based on the corresponding value of the labeled frame The weight of the key point in each target candidate frame is determined by determining the average position of the key point in each target candidate frame.
  • an embodiment of the present application provides an image processing apparatus, including: an obtaining unit configured to obtain an image of a posture of a labeled object, where the image includes at least two objects, different objects having different postures, Multiple key point instructions; training unit configured to train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, based on The previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold
  • the candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.
  • the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to determine each key point set as an anchor Point pose, where key points included in different key point sets have different positions in the target image.
  • a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set
  • a determining unit configured to determine each key point set as an anchor Point pose, where key points included in different key point sets have different positions in the target image.
  • the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
  • the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is outside the labeled box , Taking the preset first preset weight as the weight of the key point within the target candidate frame; and in response to determining that the position of the key point is within the callout frame, using the preset second preset weight as the target candidate The weight of the key point in the frame, the first preset weight is less than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled box, the key point in each target candidate frame is determined Average position.
  • the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled frame, determine whether the distance between the key point and the key point in the labeled pose is Less than or equal to the preset distance threshold; in response to determining that it is less than or equal to, based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining an average position of the key point in each target candidate frame.
  • an embodiment of the present application provides an electronic device including: one or more processors; a storage device configured to store one or more programs, and when one or more programs are executed by one or more processors , So that one or more processors implement the method as in any embodiment of the image processing method.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method as in any embodiment of the image processing method is implemented.
  • an image of the posture of the labeled object is obtained, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points.
  • the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor based on the convolutional neural network. Point pose to determine candidate poses for each object. Then, the coincidence degree of the candidate frame where each candidate pose is located with the labeled frame of the already-posted pose is determined, and the candidate frame whose coincidence degree is greater than a preset coincidence degree threshold is used as the target candidate frame.
  • each key point in the target candidate frame corresponding to each labeled frame an average position of the key point in each target candidate frame is taken. Finally, the set of position averages of each key point is used as a pose detected on the image.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of an image processing method according to the present application.
  • FIG. 3 is a schematic diagram of an application scenario of an image processing method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an image processing apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which an embodiment of an image processing method or an image processing apparatus of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as image processing applications, video applications, live broadcast applications, instant communication tools, mailbox clients, social platform software, and so on.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop computers and desktop computers.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, such as a background server that supports the terminal devices 101, 102, and 103.
  • the background server may analyze and process the acquired data such as the image of the labeled object posture, and feed back the processing result (such as a posture detected on the image) to the terminal device.
  • the image processing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103. Accordingly, the image processing apparatus may be provided in the server 105 or the terminal devices 101, 102, and 103.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the image processing method includes the following steps:
  • Step 201 Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
  • an execution subject for example, a server or a terminal device shown in FIG. 1
  • the image processing method may acquire an image of the posture of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
  • Step 202 Train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes steps 2021, 2022, and 2023, as follows:
  • Step 2021 input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
  • the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose in the convolutional neural network, the convolutional neural network obtains the candidate pose of each object.
  • the convolutional neural network includes a region candidate network (RPN).
  • the size and position of the anchor in the convolutional neural network in the image is fixed.
  • the execution subject may input the image into a regional candidate network, and the regional candidate network may determine a difference in size and position between the candidate pose and the anchor pose, and use the difference between the size and the position to Represents the size and position of each candidate pose.
  • the size here can be expressed by area or width, height or length, width, etc., and the position can be expressed by coordinates.
  • the execution subject described above may determine multiple candidate poses.
  • the execution subject can obtain the pose output by the convolutional neural network as the pose detected on the image, and determine the loss value of the pose and the labeled pose based on a preset loss function. Then use this loss value for training to get the trained convolutional neural network.
  • Step 2022 Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already labeled pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
  • the execution body may determine an overlap degree (Intersection Over Union, IOU) of a candidate frame where each candidate pose is located and a labeled frame of the already marked pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame.
  • the width and height of the frame of the gesture may be the width (or length) generated by the leftmost and rightmost coordinates of the key points included in the gesture, and the height (or width) generated by the uppermost and lowest coordinates.
  • the coincidence degree may be a ratio of an intersection between the candidate frame and the labeled frame and a union between the candidate frame and the labeled frame. If the overlap between the candidate frame and the labeled frame is large, it indicates that the candidate frame has a high accuracy in framing the object. In this way, the candidate frame can more accurately divide the object and the non-object.
  • Step 2023 For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
  • the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image.
  • the corresponding callout box and the target candidate box indicate the same object.
  • the same weight can be set for the positions of the respective key points to calculate the average position.
  • the weights set for the positions of the key points may also be different.
  • step 2023 for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
  • a preset first preset weight is taken as the target candidate frame.
  • the above-mentioned execution body may use a smaller weight for the coordinates of the position outside the callout box and a larger value for the coordinates of the position inside the callout box.
  • Weights For example, the weight of key point A, key point B, and key point C are in the callout box, callout box, and callout box, respectively. You can use weights 1, 1 for key point A, key point B, and key point C, respectively.
  • 0.5 to calculate the position average. Then the average position obtained is (1 ⁇ keypoint A position + 1 ⁇ keypoint B position + 0.5 ⁇ keypoint C position) / (1 + 1 + 0.5).
  • step 2023 for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
  • each key point in each target candidate frame corresponding to each labeled box determines whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; in response to determining that the distance is less than or equal to , Based on the weights of the key points in each target candidate frame corresponding to the labeled box, determine an average position of the key points in each target candidate frame.
  • the execution body may determine whether the distance between each key point in each target candidate frame corresponding to each labeled frame and the key point in the labeled pose in the labeled frame is less than or It is equal to a preset distance threshold, and thus the key point in each target candidate frame corresponding to the labeled frame is selected. That is, in these implementation manners, the key point in some target candidate frames does not participate in obtaining the position average. Specifically, if the distance between a key point in a target candidate frame corresponding to the labeled frame and the labeled key point is small, it is determined that the key point can participate in calculating a position average.
  • the three target candidate boxes a, b, and c corresponding to a callout box M have nasal tip key points, and the nasal tip key points in a, b, and c are different from the nasal tip key points labeled in the callout box M.
  • the distances are 1, 2 and 3, respectively. If the preset distance threshold is 2.5, then the distances 1 and 2 corresponding to the target candidate frames a and b are smaller than the preset distance threshold, so the key points of the nasal tip in the target candidate frames a and b can participate in calculating the position average.
  • These implementations can select a key point closer to the label box from a key point within each target candidate frame corresponding to the label box to determine the position average, and can avoid key points with large deviations from participating in the calculation of the position average, thereby Improve the accuracy of determining attitude.
  • FIG. 3 is a schematic diagram of an application scenario of the image processing method according to this embodiment.
  • the execution body 301 may obtain an image 302 of the posture of the labeled object, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points.
  • the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network. 303. Determine candidate poses 304 for each object.
  • each candidate pose from the image including at least two objects by the coincidence degree to select a target candidate frame that indicates the object is more accurate.
  • the average value of the key points is taken to accurately distinguish each pose in the image.
  • FIG. 4 illustrates a flowchart 400 of still another embodiment of an image processing method.
  • the process 400 of the image processing method includes the following steps:
  • Step 401 Cluster multiple preset poses in the target image to obtain a key point set.
  • an execution subject for example, a server or a terminal device shown in FIG. 1
  • the image processing method runs can obtain a target image and cluster multiple preset poses in the target image to obtain Key points collection.
  • the foregoing execution subject may cluster multiple preset poses in multiple ways. For example, the coordinates of the position of each key point can be clustered to obtain the clustering result of each key point.
  • the foregoing step 401 may include the following steps:
  • the preset pose may be represented by a multi-dimensional vector.
  • the vector of each dimension in the multi-dimensional vector corresponds to the coordinates of the position of a key point in the preset pose.
  • One or more cluster centers can be obtained by clustering.
  • the cluster center here is also a multi-dimensional vector.
  • the above-mentioned execution subject may form each key point of the gesture indicated by this multi-dimensional vector into a key point set.
  • Step 402 Determine each key point set as an anchor point pose, where key points included in different key point sets have different positions in the target image.
  • the above-mentioned execution body may determine each of the obtained key point sets as an anchor point posture. In this way, the position of each anchor point posture obtained is more differentiated. At the same time, this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose. In this way, in the process of detecting the posture, the deviation between the detected candidate posture and the anchor point posture can be reduced.
  • Step 403 Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
  • the above-mentioned execution subject may obtain an image of the posture of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points.
  • step 404 the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes steps 4041, 4042, and 4043, as follows:
  • Step 4041 Input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
  • the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose of the convolutional neural network, the candidate pose of each object is obtained by the convolutional neural network.
  • the convolutional neural network includes a regional candidate network. The size and position of the anchor pose in the image is fixed.
  • Step 4042 Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already identified pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
  • the execution body may determine the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already labeled pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame.
  • Step 4043 For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
  • the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image.
  • the postures of the respective anchor points obtained in this embodiment are more differentiated, which is beneficial to controlling the number of the postures of the anchor points while obtaining a wealth of anchor point postures. In this way, both the computation speed of the regional candidate network can be increased, and the deviation between the detected candidate pose and the anchor pose can be ensured to be small.
  • this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose, thereby further reducing the deviation between the detected candidate pose and the anchor point pose.
  • this application provides an embodiment of an image processing device.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device may specifically Used in various electronic equipment.
  • the image processing apparatus 500 in this embodiment includes an obtaining unit 501 and a training unit 502.
  • the obtaining unit 501 is configured to obtain an image of the pose of the labeled object, where the image includes at least two objects, the poses of different objects are different, and the pose is indicated by multiple key points
  • the training unit 502 is configured to be based on the image And labeling the pose, training the convolutional neural network to obtain the trained convolutional neural network
  • the training process includes: inputting the image into the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network, determining Candidate poses for each object; determine the overlap between the candidate frame where each candidate pose is located and the labeled frame with the labeled pose, and use the candidate frame with a coincidence greater than a preset coincidence threshold as the target candidate frame; for each labeled frame For each key point in the target candidate frame, an average position of the key point in each target candidate frame is taken; a set of position averages of the key points is used as
  • the obtaining unit 501 of the image processing apparatus 500 may obtain an image of the pose of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
  • the training unit 502 may input the acquired image into a convolutional neural network, so as to obtain candidate poses of each object from the convolutional neural network based on a previously set anchor pose in the convolutional neural network. Then, a candidate frame with a coincidence degree greater than a preset coincidence degree threshold is selected, and the selected candidate frame is used as a target candidate frame.
  • the above-mentioned execution body may also take, for each key point in the target candidate frame corresponding to each label frame, an average position of the key point in the candidate poses in each target candidate frame corresponding to the label frame.
  • the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to convert Each key point set is determined as an anchor point pose, where the key points included in different key point sets have different positions in the target image.
  • the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
  • the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is Outside the callout box, the preset first preset weight is taken as the weight of the key point in the target candidate box; in response to determining that the position of the keypoint is within the callout box, the preset second preset weight is set As the weight of the key point in the target candidate frame, the first preset weight is smaller than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining within each target candidate frame The average position of this keypoint.
  • the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled frame, determine the key point and the labeled pose Whether the distance of the key point in the target point is less than or equal to a preset distance threshold; in response to determining that the key point is less than or equal to, determine the key point in each target candidate frame based on the weight of the key point in each target candidate frame corresponding to the labeled box The average position of the point.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU and / or GPU) 601, which can be loaded into a random access memory (RAM) according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608
  • the program in 603 performs various appropriate actions and processes.
  • various programs and data required for the operation of the system 600 are also stored.
  • the central processing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion including a hard disk and the like 608; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit and a training unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases.
  • the obtaining unit may also be described as a “unit for obtaining an image of the posture of an already labeled object”.
  • the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium described above carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to obtain an image of the posture of the marked object, where the image contains at least two objects, different objects The pose is different, and the pose is indicated by multiple key points.
  • the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network.
  • the previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold
  • the candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the embodiments of the present application are an image processing method and apparatus. A specific embodiment of the method comprises: acquiring an image wherein the poses of the subjects are already labelled; on the basis of the images and the labelling of the poses, training a convolutional neural network to obtain a trained convolutional neural network, the training process comprising: inputting the image into the convolutional neural network and, on the basis of preset anchor poses of the convolutional neural network, determining candidate poses of each subject; setting candidate frames having a degree of coincidence greater than a preset degree of coincidence as target candidate frames; for each key point in a target candidate frame corresponding to each labelled frame, taking average position values of the key point in each target candidate frame; and setting the set of average position values of the key points as a pose detected for the image. In the present embodiment, the candidate poses are filtered by means of the degree of coincidence and the average values of the key points are taken in order to accurately distinguish the poses in an image.

Description

图像处理方法和装置Image processing method and device
本专利申请要求于2018年9月29日提交的、申请号为201811149818.4、申请人为北京字节跳动网络技术有限公司、发明名称为“图像处理方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of a Chinese patent application filed on September 29, 2018, with application number 201811149818.4, the applicant being Beijing BYTE Network Technology Co., Ltd., and the invention name "Image Processing Method and Device". Is incorporated by reference in its entirety.
技术领域Technical field
本申请实施例涉及计算机技术领域,具体涉及互联网技术领域,尤其涉及图像处理方法和装置。Embodiments of the present application relate to the field of computer technology, and specifically to the field of Internet technology, and in particular, to an image processing method and device.
背景技术Background technique
在确认人体关键点时,有时需要确认单人的关键点,有时则需要确认多个人中每个人的关键点。在相关技术中,检测多个人中每个人的关键点时,通常难以准确地检测出结果。When confirming the key points of the human body, sometimes it is necessary to confirm the key points of a single person, and sometimes it is necessary to confirm the key points of each of multiple people. In related technologies, when detecting a key point of each of a plurality of people, it is often difficult to accurately detect the result.
发明内容Summary of the Invention
本申请实施例提出了图像处理方法和装置。The embodiments of the present application provide an image processing method and device.
第一方面,本申请实施例提供了一种图像处理方法,包括:获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框;对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。In a first aspect, an embodiment of the present application provides an image processing method, including: obtaining an image of a posture of a labeled object, where the image includes at least two objects, different objects have different postures, and the posture is indicated by multiple key points; Based on the image and the annotation of the pose, the convolutional neural network is trained to obtain the trained convolutional neural network. The training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network. To determine the candidate pose of each object; determine the coincidence degree of the candidate frame where each candidate pose is located and the labeled frame of the labeled pose, and use the candidate frame whose coincidence is greater than the preset coincidence degree threshold as the target candidate frame; for each labeled frame For each key point in the corresponding target candidate frame, the position average value of the key point in each target candidate frame is taken; the set of position average values of each key point is taken as a pose detected on the image.
在一些实施例中,在将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态之前,方法还包括:对目标图像中的多个预设姿态进行聚类,得到关键点集合;将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在目标图像中的位置不同。In some embodiments, before the image is input to the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network to determine the candidate pose of each object, the method further includes: presetting multiple presets in the target image. The pose is clustered to obtain the key point set; each key point set is determined as the anchor point pose, where the key points included in different key point sets have different positions in the target image.
在一些实施例中,对目标图像中的多个预设姿态进行聚类,得到关键点集合,包括:对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;将聚类中心的多维向量所对应的姿态的各个关键点组成关键点集合。In some embodiments, clustering a plurality of preset poses in a target image to obtain a key point set includes: clustering a multi-dimensional vector corresponding to each preset pose, wherein the multi-dimensional vectors corresponding to the preset pose are The number of dimensions of the vector is the same as the number of key points of the preset pose; each key point of the pose corresponding to the multi-dimensional vector of the cluster center is composed of a key point set.
在一些实施例中,对于每个标注框所对应的目标候选框的每一个关键点,取各个目标候选框内候选姿态中的该关键点的位置平均值,包括:对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some embodiments, for each key point of the target candidate frame corresponding to each labeled frame, taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point in each target candidate frame, in response to determining that the position of the key point is outside the callout frame, use a preset first preset weight as the weight of the key point in the target candidate frame; response After determining that the position of the key point is within the label box, a preset second preset weight is used as the weight of the key point in the target candidate frame, and the first preset weight is smaller than the second preset weight; based on the label The weight of the key point in each target candidate frame corresponding to the frame determines the average position of the key point in each target candidate frame.
在一些实施例中,对于每个标注框所对应的目标候选框的每一个关键点,取各个目标候选框内候选姿态中的该关键点的位置平均值,包括:对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some embodiments, for each key point of the target candidate frame corresponding to each labeled frame, taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point within each target candidate frame, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; and in response to determining that the distance is less than or equal to, based on the corresponding value of the labeled frame The weight of the key point in each target candidate frame is determined by determining the average position of the key point in each target candidate frame.
第二方面,本申请实施例提供了一种图像处理装置,包括:获取单元,被配置成获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;训练单元,被配置成基于图像和对姿态的标注,对卷积神经网络进行训练, 得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框;对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。In a second aspect, an embodiment of the present application provides an image processing apparatus, including: an obtaining unit configured to obtain an image of a posture of a labeled object, where the image includes at least two objects, different objects having different postures, Multiple key point instructions; training unit configured to train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network. The training process includes: inputting the image into the convolutional neural network, based on The previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold The candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.
在一些实施例中,装置还包括:聚类单元,被配置成对目标图像中的多个预设姿态进行聚类,得到关键点集合;确定单元,被配置成将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在目标图像中的位置不同。In some embodiments, the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to determine each key point set as an anchor Point pose, where key points included in different key point sets have different positions in the target image.
在一些实施例中,聚类单元,进一步被配置成:对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;将聚类中心的多维向量所对应的预设姿态的各个关键点组成关键点集合。In some embodiments, the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
在一些实施例中,训练单元,进一步被配置成:被配置成对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some embodiments, the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is outside the labeled box , Taking the preset first preset weight as the weight of the key point within the target candidate frame; and in response to determining that the position of the key point is within the callout frame, using the preset second preset weight as the target candidate The weight of the key point in the frame, the first preset weight is less than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled box, the key point in each target candidate frame is determined Average position.
在一些实施例中,训练单元,进一步被配置成:对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some embodiments, the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled frame, determine whether the distance between the key point and the key point in the labeled pose is Less than or equal to the preset distance threshold; in response to determining that it is less than or equal to, based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining an average position of the key point in each target candidate frame.
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序 被一个或多个处理器执行,使得一个或多个处理器实现如图像处理方法中任一实施例的方法。According to a third aspect, an embodiment of the present application provides an electronic device including: one or more processors; a storage device configured to store one or more programs, and when one or more programs are executed by one or more processors , So that one or more processors implement the method as in any embodiment of the image processing method.
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如图像处理方法中任一实施例的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method as in any embodiment of the image processing method is implemented.
本申请实施例提供的图像处理方案,首先,获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示。之后,基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态。然后,确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框。而后,对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值。最后,将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。本实施例能够从包括至少两个对象的图像中,通过重合度对各个候选姿态进行筛选,以选取指示对象更加准确的目标候选框。并且,取得关键点的平均值准确地分辨出图像中的各个姿态。In the image processing solution provided in the embodiment of the present application, first, an image of the posture of the labeled object is obtained, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points. After that, the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network. The training process includes: inputting the image into the convolutional neural network, and the previously set anchor based on the convolutional neural network. Point pose to determine candidate poses for each object. Then, the coincidence degree of the candidate frame where each candidate pose is located with the labeled frame of the already-posted pose is determined, and the candidate frame whose coincidence degree is greater than a preset coincidence degree threshold is used as the target candidate frame. Then, for each key point in the target candidate frame corresponding to each labeled frame, an average position of the key point in each target candidate frame is taken. Finally, the set of position averages of each key point is used as a pose detected on the image. In this embodiment, it is possible to filter each candidate pose from the image including at least two objects by the coincidence degree to select a target candidate frame that indicates the object is more accurate. And, the average value of the key points is taken to accurately distinguish each pose in the image.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments with reference to the following drawings:
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的图像处理方法的一个实施例的流程图;2 is a flowchart of an embodiment of an image processing method according to the present application;
图3是根据本申请的图像处理方法的一个应用场景的示意图;3 is a schematic diagram of an application scenario of an image processing method according to the present application;
图4是根据本申请的图像处理方法的又一个实施例的流程图;4 is a flowchart of another embodiment of an image processing method according to the present application;
图5是根据本申请的图像处理装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of an image processing apparatus according to the present application;
图6是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The following describes the present application in detail with reference to the accompanying drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the related invention, rather than limiting the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the drawings and embodiments.
图1示出了可以应用本申请的图像处理方法或图像处理装置的实施例的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which an embodiment of an image processing method or an image processing apparatus of the present application can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如图像处理应用、视频类应用、直播应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102, 103, such as image processing applications, video applications, live broadcast applications, instant communication tools, mailbox clients, social platform software, and so on.
这里的终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 here may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop computers and desktop computers. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103提供支持的后台服务器。后台服务器可以对获取到的已标注对象的姿态的图像等数据进行分析等处理,并将处理结果(例如对图像检测到的一个姿态)反馈给终端设备。The server 105 may be a server that provides various services, such as a background server that supports the terminal devices 101, 102, and 103. The background server may analyze and process the acquired data such as the image of the labeled object posture, and feed back the processing result (such as a posture detected on the image) to the terminal device.
需要说明的是,本申请实施例所提供的图像处理方法可以由服务 器105或者终端设备101、102、103执行,相应地,图像处理装置可以设置于服务器105或者终端设备101、102、103中。It should be noted that the image processing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103. Accordingly, the image processing apparatus may be provided in the server 105 or the terminal devices 101, 102, and 103.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本申请的图像处理方法的一个实施例的流程200。该图像处理方法,包括以下步骤:With continued reference to FIG. 2, a flowchart 200 of an embodiment of an image processing method according to the present application is shown. The image processing method includes the following steps:
步骤201,获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示。Step 201: Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
在本实施例中,图像处理方法的执行主体(例如图1所示的服务器或终端设备)可以获取已标注对象的姿态的图像。在图像中,对象的姿态被标注出来。这里的对象可以是人、人脸、猫、物品等等。具体地,可以通过关键点的坐标来表示姿态。举例来说,人在呈站立姿态和下蹲姿态时,鼻尖关键点的坐标与脚尖关键点的坐标的距离不同。In this embodiment, an execution subject (for example, a server or a terminal device shown in FIG. 1) of the image processing method may acquire an image of the posture of the labeled object. In the image, the object's pose is labeled. The objects here can be people, faces, cats, objects, and so on. Specifically, the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
步骤202,基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括步骤2021、步骤2022和步骤2023,如下:Step 202: Train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network. The training process includes steps 2021, 2022, and 2023, as follows:
步骤2021,将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态。Step 2021: input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
在本实施例中,上述执行主体可以将获取的图像输入卷积神经网络,从而基于卷积神经网络中的在先设置的锚点姿态,由卷积神经网络得到各个对象的候选姿态(proposal)。具体地,卷积神经网络中包括区域候选网络(Region Proposal Network,RPN)。卷积神经网络中的锚点姿态(anchor)在图像中的尺寸和位置是固定的。上述执行主体可以将上述图像输入区域候选网络,区域候选网络可以确定候选姿态与锚点姿态之间的尺寸的差值和位置的差值,并利用上述的尺寸的差值与位置的差值来表示各个候选姿态的尺寸和位置。这里的尺寸可以采用面积或宽、高或长、宽等来表示,位置可以采用坐标来表示。对于图像中的每一个对象,上述执行主体可以确定多个候选姿态。In this embodiment, the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose in the convolutional neural network, the convolutional neural network obtains the candidate pose of each object. . Specifically, the convolutional neural network includes a region candidate network (RPN). The size and position of the anchor in the convolutional neural network in the image is fixed. The execution subject may input the image into a regional candidate network, and the regional candidate network may determine a difference in size and position between the candidate pose and the anchor pose, and use the difference between the size and the position to Represents the size and position of each candidate pose. The size here can be expressed by area or width, height or length, width, etc., and the position can be expressed by coordinates. For each object in the image, the execution subject described above may determine multiple candidate poses.
在训练的过程中,上述执行主体可以获得卷积神经网络输出的姿态作为对上述图像检测到的姿态,并基于预设的损失函数,确定该姿 态与标注的姿态的损失值。之后利用该损失值进行训练,以得到训练后的卷积神经网络。During the training process, the execution subject can obtain the pose output by the convolutional neural network as the pose detected on the image, and determine the loss value of the pose and the labeled pose based on a preset loss function. Then use this loss value for training to get the trained convolutional neural network.
步骤2022,确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框。Step 2022: Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already labeled pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
在本实施例中,上述执行主体可以确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度(Intersection over Union,IOU)。之后,上述执行主体可以选取重合度大于预设重合度阈值的候选框,并将所选取的候选框作为目标候选框。具体地,姿态的框的宽度和高度可以是姿态所包括的关键点的最左坐标、最右坐标所生成的宽度(或者长度),和最上坐标、最下坐标所生成的高度(或者宽度)。重合度可以是候选框和标注框之间的交集与候选框和标注框之间的并集的比值。如果候选框与标注框的重合度较大,表明该候选框对对象进行框定的准确度较高,这样,通过该候选框能够更准确地划分对象和非对象。In this embodiment, the execution body may determine an overlap degree (Intersection Over Union, IOU) of a candidate frame where each candidate pose is located and a labeled frame of the already marked pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame. Specifically, the width and height of the frame of the gesture may be the width (or length) generated by the leftmost and rightmost coordinates of the key points included in the gesture, and the height (or width) generated by the uppermost and lowest coordinates. . The coincidence degree may be a ratio of an intersection between the candidate frame and the labeled frame and a union between the candidate frame and the labeled frame. If the overlap between the candidate frame and the labeled frame is large, it indicates that the candidate frame has a high accuracy in framing the object. In this way, the candidate frame can more accurately divide the object and the non-object.
步骤2023,对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。Step 2023: For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
在本实施例中,上述执行主体可以对于每个标注框所对应的目标候选框内的每一个关键点,取该标注框所对应的各个目标候选框内候选姿态中的该关键点的位置平均值。从而,上述执行主体可以将该标注框所对应的目标候选框的各个关键点的位置平均值的集合作为对图像检测到的一个姿态。相对应的标注框与目标候选框指示相同的对象。In this embodiment, for each key point in the target candidate frame corresponding to each labeled frame, the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image. The corresponding callout box and the target candidate box indicate the same object.
具体地,可以对各个关键点的位置设置相同的权重,以计算位置平均值。此外,对各个关键点的位置所设置的权重也可以存在不同。Specifically, the same weight can be set for the positions of the respective key points to calculate the average position. In addition, the weights set for the positions of the key points may also be different.
需要说明的是,虽然对目标候选框内姿态的每一个关键点都取位置平均值,但是本实施例存在部分目标候选框内的该关键点不参与取得位置平均值的可能。It should be noted that although a position average is taken for each key point of the posture in the target candidate frame, in this embodiment, there is a possibility that the key point in some target candidate frames does not participate in obtaining the position average.
在本实施例的一些可选的实现方式中,步骤2023中的对于每个标注框所对应的目标候选框内的每一个关键点,取至少两个目标候选框 内的该关键点的位置平均值,可以包括:In some optional implementations of this embodiment, in step 2023, for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each label box, in response to determining that the position of the key point is outside the label box, a preset first preset weight is taken as the target candidate frame. The weight of the key point; for each key point in each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is within the labeled box, using a preset second preset weight as The weight of the key point in the target candidate frame, the first preset weight is smaller than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining the weight of each key candidate frame The average position of this keypoint.
在这些可选的实现方式中,上述执行主体可以在计算位置平均值的时候,对在标注框以外的位置的坐标采用较小的权重,并对在标注框以内的位置的坐标采用较大的权重。举例来说,关键点A、关键点B和关键点C的权重分别在标注框内、标注框内和标注框外,可以分别对关键点A、关键点B和关键点C采用权重1、1和0.5来计算位置平均值。那么取得的位置平均值为(1×关键点A位置+1×关键点B位置+0.5×关键点C位置)/(1+1+0.5)。In these alternative implementations, when calculating the position average, the above-mentioned execution body may use a smaller weight for the coordinates of the position outside the callout box and a larger value for the coordinates of the position inside the callout box. Weights. For example, the weight of key point A, key point B, and key point C are in the callout box, callout box, and callout box, respectively. You can use weights 1, 1 for key point A, key point B, and key point C, respectively. And 0.5 to calculate the position average. Then the average position obtained is (1 × keypoint A position + 1 × keypoint B position + 0.5 × keypoint C position) / (1 + 1 + 0.5).
这些实现方式可以差别化地获取不同目标候选框的权重。因为在标注框外的关键点往往准确度较低,这样的权重设定方式可以降低这些关键点的权重以获取更准确的关键点的位置平均值,进而准确地确定姿态。These implementations can obtain the weights of different target candidate frames differently. Because the key points outside the label box are often less accurate, such a weight setting method can reduce the weight of these key points to obtain a more accurate average position of the key points, and then accurately determine the attitude.
在本实施例的一些可选的实现方式中,步骤2023中的对于每个标注框所对应的目标候选框内的每一个关键点,取至少两个目标候选框内的该关键点的位置平均值,可以包括:In some optional implementations of this embodiment, in step 2023, for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each labeled box, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; in response to determining that the distance is less than or equal to , Based on the weights of the key points in each target candidate frame corresponding to the labeled box, determine an average position of the key points in each target candidate frame.
在这些可选的实现方式中,上述执行主体可以确定每个标注框所对应的每个目标候选框内的每个关键点与该标注框内已标注的姿态中 该关键点的距离是否小于或等于预设距离阈值,并由此对该标注框所对应的各个目标候选框内的该关键点进行取舍。也即在这些实现方式中存在部分目标候选框内的该关键点不参与取得位置平均值。具体地,如果该标注框所对应的某个目标候选框内的某个关键点与所标注的该关键点之间的距离较小,则确定该关键点可以参与计算位置平均值。如果该标注框所对应的某个目标候选框内存在某个关键点与所标注的该关键点之间的距离较大,则表示卷积神经网络得到的候选姿态中的该关键点准确度较差,可以确定该关键点不参与计算位置平均值。In these optional implementation manners, the execution body may determine whether the distance between each key point in each target candidate frame corresponding to each labeled frame and the key point in the labeled pose in the labeled frame is less than or It is equal to a preset distance threshold, and thus the key point in each target candidate frame corresponding to the labeled frame is selected. That is, in these implementation manners, the key point in some target candidate frames does not participate in obtaining the position average. Specifically, if the distance between a key point in a target candidate frame corresponding to the labeled frame and the labeled key point is small, it is determined that the key point can participate in calculating a position average. If there is a large distance between a key point and a labeled key point in a target candidate frame corresponding to the labeled box, it means that the key point in the candidate pose obtained by the convolutional neural network is more accurate. Poor, it can be determined that the key point is not involved in calculating the position average.
举例来说,一个标注框M所对应的三个目标候选框a、b和c内都有鼻尖关键点,a、b和c内的鼻尖关键点与标注框M内所标注的鼻尖关键点的距离分别为1、2和3。如果预设距离阈值为2.5,那么目标候选框a、b所分别对应的距离1、2均小于该预设距离阈值,所以目标候选框a、b内的鼻尖关键点可以参与计算位置平均值。For example, the three target candidate boxes a, b, and c corresponding to a callout box M have nasal tip key points, and the nasal tip key points in a, b, and c are different from the nasal tip key points labeled in the callout box M. The distances are 1, 2 and 3, respectively. If the preset distance threshold is 2.5, then the distances 1 and 2 corresponding to the target candidate frames a and b are smaller than the preset distance threshold, so the key points of the nasal tip in the target candidate frames a and b can participate in calculating the position average.
这些实现方式能够从标注框对应的各个目标候选框内的某个关键点中选取距离标注框较近的关键点确定位置平均值,能够避免偏差较大的关键点参与位置平均值的计算,进而提高确定姿态的准确度。These implementations can select a key point closer to the label box from a key point within each target candidate frame corresponding to the label box to determine the position average, and can avoid key points with large deviations from participating in the calculation of the position average, thereby Improve the accuracy of determining attitude.
继续参见图3,图3是根据本实施例的图像处理方法的应用场景的一个示意图。在图3的应用场景中,执行主体301可以获取已标注对象的姿态的图像302,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示。基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态303,确定各个对象的候选姿态304。确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框305。对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值306。将各个关键点的位置平均值的集合作为对图像检测到的一个姿态307。With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the image processing method according to this embodiment. In the application scenario of FIG. 3, the execution body 301 may obtain an image 302 of the posture of the labeled object, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points. Based on the image and the annotation of the pose, the convolutional neural network is trained to obtain the trained convolutional neural network. The training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network. 303. Determine candidate poses 304 for each object. Determine the coincidence degree of the candidate frame where each candidate pose is located and the labeled frame of the already-posted pose, and use the candidate frame whose coincidence degree is greater than a preset coincidence degree threshold as the target candidate frame 305. For each key point in the target candidate frame corresponding to each labeled frame, an average position 306 of the position of the key point in each target candidate frame is taken. A set of position averages of each key point is taken as a pose 307 detected on the image.
本实施例能够从包括至少两个对象的图像中,通过重合度对各个候选姿态进行筛选,以选取指示对象更加准确的目标候选框。并且,取得关键点的平均值准确地分辨出图像中的各个姿态。In this embodiment, it is possible to filter each candidate pose from the image including at least two objects by the coincidence degree to select a target candidate frame that indicates the object is more accurate. And, the average value of the key points is taken to accurately distinguish each pose in the image.
进一步参考图4,其示出了图像处理方法的又一个实施例的流程400。该图像处理方法的流程400,包括以下步骤:Further reference is made to FIG. 4, which illustrates a flowchart 400 of still another embodiment of an image processing method. The process 400 of the image processing method includes the following steps:
步骤401,对目标图像中的多个预设姿态进行聚类,得到关键点集合。Step 401: Cluster multiple preset poses in the target image to obtain a key point set.
在本实施例中,图像处理方法运行于其上的执行主体(例如图1所示的服务器或终端设备)可以获取目标图像,并对目标图像中的多个预设姿态进行聚类,以得到关键点集合。具体地,上述执行主体可以采用多种方式对多个预设姿态进行聚类。比如,可以对每一个关键点的位置的坐标进行聚类,以得到各个关键点的聚类结果。In this embodiment, an execution subject (for example, a server or a terminal device shown in FIG. 1) on which the image processing method runs can obtain a target image and cluster multiple preset poses in the target image to obtain Key points collection. Specifically, the foregoing execution subject may cluster multiple preset poses in multiple ways. For example, the coordinates of the position of each key point can be clustered to obtain the clustering result of each key point.
在本实施例的一些可选的实现方式中,上述步骤401可以包括以下步骤:In some optional implementation manners of this embodiment, the foregoing step 401 may include the following steps:
对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;将聚类中心的多维向量所对应的姿态的各个关键点组成关键点集合。Cluster the multi-dimensional vectors corresponding to each preset pose, where the number of dimensions of the multi-dimensional vector corresponding to the preset pose is the same as the number of key points of the preset pose; Key points make up a key point set.
在这些实现方式中,预设姿态可以采用多维向量表示。多维向量中的每一个维度的向量都对应预设姿态中的一个关键点的位置的坐标。通过聚类可以得到一个或多个聚类中心。这里的聚类中心也是一个多维向量。上述执行主体可以将这个多维向量所指示的姿态的各个关键点组成关键点集合。In these implementations, the preset pose may be represented by a multi-dimensional vector. The vector of each dimension in the multi-dimensional vector corresponds to the coordinates of the position of a key point in the preset pose. One or more cluster centers can be obtained by clustering. The cluster center here is also a multi-dimensional vector. The above-mentioned execution subject may form each key point of the gesture indicated by this multi-dimensional vector into a key point set.
步骤402,将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在目标图像中的位置不同。Step 402: Determine each key point set as an anchor point pose, where key points included in different key point sets have different positions in the target image.
在本实施例中,上述执行主体可以将得到的各个关键点集合确定为锚点姿态。这样,所得到的各个锚点姿态的位置是更加差别化的。同时,本实施例也能够对多个预设姿态进行聚类,以得到准确的锚点姿态。这样,在检测姿态的过程中,能够减小检测得到的候选姿态与锚点姿态的偏差。In this embodiment, the above-mentioned execution body may determine each of the obtained key point sets as an anchor point posture. In this way, the position of each anchor point posture obtained is more differentiated. At the same time, this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose. In this way, in the process of detecting the posture, the deviation between the detected candidate posture and the anchor point posture can be reduced.
步骤403,获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示。Step 403: Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
在本实施例中,上述执行主体可以获取已标注对象的姿态的图像。 在图像中,对象的姿态被标注出来。这里的对象可以是人、人脸、猫、物品等等。具体地,可以通过关键点的坐标来表示姿态。In this embodiment, the above-mentioned execution subject may obtain an image of the posture of the labeled object. In the image, the object's pose is labeled. The objects here can be people, faces, cats, objects, and so on. Specifically, the posture can be represented by the coordinates of the key points.
步骤404,基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括步骤4041、步骤4042和步骤4043,如下:In step 404, the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network. The training process includes steps 4041, 4042, and 4043, as follows:
步骤4041,将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态。Step 4041: Input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
在本实施例中,上述执行主体可以将获取的图像输入卷积神经网络,从而基于卷积神经网络的在先设置的锚点姿态,由卷积神经网络得到各个对象的候选姿态。具体地,卷积神经网络中包括区域候选网络。锚点姿态在图像中的尺寸和位置是固定的。In this embodiment, the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose of the convolutional neural network, the candidate pose of each object is obtained by the convolutional neural network. Specifically, the convolutional neural network includes a regional candidate network. The size and position of the anchor pose in the image is fixed.
步骤4042,确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框。Step 4042: Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already identified pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
在本实施例中,上述执行主体可以确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度。之后,上述执行主体可以选取重合度大于预设重合度阈值的候选框,并将所选取的候选框作为目标候选框。In this embodiment, the execution body may determine the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already labeled pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame.
步骤4043,对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。Step 4043: For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
在本实施例中,上述执行主体可以对于每个标注框所对应的目标候选框内的每一个关键点,取该标注框所对应的各个目标候选框内候选姿态中的该关键点的位置平均值。从而,上述执行主体可以将该标注框所对应的目标候选框的各个关键点的位置平均值的集合作为对图像检测到的一个姿态。In this embodiment, for each key point in the target candidate frame corresponding to each labeled frame, the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image.
本实施例所得到的各个锚点姿态是更加差别化的,有利于在获得丰富的锚点姿态的同时,控制锚点姿态的数量。这样,既能够提高区域候选网络的运算速度,也能够确保检测得到的候选姿态与锚点姿态的偏差较小。并且,本实施例也能够对多个预设姿态进行聚类,得到 准确的锚点姿态,从而进一步减小检测得到的候选姿态与锚点姿态的偏差。The postures of the respective anchor points obtained in this embodiment are more differentiated, which is beneficial to controlling the number of the postures of the anchor points while obtaining a wealth of anchor point postures. In this way, both the computation speed of the regional candidate network can be increased, and the deviation between the detected candidate pose and the anchor pose can be ensured to be small. In addition, this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose, thereby further reducing the deviation between the detected candidate pose and the anchor point pose.
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种图像处理装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 5, as an implementation of the methods shown in the foregoing figures, this application provides an embodiment of an image processing device. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device may specifically Used in various electronic equipment.
如图5所示,本实施例的图像处理装置500包括:获取单元501和训练单元502。其中,获取单元501,被配置成获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;训练单元502,被配置成基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框;对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。As shown in FIG. 5, the image processing apparatus 500 in this embodiment includes an obtaining unit 501 and a training unit 502. Wherein, the obtaining unit 501 is configured to obtain an image of the pose of the labeled object, where the image includes at least two objects, the poses of different objects are different, and the pose is indicated by multiple key points; the training unit 502 is configured to be based on the image And labeling the pose, training the convolutional neural network to obtain the trained convolutional neural network, the training process includes: inputting the image into the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network, determining Candidate poses for each object; determine the overlap between the candidate frame where each candidate pose is located and the labeled frame with the labeled pose, and use the candidate frame with a coincidence greater than a preset coincidence threshold as the target candidate frame; for each labeled frame For each key point in the target candidate frame, an average position of the key point in each target candidate frame is taken; a set of position averages of the key points is used as a pose detected on the image.
在一些实施例中,图像处理装置500的获取单元501可以获取已标注对象的姿态的图像。在图像中,对象的姿态被标注出来。这里的对象可以是人、人脸、猫、物品等等。具体地,可以通过关键点的坐标来表示姿态。举例来说,人在呈站立姿态和下蹲姿态时,鼻尖关键点的坐标与脚尖关键点的坐标的距离不同。In some embodiments, the obtaining unit 501 of the image processing apparatus 500 may obtain an image of the pose of the labeled object. In the image, the object's pose is labeled. The objects here can be people, faces, cats, objects, and so on. Specifically, the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
在一些实施例中,训练单元502可以将获取的图像输入卷积神经网络,从而基于卷积神经网络中的在先设置的锚点姿态,由卷积神经网络得到各个对象的候选姿态。之后,选取重合度大于预设重合度阈值的候选框,并将所选取的候选框作为目标候选框。上述执行主体还可以对于每个标注框所对应的目标候选框内的每一个关键点,取该标注框所对应的各个目标候选框内候选姿态中的该关键点的位置平均值。In some embodiments, the training unit 502 may input the acquired image into a convolutional neural network, so as to obtain candidate poses of each object from the convolutional neural network based on a previously set anchor pose in the convolutional neural network. Then, a candidate frame with a coincidence degree greater than a preset coincidence degree threshold is selected, and the selected candidate frame is used as a target candidate frame. The above-mentioned execution body may also take, for each key point in the target candidate frame corresponding to each label frame, an average position of the key point in the candidate poses in each target candidate frame corresponding to the label frame.
在本实施例的一些可选的实现方式中,装置还包括:聚类单元,被配置成对目标图像中的多个预设姿态进行聚类,得到关键点集合; 确定单元,被配置成将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在目标图像中的位置不同。In some optional implementations of this embodiment, the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to convert Each key point set is determined as an anchor point pose, where the key points included in different key point sets have different positions in the target image.
在一些实施例中,聚类单元,进一步被配置成:对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;将聚类中心的多维向量所对应的预设姿态的各个关键点组成关键点集合。In some embodiments, the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
在本实施例的一些可选的实现方式中,训练单元,进一步被配置成:对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some optional implementations of this embodiment, the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is Outside the callout box, the preset first preset weight is taken as the weight of the key point in the target candidate box; in response to determining that the position of the keypoint is within the callout box, the preset second preset weight is set As the weight of the key point in the target candidate frame, the first preset weight is smaller than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining within each target candidate frame The average position of this keypoint.
在本实施例的一些可选的实现方式中,训练单元,进一步被配置成:对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。In some optional implementations of this embodiment, the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled frame, determine the key point and the labeled pose Whether the distance of the key point in the target point is less than or equal to a preset distance threshold; in response to determining that the key point is less than or equal to, determine the key point in each target candidate frame based on the weight of the key point in each target candidate frame corresponding to the labeled box The average position of the point.
下面参考图6,其示出了适于用来实现本申请实施例的电子设备的计算机系统600的结构示意图。图6示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device according to an embodiment of the present application. The electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
如图6所示,计算机系统600包括中央处理单元(CPU和/或GPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。中央处理单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU and / or GPU) 601, which can be loaded into a random access memory (RAM) according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608 The program in 603 performs various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The central processing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示屏(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion including a hard disk and the like 608; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播 或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by the central processing unit 601, the above-mentioned functions defined in the method of the present application are executed. It should be noted that the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元和训练单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取已标注对象的姿态的图像的单元”。The units described in the embodiments of the present application may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit and a training unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases. For example, the obtaining unit may also be described as a “unit for obtaining an image of the posture of an already labeled object”.
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:获取已标注对象的姿态的图像,其中,图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;基于图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:将图像输入卷积神经网络,基于卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候 选框作为目标候选框;对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对图像检测到的一个姿态。As another aspect, the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device. The computer-readable medium described above carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to obtain an image of the posture of the marked object, where the image contains at least two objects, different objects The pose is different, and the pose is indicated by multiple key points. Based on the image and the annotation of the pose, the convolutional neural network is trained to obtain the trained convolutional neural network. The training process includes: inputting the image into the convolutional neural network. The previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold The candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution of the specific combination of the above technical features, but it should also cover the above technical features or Other technical solutions formed by arbitrarily combining their equivalent features. For example, a technical solution formed by replacing the above features with technical features disclosed in the present application (but not limited to) with similar functions.

Claims (12)

  1. 一种图像处理方法,包括:An image processing method includes:
    获取已标注对象的姿态的图像,其中,所述图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;Acquiring an image of the posture of the labeled object, wherein the image includes at least two objects, the posture of different objects is different, and the posture is indicated by multiple key points;
    基于所述图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:Training the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network. The training process includes:
    将所述图像输入卷积神经网络,基于所述卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;Input the image into a convolutional neural network, and determine a candidate pose of each object based on a previously set anchor pose of the convolutional neural network;
    确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框;Determining the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the annotated pose, and using the candidate frame whose coincidence is greater than a preset coincidence degree threshold as a target candidate frame;
    对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值;将各个关键点的位置平均值的集合作为对所述图像检测到的一个姿态。For each key point in the target candidate frame corresponding to each labeled box, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the detected image A gesture.
  2. 根据权利要求1所述的方法,其中,在所述将所述图像输入卷积神经网络,基于所述卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态之前,所述方法还包括:The method according to claim 1, wherein before the inputting the image into a convolutional neural network, and based on a previously set anchor pose of the convolutional neural network, determining a candidate pose of each object, the The method also includes:
    对目标图像中的多个预设姿态进行聚类,得到关键点集合;Clustering multiple preset poses in the target image to obtain the key point set;
    将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在所述目标图像中的位置不同。Each key point set is determined as an anchor point pose, where key points included in different key point sets have different positions in the target image.
  3. 根据权利要求2所述的方法,其中,所述对目标图像中的多个预设姿态进行聚类,得到关键点集合,包括:The method according to claim 2, wherein the clustering a plurality of preset poses in a target image to obtain a keypoint set comprises:
    对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;Cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose is the same as the number of key points of the preset pose;
    将聚类中心的多维向量所对应的姿态的各个关键点组成关键点集合。Each key point of the pose corresponding to the multi-dimensional vector of the cluster center is composed of a key point set.
  4. 根据权利要求1所述的方法,其中,所述对于每个标注框所对 应的目标候选框的每一个关键点,取各个目标候选框内候选姿态中的该关键点的位置平均值,包括:The method according to claim 1, wherein, for each key point of the target candidate frame corresponding to each labeled frame, taking an average position of the key point in the candidate pose in each target candidate frame includes:
    对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,所述第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each label box, in response to determining that the position of the key point is outside the label box, a preset first preset weight is taken as the target candidate frame. The weight of the key point; in response to determining that the position of the key point is within the callout frame, using a preset second preset weight as the weight of the key point within the target candidate frame, the first preset weight being less than A second preset weight; determining an average position of the key point in each target candidate frame based on the weight of the key point in each target candidate frame corresponding to the labeled frame.
  5. 根据权利要求1述的方法,其中,所述对于每个标注框所对应的目标候选框的每一个关键点,取各个目标候选框内候选姿态中的该关键点的位置平均值,包括:The method according to claim 1, wherein for each key point of the target candidate frame corresponding to each labeled frame, taking an average position of the key point in the candidate pose in each target candidate frame includes:
    对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each labeled box, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; in response to determining that the distance is less than or equal to , Based on the weights of the key points in each target candidate frame corresponding to the labeled box, determine an average position of the key points in each target candidate frame.
  6. 一种图像处理装置,包括:An image processing device includes:
    获取单元,被配置成获取已标注对象的姿态的图像,其中,所述图像包含至少两个对象,不同对象的姿态不同,姿态通过多个关键点指示;An obtaining unit configured to obtain an image of the pose of the labeled object, wherein the image includes at least two objects, the poses of different objects are different, and the pose is indicated by multiple key points;
    训练单元,被配置成基于所述图像和对姿态的标注,对卷积神经网络进行训练,得到训练后的卷积神经网络,训练过程包括:The training unit is configured to train a convolutional neural network based on the image and the annotation of the posture to obtain a trained convolutional neural network. The training process includes:
    将所述图像输入卷积神经网络,基于所述卷积神经网络的在先设置的锚点姿态,确定各个对象的候选姿态;确定各个候选姿态所在的候选框与已标注的姿态的标注框的重合度,将重合度大于预设重合度阈值的候选框作为目标候选框;对于每个标注框所对应的目标候选框内的每一个关键点,取各个目标候选框内的该关键点的位置平均值; 将各个关键点的位置平均值的集合作为对所述图像检测到的一个姿态。The image is input to a convolutional neural network, and a candidate pose of each object is determined based on a previously set anchor pose of the convolutional neural network; a candidate frame where each candidate pose is located and a labeled frame of the labeled pose are determined. Coincidence degree, taking the candidate frame whose coincidence degree is greater than the preset coincidence degree threshold as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the position of the key point in each target candidate frame is taken Average value; the set of average position values of each key point is taken as a pose detected on the image.
  7. 根据权利要求6所述的装置,其中,所述装置还包括:The apparatus according to claim 6, wherein the apparatus further comprises:
    聚类单元,被配置成对目标图像中的多个预设姿态进行聚类,得到关键点集合;A clustering unit configured to cluster a plurality of preset poses in a target image to obtain a set of key points;
    确定单元,被配置成将各个关键点集合确定为锚点姿态,其中,不同关键点集合所包括的关键点在所述目标图像中的位置不同。The determining unit is configured to determine each key point set as an anchor point pose, where key points included in different key point sets have different positions in the target image.
  8. 根据权利要求7所述的装置,其中,所述聚类单元,进一步被配置成:The apparatus according to claim 7, wherein the clustering unit is further configured to:
    对各个预设姿态所对应的多维向量进行聚类,其中,预设姿态所对应的多维向量的维度数量与预设姿态的关键点数量相同;Cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose is the same as the number of key points of the preset pose;
    将聚类中心的多维向量所对应的预设姿态的各个关键点组成关键点集合。Each key point of the preset pose corresponding to the multi-dimensional vector of the cluster center is composed of a key point set.
  9. 根据权利要求6所述的装置,其中,所述训练单元,进一步被配置成:The apparatus according to claim 6, wherein the training unit is further configured to:
    对于每个标注框所对应的每个目标候选框内的每一个关键点,响应于确定该关键点的位置在该标注框以外,将预设的第一预设权重作为该目标候选框内的该关键点的权重;响应于确定该关键点的位置在该标注框以内,将预设的第二预设权重作为该目标候选框内的该关键点的权重,所述第一预设权重小于第二预设权重;基于该标注框所对应的各个目标候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each label box, in response to determining that the position of the key point is outside the label box, a preset first preset weight is taken as the target candidate frame. The weight of the key point; in response to determining that the position of the key point is within the callout frame, using a preset second preset weight as the weight of the key point within the target candidate frame, the first preset weight being less than A second preset weight; determining an average position of the key point in each target candidate frame based on the weight of the key point in each target candidate frame corresponding to the labeled frame.
  10. 根据权利要求6的装置,其中,所述训练单元,进一步被配置成:The apparatus according to claim 6, wherein said training unit is further configured to:
    对于每个标注框所对应的每个目标候选框内的每一个关键点,确定该关键点与已标注的姿态中该关键点的距离是否小于或等于预设距离阈值;响应于确定小于或者等于,基于该标注框所对应的各个目标 候选框内的该关键点的权重,确定各个目标候选框内的该关键点的位置平均值。For each key point in each target candidate frame corresponding to each labeled box, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; in response to determining that the distance is less than or equal to , Based on the weights of the key points in each target candidate frame corresponding to the labeled box, determine an average position of the key points in each target candidate frame.
  11. 一种电子设备,包括:An electronic device includes:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序,A storage device for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-5.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-5中任一所述的方法。A computer-readable storage medium having stored thereon a computer program, wherein when the program is executed by a processor, the method according to any one of claims 1-5 is implemented.
PCT/CN2018/115968 2018-09-29 2018-11-16 Image processing method and apparatus WO2020062493A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811149818.4 2018-09-29
CN201811149818.4A CN109389640A (en) 2018-09-29 2018-09-29 Image processing method and device

Publications (1)

Publication Number Publication Date
WO2020062493A1 true WO2020062493A1 (en) 2020-04-02

Family

ID=65418681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115968 WO2020062493A1 (en) 2018-09-29 2018-11-16 Image processing method and apparatus

Country Status (2)

Country Link
CN (1) CN109389640A (en)
WO (1) WO2020062493A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163841B (en) * 2019-04-12 2021-05-14 中科微至智能制造科技江苏股份有限公司 Method, device and equipment for detecting surface defects of object and storage medium
US10885625B2 (en) 2019-05-10 2021-01-05 Advanced New Technologies Co., Ltd. Recognizing damage through image analysis
CN110569703B (en) * 2019-05-10 2020-09-01 阿里巴巴集团控股有限公司 Computer-implemented method and device for identifying damage from picture
CN110378244B (en) * 2019-05-31 2021-12-31 曹凯 Abnormal posture detection method and device
CN112132913A (en) * 2019-06-25 2020-12-25 北京字节跳动网络技术有限公司 Image processing method, image processing apparatus, image processing medium, and electronic device
CN110738125B (en) * 2019-09-19 2023-08-01 平安科技(深圳)有限公司 Method, device and storage medium for selecting detection frame by Mask R-CNN
CN110765942A (en) * 2019-10-23 2020-02-07 睿魔智能科技(深圳)有限公司 Image data labeling method, device, equipment and storage medium
CN111695540B (en) * 2020-06-17 2023-05-30 北京字节跳动网络技术有限公司 Video frame identification method, video frame clipping method, video frame identification device, electronic equipment and medium
CN112907583B (en) * 2021-03-29 2023-04-07 苏州科达科技股份有限公司 Target object posture selection method, image scoring method and model training method
CN112819937B (en) * 2021-04-19 2021-07-06 清华大学 Self-adaptive multi-object light field three-dimensional reconstruction method, device and equipment
CN113326901A (en) * 2021-06-30 2021-08-31 北京百度网讯科技有限公司 Image annotation method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187172A1 (en) * 2004-12-02 2008-08-07 Nobuyuki Otsu Tracking Apparatus And Tracking Method
CN106355188A (en) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 Image detection method and device
CN107358149A (en) * 2017-05-27 2017-11-17 深圳市深网视界科技有限公司 A kind of human body attitude detection method and device
CN107463903A (en) * 2017-08-08 2017-12-12 北京小米移动软件有限公司 Face key independent positioning method and device
CN107909005A (en) * 2017-10-26 2018-04-13 西安电子科技大学 Personage's gesture recognition method under monitoring scene based on deep learning
CN108229445A (en) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 A kind of more people's Attitude estimation methods based on cascade pyramid network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187172A1 (en) * 2004-12-02 2008-08-07 Nobuyuki Otsu Tracking Apparatus And Tracking Method
CN106355188A (en) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 Image detection method and device
CN107358149A (en) * 2017-05-27 2017-11-17 深圳市深网视界科技有限公司 A kind of human body attitude detection method and device
CN107463903A (en) * 2017-08-08 2017-12-12 北京小米移动软件有限公司 Face key independent positioning method and device
CN107909005A (en) * 2017-10-26 2018-04-13 西安电子科技大学 Personage's gesture recognition method under monitoring scene based on deep learning
CN108229445A (en) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 A kind of more people's Attitude estimation methods based on cascade pyramid network

Also Published As

Publication number Publication date
CN109389640A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
WO2020062493A1 (en) Image processing method and apparatus
US11238272B2 (en) Method and apparatus for detecting face image
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
CN108898186B (en) Method and device for extracting image
CN109584276B (en) Key point detection method, device, equipment and readable medium
CN108509915B (en) Method and device for generating face recognition model
CN108256479B (en) Face tracking method and device
US11436863B2 (en) Method and apparatus for outputting data
EP3872764B1 (en) Method and apparatus for constructing map
CN109993150B (en) Method and device for identifying age
CN108491823B (en) Method and device for generating human eye recognition model
CN109389072B (en) Data processing method and device
WO2020029466A1 (en) Image processing method and apparatus
CN109344762B (en) Image processing method and device
WO2020056902A1 (en) Method and apparatus for processing mouth image
CN110163171B (en) Method and device for recognizing human face attributes
WO2019149186A1 (en) Method and apparatus for generating information
CN110059623B (en) Method and apparatus for generating information
CN108509921B (en) Method and apparatus for generating information
CN108388889B (en) Method and device for analyzing face image
WO2019080702A1 (en) Image processing method and apparatus
US11210563B2 (en) Method and apparatus for processing image
WO2022194130A1 (en) Character position correction method and apparatus, electronic device and storage medium
CN108229375B (en) Method and device for detecting face image
CN111783626A (en) Image recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18935375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18935375

Country of ref document: EP

Kind code of ref document: A1