WO2023284401A1 - 图像美颜处理方法、装置、存储介质与电子设备 - Google Patents
图像美颜处理方法、装置、存储介质与电子设备 Download PDFInfo
- Publication number
- WO2023284401A1 WO2023284401A1 PCT/CN2022/093127 CN2022093127W WO2023284401A1 WO 2023284401 A1 WO2023284401 A1 WO 2023284401A1 CN 2022093127 W CN2022093127 W CN 2022093127W WO 2023284401 A1 WO2023284401 A1 WO 2023284401A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- face
- original
- sub
- beautified
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 118
- 238000000034 method Methods 0.000 claims description 91
- 230000003796 beauty Effects 0.000 claims description 56
- 230000008707 rearrangement Effects 0.000 claims description 46
- 230000008569 process Effects 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 35
- 230000009466 transformation Effects 0.000 claims description 30
- 238000005070 sampling Methods 0.000 claims description 20
- 230000003313 weakening effect Effects 0.000 claims description 15
- 230000001131 transforming effect Effects 0.000 claims description 13
- 230000002829 reductive effect Effects 0.000 claims description 11
- 230000007704 transition Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 23
- 230000001815 facial effect Effects 0.000 abstract 3
- 230000006870 function Effects 0.000 description 55
- 230000007547 defect Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 11
- 230000008859 change Effects 0.000 description 10
- 238000001514 detection method Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 238000013519 translation Methods 0.000 description 7
- 238000007726 management method Methods 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000010295 mobile communication Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 5
- 230000002441 reversible effect Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000036548 skin texture Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 208000002874 Acne Vulgaris Diseases 0.000 description 1
- 206010000496 acne Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present disclosure relates to the technical field of image and video processing, and in particular to an image beautification processing method, an image beautification processing device, a computer-readable storage medium, and electronic equipment.
- Beautification refers to the use of image processing technology to beautify the portraits in images or videos to better meet the aesthetic needs of users.
- the disclosure provides an image beautification processing method, an image beautification processing device, a computer-readable storage medium, and electronic equipment.
- an image beautification processing method including: acquiring an original image to be beautified from continuous multi-frame images; The face in the reference frame image of the original image is matched, and the stable bounding box of the face in the original image to be beautified is determined according to the matching result; based on the stable bounding box of the face in the original image to be beautified, Extract the original face sub-image from the original image to be beautified; use the image beautification network to process the original face sub-image to obtain the corresponding beautifying face sub-image; generate the face sub-image according to the beautifying face The target beautification image corresponding to the original image to be beautified.
- an image beautification processing device including: an image acquisition module configured to acquire an original image to be beautified from a video; a face matching module configured to convert the image to be beautified The face in the original image is matched with the face in the reference frame image of the original image to be beautified, and the stable bounding box of the face in the original image to be beautified is determined according to the matching result; the sub-image extraction module, It is configured to extract the original human face sub-image from the original image to be beautified based on the stable bounding box of the face in the original image to be beautified; the beautification processing module is configured to use the image beautification network to process all The original face sub-image is processed to obtain a corresponding beautifying face sub-image; the image generation module is configured to generate a target beautifying image corresponding to the original image to be beautified according to the beautifying human face sub-image.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned image beautification processing method of the first aspect and possible implementations thereof are provided. .
- an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to Implement the image beautification processing method of the first aspect and its possible implementation manners.
- FIG. 1 shows a schematic diagram of a system architecture in this exemplary embodiment
- FIG. 2 shows a schematic structural diagram of an electronic device in this exemplary embodiment
- FIG. 3 shows a flow chart of an image beautification processing method in this exemplary embodiment
- Fig. 4 shows a flow chart of determining a stable bounding box in this exemplary embodiment
- Fig. 5 shows a flow chart of acquiring a beautifying human face sub-image in this exemplary embodiment
- Fig. 6 shows a flow chart of combining original human face sub-images in this exemplary embodiment
- Fig. 7 shows a schematic diagram of combining original human face sub-images in this exemplary embodiment
- Fig. 8 shows a schematic structural diagram of an image beautification network in this exemplary embodiment
- FIG. 9 shows a schematic structural diagram of another image beauty network in this exemplary embodiment.
- Fig. 10 shows a flow chart of using an image beautification network to process a face image to be beautified in this exemplary embodiment
- Fig. 11 shows a flow chart of training an image beauty network in this exemplary embodiment
- FIG. 12 shows a schematic diagram of a training image beauty network in this exemplary embodiment
- Fig. 13 shows a schematic diagram of a boundary region gradient processing in this exemplary embodiment
- Fig. 14 shows a schematic flowchart of an image beautification processing method in this exemplary embodiment
- Fig. 15 shows a schematic structural diagram of an image beautification processing device in this exemplary embodiment
- Fig. 16 shows a schematic structural diagram of another image beautification processing device in this exemplary embodiment.
- Example embodiments will now be described more fully with reference to the accompanying drawings.
- Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art.
- the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
- numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure.
- those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted.
- well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
- image beautification processing usually includes multiple fixed algorithm processes, such as image feature calculation based on artificial design, spatial filtering processing, layer fusion, etc.
- image feature calculation based on artificial design such as image feature calculation based on artificial design, spatial filtering processing, layer fusion, etc.
- complex and diverse lighting conditions may be faced, and the skin conditions of the subjects are varied. The above methods cannot cope with different situations well, resulting in unsatisfactory beautification effects.
- exemplary embodiments of the present disclosure provide an image beautification processing method.
- the system architecture and application scenarios of the operating environment in this exemplary embodiment will be exemplarily described below in conjunction with FIG. 1 .
- FIG. 1 shows a schematic diagram of a system architecture
- the system architecture 100 may include a terminal 110 and a server 120 .
- the terminal 110 may be a terminal device such as a smart phone, a tablet computer, a desktop computer, or a notebook computer
- the server 120 generally refers to a background system that provides services related to image beauty in this exemplary embodiment, and may be one server or multiple servers formed clusters.
- the terminal 110 and the server 120 may be connected through a wired or wireless communication link for data exchange.
- the terminal 110 may take pictures or obtain images or videos to be beautified by other means, and upload them to the server 120 .
- the user opens the beautification App (Application, application program) on the terminal 110, selects an image or video to be beautified from the photo album, and uploads it to the server 120 for beautification, or the user opens the live broadcast App on the terminal 110
- the beautifying function in the system uploads the video collected in real time to the server 120 for beautifying.
- the server 120 executes the above image beautification processing method to obtain a beautified image or video, and returns it to the terminal 110 .
- the server 120 can perform training on the image beautification network, and send the trained image beautification network to the terminal 110 for deployment, for example, package the relevant data of the image beautification network in the above-mentioned beauty application Or in the update package of the live broadcast App, the terminal 110 can obtain the image beautification network by updating the App and deploy it locally. Furthermore, after the terminal 110 captures or obtains the image or video to be beautified by other means, it can invoke the image beautification network to implement the image or video beautification process by executing the above image beautification processing method.
- the training of the image beautification network can be performed by the terminal 110, for example, the basic structure of the image beautification network is obtained from the server 120, and the local data set is used for training, or the data set is obtained from the server 120 , training the locally constructed image beautification network, or training the image beautification network without relying on the server 120 at all. Further, the terminal 110 may invoke the image beautification network to implement image or video beautification processing by executing the above image beautification processing method.
- the execution subject of the image beautification processing method in this exemplary embodiment may be the terminal 110 or the server 120, which is not limited in the present disclosure.
- Exemplary embodiments of the present disclosure also provide an electronic device for performing the above-mentioned image beautification network training method or image beautification processing method, and the electronic device may be the above-mentioned terminal 110 or server 120 .
- the electronic device may be the above-mentioned terminal 110 or server 120 .
- the mobile terminal 200 in FIG. 2 as an example below, the structure of the above-mentioned electronic device will be exemplarily described.
- the configuration in Fig. 2 can also be applied to equipment of a stationary type.
- the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a USB (Universal Serial Bus, Universal Serial Bus) interface 230, a charging management module 240, a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291, indication Device 292, motor 293, button 294, and SIM (Subscriber Identification Module, Subscriber Identification Module) card interface 295, etc.
- a processor 210 an internal memory 221, an external memory interface 222, a USB (Universal Serial Bus, Universal Serial Bus) interface 230, a charging management module 240, a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291, indication Device 292, motor 293, button 294,
- Processor 210 can include one or more processing units, for example: processor 210 can include AP (Application Processor, application processor), modem processor, GPU (Graphics Processing Unit, graphics processing unit), ISP (Image Signal Processor, image signal processor), controller, encoder, decoder, DSP (Digital Signal Processor, digital signal processor), baseband processor and/or NPU (Neural-Network Processing Unit, neural network processor), etc.
- AP Application Processor
- modem processor GPU
- ISP Image Signal Processor
- ISP Image Signal Processor, image signal processor
- controller encoder, decoder
- DSP Digital Signal Processor, digital signal processor
- baseband processor and/or NPU Neuro-Network Processing Unit, neural network processor
- the image beauty network in this exemplary embodiment can run on GPU, DSP or NPU, DSP and NPU usually run the image beauty network with int type data (integer type), and GPU usually uses float type data (floating point type)
- int type data integer type
- GPU usually uses float type data (floating point type)
- float type data floating point type
- the power consumption is lower, the response speed is faster, and the precision is lower, while the power consumption running on the GPU is higher, the response speed is fuller, and the precision is higher.
- the encoder can encode (compress) image or video data, such as encoding the image or video obtained after beauty treatment, to form corresponding code stream data, so as to reduce the bandwidth occupied by data transmission; the decoder can compress the image or video code stream data to decode (that is, decompress) to restore the image or video data, for example, to decode the video to be beautified, to obtain the image data of each frame in the video, and extract the original image to be beautified Perform beauty treatment.
- decode that is, decompress
- the mobile terminal 200 can process images or videos in multiple encoding formats, for example: image formats such as JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group), PNG (Portable Network Graphics, portable network graphics), BMP (Bitmap, bitmap) , MPEG (Moving Picture Experts Group, moving picture expert group) 1, MPEG2, H.263, H.264, HEVC (High Efficiency Video Coding, high-efficiency video coding) and other video formats.
- image formats such as JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group), PNG (Portable Network Graphics, portable network graphics), BMP (Bitmap, bitmap) , MPEG (Moving Picture Experts Group, moving picture expert group) 1, MPEG2, H.263, H.264, HEVC (High Efficiency Video Coding, high-efficiency video coding) and other video formats.
- JPEG Joint Photographic Experts Group
- PNG Portable Network Graphics, portable network graphics
- BMP Bitmap, bitmap
- the processor 210 may include one or more interfaces, and form connections with other components of the mobile terminal 200 through different interfaces.
- the internal memory 221 may be used to store computer-executable program codes including instructions.
- the internal memory 221 may include volatile memory and non-volatile memory.
- the processor 210 executes various functional applications and data processing of the mobile terminal 200 by executing instructions stored in the internal memory 221 .
- the external memory interface 222 can be used to connect an external memory, such as a Micro SD card, to expand the storage capacity of the mobile terminal 200.
- the external memory communicates with the processor 210 through the external memory interface 222 to implement a data storage function, such as storing images, videos and other files.
- the USB interface 230 is an interface conforming to the USB standard specification, and can be used to connect a charger to charge the mobile terminal 200 , and can also be connected to earphones or other electronic devices.
- the charging management module 240 is configured to receive charging input from the charger. While the charging management module 240 is charging the battery 242, it can also supply power to the device through the power management module 241; the power management module 241 can also monitor the state of the battery.
- the wireless communication function of the mobile terminal 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- the mobile communication module 250 can provide 2G, 3G, 4G, 5G and other mobile communication solutions applied on the mobile terminal 200 .
- the wireless communication module 260 can provide WLAN (Wireless Local Area Networks, wireless local area network) (such as Wi-Fi (Wireless Fidelity, wireless fidelity) network), BT (Bluetooth, Bluetooth), GNSS (Global Navigation) applied on the mobile terminal 200 Satellite System, Global Navigation Satellite System), FM (Frequency Modulation, frequency modulation), NFC (Near Field Communication, short-range wireless communication technology), IR (Infrared, infrared technology) and other wireless communication solutions.
- WLAN Wireless Local Area Networks, wireless local area network
- Wi-Fi Wireless Fidelity, wireless fidelity
- BT Bluetooth, Bluetooth
- GNSS Global Navigation
- FM Frequency Modulation, frequency modulation
- NFC Near Field Communication, short-range wireless communication technology
- IR Infrared, infrared technology
- the mobile terminal 200 can realize a display function and display a user interface through the GPU, the display screen 290 and the AP. For example, when the user performs camera detection, the mobile terminal 200 may display an interface of a camera detection App (Application, application program) on the display screen 290 .
- a camera detection App Application, application program
- the mobile terminal 200 can realize the shooting function through the ISP, camera module 291 , encoder, decoder, GPU, display screen 290 and AP.
- the user can enable the image or video capture function in the hidden camera detection App, and at this time, the image of the space to be detected can be collected through the camera module 291 .
- the mobile terminal 200 can implement audio functions through an audio module 270 , a speaker 271 , a receiver 272 , a microphone 273 , an earphone interface 274 , and an AP.
- the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, etc., so as to realize corresponding sensing and detection functions.
- the indicator 292 can be an indicator light, which can be used to indicate the charging status, the change of the battery capacity, and also can be used to indicate messages, missed calls, notifications and so on.
- the motor 293 can generate vibration prompts, and can also be used for touch vibration feedback and the like.
- the keys 294 include a power key, a volume key and the like.
- the mobile terminal 200 may support one or more SIM card interfaces 295 for connecting SIM cards to implement functions such as calling and mobile communication.
- FIG. 3 shows an exemplary flow of the image beautification network training method, which may include:
- Step S310 acquiring the original image to be beautified from continuous multiple frames of images
- Step S320 matching the face in the original image to be beautified with the face in the reference frame image of the original image to be beautified, and determining the stable bounding box of the face in the original image to be beautified according to the matching result;
- Step S330 based on the stable bounding box of the face in the original image to be beautified, extract the original face sub-image from the original image to be beautified;
- Step S340 using the image beautification network to process the original human face sub-image to obtain the corresponding beautifying human face sub-image;
- Step S350 generating a target beautifying image corresponding to the original image to be beautified according to the beautifying face sub-image.
- the image beauty network can be trained to realize any combination of one or more beauty functions, including but not limited to blemish removal, deformation, skin tone adjustment, skin smoothing, light and shadow adjustment, etc. Therefore, the image beautification processing method in FIG. 3 can be regarded as one stage of beautification processing, and before or after the image beautification processing method in FIG. 3 , other stages of beautification processing can be added.
- Image Beautification Networks are used to de-blemish images. After the original image to be beautified is acquired, it is processed through the image beautification processing method in FIG. 3 , and the obtained target beautification image is a blemish-removed beautification image. In the follow-up, personalized beautification processing can be performed on the blemish-free beautification image to obtain the final beautification image.
- blemish removal processing is necessary for image beautification, and the user's demand for blemish removal processing is relatively fixed, and a generalized blemish removal and beautification processing process can be realized through the image beautification processing method in FIG. 3 .
- beauty functions such as skin smoothing, deformation, three-dimensional, skin tone adjustment, and light and shadow adjustment are not necessary, and the specific needs of users for these beauty functions also present personalized characteristics.
- These beauty functions can be called Personalized beautification processing usually requires the user to perform specific settings before processing, for example, the user selects one or more of the beautification functions, and sets parameters such as the degree of skin smoothing and deformation, and then the terminal or server performs the processing according to the user's preferences. Settings are processed.
- the present disclosure does not limit the sequence of the image beautification process in FIG. 3 and other beautification processes.
- the stable bounding box of the face is determined, and then the original face sub-image is extracted for beautification processing, so that the stable bounding box of the face in the original image to be beautified inherits the relevant information in the reference frame image to a certain extent.
- the faces extracted from the original images to be beautified in different frames have a certain continuity and stability, and will not undergo drastic changes, thereby ensuring the consistency of the effect of beautifying the face.
- image beautification network can be used to remove blemishes or other beautification functions to replace multiple fixed algorithm processes in related technologies, increase the flexibility of image beautification processing, and apply to various lighting conditions or skin conditions , which improves the image beautification effect, and reduces the time and memory consumption.
- step S310 the original image to be beautified is acquired from consecutive multiple frames of images.
- the continuous multi-frame images may be videos, or continuously shot images and the like.
- the continuous multi-frame images are objects that need to be beautified. Taking a video as an example, it may be a video stream currently being shot or received in real time, or a complete video that has been shot or received, such as a piece of video stored locally.
- This disclosure does not limit the parameters such as video frame rate and image resolution.
- the video frame rate can be 30fps (frame per second), 60fps, 120fps, etc.
- the image resolution can be 720P, 1080P, 4K, etc. aspect ratio.
- beautification processing can be performed on each frame of the original image in the video, or a part of the original image can be selected from the video for beautification processing, and the original image that needs to be beautified is called the Beautify the original image.
- At least two frames of original images to be beautified may be acquired from a video.
- the original image containing the target face can be used as the original image to be beautified, or a frame interval strategy can be adopted to obtain a frame of the original image to be beautified at intervals of a certain number of frames.
- each frame of the received original image may be used as the original image to be beautified.
- step S320 the face in the original image to be beautified is matched with the face in the reference frame image of the original image to be beautified, and the face in the original image to be beautified is determined according to the matching result. Stabilize the bounding box.
- a bounding box refers to an area in an image that surrounds a human face and has a certain geometric shape.
- the present disclosure does not limit the shape of the bounding box, such as a rectangle, a trapezoid, or any other shape.
- the initially detected bounding box of the face area is called a basic bounding box, which may be, for example, the minimum bounding box containing the face, or a face frame obtained by a correlation algorithm.
- Optimize the basic bounding box, such as expansion, position correction, etc., and the optimized bounding box is called a stable bounding box.
- face detection can be performed on the original image to be beautified to obtain relevant information about the face.
- This disclosure does not limit the face detection algorithm.
- the key points of the face can be detected through a specific neural network. Including the key points of the face boundary, the basic bounding box of the face is generated according to the key points of the face boundary, and the stable bounding box is obtained through optimization.
- the reference frame image can be any frame of the above-mentioned continuous multi-frame images that has determined the stable bounding box of the face or has completed the beautification process.
- the The previous frame image is used as the reference frame image.
- the stable bounding box of the face in the original image to be beautified can be determined based on the stable bounding box of the face in the reference frame image.
- the face in the original image to be beautified is matched with the face in the reference frame image of the original image to be beautified, and the original image to be beautified is determined according to the matching result.
- the stable bounding box of the face in may include the following steps S410 to S430:
- Step S410 Detect the face in the original image to be beautified, record it as the face to be determined, and match the face to be determined with the determined face in the reference frame image of the original image to be beautified.
- the face to be determined refers to the face that needs to be beautified but has not been determined as a stable bounding box, which can be regarded as a face with unknown identity
- the determined face refers to the face that has been determined to be a stable bounding box.
- Treat faces with known identities The faces whose stable bounding boxes have been determined in the reference frame image are all determined faces.
- the human face detected in the original image to be beautified is a human face whose stable bounding box has not been determined, that is, the human face to be determined.
- all faces can be detected in the original image to be beautified by the face detection algorithm, which may include faces that do not need to be beautified (such as the faces of passers-by in the distance).
- the face detection algorithm may include faces that do not need to be beautified (such as the faces of passers-by in the distance).
- the detected faces can be filtered by the face area threshold.
- the face area threshold can be set based on experience or the size of the original image to be beautified.
- the face area threshold can be the size of the original image to be beautified*0.05; If the area of the bounding box is greater than or equal to the face area threshold, it is a face that needs to beautified, and information such as the basic bounding box of the face can be retained, or the face can be recorded as a face to be determined; If the area of the basic bounding box of the face is smaller than the face area threshold, it is a face that does not need beautification, and related information such as the basic bounding box of the face can be deleted without subsequent processing.
- the upper limit of the number of original face sub-images can be set, that is, the setting to be determined
- the maximum number of faces For example, it can be set to 4. If the number of retained faces is greater than 4 after the above-mentioned face area threshold is filtered, then 4 faces to be determined can be further screened out, such as the 4 faces with the largest area. , can also be the 4 faces closest to the center of the original image to be beautified, so that 4 original face sub-images are correspondingly intercepted in the follow-up, and subsequent processing is not performed on other faces.
- multiple beautification processes can be performed.
- 4 faces are selected as faces to be determined, and their corresponding original face sub-images are intercepted and processed for beautification.
- other faces are selected as faces to be determined. Face, and intercept its corresponding original face sub-image for beautification processing, so as to complete the beautification processing for all faces in the basic bounding box whose area is larger than the face area threshold in the image to be processed.
- an ID may be assigned to each face. For example, starting from the first frame, assign an ID to each face; after detecting a face in each frame, match each face with the face in the previous frame; if the match is successful, Then inherit the face ID and other related information in the previous frame; if the matching is unsuccessful, assign a new ID as a new face.
- This disclosure does not limit the method of matching the faces to be determined with the determined faces.
- a face recognition algorithm can be used to identify and compare each face to be determined with each determined face. If they are similar If the degree of similarity is higher than the preset similarity threshold, it is determined that the face to be determined is successfully matched with the determined face.
- the difference between the face to be determined and the bounding box of the determined face can be determined. It has been determined whether the face is matched successfully.
- An exemplary way of calculating the degree of overlap is provided below:
- the overlapping degree After the overlapping degree is determined, if the overlapping degree reaches a preset overlapping degree threshold, it is determined that the face to be determined is successfully matched with the determined human face.
- the overlap threshold can be set according to experience and actual needs, for example, it can be set to 0.75.
- any one of the basic bounding box of the face to be determined and the basic bounding box of the determined face can be iteratively transformed through the ICP (Iterative Closest Point) algorithm, and according to the final transformed Determine the number of pixels in the basic bounding box of the face and the basic bounding box of the determined face with the same pixel value and the number of pixel points with different pixel values Calculate the degree of overlap between the two basic bounding boxes, thereby judging whether the matching is successful.
- ICP Intelligent Closest Point
- each face to be determined and each determined face can be separately Perform matching calculations on the face to obtain a similarity matrix or an overlap matrix, and then use the Hungarian algorithm to achieve the global maximum match, and then determine whether the match is successful based on the similarity or overlap between each determined face and the determined face .
- Step S420 if the match between the face to be determined and the face to be determined is not successful, expand the basic bounding box of the face to be determined according to the first preset parameters to obtain a stable bounding box of the face to be determined;
- the first preset parameter is an expansion parameter for the basic bounding box of a new face, which can be determined according to experience or actual needs, for example, the width and height of the basic bounding box can be expanded by 1/4.
- the basic bounding box of the face to be determined is expressed as [bb0, bb1, bb2, bb3], bb0 is the abscissa of the upper left point of the basic bounding box, bb1 is the vertical coordinate of the upper left point of the basic bounding box, and bb2 is the lower right of the basic bounding box The abscissa of the point, bb3 is the ordinate of the lower right point of the basic bounding box, the width of the basic bounding box is w, and the height is h.
- the first preset parameter is represented by E1.
- the center point coordinates of the stable bounding box are equal to the center point coordinates of the basic bounding box, namely:
- center_x represents the x-coordinate of the center point of the stable bounding box of the face to be determined
- center_y represents the y-coordinate of the center point of the stable bounding box of the face to be determined.
- expand_bb0 is the abscissa of the upper left point of the stable bounding box
- expand_bb1 is the ordinate of the upper left point of the stable bounding box
- expand_bb2 is the abscissa of the lower right point of the stable bounding box
- expand_bb3 is the ordinate of the lower right point of the stable bounding box.
- the above-mentioned coordinates usually use the pixel coordinates in the image, which are integers, so when calculating, you can use float type data for calculation, then perform rounding, and save the result as int type data.
- float data for calculation and cache intermediate results, and perform rounding when calculating the final results (including the above-mentioned expand_w, expand_h, center_x, center_y, expand_bb0, expand_bb1, expand_bb2, expand_bb3) , and save it as int type data.
- center_x_float and center_y_float represent the coordinates of the center point stored in float type data
- center_x and center_y represent the coordinates of the center point stored in int type data
- int() represents the rounding operation.
- Step S430 if the face to be determined is successfully matched with the determined face, determine the stable bounding box of the face to be determined according to the stable bounding box of the determined face.
- the face to be determined in the original image to be beautified will not change too much compared to the determined face in the matching reference frame image, which is reflected in the position change and size change. , so on the basis of the determined stable bounding box of the face, appropriate position changes and size changes can be performed to obtain the stable bounding box of the face to be determined.
- the position change and the stable bounding box of the determined face can be performed according to the position change parameters and size change parameters of the basic bounding box of the face to be determined relative to the basic bounding box of the determined face.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined face may include the following steps:
- the coordinates of the center point of the stable bounding box of the determined face and the coordinates of the center point of the basic bounding box of the face to be determined are weighted to obtain the coordinates of the center point of the stable bounding box of the face to be determined.
- the center point coordinates of the two are weighted by a preset stability coefficient, which may be the weight of a stable bounding box of a determined face, and may be determined based on experience or an actual scene.
- a preset stability coefficient which may be the weight of a stable bounding box of a determined face, and may be determined based on experience or an actual scene.
- the faster the face moves the smaller the preset stability factor.
- the preset stability coefficient can be set to 0.9, then the coordinates of the center point of the stable bounding box of the face to be determined are calculated as follows:
- pre_center_x represents the x-coordinate of the center point of the stable bounding box of the determined face
- pre_center_y represents the y-coordinate of the center point of the stable bounding box of the determined face.
- the formula (7) indicates that the center point coordinate weight of the stable bounding box of the determined face is 0.9, and the center point coordinate weight of the basic bounding box of the face to be determined is 0.1, and the two center point coordinates are weighted to obtain The coordinates of the center point of the stable bounding box of the face to be determined.
- pre_center_x_float is the saved float data of pre_center_x
- pre_center_y_float is the saved float data of pre_center_y
- the mechanism of updating the momentum of the coordinates of the center point is essentially adopted, which can prevent the coordinates of the center point of the stable bounding box of the same face from appearing from the reference frame image to the original image to be beautified Excessive movement will cause the subsequent captured original face sub-image to shake, affecting the beautification effect.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined face may include the following steps:
- the size of the stable bounding box of the determined face is expanded according to a second preset parameter, Get the size of the stable bounding box of the face to be determined;
- the size of the stable bounding box of the determined face is reduced according to a third preset parameter, Obtain the size of the stable bounding box of the face to be determined; the first magnification is greater than the second magnification;
- the size of the basic bounding box of the face to be determined is smaller than the product of the size of the stable bounding box of the determined face and the first magnification, and greater than the product of the size of the stable bounding box of the determined face and the second magnification, then the The size of the stable bounding box of the determined face is used as the size of the stable bounding box of the face to be determined.
- the above steps indicate that according to the comparison result of the size of the basic bounding box of the face to be determined and the size of the stable bounding box of the determined face, the calculation is divided into three cases respectively.
- the first magnification and the second magnification can be integer magnifications or non-integer magnifications. In one embodiment, the first magnification is greater than or equal to 1, and the second magnification is less than 1. Exemplarily, the first magnification may be 1, and the second magnification may be 0.64.
- the width and height can be compared and calculated separately.
- the comparison result of the width belongs to the first case above
- the comparison result of the height belongs to the second case
- the face to be determined is calculated in the two cases respectively.
- the third condition above is satisfied.
- the size of the stable bounding box of the determined face is equal to the size of the stable bounding box of the determined face, that is, the size of the stable bounding box remains unchanged.
- the above-mentioned first case and the second case are both situations where the size of the face changes drastically.
- the first case is that the face becomes larger dramatically.
- the stability of the determined face is appropriately expanded according to the second preset parameter.
- the size of the bounding box is to obtain the size of the stable bounding box of the face to be determined.
- the second preset parameter can be determined based on experience and the actual scene; the second case is that the face is drastically reduced.
- the size of the stable bounding box of the determined face is reduced to obtain the size of the stable bounding box of the face to be determined.
- the third preset parameter can be determined according to experience and actual scene.
- the coordinates of the upper left point and the lower right point of the stable bounding box can be calculated. If the calculated coordinates exceed the boundary of the original image to be beautified, replace the coordinates beyond the boundary with the boundary coordinates of the original image to be beautified.
- the stable bounding box can be expressed in the form of [expand_bb0, expand_bb1, expand_bb2, expand_bb3].
- the stable bounding box of the face to be determined is determined according to the stable bounding box of the determined face, so that the face to be determined inherits to a certain extent
- the information of the stable bounding box of the face has been determined, so as to ensure that the stable bounding box of the face between different frames of images has a certain continuity and stability, and there will be no drastic position or size changes, thereby ensuring the subsequent beauty.
- the consistency of the face beautification effect during the beautification process can prevent the face from flickering after beautification due to drastic changes in the face.
- the relevant parameters of the stable bounding box can be saved, and the face to be determined can be marked as a determined face for use in subsequent frames to be determined. Face matching and stable bounding box determination.
- step S330 based on the stable bounding box of the face in the original image to be beautified, the original human face sub-image is extracted from the original image to be beautified.
- the image of the stable bounding box is intercepted from the original image to be beautified to obtain the original face sub-image.
- the original image to be beautified includes stable bounding boxes of multiple faces, the original face sub-image corresponding to each face can be intercepted.
- step S340 use the image beautification network to process the original human face sub-image to obtain the corresponding beautified human face sub-image.
- each original face sub-image can be input into the image beautification network to obtain the beautifying face sub-image corresponding to each original face sub-image, and also Multiple original face sub-images can be combined and input into the image beauty network for processing.
- the above-mentioned processing of the original face sub-image by using the image beautification network to obtain the corresponding beautification face sub-image may include the following steps S510 to S530:
- Step S510 based on the input image size of the image beautification network, the original face sub-images extracted from the original image to be beautified are combined to generate a face image to be beautified.
- the input image size is the image size that matches the input layer of the image beauty network. This disclosure does not limit the size and aspect ratio of the input image size. For example, the ratio of the long side to the short side of the input image size can be set to be close to
- the image beautification network may be a fully convolutional network, and the full convolutional network may process images of different sizes.
- the image beautification network has no requirements for the size of the input image, and the size has an impact on the amount of calculation, memory usage, and fineness of beautification.
- the size of the input image can be determined according to the fineness of beautification set by the user or the performance of the device. Therefore, the image beautification network can be deployed on devices with different performances such as high, medium, and low, and has a wide range of applications. There is no need to deploy different image beautification networks for different devices, which reduces the training cost of the network.
- the size of the input image may be determined to be a small value, such as width 640*height 448.
- the input image size of the image beauty network determines the clarity of the face image to be beautified. When the clarity is low, it is not suitable for some beauty functions, such as tiny blemishes such as tiny moles and dry lip lines. It occupies a small number of pixels in the face image to be beautified. If it is removed, the removal may be inaccurate, affect the surrounding skin texture, or cause flickering. Therefore, according to the actual deployment environment of the image beautification network, the input image size of the image beautification network is determined, and then the beautification function of the image beautification network is determined, and a beautification image data set is constructed and trained. Exemplarily, when the input image size of the image beautification network is smaller than width 448*height 320, the image beautification network can be set not to include the function of removing minor blemishes.
- the original human face sub-images intercepted in each frame of the original images to be beautified can be combined, so that one frame of original images to be beautified corresponds to one Frame the face image to be beautified.
- a frame-by-frame beautification process can be performed on the video, and the original human face sub-images intercepted in each frame of image are sequentially combined to obtain each frame of human face image to be beautified corresponding to each frame of image.
- the original human face sub-images intercepted in the multi-frame original image to be beautified can generate one or more frames of human face images to be beautified corresponding to the multi-frame original image to be beautified.
- multiple frames in the video can be merged and beautified, and the original face sub-images intercepted from consecutive multiple frames of images can be combined arbitrarily to match the above input image size.
- Two original human face sub-images, these four original human face sub-images are combined into one frame of human face image to be beautified.
- the above-mentioned one or more original face sub-images are combined based on the input image size of the image beautification network to generate a face image to be beautified, which may include the following steps S610 To S630:
- Step S610 according to the quantity of the original human face sub-image, the input image size is divided into one-to-one corresponding sub-image size corresponding to the original human face sub-image;
- Step S620 transforming the corresponding original face sub-image based on the size of each sub-image respectively;
- Step S630 combining the transformed original face sub-images to generate a face image to be beautified.
- FIG. 7 represents the number of original face sub-images
- FIG. 7 shows exemplary methods of input image size division and image combination when Q is 1-4.
- the size of the input image is width 640*height 448, when Q is 1, the size of the sub-image is also width 640*height 448; when Q is 2, the size of the sub-image is half of the size of the input image, that is, width 320*height 448; Q When it is 3, the sub-image sizes are 0.5, 0.25, and 0.25 of the input image size, that is, width 320*height 448, width 320*height 224, width 320*height 224; when Q is 4, the sub-image sizes are input 0.25 of the image size, that is, width 320*height 224.
- each original face sub-image Transforms each original face sub-image to be consistent with the size of the sub-image.
- the order of the size of the original face sub-image and the size of the sub-image can be In order of size, the original face sub-images are in one-to-one correspondence with the sub-image sizes, that is, the largest original face sub-image corresponds to the largest sub-image size, and the smallest original face sub-image corresponds to the smallest sub-image size.
- the transformed original face sub-images are combined in the manner shown in FIG. 10 to generate a face image to be beautified.
- the size of the input image can be equally divided by Q to obtain Q sub-images with the same size.
- Q is an odd number, divide the input image size into Q+1 equal parts to get Q+1 same sub-image size, merge two of the sub-image sizes into one sub-image size, and keep the remaining Q-1 sub-images unchanged , thus obtaining Q sub-image sizes.
- the size ratio (or area ratio) of the original face sub-image can be calculated first, such as S 1 : S 2 : S 3 : ...: S Q , and then the input image size is divided according to this ratio is the size of Q sub-images.
- the original face sub-image may be transformed based on the size of the sub-image.
- transforming the original face sub-image may include any one or more of the following:
- the original face sub-image may not be rotated.
- the original face sub-image is down-sampled according to the size of the sub-image.
- the size of the original face sub-image is larger than the size of the sub-image means that the width of the original face sub-image is greater than the width of the sub-image size, or the height of the original face sub-image is greater than the height of the sub-image size.
- the image to be processed is generally a clear image taken by the terminal device, and its size is relatively large. Therefore, it is relatively common that the size of the original face sub-image is larger than the size of the sub-image.
- the image is downsampled.
- the downsampling can be implemented by bilinear interpolation, nearest neighbor interpolation and other methods, which is not limited in this disclosure.
- At least one of the width and height of the original face sub-image is aligned with the size of the sub-image, specifically including the following situations:
- the width and height of the original human face sub-image are the same as the sub-image size
- the width of the original human face sub-image is the same as the width of the sub-image size, and the height is smaller than the height of the sub-image size;
- the height of the original face sub-image is the same as the height of the sub-image size, and the width is smaller than the width of the sub-image size.
- the down-sampling processing step may not be performed.
- the size of the original face sub-image is smaller than the size of the sub-image
- the original face sub-image is filled according to the size difference between the original face sub-image and the sub-image, so that the size of the filled original face sub-image is equal to the size of the sub-image.
- the size of the original face sub-image is smaller than the sub-image size, means that at least one of the width and height of the original face sub-image is smaller than the sub-image size, and the other is not larger than the sub-image size, specifically including the following situations:
- the width of the original face sub-image is smaller than the width of the sub-image size, and the height is also smaller than the height of the sub-image size;
- the width of the original human face sub-image is less than the width of the sub-image size, and the height is equal to the height of the sub-image size;
- the height of the original face sub-image is smaller than the height of the sub-image size, and the width is equal to the height of the sub-image size.
- Preset pixel values can be used for filling, usually pixel values that are quite different from the face color, such as (R0, G0, B0), (R255, G255, B255), etc.
- the center of the original human face sub-image coincides with the center of the sub-image size, and the difference part around the original human face sub-image is filled, so that the size of the original human face sub-image after filling is the same as that of the original human face sub-image.
- Subimages are of consistent size.
- the original human face sub-image can also be aligned with one edge of the size of the sub-image, and the other side can be filled. The present disclosure does not limit this.
- the original face sub-image has been subjected to at least one of the above-mentioned rotation and down-sampling, and the original face sub-image has been processed by at least one of the rotation and down-sampling, then when the original face sub-image When the size is smaller than the size of the sub-image, fill it according to the difference between it and the size of the sub-image.
- the specific implementation method is the same as that of the above-mentioned original face sub-image, so it will not be described again.
- 1, 2, and 3 are used in sequence to process each original face sub-image, and combine the processed original face sub-images into a face image to be beautified.
- the orientation, size, etc. of the original face sub-image are changed, which is to facilitate the unified processing of the image beauty network.
- the image after beautification needs to be inversely transformed to restore it to be consistent with the direction and size of the original face sub-image, so as to adapt to the size of the original image to be beautified. Therefore, corresponding transformation information can be saved, including but not limited to: the direction and angle of rotation for each original face sub-image, the ratio of downsampling, and the coordinates of filled pixels. This facilitates subsequent inverse transformation according to the transformation information.
- the combination information can be saved, including but not limited to the size of each original face sub-image (that is, the corresponding sub-image size) and the position in the face image to be beautified , the arrangement and order of each original face sub-image. Subsequently, the face beautification combination image may be split according to the combination information to obtain each individual beautification face sub-image.
- Step S520 using the image beautification network to process the beautified human face image to obtain the corresponding beautified human face image.
- an image beautification network of any structure may be set according to actual needs.
- the input and output of the image beautification network are both images, so an end-to-end (end-to-end) structure can be adopted, for example, it can be a fully convolutional network.
- the image beautification network can use a deep neural network (Deep Neural Network, DNN) to reduce the number of parameters by increasing the number of network layers (ie network depth) , and at the same time, it can learn the deep features of the image and realize pixel-level processing.
- DNN Deep Neural Network
- Fig. 8 shows a schematic structural diagram of an image beautification network, which can adopt a U-Net structure.
- the convolution layer 1 is shown in FIG.
- the convolution layer 4 performs two convolution operations, the present disclosure
- the convolution layer 2 performs another round of convolution and pooling operations to obtain the size The further reduced feature image
- another round of convolution and pooling operations is performed by the convolutional layer 3 to obtain a smaller feature image
- the convolution operation is performed in the convolutional layer 4, but no pooling operation is performed; after that Enter the transposed convolution layer 1, first perform the transposed convolution operation, and then splicing with the feature image in the convolution layer 3, and then perform one or more convolution operations to obtain the feature image with an increased size
- transposing Convolution layer 2 performs another round of transposed convolution operation, splicing with the feature image in convolution layer 2, and convolution operation to obtain a feature image with a further increased size
- the transposed convolution layer 3 performs another round Perform the above operations to output a beautified face image.
- this disclosure does not limit the number of convolutional layers and transposed convolutional layers in the image beautification network. According to actual scene requirements, other types of intermediate layers can also be added to the image beautification network, such as pixel Rearrangement layer, Dropout layer (discarding layer), fully connected layer, etc.
- the image beauty network can be a fully convolutional network, including: a first pixel rearrangement layer, at least one convolutional layer, at least one transposed convolutional layer, a second pixel rearrangement layer, and its structure Refer to Figure 9. Compared with the network structure in Figure 8, two pixel rearrangement layers are mainly added. Based on the image beautification network shown in Figure 9, the above-mentioned processing of the beautification face image by using the image beautification network to obtain the corresponding beautification face image may include steps S1010 to S1040 in Figure 10:
- Step S1010 using the first pixel rearrangement layer to perform pixel rearrangement processing from single channel to multi-channel on the face image to be beautified to obtain the first feature image.
- the face image to be beautified can be a single-channel image (such as a grayscale image) or a multi-channel image (such as an RGB image).
- the first pixel rearrangement layer can rearrange each channel of the face image to be beautified into multiple channels.
- step S1010 includes:
- a represents the number of channels of the face image to be beautified, which is a positive integer
- n represents the parameter of pixel rearrangement, which is a positive integer not less than 2.
- the face image to be beautified is a single-channel image
- the first feature image of four channels will be obtained after pixel rearrangement
- the face image to be beautified is a three-channel image
- the first feature image of twelve channels will be obtained after pixel rearrangement. feature image.
- the first pixel rearrangement layer can be implemented using the space_to_depth function in TensorFlow (an implementation framework for machine learning), which converts the spatial features in the face image to be beautified into depth features, and can also use convolution with a step size of n. The operation is realized. At this time, the first pixel rearrangement layer can be regarded as a special convolutional layer.
- TensorFlow an implementation framework for machine learning
- Step S1020 using a convolution layer to perform convolution processing on the first feature image to obtain a second feature image.
- the disclosure does not limit the number of convolutional layers, the size of the convolutional kernel, the specific structure of the convolutional layer, and the like.
- Convolutional layers are used to extract image features from different scales and learn depth information.
- the convolutional layer can include a supporting pooling layer for downsampling the convolved image to achieve information abstraction, increase the receptive field, and reduce parameter complexity at the same time.
- the image can be reduced by a factor of 2 until the last convolutional layer outputs the second feature image.
- the second feature image can be image beauty Feature image with minimum size during network processing.
- Step S1030 using a transposed convolution layer to perform transposed convolution processing on the second feature image to obtain a third feature image.
- the disclosure does not limit the number of transposed convolutional layers, the size of the transposed convolutional kernel, the specific structure of the transposed convolutional layer, and the like.
- the transposed convolution layer is used to upsample the second feature image, which can be regarded as the opposite process of convolution, thereby restoring the size of the image.
- a progressive upsampling method can be adopted, for example, the image can be increased by a factor of 2 until the last transposed convolutional layer outputs the third feature image.
- the convolutional layer and the transposed convolutional layer have a completely symmetrical structure, and the size and number of channels of the third feature image are the same as those of the first feature image.
- a direct connection can be established between the convolutional layer and the transposed convolutional layer, as shown in Figure 11, between the convolutional layer corresponding to the feature image of the same size and the transposed convolution A direct connection is established between the layers, so that the feature image information of the convolution link is directly connected to the feature image in the transposed convolution link, which is conducive to obtaining a third feature image with more comprehensive information.
- Step S1040 using the second pixel rearrangement layer to perform pixel rearrangement processing on the third feature image from multi-channel to single-channel to obtain a beautifying and human-face combined image.
- step S1240 includes:
- b is a positive integer.
- the second pixel rearrangement layer can be implemented by using the depth_to_space function in TensorFlow to convert the depth features in the third feature image into spatial features, or it can be implemented by using a transposed convolution operation with a step size of n.
- the second pixel rearrangement Permutation layers can be viewed as special transposed convolutional layers.
- step S1240 can be the inverse operation of step S1210.
- the number of channels of the beautified face image and the face image to be beautified is also the same, that is, the image size and the number of channels are not changed during the processing of the image beautification network.
- the processing of the image beauty network also does not change the number of faces.
- the face image to be beautified is composed of 4 original face sub-images.
- the output beautified face image also includes 4 faces, which is a combination of the 4 original face sub-images.
- the human face after face beautification in the face sub-image is a combination of the 4 original face sub-images.
- the blemish removal effect depends on the quality and training effect of the beautification image dataset, not on the artificially designed image feature calculation.
- the image beauty network can cope with almost all situations in practical applications, including different lighting conditions, different skin conditions, etc., to achieve accurate and sufficient detection and removal of portraits flaw.
- the image beauty treatment method may also include the following steps S1110 to S1140:
- Step S1110 input the first beautification sample image to the image beautification network to be trained to output the first beautification sample image.
- the image beautification network can realize the combination of different beautification functions.
- the beautification image data sets corresponding to different beautification functions can be obtained to train the required image beautification network. For example, if it is necessary to train a flawless image beautification network, obtain a flawed sample image to be beautified, and manually remove the flaws to obtain the corresponding labeled image (Ground truth), thereby constructing the flawless beautification image data set; if it is necessary to train an image beautification network that removes blemishes and deforms, obtain sample images to be beautified that have blemishes, and obtain corresponding labeled images through manual blemish removal and deformation processing, thereby constructing a blemish-removing + deformed beautification image dataset.
- an image beautification network with any combination of one or more beautification functions can be trained by constructing different beautification image data sets.
- multiple face images can be combined to obtain a sample image to be beautified, and the images after artificial beautification corresponding to the multiple face images can be combined to obtain the sample image to be beautified corresponding to an annotated image, and then add the sample image to be beautified and the annotated image to the beautification image dataset.
- the beautification image dataset can include images of single faces, images of multiple faces, and images of combined faces.
- the first sample image to be beautified is used to provide the beautification training of the image beautification network, and the beautification training refers to training the image beautification network to beautify high-quality, natural images.
- the first sample image to be beautified may be any image in the beautified image dataset.
- the structure of the image beautification network can refer to the content of the above-mentioned parts in Figure 8 and Figure 9, so it will not be described again.
- Step S1120 input the second sample image to be beautified into the image beautification network, and transform the image output by the image beautification network through transformation parameters to obtain the second beautification sample image.
- the second sample image to be beautified is used to provide anti-flicker training for the image beautification network.
- the anti-flicker training means that the trained image beautification network can achieve stable and flicker-free beautification processing effects on continuous multi-frame images.
- the second sample image to be beautified can be any image in the beautified image dataset.
- the first sample image to be beautified and the second sample image to be beautified may be acquired from the same beautification image data set.
- the first sample image to be beautified and the second sample image to be beautified can be the same image, so that each image in the beautification image data set can be used as the first sample image to be beautified and the second sample image to be beautified Sample images are used to increase the usage of the dataset.
- the above-mentioned beautification image dataset constructed by manual processing is a labeled dataset, in which all sample images to be beautified have corresponding labeled images.
- an unlabeled data set can also be constructed, for example, only sample images to be beautified are collected without manual processing. Annotate images.
- the first sample image to be beautified can be obtained in the labeled data set
- the second sample image to be beautified can be obtained in the unlabeled data set.
- the labeled image corresponding to the second sample image to be beautified is not used in the training process.
- the difficulty of obtaining the unlabeled data set is much lower than that of the labeled data set, which is conducive to increasing the number of second beautification sample images and facilitating more sufficient anti-flicker training for the image beautification network.
- the main shooting object that is, the human face
- the transformation parameters in step S1120 are used to simulate such changes between different frame images, and may include any one or more of translation parameters, rotation parameters, and scaling parameters.
- statistical analysis may be performed on the transformation parameters of the images in the video to obtain the transformation parameters in step S1120.
- the transformation parameters may be obtained through random generation.
- the image beauty treatment method may also include the following steps:
- the translation parameter is randomly generated in the first value interval
- the rotation parameter is randomly generated in the second value interval
- the scaling parameter is randomly generated in the third value interval.
- the first numerical interval is the numerical interval for the translation parameter
- the second numerical interval is the numerical interval for the rotation parameter
- the third numerical interval is the numerical interval for the scaling parameter, respectively representing possible translation, rotation, The scaled numerical range.
- three numerical intervals may be determined according to experiences and actual scenarios.
- the first numerical interval may be [-3, 3]
- the unit is pixel, indicating the number of pixels to be translated
- the second numerical interval may be [-5, 5]
- the unit is degree, indicating the degree of rotation
- the three-value interval can be [0.97, 1.03]
- the unit is times, indicating the scaling factor.
- random numbers are respectively generated in the three numerical ranges to obtain translation parameters, rotation parameters, and scaling parameters, that is, transformation parameters in step S320 are obtained.
- Step S1130 Transform the second sample image to be beautified by transforming parameters, and input the transformed second image to be beautified into the image beautification network to output a third sample image to be beautified.
- Step S1130 is equivalent to exchanging the order of beautification and transformation in step S1120, that is, the second image to be beautified is first transformed and then beautified to obtain a third beautification sample image.
- Step S1140 based on the difference between the labeled image corresponding to the first sample image to be beautified and the first sample image, and the difference between the second sample image and the third sample image, update the parameters of the image beautification network training.
- the difference between the labeled image and the first beautification sample image reflects the beautification effect of the image beautification network, that is, the closer the first beautification sample image is to the labeled image, the better the beautification effect of the image beautification network.
- the parameters of the image beautification network can be updated to implement beautification training.
- the difference between the second beautification sample image and the third beautification sample image reflects the anti-flicker effect of the image beautification network.
- the sample image is equal to the third beautification sample image), indicating that there is a transformation relationship between the k-1th frame image after the beautification process and the face in the kth frame image, but there is no difference in the face itself, that is, two consecutive frames in the video
- the image has a beauty consistency and there is no flickering. Therefore, based on the difference between the second beautification sample image and the third beautification sample image, the parameters of the image beautification network can be updated to implement anti-flicker training.
- the aforementioned parameter update of the image beauty network by the beauty training and the parameter update of the image beauty network by the anti-flicker training may be performed simultaneously or separately, which is not limited in this disclosure.
- step S1140 may include:
- the parameters of the image beautification network are updated according to the first loss function value and the second loss function value.
- the first loss function is used to reflect the beautification loss of the image beautification network
- the second loss function is used to reflect the anti-flicker loss of the image beautification network.
- the first loss function and the second loss function can be established in advance, for example, MAE (Mean Absolute Error, mean absolute error, ie L1 loss), MSE (Mean Square Error, mean square error, ie L2 loss) and other forms can be used.
- MAE Mean Absolute Error, mean absolute error, ie L1 loss
- MSE Mean Square Error, mean square error, ie L2 loss
- Second loss function value Second loss function value
- the parameters of the image beauty network can be updated by gradient descent according to the first loss function value and the second loss function value respectively, and the global loss function value can also be further calculated from the first loss function value and the second loss function value , the global loss function can be, for example, the weighted result of the first loss function and the second loss function, and then the parameters of the image beautification network are updated by gradient descent according to the value of the global loss function.
- Fig. 12 shows a schematic flow of training an image beautification network.
- the second sample image to be beautified is input into the image beautification network, and the output image is transformed using pre-generated transformation parameters to obtain the second sample image for beautification.
- Using the transformation parameters to transform the second sample image to be beautified and then input the transformed image into the image beautification network, and output the third sample image for beautification.
- the first loss function value and the second loss function value are weighted to obtain a global loss function value, and each parameter in the image beauty network is updated according to the global loss function value.
- the image beautification network can not only realize the conventional beautification processing, but also show the invariance of the beautification effect to the translation, rotation, scaling and other transformations of the image.
- the face in the video is translated, rotated, and When zooming caused by the change of the distance from the lens, the image beauty network can maintain the beautification effect of the face.
- This is combined with the above-mentioned technical means of stabilizing the position and size of the face by stabilizing the bounding box, which can further ensure the face
- the consistency of the beautification effect prevents the phenomenon of flickering after continuous multi-frame images undergo beautification.
- Step S530 splitting the beautifying face sub-image corresponding to the original human face sub-image from the beautifying face image.
- the combination information saved above can be used to split sub-images of specific positions and specific sizes from the beautifying face image, that is, the beautifying face sub-image, the beautifying face image,
- the face sub-images are in one-to-one correspondence with the original face sub-images.
- step S350 a target beautification image corresponding to the original image to be beautified is generated according to the beautified face sub-image.
- the face beautification sub-image is the result of beautifying the face in the original image to be beautified, replacing it with the face in the original image to be beautified, and the beautification result of the original image to be beautified can be obtained, namely Target beauty images.
- the original face sub-image in the original image to be beautified can be replaced with the corresponding beautified face sub-image to obtain the target beautification image.
- the split beautifying face sub-image can be correspondingly inverse transformed , including removing filled pixels, upsampling, reverse rotation by 90 degrees, etc., so that the face sub-image after inverse transformation is consistent with the direction and size of the original face sub-image, so that 1 :1 replacement to get the target beauty image.
- the face beautification sub-image is a face sub-image processed by an image beautification network, usually a face sub-image with a high degree of beautification.
- an image beautification network usually a face sub-image with a high degree of beautification.
- the original human face sub-image can be used to
- the beautifying face sub-image is subjected to beautifying weakening processing.
- the beautification weakening process refers to reducing the beautification degree of the beautifying face sub-image to increase. Two exemplary ways of beautification weakening are provided below:
- the original face sub-image is fused into the beautifying face sub-image.
- the beautifying degree parameter may be a beautifying intensity parameter under a specific beautifying function, such as a blemish removal degree.
- the beautification level parameter may be a parameter currently set, a system default parameter, or a parameter used for a previous beautification, and the like.
- the original human face sub-image and the beautified human face sub-image may be fused with the beautification degree parameter as a proportion. For example, assuming that the range of the degree of blemish removal is 0 to 100, and the currently set value is a, refer to the following formula:
- image_blend represents the fused image
- image_ori represents the original face sub-image
- image_deblemish represents the beautified face sub-image.
- inverse transformation can be performed on the split beautifying face sub-images.
- the relationship between the original face sub-image and the beautifying face sub-image is as follows: the direction and size of the original face sub-image before transformation are consistent with the beautifying face sub-image after inverse transformation; The orientation and size of face sub-images are consistent.
- the original face sub-image before the above transformation and the beautifying face sub-image after the inverse transformation can be fused, or the above transformation can be fused
- the original face sub-image after inverse transformation and the beautified face sub-image before inverse transformation can be fused.
- Method 2 Fusing the high-frequency images of the original human face sub-image into the beautifying human face sub-image.
- the high-frequency image refers to an image containing high-frequency information such as detail texture in the original face sub-image.
- high-frequency images can be acquired in the following manner:
- the above one or more original face sub-images are combined based on the input image size of the image beauty network, if the original face sub-image is down-sampled, the down-sampled face sub-image obtained after down-sampling is up-sampled , to get the upsampled face sub-image;
- the resolution of the down-sampled face sub-image is lower than that of the original face sub-image, and the high-frequency information of the image will inevitably be lost during the down-sampling process.
- the downsampled face sub-image is up-sampled so that the resulting up-sampled face sub-image has the same resolution as the original face sub-image. It should be noted that if the original face sub-image is rotated before down-sampling, then after up-sampling the down-sampled face sub-image, reverse rotation can be performed, so that the obtained up-sampled face sub-image is the same as the original face sub-image. The orientation of the face sub-images is also the same.
- Upsampling can use methods such as bilinear interpolation and nearest neighbor interpolation. Although the resolution can be restored through upsampling, it is difficult to completely restore the lost high-frequency information, that is, the upsampled face sub-image can be regarded as a low-frequency image of the original face sub-image. Thus, to determine the difference between the original face sub-image and the upsampled face sub-image, for example, the original face sub-image can be subtracted from the up-sampled face sub-image, the result is the high-frequency information of the original face sub-image, and the subtracted value An image is formed, which is a high-frequency image of the original face sub-image.
- the original face sub-image may also be filtered to extract high-frequency information to obtain a high-frequency image.
- the high-frequency image can be superimposed on the beautifying face sub-image by direct addition, so that high-frequency information such as detail textures can be added to the beautifying face sub-image , which is more realistic.
- the pixel values are generally small, such as the value of each channel of RGB does not exceed 4.
- the mutation position in the original face sub-image such as a small mole on the face, etc., it has strong high-frequency information, so the pixel value of the corresponding position in the high-frequency image may be relatively large.
- the pixel values at these positions may have adverse effects, such as sharp edges such as "mole prints", resulting in unnatural visual experience.
- the image beautification processing method may also include the following steps:
- the defect point is a pixel point with strong high-frequency information, and a point with a larger pixel value in the high-frequency image can be determined as a defect point.
- the defect point can be determined in the following manner:
- the pixel corresponding to the pixel in the high-frequency image is determined as a flaw.
- the preset blemish condition is used to measure the difference between the beautified face sub-image and the original face sub-image, so as to judge whether each pixel is a blemish to be removed.
- the process of removing blemishes, small moles, acne, etc. in the face are usually removed, and the skin color of the face is filled.
- the sub-image of the beautified face is very different from the original sub-image, so it can be obtained by Set default defect conditions to identify defect points.
- the preset defect condition may include: the difference values of each color channel are greater than a first color difference threshold, and at least one of the difference values of each color channel is greater than a second color difference threshold.
- the first color difference threshold and the second color difference threshold may be empirical thresholds. For example, when the color channels include RGB, the first color difference threshold may be 20, and the second color difference threshold may be 40.
- the specific difference of the RGB three color channels in the difference is judged, and the difference of each color channel is judged Whether the values are greater than 20, and whether the difference between at least one color channel is greater than 40, when these two conditions are met, it means that the preset defect condition is met, and the pixel point at the corresponding position in the high-frequency image is determined as the defect point .
- the preset area around the defect point can be further determined in the high-frequency image, for example, it can be a 5*5 pixel area centered on the defect point, and the specific size can be determined according to the size of the high-frequency image.
- Public is not limited. Adjust the pixel values in the preset area to the preset value range.
- the preset value range is generally a small value range, which can be determined according to experience and actual needs. Usually, the pixel value needs to be reduced during adjustment.
- the preset value range may be -2 ⁇ 2, and the pixel value around the defect point may exceed -5 ⁇ 5, and it is adjusted to be within -2 ⁇ 2, and the limit value processing is actually performed. This can weaken sharp edges such as "mole prints" and increase the visual natural feeling.
- This exemplary embodiment can use these two beautification weakening processing methods at the same time. For example, firstly, the original face sub-image and the beautifying face sub-image are fused through the first method, and on this basis, the high-frequency image is fused through the second method. Superimposed therein, a beautifying human face sub-image that has undergone a beautifying weakening process is obtained, and the beautifying human face sub-image has both a good beautifying effect and a sense of reality.
- the following steps may also be performed:
- Gradient processing is performed on the boundary area between the non-replaced area in the original image to be beautified and the beautified face sub-image, so that the boundary area forms a smooth transition.
- the non-replaced area in the original image to be beautified is the area in the original image to be beautified except the original face sub-image.
- the boundary area between the above-mentioned unreplaced area and the beautifying face sub-image actually includes two parts: the boundary area adjacent to the beautifying face sub-image in the unreplaced area, and the boundary area adjacent to the unreplaced area in the beautifying face sub-image. border area.
- gradation processing may be performed on any part, or both parts may be processed simultaneously.
- a certain proportion (for example, 10%) of the boundary area may be determined in the face beautification sub-image, which extends inward from the edge of the face beautification sub-image.
- the boundary area usually needs to avoid the face part, so as to avoid changing the color of the face part during the gradient processing.
- the original face sub-image is intercepted by the above-mentioned stable bounding box, so that the face in the original face sub-image has a certain distance from the boundary, and the face in the beautifying face sub-image also has a certain distance from the boundary.
- the face part can be better avoided.
- the boundary between the non-replaced area and the beautified face sub-image is a gradient color area (the oblique line area in FIG. 13 ), which forms a smooth transition and prevents color mutations from causing visual disharmony.
- each beautifying face sub-image can be replaced by the corresponding original face sub-image in the image to be processed, and the gradient processing of the boundary area can be performed to obtain a target beautifying face sub-image.
- Image so that it has a natural and harmonious visual experience.
- Fig. 14 shows a schematic flow of the image beautification processing method, including:
- Step S1401 determine the original image to be beautified from the video, for example, the current frame can be used as the original image to be beautified.
- Step S1402 perform face detection on the original image to be beautified, obtain the basic bounding boxes of multiple faces, screen out the faces whose area is smaller than the face area threshold, and record the remaining faces as faces to be determined.
- Step S1403 track the above-mentioned faces to be determined according to the determined faces in the reference frame image, acquire the ID of each face to be determined, and determine its stable bounding box.
- Step S1404 intercepting the image inside the stable bounding box to obtain the original face sub-image.
- Step S1405 according to the number of original face sub-images, the input image size of the image beauty network is divided into multiple sub-image sizes, the original face sub-images are down-sampled according to the sub-image sizes, and processing such as rotation and filling can also be performed to obtain each The downsampled face sub-image corresponding to the original face sub-image.
- Step S1406 up-sampling the downsampled face sub-image, if the down-sampled face sub-image is also processed by rotation, filling, etc., then reverse rotation, filling removal and other processing can also be performed to obtain the up-sampled face sub-image , which is consistent with the resolution of the corresponding original face sub-image.
- Step S1407 subtracting the original face sub-image from the corresponding up-sampled face sub-image to obtain a high-frequency image of the original face sub-image.
- Step S1408 combining the downsampled face sub-images into a face image to be beautified.
- Step S1409 input the face image to be beautified into the image beautification network, and output the beautified face image after processing.
- Step S1410 splitting the face beautification image into face beautification sub-images corresponding to the original face sub-images one by one.
- step S1411 the beautified face sub-image is fused with the corresponding original face sub-image according to the beautification degree parameter, and then added to the high-frequency image of the original face sub-image to obtain the face sub-image to be replaced.
- Step S1412 merging the face sub-image to be replaced with the original image to be beautified, specifically, the part of the original face sub-image in the original image to be beautified can be replaced by the face sub-image to be replaced, and the edge color gradient processing can be performed , so that the face in the original image to be beautified is replaced by the face after beautification, and finally the target beautified image is obtained.
- Step S1412 merging the face sub-image to be replaced with the original image to be beautified, specifically, the part of the original face sub-image in the original image to be beautified can be replaced by the face sub-image to be replaced, and the edge color gradient processing can be performed , so that the face in the original image to be beautified is replaced by the face after beautification, and finally the target beautified image is obtained.
- follow-up can also be personalized beauty treatment.
- Exemplary embodiments of the present disclosure also provide an image beautification processing device.
- the image beautification processing device 1500 may include:
- the image acquisition module 1510 is configured to acquire the original image to be beautified from continuous multi-frame images
- the human face matching module 1520 is configured to match the human face in the original image to be beautified with the human face in the reference frame image of the original image to be beautified, and determine the identity of the human face in the original image to be beautified according to the matching result. stable bounding box;
- the sub-image extraction module 1530 is configured to extract the original face sub-image from the original image to be beautified based on the stable bounding box of the face in the original image to be beautified;
- the beautification processing module 1540 is configured to use the image beautification network to process the original face sub-image to obtain a corresponding beautification face sub-image;
- the image generation module 1550 is configured to generate a target beautification image corresponding to the original image to be beautified according to the beautification face sub-image.
- the face matching module 1520 is configured to:
- Detect the face in the original image to be beautified record it as the face to be determined, and match the face to be determined with the determined face in the reference frame image of the original image to be beautified;
- the basic bounding box of the face to be determined is expanded according to the first preset parameter to obtain a stable bounding box of the face to be determined;
- the stable bounding box of the face to be determined is determined according to the stable bounding box of the determined face.
- the matching of the human face to be determined with the determined human face in the reference frame image of the original image to be beautified includes:
- the overlapping degree of the basic bounding box of the face to be determined and the basic bounding box of the determined face it is determined whether the match between the face to be determined and the determined face is successful.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined human face includes:
- the coordinates of the center point of the stable bounding box of the determined face and the coordinates of the center point of the basic bounding box of the face to be determined are weighted to obtain the coordinates of the center point of the stable bounding box of the face to be determined.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined human face includes:
- the size of the stable bounding box of the determined face is expanded according to a second preset parameter, Get the size of the stable bounding box of the face to be determined;
- the size of the stable bounding box of the determined face is reduced according to a third preset parameter, Obtain the size of the stable bounding box of the face to be determined; the first magnification is greater than the second magnification;
- the size of the basic bounding box of the face to be determined is smaller than the product of the size of the stable bounding box of the determined face and the first magnification, and greater than the product of the size of the stable bounding box of the determined face and the second magnification, then the The size of the stable bounding box of the determined face is used as the size of the stable bounding box of the face to be determined.
- the beauty treatment module 1540 is configured to:
- the original face sub-images extracted from the original image to be beautified are combined to generate a face image to be beautified;
- the beautifying human face sub-image corresponding to the original human face sub-image is split from the beautifying human face image.
- the above-mentioned input image size based on the image beautification network combines the original face sub-images extracted from the original image to be beautified to generate a face image to be beautified, including:
- the input image size is divided into sub-image sizes corresponding to the original people's face sub-image one-to-one;
- the above-mentioned transforms the corresponding original face sub-image based on each sub-image size, including any one or more of the following:
- the original human face sub-image is rotated by 90 degrees;
- the original human face sub-image or the rotated original human face sub-image is down-sampled according to the sub-image size
- the original face sub-image is filled according to the difference between the size of the original face sub-image and the size of the sub-image , or fill the original face sub-image processed by at least one of rotation and down-sampling according to the difference between the size of the original face sub-image processed by at least one of rotation and down-sampling and the size of the sub-image.
- the image generating module 1550 is configured to:
- the image generating module 1550 is configured to:
- the beautifying human face sub-image Before replacing the original human face sub-image in the original image to be beautified with the corresponding beautifying human face sub-image, the beautifying human face sub-image is subjected to beautifying weakening processing using the original human face sub-image.
- the above-mentioned use of the original face sub-image to perform beautification and weakening processing on the beautifying face sub-image includes:
- the original face sub-image is fused to the beautifying face sub-image.
- the above-mentioned use of the original face sub-image to perform beautification and weakening processing on the beautifying face sub-image includes:
- the high-frequency images of the original face sub-image are fused to the beautifying face sub-image.
- the image acquisition module 1510 is configured to:
- the original face sub-image extracted from the original image to be beautified is combined based on the input image size of the image beautification network, if the original face sub-image is down-sampled, the down-sampled face sub-image obtained after down-sampling Carry out upsampling, obtain upsampling people's face sub-image, the resolution of upsampling people's face sub-image is identical with original people's face sub-image;
- the image generating module 1550 is configured to:
- the boundary area between the non-replaced area in the original image to be beautified and the beautifying human face sub-image is subjected to gradient processing, so that Boundary areas form smooth transitions.
- the image beautification network is a fully convolutional network, including: a first pixel rearrangement layer, at least one convolutional layer, at least one transposed convolutional layer, and a second pixel rearrangement layer.
- the beauty treatment module 1530 is configured to:
- the first pixel rearrangement layer to perform pixel rearrangement processing from single-channel to multi-channel for the beautifying face image to obtain the first feature image
- transposed convolution layer to perform transposed convolution processing on the second feature image to obtain a third feature image
- the second pixel rearrangement layer is used to perform pixel rearrangement processing from multi-channel to single-channel on the third feature image to obtain a beautified face image.
- the beauty treatment module 1530 is configured to:
- a is a positive integer
- n is a positive integer not less than 2.
- the beauty treatment module 1530 is configured to:
- b is a positive integer
- n is a positive integer not less than 2.
- the image beautification processing device 1500 may also include a network training module configured to:
- the second sample image to be beautified is input into the image beautification network, and the image output by the image beautification network is transformed by the transformation parameters to obtain the second beautification sample image;
- the beautifying face image includes a blemish-free beautifying image; the beautifying processing module 1530 is configured to:
- the image beauty processing device 1600 may include a processor 1610 and a memory 1620, and the memory 1620 stores the following program modules:
- the image acquisition module 1621 is configured to acquire the original image to be beautified from continuous multi-frame images
- the face matching module 1622 is configured to match the face in the original image to be beautified with the face in the reference frame image of the original image to be beautified, and determine the identity of the face in the original image to be beautified according to the matching result. stable bounding box;
- the sub-image extraction module 1623 is configured to extract the original face sub-image from the original image to be beautified based on the stable bounding box of the face in the original image to be beautified;
- the beautification processing module 1624 is configured to use the image beautification network to process the original face sub-image to obtain the corresponding beautification face sub-image;
- the image generating module 1625 is configured to generate a target beautifying image corresponding to the original image to be beautified according to the beautifying human face sub-image;
- the processor 1610 is configured to execute the above program modules.
- the face matching module 1622 is configured to:
- Detect the face in the original image to be beautified record it as the face to be determined, and match the face to be determined with the determined face in the reference frame image of the original image to be beautified;
- the basic bounding box of the face to be determined is expanded according to the first preset parameter to obtain a stable bounding box of the face to be determined;
- the stable bounding box of the face to be determined is determined according to the stable bounding box of the determined face.
- the matching of the human face to be determined with the determined human face in the reference frame image of the original image to be beautified includes:
- the overlapping degree of the basic bounding box of the face to be determined and the basic bounding box of the determined face it is determined whether the match between the face to be determined and the determined face is successful.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined human face includes:
- the coordinates of the center point of the stable bounding box of the determined face and the coordinates of the center point of the basic bounding box of the face to be determined are weighted to obtain the coordinates of the center point of the stable bounding box of the face to be determined.
- the determination of the stable bounding box of the face to be determined according to the stable bounding box of the determined human face includes:
- the size of the stable bounding box of the determined face is expanded according to a second preset parameter, Get the size of the stable bounding box of the face to be determined;
- the size of the stable bounding box of the determined face is reduced according to a third preset parameter, Obtain the size of the stable bounding box of the face to be determined; the first magnification is greater than the second magnification;
- the size of the basic bounding box of the face to be determined is smaller than the product of the size of the stable bounding box of the determined face and the first magnification, and greater than the product of the size of the stable bounding box of the determined face and the second magnification, then the The size of the stable bounding box of the determined face is used as the size of the stable bounding box of the face to be determined.
- the beauty treatment module 1624 is configured to:
- the original face sub-images extracted from the original image to be beautified are combined to generate a face image to be beautified;
- the beautifying human face sub-image corresponding to the original human face sub-image is split from the beautifying human face image.
- the above-mentioned input image size based on the image beautification network combines the original face sub-images extracted from the original image to be beautified to generate a face image to be beautified, including:
- the input image size is divided into sub-image sizes corresponding to the original people's face sub-image one-to-one;
- the above-mentioned transforms the corresponding original face sub-image based on each sub-image size, including any one or more of the following:
- the original human face sub-image is rotated by 90 degrees;
- the original people's face sub-image or the rotated original people's face sub-image is greater than the sub-image size, the original people's face sub-image or the rotated original people's face sub-image is down-sampled according to the sub-image size;
- the original face sub-image is filled according to the difference between the size of the original face sub-image and the size of the sub-image , or fill the original face sub-image processed by at least one of rotation and down-sampling according to the difference between the size of the original face sub-image processed by at least one of rotation and down-sampling and the size of the sub-image.
- the image generation module 1625 is configured to:
- the image generation module 1625 is configured to:
- the beautifying human face sub-image Before replacing the original human face sub-image in the original image to be beautified with the corresponding beautifying human face sub-image, the beautifying human face sub-image is subjected to beautifying weakening processing using the original human face sub-image.
- the above-mentioned use of the original face sub-image to perform beautification and weakening processing on the beautifying face sub-image includes:
- the original face sub-image is fused to the beautifying face sub-image.
- the above-mentioned use of the original face sub-image to perform beautification and weakening processing on the beautifying face sub-image includes:
- the high-frequency images of the original face sub-image are fused to the beautifying face sub-image.
- the image acquisition module 1621 is configured to:
- the original face sub-image extracted from the original image to be beautified is combined based on the input image size of the image beautification network, if the original face sub-image is down-sampled, the down-sampled face sub-image obtained after down-sampling Carry out upsampling, obtain upsampling people's face sub-image, the resolution of upsampling people's face sub-image is identical with original people's face sub-image;
- the image generation module 1625 is configured to:
- the boundary area between the non-replaced area in the original image to be beautified and the beautifying human face sub-image is subjected to gradient processing, so that Boundary areas form smooth transitions.
- the image beautification network is a fully convolutional network, including: a first pixel rearrangement layer, at least one convolutional layer, at least one transposed convolutional layer, and a second pixel rearrangement layer.
- Beautification processing module 1623 is configured as:
- the first pixel rearrangement layer to perform pixel rearrangement processing from single-channel to multi-channel for the beautifying face image to obtain the first feature image
- transposed convolution layer to perform transposed convolution processing on the second feature image to obtain a third feature image
- the second pixel rearrangement layer is used to perform pixel rearrangement processing from multi-channel to single-channel on the third feature image to obtain a beautified face image.
- the beauty treatment module 1623 is configured to:
- a is a positive integer
- n is a positive integer not less than 2.
- the beauty treatment module 1623 is configured to:
- b is a positive integer
- n is a positive integer not less than 2.
- the image beautification processing device 1600 may also include a network training module configured to:
- the second sample image to be beautified is input into the image beautification network, and the image output by the image beautification network is transformed by the transformation parameters to obtain the second beautification sample image;
- the parameters of the image beautification network training are updated.
- the beautifying face image includes a blemish-free beautifying image; the beautifying processing module 1623 is configured to:
- Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be realized in the form of a program product, which includes program code.
- the program product When the program product is run on the electronic device, the program code is used to make the electronic device The steps described in the "Exemplary Methods" section above in this specification according to various exemplary embodiments of the present disclosure are performed.
- the program product may be implemented as a portable compact disk read-only memory (CD-ROM) containing program code and run on an electronic device, such as a personal computer.
- CD-ROM portable compact disk read-only memory
- the program product of the present disclosure is not limited thereto.
- a readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus or device.
- a program product may take the form of any combination of one or more readable media.
- the readable medium may be a readable signal medium or a readable storage medium.
- the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer readable signal medium may include a data signal carrying readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a readable signal medium may also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming Language - such as "C" or similar programming language.
- the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.
- the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider). business to connect via the Internet).
- LAN local area network
- WAN wide area network
- Internet service provider e.g., a wide area network
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
一种图像美颜处理方法、装置、存储介质与电子设备。该图像美颜处理方法包括:从连续多帧图像中获取待美颜原始图像(S310);将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定待美颜原始图像中的人脸的稳定包围盒(S320);基于待美颜原始图像中的人脸的稳定包围盒,从待美颜原始图像中提取原始人脸子图像(S330);利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像(S340);根据美颜人脸子图像生成待美颜原始图像对应的目标美颜图像(S350)。改善了连续多帧图像中的人脸美颜效果不一致的问题。
Description
本申请要求申请日为2021年07月14日,申请号为202110793989.6,名称为“图像美颜处理方法、装置、存储介质与电子设备”的中国专利申请的优先权,该中国专利申请的全部内容通过引用结合在本文中。
本公开涉及图像与视频处理技术领域,尤其涉及一种图像美颜处理方法、图像美颜处理装置、计算机可读存储介质与电子设备。
美颜是指利用图像处理技术对图像或视频中的人像进行美化处理,以更好地满足用户的审美需求。
在对连续多帧图像(例如视频)进行美颜时,对于不同帧图像中的同一张人脸,一般需要呈现出一致的美颜效果。
发明内容
本公开提供一种图像美颜处理方法、图像美颜处理装置、计算机可读存储介质与电子设备。
根据本公开的第一方面,提供一种图像美颜处理方法,包括:从连续多帧图像中获取待美颜原始图像;将所述待美颜原始图像中的人脸与所述待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定所述待美颜原始图像中的人脸的稳定包围盒;基于所述待美颜原始图像中的人脸的稳定包围盒,从所述待美颜原始图像中提取原始人脸子图像;利用图像美颜网络对所述原始人脸子图像进行处理,得到对应的美颜人脸子图像;根据所述美颜人脸子图像生成所述待美颜原始图像对应的目标美颜图像。
根据本公开的第二方面,提供一种图像美颜处理装置,包括:图像获取模块,被配置为从视频中获取待美颜原始图像;人脸匹配模块,被配置为将所述待美颜原始图像中的人脸与所述待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定所述待美颜原始图像中的人脸的稳定包围盒;子图像提取模块,被配置为基于所述待美颜原始图像中的人脸的稳定包围盒,从所述待美颜原始图像中提取原始人脸子图像;美颜处理模块,被配置为利用图像美颜网络对所述原始人脸子图像进行处理,得到对应的美颜人脸子图像;图像生成模块,被配置为根据所述美颜人脸子图像生成所述待美颜原始图像对应的目标美颜图像。
根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面的图像美颜处理方法及其可能的实施方式。
根据本公开的第四方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述第一方面的图像美颜处理方法及其可能的实施方式。
图1示出本示例性实施方式中一种系统架构的示意图;
图2示出本示例性实施方式中一种电子设备的结构示意图;
图3示出本示例性实施方式中一种图像美颜处理方法的流程图;
图4示出本示例性实施方式中一种确定稳定包围盒的流程图;
图5示出本示例性实施方式中一种获取美颜人脸子图像的流程图;
图6示出本示例性实施方式中一种组合原始人脸子图像的流程图;
图7示出本示例性实施方式中一种组合原始人脸子图像的示意图;
图8示出本示例性实施方式中一种图像美颜网络的结构示意图;
图9示出本示例性实施方式中另一种图像美颜网络的结构示意图;
图10示出本示例性实施方式中一种利用图像美颜网络处理待美颜人脸图像的流程图;
图11示出本示例性实施方式中一种训练图像美颜网络的流程图;
图12示出本示例性实施方式中一种训练图像美颜网络的示意图;
图13示出本示例性实施方式中一种边界区域渐变处理的示意图;
图14示出本示例性实施方式中一种图像美颜处理方法的示意性流程图;
图15示出本示例性实施方式中一种图像美颜处理装置的结构示意图;
图16示出本示例性实施方式中另一种图像美颜处理装置的结构示意图。
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
相关技术中,一方面,在对连续多帧图像(例如视频)进行美颜时,通常需要对其中每一帧图像分别进行美颜,如果在不同帧图像之间,特别是在相邻帧图像之间,人脸发生比较明显的运动,可能导致对不同帧图像的美颜效果不一致,包括但不限于眼袋、黑眼圈、斑痘、痣等瑕疵的不一致,脸部形状的不一致,皮肤纹理的不一致,噪点的不一致等,呈现出画面的闪动现象,影响美颜后的视觉感受。另一方面,图像美颜处理通常包括固定的多个算法流程,例如基于人为设计的图像特征计算、空间滤波处理、图层融合等。然而,实际拍摄场景中可能面临复杂多样的光照条件,且拍摄对象的皮肤状况多种多样,采用上述方法无法较好地应对不同的情况,导致美颜效果不理想。
鉴于上述一个或多个问题,本公开的示例性实施方式提供一种图像美颜处理方法。下面结合图1对本示例性实施方式运行环境的系统架构与应用场景进行示例性说明。
图1示出了系统架构的示意图,该系统架构100可以包括终端110与服务器120。其中,终端110可以是智能手机、平板电脑、台式电脑、笔记本电脑等终端设备,服务器120泛指提供本示例性实施方式中图像美颜相关服务的后台系统,可以是一台服务器或多台服务器形成的集群。终端110与服务器120之间可以通过有线或无线的通信链路形成连接,以进行数据交互。
在一种实施方式中,终端110可以拍摄或者通过其他方式获取待美颜的图像或视频,将其上传至服务器120。例如,用户在终端110上打开美颜App(Application,应用程序),从相册中选取待美颜的图像或视频,将其上传至服务器120以进行美颜,或者用户在终端110上打开直播App中的美颜功能,将实时采集的视频上传至服务器120以进行美颜。服务器120执行上述图像美颜处理方法,以得到经过美颜的图像或视频,并返回终端110。
在一种实施方式中,服务器120可以执行对图像美颜网络的训练,将经过训练的图像美颜网络发送至终端110进行部署,例如将该图像美颜网络的相关数据打包在上述美颜App或直播App的更新包中,使终端110通过更新App而获得该图像美颜网络并部署在本地。进而,终端110在拍摄或者通过其他方式获取待美颜的图像或视频后,可以通过执行上述图像美颜处理方法,调用该图像美颜网络实现图像或视频的美颜处理。
在一种实施方式中,可以由终端110执行对图像美颜网络的训练,例如从服务器120处获取图像美颜网络的基本架构,通过本地的数据集进行训练,或者从服务器120处获取数据集,对本地构建的图像美颜网络进行训练,或者完全不依赖服务器120而训练得到图像美颜网络。进而,终端110可以通过执行上述图像美颜处理方法,调用该图像美颜网络实现图像或视频的美颜处理。
由上可知,本示例性实施方式中的图像美颜处理方法的执行主体可以是上述终端110或服务器120,本公开对此不做限定。
本公开的示例性实施方式还提供一种用于执行上述图像美颜网络训练方法或图像美颜处理方法的电子设备,该电子设备可以是上述终端110或服务器120。下面以图2中的移动终端200为例,对上述电子设备的构造进行示例性说明。本领域技术人员应当理解,除了特别用于移动目的的部件之外,图2中的构造也能够应用于固定类型的设备。
如图2所示,移动终端200具体可以包括:处理器210、内部存储器221、外部存储器接口222、USB(Universal Serial Bus,通用串行总线)接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器271、受话器272、麦克风273、耳机接口274、传感器模块280、显示屏290、摄像模组291、指示器292、马达293、按键294以及SIM(Subscriber Identification Module,用户标识模块)卡接口295等。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括AP(Application Processor,应用处理器)、调制解调处理器、GPU(Graphics Processing Unit,图形处理器)、ISP(Image Signal Processor, 图像信号处理器)、控制器、编码器、解码器、DSP(Digital Signal Processor,数字信号处理器)、基带处理器和/或NPU(Neural-Network Processing Unit,神经网络处理器)等。本示例性实施方式中的图像美颜网络可以运行于GPU、DSP或NPU上,DSP与NPU通常以int型数据(整数型)运行图像美颜网络,GPU通常以float型数据(浮点型)运行图像美颜网络,相较而言,DSP与NPU上运行的功耗较低,响应速度较快,精度较低,GPU上运行的功耗较高,响应速度较满,精度较高。实际应用中,可以根据设备性能与实际需求选择合适的处理单元运行图像美颜网络,例如在对视频进行实时美颜时,对速度要求较高,可以选择DSP或NPU来运行图像美颜网络。
编码器可以对图像或视频数据进行编码(即压缩),例如对美颜处理后得到的图像或视频进行编码,形成对应的码流数据,以减少数据传输所占的带宽;解码器可以对图像或视频的码流数据进行解码(即解压缩),以还原出图像或视频数据,例如对待美颜的视频进行解码,以得到视频中每一帧的图像数据,提取其中的待美颜原始图像进行美颜处理。移动终端200可以处理多种编码格式的图像或视频,例如:JPEG(Joint Photographic Experts Group,联合图像专家组)、PNG(Portable Network Graphics,便携式网络图形)、BMP(Bitmap,位图)等图像格式,MPEG(Moving Picture Experts Group,动态图像专家组)1、MPEG2、H.263、H.264、HEVC(High Efficiency Video Coding,高效率视频编码)等视频格式。
在一种实施方式中,处理器210可以包括一个或多个接口,通过不同的接口和移动终端200的其他部件形成连接。
内部存储器221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器221可以包括易失性存储器与非易失性存储器。处理器210通过运行存储在内部存储器221的指令,执行移动终端200的各种功能应用以及数据处理。
外部存储器接口222可以用于连接外部存储器,例如Micro SD卡,实现扩展移动终端200的存储能力。外部存储器通过外部存储器接口222与处理器210通信,实现数据存储功能,例如存储图像,视频等文件。
USB接口230是符合USB标准规范的接口,可以用于连接充电器为移动终端200充电,也可以连接耳机或其他电子设备。
充电管理模块240用于从充电器接收充电输入。充电管理模块240为电池242充电的同时,还可以通过电源管理模块241为设备供电;电源管理模块241还可以监测电池的状态。
移动终端200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。移动通信模块250可以提供应用在移动终端200上2G、3G、4G、5G等移动通信解决方案。无线通信模块260可以提供应用在移动终端200上的WLAN(Wireless Local Area Networks,无线局域网)(如Wi-Fi(Wireless Fidelity,无线保真)网络)、BT(Bluetooth,蓝牙)、GNSS(Global Navigation Satellite System,全球导航卫星系统)、FM(Frequency Modulation,调频)、NFC(Near Field Communication,近距离无线通信技术)、IR(Infrared,红外技术)等无线通信解决方案。
移动终端200可以通过GPU、显示屏290及AP等实现显示功能,显示用户界面。例如,当用户进行摄像头检测时,移动终端200可以在显示屏290中显示摄像头检测App(Application,应用程序)的界面。
移动终端200可以通过ISP、摄像模组291、编码器、解码器、GPU、显示屏290及AP等实现拍摄功能。例如,用户可以在隐藏摄像头检测App中开启图像或视频拍摄功能,此时可以通过摄像模组291采集待检测空间的图像。
移动终端200可以通过音频模块270、扬声器271、受话器272、麦克风273、耳机接口274及AP等实现音频功能。
传感器模块280可以包括深度传感器2801、压力传感器2802、陀螺仪传感器2803、气压传感器2804等,以实现相应的感应检测功能。
指示器292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。马达293可以产生振动提示,也可以用于触摸振动反馈等。按键294包括开机键,音量键等。
移动终端200可以支持一个或多个SIM卡接口295,用于连接SIM卡,以实现通话与移动通信等功能。
下面结合图3对本示例性实施方式中的图像美颜网络训练方法进行说明,图3示出了该图像美颜网络训练方法的示例性流程,可以包括:
步骤S310,从连续多帧图像中获取待美颜原始图像;
步骤S320,将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据 匹配结果确定待美颜原始图像中的人脸的稳定包围盒;
步骤S330,基于待美颜原始图像中的人脸的稳定包围盒,从待美颜原始图像中提取原始人脸子图像;
步骤S340,利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像;
步骤S350,根据美颜人脸子图像生成待美颜原始图像对应的目标美颜图像。
其中,图像美颜网络可以被训练为实现任意一种或多种美颜功能组合,美颜功能包括但不限于去瑕疵、形变、肤色调整、磨皮、光影调整等。由此,可以将图3的图像美颜处理方法作为一个阶段的美颜处理,在图3的图像美颜处理方法之前或之后,增加其他阶段的美颜处理。例如,图像美颜网络用于对图像进行去瑕疵处理。在获取待美颜原始图像后,通过图3的图像美颜处理方法进行处理,得到的目标美颜图像为去瑕疵美颜图像。后续还可以对去瑕疵美颜图像进行个性化美颜处理,得到最终的美颜图像。
一般的,去瑕疵处理对于图像美颜来说是必需的,且用户对于去瑕疵处理的需求较为固定化,可以通过图3的图像美颜处理方法实现通用化的去瑕疵美颜处理流程。相比之下,磨皮、形变、立体、肤色调整、光影调整等美颜功能不是必需的,且用户对于这些美颜功能的具体需求也呈现个性化的特点,可以将这些美颜功能称为个性化美颜处理,通常需要用户进行具体的设置后进行处理,例如用户选择其中的一种或多种美颜功能,并设置磨皮度、形变度等参数,然后由终端或服务器根据用户的设置进行处理。
需要说明的是,本公开对于图3的图像美颜处理与其他美颜处理的先后顺序不做限定。例如可以先对待处理图像进行个性化美颜处理,得到中间美颜图像,再以中间美颜图像作为待美颜原始图像,执行图3的图像美颜处理,得到的目标美颜图像为最终输出的美颜图像。
基于上述图像美颜处理方法,一方面,通过匹配待美颜原始图像中的人脸与参考帧图像中的人脸,以确定人脸的稳定包围盒,进而提取原始人脸子图像以进行美颜处理,使得待美颜原始图像中人脸的稳定包围盒在一定程度上继承了参考帧图像中的相关信息,由此,在对多帧连续图像中的不同帧待美颜原始图像进行处理时,从不同帧待美颜原始图像中所提取的人脸具有一定的连续性与稳定性,不会发生剧烈的变化,进而保证了对人脸进行美颜处理的效果一致性,例如不同帧待美颜原始图像中的人脸经过美颜处理后具有瑕疵一致性,脸部形状的一致性,皮肤纹理、肤色的一致性等,有利于提升视觉感受。另一方面,通过图像美颜网络可以实现去瑕疵或其他美颜功能,以替代相关技术中固定的多个算法流程,增加了图像美颜处理的灵活性,适用于多样的光照条件或皮肤状况,改善了图像美颜效果,并且降低了耗时与内存占用。
下面对图3中的每个步骤做具体说明。
参考图3,在步骤S310中,从连续多帧图像中获取待美颜原始图像。
其中,连续多帧图像可以是视频,也可以是连拍的图像等。该连续多帧图像为需要进行美颜处理的对象。以视频为例,其可以是当前实时拍摄或实时接收的视频流,也可以是已完成拍摄或接收的完整视频,如本地存储的一段视频。本公开对于视频的帧率、图像分辨率等参数不做限定,例如视频帧率可以采用30fps(帧每秒)、60fps、120fps等,图像分辨率可以采用720P、1080P、4K等以及对应的不同宽高比。
本示例性实施方式中,可以对视频中的每一帧原始图像均进行美颜处理,也可以从视频中筛选出一部分原始图像进行美颜处理,将需要进行美颜处理的原始图像称为待美颜原始图像。
在一种实施方式中,可以从视频中获取至少两帧待美颜原始图像。例如,可以将包含目标人脸的原始图像作为待美颜原始图像,也可以采取隔帧策略,间隔一定的帧数获取一帧待美颜原始图像。
在一种实施方式中,实时接收视频流时,可以将所接收的每一帧原始图像均作为待美颜原始图像。
继续参考图3,在步骤S320,将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定待美颜原始图像中的人脸的稳定包围盒。
包围盒(Bounding Box)是指图像中包围人脸、且具有一定几何形状的区域,本公开对于包围盒的形状不做限定,如可以是矩形、梯形等任意形状。将初始检测出的人脸区域包围盒称为基础包围盒,例如可以是包含人脸的最小包围盒、或者通过相关算法得到的人脸框。对基础包围盒进行优化,例如扩展、位置修正等,将优化后的包围盒称为稳定包围盒。
本示例性实施方式中,可以对待美颜原始图像进行人脸检测,以得到人脸的相关信息,本公开对于人脸检测算法不做限定,例如可以通过特定的神经网络检测人脸关键点,包括脸部边界的关键点,根据脸部边界的关键点生成人脸的基础包围盒,通过优化得到稳定包围盒。
参考帧图像可以是上述连续多帧图像中已确定人脸稳定包围盒或者已完成美颜处理的任一帧图像,例如在对视频进行逐帧美颜处理时,可以将待美颜原始图像的上一帧图像作为参考帧图像。通过匹配待美颜原始图像与参考帧图像中的人脸,可以基于参考帧图像中的人脸的稳定包围盒,确定待美颜原始图像中的人脸的稳定包围盒。
在一种实施方式中,参考图4所示,上述将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定待美颜原始图像中的人脸的稳定包围盒,可以包括以下步骤S410至S430:
步骤S410,检测待美颜原始图像中的人脸,记为待确定人脸,将待确定人脸与待美颜原始图像的参考帧图像中的已确定人脸进行匹配。
其中,待确定人脸是指需要进行美颜、但未被确定稳定包围盒的人脸,可视为身份未知的人脸,已确定人脸是指已被确定稳定包围盒的人脸,可视为身份已知的人脸。参考帧图像中已被确定稳定包围盒的人脸均为已确定人脸。相对应的,待美颜原始图像中所检测出的人脸为未被确定稳定包围盒的人脸,即待确定人脸。将待美颜原始图像中的待确定人脸与参考帧图像中的已确定人脸进行匹配,可以推断待确定人脸的稳定包围盒与该待确定人脸匹配的已确定人脸的稳定包围盒之间具有相关性,可以由此确定待确定人脸的稳定包围盒。
一般的,通过人脸检测算法可以在待美颜原始图像中检测出所有的人脸,其中可能包括不需要美颜的人脸(例如远处路人的人脸),考虑到在图像美颜的场景中,通常需要对较大的人脸进行美颜(较小的人脸美颜后效果不明显),因此可以通过人脸面积阈值对所检测出的人脸进行过滤。在一种实施方式中,可以根据经验或者待美颜原始图像的大小,设置人脸面积阈值,示例性的,人脸面积阈值可以是待美颜原始图像的尺寸*0.05;如果人脸的基础包围盒的面积大于或等于人脸面积阈值,则为需要美颜的人脸,可以将该人脸的基础包围盒等信息予以保留,也可以将该人脸记为待确定人脸;如果人脸的基础包围盒的面积小于人脸面积阈值,则为不需要美颜的人脸,可以删除该人脸的基础包围盒等相关信息,不对其进行后续处理。
在一种实施方式中,为便于后续对原始人脸子图像进行处理,例如对原始人脸子图像进行组合处理,或者考虑到设备性能的限制,可以设置原始人脸子图像的数量上限,即设置待确定人脸的数量上限。如可以设置为4,如果经过上述人脸面积阈值的过滤后,所保留的人脸的数量大于4,则可以从中进一步筛选出4张待确定人脸,如可以是面积最大的4张人脸,也可以是距离待美颜原始图像的中心最近的4张人脸,这样在后续对应截取4张原始人脸子图像,对于其他人脸则不进行后续处理。或者,可以进行多次美颜处理,本次处理中选取4张人脸作为待确定人脸,截取其对应的原始人脸子图像并进行美颜处理,下次处理中选取其他人脸作为待确定人脸,并截取其对应的原始人脸子图像进行美颜处理,从而完成对待处理图像中所有面积大于人脸面积阈值的基础包围盒内的人脸进行美颜处理。
在一种实施方式中,为便于跟踪与识别连续多帧图像中的人脸,可以对每一张人脸分配ID(Identity Document,标识)。例如,从第一帧开始,为每一张人脸分配ID;后续在每一帧中检测出人脸后,将每一张人脸与上一帧中的人脸进行匹配;如果匹配成功,则继承上一帧中的人脸ID与其他相关信息;如果匹配不成功,则作为新的人脸,分配新的ID。
本公开对于匹配待确定人脸与已确定人脸的方式不做限定,例如可以采用人脸识别算法,将每一张待确定人脸与每一张已确定人脸进行识别比对,如果相似度高于预设的相似度阈值,则确定待确定人脸与已确定人脸匹配成功。
在一种实施方式中,可以根据待确定人脸的基础包围盒与已确定人脸的基础包围盒的重叠度(Intersection Over Union,IOU,也称为交并比),确定待确定人脸与已确定人脸是否匹配成功。下面提供计算重叠度的示例性方式:
获取待确定人脸的基础包围盒在待美颜原始图像中的位置,以及已确定人脸的基础包围盒在参考帧图像中的位置,统计两个基础包围盒中位置重合的像素点数量,记为k1,以及位置不重合的像素点数量,记为k2(表示待确定人脸的基础包围盒中与已确定人脸的基础包围盒不重合的像素点数量)与k3(表示已确定人脸的基础包围盒中与未确定人脸的基础包围盒不重合的像素点数量),则两个基础包围盒的重叠度为:
在确定重叠度后,如果该重叠度达到预设的重叠度阈值,则确定该待确定人脸与该已确定人脸匹配成功。重叠度阈值可以根据经验与实际需求设定,例如可以设为0.75。
此外,还可以通过ICP(Iterative Closest Point,迭代最近邻点)算法等对待确定人脸的基础包围盒与已确定人脸的基础包围盒中的任一个进行迭代变换,并根据最终变换后的待确定人脸的基础包围盒与已确定人脸的基础包围盒中像素值相同的像素点数量与像素值不同的像素点数量计算两个基础包围盒的重叠度,由此判断是否匹配成功。
需要说明的是,由于待美颜原始图像中可能存在多个待确定人脸,参考帧图像中存在多个已确定人脸,可以分别对每一张待确定人脸与每一张已确定人脸进行匹配计算,得到相似度矩阵或重叠度矩阵,进而可以采用匈牙利算法等实现全局的最大匹配,再根据每一对待确定人脸与已确定人脸的相似度或重 叠度确定其是否匹配成功。
步骤S420,如果待确定人脸与已确定人脸匹配不成功,则根据第一预设参数对待确定人脸的基础包围盒进行扩展,得到待确定人脸的稳定包围盒;
待确定人脸与已确定人脸匹配不成功,说明该待确定人脸为连续多帧图像中新出现的人脸,无法从参考帧图像中获得参考信息。因此可以在该待确定人脸的基础包围盒基础上,进行适当的扩展,得到稳定包围盒。第一预设参数为针对新出现人脸的基础包围盒的扩展参数,可以根据经验或实际需求确定,例如可以是对基础包围盒的宽度与高度均扩展1/4。
假设待确定人脸的基础包围盒表示为[bb0,bb1,bb2,bb3],bb0为基础包围盒左上点的横坐标,bb1为基础包围盒左上点的纵坐标,bb2为基础包围盒右下点的横坐标,bb3为基础包围盒右下点的纵坐标,该基础包围盒的宽度为w,高度为h。请注意,图像中的像素坐标通常是以图像左上点为(0,0),右下点为(W,H),W与H表示图像的宽度与高度。因此有bb0<bb2,bb1<bb3。以E1表示第一预设参数,当根据第一预设参数对该基础包围盒进行中心扩展(即上下左右均匀扩展)时,可以得到稳定包围盒的尺寸为:
其中,expand_w与expand_h分别为待确定人脸的稳定包围盒的宽度与高度。需要说明的是,如果扩展后的宽度expand_w超出待美颜原始图像的宽度W,则expand_w=W;如果扩展后的高度expand_h超出待美颜原始图像的高度H,则expand_h=H。
稳定包围盒的中心点坐标等于基础包围盒的中心点坐标,即:
其中,center_x表示待确定人脸的稳定包围盒的中心点x坐标,center_y表示待确定人脸的稳定包围盒的中心点y坐标。
则可以计算稳定包围盒的左上点与右下点的坐标如下:
其中,expand_bb0为稳定包围盒左上点的横坐标,expand_bb1为稳定包围盒左上点的纵坐标,expand_bb2为稳定包围盒右下点的横坐标,expand_bb3为稳定包围盒右下点的纵坐标。由此得到待确定人脸的稳定包围盒。如果计算出的坐标超出了待美颜原始图像的边界,则以待美颜原始图像的边界坐标替代该超出边界的坐标。最终可以将该扩展包围表示为[expand_bb0,expand_bb1,expand_bb2,expand_bb3]的形式。
需要补充的是,上述各坐标通常采用图像中的像素坐标,为整数,因此在计算时,可以采用float型数据进行计算,然后进行取整,并将结果保存为int型数据。示例性的,在涉及到除法运算时,采用float型数据进行计算并缓存中间结果,在计算最终的结果(包括上述expand_w、expand_h、center_x、center_y、expand_bb0、expand_bb1、expand_bb2、expand_bb3)时进行取整,并以int型数据进行保存。
对于中心点坐标,由于保存int型数据会影响后续处理其他帧的精确性,可以保存int型与float型数据,例如将公式(3)中计算的结果保存为float型数据,如下所示:
其中,center_x_float、center_y_float表示以float型数据所保存的中心点坐标,center_x、center_y表示以int型数据所保存的中心点坐标,int()表示取整运算。
进一步的,为保证结果的准确性,可以将公式(4)更改为如下计算方法:
步骤S430,如果待确定人脸与已确定人脸匹配成功,则根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒。
一般的,待美颜原始图像中的待确定人脸相对于与之匹配的参考帧图像中的已确定人脸来说,变化 不会太大,体现为位置变化与尺寸变化均不会太大,因此可以在已确定人脸的稳定包围盒的基础上,进行适当的位置变化与尺寸变化,得到待确定人脸的稳定包围盒。
在一种实施方式中,可以根据待确定人脸的基础包围盒相对于已确定人脸的基础包围盒的位置变化参数与尺寸变化参数,对该已确定人脸的稳定包围盒进行位置变化与尺寸变化,得到该待确定人脸的稳定包围盒。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,可以包括以下步骤:
基于预设稳定系数,对已确定人脸的稳定包围盒的中心点坐标与待确定人脸的基础包围盒的中心点坐标进行加权,得到待确定人脸的稳定包围盒的中心点坐标。
上述步骤表示将已确定人脸的稳定包围盒的位置与待确定人脸的基础包围盒的位置进行融合,以作为待确定人脸的稳定包围盒的位置。在融合时,采用预设稳定系数对两者的中心点坐标进行加权,预设稳定系数可以是已确定人脸的稳定包围盒的权重,可以根据经验或实际场景确定。一般的,人脸移动越快的场景,预设稳定系数越小。示例性的,在直播场景中,人脸通常在一定的范围内移动幅度很小,预设稳定系数可以设为0.9,则计算待确定人脸的稳定包围盒的中心点坐标如下:
其中,pre_center_x表示已确定人脸的稳定包围盒的中心点x坐标,pre_center_y表示已确定人脸的稳定包围盒的中心点y坐标。可见,公式(7)表示以已确定人脸的稳定包围盒的中心点坐标权重为0.9,待确定人脸的基础包围盒的中心点坐标权重为0.1,对两个中心点坐标进行加权,得到待确定人脸的稳定包围盒的中心点坐标。
与上述公式(5)类似的,可以保存int型与float型数据中心点坐标,则有:
其中,pre_center_x_float是已保存的pre_center_x的float型数据,pre_center_y_float是已保存的pre_center_y的float型数据。
通过上述加权以计算中心点坐标的方式,实质上采用了对中心点坐标进行动量更新的机制,能够避免同一张人脸的稳定包围盒的中心点坐标从参考帧图像到待美颜原始图像出现过度移动,以导致后续截取的原始人脸子图像发生抖动,影响美颜效果。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,可以包括以下步骤:
如果待确定人脸的基础包围盒的尺寸大于已确定人脸的稳定包围盒的尺寸与第一倍率之积,则根据第二预设参数对已确定人脸的稳定包围盒的尺寸进行扩展,得到待确定人脸的稳定包围盒的尺寸;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则根据第三预设参数对已确定人脸的稳定包围盒的尺寸进行缩小,得到待确定人脸的稳定包围盒的尺寸;第一倍率大于第二倍率;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第一倍率之积、且大于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则将已确定人脸的稳定包围盒的尺寸作为待确定人脸的稳定包围盒的尺寸。
上述步骤表示根据待确定人脸的基础包围盒的尺寸与已确定人脸的稳定包围盒的尺寸的比较结果,分为三种情况分别进行计算。第一倍率与第二倍率可以是整数倍率,也可以是非整数倍率。在一种实施方式中,第一倍率大于或等于1,第二倍率小于1。示例性的,第一倍率可以是1,第二倍率可以是0.64。
在进行计算时,可以对宽度与高度分别进行比较与计算,例如宽度的比较结果属于上述第一种情况,高度的比较结果属于第二种情况,则分别在两种情况下计算待确定人脸的稳定包围盒的宽度与高度。
假设第一倍率为t1,第二倍率为t2,对宽度的计算进行说明:
第一种情况、如果w>pre_expand_w·t1,以E2表示第二预设参数,则有:
expand_w=pre_expand_w+pre_expand_w·E2 (9)
第二种情况、如果w<pre_expand_w·t2,以E3表示第三预设参数,则有:
expand_w=pre_expand_w-pre_expand_w·E3 (10)
第三种情况、如果pre_expand_w·t2<w<pre_expand_w·t1,则有:
expand_w=pre_expand_w (11)
对于高度,同样可以按照上述三种情况分别进行计算,得到expand_h。
一般的,在视频连续多帧图像中,只要人脸不快速靠近镜头、不快速远离镜头或者不移出画面,人脸的尺寸不会发生剧烈变化,则满足上述第三种情况,此时使待确定人脸的稳定包围盒的尺寸等于已确定人脸的稳定包围盒的尺寸,即保持稳定包围盒的尺寸不变。而上述第一种情况与第二种情况均是人脸的尺寸发生剧烈变化的情况,第一种情况是人脸剧烈变大,此时根据第二预设参数适当扩大已确定人脸的稳定包围盒的尺寸,得到待确定人脸的稳定包围盒的尺寸,第二预设参数可以根据经验与实际场景确定;第二种情况是人脸剧烈变小,此时根据第三预设参数适当缩小已确定人脸的稳定包围盒的尺寸,得到待确定人脸的稳定包围盒的尺寸,第三预设参数可以根据经验与实际场景确定。
如果扩展后的宽度expand_w超出待美颜原始图像的宽度W,则expand_w=W;如果扩展后的高度expand_h超出待美颜原始图像的高度H,则expand_h=H。
通过上述三种情况的计算,能够避免同一张人脸的稳定包围盒的尺寸从参考帧图像到待美颜原始图像出现过度变化,以导致后续截取的原始人脸子图像发生抖动,影响美颜效果
分别得到待确定人脸的稳定包围盒的中心点坐标与尺寸后,可以计算出该稳定包围盒的左上点与右下点的坐标。如果计算出的坐标超出了待美颜原始图像的边界,则以待美颜原始图像的边界坐标替代该超出边界的坐标。最终可以将该稳定包围盒表示为[expand_bb0,expand_bb1,expand_bb2,expand_bb3]的形式。
由上可知,在待确定人脸与已确定人脸匹配成功的情况下,根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,使得待确定人脸在一定程度上继承了已确定人脸的稳定包围盒的信息,从而保证了不同帧图像之间人脸的稳定包围盒具有一定的连续性与稳定性,不会发生剧烈的位置或尺寸变化,进而保证了后续进行美颜处理时人脸美颜效果的一致性,防止由于人脸的剧烈变化导致美颜后的人脸发生闪动现象。
在一种实施方式中,得到待确定人脸的稳定包围盒后,可以保存其稳定包围盒的相关参数,并将该待确定人脸标记为已确定人脸,以用于后续帧中待确定人脸的匹配与稳定包围盒的确定。
继续参考图3,在步骤S330中,基于待美颜原始图像中的人脸的稳定包围盒,从待美颜原始图像中提取原始人脸子图像。
从待美颜原始图像中截取稳定包围盒部分的图像,得到原始人脸子图像。当待美颜原始图像中包括多张人脸的稳定包围盒时,可以截取每一张人脸对应的原始人脸子图像。
继续参考图3,在步骤S340中,利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像。
当从待美颜原始图像中提取多张原始人脸子图像时,可以分别将每一张原始人脸子图像输入图像美颜网络,得到每一张原始人脸子图像对应的美颜人脸子图像,也可以将多张原始人脸子图像进行组合后输入图像美颜网络以进行处理。在一种实施方式中,参考图5所示,上述利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像,可以包括以下步骤S510至S530:
步骤S510,基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合,生成待美颜人脸图像。
在一种实施方式中,图像美颜网络可以是全卷积网络,全卷积网络可以处理不同尺寸的图像。在这种情况下,图像美颜网络对于输入的图像尺寸没有要求,尺寸的大小对于计算量、内存占用、美颜精细度有影响。可以根据用户设置的美颜精细度或者设备的性能,确定输入图像尺寸。由此,该图像美颜网络可以部署在高、中、低等不同性能的设备上,适用范围很广,无需针对不同的设备部署不同的图像美颜网络,降低了网络的训练成本。示例性的,考虑在移动终端上适合进行轻量化计算,可以将输入图像尺寸确定为较小的数值,例如为宽640*高448。
图像美颜网络的输入图像尺寸决定了待美颜人脸图像的清晰度,在清晰度较低时,不适合进行某些美颜功能,例如对于极小黑痣、干燥唇纹等微小瑕疵,其在待美颜人脸图像中占据的像素数较少,如果进行去除,则可能去除不准确,影响周围的皮肤纹理,或者产生闪动的现象。因此,根据图像美颜网络的实际部署环境,确定图像美颜网络的输入图像尺寸,进而确定图像美颜网络的美颜功能,并以此构建美颜图像数据集并进行训练。示例性的,当图像美颜网络的输入图像尺寸小于宽448*高320时,可以设置图像美颜网络不包括去除微小瑕疵的功能。
本示例性实施方式中,当获取多帧待美颜原始图像时,可以分别对每一帧待美颜原始图像中所截取的原始人脸子图像进行组合,使得一帧待美颜原始图像对应一帧待美颜人脸图像。例如,可以对视频进行逐帧美颜处理,依次对每一帧图像中所截取的原始人脸子图像进行组合,得到每一帧图像对应的每一帧待美颜人脸图像。也可以将多帧待美颜原始图像中所截取的原始人脸子图像进行组合,生成该多帧待 美颜原始图像对应的一帧或多帧待美颜人脸图像。例如,可以对视频中的多帧进行合并美颜处理,将从连续多帧图像中所截取的原始人脸子图像进行任意组合,以匹配上述输入图像尺寸,如从连续两帧图像中均分别截取两张原始人脸子图像,将这四张原始人脸子图像组合为一帧待美颜人脸图像。
在获取输入图像尺寸后,需要将原始人脸子图像组合为该尺寸大小的待美颜人脸图像。具体组合的方式与原始人脸子图像的数量相关。在一种实施方式中,参考图6所示,上述基于图像美颜网络的输入图像尺寸将上述一张或多张原始人脸子图像进行组合,生成待美颜人脸图像,可以包括以下步骤S610至S630:
步骤S610,根据原始人脸子图像的数量,将输入图像尺寸分割为原始人脸子图像对应的一一对应的子图像尺寸;
步骤S620,分别基于每个子图像尺寸将对应的原始人脸子图像进行变换;
步骤S630,将变换后的原始人脸子图像进行组合,生成待美颜人脸图像。
下面结合图7举例说明。图7中Q表示原始人脸子图像的数量,图7分别示出了Q为1~4时的输入图像尺寸分割与图像组合的示例性方式。假设输入图像尺寸为宽640*高448,Q为1时,子图像尺寸也为宽640*高448;Q为2时,子图像尺寸为输入图像尺寸的一半,即宽320*高448;Q为3时,子图像尺寸分别为输入图像尺寸的0.5、0.25、0.25,即宽320*高448、宽320*高224、宽320*高224;Q为4时,子图像尺寸分别均为输入图像尺寸的0.25,即宽320*高224。将各个原始人脸子图像分别变换为与子图像尺寸一致,需要特别说明的是,当各个子图像尺寸不一致时,如Q为3的情况,可以按照原始人脸子图像的大小顺序与子图像尺寸的大小顺序,将原始人脸子图像与子图像尺寸进行一一对应,即最大的原始人脸子图像对应到最大的子图像尺寸,最小的原始人脸子图像对应到最小的子图像尺寸。在将原始人脸子图像进行变换后,再将变换后的原始人脸子图像按照图10所示的方式进行组合,生成一张待美颜人脸图像。
在一种实施方式中,当Q为偶数时,可以将输入图像尺寸进行Q等分,得到Q个相同的子图像尺寸。具体地,可以将Q分解为两个因数的乘积,即Q=q
1*q
2,使q
1/q
2的比例与输入图像尺寸的宽高比(如
)尽可能接近,将输入图像尺寸的宽度进行q
1等分,高度进行q
2等分。当Q为奇数时,将输入图像尺寸进行Q+1等分,得到Q+1个相同的子图像尺寸,将其中的两个子图像尺寸合并为一个子图像尺寸,其余Q-1个子图像不变,由此得到Q个子图像尺寸。
在另一种实施方式中,可以先计算原始人脸子图像的尺寸比例(或面积比例),如可以是S
1:S
2:S
3:…:S
Q,再按照该比例将输入图像尺寸分割为Q个子图像尺寸。
确定每个原始人脸子图像对应的子图像尺寸后,可以基于子图像尺寸对原始人脸子图像进行变换。在一种实施方式中,对原始人脸子图像进行变换,可以包括以下任意一条或多条:
①当原始人脸子图像的宽度与高度的大小关系与子图像尺寸的宽度与高度的大小关系不同时,将原始人脸子图像旋转90度。具体来说,在原始人脸子图像与子图像尺寸中,均为宽度大于高度或者均为宽度小于高度,则原始人脸子图像与子图像尺寸的宽度与高度的大小关系相同,无需旋转原始人脸子图像;否则,原始人脸子图像与子图像尺寸的宽度与高度的大小关系不同,需要将原始人脸子图像旋转90度(顺时针或逆时针旋转皆可)。例如,子图像尺寸为宽320*高448时,即宽度小于高度,如果原始人脸子图像为宽度大于高度的情况,则将原始人脸子图像旋转90度。
在一种实施方式中,为了保持原始人脸子图像中人脸的角度,可以不对原始人脸子图像进行旋转。
②当原始人脸子图像的尺寸大于子图像尺寸时,根据子图像尺寸将原始人脸子图像进行下采样。其中,原始人脸子图像的尺寸大于子图像尺寸,是指原始人脸子图像的宽度大于子图像尺寸的宽度,或者原始人脸子图像的高度大于子图像尺寸的高度。在图像美颜场景中,待处理图像一般是终端设备拍摄的清晰图像,其尺寸较大,因此原始人脸子图像的尺寸大于子图像尺寸是比较常见的情况,即通常情况下需要对原始人脸子图像进行下采样。
下采样可以采用双线性插值、最近邻插值等方法实现,本公开对此不做限定。
在进行下采样后,原始人脸子图像的宽度与高度中的至少一个与子图像尺寸对齐,具体包括以下几种情况:
原始人脸子图像的宽度、高度均与子图像尺寸相同;
原始人脸子图像的宽度与子图像尺寸的宽度相同,高度小于子图像尺寸的高度;
原始人脸子图像的高度与子图像尺寸的高度相同,宽度小于子图像尺寸的宽度。
需要说明的是,如果已经对原始人脸子图像进行了上述旋转,得到经过旋转的原始人脸子图像,则当该原始人脸子图像的尺寸大于子图像尺寸时,根据子图像尺寸对其进行下采样,具体的实现方式与上述原始人脸子图像的下采样方式相同,因而不再赘述。
反之,当原始人脸子图像(或经过旋转的原始人脸子图像)的尺寸小于或等于子图像尺寸时,可以 不进行下采样的处理步骤。
③当原始人脸子图像的尺寸小于子图像尺寸时,根据原始人脸子图像与子图像尺寸的差值将原始人脸子图像进行填充,使填充后的原始人脸子图像的尺寸等于子图像尺寸。其中,原始人脸子图像的尺寸小于子图像尺寸,是指原始人脸子图像的宽度与高度中的至少一个小于子图像尺寸,另一个不大于子图像尺寸,具体包括以下几种情况:
原始人脸子图像的宽度小于子图像尺寸的宽度,高度也小于子图像尺寸的高度;
原始人脸子图像的宽度小于子图像尺寸的宽度,高度等于子图像尺寸的高度;
原始人脸子图像的高度小于子图像尺寸的高度,宽度等于子图像尺寸的高度。
填充时可以采用预设像素值,通常是与人脸颜色差别较大的像素值,如(R0,G0,B0)、(R255,G255,B255)等。
一般可以填充在原始人脸子图像的四周,例如将原始人脸子图像的中心与子图像尺寸的中心重合,对原始人脸子图像四周的差值部分进行填充,使填充后原始人脸子图像的尺寸与子图像尺寸一致。当然也可以将原始人脸子图像与子图像尺寸的一侧边缘对齐,对另一侧进行填充。本公开对此不做限定。
需要说明的是,如果已经对原始人脸子图像进行了上述旋转与下采样中至少一种处理,得到经过旋转与下采样中至少一种处理的原始人脸子图像,则当该原始人脸子图像的尺寸小于子图像尺寸时,根据其与子图像尺寸的差值进行填充,具体的实现方式与上述原始人脸子图像的填充方式相同,因而不再赘述。
上述①~③为常用的三种变换方式,可以根据实际需求使用其中的任意一种或多种。例如,依次采用①、②、③对每一张原始人脸子图像进行处理,将处理后的原始人脸子图像组合为待美颜人脸图像。
在上述变换中,改变了原始人脸子图像的方向、尺寸等,这是为了便于图像美颜网络的统一处理。后续还需要对美颜后的图像进行逆变换,使其恢复为与原始人脸子图像的方向、尺寸等一致,以适应待美颜原始图像的尺寸。因此,可以保存相应的变换信息,包括但不限于:对每一张原始人脸子图像旋转的方向与角度,下采样的比例,填充的像素的坐标。这样便于后续根据该变换信息进行逆变换。
在将变换后的原始人脸子图像进行组合后,可以保存组合信息,包括但不限于每一张原始人脸子图像的尺寸(即对应的子图像尺寸)以及在待美颜人脸图像中的位置,各原始人脸子图像的排列方式与顺序。后续可以根据该组合信息对美颜人脸组合图像进行拆分,以得到每个单独的美颜人脸子图像。
步骤S520,利用图像美颜网络对待美颜人脸图像进行处理,得到对应的美颜人脸图像。
本示例性实施方式中,可以根据实际需求设置任意结构的图像美颜网络。一般的,图像美颜网络的输入与输出均为图像,因此可以采用端到端(end-to-end)结构,例如可以是全卷积网络。考虑到图像美颜处理需要对图像进行较为深入的特征挖掘与学习,因此图像美颜网络可以采用深度神经网络(Deep Neural Network,DNN),通过增加网络层数(即网络深度)以减少参数量,同时能够学习到图像的深层特征,实现像素级处理。
图8示出了图像美颜网络的示意性结构图,可以采用U-Net结构。示例性的,将待美颜人脸图像输入图像美颜网络后,由卷积层1进行一次或多次卷积操作(图4中示出卷积层1进行两次卷积操作,本公开对于每个卷积层中具体的卷积操作次数不做限定),然后经过池化操作,得到尺寸减小的特征图像;由卷积层2再进行一轮卷积与池化操作,得到尺寸进一步减小的特征图像;由卷积层3再进行一轮卷积与池化操作,得到尺寸更小的特征图像;在卷积层4中进行卷积操作,但不进行池化操作;之后进入转置卷积层1,先进行转置卷积操作,再与卷积层3中的特征图像进行拼接,然后进行一次或多次卷积操作,得到尺寸增大的特征图像;由转置卷积层2再进行一轮转置卷积操作、与卷积层2中的特征图像的拼接、以及卷积操作,得到尺寸进一步增大的特征图像;最后由转置卷积层3再进行一轮上述操作,输出美颜人脸图像。需要说明的是,本公开对于图像美颜网络中卷积层、转置卷积层的数量不做限定,根据实际场景需求,还可以在图像美颜网络中增加其他类型的中间层,如像素重排层、Dropout层(丢弃层)、全连接层等。
在一种实施方式中,图像美颜网络可以是全卷积网络,包括:第一像素重排层、至少一个卷积层、至少一个转置卷积层、第二像素重排层,其结构可以参考图9所示。与图8的网络结构相比,主要增加了两个像素重排层。基于图9所示的图像美颜网络,上述利用图像美颜网络对待美颜人脸图像进行处理,得到对应的美颜人脸图像,可以包括图10中的步骤S1010至S1040:
步骤S1010,利用第一像素重排层对待美颜人脸图像进行由单通道到多通道的像素重排处理,得到第一特征图像。
需要说明的是,待美颜人脸图像原本可以是单通道图像(如灰度图像),也可以是多通道图像(如RGB图像)。第一像素重排层可以将待美颜人脸图像的每个通道重排为多个通道。
在一种可选的实施方式中,步骤S1010包括:
将通道数为a的待美颜人脸图像输入第一像素重排层;
将待美颜人脸图像的每个通道中每n*n邻域的像素点分别重排至n*n个通道中的相同位置,输出通道数为a*n*n的第一特征图像。
其中,a表示待美颜人脸图像的通道数,为正整数,n表示像素重排的参数,为不小于2的正整数。以n=2为例,遍历待美颜人脸图像的第一通道,通常从左上角开始,将每2*2格子的像素点提取出来,分别重排到4个通道中的相同位置,由此将一个通道变为四个通道,同时图像的宽和高降低到一半,将重排后的图像记为第一特征图像;采用同样的方式处理其他通道。如果待美颜人脸图像为单通道图像,则像素重排后得到四通道的第一特征图像;如果待美颜人脸图像为三通道图像,则像素重排后得到十二通道的第一特征图像。
第一像素重排层可以采用TensorFlow(一种机器学习的实现框架)中的space_to_depth函数实现,将待美颜人脸图像中的空间特征转换为深度特征,也可用采用步长为n的卷积操作实现,此时第一像素重排层可视为特殊的卷积层。
步骤S1020,利用卷积层对第一特征图像进行卷积处理,得到第二特征图像。
本公开对于卷积层的数量、卷积核尺寸、卷积层的具体结构等不做限定。卷积层用于从不同尺度上提取图像特征并学习深度信息。卷积层可以包括配套的池化层,用于对卷积后的图像进行下采样,以实现信息抽象,增大感受野,同时降低参数复杂度。
当设置多个卷积层时,可以采用逐步卷积与下采样的方式,例如可以使图像按照2倍率下降,直到最后一个卷积层输出第二特征图像,第二特征图像可以是图像美颜网络处理过程中尺寸最小的特征图像。
步骤S1030,利用转置卷积层对第二特征图像进行转置卷积处理,得到第三特征图像。
本公开对于转置卷积层的数量、转置卷积核尺寸、转置卷积层的具体结构等不做限定。转置卷积层用于对第二特征图像进行上采样,可视为卷积的相反过程,由此恢复图像的尺寸。
当设置多个转置卷积层时,可以采用逐步上采样的方式,例如可以使图像按照2倍率上升,直到最后一个转置卷积层输出第三特征图像。
在一种可选的实施方式中,卷积层与转置卷积层为完全对称的结构,则第三特征图像与第一特征图像的尺寸、通道数相同。
在一种可选的实施方式中,可以在卷积层与转置卷积层之间建立直连,如图11所示,在对应于相同尺寸的特征图像的卷积层与转置卷积层之间建立直连,由此实现卷积环节的特征图像信息直接连接到转置卷积环节中的特征图像,有利于得到信息更为全面的第三特征图像。
步骤S1040,利用第二像素重排层对第三特征图像进行由多通道到单通道的像素重排处理,得到美颜人脸组合图像。
第二像素重排层可以将第三特征图像的多个通道重排为单个通道。在一种可选的实施方式中,步骤S1240包括:
将通道数为b*n*n的第三特征图像输入第二像素重排层;
将第三特征图像的每n*n个通道中相同位置的像素点重排至单通道中的n*n邻域内,输出通道数为b的美颜人脸组合图像;
其中,b为正整数。
第二像素重排层可以采用TensorFlow中的depth_to_space函数实现,将第三特征图像中的深度特征转换为空间特征,也可用采用步长为n的转置卷积操作实现,此时第二像素重排层可视为特殊的转置卷积层。
如果卷积层与转置卷积层为完全对称的结构,即第三特征图像与第一特征图像的尺寸、通道数相同,则有a=b,步骤S1240可以是步骤S1210的逆操作。进而,美颜人脸图像与待美颜人脸图像的通道数也相同,即图像美颜网络的处理过程不改变图像尺寸与通道数。
需要说明的是,图像美颜网络的处理过程同样不改变人脸的数量。例如待美颜人脸图像是由4张原始人脸子图像组合而成,在经过图像美颜网络的处理后,输出的美颜人脸图像中也包括4张人脸,是对4张原始人脸子图像中的人脸进行美颜后的人脸。
如果将图像美颜网络用于进行去瑕疵处理,其去瑕疵效果依赖于美颜图像数据集的质量与训练效果,而不依赖于人为设计的图像特征计算。当采用较为全面的美颜图像数据集进行充分训练后,图像美颜网络可以应对实际应用中的几乎所有情况,包括不同的光照条件、不同的皮肤状况等,实现准确、充分地检测与去除人像瑕疵。
在一种实施方式中,在图像美颜网络的训练过程,除了常规的美颜训练外,还可以增加抗闪动训练,使得经过训练的图像美颜网络可以进一步保证人脸美颜效果的一致性。参考图11所示,图像美颜处理方法还可以包括以下步骤S1110至S1140:
步骤S1110,将第一待美颜样本图像输入待训练的图像美颜网络,以输出第一美颜样本图像。
图像美颜网络可以实现不同美颜功能的组合,本示例性实施方式可以根据实际需求,获取对应于不同美颜功能的美颜图像数据集,以训练所需的图像美颜网络。例如,如果需要训练去瑕疵的图像美颜网络,则获取具有瑕疵的待美颜样本图像,通过人工去瑕疵处理,得到对应的标注图像(Ground truth),由此构建去瑕疵的美颜图像数据集;如果需要训练去瑕疵+形变的图像美颜网络,则获取具有瑕疵的待美颜样本图像,通过人工去瑕疵与形变处理,得到对应的标注图像,由此构建去瑕疵+形变的美颜图像数据集。当然,也可以先获取标注图像,经过反向处理,得到待美颜样本图像,例如获取无瑕疵的人脸图像,对其进行添加瑕疵、反向形变(是指与美颜中的形变相反的处理,例如美颜中常进行“瘦脸”,此处可以将脸部拉宽)等处理,得到待美颜样本图像,将无瑕疵的人脸图像作为其对应的标注图像,构建去瑕疵+形变的美颜图像数据集。可见,本示例性实施方式可以通过构建不同的美颜图像数据集,训练任意一种或多种美颜功能组合的图像美颜网络。
在一种实施方式中,可以将多张人脸图像进行组合,得到一张待美颜样本图像,并可以将该多张人脸图像对应的人工美颜后图像进行组合,得到该待美颜样本图像对应的一张标注图像,然后将该待美颜样本图像与标注图像添加至美颜图像数据集中。换句话说,美颜图像数据集可以包括单人脸的图像、多人脸的图像、组合人脸的图像等不同类型。
第一待美颜样本图像用于提供图像美颜网络的美颜训练,美颜训练是指训练图像美颜网络能够美颜出高质量、自然的图像。第一待美颜样本图像可以是美颜图像数据集中的任意图像。
图像美颜网络的结构可以参考上述图8与图9部分的内容,因而不再赘述。将第一待美颜样本图像输入图像美颜网络,输出对应的第一美颜样本图像,由于此时图像美颜网络未经训练或未经充分地训练,因此第一美颜样本图像应当与理想的美颜图像存在差别,例如与第一待美颜样本图像对应的标注图像存在差别。
步骤S1120,将第二待美颜样本图像输入图像美颜网络,并通过变换参数对图像美颜网络输出的图像进行变换,得到第二美颜样本图像。
第二待美颜样本图像用于提供图像美颜网络的抗闪动训练,抗闪动训练是指训练图像美颜网络能够对连续多帧图像实现稳定、无闪动的美颜处理效果。第二待美颜样本图像可以是美颜图像数据集中的任意图像。
在一种实施方式中,可以从同一个美颜图像数据集中获取第一待美颜样本图像与第二待美颜样本图像。例如第一待美颜样本图像与第二待美颜样本图像可以是同一张图像,这样美颜图像数据集中的每一张图像均可以同时作为第一待美颜样本图像与第二待美颜样本图像被使用,从而提高数据集的使用率。
上述通过人工处理的方式所构建的美颜图像数据集为有标注数据集,其中的待美颜样本图像均具有对应的标注图像。此外,还可以构建无标注数据集,例如仅收集待美颜样本图像,不需要进行人工处理,将这些待美颜样本图像形成无标注数据集,其中的待美颜样本图像均不具有对应的标注图像。在一种实施方式中,可以在有标注数据集中获取第一待美颜样本图像,在无标注数据集中获取第二待美颜样本图像。训练过程中并不使用第二待美颜样本图像对应的标注图像。无标注数据集的获取难度远低于有标注数据集,从而有利于增加第二待美颜样本图像的数量,便于对图像美颜网络进行更加充分的抗闪动训练。
在连续多帧图像之间,由于拍摄镜头的运动或者拍摄对象本身的运动,导致图像中的主要拍摄对象(即人脸)发生变换,包括平移、旋转、缩放中的一种或多种。步骤S1120中的变换参数用于模拟不同帧图像间的这种变化,可以包括平移参数、旋转参数、缩放参数中的任意一种或多种。
在一种实施方式中,可以对视频中图像的变换参数进行统计分析,得到步骤S1120中的变换参数。
在一种实施方式中,可以通过随机生成的方式得到变换参数。具体地,图像美颜处理方法还可以包括以下步骤:
获取预设的第一数值区间、第二数值区间、第三数值区间;
在第一数值区间内随机生成平移参数,在第二数值区间内随机生成旋转参数,在第三数值区间内随机生成缩放参数。
其中,第一数值区间是针对平移参数的数值区间,第二数值区间是针对旋转参数的数值区间,第三数值区间是针对缩放参数的数值区间,分别表示视频中图像可能发生的平移、旋转、缩放的数值范围。本示例性实施方式可以根据经验与实际场景确定三个数值区间。示例性的,第一数值区间可以是[-3,3],单位为像素,表示平移的像素数;第二数值区间可以是[-5,5],单位为度,表示旋转的度数;第三数值区间可以是[0.97,1.03],单位为倍,表示缩放的倍率。进而,分别在三个数值区间内生成随机数,得到平移参数、旋转参数、缩放参数,即得到步骤S320中的变换参数。
将第二待美颜样本图像输入图像美颜网络,再利用上述得到的变换参数对图像美颜网络输出的图像进行变换,即第二待美颜样本图像先经过美颜、再经过变换,得到第二美颜样本图像。
步骤S1130,通过变换参数对第二待美颜样本图像进行变换,并将变换后的第二待美颜变换图像输入图像美颜网络,以输出第三美颜样本图像。
步骤S1130相当于将步骤S1120中的美颜与变换的顺序交换,即第二待美颜图像先经过变换、再经过美颜,得到第三美颜样本图像。
步骤S1140,基于第一待美颜样本图像对应的标注图像与第一美颜样本图像的差别,第二美颜样本图像与第三美颜样本图像的差别,更新图像美颜网络训练的参数。
标注图像与第一美颜样本图像的差别反映了图像美颜网络的美颜效果,即第一美颜样本图像越接近标注图像,表示图像美颜网络的美颜效果越好。由此,可以基于标注图像与第一美颜样本图像的差别,更新图像美颜网络的参数,以实现美颜训练。
第二美颜样本图像与第三美颜样本图像的差别反映了图像美颜网络的抗闪动效果。举例来说,视频第k帧图像(记为I
k,相当于上述变换后的第二待美颜样本图像)中的人脸相对第k-1帧图像(记为I
k-1,相当于上述第二待美颜样本图像)中的人脸发生了变换,假设变换参数为P,即I
k=I
k-1·P。在实际美颜中,需要对第k-1帧图像与第k帧图像均进行美颜处理,将第k-1帧图像与第k帧图像输入图像美颜网络后输出的图像分别记为Y(I
k-1)、Y(I
k)=Y(I
k-1·P),如果Y(I
k-1)·P=Y(I
k-1·P)(相当于第二美颜样本图像等于第三美颜样本图像),说明经过美颜处理的第k-1帧图像与第k帧图像中的人脸存在变换关系,但是人脸本身没有差别,即视频中的连续两帧图像具有美颜一致性,不存在闪动的情况。由此,可以基于第二美颜样本图像与第三美颜样本图像的差别,更新图像美颜网络的参数,以实现抗闪动训练。
上述美颜训练对图像美颜网络的参数更新与抗闪动训练对图像美颜网络的参数更新可以同时执行,也可以分开执行,本公开对此不做限定。
在一种实施方式中,步骤S1140可以包括:
基于第一待美颜样本图像对应的标注图像与第一美颜样本图像的差别,确定第一损失函数值;
基于第二美颜样本图像与第三美颜样本图像的差别,确定第二损失函数值;
根据第一损失函数值与第二损失函数值更新图像美颜网络的参数。
其中,第一损失函数用于反映图像美颜网络的美颜损失,第二损失函数用于反映图像美颜网络的抗闪动损失。第一损失函数与第二损失函数可以预先建立,例如可以采用MAE(Mean Absolute Error,平均绝对误差,即L1损失)、MSE(Mean Square Error,均方误差,即L2损失)等形式。在训练中,将标注图像与第一美颜样本图像代入第一损失函数,计算出第一损失函数值,将第二美颜样本图像与第三美颜样本图像代入第二损失函数,计算出第二损失函数值。进而,可以分别根据第一损失函数值与第二损失函数值,对图像美颜网络的参数进行梯度下降更新,也可以由第一损失函数值与第二损失函数值进一步计算出全局损失函数值,全局损失函数例如可以是第一损失函数与第二损失函数进行加权的结果,再根据全局损失函数值对图像美颜网络的参数进行梯度下降更新。
图12示出了训练图像美颜网络的示意性流程。将第一待美颜样本图像输入图像美颜网络,输出第一美颜样本图像,将第一待美颜样本图像对应的标注图像与第一美颜样本图像代入第一损失函数,得到第一损失函数值。将第二待美颜样本图像输入图像美颜网络,利用预先生成的变换参数对输出的图像进行变换处理,得到第二美颜样本图像。利用该变换参数对第二待美颜样本图像进行变换处理,再将变换处理后的图像输入图像美颜网络,输出第三美颜样本图像。将第二美颜样本图像与第三美颜样本图像代入第二损失函数,得到第二损失函数值。对第一损失函数值与第二损失函数值进行加权,得到全局损失函数值,根据全局损失函数值对图像美颜网络中的各参数进行更新。经过这样多轮的迭代更新,当图像美颜网络在上述美颜图像数据集中的验证子集上的准确率达到预设的准确率阈值或者损失值低于预设的损失阈值时,确定完成训练,得到可用于实际美颜处理的图像美颜网络。
由上可知,图像美颜网络在实现常规的美颜处理的同时,还可以对图像的平移、旋转、缩放等变换表现出美颜效果的不变性,例如视频中人脸发生平移、旋转、以及与镜头远近的变化所导致的缩放时,图像美颜网络可以保持对该人脸的美颜效果,这与上述通过稳定包围盒稳定人脸的位置与尺寸的技术手段相结合,可以进一步保证人脸美颜效果的一致性,防止连续多帧图像经过美颜后发生画面闪动的现象。
步骤S530,从美颜人脸图像中拆分出与原始人脸子图像对应的美颜人脸子图像。
其中,在对美颜人脸图像进行拆分时,可以采用上述保存的组合信息,从美颜人脸图像中拆分出特定位置、特定尺寸的子图像,即美颜人脸子图像,美颜人脸子图像与原始人脸子图像一一对应。
继续参考图3,在步骤S350中,根据美颜人脸子图像生成待美颜原始图像对应的目标美颜图像。
美颜人脸子图像为对待美颜原始图像中的人脸经过美颜处理后的结果,将其替换掉待美颜原始图像中的人脸,可以得到待美颜原始图像的美颜结果,即目标美颜图像。示例性的,可以将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像,得到目标美颜图像。
在一种实施方式中,如果在将原始人脸子图像组合为待美颜人脸图像时,对原始人脸子图像进行了 变换,则可以相应的对拆分得到的美颜人脸子图像进行逆变换,包括去除填充的像素、上采样、反向旋转90度等,使逆变换后的美颜人脸子图像与原始人脸子图像的方向、尺寸等一致,这样在待美颜原始图像中可以进行1:1替换,得到目标美颜图像。
美颜人脸子图像是经过图像美颜网络进行美颜处理后的人脸子图像,通常是美颜程度较高的人脸子图像。在一种实施方式中,为了增加美颜人脸子图像的真实感,在上述将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像前,可以利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理。美颜弱化处理是指降低美颜人脸子图像的美颜程度,以增加。下面提供美颜弱化处理的两种示例性方式:
方式一、根据设定的美颜程度参数,将原始人脸子图像融合至美颜人脸子图像。其中,美颜程度参数可以是特定美颜功能下的美颜力度参数,如去瑕疵程度。本示例性实施方式中,美颜程度参数可以是用于当前设定的参数,系统默认的参数,或者上一次美颜所使用的参数等。在确定美颜程度参数后,可以以美颜程度参数作为比重,将原始人脸子图像与美颜人脸子图像进行融合。举例来说,假设去瑕疵程度的范围为0~100,当前设定的值为a,参考如下公式:
其中,image_blend表示融合后的图像,image_ori表示原始人脸子图像,image_deblemish表示美颜人脸子图像。当a为0时,表示不进行去瑕疵处理,则完全使用原始人脸子图像;当a为100时,表示完全去瑕疵处理,则完全使用美颜人脸子图像。因此,公式(12)表示通过融合,得到介于原始人脸子图像与美颜人脸子图像中间的图像,a越大,所得到的图像越接近于美颜人脸子图像,即美颜程度越高,美颜效果越明显。
需要说明的是,如果在将原始人脸子图像组合为待美颜人脸图像时,对原始人脸子图像进行了变换,可以对拆分得到的美颜人脸子图像进行逆变换。原始人脸子图像与美颜人脸子图像具有如下关系:变换前的原始人脸子图像与逆变换后的美颜人脸子图像方向、尺寸等一致;变换后的原始人脸子图像与逆变换前的美颜人脸子图像方向、尺寸等一致。因此,在利用上述公式(12)将原始人脸子图像与美颜人脸子图像进行融合时,可以融合上述变换前的原始人脸子图像与逆变换后的美颜人脸子图像,也可以融合上述变换后的原始人脸子图像与逆变换前的美颜人脸子图像。
方式二、将原始人脸子图像的高频图像融合至美颜人脸子图像。其中,高频图像是指包含原始人脸子图像中细节纹理等高频信息的图像。
在一种实施方式中,可以通过以下方式获取高频图像:
在基于图像美颜网络的输入图像尺寸将上述一张或多张原始人脸子图像进行组合时,如果对原始人脸子图像进行下采样,则将下采样后得到的下采样人脸子图像进行上采样,得到上采样人脸子图像;
根据原始人脸子图像与上采样人脸子图像的差别,获取原始人脸子图像的高频图像。
其中,下采样人脸子图像的分辨率低于原始人脸子图像,一般在下采样的过程中,不可避免地会损失图像的高频信息。对下采样人脸子图像进行上采样,使得到的上采样人脸子图像与原始人脸子图像的分辨率相同。需要说明的是,如果对原始人脸子图像进行下采样前,还进行了旋转,则对下采样人脸子图像进行上采样后,还可以进行反向旋转,使得到的上采样人脸子图像与原始人脸子图像的方向也相同。
上采样可以采用双线性插值、最近邻插值等方法。通过上采样虽然能够恢复分辨率,但是难以完全恢复出所损失的高频信息,即上采样人脸子图像可视为原始人脸子图像的低频图像。由此,确定原始人脸子图像与上采样人脸子图像的差别,例如可以将原始人脸子图像与上采样人脸子图像相减,结果为原始人脸子图像的高频信息,将相减后的值形成图像,即原始人脸子图像的高频图像。
在另一种实施方式中,还可以通过对原始人脸子图像进行滤波,以提取高频信息,得到高频图像。
在将上述高频图像融合至美颜人脸子图像时,可以采用直接相加的方式,将高频图像叠加到美颜人脸子图像中,使得美颜人脸子图像中增加细节纹理等高频信息,更具有真实感。
由于原始人脸子图像与上采样人脸子图像通常是非常相近的,基于其差值得到的高频图像中,像素值一般较小,如RGB各通道值不超过4。然而,对于原始人脸子图像中的突变位置,比如脸上的小黑痣等,其具有强烈的高频信息,因此在高频图像中对应位置的像素值可能比较大。在将高频图像融合至原始人脸子图像时,这些位置的像素值可能产生不良影响,例如产生“痣印”等锐利边缘,导致视觉感受不自然。
针对于上述问题,在一种实施方式中,图像美颜处理方法还可以包括以下步骤:
在高频图像中确定瑕疵点;
将高频图像中上述瑕疵点周围预设区域内的像素值调整到预设数值范围内。
其中,瑕疵点是具有强烈高频信息的像素点,可以将高频图像中像素值较大的点确定为瑕疵点。或者,在一种实施方式中,可以通过以下方式确定瑕疵点:
将美颜人脸子图像与对应的原始人脸子图像相减,得到每个像素点的差值;
当判断某个像素点的差值满足预设瑕疵条件时,将该像素点在高频图像中对应的像素点确定为瑕疵点。
其中,预设瑕疵条件用于衡量美颜人脸子图像与原始人脸子图像的差别,以判断每个像素点是否为被去除的瑕疵点。在去瑕疵处理中,通常会将人脸中的小黑痣、痘等去除,并填充人脸肤色,在该位置处,美颜人脸子图像与原始人脸子图像的差别很大,因此可以通过设定预设瑕疵条件来甄别瑕疵点。
示例性的,预设瑕疵条件可以包括:各个颜色通道的差值均大于第一颜色差阈值,且各个颜色通道的差值中的至少一个大于第二颜色差阈值。第一颜色差阈值与第二颜色差阈值可以是经验阈值。例如,当颜色通道包括RGB时,第一颜色差阈值可以是20,第二颜色差阈值可以是40。由此,得到每个像素点在美颜人脸子图像中与在原始人脸子图像中的差值后,对差值中RGB三个颜色通道的具体差值进行判断,判断每个颜色通道的差值是否均大于20,以及其中是否由至少一个颜色通道的差值大于40,当满足这两个条件时,表示满足预设瑕疵条件,则将高频图像中对应位置的像素点确定为瑕疵点。
确定瑕疵点后,可以在高频图像中进一步确定瑕疵点周围的预设区域,例如可以是以瑕疵点为中心的5*5像素区域,具体的尺寸可以根据高频图像的尺寸来确定,本公开不做限定。将预设区域内的像素值调整到预设数值范围内,预设数值范围一般是较小的数值范围,可以根据经验与实际需求确定,在调整时通常需要减小像素值。示例性的,预设数值范围可以是-2~2,而瑕疵点周围的像素值可能超出-5~5,将其调整到-2~2内,实际上进行了限值处理。由此能够弱化“痣印”等锐利边缘,增加视觉上的自然感受。
以上说明了两种美颜弱化处理方式。本示例性实施方式可以同时采用这两种美颜弱化处理方式,例如,先通过方式一进行原始人脸子图像与美颜人脸子图像的融合,在此基础上,再通过方式二将高频图像叠加到其中,得到经过美颜弱化处理的美颜人脸子图像,该的美颜人脸子图像兼具有较好的美颜效果与真实感。
在一种实施方式中,在将待处理图像中的原始人脸子图像替换为对应的美颜人脸子图像时,还可以执行以下步骤:
对位于待美颜原始图像中的未替换区域与美颜人脸子图像之间的边界区域进行渐变处理,使边界区域形成平滑过渡。
其中,待美颜原始图像中的未替换区域即待美颜原始图像中除原始人脸子图像以外的区域。上述未替换区域与美颜人脸子图像之间的边界区域实际包括两部分:未替换区域中与美颜人脸子图像相邻的边界区域,以及美颜人脸子图像中与未替换区域相邻的边界区域。本示例性实施方式可以对其中任一部分进行渐变处理,也可以同时对两部分进行渐变处理。
参考图13所示,可以在美颜人脸子图像中确定一定比例(如10%)的边界区域,其从美颜人脸子图像的边缘向内延伸。需要注意的是,边界区域通常需要避开人脸部分,以避免渐变处理中改变人脸部分的颜色。例如,通过上述稳定包围盒截取原始人脸子图像,使得原始人脸子图像中的人脸与边界具有一定的距离,则美颜人脸子图像中的人脸与边界也具有一定的距离,这样在进行渐变处理时,可以较好地避开人脸部分。确定边界区域后,获取边界区域的内边缘颜色,记为第一颜色;获取未替换区域的内边缘颜色,记为第二颜色;再对边界区域进行第一颜色与第二颜色的渐变处理。由此,未替换区域与美颜人脸子图像的边界处为渐变色区域(图13中的斜线区域),这样形成平滑过渡,防止产生颜色突变,导致视觉感受不和谐。
需要说明的是,当有多张美颜人脸子图像时,可以分别将每一张美颜人脸子图像替换掉待处理图像中对应的原始人脸子图像,并进行边界区域的渐变处理,得到一张目标美颜图像,使其具有自然、和谐的视觉感受。
图14示出了图像美颜处理方法的示意性流程,包括:
步骤S1401,从视频中确定待美颜原始图像,例如可以将当前帧作为待美颜原始图像。
步骤S1402,对待美颜原始图像进行人脸检测,得到多张人脸的基础包围盒,筛除其中面积小于人脸面积阈值的人脸,余下的人脸记为待确定人脸。
步骤S1403,根据参考帧图像中的已确定人脸对上述待确定人脸进行跟踪,获取每张待确定人脸的ID,并确定其稳定包围盒。
步骤S1404,截取稳定包围盒以内的图像,得到原始人脸子图像。
步骤S1405,根据原始人脸子图像的数量将图像美颜网络的输入图像尺寸划分为多个子图像尺寸,根据子图像尺寸对原始人脸子图像进行下采样,还可以进行旋转、填充等处理,得到每张原始人脸子图像对应的下采样人脸子图像。
步骤S1406,将下采样人脸子图像进行上采样,如果在获取下采样人脸子图像时还进行了旋转、填充等处理,则还可以进行反向旋转、去除填充等处理,得到上采样人脸子图像,其与对应的原始人脸子 图像的分辨率一致。
步骤S1407,将原始人脸子图像与对应的上采样人脸子图像相减,得到原始人脸子图像的高频图像。
步骤S1408,将下采样人脸子图像组合为一张待美颜人脸图像。
步骤S1409,将待美颜人脸图像输入图像美颜网络,处理后输出美颜人脸图像。
步骤S1410,将美颜人脸图像拆分为与原始人脸子图像一一对应的美颜人脸子图像。
步骤S1411,将美颜人脸子图像与对应的原始人脸子图像按照美颜程度参数进行融合,再与该原始人脸子图像的高频图像相加,得到待替换人脸子图像。
步骤S1412,将待替换人脸子图像融合至待美颜原始图像,具体地,可以由待替换人脸子图像替换掉待美颜原始图像中的原始人脸子图像的部分,并进行边缘的颜色渐变处理,使得待美颜原始图像中的人脸被替换为美颜后的人脸,最终得到目标美颜图像。后续还可以进行个性化美颜处理。
本公开的示例性实施方式还提供一种图像美颜处理装置。参考图15所示,该图像美颜处理装置1500可以包括:
图像获取模块1510,被配置为从连续多帧图像中获取待美颜原始图像;
人脸匹配模块1520,被配置为将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定待美颜原始图像中的人脸的稳定包围盒;
子图像提取模块1530,被配置为基于待美颜原始图像中的人脸的稳定包围盒,从待美颜原始图像中提取原始人脸子图像;
美颜处理模块1540,被配置为利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像;
图像生成模块1550,被配置为根据美颜人脸子图像生成待美颜原始图像对应的目标美颜图像。
在一种实施方式中,人脸匹配模块1520,被配置为:
检测待美颜原始图像中的人脸,记为待确定人脸,将待确定人脸与待美颜原始图像的参考帧图像中的已确定人脸进行匹配;
如果待确定人脸与已确定人脸匹配不成功,则根据第一预设参数对待确定人脸的基础包围盒进行扩展,得到待确定人脸的稳定包围盒;
如果待确定人脸与已确定人脸匹配成功,则根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒。
在一种实施方式中,上述将待确定人脸与待美颜原始图像的参考帧图像中的已确定人脸进行匹配,包括:
根据待确定人脸的基础包围盒与已确定人脸的基础包围盒的重叠度,确定待确定人脸与已确定人脸是否匹配成功。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,包括:
基于预设稳定系数,对已确定人脸的稳定包围盒的中心点坐标与待确定人脸的基础包围盒的中心点坐标进行加权,得到待确定人脸的稳定包围盒的中心点坐标。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,包括:
如果待确定人脸的基础包围盒的尺寸大于已确定人脸的稳定包围盒的尺寸与第一倍率之积,则根据第二预设参数对已确定人脸的稳定包围盒的尺寸进行扩展,得到待确定人脸的稳定包围盒的尺寸;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则根据第三预设参数对已确定人脸的稳定包围盒的尺寸进行缩小,得到待确定人脸的稳定包围盒的尺寸;第一倍率大于第二倍率;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第一倍率之积、且大于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则将已确定人脸的稳定包围盒的尺寸作为待确定人脸的稳定包围盒的尺寸。
在一种实施方式中,美颜处理模块1540,被配置为:
基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合,生成待美颜人脸图像;
利用图像美颜网络对待美颜人脸图像进行处理,得到对应的美颜人脸图像;
从美颜人脸图像中拆分出与原始人脸子图像对应的美颜人脸子图像。
在一种实施方式中,上述基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合,生成待美颜人脸图像,包括:
根据原始人脸子图像的数量,将输入图像尺寸分割为与原始人脸子图像一一对应的子图像尺寸;
分别基于每个子图像尺寸将对应的原始人脸子图像进行变换;
将变换后的原始人脸子图像进行组合,生成待美颜人脸图像。
在一种实施方式中,上述分别基于每个子图像尺寸将对应的原始人脸子图像进行变换,包括以下任意一条或多条:
当原始人脸子图像的宽度与高度的大小关系与子图像尺寸的宽度与高度的大小关系不同时,将原始人脸子图像旋转90度;
当原始人脸子图像或者经过旋转的原始人脸子图像的尺寸大于子图像尺寸时,根据子图像尺寸将原始人脸子图像或者经过旋转的原始人脸子图像进行下采样;
当原始人脸子图像或者经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸小于子图像尺寸时,根据原始人脸子图像的尺寸与子图像尺寸的差值将原始人脸子图像进行填充,或者根据经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸与子图像尺寸的差值将经过旋转与下采样中至少一种处理的原始人脸子图像进行填充。
在一种实施方式中,图像生成模块1550,被配置为:
将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像,得到目标美颜图像。
在一种实施方式中,图像生成模块1550,被配置为:
在将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像前,利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理。
在一种实施方式中,上述利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理,包括:
根据设定的美颜程度参数,将原始人脸子图像融合至美颜人脸子图像。
在一种实施方式中,上述利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理,包括:
将原始人脸子图像的高频图像融合至美颜人脸子图像。
在一种实施方式中,图像获取模块1510,被配置为:
在基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合时,如果对原始人脸子图像进行下采样,则将下采样后得到的下采样人脸子图像进行上采样,得到上采样人脸子图像,上采样人脸子图像与原始人脸子图像的分辨率相同;
根据原始人脸子图像与上采样人脸子图像的差别,获取原始人脸子图像的高频图像。
在一种实施方式中,图像生成模块1550,被配置为:
在将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像时,对待美颜原始图像中的未替换区域与美颜人脸子图像之间的边界区域进行渐变处理,使边界区域形成平滑过渡。
在一种实施方式中,图像美颜网络为全卷积网络,包括:第一像素重排层、至少一个卷积层、至少一个转置卷积层、第二像素重排层。
美颜处理模块1530,被配置为:
利用第一像素重排层对待美颜人脸图像进行由单通道到多通道的像素重排处理,得到第一特征图像;
利用卷积层对第一特征图像进行卷积处理,得到第二特征图像;
利用转置卷积层对第二特征图像进行转置卷积处理,得到第三特征图像;
利用第二像素重排层对第三特征图像进行由多通道到单通道的像素重排处理,得到美颜人脸图像。
在一种实施方式中,美颜处理模块1530,被配置为:
将通道数为a的待美颜人脸图像输入第一像素重排层;
将待美颜人脸图像的每个通道中每n*n邻域的像素点分别重排至n*n个通道中的相同位置,输出通道数为a*n*n的第一特征图像;
其中,a为正整数,n为不小于2的正整数。
在一种实施方式中,美颜处理模块1530,被配置为:
将通道数为b*n*n的第三特征图像输入第二像素重排层;
将第三特征图像的每n*n个通道中相同位置的像素点重排至单通道中的n*n邻域内,输出通道数为b的美颜人脸图像;
其中,b为正整数,n为不小于2的正整数。
在一种实施方式中,图像美颜处理装置1500还可以包括网络训练模块,被配置为:
将第一待美颜样本图像输入待训练的图像美颜网络,以输出第一美颜样本图像;
将第二待美颜样本图像输入图像美颜网络,并通过变换参数对图像美颜网络输出的图像进行变换,得到第二美颜样本图像;
通过变换参数对第二待美颜样本图像进行变换,并将变换后的第二待美颜变换图像输入图像美颜网络,以输出第三美颜样本图像;
基于第一待美颜样本图像对应的标注图像与第一美颜样本图像的差别,第二美颜样本图像与第三美 颜样本图像的差别,更新图像美颜网络训练的参数。
在一种实施方式中,美颜人脸图像包括去瑕疵美颜图像;美颜处理模块1530,被配置为:
在得到去瑕疵美颜图像后,对去瑕疵美颜图像进行个性化美颜处理,得到最终的美颜图像。
本公开的示例性实施方式还提供另一种图像美颜处理装置。参考图16所示,该图像美颜处理装置1600可以包括处理器1610和存储器1620,存储器1620存储有以下程序模块:
图像获取模块1621,被配置为从连续多帧图像中获取待美颜原始图像;
人脸匹配模块1622,被配置为将待美颜原始图像中的人脸与待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定待美颜原始图像中的人脸的稳定包围盒;
子图像提取模块1623,被配置为基于待美颜原始图像中的人脸的稳定包围盒,从待美颜原始图像中提取原始人脸子图像;
美颜处理模块1624,被配置为利用图像美颜网络对原始人脸子图像进行处理,得到对应的美颜人脸子图像;
图像生成模块1625,被配置为根据美颜人脸子图像生成待美颜原始图像对应的目标美颜图像;
处理器1610被配置为执行以上程序模块。
在一种实施方式中,人脸匹配模块1622,被配置为:
检测待美颜原始图像中的人脸,记为待确定人脸,将待确定人脸与待美颜原始图像的参考帧图像中的已确定人脸进行匹配;
如果待确定人脸与已确定人脸匹配不成功,则根据第一预设参数对待确定人脸的基础包围盒进行扩展,得到待确定人脸的稳定包围盒;
如果待确定人脸与已确定人脸匹配成功,则根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒。
在一种实施方式中,上述将待确定人脸与待美颜原始图像的参考帧图像中的已确定人脸进行匹配,包括:
根据待确定人脸的基础包围盒与已确定人脸的基础包围盒的重叠度,确定待确定人脸与已确定人脸是否匹配成功。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,包括:
基于预设稳定系数,对已确定人脸的稳定包围盒的中心点坐标与待确定人脸的基础包围盒的中心点坐标进行加权,得到待确定人脸的稳定包围盒的中心点坐标。
在一种实施方式中,上述根据已确定人脸的稳定包围盒确定待确定人脸的稳定包围盒,包括:
如果待确定人脸的基础包围盒的尺寸大于已确定人脸的稳定包围盒的尺寸与第一倍率之积,则根据第二预设参数对已确定人脸的稳定包围盒的尺寸进行扩展,得到待确定人脸的稳定包围盒的尺寸;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则根据第三预设参数对已确定人脸的稳定包围盒的尺寸进行缩小,得到待确定人脸的稳定包围盒的尺寸;第一倍率大于第二倍率;
如果待确定人脸的基础包围盒的尺寸小于已确定人脸的稳定包围盒的尺寸与第一倍率之积、且大于已确定人脸的稳定包围盒的尺寸与第二倍率之积,则将已确定人脸的稳定包围盒的尺寸作为待确定人脸的稳定包围盒的尺寸。
在一种实施方式中,美颜处理模块1624,被配置为:
基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合,生成待美颜人脸图像;
利用图像美颜网络对待美颜人脸图像进行处理,得到对应的美颜人脸图像;
从美颜人脸图像中拆分出与原始人脸子图像对应的美颜人脸子图像。
在一种实施方式中,上述基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合,生成待美颜人脸图像,包括:
根据原始人脸子图像的数量,将输入图像尺寸分割为与原始人脸子图像一一对应的子图像尺寸;
分别基于每个子图像尺寸将对应的原始人脸子图像进行变换;
将变换后的原始人脸子图像进行组合,生成待美颜人脸图像。
在一种实施方式中,上述分别基于每个子图像尺寸将对应的原始人脸子图像进行变换,包括以下任意一条或多条:
当原始人脸子图像的宽度与高度的大小关系与子图像尺寸的宽度与高度的大小关系不同时,将原始人脸子图像旋转90度;
当原始人脸子图像或者经过旋转的原始人脸子图像的尺寸大于子图像尺寸时,根据子图像尺寸将原 始人脸子图像或者经过旋转的原始人脸子图像进行下采样;
当原始人脸子图像或者经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸小于子图像尺寸时,根据原始人脸子图像的尺寸与子图像尺寸的差值将原始人脸子图像进行填充,或者根据经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸与子图像尺寸的差值将经过旋转与下采样中至少一种处理的原始人脸子图像进行填充。
在一种实施方式中,图像生成模块1625,被配置为:
将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像,得到目标美颜图像。
在一种实施方式中,图像生成模块1625,被配置为:
在将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像前,利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理。
在一种实施方式中,上述利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理,包括:
根据设定的美颜程度参数,将原始人脸子图像融合至美颜人脸子图像。
在一种实施方式中,上述利用原始人脸子图像对美颜人脸子图像进行美颜弱化处理,包括:
将原始人脸子图像的高频图像融合至美颜人脸子图像。
在一种实施方式中,图像获取模块1621,被配置为:
在基于图像美颜网络的输入图像尺寸将从待美颜原始图像中提取的原始人脸子图像进行组合时,如果对原始人脸子图像进行下采样,则将下采样后得到的下采样人脸子图像进行上采样,得到上采样人脸子图像,上采样人脸子图像与原始人脸子图像的分辨率相同;
根据原始人脸子图像与上采样人脸子图像的差别,获取原始人脸子图像的高频图像。
在一种实施方式中,图像生成模块1625,被配置为:
在将待美颜原始图像中的原始人脸子图像替换为对应的美颜人脸子图像时,对待美颜原始图像中的未替换区域与美颜人脸子图像之间的边界区域进行渐变处理,使边界区域形成平滑过渡。
在一种实施方式中,图像美颜网络为全卷积网络,包括:第一像素重排层、至少一个卷积层、至少一个转置卷积层、第二像素重排层。
美颜处理模块1623,被配置为:
利用第一像素重排层对待美颜人脸图像进行由单通道到多通道的像素重排处理,得到第一特征图像;
利用卷积层对第一特征图像进行卷积处理,得到第二特征图像;
利用转置卷积层对第二特征图像进行转置卷积处理,得到第三特征图像;
利用第二像素重排层对第三特征图像进行由多通道到单通道的像素重排处理,得到美颜人脸图像。
在一种实施方式中,美颜处理模块1623,被配置为:
将通道数为a的待美颜人脸图像输入第一像素重排层;
将待美颜人脸图像的每个通道中每n*n邻域的像素点分别重排至n*n个通道中的相同位置,输出通道数为a*n*n的第一特征图像;
其中,a为正整数,n为不小于2的正整数。
在一种实施方式中,美颜处理模块1623,被配置为:
将通道数为b*n*n的第三特征图像输入第二像素重排层;
将第三特征图像的每n*n个通道中相同位置的像素点重排至单通道中的n*n邻域内,输出通道数为b的美颜人脸图像;
其中,b为正整数,n为不小于2的正整数。
在一种实施方式中,图像美颜处理装置1600还可以包括网络训练模块,被配置为:
将第一待美颜样本图像输入待训练的图像美颜网络,以输出第一美颜样本图像;
将第二待美颜样本图像输入图像美颜网络,并通过变换参数对图像美颜网络输出的图像进行变换,得到第二美颜样本图像;
通过变换参数对第二待美颜样本图像进行变换,并将变换后的第二待美颜变换图像输入图像美颜网络,以输出第三美颜样本图像;
基于第一待美颜样本图像对应的标注图像与第一美颜样本图像的差别,第二美颜样本图像与第三美颜样本图像的差别,更新图像美颜网络训练的参数。
在一种实施方式中,美颜人脸图像包括去瑕疵美颜图像;美颜处理模块1623,被配置为:
在得到去瑕疵美颜图像后,对去瑕疵美颜图像进行个性化美颜处理,得到最终的美颜图像。
上述装置中各部分的具体细节在方法部分实施方式中已经详细说明,未披露的细节内容可以参见方法部分的实施方式内容,因而不再赘述。
本公开的示例性实施方式还提供了一种计算机可读存储介质,可以实现为一种程序产品的形式,其 包括程序代码,当程序产品在电子设备上运行时,程序代码用于使电子设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。在一种可选的实施方式中,该程序产品可以实现为便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在电子设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限定。
Claims (22)
- 一种图像美颜处理方法,其特征在于,包括:从连续多帧图像中获取待美颜原始图像;将所述待美颜原始图像中的人脸与所述待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定所述待美颜原始图像中的人脸的稳定包围盒;基于所述待美颜原始图像中的人脸的稳定包围盒,从所述待美颜原始图像中提取原始人脸子图像;利用图像美颜网络对所述原始人脸子图像进行处理,得到对应的美颜人脸子图像;根据所述美颜人脸子图像生成所述待美颜原始图像对应的目标美颜图像。
- 根据权利要求1所述的方法,其特征在于,所述将所述待美颜原始图像中的人脸与所述待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定所述待美颜原始图像中的人脸的稳定包围盒,包括:检测所述待美颜原始图像中的人脸,记为待确定人脸,将所述待确定人脸与所述待美颜原始图像的参考帧图像中的已确定人脸进行匹配;如果所述待确定人脸与所述已确定人脸匹配不成功,则根据第一预设参数对所述待确定人脸的基础包围盒进行扩展,得到所述待确定人脸的稳定包围盒;如果所述待确定人脸与所述已确定人脸匹配成功,则根据所述已确定人脸的稳定包围盒确定所述待确定人脸的稳定包围盒。
- 根据权利要求2所述的方法,其特征在于,所述将所述待确定人脸与所述待美颜原始图像的参考帧图像中的已确定人脸进行匹配,包括:根据所述待确定人脸的基础包围盒与所述已确定人脸的基础包围盒的重叠度,确定所述待确定人脸与所述已确定人脸是否匹配成功。
- 根据权利要求2所述的方法,其特征在于,所述根据所述已确定人脸的稳定包围盒确定所述待确定人脸的稳定包围盒,包括:基于预设稳定系数,对所述已确定人脸的稳定包围盒的中心点坐标与所述待确定人脸的基础包围盒的中心点坐标进行加权,得到所述待确定人脸的稳定包围盒的中心点坐标。
- 根据权利要求2所述的方法,其特征在于,所述根据所述已确定人脸的稳定包围盒确定所述待确定人脸的稳定包围盒,包括:如果所述待确定人脸的基础包围盒的尺寸大于所述已确定人脸的稳定包围盒的尺寸与第一倍率之积,则根据第二预设参数对所述已确定人脸的稳定包围盒的尺寸进行扩展,得到所述待确定人脸的稳定包围盒的尺寸;如果所述待确定人脸的基础包围盒的尺寸小于所述已确定人脸的稳定包围盒的尺寸与第二倍率之积,则根据第三预设参数对所述已确定人脸的稳定包围盒的尺寸进行缩小,得到所述待确定人脸的稳定包围盒的尺寸;所述第一倍率大于所述第二倍率;如果所述待确定人脸的基础包围盒的尺寸小于所述已确定人脸的稳定包围盒的尺寸与第一倍率之积、且大于所述已确定人脸的稳定包围盒的尺寸与第二倍率之积,则将所述已确定人脸的稳定包围盒的尺寸作为所述待确定人脸的稳定包围盒的尺寸。
- 根据权利要求1所述的方法,其特征在于,所述利用图像美颜网络对所述原始人脸子图像进行处理,得到对应的美颜人脸子图像,包括:基于所述图像美颜网络的输入图像尺寸将从所述待美颜原始图像中提取的所述原始人脸子图像进行组合,生成待美颜人脸图像;利用所述图像美颜网络对所述待美颜人脸图像进行处理,得到对应的美颜人脸图像;从所述美颜人脸图像中拆分出与所述原始人脸子图像对应的美颜人脸子图像。
- 根据权利要求6所述的方法,其特征在于,所述基于所述图像美颜网络的输入图像尺寸将从所述待美颜原始图像中提取的所述原始人脸子图像进行组合,生成所述待美颜人脸图像,包括:根据所述原始人脸子图像的数量,将所述输入图像尺寸分割为与所述原始人脸子图像一一对应的子图像尺寸;分别基于每个子图像尺寸将对应的所述原始人脸子图像进行变换;将变换后的所述原始人脸子图像进行组合,生成所述待美颜人脸图像。
- 根据权利要求7所述的方法,其特征在于,所述分别基于每个子图像尺寸将对应的所述原始人脸子图像进行变换,包括以下任意一条或多条:当所述原始人脸子图像的宽度与高度的大小关系与所述子图像尺寸的宽度与高度的大小关系不同时,将所述原始人脸子图像旋转90度;当所述原始人脸子图像或者经过旋转的原始人脸子图像的尺寸大于所述子图像尺寸时,根据所述子图像尺寸将所述原始人脸子图像或者所述经过旋转的原始人脸子图像进行下采样;当所述原始人脸子图像或者经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸小于所述子图像尺寸时,根据所述原始人脸子图像的尺寸与所述子图像尺寸的差值将所述原始人脸子图像进行填充,或者根据所述经过旋转与下采样中至少一种处理的原始人脸子图像的尺寸与所述子图像尺寸的差值将所述经过旋转与下采样中至少一种处理的原始人脸子图像进行填充。
- 根据权利要求6所述的方法,其特征在于,所述根据所述美颜人脸子图像生成所述待美颜原始图像对应的目标美颜图像,包括:将所述待美颜原始图像中的所述原始人脸子图像替换为对应的所述美颜人脸子图像,得到所述目标美颜图像。
- 根据权利要求9所述的方法,其特征在于,在将所述待美颜原始图像中的所述原始人脸子图像替换为对应的所述美颜人脸子图像前,所述方法还包括:利用所述原始人脸子图像对所述美颜人脸子图像进行美颜弱化处理。
- 根据权利要求10所述的方法,其特征在于,所述利用所述原始人脸子图像对所述美颜人脸子图像进行美颜弱化处理,包括:根据设定的美颜程度参数,将所述原始人脸子图像融合至所述美颜人脸子图像。
- 根据权利要求10所述的方法,其特征在于,所述利用所述原始人脸子图像对所述美颜人脸子图像进行美颜弱化处理,包括:将所述原始人脸子图像的高频图像融合至所述美颜人脸子图像。
- 根据权利要求12所述的方法,其特征在于,所述方法还包括:在基于所述图像美颜网络的输入图像尺寸将从所述待美颜原始图像中提取的所述原始人脸子图像进行组合时,如果对所述原始人脸子图像进行下采样,则将下采样后得到的下采样人脸子图像进行上采样,得到上采样人脸子图像,所述上采样人脸子图像与所述原始人脸子图像的分辨率相同;根据所述原始人脸子图像与所述上采样人脸子图像的差别,获取所述原始人脸子图像的高频图像。
- 根据权利要求9所述的方法,其特征在于,在将所述待美颜原始图像中的所述原始人脸子图像替换为对应的所述美颜人脸子图像时,所述方法还包括:对位于所述待美颜原始图像中的未替换区域与所述美颜人脸子图像之间的边界区域进行渐变处理,使所述边界区域形成平滑过渡。
- 根据权利要求6所述的方法,其特征在于,所述图像美颜网络为全卷积网络,包括:第一像素重排层、至少一个卷积层、至少一个转置卷积层、第二像素重排层;所述利用所述图像美颜网络对所述待美颜人脸图像进行处理,得到对应的美颜人脸图像,包括:利用所述第一像素重排层对所述待美颜人脸图像进行由单通道到多通道的像素重排处理,得到第一特征图像;利用所述卷积层对所述第一特征图像进行卷积处理,得到第二特征图像;利用所述转置卷积层对所述第二特征图像进行转置卷积处理,得到第三特征图像;利用所述第二像素重排层对所述第三特征图像进行由多通道到单通道的像素重排处理,得到所述美颜人脸图像。
- 根据权利要求15所述的方法,其特征在于,所述利用所述第一像素重排层对所述待美颜人脸图像进行由单通道到多通道的像素重排处理,得到第一特征图像,包括:将通道数为a的所述待美颜人脸图像输入所述第一像素重排层;将所述待美颜人脸图像的每个通道中每n*n邻域的像素点分别重排至n*n个通道中的相同位置,输出通道数为a*n*n的所述第一特征图像;其中,a为正整数,n为不小于2的正整数。
- 根据权利要求15所述的方法,其特征在于,所述利用所述第二像素重排层对所述第三特征图像进行由多通道到单通道的像素重排处理,得到所述中间图像,包括:将通道数为b*n*n的所述第三特征图像输入所述第二像素重排层;将所述第三特征图像的每n*n个通道中相同位置的像素点重排至单通道中的n*n邻域内,输出通道数为b的所述美颜人脸图像;其中,b为正整数,n为不小于2的正整数。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:将第一待美颜样本图像输入待训练的所述图像美颜网络,以输出第一美颜样本图像;将第二待美颜样本图像输入所述图像美颜网络,并通过变换参数对所述图像美颜网络输出的图像进 行变换,得到第二美颜样本图像;通过所述变换参数对所述第二待美颜样本图像进行变换,并将变换后的第二待美颜变换图像输入所述图像美颜网络,以输出第三美颜样本图像;基于所述第一待美颜样本图像对应的标注图像与所述第一美颜样本图像的差别,所述第二美颜样本图像与所述第三美颜样本图像的差别,更新所述图像美颜网络训练的参数。
- 根据权利要求1所述的方法,其特征在于,所述目标美颜图像包括去瑕疵美颜图像,在得到所述去瑕疵美颜图像后,所述方法还包括:对所述去瑕疵美颜图像进行个性化美颜处理,得到最终的美颜图像。
- 一种图像美颜处理装置,其特征在于,包括处理器与存储器,所述处理器被配置为执行所述存储器中存储的以下程序模块:图像获取模块,被配置为从连续多帧图像中获取待美颜原始图像;人脸匹配模块,被配置为将所述待美颜原始图像中的人脸与所述待美颜原始图像的参考帧图像中的人脸进行匹配,根据匹配结果确定所述待美颜原始图像中的人脸的稳定包围盒;子图像提取模块,被配置为基于所述待美颜原始图像中的人脸的稳定包围盒,从所述待美颜原始图像中提取原始人脸子图像;美颜处理模块,被配置为利用图像美颜网络对所述原始人脸子图像进行处理,得到对应的美颜人脸子图像;图像生成模块,被配置为根据所述美颜人脸子图像生成所述待美颜原始图像对应的目标美颜图像。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至19任一项所述的方法。
- 一种电子设备,其特征在于,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至19任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110793989.6A CN113538274A (zh) | 2021-07-14 | 2021-07-14 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN202110793989.6 | 2021-07-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023284401A1 true WO2023284401A1 (zh) | 2023-01-19 |
Family
ID=78099028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/093127 WO2023284401A1 (zh) | 2021-07-14 | 2022-05-16 | 图像美颜处理方法、装置、存储介质与电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113538274A (zh) |
WO (1) | WO2023284401A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115841432A (zh) * | 2023-02-09 | 2023-03-24 | 北京达佳互联信息技术有限公司 | 美颜特效数据确定和模型训练方法、装置、设备和介质 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113077397B (zh) * | 2021-03-29 | 2024-05-17 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113538274A (zh) * | 2021-07-14 | 2021-10-22 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN114049278A (zh) * | 2021-11-17 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN115908119B (zh) * | 2023-01-05 | 2023-06-06 | 广州佰锐网络科技有限公司 | 基于人工智能的人脸图像美颜处理方法及系统 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040151371A1 (en) * | 2003-01-30 | 2004-08-05 | Eastman Kodak Company | Method for face orientation determination in digital color images |
CN104732210A (zh) * | 2015-03-17 | 2015-06-24 | 深圳超多维光电子有限公司 | 目标人脸跟踪方法及电子设备 |
CN106228112A (zh) * | 2016-07-08 | 2016-12-14 | 深圳市优必选科技有限公司 | 人脸检测跟踪方法及机器人头部转动控制方法和机器人 |
CN109308469A (zh) * | 2018-09-21 | 2019-02-05 | 北京字节跳动网络技术有限公司 | 用于生成信息的方法和装置 |
CN112233041A (zh) * | 2020-11-05 | 2021-01-15 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113077397A (zh) * | 2021-03-29 | 2021-07-06 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113538274A (zh) * | 2021-07-14 | 2021-10-22 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113902611A (zh) * | 2021-10-09 | 2022-01-07 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN114049278A (zh) * | 2021-11-17 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
-
2021
- 2021-07-14 CN CN202110793989.6A patent/CN113538274A/zh active Pending
-
2022
- 2022-05-16 WO PCT/CN2022/093127 patent/WO2023284401A1/zh active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040151371A1 (en) * | 2003-01-30 | 2004-08-05 | Eastman Kodak Company | Method for face orientation determination in digital color images |
CN104732210A (zh) * | 2015-03-17 | 2015-06-24 | 深圳超多维光电子有限公司 | 目标人脸跟踪方法及电子设备 |
CN106228112A (zh) * | 2016-07-08 | 2016-12-14 | 深圳市优必选科技有限公司 | 人脸检测跟踪方法及机器人头部转动控制方法和机器人 |
CN109308469A (zh) * | 2018-09-21 | 2019-02-05 | 北京字节跳动网络技术有限公司 | 用于生成信息的方法和装置 |
CN112233041A (zh) * | 2020-11-05 | 2021-01-15 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113077397A (zh) * | 2021-03-29 | 2021-07-06 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113538274A (zh) * | 2021-07-14 | 2021-10-22 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN113902611A (zh) * | 2021-10-09 | 2022-01-07 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
CN114049278A (zh) * | 2021-11-17 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像美颜处理方法、装置、存储介质与电子设备 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115841432A (zh) * | 2023-02-09 | 2023-03-24 | 北京达佳互联信息技术有限公司 | 美颜特效数据确定和模型训练方法、装置、设备和介质 |
CN115841432B (zh) * | 2023-02-09 | 2023-08-08 | 北京达佳互联信息技术有限公司 | 美颜特效数据确定和模型训练方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113538274A (zh) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023284401A1 (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
WO2022110638A1 (zh) | 人像修复方法、装置、电子设备、存储介质和程序产品 | |
US20220261960A1 (en) | Super-resolution reconstruction method and related apparatus | |
WO2022206202A1 (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
CN109919869B (zh) | 一种图像增强方法、装置及存储介质 | |
CN111598776B (zh) | 图像处理方法、图像处理装置、存储介质与电子设备 | |
CN107123089B (zh) | 基于深度卷积网络的遥感图像超分辨重建方法及系统 | |
WO2018205676A1 (zh) | 用于卷积神经网络的处理方法和系统、和存储介质 | |
CN113034358B (zh) | 一种超分辨率图像处理方法以及相关装置 | |
CN112927362B (zh) | 地图重建方法及装置、计算机可读介质和电子设备 | |
WO2022068451A1 (zh) | 风格图像生成方法、模型训练方法、装置、设备和介质 | |
CN112562019A (zh) | 图像色彩调整方法及装置、计算机可读介质和电子设备 | |
US20160086316A1 (en) | Real time skin smoothing image enhancement filter | |
WO2022057868A1 (zh) | 图像超分方法和电子设备 | |
CN113902611A (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
CN114723760B (zh) | 人像分割模型的训练方法、装置及人像分割方法、装置 | |
CN114049278A (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
US20230047748A1 (en) | Method of fusing image, and method of training image fusion model | |
KR20240089729A (ko) | 화상 처리 방법, 장치, 저장 매체 및 전자 기기 | |
CN113409203A (zh) | 图像模糊程度确定方法、数据集构建方法与去模糊方法 | |
CN114565532A (zh) | 视频美颜处理方法、装置、存储介质与电子设备 | |
Liu et al. | Arbitrary-scale super-resolution via deep learning: A comprehensive survey | |
CN112200817A (zh) | 基于图像的天空区域分割和特效处理方法、装置及设备 | |
CN115330633A (zh) | 图像色调映射方法及装置、电子设备、存储介质 | |
WO2024032331A9 (zh) | 图像处理方法及装置、电子设备、存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22841041 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22841041 Country of ref document: EP Kind code of ref document: A1 |