WO2021179773A1

WO2021179773A1 - Image processing method and device

Info

Publication number: WO2021179773A1
Application number: PCT/CN2020/142530
Authority: WO
Inventors: 黄秀杰; 张迪; 马飞龙; 李宇; 宋星光; 王提政
Original assignee: 华为技术有限公司
Priority date: 2020-03-07
Filing date: 2020-12-31
Publication date: 2021-09-16
Also published as: CN113364971A; CN113364971B

Abstract

The present application discloses an image processing method and device, relating to the field of image processing technology. The present solution is more interactive and smarter in the process of recommending a pose to a user, and thereby is able to improve the user experience. Said method comprises: displaying a first preview image of a current photographing scene, the first preview image comprising a first human image of a photographed person in a first pose; recognizing the first preview image, so as to determine the scene category of the current photographing scene; displaying a second preview image in the current photographing scene, and displaying in the second preview image a target reference pose, the target reference pose being obtained at least on the basis of the scene category of the current photographing scene, wherein the second preview image comprises a second human image of the photographed person in a second pose; and if the second pose matches the target reference pose, generating a target image according to the second preview image. Said method can be applied to a photographing scene.

Description

Image processing method and device

This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office on March 7, 2020, the application number is 202010153760.1, and the application name is "A method and device for intelligent posture guidance composition fusion scene information", and The priority of the Chinese patent application filed with the State Intellectual Property Office, the application number is 202010480843.1, and the application name is "Image Processing Method and Apparatus" on May 30, 2020, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of image processing technology, and in particular to image processing methods and devices.

Background technique

With the development of smart phones, mobile photography has become an important part of people's lives. Portrait photography occupies a large proportion of mobile phone photography. To obtain beautiful portrait photography, you must first determine the shooting angle of the portrait, then determine the shooting composition suitable for the current shooting scene, and then pose to take the desired picture.

Regarding how to guide the subject to pose in a natural and graceful posture, there are currently some posture recommendation applications on the market. The working principle is specifically as follows: the user manually selects the posture to be photographed, and then the mobile phone displays the posture selected by the user on the display screen. The person being photographed poses under the guidance of the displayed posture, and then the photographer actively decides whether to shoot. The application requires the subjective judgment of the photographer during the posture recommendation process, and the interactivity is not very friendly and lacks intelligence.

Summary of the invention

The embodiments of the present application provide an image processing method and device. In the process of recommending user gestures, the interaction is better and more intelligent, so that the user experience can be improved.

In order to achieve the above objectives, this application adopts the following technical solutions:

In the first aspect, an image processing method is provided, and the method is applied to a first terminal. The method includes: firstly displaying a first preview image of the current shooting scene, the first preview image including a first portrait of the photographed person in a first posture. Secondly, the first preview image is recognized to determine the scene category of the current shooting scene. Next, display the second preview image in the current shooting scene, and display the target reference pose in the second preview image; the target reference pose is obtained at least based on the scene category of the current shooting scene; wherein, the second preview image includes the subject The second portrait in the second pose. If the second posture matches the target reference posture, the target image is generated according to the second preview image. It can be seen from this that the embodiment of the present application provides an intelligent gesture guidance/recommendation method that integrates scene information, and the entire process of recommending gestures does not require user participation, so the interaction is better and more intelligent, which can improve the user’s Experience.

Optionally, the first posture is different from the second posture. Optionally, the target image may be an image obtained by shooting the current shooting scene by the first device. In other words, the target image is the image that the first terminal needs to save.

In a possible design, the target reference posture and the first posture meet at least one of the following conditions: the target reference posture is different from the first posture; the relative position of the target reference posture in the second preview image is in the first posture. The relative position in a preview image is different; or, the size occupied by the target reference posture in the second preview image is different from the size occupied by the first posture in the first preview image. The technical solution provided by this possible design can be understood as: displaying the target reference posture in the second preview image when at least one of the foregoing conditions is satisfied. In other words, the embodiment of the present application provides a possible trigger condition for displaying the target reference posture in the second preview image.

In a possible design, the scene category of the current shooting scene includes at least one of the following categories: grass scene, step scene, seaside scene, sunset scene, road scene, or tower scene. Of course, the specific implementation is not limited to this.

In a possible design, the posture category of the target reference posture is obtained based on the posture category of the first posture; wherein the posture category includes a sitting posture, a standing posture, or a lying posture. For example, the posture category of the target reference posture is consistent with the posture category of the first posture. In this way, for the person being photographed, there is no need to greatly adjust the posture, which helps to improve the user experience.

In a possible design, the target reference posture is among multiple reference postures corresponding to the category of the current shooting scene, and the similarity with the first posture is greater than or equal to the first threshold. Since the reference posture is a pre-defined graceful and natural posture, the technical solution provided by this possible design helps to minimize the adjustment by the photographer while guaranteeing (or as much as possible) recommending a graceful and natural posture to the user The amplitude of the posture, thereby improving the user experience.

In a possible design, the target reference posture is the reference posture with the highest similarity to the first posture among the multiple reference postures corresponding to the category of the current shooting scene. Since the reference posture is a pre-defined graceful and natural posture, the technical solution provided by this possible design helps to minimize the adjustment by the photographer while guaranteeing (or as much as possible) recommending a graceful and natural posture to the user The amplitude of the posture, thereby improving the user experience.

In a possible design, the position of the target reference posture in the second preview image is determined based on the position of the first preset object in the first preview image in the first preview image; wherein the target reference posture is There is a first association relationship between the first local posture of and the position of the first preset object in the same image, and the first association relationship is predefined or determined in real time. This possible design provides a specific implementation for determining the position of the target reference posture in the second preview image. In this way, it is helpful to improve the degree of combination (or degree of coupling or association) between the person's posture and the preset object in the preview image, so that the photographing effect is better.

In a possible design, the size of the target reference pose in the second preview image is determined based on the size of the second preset object in the first preview image in the first preview image; wherein, There is a second association relationship between the target reference posture and the size of the second preset object in the same image, and the second association relationship is predefined or determined in real time. This possible design provides a specific implementation for determining the size of the target reference pose in the second preview image. In this way, it helps to improve the overall composition effect, thereby making the picture better.

In a possible design, displaying the target reference posture in the second preview image includes: displaying the target reference posture in the form of a human skeleton or a human contour in the second preview image.

In a possible design, the target reference posture information is determined by the first terminal itself, or acquired by the first terminal from a network device.

In a possible design, displaying the target reference pose in the second preview image includes: if the scene category of the current shooting scene includes multiple scene categories, displaying multiple target reference poses in the second preview image; wherein, There is a one-to-one correspondence between the scene category and the target reference state. In this case, if the second posture matches the target reference posture, the target image is generated according to the second preview image, including: if the second posture matches any one of the multiple target reference postures, then according to the second preview The image generates the target image.

In a possible design, the method further includes: sending information about the target reference pose and information about the second preview image to the second terminal to instruct the second terminal to display the second preview image, and display it in the second preview image Target reference posture. In this way, the person who is photographed can see the second preview image and the target reference posture through the content displayed on the second terminal, so that posture adjustment is more convenient and the photographing effect is better.

In a possible design, the method further includes: displaying category information of the current shooting scene in the second preview image. In this way, the user can learn the category information of the current shooting scene, thereby improving the user experience.

In one possible design, different scene categories are characterized by different predefined object groups. If the first preview image contains a predefined object group, the scene category of the current shooting scene is the scene category represented by the predefined object group. If the first preview image contains multiple predefined object groups, the scene category of the current shooting scene is part or all of the scene categories represented by the multiple predefined object groups. In other words, the scene category of the current shooting scene may be one or more.

In a possible design, the proportion of the first portrait in the first preview image is greater than or equal to the second threshold. Or, the number of pixels in the first portrait is greater than or equal to the third threshold. In other words, the first portrait is larger. This is in consideration of “if the person’s portrait is small, it is difficult to judge the posture of the person who is being photographed, which will result in little significance in recommending the reference posture”, and “in order to avoid using the person in the background as the person being photographed.者" and proposed the technical proposal.

In a possible design, if the second posture matches the target reference posture, the target image is generated according to the second preview image, including: if the second posture matches the target reference posture, output prompt information, and the prompt information is used for prompting The second posture matches the target reference posture; the first operation is received; in response to the first operation, the target image is generated according to the second preview image. This method provides a specific implementation method for generating the target image under the instruction of the user. Of course, in specific implementation, the first terminal may automatically generate the target image according to the second preview image when determining that the second posture matches the target reference posture.

In a possible design, the method further includes: if the similarity between the second posture and the target reference posture is greater than or equal to a fourth threshold, determining that the second posture matches the target reference posture.

In a possible design, the method includes: calculating a first vector and a second vector; where the first vector is a vector formed by relative angle information of key points in the second portrait, and is used to represent the second posture; second The vector is a vector formed by the relative angle information of the key points in the portrait in the target reference pose, which is used to characterize the target reference pose; calculate the distance between the first vector and the second vector; if the distance is less than or equal to the fifth threshold, determine the first The similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold.

In a possible design, the method further includes: inputting the second posture and the target reference posture into the neural network to obtain the similarity between the second posture and the target reference posture; wherein the neural network is used to represent multiple input The similarity between postures.

In a second aspect, an image processing device is provided, which may be a terminal, a chip, or a chip system.

In a possible design, the device can be used to execute any of the methods provided in the first aspect above. In this application, the device may be divided into functional modules according to any of the methods provided in the above-mentioned first aspect and any of its possible design manners. For example, each function module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. Exemplarily, the present application may divide the device into a processing unit and a sending unit according to functions. The descriptions of the possible technical solutions and beneficial effects performed by the above-mentioned divided functional modules can all refer to the technical solutions provided by the first aspect or its corresponding possible designs, which will not be repeated here.

In another possible design, the device includes a memory and one or more processors, where the memory is used to store computer instructions, and the processor is used to call the computer instructions to execute the first aspect and any of them. Any method provided by a possible design method. Wherein, the display step in any method provided in the above-mentioned first aspect and any of its possible design manners can be specifically replaced with a control display step in the possible design. The output step in any method provided in the above-mentioned first aspect or any possible design manner can be specifically replaced with a control output step in this possible design.

In a third aspect, a terminal is provided, which includes a processor, a memory, and a display screen. The display screen is used to display images and other information, the memory is used to store computer programs and instructions, and the processor is used to call the computer programs and instructions, and cooperate with the display screen to execute the technical solutions provided by the first aspect or its corresponding possible designs.

In a fourth aspect, a computer-readable storage medium is provided, such as a non-transitory computer-readable storage medium. A computer program (or instruction) is stored thereon, and when the computer program (or instruction) runs on a computer, the computer is caused to execute any method provided by any one of the possible implementations of the first aspect. Wherein, the display step in any method provided in the above-mentioned first aspect and any of its possible design manners can be specifically replaced with a control display step in the possible design. The output step in any method provided in the above-mentioned first aspect or any possible design manner can be specifically replaced with a control output step in this possible design.

In a fifth aspect, a computer program product is provided, which, when running on a computer, enables any method provided in any possible implementation manner of the first aspect or the second aspect to be executed. Wherein, the display step in any method provided in the above-mentioned first aspect and any of its possible design manners can be specifically replaced with a control display step in the possible design. The output step in any method provided in the above-mentioned first aspect or any possible design manner can be specifically replaced with a control output step in this possible design.

It can be understood that any of the image processing devices, computer storage media, computer program products, or chip systems provided above can be applied to the corresponding methods provided above. Therefore, the beneficial effects that can be achieved can be referred to The beneficial effects of the corresponding method will not be repeated here.

In this application, the name of the above-mentioned image processing apparatus or each functional module does not constitute a limitation on the device or the functional module itself. In actual implementation, these devices or functional modules may appear under other names. As long as the function of each device or functional module is similar to that of this application, it falls within the scope of the claims of this application and its equivalent technologies.

These and other aspects of the application will be more concise and understandable in the following description.

Description of the drawings

FIG. 1 is a schematic structural diagram of a terminal that can be adapted to an embodiment of the present application;

FIG. 2 is a block diagram of the software structure of a terminal suitable for an embodiment of the present application;

FIG. 3 is a flowchart of an image processing method provided by an embodiment of the application;

4 is a schematic diagram of a display mode of a target reference posture provided by an embodiment of the application;

FIG. 5 is a schematic diagram of an image displayed on a first terminal in a tower scenario according to an embodiment of the application;

FIG. 6 is a schematic diagram of an image displayed on a first terminal in a sunset scene provided by an embodiment of the application;

FIG. 7 is a flowchart of another image processing method provided by an embodiment of the application;

FIG. 8 is a schematic diagram of a human body key point applicable to the embodiment of the present application;

FIG. 9 is a flowchart of another image processing method provided by an embodiment of the application;

FIG. 10 is a schematic flowchart of a photographing method provided by an embodiment of this application;

FIG. 11 is a schematic diagram of a comparison of photographing effects provided by an embodiment of this application;

FIG. 12 is a schematic structural diagram of a terminal provided by an embodiment of the application.

Detailed ways

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.

In the embodiments of the present application, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

The image processing method provided in the embodiments of this application can be applied to a terminal, which can be a terminal with a camera, such as a smart phone, a tablet computer, a wearable device, an AR/VR device, or a personal computer (personal computer, A PC), a personal digital assistant (PDA), a netbook, and other devices may also be any other terminal that can implement the embodiments of the present application. This application does not limit the specific form of the terminal. Wearable devices can also be called wearable smart devices. It is a general term for using wearable technology to intelligently design everyday wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones. Use, such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.

In this application, the structure of the terminal may be as shown in Figure 1. As shown in FIG. 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142, Antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, A display screen 194, a subscriber identification module (SIM) card interface 195, and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structure illustrated in this embodiment does not constitute a specific limitation on the terminal 100. In other embodiments, the terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. For example, in this application, the processor 110 may control the display screen 194 to display a first preview image of the current shooting scene, the first preview image including the first portrait of the subject in the first posture. Second, the processor 110 recognizes the first preview image to determine the scene category of the current shooting scene. Next, the control display screen 194 displays the second preview image in the current shooting scene, and displays the target reference pose in the second preview image; the target reference pose is obtained at least based on the scene category of the current shooting scene; wherein, the second preview image Including the second portrait of the subject in the second pose. Finally, if the second posture matches the target reference posture, the target image is generated according to the second preview image. For related descriptions of this technical solution, please refer to the following.

The controller may be the nerve center and command center of the terminal 100. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.

A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.

The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. The MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the terminal 100. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the terminal 100.

The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transfer data between the terminal 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect to other terminals, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in this embodiment is merely a schematic description, and does not constitute a structural limitation of the terminal 100. In other embodiments of the present application, the terminal 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

The terminal 100 implements a display function through a GPU, a display screen 194, and an application processor. The GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oled, quantum dot light-emitting diode (QLED), etc. In some embodiments, the terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.

A series of graphical user interfaces (GUIs) can be displayed on the display screen 194 of the terminal 100, and these GUIs are the main screens of the terminal 100. Generally speaking, the size of the display screen 194 of the terminal 100 is fixed, and only limited controls can be displayed on the display screen 194 of the terminal 100. A control is a GUI element. It is a software component contained in an application. It controls all the data processed by the application and the interactive operations on these data. The user can interact with the control through direct manipulation. , So as to read or edit the relevant information of the application. Generally speaking, controls may include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, and Widgets.

The terminal 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.

The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the terminal 100 may include one or N cameras 193, and N is a positive integer greater than one. For example, the aforementioned camera 193 may include one or at least two cameras such as a main camera, a telephoto camera, a wide-angle camera, an infrared camera, a depth camera, or a black and white camera. In combination with the technical solutions provided by the embodiments of the present application, the first terminal may use one or at least two cameras to capture images, and process the captured images (such as fusion, etc.) to obtain preview images (such as the first preview image or The second preview image, etc.).

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the terminal 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121. For example, in this embodiment, the processor 110 may acquire the posture of the terminal 100 by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the terminal 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like. The processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.

The speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C. The terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.

The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The terminal 100 determines the strength of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the movement posture of the terminal 100. In some embodiments, the angular velocity of the terminal 100 around three axes (ie, x, y, and z axes) can be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyroscope sensor 180B detects the shake angle of the terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the terminal 100 through a reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the terminal 100 is a flip phone, the terminal 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Furthermore, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize terminal gestures, switch between horizontal and vertical screens, pedometers and other applications.

Distance sensor 180F, used to measure distance. The terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 100 emits infrared light to the outside through the light emitting diode. The terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 can determine that there is no object in the vicinity of the terminal 100. The terminal 100 can use the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense the brightness of the ambient light. The terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The terminal 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the terminal 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.

Touch sensor 180K, also called "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”. The touch sensor 180K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal 100, which is different from the position of the display screen 194.

The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal. In some embodiments, the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function. The application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.

The button 190 includes a power-on button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The terminal 100 may receive key input, and generate key signal input related to user settings and function control of the terminal 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photographing, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.

In addition, on top of the above components, an operating system runs. For example, the iOS operating system developed by Apple, the Android open source operating system developed by Google, and the Windows operating system developed by Microsoft. You can install and run applications on this operating system.

The operating system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the terminal 100 by way of example.

FIG. 2 is a block diagram of the software structure of the terminal 100 according to an embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.

The application layer can include a series of application packages. As shown in Figure 2, the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc. For example, when taking a picture, the camera application can access the camera interface management service provided by the application framework layer.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 2, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager. For example, in the embodiment of the present application, when taking pictures, the application framework layer may provide APIs related to the photographing function for the application layer, and provide camera interface management services for the application layer to realize the photographing function.

The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.

The content provider is used to store and retrieve data and make these data accessible to applications. The data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, a prompt sound is emitted, the terminal vibrates, and the indicator light flashes.

Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.

The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.

The application layer and application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.

The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.

It should be noted that although the embodiment of the present application takes the Android system as an example for description, its basic principles are also applicable to terminals based on operating systems such as iOS or Windows.

In the following, with reference to FIG. 1 and the shooting scene, the working process of the software and hardware of the terminal 100 is exemplified.

The touch sensor 180K receives the touch operation and reports it to the processor 110, so that the processor 110 starts the camera application in response to the aforementioned touch operation, and displays the user interface of the camera application on the display screen 194. For example, after the touch sensor 180K receives a touch operation on the camera application icon, it reports the touch operation on the camera application to the processor 110, so that the processor 110 starts the camera application in response to the above touch operation, and displays it on the display screen 194 The user interface of the camera. In addition, in the embodiment of the present application, the terminal 100 may also start the camera application in other ways, and display the user interface of the camera application on the display screen 194. For example, when the terminal 100 displays a black screen, displays a lock screen interface, or displays a certain user interface after unlocking, it can start the camera application in response to a user's voice instruction or shortcut operation, and display the user interface of the camera application on the display screen 194.

Regarding how to guide the photographed person to pose for a photo, the basic principle of the solution adopted in the related technology is: predefine several photo postures in the terminal, and then the user manually selects the photo posture when actually taking a photo. This solution requires the subjective judgment of the photographer during the posture recommendation process, and the interactivity is not very friendly and lacks intelligence.

In this regard, an embodiment of the present application provides an image processing method, which is applied to a terminal, and the method includes: displaying a first preview image of the current shooting scene, the first preview image including the first image of the subject in the first posture A portrait; the first preview image is recognized to determine the scene category of the current shooting scene; the second preview image in the current shooting scene is displayed, and the target reference pose is displayed in the second preview image; the target reference pose is at least based on The scene category of the current shooting scene is obtained; the second preview image includes the second portrait of the subject in the second posture; if the second posture matches the target reference posture, the target image is generated according to the second preview image.

In the embodiment of the present application, the terminal automatically determines the current shooting scene, and automatically recommends the target reference posture based on the current shooting scene, so as to instruct (or guide) the person to be photographed to adjust the posture. The entire process of recommending gestures does not require user participation, so the interaction is better and more intelligent, which can improve the user experience.

It should be noted that the “pose” described in the embodiments of the present application may refer to the overall posture of the human body, or may refer to the partial posture of the human body (such as gestures, etc.).

The implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

As shown in FIG. 3, it is a flowchart of an image processing method provided by an embodiment of this application. The method shown in Figure 3 includes the following steps:

S101: The first terminal displays a first preview image of a current shooting scene, where the first preview image includes a first portrait of the photographed person in the first posture.

The first terminal is a terminal for taking pictures, such as a mobile phone held by the photographer. The current shooting scene may be the shooting scene in the field of view shot by the camera of the first terminal when the first terminal executes S101. The first posture is the current posture of the subject in the first preview image, and the first portrait is the image of the subject in the current posture.

The preview image is the image displayed on the terminal's display screen during the photographing process. In an example, from the moment when the terminal starts the photographing function to the moment when the photograph is finished, the preview image may always be displayed on the display screen of the terminal, that is, the terminal displays the preview image in a preview image stream. The first preview image is the preview image for the current shooting scene displayed on the display screen of the first terminal when S101 is executed.

The embodiment of the present application does not limit the method of obtaining the first preview image. For example, the first terminal may collect an image of the current shooting scene through a camera; use the collected image as the first preview image, or perform processing on the collected image. After processing (such as cropping, and/or fusion with other images, etc.), the processed image is used as the first preview image.

Optionally, the proportion of the first portrait in the first preview image is greater than or equal to the second threshold. Optionally, the number of pixels of the first portrait is greater than or equal to the third threshold. In layman's terms, these two optional implementations are intended to illustrate that the reference posture is recommended to the subject in the case of a large portrait of the subject. This is in consideration of “if the person’s portrait is small, it is difficult to judge the posture of the person who is being photographed, which will result in little significance in recommending the reference posture”, and “in order to avoid using the person in the background as the person being photographed.者" and proposed the technical proposal. The embodiment of the present application does not limit the values of the second threshold and the third threshold.

S102: The first terminal recognizes the first preview image to determine the scene category of the current shooting scene.

Optionally, different scene categories are characterized by different predefined object groups. In other words, different shooting scenes can be distinguished by the predefined object groups contained in them.

A predefined object group can include one or more predefined objects. The embodiment of the present application does not limit the object category of the predefined object. For example, the object category of the predefined object may be grass, stairs, seaside, sunset, road or tower, etc. Correspondingly, the embodiment of the present application does not limit the scene category of the shooting scene.

In an example, a predefined object group includes one predefined object, that is, the category of the shooting scene is distinguished based on the category of a single object. For example, taking the predefined objects in multiple predefined object groups are grass, stairs, seaside, sunset, and road as an example, the types of shooting scenes can include: grass scene, step scene, seaside scene, sunset scene, road scene, etc. .

In another example, a predefined object group includes a plurality of predefined objects, that is, the shooting scene is distinguished based on the plurality of objects. For example, the predefined objects in multiple predefined object groups are [Beach, Sunset], [Street, Sunset] and [Stairs, Sunset] as an example, where an object in brackets represents a predefined object group Based on this, the categories of shooting scenes can include: seaside sunset scenes, road sunset scenes, and stairs sunset scenes.

Of course, there may also be a situation where "some shooting scenes are distinguished based on a single object, and other shooting scenes are distinguished based on multiple objects". No specific explanation here.

Which objects the predefined object group includes, the number of predefined object groups, which predefined object group represents the scene category of which shooting scene, etc. may be predefined. Specifically, this information may be pre-stored in the first terminal. For example, when an application for implementing the technical solutions provided in the embodiments of the present application is installed in the first terminal, it is pre-stored in the first terminal along with information such as the installation package of the application. , The information can be updated with the update of the application (such as the update of the version of the application). Or, the information may be pre-stored in other devices (such as network devices), and obtained by the first terminal from the other devices.

The embodiment of the present application does not limit the specific implementation manner of the first terminal to recognize the first preview image to determine the current shooting scene. Optionally, the recognition result may include: which predefined object groups are included in the first preview image. For example, the first terminal first recognizes the categories of objects included in the first preview image (ie, people, grass, steps, etc.). The specific implementation of this step can refer to the prior art; secondly, it determines whether the recognized objects are Objects in the predefined object group to determine which predefined object groups are included in the first preview image.

The scene category of the current shooting scene may include one or more.

Optionally, if the first preview image contains a predefined object group (that is, a single label), the scene category of the current shooting scene is the scene category of the shooting scene represented by the predefined object group.

This situation can be considered to be based on a single tag to determine the scene category of the current shooting scene. Taking “the predefined multiple shooting scenes as step scenes, seaside scenes, and sunset scenes” as an example, if the recognition result is that the first preview image includes steps but does not include the seaside and sunset, the first terminal can determine the step scene as The current shooting scene.

Optionally, if the first preview image contains multiple predefined object groups (ie, multiple tags), the scene category of the current shooting scene is the scene category of some or all of the shooting scenes represented by the multiple predefined object groups.

This situation can be regarded as determining the scene category of the current shooting scene based on multiple tags. As an example, if the scene category of the current shooting scene is the scene category of part of the shooting scene represented by the multiple predefined object groups, specifically, the "predefined objects whose priority meets the condition" in the multiple predefined object groups may be The scene category of the shooting scene represented by "group" is used as the scene category of the current shooting scene. Among them, the predefined object group whose priority satisfies the condition may include: the predefined object group with the highest priority, or the predefined object group with the priority higher than the preset level.

Take "the multiple shooting scenes stored in the first terminal are step scenes, seaside scenes, and sunset scenes, and the priority order of the predefined object groups from high to low: steps, seaside, sunset" as an example, if the first The preview image includes steps and sunset, and the first terminal may determine the step scene as the current shooting scene based on the priority order of the steps and sunset.

It should be noted that if the recognition result is that the first preview image does not contain any predefined object group, that is, the current shooting scene is not the above-mentioned shooting scene distinguished by the included predefined objects, then the first terminal device may The current shooting scene is determined as the default scene. Wherein, the default scene may also be a scene pre-stored in the first terminal.

(Optional) S103: The first terminal displays scene category information of the current shooting scene. The scene category information may include: identification information of the scene category, such as text information, picture information, and so on.

Specifically, the first terminal displays scene category information of the current shooting scene on the display screen.

S104: The first terminal acquires the target reference posture, the position of the target reference posture in the second preview image, and the size of the target reference posture in the second preview image. The target reference pose is obtained at least based on the scene category of the current shooting scene. The number of target reference postures can be one or more.

The second preview image may be a preview image for the current photographing scene displayed on the first terminal when S105 is executed. The second preview image may be an image collected by a camera installed on the first terminal, or may be an image obtained by processing an image collected by a camera installed on the first terminal, and the processing steps can be referred to above.

Between the first preview image and the second preview image, there may be one or more frames of preview images.

It is understandable that in the actual shooting process, the current shooting scene may be different when the first terminal displays the first preview image and the second preview image due to the shake of the photographer. Considering this point, for the convenience of description, the embodiments of the present application are based on the fact that the jitter is within the error range when the first terminal displays the first preview image and the second preview image, that is, the current shooting scene has little change. It can be ignored, as an example. This is a unified description, and will not be repeated below.

It is understandable that the target reference posture may be displayed in each frame of preview image after the first preview image in the preview image stream. Optionally, the position of the target reference posture in each frame of the preview image displayed by it is the same (or approximately the same).

Optionally, the target reference posture and the first posture satisfy at least one of the following conditions 1 to 3:

Condition 1: The target reference posture is different from the first posture.

Condition 2: The relative position of the target reference posture in the second preview image is different from the relative position of the first posture in the first preview image.

In an implementation manner, the relative position of the target reference posture in the second preview image may be the position of the target reference posture relative to a reference object in the current shooting scene. The relative position of the first posture in the first preview image may be the position of the first posture relative to the reference object in the current shooting scene. Wherein, the reference object may be a predefined object, or may be an object in the current shooting scene determined by the first terminal in real time.

In another implementation manner, the relative position of the target reference posture in the second preview image may be the position of the target reference posture in the coordinate system where the second preview image is located. The relative position of the first posture in the first preview image may be the position of the first posture in the coordinate system where the first preview image is located. Among them, the two coordinate systems are the same or roughly the same. It is understandable that if we do not consider issues such as the shaking of the first terminal during the photographing process, that is to say, the current photographing scene when the first preview image is displayed is different from the current photographing scene when the second preview image is displayed. Same, then the two coordinate systems are usually the same.

Condition 3: The size occupied by the target reference posture in the second preview image is different from the size occupied by the first posture in the first preview image.

The following describes the specific implementation manners of obtaining the target reference pose, obtaining the position of the target reference pose in the second preview image, and obtaining the size of the target reference pose in the second preview image provided by the embodiments of the present application:

First, get the target reference pose

The embodiment of the present application does not limit how to obtain the target reference posture. Possible implementations are provided below:

Manner 1. The target reference pose is obtained based on the scene category of the current shooting scene. Specifically, the first terminal may determine the reference pose corresponding to the scene category of the current shooting scene based on the correspondence between the scene categories of the multiple preset shooting scenes and the multiple reference poses, and use the determined reference pose as the target Reference posture. Wherein, the corresponding relationship is pre-stored in the first terminal or acquired by the first terminal from the network device.

Among them, the scene category of a shooting scene may correspond to one or more reference poses, and the reference poses corresponding to the scene categories of different shooting scenes may be the same or different. As shown in Table 1, the corresponding relationship between the scene category of the shooting scene and the reference posture provided in this embodiment of the application.

Table 1

Optionally, if the scene category of the current shooting scene corresponds to multiple reference poses, then:

In an example, the target reference posture may be any one or more of the multiple reference postures corresponding to the category of the current shooting scene. For example, referring to Table 1, if the scene category of the current shooting scene is a step scene, the target reference posture may be at least one of the reference posture 21 and the reference posture 22.

In another example, the target reference posture may be a reference posture whose similarity with the first posture is greater than or equal to the first threshold among multiple reference postures corresponding to the category of the current shooting scene. For example, referring to Table 1, if the scene category of the current shooting scene is a step scene, the target reference posture may be the reference posture of the reference posture 21 and the reference posture 22, and the similarity with the first posture is greater than or equal to the first threshold. .

In another example, the target reference posture may be the reference posture with the highest similarity to the first posture among the multiple reference postures corresponding to the category of the current shooting scene. Referring to Table 1, if the scene category of the current shooting scene is a step scene, the target reference posture may be the reference posture with the highest similarity between the reference posture 21 and the reference posture 22 and the first posture.

For the specific implementation of the similarity between postures, please refer to the following, which will not be repeated here.

Manner 2: The target reference posture is determined based on the scene category of the current shooting scene and the posture category of the first posture. The posture category of the first posture can be used to determine the posture category of the target reference posture. For example, the posture category of the target reference posture is consistent with the posture category of the first posture.

Specifically, the first terminal may determine the corresponding relationship between the scene category of the multiple preset shooting scenes, the preset posture category, and the multiple reference postures, to determine the scene category corresponding to the current shooting scene and the first posture category. Reference posture, and use the determined reference posture as the target reference posture. Wherein, the corresponding relationship is pre-stored in the first terminal or acquired by the first terminal from the network device.

The posture category may include one or more of standing posture, sitting posture, and prone posture. Of course, in specific implementation, in an example, the posture category may also include postures parallel to the standing posture, sitting posture, and prone posture. In another example, the posture category may also be a more fine-grained classification of any one or more of the standing posture, sitting posture, and lying posture, so as to obtain a more fine-grained posture category. Of course, there may also be other implementation manners, which are not limited in the embodiment of the present application. In the following specific examples, the posture categories include standing posture, sitting posture, and lying posture as examples.

The scene category of a shooting scene can correspond to one or more pose categories. A pose category can correspond to one or more reference poses. The posture categories corresponding to the scene categories of different shooting scenes may be the same or different. The reference states corresponding to the same posture category under different scene categories may be the same or different. As shown in Table 2, the corresponding relationship between the scene category, the posture category, and the reference posture of the shooting scene provided in this embodiment of the application.

Table 2

Optionally, if there are multiple reference poses corresponding to both the scene category of the current shooting scene and the pose category of the first pose, then:

In an example, the target reference posture may be any one or more of the multiple reference postures corresponding to the category of the current shooting scene and the posture category of the first posture. For example, referring to Table 2, if the scene category of the current shooting scene is a grass scene and the posture category of the first posture is a standing posture, the target reference posture may be at least one of the reference posture 11A and the reference posture 11B.

In another example, the target reference posture may be a reference of a plurality of reference postures corresponding to the category of the current shooting scene and the posture category of the first posture, and the similarity with the first posture is greater than or equal to the first threshold. posture. For example, referring to Table 2, if the scene category of the current shooting scene is a grass scene, and the pose category of the first posture is a standing posture, the target reference posture may be between the reference posture 11A and the reference posture 11B and the first posture. The reference posture whose similarity is greater than or equal to the first threshold.

In another example, the target reference posture may be the reference posture with the highest similarity to the first posture among multiple reference postures corresponding to the category of the current shooting scene and the posture category of the first posture. Referring to Table 2, if the scene category of the current shooting scene is a grass scene, and the posture category of the first posture is a standing posture, the target reference posture can be the similarity between the reference posture 11A and the reference posture 11B and the first posture The highest reference posture.

It should be noted that the reference posture corresponding to the scene category of the shooting scene is a posture that the first terminal can use to recommend to the user in the shooting scene. The reference posture corresponding to both the scene category of the shooting scene and a certain posture category is a posture that the first terminal can use to recommend to the user in the shooting scene and the posture category of the current posture of the subject. In an example, in layman's terms, the reference posture is a graceful and natural posture determined by the first terminal/network device. The embodiment of the present application does not limit the determination method of the reference posture corresponding to the scene category of the shooting scene. For example, it may be determined based on methods such as big data analysis and pre-stored in the first terminal or network device.

Optionally, the scene category of the shooting scene, the reference pose corresponding to the scene category of each shooting scene, the pose category corresponding to the scene category of each shooting scene, and the reference of each pose category corresponding to the scene category of each shooting scene Posture etc. can be updated. For example, the method provided in the embodiment of the present application is implemented by an application installed in the first terminal, and the above-mentioned information is updated by updating the version of the application and so on. In another example, the above-mentioned information is all stored in the network device, and the first terminal obtains the above-mentioned information from the network device in real time.

Second, get the position of the target reference pose in the second preview image

The embodiment of the present application does not limit how to obtain the position of the target reference posture in the second preview image.

Optionally, the position of the target reference posture in the second preview image is determined based on the position of the first preset object in the first preview image in the first preview image. There is a first association relationship between the first local posture in the target reference posture and the position of the first preset object in the same image, and the first association relationship is predefined or determined in real time.

The first preset object may be one or more predefined objects. The first preset object and the objects included in the category of the current shooting scene may be the same or different. For example, in a tower scene, the first preview image contains a tower, and the first preset object may be a tower. In the grass scene, the first preview image may include sunset and grass, etc., and the first preset object may be the sunset. More specifically, the first preset object may be the bottom or top of the tower, the center of the sunset, the edge of the sunset, and so on. In addition, the first preset object may be the first portrait, or a part of the first portrait.

The first local posture may be one or more pre-defined postures, such as a human hand.

The first local posture and the first preset object having the first association relationship may include: the first local posture and the first preset object have an association relationship in orientation, and/or an association relationship in distance, and the like.

For example, the first local posture has an association relationship with the first preset object in terms of orientation, which may include: the first local posture is above, below, diagonally above the first preset object, and so on. The first local posture and the first preset object have an association relationship in distance, which may include: the distance between the first local posture and the first preset object is less than or equal to a threshold, and the like.

The first association relationship may be predefined. For example, the first association relationship is predefined in the first terminal or predefined in the network device. Alternatively, the first association relationship may be obtained in real time. For example, the first association relationship is obtained in real time by the first terminal or network device based on some pre-stored images through certain analysis and calculation.

For example, when the current shooting scene is a tower scene, if the target reference posture is the posture of "hand support tower", the preset object can be the tower (specifically the tower bottom), and the local posture can be the hand used for "tota" Posture. As shown in FIG. 5, a schematic diagram of an image displayed on a first terminal in a tower scenario provided by an embodiment of this application. Wherein, the diagram a in FIG. 5 illustrates a partial diagram of the second preview image, which includes a human hand 41 and a tower 42 (ie, a preset object). The target reference posture is the posture of the "hand support tower". Based on this, the first terminal can determine the position of the target reference posture in the second preview image based on the association relationship between the “hand for the tower” and the “bottom of the tower” (ie, relative orientation information and relative distance information), As shown in figure b in Figure 5.

Third, get the size of the target reference pose in the second preview image

Optionally, the size occupied by the target reference posture in the second preview image is determined based on the size occupied by the second preset object in the first preview image in the first preview image. There is a second association relationship between the target reference posture and the size of the second preset object in the same image, and the second association relationship is predefined or determined in real time.

If the size of the target reference pose in the second preview image occupies, it can be the pixel points occupied by the target reference pose in the second preview image, or the smallest rectangular frame (or other shapes) that contains the target reference pose in the first preview image. 2. The pixels occupied in the preview image, etc.

The second preset object may be the same as or different from the above-mentioned first preset object.

There is a second association relationship between the target reference pose and the size of the second preset object in the same image, and it may be that the proportion of the target reference pose and the second preset object in the same image satisfies the preset relationship.

As shown in Fig. 6, a schematic diagram of an image displayed on the first terminal in a sunset scene provided by this embodiment of the application. Wherein, the diagram a in FIG. 6 represents the second preview image, which includes the sunset 51 and the subject 52. FIG. 6B shows the second preview image in which the target reference posture 53 is displayed. The size of the target reference posture 53 is determined based on the size of the sunset in the second preview image (that is, the second preset object).

Optionally, the position and size of the target reference posture in the second preview image are determined based on the composition of the first preview image. For example, the composition of the image obtained after replacing the portrait of the second pose with the portrait of the target reference pose in the second preview image is better than the composition of the first preview image. The embodiment of the present application does not limit the specific judgment method of comparing who is superior or inferior between two compositions, and it can be determined based on some common judgment standards or judgment algorithms in the technical field, which will not be repeated here.

It should be noted that some or all of the technical features in any of the above-mentioned technical solutions can be used in combination without conflict to form a new technical solution.

Optionally, the information of the target reference posture may be determined by the first terminal itself, for example, determined by the first terminal based on information stored by itself; it may also be obtained by the first terminal from a network device. The information of the target reference posture includes but is not limited to at least one of the following: the posture type of the target reference posture, the position of the target reference posture in the second preview image, or the size of the target reference posture in the second preview image. Wherein, the size of the target reference posture in the second preview image can be characterized by the number of pixels occupied by the target reference posture.

The embodiment of the present application does not limit the specific implementation manner in which the first terminal obtains the target reference posture information from the network device. For example, the first terminal sends a first preview image of the current shooting scene (or information obtained after processing the first preview image) to the network device. The network device performs the following steps: First, based on the received information, determine the scene category of the current shooting scene. Then, the reference pose corresponding to the scene category of the current shooting scene is selected in the database. From these reference postures, select the reference posture with the same posture type as the "posture type of the first posture", and use the selected reference posture as the target reference posture; then, based on one or more of the above methods 1 to 3 In combination, the position and size of the target reference posture in the second preview image are determined, and information such as the determined target reference posture and the position and size of the target reference posture in the second preview image are sent to the first terminal. Based on the received information, the first terminal displays the target reference posture in the second preview image.

It should be noted that, compared to the terminal, the storage space of the network device is larger and the computing power is stronger. Therefore, the image stored in the database of the network device will be richer. In this way, the network device determines the target reference posture and target. The position and size of the reference posture in the second preview image can make the photographing effect better.

S105: The first terminal displays the second preview image in the current shooting scene, and displays the target reference posture in the second preview image. Wherein, the position and size of the target reference posture in the second preview image may be the position and size determined in S104, respectively. The second preview image includes a second portrait of the subject in the second posture.

The first terminal displays the second preview image in the current shooting scene on the display screen. The second posture is the current posture of the subject in the second preview image, and the second portrait is the image of the subject in the current posture. For other explanations about the second preview image, please refer to the above, which will not be repeated here.

The first posture and the second posture are the postures of the same subject in the same shooting scene at different moments. Optionally, the first posture is different from the second posture.

Optionally, the first terminal may display the target reference posture in each frame of the second preview image displayed by the first terminal after performing S103 and before performing S106.

In some embodiments of the present application, it is considered that the target reference pose is not a part of the second preview image (or not a component of the second preview image), but an image displayed on the upper layer of the second preview image. The other features below are also described based on this. It should be noted that if it is considered that the target reference pose is a part of the second preview image, the “generating the target image based on the second preview image” in S106 below may specifically include: based on the second preview image that does not contain the target reference pose, Generate the target image.

The embodiment of the present application does not limit the manner in which the target reference posture is displayed in the second preview image. For example, the target reference posture may be displayed in a manner such as a human skeleton or a human body contour. As shown in Figure a in Figure 4, it is a schematic diagram of displaying the target reference posture in the form of a human skeleton, where the points in the human skeleton may be specific joints of the human body. Figure b in Figure 4 is a schematic diagram of displaying the target reference posture in the form of a human body contour. Among them, the outline of the human body can be presented in the form of simple strokes.

Optionally, as shown in FIG. 7, before or after or at the same time as S105, the method may further include the following steps 1 to 2:

Step 1: The first terminal sends the target reference posture information and the second preview image information to the second terminal to instruct the second terminal to display the second preview image, and display the target reference posture in the second preview image.

Step 2: The second terminal displays a second preview image based on the received information, and displays the target reference posture in the second preview image.

It is understandable that the content displayed on the display screen of the first terminal can be seen by the photographer, and usually cannot be seen by the photographer. Here, the second terminal may be a terminal used by the photographer, or in other words, the content displayed on the display screen of the second terminal is a terminal that can be seen by the photographer. The embodiment of the present application does not limit the connection mode between the first terminal and the second terminal. For example, it may be a Bluetooth connection.

The technical solution can be described as: synchronizing the information displayed on the terminal used by the photographer to the terminal used by the photographer. In this way, the person who is photographed can see the second preview image and the target reference posture through the content displayed on the second terminal, so that posture adjustment is more convenient and the photographing effect is better. There is no need to guide the photographer to adjust the posture through communication between the photographer and the photographer as in the prior art.

S106: If the second posture matches the target reference posture, the first terminal generates the target image based on the second preview image. Subsequently, the first terminal may save the target image.

The target image may be an image obtained by the first device shooting the current shooting scene. In other words, the target image is the image that the first terminal needs to save. In comparison, the above-mentioned first preview image and second preview image are images that the first terminal does not need to save. Of course, the specific implementation is not limited to this.

During the process of adjusting the posture of the subject, the first terminal can obtain the second preview image in real time, and recognize the posture of the subject in the second preview image (marked as the second posture), and then determine the second posture and the target Whether the reference posture matches. If the second posture matches the target reference posture, the target image is determined based on the second preview image. Optionally, if the second posture does not match the target reference posture, the subject can continue to adjust the posture, and the first terminal can continue to collect the second preview image until the second posture in the collected second preview image is the target posture So far.

Determining the target image based on the second preview image may include: directly using the second preview image as the target image; or processing the second preview image (such as enhancement, noise reduction, etc.) to obtain the target image.

Optionally, based on the description in S103, the scene category of the current shooting scene may include multiple types. Based on:

In S104, a target reference posture can be determined based on each scene category of the current shooting scene.

In S105, the first terminal may display each determined target reference posture in the second preview image. Among them, different target reference postures can be displayed in the same or different manners, for example, human body contours of different colors are displayed to display different target reference postures, and so on.

Based on this: In an implementation manner, in S106, the first terminal may generate a target image based on the second preview image when determining that the second posture matches any one of the multiple target reference postures. In another implementation manner, after performing S105, the first terminal may receive an operation instructed by the user, and in response to the operation, display a target reference gesture in the second preview image. That is, the user selects one target reference posture from the multiple target reference postures displayed in S105 for display. In this case, when S106 is executed, the first terminal uses the second posture to match the target reference posture selected by the user. Among them, the "user" here can be the photographer or the person being photographed.

Optionally, if the similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold, it is determined that the second posture matches the target reference posture. The embodiment of the present application does not limit how to determine the similarity between the second posture and the target reference posture. For example, it can be implemented in the following manner one or two:

method one:

Step A: Calculate the first vector and the second vector; where the first vector is a vector formed by the relative angle information of the key points in the second portrait, and is used to represent the second posture. The second vector is a vector formed by the relative angle information of key points in the portrait in the target reference posture, and is used to characterize the target reference posture.

The key point is a point used to characterize the posture of the human body, for example, it may be a key point of a human bone, such as a joint. As shown in FIG. 8, it is a schematic diagram of a human body key point applicable to the embodiment of the present application. The key points shown in Figure 8 include: chin, clavicle center, shoulders, elbows, hands, hip bones, knee joints, ankles, etc.

The relative angle information of the key points is specifically: information about the relative angle between the key points that have a connection relationship on the human body. For example, taking the key point is the left leg knee joint as an example, the relative angle information of the key point can be "the straight line between the left leg knee joint and the left ankle (that is, the left thigh)" and "the straight line between the leg knee joint and the left hip bone ( And the left calf)". For another example, taking the key point is the elbow of the left arm as an example, the relative angle information of the key point can be the clip between "the line between the elbow of the left arm and the left shoulder" and "the line between the elbow of the left arm and the left hand" Angle of information.

It is understandable that for some key points (specifically the last key point in a certain direction), there may be no relative angle information. For example, taking Figure 8 as an example, if the key point is the left hand, there is no relative angle information. Angle information. This is only an example, and it does not limit the determination of the relative angle information of the key points applicable to the embodiments of the present application.

The embodiments of the present application do not limit the specific key points that characterize the human body posture, and the relative angle information of which key points are calculated. For example, the method for determining the key points that characterize the human body posture can refer to the prior art. It is understandable that the key points of the human body posture and the relative angle information of which key points need to be calculated can be predefined. After the information is determined, the relative angle information of these key points can be determined based on the angle calculation method in the prior art.

Optionally, the number of elements of the first vector and the second vector are the same, and the elements at the same position in the two vectors respectively represent the relative angle information of the same key point in the human body. For example, the first vector is [A1, A2, A3, A4]; the second vector is [B1, B2, B3, B4]. Among them, A1 and B1 respectively represent the relative angle information of the human left shoulder in the second posture and the target reference posture, and A2 and B2 represent the relative angle information of the human right shoulder in the second posture and the target reference posture, respectively. The meanings of other elements are similar to this and will not be explained one by one.

It is understandable that because the relative angle information of the key points can measure the specific posture of the human body, for example, when the angle of the thigh and the calf is 90 degrees, then the knee is in a bent state. Therefore, the overall posture of the human body can be measured based on the relative angle information between multiple key points of the human body that have a connection relationship. Based on this, the basic principle of the first method is: the similarity of the overall posture of the human body is measured and decomposed into: the similarity of the specific posture of the key points of the human body is measured.

Step B: Calculate the distance between the first vector and the second vector. For example, calculating the Euclidean distance between the first vector and the second vector, etc.

Step C: If the distance between the first vector and the second vector is less than or equal to the fifth threshold, it is determined that the similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold.

The smaller the distance between the first vector and the second vector, the greater the similarity between the second posture and the target reference posture.

The fifth threshold is predefined, and is used to characterize the distance between the first vector and the second vector when the similarity between the reference posture and the second posture is the fourth threshold.

Way two:

The second posture and the target reference posture are input to the neural network to obtain the similarity between the second posture and the target reference posture; wherein the neural network is used to characterize the similarity between the input multiple postures.

The first method above is based on a conventional method to calculate the similarity between postures. The second method is based on a neural network such as a convolutional neural network (convolutional neural network, CNN) to calculate the similarity between postures. In specific implementation, the neural network model can be pre-stored in the first terminal. The neural network model can be obtained by training based on multiple sets of training data, where one set of training data includes two images with different poses (the images can be captured by a camera). The image can also be the image obtained by processing the image collected by the camera), and the degree of similarity between the posture of the human body in the two images. For training on multiple sets of training data, a neural network model can be obtained. Based on the introduction of the principle of the relationship between the specific posture of the human body and the overall posture measurement in mode 1, in an example, the process of training the training data can be considered as the similarity measurement relationship of the key points of the neural network model learning (that is, the learning gains use For the process of characterizing the posture vector).

In an example, the neural network model pre-stored in the first terminal can be updated. For example, taking the method provided in the embodiment of the present application executed by an application installed in the first terminal as an example, the neural network model may be updated by an update of the application (such as a version update). Of course, the embodiments of the present application are not limited to this.

The above-mentioned method 1 and method 2 are only examples, which do not constitute a limitation on the calculation method applicable to the embodiment of the present application for calculating the similarity between two human postures.

In S106, if the second posture matches the target reference posture, the first terminal generates the target image based on the second preview image. specific:

In an implementation manner, the first terminal may automatically generate the target image based on the second preview image when it is determined that the second posture matches the target reference posture. In other words, the first terminal takes pictures autonomously, or takes a snapshot. This process does not require user involvement, so the interaction with the user is better and smarter, which helps to improve the user experience.

In another implementation manner, as shown in FIG. 9, the foregoing S106 may include:

S106A: When it is determined that the second posture matches the target reference posture, the first terminal outputs prompt information, where the prompt information is used to prompt that the second posture matches the target reference posture.

S106B: The first terminal receives the first operation.

The first operation can be a voice operation, or a touch screen operation. For example, a method of touching a virtual control on the display screen in a specific touch mode, a method of pressing a specific physical control on the first terminal, and so on.

S106C: In response to the first operation, the first terminal generates a target image based on the second preview image.

In other words, take pictures under the user's instructions. It should be noted that in this implementation manner, although the photo is taken under the instruction of the user, the prompt information output to the user is output when the first terminal autonomously determines that the second posture matches the target reference posture. This process does not require the user to judge, so it helps to improve the user experience.

The prompt information here can be any prompt information such as voice prompt information, text prompt information, pattern prompt information, a special mark of a control on the interface (such as flashing or brightening), or any combination of various prompt information. The application embodiment does not limit this.

In the image processing method provided by the embodiment of the present application, the first terminal automatically determines the current shooting scene, and automatically recommends the target reference posture based on the current shooting scene, so as to instruct (or guide) the person being photographed to adjust the posture. That is to say, the embodiment of the present application provides an intelligent gesture guidance/recommendation method that integrates scene information, and the entire process of recommending gestures does not require user participation, so the interaction is better and more intelligent, which can improve the user’s Experience.

Hereinafter, in combination with the method described above, an actual application scenario of the technical solution provided by the embodiment of the present application is explained.

As shown in FIG. 10, it is a schematic flowchart of a photographing method provided by an embodiment of this application. The method shown in FIG. 10 may include the following steps:

S201: The user (which may be any user such as the photographer or the photographed person) sends a second operation to the first terminal. The second operation is for the first terminal to start the camera application. The second operation may be a touch screen operation or a voice operation issued by the user.

S202: The first terminal receives the second operation. In response to the second operation, the first terminal launches the camera application.

S203: The first terminal displays the target user interface of the camera application on the display screen. The target user interface contains a "posture recommendation mode" control. The trigger gesture recommendation mode can enable the first terminal to execute the image processing method provided in the embodiment of the present application.

The target user interface may be the first user interface after the camera application is started, or it may be the non-first user interface after the camera application is started. For example, after the camera application is started and before the user interface is displayed, the user can choose whether to turn on the flash, etc., so that the target user interface is not the first user interface after startup.

S204: The user (which may be any user such as the photographer or the photographed person) sends a third operation to the first terminal. The third operation is to act on the gesture recommendation mode control. The third operation may be a touch screen operation issued by the user.

S205: The first terminal receives the third operation. In response to the third operation, the first terminal enters a gesture recommendation mode. Then, the following S206 is executed.

Regarding the foregoing S203 to S205, alternatively, the first terminal may not display the foregoing target user interface (that is, the target user interface containing the mode space recommended by the gesture) on the first terminal, but after the first terminal starts the camera application, It automatically enters the posture recommendation mode, and then executes the following S206.

S206: The first terminal executes the foregoing steps S101 to S105.

After this step is completed, a second preview image is displayed on the first terminal, and the target reference posture is displayed in the second preview image. Wherein, the posture of the subject in the second preview image is the second posture. It is understandable that the first terminal will collect the actual image of the current shooting scene in real time, and based on the actual image, generate and display a second preview image of frame by frame, thereby displaying the effect of the preview image stream, and one or more frames The target reference posture is displayed in the second preview image of each frame (for example, each frame).

S207: The subject adjusts the current posture based on the target reference posture displayed in the second preview image.

In an implementation manner, the photographer instructs the photographer to adjust the current posture based on the second preview image displayed on the first terminal and the target reference posture displayed in the second preview image.

In another implementation manner, the first terminal may display the second preview image and the target reference posture on the display screen of the second terminal based on the above steps 1 to 2. The photographer adjusts the current posture by viewing the second preview image displayed on the display screen of the second terminal and the target reference posture displayed in the second preview image.

S208: If the target reference posture matches the second posture, the first terminal generates the target image based on the second preview image. Subsequently, the first terminal may save the target image.

The second preview image in this step may be any frame of the second preview image in S207, and correspondingly, the second posture is the posture of the subject displayed in the second preview image.

For the specific implementation of S208, please refer to the above, which will not be repeated here.

As shown in FIG. 11, it is a schematic diagram of a comparison of photographing effects provided by an embodiment of this application. Wherein, the diagram a in FIG. 11 represents the first preview image, and the effect of the photo is average. Diagram b in FIG. 11 represents a target image obtained based on "a second preview image that satisfies the second posture to match the target reference posture". Obviously, under normal circumstances, the user will think that the posture of the human body in the target image is more graceful and natural compared to the first preview image.

It can be understood that, in order to implement the functions in the foregoing embodiments, the terminal includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that, in combination with the units and method steps of the examples described in the embodiments disclosed in the present application, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.

FIG. 12 contains a schematic structural diagram of a possible image processing device provided by an embodiment of the present application. These image processing apparatuses can be used to implement the functions of the terminal in the foregoing method embodiments, and therefore can also achieve the beneficial effects of the foregoing method embodiments. In the embodiment of the present application, the image processing apparatus may be the terminal 100 as shown in FIG. 1, or may be a module (such as a chip) applied to the terminal. In the following, the image processing apparatus is the terminal 11 as an example for description.

The terminal 11 includes: a display unit 111, a determination unit 112, and a generation unit 113. The display unit 111 is configured to display a first preview image of a current shooting scene, and the first preview image includes a first portrait of the photographed person in a first posture. The determining unit 112 is configured to recognize the first preview image to determine the scene category of the current shooting scene. The display unit 111 is further configured to display a second preview image in the current shooting scene, and display the target reference pose in the second preview image; the target reference pose is obtained at least based on the scene category of the current shooting scene; wherein, the second preview The image includes a second portrait of the subject in the second posture. The generating unit 113 is configured to generate a target image according to the second preview image if the second posture matches the target reference posture. For example, in conjunction with FIG. 3, the display unit 111 may be used to perform S101 and S105. The determining unit 112 may be used to perform S102. The generating unit 113 may be used to perform S106.

Optionally, the target reference posture and the first posture meet at least one of the following conditions: the target reference posture is different from the first posture; the relative position of the target reference posture in the second preview image is different from the first posture in the first preview image The relative position of is different; or, the size of the target reference posture in the second preview image is different from the size of the first posture in the first preview image.

Optionally, the scene category of the current shooting scene includes at least one of the following categories: grass scene, step scene, seaside scene, sunset scene, road scene, or tower scene.

Optionally, the posture category of the target reference posture is obtained based on the posture category of the first posture; wherein the posture category includes a sitting posture, a standing posture, or a lying posture.

Optionally, the target reference posture is a reference posture whose similarity with the first posture is greater than or equal to a first threshold among multiple reference postures corresponding to the category of the current shooting scene.

Optionally, the target reference pose is the reference pose with the highest similarity to the first pose among multiple reference poses corresponding to the category of the current shooting scene.

Optionally, the position of the target reference posture in the second preview image is determined based on the position of the first preset object in the first preview image in the first preview image. Wherein, the first local posture in the target reference posture and the position of the first preset object in the same image have a first association relationship, and the first association relationship is predefined or determined in real time.

Optionally, the display unit 111 is specifically configured to display the target reference posture in the second preview image with a human skeleton or a human contour. For example, in conjunction with FIG. 4, the display unit 11 may display the target reference posture shown in FIG. 4.

Optionally, the target reference posture information is determined by the terminal itself, or the terminal obtains it from a network device.

Optionally, the display unit 111 is specifically configured to: if the scene category of the current shooting scene includes multiple scene categories, display multiple target reference poses in the second preview image; wherein the scene category corresponds to the target reference state one-to-one. The generating unit 113 is specifically configured to generate a target image according to the second preview image if the second posture matches any one of the multiple target reference postures.

Optionally, the terminal 11 further includes: a sending unit 114, configured to send information about the target reference pose and information about the second preview image to the second terminal, so as to instruct the second terminal to display the second preview image, and display the second preview image on the second terminal. The target reference posture is displayed in. For example, with reference to FIG. 7, the sending unit 114 may be used to perform step 1. The second terminal can be used to perform step 2.

Optionally, the display unit 111 is further configured to display category information of the current shooting scene in the second preview image.

Optionally, different scene categories are characterized by different predefined object groups; if the first preview image contains a predefined object group, the scene category of the current shooting scene is the scene category represented by the predefined object group; if the first preview image contains a predefined object group The preview image contains multiple predefined object groups, and the scene category of the current shooting scene is part or all of the scene categories represented by the multiple predefined object groups.

Optionally, the proportion of the first portrait in the first preview image is greater than or equal to the second threshold; or, the number of pixels of the first portrait is greater than or equal to the third threshold.

Optionally, the terminal 11 further includes: an output unit 115, configured to output prompt information if the second posture matches the target reference posture, and the prompt information is used to prompt that the second posture matches the target reference posture. The receiving unit 116 is configured to receive the first operation. The generating unit 113 is specifically configured to generate the target image according to the second preview image in response to the first operation. For example, with reference to FIG. 9, the output unit 115 may be used to perform S106A, the receiving unit 116 may be used to perform S106B, and the generating unit 116 may be used to perform S106C.

Optionally, the determining unit 112 is further configured to, if the similarity between the second posture and the target reference posture is greater than or equal to a fourth threshold, determine that the second posture matches the target reference posture.

Optionally, the terminal 11 further includes: a calculation unit 117.

In one implementation, the calculation unit 117 is used to calculate the first vector and the second vector; where the first vector is a vector formed by the relative angle information of the key points in the second portrait, and is used to represent the second posture; The vector is a vector formed by the relative angle information of the key points in the portrait in the target reference pose, and is used to characterize the target reference pose. And, calculate the distance between the first vector and the second vector. The determining unit 112 is further configured to, if the distance is less than or equal to the fifth threshold, determine that the similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold.

In another implementation manner, the calculation unit 117 is configured to input the second posture and the target reference posture into the neural network to obtain the similarity between the second posture and the target reference posture; wherein the neural network is used to represent the amount of input. The similarity between different postures.

For specific descriptions of the foregoing optional manners, reference may be made to the foregoing method embodiments, and details are not described herein again. In addition, the explanation and the description of the beneficial effects of any of the image processing apparatuses 11 provided above can refer to the corresponding method embodiments described above, and will not be repeated.

As an example, with reference to FIG. 1, the functions of the above-mentioned display unit 111 may be implemented through the display screen 194. The function of any one of the above-mentioned determining unit 112, generating unit 113, and calculating unit 117 can be implemented by calling the degree code stored in the internal memory 121 by the processor 110. The above-mentioned sending unit 114 can be realized by the functions of the mobile communication module 150 or the wireless communication module 160 in combination with the antenna connected thereto. The above-mentioned output unit 115 may be implemented by a device for outputting information, such as the display screen 114 or the speaker 170A. The above-mentioned receiving unit 116 may be implemented by a device for inputting information, such as a display screen, a microphone 170C, and the like.

Another embodiment of the present application further provides a computer-readable storage medium that stores instructions in the computer-readable storage medium. When the instructions are executed on a terminal, each step executed by the terminal in the method flow shown in the foregoing method embodiment is performed.

In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles.

It should be understood that the arrangement described here is for illustrative purposes only. Thus, those skilled in the art will understand that other arrangements and other elements (for example, machines, interfaces, functions, sequences, and functional groups, etc.) can be used instead, and some elements can be omitted altogether depending on the desired result . In addition, many of the described elements are functional entities that can be implemented as discrete or distributed components, or combined with other components in any appropriate combination and position.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer execution instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or may include one or more data storage devices such as a server or a data center that can be integrated with the medium. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. It should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

An image processing method, characterized in that it is applied to a first terminal, and the method includes:

Displaying a first preview image of the current shooting scene, where the first preview image includes a first portrait of the subject in the first posture;

Recognizing the first preview image to determine the scene category of the current shooting scene;

Display the second preview image in the current shooting scene, and display the target reference pose in the second preview image; the target reference pose is obtained at least based on the scene category of the current shooting scene; wherein, the The second preview image includes a second portrait of the subject in the second posture;

If the second posture matches the target reference posture, a target image is generated according to the second preview image.
The method according to claim 1, wherein the target reference posture and the first posture satisfy at least one of the following conditions:

The target reference posture is different from the first posture;

The relative position of the target reference posture in the second preview image is different from the relative position of the first posture in the first preview image;

Alternatively, the size occupied by the target reference posture in the second preview image is different from the size occupied by the first posture in the first preview image.
The method according to claim 1 or 2, wherein the scene category of the current shooting scene includes at least one of the following categories: grass scene, step scene, seaside scene, sunset scene, road scene, or tower scene .
The method according to any one of claims 1 to 3, wherein the posture category of the target reference posture is consistent with the posture category of the first posture; wherein the posture category includes a sitting posture, a standing posture, or a lying posture. posture.
The method according to any one of claims 1 to 4, wherein the target reference posture is one of a plurality of reference postures corresponding to the category of the current shooting scene, which is between the first posture and the target posture. The reference posture whose similarity is greater than or equal to the first threshold.
The method according to any one of claims 1 to 4, wherein the target reference posture is one of a plurality of reference postures corresponding to the category of the current shooting scene, which is between the first posture and the target posture. The reference pose with the highest similarity.
The method according to any one of claims 1 to 6, characterized in that,

The position of the target reference posture in the second preview image is determined based on the position of the first preset object in the first preview image in the first preview image;

Wherein, the first local posture in the target reference posture and the position of the first preset object in the same image have a first association relationship, and the first association relationship is predefined or determined in real time.
The method according to any one of claims 1 to 7, characterized in that:

The size occupied by the target reference pose in the second preview image is determined based on the size occupied by the second preset object in the first preview image in the first preview image;

Wherein, there is a second association relationship between the target reference posture and the size of the second preset object in the same image, and the second association relationship is predefined or determined in real time.
The method according to any one of claims 1 to 8, wherein displaying the target reference pose in the second preview image comprises:

The target reference posture is displayed in the form of a human skeleton or a human contour in the second preview image.
The method according to any one of claims 1 to 9, wherein the information of the target reference posture is determined by the first terminal itself, or obtained by the first terminal from a network device.
The method according to any one of claims 1 to 10, wherein the displaying the target reference posture in the second preview image comprises:

If the scene category of the current shooting scene includes multiple scene categories, display multiple target reference poses in the second preview image; wherein, the scene category corresponds to the target reference state one-to-one;

If the second posture matches the target reference posture, generating a target image according to the second preview image includes:

If the second posture matches any one of the multiple target reference postures, a target image is generated according to the second preview image.
The method according to any one of claims 1 to 11, wherein the method further comprises:

Send the information of the target reference posture and the information of the second preview image to the second terminal to instruct the second terminal to display the second preview image, and display the target in the second preview image Reference posture.
The method according to any one of claims 1 to 12, wherein the method further comprises:

The category information of the current shooting scene is displayed in the second preview image.
The method according to any one of claims 1 to 13, wherein different scene categories are characterized by different predefined object groups;

If the first preview image contains a predefined object group, the scene category of the current shooting scene is the scene category represented by the predefined object group;

If the first preview image contains multiple predefined object groups, the scene category of the current shooting scene is part or all of the scene categories represented by the multiple predefined object groups.
The method according to any one of claims 1 to 14, characterized in that,

The proportion of the first portrait in the first preview image is greater than or equal to a second threshold;

Alternatively, the number of pixels of the first portrait is greater than or equal to the third threshold.
The method according to any one of claims 1 to 15, wherein if the second posture matches the target reference posture, generating a target image according to the second preview image comprises:

If the second posture matches the target reference posture, output prompt information, where the prompt information is used to prompt that the second posture matches the target reference posture;

Receive the first operation;

In response to the first operation, a target image is generated based on the second preview image.
The method according to any one of claims 1 to 16, wherein the method further comprises:

If the similarity between the second posture and the target reference posture is greater than or equal to a fourth threshold, it is determined that the second posture matches the target reference posture.
The method according to claim 17, wherein the method comprises:

Calculate the first vector and the second vector; wherein, the first vector is a vector formed by the relative angle information of key points in the second portrait, and is used to characterize the second posture; the second vector is the The vector formed by the relative angle information of the key points in the portrait in the target reference pose is used to characterize the target reference pose;

Calculating the distance between the first vector and the second vector;

If the distance is less than or equal to the fifth threshold, it is determined that the similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold.
The method according to claim 17, wherein the method further comprises:

Input the second posture and the target reference posture into a neural network to obtain the similarity between the second posture and the target reference posture; wherein, the neural network is used to characterize the difference between the input multiple postures的similarity.
A terminal, characterized in that the terminal includes: a display unit, a determining unit, and a generating unit;

The display unit is configured to display a first preview image of a current shooting scene, the first preview image including a first portrait of the subject in a first posture;

The determining unit is configured to recognize the first preview image to determine the scene category of the current shooting scene;

The display unit is further configured to display a second preview image in the current shooting scene, and display a target reference pose in the second preview image; the target reference pose is at least a scene based on the current shooting scene Category obtained; wherein, the second preview image includes a second portrait of the subject in a second pose;

The generating unit is configured to generate a target image according to the second preview image if the second posture matches the target reference posture.
The terminal according to claim 20, wherein the target reference posture and the first posture satisfy at least one of the following conditions:

The target reference posture is different from the first posture;

The relative position of the target reference posture in the second preview image is different from the relative position of the first posture in the first preview image;

Alternatively, the size occupied by the target reference posture in the second preview image is different from the size occupied by the first posture in the first preview image.
The terminal according to claim 20 or 21, wherein the scene category of the current shooting scene includes at least one of the following categories: grass scene, step scene, seaside scene, sunset scene, road scene, or tower scene .
The terminal according to any one of claims 20 to 22, wherein the posture category of the target reference posture is consistent with the posture category of the first posture; wherein the posture category includes a sitting posture, a standing posture, or a lying posture. posture.
The terminal according to any one of claims 20 to 23, wherein the target reference posture is one of a plurality of reference postures corresponding to the category of the current shooting scene and the one between the first posture and the first posture. The reference posture whose similarity is greater than or equal to the first threshold.
The terminal according to any one of claims 20 to 23, wherein the target reference posture is one of a plurality of reference postures corresponding to the category of the current shooting scene and the one between the first posture and the first posture. The reference pose with the highest similarity.
The terminal according to any one of claims 20 to 25, wherein:

The position of the target reference posture in the second preview image is determined based on the position of the first preset object in the first preview image in the first preview image;

Wherein, the local posture in the target reference posture has a first association relationship with the position of the first preset object in the same image, and the first association relationship is predefined or determined in real time.
The terminal according to any one of claims 20 to 26, wherein:

The size occupied by the target reference pose in the second preview image is determined based on the size occupied by the second preset object in the first preview image in the first preview image;

Wherein, there is a second association relationship between the target reference posture and the size of the second preset object in the same image, and the second association relationship is predefined or determined in real time.
The terminal according to any one of claims 20 to 27, wherein:

The display unit is specifically configured to display the target reference posture in the second preview image with a human skeleton or a human contour.
The terminal according to any one of claims 20 to 28, wherein the target reference posture information is determined by the terminal itself, or acquired by the terminal from a network device.
The terminal according to any one of claims 20 to 29, wherein:

The display unit is specifically configured to: if the scene category of the current shooting scene includes multiple scene categories, display multiple target reference poses in the second preview image; wherein the scene category corresponds to the target reference state one-to-one ；

The generating unit is specifically configured to generate a target image according to the second preview image if the second posture matches any one of the multiple target reference postures.
The terminal according to any one of claims 20 to 30, wherein the terminal further comprises:

The sending unit is configured to send the information of the target reference posture and the information of the second preview image to the second terminal, so as to instruct the second terminal to display the second preview image, and display the information on the second preview image. The target reference posture is displayed in.
The terminal according to any one of claims 20 to 31, wherein:

The display unit is further configured to display category information of the current shooting scene in the second preview image.
The terminal according to any one of claims 20 to 32, wherein different scene categories are characterized by different predefined object groups;

If the first preview image contains a predefined object group, the scene category of the current shooting scene is the scene category represented by the predefined object group;

If the first preview image contains multiple predefined object groups, the scene category of the current shooting scene is part or all of the scene categories represented by the multiple predefined object groups.
The terminal according to any one of claims 20 to 33, wherein:

The proportion of the first portrait in the first preview image is greater than or equal to a second threshold;

Alternatively, the number of pixels of the first portrait is greater than or equal to the third threshold.
The terminal according to any one of claims 20 to 34, wherein the terminal further comprises:

An output unit, configured to output prompt information if the second posture matches the target reference posture, where the prompt information is used to prompt that the second posture matches the target reference posture;

A receiving unit for receiving the first operation;

The generating unit is specifically configured to generate a target image according to the second preview image in response to the first operation.
The terminal according to any one of claims 20 to 35, wherein:

The determining unit is further configured to, if the similarity between the second posture and the target reference posture is greater than or equal to a fourth threshold, determine that the second posture matches the target reference posture.
The terminal according to claim 36, wherein the terminal further comprises:

The calculation unit is configured to calculate a first vector and a second vector; wherein, the first vector is a vector formed by relative angle information of key points in the second portrait, and is used to characterize the second posture; The second vector is a vector composed of relative angle information of key points in the portrait in the target reference pose, and is used to characterize the target reference pose; and calculating the distance between the first vector and the second vector;

The determining unit is further configured to, if the distance is less than or equal to a fifth threshold, determine that the similarity between the second posture and the target reference posture is greater than or equal to the fourth threshold.
The terminal according to claim 36, wherein the terminal further comprises:

The calculation unit is configured to input the second posture and the target reference posture into a neural network to obtain the similarity between the second posture and the target reference posture; wherein the neural network is used to characterize the input The similarity between multiple poses.
A terminal, characterized by comprising: a processor, a memory, and a display screen, the display screen is used to display images, the memory is used to store computer programs and instructions, and the processor is used to call the computer programs and instructions , To execute the method according to any one of claims 1-19 in cooperation with the display screen.