CN118450269A

CN118450269A - Image processing method and electronic device

Info

Publication number: CN118450269A
Application number: CN202311406139.1A
Authority: CN
Inventors: 陈兵; 张心鸣; 李辉
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2024-08-06

Abstract

The application provides an image processing method and electronic equipment, wherein the electronic equipment comprises a main camera and a long-focus camera, and the main camera and/or the long-focus camera are rotatable cameras. When shooting is performed through the main camera and the long-focus camera, the definition of the fusion image can be improved. The method may include: starting a camera application program; receiving a first operation of a shooting control in the camera application program; determining a main cut image and a tele cut image in response to a first operation; determining an enhanced feature image according to the main shot cut image, the tele cut image and the image processing model; and generating a fusion image based on the enhanced feature image, and displaying the fusion image. The main shot cutting image is obtained by cutting a main shot image shot by a main camera, and the long-focus shot cutting image is obtained by cutting a long-focus image shot by a long-focus camera.

Description

Image processing method and electronic device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and an electronic device.

Background

The camera module of the electronic device may include one or more cameras, such as one or more of a main camera, a wide angle camera, a tele camera, and the like. The main camera is the most important camera in the camera module, has the highest pixel, the largest aperture and the best imaging quality, and can be used for shooting most shooting scenes. The wide-angle camera has a wider shooting range or field of view than the main camera, and can be used for shooting wide scenes such as scenery, buildings and the like. The shooting distance of the long-focus camera is farther than that of the main camera, and the long-focus camera can be used for shooting wind, light, people and the like at a distance. The wide-angle camera and the long-focus camera can make up the defects of the main camera, so that the shooting effect is improved. For example, the primary camera merges with the tele camera: the main camera and the long-focus camera are utilized to shoot the same human image in the long-range view, so that a main shot human image and a long-focus human image can be obtained (although the pixels of the main camera are high, the effect of shooting the human image in the long-range view is poor, and the long-focus camera shoots the long-range view through lossless zooming, so that the definition of the long-focus human image is better than that of the main shot human image), the long-focus human image is fused into the main shot human image, a fused human image can be obtained, and the overall definition of the fused human image and the definition of the human image are both high Yu Zhu. The fusion of the main camera and the long-focus camera is a multi-shot fusion mode.

Currently, each camera in the camera module is a fixed camera, that is, the physical position of each camera on the electronic device is fixed, and the difference of the position angles between the cameras is a fixed value. For fusion of the main camera and the tele camera, there may be a difference between the shooting angle of the main camera and the shooting angle of the tele camera, and this difference may affect the sharpness of the fused image.

Disclosure of Invention

The application provides an image processing method and electronic equipment, which can improve the definition of a fused image.

In a first aspect, the present application provides an image processing method, which is applicable to an electronic device including a main camera and a tele camera, where the main camera and/or the tele camera are rotatable cameras. The method may include: starting a camera application program and displaying a user interface of the camera application program, wherein the user interface comprises shooting control; receiving a first operation of a shooting control of a camera application program, namely receiving a first operation of the shooting control in the user interface; in response to the first operation, performing the following operations: determining a main shot cut image and a tele cut image; determining an enhanced feature image based on the main shot cut image, the tele cut image, and the image processing model; based on the enhanced feature image, a fused image is generated and displayed.

According to the image processing method provided by the first aspect, under the scene that the main camera and/or the long-focus camera are/is rotatable, the image processing model is adopted to obtain the enhanced characteristic image with higher definition, and the fusion image can be generated based on the enhanced characteristic image, so that the definition of the fusion image can be improved.

The rotatable camera is a camera which can rotate up, down, left and right at a certain angle, and is similar to the rotatable monitoring camera. Optionally, the rotatable camera can not only rotate up and down and left and right at a certain angle, but also move back and forth at a certain distance. Since the rotatable camera may be rotated and/or moved, there is a difference in the camera's angle of capture from other fixed cameras on the electronic device. For the main camera and/or the tele camera is/are rotatable, there may be a large difference between the shooting angle of the main camera and the shooting angle of the tele camera, and there may be a large difference between the main image shot by the main camera and the tele image shot by the tele camera, for example, at least one of a position difference, an angle difference, an expression difference, a color difference and the like, which may affect the sharpness of the fused image. With the method provided in the first aspect, the influence of these differences on the sharpness of the fused image can be reduced, so that the sharpness of the fused image can be improved.

The image processing method provided in the first aspect can be further applied to a scene in which the main camera and the tele camera are fixed cameras, and the image processing model is adopted, so that the definition of the fusion image can be improved, and the efficiency of generating the fusion image can be improved. For cameras with fixed main cameras and fixed tele cameras, the fusion image obtained by the method provided by the first aspect has higher definition than the fusion image obtained by current multi-shot fusion.

The shooting range that fixed camera gathered is limited, if the user wants to shoot the scene near a certain shooting range through the camera module, then need mobile electronic equipment to make the camera module shoot this scene again. However, if the camera is a rotatable camera, the user does not need to move the electronic device, and the camera is rotated by a certain angle and/or moved back and forth by a certain distance to shoot the scene by adjusting the camera.

In some embodiments, in response to the first operation, a primary image is determined from images acquired by the primary camera and a tele image is determined from images acquired by the tele camera. That is, in response to the first operation, the main camera and the tele camera are invoked to acquire images of the same shooting scene, a main shot image is determined from at least one image acquired by the main camera for the shooting scene, and a tele image is determined from at least one image acquired by the tele camera. After the main shot image and the tele image are determined, the main shot image and the tele image are cut to obtain a main shot cut image and a tele cut image, so that the main shot cut image and the tele cut image are input into an image processing model for processing. The processing efficiency can be improved by inputting the main shot clipping image and the tele clipping image into the image processing model for processing, and compared with the processing of inputting the main shot image and the tele image into the image processing model for processing, the processing of the clipping image is emphasized so as to highlight the characteristics of the clipping image.

In combination with the method provided in the first aspect, in some embodiments, when the main shot image and the tele image are cropped, the region identification model may be invoked to identify a preset region in the main shot image and the tele image, and a cropping function may be invoked to crop the main shot image and the tele image based on the preset region, so as to obtain the main shot cropping image and the tele cropping image. The preset area may be, for example, a portrait face area, a portrait area, or a designated shooting area. The region recognition model may be, for example, a face recognition model for recognizing a facial region of a human face. Because the shooting parameters of the main shooting image and the long-focus image are different, the size of a preset area in the main shooting image is different from that of the preset area in the long-focus image, and the preset area is used for indicating a certain area and is not limited to be the same in size. The clipping function is used for clipping the image, and clipping a preset area image from the image.

In some embodiments, in combination with the method provided in the first aspect, since the enhanced feature image is a cropped image, it is necessary to fuse the enhanced feature image into the main shot image to generate a fused image. That is, a fusion image is generated based on the enhanced feature image and the main shot image. It will be appreciated that the main cut image in the main image is replaced or overlaid with the enhanced feature image, and that peripheral fusion processing is performed on the surrounding images of the cut location during the replacement or overlay process. In this way, compared with the main shooting image, the fusion image has higher definition of the preset area and more detail.

In some embodiments, the method provided in the first aspect, the determining the enhancement feature image according to the main shot clipping image, the tele clipping image and the image processing model may be that the main shot clipping image and the tele clipping image are input into the image processing model, so as to obtain the enhancement feature image output by the image processing model.

In some embodiments, the image processing model is specifically configured to perform alignment processing on the tele-cut image based on the main-shot cut image to obtain a first intermediate image, so that the first intermediate image is aligned (e.g., aligned in size and aligned in pixels) with the main-shot cut image for subsequent processing; performing color mapping processing on the first intermediate image based on the main shooting cutting image to obtain a second intermediate image so as to reduce the color difference between the second intermediate image and the main shooting cutting image and further reduce the difference between the fusion image and the main shooting image; and performing attention learning on the main shot cut image based on the second intermediate image to obtain an enhanced feature image, that is, learning the main shot cut image to obtain the second intermediate image can obtain the enhanced feature image, thereby improving the definition of the enhanced feature image.

In some embodiments, the features of the first intermediate image are aligned with the features of the main cut image. That is, by the alignment process, the features of the first intermediate image and the features of the main cut image can be aligned in size and pixel for color mapping process and attention learning.

In some embodiments, the color features of the main cut image are fused in the second intermediate image in combination with the method provided in the first aspect. That is, the second intermediate image fuses the color features of the main cut image to reduce the color difference between the second intermediate image and the main cut image, thereby reducing the difference between the fused image and the main cut image.

In some embodiments, the method provided in connection with the first aspect, before using the image processing model, receives parameters of the image processing model, deploys the image processing model based on the parameters of the image processing model so that the electronic device can invoke the image processing model to process the main cut image and the tele cut image. The parameters of the image processing model may come from a server or a personal computer (personal computer, PC) or the like, which may be understood as a model training device. That is, the server, the PC, or the like performs the model training process, and when a training model satisfying the deployment condition is obtained, the parameters of the training model are transmitted to the electronic device so that the electronic device deploys the image processing model based on the parameters.

In a second aspect, the present application provides an electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method as described in the first aspect and any possible implementation of the first aspect.

In a third aspect, the application provides a computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform a method as described in the first aspect and any possible implementation of the first aspect.

In a fourth aspect, the application provides a computer program product comprising instructions which, when run on an electronic device, cause the electronic device to perform a method as described in the first aspect and any possible implementation of the first aspect.

It will be appreciated that the electronic device provided in the second aspect, the computer storage medium provided in the third aspect, and the computer program product provided in the fourth aspect described above are all configured to perform the method provided in the first aspect of the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

Drawings

FIGS. 1A-1D are a set of user interface diagrams provided by an embodiment of the present application;

FIGS. 2A-2C are effect diagrams of image processing provided by embodiments of the present application;

FIG. 3 is a schematic diagram of a software architecture provided by an embodiment of the present application;

FIG. 4 is a flowchart of an image processing performed by an electronic device to obtain a fused image according to an embodiment of the present application;

FIG. 5 is an exemplary view of clipping and alignment provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of several implementations of step 404 and step 405 of FIG. 4;

FIG. 7 is a schematic diagram of a training process of an image processing model provided by an embodiment of the present application;

fig. 8 is a hardware configuration diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The terminology used in the following embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

1. Alignment of

For a plurality of cameras of an electronic device to capture the same scene, due to differences in spatial positions of the cameras, sensors, and the like, parameter information of different cameras, such as focal length, pixel size, field of view (FOV), and the like, is different, so that images obtained by capturing the same scene by different cameras are different. For example, for a scene including a portrait, because the pixel of the main camera is Yu Changjiao higher than the pixel of the telephoto camera, the pixel of the telephoto camera has multiple lossless zooming, the aperture of the telephoto camera is larger than that of the main camera, and the like, the main camera shooting image is more scenery and smaller than the telephoto image shot by the telephoto camera, so that the portrait in the main camera and the portrait in the telephoto image is enlarged, and the definition of the portrait in the telephoto image is higher than that of the portrait in the Yu Zhu camera. Wherein, since the focal length of the main camera is shorter than that of the tele camera and the FOV of the main camera is greater than that of the tele camera, the tele image can be understood as a close-up image of the scene and the main image can be understood as a far-view image or a wide-angle image of the scene. Therefore, the images are subjected to an alignment process before the fusion process is performed on the images. Illustratively, taking an example of performing alignment processing on a tele image and a main shot image of the same scene, for the same feature in the same scene, the size of the tele image is larger than that of the main shot image, the sharpness of the tele image is higher by Yu Zhu degrees of sharpness of the image, for example, for the facial feature of the same person, the size of the facial feature in the tele image is larger than that in the main shot image, and the sharpness in the tele image is higher than that in the main shot image. Alignment may include size alignment and pixel alignment. The size alignment can reduce the size difference between different images; pixel alignment may reduce pixel differences between different images.

1) Alignment of dimensions

The size alignment may be understood as that other images are subjected to cropping and scaling processing based on the size of the reference image so that the processed other images are aligned in size with the reference image.

In one implementation, the electronic device may calculate a scale difference between the tele image and the main shot image based on parameter information of the tele image and the main shot image when the parameter information is known, and then crop and amplify the main shot image based on the scale difference, so that the processed main shot image may be a Ji Changjiao image in size.

In another mode, under the condition that parameter information of the tele image and the main shot image is not known, the electronic device performs sparse feature point matching on the tele image and the main shot image to obtain a transformation matrix between the tele image and the main shot image, and performs clipping and amplifying processing on the main shot image based on the transformation matrix, so that the processed main shot image can be subjected to Ji Changjiao images in size. For convenience of description, an image obtained by performing size alignment processing on a main-shot image is referred to as a size-aligned image. Although the size-aligned image is dimensionally aligned with the tele image, there may be rotation and/or translation of the tele image compared to the size-aligned image for the same object, and thus pixel alignment of the size-aligned image is required.

2) Pixel alignment

Pixel alignment may be understood as the rotation and/or translation of other images based on the position of each object in the reference image such that the other images after processing are aligned on the pixels with the reference image. For the same object, the object in the other images is moved or rotated relative to the object in the reference image.

Pixel alignment is typically implemented using an optical flow alignment algorithm that can cover local runs, and can better handle detail changes, such as facial expression changes, etc. To improve alignment accuracy and alignment efficiency, a deep-learned optical flow alignment algorithm may be employed, such as DISFlow algorithm in an open-source computer vision library (Open Source Computer Vision Library, openCV).

Taking the main shot image pair Ji Changjiao as an example, i.e. the tele image is taken as a reference image, the main shot image is adjusted to align to the tele image. The tele image may also be aligned to the main shot image by adjusting the tele image as a reference image to the Ji Zhu shot image, for example, the tele image may be cropped and scaled down so that the processed tele image may be sized to Ji Zhu shot images.

2. Adaptive instance Normalization (ADAPTIVE INSTANCE Normalization, adaIN)

The AdaIN algorithm is an image style migration algorithm that can migrate styles, textures in one image (style image) to another image (content image) while preserving the body structure of the content image. The AdaIN algorithm is based on a forward neural network model, the generation speed is high, the AdaIN algorithm can support migration of any style, the migration is not limited to a specific style, and a style migration image can be obtained only by inputting a content image and a style image into the AdaIN algorithm.

Illustratively, assuming that the content image is characterized by x and the style image is characterized by y, adaIN algorithm can migrate the style of y to x, see the following formula.

Where μ (y) represents the mean of the features of the style image, σ (y) represents the variance of the features of the style image, μ (x) represents the mean of the features of the content image, and σ (x) represents the variance of the features of the content image.

In an embodiment of the present application, adaIN algorithm is used to migrate color features in one image to another image to reduce color differences.

3. Attention mechanism (Attention Mechanism)

The mechanism of attention stems from the study of human vision. In cognitive sciences, due to bottlenecks in information processing, humans may selectively focus on a portion of all information while ignoring other visible information. The above mechanism is often referred to as an attention mechanism. Different parts of the human retina have different degrees of information processing capability, i.e. acuity, with only the foveal parts having the strongest acuity. In order to reasonably utilize limited visual information processing resources, a human needs to select a specific part in the visual area and then concentrate on it. For example, people typically only have a small number of words to be read that are of interest and processing when reading. In summary, the attention mechanism has two main aspects: determining a part to be concerned; the limited information processing resources are allocated to the portion of interest.

The attention mechanism may enable a neural network (model) to have the ability to concentrate on a subset of its inputs (or features): a particular input is selected. The attention mechanism may be applied to any type of input regardless of its shape. In situations where computing power is limited, the attention mechanism is a resource allocation scheme that is the primary means of solving the information overload problem, allocating computing resources to more important tasks.

In the embodiment of the application, the attention mechanism is used for carrying out focus learning on the difference characteristics of a specific region (such as a region of interest) in the image so as to improve the detail learning capability of the region of interest. For example, the difference feature of the region such as the five sense organs hair in the portrait image is subjected to the emphasis learning.

4. Loss parameters (loss)

The loss parameter means that there is a difference between the predicted value (estimated value) and the actual value (expected value, reference value, label (ground truth)), and the "loss" means that the model is punished by failing to produce the expected result. The function is to determine the performance of the model by comparing the predicted output and the expected output of the model, and then find the direction of optimization. If the deviation between the two is very large, the loss value will be large; if the deviation is small or the values are nearly the same, the loss value will be very low. The Loss parameter may be a Loss value of the Loss Function (Loss Function), or may be a Loss value of the Cost Function (Cost Function).

For a model, the model may be deployed and/or used if its loss parameters meet the corresponding conditions. In the embodiment of the application, in the training process of the image processing model, if the loss parameters meet the corresponding deployment conditions, the image processing model under the loss parameters can be deployed in the electronic equipment, and the electronic equipment is used for calling the image processing model to process the main shot image and the tele image.

First, an application scenario of the embodiment of the present application is described. The image processing method and the electronic device provided by the embodiment of the application can be applied to a scene in which the main camera is a fixed camera, the long-focus camera is a rotatable camera, a scene in which the main camera is a rotatable camera, the long-focus camera is a fixed camera, a scene in which the main camera and the long-focus camera are both rotatable cameras, and a scene in which the main camera and the long-focus camera are both fixed cameras. The fixed camera is fixed in physical position on the electronic equipment, the position angle difference between the fixed cameras is a fixed value, and the rotatable camera is capable of rotating up and down and left and right at a certain angle and/or moving back and forth at a certain distance, so that the position angle difference between the rotatable camera and the fixed camera is within a variation range.

For example, in the case of a rotatable main camera and/or a rotatable telephoto camera, because there may be a large difference between a main image captured by the main camera and a telephoto image captured by the telephoto camera, the display effect of the fused image may be affected, for example, the sharpness may be affected, by adopting the current multi-shot fusion algorithm. Furthermore, by adopting the embodiment of the application, the definition of the fusion image can be improved. For cameras with fixed main cameras and fixed tele cameras, the definition of the fused image can be further improved by adopting the embodiment of the application compared with the existing multi-shot fusion algorithm.

The image processing method provided by the embodiment of the application can be applied to the electronic equipment with the image processing capability, such as a mobile phone, a tablet personal computer and the like, wherein the electronic equipment (namely the electronic equipment 100) is provided with the image processing capability.

By implementing the image processing method provided by the embodiment of the application, the electronic device 100 can perform alignment processing, color mapping processing and attention learning to obtain a fused image. Performing the color mapping process helps to reduce the color difference between the fused image and the main image. Performing attention learning helps to improve the sharpness of the fused image. Thus, even if the tele camera is not fixed, the color difference can be reduced and the sharpness of the fused image can be improved. Taking a scene including a portrait as an example, the embodiment of the application can improve the definition of the portrait in the fused image.

For example, the electronic device 100 may include a tele camera and a main camera. In some embodiments, the electronic device 100 invokes the tele camera and the main camera to image the same scene, resulting in a tele image and a main image, and processes the main image based on the tele Jiao Tu to improve the sharpness of the fused image. In other embodiments, the electronic device 1 invokes the tele camera and the main camera to perform video shooting on the same scene, so as to obtain an original long Jiao Shipin frame and an original main video frame, and processes the original main video frame based on the original long Jiao Shipin frame to improve the definition of the original fused video frame, and the process is repeated, so that a fused video stream with higher definition can be obtained.

Not limited to a cell phone, tablet computer, electronic device 100 may also be a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular telephone, a Personal Digital Assistant (PDA), an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) device, a wearable device, a vehicle-mounted device, a smart home device, and/or a smart city device. The embodiment of the application does not limit the specific type of the electronic device.

Fig. 1A to 1D schematically illustrate a set of user interfaces displayed on an electronic device 100, and an application scenario for implementing the image processing method provided by the embodiment of the present application is specifically described below with reference to fig. 1A to 1D.

First, fig. 1A illustrates a user interface, i.e., home page, on an electronic device 100 that presents an installed application. As shown in FIG. 1A, one or more application icons are displayed in the main page, such as a "clock" application icon, a "calendar" application icon, a "weather" application icon, and so forth.

The one or more application icons include an icon for a "camera" application, i.e., icon 111. The electronic device 100 may detect a user operation, such as a click operation or the like, acting on the icon 111. In response to the user operation, the electronic device 100 may turn on the camera and invoke the camera to capture an image and display the user interface shown in fig. 1B.

In another implementation, the lock screen user interface of the electronic device 100 displays a "camera" icon. The electronic device 100 may detect a user operation on the "camera" icon, which may be, for example, an operation to click and pull up to the top of the lock screen user interface. In response to the user operation, the electronic device 100 may turn on the camera and invoke the camera to capture an image and display the user interface shown in fig. 1B.

When the electronic device 100 turns on the camera, one camera may be turned on by default, for example, a tele camera or a front camera may be turned on by default, or a plurality of cameras may be turned on by default, for example, a main camera and a tele camera may be turned on by default. Optionally, the electronic device 100 also turns on the tele camera when the focal length of the turned-on main camera is within a certain range.

Fig. 1B and 1C illustrate user interfaces when a "camera" application is running on electronic device 100. The user interface may include a window 121, a capture control 122, an image display control 123, and an inversion control 124.

The window 121 is used for displaying an image acquired by the camera, and may be understood as a display area of the acquired image, so that a user inputs a click operation to the photographing control 122 or the inversion control 124 according to a screen displayed by the window 121. In some embodiments, window 121 also includes controls, such as focus adjustment controls. Upon detecting a user operation acting on the focus adjustment control, the electronic device 100 may adjust the focus of the camera, for example, from 1x to 2x, in response to the user operation, without limiting the class of cameras herein.

In the case where the electronic device 100 defaults to turn on one camera, the image displayed by the window 121 is the image acquired by the camera. In the case where the electronic device 100 turns on two cameras, the image displayed by the window 121 may be an image acquired by the main camera, or may be an image obtained by fusing an image acquired by the main camera and an image acquired by the telephoto camera, depending on the processing capability of the electronic device 100, where the processing capability is required to be slightly lower.

Upon detecting a user operation on the photographing control 122, the electronic device 100 may determine a photographed image from among the collected images and display a thumbnail of the photographed image in the image display control 123 in response to the user operation. A main image is determined, for example, from the images acquired by the main camera, and a thumbnail of the main image is displayed in the image display control 123. For another example, a main shot image is determined from images acquired by a main camera, a long-focus image is determined from images acquired by a long-focus camera, the frame time of the main shot image is the same as that of the long-focus image, the main shot image is processed based on a length Jiao Tu to obtain a fused image, a thumbnail of the fused image can be displayed in an image display control 123, and when a user operation acting on the image display control 123 is detected, the fused image is displayed in a user interface shown in fig. 1D in response to the user operation; the fused image may be stored in the gallery, and when a click operation on the fused image in the gallery is detected, the fused image may be displayed on the user interface shown in fig. 1D in response to the click operation.

When the electronic device 100 detects a user operation acting on the inversion control 124, the rear camera can be turned off, the front camera can be turned on, and the front camera can be called to collect images in response to the user operation; or the front camera can be closed, the rear camera can be opened, and the rear camera can be called to collect images.

The user interfaces shown in fig. 1B and 1C further include a "night view" mode, a "portrait" mode, a "photo" mode, a "video" mode, and these modes are used for example, and are not limited to the embodiments of the present application, and for example, a "multi-mirror video" mode is also included in practical applications. The video recording mode is used for recording video files, and the multi-lens video recording mode is used for recording video files when the front camera and the rear camera are simultaneously started. The night scene mode is used for shooting night scene images, and the definition of the night scene images is improved. The portrait mode is mainly used for shooting portrait images, and the image processing method provided by the embodiment of the application can be adopted in the portrait mode. The "photograph" mode is used to photograph a single frame of image. In fig. 1B and 1C, the mode selected by the user is the "portrait" mode.

Upon detecting a user operation on the image display control 123, the electronic device 100 displays a user interface as shown in fig. 1D in response to the user operation. The user interface is a user interface for browsing pictures, and may include an image display window 131 for displaying images or videos. The image may be a fused image obtained by adopting the image processing method provided by the embodiment of the application, and the video may be a fused video obtained by adopting the image processing method provided by the embodiment of the application.

The user interface may also include a control 134, a share control 135, a favorites control 136, an edits control 133, a deletes control 137, and so forth.

The control 134 may be used to present detailed information of the image, such as time of capture, location of capture, color coding format, code rate, frame rate, pixel size, and so forth.

The sharing control 135 may be used to send the displayed image or video for use by other applications. For example, upon detecting a user operation on the sharing control, in response to the user operation, the electronic device 100 may display icons of one or more applications, including icons of social software 1. Upon detecting a user operation of an application icon acting on social software 1, in response to the user operation, electronic device 100 may send an image or video to social software 1, and further, the user may share the image or video with friends through the social software.

The collection control 136 may be used to mark images or videos. In the user interface shown in FIG. 1D, upon detecting a user operation on the favorites control, the electronic device 100 can mark the displayed image or video as a user-favorite image or video in response to the user operation. The electronic device 100 may generate an album for displaying images or videos that are marked as user likes.

The delete control 137 may be used to delete a displayed image or video.

By adopting the image processing method provided by the embodiment of the application, the image display control 123 can display the thumbnail of the fused image or store the fused image in a gallery. The image displayed by the image display window 131 in fig. 1D is a fused image. Wherein the fused image has a higher definition than the main shot image, for example, the definition of the face area is particularly noticeable, and the color difference between the fused image and the main shot image is small.

For example, referring to the images shown in fig. 2A to fig. 2C, the image shown in fig. 2A is a main image captured by a main camera (i.e., a main image determined from an image captured by the main camera), the image shown in fig. 2B is a tele image captured by a tele camera (i.e., a tele image determined from an image captured by the tele camera), and the image shown in fig. 2C is a fused image obtained by using the image processing method provided by the embodiment of the present application. That is, the image display window 131 in fig. 1D displays the fused image shown in fig. 2C.

Because the pixel of the main camera is Yu Changjiao high, the long-focus camera has multiple lossless zooming, the aperture of the long-focus camera is larger than that of the main camera, and the like, for the scene of the main camera and the long-focus camera shooting the same perspective, compared with the long-focus image, the main camera has more scenes and smaller human images, and the definition of the human images in the main camera is lower than that of the human images in the long-focus image, for example, the definition of the human face in fig. 2B is far higher than that in fig. 2A. Since the focal length of the main camera is shorter than the focal length of the tele camera and the FOV of the main camera is greater than the FOV of the tele camera, fig. 2B can be understood as a close-up image of the scene and fig. 2A can be understood as a far-view image of the scene. As can be seen by comparing fig. 2A and 2C, the definition of fig. 2C is higher than that of fig. 2A. Although the definition of fig. 2C is not as good as that of fig. 2B, it is higher than that of fig. 2A.

In fig. 2A to 2C, the facial expression of the portrait is unchanged, but since the tele camera and/or the main camera are rotatable cameras, the shooting angles of the tele camera and the main camera may be different, for example, the shooting angles deviate by 10 °, so that the facial expression in fig. 2A may actually differ from the facial expression in fig. 2B. By adopting the embodiment of the application, even if the difference exists, the processing can be performed, so that the definition of the fusion image is improved.

The specific process by which the electronic device 100 implements the user interface shown in fig. 1D is described in detail below.

First, fig. 3 exemplarily shows a software architecture of the electronic device 100.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the invention, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, from top to bottom, an application layer, an application Framework layer, an Zhuoyun lines (Android runtime) and system libraries, a hardware abstraction layer (Hardware Abstraction Layer, HAL), and a kernel layer, respectively.

The application layer may include a series of application packages. As shown in fig. 3, the application package may include camera, gallery, video, music, navigation, calendar, map, WLAN, etc. applications. In an embodiment of the present application, during operation of the camera, the user interface shown in fig. 1B to 1D may be provided, and the electronic device 100 may display a thumbnail of the fused image on the image display control 123, or may display the fused image on the image display window 131. After the camera is started, the electronic device 100 may call the camera to capture an image. The electronic device 100 comprises at least two cameras, for example one camera is a main camera and the other camera is a tele camera, which is rotatable.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions. In embodiments of the present application, the application framework layer includes a camera services framework, which may include an image processing model. Optionally, the image processing model processes the main shot clipping image and the tele clipping image, so that an enhanced feature image can be obtained. Optionally, the image processing model processes the main shot image and the tele image, and a fused image can be obtained. The image processing model may implement the functions of alignment processing, color mapping processing, and attention learning.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

Android run time includes a core library and virtual machines. Android runtime is responsible for scheduling and management of the android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (e.g., openGL), etc. The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications. Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

A Hardware Abstraction Layer (HAL) is an interface layer located between the kernel layer and the hardware that can be used to control the actions of the hardware. The hardware abstraction layer may include a hardware module for controlling the actions of the hardware.

The kernel layer is the basis of the android system, for example, ART relies on the kernel layer to perform underlying functions, such as thread and low-level memory management, etc. The kernel layer is a layer between hardware and software. The kernel layer at least comprises display drive, camera drive, audio drive, sensor drive, GPU drive and the like. In the embodiment of the application, the kernel layer comprises a camera driver and a display driver, wherein the camera driver is used for driving the camera to acquire images, and the display driver is used for driving the display screen to display the fused images.

The image processing method provided by the embodiment of the application is described based on the software architecture shown in fig. 3. In some embodiments, the electronic device 100 detects a user operation acting on the camera application and, in response to the user operation, activates the camera to capture an image. Upon detecting the user operation, the camera application triggers an instruction to activate the camera. The camera application program calls an API interface of the application program framework layer to send the instruction to the camera service framework, the camera service framework calls the hardware abstraction layer to send the instruction to the camera driver, the camera driver can drive the starting of the camera and drive the camera to collect images, and the images collected by the camera can be cached in the image cache area.

When the electronic device 100 activates a camera, the camera or cameras may be activated. In the embodiment of the application, the main camera and the long-focus camera are started as examples, and the main camera and/or the long-focus camera can be rotatable cameras. Optionally, the electronic device 100 detects the user operation and immediately activates the main camera and the tele camera. Optionally, the electronic device 100 detects the user operation and the main camera is started, and may further start the tele camera. Optionally, the electronic device 100 detects the user operation and the focal length of the main camera is within a certain range, and may further activate the tele camera.

The primary camera may transmit the primary image to an image processing model in the camera service framework and the tele camera may transmit the tele image to an image processing model in the camera service framework. Optionally, the image processing model processes the main shot image and the tele image to obtain a fused image. Optionally, the camera service framework cuts the main shot image and the tele image respectively to obtain a main shot cut image and a tele cut image, the image processing model processes the main shot cut image and the tele cut image to obtain an enhanced feature image, and the image processing model or the camera service framework calls a post-processing algorithm to process the enhanced feature image to generate the fusion image. The camera service framework invokes the hardware abstraction layer to transmit the fused image to the display screen so that the display screen displays the fused image. The fusion image has higher definition than the main image.

Fig. 4 illustrates a flowchart of image processing by the electronic device 100 to obtain a fused image. In combination with the user interfaces shown in fig. 1A to 1D and the software architecture of the electronic device 100 shown in fig. 3, a flow of image processing performed by the electronic device 100 to obtain a fused image will be specifically described in the following embodiments of the present application.

The electronic device 100 determines 401 a main shot image and a tele image in response to a first operation of a shooting control.

The main shot image and the tele image are images for shooting the same scene, and shooting parameters of the main shot image and the tele image are different, such as different focal lengths. In the embodiment of the present application, taking a main shot image and a tele image as images for shooting a target portrait, that is, any portrait, the main shot image may be, for example, an image shown in fig. 2A, and the tele image may be, for example, an image shown in fig. 2B. Because the pixel of the main camera is Yu Changjiao higher, the long-focus camera has multiple lossless zooming, the aperture of the long-focus camera is larger than that of the main camera, and the like, compared with the long-focus image, the main camera has more scenery and smaller human images, and the definition of the human images in the main camera is lower than that of the human images in the long-focus image, for example, the definition of the human face in fig. 2B is far higher than that in fig. 2A. Since the focal length of the main camera is shorter than that of the tele camera and the FOV of the main camera is greater than that of the tele camera, fig. 2B can be understood as a near view image and fig. 2A can be understood as a far view image. In fig. 2A and fig. 2B take the background of the target portrait as a white example, in practical application, a non-white background exists in the image of the target portrait, and the embodiment of the application mainly processes the target portrait area, and a similar method may also be used for processing the background in the background portion.

The photographing control is one of the controls in the camera application program, and upon receiving a click operation, the electronic device 100 may determine a preview image from at least one frame image acquired by the camera in response to the operation, and display a thumbnail of the preview image in an image display control of the camera application program. The electronic device 100 opens the camera application and detects whether a first operation is received on the capture control, such as a click operation on the capture control 122 in the user interface shown in fig. 1B. Upon receiving a click operation on the photographing control, the electronic apparatus 100 determines a main photographing image and a tele image.

Optionally, in response to the clicking operation, the electronic device 100 invokes the main camera and the tele camera to shoot the same shooting scene, so as to obtain a main shot image and a tele image.

Optionally, in response to the clicking operation, the electronic device 100 determines a main shot image from at least one frame image of a certain shooting scene acquired by the main camera, and determines a tele image from at least one frame image of the shooting scene acquired by the tele camera. Upon receiving a user operation on the camera application, the electronic device 100 opens the camera application, which may trigger the start of the camera to invoke the camera to capture an image. Before detecting a click operation on the shooting control, the camera is always acquiring an image and buffering the acquired image in an image buffer. And detecting clicking operation acting on the shooting control, responding to the clicking operation, determining a main shooting image from at least one image acquired by a main camera cached in the image cache area, and determining a tele image from at least one image acquired by a tele camera cached in the image cache area. The opening of the camera application may be triggered by a user operation as shown in fig. 1A, or by a user operation for other applications, such as a user operation for a shooting icon of an instant messaging application.

The electronic device 100 detects a user operation on the photographing control and the camera application may trigger a photographing instruction. The camera application program calls an API interface of the application program framework layer to send a shooting instruction to the camera service framework, and the camera service framework calls the camera interface to read the cached image from the image cache area and determine a main shooting image and a tele image from the cached image.

402, The electronic device 100 clips the main shot image and the tele image, resulting in a main shot clip image and a tele clip image.

Because the shooting effect of the shooting object in the long-focus image is better than that of the shooting object in the main shooting image, the shooting object in the main shooting image can be processed by referring to the shooting object in the long-focus image so as to improve the shooting effect. Further, the electronic device 100 cuts out the main shot image and the tele image, and cuts out the shot object to obtain a main shot cut image and a tele cut image. For example, the photographing effect of the person image in the tele image is better than that of the person image in the main photographing image, so that the person image can be cut out from the main photographing image and the person image can be cut out from the tele image. The shooting object is an example, and may be replaced by a partial shooting area, and the partial shooting area may be a designated shooting area, for example. The electronic apparatus 100 can recognize a photographing object or a part of a photographing region from the main photographing image and the tele image, and crop based on the recognition result to obtain a main photographing cropping image and a tele cropping image. The region in which the subject is located, or a partial photographing region, or the like can be regarded as a preset region.

Illustratively, the electronic device 100 clips the main shot image based on the face region of the target person image to obtain a main shot clip image, and clips the tele image based on the face region of the target person image to obtain a tele clip image. Thus, the main shot cut image and the tele cut image are images of the face area of the target portrait. The size of the main shot cut image and the tele cut image may be different, e.g., the size of the main shot cut image is smaller than the size of the tele cut image.

In the implementation process, the camera service framework can call the region recognition model to recognize preset regions in the main shot image and the tele image, for example, call the face recognition model to recognize human face regions in the main shot image and the tele image, and call the clipping function to clip the main shot image and the tele image based on the human face regions respectively, so as to obtain a main shot clipping image and a tele clipping image. The region identification model may be included in the image processing model or may be independent of the image processing model. The cropping function may be included in the image processing model or may be independent of the image processing model.

For example, referring to the exemplary diagram of cropping and alignment shown in fig. 5, the electronic device 100 recognizes the face areas of the person in the main-shot image and the tele image, and crops the main-shot image and the tele image based on the face areas of the person to obtain a main-shot cropping image and a tele cropping image, respectively.

403, The electronic device 100 performs alignment processing on the tele clipping image based on the main shooting clipping image, to obtain a first intermediate image.

The electronic device 100 performs an alignment process on the tele-cut image based on the main-cut image to obtain a first intermediate image, which may be understood as transforming (warp) the tele-cut image such that the first intermediate image obtained by warp is aligned with the main-cut image. Features in the main shot cut image can be learned through the warp tele cut image, so that the first intermediate image is ensured not to change compared with facial area features of the main shot image, for example, features such as facial expression, five sense organs and the like in the first intermediate image are respectively aligned with the features in the main shot image. warp refers to transforming one image into the perspective of another image or moving one image to a target position according to relative motion to align with another image. warp may also be understood as an alignment process, which may include a size alignment process and a pixel alignment. And (3) aligning the sizes, namely adjusting the sizes of the tele cropping images so that the sizes of the first intermediate images are the same as the sizes of the main shooting cropping images. The pixel alignment is to rotate and/or translate each five sense organ in the tele-cut image based on the position of each five sense organ in the main-cut image such that the first intermediate image is aligned on the pixel with the main-cut image.

The method has the advantages that the long-focus cutting image is aligned based on the main-shot cutting image, local motion can be covered, and expression difference between the main-shot cutting image and the long-focus cutting image can be processed better.

For example, referring to the exemplary view of cropping and alignment shown in fig. 5, the electronic device 100 performs an alignment process on the tele cropping image based on the main cropping image, resulting in a first intermediate image. The first intermediate image is dimensionally and pixelwise aligned with the main crop image.

404, The electronic device 100 performs color mapping processing on the first intermediate image based on the main cropping image, to obtain a second intermediate image.

Wherein the color mapping process may be implemented by AdaIN algorithm. Performing color mapping processing on the first intermediate image based on the main cut image can be understood as migrating color features of the main cut image onto the first intermediate image. For example, assuming that the color characteristic of the first intermediate image is i and the color characteristic of the main cut image is j, adaIN algorithm can migrate j to i, see the following formula.

Wherein μ (j) represents the mean value of the color features of the main cut image, σ (j) represents the variance of the color features of the main cut image, μ (i) represents the mean value of the color features of the first intermediate image, σ (i) represents the variance of the features of the first intermediate image, adaIN (i, j) represents the color features of the second intermediate image.

Color difference exists between images shot by different cameras, in order to avoid the color of the fusion image from being changed, color mapping is adopted, so that the color difference between the second intermediate image and the main shooting cutting image can be reduced, and the color difference between the fusion image and the main shooting image is reduced.

405, The electronic device 100 performs attention learning on the main cut image based on the second intermediate image, resulting in an enhanced feature image.

Because the main camera and/or the tele camera are/is rotatable cameras, the shooting angle of the main camera may be greatly different from that of the tele camera, for example, the angles of shooting faces may be different, and further, the expression, the positions of five sense organs and the like of the faces in the main shot image and the tele image may be different. It is therefore necessary to find the corresponding facial region to learn the detailed information of the region of the five sense organs or the hair and the like more specifically. In order to avoid that the features of different areas of the main shot image and the tele image cannot be matched, attention learning is increased. For example, the image a is a face image of a target portrait photographed by the main camera at a right angle, and the image B is a face image of a target portrait photographed by rotating the tele camera by 10 degrees, so that differences exist between the image a and the target portrait of the image B in terms of position, angle, expression and the like, and attention learning can learn feature mapping with larger differences between the image a and the image B so as to enhance the region with larger differences in the image a, thus improving the definition of the image a and embodying more details under the condition that the features of the image a are unchanged.

The electronic device 100 performs attention learning on the main shot cut image based on the second intermediate image, that is, the electronic device 100 learns features having large differences between the second intermediate image and the main shot cut image, and performs enhancement processing on these features in the main shot cut image, thereby obtaining an enhanced feature image. That is, the main cut image learning second intermediate image can obtain the enhanced feature image. Because the second intermediate image is an image obtained by carrying out alignment processing and color mapping processing on the long-focus cutting image, the definition of the second intermediate image is high Yu Zhu, the definition of the cutting image is high, the color difference between the second intermediate image and the main cutting image is small, and the second intermediate image and the main cutting image have differences in angle, expression and the like, the definition of the enhanced feature image can be improved and more details can be embodied by learning the second intermediate image through the main cutting image.

In one implementation, the electronic device 100 may input the second intermediate image and the main cropping image into the attention learning module, resulting in the enhanced feature image output by the attention learning module. The attention mechanics module may implement the functionality of the attention mechanics model or be described as including the attention mechanics model. Note that the mechanics learning model may perform focus learning on a specific region (for example, a region such as five sense organs hair) where there is a difference between the second intermediate image and the main shot cut image, so as to enhance the calculation weight of the feature of the specific region, which is helpful to improve the definition of the specific region and embody more details of the specific region.

Step 404 and step 405 may be implemented by any of the following means:

Mode 1, after inputting a first intermediate image into an encoder, the encoder may output a characteristic of the first intermediate image; after inputting the main cut image into the encoder, the encoder may output the features of the main cut image. The features of the first intermediate image and the features of the main shot cut image are input into a color mapping module, the color mapping module can output a second intermediate image, and the features of the second intermediate image are fused with the color features of the main shot cut image. The features of the second intermediate image and the features of the main shot cut image are input into an attention mechanics learning module, which can output the enhanced features of the main shot cut image. The enhancement features of the main cut image are input to a decoder, which can output the enhancement feature image.

Mode 2, after inputting the first intermediate image into the encoder, the encoder may output the characteristics of the first intermediate image; after inputting the main cut image into the encoder, the encoder may output the features of the main cut image. The features of the first intermediate image and the features of the main shot cut image are input into a color mapping module, the color mapping module can output a second intermediate image, and the features of the second intermediate image are fused with the color features of the main shot cut image. Features of the second intermediate image are input to a decoder, which may output the second intermediate image. The second intermediate image and the main shot cut image are input into an attention mechanical learning module, and the attention mechanical learning module can output an enhanced feature image.

Mode 3, after inputting the first intermediate image into the encoder, the encoder may output the characteristics of the first intermediate image; after inputting the main cut image into the encoder, the encoder may output the features of the main cut image. The features of the first intermediate image and the features of the main shot cut image are input into a color mapping module, the color mapping module can output a second intermediate image, and the features of the second intermediate image are fused with the color features of the main shot cut image. Features of the second intermediate image are input to a decoder, which may output the second intermediate image. The second intermediate image is input to an encoder, which may output characteristics of the second intermediate image. The features of the second intermediate image and the features of the main shot cut image are input into an attention mechanical learning module, which can output the enhanced features of the second intermediate image. The enhancement features of the second intermediate image are input to a decoder, which may output the enhancement feature image.

Modes 1 to 3, features of an image obtained by an encoder are introduced, and an image is obtained by a decoder based on the features. The difference between the mode 1 and the mode 2 is that the positions of the decoders are different, and the decoders in the mode 1 make the color mapping processing and the attention learning based on the characteristics of the image after the attention learning module, so that the realization is convenient and the processing efficiency is high; before the decoder in the mode 2 pays attention to the learning module, attention learning is based on the image, processing complexity is high, and processing efficiency is low. Mode 3 is to add a pair of encoder and decoder based on mode 1, and has a complex structure, high cost and low processing efficiency.

In modes 1 to 3, the first intermediate image and the main cut image are input to the encoder, respectively, and may be input to the same decoder, or may be input to different encoders, and the encoder may use a trained visual geometry group (Visual Geometry Group, VGG) model. Accordingly, the decoder may also employ a VGG model. The encoders in embodiment 3 may be the same encoder or different encoders.

For example, modes 1 to 3 can be seen from fig. 6, and fig. 6 exemplifies that the first intermediate image and the main cut image are input to different encoders, respectively.

Steps 403 to 405 may be understood as the electronic device 100 determining the enhanced feature image from the main cut image, the tele cut image, and the image processing model. That is, the electronic apparatus 100 inputs the main cut image and the tele cut image into the image processing model, and can obtain the enhanced feature image output by the image processing model. The image processing model is used for realizing alignment processing, color mapping processing and attention learning. That is, the image processing model is used for performing alignment processing on the main cropping image based on the long Jiao Caijian image to obtain a first intermediate image; performing color mapping processing on the first intermediate image based on the main shot clipping image to obtain a second intermediate image; and performing attention study on the main shot clipping image based on the second intermediate image to obtain an enhanced feature image. An image processing model can be understood as a network model that can implement the functions of alignment processing, color mapping processing, and attention learning. The image processing model takes the long-focus clipping image as a reference, and processes the main shooting clipping image to obtain an enhanced characteristic image corresponding to the main shooting clipping image.

In the implementation process, an image processing model is deployed on the camera service frame, the camera service frame inputs a main shot cutting image and a long focus cutting image into the image processing model, the image processing model processes the main shot cutting image by taking the long focus cutting image as a reference, and therefore the image processing model can output an enhanced characteristic image.

406, The electronic device 100 generates a fused image based on the enhanced feature image.

Since the enhanced feature image is an enhanced feature image corresponding to the main cut image, the main cut image is a partial area image of the main cut image, and other area images in the main cut image other than the main cut image are not processed by the electronic device 100, the enhanced feature image is fused into the main cut image to obtain a fused image. Fusing the enhanced feature image into the main shot image can be understood as replying the enhanced feature image into the main shot image, and performing peripheral fusion processing in the replying process so as to better fuse the enhanced feature image into the main shot image. By way of example, the enhanced feature image is posted back to the main shot image according to the clipping position of the main shot clipping image in the main shot image, and peripheral fusion processing can be performed on surrounding images of the clipping position in the posting process. The posting of the enhanced feature image back into the main shot image can be understood as replacing or overlaying the main shot cut image in the main shot image with the enhanced feature image. The peripheral fusion processing is a processing mode related to the image splicing process, so that the difference of splicing areas can be reduced. Generating a fused image is also understood to mean generating a preview image, the preset image being a fused image based on the main shot image and the tele image.

In the implementation process, the camera service framework may call a post-processing algorithm to fuse the enhanced feature image into the main shot image to obtain a fused image. Among them, post-processing algorithms may include, but are not limited to, a posting algorithm and a peripheral fusion processing algorithm.

407, The electronic device 100 displays the fused image.

Optionally, the electronic device 100 displays the fused image in the image display area. The image display area may be, for example, a display area shown by the image display control 123 in fig. 1B or 1C, or a display area shown by the image display window 131 in fig. 1D. Compared with the main shooting image, the fusion image has more obvious detail and higher definition of the face area.

In the implementation process, the camera service framework calls the hardware abstraction layer to transmit the fusion image to the display driver, and the display driver receives the fusion image and drives the display screen to display the fusion image.

In the embodiment shown in fig. 4, the color difference between the fusion image and the main photographed image can be reduced by the color mapping process; by paying attention to the mechanical learning process, the definition and detail of the fusion image can be improved.

The embodiment shown in fig. 4, the electronic device 100 has the capabilities of an alignment process, a color mapping process, and attention learning. An image processing model is deployed in the electronic device 100 for providing an alignment process, a color mapping process, and an attention learning capability, i.e., for performing steps 403 to 405. The camera services framework of the electronic device 100 may deploy an image processing model. The image processing model deployed by the electronic device 100 may come from a server. That is, in the process of training the model, the server may obtain an image processing model that satisfies the deployment condition, and send the parameters of the model to the electronic device 100, so that the electronic device 100 deploys the image processing model based on the parameters, and thus the electronic device 100 may invoke the image processing model to process the main-shot clipping image based on the long Jiao Caijian image, so as to obtain the enhanced feature image.

The training process of the image processing model is described below, and please refer to the flowchart shown in fig. 7. The flow chart shown in fig. 7 may include:

1, the training image 1 is cut into a training cut image 1, and the training image 2 is cut into a training cut image 2.

The training images 1 and 2 are training images of the same shooting scene in the training image set, for example, training images of target figures. The training image 1 may be a training image of a tele image, and the training image 2 is a training image of a main shot image. It will be appreciated that training image 1 and training image 2 have the same image Identifier (ID), which may be used to identify images of different shooting parameters of the same shooting scene. Images having the same image ID are photographed in the same scene, and photographed subjects are the same, but photographing parameters such as photographing angle, focal length, color, and the like are different.

The manner of acquiring the training image set may be any of the following:

In the mode a, a public data set (for example, VGG face 2) is subjected to data screening, a training image set having the same image ID is screened, and the training image set is distinguished to obtain at least one image with different photographing angles, focal lengths, colors, and the like.

Mode B, a training image set is constructed based on the public data set. For example, if the public data set has only one image for a certain image ID, at least one different image of a photographing angle, a focal length, a color, etc. may be constructed based on the image, thereby generating a training image set of the image. Alternatively, the training image set may be constructed using rendering (e.g., rotate-and-Render).

Mode C, data degradation. For a certain image, a method of network learning (for example, a method of DSN network learning, a method of full-image or split-patch training, etc.) is used, and at least one method of up-down sampling, JPEG compression, gaussian blur, SINC FILTER, random small-range distortion, etc. is superimposed to degenerate the image, so as to obtain at least one degraded image. The image may be taken as a tele image and the degraded image as a main shot image. The image may be an actually photographed image. The data degradation is a degradation effect simulating an actually photographed image.

The modes a to C may be executed by a server or may be executed manually, that is, the developer selects a training image set and inputs it into the server.

2, Inputting the training clipping image 1 and the training clipping image 2 into an alignment module, performing alignment processing on the training clipping image 1 based on the training clipping image 2, and outputting a first training image by the alignment module.

The alignment processing of the training cut image 1 based on the training cut image 2 may be that the training cut image 1 is transformed such that the transformed first training image is aligned with the training cut image 2. The alignment may include, among other things, size alignment and pixel alignment. The size alignment may refer to the specific description of "1) the size alignment" above, and the pixel alignment may refer to the specific description of "2) the pixel alignment" above, and thus, will not be described again.

3, Inputting the first training image into an encoder, wherein the encoder can output the characteristics (including color characteristics) of the first training image; the training crop image 2 is input to an encoder, which may output features (including color features) of the training crop image 2.

The encoder for inputting the first training image may be the same encoder as the encoder for inputting the training clip image 2, or may be a different encoder. The encoder may employ a VGG model.

And 4, inputting the features of the first training image and the features of the training clipping image 2 into a color mapping module for performing color mapping processing, wherein the color mapping module can output the features of the second training image. The features of the second training image blend the color features of the training crop image 2.

The color mapping module may perform color mapping processing by adopting AdaIN algorithm, and blend the color features of the training cut image 2 into the features of the second training image, so as to reduce the color difference between the second training image and the training cut image 2.

For example, assuming that the color characteristic of the first training image is m and the color characteristic of the training clip image 2 is n, the AdaIN algorithm may migrate n to m, see the following formula.

Where μ (n) represents the mean value of the color features of the training cut image 2, σ (n) represents the mean value of the color features of the training cut image 2, μ (m) represents the mean value of the color features of the first training image, σ (m) represents the mean value of the features of the first training image, adaIN (m, n) represents the color features of the second training image.

And 5, inputting the features of the training cut image 2 and the features of the second training image into an attention mechanical learning module for attention learning, wherein the attention mechanical learning module can output the enhanced features of the training cut image 2.

That is, the attention training module learns the features having large differences between the second training image and the training cut image 2, and performs enhancement processing on these features in the training cut image 2, thereby obtaining enhanced features.

The enhancement features of training cut image 2 are input to a decoder, which can output an enhancement training image.

The decoder may employ a VGG model. The decoder is used to decode the features to obtain an image, i.e. to decode the enhanced features of the training cut image 2 to obtain an enhanced training image.

7, Generating a fusion training image based on the enhanced training image. Namely, according to the cutting position of the training cutting image 2 in the training image 2, the enhanced training image is posted back to the training image 2, and peripheral fusion processing can be carried out on the surrounding images of the cutting position in the posting process.

Since the enhanced training image is the enhanced image corresponding to the training cut image 2, the training cut image 2 is a partial area image of the training image 2, and the other area images except the training cut image 2 in the training image 2 are not processed, the enhanced training image is fused into the training image 2 to obtain a fused training image.

Steps 2 to 7 may be performed by a server. The steps 2-7 can be a training process of training the model, namely, after the training clipping image 1 and the training clipping image 2 are input into the training model, a fused training image output by the training model can be obtained. Or the steps 1-7 can be the training process of the training model, namely, after the training image 1 and the training image 2 are input into the training model, the fusion training image output by the training model can be obtained. The training model also outputs at least one of the following loss functions when outputting the fused training image: layer 1 (Layer 1, L1) loss based on full map, loss containing a priori keypoints of a specific region (e.g. face) (i.e. landmark loss), perceived loss, generated countermeasure Network (GAN) loss capable of generating detail textures, color loss (i.e. color loss), random up-down sampling loss (i.e. random up loss and/or random down loss), ARTIFACTS MAP loss for avoiding excessive flaws generated by GAN loss, multi-scale loss of feature output of each Layer, etc. The L1 loss based on the full graph refers to a loss function between a fusion training image and a target image, wherein the target image is an image corresponding to the training image 2 and is used for measuring whether a training model meets requirements. The multi-scale loss of each layer of characteristic output refers to the characteristic loss of each layer of convolution output by the encoder after the fusion training image and the target image are input into the same encoder. The multiscale loss of each layer of feature output is used to supervise the training model in generating intermediate features (e.g., features of the first training image, features of the second training image, and enhanced features of the training crop image 2).

In the case where the above-described loss functions each satisfy the deployment condition, the server may transmit the parameters of the training model to the electronic device 100 so that the electronic device 100 deploys the image processing model based on the parameters. Alternatively, the electronic device 100, after deploying the image processing model, inputs the main cut image and the tele cut image into the image processing model, which may output the enhanced feature image. Alternatively, the electronic apparatus 100 inputs the main image and the tele image into the image processing model after deploying the image processing model, and the image processing model may output the fused image.

The above-mentioned taking a server to perform model training and sending parameters of the training model meeting the deployment conditions to the electronic device as an example, besides the server may perform model training, the PC may also perform model training and send parameters of the training model meeting the deployment conditions to the electronic device. Other devices may be used for model training, and embodiments of the present application are not limited.

Alternatively, the parameters of the image processing model may be preset on the electronic device. For example, the parameters of the image processing model may be preset when the electronic device leaves the factory, so that the image processing model may be deployed based on the parameters of the image processing model when the electronic device is turned on or when the camera application is first used; or the electronic equipment can deploy the image processing model based on the parameters of the image processing model when receiving the triggering operation of deploying the image processing model; etc.

After the electronic device deploys the image processing model, the electronic device may receive update parameters from a server, for updating the image processing model, or patches for repairing or upgrading the image processing model, or the like. Alternatively, the electronic device may self-repair the image processing model after deployment of the image processing model. For example, the electronic device may self-repair the image processing model upon a system update of the electronic device.

Fig. 8 exemplarily shows a hardware configuration diagram of the electronic device 100.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio processing module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an AP, a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural-Network Processor (NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. In an embodiment of the present application, the processor 110 may deploy an image processing model and invoke the image processing model.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor.

The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques can include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information. In an embodiment of the present application, the display screen 194 is used to display the fused image.

The internal memory 121 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (NVM).

The random access memory may include static random-access memory (SRAM), dynamic random-access memory (dynamic random access memory, DRAM), synchronous dynamic random-access memory (synchronous dynamic random access memory, SDRAM), double data rate synchronous dynamic random-access memory (doubledata rate synchronous dynamic random access memory, DDR SDRAM, e.g., fifth generation DDR SDRAM is commonly referred to as DDR5 SDRAM), etc. The nonvolatile memory may include a disk storage device, a flash memory (flash memory).

The random access memory may be read directly from and written to by the processor 110, may be used to store executable programs (e.g., machine instructions) for an operating system or other on-the-fly programs, may also be used to store data for users and applications, and the like.

The nonvolatile memory may store executable programs, store data of users and applications, and the like, and may be loaded into the random access memory in advance for the processor 110 to directly read and write.

The external memory interface 120 may be used to connect external non-volatile memory to enable expansion of the memory capabilities of the electronic device 100. The external nonvolatile memory communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and video are stored in an external nonvolatile memory.

The electronic device 100 may implement audio functions through an audio processing module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio processing module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio processing module 170 may also be used to encode and decode audio signals. In some embodiments, the audio processing module 170 may be disposed in the processor 110, or some functional modules of the audio processing module 170 may be disposed in the processor 110.

Speaker 170A, also known as a "horn". The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A. Receiver 170B, also known as a "earpiece". When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear. Microphone 170C, also known as a "microphone" or "microphone". When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. The air pressure sensor 180C is used to measure air pressure. The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip cover using the magnetic sensor 180D. The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). A distance sensor 180F for measuring a distance. The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The ambient light sensor 180L is used to sense ambient light level. The fingerprint sensor 180H is used to collect a fingerprint. The temperature sensor 180J is for detecting temperature. The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The bone conduction sensor 180M may acquire a vibration signal.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like.

The term "User Interface (UI)" in the description and claims of the present application and in the drawings is a media interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The user interface of the application program is a source code written in a specific computer language such as java, extensible markup language (extensible markup language, XML) and the like, the interface source code is analyzed and rendered on the terminal equipment, and finally the interface source code is presented as content which can be identified by a user, such as a control of pictures, words, buttons and the like. Controls (controls), also known as parts (widgets), are basic elements of a user interface, typical controls being a toolbar (toolbar), menu bar (menu bar), text box (text box), button (button), scroll bar (scrollbar), picture and text. The properties and content of the controls in the interface are defined by labels or nodes, such as XML specifying the controls contained in the interface by nodes < Textview >, < ImgView >, < VideoView >, etc. One node corresponds to a control or attribute in the interface, and the node is rendered into visual content for a user after being analyzed and rendered. In addition, many applications, such as the interface of a hybrid application (hybrid application), typically include web pages. A web page, also referred to as a page, is understood to be a special control embedded in an application program interface, and is source code written in a specific computer language, such as hypertext markup language (hyper text markup language, GTML), cascading style sheets (CASCADING STYLE SHEETS, CSS), java script (JavaScript, JS), etc., and the web page source code may be loaded and displayed as user-recognizable content by a browser or web page display component similar to the browser function. The specific content contained in a web page is also defined by tags or nodes in the web page source code, such as GTML defines elements and attributes of the web page by < p >, < img >, < video >, < canvas >.

A commonly used presentation form of a user interface is a graphical user interface (graphic user interface, GUI), which refers to a graphically displayed user interface that is related to computer operations. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

As used in the specification of the present application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that the term "and/or" as used in this disclosure refers to and encompasses any or all possible combinations of one or more of the listed items. As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection …" depending on the context. Similarly, the phrase "at the time of determination …" or "if detected (a stated condition or event)" may be interpreted to mean "if determined …" or "in response to determination …" or "at the time of detection (a stated condition or event)" or "in response to detection (a stated condition or event)" depending on the context.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

Claims

1. The image processing method is characterized by being applied to electronic equipment comprising a main camera and a long-focus camera, wherein the main camera and/or the long-focus camera are rotatable cameras; the method comprises the following steps:

Starting a camera application program;

Receiving a first operation of a shooting control in the camera application program;

determining a main cut image and a tele cut image in response to the first operation;

determining an enhanced feature image based on the main shot cut image, the tele cut image, and an image processing model;

generating a fusion image based on the enhanced feature image;

and displaying the fusion image.

2. The method of claim 1, wherein the determining a main cut image and a tele cut image comprises:

Determining a main shooting image from images acquired by the main camera, and determining a long-focus image from images acquired by the long-focus camera;

And cutting the main shot image and the long-focus image to obtain a main shot cutting image and a long-focus cutting image.

3. The method of claim 2, wherein cropping the main shot image and the tele image to obtain a main shot cropped image and a tele cropped image comprises:

calling a region identification model to identify preset regions in the main shot image and the tele image, and calling a cutting function to cut the main shot image and the tele image based on the preset regions to obtain a main shot cutting image and a tele cutting image.

4. The method of claim 2, wherein the generating a fused image based on the enhanced feature image comprises:

and generating a fusion image based on the enhanced feature image and the main shot image.

5. The method of any of claims 1-4, wherein the determining an enhanced feature image from the main cut image, the tele cut image, and an image processing model comprises:

and inputting the main shot cut image and the tele cut image into the image processing model to obtain an enhanced characteristic image output by the image processing model.

6. The method of claim 5, wherein the image processing model is specifically configured to perform an alignment process on the tele-cropped image based on the main shot cropped image to obtain a first intermediate image; performing color mapping processing on the first intermediate image based on the main shot clipping image to obtain a second intermediate image; and performing attention learning on the main shot clipping image based on the second intermediate image to obtain the enhanced feature image.

7. The method of claim 6, wherein features of the first intermediate image are aligned with features of the main cut image.

8. The method of claim 6, wherein color features of the main cut image are fused in the second intermediate image.

9. The method of claim 5, wherein the method further comprises:

And receiving parameters of the image processing model, and deploying the image processing model based on the parameters of the image processing model.

10. An electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, the one or more memories for storing computer program code comprising computer instructions that, when executed by the one or more processors, cause the method of any of claims 1-9 to be performed.

11. A chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform the method of any of claims 1-9.

12. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the method of any one of claims 1-9 to be performed.