CN116091572B

CN116091572B - Method for acquiring image depth information, electronic equipment and storage medium

Info

Publication number: CN116091572B
Application number: CN202211290478.3A
Authority: CN
Inventors: 高旭
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-10-03
Anticipated expiration: 2042-10-21
Also published as: CN116091572A

Abstract

A method for acquiring image depth information, an electronic device and a storage medium. The method comprises the following steps: acquiring a plurality of first weight coefficients from a plurality of feature points to a target point in an image; determining bias coefficients corresponding to the target points and negatively related to the dispersion according to the dispersion of the plurality of first weight coefficients; adjusting a plurality of first weight coefficients using the bias coefficients; and acquiring a transformation coefficient of the target point according to the adjusted multiple first weight coefficients, and further acquiring image depth information of the target point. When the target point is in the partial vacancy area, the influence degree of each characteristic point in the plurality of characteristic points on the transformation coefficient of the target point is small, the acquired plurality of first weight coefficients are almost 0, the discrete degree is low, and the bias coefficient is large. The first weight coefficient is adjusted by using a larger bias coefficient, so that the influence of each characteristic point on the image on the target point is averaged, the accuracy of a transformation system obtained by the target point due to partial gaps is reduced, and the accuracy of the obtained image depth information is improved.

Description

Method for acquiring image depth information, electronic equipment and storage medium

Technical Field

The present application relates to the field of augmented reality technologies, and in particular, to a method for acquiring image depth information, an electronic device, and a storage medium.

Background

Augmented Reality (AR) is a technology in which the real world is supplemented with generated virtual information, and the virtual information and the real world appear to coexist in the same space. Currently, most AR applications running on electronic devices superimpose a virtual scene in front of a real scene, but this simple superimposition technique lacks in correctly handling the occlusion relationship between the virtual scene and the real scene, and may cause erroneous spatial perception to the user.

Therefore, in order to strengthen the sense of reality of the virtual object in the real scene, the user can generate correct spatial perception in the AR application, and the correct processing of the virtual-real occlusion relationship between the virtual object in the AR and the real scene has important significance. The virtual-actual shielding processing in the AR generally adopts means such as depth extraction, and generates a virtual-actual fusion image with a correct shielding relationship by acquiring the shielding contour of a foreground object in the actual image and recovering the depth information of the foreground object.

Restoring scene object depth information generally refers to acquiring an absolute depth map of a scene using a monocular image. That is, the monocular image is processed using a relative depth model, such as a convolutional neural network (convolutional neural networks, CNN) model, to obtain a relative depth map. Wherein the monocular image is an image in the same scene. Then, combining the synchronous positioning and mapping system (Simultaneous Localization and Mapping, SLAM) feature points, and obtaining an absolute depth map of the scene through translation, rotation and scaling transformation. However, because the SLAM feature points may be unevenly distributed, for example, a partial void may occur, so that the error between the obtained absolute depth map of the scene and the real value is larger, thereby forming an incorrect virtual-real occlusion relationship, further reducing the sense of reality of the virtual scene added in the real scene by the AR application, and even affecting the correct spatial perception of the user.

Disclosure of Invention

In order to solve the above problems, the present application provides a method, an electronic device, and a storage medium for obtaining image depth information, which are used for obtaining correct image depth information, and further obtaining correct virtual-real occlusion relationship, so as to improve the sense of reality of a virtual scene added in a real scene by an AR application, and enable a user to generate correct spatial perception.

In a first aspect, the present application provides a method of acquiring image depth information, the method comprising:

acquiring a plurality of first weight coefficients from a plurality of feature points to a target point in an image; a first weight coefficient represents the influence degree of one characteristic point in the plurality of characteristic points on the transformation coefficient of the target point, and the transformation coefficient represents the translation size and the scaling size;

determining bias coefficients corresponding to the target points according to the dispersion of the first weight coefficients; the dispersion represents the dispersion degree of a plurality of first weight coefficients, and the bias coefficient and the dispersion are in negative correlation; adjusting a plurality of first weight coefficients using the bias coefficients;

obtaining a transformation coefficient of the target point according to the adjusted first weight coefficients; and acquiring image depth information of the target point according to the transformation coefficient of the target point.

According to the scheme provided by the application, for the region where the effective characteristic points do not exist, such as a partial vacancy region, the influence degree of each characteristic point in the plurality of characteristic points on the transformation coefficient of the target point is small, and the acquired plurality of first weight coefficients are almost 0. At this time, the degree of dispersion of the plurality of first weight coefficients is low, and the acquired bias coefficient is large. The first weight coefficient is adjusted with a large bias coefficient. For the target point in other areas, the influence degree of each characteristic point in the plurality of characteristic points on the transformation coefficient of the target point is large, and the obtained plurality of first weight coefficients are large in discrete degree. The bias factor is small at this time. The first weight coefficient is adjusted with a small bias coefficient. Therefore, the influence of each characteristic point on the image on the target point is averaged, so that the influence of the partial vacancy is reduced, the problem that the accuracy of the transformation coefficient acquired by the target point is low is caused, and the accuracy of the acquired image depth information is improved.

In one possible implementation, the dispersion is a standard deviation, and the bias coefficient corresponding to the target point is determined based on a mapping relationship between a preset bias coefficient and the standard deviation according to the standard deviation of the plurality of first weight coefficients. Wherein the bias factor is a number in the range of 0 to 1.

In one possible implementation, a distance from each of a plurality of feature points to a target point is obtained; acquiring a first weight coefficient according to the distance from each characteristic point to the target point; the first weight coefficient and the distance are in negative correlation, and the first weight coefficient is a number in the range of 0 to 1.

In one possible implementation, a first weight coefficient is obtained according to the distance between each feature point and the target point and the confidence level of each feature point; the first weight coefficient is in positive correlation with the confidence coefficient, that is, the greater the confidence coefficient is, the greater the first weight coefficient is, wherein the confidence coefficient is used for representing the probability that the true value of each feature point falls around the measurement result. The greater the probability, the greater the confidence, and the greater the first weight coefficient. The smaller the probability, the lower the confidence, and the smaller the first weight coefficient.

In one possible implementation manner, according to the adjusted first weight coefficient, a diagonal matrix taking the first weight coefficient as a main diagonal element, namely a weight matrix, is obtained, and the transformation coefficient of the target point is calculated by utilizing the weight matrix and a preset curve fitting mode.

In one possible implementation, the preset curve fitting mode is a least squares method.

In one possible implementation, the method further includes:

obtaining the transformation coefficients of other target points by utilizing an interpolation algorithm according to the transformation coefficients of the target points; and determining image depth information of other target points according to the transformation coefficients of the other target points.

In one possible implementation, the interpolation algorithm is a bilinear interpolation algorithm.

Therefore, only the transformation coefficients of part of target points are needed to be calculated, and the transformation coefficients of other target points can be calculated by combining an interpolation algorithm. Compared with a mode of adjusting the weight coefficient by using the bias coefficient and calculating the conversion coefficient of the target point by using the weight coefficient, the interpolation algorithm has less calculation workload and can acquire the efficiency of the image depth information.

In one possible implementation, the target point is a pixel point.

In one possible implementation manner, the target point is a local center point, and the method for specifically acquiring the local center point is as follows:

and acquiring a second weight coefficient from each feature point to each pixel point in the feature points. A second weight coefficient represents the degree of influence of a feature point on the transform coefficient of a pixel point. For example, N second weight coefficients from N feature points to M pixel points;

Acquiring an image characteristic point distribution density map according to a plurality of second weight coefficients; and dividing the characteristic point distribution density map based on a preset rule, and obtaining a plurality of divided characteristic point distribution density maps. And determining a local center point corresponding to the distribution density map of each feature point. Each local center is a target point. When the image depth information is acquired, only the image depth information of each local center point is needed to be calculated, and the image depth information of each pixel point is not needed to be calculated, so that the calculation workload is reduced, and the efficiency of acquiring the image depth information is improved.

In one possible implementation, a plurality of second weight coefficients corresponding to each pixel point are accumulated to obtain a weight accumulated value. And acquiring a characteristic point distribution density map according to the weight accumulated value.

In one possible implementation, the preset rule is that the difference value of the weight accumulated value is within a preset difference range, and a plurality of pixels with pixel distances within a preset distance range are distributed in the same block.

In one possible implementation, the feature point distribution density map is segmented using a super-pixel segmentation algorithm.

In a second aspect, the application also provides an electronic device comprising a memory and a processor, the memory being coupled to the processor. The memory stores program instructions that, when executed by the processor, cause the electronic device to perform the above first aspect or implementations corresponding to the first aspect.

In a third aspect, the application provides a computer readable medium comprising computer readable instructions which, when run on a computing device, cause the computing device to perform the above first aspect or implementations corresponding to the first aspect.

It should be appreciated that the description of technical features, aspects, benefits or similar language in the present application does not imply that all of the features and advantages may be realized with any single embodiment. Conversely, it should be understood that the description of features or advantages is intended to include, in at least one embodiment, the particular features, aspects, or advantages. Therefore, the description of technical features, technical solutions or advantageous effects in this specification does not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantageous effects described in the present embodiment may also be combined in any appropriate manner. Those of skill in the art will appreciate that an embodiment may be implemented without one or more particular features, aspects, or benefits of a particular embodiment. In other embodiments, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic diagram of an electronic device software structure according to an embodiment of the present application;

FIG. 3A is a schematic view of an RGB color image of a real scene according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a relative depth information map according to an embodiment of the present application;

FIG. 3C is a schematic diagram of a feature point according to an embodiment of the present application;

FIG. 3D is a schematic diagram of an absolute depth information map according to an embodiment of the present application;

fig. 4 is a schematic diagram of a mobile phone interface according to an embodiment of the present application;

fig. 5A is a schematic diagram of a shooting interface according to an embodiment of the present application;

fig. 5B is a schematic view of another shooting interface according to an embodiment of the present application;

fig. 5C is a schematic diagram of an interface including an option menu in another shooting interface according to an embodiment of the present application;

fig. 5D is a schematic view of a photographing interface provided in the photographing function according to an embodiment of the present application;

fig. 5E is a schematic view of a shooting interface provided in a video recording function according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for turning on a first function to perform an "AR image" preview according to an embodiment of the present application;

FIG. 7A is a flowchart of a method for obtaining image depth information according to an embodiment of the present application;

FIG. 7B is a schematic view of SLAM feature points according to an embodiment of the present application;

fig. 7C is a schematic diagram of an image depth information map according to an embodiment of the present application;

FIG. 8A is a flowchart of another method for obtaining image depth information according to an embodiment of the present application;

FIG. 8B is a schematic view of SLAM feature points according to an embodiment of the present application;

FIG. 8C is a distribution density chart of feature points according to an embodiment of the present application;

FIG. 8D is a block diagram of a feature point distribution density map according to an embodiment of the present application;

FIG. 8E is a schematic diagram of another image depth information map obtained according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for obtaining image depth information according to an embodiment of the present application.

Detailed Description

The terms first, second, third and the like in the description and in the claims and in the drawings are used for distinguishing between different objects and not for limiting the specified order.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In order to make the technical scheme of the present application more clearly understood by those skilled in the art, the application scenario of the technical scheme of the present application is first described below.

The image depth information acquisition method provided by the application is applied to electronic equipment. The electronic device may be a mobile phone, a notebook computer, a wearable electronic device (such as a smart watch), a tablet computer, an AR device, a vehicle-mounted device, and the like. The above electronic device may include one or more cameras. The application of the above electronic device may provide an "AR special effect" function and/or a blurring function.

The AR special effect function can be used for fusing a photographed real scene and a virtual scene in the photographing, video recording or video call process so as to enhance the content of the real scene. For example, in the video call process, virtual scenes such as caps and the like are superimposed on the face of the caller in real time, so that the scene content of the caller is enhanced, and the interestingness of the video conversation is improved. The fusion of the real scene and the virtual scene involved in the "AR special effect" function may include: and calculating the depth information of the real scene graph, then acquiring the spatial position relation between the virtual scene and the real scene according to the viewpoint position of the user, the superposition position of the virtual scene and the depth information, namely acquiring the correct virtual-real occlusion relation, and finally fusing the real scene and the virtual scene by using the virtual-real occlusion relation so as to realize the correct spatial perception of the fused image to the user.

The blurring function can be used for highlighting an image theme in the process of photographing, video recording or video, blurring an image background and enabling the image to be more vivid. For example, in the process of shooting a landscape image, blurring is performed on other shot objects, and shooting of the landscape is highlighted, so that the image can be more vivid. The blurring image background to which the blurring function relates may include: calculating image depth information, filtering the depth, identifying an interested region according to the depth information, and blurring pixel points at different depth positions to different degrees so as to realize blurring image background and highlight shooting subjects.

In the embodiment of the application, the AR special effect function or the blurring function can be integrated in a camera application program of electronic equipment such as a mobile phone, and particularly integrated in a photographing function and a video recording function of the camera application program. When the electronic equipment performs a photographing function and a video recording function, the AR special effect or the blurring function can be selected through the configuration file to perform photographing and video recording. The "AR special effect" function or the blurring function may also be integrated into the "video communication" function in the communication application of the electronic device such as the mobile phone. The "AR special effects" function may also be implemented as a stand-alone application.

In order to make the technical scheme of the application clearer and easier to understand, the electronic equipment and the image processing system architecture thereof are described below.

An embodiment of the present application provides an electronic device 100, see fig. 1. The electronic device 100 may include: processor 110, external memory interface 120, internal memory 121, mobile communication module 150, sensor module 180, keys 190, camera 193, display 194, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, a fingerprint sensor 180H, a touch sensor 180K, and the like, among others.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering, such as image blurring and AR special effects functions. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an image signal processor (Image Signal Processor, ISP), a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. ISPs may also perform blurring or AR effect processing on the image. In some embodiments, the ISP may be provided in the camera 193.

In some embodiments, the camera 193 may be comprised of a color camera module and a 3D sensing module.

In some embodiments, the photosensitive element of the camera of the color camera module may be a charge coupled device (charge coupled device, CCD) or a complementary metal oxide semiconductor (complementary metal-oxide-semiconductor, CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format.

In some embodiments, the 3D sensing module may be a (time of flight) 3D sensing module or a structured light (3D) sensing module. The structured light 3D sensing is an active depth sensing technology, and basic components of the structured light 3D sensing module may include an Infrared (Infrared) emitter, an IR camera module, and the like. The working principle of the structured light 3D sensing module is that a light spot (pattern) with a specific pattern is emitted to a shot object, then a light spot pattern code (light coding) on the surface of the object is received, and the difference between the light spot and an original projected light spot is compared, and the three-dimensional coordinate of the object is calculated by utilizing the triangle principle. The three-dimensional coordinates include the distance from the electronic device 100 to the subject. Among other things, TOF 3D sensing is also an active depth sensing technology, and the basic components of TOF 3D sensing modules may include Infrared (IR) emitters, IR camera modules, and the like. The working principle of the TOF 3D sensing module is to calculate the distance (namely depth) between the TOF 3D sensing module and the shot object through the time of infrared ray turn-back so as to obtain a 3D depth map.

The structured light 3D sensing module can also be applied to the fields of face recognition, somatosensory game machines, industrial machine vision detection and the like. The TOF 3D sensing module can also be applied to the fields of game machines, augmented reality (augmented reality, AR)/Virtual Reality (VR), and the like.

In other embodiments, camera 193 may also be comprised of two or more cameras. The two or more cameras may include a color camera that may be used to capture color image data of the object being photographed. The two or more cameras may employ stereoscopic vision (stereo) technology to acquire depth data of the photographed object. The stereoscopic vision technology is based on the principle of parallax of human eyes, and obtains distance information, i.e., depth information, between the electronic device 100 and the object to be photographed by shooting images of the same object from different angles through two or more cameras under a natural light source and performing operations such as triangulation.

In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1. Specifically, the electronic device 100 may include 1 front camera 193 and 1 rear camera 193. The front camera 193 may be used to collect color image data and depth data of a photographer facing the display screen 194, and the rear 3D camera module 193 may be used to collect color image data and depth data of a photographed object (e.g., a person, a landscape, etc.) facing the photographer.

In some embodiments, a CPU or GPU or NPU in the processor 110 may process color image data and depth data acquired by the 3D camera module 193. In some embodiments, the NPU may identify color image data acquired by the 3D camera module 193 (specifically, the color camera module) through a neural network algorithm based on a feature point identification technique, such as a synchronous localization and mapping SLAM algorithm, to determine feature points of a captured image. The CPU or GPU may also run a neural network algorithm to implement determining feature points of the captured image from the color image data. In some embodiments, the CPU or GPU or NPU may also be configured to obtain absolute depth information of the image based on depth data collected by the 3D camera module 193 (specifically, the 3D sensing module) and the determined feature points, and to blur the image or implement "AR special effects" based on the depth information of the image. In the following embodiments, how to perform the "AR special effect" and blurring processing on the captured image based on the color image data and the depth data acquired by the 3D camera module 193 will be described in detail, and will not be described in detail here.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The hardware system of the electronic device 100 is described in detail above, and the software system of the electronic device 100 is described below. The software system may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the invention, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

As shown in fig. 2, the Android system with a layered structure is divided into a plurality of layers, and each layer has clear roles and division. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages. The application package may include camera, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

The application framework layer may include a window manager, a content provider, a view system, a resource manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface 1 manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The workflow of the electronic device 100 software and hardware is illustrated below in connection with capturing a photo scene.

When touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video by the camera 193.

The present application takes the example of the electronic device 100 being a mobile phone, and implementing the "AR special effect" function or the "image blurring" function as a mobile phone "camera" program. The technical scheme provided by the application is described in detail below with reference to the accompanying drawings.

The terms involved in the present application will be explained first.

Monocular image: refers to image information acquired from a single camera sensor.

SLAM: the synchronous positioning and mapping method is mainly used for solving the problems of positioning and mapping in an unknown environment.

SLAM feature points: the images obtained by SLAM algorithm have distinct characteristics and can effectively reflect discrete points of essential characteristics of the images.

AR special effects: the technology is a technology for skillfully fusing a virtual scene and a real scene, and adopts various technical means such as multimedia, three-dimensional modeling, real-time tracking and registering, intelligent interaction, sensing and the like, and the technology is applied to the real scene after the virtual scene such as characters, images, three-dimensional modeling, music and the like generated by a computer are simulated, so as to enhance the content of the real scene.

Virtual-real occlusion relationship: the virtual scene and the real scene should have a correct shielding relationship, for example, the virtual scene can be shielded by the foreground object while the virtual scene shields the background. However, a wrong occlusion relationship may cause a wrong spatial perception to the user. Generally, a technical means based on depth calculation is adopted, and through calculating the depth information of a real scene graph, the spatial position relation between a virtual scene and the real scene is obtained according to the viewpoint position of a user, the superposition position of the virtual scene and the image depth information, so that a correct shielding relation is generated.

Relative depth information: representing the relative distance relationship between pixel points, not the true depth value. The depth information output by the existing depth generation model is relative depth information.

Absolute depth information: the value of a pixel represents the true distance from the camera, typically measured in meters. In the prior art, the relative depth information is often acquired first, and the absolute depth information is acquired by combining conversion means such as offset, scaling and the like. Exemplary description: the image shown in fig. 3A (typically, a color image in practical application) is input into a conventional relative depth generation model, and relative depth information is output, as shown in fig. 3B. The relative depth information is converted into an absolute depth information map 3D by a conversion means such as offset and scaling in combination with the respective feature points in fig. 3C.

Because the relative depth information has estimation errors and the characteristic points have errors, the precision of the obtained transformation coefficient directly influences the precision of the absolute depth information obtained after transformation. Wherein the transform coefficient is used for representing the transform size of the absolute depth information obtained by the relative depth information through the transform means such as offset, scaling and the like, and comprises the offset coefficientsScaling factor μ. In order to obtain a transform coefficient with higher precision, in one possible design, a plurality of feature points are adopted to calculate a formula by a curve fitting mode, such as a global least square method, as follows:

wherein,,d_absas the absolute depth information, it is possible to provide,d_absis the relative depth information.

In particular obtaining transform coefficientsβThe method is a least square method, and specifically comprises the following steps:

wherein,,

thus, it is possible to obtain:

wherein,,X ^T matrix of relative depth pointsXIs used to determine the transposed matrix of (a),yand a matrix formed by absolute depth information corresponding to the pixel points.

However, this fitting method results in a larger error in the absolute depth information obtained due to the relative depth point error. Thus, in one possible implementation, a locally weighted least squares method is employed, i.e. the transform coefficients for each point are obtained by adding a weight coefficient to each feature point at the time of fitting.

However, such a locally weighted least squares method does not take into account the influence of local characteristics, such as feature point local vacancies, in particular, for example, the lower right corner sofa position of fig. 3C, such that the acquired transformation coefficients are not presentThe accuracy is low, so that the error between the recovered absolute depth information map and the true depth information map is larger.

Based on the above, the application provides a mode for acquiring the image depth information, and when the transformation coefficient is acquired during curve fitting by utilizing the characteristic points, the weight coefficient is adjusted by utilizing the bias coefficient. Specifically, for the feature points at the partial vacancy positions, the degree of dispersion of the weight coefficient is low and is almost 0, and the weight coefficient is adjusted by using a larger bias coefficient. For the feature points at other positions, the degree of dispersion of the weight coefficient is high, and the weight coefficient is adjusted by using a smaller bias coefficient. Therefore, the influence of each characteristic point on the image on the target point on the image is averaged, so that the problem that the accuracy of the transformation coefficient obtained by curve fitting is low due to the fact that the partial gaps are reduced is solved, and the accuracy of the obtained absolute depth information map is improved.

In order to better explain the technical scheme of the application, a specific implementation manner of an application scene is described in detail below.

Illustratively, as in the handset shown in FIG. 4, the user clicks on the "camera" application. The camera application can be a native camera application of a mobile phone, a multifunctional camera application developed by a third party, and the like.

In response to receiving an operation of the user to open the camera application, the mobile phone may display a photographing interface 500 as shown in fig. 5A. The camera application may turn on the "take a picture" function by default. The photographing interface 500 includes a viewfinder 501, and functional controls such as "portrait", "photograph", "video", "professional", "more", and the like. In some examples, as shown in fig. 5A, the capture interface 500 further includes a "first function" control 502 that a user can turn on a first function of the cell phone by operating the "first function" control. Wherein the first function comprises an "AR special effect" function, and/or an image blurring function.

The "AR special effect" function is to fuse the real scene shot by the user with the virtual scene image based on the virtual scene selected by the user (for example, the virtual scene selected from the cloud server) so as to enhance the content of the real scene graph. The image blurring function adjusts the depth of field for the user, thereby controlling the sharpness of the picture subject and the background environment in the image.

In the embodiment of the application, the first function is integrated with the shooting function of the first function, and specifically comprises the shooting function of the first function and the video recording function of the first function. After the photographing or video recording function of the first function is started, the user can obtain the first function image. For example, the function of the "AR special effect" is integrated into the shooting function of the "AR special effect", and specifically includes the shooting function of the "AR special effect" and the video recording function of the "AR special effect". After the shooting or video recording function of the AR special effect is started, the mobile phone can fuse the shot real scene with the virtual scene based on the selected virtual scene to obtain a fused image. Or, the mobile phone guides the user to adjust the shooting position or the gesture of the mobile phone based on the selected virtual scene, so that the image in the recorded video can be fused with the virtual scene to obtain a fused image.

In other examples, as shown in fig. 5B, the "more" control 504 is operated in response to a user in the capture interface 507. The handset opens an options menu 508 of more functionality controls as shown in fig. 5C, with the "first functionality" control 502 provided in the options menu 508. Alternatively, the user may also open a setup options interface of the camera application by operating the "setup" control 505 in the capture interface 507, where the "first functionality" control 502 is set.

In still other examples, as shown in fig. 5D, the "first functionality" control 502 may also be provided in a capture interface 506 in a "capture" function. Alternatively, as shown in fig. 5E, a "first function" control 502 is provided in a shooting interface 509 in the "video recording" function. The user may quickly open the first function through the "first function" control 502 while using the "take a picture" function or the "record" function. For example, when the user uses the "photographing" function, after the "AR special effect" function is turned on through the "AR special effect" function control 502, the mobile phone defaults to turn on the "AR special effect" photographing function. For another example, when the user uses the "video" function, after the "AR special effect" function is turned on by the "AR special effect" function control 502, the mobile phone defaults to turn on the "AR special effect" video function.

It should be noted that, in the embodiment of the present application, the setting position of the "first function" control 502 and the manner in which the user operates the "AR special effect" function control 502 are not limited. Of course, the user may also turn on the first function in other ways, such as performing a predefined air gesture, entering a voice command, pressing a physical control, drawing a predefined pattern on the touch screen of the cell phone, etc.

For easy understanding, the method for starting the first function to preview the image by operating the first function control 502 on the option menu 508 shown in fig. 5C will be described with reference to the drawing in the following description of the photographing interface 507 shown in fig. 5A. Wherein the first function is an "AR special effect" function. Fig. 6 is a flowchart of a method for turning on a first function to preview an "AR image" according to an embodiment of the present application. The method comprises the following steps:

s601: the first function control receives virtual scene information transmitted by the cloud server.

In the embodiment of the application, the cloud server stores a virtual scene information base. In one possible implementation manner, the first functional control obtains a preset requirement, and obtains virtual scene information from a virtual scene information base of the cloud server according to the preset requirement. For example, an icon corresponding to the virtual scene in the virtual scene information base is built in the first functional control, and communication between the first functional control and the cloud server is achieved through the icon. And clicking a corresponding icon from the first functional control by the user according to a preset requirement, and acquiring virtual scene information corresponding to the icon from the cloud server.

S602: and starting the first function control and sending an opening instruction to the camera of the mobile phone equipment.

In the embodiment of the application, the user clicks a first function control in the mobile phone camera application program, namely, the user inputs a first function opening instruction. The specific location of the first functionality control is shown in fig. 5A-5E. And the first function control sends the generated opening instruction to the camera of the mobile phone equipment. The device camera can be a front camera or a rear camera.

In one possible implementation, the first functionality control is connected to the device camera through an MIPI interface. The MIPI interface includes camera serial interface (camera serial interface, CSI) and the like. The first functional control is communicated with the equipment camera through the CSI interface, and the transmission of the starting instruction and the image information is realized.

S603: the camera collects monocular images of the real scene based on the opening instruction and sends monocular image information to the first functional control.

The camera obtains a monocular image of the real scene based on the opening instruction. In the embodiment of the application, the monocular image is an RGB color image. Illustratively, a monocular image of a real scene captured by a camera based on an open instruction is shown in fig. 4. And the camera sends the acquired RGB color image to the first functional control to execute the first functional operation. For example, the camera sends the collected RGB color image to the first functional control through the CSI interface.

S604: the first functional control fuses the received virtual scene information and monocular image information transmitted by the cloud server, and a first functional image is obtained.

The first functional control receives a monocular image acquired by the camera and can receive virtual scene information transmitted by the cloud server. And the relative depth information map and the feature point map are acquired through monocular image processing. Since the real depth of the outdoor real scene is often difficult to obtain, people can obtain only the relative depth, i.e. the front-back relationship of the object, according to experience, occlusion relationship, optical fibers, shadows, etc. And acquiring image depth information by shifting, scaling and the like of coordinate points for the acquired relative depth information map. Where the image depth information is absolute information (true depth information) of the image. For specific processing, see fig. 7A below, and/or a method flowchart shown in fig. 8A. And fusing the monocular image with the virtual scene image based on the image depth information to obtain a first functional image.

The following description is directed to a specific manner of acquiring image depth information.

Referring to fig. 7A, a flowchart of a method for obtaining image depth information according to an embodiment of the present application is provided.

The method comprises the following steps:

s701: a plurality of feature points of the image are acquired.

The feature points are points which have depth information in a scene graph shot by a camera, have sharp characteristics and can effectively reflect the essence of an image. In one embodiment, the feature point may also include a confidence level for the point, indicating the probability that the true value of the point falls around the measurement. The greater the probability, the higher the confidence.

Exemplary description: known as (i)iThe characteristic points are expressed as%u _i ，v _i ，y _i ，k _i ). Wherein, the method comprises the following steps ofu _i ，v _i ) Denoted as the firstiThe two-dimensional coordinates of the individual feature points,y _i denoted as the firstiAbsolute depth information of the individual feature points,k _i is the firstiConfidence of each feature point, representing that the probability that the feature point coordinates fall around the coordinate value isk _i 。

The confidence coefficient of the feature points is obtained simultaneously when the feature point coordinates are obtained, feature points with different precision can be treated differently, and the influence of the feature points with low confidence coefficient is reduced, so that the fitting precision can be improved.

Further, a feature point extraction control is built in the photographing application program, and for the image in photographing, feature points of the image are directly acquired. In one embodiment, a feature point extraction control is built in an "AR special effect" function control of a "camera" program of the "mobile phone", and for an image shot by the camera, the feature point extraction control is uploaded to the "AR characteristic" function control.

In one embodiment, the feature point extraction control may be a simultaneous localization and mapping (Simultaneous Localization and Mapping, SLAM) algorithm, with the feature points obtained being SLAM feature points.

The method for extracting the characteristic points by the SLAM algorithm comprises the following steps: and reading an image shot by the camera, detecting the position of the corner, calculating descriptors according to the position of the corner, and matching the descriptors in the image to obtain SLAM feature points.

The corner points are isolated points with the maximum or minimum attribute intensity in the image, such as pixel points corresponding to the maximum gradient local of gray scale, and the points with the highest gradient value and gradient direction change rate in the image. The descriptors are vectors around the corner points. In one embodiment, 128 or 256 pairs of points p and q may be randomly taken around the corner, and if p is greater than q, it is noted as a first value, otherwise, it is noted as a second value, and the descriptor is a set of 128-bit sequences or 256 sequences consisting of the first value and the second value. In one embodiment, the first value is 0 and the second value is 1. In yet another embodiment, the first value is 1 and the second value is 0.

Exemplary, as shown in fig. 7B, a map of SLAM feature points is obtained by processing the image shown in fig. 4 based on a SLAM algorithm.

In fig. 7B, SLAM feature points are marked with "+", and the SLAM feature points are unevenly distributed, for example, the feature points at the sofa on the right side and the white cabinet on the middle desk lamp and the left side are centrally distributed, and the area above the sofa has no effective SLAM feature points.

S702: and acquiring weight coefficients from each characteristic point to the target point on the image.

The weight coefficient is used for representing the influence degree of each characteristic point on the image on the target point. In the embodiment of the application, the target point is a pixel point of the image.

In one embodiment, the weight coefficient is a number within a range (0-1) obtained based on a distance-weight mapping function. Wherein, the distance-weight mapping function, the independent variable is distance, and the dependent variable is weight coefficient.

The larger the distance, the smaller the weight coefficient; the smaller the distance, the larger the weight coefficient. That is, the distance-weight mapping function is a value range of (0, 1), inxThe axis (representing distance) is a monotonically decreasing function on the positive half axis. In one embodiment, the distance-weight mapping function may be a gaussian function, expressed in terms of:

in the formula (3), the amino acid sequence of the compound,athe Gaussian coefficient can be adjusted by self according to the requirement.bAs an average value of the distances from each feature point to the target point,cthe standard deviation of the distances from each characteristic point to the characteristic points is used for representing the fluctuation condition of the distances.

In one embodiment, the distance may be a Euclidean distance. Exemplary, the first shown in step S701i(i=1, 2, … … n) feature points are expressed as%u _i ，v _i ，y _i ，k _i ) Then (1)iThe characteristic points reach the target pointsu _j ，v _j ) Distance of (j.gtoreq.1 positive integer):

then the firstiThe characteristic points reach the target pointsu _j ，v _j ) Weight coefficient of (2)

Wherein,,nis the number of feature points.

In one embodiment, the distance-weight mapping function may also be related to confidence, illustratively, the first in step S701iThe characteristic points are expressed as%u _i ，v _i ，y _i ，k _i ) Then (1)iThe characteristic points reach the target pointsu _j ，v _j ) Weight coefficient of (2)

When the weight coefficient is calculated, the confidence coefficient of the feature points is considered, different feature points are treated differently, and the influence of the feature points with low confidence coefficient is reduced, so that the fitting precision is improved.

S703: and calculating a bias coefficient at the target point, and adjusting the weight coefficient by using the bias coefficient.

For the target pointu _j ，v _j ) The weight coefficient obtained by the distance-weight calculation function isw(i，u _j ，v _j )。

In the embodiment of the application, the weight coefficient is adjusted through the offset coefficient, so that the problem that the weight coefficient is almost 0 due to uneven distribution of the characteristic points on the image, such as partial vacancies, is avoided. The bias coefficient takes the value range of (0, 1) and is used for adjusting the weight coefficient, when the weight coefficient is smaller, such as almost 0, the bias coefficient is adjusted by using a larger bias coefficient, otherwise, the bias coefficient is adjusted by using a smaller bias coefficient.

In one possible implementation, considering that the feature points are unevenly distributed on the image, there are some feature points on the image that are relatively more distributed and some that are partially empty. For places with more feature point distribution, the weight coefficient distribution from each feature point to the target point is uneven, and for the partial vacancy, the weight coefficient distribution from the feature point to the target point is almost 0 because no effective feature point exists.

In the embodiment of the application, the bias coefficient and the discrete value of the weight coefficient from each characteristic point to the target point are in a negative correlation relationship. The discrete value of the weight coefficient is used to represent the degree of dispersion of the weight coefficient. Therefore, for the partial vacancy, the weight coefficient reaching the target point is almost 0, namely the difference value is smaller, and the bias coefficient is larger at the moment, so that the problem that the weight coefficient is almost 0 can be avoided by adjusting the weight coefficient by using the larger bias coefficient.

In one possible implementation, the bias factor is inversely related to the standard deviation of the weight factor from the other feature point to the target point. Wherein the standard deviation is used to represent the degree of dispersion of the weight coefficients. Exemplary description: bias coefficient：

Where std () is a standard deviation function. To adjust the parameters of the standard deviation influencing the magnitude, +.>For example +.>。

Further, the bias coefficient obtained in the above manner is used to adjust the weight coefficient. In one possible implementation, the bias coefficients and the weight coefficients are added, and the addition results form a diagonal matrix whose diagonal elements are the sum of the bias coefficients and the weight coefficients, and this diagonal matrix is the weight matrix. In the above exemplary description, the weight matrix

The weight matrix is expressed in the following specific form:

the application adjusts the weight coefficient from each characteristic point to the target point through the bias coefficient. Wherein the bias factor is smaller when the weight factor discrete value is large, and is larger when the difference is smaller, such as a partial gap. The weight coefficient is adjusted by the bias coefficient, so that the weight coefficient can fluctuate within a preset distance range. Exemplary: the preset distance ranges from 0.95 to 1.05. For example, the weight coefficient obtained after adjustment is 0.99, and fluctuates in this range, so as to meet the adjustment requirement. By adjusting the weight coefficient by the right bias coefficient, the difference of each characteristic point can be averaged, and the result tends to be global fit.

S704: transform coefficients at the target point are calculated.

As described above, currently, the transform coefficients, that is, the offset coefficient and the scaling coefficient are calculated, often obtained by performing curve fitting on a plurality of feature points. By way of example, the present application takes a curve fitting manner of least square method as an example, and in combination with step S703, a weight matrix is obtained, and how to obtain transform coefficients is described in detail.

Suppose that the target point is [ ]u _j ，v _j ) Transform coefficients of (2) areβ _j =(s _j ，μ _j ) Wherein the scaling factor iss _j The offset coefficient isμ _j . The least square method is as follows:

wherein,,d _i for the relative depth of the ith feature point,k _i is the firstiConfidence of each feature point, lambda is the coefficient of scalings _j Is aimed at ensuring that the scaling factor iss _j It is significant that the person skilled in the art can set itself as required.

Derivation calculation of equation (13), determination:

wherein,,is thatβ _j Is used for the estimation of the (c),W _j is a weight matrix.XIs of relative depthx _i =[d _i ,1]A matrix is formed which is a combination of the two,X ^T is a matrixXIs used to determine the transposed matrix of (a),yis an absolute depth matrix.

In this way, the weight matrix obtained in the above step, the relative depth matrix composed of the relative depth values at each feature point, and the absolute depth matrix composed of the absolute depth values obtained at each feature point are calculated as estimation values of the transform coefficients.

S705: transform coefficients of other target points on the image are acquired.

In order to better acquire depth information, i.e. absolute depth information, on an image, it is necessary to calculate transform coefficients of other target points on the image, so that the absolute depth information is acquired by scaling and shifting the relative depth information in combination with the transform coefficients.

In one possible implementation, the transform coefficients of other target points on the image may be obtained by interpolation. The transformation coefficients of other target points of the image can be obtained through a single line interpolation method or a double line interpolation method. The embodiment of the application does not limit the specific interpolation method of the image.

Exemplary description: taking bilinear interpolation as an example, transform coefficients of other target points are obtained.

Through the above example, the transformation coefficients of other target points can be calculated according to the obtained coordinates and transformation coefficients of four adjacent target points. Suppose that the coordinate values of the four target points are respectively%u ₁ ,v ₁ )，(u ₁ ,v ₁ +1)，(u ₁ +1,v ₁ ) And%u ₁ +1,v ₁ +1), the corresponding transform coefficient sets are:

in order to enable those skilled in the art to better understand the two-line interpolation according to the present application, the scaling coefficient and the offset coefficient of each coordinate point are obtained, and are represented by specific data below.

Assuming that the acquired fixed discrete depth point is (104,615,202,941), the coordinate values of the four target points are (104,202), (104,203), (105,202), (105,203), respectively.

Then:

substituting the value of the formula (16) to obtain the scaling coefficient and the offset coefficient corresponding to other target points.

In the embodiment of the application, a group of scaling parameters are calculated for each coordinate point by an interpolation method instead of calculation, so that the calculation workload can be reduced and the calculation efficiency can be improved.

S706: image depth information of each target point is acquired.

And carrying out scaling and offset transformation on the obtained relative depth value through scaling coefficients and offset coefficients in the transformation coefficients, so that the depth information of the image can be obtained. The specific image depth information acquisition formula is as follows:

wherein,,dabs(u _j ，v _j ) Is the firstjAbsolute depth values of the individual target points.drel(u _j ，v _j ) Is the firstjRelative depth values of the target points.s _j ，μ _j Respectively the firstjScaling coefficients and offset coefficients corresponding to the transform coefficients of the target points.

Exemplary: for fig. 3A, the confidence of the feature point is considered in the weight coefficient by calculating the weight coefficient of each feature point to the target point. And then adjusting the weight coefficient by using the standard deviation of the weight coefficient to obtain a weight matrix. The transform coefficients at the target point are calculated using equation (15). And then, obtaining the transformation coefficients of other target points on the image by using a bilinear interpolation method, and obtaining a depth information map of the image based on a formula (22) by using the transformation coefficients of the target points of the obtained image. See fig. 7C for detailed view information.

In the embodiment of the application, when the transformation coefficient is obtained by utilizing the characteristic points to perform curve fitting, the influence of the weight coefficient based on the confidence coefficient is considered, and the weight coefficient is adjusted by utilizing the bias coefficient.

Specifically, for the feature points at the partial vacancy positions, the degree of dispersion of the weight coefficients is small, and the weight coefficients are adjusted by using larger bias coefficients. For the feature points at other positions, the degree of dispersion of the weight coefficient is large, and the weight coefficient is adjusted by using a smaller bias coefficient. Therefore, the influence of each characteristic point on the image on the target point on the image is averaged, the problem that the partial blank of the depth image information is caused by partial blank is solved, and the accuracy of acquiring the depth information of the image is improved.

In addition, referring to fig. 8A, another flowchart of a method for obtaining image depth information according to an embodiment of the present application is provided for further improving timeliness of obtaining image depth information. The method comprises the following steps:

s801: a plurality of feature points in the image is acquired.

Exemplary description: known as (i)iThe characteristic points are expressed as%u _i ，v _i ，y _i ，k _i ) Wherein, it is characterized byu _i ，v _i ) Denoted as the firstiThe two-dimensional coordinates of the individual feature points,y _i denoted as the firstiAbsolute depth information of the individual feature points,k _i is the firstiConfidence of each feature point, representing that the probability that the feature point coordinates fall around the coordinate value isk _i 。

Exemplary, as shown in fig. 8B, a map of SLAM feature points is obtained by processing the monocular image shown in fig. 4 based on a SLAM algorithm. In fig. 8B, SLAM feature points are marked with "+", and the SLAM feature points are unevenly distributed, for example, feature points at the sofa on the right side and the white cabinet on the middle desk lamp and the left side are centrally distributed, and no effective SLAM feature points exist in the upper non-area.

S802: and acquiring a weight coefficient from each characteristic point on the image to each pixel point on the image.

The weight coefficient is used for representing the influence degree of each characteristic point on the image on the target point. In one embodiment, the weight coefficient is a number within a range (0-1) obtained based on a distance-weight mapping function.

Wherein, the distance-weight mapping function, the independent variable is distance, and the dependent variable is weight coefficient. The larger the distance, the smaller the weight coefficient, and correspondingly, the smaller the distance, the larger the weight coefficient. That is, the distance-weight mapping function is a value range of (0, 1), inxThe axis (representing distance) is a monotonically decreasing function on the positive half axis. In one embodiment, the distance-weight mapping function may be a gaussian function, expressed in terms of:

wherein,,the Gaussian coefficient can be adjusted by self according to the requirement. />For the average value of the distances from each point to the characteristic point, +.>The standard deviation of the distances from each point to the characteristic points is used for representing the fluctuation condition of the distances.

In one embodiment, the distance may be a Euclidean distance. Exemplary, the first shown in step S801i(i=1, 2, … … n) feature points are expressed as%u _i ，v _i ，y _i ，k _i ) Then (1)iThe characteristic points are changed into pixel pointsu，v) Distance of (2):

then the firstiThe characteristic points are changed into pixel pointsu，v) Weight coefficient of (2)

Wherein,,nis the number of feature points.

In one embodiment, the distance-weight mapping function may also be related to confidence, illustratively, the first in step S801iThe characteristic points are expressed as%u _i ，v _i ，y _i ，k _i ) Then (1)iThe characteristic points are changed into pixel pointsu，v) Weight coefficient of (2)

S803: and obtaining a characteristic point distribution density map.

The characteristic point distribution density map is a characteristic point distribution map schematic diagram obtained by accumulating a plurality of weight coefficients of each pixel point in a characteristic point dimension. The characteristic point distribution density map is used for representing the density condition of characteristic points on the image. Specifically, the weight coefficients are accumulated in the feature point dimension to obtain a weight accumulated value, and the larger the weight accumulated value is, the denser the feature points at the positions of the pixel points on the image are represented, otherwise, the sparser the feature points at the positions on the image are represented.

Illustratively, as shown in fig. 8B, the feature point dimensions are accumulated by the weight coefficients, i.e.,

the weight coefficient is a weight coefficient considering the confidence, and the obtained feature point distribution density map is shown in fig. 8C. The part A represents that the characteristic points on the image are dense, namely the weight accumulated value of each pixel point is large, the part B represents that the characteristic points on the image are sparse, namely the weight accumulated value of each pixel point is smaller, and the part C represents that the characteristic points on the image are absent.

S804: and performing image blocking based on the characteristic point distribution density map to obtain local center points of all the blocks.

The image blocking refers to dividing an image into a plurality of blocks according to a preset rule. The L can be marked by blocks u,v) E 1, …, K. K represents the total number of blocks separated. Namely, if L%u,v) E1, the characteristic pointu,v) Belonging to the first block. Correspondingly, if L%u,v) E, K, then feature pointsu,v) Belonging to the K-th block.

In one possible implementation, the preset rule may be to divide adjacent pixels having a difference value of the weight accumulated value of the weight coefficient within a preset range into the same block. In one embodiment, a preset rule can be implemented through a super-pixel segmentation algorithm, and the image segmentation is performed on the feature point distribution density map.

The super-pixel segmentation algorithm groups pixels by using the similarity of features among the pixels, and uses a small amount of super-pixels to replace a large amount of pixels to express the picture features, so as to reduce the complexity of image post-processing. Common superpixel segmentation algorithms include simple linear iterative clustering (simple linear iterativeclustering, SLIC), normalized cuts (NCut), and the like. Taking an SLIC super-pixel segmentation algorithm as an example, the feature point distribution density map of fig. 8C is segmented.

S804.1: and initializing a cluster center.

Cluster centers are uniformly distributed in the image of fig. 8B according to the set number of super pixels. Assuming that the image has N characteristic points and is pre-segmented into K super pixels with the same size, the size of each super pixel is N/K, and the distance (also called step length) between adjacent cluster centers is:

S804.2: the cluster centers are reselected in the neighborhood of the cluster centers.

The neighborhood size of the cluster center selected in this implementation is 3*3. Firstly, calculating gradient values of all feature points in the neighborhood, and moving a clustering center to the position with the minimum gradient in the neighborhood.

S804.3: and assigning a label to each feature point in the neighborhood of the cluster center.

In this embodiment, the allocation label is used to indicate which cluster center the feature point belongs to. To accelerate convergence, the search range is expanded to 1 time the neighborhood range, i.e., 6*6.

S804.4: similarity is measured.

And for each searched feature point, calculating the similarity degree of the feature point and the cluster center closest to the feature point. And assigning the label of the most similar cluster center to the feature point. By iterating the process until convergence. The specific similarity measurement method is as follows:

d _c for the color difference between the feature points,d _s for the distance between the feature points,D’and m is a balance parameter for measuring the proportion of the color value and the space information in the similarity measurement.D’The larger the two feature points are, the more similar the two feature points are. First, thejThe color space information of each characteristic point is%l _j ,a _j ,b _j ) Two-dimensional coordinate information [ ] x _j ,y _j ). First, theiThe color information of each characteristic point is%l _i ,a _i ,b _i ) The two-dimensional coordinate information is%x _i ,y _i )。

Referring to fig. 8D, a schematic diagram of the image blocking result obtained based on K of 24×32 in fig. 8C in combination with the above implementation is shown.

Further, for each module in the image, a local center point per block may be calculated. Taking fig. 8D as an example, assume that the local center point of the j-th block ±u _j ，v _j ) The method can be obtained by the following steps:

wherein, the method comprises the following steps ofu _j ^* ，v _j ^* ) First, thejThe coordinates of the feature points of the block,j=1，2,……，K。

s805: calculating the bias coefficient of the target point, and adjusting the weight coefficient by using the bias coefficient.

For the local center point obtained in step S804u _j ，v _j ) As the target point, a weight coefficient is obtained by the distance-weight calculation formula described in step S802w(i，u _j ，v _j )。

The bias coefficient takes the value range of (0, 1) and is used for adjusting the weight coefficient, when the weight coefficient is smaller, such as almost 0, the bias coefficient is adjusted by using a larger bias coefficient, otherwise, the bias coefficient is adjusted by using a smaller bias coefficient. In one possible implementation, it is considered that the feature points are unevenly distributed on the image, some of the feature points on the image are relatively more distributed, and some of the feature points are partially vacant. For a place with a relatively large distribution of feature points, the discrete value of the weight coefficient of each feature point at the local center point is large at this time, and for a partial vacancy, the discrete value of the weight coefficient at the local center point is small at this time, and the weight coefficient is almost 0. In order to avoid the problem that the weight coefficient is almost 0 due to the image partial vacancy, the bias coefficient and the weight coefficient discrete value of each feature point at the partial center point are in a negative correlation mapping relation. In one possible implementation, the bias factor is inversely related to the standard deviation of the feature point weight factors at the local center points. Exemplary description: bias coefficient:

Where std () is a standard deviation function.aIn order to adjust the parameters of the standard deviation affecting the magnitude,a>0such asa =5。

Further, the bias coefficient obtained in the above manner is used to adjust the weight coefficient. In one possible implementation, the bias coefficients are added to the weight coefficients, and the addition results in a diagonal matrix whose diagonal elements are bias coefficient weighted weight coefficients, this diagonal matrix being the weight matrix. In the above exemplary description, the weight matrix

If the following steps are made:

the weight matrix is expressed in the following specific form:

the application adjusts the weight coefficient from each characteristic point to the local center point through the bias coefficient, when the bias coefficient is larger in the discrete value of the weight coefficient, the value is smaller, and when the discrete value is smaller, such as the local vacancy, the bias coefficient is larger. The weight coefficient is adjusted by the bias coefficient, so that the weight coefficient can fluctuate within a preset distance range. Exemplary: the preset distance ranges from 0.95 to 1.05. For example, the weight coefficient obtained after adjustment is 0.99, and fluctuates in this range, so as to meet the adjustment requirement. By adjusting the weight coefficient by the right bias coefficient, the difference of each characteristic point can be averaged, and the result tends to be global fit.

S806: transform coefficients at the target point are calculated.

As described above, currently, the transform coefficients, i.e., the offset coefficient and the scaling coefficient, are calculated by curve fitting through a plurality of feature points. By way of example, the present application describes how to obtain the transform coefficients in detail by taking the curve fitting mode of the least square method as an example, and combining the local center points obtained in step S805 and the weight coefficients obtained in step S802.

Assume the firstjLocal center point of block [ ]u _j ，v _j ) For a transform coefficient ofβ _j =(s _j ，μ _j ) Wherein the scaling factor iss _j The offset coefficient isμ _j I.e. the one in question. The least square method is as follows:

Deducing and calculating the formula (1), and determining:

Thus, the obtained weight matrix, the relative depth matrix formed by the relative depth values at each characteristic point and the absolute depth matrix formed by the absolute depth at each characteristic point in each block local center point in the characteristic point distribution density map can be used for calculating the estimated value of the transformation coefficient.

S807: transform coefficients of other target points on the image are acquired.

In order to better acquire depth information, i.e., absolute depth information, on an image, a transformation coefficient calculation needs to be performed on each target point on the image, so that the image absolute depth information is acquired by scaling and shifting the relative depth information in combination with scaling coefficients and shifting coefficients.

In one possible implementation, the transform coefficients for each target point on the image may be obtained by interpolation. The transformation coefficients of each target point on the image can be obtained by a single line interpolation method or a double line interpolation method. The embodiment of the application does not limit the specific interpolation method of the image.

Exemplary description: taking bilinear interpolation as an example, a scaling coefficient and an offset coefficient of each coordinate point are obtained.

For obtaining the first, as in the above embodimentjLocal center point of block [ ]u _j ，v _j ) Transform coefficients of (a)Wherein the scaling factor iss _j The offset coefficient isμ _j Can obtain and the firstjFour local center points adjacent to a local center point of a blockj ₁ , j ₂ , j ₃ , j ₄ Corresponding transform coefficients:

wherein,,j ₁ , j ₂ is the firstjTwo adjacent center points from left to right above the block local center point.j ₃ , j ₄ Is the firstjTwo adjacent center points from left to right below the block local center point. Obtaining the first by using a bilinear interpolation function jLocal center point of block [ ]u _j ，v _j ) The specific formula is as follows:

(u ₁ ，v ₁ ) And%u ₁ ，v ₁ ) Respectively isj ₁ Andj ₄ corresponding local center points.

To better illustrate bilinear interpolation acquisitionjThe transform coefficients of the local center points of the blocks are represented below by specific values.

j ₁ , j ₂ , j ₃ , j ₄ The corresponding local center points are (104,202), (104,203), (105,202), (105,203). The j-th block has a local center point (104.615,202.941).

Then:

namely:

substituting the value of formula (40) to obtain the firstjTransform coefficients corresponding to local center points of the blocks.

Therefore, in the embodiment of the application, interpolation is used for replacing calculation of a group of scaling parameters for each coordinate point, so that the calculation workload can be reduced, and the operation efficiency can be improved.

S808: depth information of each target point on the image is acquired.

And carrying out scaling and offset transformation on the acquired relative depth value through the scaling coefficient and the offset coefficient, so that the depth information of the image can be acquired. The specific image depth information acquisition formula is as follows:

wherein,,dabs(u _j ，v _j ) Is the firstjAbsolute depth values of the individual target points.drel(u _j ，v _j ) Is the firstjRelative depth values of the target points.s _j ，μ _j Respectively the firstjScaling corresponding to transformation coefficients of each target pointCoefficients and offset coefficients.

Specifically, for the image super-pixel segmented image acquired in fig. 8B, by acquiring the local center point of the j-th block

(u _j ，v _j ) Wherein, the method comprises the steps of, wherein,

then utilize the bias coefficient

Adjusting the weight coefficient to obtain a weight matrix

Based on the manner in which the scaling coefficient and the offset coefficient are acquired in the exemplary manner in step S806, namely:

and calculating the scaling coefficient and the offset coefficient of each local center point, and obtaining the scaling coefficient and the offset coefficient of each coordinate point on the image by using a bilinear interpolation method. Finally, the image absolute depth information is obtained based on the image absolute depth formula, as shown in fig. 8E. As can be seen from fig. 8E, the image depth information obtained by the method avoids abnormal blank areas of the image caused by uneven distribution of feature points, such as partial gaps.

According to the image depth information acquisition method provided by the embodiment of the application, the characteristic point distribution density map is introduced, the image is segmented, the local center point of each block is acquired, the local center point is taken as a target point, the bias coefficient is calculated according to the degree of dispersion of the weight coefficient from each characteristic point to the target point, and the weight coefficient from each characteristic point to the target point is adjusted by using the bias coefficient. And optimizing curve fitting by using the adjusted weight coefficient from each characteristic point to the target point to obtain a transformation coefficient. Therefore, on the premise of ensuring the accuracy of acquiring the absolute depth information map, as the transformation coefficient of each pixel point is not required to be calculated, only the transformation coefficient of the local center point in the block is required to be calculated, the calculation workload is reduced, and the working efficiency is improved.

The method for acquiring image depth information according to the embodiment of the present application is described in detail above with reference to fig. 1 to 8E, and the apparatus for acquiring image depth information according to the embodiment of the present application is described below with reference to the accompanying drawings.

Referring to fig. 9, a schematic structural diagram of an apparatus for obtaining image depth information according to an embodiment of the present application, the apparatus 900 includes:

a first acquisition unit 901 for acquiring a plurality of feature points of an image.

A second acquiring unit 902, configured to acquire a weight coefficient from each feature point to the target point on the image.

A first calculating unit 903 for calculating a bias coefficient at the target point, and adjusting the weight coefficient using the bias coefficient.

A second calculation unit 904 for calculating a transform coefficient at the target point.

A third obtaining unit 905 is configured to obtain transform coefficients of other target points on the image.

A fourth acquiring unit 906 for acquiring image depth information of each target point.

Optionally, the apparatus 900 may further include:

and a fifth acquisition unit for acquiring the characteristic point distribution density map.

And the blocking unit is used for carrying out image blocking based on the characteristic point distribution density map to obtain the center point of each block.

According to the image depth information acquisition device, the weight coefficient can be adjusted through the offset coefficient, so that the problem of partial blank of depth image information caused by partial blank can be solved, and the accuracy of acquiring the image depth information is improved.

The foregoing and other operations and/or functions of each module/unit of the apparatus 900 according to the embodiment of the present application are respectively for implementing the corresponding flow of each method in the embodiment shown in the image depth information obtaining method, and are not described herein for brevity.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

The embodiment of the application also provides a computer readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform the AI model training method described above.

The embodiment of the application also provides a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computing device, the processes or functions in accordance with embodiments of the present application are fully or partially developed. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computing device, or data center to another website, computing device, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer program product may be a software installation package that can be downloaded and executed on a computing device in the event that any of the aforementioned AI model training methods are desired.

The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.

Claims

1. A method of acquiring image depth information, the method comprising:

acquiring a plurality of first weight coefficients from a plurality of feature points to a target point in an image; one of the first weight coefficients is used for representing the influence degree of one of the feature points on a transformation coefficient of the target point, and the transformation coefficient is used for representing a translation size and a scaling transformation size;

determining bias coefficients corresponding to the target points according to the dispersion of the plurality of first weight coefficients; the dispersion represents a degree of dispersion of the plurality of first weight coefficients; the bias coefficient and the dispersion are in a negative correlation relationship; adjusting the plurality of first weight coefficients using the bias coefficients;

2. The method of claim 1, wherein the dispersion is a standard deviation;

the determining a bias coefficient corresponding to the target point according to the dispersion of the plurality of first weight coefficients comprises:

determining bias coefficients corresponding to the target points based on a mapping relation between preset bias coefficients and standard deviations according to the standard deviations of the plurality of first weight coefficients; wherein the bias factor is a number in the range of 0 to 1.

3. The method of claim 1, wherein the acquiring a plurality of first weight coefficients for a plurality of feature points in the image to the target point comprises:

obtaining the distance from each feature point to the target point;

acquiring the plurality of first weight coefficients according to the distance from each characteristic point to the target point; the first weight coefficient is a number in a range of 0 to 1 in negative correlation with the distance.

4. A method according to claim 3, wherein said obtaining the plurality of first weight coefficients according to the distance from each feature point to the target point comprises:

acquiring the plurality of first weight coefficients according to the distance from each feature point to the target point and the confidence coefficient of each feature point; the first weight coefficient and the confidence coefficient are in positive correlation; the confidence is used to represent the probability that the true value of each feature point falls around the measurement.

5. The method according to claim 1, wherein the obtaining the transform coefficients of the target point according to the adjusted plurality of first weight coefficients comprises:

acquiring a weight matrix according to the adjusted first weight coefficient; the weight matrix is a diagonal matrix taking the adjusted first weight coefficient as a main diagonal element;

And acquiring the transformation coefficient of the target point based on a preset curve fitting mode according to the weight matrix.

6. The method of claim 5, wherein the predetermined curve fitting method is least squares.

7. The method according to claim 1, wherein the method further comprises:

acquiring transformation coefficients of a plurality of other target points in the image based on an interpolation algorithm according to the transformation coefficients of the target points;

and acquiring image depth information of the plurality of other target points according to the transformation coefficients of the plurality of other target points.

8. The method of claim 7, wherein the interpolation algorithm is a bilinear interpolation algorithm.

9. The method according to any one of claims 1-8, wherein the target point is a pixel of an image.

10. The method according to any one of claims 1-8, wherein the target point is a local center point, the target point being obtained by:

acquiring a second weight coefficient from each feature point to each pixel point in the feature points; the second weight coefficient is used for representing the influence degree of a characteristic point on the transformation coefficient of a pixel point;

Acquiring a characteristic point distribution density map of the image according to the acquired second weight coefficients;

dividing the characteristic point distribution density map based on a preset rule to obtain a plurality of divided characteristic point distribution density maps;

determining local center points corresponding to each segmented feature point distribution density map; each local center point is taken as one target point.

11. The method according to claim 10, wherein the acquiring the feature point distribution density map of the image according to the acquired plurality of the second weight coefficients includes:

acquiring a weight accumulated value of each pixel point; the weight accumulated value is an accumulated value of a plurality of second weight coefficients corresponding to each pixel point;

and acquiring the characteristic point distribution density map according to the weight accumulated value.

12. The method of claim 11, wherein the predetermined rule is that a difference value of the weight accumulated value is within a predetermined difference range, and a plurality of pixels having a distance between pixels within a predetermined distance range are distributed in the same block.

13. The method of claim 12, wherein the predetermined rule is a super-pixel segmentation algorithm.

14. An electronic device, the electronic device comprising:

a memory and a processor, the memory coupled with the processor;

the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the method of any of claims 1-13.

15. A computer readable storage medium comprising computer readable instructions which, when run on a computing device, cause the computing device to perform the method of any of claims 1-13.