WO2023051215A1

WO2023051215A1 - Gaze point acquisition method and apparatus, electronic device and readable storage medium

Info

Publication number: WO2023051215A1
Application number: PCT/CN2022/117847
Authority: WO
Inventors: 孙哲
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-09-30
Filing date: 2022-09-08
Publication date: 2023-04-06
Also published as: CN113900519A

Abstract

Embodiments of the present application disclose a gaze point acquisition method and apparatus, an electronic device and a readable storage medium. The method comprises: acquiring a first gaze point, the first gaze point being a gaze point obtained by inputting a gaze state image into a first network model; inputting the first gaze point and historical gaze point distribution information into a second network model, and acquiring a second gaze point outputted by the second network model, wherein the historical gaze point distribution information represents the distribution of second gaze points historically outputted by the second network model; and acquiring a target gaze point according to the second gaze point. Therefore, by means of the described method, the first gaze point outputted by the first network model will be further inputted into the second network together with the historical gaze point distribution information representing the distribution of the second gaze points historically outputted by the second network model, and then the target gaze point is acquired according to the second gaze point outputted by the second network model, thereby improving the accuracy of the target gaze point.

Description

Method, device, electronic device and readable storage medium for obtaining gaze point

Cross References to Related Applications

This application claims priority to Chinese Application No. 202111161492.9 filed September 30, 2021, which is hereby incorporated by reference in its entirety for all purposes.

technical field

The present application relates to the technical field of artificial intelligence, and more specifically, to a gaze point acquisition method, device, electronic equipment, and readable storage medium.

Background technique

With the development of technology, the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user. However, there is still a problem that the detection accuracy needs to be improved in the related manner of detecting the gaze position of the user.

Contents of the invention

In view of the above problems, the present application proposes a gaze point acquisition method, device, electronic device, and readable storage medium, so as to improve the above problems.

In a first aspect, the present application provides a fixation point acquisition method, which is applied to an electronic device, and the method includes: acquiring a first fixation point, which is obtained by inputting a fixation state image into a first network model The point of fixation; The first point of fixation and historical point of fixation distribution information are input to the second network model to obtain the second point of fixation output by the second network model, wherein the historical point of fixation distribution information represents the The distribution of the second gaze point historically output by the second network model; and obtaining the target gaze point according to the second gaze point.

In a second aspect, the present application provides a fixation point acquisition device that runs on an electronic device, and the device includes: a first fixation point acquisition unit, configured to acquire a first fixation point, and the first fixation point is a gaze state Image input to the gaze point obtained by the first network model; the second gaze point acquisition unit is used to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the output of the second network model The second point of fixation, wherein, the distribution information of the historical point of fixation characterizes the distribution of the second point of fixation historically output by the second network model; the point of fixation determination unit is used to obtain the target gaze according to the second point of fixation point.

In a third aspect, the present application provides an electronic device, including one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, The one or more programs are configured to perform the methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above method is executed when the program code is running.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

FIG. 1 shows a schematic diagram of an application scenario of a gaze point acquisition method proposed in an embodiment of the present application;

FIG. 2 shows a flow chart of a method for obtaining a gaze point proposed in an embodiment of the present application;

FIG. 3 shows a schematic diagram of obtaining a first gaze point in an embodiment of the present application;

FIG. 4 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application;

Fig. 5 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application;

FIG. 6 shows a schematic diagram of a gaze point in an embodiment of the present application;

Fig. 7 shows a schematic diagram of different distances between the user's face and the electronic device in the embodiment of the present application;

FIG. 8 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application;

FIG. 9 shows a schematic diagram of the second gaze point in history in the embodiment of the present application;

FIG. 10 shows a schematic diagram of a gaze area in an embodiment of the present application;

Fig. 11 shows a schematic diagram of another gaze area in the embodiment of the present application;

FIG. 12 shows a flow chart of a gaze point acquisition method proposed by another embodiment of the present application;

FIG. 13 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application;

FIG. 14 shows a schematic diagram of a model training method proposed by an embodiment of the present application;

FIG. 15 shows a structural block diagram of a gaze point acquisition device proposed in an embodiment of the present application;

FIG. 16 shows a structural block diagram of a gaze point acquisition device proposed in another embodiment of the present application;

Fig. 17 shows a structural block diagram of an electronic device proposed by the present application;

Fig. 18 is a storage unit for storing or carrying program codes for realizing the fixation point acquisition method according to the embodiment of the present application according to the embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

With the development of technology, the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user. For example, in an information browsing scenario, the electronic device may determine whether to update the browsing information according to the detected position of the gaze point of the user, and the updating includes turning pages and the like. Furthermore, in some scenarios, the control operation corresponding to the button that the user is gazing at can be triggered according to the button that the user is gazing at.

However, the inventor found in research on related technologies for detecting the user's gaze position that the relevant method for detecting the user's gaze position still has the problem of insufficient detection accuracy. Moreover, in the related technology, in order to improve the detection accuracy, the user needs to watch the designated position on the screen of the electronic device according to the prompt of the electronic device before using it, which causes inconvenience to the user. Moreover, when there are many designated positions that need to be watched, it will also consume too much time for the user.

Therefore, in order to improve the above problems, the embodiments of the present application propose a gaze point acquisition method, device, electronic equipment, and readable storage medium. The method obtains the fixation point obtained by inputting the fixation state image into the first network model as the first fixation point, and then inputs the first fixation point and historical fixation point distribution information into the second network model to obtain the described The second gaze point output by the second network model, and then obtain the target gaze point according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.

The application scenarios involved in the embodiments of the present application are firstly introduced below.

In the embodiment of the present application, the provided gaze point acquisition method may be executed by an electronic device. In the manner of being executed by the electronic device, all the steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the electronic device. Furthermore, it may also be executed by a server. In this manner executed by the electronic device, all steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the server. In addition, it can also be executed cooperatively by the electronic device and the server. In this mode of cooperative execution by the electronic device and the server, some steps in the gaze point acquisition method provided by the embodiment of the present application are executed by the electronic device, while other parts of the steps are executed by the server.

Exemplarily, as shown in FIG. 1 , the electronic device 100 may perform the gaze point acquisition method including: acquiring a first gaze point, and the first gaze point is a gaze point obtained by inputting a gaze state image into a first network model . After the electronic device 100 obtains the first gaze point, it can send the first gaze point to the server 200, and then the server 200 executes the gaze point acquisition method including: inputting the first gaze point and historical gaze point distribution information into the first gaze point Two network models, obtaining the second gaze point output by the second network model, wherein the historical gaze point distribution information characterizes the distribution of the second gaze point historically output by the second network model; according to the second The gaze point acquires a target gaze point, and may also return the target gaze point to the electronic device 100 .

It should be noted that, in this method of cooperative execution by the electronic device and the server, the steps performed by the electronic device and the server respectively are not limited to the method described in the above examples. In practical applications, the electronic device can be dynamically adjusted according to the actual situation Steps performed by the device and the server respectively. Wherein, the electronic device may be a smart phone, a tablet computer, and the like.

The embodiments involved in this application will be introduced below in conjunction with the accompanying drawings.

Please refer to Fig. 2, a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:

S110: Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.

Among them, in the embodiment of the present application, when the user is using the electronic device, the electronic device can collect the image of the user's face through the image acquisition device provided by the electronic device, and then obtain the gaze state image, and then input the collected gaze state image into the first network model to get the first fixation point. That is to say, the first network model can directly output the first gaze point correspondingly according to the collected gaze state image.

It should be noted that, as a method, obtaining the first gaze point in the embodiment of the present application can be understood as that the electronic device is responsible for inputting the acquired gaze state image into the first network model, and obtaining the gaze point output by the first network model . In this way, the first network model can be directly deployed locally on the electronic device, and after the electronic device collects the gaze state image through its own image acquisition device, it can input the collected gaze state image to the local first network In the model, the first gaze point output by the first network model is further obtained. Exemplarily, as shown in FIG. 3 , the electronic device 100 collects a gaze state image 10 , and then inputs the gaze state image 10 into the first network model 20 , and then obtains a first gaze point output by the first network model 20 .

As another manner, obtaining the first gaze point in the embodiment of the present application may be understood as obtaining the first gaze point output by other devices. In this manner, the electronic device can be understood as a device for obtaining a final target gaze point according to the first gaze point, and can return the finally determined first gaze point to the device that sends the first gaze point. Exemplarily, as shown in FIG. 4 , the electronic device 200 collects the gaze state image 10, then inputs the gaze state image 10 into the first network model 20, and then obtains the first gaze point output by the first network model 20, And the first gaze point can be transmitted to the electronic device 100 again, and then the electronic device 100 executes the gaze point acquisition method provided by the embodiment of the present application, and after performing the gaze point acquisition method to obtain the target gaze point, then the target gaze point Return to electronic device 200 .

As another way, after the electronic device collects the gaze state image, it transmits the collected gaze state image to other electronic devices, and then other electronic devices input the gaze state image into the first network model, and then the The first network model in the other electronic device outputs the first gaze point, and returns the outputted first gaze point to the electronic device. Exemplarily, as shown in FIG. 5 , after the electronic device 100 acquires the gaze state image 10, it can transmit the gaze state image 10 to the electronic device 300, and then the electronic device 300 then inputs the gaze state image 10 acquired to the In the local first network model, after the first gaze point output by the local first network model is obtained, the first gaze point is returned to the electronic device 100, and then the electronic device 100 executes this function based on the acquired first gaze point. The gaze point acquisition method provided in the application embodiment.

Wherein, the gaze state image may include an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the facial features of the face. Distribution situation, the key point image of the human face represents the position of the key point in the human face. Among them, the five key points in the human face may include the centers of two eyeballs, the nose and the two corners of the mouth.

S120: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.

Wherein, the electronic device may record the second gaze point output by the second network model after starting to run, and obtain historical gaze point distribution information according to the recorded second gaze point. Then, each time data is input to the second network model, besides the first gaze point acquired in S110, the current historical gaze point distribution information will also be included. Moreover, after the second gaze point output by the second network model is acquired, the second gaze point will be added to the historical gaze point distribution information. Exemplarily, in the case that the historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5, in the process of executing S120, input together with the first gaze point The historical gaze point distribution information to the second network model includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5. If the second gaze point output by the second network model is gaze point z6, after adding gaze point z6 to the historical gaze point distribution information, the latest historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point point z3, gaze point z4, gaze point z5, and gaze point z6.

It should be noted that one function of the second network model is to correct the first gaze point, so that the output second gaze point can more accurately represent the gaze position actually corresponding to the gaze state image. Moreover, the input data of the second network model includes historical gaze point distribution information, so that the second network model can improve the problem of error when the user gazes at the same position on the screen at different positions. Exemplarily, as shown in FIG. 6 , there is a position 40 in the electronic device, and in the relevant gaze position detection method, there will be a situation where the electronic device pays attention to the position 40 in different directions or different relative distances. The determined gaze position is not a matter of position 40. For example, as shown in FIG. 7, the left image of FIG. 7 and the right image of FIG. 7 respectively show two postures of the user holding the mobile phone, wherein, in the left image of FIG. 7, the user's face and The distance between the held mobile phone will be smaller than the distance between the user's face and the held mobile phone in the image on the right side of Figure 7, so in related technologies, even in the image on the left side of Figure 7 and the right side of Figure 7 In the case that the users shown in the side image are all looking at the same position, the final gaze point of the electronic device may be different. In the embodiment of the present application, because the second gaze point output by the second network model is Record to form the historical gaze point distribution information, and the historical gaze point distribution information will be used as the input of the second network model, so that the second network model can be used to greatly improve the user's gaze at the same position in different positions. error problem.

Wherein, the second network model may be a neural network regression (Quantile Regression Neural Network, QRNN).

It should be noted that during the operation of the second network model, the historical gaze point distribution information input into the second network model may belong to the same user, that is, the historical gaze point distribution information may represent the same user. The distribution of the user's gaze point on the screen, so the second network model can better obtain the user's gaze habit according to the historical gaze position of the same user, and then can more accurately determine the characteristic of the user's current gaze position The second fixation point, which in turn improves the accuracy of the second fixation point.

S130: Obtain a target gaze point according to the second gaze point.

Wherein, in the embodiment of the present application, the target gazing point may be understood as the position where the user is actually gazing determined by the electronic device. In other words, the target gaze point can be understood as the gaze point corresponding to the gaze state image. Moreover, if the target gaze point is related to the second gaze point, the target gaze point will be obtained according to the second gaze point. As a manner, the electronic device may use the second gaze point as the target gaze point.

A fixation point acquisition method provided in this embodiment is to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.

Please refer to FIG. 8, a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:

S210: Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.

S220: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.

S230: Obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, and the historical first gaze points are the first network gaze points A model is output based on a gaze state image input prior to said gaze state image input.

Exemplarily, as shown in FIG. 9, with the operation of the second network model, the first gaze point input to the second network model may include the first gaze point z7, the first gaze point z9, the first gaze point z11 and The first gaze point z13. Wherein, in the case of inputting the first gaze point z7, the second gaze point correspondingly output by the second network model is the second gaze point z8. In the case of inputting the first gaze point z9, the second gaze point correspondingly output by the second network model is the second gaze point z10. In the case of inputting the first gaze point z11, the second gaze point correspondingly output by the second network model is the second gaze point z12. In the case of inputting the first gaze point z13, the second gaze point correspondingly output by the second network model is the second gaze point z14. Then in this case, if it is determined that the number of multiple historical second fixation points acquired is 2, and the input to the second network model in S220 is the first fixation point z11, the corresponding multiple historical first fixation points acquired The two gaze points include the second gaze point z8 and the second gaze point z10. Moreover, if the first gaze point z13 is input to the second network model in S220, the corresponding multiple historical second gaze points include The second gaze point z10 and the second gaze point z12.

S240: Input the second gaze point and the multiple historical second gaze points into a third network model, and acquire a third gaze point output by the third network model.

Wherein, the third network model may be a long short-term memory artificial neural network (Long Short-Term Memroy). The long-term short-term memory network is a kind of time recurrent neural network, which can be used to solve the long-term dependence problem of the general recurrent neural network (Recurrent Neural Network), which belongs to a kind of time recurrent neural network. In the embodiment of the present application, in the process of determining the gaze point to be output, the third network model will not only refer to the input second gaze point, but also combine the multiple historical second gaze points, so that more accurate Determine the fixation point to which the fixation state image actually corresponds. Specifically, in this embodiment, the obtained multiple historical second gaze points are continuous in time with the second gaze points obtained according to the first gaze point, which means that input to the third network The second fixation point and multiple historical second fixation points in the model represent the user's continuous gaze operation in the recent period, and the long-short memory artificial neural network can memorize and transfer the relevant information of the last output content to the In the next output determination process, so that the third network model is a long-short-term memory artificial neural network, the third network model can determine the current output third gaze point in conjunction with the user's continuous gaze operations in the most recent period of time, thereby This enables the output of the third gaze point to be more stable and accurate.

S250: Use the third gaze point as a target gaze point.

As a method, the data processing parameters of the electronic device are obtained, and the processing parameters characterize the data processing capability of the electronic device; according to the data processing parameters, the number of the multiple historical second gaze points obtained is determined .

It should be noted that, the more data is input into the third network model, the third network model can relatively output a more accurate third gaze point representing the user's actual gaze position. However, correspondingly, the more data input into the third network model, the more data the third network model needs to process, so in the same model running environment, the more data the third network model needs to process If there are too many, it will cause the time consumed for each data output to be longer. In order to make the output of the third gaze point by the third network model have better adaptability, the device running the third network model can determine the acquired multiple The number of historical second fixations. Optionally, if the data processing capability of the electronic device represented by the data processing parameter is stronger, the number of multiple historical second fixation points acquired is larger. Correspondingly, if the data of the electronic device represented by the data processing parameter The weaker the processing capability, the smaller the number of acquired multiple historical second gaze points.

Optionally, the data processing parameters may include multiple parameters, then determining the number of the multiple historical second gaze points obtained according to the data processing parameters may include: obtaining scores corresponding to multiple parameters; The scores corresponding to each parameter are used to obtain the total score; according to the total score, the number of acquired multiple historical second fixation points is determined. Wherein, the electronic device may obtain scoring rules corresponding to multiple parameters, and then obtain the scoring rules corresponding to each parameter based on the scoring rules corresponding to each parameter, and add the scores corresponding to multiple parameters to obtain the total score , and then determine the number of multiple historical second fixation points to be acquired according to the total score and the number of historical second fixation points. Exemplarily, the multiple parameters may include: the number of processor cores, the main frequency of the processor, and available memory, etc. During the scoring process, if the score corresponding to the number of processor cores is p1, the corresponding The score is p2, and the score corresponding to the available memory is p3, then the total score obtained is p1+p2+p3.

This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.

Moreover, in this embodiment, after obtaining the second fixation point output by the second model, the second fixation point output this time and the multiple historical second fixation points output by the second network model can be recreated together. Input it into the third network model, and then obtain the third gaze point output by the third network model as the target gaze point. In this way, when different users gaze at the same position, the target gaze point of the electronic device is different. problem, thereby improving the accuracy of the final gaze point of the electronic device. And, in this embodiment, along with the increase of the quantity of the second fixation point output by the second network model, the quantity of the second fixation point included in the historical fixation point distribution information is also more, so the historical fixation point The distribution information can also more accurately record the user's habit of gazing at the screen, and then in the process of outputting the second gaze point through the second network model, as the number of times the second network model runs increases, the second network model can be more accurate. Accurate and more stable output of the second fixation point.

It should be noted that, as can be seen from the foregoing embodiments, in the embodiment of the present application, obtaining the target gaze point according to the second gaze point may include directly using the second gaze point as the target gaze point. Furthermore, obtaining the target gaze point according to the second gaze point may also include inputting the second gaze point and multiple historical second gaze points into the third network model, and obtaining the third gaze point output by the third network model , and take the third fixation point as the target fixation point. Then, in the case that there are multiple ways to obtain the target gaze point according to the second gaze point, the electronic device may determine which method to use to obtain the target gaze point according to current actual needs.

As a manner, the electronic device may determine according to the current application scenario which method is specifically adopted to determine the manner of acquiring the gaze point of the target. It should be noted that the user is usually using the electronic device in the process of using the electronic device to obtain his gaze point, and the user is usually using the application program in the electronic device in the process of using the electronic device. The inventor found in the research that different application programs have different requirements for the detection accuracy of the gaze point. Some application programs require more accurate gaze point detection, while some application programs have relatively low gaze point detection requirements. sketchy. For example, some applications will provide a gaze area. If it is detected that the user’s gaze on the gaze area meets the specified duration, the corresponding operation will be triggered. Usually, the area of the gaze area is relatively large, which makes the gaze The detection of the position can have a better error tolerance rate. For example, as shown in FIG. 10 , if it is detected that the user gazes at button 1 for a specified duration, the operation corresponding to button 1 will be triggered; if it is detected that the user gazes at button 2 for a specified duration, the operation corresponding to button 2 will be triggered. Moreover, as shown in FIG. 10 , the areas covered by button 1 and button 2 are relatively large, so that even if there is a certain error between the gaze point detected by the electronic device and the actual gaze point, it can still be judged more accurately Whether the user is looking at button 1 or button 2.

In other application scenarios, the corresponding gaze area is relatively small, and the electronic device may need to detect the actual gaze position more accurately to achieve more effective control. For example, as shown in FIG. 11 , the electronic device is in an information browsing scene (for example, web page browsing), and the interface corresponding to the information browsing scene includes a text area A, a text area B, a text area C, and a text area D. , text area E, text area F, and text area G. If the electronic device detects that the user is staring at the text area A for a long time, the page can be turned to the upper part shown in FIG. It is clear that each text area shown in FIG. 10 is small (smaller than the covered area of the button shown in FIG. 9 ), then a relatively precise gaze point is required to achieve an accurate page turning operation.

Among the two ways of obtaining the target gaze point provided in the embodiment of the present application, the obtained third gaze point has a higher probability than the second gaze point and can accurately represent the actual gaze point. Then based on the aforementioned method, obtaining the target gaze point according to the second gaze point includes: obtaining the current application scene, obtaining the way of determining the gaze point corresponding to the current application scene; and then determining the gaze point corresponding to the current application scene The fixation point is used to obtain the target fixation point. Moreover, the manner of determining the gaze point corresponding to the application scenario corresponds to the detection accuracy required by the application scenario. For example, if the way of determining the gaze point corresponding to the current application scenario is to use the second gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, the obtained second gaze point will be The fixation point serves as the target fixation point. If the method of determining the gaze point corresponding to the current application scenario is to use the third gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, multiple historical second gaze points will be obtained Point, input the second gaze point and multiple historical second gaze points into the third network model, and obtain the third gaze point output by the third network model as the target gaze point.

Optionally, the electronic device may determine the current application scene according to the application program running on the electronic device in the foreground during gaze point detection. For example, if the application currently running in the foreground is a text browsing program, then it can be determined that the current scene is an information browsing scene.

Please refer to FIG. 12 , a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:

S310: Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.

S320: Detect whether the first gaze point is valid.

As a manner, the detecting whether the first gaze point is valid includes: detecting whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is satisfied, determining that the first gaze point is valid. Wherein, the target state includes eyes being in an open state. In some cases, even if the user's eyes are closed, the first network model can still output the first gaze point, but the outputted first gaze point is invalid. Then, by screening whether the first gaze point is valid, the gaze point corresponding to the image in which the user is actually in the closed-eye state can be screened out, so as to avoid inputting the invalid first gaze point into the subsequent model.

S330: If the first gaze point is valid, input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical The gaze point distribution information represents the distribution of the second gaze points historically output by the second network model.

S340: Obtain a target gaze point according to the second gaze point.

If the first gaze point is invalid, end the procedure.

This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, in this embodiment, after obtaining the first fixation point, it is possible to first judge whether the first fixation point is valid, and then if the first fixation point itself is invalid, no subsequent follow-up processing, which in turn helps to improve the effectiveness of controlling the electronic device based on the point of gaze.

Please refer to FIG. 13 , a method for obtaining a gaze point provided by the present application, the method includes:

S410: Obtain the sample gaze state images and the labeled gaze points corresponding to each sample gaze state image.

S420: Train the first network model to be trained by using the sample gaze state images and the labeled gaze points corresponding to each sample gaze state image, to obtain the first network model.

S430: Obtain a fixation point output by the first network model to be trained during the training process as a first training fixation point.

S440: Train the second network model to be trained by using the first training gaze point, the distribution information of the historical second training gaze point distribution information, and the marked gaze point corresponding to each sample gaze state image, to obtain the first training gaze point. The second network model, wherein the historical second training fixation point distribution information includes the distribution of fixation points output by the second network model to be trained during the training process.

S450: If the second training fixation point output by the second network model to be trained is obtained, obtain a plurality of historical second training fixation points, wherein the historical second training fixation points are the second training fixation point of the second network to be trained The model is output according to the history of the first training fixation point of input, and the first training fixation point of the history is that the network model to be trained is input to the first network model to be trained according to the current sample gaze state image The sample gaze state image output of the current sample gaze state image is the sample gaze state image corresponding to the second training gaze point.

S460: Train a third network model to be trained by using the second training gaze point and the multiple historical second training gaze points, to obtain the third network model.

S470: Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.

S480: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second gaze point distribution information The distribution of the second fixation point historically output by the network model.

S490: Obtain a target gaze point according to the second gaze point.

Exemplarily, as shown in FIG. 14 , among the acquired multiple sample gaze state images, each sample gaze state image includes a left-eye image, a right-eye image, a face image, and images of five key points in the face. Wherein, the face image represents the relative distribution position of the facial features in the face. Among them, the five key points in the human face include the centers of two eyeballs, the nose and the two corners of the mouth.

After obtaining the sample gaze state image and the respective corresponding annotation gaze points (coordinate point true value in Fig. Part of the data is screened out from the gaze point to generate batch data (batch data), and then the batch data is input into the neural network model (the first network model to be trained) to make the neural network model perform inference, and output the predicted coordinate point ( The first training gaze point), and then calculate the loss combined with the true value of the coordinate point, and then train the neural network model according to the calculated loss to optimize the gradient of the neural network model, so that the loss obtained by the subsequent calculation is relatively reduced until the calculated loss is minimal.

It should be noted that, in this embodiment, S410 to S460 can be executed by the server. After the server executes the steps from S410 to S460, the complete first network model, second network model and third network model can be trained Deployed to the electronic device, so that the electronic device correspondingly executes the steps from S470 to S490 in the embodiment of the present application.

This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, this embodiment provides a training method for the first network model, the second network model and the third network model.

Please refer to FIG. 15 , a gaze point acquisition device 500 provided by the present application runs on an electronic device, and the device 500 includes:

The first gaze point acquisition unit 510 is configured to acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.

As a method, the gaze state image includes an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the face. The distribution of facial features in the human face, and the key point image of the human face represents the position of the key points in the human face.

The second gaze point acquisition unit 520 is configured to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point The point distribution information represents the distribution of the second fixation points historically output by the second network model.

The gaze point determining unit 530 is configured to acquire a target gaze point according to the second gaze point.

As a manner, the gaze point determination unit 530 is specifically configured to obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, so The first gaze point of the history is that the first network model is output according to the gaze state image input before the gaze state image input; the second gaze point and the multiple historical second gaze points are input to the second gaze point A three-network model, obtaining a third gaze point output by the third network model; using the third gaze point as a target gaze point. The gaze point determining unit 530 is also specifically configured to acquire data processing parameters of the electronic device, the processing parameters characterizing the data processing capability of the electronic device; Two the number of gaze points.

As a method, the second gaze point acquisition unit 520 is also used to detect Whether the first gaze point is valid; if the first gaze point is valid, input the first gaze point and historical gaze points to the second network model to obtain the second gaze point output by the second network model . Optionally, the second gaze point acquisition unit 520 is specifically configured to detect whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is met, determine that the first gaze point is valid.

A fixation point acquisition device provided in this embodiment is used to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.

As shown in Figure 16, the device 500 also includes:

The model training unit 540 is used to obtain the sample gaze state image and each sample gaze state image corresponding to the marked gaze point; by the sample gaze state image and each sample gaze state image respectively corresponding to the mark gaze point, to the first The network model to be trained is trained to obtain the first network model.

The model training unit 540 is also used to obtain the gaze point of the output of the first network model to be trained in the training process as the first training gaze point; through the first training gaze point, the history of the second training gaze point distribution information and the corresponding marked gaze point of each sample gaze state image, the second network model to be trained is trained to obtain the second network model, wherein the historical second training gaze point distribution information includes the Describe the distribution of gaze points output by the second network model to be trained during the training process.

The model training unit 540 is further configured to obtain a plurality of historical second training gaze points if the second training gaze point output by the second network model to be trained is obtained, wherein the historical second training gaze point is the The second network model to be trained is output according to the historical first training fixation point of input, and the first training fixation point of the history is that the first network model to be trained is input to the first fixation point according to the current sample gaze state image. A sample gaze state image output of a network model to be trained, the current sample gaze state image is a sample gaze state image corresponding to the second training gaze point; through the second training gaze point and the plurality of historical first Two training gaze points are used to train the third network model to be trained to obtain the third network model.

It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described devices and units can refer to the corresponding process in the foregoing method embodiment, and will not be repeated here. . In several embodiments provided in the present application, the coupling between the modules may be electrical. In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.

An electronic device provided by the present application will be described below with reference to FIG. 17 .

Please refer to FIG. 17 , based on the above-mentioned device control method and apparatus, the embodiment of the present application also provides an electronic device 1000 that can implement the aforementioned device control method. The electronic device 1000 includes one or more (only one is shown in the figure) processors 102 , a memory 104 , a camera 106 and an audio collection device 108 coupled to each other. Wherein, the memory 104 stores programs capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104 .

Wherein, the processor 102 may include one or more processing cores. The processor 102 uses various interfaces and circuits to connect various parts of the entire electronic device 1000, and executes or executes instructions, programs, code sets, or instruction sets stored in the memory 104, and calls data stored in the memory 104 to execute Various functions of the electronic device 1000 and processing data. Optionally, the processor 102 may adopt at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). implemented in the form of hardware. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used to render and draw the displayed content; the modem is used to handle wireless communication. It can be understood that the above modem may also not be integrated into the processor 102, but implemented by a communication chip alone. As one manner, the processor 102 may be a neural network chip. For example, it may be an embedded neural network chip (NPU).

The memory 104 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. For example, a data acquisition device may be stored in the memory 104 . The data acquisition device may be the aforementioned device 500 . The memory 104 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.

Furthermore, the electronic device 1000 may further include a network module 110 and a sensor module 112 in addition to the aforementioned devices.

The network module 110 is used to implement information interaction between the electronic device 1000 and other devices, for example, transmitting device control instructions, manipulation request instructions, and status information acquisition instructions. However, when the electronic device 200 is specifically a different device, its corresponding network module 110 may be different.

The sensor module 112 may include at least one sensor. Specifically, the sensor module 112 may include, but is not limited to: a level, a light sensor, a motion sensor, a pressure sensor, an infrared heat sensor, a distance sensor, an acceleration sensor, and other sensors.

Wherein, the pressure sensor may be a sensor for detecting pressure generated by pressing on the electronic device 1000 . That is, the pressure sensor detects pressure generated by contact or press between the user and the electronic device, eg, contact or press between the user's ear and the mobile terminal. Therefore, the pressure sensor can be used to determine whether contact or pressure occurs between the user and the electronic device 1000, and the magnitude of the pressure.

Among them, the acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is still, and can be used to identify the application of electronic equipment 1000 attitude (such as horizontal and vertical screen switching, related games, magnetometer, etc.) Attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc. In addition, the electronic device 1000 may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, and a thermometer, which will not be repeated here.

The audio collection device 110 is configured to collect audio signals. Optionally, the audio collection device 110 includes multiple audio collection devices, and the audio collection devices may be microphones.

As one way, the network module of the electronic device 1000 is a radio frequency module, and the radio frequency module is used to receive and send electromagnetic waves, realize mutual conversion between electromagnetic waves and electrical signals, and communicate with a communication network or other devices. The radio frequency module may include various existing circuit elements for performing these functions, such as antenna, radio frequency transceiver, digital signal processor, encryption/decryption chip, Subscriber Identity Module (SIM) card, memory and so on. For example, the radio frequency module can interact with external devices by sending or receiving electromagnetic waves. For example, a radio frequency module can send instructions to a target device.

Please refer to FIG. 18 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. Program codes are stored in the computer-readable medium 800, and the program codes can be invoked by a processor to execute the methods described in the foregoing method embodiments.

The computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium). The computer-readable storage medium 800 has a storage space for program code 810 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products. Program code 810 may, for example, be compressed in a suitable form.

In summary, the present application provides a fixation point acquisition method, device, electronic equipment, and readable storage medium to acquire the first fixation point, which is obtained by inputting the fixation state image into the first network model. Obtained fixation point, then input the first fixation point and historical fixation point distribution information to the second network model, obtain the second fixation point output by the second network model, and then obtain according to the second fixation point target gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, in the embodiment of the present application, because the second network model and the third network model can be used to make the final target gaze point more accurate and stable, therefore, it is not necessary for the user to follow the prompts of the electronic device when the user starts using it. Focus on the position for calibration operations, which saves the user's time and improves efficiency. Moreover, it also makes the solution provided by the embodiment of the present application better applicable to different users, and there will be no

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not drive the essence of the corresponding technical solutions away from the spirit and scope of the technical solutions of the various embodiments of the present application.

Claims

A gaze point acquisition method, characterized in that it is applied to electronic equipment, the method comprising:

Acquiring a first gaze point, the first gaze point is the gaze point obtained by inputting the gaze state image into the first network model;

The first gaze point and historical gaze point distribution information are input to the second network model, and the second gaze point output by the second network model is obtained, wherein the historical gaze point distribution information characterizes the second network model The distribution of the second gaze point of the historical output;

Acquiring a target gaze point according to the second gaze point.
The method according to claim 1, wherein said acquiring a target gaze point according to said second gaze point comprises:

Obtaining a plurality of historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, and the historical first gaze points are output by the first network model according to The gaze state image output that is input before the gaze state image input;

The second gaze point and the multiple historical second gaze points are input to a third network model, and the third gaze point output by the third network model is obtained;

The third gaze point is used as the target gaze point.
The method according to claim 2, further comprising:

Acquire data processing parameters of the electronic device, where the processing parameters represent the data processing capabilities of the electronic device;

The quantity of the acquired multiple historical second fixation points is determined according to the data processing parameter.
The method according to claim 3, characterized in that, if the data processing capability of the electronic device represented by the data processing parameter is stronger, then the number of multiple historical second fixation points acquired is larger, if the data processing The weaker the data processing capability of the electronic device represented by the parameter, the smaller the number of acquired multiple historical second gaze points.
The method according to claim 3, wherein the data processing parameters include a plurality of parameters, and the determination of the number of acquired second gaze points according to the data processing parameters includes:

Acquiring scores corresponding to each of the multiple parameters;

obtaining a total score based on scores corresponding to each of the plurality of parameters;

The number of acquired multiple historical second fixation points is determined according to the total score.
The method according to claim 5, wherein said obtaining a total score based on the respective scores corresponding to said plurality of parameters comprises:

Acquiring scoring rules corresponding to the multiple parameters;

A score corresponding to each parameter is obtained based on a scoring rule corresponding to each parameter.
The method according to claim 1, wherein said acquiring a target gaze point according to said second gaze point comprises:

Get the current application scenario;

Obtaining a way of determining a gaze point corresponding to the current application scene;

The target gaze point is acquired based on the way of determining the gaze point corresponding to the current application scene and the second gaze point.
The method according to claim 7, wherein the method of determining the gaze point corresponding to the current application scene and the second gaze point to obtain the target gaze point include:

If the way of determining the gaze point corresponding to the current application scenario is to use the second gaze point as the target gaze point, after obtaining the second gaze point output by the second network model, the acquired The second gaze point is used as the target gaze point.
The method according to claim 8, characterized in that the method further comprises:

If the way of determining the gaze point corresponding to the current application scenario is to use the third gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, obtain multiple historical second gaze points. fixation point, inputting the second fixation point and the plurality of historical second fixation points into a third network model;

Acquiring a third gaze point output by the third network model as a target gaze point.
The method according to claim 7, wherein acquiring the current application scenario comprises:

Determine the current application scenario based on the applications running in the foreground.
The method according to any one of claims 1-10, wherein the first gaze point and the historical gaze point are input to the second network model, and the second gaze point output by the second network model is obtained Previously also included:

Detecting whether the first gaze point is valid;

If the first gaze point is valid, perform the step of inputting the first gaze point and historical gaze points into the second network model, and obtain a second gaze point output by the second network model.
The method according to claim 11, wherein the detecting whether the first gaze point is valid comprises:

Detecting whether the eyeball state represented by the first fixation point satisfies the target state;

If the target state is met, it is determined that the first gaze point is valid.
The method according to any one of claims 1-12, wherein the gaze state image comprises an eye feature image, a face feature image and a face key point image, wherein the eye feature image represents the position of the iris As well as eyeball positions, the facial feature image represents the distribution of facial features, and the human face key point image represents the position of key points in the human face.
According to the method described in any one of claims 1-13, it is characterized in that said obtaining the first point of fixation, said first point of fixation is also included before the fixation point obtained by inputting the fixation state image into the first network model :

Obtaining the sample fixation state image and the marked fixation point corresponding to each sample fixation state image;

The first network model to be trained is trained by using the sample gaze state images and the marked gaze points corresponding to each sample gaze state image to obtain the first network model.
The method according to claim 14, characterized in that, after said obtaining the sample gaze state image and each corresponding mark gaze point of each sample gaze state image, it also includes:

Obtain the fixation point of the output of the first network model to be trained in the training process as the first training fixation point;

Through the first training fixation point, the historical second training fixation point distribution information and the corresponding labeling fixation point of each sample gaze state image, the second network model to be trained is trained to obtain the second network Model, wherein, the second training fixation point distribution information of the history includes

The distribution of gaze points output by the second network model to be trained during the training process.
The method according to claim 15, characterized in that, after obtaining the gaze point of the output of the first network model to be trained in the training process as the first training gaze point, it also includes:

If the second training fixation point output by the second network model to be trained is obtained, a plurality of historical second training fixation points are obtained, wherein the second training fixation point of the history is the second training fixation point of the second network model to be trained according to The first training fixation point of the input history is output, and the first training fixation point of the history is the first network model to be trained according to the sample input to the first network model to be trained before the current sample gaze state image The fixation state image output, the current sample fixation state image is the sample fixation state image corresponding to the second training fixation point;

The third network model to be trained is obtained by using the second training gaze point and the plurality of historical second training gaze points to train the third network model.
The method according to any one of claims 1-16, characterized in that before acquiring the first gaze point, it also includes:

The image of the user's face is collected by the image collection device of the electronic device to obtain a gaze state image.
A gaze point acquisition device is characterized in that it operates on electronic equipment, and the device includes:

The first fixation point acquisition unit is used to acquire the first fixation point, and the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model;

The second gaze point acquisition unit is used to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point The distribution information characterizes the distribution of the second fixation point historically output by the second network model;

A gaze point determining unit, configured to acquire a target gaze point according to the second gaze point.
An electronic device, characterized in that it includes one or more processors and memory;

One or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any one of claims 1-17.
A computer-readable storage medium, wherein a program code is stored in the computer-readable storage medium, wherein the method according to any one of claims 1-17 is executed when the program code is running.