Nothing Special   »   [go: up one dir, main page]

WO2023051215A1 - Gaze point acquisition method and apparatus, electronic device and readable storage medium - Google Patents

Gaze point acquisition method and apparatus, electronic device and readable storage medium Download PDF

Info

Publication number
WO2023051215A1
WO2023051215A1 PCT/CN2022/117847 CN2022117847W WO2023051215A1 WO 2023051215 A1 WO2023051215 A1 WO 2023051215A1 CN 2022117847 W CN2022117847 W CN 2022117847W WO 2023051215 A1 WO2023051215 A1 WO 2023051215A1
Authority
WO
WIPO (PCT)
Prior art keywords
gaze
gaze point
point
network model
fixation
Prior art date
Application number
PCT/CN2022/117847
Other languages
French (fr)
Chinese (zh)
Inventor
孙哲
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2023051215A1 publication Critical patent/WO2023051215A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and more specifically, to a gaze point acquisition method, device, electronic equipment, and readable storage medium.
  • the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user.
  • the detection accuracy needs to be improved in the related manner of detecting the gaze position of the user.
  • the present application proposes a gaze point acquisition method, device, electronic device, and readable storage medium, so as to improve the above problems.
  • the present application provides a fixation point acquisition method, which is applied to an electronic device, and the method includes: acquiring a first fixation point, which is obtained by inputting a fixation state image into a first network model The point of fixation; The first point of fixation and historical point of fixation distribution information are input to the second network model to obtain the second point of fixation output by the second network model, wherein the historical point of fixation distribution information represents the The distribution of the second gaze point historically output by the second network model; and obtaining the target gaze point according to the second gaze point.
  • the present application provides a fixation point acquisition device that runs on an electronic device, and the device includes: a first fixation point acquisition unit, configured to acquire a first fixation point, and the first fixation point is a gaze state Image input to the gaze point obtained by the first network model; the second gaze point acquisition unit is used to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the output of the second network model The second point of fixation, wherein, the distribution information of the historical point of fixation characterizes the distribution of the second point of fixation historically output by the second network model; the point of fixation determination unit is used to obtain the target gaze according to the second point of fixation point.
  • the present application provides an electronic device, including one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, The one or more programs are configured to perform the methods described above.
  • the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above method is executed when the program code is running.
  • FIG. 1 shows a schematic diagram of an application scenario of a gaze point acquisition method proposed in an embodiment of the present application
  • FIG. 2 shows a flow chart of a method for obtaining a gaze point proposed in an embodiment of the present application
  • FIG. 3 shows a schematic diagram of obtaining a first gaze point in an embodiment of the present application
  • FIG. 4 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application
  • Fig. 5 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application
  • FIG. 6 shows a schematic diagram of a gaze point in an embodiment of the present application
  • Fig. 7 shows a schematic diagram of different distances between the user's face and the electronic device in the embodiment of the present application
  • FIG. 8 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application.
  • FIG. 9 shows a schematic diagram of the second gaze point in history in the embodiment of the present application.
  • FIG. 10 shows a schematic diagram of a gaze area in an embodiment of the present application.
  • Fig. 11 shows a schematic diagram of another gaze area in the embodiment of the present application.
  • FIG. 12 shows a flow chart of a gaze point acquisition method proposed by another embodiment of the present application.
  • FIG. 13 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application.
  • FIG. 14 shows a schematic diagram of a model training method proposed by an embodiment of the present application.
  • FIG. 15 shows a structural block diagram of a gaze point acquisition device proposed in an embodiment of the present application.
  • FIG. 16 shows a structural block diagram of a gaze point acquisition device proposed in another embodiment of the present application.
  • Fig. 17 shows a structural block diagram of an electronic device proposed by the present application.
  • Fig. 18 is a storage unit for storing or carrying program codes for realizing the fixation point acquisition method according to the embodiment of the present application according to the embodiment of the present application.
  • the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user. For example, in an information browsing scenario, the electronic device may determine whether to update the browsing information according to the detected position of the gaze point of the user, and the updating includes turning pages and the like. Furthermore, in some scenarios, the control operation corresponding to the button that the user is gazing at can be triggered according to the button that the user is gazing at.
  • the user in order to improve the detection accuracy, the user needs to watch the designated position on the screen of the electronic device according to the prompt of the electronic device before using it, which causes inconvenience to the user.
  • the user when there are many designated positions that need to be watched, it will also consume too much time for the user.
  • the embodiments of the present application propose a gaze point acquisition method, device, electronic equipment, and readable storage medium.
  • the method obtains the fixation point obtained by inputting the fixation state image into the first network model as the first fixation point, and then inputs the first fixation point and historical fixation point distribution information into the second network model to obtain the described The second gaze point output by the second network model, and then obtain the target gaze point according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • the provided gaze point acquisition method may be executed by an electronic device.
  • all the steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the electronic device.
  • it may also be executed by a server.
  • all steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the server.
  • it can also be executed cooperatively by the electronic device and the server. In this mode of cooperative execution by the electronic device and the server, some steps in the gaze point acquisition method provided by the embodiment of the present application are executed by the electronic device, while other parts of the steps are executed by the server.
  • the electronic device 100 may perform the gaze point acquisition method including: acquiring a first gaze point, and the first gaze point is a gaze point obtained by inputting a gaze state image into a first network model .
  • the electronic device 100 can send the first gaze point to the server 200, and then the server 200 executes the gaze point acquisition method including: inputting the first gaze point and historical gaze point distribution information into the first gaze point Two network models, obtaining the second gaze point output by the second network model, wherein the historical gaze point distribution information characterizes the distribution of the second gaze point historically output by the second network model; according to the second
  • the gaze point acquires a target gaze point, and may also return the target gaze point to the electronic device 100 .
  • the steps performed by the electronic device and the server respectively are not limited to the method described in the above examples.
  • the electronic device can be dynamically adjusted according to the actual situation Steps performed by the device and the server respectively.
  • the electronic device may be a smart phone, a tablet computer, and the like.
  • a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
  • S110 Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
  • the electronic device when the user is using the electronic device, the electronic device can collect the image of the user's face through the image acquisition device provided by the electronic device, and then obtain the gaze state image, and then input the collected gaze state image into the first network model to get the first fixation point. That is to say, the first network model can directly output the first gaze point correspondingly according to the collected gaze state image.
  • obtaining the first gaze point in the embodiment of the present application can be understood as that the electronic device is responsible for inputting the acquired gaze state image into the first network model, and obtaining the gaze point output by the first network model .
  • the first network model can be directly deployed locally on the electronic device, and after the electronic device collects the gaze state image through its own image acquisition device, it can input the collected gaze state image to the local first network In the model, the first gaze point output by the first network model is further obtained.
  • the electronic device 100 collects a gaze state image 10 , and then inputs the gaze state image 10 into the first network model 20 , and then obtains a first gaze point output by the first network model 20 .
  • obtaining the first gaze point in the embodiment of the present application may be understood as obtaining the first gaze point output by other devices.
  • the electronic device can be understood as a device for obtaining a final target gaze point according to the first gaze point, and can return the finally determined first gaze point to the device that sends the first gaze point. Exemplarily, as shown in FIG.
  • the electronic device 200 collects the gaze state image 10, then inputs the gaze state image 10 into the first network model 20, and then obtains the first gaze point output by the first network model 20, And the first gaze point can be transmitted to the electronic device 100 again, and then the electronic device 100 executes the gaze point acquisition method provided by the embodiment of the present application, and after performing the gaze point acquisition method to obtain the target gaze point, then the target gaze point Return to electronic device 200 .
  • the electronic device collects the gaze state image, it transmits the collected gaze state image to other electronic devices, and then other electronic devices input the gaze state image into the first network model, and then the The first network model in the other electronic device outputs the first gaze point, and returns the outputted first gaze point to the electronic device.
  • the electronic device 100 acquires the gaze state image 10
  • it can transmit the gaze state image 10 to the electronic device 300, and then the electronic device 300 then inputs the gaze state image 10 acquired to the In the local first network model
  • the first gaze point output by the local first network model is obtained, the first gaze point is returned to the electronic device 100, and then the electronic device 100 executes this function based on the acquired first gaze point.
  • the gaze point acquisition method provided in the application embodiment.
  • the gaze state image may include an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the facial features of the face.
  • the key point image of the human face represents the position of the key point in the human face.
  • the five key points in the human face may include the centers of two eyeballs, the nose and the two corners of the mouth.
  • S120 Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.
  • the electronic device may record the second gaze point output by the second network model after starting to run, and obtain historical gaze point distribution information according to the recorded second gaze point. Then, each time data is input to the second network model, besides the first gaze point acquired in S110, the current historical gaze point distribution information will also be included. Moreover, after the second gaze point output by the second network model is acquired, the second gaze point will be added to the historical gaze point distribution information.
  • the historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5, in the process of executing S120, input together with the first gaze point
  • the historical gaze point distribution information to the second network model includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5.
  • the second gaze point output by the second network model is gaze point z6, after adding gaze point z6 to the historical gaze point distribution information, the latest historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point point point z3, gaze point z4, gaze point z5, and gaze point z6.
  • one function of the second network model is to correct the first gaze point, so that the output second gaze point can more accurately represent the gaze position actually corresponding to the gaze state image.
  • the input data of the second network model includes historical gaze point distribution information, so that the second network model can improve the problem of error when the user gazes at the same position on the screen at different positions.
  • FIG. 6 there is a position 40 in the electronic device, and in the relevant gaze position detection method, there will be a situation where the electronic device pays attention to the position 40 in different directions or different relative distances.
  • the determined gaze position is not a matter of position 40. For example, as shown in FIG. 7, the left image of FIG. 7 and the right image of FIG.
  • the second gaze point output by the second network model is Record to form the historical gaze point distribution information
  • the historical gaze point distribution information will be used as the input of the second network model, so that the second network model can be used to greatly improve the user's gaze at the same position in different positions. error problem.
  • the second network model may be a neural network regression (Quantile Regression Neural Network, QRNN).
  • the historical gaze point distribution information input into the second network model may belong to the same user, that is, the historical gaze point distribution information may represent the same user.
  • the distribution of the user's gaze point on the screen so the second network model can better obtain the user's gaze habit according to the historical gaze position of the same user, and then can more accurately determine the characteristic of the user's current gaze position The second fixation point, which in turn improves the accuracy of the second fixation point.
  • the target gazing point may be understood as the position where the user is actually gazing determined by the electronic device.
  • the target gaze point can be understood as the gaze point corresponding to the gaze state image.
  • the target gaze point will be obtained according to the second gaze point.
  • the electronic device may use the second gaze point as the target gaze point.
  • a fixation point acquisition method provided in this embodiment is to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
  • S210 Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
  • S220 Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.
  • S230 Obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, and the historical first gaze points are the first network gaze points A model is output based on a gaze state image input prior to said gaze state image input.
  • the first gaze point input to the second network model may include the first gaze point z7, the first gaze point z9, the first gaze point z11 and The first gaze point z13.
  • the second gaze point correspondingly output by the second network model is the second gaze point z8.
  • the second gaze point correspondingly output by the second network model is the second gaze point z10.
  • the second gaze point correspondingly output by the second network model is the second gaze point z12.
  • the second gaze point correspondingly output by the second network model is the second gaze point z14.
  • the corresponding multiple historical first fixation points acquired The two gaze points include the second gaze point z8 and the second gaze point z10.
  • the corresponding multiple historical second gaze points include The second gaze point z10 and the second gaze point z12.
  • S240 Input the second gaze point and the multiple historical second gaze points into a third network model, and acquire a third gaze point output by the third network model.
  • the third network model may be a long short-term memory artificial neural network (Long Short-Term Memroy).
  • the long-term short-term memory network is a kind of time recurrent neural network, which can be used to solve the long-term dependence problem of the general recurrent neural network (Recurrent Neural Network), which belongs to a kind of time recurrent neural network.
  • Recurrent Neural Network general recurrent neural network
  • the third network model will not only refer to the input second gaze point, but also combine the multiple historical second gaze points, so that more accurate Determine the fixation point to which the fixation state image actually corresponds.
  • the obtained multiple historical second gaze points are continuous in time with the second gaze points obtained according to the first gaze point, which means that input to the third network
  • the second fixation point and multiple historical second fixation points in the model represent the user's continuous gaze operation in the recent period
  • the long-short memory artificial neural network can memorize and transfer the relevant information of the last output content to the In the next output determination process, so that the third network model is a long-short-term memory artificial neural network
  • the third network model can determine the current output third gaze point in conjunction with the user's continuous gaze operations in the most recent period of time, thereby This enables the output of the third gaze point to be more stable and accurate.
  • the data processing parameters of the electronic device are obtained, and the processing parameters characterize the data processing capability of the electronic device; according to the data processing parameters, the number of the multiple historical second gaze points obtained is determined .
  • the third network model can relatively output a more accurate third gaze point representing the user's actual gaze position.
  • the more data input into the third network model the more data the third network model needs to process, so in the same model running environment, the more data the third network model needs to process If there are too many, it will cause the time consumed for each data output to be longer.
  • the device running the third network model can determine the acquired multiple The number of historical second fixations.
  • the data processing capability of the electronic device represented by the data processing parameter is stronger, the number of multiple historical second fixation points acquired is larger.
  • the data of the electronic device represented by the data processing parameter The weaker the processing capability the smaller the number of acquired multiple historical second gaze points.
  • the data processing parameters may include multiple parameters, then determining the number of the multiple historical second gaze points obtained according to the data processing parameters may include: obtaining scores corresponding to multiple parameters; The scores corresponding to each parameter are used to obtain the total score; according to the total score, the number of acquired multiple historical second fixation points is determined.
  • the electronic device may obtain scoring rules corresponding to multiple parameters, and then obtain the scoring rules corresponding to each parameter based on the scoring rules corresponding to each parameter, and add the scores corresponding to multiple parameters to obtain the total score , and then determine the number of multiple historical second fixation points to be acquired according to the total score and the number of historical second fixation points.
  • the multiple parameters may include: the number of processor cores, the main frequency of the processor, and available memory, etc. During the scoring process, if the score corresponding to the number of processor cores is p1, the corresponding The score is p2, and the score corresponding to the available memory is p3, then the total score obtained is p1+p2+p3.
  • This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method
  • the distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • the second fixation point output this time and the multiple historical second fixation points output by the second network model can be recreated together. Input it into the third network model, and then obtain the third gaze point output by the third network model as the target gaze point. In this way, when different users gaze at the same position, the target gaze point of the electronic device is different. problem, thereby improving the accuracy of the final gaze point of the electronic device.
  • the quantity of the second fixation point included in the historical fixation point distribution information is also more, so the historical fixation point
  • the distribution information can also more accurately record the user's habit of gazing at the screen, and then in the process of outputting the second gaze point through the second network model, as the number of times the second network model runs increases, the second network model can be more accurate. Accurate and more stable output of the second fixation point.
  • obtaining the target gaze point according to the second gaze point may include directly using the second gaze point as the target gaze point. Furthermore, obtaining the target gaze point according to the second gaze point may also include inputting the second gaze point and multiple historical second gaze points into the third network model, and obtaining the third gaze point output by the third network model , and take the third fixation point as the target fixation point. Then, in the case that there are multiple ways to obtain the target gaze point according to the second gaze point, the electronic device may determine which method to use to obtain the target gaze point according to current actual needs.
  • the electronic device may determine according to the current application scenario which method is specifically adopted to determine the manner of acquiring the gaze point of the target.
  • the user is usually using the electronic device in the process of using the electronic device to obtain his gaze point
  • the user is usually using the application program in the electronic device in the process of using the electronic device.
  • the inventor found in the research that different application programs have different requirements for the detection accuracy of the gaze point. Some application programs require more accurate gaze point detection, while some application programs have relatively low gaze point detection requirements. sketchy. For example, some applications will provide a gaze area. If it is detected that the user’s gaze on the gaze area meets the specified duration, the corresponding operation will be triggered.
  • the area of the gaze area is relatively large, which makes the gaze
  • the detection of the position can have a better error tolerance rate. For example, as shown in FIG. 10 , if it is detected that the user gazes at button 1 for a specified duration, the operation corresponding to button 1 will be triggered; if it is detected that the user gazes at button 2 for a specified duration, the operation corresponding to button 2 will be triggered.
  • the areas covered by button 1 and button 2 are relatively large, so that even if there is a certain error between the gaze point detected by the electronic device and the actual gaze point, it can still be judged more accurately Whether the user is looking at button 1 or button 2.
  • the corresponding gaze area is relatively small, and the electronic device may need to detect the actual gaze position more accurately to achieve more effective control.
  • the electronic device is in an information browsing scene (for example, web page browsing), and the interface corresponding to the information browsing scene includes a text area A, a text area B, a text area C, and a text area D. , text area E, text area F, and text area G. If the electronic device detects that the user is staring at the text area A for a long time, the page can be turned to the upper part shown in FIG. It is clear that each text area shown in FIG. 10 is small (smaller than the covered area of the button shown in FIG. 9 ), then a relatively precise gaze point is required to achieve an accurate page turning operation.
  • the obtained third gaze point has a higher probability than the second gaze point and can accurately represent the actual gaze point.
  • obtaining the target gaze point according to the second gaze point includes: obtaining the current application scene, obtaining the way of determining the gaze point corresponding to the current application scene; and then determining the gaze point corresponding to the current application scene
  • the fixation point is used to obtain the target fixation point.
  • the manner of determining the gaze point corresponding to the application scenario corresponds to the detection accuracy required by the application scenario.
  • the way of determining the gaze point corresponding to the current application scenario is to use the second gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, the obtained second gaze point will be The fixation point serves as the target fixation point.
  • the method of determining the gaze point corresponding to the current application scenario is to use the third gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, multiple historical second gaze points will be obtained Point, input the second gaze point and multiple historical second gaze points into the third network model, and obtain the third gaze point output by the third network model as the target gaze point.
  • the electronic device may determine the current application scene according to the application program running on the electronic device in the foreground during gaze point detection. For example, if the application currently running in the foreground is a text browsing program, then it can be determined that the current scene is an information browsing scene.
  • a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
  • S310 Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
  • the detecting whether the first gaze point is valid includes: detecting whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is satisfied, determining that the first gaze point is valid.
  • the target state includes eyes being in an open state.
  • the first network model can still output the first gaze point, but the outputted first gaze point is invalid. Then, by screening whether the first gaze point is valid, the gaze point corresponding to the image in which the user is actually in the closed-eye state can be screened out, so as to avoid inputting the invalid first gaze point into the subsequent model.
  • S330 If the first gaze point is valid, input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical The gaze point distribution information represents the distribution of the second gaze points historically output by the second network model.
  • This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method
  • the distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • after obtaining the first fixation point it is possible to first judge whether the first fixation point is valid, and then if the first fixation point itself is invalid, no subsequent follow-up processing, which in turn helps to improve the effectiveness of controlling the electronic device based on the point of gaze.
  • a method for obtaining a gaze point provided by the present application, the method includes:
  • S410 Obtain the sample gaze state images and the labeled gaze points corresponding to each sample gaze state image.
  • S430 Obtain a fixation point output by the first network model to be trained during the training process as a first training fixation point.
  • S440 Train the second network model to be trained by using the first training gaze point, the distribution information of the historical second training gaze point distribution information, and the marked gaze point corresponding to each sample gaze state image, to obtain the first training gaze point.
  • S450 If the second training fixation point output by the second network model to be trained is obtained, obtain a plurality of historical second training fixation points, wherein the historical second training fixation points are the second training fixation point of the second network to be trained
  • the model is output according to the history of the first training fixation point of input, and the first training fixation point of the history is that the network model to be trained is input to the first network model to be trained according to the current sample gaze state image
  • the sample gaze state image output of the current sample gaze state image is the sample gaze state image corresponding to the second training gaze point.
  • S470 Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
  • S480 Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second gaze point distribution information The distribution of the second fixation point historically output by the network model.
  • S490 Obtain a target gaze point according to the second gaze point.
  • each sample gaze state image includes a left-eye image, a right-eye image, a face image, and images of five key points in the face.
  • the face image represents the relative distribution position of the facial features in the face.
  • the five key points in the human face include the centers of two eyeballs, the nose and the two corners of the mouth.
  • S410 to S460 can be executed by the server. After the server executes the steps from S410 to S460, the complete first network model, second network model and third network model can be trained Deployed to the electronic device, so that the electronic device correspondingly executes the steps from S470 to S490 in the embodiment of the present application.
  • This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method
  • the distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • this embodiment provides a training method for the first network model, the second network model and the third network model.
  • a gaze point acquisition device 500 provided by the present application runs on an electronic device, and the device 500 includes:
  • the first gaze point acquisition unit 510 is configured to acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
  • the gaze state image includes an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the face.
  • the distribution of facial features in the human face, and the key point image of the human face represents the position of the key points in the human face.
  • the second gaze point acquisition unit 520 is configured to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point
  • the point distribution information represents the distribution of the second fixation points historically output by the second network model.
  • the gaze point determining unit 530 is configured to acquire a target gaze point according to the second gaze point.
  • the gaze point determination unit 530 is specifically configured to obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, so
  • the first gaze point of the history is that the first network model is output according to the gaze state image input before the gaze state image input; the second gaze point and the multiple historical second gaze points are input to the second gaze point A three-network model, obtaining a third gaze point output by the third network model; using the third gaze point as a target gaze point.
  • the gaze point determining unit 530 is also specifically configured to acquire data processing parameters of the electronic device, the processing parameters characterizing the data processing capability of the electronic device; Two the number of gaze points.
  • the second gaze point acquisition unit 520 is also used to detect Whether the first gaze point is valid; if the first gaze point is valid, input the first gaze point and historical gaze points to the second network model to obtain the second gaze point output by the second network model .
  • the second gaze point acquisition unit 520 is specifically configured to detect whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is met, determine that the first gaze point is valid.
  • a fixation point acquisition device is used to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • the device 500 also includes:
  • the model training unit 540 is used to obtain the sample gaze state image and each sample gaze state image corresponding to the marked gaze point; by the sample gaze state image and each sample gaze state image respectively corresponding to the mark gaze point, to the first
  • the network model to be trained is trained to obtain the first network model.
  • the model training unit 540 is also used to obtain the gaze point of the output of the first network model to be trained in the training process as the first training gaze point; through the first training gaze point, the history of the second training gaze point distribution information and the corresponding marked gaze point of each sample gaze state image, the second network model to be trained is trained to obtain the second network model, wherein the historical second training gaze point distribution information includes the Describe the distribution of gaze points output by the second network model to be trained during the training process.
  • the model training unit 540 is further configured to obtain a plurality of historical second training gaze points if the second training gaze point output by the second network model to be trained is obtained, wherein the historical second training gaze point is the The second network model to be trained is output according to the historical first training fixation point of input, and the first training fixation point of the history is that the first network model to be trained is input to the first fixation point according to the current sample gaze state image.
  • a sample gaze state image output of a network model to be trained, the current sample gaze state image is a sample gaze state image corresponding to the second training gaze point; through the second training gaze point and the plurality of historical first Two training gaze points are used to train the third network model to be trained to obtain the third network model.
  • each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the embodiment of the present application also provides an electronic device 1000 that can implement the aforementioned device control method.
  • the electronic device 1000 includes one or more (only one is shown in the figure) processors 102 , a memory 104 , a camera 106 and an audio collection device 108 coupled to each other.
  • the memory 104 stores programs capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104 .
  • the processor 102 may include one or more processing cores.
  • the processor 102 uses various interfaces and circuits to connect various parts of the entire electronic device 1000, and executes or executes instructions, programs, code sets, or instruction sets stored in the memory 104, and calls data stored in the memory 104 to execute Various functions of the electronic device 1000 and processing data.
  • the processor 102 may adopt at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). implemented in the form of hardware.
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used to render and draw the displayed content
  • the modem is used to handle wireless communication.
  • the processor 102 may be a neural network chip.
  • it may be an embedded neural network chip (NPU).
  • the memory 104 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • a data acquisition device may be stored in the memory 104 .
  • the data acquisition device may be the aforementioned device 500 .
  • the memory 104 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
  • the electronic device 1000 may further include a network module 110 and a sensor module 112 in addition to the aforementioned devices.
  • the network module 110 is used to implement information interaction between the electronic device 1000 and other devices, for example, transmitting device control instructions, manipulation request instructions, and status information acquisition instructions. However, when the electronic device 200 is specifically a different device, its corresponding network module 110 may be different.
  • the sensor module 112 may include at least one sensor. Specifically, the sensor module 112 may include, but is not limited to: a level, a light sensor, a motion sensor, a pressure sensor, an infrared heat sensor, a distance sensor, an acceleration sensor, and other sensors.
  • the pressure sensor may be a sensor for detecting pressure generated by pressing on the electronic device 1000 . That is, the pressure sensor detects pressure generated by contact or press between the user and the electronic device, eg, contact or press between the user's ear and the mobile terminal. Therefore, the pressure sensor can be used to determine whether contact or pressure occurs between the user and the electronic device 1000, and the magnitude of the pressure.
  • the acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is still, and can be used to identify the application of electronic equipment 1000 attitude (such as horizontal and vertical screen switching, related games, magnetometer, etc.) Attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.
  • the electronic device 1000 may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, and a thermometer, which will not be repeated here.
  • the audio collection device 110 is configured to collect audio signals.
  • the audio collection device 110 includes multiple audio collection devices, and the audio collection devices may be microphones.
  • the network module of the electronic device 1000 is a radio frequency module, and the radio frequency module is used to receive and send electromagnetic waves, realize mutual conversion between electromagnetic waves and electrical signals, and communicate with a communication network or other devices.
  • the radio frequency module may include various existing circuit elements for performing these functions, such as antenna, radio frequency transceiver, digital signal processor, encryption/decryption chip, Subscriber Identity Module (SIM) card, memory and so on.
  • SIM Subscriber Identity Module
  • the radio frequency module can interact with external devices by sending or receiving electromagnetic waves.
  • a radio frequency module can send instructions to a target device.
  • FIG. 18 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • Program codes are stored in the computer-readable medium 800, and the program codes can be invoked by a processor to execute the methods described in the foregoing method embodiments.
  • the computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium).
  • the computer-readable storage medium 800 has a storage space for program code 810 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • Program code 810 may, for example, be compressed in a suitable form.
  • the present application provides a fixation point acquisition method, device, electronic equipment, and readable storage medium to acquire the first fixation point, which is obtained by inputting the fixation state image into the first network model.
  • Obtained fixation point then input the first fixation point and historical fixation point distribution information to the second network model, obtain the second fixation point output by the second network model, and then obtain according to the second fixation point target gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
  • the second network model and the third network model can be used to make the final target gaze point more accurate and stable, therefore, it is not necessary for the user to follow the prompts of the electronic device when the user starts using it. Focus on the position for calibration operations, which saves the user's time and improves efficiency. Moreover, it also makes the solution provided by the embodiment of the present application better applicable to different users, and there will be no

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Embodiments of the present application disclose a gaze point acquisition method and apparatus, an electronic device and a readable storage medium. The method comprises: acquiring a first gaze point, the first gaze point being a gaze point obtained by inputting a gaze state image into a first network model; inputting the first gaze point and historical gaze point distribution information into a second network model, and acquiring a second gaze point outputted by the second network model, wherein the historical gaze point distribution information represents the distribution of second gaze points historically outputted by the second network model; and acquiring a target gaze point according to the second gaze point. Therefore, by means of the described method, the first gaze point outputted by the first network model will be further inputted into the second network together with the historical gaze point distribution information representing the distribution of the second gaze points historically outputted by the second network model, and then the target gaze point is acquired according to the second gaze point outputted by the second network model, thereby improving the accuracy of the target gaze point.

Description

注视点获取方法、装置、电子设备及可读存储介质Method, device, electronic device and readable storage medium for obtaining gaze point
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年9月30日提交的申请号为202111161492.9的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文。This application claims priority to Chinese Application No. 202111161492.9 filed September 30, 2021, which is hereby incorporated by reference in its entirety for all purposes.
技术领域technical field
本申请涉及人工智能技术领域,更具体地,涉及一种注视点获取方法、装置、电子设备及可读存储介质。The present application relates to the technical field of artificial intelligence, and more specifically, to a gaze point acquisition method, device, electronic equipment, and readable storage medium.
背景技术Background technique
随着技术的发展,电子设备可以对用户的注视屏幕的位置进行检测,从而根据所检测到的用户的注视位置来进行对应的操作。但是,相关的进行用户的注视位置检测的方式还存在检测精度有待提升的问题。With the development of technology, the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user. However, there is still a problem that the detection accuracy needs to be improved in the related manner of detecting the gaze position of the user.
发明内容Contents of the invention
鉴于上述问题,本申请提出了一种注视点获取方法、装置、电子设备及可读存储介质,以实现改善上述问题。In view of the above problems, the present application proposes a gaze point acquisition method, device, electronic device, and readable storage medium, so as to improve the above problems.
第一方面,本申请提供了一种注视点获取方法,应用于电子设备,所述方法包括:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点;将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况;根据所述第二注视点获取目标注视点。In a first aspect, the present application provides a fixation point acquisition method, which is applied to an electronic device, and the method includes: acquiring a first fixation point, which is obtained by inputting a fixation state image into a first network model The point of fixation; The first point of fixation and historical point of fixation distribution information are input to the second network model to obtain the second point of fixation output by the second network model, wherein the historical point of fixation distribution information represents the The distribution of the second gaze point historically output by the second network model; and obtaining the target gaze point according to the second gaze point.
第二方面,本申请提供了一种注视点获取装置,运行于电子设备,所述装置包括:第一注视点获取单元,用于获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点;第二注视点获取单元,用于将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况;注视点确定单元,用于根据所述第二注视点获取目标注视点。In a second aspect, the present application provides a fixation point acquisition device that runs on an electronic device, and the device includes: a first fixation point acquisition unit, configured to acquire a first fixation point, and the first fixation point is a gaze state Image input to the gaze point obtained by the first network model; the second gaze point acquisition unit is used to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the output of the second network model The second point of fixation, wherein, the distribution information of the historical point of fixation characterizes the distribution of the second point of fixation historically output by the second network model; the point of fixation determination unit is used to obtain the target gaze according to the second point of fixation point.
第三方面,本申请提供了一种电子设备,包括一个或多个处理器以及存储器;一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述的方法。In a third aspect, the present application provides an electronic device, including one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, The one or more programs are configured to perform the methods described above.
第四方面,本申请提供的一种计算机可读存储介质,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码运行时执行上述的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above method is executed when the program code is running.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1示出了本申请实施例提出的一种注视点获取方法的一种应用场景的示意图;FIG. 1 shows a schematic diagram of an application scenario of a gaze point acquisition method proposed in an embodiment of the present application;
图2示出了本申请实施例提出的一种注视点获取方法的流程图;FIG. 2 shows a flow chart of a method for obtaining a gaze point proposed in an embodiment of the present application;
图3示出了本申请实施例中一种获取第一注视点的示意图;FIG. 3 shows a schematic diagram of obtaining a first gaze point in an embodiment of the present application;
图4示出了本申请实施例中另一种获取第一注视点的示意图;FIG. 4 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application;
图5示出了本申请实施例中再一种获取第一注视点的示意图;Fig. 5 shows another schematic diagram of obtaining the first gaze point in the embodiment of the present application;
图6示出了本申请实施例中一种注视点的示意图;FIG. 6 shows a schematic diagram of a gaze point in an embodiment of the present application;
图7示出了本申请实施例中用户人脸与电子设备的不同间距的示意图;Fig. 7 shows a schematic diagram of different distances between the user's face and the electronic device in the embodiment of the present application;
图8示出了本申请另一实施例提出的一种注视点获取方法的流程图;FIG. 8 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application;
图9示出了本申请实施例中历史第二注视点的示意图;FIG. 9 shows a schematic diagram of the second gaze point in history in the embodiment of the present application;
图10示出了本申请实施例中一种注视区域的示意图;FIG. 10 shows a schematic diagram of a gaze area in an embodiment of the present application;
图11示出了本申请实施例中另一种注视区域的示意图;Fig. 11 shows a schematic diagram of another gaze area in the embodiment of the present application;
图12示出了本申请再一实施例提出的一种注视点获取方法的流程图;FIG. 12 shows a flow chart of a gaze point acquisition method proposed by another embodiment of the present application;
图13示出了本申请另一实施例提出的一种注视点获取方法的流程图;FIG. 13 shows a flow chart of a gaze point acquisition method proposed in another embodiment of the present application;
图14示出了本申请一实施例提出一种模型训练方式的示意图;FIG. 14 shows a schematic diagram of a model training method proposed by an embodiment of the present application;
图15示出了本申请实施例提出的一种注视点获取装置的结构框图;FIG. 15 shows a structural block diagram of a gaze point acquisition device proposed in an embodiment of the present application;
图16示出了本申请另一实施例提出的一种注视点获取装置的结构框图;FIG. 16 shows a structural block diagram of a gaze point acquisition device proposed in another embodiment of the present application;
图17示出了本申请提出的一种电子设备的结构框图;Fig. 17 shows a structural block diagram of an electronic device proposed by the present application;
图18是本申请实施例的用于保存或者携带实现根据本申请实施例的注视点获取方法的程序代码的存储单元。Fig. 18 is a storage unit for storing or carrying program codes for realizing the fixation point acquisition method according to the embodiment of the present application according to the embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
随着技术的发展,电子设备可以对用户的注视屏幕的位置进行检测,从而根据所检测到的用户的注视位置来进行对应的操作。例如,在信息浏览场景中,电子设备可以根据检测到的用户的注视点的位置来确定是否进行浏览信息的更新,该更新包括翻页等。再者,在一些场景中,可以根据用户所注视的按键来触发所注视按键对应的控制操作。With the development of technology, the electronic device can detect the position of the user's gaze on the screen, so as to perform corresponding operations according to the detected gaze position of the user. For example, in an information browsing scenario, the electronic device may determine whether to update the browsing information according to the detected position of the gaze point of the user, and the updating includes turning pages and the like. Furthermore, in some scenarios, the control operation corresponding to the button that the user is gazing at can be triggered according to the button that the user is gazing at.
但是,发明人在对相关的检测用户注视位置的技术的研究中发现,相关的检测用户注视位置的方法还存在检测精度不够高的问题。并且,在相关的技术中,为了能够提升检测的精确度,需要用户在使用之前,先根据电子设备的提示对电子设备的屏幕中的指定位置进行注视,进而给用户造成了不便。并且,在需要注视的指定位置较多的情况下,还会造成对消耗用户过多的时间。However, the inventor found in research on related technologies for detecting the user's gaze position that the relevant method for detecting the user's gaze position still has the problem of insufficient detection accuracy. Moreover, in the related technology, in order to improve the detection accuracy, the user needs to watch the designated position on the screen of the electronic device according to the prompt of the electronic device before using it, which causes inconvenience to the user. Moreover, when there are many designated positions that need to be watched, it will also consume too much time for the user.
因此,为了改善上述问题,本申请实施例提出了一种注视点获取方法、装置、电子设备及可读存储介质。该方法通过获取将注视状态图像输入到第一网络模型所得到的注视点作为第一注视点后,再将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,进而根据所述第二注视点获取目标注视点。从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。Therefore, in order to improve the above problems, the embodiments of the present application propose a gaze point acquisition method, device, electronic equipment, and readable storage medium. The method obtains the fixation point obtained by inputting the fixation state image into the first network model as the first fixation point, and then inputs the first fixation point and historical fixation point distribution information into the second network model to obtain the described The second gaze point output by the second network model, and then obtain the target gaze point according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
下面先对本申请实施例所涉及的应用场景进行介绍。The application scenarios involved in the embodiments of the present application are firstly introduced below.
在本申请实施例中,所提供的注视点获取方法可以由电子设备执行。在由电子设备执行的这种方式中,本申请实施例提供的注视点获取方法中所有步骤可以均由电子设备执行。再者,也可以由服务器进行执行。在由电子设备执行的这种方式中,本申请实施例提供的注视点获取方法中所有步骤可以均由服务器执行。另外,还可以由电子设备和服务器协同执行。在由电子设备和服务器协同执行的这种方式中,本申请实施例提供的注视点获取方法中的部分步骤由电子设备执行,而另外部分的步骤则由服务器来执行。In the embodiment of the present application, the provided gaze point acquisition method may be executed by an electronic device. In the manner of being executed by the electronic device, all the steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the electronic device. Furthermore, it may also be executed by a server. In this manner executed by the electronic device, all steps in the gaze point acquisition method provided in the embodiment of the present application may be executed by the server. In addition, it can also be executed cooperatively by the electronic device and the server. In this mode of cooperative execution by the electronic device and the server, some steps in the gaze point acquisition method provided by the embodiment of the present application are executed by the electronic device, while other parts of the steps are executed by the server.
示例性的,如图1所示,电子设备100可以执行注视点获取方法包括的:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。电子设备100得到第一注视点后,可以将第一注视点发送给服务器200,然后由服务器200来执行注视点获取方法包括的:将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况;根据所述第二注视点获取目标注视点,并且还可以将目标注视点返回给电子设备100。Exemplarily, as shown in FIG. 1 , the electronic device 100 may perform the gaze point acquisition method including: acquiring a first gaze point, and the first gaze point is a gaze point obtained by inputting a gaze state image into a first network model . After the electronic device 100 obtains the first gaze point, it can send the first gaze point to the server 200, and then the server 200 executes the gaze point acquisition method including: inputting the first gaze point and historical gaze point distribution information into the first gaze point Two network models, obtaining the second gaze point output by the second network model, wherein the historical gaze point distribution information characterizes the distribution of the second gaze point historically output by the second network model; according to the second The gaze point acquires a target gaze point, and may also return the target gaze point to the electronic device 100 .
需要说明的是,在由电子设备和服务器协同执行的这种方式中,电子设备和服务器分别执行的步骤 不限于上述示例中所介绍的方式,在实际应用中,可以根据实际情况动态的调整电子设备和服务器分别执行的步骤。其中,电子设备可以为智能手机以及平板电脑等。It should be noted that, in this method of cooperative execution by the electronic device and the server, the steps performed by the electronic device and the server respectively are not limited to the method described in the above examples. In practical applications, the electronic device can be dynamically adjusted according to the actual situation Steps performed by the device and the server respectively. Wherein, the electronic device may be a smart phone, a tablet computer, and the like.
下面则结合附图来对本申请所涉及的实施例进行介绍。The embodiments involved in this application will be introduced below in conjunction with the accompanying drawings.
请参阅图2,本申请提供的一种注视点获取方法,应用于电子设备,所述方法包括:Please refer to Fig. 2, a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
S110:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。S110: Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
其中,在本申请实施例中,用户在使用电子设备的过程中,电子设备可以通过其设置的图像采集器件采集用户脸部的图像,进而得到注视状态图像,然后可以将采集的注视状态图像输入到第一网络模型中,以得到第一注视点。也就是说,第一网络模型可以直接根据采集的注视状态图像来对应输出第一注视点。Among them, in the embodiment of the present application, when the user is using the electronic device, the electronic device can collect the image of the user's face through the image acquisition device provided by the electronic device, and then obtain the gaze state image, and then input the collected gaze state image into the first network model to get the first fixation point. That is to say, the first network model can directly output the first gaze point correspondingly according to the collected gaze state image.
需要说明的是,作为一种方式,本申请实施例中的获取第一注视点可以理解为电子设备负责将获取的注视状态图像输入到第一网络模型,并获取第一网络模型输出的注视点。在这种方式中,第一网络模型可以直接部署在电子设备的本地,电子设备在通过自身的图像采集器件采集到注视状态图像后,则可以将采集的注视状态图像输入到本地的第一网络模型中,进而获取到第一网络模型输出的第一注视点。示例性的,如图3所示,电子设备100采集了注视状态图像10,然后将注视状态图像10输入到第一网络模型20中,进而得到了第一网络模型20输出的第一注视点。It should be noted that, as a method, obtaining the first gaze point in the embodiment of the present application can be understood as that the electronic device is responsible for inputting the acquired gaze state image into the first network model, and obtaining the gaze point output by the first network model . In this way, the first network model can be directly deployed locally on the electronic device, and after the electronic device collects the gaze state image through its own image acquisition device, it can input the collected gaze state image to the local first network In the model, the first gaze point output by the first network model is further obtained. Exemplarily, as shown in FIG. 3 , the electronic device 100 collects a gaze state image 10 , and then inputs the gaze state image 10 into the first network model 20 , and then obtains a first gaze point output by the first network model 20 .
作为另外一种方式,本申请实施例中的获取第一注视点可以理解为获取其他设备所输出的第一注视点。在这种方式中,电子设备可以理解为用于根据第一注视点而获取最终目标注视点的设备,并可以将所最终所确定的第一注视点再返回给发送第一注视点的设备。示例性的,如图4所示,电子设备200采集了注视状态图像10,然后将注视状态图像10输入到第一网络模型20中,进而得到了第一网络模型20输出的第一注视点,并可以将第一注视点再传输给电子设备100,然后电子设备100在执行本申请实施例提供的注视点获取方法,并且可以在执行注视点获取方法获取目标注视点之后,再将目标注视点返回给电子设备200。As another manner, obtaining the first gaze point in the embodiment of the present application may be understood as obtaining the first gaze point output by other devices. In this manner, the electronic device can be understood as a device for obtaining a final target gaze point according to the first gaze point, and can return the finally determined first gaze point to the device that sends the first gaze point. Exemplarily, as shown in FIG. 4 , the electronic device 200 collects the gaze state image 10, then inputs the gaze state image 10 into the first network model 20, and then obtains the first gaze point output by the first network model 20, And the first gaze point can be transmitted to the electronic device 100 again, and then the electronic device 100 executes the gaze point acquisition method provided by the embodiment of the present application, and after performing the gaze point acquisition method to obtain the target gaze point, then the target gaze point Return to electronic device 200 .
作为再一种方式,电子设备可以在采集注视状态图像后,将所采集的注视状态图像传输给其他的电子设备,然后由其他的电子设备将注视状态图像输入到第一网络模型中,然后由其他电子设备中的第一网络模型输出第一注视点,并将输出的第一注视点再返回给电子设备。示例性的,如图5所示,电子设备100在采集得到注视状态图像10后可以将该注视状态图像10传输给电子设备300,然后,电子设备300再将获取到的注视状态图像10输入到本地的第一网络模型中,在得到本地的第一网络模型输出的第一注视点之后,再将第一注视点返回给电子设备100,进而电子设备100在基于获取的第一注视点执行本申请实施例提供的注视点获取方法。As another way, after the electronic device collects the gaze state image, it transmits the collected gaze state image to other electronic devices, and then other electronic devices input the gaze state image into the first network model, and then the The first network model in the other electronic device outputs the first gaze point, and returns the outputted first gaze point to the electronic device. Exemplarily, as shown in FIG. 5 , after the electronic device 100 acquires the gaze state image 10, it can transmit the gaze state image 10 to the electronic device 300, and then the electronic device 300 then inputs the gaze state image 10 acquired to the In the local first network model, after the first gaze point output by the local first network model is obtained, the first gaze point is returned to the electronic device 100, and then the electronic device 100 executes this function based on the acquired first gaze point. The gaze point acquisition method provided in the application embodiment.
其中,注视状态图像可以包括眼部特征图像、脸部特征图像以及人脸关键点图像,其中,所述眼部特征图像表征虹膜位置以及眼球位置,所述脸部特征图像表征脸部的五官的分布情况,所述人脸关键点图像表征人脸中关键点的位置。其中,人脸中的五个关键点可以包括有两只眼球中心、鼻子以及两个嘴角。Wherein, the gaze state image may include an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the facial features of the face. Distribution situation, the key point image of the human face represents the position of the key point in the human face. Among them, the five key points in the human face may include the centers of two eyeballs, the nose and the two corners of the mouth.
S120:将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况。S120: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.
其中,电子设备可以将第二网络模型在开始运行后所输出的第二注视点进行记录,并根据记录的第二注视点得到历史注视点分布信息。那么在每次向第二网络模型输入数据时,除了会包括在S110中获取的第一注视点外,还会包括当前的历史注视点分布信息。并且,在获取到第二网络模型输出的第二注视点后,会将该第二注视点增加到历史注视点分布信息中。示例性的,在历史注视点分布信息中包括有注视点z1、注视点z2、注视点z3、注视点z4以及注视点z5的情况下,在执行S120的过程中,与第一注视点一同输入到第二网络模型的历史注视点分布信息会包括有注视点z1、注视点z2、注视点z3、注视点z4以及注视点z5。若第二网络模型所输出的第二注视点为注视点z6,在将注视点z6增加到历史注视点分布信息后,当前最新的历史注视点分布信息则包括注视点z1、注视点z2、注视点z3、注视点z4、注视点z5以及注视点z6。Wherein, the electronic device may record the second gaze point output by the second network model after starting to run, and obtain historical gaze point distribution information according to the recorded second gaze point. Then, each time data is input to the second network model, besides the first gaze point acquired in S110, the current historical gaze point distribution information will also be included. Moreover, after the second gaze point output by the second network model is acquired, the second gaze point will be added to the historical gaze point distribution information. Exemplarily, in the case that the historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5, in the process of executing S120, input together with the first gaze point The historical gaze point distribution information to the second network model includes gaze point z1, gaze point z2, gaze point z3, gaze point z4, and gaze point z5. If the second gaze point output by the second network model is gaze point z6, after adding gaze point z6 to the historical gaze point distribution information, the latest historical gaze point distribution information includes gaze point z1, gaze point z2, gaze point point z3, gaze point z4, gaze point z5, and gaze point z6.
需要说明的是,第二网络模型的一个作用是对第一注视点进行修正,以使得所输出的第二注视点可以更加准确的表征注视状态图像所实际对应的注视位置。并且,第二网络模型的输入数据中包括有历史注视点分布信息,则可以使得第二网络模型可以改善用户在不同的位置注视屏幕中的同一位置时会存在误差的问题。示例性的,如图6所示,在电子设备中有位置40,而在相关的注视位置检测方式中,会存在从电子设备不同方位或者不同相对距离关注位置40的情况下,最终电子设备所确定的注视位置并不是位置40的问题。例如,如图7所示,图7的左侧图像和图7的右侧图像分别示出了两种用户握持手机的姿态,其中,在图7的左侧图像中,用户的脸部与所握持的手机的距离,会相比图7的右侧图像中用户的脸部与所握持的手机的距离更小,那么在相关技术中,即使在图7左侧图像和图7右侧图像所示的用户都是在注视同一位置的情况下,电子设备最终所目标注视点可能是不同的,而在本申请实施例中,因为会对第二网络模型输出的第二注视点进行记录以形成历史注视点分布信息,并且会将历史注视点分布信息作为第二网络模型的输入,从而可以通过第二网络模型极大的改善用户在不同的位置注视屏幕中的同一位置时会存在误差的 问题。It should be noted that one function of the second network model is to correct the first gaze point, so that the output second gaze point can more accurately represent the gaze position actually corresponding to the gaze state image. Moreover, the input data of the second network model includes historical gaze point distribution information, so that the second network model can improve the problem of error when the user gazes at the same position on the screen at different positions. Exemplarily, as shown in FIG. 6 , there is a position 40 in the electronic device, and in the relevant gaze position detection method, there will be a situation where the electronic device pays attention to the position 40 in different directions or different relative distances. The determined gaze position is not a matter of position 40. For example, as shown in FIG. 7, the left image of FIG. 7 and the right image of FIG. 7 respectively show two postures of the user holding the mobile phone, wherein, in the left image of FIG. 7, the user's face and The distance between the held mobile phone will be smaller than the distance between the user's face and the held mobile phone in the image on the right side of Figure 7, so in related technologies, even in the image on the left side of Figure 7 and the right side of Figure 7 In the case that the users shown in the side image are all looking at the same position, the final gaze point of the electronic device may be different. In the embodiment of the present application, because the second gaze point output by the second network model is Record to form the historical gaze point distribution information, and the historical gaze point distribution information will be used as the input of the second network model, so that the second network model can be used to greatly improve the user's gaze at the same position in different positions. error problem.
其中,第二网络模型可以为一种神经网络回归(Quantile RegressionNeural Network,QRNN)。Wherein, the second network model may be a neural network regression (Quantile Regression Neural Network, QRNN).
需要说明的是,在第二网络模型的运行过程中,输入到第二网络模型中的历史注视点分布信息可以是属于同一个用户的信息,即,历史注视点分布信息可以表征的是同一个用户在屏幕中的注视点的分布情况,所以第二网络模型可以能够更好的根据该同一用户的历史注视位置来获取该用户的注视习惯,进而可以更为精确的确定表征用户当前注视位置的第二注视点,进而提升了第二注视点的精度。It should be noted that during the operation of the second network model, the historical gaze point distribution information input into the second network model may belong to the same user, that is, the historical gaze point distribution information may represent the same user. The distribution of the user's gaze point on the screen, so the second network model can better obtain the user's gaze habit according to the historical gaze position of the same user, and then can more accurately determine the characteristic of the user's current gaze position The second fixation point, which in turn improves the accuracy of the second fixation point.
S130:根据所述第二注视点获取目标注视点。S130: Obtain a target gaze point according to the second gaze point.
其中,在本申请实施例中目标注视点可以理解为电子设备所确定的用户实际在注视的位置。或者说,目标注视点可以理解为作为与注视状态图像所对应的注视点。并且,目标注视点和第二注视点是有关的,则会根据第二注视点获取目标注视点。作为一种方式,电子设备可以将第二注视点作为目标注视点。Wherein, in the embodiment of the present application, the target gazing point may be understood as the position where the user is actually gazing determined by the electronic device. In other words, the target gaze point can be understood as the gaze point corresponding to the gaze state image. Moreover, if the target gaze point is related to the second gaze point, the target gaze point will be obtained according to the second gaze point. As a manner, the electronic device may use the second gaze point as the target gaze point.
本实施例提供的一种注视点获取方法,获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点,然后再将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,进而根据所述第二注视点获取目标注视点。从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。A fixation point acquisition method provided in this embodiment is to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
请参阅图8,本申请提供的一种注视点获取方法,应用于电子设备,所述方法包括:Please refer to FIG. 8, a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
S210:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。S210: Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
S220:将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况。S220: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second The distribution of the second fixation point historically output by the network model.
S230:获取多个历史第二注视点,其中,所述历史第二注视点为所述第二网络模型根据输入的历史第一注视点输出,所述历史第一注视点为所述第一网络模型根据在所述注视状态图像输入之前所输入的注视状态图像输出。S230: Obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, and the historical first gaze points are the first network gaze points A model is output based on a gaze state image input prior to said gaze state image input.
示例性的,如图9所示,随着第二网络模型的运行,向第二网络模型输入的第一注视点可以包括第一注视点z7、第一注视点z9、第一注视点z11以及第一注视点z13。其中,在输入第一注视点z7的情况下,第二网络模型对应输出的第二注视点为第二注视点z8。在输入第一注视点z9的情况下,第二网络模型对应输出的第二注视点为第二注视点z10。在输入第一注视点z11的情况下,第二网络模型对应输出的第二注视点为第二注视点z12。在输入第一注视点z13的情况下,第二网络模型对应输出的第二注视点为第二注视点z14。那么在这种情况下,若确定获取的多个历史第二注视点的数量为2,且在S220中输入到第二网络模型的为第一注视点z11,对应的所获取的多个历史第二注视点包括第二注视点z8以及第二注视点z10,再者,若在S220中输入到第二网络模型的为第一注视点z13,对应的所获取的多个历史第二注视点包括第二注视点z10以及第二注视点z12。Exemplarily, as shown in FIG. 9, with the operation of the second network model, the first gaze point input to the second network model may include the first gaze point z7, the first gaze point z9, the first gaze point z11 and The first gaze point z13. Wherein, in the case of inputting the first gaze point z7, the second gaze point correspondingly output by the second network model is the second gaze point z8. In the case of inputting the first gaze point z9, the second gaze point correspondingly output by the second network model is the second gaze point z10. In the case of inputting the first gaze point z11, the second gaze point correspondingly output by the second network model is the second gaze point z12. In the case of inputting the first gaze point z13, the second gaze point correspondingly output by the second network model is the second gaze point z14. Then in this case, if it is determined that the number of multiple historical second fixation points acquired is 2, and the input to the second network model in S220 is the first fixation point z11, the corresponding multiple historical first fixation points acquired The two gaze points include the second gaze point z8 and the second gaze point z10. Moreover, if the first gaze point z13 is input to the second network model in S220, the corresponding multiple historical second gaze points include The second gaze point z10 and the second gaze point z12.
S240:将所述第二注视点以及所述多个历史第二注视点输入到第三网络模型,获取所述第三网络模型输出的第三注视点。S240: Input the second gaze point and the multiple historical second gaze points into a third network model, and acquire a third gaze point output by the third network model.
其中,第三网络模型可以为一种长短记忆人工神经网络(Long Short-Term Memroy)。长短期记忆网络是一种时间循环神经网络,可以用于解决一般的循环神经网络(Recurrent Neural Network)存在的长期依赖问题,属于时间递归神经网络中的一种。在本申请实施例中,第三网络模型在确定所要输出的注视点的过程中,不仅会参考输入的第二注视点,还会结合该多个历史第二注视点,使得可以更为准确的确定注视状态图像所实际对应的注视点。具体地,在本实施例中,所获取的多个历史第二注视点与根据第一注视点所得到的第二注视点在时间上是连续的,那么也就意味着,输入到第三网络模型中的第二注视点和多个历史第二注视点表征的是用户在最近一段时间内连续的注视操作,而长短记忆人工神经网络可以将上一次的输出内容的相关信息进行记忆并传递给下一次的输出确定过程,进而使得第三网络模型为长短记忆人工神经网络的情况下,第三网络模型可以结合用户最近一段时间内连续的注视操作来确定当前所输出的第三注视点,从而使得所输出的第三注视点可以更加稳定和精确。Wherein, the third network model may be a long short-term memory artificial neural network (Long Short-Term Memroy). The long-term short-term memory network is a kind of time recurrent neural network, which can be used to solve the long-term dependence problem of the general recurrent neural network (Recurrent Neural Network), which belongs to a kind of time recurrent neural network. In the embodiment of the present application, in the process of determining the gaze point to be output, the third network model will not only refer to the input second gaze point, but also combine the multiple historical second gaze points, so that more accurate Determine the fixation point to which the fixation state image actually corresponds. Specifically, in this embodiment, the obtained multiple historical second gaze points are continuous in time with the second gaze points obtained according to the first gaze point, which means that input to the third network The second fixation point and multiple historical second fixation points in the model represent the user's continuous gaze operation in the recent period, and the long-short memory artificial neural network can memorize and transfer the relevant information of the last output content to the In the next output determination process, so that the third network model is a long-short-term memory artificial neural network, the third network model can determine the current output third gaze point in conjunction with the user's continuous gaze operations in the most recent period of time, thereby This enables the output of the third gaze point to be more stable and accurate.
S250:将所述第三注视点作为目标注视点。S250: Use the third gaze point as a target gaze point.
作为一种方式,获取所述电子设备的数据处理参数,所述处理参数表征所述电子设备的数据处理能力;根据所述数据处理参数确定所获取的所述多个历史第二注视点的数量。As a method, the data processing parameters of the electronic device are obtained, and the processing parameters characterize the data processing capability of the electronic device; according to the data processing parameters, the number of the multiple historical second gaze points obtained is determined .
需要说明的是,对于输入到第三网络模型中的数据越多,第三网络模型就相对可以输出更为准确的表征用户实际注视位置的第三注视点。但是,对应的,输入到第三网络模型中的数据越多,第三网络模型所需要处理的数据也就越多,那么在相同的模型运行环境下,第三网络模型所需处理的数据越多,则就会造成每次进行数据输出所消耗的时间越长。为了使得第三网络模型进行第三注视点的输出具有更好的适配性,运行第三网络模型的设备可以根据自身的数据处理参数,进而再根据数据处理参数确定所获取的所述多个历史第二注视点的数量。可选的,若数据处理参数所表征的电子设备的数据处理能力越强,则所获取 的多个历史第二注视点的数量越多,对应的,若数据处理参数所表征的电子设备的数据处理能力越弱,则所获取的多个历史第二注视点的数量越少。It should be noted that, the more data is input into the third network model, the third network model can relatively output a more accurate third gaze point representing the user's actual gaze position. However, correspondingly, the more data input into the third network model, the more data the third network model needs to process, so in the same model running environment, the more data the third network model needs to process If there are too many, it will cause the time consumed for each data output to be longer. In order to make the output of the third gaze point by the third network model have better adaptability, the device running the third network model can determine the acquired multiple The number of historical second fixations. Optionally, if the data processing capability of the electronic device represented by the data processing parameter is stronger, the number of multiple historical second fixation points acquired is larger. Correspondingly, if the data of the electronic device represented by the data processing parameter The weaker the processing capability, the smaller the number of acquired multiple historical second gaze points.
可选的,数据处理参数可以包括有多个参数,那么根据所述数据处理参数确定所获取的所述多个历史第二注视点的数量可以包括:获取多个参数各自对应的评分;基于多个参数各自对应的评分来得到总评分;根据所述总评分确定获取的多个历史第二注视点的数量。其中,电子设备可以获取得到多个参数各自对应的评分规则,然后基于每个参数各自对应的评分规则获取每个参数各自对应的评分,并将多个参数各自对应的评分进行相加以得到总评分,然后再根据总评分与历史第二注视点的数量来确定所要获取的多个历史第二注视点的数量。示例性的,该多个参数可以包括有:处理器核数、处理器主频以及可用内存等,在进行评分的过程中,若处理器核数对应的评分为p1,处理器主频对应的评分为p2,可用内存对应的评分为p3,那么则得到的总评分为p1+p2+p3。Optionally, the data processing parameters may include multiple parameters, then determining the number of the multiple historical second gaze points obtained according to the data processing parameters may include: obtaining scores corresponding to multiple parameters; The scores corresponding to each parameter are used to obtain the total score; according to the total score, the number of acquired multiple historical second fixation points is determined. Wherein, the electronic device may obtain scoring rules corresponding to multiple parameters, and then obtain the scoring rules corresponding to each parameter based on the scoring rules corresponding to each parameter, and add the scores corresponding to multiple parameters to obtain the total score , and then determine the number of multiple historical second fixation points to be acquired according to the total score and the number of historical second fixation points. Exemplarily, the multiple parameters may include: the number of processor cores, the main frequency of the processor, and available memory, etc. During the scoring process, if the score corresponding to the number of processor cores is p1, the corresponding The score is p2, and the score corresponding to the available memory is p3, then the total score obtained is p1+p2+p3.
本实施例提供的一种注视点获取方法,从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
并且,在本实施例中,在得到第二模型输出的第二注视点以后,还可以再将当次输出的第二注视点以及第二网络模型之前输出的多个历史第二注视点一同再输入到第三网络模型中,进而获取第三网络模型输出的第三注视点作为目标注视点,从而通过这种方式,可以再改善不同的用户在注视同一位置时,电子设备所目标注视点不同的问题,进而提升了电子设备最终所目标注视点的精度。并且,在本实施例中,随着第二网络模型所输出的第二注视点的数量的增加,历史注视点分布信息中所包括的第二注视点的数量也就越多,那么历史注视点分布信息也就可以更加准确的记录用户的注视屏幕的习惯,进而在通过第二网络模型输出第二注视点的过程中,随着第二网络模型的运行次数的增加,第二网络模型可以更加精确以及更加稳定的进行第二注视点的输出。Moreover, in this embodiment, after obtaining the second fixation point output by the second model, the second fixation point output this time and the multiple historical second fixation points output by the second network model can be recreated together. Input it into the third network model, and then obtain the third gaze point output by the third network model as the target gaze point. In this way, when different users gaze at the same position, the target gaze point of the electronic device is different. problem, thereby improving the accuracy of the final gaze point of the electronic device. And, in this embodiment, along with the increase of the quantity of the second fixation point output by the second network model, the quantity of the second fixation point included in the historical fixation point distribution information is also more, so the historical fixation point The distribution information can also more accurately record the user's habit of gazing at the screen, and then in the process of outputting the second gaze point through the second network model, as the number of times the second network model runs increases, the second network model can be more accurate. Accurate and more stable output of the second fixation point.
需要说明的是,如前述实施例内容可知,在本申请实施例中,根据所述第二注视点获取目标注视点可以包括直接将第二注视点作为目标注视点。再者,根据所述第二注视点获取目标注视点也可以包括将第二注视点以及多个历史第二注视点输入到第三网络模型,获取所述第三网络模型输出的第三注视点,并将第三注视点作为目标注视点。那么在可以有多种方式根据第二注视点来获取目标注视点的情况下,电子设备可以根据当前的实际需求来确定具体采用何种方式来获取目标注视点。It should be noted that, as can be seen from the foregoing embodiments, in the embodiment of the present application, obtaining the target gaze point according to the second gaze point may include directly using the second gaze point as the target gaze point. Furthermore, obtaining the target gaze point according to the second gaze point may also include inputting the second gaze point and multiple historical second gaze points into the third network model, and obtaining the third gaze point output by the third network model , and take the third fixation point as the target fixation point. Then, in the case that there are multiple ways to obtain the target gaze point according to the second gaze point, the electronic device may determine which method to use to obtain the target gaze point according to current actual needs.
作为一种方式,电子设备可以根据当前的应用场景来确定具体采用哪种方式来确定获取目标注视点的方式。需要说明的是,用户在使用电子设备来获取自己的注视点的过程中,通常是在使用电子设备的过程,并且,用户在使用电子设备的过程通常是在使用电子设备中的应用程序。发明人在研究中发现,不同的应用程序对于注视点的检测精度需求是有所区别的,有的应用程序对于注视点的检测需要较为精确,而有的应用程序对于注视点的检测需求相对较为粗略。例如,有的应用程序是会提供一个注视区域,若检测到用户对该注视区域的注视时长满足指定时长则会触发对应的操作,而通常该注视区域的面积是较大的,进而使得对注视位置的检测可以有较好的容错率。例如,如图10所示,若检测到用户注视按键1的时长满足指定时长则会触发按键1对应的操作,若检测到用户注视按键2的时长满足指定时长则会触发按键2对应的操作。并且,如图10所示,按键1和按键2各自所覆盖的区域都较大,使得即使在电子设备所检测到的注视点与实际注视点有一定误差的情况下,依然可以较为准确的判断用户是在注视按键1还是按键2。As a manner, the electronic device may determine according to the current application scenario which method is specifically adopted to determine the manner of acquiring the gaze point of the target. It should be noted that the user is usually using the electronic device in the process of using the electronic device to obtain his gaze point, and the user is usually using the application program in the electronic device in the process of using the electronic device. The inventor found in the research that different application programs have different requirements for the detection accuracy of the gaze point. Some application programs require more accurate gaze point detection, while some application programs have relatively low gaze point detection requirements. sketchy. For example, some applications will provide a gaze area. If it is detected that the user’s gaze on the gaze area meets the specified duration, the corresponding operation will be triggered. Usually, the area of the gaze area is relatively large, which makes the gaze The detection of the position can have a better error tolerance rate. For example, as shown in FIG. 10 , if it is detected that the user gazes at button 1 for a specified duration, the operation corresponding to button 1 will be triggered; if it is detected that the user gazes at button 2 for a specified duration, the operation corresponding to button 2 will be triggered. Moreover, as shown in FIG. 10 , the areas covered by button 1 and button 2 are relatively large, so that even if there is a certain error between the gaze point detected by the electronic device and the actual gaze point, it can still be judged more accurately Whether the user is looking at button 1 or button 2.
在另外的应用场景下,对应的注视区域相对较小,则电子设备可能要较为精确的检测到实际注视位置,才能实现较为有效的控制。例如,如图11所示,电子设备所处的为信息浏览场景(例如,网页浏览),在该信息浏览场景对应的界面中包括有文本区域A、文本区域B、文本区域C、文本区域D、文本区域E、文本区域F以及文本区域G。若电子设备检测到用户长时间注视文本区域A可以向图10所示的上部进行翻页,若电子设备检测到用户长时间注视文本区域G可以向图10所示的下部进行翻页。可以明确的是,图10中所示的每个文本区域都较小(比图9中所示的按键的覆盖区域小),那么则需要获取较为精确的注视点才能实现准确的翻页操作。In other application scenarios, the corresponding gaze area is relatively small, and the electronic device may need to detect the actual gaze position more accurately to achieve more effective control. For example, as shown in FIG. 11 , the electronic device is in an information browsing scene (for example, web page browsing), and the interface corresponding to the information browsing scene includes a text area A, a text area B, a text area C, and a text area D. , text area E, text area F, and text area G. If the electronic device detects that the user is staring at the text area A for a long time, the page can be turned to the upper part shown in FIG. It is clear that each text area shown in FIG. 10 is small (smaller than the covered area of the button shown in FIG. 9 ), then a relatively precise gaze point is required to achieve an accurate page turning operation.
对于本申请实施例提供的两种获取目标注视点的方式中,所获取的第三注视点相比第二注视点更大概率的能够准确的表征实际的注视点。那么基于前述方式,根据所述第二注视点获取目标注视点包括:获取当前的应用场景,获取与当前的应用场景所对应的确定注视点的方式;然后再基于当前的应用场景所对应的确定注视点的方式来获取目标注视点。并且,应用场景所对应的确定注视点的方式与该应用场景所需的检测精度对应。例如,若当前的应用场景所对应的确定注视点的方式为将第二注视点作为目标注视点,那么在获取到第二网络模型输出的第二注视点后,则会将获取到的第二注视点作为目标注视点。若当前的应用场景所对应的确定注视点的方式为将第三注视点作为目标注视点,那么则在获取到第二网络模型输出的第二注视点以后,还会获取多个历史第二注视点,将第二注视点以及多个历史第二注视点输入到第三网络模型,并获取第三网络模型输出的第三注视点作为目标注视点。Among the two ways of obtaining the target gaze point provided in the embodiment of the present application, the obtained third gaze point has a higher probability than the second gaze point and can accurately represent the actual gaze point. Then based on the aforementioned method, obtaining the target gaze point according to the second gaze point includes: obtaining the current application scene, obtaining the way of determining the gaze point corresponding to the current application scene; and then determining the gaze point corresponding to the current application scene The fixation point is used to obtain the target fixation point. Moreover, the manner of determining the gaze point corresponding to the application scenario corresponds to the detection accuracy required by the application scenario. For example, if the way of determining the gaze point corresponding to the current application scenario is to use the second gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, the obtained second gaze point will be The fixation point serves as the target fixation point. If the method of determining the gaze point corresponding to the current application scenario is to use the third gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, multiple historical second gaze points will be obtained Point, input the second gaze point and multiple historical second gaze points into the third network model, and obtain the third gaze point output by the third network model as the target gaze point.
可选的,电子设备可以根据在进行注视点检测过程中,电子设备在前台运行的应用程序来确定当前的应用场景。例如,若当前在前台运行的应用程序为文本浏览程序,那么则可以确定当前的场景为信息浏览场景。Optionally, the electronic device may determine the current application scene according to the application program running on the electronic device in the foreground during gaze point detection. For example, if the application currently running in the foreground is a text browsing program, then it can be determined that the current scene is an information browsing scene.
请参阅图12,本申请提供的一种注视点获取方法,应用于电子设备,所述方法包括:Please refer to FIG. 12 , a gaze point acquisition method provided by this application is applied to electronic devices, and the method includes:
S310:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。S310: Obtain a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
S320:检测所述第一注视点是否有效。S320: Detect whether the first gaze point is valid.
作为一种方式,所述检测所述第一注视点是否有效,包括:检测所述第一注视点所表征的眼球状态是否满足目标状态;若满足目标状态,确定所述第一注视点有效。其中,目标状态包括眼睛处于睁眼状态。在一些情况下,即使用户的眼睛处于闭眼状态下,第一网络模型依然可以进行第一注视点的输出,只是所输出的第一注视点是无效的。那么通过对第一注视点进行是否有效的筛选,可以将用户实际处于闭眼状态的图像对应输出的注视点进行筛除,以避免将无效的第一注视点再输入到后续的模型中。As a manner, the detecting whether the first gaze point is valid includes: detecting whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is satisfied, determining that the first gaze point is valid. Wherein, the target state includes eyes being in an open state. In some cases, even if the user's eyes are closed, the first network model can still output the first gaze point, but the outputted first gaze point is invalid. Then, by screening whether the first gaze point is valid, the gaze point corresponding to the image in which the user is actually in the closed-eye state can be screened out, so as to avoid inputting the invalid first gaze point into the subsequent model.
S330:若所述第一注视点有效,将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况。S330: If the first gaze point is valid, input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical The gaze point distribution information represents the distribution of the second gaze points historically output by the second network model.
S340:根据所述第二注视点获取目标注视点。S340: Obtain a target gaze point according to the second gaze point.
若所述第一注视点无效,结束流程。If the first gaze point is invalid, end the procedure.
本实施例提供的一种注视点获取方法,从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。并且,在本实施例中,还可以在得到第一注视点之后,可以先对第一注视点是否有效进行判断,进而在第一注视点本身就是无效的情况下,则不会再进行后续的处理,进而有利于提升基于注视点对电子设备进行控制的有效性。This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, in this embodiment, after obtaining the first fixation point, it is possible to first judge whether the first fixation point is valid, and then if the first fixation point itself is invalid, no subsequent follow-up processing, which in turn helps to improve the effectiveness of controlling the electronic device based on the point of gaze.
请参阅图13,本申请提供的一种注视点获取方法,所述方法包括:Please refer to FIG. 13 , a method for obtaining a gaze point provided by the present application, the method includes:
S410:获取样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点。S410: Obtain the sample gaze state images and the labeled gaze points corresponding to each sample gaze state image.
S420:通过所述样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点,对第一待训练的网络模型进行训练,得到所述第一网络模型。S420: Train the first network model to be trained by using the sample gaze state images and the labeled gaze points corresponding to each sample gaze state image, to obtain the first network model.
S430:获取所述第一待训练的网络模型在训练过程中的输出的注视点作为第一训练注视点。S430: Obtain a fixation point output by the first network model to be trained during the training process as a first training fixation point.
S440:通过所述第一训练注视点、历史第二训练注视点分布信息以及所述每个样本注视状态图像各自对应的标注注视点,对第二待训练的网络模型进行训练,得到所述第二网络模型,其中,所述历史第二训练注视点分布信息包括所述第二待训练的网络模型在训练过程中输出的注视点的分布情况。S440: Train the second network model to be trained by using the first training gaze point, the distribution information of the historical second training gaze point distribution information, and the marked gaze point corresponding to each sample gaze state image, to obtain the first training gaze point. The second network model, wherein the historical second training fixation point distribution information includes the distribution of fixation points output by the second network model to be trained during the training process.
S450:若获取到第二待训练的网络模型输出的第二训练注视点,则获取多个历史第二训练注视点,其中,所述历史第二训练注视点为所述第二待训练的网络模型根据输入的历史第一训练注视点所输出,所述历史第一训练注视点为所述第一待训练的网络模型根据在当前样本注视状态图像之前输入到所述第一待训练的网络模型的样本注视状态图像输出,所述当前样本注视状态图像为所述第二训练注视点对应的样本注视状态图像。S450: If the second training fixation point output by the second network model to be trained is obtained, obtain a plurality of historical second training fixation points, wherein the historical second training fixation points are the second training fixation point of the second network to be trained The model is output according to the history of the first training fixation point of input, and the first training fixation point of the history is that the network model to be trained is input to the first network model to be trained according to the current sample gaze state image The sample gaze state image output of the current sample gaze state image is the sample gaze state image corresponding to the second training gaze point.
S460:通过所述第二训练注视点以及所述多个历史第二训练注视点对第三待训练的网络模型进行训练,得到所述第三网络模型。S460: Train a third network model to be trained by using the second training gaze point and the multiple historical second training gaze points, to obtain the third network model.
S470:获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。S470: Acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
S480:将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况。S480: Input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point distribution information represents the second gaze point distribution information The distribution of the second fixation point historically output by the network model.
S490:根据所述第二注视点获取目标注视点。S490: Obtain a target gaze point according to the second gaze point.
示例性的,如图14所示,所获取的多个样本注视状态图像中,每个样本注视状态图像包括有左眼图像、右眼图像、人脸图像以及人脸中五个关键点图像。其中,人脸图像表征的是人脸中五官的相对分布位置。其中,人脸中的五个关键点包括有两只眼球中心、鼻子以及两个嘴角。Exemplarily, as shown in FIG. 14 , among the acquired multiple sample gaze state images, each sample gaze state image includes a left-eye image, a right-eye image, a face image, and images of five key points in the face. Wherein, the face image represents the relative distribution position of the facial features in the face. Among them, the five key points in the human face include the centers of two eyeballs, the nose and the two corners of the mouth.
在获取得到样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点(图13中的坐标点真值)后,会从该样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点中筛选出部分数据以生成批数据(batch数据),然后将该批数据输入到神经网络模型(第一待训练的网络模型)中使得该神经网络模型进行推理,并输出预测坐标点(第一训练注视点),然后结合坐标点真值计算损失,再根据计算的损失对神经网络模型进行训练,以优化神经网络模型的梯度,使得后续再计算得到的损失相对降低,直到所计算出的损失最小。After obtaining the sample gaze state image and the respective corresponding annotation gaze points (coordinate point true value in Fig. Part of the data is screened out from the gaze point to generate batch data (batch data), and then the batch data is input into the neural network model (the first network model to be trained) to make the neural network model perform inference, and output the predicted coordinate point ( The first training gaze point), and then calculate the loss combined with the true value of the coordinate point, and then train the neural network model according to the calculated loss to optimize the gradient of the neural network model, so that the loss obtained by the subsequent calculation is relatively reduced until the calculated loss is minimal.
需要说明的是,在本实施例中,S410到S460可以由服务器来执行,在服务器执行完成S410到S460的步骤后,可以将训练完整的第一网络模型、第二网络模型以及第三网络模型部署到电子设备中,从而电子设备再对应执行本申请实施例中的S470到S490的步骤。It should be noted that, in this embodiment, S410 to S460 can be executed by the server. After the server executes the steps from S410 to S460, the complete first network model, second network model and third network model can be trained Deployed to the electronic device, so that the electronic device correspondingly executes the steps from S470 to S490 in the embodiment of the present application.
本实施例提供的一种注视点获取方法,从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。并且,在本实施例中提供了一种对于第一网络模型、第二网络模型以及第三网络模型的训练方式。This embodiment provides a fixation point acquisition method, so that the first fixation point output through the first network model is further compared with the distribution of the second fixation point historically output by the second network model through the above method The distribution information of the historical gaze point is input to the second network model together, and then the target gaze point is obtained according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, this embodiment provides a training method for the first network model, the second network model and the third network model.
请参阅图15,本申请提供的一种注视点获取装置500,运行于电子设备,所述装置500包括:Please refer to FIG. 15 , a gaze point acquisition device 500 provided by the present application runs on an electronic device, and the device 500 includes:
第一注视点获取单元510,用于获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点。The first gaze point acquisition unit 510 is configured to acquire a first gaze point, where the first gaze point is a gaze point obtained by inputting the gaze state image into the first network model.
作为一种方式,所述注视状态图像包括眼部特征图像、脸部特征图像以及人脸关键点图像,其中,所述眼部特征图像表征虹膜位置以及眼球位置,所述脸部特征图像表征脸部的五官的分布情况,所述人脸关键点图像表征人脸中关键点的位置。As a method, the gaze state image includes an eye feature image, a face feature image, and a face key point image, wherein the eye feature image represents the iris position and the eyeball position, and the face feature image represents the face. The distribution of facial features in the human face, and the key point image of the human face represents the position of the key points in the human face.
第二注视点获取单元520,用于将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况。The second gaze point acquisition unit 520 is configured to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point The point distribution information represents the distribution of the second fixation points historically output by the second network model.
注视点确定单元530,用于根据所述第二注视点获取目标注视点。The gaze point determining unit 530 is configured to acquire a target gaze point according to the second gaze point.
作为一种方式,注视点确定单元530,具体用于获取多个历史第二注视点,其中,所述历史第二注视点为所述第二网络模型根据输入的历史第一注视点输出,所述历史第一注视点为所述第一网络模型根据在所述注视状态图像输入之前所输入的注视状态图像输出;将所述第二注视点以及所述多个历史第二注视点输入到第三网络模型,获取所述第三网络模型输出的第三注视点;将所述第三注视点作为目标注视点。注视点确定单元530,还具体用于获取所述电子设备的数据处理参数,所述处理参数表征所述电子设备的数据处理能力;根据所述数据处理参数确定所获取的所述多个历史第二注视点的数量。As a manner, the gaze point determination unit 530 is specifically configured to obtain multiple historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, so The first gaze point of the history is that the first network model is output according to the gaze state image input before the gaze state image input; the second gaze point and the multiple historical second gaze points are input to the second gaze point A three-network model, obtaining a third gaze point output by the third network model; using the third gaze point as a target gaze point. The gaze point determining unit 530 is also specifically configured to acquire data processing parameters of the electronic device, the processing parameters characterizing the data processing capability of the electronic device; Two the number of gaze points.
作为一种方式,第二注视点获取单元520还用于在将所述第一注视点以及历史注视点输入到第二网络模型,获取所述第二网络模型输出的第二注视点之前,检测所述第一注视点是否有效;若所述第一注视点有效,执行将所述第一注视点以及历史注视点输入到第二网络模型,获取所述第二网络模型输出的第二注视点。可选的,第二注视点获取单元520具体用于检测所述第一注视点所表征的眼球状态是否满足目标状态;若满足目标状态,确定所述第一注视点有效。As a method, the second gaze point acquisition unit 520 is also used to detect Whether the first gaze point is valid; if the first gaze point is valid, input the first gaze point and historical gaze points to the second network model to obtain the second gaze point output by the second network model . Optionally, the second gaze point acquisition unit 520 is specifically configured to detect whether the eyeball state represented by the first gaze point satisfies a target state; if the target state is met, determine that the first gaze point is valid.
本实施例提供的一种注视点获取装置,获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点,然后再将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,进而根据所述第二注视点获取目标注视点。从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。A fixation point acquisition device provided in this embodiment is used to acquire a first fixation point, the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model, and then the first fixation point And historical gaze point distribution information is input to the second network model, the second gaze point output by the second network model is obtained, and the target gaze point is obtained according to the second gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point.
如图16所示,装置500,还包括:As shown in Figure 16, the device 500 also includes:
模型训练单元540,用于获取样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点;通过所述样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点,对第一待训练的网络模型进行训练,得到所述第一网络模型。The model training unit 540 is used to obtain the sample gaze state image and each sample gaze state image corresponding to the marked gaze point; by the sample gaze state image and each sample gaze state image respectively corresponding to the mark gaze point, to the first The network model to be trained is trained to obtain the first network model.
模型训练单元540,还用于获取所述第一待训练的网络模型在训练过程中的输出的注视点作为第一训练注视点;通过所述第一训练注视点、历史第二训练注视点分布信息以及所述每个样本注视状态图像各自对应的标注注视点,对第二待训练的网络模型进行训练,得到所述第二网络模型,其中,所述历史第二训练注视点分布信息包括所述第二待训练的网络模型在训练过程中输出的注视点的分布情况。The model training unit 540 is also used to obtain the gaze point of the output of the first network model to be trained in the training process as the first training gaze point; through the first training gaze point, the history of the second training gaze point distribution information and the corresponding marked gaze point of each sample gaze state image, the second network model to be trained is trained to obtain the second network model, wherein the historical second training gaze point distribution information includes the Describe the distribution of gaze points output by the second network model to be trained during the training process.
模型训练单元540,还用于若获取到第二待训练的网络模型输出的第二训练注视点,则获取多个历史第二训练注视点,其中,所述历史第二训练注视点为所述第二待训练的网络模型根据输入的历史第一训练注视点所输出,所述历史第一训练注视点为所述第一待训练的网络模型根据在当前样本注视状态图像之前输入到所述第一待训练的网络模型的样本注视状态图像输出,所述当前样本注视状态图像为所述第二训练注视点对应的样本注视状态图像;通过所述第二训练注视点以及所述多个历史第二训练注视点对第三待训练的网络模型进行训练,得到所述第三网络模型。The model training unit 540 is further configured to obtain a plurality of historical second training gaze points if the second training gaze point output by the second network model to be trained is obtained, wherein the historical second training gaze point is the The second network model to be trained is output according to the historical first training fixation point of input, and the first training fixation point of the history is that the first network model to be trained is input to the first fixation point according to the current sample gaze state image. A sample gaze state image output of a network model to be trained, the current sample gaze state image is a sample gaze state image corresponding to the second training gaze point; through the second training gaze point and the plurality of historical first Two training gaze points are used to train the third network model to be trained to obtain the third network model.
需要说明的是,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性。另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described devices and units can refer to the corresponding process in the foregoing method embodiment, and will not be repeated here. . In several embodiments provided in the present application, the coupling between the modules may be electrical. In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
下面将结合图17对本申请提供的一种电子设备进行说明。An electronic device provided by the present application will be described below with reference to FIG. 17 .
请参阅图17,基于上述的设备控制方法、装置,本申请实施例还提供的一种可以执行前述设备控制方 法的电子设备1000。电子设备1000包括相互耦合的一个或多个(图中仅示出一个)处理器102、存储器104、摄像头106以及音频采集装置108。其中,该存储器104中存储有可以执行前述实施例中内容的程序,而处理器102可以执行该存储器104中存储的程序。Please refer to FIG. 17 , based on the above-mentioned device control method and apparatus, the embodiment of the present application also provides an electronic device 1000 that can implement the aforementioned device control method. The electronic device 1000 includes one or more (only one is shown in the figure) processors 102 , a memory 104 , a camera 106 and an audio collection device 108 coupled to each other. Wherein, the memory 104 stores programs capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104 .
其中,处理器102可以包括一个或者多个处理核。处理器102利用各种接口和线路连接整个电子设备1000内的各个部分,通过运行或执行存储在存储器104内的指令、程序、代码集或指令集,以及调用存储在存储器104内的数据,执行电子设备1000的各种功能和处理数据。可选地,处理器102可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器102可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器102中,单独通过一块通信芯片进行实现。作为一种方式,处理器102可以为神经网络芯片。例如,可以为嵌入式神经网络芯片(NPU)。Wherein, the processor 102 may include one or more processing cores. The processor 102 uses various interfaces and circuits to connect various parts of the entire electronic device 1000, and executes or executes instructions, programs, code sets, or instruction sets stored in the memory 104, and calls data stored in the memory 104 to execute Various functions of the electronic device 1000 and processing data. Optionally, the processor 102 may adopt at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). implemented in the form of hardware. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used to render and draw the displayed content; the modem is used to handle wireless communication. It can be understood that the above modem may also not be integrated into the processor 102, but implemented by a communication chip alone. As one manner, the processor 102 may be a neural network chip. For example, it may be an embedded neural network chip (NPU).
存储器104可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器104可用于存储指令、程序、代码、代码集或指令集。例如,存储器104中可以存储有数据获取装置。该数据获取装置可以为前述的装置500。存储器104可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。The memory 104 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. For example, a data acquisition device may be stored in the memory 104 . The data acquisition device may be the aforementioned device 500 . The memory 104 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
再者,电子设备1000除了前述所示的器件外,还可以包括网络模块110以及传感器模块112。Furthermore, the electronic device 1000 may further include a network module 110 and a sensor module 112 in addition to the aforementioned devices.
所述网络模块110用于实现电子设备1000与其他设备之间的信息交互,例如,传输设备控制指令、操纵请求指令以及状态信息获取指令等。而当电子设备200具体为不同的设备时,其对应的网络模块110可能会有不同。The network module 110 is used to implement information interaction between the electronic device 1000 and other devices, for example, transmitting device control instructions, manipulation request instructions, and status information acquisition instructions. However, when the electronic device 200 is specifically a different device, its corresponding network module 110 may be different.
传感器模块112可以包括至少一种传感器。具体地,传感器模块112可包括但并不限于:水平仪、光传感器、运动传感器、压力传感器、红外热传感器、距离传感器、加速度传感器、以及其他传感器。The sensor module 112 may include at least one sensor. Specifically, the sensor module 112 may include, but is not limited to: a level, a light sensor, a motion sensor, a pressure sensor, an infrared heat sensor, a distance sensor, an acceleration sensor, and other sensors.
其中,压力传感器可以检测由按压在电子设备1000产生的压力的传感器。即,压力传感器检测由用户和电子设备之间的接触或按压产生的压力,例如由用户的耳朵与移动终端之间的接触或按压产生的压力。因此,压力传感器可以用来确定在用户与电子设备1000之间是否发生了接触或者按压,以及压力的大小。Wherein, the pressure sensor may be a sensor for detecting pressure generated by pressing on the electronic device 1000 . That is, the pressure sensor detects pressure generated by contact or press between the user and the electronic device, eg, contact or press between the user's ear and the mobile terminal. Therefore, the pressure sensor can be used to determine whether contact or pressure occurs between the user and the electronic device 1000, and the magnitude of the pressure.
其中,加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别电子设备1000姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等。另外,电子设备1000还可配置陀螺仪、气压计、湿度计、温度计等其他传感器,在此不再赘述。Among them, the acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is still, and can be used to identify the application of electronic equipment 1000 attitude (such as horizontal and vertical screen switching, related games, magnetometer, etc.) Attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc. In addition, the electronic device 1000 may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, and a thermometer, which will not be repeated here.
音频采集装置110,用于进行音频信号采集。可选的,音频采集装置110包括有多个音频采集器件,该音频采集器件可以为麦克风。The audio collection device 110 is configured to collect audio signals. Optionally, the audio collection device 110 includes multiple audio collection devices, and the audio collection devices may be microphones.
作为一种方式,电子设备1000的网络模块为射频模块,该射频模块用于接收以及发送电磁波,实现电磁波与电信号的相互转换,从而与通讯网络或者其他设备进行通讯。所述射频模块可包括各种现有的用于执行这些功能的电路元件,例如,天线、射频收发器、数字信号处理器、加密/解密芯片、用户身份模块(SIM)卡、存储器等等。例如,该射频模块可以通过发送或者接收的电磁波与外部设备进行交互。例如,射频模块可以向目标设备发送指令。As one way, the network module of the electronic device 1000 is a radio frequency module, and the radio frequency module is used to receive and send electromagnetic waves, realize mutual conversion between electromagnetic waves and electrical signals, and communicate with a communication network or other devices. The radio frequency module may include various existing circuit elements for performing these functions, such as antenna, radio frequency transceiver, digital signal processor, encryption/decryption chip, Subscriber Identity Module (SIM) card, memory and so on. For example, the radio frequency module can interact with external devices by sending or receiving electromagnetic waves. For example, a radio frequency module can send instructions to a target device.
请参考图18,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质800中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 18 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. Program codes are stored in the computer-readable medium 800, and the program codes can be invoked by a processor to execute the methods described in the foregoing method embodiments.
计算机可读存储介质800可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质800包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质800具有执行上述方法中的任何方法步骤的程序代码810的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码810可以例如以适当形式进行压缩。The computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium). The computer-readable storage medium 800 has a storage space for program code 810 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products. Program code 810 may, for example, be compressed in a suitable form.
综上所述,本申请提供的一种注视点获取方法、装置、电子设备及可读存储介质,获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点,然后再将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,进而根据所述第二注视点获取目标注视点。从而通过上述方式使得对于通过第一网络模型所输出的第一注视点,还会进一步与表征第二网络模型历史输出的第二注视点的分布情况的历史注视点分布信息一同输入到第二网络模型,进而根据第二网络模型输出的第二注视点获取目标注视点,从而提升了所目标注视点的精确程度。并且,在本申请实施例中,因为可以通过第二网络模型以及第三网络模型来使得最终所目标注视点更加精确和稳 定,因此,在用户开始使用阶段并不需要用户再根据电子设备提示的注视位置进行标定操作,从而节省了用户的时间,也提升了效率。再者,也使得本申请实施例提供的方案可以更好的适用不同用户,而不会存在In summary, the present application provides a fixation point acquisition method, device, electronic equipment, and readable storage medium to acquire the first fixation point, which is obtained by inputting the fixation state image into the first network model. Obtained fixation point, then input the first fixation point and historical fixation point distribution information to the second network model, obtain the second fixation point output by the second network model, and then obtain according to the second fixation point target gaze point. Therefore, through the above method, the first gaze point output by the first network model will be further input into the second network together with the historical gaze point distribution information representing the distribution of the second gaze point historically output by the second network model. model, and then obtain the target gaze point according to the second gaze point output by the second network model, thereby improving the accuracy of the target gaze point. Moreover, in the embodiment of the present application, because the second network model and the third network model can be used to make the final target gaze point more accurate and stable, therefore, it is not necessary for the user to follow the prompts of the electronic device when the user starts using it. Focus on the position for calibration operations, which saves the user's time and improves efficiency. Moreover, it also makes the solution provided by the embodiment of the present application better applicable to different users, and there will be no
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not drive the essence of the corresponding technical solutions away from the spirit and scope of the technical solutions of the various embodiments of the present application.

Claims (20)

  1. 一种注视点获取方法,其特征在于,应用于电子设备,所述方法包括:A gaze point acquisition method, characterized in that it is applied to electronic equipment, the method comprising:
    获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点;Acquiring a first gaze point, the first gaze point is the gaze point obtained by inputting the gaze state image into the first network model;
    将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况;The first gaze point and historical gaze point distribution information are input to the second network model, and the second gaze point output by the second network model is obtained, wherein the historical gaze point distribution information characterizes the second network model The distribution of the second gaze point of the historical output;
    根据所述第二注视点获取目标注视点。Acquiring a target gaze point according to the second gaze point.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第二注视点获取目标注视点包括:The method according to claim 1, wherein said acquiring a target gaze point according to said second gaze point comprises:
    获取多个历史第二注视点,其中,所述历史第二注视点为所述第二网络模型根据输入的历史第一注视点输出,所述历史第一注视点为所述第一网络模型根据在所述注视状态图像输入之前所输入的注视状态图像输出;Obtaining a plurality of historical second gaze points, wherein the historical second gaze points are output by the second network model according to the input historical first gaze points, and the historical first gaze points are output by the first network model according to The gaze state image output that is input before the gaze state image input;
    将所述第二注视点以及所述多个历史第二注视点输入到第三网络模型,获取所述第三网络模型输出的第三注视点;The second gaze point and the multiple historical second gaze points are input to a third network model, and the third gaze point output by the third network model is obtained;
    将所述第三注视点作为目标注视点。The third gaze point is used as the target gaze point.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    获取所述电子设备的数据处理参数,所述所述处理参数表征所述电子设备的数据处理能力;Acquire data processing parameters of the electronic device, where the processing parameters represent the data processing capabilities of the electronic device;
    根据所述数据处理参数确定所获取的所述多个历史第二注视点的数量。The quantity of the acquired multiple historical second fixation points is determined according to the data processing parameter.
  4. 根据权利要求3所述的方法,其特征在于,若所述数据处理参数所表征的电子设备的数据处理能力越强,则所获取的多个历史第二注视点的数量越多,若数据处理参数所表征的电子设备的数据处理能力越弱,则所获取的多个历史第二注视点的数量越少。The method according to claim 3, characterized in that, if the data processing capability of the electronic device represented by the data processing parameter is stronger, then the number of multiple historical second fixation points acquired is larger, if the data processing The weaker the data processing capability of the electronic device represented by the parameter, the smaller the number of acquired multiple historical second gaze points.
  5. 根据权利要求3所述的方法,其特征在于,所述数据处理参数包括多个参数,所述根据所述数据处理参数确定所获取的所述多个历史第二注视点的数量,包括:The method according to claim 3, wherein the data processing parameters include a plurality of parameters, and the determination of the number of acquired second gaze points according to the data processing parameters includes:
    获取所述多个参数各自对应的评分;Acquiring scores corresponding to each of the multiple parameters;
    基于所述多个参数各自对应的评分来得到总评分;obtaining a total score based on scores corresponding to each of the plurality of parameters;
    根据所述总评分确定获取的多个历史第二注视点的数量。The number of acquired multiple historical second fixation points is determined according to the total score.
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述多个参数各自对应的评分来得到总评分,包括:The method according to claim 5, wherein said obtaining a total score based on the respective scores corresponding to said plurality of parameters comprises:
    获取得到所述多个参数各自对应的评分规则;Acquiring scoring rules corresponding to the multiple parameters;
    基于每个参数各自对应的评分规则获取每个参数各自对应的评分。A score corresponding to each parameter is obtained based on a scoring rule corresponding to each parameter.
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述第二注视点获取目标注视点包括:The method according to claim 1, wherein said acquiring a target gaze point according to said second gaze point comprises:
    获取当前的应用场景;Get the current application scenario;
    获取与所述当前的应用场景所对应的确定注视点的方式;Obtaining a way of determining a gaze point corresponding to the current application scene;
    基于当前的应用场景所对应的确定注视点的方式以及第二注视点获取目标注视点。The target gaze point is acquired based on the way of determining the gaze point corresponding to the current application scene and the second gaze point.
  8. 根据权利要求7所述的方法,其特征在于,所述基于当前的应用场景所对应的确定注视点的方式以及第二注视点获取目标注视点,包括:The method according to claim 7, wherein the method of determining the gaze point corresponding to the current application scene and the second gaze point to obtain the target gaze point include:
    若当前的应用场景所对应的确定注视点的方式为将第二注视点作为目标注视点,则在获取到所述第二网络模型输出的所述第二注视点后,将获取到的所述第二注视点作为目标注视点。If the way of determining the gaze point corresponding to the current application scenario is to use the second gaze point as the target gaze point, after obtaining the second gaze point output by the second network model, the acquired The second gaze point is used as the target gaze point.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method according to claim 8, characterized in that the method further comprises:
    若当前的应用场景所对应的确定注视点的方式为将第三注视点作为目标注视点,则在获取到所述第二网络模型输出的所述第二注视点以后,获取多个历史第二注视点,将所述第二注视点以及所述多个历史第二注视点输入到第三网络模型;If the way of determining the gaze point corresponding to the current application scenario is to use the third gaze point as the target gaze point, then after obtaining the second gaze point output by the second network model, obtain multiple historical second gaze points. fixation point, inputting the second fixation point and the plurality of historical second fixation points into a third network model;
    获取所述第三网络模型输出的第三注视点作为目标注视点。Acquiring a third gaze point output by the third network model as a target gaze point.
  10. 根据权利要求7所述的方法,其特征在于,获取当前的应用场景,包括:The method according to claim 7, wherein acquiring the current application scenario comprises:
    根据在前台运行的应用程序确定当前的应用场景。Determine the current application scenario based on the applications running in the foreground.
  11. 根据权利要求1-10任一所述的方法,其特征在于,所述将所述第一注视点以及历史注视点输入到第二网络模型,获取所述第二网络模型输出的第二注视点之前还包括:The method according to any one of claims 1-10, wherein the first gaze point and the historical gaze point are input to the second network model, and the second gaze point output by the second network model is obtained Previously also included:
    检测所述第一注视点是否有效;Detecting whether the first gaze point is valid;
    若所述第一注视点有效,执行所述将所述第一注视点以及历史注视点输入到第二网络模型,获取所述第二网络模型输出的第二注视点。If the first gaze point is valid, perform the step of inputting the first gaze point and historical gaze points into the second network model, and obtain a second gaze point output by the second network model.
  12. 根据权利要求11所述的方法,其特征在于,所述检测所述第一注视点是否有效,包括:The method according to claim 11, wherein the detecting whether the first gaze point is valid comprises:
    检测所述第一注视点所表征的眼球状态是否满足目标状态;Detecting whether the eyeball state represented by the first fixation point satisfies the target state;
    若满足目标状态,确定所述第一注视点有效。If the target state is met, it is determined that the first gaze point is valid.
  13. 根据权利要求1-12任一所述的方法,其特征在于,所述注视状态图像包括眼部特征图像、脸部特征图像以及人脸关键点图像,其中,所述眼部特征图像表征虹膜位置以及眼球位置,所述脸部特征图像表征脸部的五官的分布情况,所述人脸关键点图像表征人脸中关键点的位置。The method according to any one of claims 1-12, wherein the gaze state image comprises an eye feature image, a face feature image and a face key point image, wherein the eye feature image represents the position of the iris As well as eyeball positions, the facial feature image represents the distribution of facial features, and the human face key point image represents the position of key points in the human face.
  14. 根据权利要求1-13任一所述的方法,其特征在于,所述获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点之前还包括:According to the method described in any one of claims 1-13, it is characterized in that said obtaining the first point of fixation, said first point of fixation is also included before the fixation point obtained by inputting the fixation state image into the first network model :
    获取样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点;Obtaining the sample fixation state image and the marked fixation point corresponding to each sample fixation state image;
    通过所述样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点,对第一待训练的网络模型进行训练,得到所述第一网络模型。The first network model to be trained is trained by using the sample gaze state images and the marked gaze points corresponding to each sample gaze state image to obtain the first network model.
  15. 根据权利要求14所述的方法,其特征在于,所述获取样本注视状态图像以及每个样本注视状态图像各自对应的标注注视点之后还包括:The method according to claim 14, characterized in that, after said obtaining the sample gaze state image and each corresponding mark gaze point of each sample gaze state image, it also includes:
    获取所述第一待训练的网络模型在训练过程中的输出的注视点作为第一训练注视点;Obtain the fixation point of the output of the first network model to be trained in the training process as the first training fixation point;
    通过所述第一训练注视点、历史第二训练注视点分布信息以及所述每个样本注视状态图像各自对应的标注注视点,对第二待训练的网络模型进行训练,得到所述第二网络模型,其中,所述历史第二训练注视点分布信息包括Through the first training fixation point, the historical second training fixation point distribution information and the corresponding labeling fixation point of each sample gaze state image, the second network model to be trained is trained to obtain the second network Model, wherein, the second training fixation point distribution information of the history includes
    所述第二待训练的网络模型在训练过程中输出的注视点的分布情况。The distribution of gaze points output by the second network model to be trained during the training process.
  16. 根据权利要求15所述的方法,其特征在于,所述获取所述第一待训练的网络模型在训练过程中的输出的注视点作为第一训练注视点之后还包括:The method according to claim 15, characterized in that, after obtaining the gaze point of the output of the first network model to be trained in the training process as the first training gaze point, it also includes:
    若获取到第二待训练的网络模型输出的第二训练注视点,则获取多个历史第二训练注视点,其中,所述历史第二训练注视点为所述第二待训练的网络模型根据输入的历史第一训练注视点所输出,所述历史第一训练注视点为所述第一待训练的网络模型根据在当前样本注视状态图像之前输入到所述第一待训练的网络模型的样本注视状态图像输出,所述当前样本注视状态图像为所述第二训练注视点对应的样本注视状态图像;If the second training fixation point output by the second network model to be trained is obtained, a plurality of historical second training fixation points are obtained, wherein the second training fixation point of the history is the second training fixation point of the second network model to be trained according to The first training fixation point of the input history is output, and the first training fixation point of the history is the first network model to be trained according to the sample input to the first network model to be trained before the current sample gaze state image The fixation state image output, the current sample fixation state image is the sample fixation state image corresponding to the second training fixation point;
    通过所述第二训练注视点以及所述多个历史第二训练注视点对第三待训练的网络模型进行训练,得到所述第三网络模型。The third network model to be trained is obtained by using the second training gaze point and the plurality of historical second training gaze points to train the third network model.
  17. 根据权利要求1-16任一所述的方法,其特征在于,所述获取第一注视点之前还包括:The method according to any one of claims 1-16, characterized in that before acquiring the first gaze point, it also includes:
    通过所述电子设备的图像采集器件采集用户脸部的图像,得到注视状态图像。The image of the user's face is collected by the image collection device of the electronic device to obtain a gaze state image.
  18. 一种注视点获取装置,其特征在于,运行于电子设备,所述装置包括:A gaze point acquisition device is characterized in that it operates on electronic equipment, and the device includes:
    第一注视点获取单元,用于获取第一注视点,所述第一注视点为将注视状态图像输入到第一网络模型所得到的注视点;The first fixation point acquisition unit is used to acquire the first fixation point, and the first fixation point is the fixation point obtained by inputting the fixation state image into the first network model;
    第二注视点获取单元,用于将所述第一注视点以及历史注视点分布信息输入到第二网络模型,获取所述第二网络模型输出的第二注视点,其中,所述历史注视点分布信息表征所述第二网络模型历史输出的第二注视点的分布情况;The second gaze point acquisition unit is used to input the first gaze point and historical gaze point distribution information into the second network model, and obtain the second gaze point output by the second network model, wherein the historical gaze point The distribution information characterizes the distribution of the second fixation point historically output by the second network model;
    注视点确定单元,用于根据所述第二注视点获取目标注视点。A gaze point determining unit, configured to acquire a target gaze point according to the second gaze point.
  19. 一种电子设备,其特征在于,包括一个或多个处理器以及存储器;An electronic device, characterized in that it includes one or more processors and memory;
    一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行权利要求1-17任一所述的方法。One or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any one of claims 1-17.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码运行时执行权利要求1-17任一所述的方法。A computer-readable storage medium, wherein a program code is stored in the computer-readable storage medium, wherein the method according to any one of claims 1-17 is executed when the program code is running.
PCT/CN2022/117847 2021-09-30 2022-09-08 Gaze point acquisition method and apparatus, electronic device and readable storage medium WO2023051215A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111161492.9 2021-09-30
CN202111161492.9A CN113900519A (en) 2021-09-30 2021-09-30 Method and device for acquiring fixation point and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023051215A1 true WO2023051215A1 (en) 2023-04-06

Family

ID=79189909

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117847 WO2023051215A1 (en) 2021-09-30 2022-09-08 Gaze point acquisition method and apparatus, electronic device and readable storage medium

Country Status (2)

Country Link
CN (1) CN113900519A (en)
WO (1) WO2023051215A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116688312A (en) * 2023-06-08 2023-09-05 深圳市心流科技有限公司 Multi-person concentration training method, device, terminal equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900519A (en) * 2021-09-30 2022-01-07 Oppo广东移动通信有限公司 Method and device for acquiring fixation point and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147163A (en) * 2019-05-20 2019-08-20 浙江工业大学 The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus
CN110647790A (en) * 2019-04-26 2020-01-03 北京七鑫易维信息技术有限公司 Method and device for determining gazing information
US20200202561A1 (en) * 2018-12-24 2020-06-25 Samsung Electronics Co., Ltd. Method and apparatus with gaze estimation
CN111723596A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Method, device and equipment for detecting gazing area and training neural network
CN111783948A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN113900519A (en) * 2021-09-30 2022-01-07 Oppo广东移动通信有限公司 Method and device for acquiring fixation point and electronic equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058694B (en) * 2019-04-24 2022-03-25 腾讯科技(深圳)有限公司 Sight tracking model training method, sight tracking method and sight tracking device
US11347308B2 (en) * 2019-07-26 2022-05-31 Samsung Electronics Co., Ltd. Method and apparatus with gaze tracking
CN110728333B (en) * 2019-12-19 2020-06-12 广东博智林机器人有限公司 Sunshine duration analysis method and device, electronic equipment and storage medium
CN111176447A (en) * 2019-12-25 2020-05-19 中国人民解放军军事科学院国防科技创新研究院 Augmented reality eye movement interaction method fusing depth network and geometric model
CN111382714B (en) * 2020-03-13 2023-02-17 Oppo广东移动通信有限公司 Image detection method, device, terminal and storage medium
CN111399658B (en) * 2020-04-24 2022-03-15 Oppo广东移动通信有限公司 Calibration method and device for eyeball fixation point, electronic equipment and storage medium
CN111598038B (en) * 2020-05-22 2023-06-20 深圳市瑞立视多媒体科技有限公司 Facial feature point detection method, device, equipment and storage medium
CN112905839A (en) * 2021-02-10 2021-06-04 北京有竹居网络技术有限公司 Model training method, model using device, storage medium and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200202561A1 (en) * 2018-12-24 2020-06-25 Samsung Electronics Co., Ltd. Method and apparatus with gaze estimation
CN111723596A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Method, device and equipment for detecting gazing area and training neural network
CN110647790A (en) * 2019-04-26 2020-01-03 北京七鑫易维信息技术有限公司 Method and device for determining gazing information
CN110147163A (en) * 2019-05-20 2019-08-20 浙江工业大学 The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus
CN111783948A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN113900519A (en) * 2021-09-30 2022-01-07 Oppo广东移动通信有限公司 Method and device for acquiring fixation point and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116688312A (en) * 2023-06-08 2023-09-05 深圳市心流科技有限公司 Multi-person concentration training method, device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN113900519A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
US10169639B2 (en) Method for fingerprint template update and terminal device
WO2023051215A1 (en) Gaze point acquisition method and apparatus, electronic device and readable storage medium
CN111260665B (en) Image segmentation model training method and device
CN111223143B (en) Key point detection method and device and computer readable storage medium
CN111105852B (en) Electronic medical record recommendation method, device, terminal and storage medium
CN108184050B (en) Photographing method and mobile terminal
EP3965003A1 (en) Image processing method and device
CN108712603B (en) Image processing method and mobile terminal
CN108985220B (en) Face image processing method and device and storage medium
JP2022546453A (en) FITNESS AID METHOD AND ELECTRONIC DEVICE
CN110942479B (en) Virtual object control method, storage medium and electronic device
US11868521B2 (en) Method and device for determining gaze position of user, storage medium, and electronic apparatus
CN110765924B (en) Living body detection method, living body detection device and computer readable storage medium
CN111046742B (en) Eye behavior detection method, device and storage medium
CN111080747B (en) Face image processing method and electronic equipment
WO2022188551A1 (en) Information processing method and apparatus, master control device, and controlled device
CN112818733B (en) Information processing method, device, storage medium and terminal
CN108846817A (en) Image processing method, device and mobile terminal
CN109819331B (en) Video call method, device and mobile terminal
CN116320721A (en) Shooting method, shooting device, terminal and storage medium
CN115240250A (en) Model training method and device, computer equipment and readable storage medium
CN112988984B (en) Feature acquisition method and device, computer equipment and storage medium
CN114648315A (en) Virtual interview method, device, equipment and storage medium
CN107872619B (en) Photographing processing method, device and equipment
CN108958505B (en) Method and terminal for displaying candidate information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874602

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22874602

Country of ref document: EP

Kind code of ref document: A1