WO2019018577A1

WO2019018577A1 - Systems and methods for analyzing behavior of a human subject

Info

Publication number: WO2019018577A1
Application number: PCT/US2018/042770
Authority: WO
Inventors: Leanne Chukoskie; Phyllis TOWNSEND; Pamela Cosman
Original assignee: The Regents Of The University Of California
Priority date: 2017-07-18
Filing date: 2018-07-18
Publication date: 2019-01-24

Abstract

Systems and methods for analyzing behavior of a human subject are disclosed. Example embodiments include obtaining video of a real-world environment. The real-world environment may include a stimulus. The embodiments may include obtaining video of at least one eye captured at a same time as the video of the real-world environment. The embodiments may include using at least the video of the real-world environment and the video of the eye to generate gaze data.

Description

SYSTEMS AND METHODS FOR ANALYZING BEHAVIOR OF A HUMAN

SUBJECT

Statement Regarding Federally Sponsored Research or Development

[0001] This invention was made with government support under R21M H096967 awarded by the National Institutes of Health. The government has certain rights in the invention.

Technical Field

[0002] The disclosed technology relates generally to analyzing behavior. More specifically, the present invention relates to systems and methods for objectively analyzing behavior of a human subject based upon eye movement and patterns.

Brief Summary of the Embodiments

[0003] Disclosed is a system, device, apparatus, and methods that objectively analyze behavior of a human subject, including in some examples, by capturing a subject's gaze and the duration and instances of reactions to cues. Video of a real-world environment and video of a subject's eye corresponding to a same time as the video of the real-world environment may be received by the system. The system may use the video to generate gaze data, including whether a gaze fixates on a given stimulus, the duration of the gaze, the subject's response time to a trigger by a stimulus, where on the stimulus the gaze is fixated, and other information. The gaze data may be used to generate a visualization that may be superimposed on the video of the real-world environment. Embodiments of the present disclosure include systems, methods, and devices capable of analyzing behavior of a human subject, as well as interconnected processors and/or circuitry, to generate gaze data using at least the video of the real- world environment and the video of the eye.

[0004] In accordance with aspects of the present disclosure, a computer- implemented method of analyzing behavior of a human subject includes a number of operations. The method includes obtaining video of a real-world environment. The real- world environment includes a stimulus. The method includes obtaining video of at least one eye captured at a same time as the video of the real-world environment. The method also includes using at least the video of the real-world environment and the video of the eye to generate gaze data.

[0005] In embodiments, the method may also include obtaining calibration data to track gaze of the real-world environment. The method may include using at least the calibration data to generate gaze data.

[0006] In embodiments, the method may include generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on the stimulus, the stimulus is highlighted. The method may include displaying the visualization on a graphical user interface.

[0007] In embodiments, the stimulus may include one or more of a person, an object, or an audio-visual cue.

[0008] In embodiments, the gaze data may include an amount of time spent looking at a given location in the environment.

[0009] In embodiments, the gaze data may include an amount of time taken to respond to a stimulus.

[0010] In embodiments, generating a visualization of the gaze data includes recognizing the stimulus in the video. Generating a visualization of the gaze data may also include highlighting the stimulus when a gaze at a given moment in time is fixated on the stimulus.

[0011] In embodiments, a specialist uses the gaze data and the visualization to evaluate the subject.

[0012] In accordance with additional aspects of the present disclosure, a system analyzes behavior of a human subject. The system may include an eye-tracking device. The eye-tracking device may include a first camera to capture video of a real- world environment. The eye-tracking device may include a second camera to capture video of an eye. The system may include a plurality of units. Each of the plurality of units generate an auditory or visual cue. The system may also include a control unit communicatively coupled to each of the plurality of units. The control unit is to trigger each of the plurality of units to generate a respective auditory or visual cue while the first and second video cameras capture video. [0013] In embodiments, the system may include a non-transitory computer- readable medium and a processor. The non-transitory computer-readable medium is operatively coupled to the processor. The non-transitory computer-readable medium stores instructions that, when executed by the processor, performs a number of operations.

[0014] One operation is receiving the video of the real-world environment. The real-world environment may include a stimulus. Another operation is receiving the video of the eye captured at a same time as the video of the real-world environment. Yet another operation is using at least the video of the real-world environment and the video of the eye to generate gaze data. Another operation is generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on a stimulus, the stimulus is highlighted. Yet another operation is displaying the visualization on a graphical user interface.

[0015] In embodiments, another operation may be obtaining calibration data to track gaze of the real-world environment. Yet another operation may be using at least the calibration data to generate gaze data.

[0016] In embodiments, the stimulus includes one or more of a person, an object, or an auditory or visual cue generated by one of the plurality of units.

[0017] In embodiments, the gaze data may include an amount of time spent looking at a given location in the environment. The gaze data may also include an amount of time taken to respond to a stimulus.

[0018] In embodiments, the control unit is to send a time stamp each time it triggers one of the plurality of units to generate an auditory or visual cue.

[0019] In accordance with additional aspects of the present disclosure, a computer-implemented method of analyzing behavior of a human subject includes a number of operations. The method may include receiving video of a real-world environment. The real-world environment includes a stimulus. The method may also include receiving video of at least one eye captured at a same time as the video of the real-world environment. The method may include using at least the video of the real- world environment and the video of the eye to generate gaze data. The method may also include generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on the stimulus, the stimulus is highlighted. The method may include displaying the visualization on a graphical user interface.

[0020] In embodiments, the method may include receiving calibration data to track gaze of the real-world environment. The method may further include using at least the calibration data to generate gaze data.

[0021] In embodiments, the stimulus includes one or more of a person, an object, or an audio-visual cue.

[0022] In embodiments, the gaze data includes an amount of time spent looking at a given location in the environment and an amount of time taken to respond to a stimulus.

[0023] In embodiments, generating a visualization of the gaze data includes recognizing the stimulus in the video. Generating a visualization of the gaze data may also include highlighting the stimulus when the gaze is fixated on the stimulus.

[0024] In embodiments, a specialist uses the gaze data and the visualization to evaluate the subject.

Brief Description of the Drawings

[0025] The technology disclosed herein, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict typical or example embodiments of the disclosed technology. These drawings are provided to facilitate the reader's understanding of the disclosed technology and shall not be considered limiting of the breadth, scope, or applicability thereof. It should be noted that for clarity and ease of illustration these drawings are not necessarily made to scale.

[0026] FIG. 1 illustrates an eye-tracking device for tracking gaze in accordance with one embodiment of the present disclosure.

[0027] FIG. 2 illustrates an eye-tracking system in accordance with one embodiment of the present disclosure.

[0028] FIG. 3 illustrates a test room with the eye-tracking system set-up in accordance with one embodiment of the present disclosure. [0029] FIG. 4 illustrates a calibration board for calibrating an eye-tracking device in accordance with one embodiment of the present disclosure.

[0030] FIG. 5 illustrates collected data analysis based on the data stream from an eye-tracking device in accordance with one embodiment of the present disclosure.

[0031] FIG. 6 illustrates a graphical user interface to showcase a video feed in accordance with one embodiment of the present disclosure.

[0032] FIG. 7 illustrates a flow diagram depicting various operations, in accordance with one embodiment of the present disclosure.

[0033] FIG. 8 illustrates a flow diagram depicting various operations, in accordance with one embodiment of the present disclosure.

[0034] FIG. 9 illustrates an exemplary computing module that may be used to implement any of the embodiments disclosed herein.

[0035] The figures are not intended to be exhaustive or to limit the technology to the precise form disclosed. It should be understood that the technology can be practiced with modification and alteration, and that the disclosed technology be limited only by the claims and the equivalents thereof.

Detailed Description of the Embodiments

[0036] Many people with behavioral and neurological conditions often characteristically display deficits in cognitive and social behaviors, such as, for example, communication skills. Indicators may include differences in orienting behavior, which may manifest as an inability or unwillingness to make appropriate eye contact with others or ill-timed gaze shifts. Gaze may include a location an eye is focusing on at a given moment. For example, one of the earliest and reliable signs of autism spectrum disorder occurs when a child fails to look toward someone calling his name. Atypical gaze patterns can also be observed in other behavioral and neurological disorders, such as attention-deficit hyperactivity, Alzheimer's, schizophrenia, depression, and the like.

[0037] While there are eye-tracking technologies available to monitor visual attention, most technology and methods are limited to showing images or videos on a computer or television screen. Although these methods offer a lot of control over the nature and timing of presentations, computer and television screens are often a poor proxy for assessing or diagnosing social or communicative behavior, as the human element is lacking in such set ups. On the other hand, although one can measure gaze during encounters outside of a computer screen, the real world lacks, the timing control, analytical options, and availability afforded by a computer. Manually noting the test subject's gaze response to each event is not only labor intensive, but is subjective and susceptible to human error.

[0038] Because there is a greater understanding of certain behavioral or neurological disorders, like autism spectrum disorder, it is now possible to provide better overall health and wellbeing outcomes with a more accurate assessment. One method of detecting common behavioral and neurological symptoms may be to assess eye movement patterns or gaze behavior in a social setting. This is because eye movement patterns and gaze behavior may provide an accurate assessment of the subject's cognitive and social development.

[0039] As such, the disclosed embodiments include the use of an eye-tracking device and systems and methods to accurately measure the subject's eye and gaze behavior, which may result in determining whether the subject shows symptoms of any behavioral or neurological disorders and any change in those behaviors following behavioral or pharmacological treatment. The device, systems, and methods may provide diagnostic and assessment connected to an automated system that may objectively assess social, cognitive, and communicative behavior using gaze patterns, including eye contact between participants. By way of example, such behavioral or neurological disorders that may be detected by monitoring and analyzing a subject's eye and gaze behavior may include attention deficit/hyperactivity, Alzheimer's, depression, and the like.

[0040] However, it should be noted that embodiments of the eye-tracking device may also be implemented in a wide range of applications. For example, the disclosed eye-tracking device may also be used to help train and assess high-performance athletes. Because high performance athletes often require quick eye time response, the disclosed eye-tracking device may be used to assess the athlete's speed, focus, accuracy in gaze, and response time based on the monitored eye movement.

[0041] In further embodiments, the eye-tracking device may include eyewear (e.g., eyeglasses) with multiple video cameras to record and analyze an environment and the subject. The video may be analyzed using a machine-learning-based software tool, herein referred to as an analytical tool or behavioral recognition model, for effective gaze analysis. As such, the eye-tracking device may accurately detect specific eye movements that are related to the eye and brain that control overt gaze shifts as well as covert gaze shifts. In further embodiments, the eye-tracking device may include eyewear (e.g., eye glasses), audio components (e.g., speakers), visual components (e.g., an array of LED lights), and a central control unit. By way of example, the operator of the eye-tracking system may present audible and visual cues to a test subject, where the test subject's eye responses may be recorded and analyzed. Fast or accurate response may indicate a state of health, and slow or inaccurate responses may indicate potential behavioral or neurological issues. In addition, the analytical tool may permit the experimenter to make a mark in the data stream to indicate a particular behavior to assist in later analyses, as described herein.

[0042] FIG. 1 illustrates an eye-tracking device 100 for gaze analysis according to one embodiment of the present disclosure. By way of example, the eye-tracking device 100 may include one or more optical lenses surrounded by a frame 120. Eye-tracking device 100 may be calibrated when eye-tracking device 100 is worn in a given position on the subject, as described herein. The eye-tracking device 100 may include one or more video cameras 110 mounted on frame 120 of eye-tracking device 100 to record a video feed of what the test subject is seeing (e.g., a real-world environment). By way of example, video camera 110 may record a video from the perspective of the subject at 30 to 60 frames per second, and may be a specialized camera or cameras that render the three-dimensional aspect of the scene. As will be appreciated, one or more video cameras 110 may be used to capture and record the real-world environment, and the one or more real-world environment videos may be stitched together to be viewed.

[0043] Additionally, the eye-tracking device 100 may also include one or more eye cameras 130, which may record and detect the pupil and corneal reflections of the test subject's eye or eyes. As illustrated in the example of FIG. 1, eye-tracking device 100 includes a first camera for the right eye and a second camera for the left eye. The eye camera 130 may also record the test subject's general eye area so that a doctor or medical personnel may observe the eye movement and behavior by reviewing the recorded eye camera feed. Both the environment-facing video camera 110 and the eye camera 130 may be mounted on the frame 120 via a ball, socket, swivel, rail or other mechanism to allow the adjustment and re-positioning of the video camera 110 and the eye camera 130. For example, a subject may adjust the wearable eye-tracking device 100 when it is worn, which may require recalibration or adjustment of the eye camera 130. Recordings of the eye or eyes may be at higher rates (e.g., 120 frames per second) and with a higher spatial resolution for resolving smaller changes in eye position and pupil size. As will be appreciated, one or more eye cameras 130 may be used to capture and record a given eye of the test subject, and the one or more eye videos may be stitched together to be viewed.

[0044] In some embodiments, the video feed from the video camera 110 and the eye camera 130 may be saved onto a storage device (not shown) of the eye-tracking device 100. In further embodiments, the video camera 110 and the eye camera 130 may be wirelessly synced to a storage device on a computing device, or to one or more other eye-tracking devices. By way of example, the video data, including metadata (e.g., time stamps, length of recording, etc.) from the video camera 110 and the eye camera 130 may be exported to the computing device. In other embodiments, the eye-tracking device 100 may include a USB cable, thus allowing the eye-tracking device 100 to be physically connected to a computing device.

[0045] A computing device may be programmed to include an analytical tool (e.g., a software application or instructions that may be executed by a processing device of the computing device) to allow the automation for detecting and analyzing gaze behaviors from the video feed provided by the video camera 110 and the eye camera 130, either offline after data collection from eye-tracking device 100, or online in near-real time. In embodiments, the computing device may be integrated into eye-tracking device 100. To accurately detect gaze behaviors and eye movement patterns, the analytical tool may be configured to detect faces from each frame extracted from the input video of the video camera 110. In embodiments, a neural network, such as, for example, Faster RCN N, may be used to find and classify multiple objects in a single image. In one embodiment, the analytical tool may also include a facial recognition software, otherwise herein referred to as a facial recognition tool. By way of example, the facial recognition tool may include a Viola-Jones object detection framework or a histogram of oriented gradients (HOG) which counts the occurrences of gradient orientation in each image. The Viola- Jones object detection framework may detect faces using detected patterns of light and dark regions to recognize a facial region. The detected faces may be compared to Haar features, which are a series of black and white patterns that resemble light and dark areas of a face.

[0046] Once a face is detected, the face may be classified and labeled based on a classifier, such as, for example, an Error-Correcting Output Code (ECOC) Classifier or Eigenfaces. The input for the classifier may be frames of the video feed at various intervals (e.g., 1 millisecond, 1 second, 10 seconds, etc.) The faces may be extracted and used to generate a face database. HOG may be applied to the face database to extract relevant information on the images. It should be appreciated that other face detection, object detection, and classifiers may be used.

[0047] Object detection input may include multiple images of an object at various scales and perspectives. In some embodiments, a video may be taken of an object at varying distances and angles. The individual frames of the video may be extracted for use as input in a neural network (e.g., Faster RCANN). In order to improve training runtimes, the neural network may perform calculations on the GPU instead of the CPU. The facial and object detection and recognition may be more clearly described in FIG. 7.

[0048] The analytical tool may generate gaze data. Gaze data may include when a gaze fixates on a given stimulus (e.g., face, object, and other location in the real- world environment), how long the gaze is fixated on the given stimulus, the subject's response time to a trigger by a stimulus, where on the stimulus the gaze is fixated, and other information. Gaze data may be represented by a visualization, as described herein.

[0049] Additionally, the analytical tool may even be used to detect specific facial sub-regions, such as the eyes, mouth, and nose. It should be appreciated that further granularity may be achieved using the analytical tool, such as what part of the eye, what part of the mouth, what part of the nose, etc., a subject's gaze is fixated on at a given moment in time. By doing so, the analytical tool may be able to take the video feed from the eye camera 130 and detect faces in the video feed to determine whether the test subject is focusing his or her gaze at particular regions of the face. [0050] Furthermore, in additional embodiments, the analytical tool with the facial recognition analysis may be able to determine the duration of the test subject's fixed gaze. By determining the gaze fixation and its duration, an objective social and communicative behavioral analysis may be determined. For example, in the instance that the test subject predominantly fixes his or her gaze at the corner of the speaker's mouth for a prolonged period of time without making eye contact, this may indicate that the test subject is showing symptoms of autism or other forms of behavioral disorder. Since gaze behavior is a common target of behavioral therapy for children with autism, the eye- tracking device and analytical tools can be used to evaluate treatment progress.

[0051] In additional embodiments, the analytical tool may also detect objects in the video feed provided by the video camera 110. As a result, the analytical tool may be able to detect and distinguish among one or more faces and one or more non-human items. Additionally, by way of example only, the analytical tool may label the faces and objects.

[0052] In further embodiments, the eye-tracking device 100 may also include a speaker. The speaker may be used to create an audio trigger to assess the test subject's response. As such, the eye-tracking device 100 may be used to create audio and visual triggers to calculate latency, which can then be used to further assess social and communicative behavior of the test subject.

[0053] In other embodiments, the eye-tracking device 100 may be part of a virtual reality headset. The virtual reality headset may include software and hardware that provides the necessary audio and/or visual signals to be presented on a display (e.g., as part of a graphical user interface) and/or speakers. In addition, aspects of the virtual reality headset can be made gaze-dependent, so that the images the subject sees on the headset respond to the subject's eye and gaze behavior.

[0054] In some embodiments, the virtual reality headset may be configured to display real-world social interactive experiences, such as an audiovisual environment displaying a virtual room full of strangers interacting with each other. Because the virtual reality headset provides the test subject with a virtual interactive and immersive experience, the test subject may be presented with real world social and environmental experiences in a virtual environment. Thus, the analytical tool may be able to objectively analyze the test subject's behavior by analyzing the gaze behavior in response to the virtual social and environmental cues.

[0055] In other embodiments, the eye-tracking device 100 may be part of an augmented reality headset. The augmented reality headset allows the user to see people and objects that are actually present, but also may include software and hardware that allows for additional audio and visual signals to augment the user's experience of the real- world environment. For example, digital objects may be overlaid in the user's field of view of the real-world environment in real-time. Possible examples include having name labels appear for people or objects in the scene that are unknown, having virtual objects appear together with real objects or people, having visual cues replace audio cues for hearing- impaired subjects, and the like. In addition, aspects of the augmented reality headset can be made gaze-dependent, so that the augmentations the subject sees depend on the subject's eye and gaze behavior. For example, the subject may see an animation as a reward. The analytical tool may be able to objectively analyze the test subject's behavior by analyzing the gaze behavior in response to the composite cues made of real and augmented social and environmental cues.

[0056] In further embodiments, the eye-tracking device 100 may be incorporated into a testing room, where the testing room is set up as a system for monitoring gazes and their latency time, as further illustrated in FIG. 2. As depicted, the system may include a control unit 205 and multiple stimuli, which are illustrated as gaze attractor units 210. The gaze attractor units 210 may each individually be wireless, battery-operated, rechargeable devices that provide audiovisual cues (e.g., light up and make a sound). By way of example, the gaze attractor units 210 may be placed throughout a room.

[0057] In further embodiments, a control unit 205 may be configured to remotely trigger the lights and sounds corresponding to each individual gaze attractor units 210. The lights and sounds may be triggered to direct the test subjects gaze to specific parts of the room through the use of visual and auditory cues. Additionally, in some instances, the control unit 205 may send time stamps to the collected data stream each time a light or sound is triggered on a gaze attractor unit 210. [0058] FIG. 3 illustrates an example from a world-view as seen through a test subject wearing an eye-tracking device. As illustrated, the test subject is in a room sitting across from three other individuals. Additionally, the test room is set up with a control unit 305 and multiple gaze attractor units 310 placed throughout the room to monitor the range of eye-movement and to assess the speed and accuracy of gaze shifts.

[0059] Additionally, the analytical tool of the eye-tracking system may be able to take the data stream provided by the eye-tracking device and detect that the test subject currently has his or her gaze affixed on a particular face of an individual by various visualization as described herein. The analytical tool may further identify that there are two other people within the test subject's line of sight and also further determine that the test subject's gaze is not affixed on them, as indicated by the red boxes.

[0060] FIG. 4 illustrates a calibration board 400 for calibrating the eye-tracking device according to one embodiment of the present disclosure. The calibration board 400 may calibrate the eye-tracking device in order to ensure its accuracy. The eye-tracking device may need to be initially calibrated as well as repeatedly re-calibrated during the test session in case the glasses or headset shifts position. The calibration may include calibration data indicating a given position in the location of the real-world environment based on a given movement of the eye captured by the eye camera when the eye-tracking device is on a given position on the subject. In one embodiment, the calibration board 400 may include a target grid, where each frame is equal in size and includes a target unit 402 in each one. By way of example, the calibration board 400 in FIG. 4 has 24 different frames with 24 individual targets 402 in each frame, each of which can be illuminated remotely. By way of further example, there may be any number of frames with any number of individual targets. In embodiments, calibration board 400 may be placed 0.5 m, 1 m, 5 m, 10 m, 25 m, etc. away from the subject for calibration.

[0061] To calibrate the eye-tracking device, the user may wear the eye- tracking device and may be instructed to focus his or her gaze at a specified target 402. With the analytic tool, it may be quickly determined whether the eye-tracking device is accurately monitoring and recognizing the movement of the eyes and the gaze. It should further be noted that the control unit and the gaze attractor units as depicted and described in FIGs. 2 and 3 may also be used to calibrate the wearable eye-tracking device. For example, instructing the test subject to gaze at specific or individual gaze attractor units may help validate the current calibration setting or identify areas of potential offset that need recalibration. In embodiments, far left, far right, far up, and far down gaze position have lower reliability and tend to be more difficult to calibrate. The ability to assess calibration validity may also be used to determine whether the eye-tracking device needs to be re-shifted or re-positioned on the test subject's face.

[0062] FIG. 5 illustrates collected data analysis based on the data from an eye- tracking device according to one embodiment of the present disclosure. As illustrated, the analytical tool may be able to provide an organized data sheet providing the test subject's gaze and movement results. The analysis may indicate and identify what the test subject has gazed upon, such as identifying the number of faces and types of objects. For example, the analysis here indicates that the test subject gazed upon two different faces and two different objects at various different times during the test session. Additionally, the analysis may also indicate at what specific times a stimulus or trigger was introduced to assess the test subject's response and latency time. This can be performed offline or in near-real time, the latter of which may be used for training or therapeutic purposes.

[0063] FIG. 6 illustrates a graphical user interface 600 to showcase the video feed according to one embodiment of the present disclosure. The graphical user interface 600 may output the video feed collected from the video camera of the eye-tracking device. Additionally, the video feed may have been processed and analyzed with the analytical tool to generate a visualization, so that the video feed may be marked with the stimuli, such as objects, facial recognition boxes, and labels. It should be appreciated that other markings may be used to otherwise highlight a given stimuli. In embodiments, the markings may include dynamic markings on the gaze at a given moment in time throughout the video feed. The dynamic markings may include a dot, a reticle, and/or other markings. The markings, or visualization, may be superimposed on the video feed. In embodiments, the video feed may mark a stimulus with a box when the subject fixates his or her gaze on a given stimulus. For example, the visualization may turn the gaze marker and the box green to indicate gaze is fixated on the corresponding stimulus. [0064] Thus, the video feed displayed on the graphical user interface 600 may allow a medical or clinical professional to review the analyzed video feed and manually assess the test subject's gaze behavior. By way of example only, the graphical user interface 600 may include viewing options that allow a medical or clinical professional to review the video feed in select frames or in a video format. Using the video feed, a professional may be able to assess a remote test subject's behavior.

[0065] FIG. 7 illustrates a flow diagram depicting various operations of method 700, in accordance with one embodiment of the present disclosure. The operations of the various methods described herein are not necessarily limited to the order described or shown in the figures, and it will be appreciated, upon studying the present disclosure, variations of the order of the operations described herein that are within the spirit and scope of the disclosure.

[0066] The operations and sub-operations of methods 700 and 800 may be carried out, in some cases, by one or more of the components, elements, devices, and circuitry, described herein and referenced with respect to at least FIGs. 1 and 9, as well as sub-components, elements, devices, and circuitry depicted therein and/or described with respect thereto. In such instances, the description of methods 700 and 800 may refer to a corresponding component, element, etc., but regardless of whether an explicit reference is made, it will be appreciated, upon studying the present disclosure, when the corresponding component, element, etc. may be used. Further, it will be appreciated that such references do not necessarily limit the described methods to the particular component, element, etc. referred to. Thus, it will be appreciated that aspects and features described above in connection with (sub-) components, elements, devices, circuitry, etc., including variations thereof, may be applied to the various operations described in connection with methods 700 and 800 without departing from the scope of the present disclosure.

[0067] Operations 702 through 708 may be used to train a database using the analytical tool to detect and recognize the one or more objects in a real-world environment. At operation 702, method 700 may include receiving frames of a video of an object. The video may have been taken of the object at different distances from the object, at different angles, and, in some examples, where the object is partially obscured. In embodiments, one or more videos may be used. At operation 704, method 700 may include defining one or more regions of interest. In some embodiments, the region of interest may be determined using machine learning tools to determine boundaries of object(s) in the video. In embodiments, the one or more regions of interest may be manually determined by examining the one or more frames to determine where the object is in the frame. At operation 706, method 700 may include generating a database of training images for the machine learning object recognition. Using the one or more regions of interest, thousands of images may be generated of the object from the videos for training by the machine-learning object recognition. At operation 708, method 700 may include classifying the objects For example, a given object may be classified and given a code that is translated to a corresponding label. A stuffed animal shark toy may be labeled "shark." The label may appear in the visualization of the gaze data that is superimposed on the video feed.

[0068] Operations 710 through 712 may incorporate video of the eye into generating gaze data. At operation 710, method 700 may include receiving frames of videos of the real-world environment and of the eye(s). The video cameras may be mounted to the eye-tracking device. A first set of video cameras may capture the real- world environment. The second set of video cameras may capture the movement of the eye. In an example, a first camera may capture a first eye and a second camera may capture a second eye. At operation 712, method 700 may include sending raw video of the eye to operation 726. Continuing the example, the video of the first camera and the second camera may be sent to operation 726.

[0069] Operations 708 and 720 through 726 may be used to classify faces in the video of the real-world environment to generate gaze data. At operation 720, method 700 may include extracting the frames of the real-world environment to be analyzed by the analytical tool. The first set of video cameras may be stitched, pieced, cut together, etc. The individual frames may be extracted from the videos of the real-world environment. At operation 708, method 700 may include applying the analytical tool to a frame or a set of consecutive frames to detect faces in a real-world environment. For example, a facial detection algorithm, such as, for example, the Viola-Jones Object Detection algorithm with Haar features may be able to detect whether one or more faces are in a given frame of the captured video of the real-world environment. The algorithm may pass an image (e.g., one frame grayscale video) through a series of matches to the Haar features. The Haar features may be a series of black and white patterns that resemble light and dark areas of expected face. These patterns are expected in human facial feature due to lighting and general contours of a human face. These features may seek possible matches in the current frame, and locate all the subframes which resemble the pattern.

[0070] At operation 722, method 700 may include classifying the faces in a frame or a set of consecutive frames. At operation 724, method 700 may include detecting the faces in a frame or a set of frames. The training faces may be generated directly from the input video itself. For example, in the first fifty frames of the video, which is approximately 2 seconds, the relative horizontal positions of faces may remain unchanged. U pon detection, as described above, the faces may be labeled based on their horizontal coordinates. The leftmost face in the frame may be assigned with index 1, and the subsequent face may be assigned with index 2. The faces may be extracted with their corresponding index information and put into a customized face database.

[0071] HOG features from the training images in the face database may be extracted for classification. The HOG features count the occurrences of gradient orientation in each image. An unsupervised binary classifier may be run on each HOG feature to generate a binary codeword of each face class for the ECOC model. Each class may be assigned a unique binary string of length 15. The string may also be called a codeword. For example, class 2 has the codeword 100100011110101. During training, one binary classifier may be learned for each column. For instance, for the first column, a binary classifier may be built to separate {0, 2, 4, 6, 8} from {1, 3, 5, 7, 9}. Each newly detected face may generate a codeword based on its HOG features vectors. The classifier may choose the class whose codeword is closest to the codeword of the input image as the predicted label for the newly detected face.

[0072] At operation 726, method 700 may include generating gaze data, as described above. The gaze data may already include or be derived from calibration data. As a result, the number of times a subject fixates on a stimulus, the duration of each fixation, where the fixation is located on the stimulus, and other information, as described above, are recorded. The gaze data may be further analyzed and processed to generate visualizations, which may include dots, reticles, or other markers representing a subject's gaze at a current moment, as well as boxes, highlights, or other markers representing a stimulus in the real-world environment.

[0073] FIG. 8 illustrates a flow diagram depicting various operations of method 800, and accompanying embodiments for analyzing behavior of a human subject, in accordance with aspects of the present disclosure. The operations of the various methods described herein are not necessarily limited to the order described or shown in the figures, and it will be appreciated, upon studying the present disclosure, variations of the order of the operations described herein that are within the spirit and scope of the disclosure.

[0074] At operation 802, method 800 includes obtaining video of a real-world environment. As described above, the video of the real-world environment may be captured by a first set of video cameras. It will be appreciated that the video may be captured at one or more resolutions. The video of the real-world environment may include one or more stimuli, such as one or more persons, objects, and audio-visual cues.

[0075] At operation 804, method 800 includes obtaining video of an eye. As described above, the video of the eye may be captured by a second set of video cameras, where a first camera may capture a first eye and a second camera may capture a second eye. Moreover, the second set of video cameras may capture movement of both eyes. As described above, the resolution and framerate of the second set of video cameras may be higher to better capture subtle changes in the subject's gaze.

[0076] At operation 806, method 800 includes using at least the video of the real-world environment and the video of the eye to generate gaze data In embodiments, calibration data may also be included to generate gaze data. Based on a given position of the eye-tracking device on a subject, the eye-tracking device may generate calibration data based on correlating eye movement with a location in the real-world environment. This is generated using the video of the real-world environment and the video of the eye to determine where the subject's gaze is fixated in the real-world environment. Gaze data may include the number of times a subject fixates on a stimulus, the duration of each fixation, where the fixation is located on the stimulus, and other information, as described above, are recorded. [0077] At operation 808, method 800 includes generating a visualization. The visualization, as described above, may be of the gaze data, and may be superimposed on the video feed. In embodiments, the gaze may be marked with a given visualization and the one or more stimuli may be marked with another visualization. When the gaze fixates on one of the one or more stimuli, the gaze and the corresponding stimuli visualizations may change. For example, when a subject's gaze fixates on a person's eyes, a red box surrounding the person's eyes may turn green and the red dot corresponding to the subject gaze may also turn green. It should be appreciated that other visualizations may be used.

[0078] At operation 810, method 800 includes displaying the visualization on a graphical user interface.

[0079] FIG. 9 illustrates an exemplary computing module that may be used to implement any of the embodiments disclosed herein. As used herein, the term "system" might describe a given unit of functionality that can be performed in accordance with one or more embodiments of the present invention. As used herein, a system might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a system. In implementation, the various systems described herein might be implemented as discrete systems or the functions and features described can be shared in part or in total among one or more systems. In other words, it should be appreciated that after reading this description, the various features and functionality described herein may be implemented in any given application and can be implemented in one or more separate or shared systems in various combinations and permutations. Even though various features or elements of functionality may be individual ly described or claimed as separate systems, it should be appreciated that these features and functionality can be shared among one or more common software and hardware elements, and such description shall not require or imply that separate hardware or software components are used to implement such features or functionality.

[0080] While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the invention, which is done to aid in understanding the features and functionality that can be included in the invention. The invention is not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations. Indeed, it will be appreciated how alternative functional, logical or physical partitioning and configurations can be implemented to implement the desired features of the present invention. Also, a multitude of different constituent system names other than those depicted herein can be applied to the various partitions. Additionally, with regard to flow diagrams, operational descriptions and method claims, the order in which the steps are presented herein shall not mandate that various embodiments be implemented to perform the recited functionality in the same order unless the context dictates otherwise.

[0081] Where components or systems of the invention are implemented in whole or in part using software, in one embodiment, these software elements can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto. One such example-computing system is shown in Fig. 9. Various embodiments are described in terms of this example-computing system 900. After reading this description, it will be appreciated how to implement the invention using other computing systems or architectures.

[0082] Referring now to FIG. 9, computing system 900 may represent, for example, computing or processing capabilities found within desktop, laptop and notebook computers; hand-held computing devices (PDA's, tablets, smart phones, cell phones, palmtops, etc.); mainframes, supercomputers, workstations or servers; or any other type of special-purpose or general-purpose computing devices as may be desirable or appropriate for a given application or environment. Computing system 900 might also represent computing capabilities embedded within or otherwise available to a given device. For example, a computing system might be found in other electronic devices such as, for example, digital cameras, navigation systems, cellular telephones, portable computing devices, modems, routers, WAPs, terminals and other electronic devices that might include some form of processing capability. [0083] Computing system 900 might include, for example, one or more processors, controllers, control systems, or other processing devices, such as a processor 904. Processor 904 might be implemented using a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic. In the example illustrated in FIG. 9, processor 904 is connected to a bus 902, although any communication medium can be used to facilitate interaction with other components of computing system 900 or to communicate externally.

[0084] Computing system 900 might also include one or more memory systems, simply referred to herein as main memory 908. For example, preferably random access memory (RAM) or other dynamic memory might be used for storing information and instructions to be executed by processor 904. Main memory 908 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 904. Computing system 900 might likewise include a read only memory ("ROM") or other static storage device coupled to bus 902 for storing static information and instructions for processor 904.

[0085] The computing system 900 might also include one or more various forms of information storage mechanism 910, which might include, for example, a media drive 912 and a storage unit interface 920. The media drive 912 might include a drive or other mechanism to support fixed or removable storage media 914. For example, a hard disk drive, a floppy disk drive, a magnetic tape drive, an optical disk drive, a CD or DVD drive (R or RW), or other removable or fixed media drive might be provided. Accordingly, storage media 914 might include, for example, a hard disk, a floppy disk, magnetic tape, cartridge, optical disk, a CD or DVD, or other fixed or removable medium that is read by, written to or accessed by media drive 912. As these examples illustrate, the storage media 914 can include a computer usable storage medium having stored therein computer software or data.

[0086] In alternative embodiments, information storage mechanism 910 might include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into computing system 900. Such instrumentalities might include, for example, a fixed or removable storage unit 922 and an interface 920. Examples of such storage units 922 and interfaces 920 can include a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory system) and memory slot, a PCMCIA slot and card, and other fixed or removable storage units 922 and interfaces 920 that allow software and data to be transferred from the storage unit 922 to computing system 900.

[0087] Computing system 900 might also include a communications interface 924. Communications interface 924 might be used to allow software and data to be transferred between computing system 900 and external devices. Examples of communications interface 924 might include a modem or softmodem, a network interface (such as an Ethernet, network interface card, WiMedia, IEEE 902. XX or other interface), a communications port (such as for example, a USB port, IR port, RS232 port Bluetooth^® interface, or other port), or other communications interface. Software and data transferred via communications interface 924 might typically be carried on signals, which can be electronic, electromagnetic (which includes optical) or other signals capable of being exchanged by a given communications interface 924. These signals might be provided to communications interface 924 via a channel 928. This channel 928 might carry signals and might be implemented using a wired or wireless communication medium. Some examples of a channel might include a phone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communications channels.

[0088] In this document, the terms "computer program medium" and "computer usable medium" are used to generally refer to media such as, for example, memory 908, storage unit 920, media 914, and signals on channel 928. These and other various forms of computer program media or computer usable media may be involved in carrying one or more sequences of one or more instructions to a processing device for execution. Such instructions embodied on the medium, are generally referred to as "computer program code" or a "computer program product" (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable the computing system 900 to perform features or functions of the present invention as discussed herein.

[0089] While various embodiments of the disclosed technology have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the disclosed technology, which is done to aid in understanding the features and functionality that can be included in the disclosed technology. The disclosed technology is not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations. Indeed, it will be appreciated how alternative functional, logical or physical partitioning and configurations can be implemented to implement the desired features of the technology disclosed herein. Also, a multitude of different constituent module names other than those depicted herein can be applied to the various partitions. Additionally, with regard to flow diagrams, operational descriptions and method claims, the order in which the steps are presented herein shall not mandate that various embodiments be implemented to perform the recited functionality in the same order unless the context dictates otherwise.

[0090] Although the disclosed technology is described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead can be applied, alone or in various combinations, to one or more of the other embodiments of the disclosed technology, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the technology disclosed herein should not be limited by any of the above-described exemplary embodiments.

[0091] Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term "including" should be read as meaning "including, without limitation" or the like; the term "example" is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; the terms "a" or "an" should be read as meaning "at least one," "one or more" or the like; and adjectives such as "conventional," "traditional," "normal," "standard," "known" and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. Likewise, where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future.

[0092] The presence of broadening words and phrases such as "one or more," "at least," "but not limited to" or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent. The use of the term "module" does not imply that the components or functionality described or claimed as part of the module are all configured in a common package. Indeed, any or all of the various components of a module, whether control logic or other components, can be combined in a single package or separately maintained and can further be distributed in multiple groupings or packages or across multiple locations.

[0093] Additionally, the various embodiments set forth herein are described in terms of exemplary block diagrams, flow charts and other illustrations. As will appreciated after reading this document, the illustrated embodiments and their various alternatives can be implemented without confinement to the illustrated examples. For example, block diagrams and their accompanying description should not be construed as mandating a particular architecture or configuration.

Claims

CLAIMS What is claimed is:

1. A computer-implemented method of analyzing behavior of a human subject, comprising:

obtaining video of a real-world environment, wherein the real-world

environment comprises a stimulus;

obtaining video of at least one eye captured at a same time as the video of the real-world environment;

using at least the video of the real-world environment and the video of the eye to generate gaze data.

2. The computer-implemented method of claim 1, further comprising:

obtaining calibration data to track gaze of the real-world environment; and using at least the calibration data to generate gaze data.

3. The computer-implemented method of claim 1, further comprising:

generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on the stimulus, the stimulus is highlighted; and

displaying the visualization on a graphical user interface.

4. The computer-implemented method of claim 1, wherein the stimulus comprises one or more of a person, an object, or an audio-visual cue.

5. The computer-implemented method of claim 1, wherein the gaze data comprises an amount of time spent looking at a given location in the environment.

6. The computer-implemented method of claim 5, wherein the gaze data further comprises an amount of time taken to respond to a stimulus.

7. The computer-implemented method of claim 2, wherein generating a visualization of the gaze data comprises:

recognizing the stimulus in the video; and

highlighting the stimulus when a gaze at a given moment in time is fixated on the stimulus.

8. The computer-implemented method of claim 2, wherein a specialist uses the gaze data and the visualization to evaluate the subject.

9. A system for analyzing behavior of a human subject, the system comprising: an eye-tracking device comprising:

a first camera to capture video of a real-world environment; a second camera to capture video of an eye;

a plurality of units, each of the plurality of units to generate an auditory or visual cue; and

a control unit communicatively coupled to each of the plurality of units, wherein the control unit is to trigger each of the plurality of units to generate a respective auditory or visual cue while the first and second video cameras capture video.

10. The system of claim 9, further comprising: a non-transitory computer-readable medium and a processor, wherein the non-transitory computer-readable medium is operatively coupled to the processor, and wherein it stores instructions that, when executed by the processor, performs operations of:

receiving the video of the real-world environment, wherein the real- world environment comprises a stimulus;

receiving the video of the eye captured at a same time as the video of the real-world environment;

using at least the video of the real-world environment and the video of the eye to generate gaze data; generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on a stimulus, the stimulus is highlighted; and

displaying the visualization on a graphical user interface.

11. The system of claim 10, wherein the non-transitory computer-readable medium further stores instructions that, when executed by the processor, further performs operations of:

12. The system of claim 10, wherein the stimulus comprises one or more of a person, an object, or an auditory or visual cue generated by one of the plurality of units.

13. The system of claim 10, wherein the gaze data comprises:

an amount of time spent looking at a given location in the environment; and an amount of time taken to respond to a stimulus.

14. The system of claim 10, wherein the control unit is to send a time stamp each time it triggers one of the plurality of units to generate an auditory or visual cue.

15. A computer-implemented method of analyzing behavior of a human subject, comprising:

receiving video of a real-world environment, wherein the real-world

environment comprises a stimulus;

receiving video of at least one eye captured at a same time as the video of the real-world environment;

using at least the video of the real-world environment and the video of the eye to generate gaze data; generating a visualization of the gaze data to be superimposed on the video of the real-world environment, such that when a gaze at a given moment in time is fixated on the stimulus, the stimulus is highlighted; and

displaying the visualization on a graphical user interface.

16. The computer-implemented method of claim 15, further comprising:

receiving calibration data to track gaze of the real-world environment; and using at least the calibration data to generate gaze data.

17. The computer-implemented method of claim 15, wherein the stimulus comprises one or more of a person, an object, or an audio-visual cue.

18. The computer-implemented method of claim 15, wherein the gaze data comprises: an amount of time spent looking at a given location in the environment; and

an amount of time taken to respond to a stimulus.

19. The computer-implemented method of claim 15, wherein generating a visualization of the gaze data comprises:

recognizing the stimulus in the video; and

highlighting the stimulus when the gaze is fixated on the stimulus.

20. The computer-implemented method of claim 15, wherein a specialist uses the gaze data and the visualization to evaluate the subject.