FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality
Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but
traditional frame-based systems struggle to meet XR's demands for high accuracy, low
latency, and power efficiency. Event cameras offer a promising alternative due to their high
temporal resolution and low power consumption. In this paper, we present FACET (Fast and
Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs
pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse …
traditional frame-based systems struggle to meet XR's demands for high accuracy, low
latency, and power efficiency. Event cameras offer a promising alternative due to their high
temporal resolution and low power consumption. In this paper, we present FACET (Fast and
Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs
pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse …
Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but traditional frame-based systems struggle to meet XR's demands for high accuracy, low latency, and power efficiency. Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. In this paper, we present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse output can be directly used in subsequent ellipse-based pupil trackers. We enhance the EV-Eye dataset by expanding annotated data and converting original mask labels to ellipse-based annotations to train the model. Besides, a novel trigonometric loss is adopted to address angle discontinuities and a fast causal event volume event representation method is put forward. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms, reducing pixel error and inference time by 1.6 and 1.8 compared to the prior art, EV-Eye, with 4.4 and 11.7 less parameters and arithmetic operations. The code is available at https://github.com/DeanJY/FACET.
arxiv.org