An eye controlled mouse is a type of assistive technology that allows users to control a computer mouse using their eye movements. This technology is particularly useful for people with disabilities who may have difficulty using a traditional mouse, as well as for individuals who are looking for an alternative way to interact with their computer.
To use an eye controlled mouse, a camera is typically mounted above the computer screen to track the user's eye movements. This information is then used to control the movement of the mouse pointer on the screen. Some eye controlled mouse systems also have the ability to perform clicks and other actions by detecting changes in the user's eye movements, such as blinks or winks.
There are various commercially available eye controlled mouse systems, as well as open-source software that can be used to create a custom solution. The accuracy and ease-of-use of these systems can vary, but with advances in computer vision and machine learning, eye controlled mouse technology has become increasingly accessible and reliable.
Source code
Refer to this Github Repo for further information on my project
Line-by-Line Code Explanation
Python modules used
"cv2" is a Python library used for computer vision and image processing. It is a wrapper around the popular open-source computer vision library, OpenCV. The cv2 library provides functions for reading, writing, and manipulating images, as well as tools for object detection, face recognition, and more. It is widely used in computer vision applications due to its comprehensive and performance-oriented features.
Mediapipe is a cross-platform, open-source framework for building multimedia processing pipelines. It provides a set of tools and libraries for developing computer vision and audio processing applications on various platforms, including mobile devices and desktop computers. Mediapipe supports the creation of custom pipelines that can be optimized for specific use cases, making it a popular choice for developers working in fields such as augmented reality, robotics, and video analysis. The "mp" here in "mediapipe mp" stands for "mediapipe."
PyAutoGUI is a Python library for automating GUI interactions. It provides a simple and convenient way to interact with graphical user interfaces (GUIs) on your computer, allowing you to simulate mouse clicks, keyboard presses, and other forms of input. With PyAutoGUI, you can write scripts that can automate repetitive tasks, perform image recognition, and even control other software applications. Some common use cases for PyAutoGUI include automating tasks in productivity applications, performing image-based testing, and automating game playing. The library works with all major operating systems and provides a high-level API for controlling GUI interactions, making it a useful tool for both developers and testers.
cam = cv2.VideoCapture(0)
face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
screen_w, screen_h = pyautogui.size()
cam = cv2.VideoCapture(0)
: This line initializes a video capture object using OpenCV's cv2
library. The 0
argument passed to VideoCapture
specifies that the default camera should be used as the source of the video feed. The video capture object is assigned to the variable cam
.
face_mesh =
mp.solutions
.face_mesh.FaceMesh(refine_landmarks=True)
: This line initializes a face mesh object from the Mediapipe (mp) library's face_mesh
module. The refine_landmarks
argument is set to True
, which means that the face mesh will use refined facial landmarks for more accurate face tracking. The face mesh object is assigned to the variable face_mesh
.
screen_w, screen_h = pyautogui.size()
: This line uses PyAutoGUI to get the dimensions of the computer's screen and assigns them to the variables screen_w
and screen_h
. The size
function returns the width and height of the screen as a tuple, which is unpacked and assigned to the variables.
These three lines of code initialize objects that can be used to capture video from a camera, track facial landmarks in the video feed, and get the dimensions of the computer screen, respectively. The specifics of how these objects are used depend on the context in which they are implemented, but they are commonly used in computer vision and multimedia processing applications.
Now I'll be explaining the rest of the code with a slight difference
while True:
_, frame = cam.read()
frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = face_mesh.process(rgb_frame)
landmark_points = output.multi_face_landmarks
frame_h, frame_w, _ = frame.shape
if landmark_points:
landmarks = landmark_points[0].landmark
for id, landmark in enumerate(landmarks[474:478]):
x = int(landmark.x * frame_w)
y = int(landmark.y * frame_h)
cv2.circle(frame, (x, y), 3, (0, 255, 0))
if id == 1:
screen_x = screen_w * landmark.x
screen_y = screen_h * landmark.y
pyautogui.moveTo(screen_x, screen_y)
left = [landmarks[145], landmarks[159]]
for landmark in left:
x = int(landmark.x * frame_w)
y = int(landmark.y * frame_h)
cv2.circle(frame, (x, y), 3, (0, 255, 255))
if (left[0].y - left[1].y) < 0.004:
pyautogui.click()
pyautogui.sleep(1)
cv2.imshow('Eye Controlled Mouse', frame)
cv2.waitKey(1)
The code uses a while
loop to repeatedly process frames from a video feed captured by a camera. The video capture object is created and initialized earlier in the code using cam = cv2.VideoCapture(0)
.
Each iteration of the loop performs the following actions:
_, frame =
cam.read
()
: This line uses theread
method of the video capture object to retrieve the next frame from the video feed. The_
is a placeholder variable that is used to ignore the return value indicating whether the frame was successfully retrieved.frame = cv2.flip(frame, 1)
: This line uses theflip
function from OpenCV'scv2
library to flip the video frame horizontally. This is done to ensure that the image orientation is correct for processing.rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
: This line uses thecvtColor
function from OpenCV'scv2
library to convert the video frame from the BGR color format used by OpenCV to the RGB color format used by Mediapipe.output = face_mesh.process(rgb_frame)
: This line uses theprocess
method of the face mesh object created earlier in the code to process the video frame and detect facial landmarks. The result of this processing is stored in theoutput
variable.landmark_points = output.multi_face_landmarks
: This line extracts the landmarks of the face detected in the video frame from theoutput
variable. The landmarks are stored in thelandmark_points
variable.if landmark_points:
: This line checks if any facial landmarks were detected in the frame. If landmarks were detected, the code inside theif
block is executed.The code inside the
if
block performs the following actions:a. Loops over the four landmarks corresponding to the eyes and draws circles around them in the video frame using the
cv2.circle
function from OpenCV'scv2
library.b. Calculates the position of the mouse cursor based on the position of the eyes and uses the
pyautogui.moveTo
method from the PyAutoGUI library to move the mouse cursor to the calculated position.c. Loops over the two landmarks corresponding to the mouth and draws circles around them in the video frame using the
cv2.circle
function.d. Checks the vertical distance between the two landmarks and, if it is below a certain threshold, uses the
pyautogui.click
method from the PyAutoGUI library to simulate a mouse click. Apyautogui.sleep
call is then made to pause the code for 1 second.cv2.imshow('Eye Controlled Mouse', frame)
: This line displays the processed video frame in a window using thecv2.imshow
function from OpenCV'scv2
library. The window is named "Eye Controlled Mouse".cv2.waitKey(1)
: This line uses thecv2.waitKey
function from OpenCV'scv2
library to wait for a key event in
Difficulties Faced
These code snippets as a program were now taken for testing, they were working perfectly fine until I noticed there was a slight delay in the execution process.
Improving the performance of the code would likely require a combination of several strategies, as the code can be slow for a variety of reasons. Here are a few points that I brainstromed with few friends from tech twitter community this might help speed things up:
Reduce the frame rate: You can reduce the frame rate that is captured by the webcam by increasing the wait time in the
cv2.waitKey()
function. For example, instead ofcv2.waitKey(1)
, you could usecv2.waitKey(10)
to slow down the frame rate.Resize the frame: If the frame size is too large, the processing can become slow. You can resize the frame using the
cv2.resize()
function to make it smaller, which should speed up the processing.Optimize the processing: You can try to optimize the processing of each frame by reducing the number of operations that are performed on it. For example, you might be able to eliminate the
cv2.flip()
operation, or you might be able to reduce the number of circles that are drawn on the frame.Use multi-threading: You can try to use multi-threading to perform the processing in parallel, which should speed up the processing. However, this can be a complex task and might require a significant amount of programming effort.
Use hardware acceleration: If your computer has a GPU, you can try to use hardware acceleration to speed up the processing. Some libraries, such as OpenCV, have GPU-accelerated functions that can be used to perform certain operations much faster than the equivalent CPU-based functions.
After spending a night and finishing about 6 cups of coffee I made a modified version
import cv2
import mediapipe as mp
import pyautogui
screen_w, screen_h = pyautogui.size()
cam = cv2.VideoCapture(0)
face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
def find_landmarks_and_click(landmarks, frame_w, frame_h):
for id, landmark in enumerate(landmarks[474:478]):
x = int(landmark.x * frame_w)
y = int(landmark.y * frame_h)
cv2.circle(frame, (x, y), 3, (0, 255, 0))
if id == 1:
screen_x = screen_w * landmark.x
screen_y = screen_h * landmark.y
pyautogui.moveTo(screen_x, screen_y)
left = [landmarks[145], landmarks[159]]
for landmark in left:
x = int(landmark.x * frame_w)
y = int(landmark.y * frame_h)
cv2.circle(frame, (x, y), 3, (0, 255, 255))
if (left[0].y - left[1].y) < 0.004:
pyautogui.click()
pyautogui.sleep(1)
while True:
ret, frame = cam.read()
if not ret:
break
frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = face_mesh.process(rgb_frame)
landmark_points = output.multi_face_landmarks
if landmark_points:
landmarks = landmark_points[0].landmark
frame_h, frame_w, _ = frame.shape
find_landmarks_and_click(landmarks, frame_w, frame_h)
cv2.imshow('Eye Controlled Mouse', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cam.release()
cv2.destroyAllWindows()
To stop the code, you can press the "q" key in the "Eye Controlled Mouse" window. The code uses the cv2.waitKey(1)
function, which waits for a keyboard event. If the value returned by cv2.waitKey(1)
is equal to ord("q")
, the code breaks out of the while loop and stops executing.
Testing
For testing such a small case project I decided to use unittest framework in python.
Generating unit tests for this code would require a comprehensive testing framework that can simulate a camera feed, face detection and landmark detection, mouse movements and clicks.
Here is an example of a basic unit test using the unittest
framework in Python:
pythonCopy codeimport unittest
import cv2
import mediapipe as mp
import pyautogui
class TestEyeControlledMouse(unittest.TestCase):
def setUp(self):
self.cam = cv2.VideoCapture(0)
self.face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
self.screen_w, self.screen_h = pyautogui.size()
def test_mouse_movement(self):
_, frame = self.cam.read()
frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = self.face_mesh.process(rgb_frame)
landmark_points = output.multi_face_landmarks
frame_h, frame_w, _ = frame.shape
if landmark_points:
landmarks = landmark_points[0].landmark
for id, landmark in enumerate(landmarks[474:478]):
x = int(landmark.x * frame_w)
y = int(landmark.y * frame_h)
if id == 1:
screen_x = self.screen_w * landmark.x
screen_y = self.screen_h * landmark.y
current_mouse_position = pyautogui.position()
self.assertNotEqual(current_mouse_position, (screen_x, screen_y))
break
def test_mouse_click(self):
_, frame = self.cam.read()
frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = self.face_mesh.process(rgb_frame)
landmark_points = output.multi_face_landmarks
frame_h, frame_w, _ = frame.shape
if landmark_points:
landmarks = landmark_points[0].landmark
left = [landmarks[145], landmarks[159]]
if (left[0].y - left[1].y) < 0.004:
current_mouse_clicks = pyautogui.click()
self.assertGreater(current_mouse_clicks, 0)
break
def tearDown(self):
self.cam.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
unittest.main()
This code creates two test cases, one to test mouse movement and another to test mouse clicks.
Note: this is just a basic example and you may need to add more tests and assertions depending on your specific requirements but for me, this works just fine.
To conclude
In conclusion, the Eye Controlled Mouse is a fascinating application of computer vision and machine learning that can have a significant impact on people with disabilities. This project demonstrated how to build a basic prototype of an Eye Controlled Mouse using OpenCV, MediaPipe, and PyAutoGUI. By tracking the eyes, the mouse cursor can be moved, and by tracking the eye blinks, mouse clicks can be simulated. While the current implementation is slow and requires optimization, it opens the door to a world of possibilities where human-computer interaction can be redefined. We hope this project will inspire developers and researchers to further explore this area and find more innovative solutions that can improve the lives of people with disabilities.