An eye controlled mouse is a type of assistive technology that allows users to control a computer mouse using their eye movements. This technology is particularly useful for people with disabilities who may have difficulty using a traditional mouse, as well as for individuals who are looking for an alternative way to interact with their computer.

To use an eye controlled mouse, a camera is typically mounted above the computer screen to track the user's eye movements. This information is then used to control the movement of the mouse pointer on the screen. Some eye controlled mouse systems also have the ability to perform clicks and other actions by detecting changes in the user's eye movements, such as blinks or winks.

There are various commercially available eye controlled mouse systems, as well as open-source software that can be used to create a custom solution. The accuracy and ease-of-use of these systems can vary, but with advances in computer vision and machine learning, eye controlled mouse technology has become increasingly accessible and reliable.

Source code

Refer to this Github Repo for further information on my project

Line-by-Line Code Explanation

Python modules used

"cv2" is a Python library used for computer vision and image processing. It is a wrapper around the popular open-source computer vision library, OpenCV. The cv2 library provides functions for reading, writing, and manipulating images, as well as tools for object detection, face recognition, and more. It is widely used in computer vision applications due to its comprehensive and performance-oriented features.
Mediapipe is a cross-platform, open-source framework for building multimedia processing pipelines. It provides a set of tools and libraries for developing computer vision and audio processing applications on various platforms, including mobile devices and desktop computers. Mediapipe supports the creation of custom pipelines that can be optimized for specific use cases, making it a popular choice for developers working in fields such as augmented reality, robotics, and video analysis. The "mp" here in "mediapipe mp" stands for "mediapipe."
PyAutoGUI is a Python library for automating GUI interactions. It provides a simple and convenient way to interact with graphical user interfaces (GUIs) on your computer, allowing you to simulate mouse clicks, keyboard presses, and other forms of input. With PyAutoGUI, you can write scripts that can automate repetitive tasks, perform image recognition, and even control other software applications. Some common use cases for PyAutoGUI include automating tasks in productivity applications, performing image-based testing, and automating game playing. The library works with all major operating systems and provides a high-level API for controlling GUI interactions, making it a useful tool for both developers and testers.

cam = cv2.VideoCapture(0)
face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
screen_w, screen_h = pyautogui.size()

cam = cv2.VideoCapture(0): This line initializes a video capture object using OpenCV's cv2 library. The 0 argument passed to VideoCapture specifies that the default camera should be used as the source of the video feed. The video capture object is assigned to the variable cam.

face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True): This line initializes a face mesh object from the Mediapipe (mp) library's face_mesh module. The refine_landmarks argument is set to True, which means that the face mesh will use refined facial landmarks for more accurate face tracking. The face mesh object is assigned to the variable face_mesh.

screen_w, screen_h = pyautogui.size(): This line uses PyAutoGUI to get the dimensions of the computer's screen and assigns them to the variables screen_w and screen_h. The size function returns the width and height of the screen as a tuple, which is unpacked and assigned to the variables.

These three lines of code initialize objects that can be used to capture video from a camera, track facial landmarks in the video feed, and get the dimensions of the computer screen, respectively. The specifics of how these objects are used depend on the context in which they are implemented, but they are commonly used in computer vision and multimedia processing applications.

Now I'll be explaining the rest of the code with a slight difference

while True:
    _, frame = cam.read()
    frame = cv2.flip(frame, 1)
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    output = face_mesh.process(rgb_frame)
    landmark_points = output.multi_face_landmarks
    frame_h, frame_w, _ = frame.shape
    if landmark_points:
        landmarks = landmark_points[0].landmark
        for id, landmark in enumerate(landmarks[474:478]):
            x = int(landmark.x * frame_w)
            y = int(landmark.y * frame_h)
            cv2.circle(frame, (x, y), 3, (0, 255, 0))
            if id == 1:
                screen_x = screen_w * landmark.x
                screen_y = screen_h * landmark.y
                pyautogui.moveTo(screen_x, screen_y)
        left = [landmarks[145], landmarks[159]]
        for landmark in left:
            x = int(landmark.x * frame_w)
            y = int(landmark.y * frame_h)
            cv2.circle(frame, (x, y), 3, (0, 255, 255))
        if (left[0].y - left[1].y) < 0.004:
            pyautogui.click()
            pyautogui.sleep(1)
    cv2.imshow('Eye Controlled Mouse', frame)
    cv2.waitKey(1)

The code uses a while loop to repeatedly process frames from a video feed captured by a camera. The video capture object is created and initialized earlier in the code using cam = cv2.VideoCapture(0).

Each iteration of the loop performs the following actions:

_, frame = cam.read(): This line uses the read method of the video capture object to retrieve the next frame from the video feed. The _ is a placeholder variable that is used to ignore the return value indicating whether the frame was successfully retrieved.
frame = cv2.flip(frame, 1): This line uses the flip function from OpenCV's cv2 library to flip the video frame horizontally. This is done to ensure that the image orientation is correct for processing.
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB): This line uses the cvtColor function from OpenCV's cv2 library to convert the video frame from the BGR color format used by OpenCV to the RGB color format used by Mediapipe.
output = face_mesh.process(rgb_frame): This line uses the process method of the face mesh object created earlier in the code to process the video frame and detect facial landmarks. The result of this processing is stored in the output variable.
landmark_points = output.multi_face_landmarks: This line extracts the landmarks of the face detected in the video frame from the output variable. The landmarks are stored in the landmark_points variable.
if landmark_points:: This line checks if any facial landmarks were detected in the frame. If landmarks were detected, the code inside the if block is executed.
The code inside the if block performs the following actions:

a. Loops over the four landmarks corresponding to the eyes and draws circles around them in the video frame using the cv2.circle function from OpenCV's cv2 library.

b. Calculates the position of the mouse cursor based on the position of the eyes and uses the pyautogui.moveTo method from the PyAutoGUI library to move the mouse cursor to the calculated position.

c. Loops over the two landmarks corresponding to the mouth and draws circles around them in the video frame using the cv2.circle function.

d. Checks the vertical distance between the two landmarks and, if it is below a certain threshold, uses the pyautogui.click method from the PyAutoGUI library to simulate a mouse click. A pyautogui.sleep call is then made to pause the code for 1 second.
cv2.imshow('Eye Controlled Mouse', frame): This line displays the processed video frame in a window using the cv2.imshow function from OpenCV's cv2 library. The window is named "Eye Controlled Mouse".
cv2.waitKey(1): This line uses the cv2.waitKey function from OpenCV's cv2 library to wait for a key event in

Difficulties Faced

These code snippets as a program were now taken for testing, they were working perfectly fine until I noticed there was a slight delay in the execution process.
Improving the performance of the code would likely require a combination of several strategies, as the code can be slow for a variety of reasons. Here are a few points that I brainstromed with few friends from tech twitter community this might help speed things up:

Reduce the frame rate: You can reduce the frame rate that is captured by the webcam by increasing the wait time in the cv2.waitKey() function. For example, instead of cv2.waitKey(1), you could use cv2.waitKey(10) to slow down the frame rate.
Resize the frame: If the frame size is too large, the processing can become slow. You can resize the frame using the cv2.resize() function to make it smaller, which should speed up the processing.
Optimize the processing: You can try to optimize the processing of each frame by reducing the number of operations that are performed on it. For example, you might be able to eliminate the cv2.flip() operation, or you might be able to reduce the number of circles that are drawn on the frame.
Use multi-threading: You can try to use multi-threading to perform the processing in parallel, which should speed up the processing. However, this can be a complex task and might require a significant amount of programming effort.
Use hardware acceleration: If your computer has a GPU, you can try to use hardware acceleration to speed up the processing. Some libraries, such as OpenCV, have GPU-accelerated functions that can be used to perform certain operations much faster than the equivalent CPU-based functions.

After spending a night and finishing about 6 cups of coffee I made a modified version

import cv2
import mediapipe as mp
import pyautogui

screen_w, screen_h = pyautogui.size()
cam = cv2.VideoCapture(0)
face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)

def find_landmarks_and_click(landmarks, frame_w, frame_h):
    for id, landmark in enumerate(landmarks[474:478]):
        x = int(landmark.x * frame_w)
        y = int(landmark.y * frame_h)
        cv2.circle(frame, (x, y), 3, (0, 255, 0))
        if id == 1:
            screen_x = screen_w * landmark.x
            screen_y = screen_h * landmark.y
            pyautogui.moveTo(screen_x, screen_y)
    left = [landmarks[145], landmarks[159]]
    for landmark in left:
        x = int(landmark.x * frame_w)
        y = int(landmark.y * frame_h)
        cv2.circle(frame, (x, y), 3, (0, 255, 255))
    if (left[0].y - left[1].y) < 0.004:
        pyautogui.click()
        pyautogui.sleep(1)

while True:
    ret, frame = cam.read()
    if not ret:
        break
    frame = cv2.flip(frame, 1)
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    output = face_mesh.process(rgb_frame)
    landmark_points = output.multi_face_landmarks
    if landmark_points:
        landmarks = landmark_points[0].landmark
        frame_h, frame_w, _ = frame.shape
        find_landmarks_and_click(landmarks, frame_w, frame_h)
    cv2.imshow('Eye Controlled Mouse', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cam.release()
cv2.destroyAllWindows()

To stop the code, you can press the "q" key in the "Eye Controlled Mouse" window. The code uses the cv2.waitKey(1) function, which waits for a keyboard event. If the value returned by cv2.waitKey(1) is equal to ord("q"), the code breaks out of the while loop and stops executing.

Testing

For testing such a small case project I decided to use unittest framework in python.

Generating unit tests for this code would require a comprehensive testing framework that can simulate a camera feed, face detection and landmark detection, mouse movements and clicks.

Here is an example of a basic unit test using the unittest framework in Python:

pythonCopy codeimport unittest
import cv2
import mediapipe as mp
import pyautogui

class TestEyeControlledMouse(unittest.TestCase):
    def setUp(self):
        self.cam = cv2.VideoCapture(0)
        self.face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
        self.screen_w, self.screen_h = pyautogui.size()

    def test_mouse_movement(self):
        _, frame = self.cam.read()
        frame = cv2.flip(frame, 1)
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        output = self.face_mesh.process(rgb_frame)
        landmark_points = output.multi_face_landmarks
        frame_h, frame_w, _ = frame.shape
        if landmark_points:
            landmarks = landmark_points[0].landmark
            for id, landmark in enumerate(landmarks[474:478]):
                x = int(landmark.x * frame_w)
                y = int(landmark.y * frame_h)
                if id == 1:
                    screen_x = self.screen_w * landmark.x
                    screen_y = self.screen_h * landmark.y
                    current_mouse_position = pyautogui.position()
                    self.assertNotEqual(current_mouse_position, (screen_x, screen_y))
                    break

    def test_mouse_click(self):
        _, frame = self.cam.read()
        frame = cv2.flip(frame, 1)
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        output = self.face_mesh.process(rgb_frame)
        landmark_points = output.multi_face_landmarks
        frame_h, frame_w, _ = frame.shape
        if landmark_points:
            landmarks = landmark_points[0].landmark
            left = [landmarks[145], landmarks[159]]
            if (left[0].y - left[1].y) < 0.004:
                current_mouse_clicks = pyautogui.click()
                self.assertGreater(current_mouse_clicks, 0)
                break

    def tearDown(self):
        self.cam.release()
        cv2.destroyAllWindows()

if __name__ == '__main__':
    unittest.main()

This code creates two test cases, one to test mouse movement and another to test mouse clicks.
Note: this is just a basic example and you may need to add more tests and assertions depending on your specific requirements but for me, this works just fine.

To conclude

In conclusion, the Eye Controlled Mouse is a fascinating application of computer vision and machine learning that can have a significant impact on people with disabilities. This project demonstrated how to build a basic prototype of an Eye Controlled Mouse using OpenCV, MediaPipe, and PyAutoGUI. By tracking the eyes, the mouse cursor can be moved, and by tracking the eye blinks, mouse clicks can be simulated. While the current implementation is slow and requires optimization, it opens the door to a world of possibilities where human-computer interaction can be redefined. We hope this project will inspire developers and researchers to further explore this area and find more innovative solutions that can improve the lives of people with disabilities.

Eye Controlled Mouse

Table of contents