Eye-Tracking AAC System

Augmentative and Alternative Communication (AAC) refers to the field of emerging technologies that utilizes communication systems to aide or supplement speech or writing for those with impairments or difficulties communicating.

It allows people to regain or enhance certain communications skills usually impaired by congenital conditions or acquired maladies such as autism or cerebral palsy.

Inspiration

I was captivated by a particular episode of Breaking Bad where Hector Salamanca, a paraplegic and mute character, is being questioned by the DEA. To communicate, he uses a bell and a nurse with a table of the alphabet. The nurse moves row by row, and Hector rings the bell when he sees a letter in the row he wants. Then she moves across the columns for the final selection. The process is repeated letter by letter, word by word, to form sentences.

This was a wicked scene, as we were all waiting to see if Hector would sell out Gus, his long long time enemy. However it was also painstakingly slow. That’s when the idea came: what if there was a more efficient way for people like Hector to communicate?

High-Level Circuit Block Diagram

High-level Circuit Block Diagram

High-Level Components

Button/Gyro Circuit: Interfaced with Atmega328p Microcontroller
USB Webcam: This is running on a webpage, capturing eye movements
Flask (Python): running on user’s PC to parse and transmit data across components

How It Works

This project was broken into 4 main components:

Glasses Design
Web Page
Eye-Tracker
Button/Gyro

Glasses Design

Glasses on test user

The Glasses Frame was designed to houst the gyroscope on its side to measure the user’s head movements. A seperate mount was designed that would fix itself onto the front of the frame to houst the webcam in front of the right eye.

Web Page

Original 6x5 Grid

Updated 3x2 Grid

The first design had a 6x5 grid, each letter in its own box. But it was hard to track eye movement accurately because the boxes were too close together. So, we switched to a simpler design: two 3x2 grids with two letters in each box. A button lets you switch between the two grids. This made tracking way more accurate.

To type, you look at a letter and press a button—left or right—to select it. When you’re done spelling out a word, the computer reads it out loud.

Button Functions

Action	Left Button	Right Button
Single Press	Selects left-sided letter in highlighted box	Selects right-sided letter in highlighted box
Double Press	Scrolls between the 2 keyboards (A-L \| M-Z)	Speaks word in textbox aloud
Long Press	Deletes a letter from textbox	Selects ‘Y’ or ‘Z’ if looking at that letter box

Button/Gyro

Circuit

1 Atmega328P
2 Push Buttons
2 LED’s
1 MPU6050 Gyroscope

The buttons and gyro were configured and controlled by the AtMega328p whose job was to write the current angle of the user’s head (x,y,z) and the current status of the push buttons the to the Serial Monotor using the Polulu Programmer’s TTL-serial port.

void loop() {
  gyro(); //calculates pitch and roll, converts to degrees, then returns the (x,y,z) coords 
  button.tick(); //left button 
  button2.tick(); //right button 
  transmit(); //sends to serial monitor
  delay(50);
}

Eye-Tracker

The system starts by prompting the user to focus on a series of letters displayed on the screen. It records the x and y coordinates of the pupil for each letter, effectively mapping the pupil’s position to individual letters. This is done by sending the webcam frames to the python script eyes.py running on Flask using SocketIO.

 function sendframe(){
            var cam = document.getElementById("camera")
            var canv = document.getElementById("frame")
            var context = canv.getContext("2d")
            context.drawImage(cam, 0, 0)
            var image = canv.toDataURL('image/jpeg', 1)
            console.log("Sending frame")
            socket.emit("frame", image)
        }

The py script processes it and returns the current pupil’s center coordinates along with the gyro coordinates and the button statuses from the microcontroller where it is converted to a letter’s position on the screen and highlighted.

    socket.on("response", function(xy){
        let tex = ['a', 'c', 'e', 'g', 'i', 'k']
        xEye = xy.x
        yEye = xy.y
        prebtn1 = btn1
        prebtn2 = btn2
        // console.log(xy.data[0])
        btn1 = xy.data[0]
        btn2 = xy.data[1]
        if(prebtn2!=btn2 && (btn2 == 3||btn2 == 4)){toScroll()}
            if(prebtn1!=btn1 && (btn1 == 3||btn1 == 4)){
                speakAloud()
                count = 0
            }
        ... 
    })

Additionally, the system captures the initial position of an integrated gyroscope to serve as a reference for the user’s head position. If the head position is significantly changed, the user will be prompted to recalibrate.

The main python function is always active, constantly tracking the pupil’s position. It’s purpose is to return the current pupil’s center coordinates to the web page. Here is how it works:

Initial Isolation: The “eyes.py” starts by analyzing the video feed, frame by frame, to isolate the pupil from the rest of the eye and face. This is achieved using an ’eye classifier’ from the openCV python module, trained on tens of thousands of eye images.
Pixel Tracking: The pupil, being the darkest part of the eye, is the focus for tracking. The system converts the eye area into greyscale and sets a color threshold. Pixels below this threshold turn black, effectively leaving the pupil as the only black pixels on the screen.
Blob Detection: To enhance the pupil’s visibility, a technique called ‘blob detection’ is used. This makes the pupil more pronounced and easier to track.

Circuit

Coordinate Positioning: The system calculates the midpoint of the pronounced pupil blob and uses it as the coordinate position for tracking.


def getXY(frame):

    if frame is None:
        print("No frames detected")
        return -1

    eyes = eye_cascade.detectMultiScale(frame) #array of arrays of eyes

    topx = 20
    topy = 150
    botx = 400
    boty = 350
    cv2.rectangle(frame, (topx,topy), (botx, boty), (255, 255, 100), 1)
    eye = frame[topy:boty, topx:botx]
    gray_eye = cv2.cvtColor(eye, cv2.COLOR_BGR2GRAY)

    blur_eye = blob_process(eye, thresh, detector)
    _, contours, _ = cv2.findContours(blur_eye, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=lambda x: cv2.contourArea(x), reverse = True) #sorts from biggest 
    #print(contours)
    xPupil = 0
    yPupil = 0
    for cnt in contours:

        cv2.drawContours(eye, [cnt], -1, (0, 0, 255), 2)
        (x, y, w, h) = cv2.boundingRect(cnt)
        xPupil = int(x+w/2)
        yPupil = int(y+h/2)
        #print('Pupil Center: ', xPupil, yPupil)
        cv2.rectangle(eye, (x,y), (x+w, y+h), (255, 0, 0), 1)
        cv2.line(eye, (xPupil,0), (xPupil, 300), (200,0,0), 1 )
        cv2.line(eye, (0, yPupil), (400, yPupil), (200, 0, 0), 1)
        break

    return xPupil, yPupil

Real-Time Updates: This process is repeated five times every second, providing real-time tracking of the pupil’s position.

Conclusion

The Eye-Tracking AAC System is a groundbreaking solution designed to revolutionize the way people with communication impairments interact with the world. By leveraging technologies like Python, C, and JavaScript, this system provides a seamless and intuitive method for typing using eye movements.

From the initial inspiration drawn from a TV show to the intricate details of eye-tracking algorithms, this project is a testament to the power of technology to make lives better. It’s not just about tracking eye movements; it’s about giving people the freedom to communicate on their own terms.

Whether you’re a developer interested in the nitty-gritty details of the code or someone who sees the potential for real-world applications, this project offers something for everyone.

Feel free to check out the full documentation here for more details.

See the full code here.