Eye-Tracking AAC System
Augmentative and Alternative Communication (AAC) refers to the field of emerging technologies that utilizes communication systems to aide or supplement speech or writing for those with impairments or difficulties communicating.
It allows people to regain or enhance certain communications skills usually impaired by congenital conditions or acquired maladies such as autism or cerebral palsy.
Inspiration
I was captivated by a particular episode of Breaking Bad where Hector Salamanca, a paraplegic and mute character, is being questioned by the DEA. To communicate, he uses a bell and a nurse with a table of the alphabet. The nurse moves row by row, and Hector rings the bell when he sees a letter in the row he wants. Then she moves across the columns for the final selection. The process is repeated letter by letter, word by word, to form sentences.
This was a wicked scene, as we were all waiting to see if Hector would sell out Gus, his long long time enemy. However it was also painstakingly slow. That’s when the idea came: what if there was a more efficient way for people like Hector to communicate?
High-Level Circuit Block Diagram
High-Level Components
- Button/Gyro Circuit: Interfaced with Atmega328p Microcontroller
- USB Webcam: This is running on a webpage, capturing eye movements
- Flask (Python): running on user’s PC to parse and transmit data across components
How It Works
This project was broken into 4 main components:
- Glasses Design
- Web Page
- Eye-Tracker
- Button/Gyro
Glasses Design
The Glasses Frame was designed to houst the gyroscope on its side to measure the user’s head movements. A seperate mount was designed that would fix itself onto the front of the frame to houst the webcam in front of the right eye.
Web Page
Original 6x5 Grid
Updated 3x2 Grid
The first design had a 6x5 grid, each letter in its own box. But it was hard to track eye movement accurately because the boxes were too close together. So, we switched to a simpler design: two 3x2 grids with two letters in each box. A button lets you switch between the two grids. This made tracking way more accurate.
To type, you look at a letter and press a button—left or right—to select it. When you’re done spelling out a word, the computer reads it out loud.
Button Functions
Action | Left Button | Right Button |
---|---|---|
Single Press | Selects left-sided letter in highlighted box | Selects right-sided letter in highlighted box |
Double Press | Scrolls between the 2 keyboards (A-L | M-Z) | Speaks word in textbox aloud |
Long Press | Deletes a letter from textbox | Selects ‘Y’ or ‘Z’ if looking at that letter box |
Button/Gyro
- 1 Atmega328P
- 2 Push Buttons
- 2 LED’s
- 1 MPU6050 Gyroscope
The buttons and gyro were configured and controlled by the AtMega328p whose job was to write the current angle of the user’s head (x,y,z) and the current status of the push buttons the to the Serial Monotor using the Polulu Programmer’s TTL-serial port.
void loop() {
gyro(); //calculates pitch and roll, converts to degrees, then returns the (x,y,z) coords
button.tick(); //left button
button2.tick(); //right button
transmit(); //sends to serial monitor
delay(50);
}
Eye-Tracker
The system starts by prompting the user to focus on a series of letters displayed on the screen. It records the x and y coordinates of the pupil for each letter, effectively mapping the pupil’s position to individual letters. This is done by sending the webcam frames to the python script eyes.py running on Flask using SocketIO.
function sendframe(){
var cam = document.getElementById("camera")
var canv = document.getElementById("frame")
var context = canv.getContext("2d")
context.drawImage(cam, 0, 0)
var image = canv.toDataURL('image/jpeg', 1)
console.log("Sending frame")
socket.emit("frame", image)
}
The py script processes it and returns the current pupil’s center coordinates along with the gyro coordinates and the button statuses from the microcontroller where it is converted to a letter’s position on the screen and highlighted.
socket.on("response", function(xy){
let tex = ['a', 'c', 'e', 'g', 'i', 'k']
xEye = xy.x
yEye = xy.y
prebtn1 = btn1
prebtn2 = btn2
// console.log(xy.data[0])
btn1 = xy.data[0]
btn2 = xy.data[1]
if(prebtn2!=btn2 && (btn2 == 3||btn2 == 4)){toScroll()}
if(prebtn1!=btn1 && (btn1 == 3||btn1 == 4)){
speakAloud()
count = 0
}
...
})
Additionally, the system captures the initial position of an integrated gyroscope to serve as a reference for the user’s head position. If the head position is significantly changed, the user will be prompted to recalibrate.
The main python function is always active, constantly tracking the pupil’s position. It’s purpose is to return the current pupil’s center coordinates to the web page. Here is how it works:
-
Initial Isolation: The “eyes.py” starts by analyzing the video feed, frame by frame, to isolate the pupil from the rest of the eye and face. This is achieved using an ’eye classifier’ from the openCV python module, trained on tens of thousands of eye images.
-
Pixel Tracking: The pupil, being the darkest part of the eye, is the focus for tracking. The system converts the eye area into greyscale and sets a color threshold. Pixels below this threshold turn black, effectively leaving the pupil as the only black pixels on the screen.
-
Blob Detection: To enhance the pupil’s visibility, a technique called ‘blob detection’ is used. This makes the pupil more pronounced and easier to track.
- Coordinate Positioning: The system calculates the midpoint of the pronounced pupil blob and uses it as the coordinate position for tracking.
def getXY(frame):
if frame is None:
print("No frames detected")
return -1
eyes = eye_cascade.detectMultiScale(frame) #array of arrays of eyes
topx = 20
topy = 150
botx = 400
boty = 350
cv2.rectangle(frame, (topx,topy), (botx, boty), (255, 255, 100), 1)
eye = frame[topy:boty, topx:botx]
gray_eye = cv2.cvtColor(eye, cv2.COLOR_BGR2GRAY)
blur_eye = blob_process(eye, thresh, detector)
_, contours, _ = cv2.findContours(blur_eye, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=lambda x: cv2.contourArea(x), reverse = True) #sorts from biggest
#print(contours)
xPupil = 0
yPupil = 0
for cnt in contours:
cv2.drawContours(eye, [cnt], -1, (0, 0, 255), 2)
(x, y, w, h) = cv2.boundingRect(cnt)
xPupil = int(x+w/2)
yPupil = int(y+h/2)
#print('Pupil Center: ', xPupil, yPupil)
cv2.rectangle(eye, (x,y), (x+w, y+h), (255, 0, 0), 1)
cv2.line(eye, (xPupil,0), (xPupil, 300), (200,0,0), 1 )
cv2.line(eye, (0, yPupil), (400, yPupil), (200, 0, 0), 1)
break
return xPupil, yPupil
- Real-Time Updates: This process is repeated five times every second, providing real-time tracking of the pupil’s position.
Conclusion
The Eye-Tracking AAC System is a groundbreaking solution designed to revolutionize the way people with communication impairments interact with the world. By leveraging technologies like Python, C, and JavaScript, this system provides a seamless and intuitive method for typing using eye movements.
From the initial inspiration drawn from a TV show to the intricate details of eye-tracking algorithms, this project is a testament to the power of technology to make lives better. It’s not just about tracking eye movements; it’s about giving people the freedom to communicate on their own terms.
Whether you’re a developer interested in the nitty-gritty details of the code or someone who sees the potential for real-world applications, this project offers something for everyone.
Feel free to check out the full documentation here for more details.
See the full code here.