For the last four decades, an eye-tracking system, known as the eye tracker device, has been used to measure long points and the motion of an eye relative to the head. It measures eye positions and motions. It is used in the visual system, psychology, psycholinguistics, marketing, as an input device for human-computer interaction, and in product design.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Georgia recently developed new software that can turn any smartphone into an eye-tracking device.
This system could enable new computer interfaces, help detect indications of developing a neurological disease or mental illness, and make the current eye-tracking system more approachable.
According to Aditya Khosla, an MIT graduate student in electrical engineering and computer science, “This system is stuck in this chicken-and-egg loop.”
He said, “Most of the people have external devices. That’s why there is not so big incentive to make applications for them. Since there are no applications, there’s no incentive for people to buy the devices. We thought we should break this circle and try to make an eye tracker that works on a single mobile device, using just your front-facing camera.”
New Eye Test Could Detect Glaucoma Years Earlier
Researchers developed this new eye tracker using a machine learning system. Machine learning is a field of study that allows computers to learn without being explicitly programmed. Khosla and his colleague Kyle Krafka of the University of Georgia, Wojciech Matusik, an MIT professor of electrical engineering, Antonio Torralba, an MIT professor of computer science, and three others developed this new software.
Strength in numbers
Khosla says, “Their training set includes examples of gaze patterns from 1,500 mobile-device users. Previously, the largest data sets used to train experimental eye-tracking systems had topped out at about 50 users.”
“For assembling data sets, other groups tend to call people into the lab. It’s really hard to scale that up. Calling 50 people in itself is already a fairly tedious process. But we realized we could do this through crowdsourcing”, he continued.
At the beginning of the experiment, researchers used training data drawn from 800 mobile-device users. Through it, researchers gain the system’s margin of error down to 1.5 centimeters. This is a double improvement in previous systems. After submitting the paper, the additional training data has been reduced. The margin of error becomes about a centimeter by collecting data from another 700 people.
The researchers prepared and retained their system using different-sized subsets of their data to get an idea about how these training sets may enhance performance. The result is that 10,000 training examples are enough to reduce the margin of error to a half-centimetre. This is good enough to develop the system profitably and reasonably.
The researchers developed a simple application for devices using Apple’s iOS operating system to collect training examples. This application beams a small dot somewhere on the device’s screen to attract users’ attention.
After that, it concisely replaces this dot with either an “R” or an “L,” telling the user to tap either the right or the left on the screen. After tapping, the user has shifted his or her gaze to the intended location. The device camera continuously captures images of the user’s face while this process is ongoing. The data set contains, on average, 1,600 images for each user.
Tightening the net
The researchers’ machine-learning system was a neural network. This neural network is a software abstraction, but can be considered a vast network of elementary information processors arranged into discrete layers.
Training improves the settings of different processors. Due to this, a data item can capture images of the mobile user and augment the bottom layer. The subsequent layers will process this. The topmost layer overcomes the computational problems.
Researchers used the dark knowledge technique to decrease the size of neural networks because the neural network is so extensive. Dark knowledge includes a fully trained network output, generally an approximate solution.
This can be the real solution to teaching a much smaller network. The technique decreased the size of the network by 80 percent, making it run more effectively on smartphones. With a reduced network, this eye tracker can perform at about 15 frames per second, which is faster than recording a brief look.
Noah Snavely, an associate professor of computer science at Cornell University, “In lots of cases, if you want to do a user study, in computer vision, in marketing, in developing new user interfaces, eye-tracking system is something people have been very interested in, but it hasn’t really been accessible. You need expensive equipment, or it has to be calibrated very well in order to work. So something that will work on a device everyone has seems very compelling. And from what I’ve seen, the accuracy they get seems like it’s in the ballpark that you can do something interesting.”
“Part of the excitement is that they’ve also created this way of collecting data, and also the data set itself. They did all the legwork that will make other people interested in this problem. And the fact that the community will start working on this will lead to fast improvements”, he explained.