Enabling AI models to see the world more like humans do

Researchers enhance peripheral vision in AI models.

Share

Peripheral vision is the ability of humans to see objects that are not directly in front of them. Although it’s not as precise as what’s directly in front of us, it aids our ability to see objects in our peripheral vision, such as an approaching car.

However, AI and computers lack peripheral vision. To address this, researchers at MIT created a unique dataset that would train artificial intelligence to simulate peripheral vision like humans do. They discovered that while using this dataset to train the models helped them identify objects in the visual periphery, the models’ performance was still inferior to that of humans.

Surprisingly, unlike humans, AI performed equally well regardless of the size of objects or how clogged the scene was.

Vasha DuTell, a postdoc and co-author of a paper detailing this study, said, “There is something fundamental going on here. We tested so many different models, and even when we trained them, they got a little bit better, but they were not quite like humans. So, the question is: What is missing in these models?”

Many existing approaches to model peripheral vision in AI represent this deteriorating detail by blurring the edges of images. Still, the information loss in the optic nerve and visual cortex is far more complex.

The MIT researchers started with a method that simulates human peripheral vision to arrive at a more accurate method. This technique called the texture tiling model, modifies images to simulate the loss of visual information experienced by a human.  

This model was altered to enable more flexible picture transformation that does not require the AI or person to know where they will direct their eyes ahead of time. It allowed them to model peripheral vision the same way it is in human vision research.

To simulate the loss of detail that happens when a human gazes farther into the periphery, scientists employed this modified technique to create a massive collection of changed photos that appear more textured in some locations.

Subsequently, they trained many computer vision models on the dataset and evaluated their performance on an object detection test compared to human performance.

DuTell said, “We had to be very clever in how we set up the experiment so we could also test it in the machine learning models. We didn’t want to have to retrain the models on a toy task that they weren’t meant to be doing.”

Pairs of transformed images, one with a target object in the periphery and the other identical, were presented to humans and models. Each participant was then instructed to select an image with the desired object.

The models’ ability to detect and recognize items improved the most when trained from scratch using the researchers’ dataset. Smaller performance gains were obtained when fine-tuning a pre-trained model using their dataset, which entails adjusting the model to enable it to perform a new task.

However, the machines were never as accurate as people, and they performed poorly when identifying objects on the other side. Additionally, they performed differently from humans in terms of patterns.

Harrington said, “That might suggest that the models aren’t using context in the same way as humans are to do these detection tasks. The strategy of the models might be different.”

The researchers plan to investigate these variations further to develop a model that can forecast human performance in the visual periphery. This would allow AI systems to warn drivers of potential hazards, for example. They also want to use their freely available dataset to encourage other academics to conduct further computer vision research.

Justin Gardner, an associate professor in the Department of Psychology at Stanford University who was not involved with this work, said, “This work is important because it contributes to our understanding that human vision in the periphery should not be considered just impoverished vision due to limits in the number of photoreceptors we have, but rather, a representation that is optimized for us to perform tasks of real-world consequence.”

“Moreover, the work shows that neural network models, despite their advancement in recent years, are unable to match human performance in this regard, which should lead to more AI research to learn from the neuroscience of human vision. This future research will be aided significantly by the database of images provided by the authors to mimic peripheral human vision.”

Journal Reference:

  1. COCO-Periph: Bridging the Gap Between Human and Machine Perception in the Periphery. Paper.

Trending