The human brain can recognize a particular sound and the direction it is coming from. Parts of the midbrain are specialized to compare these slight differences to help estimate what direction the sound came from.
Sound localization refers to our ability to identify the direction of a sound source. This task becomes more complex under real-world conditions — where the environment produces echoes and many sounds.
MIT scientists have developed a computer model of sound localization. It uses convolutional neural networks to perform the tasks humans do and struggle in the same way humans do.
Scientists have wanted to develop computer models to perform the same kind of calculations that the brain uses to localize sounds. However, these models work better in idealized settings without background noise but fail in real-world environments, with their noises and echoes.
This new computer model can localize sounds in the real world.
Josh McDermott, an associate professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research, said, “And when we treated the model as a human experimental participant and simulated this large set of experiments that people had tested humans on in the past, what we found over and over again is it the model recapitulates the results that you see in humans.”
Using a supercomputer, scientists trained and tested about 1,500 different models. That search identified ten that seemed the best-suited for localization. Scientists further trained the models and used them for subsequent studies.
For training the models, they created a virtual world in which they can control the size of the room and the reflection properties of the walls of the room.
All of the sounds fed to the models originated from somewhere in these virtual rooms. The set of more than 400 training sounds included:
- Human voices
- Animal sounds
- Machine sounds such as car engines
- Natural sounds such as thunder
Scientists ensured that the model starts with the same information provided by human ears. Several folds in the outer ear reflect sound, altering the frequencies that enter the ear. There are variations in these reflections based on the direction of the sound.
This effect was simulated by running each sound through a specialized mathematical function before the computer model.
MIT graduate student Andrew Francl said, “This allows us to give the model the same kind of information that a person would have.”
“Although the model was trained in a virtual world, when we evaluated it, it could localize sounds in the real world.”
During tests, the model also showed the same pattern of sensitivity to frequency as ears.
Josh McDermott, an associate professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research, said, “The model seems to use timing and level differences between the two ears in the same way that people do, in a way that’s frequency-dependent.”
Francl said, “As you add more and more sources, you get a specific pattern of decline in humans’ ability to accurately judge the number of sources present and their ability to localize those sources. Humans seem to be limited to localizing about three sources at once, and when we ran the same test on the model, we saw a similar pattern of behavior.”
Scientists noted, “When the models trained in these strange worlds were evaluated on the same battery of behavioral tests, the models deviated from human behavior, and how they failed varied depending on the type of environment, they had been trained in. These results support the idea that the localization abilities of the human brain are adapted to the environments in which humans evolved.”
- Francl, A., McDermott, J.H. Deep neural network models of sound localization reveal how perception is adapted to real-world environments. Nat Hum Behav 6, 111–133 (2022). DOI: 10.1038/s41562-021-01244-z