Machine-learning system processes sounds like humans do

Neuroscientists train a deep neural network to analyze speech and music.

MIT neuroscientists have developed a machine-learning system that can process speech and music the same way that humans do.
MIT neuroscientists have developed a machine-learning system that can process speech and music the same way that humans do.

MIT scientists have devised a new a new model that depicts human performance on auditory tasks such as identifying a musical genre. It consists of numerous layers of data handling units that can be prepared on immense volumes of information to perform particular assignments, was utilized by the analysts to reveal insight into how the human cerebrum might play out similar undertakings.

The study suggests that the human auditory cortex is hierarchical, much like the visual cortex. In this kind of course of action, tangible data goes through progressive phases of preparing, with essential data handled before and further developed highlights, for example, word significance extricated in later stages.

Scientists used the neural network and trained it to perform two auditory tasks, one involving speech and the other involving music. For the speech task, the researchers gave the model thousands of two-second recordings of a person talking. The task was to identify the word in the middle of the clip. For the music task, the model was asked to identify the genre of a two-second clip of music. Each clip also included background noise to make the task more realistic.

After numerous experiments, the model learned to perform the task just as accurately as a human listener.

MIT graduate student Alexander Kell and Stanford University Assistant Professor Daniel Yamins are the paper’s lead authors. Other authors are former MIT visiting student Erica Shook and former MIT postdoc Sam Norman-Haignere.

Kell said, “That’s been an exciting opportunity for neuroscience, in that we can actually create systems that can do some of the things people can do, and we can then interrogate the models and compare them to the brain.”

“The idea is over time the model gets better and better at the task. The hope is that it’s learning something general, so if you present a new sound that the model has never heard before, it will do well, and in practice that is often the case.”

Likewise, humans, the model tends to make mistake if human made the most mistakes on. Its processing units can be combined in various ways to create different architectures that affect the performance of the model.

Scientists then used their model to verify if human auditory cortex has a hierarchical structure.

Josh McDermott, the Frederick A. and Carole J. Middleton Assistant Professor of Neuroscience in the Department of Brain and Cognitive Sciences at MIT, “In a hierarchical system, a series of brain regions perform different types of computation on sensory information as it flows through the system. It has been well documented that the visual cortex has this type of organization. Earlier regions, known as the primary visual cortex, respond to simple features such as color or orientation. Later stages enable more complex tasks such as object recognition.”

“We thought that if we could construct a model that could do some of the same things that people do, we might then be able to compare different stages of the model to different parts of the brain and get some evidence for whether those parts of the brain might be hierarchically organized.”

The specialists found that in their model, essential features of sound, for example, the frequency is easy to remove in the beginning stages. As data is prepared and moves more distant along the system, it ends up harder to remove recurrence yet simpler to separate larger amount data, for example, words.

To check whether the model stages may recreate how the human sound-related cortex forms sound data, the specialists utilized useful attractive reverberation imaging (fMRI) to quantify diverse areas of auditory cortex as the cerebrum forms true sounds. They at that point contrasted the cerebrum reactions with the reactions in the model when it prepared similar sounds.

They found that the center phases of the model compared best to move in the essential auditory cortex, and later stages compared best to action outside of the essential cortex. This gives confirm that the sound-related cortex may be orchestrated in a progressive design, like the visual cortex.

McDermott said, “We found a distinction between primary auditory cortex and everything else.”

Alex Huth, an assistant professor of neuroscience and computer science at the University of Texas at Austin, says the paper is exciting in part because it offers convincing evidence that the early part of the auditory cortex performs generic sound processing while the higher auditory cortex performs more specialized tasks.”

“This is one of the ongoing mysteries in auditory neuroscience: What distinguishes the early auditory cortex from the higher auditory cortex? This is the first paper I’ve seen that has a computational hypothesis for that, who was not involved in the research.”

Scientists are now planning to develop models that can perform other types of auditory tasks, such as determining the location from which a particular sound came, to explore whether these tasks can be done by the pathways identified in this model or if they require separate pathways, which could then be investigated in the brain.

The study appears in the April 19 issue of Neuron.