New model to naturally detect depression in conversations

Neural network learns speech patterns that predict depression in clinical interviews.

Share

In order to help patients to tackle depression, clinicians perform various methods including asking specific questions — about, say, past mental illnesses, lifestyle, and mood. Now, MIT scientists have come up with a method that may help clinicians to easily detect depression.

Scientists have developed a neural-network model that can be unleashed on raw text and audio data from interviews to discover speech patterns indicative of depression. Their neural network can proficiently predict if the individual is depressed, without needing any other information about the questions and answers.

According to scientists, this method may have application in developing tools to detect a sign of depression in natural conversation.

First author Tuka Alhanai, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) said, “The first hints we have that a person is happy, excited. If you want to deploy [depression-detection] models in a scalable way … you want to minimize amount of constraints you have on the data you’re using. You want to deploy it in any regular conversation and have the model pick up, from the natural interaction, the state of the individual.”

Co-author James Glass, a senior research scientist in CSAIL said, “The technology could still, of course, be used for identifying mental distress in casual conversations in clinical offices. Every patient will talk differently, and if the model sees changes maybe it will be a flag to the doctors. This is a step forward in seeing if we can do something assistive to help clinicians.”

The model works by detecting patterns indicative of depression and then map those patterns to new individuals, with no additional information. It is provided with a specific set of questions and then given examples of how a person without depression responds and examples of how a person with depression responds — for example, the straightforward inquiry.

The researchers, on the other hand, used a technique called sequence modeling, often used for speech processing. With this technique, they fed the model sequences of text and audio data from questions and answers, from both depressed and non-depressed individuals, one by one. As the sequences accumulated, the model extracted speech patterns that emerged for people with or without depression.

Words such as, say, “sad,” “low,” or “down,” may be paired with audio signals that are flatter and more monotone. Individuals with depression may also speak slower and use longer pauses between words. These text and audio identifiers for mental distress have been explored in previous research. It was ultimately up to the model to determine if any patterns were predictive of depression or not.

The model then checks sequences of words or speaking style and determines that these patterns are more likely to be seen in people who are depressed or not depressed.

Scientists experimented their model on a dataset of 142 interactions from the Distress Analysis Interview Corpus that contains audio, text, and video interviews of patients with mental-health issues and virtual agents controlled by humans.

Each subject is evaluated regarding depression on a scale between 0 to 27, utilizing the Personal Health Questionnaire. Scores over a cutoff between direct (10 to 14) and modestly extreme (15 to 19) are viewed as depressed, while all others underneath that limit are considered not discouraged. Out of the considerable number of subjects in the dataset, 28 (20 percent) are tagged as depressed.

In tests, the model was assessed utilizing measurements of precision and review. Precision estimates which of the depressed subjects distinguished by the model were diagnosed as depressed. Review estimates the precision of the model in identifying all subjects who were analyzed as depressed in the whole dataset. Inaccuracy, the model scored 71 percent and, on review, scored 83 percent. The arrived at the midpoint of the combined score for those measurements, thinking about any blunders, was 77 percent. In the larger part of tests, the scientists’ model beat about every single other model.

Alhanai said, “One key insight from the research is that, during experiments, the model needed much more data to predict depression from audio than text. With text, the model can accurately detect depression using an average of seven question-answer sequences. With audio, the model needed around 30 sequences. That implies that the patterns in words people use that are predictive of depression happen in a shorter time span in text than in audio. Such insights could help the MIT researchers, and others, further refine their models.”

Glass said, “This work represents a “very encouraging” pilot. But now the researchers seek to discover what specific patterns the model identifies across scores of raw data. Right now it’s a bit of a black box. These systems, however, are more believable when you have an explanation of what they’re picking up. … The next challenge is finding out what data it’s seized upon.”

In the future, the model could, for instance, power mobile apps that monitor a user’s text and voice for mental distress and send alerts. This could be especially useful for those who can’t get to a clinician for an initial diagnosis, due to distance, cost, or a lack of awareness that something may be wrong.

The other co-author on the paper is Mohammad Ghassemi, a member of the Institute for Medical Engineering and Science (IMES).

Trending