Machine learning models rank predictive risks for Alzheimer's disease

Alzheimer’s disease (AD) is the most common late-onset neurodegenerative disorder. Identifying individuals at increased risk of developing AD is important for early intervention.

In a new study, scientists at the Ohio State University have developed a machine learning model to rank risk factors to determine how strong their association is with the eventual development of Alzheimer’s disease. This is the first study to construct machine learning models with genetic risk scores, non-genetic information, and electronic health record data.

Using the models, scientists ranked predictive risk factors for two populations from the UK Biobank: White individuals aged 40 and older and a subset of those adults who were 65 or older.

The highest risk factor for Alzheimer’s in the general population is age, which according to the Alzheimer’s Association, accounts for one-third of total risk by the age of 85. However, the genetic risk was more predictive for older persons as measured by a polygenic risk score.

Lead study author Xiaoyi Raymond Gao, associate professor of ophthalmology and visual sciences and biomedical informatics at The Ohio State University College of Medicine, said, “We all know Alzheimer’s disease is a later-onset disease, so we know age is an important risk factor. But when we consider risk only for people age 65 or older, then genetic information captured by a polygenic risk score ranks higher than age.”

“That means it’s really important to consider genetic information when we work on Alzheimer’s disease.”

“The finding related to income is very, very interesting. We all want to have a healthy life, and income can be such an important factor in deciding what you can afford to eat, where you can afford to live, your education level, and access to care – all of these possibly contribute to Alzheimer’s disease.”

Of the 457,936 persons in the UK Biobank sample, 2,177 had Alzheimer’s disease, 455,759 did not, and 88,309 were 65 or older.

The following non-genetic risk factors stood out as being different in those with and without Alzheimer’s disease (AD): The findings revealed that higher systolic and lower diastolic blood pressure, diabetes, lower household income and education, recent falls, hearing impairment, and a mother’s history of having AD were all more prevalent in adults with AD.

Diagnoses of high blood pressure, urinary tract infection, depressive episodes, fainting, vague chest pain, disorientation, and abnormal weight loss were also on the top 20 list of risk factors for the entire sample of adults. High cholesterol and irregular gait were two additional risk factors in the top 20 for adults 65 and older. These results demonstrated the effectiveness of including condition codes from electronic medical records in the models.

Gao said, “Machine learning can explore relationships among all of those features, or variables, pick the important features and rank certain features at the top that contribute much more to Alzheimer’s disease risk than the rest. Typically, it’s not good to be highly obese, but we also see here that a lower body mass index is not good. High blood pressure is typically not good, but here we see lower diastolic blood pressure is not good. The models revealed some interesting patterns.”

The models were constructed in two steps. To find genetic variants connected to the overall risk of getting Alzheimer’s disease and the onset of the condition after a certain age, the researchers first performed genome-wide association studies using data from the Alzheimer’s Disease Genetics Consortium. Two polygenic risk scores, which combine genetic influences across the genome into a single risk estimate for each person, were created using the different sets of variations.

These scores were applied to DNA data from UK Biobank participants and biobank data on traditional risk factors like sex, education, body mass index, and blood pressure, as well as more than 11,000 condition codes from electronic health records that had been mentioned in the forms of specific individuals.

The team also used an algorithm in interpreting the model’s output to ensure risk factor variables were weighted objectively in the analysis.

Scientists noted, “We are born with our genetic risk for disease already established, but information about how other health and socioeconomic factors affect our risk for Alzheimer’s – as well as glaucoma, which Gao also studies – gives us the power to take preventive measures.”

Gao said, “If people know more about risk factors, they can adjust their lifestyle. For both Alzheimer’s and glaucoma, there is no cure so that prevention can help a lot. I also hope constructing models to make these predictions could help with drug development and effective and low-cost screening programs.”

Journal Reference:

Gao, X.R., Chiariglione, M., Qin, K. et al. Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction. Sci Rep 13, 450 (2023). DOI: 10.1038/s41598-023-27551-1

Machine learning models rank predictive risks for Alzheimer’s disease

Trending

Holographic displays: Pioneering an immersive future

ETH Zurich engineered E. coli bacteria for future climate-neutral chemicals

Rekindling old friendships is as scary as making new ones, study

Holographic displays: Pioneering an immersive future

ETH Zurich engineered E. coli bacteria for future climate-neutral chemicals