There has been a significant advancement in human genetic research for two decades. It generates genomic data for hundreds of thousands of individuals, including from thousands of prehistoric people. However, different methods and data quality can make comparisons among them difficult.
To produce a map of how individuals worldwide are related to each other, Scientists from the University of Oxford‘s Big Data Institute have created a new method that combines data from multiple sources and scales to accommodate millions of genome sequences.
Scientists applied a tree recording method to ancient and modern human genomes to generate a unified human genealogy. In other words, they built a huge family tree. It allows scientists to characterize how every person’s genetic sequence relates to every other, along with all the points of the genome.
Dr. Yan Wong, an evolutionary geneticist at the Big Data Institute and one of the principal authors explained: “It is a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today.”
For this study, scientists combined a tree sequence of 3609 individual genome sequences from 215 populations and eight different datasets. The ancient genomes included samples found worldwide with ages ranging from 1,000 to over 100,000 years.
Using simulations and empirical analyses, scientists demonstrated the ability to recover relationships between individuals and populations and identify descendants of ancient samples. They then calculated the time distribution to the most recent common ancestry between the 215 people of the constituent datasets.
The algorithms predicted where common ancestors must be present in the evolutionary trees to explain the patterns of genetic variation. The resulting network contained almost 27 million ancestors.
Scientists then added location data on these sample genomes to recapitulate key features of human history. The results successfully recaptured key events in human evolutionary history, including the migration out of Africa.
Lead author Dr. Anthony Wilder Wohns, who undertook the research as part of his Ph.D. at the Big Data Institute, said, “The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.”
“While humans are the focus of this study, the method is valid for most living things, from orangutans to bacteria. It could be particularly beneficial in medical genetics, in separating true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history.”
Dr. Wong said, “This study is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate, and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today.”
Scientists are further planning to make this map more comprehensive by incorporating genetic data as it becomes available.