Scientists at deCODE genetics, in collaboration with scientists in Denmark, have released the first report from the greatest whole genome sequencing effort to date. They report on the genome sequences of 150 thousand participants in the UK biobank.
Thanks to this huge dataset, scientists could distinguish between regions that can tolerate a high degree of diversity in sequence and those that cannot. It is assumed that areas that are intolerant of sequence variety are crucial for human reproduction and survival. Coding exons are widely believed to be the areas most vital to human existence. However, only 13% of the sequences in the 1% of the genome that is the most conserved are found to be coding exons.
Scientists found 600 million SNPs and indels in these 150 thousand genomes corresponding to 7% of those that can theoretically occur in the human genome. It is, however, likely that some of the theoretically possible variants are incompatible with life.
Kari Stefansson, the founder of deCODE and one of the authors of the paper, said, “Data of this type and quantity are going to revolutionize our ability to identify and characterize intergenic sequences of importance to human diversity, be it to the risk of disease and response to treatment or some other attributes.”
Scientists also found a correlation between variants not identified through whole exome sequencing with diseases and other phenotypes.
85% of participants in the study could trace most of their ancestry to the British Isles. It was also found that most participants trace their ancestry mainly to Africa and South Asia.
Scientists noted, “This study is likely to represent the largest set of whole genome sequenced individuals of African and South-Asian origin. However, the imbalance in the ethnic mix of those contributing sequences to this study and other studies already published is unfortunate from both societal and scientific points of view. Scientists at deCODE genetics are determined to work towards more ethnically balanced sequencing cohorts in the future.”
- Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature DOI: 10.1038/s41586-022-04965-x.