Scientists discovered one million new components of the human genome

We’ve started to chip away at the dark genome by finding nearly one million previously unknown exons.


Indeed, the human genome is a vast landscape, comprising approximately 20,000 protein-coding genes. These genes contain around 180,000 internal exons, crucial protein synthesis segments. Surprisingly, these protein-coding regions represent only a small fraction, just one percent, of the entire human genome.

The remaining portion, often dubbed the “dark genome,” remains largely unexplored and mysterious.

Researchers at the University of Toronto‘s Donnelly Centre for Cellular and Biomolecular Research have found nearly one million new exons in the human genome––fragments of DNA that are expressed in mature RNA.

Timothy Hughes, principal investigator on the study and professor and chair of the department of molecular genetics at U of T’s Temerty Faculty of Medicine, said, “We’ve started to chip away at the dark genome by finding nearly one million previously unknown exons through a method called exon trapping.”

The technique involves using a process called exon trapping, which utilizes plasmids to identify exons within DNA fragments of unknown composition. Although exon trapping is less commonly used today, it has demonstrated its effectiveness, especially when combined with high-throughput sequencing methods that enable researchers to scan the entire human genome more rapidly and comprehensively.

Exons are crucial segments of the genome responsible for encoding proteins, vital for directing tissue development and various biological processes in the body. Some exons are considered autonomous, meaning they can splice into mature RNA transcripts without external assistance, which are then translated into proteins.

The research team aimed to challenge the exon definition model, a guiding principle in molecular genetics research. They questioned one of its assumptions: that clear and consistent indicators of exon boundaries facilitate the accurate removal of non-protein-coding intron regions. However, this assumption doesn’t always hold true, as exon splicing can be imperfect, leading to mature RNA transcripts containing non-functional components.

“Almost none of the newly discovered exons are found consistently across genomes of different species,” said Hughes. “They seem to appear in the human genome mainly due to random mutation and are unlikely to play a significant role in our biology. This is evidence that human evolution involves a lot of trial and error – most likely enabled by the vast size of our genome.”

Nearly four percent of the approximately 1.25 million known and unknown exons discovered through exon trapping were identified as long non-coding RNA exons.

Furthermore, there are exons found within non-coding introns, termed pseudoexons, which can undergo mutations to strengthen a weak splice site. Consequently, these pseudoexons may end up being included in a mature RNA transcript, which could contribute to the development of diseases.

Benjamin Blencowe, professor of molecular genetics at U of T’s Temerty Faculty of Medicine, who was not involved in the study, said, “This is an interesting study that broadens our knowledge of sequences across the human genome that have the potential to be recognized as exons in transcribed RNA. While the significance of the majority of the newly detected exons is unclear, some of them may be activated in certain contexts – for example, by disease mutations – and therefore cataloging them is important. This study will further serve as a valuable resource facilitating ongoing efforts directed at deciphering the splicing code.”

Journal Reference:

  1. Nicholas Stepankiw,1 Ally W.H. Yang and Timothy R. Hughes. The human genome contains over a million autonomous exons. Genome Res. DOI: 10.1101/gr.277792.123
- Advertisement -

Latest Updates