The current human reference genome is the most precise and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist.
Now, for the first time, scientists have determined the complete sequence of a human chromosome from one end to the other (‘telomere to telomere’) with no gaps and an unprecedented level of accuracy. Scientists made it possible by sequencing technologies that enable “ultra-long reads,” such as the nanopore sequencing technology pioneered at UC Santa Cruz.
Lead author Karen Miga, a research scientist at the UC Santa Cruz Genomics Institute said, “These repeat-rich sequences were once deemed intractable, but now we’ve made leaps and bounds in sequencing technology. With nanopore sequencing, we get ultra-long reads of hundreds of thousands of base pairs that can span an entire repeat region, so that bypasses some of the challenges.”
“We’re starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we’ve been missing a lot of information that could be important to understanding human biology and disease.”
Miga and Adam Phillippy at the National Human Genome Research Institute (NHGRI), both corresponding authors of the new paper, co-founded the Telomere-to-Telomere (T2T) consortium to pursue a complete genome assembly after working together on a 2018 paper that demonstrated the potential of nanopore technology to produce an entire human genome sequence. That effort used the Oxford Nanopore Technologies MinION sequencer, which sequences DNA by detecting the change in current flow as single molecules of DNA pass through a tiny hole (a “nanopore”) in a membrane.
The new project built on that effort, combining nanopore sequencing with other sequencing technologies from PacBio and Illumina and optical maps from BioNano Genomics. Using these technologies, the team produced a whole-genome assembly that exceeds all prior human genome assemblies in terms of continuity, completeness, and accuracy, even surpassing the current human reference genome by some metrics.
To finish the X chromosome, scientists manually resolved several gaps in the sequence. Two segmental duplications were settled with ultra-long nanopore reads that ultimately spanned the repeats and were exceptionally tied down on either side. The remaining break was at the centromere, a famously tricky region of tedious DNA found in each chromosome.
In the X chromosome, the centromere envelops a region of exceptionally repetitive DNA spanning 3.1 million base pairs. The group had the option to recognize variations inside the recurrent sequence to fill in as markers, which they used to align the long reads and associate them together to traverse the span centromere.
Miga said, “For me, the idea that we can put together a 3-megabase-size tandem repeat is just mind-blowing. We can now reach these repeat regions covering millions of bases that were previously thought intractable.”
Using an iterative process over three different sequencing platforms, scientists polished the sequence and reached a high level of accuracy. The unique markers provide an anchoring system for the ultra-long reads, and once you anchor the reads, you can use multiple data sets to call each base.
Nanopore sequencing also detects bases that have been modified by methylation, an “epigenetic” change that does not alter the sequence but has essential effects on DNA structure and gene expression.
By mapping patterns of methylation on the X chromosome, the team was able to confirm previous observations and reveal some interesting trends in methylation patterns within the centromere.
The new human genome sequence, derived from a human cell line called CHM13, closes many gaps in the current reference genome, known as Genome Reference Consortium build 38 (GRCh38).
- Karen H. Miga et al., Telomere-to-telomere assembly of a complete human X chromosome, Nature (2020). DOI: 10.1038/s41586-020-2547-7