Scientists discovered thousands of rare new CRISPR systems

These systems that have a range of functions and could enable gene editing, diagnostics, and more.


Microbial biochemical systems are incredibly diverse, and computational tools to analyze sequence data are essential in identifying new and valuable components for biotechnology development.

Scientists from the National Center for Biotechnology Information (NCBI) at the National Institutes of Health, the McGovern Institute for Brain Research at MIT, and the Broad Institute at MIT and Harvard have created a new search algorithm. They discovered 188 unique types of uncommon CRISPR systems in bacterial genomes, totaling thousands of distinct systems. 

The system quickly searches vast volumes of genetic data using big-data clustering techniques. Using their algorithm called Fast Locality-Sensitive Hashing-based clustering (FLSHclust), the scientists mined three large public datasets containing information from a variety of strange bacteria, including those that have been found in dog saliva, coal mines, breweries, and Antarctic lakes.

The study highlights an unprecedented level of diversity and flexibility of CRISPR. There are likely many rare systems yet to be discovered as databases grow.

Based on a methodology taken from the big data community, scientists created an algorithm to scan databases of protein and nucleic acid sequences for new CRISPR systems. This technique is called locality-sensitive hashing, which clusters similar but not identical objects together. In contrast to earlier methods that search for exact things, this approach allowed the scientists to probe billions of protein and DNA sequences from the NCBI, its Whole Genome Shotgun database, and the Joint Genome Institute in weeks. Their system was created to search for CRISPR-associated genes.

Soumya Kannan, a co-first author of the study, said, “This new algorithm allows us to parse through data in a time frame short enough that we can actually recover results and make biological hypotheses.”

Han Altae-Tran, a postdoctoral researcher at the University of Washington, said“This is a testament to what you can do when you improve on the methods for exploration and use as much data as possible.”

Scientists discovered during their analysis that thousands of CRISPR systems fit under several newly created and preexisting categories. They examined a few of the new systems in more detail in the lab.

They discovered several new versions of existing Type I CRISPR systems, which employ a 32-base pair guide RNA instead of Cas9’s 20 nucleotide guide. These Type I systems may be used to create more accurate gene-editing technology that is less likely to edit off-target because of their longer guide RNAs. Two of these methods were demonstrated by Zhang’s group to be able to make brief modifications to human cell DNA.

These Type I systems might also be transmitted to human or animal cells using the same gene-delivery technologies already used for CRISPR, given their size similar to CRISPR-Cas9.

After the CRISPR protein connects to its target, one of the Type I systems also demonstrated “collateral activity,” which is the widespread breakdown of nucleic acids. Scientists used similar methods, like SHERLOCK, which can quickly detect a single DNA or RNA molecule, to diagnose infectious diseases. Zhang’s group believes that diagnostic technologies could also benefit from adaptations of the new systems.

Additionally, scientists discovered novel modes of action for specific Type IV CRISPR systems and a Type VII system that explicitly targets RNA and may be applied to RNA editing. Other methods might be employed as sensors of a particular activity in a living cell or as recording tools, a molecular record of when a gene was expressed.

This new algorithm could be helpful in the search for other biochemical systems. Anyone who wants to work with these large databases could also use it to study how proteins evolve or discover new genes.

Journal Reference:

  1. Altae-Tran H, Kannan S, et al. Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering. Science. DOI: 10.1126/science.adi1910.


See stories of the future in your inbox each morning.