CellHint: New methodology unifies single-cell data

Creating harmonized, applicable datasets for the study of human health and disease.

Share

In the last ten years, scientists have gathered much information about individual cells in our bodies, studying how they work and change as we grow. Groups of researchers from around the world, like the Human Cell Atlas, are trying to create a standard map of all the different types of cells in the human body.

However, they need help with different labs using their names for these cells, which can confuse them. To help with this, there is an ongoing effort called the Cell Ontology database, which is trying to create a standard way of naming cell types that everyone can use.

Researchers at the Wellcome Sanger Institute, the University of Cambridge, EMBL’s European Bioinformatics Institute (EMBL-EBI), and collaborators developed the tool called CellHint. CellHint uses machine learning to unify data produced worldwide, allowing it to be accessed by the broader research community, potentially driving discoveries.

In a new study, researchers used CellHint to reveal underexplored connections between healthy and diseased lung cell states. The researchers used CellHint to study eight diseases, including interstitial and chronic obstructive pulmonary lung disease, demonstrating the tool’s potential benefits. Additionally, they applied CellHint to 12 tissues from 38 datasets, creating a comprehensive cross-tissue database with approximately 3.7 million cells.

CellHint is accessible for free worldwide and is a part of the Human Cell Atlas initiative, which seeks to map every cell type in the human body to enhance our understanding of health and disease.

CellHint unified cell types produced by independent laboratories. They then placed the data into a defined graph that shows the relationships between cell subtypes, giving a full picture of all the cells identified across different datasets.

The scientists used CellHint to study the connections between healthy and diseased lung cell states in eight different diseases, uncovering new relationships. They also identified specific cell types in the adult human hippocampus that could be important for future research.

Moreover, they applied CellHint to 12 tissues from 38 datasets, creating a comprehensive cross-tissue database with approximately 3.7 million cells. Each cell was labeled with specific information through a process called annotation. The researchers demonstrated how CellHint can generate various models for automatically annotating cells across different human tissues.

Journal Reference:

  1. C. Xu, M. Prete, S. Webb, et al. (2023) Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell. DOI: 10.1016/j.cell.2023.11.026

Trending