SuperCT: A supervised-learning-framework to enhance the characterization of single-cell transcriptomic profiles

Published on Sep 16, 2018in bioRxiv
· DOI :10.1101/416719
Peng Xie1
Estimated H-index: 1
(UTD: University of Texas at Dallas),
Mingxuan Gao1
Estimated H-index: 1
(Ha Tai: Xiamen University)
+ 6 AuthorsWei Lin12
Estimated H-index: 12
(TGen: Translational Genomics Research Institute)
Characterization of individual cell types is fundamental to the study of multicellular samples such as tumor tissues. Single-cell RNAseq techniques, which allow high-throughput expression profiling of individual cells, have significantly advanced our ability of this task. Currently, most of the scRNA-seq data analyses are commenced with unsupervised clustering of cells followed by visualization of clusters in a low-dimensional space. Clusters are often assigned to different cell types based on canonical markers. However, the efficiency of characterizing the known cell types in this way is low and limited by the investigator[s] knowledge. In this study, we present a technical framework of training the expandable supervised-classifier in order to reveal the single-cell identities based on their RNA expression profiles. Using multiple scRNA-seq datasets we demonstrate the superior accuracy, robustness, compatibility and expandability of this new solution compared to the traditional methods. We use two examples of model upgrade to demonstrate how the projected evolution of the cell-type classifier is realized.
  • References (12)
  • Citations (1)
📖 Papers frequently viewed together
5 Authors (Amir Alavi, ..., Ziv Bar-Joseph)
2 Citations
3 Authors (Xiaoyang Chen, ..., Rui Jiang)
2 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Marlon StoeckiusH-Index: 14
Last. Peter SmibertH-Index: 20
view all 8 authors...
Using established high-throughput single-cell RNA-seq platforms, CITE-seq combines highly multiplexed, antibody-based protein marker quantification with unbiased transcriptome profiling for thousands of single cells.
248 CitationsSource
#1Xiaojie Qiu (UW: University of Washington)H-Index: 13
#2Qi Mao (Here)H-Index: 10
Last. Cole Trapnell (UW: University of Washington)H-Index: 47
view all 7 authors...
Monocle 2 uses reversed graph embedding to automatically learn complex, branched pseudotime trajectories of differentiation or cellular state changes from single-cell expression data. Monocle 2 uses reversed graph embedding to automatically learn complex, branched pseudotime trajectories of differentiation or cellular state changes from single-cell expression data. Single-cell trajectories can unveil how gene regulation governs cell fate decisions. However, learning the structure of complex traj...
380 CitationsSource
#1Zizhen Yao (Allen Institute for Brain Science)H-Index: 34
#2John K. Mich (Allen Institute for Brain Science)H-Index: 5
Last. Sharad Ramanathan S (Harvard University)H-Index: 20
view all 32 authors...
Summary During human brain development, multiple signaling pathways generate diverse cell types with varied regional identities. Here, we integrate single-cell RNA sequencing and clonal analyses to reveal lineage trees and molecular signals underlying early forebrain and mid/hindbrain cell differentiation from human embryonic stem cells (hESCs). Clustering single-cell transcriptomic data identified 41 distinct populations of progenitor, neuronal, and non-neural cells across our differentiation t...
48 CitationsSource
#1Karthik Shekhar (Broad Institute)H-Index: 18
#2Sylvain W. Lapan (Harvard University)H-Index: 14
Last. Joshua R. Sanes (Harvard University)H-Index: 122
view all 14 authors...
Summary Patterns of gene expression can be used to characterize and classify neuronal types. It is challenging, however, to generate taxonomies that fulfill the essential criteria of being comprehensive, harmonizing with conventional classification schemes, and lacking superfluous subdivisions of genuine types. To address these challenges, we used massively parallel single-cell RNA profiling and optimized computational methods on a heterogeneous class of neurons, mouse retinal bipolar cells (BCs...
390 CitationsSource
#1Bosiljka Tasic (Allen Institute for Brain Science)H-Index: 29
#2Vilas Menon (Allen Institute for Brain Science)H-Index: 22
Last. Hongkui Zeng (Allen Institute for Brain Science)H-Index: 32
view all 22 authors...
Mammalian cortex comprises a variety of cells, but the extent of this cellular diversity is unknown. The authors defined cell types in the primary visual cortex of adult mice using single-cell transcriptomics. This revealed 49 cell types, including 23 GABAergic, 19 glutamatergic and 7 non-neuronal types.
579 CitationsSource
#1Amit Zeisel (KI: Karolinska Institutet)H-Index: 25
#2Ana B. Muñoz-Manchado (KI: Karolinska Institutet)H-Index: 13
Last. Sten Linnarsson (KI: Karolinska Institutet)H-Index: 43
view all 14 authors...
The mammalian cerebral cortex supports cognitive functions such as sensorimotor integration, memory, and social behaviors. Normal brain function relies on a diverse set of differentiated cell types, including neurons, glia, and vasculature. Here, we have used large-scale single-cell RNA sequencing (RNA-seq) to classify cells in the mouse somatosensory cortex and hippocampal CA1 region. We found 47 molecularly distinct subclasses, comprising all known major cell types in the cortex. We identified...
1,210 CitationsSource
#1Pauli Rämö (ETH Zurich)H-Index: 15
#2Raphael Sacher (ETH Zurich)H-Index: 8
Last. Lucas Pelkmans (ETH Zurich)H-Index: 38
view all 5 authors...
Summary: CellClassifier is a tool for classifying single-cell phenotypes in microscope images. It includes several unique and user-friendly features for classification using multiclass support vector machines Availability: Source code, user manual and SaveObjectSegmentation CellProfiler module available for download at www under the GPL license (implemented in Matlab). Contact: Supplementary information: Supplementary data are available at Bioin...
64 CitationsSource
#1Kenneth P. Olive (Cancer Research UK)H-Index: 21
#2Michael A. Jacobetz (Cancer Research UK)H-Index: 9
Last. David A. Tuveson (Cancer Research UK)H-Index: 73
view all 37 authors...
Pancreatic ductal adenocarcinoma (PDA) is among the most lethal human cancers in part because it is insensitive to many chemotherapeutic drugs. Studying a mouse model of PDA that is refractory to the clinically used drug gemcitabine, we found that the tumors in this model were poorly perfused and poorly vascularized, properties that are shared with human PDA. We tested whether the delivery and efficacy of gemcitabine in the mice could be improved by coadministration of IPI-926, a drug that deple...
1,817 CitationsSource
This paper presents a semi-supervised graph-based method for the classification of hyperspectral images. The method is designed to handle the special characteristics of hyperspectral images, namely, high-input dimension of pixels, low number of labeled samples, and spatial variability of the spectral signature. To alleviate these problems, the method incorporates three ingredients, respectively. First, being a kernel-based method, it combats the curse of dimensionality efficiently. Second, follo...
408 CitationsSource
#1Joao Soares (USP: University of São Paulo)H-Index: 9
#2Jorge J. G. LeandroH-Index: 10
Last. Michael J. CreeH-Index: 25
view all 5 authors...
We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or nonvessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and two-dimensional Gabor wavelet transform responses taken at multiple scales. The Gabor wavelet is capable of tuning to specific frequencies, thus allowing noise filtering and vessel enhancement in a single step. We use a Bayesian c...
918 CitationsSource
Cited By1
#1Nikki K. Lytle (UCSD: University of California, San Diego)H-Index: 15
#2L. Paige Ferguson (UCSD: University of California, San Diego)H-Index: 2
Last. Tannishtha ReyaH-Index: 35
view all 33 authors...
Summary Drug resistance and relapse remain key challenges in pancreatic cancer. Here, we have used RNA sequencing (RNA-seq), chromatin immunoprecipitation (ChIP)-seq, and genome-wide CRISPR analysis to map the molecular dependencies of pancreatic cancer stem cells, highly therapy-resistant cells that preferentially drive tumorigenesis and progression. This integrated genomic approach revealed an unexpected utilization of immuno-regulatory signals by pancreatic cancer epithelial cells. In particu...
7 CitationsSource