DEclust: A statistical approach for obtaining differential expression profiles of multiple conditions

Published on Nov 21, 2017in PLOS ONE2.776
· DOI :10.1371/journal.pone.0188285
Yoshimasa Aoto2
Estimated H-index: 2
(Keio: Keio University),
Tsuyoshi Hachiya11
Estimated H-index: 11
(Iwate Medical University)
+ 4 AuthorsYasubumi Sakakibara29
Estimated H-index: 29
(Keio: Keio University)
High-throughput RNA sequencing technology is widely used to comprehensively detect and quantify cellular gene expression. Thus, numerous analytical methods have been proposed for identifying differentially expressed genes (DEGs) between paired samples such as tumor and control specimens, but few studies have reported methods for analyzing differential expression under multiple conditions. We propose a novel method, DEclust, for differential expression analysis among more than two matched samples from distinct tissues or conditions. As compared to conventional clustering methods, DEclust more accurately extracts statistically significant gene clusters from multi-conditional transcriptome data, particularly when replicates of quantitative experiments are available. DEclust can be used for any multi-conditional transcriptome data, as well as for extending any DEG detection tool for paired samples to multiple samples. Accordingly, DEclust can be used for a wide range of applications for transcriptome data analysis. DEclust is freely available at
Figures & Tables
  • References (21)
  • Citations (3)
📖 Papers frequently viewed together
42 Citations
1 Author (Debashis Ghosh)
52 Citations
3 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Guangliang Kang (Fudan University)H-Index: 1
#2Li Du (Fudan University)H-Index: 1
Last. Hong Zhang (Fudan University)H-Index: 1
view all 3 authors...
Background The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions.
2 CitationsSource
#1Simon AndersH-Index: 25
#2Paul Theodor PylH-Index: 3
Last. Wolfgang HuberH-Index: 76
view all 3 authors...
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model informa...
5,748 CitationsSource
#1Michael I. Love (MPG: Max Planck Society)H-Index: 21
#2Wolfgang HuberH-Index: 76
Last. Simon AndersH-Index: 25
view all 3 authors...
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpret...
10.6k CitationsSource
#1Hua YuH-Index: 44
#2Heehyoung LeeH-Index: 21
Last. Richard JoveH-Index: 75
view all 5 authors...
The Janus kinases (JAKs) are major activators of signal transducer and activator of transcription (STAT) proteins, and this signalling axis is crucial for cancer development in both tumour cells and the tumour microenvironment. This Review discusses the new roles of JAK–STAT signalling in promoting cancer through inflammation, obesity, stem cells and the pre-metastatic niche, and the potential therapeutic strategies that these roles can offer.
657 CitationsSource
#1Camillo PortaH-Index: 19
#2Chiara PaglinoH-Index: 19
Last. Alessandra Mosca (University of Eastern Piedmont)H-Index: 13
view all 3 authors...
The phosphatidylinositol-3-kinase (PI3K)/Akt and the mammalian target of Rapamycin (mTOR) signaling pathways are two pathways crucial to many aspects of cell growth and survival, in physiological as well as in pathological conditions (e.g. cancer). Indeed, they are so interconnected that, in a certain sense, they could be regarded as a single, unique pathway. In this paper, after a general overview of the biological significance and the main components of these pathways, we address the present s...
362 CitationsSource
#1Michael I. LoveH-Index: 21
#2Wolfgang HuberH-Index: 3
Last. Simon AndersH-Index: 25
view all 3 authors...
171 Citations
#1Daehwan Kim (UMD: University of Maryland, College Park)H-Index: 8
#2Geo Pertea (Johns Hopkins University)H-Index: 24
Last. Steven L. Salzberg (Johns Hopkins University)H-Index: 128
view all 6 authors...
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ab...
6,216 CitationsSource
#1Bert Vogelstein (HHMI: Howard Hughes Medical Institute)H-Index: 223
#2Nickolas Papadopoulos (HHMI: Howard Hughes Medical Institute)H-Index: 75
Last. Kenneth W. Kinzler (HHMI: Howard Hughes Medical Institute)H-Index: 186
view all 6 authors...
Over the past decade, comprehensive sequencing efforts have revealed the genomic landscapes of common forms of human cancer. For most cancer types, this landscape consists of a small number of “mountains” (genes altered in a high percentage of tumors) and a much larger number of “hills” (genes altered infrequently). To date, these studies have revealed ~140 genes that, when altered by intragenic mutations, can promote or “drive” tumorigenesis. A typical tumor contains two to eight of these “driv...
3,669 CitationsSource
#1Cole Trapnell (Broad Institute)H-Index: 42
#2David G. Hendrickson (Broad Institute)H-Index: 11
Last. Lior Pachter (University of California, Berkeley)H-Index: 55
view all 6 authors...
Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference...
1,897 CitationsSource
#1Ben Langmead (UMD: University of Maryland, College Park)H-Index: 20
#2Steven L. Salzberg (UMD: University of Maryland, College Park)H-Index: 128
The Bowtie 2 software achieves fast, sensitive, accurate and memory-efficient gapped alignment of sequencing reads using the full-text minute index and hardware-accelerated dynamic programming algorithms.
14.1k CitationsSource
Cited By3
#2Aditya BhaskaraH-Index: 13
Last. Kuberan BalagurunathanH-Index: 1
view all 4 authors...
The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior k...
Studies conducted in time series could be far more informative than those that only capture a specific moment in time. However, when it comes to transcriptomic data, time points are sparse creating the need for a constant search for methods capable of extracting information out of experiments of this kind. We propose a feature selection algorithm embedded in a hidden Markov model applied to gene expression time course data on either single or even multiple biological conditions. For the latter, ...
#1Kira C. M. Neller (York University)H-Index: 2
#2Camille A Diaz (York University)
Last. Katalin A. Hudak (York University)H-Index: 13
view all 4 authors...
Ribosome-inactivating proteins are RNA glycosidases thought to function in defense against pathogens. These enzymes remove purine bases from RNAs, including rRNA; the latter activity decreases protein synthesis in vitro, which is hypothesized to limit pathogen proliferation by causing host cell death. Pokeweed antiviral protein (PAP) is a ribosome-inactivating protein synthesized by the American pokeweed plant (Phytolacca americana). PAP inhibits virus infection when expressed in crop plants, ye...
#1Yoshimasa Aoto (Keio: Keio University)H-Index: 2
#2Kazuhiro OkumuraH-Index: 5
Last. Yasubumi Sakakibara (Keio: Keio University)H-Index: 29
view all 7 authors...
Recent years have witnessed substantial progress in understanding tumor heterogeneity and the process of tumor progression; however, the entire process of the transition of tumors from a benign to metastatic state remains poorly understood. In the present study, we performed a prospective cancer genome-sequencing analysis by employing an experimental carcinogenesis mouse model of squamous cell carcinoma to systematically understand the evolutionary process of tumors. We surgically collected a pa...