Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes

Published on Dec 1, 2007in Genome Research9.944
路 DOI :10.1101/gr.6679507
Michael F. Lin23
Estimated H-index: 23
Joseph W. Carlson32
Estimated H-index: 32
+ 16 AuthorsManolis Kamvysselis94
Estimated H-index: 94
The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.
  • References (53)
  • Citations (121)
馃摉 Papers frequently viewed together
2011ISMB: Intelligent Systems in Molecular Biology
460 Citations
20 Authors (Mitchell Guttman, ..., Eric S. Lander)
2,540 Citations
415 Authors (A. Clark, ..., Iain MacCallum)
1,581 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Alexander StarkH-Index: 64
#2Pouya KheradpourH-Index: 29
Last. Manolis KamvysselisH-Index: 94
view all 7 authors...
MicroRNAs (miRNAs) are short regulatory RNAs that inhibit target genes by complementary binding in 3 untranslated regions (3 UTRs). They are one of the most abundant classes of regulators, targeting a large fraction of all genes, making their comprehensive study a requirement for understanding regulation and development. Here we use 12 Drosophila genomes to define structural and evolutionary signatures of miRNA hairpins, which we use for their de novo discovery. We predict >41 novel miRNA genes,...
155 CitationsSource
Jan 1, 2006 in NeurIPS (Neural Information Processing Systems)
#1Samuel S. Gross (Stanford University)H-Index: 4
#2Olga Russakovsky (Stanford University)H-Index: 17
Last. Serafim Batzoglou (Stanford University)H-Index: 53
view all 4 authors...
We consider the problem of training a conditional random field (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives significantly better testing performance than several current approaches fo...
25 Citations
#1J. Robert Manak (Affymetrix)H-Index: 23
#2Sujit Dike (Affymetrix)H-Index: 8
Last. Thomas R. Gingeras (Affymetrix)H-Index: 83
view all 11 authors...
Many animal and plant genomes are transcribed much more extensively than current annotations predict. However, the biological function of these unannotated transcribed regions is largely unknown. Approximately 7% and 23% of the detected transcribed nucleotides during D. melanogaster embryogenesis map to unannotated intergenic and intronic regions, respectively. Based on computational analysis of coordinated transcription, we conservatively estimate that 29% of all unannotated transcribed sequenc...
168 CitationsSource
#1Jennifer Harrow (Wellcome Trust Sanger Institute)H-Index: 43
#3Adam Frankish (Wellcome Trust Sanger Institute)H-Index: 38
Last. Roderic GuigH-Index: 91
view all 14 authors...
Background The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.
444 CitationsSource
#1Kenneth H. Wan (LBNL: Lawrence Berkeley National Laboratory)H-Index: 18
#2Charles Yu (LBNL: Lawrence Berkeley National Laboratory)H-Index: 11
Last. Susan E. Celniker (LBNL: Lawrence Berkeley National Laboratory)H-Index: 55
view all 8 authors...
Libraries of cDNA clones are valuable resources foranalysing the expression, structure, and regulation of genes, as well asfor studying protein functions and interactions. Full-length cDNA clonesprovide information about intron and exon structures, splice junctionsand 5'- and 3'-untranslated regions (UTRs). Open reading frames (ORFs)derived from cDNA clones can be used to generate constructs allowingexpression of native proteins and N- or C-terminally tagged proteins.Thus, obtaining full-length ...
13 CitationsSource
Driven by competition, automation, and technology, the genomics community has far exceeded its ambition to sequence the human genome by 2005. By analyzing mammalian genomes, we have shed light on the history of our DNA sequence, determined that alternatively spliced RNAs and retroposed pseudogenes are incredibly abundant, and glimpsed the apparently huge number of non-coding RNAs that play significant roles in gene regulation. Ultimately, genome science is likely to provide comprehensive catalog...
99 CitationsSource
#1Roger A. Hoskins (LBNL: Lawrence Berkeley National Laboratory)H-Index: 34
#2Mark Stapleton (LBNL: Lawrence Berkeley National Laboratory)H-Index: 16
Last. Susan E. Celniker (LBNL: Lawrence Berkeley National Laboratory)H-Index: 55
view all 7 authors...
The invention provides a method for screening, isolation and recovery of clones using self-ligation of inverse PCR products. The recovery of full-length, intact clones representing genes and alternatively spliced transcripts of interest is described. We demonstrate the utility of the method by recovering full-length cDNA clones for genes and alternatively spliced transcripts, including genes that are not represented in available EST collections. The method is applicable to any plasmid library, i...
22 CitationsSource
#1Adam SiepelH-Index: 50
#2Gill BejeranoH-Index: 35
Last. David HausslerH-Index: 145
view all 16 authors...
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based o...
2,450 CitationsSource
#1Mark Yandell (University of California, Berkeley)H-Index: 58
#2Adina M. Bailey (University of California, Berkeley)H-Index: 7
Last. Gerald M. Rubin (LBNL: Lawrence Berkeley National Laboratory)H-Index: 137
view all 8 authors...
Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions...
32 CitationsSource
#1Stephen Richards (BCM: Baylor College of Medicine)H-Index: 49
#2Yunlong Liu (BCM: Baylor College of Medicine)H-Index: 47
Last. Richard A. Gibbs (BCM: Baylor College of Medicine)H-Index: 142
view all 52 authors...
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the A pseudoobscura genome at many junctions be...
460 CitationsSource
Cited By121
#1Pan WuH-Index: 1
#2Yongzhen MoH-Index: 3
Last. Yi LiH-Index: 68
view all 14 authors...
Non-coding RNAs do not encode proteins and regulate various oncological processes. They are also important potential cancer diagnostic and prognostic biomarkers. Bioinformatics and translation omics have begun to elucidate the roles and modes of action of the functional peptides encoded by ncRNA. Here, recent advances in long non-coding RNA (lncRNA) and circular RNA (circRNA)-encoded small peptides are compiled and synthesized. We introduce both the computational and analytical methods used to f...
1 CitationsSource
#1Ping WeiH-Index: 1
#2Wen Xue (Soochow University (Suzhou))
Last. Jiwu WangH-Index: 1
view all 5 authors...
#1Fabrice Darbellay (EPFL: 脡cole Polytechnique F茅d茅rale de Lausanne)H-Index: 4
#2Anamaria Necsulea (EPFL: 脡cole Polytechnique F茅d茅rale de Lausanne)H-Index: 15
The functionality of long noncoding RNAs (lncRNAs) is disputed. In general, lncRNAs are under weak selective pressures, suggesting that the majority of lncRNAs may be nonfunctional. However, although some surveys showed negligible phenotypic effects upon lncRNA perturbation, key biological roles were demonstrated for individual lncRNAs. Most lncRNAs with proven functions were implicated in gene expression regulation, in pathways related to cellular pluripotency, differentiation, and organ morpho...
#1Jonathan M. MudgeH-Index: 17
#2Irwin JungreisH-Index: 15
Last. Manolis KamvysselisH-Index: 94
view all 17 authors...
The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved pr...
#1Cornelia Fritsch (University of Fribourg)H-Index: 5
#2F. Javier Bernardo-Garcia (University of Fribourg)H-Index: 4
Last. Simon G. Sprecher (University of Fribourg)H-Index: 17
view all 10 authors...
Development of eye tissue is initiated by a conserved set of transcription factors termed retinal determination network (RDN). In the fruit fly Drosophila melanogaster, the zinc-finger transcription factor Glass acts directly downstream of the RDN to control identity of photoreceptor as well as non-photoreceptor cells. Tight control of spatial and temporal gene expression is a critical feature during development, cell-fate determination as well as maintenance of differentiated tissues. The molec...
#1Bailing ZhouH-Index: 1
#2Yuedong Yang (Griffith University)H-Index: 26
Last. Yaoqi Zhou (Griffith University)H-Index: 52
view all 6 authors...
High-throughput techniques have uncovered hundreds and thousands of long non-coding RNAs (lncRNAs). Among them, only a small fraction has experimentally validated functions (EVlncRNAs) by low-throughput methods. What fraction of lncRNAs from high-throughput experiments (HTlncRNAs) is truly functional is an active subject of debate. Here, we developed the first method to distinguish EVlncRNAs from HTlncRNAs and mRNAs by using Support Vector Machines and found that EVlncRNAs can be well separated ...
#1Zhen You (CAU: China Agricultural University)H-Index: 1
#2Qinghe Zhang (CAU: China Agricultural University)
Last. Ling Lian (CAU: China Agricultural University)H-Index: 11
view all 6 authors...
Background Marek鈥檚 disease virus (MDV) is an oncogenic herpesvirus that can cause T-cell lymphomas in chicken. Long noncoding RNA (lncRNA) is strongly associated with various cancers and many other diseases. In chickens, lncRNAs have not been comprehensively identified. Here, we profiled mRNA and lncRNA repertoires in three groups of spleens from MDV-infected and non-infected chickens, including seven tumorous spleens (TS) from MDV-infected chickens, five spleens from the survivors (SS) without ...
#1Xinqiang Yin (North Sichuan Medical College)H-Index: 1
#2Yuanyuan Jing (North Sichuan Medical College)H-Index: 1
Last. Hanmei Xu (MOE: Chinese Ministry of Education)H-Index: 1
view all 3 authors...
ABSTRACTIntroduction: Small open reading frames (sORFs) with potential protein-coding capacity have been disclosed in various transcripts, including long noncoding RNAs (LncRNAs), mRNAs (5使-upstream, coding domain, and 3使-downstream), circular RNAs, pri-miRNAs, and ribosomal RNAs (rRNAs). Recent characterization of several sORF-encoded peptides (SEPs or micropeptides) revealed their important roles in many fundamental biological processes in a broad range of species from yeast to human. The succ...
1 CitationsSource
#1Cornelia Fritsch (University of Fribourg)H-Index: 5
#2F. J. Bernardo-Garcia (University of Fribourg)
Last. Simon G. Sprecher (University of Fribourg)H-Index: 17
view all 7 authors...
Development of eye tissue is initiated by a conserved set of transcripton factors termed retinal determination network (RDN). In the fruit fly Drosophila melanogaster, the zinc-finger transcription factor Glass acts directly downstream of the RDN to control idendity of photoreceptor as well as non-photoreceptors cells. Tight control of spatial and temporal gene expression is a critical feature during development, cell-fate determination as well as maintainance of differentiated tissues. The mole...
#1Rasha A. Al-Eisa (Taif University)H-Index: 3
#2Fawziah A. Al-Salmi (Taif University)H-Index: 1
Last. Nahla S. El-Shenawy (Suez Canal University)H-Index: 10
view all 4 authors...
Aspartame (ASP) has been used as an alternative to sucrose for diabetics and obese people worldwide. Co-administration of L-carnitine (LC) with ASP has a protective effect against the liver and kidney toxicity induced of ASP. The goal of the investigation was to assess the enhancement of LC effect on the cardiac toxicity caused of ASP. The rats were divided into 6 groups: control with saline, LC (10 mg/kg), ASP (75 mg/kg), ASP (150 mg/kg), LC with 75 mg/kg of ASP, and LC with 150 mg/kg ASP. The ...
2 CitationsSource