scinapse is loading now...

PILER: identification and classification of genomic repeats

Published on Jan 1, 2005 in Intelligent Systems in Molecular Biology
· DOI :10.1093/bioinformatics/bti1003
Robert C. Edgar22
Estimated H-index: 22
,
Eugene W. Myers55
Estimated H-index: 55
(University of California, Berkeley)
Abstract
Summary: Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem. Therefore, repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curated libraries of known repeat families. We present a new approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. We describe PILER, a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for Homo sapiens, Arabidopsis thalania and Drosophila melanogaster. Availability: The PILER software is freely available at http://www.drive5.com/piler Contact: [email protected]
  • References (14)
  • Citations (272)
Cite
References14
Newest
Published on Sep 1, 2015in Clinical Chemistry 8.64
J. Craig Venter89
Estimated H-index: 89
(J. Craig Venter Institute),
Mark D. Adams39
Estimated H-index: 39
+ 270 AuthorsRobert A. Holt73
Estimated H-index: 73
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Cel...
766 Citations Source Cite
Published on Sep 1, 2004in Genome Research 10.10
Pavel A. Pevzner80
Estimated H-index: 80
,
Haixu Tang38
Estimated H-index: 38
,
Glenn Tesler27
Estimated H-index: 27
Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly o...
159 Citations Source Cite
Published on Aug 19, 2004in BMC Bioinformatics 2.21
Robert C. Edgar22
Estimated H-index: 22
(University of California, Berkeley)
Background In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed f...
4,264 Citations Source Cite
Published on Mar 8, 2004in Nucleic Acids Research 11.56
Robert C. Edgar22
Estimated H-index: 22
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. ...
20.4k Citations Source Cite
Published on Dec 23, 2002in Genome Biology 13.21
Susan E. Celniker52
Estimated H-index: 52
(Lawrence Berkeley National Laboratory),
D Altshuler177
Estimated H-index: 177
(Baylor College of Medicine)
+ 29 AuthorsErwin Frise12
Estimated H-index: 12
(Lawrence Berkeley National Laboratory)
Background The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
340 Citations Source Cite
Published on Aug 1, 2002in Genome Research 10.10
Zhirong Bao23
Estimated H-index: 23
(Washington University in St. Louis),
Sean R. Eddy76
Estimated H-index: 76
(Washington University in St. Louis)
Repetitive sequences make up a major part of eukaryotic genomes. We have developed an approach for the de novo identification and classification of repeat sequence families that is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. When tested o...
397 Citations Source Cite
Published on Jul 1, 2002in Current Issues in Molecular Biology 2.27
Nathan J. Bowen20
Estimated H-index: 20
(National Institutes of Health),
I. King Jordan34
Estimated H-index: 34
(National Institutes of Health)
Eukaryotic transposable elements are ubiquitous and widespread mobile genetic entities. These elements often make up a substantial fraction of the host genomes in which they reside. For example, approximately 1/2 of the human genome was recently shown to consist of transposable element sequences. There is a growing body of evidence that demonstrates that transposable elements have been major players in genome evolution. A sample of this evidence is reviewed here with an emphasis on the role that...
87 Citations
Published on Aug 1, 2001in Genome Biology 13.21
Natalia Volfovsky16
Estimated H-index: 16
,
Brian J. Haas62
Estimated H-index: 62
,
Steven L. Salzberg120
Estimated H-index: 120
Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.
120 Citations Source Cite
Published on Feb 16, 2001in Science 41.06
J. Craig Venter89
Estimated H-index: 89
(Celera Corporation),
Mark D. Adams39
Estimated H-index: 39
(Celera Corporation)
+ 270 AuthorsRobert A. Holt73
Estimated H-index: 73
(Celera Corporation)
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Cel...
10.9k Citations Source Cite
Published on Feb 15, 2001in Nature 41.58
Eric S. Lander245
Estimated H-index: 245
(Massachusetts Institute of Technology),
Lauren Linton7
Estimated H-index: 7
(Massachusetts Institute of Technology)
+ 246 AuthorsWilliam FitzHugh9
Estimated H-index: 9
(Massachusetts Institute of Technology)
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
16.5k Citations Source Cite
Cited By272
Newest
Published on Apr 1, 2019in BMC Plant Biology 3.93
Ming Li2
Estimated H-index: 2
(Sichuan University),
Songtao Yang2
Estimated H-index: 2
+ 22 AuthorsFeng Lin
Background Sweetpotato (Ipomoea batatas (L.) Lam.) is the seventh most important crop in the world and is mainly cultivated for its underground storage root (SR). The genetic studies of this species have been hindered by a lack of high-quality reference sequence due to its complex genome structure. Diploid Ipomoea trifida is the closest relative and putative progenitor of sweetpotato, which is considered a model species for sweetpotato, including genetic, cytological, and physiological analyses.
Source Cite
Published on Apr 18, 2019in Genome Biology 13.21
Lin Zeng4
Estimated H-index: 4
(Chinese Academy of Sciences),
Xiao-Long Tu1
Estimated H-index: 1
+ 14 AuthorsXiao-Long Li
Background Pistachio (Pistacia vera), one of the most important commercial nut crops worldwide, is highly adaptable to abiotic stresses and is tolerant to drought and salt stresses.
Source Cite
Published on Jan 1, 2019in Nature Communications 12.35
Changsong Zou1
Estimated H-index: 1
(Chinese Academy of Sciences),
Leiting Li14
Estimated H-index: 14
(Chinese Academy of Sciences)
+ 22 AuthorsWei Jia1
Estimated H-index: 1
(Chinese Academy of Sciences)
Broomcorn millet (Panicum miliaceum L.) is the most water-efficient cereal and one of the earliest domesticated plants. Here we report its high-quality, chromosome-scale genome assembly using a combination of short-read sequencing, single-molecule real-time sequencing, Hi-C, and a high-density genetic map. Phylogenetic analyses reveal two sets of homologous chromosomes that may have merged ~5.6 million years ago, both of which exhibit strong synteny with other grass species. Broomcorn millet con...
1 Citations Source Cite
Published on May 14, 2019in Mobile Dna 5.89
Andrei S. Guliaev1
Estimated H-index: 1
(Russian Academy of Sciences),
S. K. Semyenova10
Estimated H-index: 10
(Russian Academy of Sciences)
Background Genomes of eukaryotes are inhabited by myriads of mobile genetic elements (MGEs) – transposons and retrotransposons - which play a great role in genome plasticity and evolution. A lot of computational tools were developed to annotate them either in genomic assemblies or raw reads using de novo or homology-based approaches. But there has been no pipeline enabling users to get coding and flanking sequences of MGEs suitable for a downstream analysis from genome assemblies.
Source Cite
Published on Jan 16, 2019in BMC Genomics 3.73
Rahul V. Rane6
Estimated H-index: 6
(University of Melbourne),
Stephen L. Pearce6
Estimated H-index: 6
(Commonwealth Scientific and Industrial Research Organisation)
+ 9 AuthorsSiu F. Lee20
Estimated H-index: 20
(University of Melbourne)
3 Citations Source Cite
Published on Jan 29, 2019in Mobile Dna 5.89
Joelle Amselem24
Estimated H-index: 24
(Université Paris-Saclay),
Guillaume Cornut1
Estimated H-index: 1
(Université Paris-Saclay)
+ 9 AuthorsCyril Pommier7
Estimated H-index: 7
(Université Paris-Saclay)
Background Thanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution. Their detection and annotation are considered essential in any genome sequencing project. The number of fully sequenced genomes is rapidly increasing with improvements in high-throughput sequencing technologies. A fully automated de novo annotation process for TEs is therefore required to cope with the deluge o...
Source Cite
Published on May 1, 2019in BMC Genomics 3.73
Viktor N. Shamanskiy1
Estimated H-index: 1
,
Valeria N. Timonina1
Estimated H-index: 1
+ 1 AuthorsKonstantin V. Gunbin8
Estimated H-index: 8
1 Citations Source Cite
Published on May 1, 2019in Molecular Plant 9.33
Jian Sun4
Estimated H-index: 4
(Shenyang Agricultural University),
Dianrong Ma1
Estimated H-index: 1
(Shenyang Agricultural University)
+ 18 AuthorsWenxing Zhang (Shenyang Agricultural University)
Abstract Crop weediness, especially that of weedy rice ( Oryza sativa f . spontanea ), remains mysterious. Weedy rice possesses robust ecological adaptability; however, how this strain originated and gradually formed proprietary genetic features remains unclear. Here, we demonstrate that weedy rice at Asian high latitudes ( WRAH ) is phylogenetically well defined and possesses unselected genomic characteristics in many divergence regions between weedy and cultivated rice. We also identified nove...
Source Cite
Published on May 1, 2019in Molecular Plant 9.33
Xianjun Peng9
Estimated H-index: 9
(Chinese Academy of Sciences),
Hui Liu18
Estimated H-index: 18
(Chinese Academy of Sciences)
+ 21 AuthorsHui Chen11
Estimated H-index: 11
(Chinese Academy of Sciences)
Abstract Paper mulberry ( Broussonetia papyrifera ) is a well-known woody tree historically used for Cai Lun papermaking, one of the four great inventions of ancient China. More recently, Paper mulberry has also been used as forage to address the shortage of feedstuff because of its digestible crude fiber and high protein contents. In this study, we obtained a chromosome-scale genome assembly for Paper mulberry using integrated approaches, including Illumina and PacBio sequencing platform as wel...
1 Citations Source Cite