Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch.
Loading Scinapse...
PILER: identification and classification of genomic repeats
Published on Jan 1, 2005 in Intelligent Systems in Molecular Biology
· DOI :10.1093/bioinformatics/bti1003
Robert C. Edgar13
Estimated H-index: 13
Eugene W. Myers53
Estimated H-index: 53
(University of California, Berkeley)
Summary: Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem. Therefore, repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curated libraries of known repeat families. We present a new approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. We describe PILER, a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for Homo sapiens, Arabidopsis thalania and Drosophila melanogaster. Availability: The PILER software is freely available at Contact: [email protected]
  • Full text
  • References (14)
  • Cited By (252)
Published on Aug 19, 2004in BMC Bioinformatics 2.21
Robert C. Edgar13
Estimated H-index: 13
(University of California, Berkeley)
4,154 Citations Source Cite
Published on Aug 1, 2001in Genome Biology 13.21
Natalia Volfovsky15
Estimated H-index: 15
Brian J. Haas64
Estimated H-index: 64
Steven L. Salzberg118
Estimated H-index: 118
Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.
118 Citations Source Cite
Published on Sep 1, 2000in Trends in Genetics 10.56
Jerzy Jurka57
Estimated H-index: 57
(Genetic Information Research Institute)
This work was supported in part by the National Library of Medicine, grant 5 P41 LM06252-3.
738 Citations Source Cite
Published on Sep 1, 1994in Nature 41.58
Brian Charlesworth78
Estimated H-index: 78
Paul D. Sniegowski23
Estimated H-index: 23
Wolfgang Stephan46
Estimated H-index: 46
1,194 Citations Source Cite
Published on Oct 1, 1990in Journal of Molecular Biology 4.89
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health),
Warren Gish15
Estimated H-index: 15
(National Institutes of Health)
+ 2 AuthorsDavid J. Lipman44
Estimated H-index: 44
(National Institutes of Health)
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a vari...
58.1k Citations Source Cite
Published on Sep 1, 2004in Genome Research 10.10
Paul A. Pevzner1
Estimated H-index: 1
Haixu Tang38
Estimated H-index: 38
Glenn Tesler27
Estimated H-index: 27
Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly o...
152 Citations Source Cite
Published on Jan 1, 1999in Nucleic Acids Research 11.56
Gary Benson26
Estimated H-index: 26
(Icahn School of Medicine at Mount Sinai)
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem re...
3,404 Citations Source Cite
Published on Dec 23, 2002in Genome Biology 13.21
Susan E. Celniker51
Estimated H-index: 51
(Lawrence Berkeley National Laboratory),
David A. Wheeler71
Estimated H-index: 71
(Baylor College of Medicine)
+ 29 AuthorsErwin Frise11
Estimated H-index: 11
(Lawrence Berkeley National Laboratory)
Background The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
325 Citations Source Cite
Published on Feb 16, 2001in Science 41.06
J. Craig Venter85
Estimated H-index: 85
(Celera Corporation),
Mark D. Adams65
Estimated H-index: 65
(Celera Corporation)
+ 270 AuthorsRobert A. Holt68
Estimated H-index: 68
(Celera Corporation)
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Cel...
10.7k Citations Source Cite
Published on Mar 8, 2004in Nucleic Acids Research 11.56
Robert C. Edgar5
Estimated H-index: 5
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. ...
19.3k Citations Source Cite
Cited By252
Published on Nov 3, 2013
Nathaniel D. Figueroa1
Estimated H-index: 1
(Miami University),
Xiaolin Liu1
Estimated H-index: 1
(Miami University)
+ 1 AuthorsJohn E. Karro10
Estimated H-index: 10
(Miami University)
Here we present RAIDER, a tool for the de novo identification of elementary repeats. The problem of searching for genomic repeats without reference to a compiled profile library is important in the annotation of new genomes and the discovery of new repeat classes. Several tools have attempted to address the problem, but generally suffer either an inability to run at the whole-genome scale or loss of sensitivity due to sequence variation between repeat copies. To address this, Zheng and Lonardi d...
1 Citations Source Cite
Published on Jan 1, 2013
Tiago Loureiro3
Estimated H-index: 3
(University of Porto),
Rui Camacho14
Estimated H-index: 14
(University of Porto)
+ 1 AuthorsNuno A. Fonseca18
Estimated H-index: 18
(European Bioinformatics Institute)
Transposable Elements (TE) are sequences of DNA that move and transpose within a genome. TEs, as mutation agents, are quite important for their role in both genome alteration diseases and on species evolution. Several tools have been developed to discover and annotate TEs but no single one achieves good results on all different types of TEs. In this paper we evaluate the performance of several TEs detection and annotation tools and investigate if Machine Learning techniques can be used to improv...
3 Citations Source Cite
Published on Jan 1, 2014
Published on Jan 1, 2008
G Achaz , F Boyer + 1 AuthorsE Coissac
The importance of genome redundancy has been strongly emphasized in the field of genome dynamics and evolution as well as in medical biology. A repeat is a sequence present twice or more with a high degree of similarity within a larger sequence (e.g. a chromosome) or set of sequences (e.g. a genome with several chromosomes). Each instance of the repeated sub-sequence is called a ’copy’ of the repeat. We use the term ”duplication” to denote any active mechanistic event that creates a repeat. Even...
Published on Oct 28, 2010
Tout organisme vivant est le produit d'interactions complexes entre son genome et son environnement, interactions caracterisees par des echanges de matiere et d'energie indispensables a la survie de l'organisme et la transmission de son genome. Depuis la decouverte dans les annees 1910 que le chromosome est le support de l'information genetique, les biologistes etudient les genomes afin de decrypter les mecanismes et processus a l'oeuvre dans le developpement des organismes et l'evolution des po...
Published on Jan 1, 2010in Methods of Molecular Biology
Richard Cordaux23
Estimated H-index: 23
(University of Poitiers),
Shurjo K. Sen7
Estimated H-index: 7
(Louisiana State University)
+ 1 AuthorsMark A. Batzer82
Estimated H-index: 82
(Louisiana State University)
Transposable elements (TE), defined as discrete pieces of DNA that can move from site to another site in genomes, represent significant components of eukaryotic genomes, including primates. Comparative genome-wide analyses have revealed the considerable structural and functional impact of TE families on primate genomes. Insights into these questions have come in part from the development of computational methods that allow detailed and reliable identification, annotation and evolutionary analyse...
6 Citations Source Cite
Published on Jan 1, 2008
Anna-Sophie Fiston-Lavier1
Estimated H-index: 1
De la bacterie a l'homme, dispersees ou en tandem, les repetitions peuvent representer jusqu'a 90 % de la sequence genomique. Malgre leur impact sur la plasticite et l'evolution des genomes eucaryotes, leurs mecanismes de formation sont encore tres speculatifs. L'insertion continue de nouvelles repetitions devrait conduire a une augmentation constante de la taille des genomes. Or, il ne semble pas que ce soit le cas. Y a t-il regulation de la taille des genomes? Le processus de regulation est-il...
1 Citations
Published on Jan 1, 2014
Ha X. Dang10
Estimated H-index: 10
(Virginia Bioinformatics Institute),
Christopher B. Lawrence28
Estimated H-index: 28
(Virginia Bioinformatics Institute)
The “rots” are among the most destructive plant diseases caused by necrotrophic fungi. Toxins and hydrolytic enzymes secreted by these pathogens inflict tissue damage and/or death on their hosts in advance of and in concert with hyphal colonization. Although they represent a small percentage of fungal diversity, these fungi are tremendously important, economically causing severe losses in some parts of the world annually. Necrotrophs, including many species of Alternaria, represent the largest c...
2 Citations Source Cite
Published on Jan 1, 2011in Advances in Genetics 4.69
Dale J. Hedges31
Estimated H-index: 31
(University of Miami),
Victoria P. Belancio17
Estimated H-index: 17
(Tulane University)
Since their initial discovery in maize, there have been various attempts to categorize the relationship between transposable elements (TEs) and their host organisms. These have ranged from TEs being selfish parasites to their role as essential, functional components of organismal biology. Research over the past several decades has, in many respects, only served to complicate the issue even further. On the one hand, investigators have amassed substantial evidence concerning the negative effects t...
15 Citations Source Cite
Published on Oct 1, 2006
Jing-Doo Wang6
Estimated H-index: 6
(Asia University (Japan))
This work presents an external memory approach to extract the maximal repeats from whole genome sequences with the statistics of these repeats across classes, where the definition of a class is determined from the statistics to be computed. A heuristic method consisting of a bucket-sort-like approach and the Chinese term extraction approach is adopted. The bucket-sorting method is used to sort the suffixes of DNA sequences stored in files, and the term extraction is used to extract maximal repea...
3 Citations Source Cite
View next paperBasic Local Alignment Search Tool