Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch.
Loading Scinapse...
The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome.
Published on Jan 1, 1994 in Intelligent Systems in Molecular Biology
Pankaj K. Agarwal52
Estimated H-index: 52
,
David J. States32
Estimated H-index: 32
Abstract
Over 3.6 million bases of DNA sequence from chromosome III of the C. eleganshave been determined. The availability of this extended region of contiguous sequence has allowed us to a~nalyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant locM alignments (utilizing both two-way and three-way alignments), dividing the set of alignments into connected components (signifying repeat families), computing evolutionary distance between repeat fanfily members, constructing minimum spanning trees from the connected components, and ~isualizing the evolution of the repeat faanilies. Over 7000 families of repetitive sequences were identified. The size of the families ranged from isolated pairs to over 1600 segments of similar sequence. Approximately 12.3% of the analyzed sequence participates in a repeat element.
  • Full text
  • References (23)
  • Cited By (28)
References23
Published on Jan 1, 1986in Methods in Enzymology 1.98
Walter M. Fitch51
Estimated H-index: 51
,
Temple F. Smith37
Estimated H-index: 37
,
Jan L. Breslow87
Estimated H-index: 87
Publisher Summary Sequence repeats are most easily generated through combinations of gene duplication and unequal crossings over. This chapter describes the detection of internally repeated sequences, and explores the ancestral history. The phylogenetic reconstruction method for detecting tandem repeats is quite powerful. The sensitivity of the method in any particular case is dependent upon the number of repeats, the fraction of the sequence that is composed of repeats, and the evolutionary con...
6 Citations Source Cite
Published on Jan 1, 1988
William Barry Wood1
Estimated H-index: 1
In 1965 Sydney Brenner chose the free-living nematode Caenorhabditis elegans as a promising model system for a concerted genetic, ultrastructural, and behavioral attack on the development and function of a simple nervous system. Since then, with the help of a growing number of investigators, knowledge about the biology of "the worm" has accumulated at a steadily accelerating pace to the extent that C. elegans is now probably the most completely understood metazoan in terms of anatomy, genetics, ...
1,717 Citations
Published on Aug 9, 1968in Science 41.06
Roy J. Britten64
Estimated H-index: 64
,
D. E. Kohne1
Estimated H-index: 1
Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms.
2,242 Citations Source Cite
Roy J. Britten64
Estimated H-index: 64
,
Will F. Baron1
Estimated H-index: 1
+ 1 AuthorsEric H. Davidson95
Estimated H-index: 95
(California Institute of Technology)
Alu repeated sequences arising in DNA of the human lineage during about the last 30 million years are closely similar to a modern consensus. Alu repeats arising at earlier times share correlated blocks of differences from the current consensus at diagnostic positions in the sequence. Using these 26 positions, we can recognize four subfamilies and the older ones are each successively closer to the 7SL sequence. It appears that there has existed a series of conserved genes that are the primary sou...
266 Citations Source Cite
Published on Jun 1, 1991in Journal of Molecular Biology 4.89
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health)
Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is i.mplicitly a "log-odds" matrix, with a specific target distribution for align...
552 Citations Source Cite
Published on Jan 1, 1989in Biometrics 1.52
Michael S. Waterman61
Estimated H-index: 61
238 Citations Source Cite
Published on Feb 1, 1994in Nature Genetics 27.13
Stephen F. Altschul46
Estimated H-index: 46
,
Mark S. Boguski46
Estimated H-index: 46
+ 1 AuthorsJohn C. Wootton4
Estimated H-index: 4
677 Citations Source Cite
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health),
David J. Lipman44
Estimated H-index: 44
(National Institutes of Health)
Abstract Protein database searches frequently can reveal biologically significant sequence relationships useful in understanding structure and function. Weak but meaningful sequence patterns can be obscured, however, by other similarities due only to chance. By searching a database for multiple as opposed to pairwise alignments, distant relationships are much more easily distinguished from background noise. Recent statistical results permit the power of this approach to be analyzed. Given a typi...
357 Citations Source Cite
William R. Pearson45
Estimated H-index: 45
,
David J. Lipman44
Estimated H-index: 44
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an ...
9,718 Citations Source Cite
Published on Aug 1, 1991in Methods 4.00
David J. States32
Estimated H-index: 32
(National Institutes of Health),
Warren Gish15
Estimated H-index: 15
(National Institutes of Health),
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health)
Scoring matrices for nucleic acid sequence comparison that are based on models appropriate to the analysis of molecular sequencing errors or biological mutation processes are presented. In mammalian genomes, transition mutations occur significantly more frequently than transversions, and the optimal scoring of sequence alignments based on this substitution model differs from that derived assuming a uniform mutation model. The information from sequence alignments potentially available using an op...
124 Citations Source Cite
Cited By28
Published on Jan 1, 2004 in Research in Computational Molecular Biology
Pavel A. Pevzner77
Estimated H-index: 77
,
Haixu Tang38
Estimated H-index: 38
,
Glenn Tesler27
Estimated H-index: 27
1 Citations
Published on Jan 1, 2010
Carlos Norberto Fischer1
Estimated H-index: 1
,
Adriane Beatriz de Souza Serapião1
Estimated H-index: 1
With the advances in the genome area, new techniques and automation processes for DNA sequencing, the amount of data produced has increased exponentially. Analyzing this data, in order to identify interesting biological features, is an enormous challenge, especially if it would be done manually. Think about trying to find a specific word in a book, say Don Quixote, and we have to search word by word. How long it would take? Bioinformatics has played an important role trying to help specialists t...
1 Citations Source Cite
Published on Aug 1, 2001in Genome Biology 13.21
Natalia Volfovsky15
Estimated H-index: 15
,
Brian J. Haas64
Estimated H-index: 64
,
Steven L. Salzberg118
Estimated H-index: 118
Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.
118 Citations Source Cite
Published on Dec 31, 1998in New Comprehensive Biochemistry
David J. States32
Estimated H-index: 32
(University of Washington),
William C. Reisdorf (University of Washington)
Molecular sequence is an information-rich source of data that has been at the core of the revolution in molecular biology. The process of identifying similar sequences and grouping related sequences into classes is a complex process. This chapter discusses primary-sequence-based methods. The space of molecular sequences is very large. For a typical 300 amino protein, there are 20 300 or more than 10 390 possible sequences, much larger than the number of atoms in the universe. The genes found in ...
Source Cite
Published on Aug 19, 2000 in Intelligent Systems in Molecular Biology
Stefan Kurtz30
Estimated H-index: 30
(Bielefeld University),
Enno Ohlebusch23
Estimated H-index: 23
+ 2 AuthorsRobert Giegerich35
Estimated H-index: 35
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.
50 Citations
Published on Jul 1, 1998 in Intelligent Systems in Molecular Biology
Lloyd Allison19
Estimated H-index: 19
(Monash University),
Timothy Edgoose5
Estimated H-index: 5
(Monash University),
Trevor I. Dix13
Estimated H-index: 13
(Monash University)
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for example. Typically there are many explanations for a given string under the model, some optimal and many suboptimal. Rather than commit to one optimal explanation, we sum the probabilities over all explanations under the model because this gives the probability of t...
32 Citations
Published on Jan 1, 1997in Genome Informatics
Eric Rivals16
Estimated H-index: 16
,
Jean-Paul Delahaye16
Estimated H-index: 16
(Centre national de la recherche scientifique)
+ 1 AuthorsOlivier Delgrange5
Estimated H-index: 5
(University of Mons)
16 Citations Source Cite
Published on Dec 19, 2005in Genome Research 10.10
Anat Caspi4
Estimated H-index: 4
,
Lior Pachter49
Estimated H-index: 49
Accurate genome-wide cataloging of transposable elements (TEs) will facilitate our understanding of mobile DNA evolution, expose the genomic effects of TEs on the host genome, and improve the quality of assembled genomes. Using the availability of several nearly complete Drosophila genomes and developments in whole genome alignment methods, we introduce a large-scale comparative method for identifying repetitive mobile DNA regions. These regions are highly enriched for transposable elements. Our...
42 Citations Source Cite
Published on Jun 1, 2010in Heredity 3.87
Emmanuelle Lerat19
Estimated H-index: 19
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs
114 Citations Source Cite
Published on Mar 27, 2004 in Research in Computational Molecular Biology
Pavel A. Pevzner77
Estimated H-index: 77
(University of California, San Diego),
Haixu Tang38
Estimated H-index: 38
(University of California, San Diego),
Glenn Tesler27
Estimated H-index: 27
(University of California, San Diego)
Repetitive sequences make up a significant fraction of almost any genome and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a radically new approach to repeat classification that is motivated by the fundamental topological notion of quotient spaces. A torus or Klein bottle are examples of quotient spaces that can be obtained from a square by gluing some points. Our new repeat classification algorithm is based on the observation...
13 Citations Source Cite
View next paperBasic Local Alignment Search Tool