scinapse is loading now...

The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome.

Published on Jan 1, 1994 in Intelligent Systems in Molecular Biology
Pankaj K. Agarwal59
Estimated H-index: 59
,
David J. States34
Estimated H-index: 34
Abstract
Over 3.6 million bases of DNA sequence from chromosome III of the C. eleganshave been determined. The availability of this extended region of contiguous sequence has allowed us to a~nalyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant locM alignments (utilizing both two-way and three-way alignments), dividing the set of alignments into connected components (signifying repeat families), computing evolutionary distance between repeat fanfily members, constructing minimum spanning trees from the connected components, and ~isualizing the evolution of the repeat faanilies. Over 7000 families of repetitive sequences were identified. The size of the families ranged from isolated pairs to over 1600 segments of similar sequence. Approximately 12.3% of the analyzed sequence participates in a repeat element.
  • References (23)
  • Citations (29)
References23
Newest
Published on Feb 1, 1994in Nature Genetics 27.13
Stephen F. Altschul46
Estimated H-index: 46
,
Mark S. Boguski47
Estimated H-index: 47
+ 1 AuthorsJohn C. Wootton30
Estimated H-index: 30
Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the ch...
688 Citations Source Cite
Published on Jan 1, 1994
Bieganski1
Estimated H-index: 1
(University of Minnesota),
Riedl1
Estimated H-index: 1
(University of Minnesota)
+ 1 AuthorsRetzel2
Estimated H-index: 2
(University of Minnesota)
This paper addresses applications of suffix trees and generalized suffix trees (GSTs) to biological sequence data analysis. We define a basic set of suffix trees and GST operations needed to support sequence data analysis. While those definitions are straightforward, the construction and manipulation of disk-based GST structures for large volumes of sequence data requires intricate design. GST processing is fast because the structure is content addressable, supporting efficient searches for all ...
79 Citations Source Cite
J P McMillan1
Estimated H-index: 1
,
M F Singer1
Estimated H-index: 1
Abstract Full-length RNA transcribed from the human LINE-1 (L1) element L1 Homo sapiens (L1Hs) has a 900-nt, G+C-rich, 5'-untranslated region (UTR). The 5' UTR is followed by two long open reading frames, ORF1 and ORF2, which are separated from each other by an inter-ORF region of 33 nt that includes two or three in-frame stop codons. We examine here the mechanism(s) by which the translation of L1Hs ORF1 and ORF2 is initiated. A stable hairpin structure (delta G = -74.8 kcal/mol), inserted at nt...
58 Citations Source Cite
Published on Jul 1, 1993 in Intelligent Systems in Molecular Biology
Andrea Califano57
Estimated H-index: 57
,
Isidore Rigoutsos48
Estimated H-index: 48
139 Citations
Published on Mar 1, 1993in Journal of Molecular Evolution 1.96
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health)
Protein sequence alignments generally are constructed with the aid of a “substitution matrix” that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a “log-odds” matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be deri...
181 Citations Source Cite
Published on Jan 1, 1993in Bioinformatics 5.48
Aleksandar Milosavljević4
Estimated H-index: 4
(Linus Pauling Institute),
Jerzy Jurka58
Estimated H-index: 58
(Linus Pauling Institute)
A new method, «algorithmic significance», is proposed as a tool for discovery of patterns in DNA sequences. The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. In this sense, the method can be viewed as a formal version of the Occam's Razor principle. In this paper the method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain «words» and thus can be enc...
67 Citations Source Cite
Published on Oct 1, 1992in Journal of Molecular Evolution 1.96
Jerzy Jurka58
Estimated H-index: 58
(Linus Pauling Institute),
Jolanta Walichiewicz2
Estimated H-index: 2
(Linus Pauling Institute),
Aleksandar Milosavljević4
Estimated H-index: 4
(Linus Pauling Institute)
We report a collection of 53 prototypic sequences representing known families of repetitive elements from the human genome. The prototypic sequences are either consensus sequences or selected examples of repetitive sequences. The collection includes: prototypes for high and medium reiteration frequency interspersed repeats, long terminal repeats of endogenous retroviruses, alphoid repeats, telomere-associated repeats, and some miscellaneous repeats. The collection is annotated and available elec...
125 Citations Source Cite
Published on Aug 1, 1991in Methods 4.00
David J. States34
Estimated H-index: 34
(National Institutes of Health),
Warren Gish15
Estimated H-index: 15
(National Institutes of Health),
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health)
Scoring matrices for nucleic acid sequence comparison that are based on models appropriate to the analysis of molecular sequencing errors or biological mutation processes are presented. In mammalian genomes, transition mutations occur significantly more frequently than transversions, and the optimal scoring of sequence alignments based on this substitution model differs from that derived assuming a uniform mutation model. The information from sequence alignments potentially available using an op...
125 Citations Source Cite
Published on Jun 1, 1991in Journal of Molecular Biology 4.89
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health)
Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is i.mplicitly a "log-odds" matrix, with a specific target distribution for align...
557 Citations Source Cite
Published on Oct 1, 1990in Journal of Molecular Biology 4.89
Stephen F. Altschul46
Estimated H-index: 46
(National Institutes of Health),
Warren Gish15
Estimated H-index: 15
(National Institutes of Health)
+ 2 AuthorsDavid J. Lipman45
Estimated H-index: 45
(National Institutes of Health)
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a vari...
59.9k Citations Source Cite
Cited By29
Newest
Published on May 30, 2015
Kristal Curtis4
Estimated H-index: 4
,
Ameet Talwalkar25
Estimated H-index: 25
+ 2 AuthorsDavid A. Patterson78
Estimated H-index: 78
Abstract : Next-generation genomic sequencing costs are rapidly decreasing, having recently reached the $1000-per-genome barrier, a likely tipping point for widespread clinical use. However, genomic analysis techniques have failed to keep pace. In particular, the process of variant calling, or inferring a sample genome from the noisy sequencing data, introduces major computational and statistical challenges. In this work, we explore the feasibility of a hybrid approach that addresses these chall...
1 Citations
Published on Jan 1, 2015
Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy th...
Published on Jan 1, 2013in Methods of Molecular Biology
Ning Jiang33
Estimated H-index: 33
(Michigan State University)
6 Citations Source Cite
Torabi Dashti Hesam1
Estimated H-index: 1
(University of Tehran),
Masoudi Nejad Ali1
Estimated H-index: 1
,
Zare Fatemeh
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome's subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approaches. In this article we propose novel de novo methodology which stands on theory of pattern recognition's sc...
Published on Jan 1, 2012in Methods of Molecular Biology
Wojciech Makal31
Estimated H-index: 31
(University of Münster),
Amit Pande2
Estimated H-index: 2
(University of Münster)
+ 1 AuthorsIzabela Makalowska27
Estimated H-index: 27
(Adam Mickiewicz University in Poznań)
20 Citations Source Cite
Published on Aug 1, 2011in Chromosome Research 2.91
Mateusz Janicki1
Estimated H-index: 1
(University of Toronto),
Rebecca Rooke3
Estimated H-index: 3
(University of Toronto),
Guojun Yang23
Estimated H-index: 23
(University of Toronto)
A major portion of most eukaryotic genomes are transposable elements (TEs). During evolution, TEs have introduced profound changes to genome size, structure, and function. As integral parts of genomes, the dynamic presence of TEs will continue to be a major force in reshaping genomes. Early computational analyses of TEs in genome sequences focused on filtering out “junk” sequences to facilitate gene annotation. When the high abundance and diversity of TEs in eukaryotic genomes were recognized, t...
39 Citations Source Cite
Published on Aug 2, 2010 in International Conference on Bioinformatics
Nirmalya Bandyopadhyay8
Estimated H-index: 8
(University of Florida),
A. Mark Settles15
Estimated H-index: 15
(University of Florida),
Tamer Kahveci22
Estimated H-index: 22
(University of Florida)
Growing sequencing and assembly efforts have been met by the advances in high throughput machines. However, the presence of massive amounts of repeats and transposons complicates the assembly process. Given a library of possible repeats, this paper considers the problem of identifying repeats and transposons in the fragments (also called reads) generated from sequencing machines. This is a difficult problem as the locations of the fragments on the complete genome are not known. Furthermore, due ...
Source Cite
Published on Jun 1, 2010in Heredity 3.87
Emmanuelle Lerat19
Estimated H-index: 19
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs
121 Citations Source Cite
Published on Jan 1, 2010
Carlos Norberto Fischer1
Estimated H-index: 1
,
Adriane Beatriz de Souza Serapião1
Estimated H-index: 1
With the advances in the genome area, new techniques and automation processes for DNA sequencing, the amount of data produced has increased exponentially. Analyzing this data, in order to identify interesting biological features, is an enormous challenge, especially if it would be done manually. Think about trying to find a specific word in a book, say Don Quixote, and we have to search word by word. How long it would take? Bioinformatics has played an important role trying to help specialists t...
1 Citations Source Cite