Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch.
Loading Scinapse...
Computation and Visualization of Degenerate Repeats in Complete Genomes
Published on Aug 19, 2000 in Intelligent Systems in Molecular Biology
Stefan Kurtz30
Estimated H-index: 30
(Bielefeld University),
Enno Ohlebusch23
Estimated H-index: 23
+ 2 AuthorsRobert Giegerich35
Estimated H-index: 35
Abstract
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.
  • Full text
  • References (29)
  • Cited By (50)
References29
Published on Jan 1, 1995
Marie-France Sagot33
Estimated H-index: 33
,
Vincent Escalier2
Estimated H-index: 2
+ 1 AuthorsHenri Soldano5
Estimated H-index: 5
30 Citations
Published on Jan 1, 1994 in Intelligent Systems in Molecular Biology
Pankaj K. Agarwal52
Estimated H-index: 52
,
David J. States32
Estimated H-index: 32
Over 3.6 million bases of DNA sequence from chromosome III of the C. eleganshave been determined. The availability of this extended region of contiguous sequence has allowed us to a~nalyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant locM alignments (utilizing both two-way and three-wa...
28 Citations
Published on Jan 1, 1993
Edwin H. McConkey1
Estimated H-index: 1
Begins with molecular characterization of the human genome (rather than the conventional descriptions of Mendelian inheritance, pedigree analysis, and chromosome abnormalities), and maintains this emphasis on understanding human genetics in molecular terms throughout. Suitable as a text for biology
31 Citations
Published on Jan 1, 1986in Methods in Enzymology 1.98
Walter M. Fitch51
Estimated H-index: 51
,
Temple F. Smith37
Estimated H-index: 37
,
Jan L. Breslow87
Estimated H-index: 87
Publisher Summary Sequence repeats are most easily generated through combinations of gene duplication and unequal crossings over. This chapter describes the detection of internally repeated sequences, and explores the ancestral history. The phylogenetic reconstruction method for detecting tandem repeats is quite powerful. The sensitivity of the method in any particular case is dependent upon the number of repeats, the fraction of the sequence that is composed of repeats, and the evolutionary con...
6 Citations Source Cite
Published on Apr 20, 1998
Marie-France Sagot33
Estimated H-index: 33
(Pasteur Institute)
We present in this paper two algorithms. The first one extracts repeated motifs from a sequence defined over an alphabet Σ. For instance, Σ may be equal to (A, C, G, T} and the sequence represents an encoding of a DNA macromolecule. The motifs searched correspond to words over the same alphabet which occur a minimum number q of times in the sequence with at most e mismatches each time (q is called the quorum constraint). The second algorithm extracts common motifs from a set of N ≥ 2 sequences. ...
193 Citations
Published on Jun 30, 1997
Stefan Kurtz30
Estimated H-index: 30
(Bielefeld University),
Gene Myers12
Estimated H-index: 12
(University of Arizona)
16 Citations Source Cite
Published on Jun 2, 1993
Gad M. Landau31
Estimated H-index: 31
(New York University),
Jeanette P. Schmidt15
Estimated H-index: 15
(New York University)
A perfect tandem repeat within a string S is a substring r = r1,... r2l of S, for which r1 ... rl = rl+1 ... r2l. An approximate tandem repeat is a substring r = r1,..., rl′,... rl, for which r1,..., rl′ and rl′+1, ... rl are similar. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = ūu, for which the Hammi...
78 Citations Source Cite
Published on Jan 1, 1974in Journal of the ACM 1.74
Robert A. Wagner5
Estimated H-index: 5
(Vanderbilt University),
Michael J. Fischer45
Estimated H-index: 45
(Massachusetts Institute of Technology)
The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. ...
2,319 Citations Source Cite
Published on Jul 1, 1984in Bulletin of Mathematical Biology 1.48
Michael S. Waterman61
Estimated H-index: 61
(University of Southern California),
Richard Arratia27
Estimated H-index: 27
(University of Southern California),
D. J. Galas1
Estimated H-index: 1
(University of Southern California)
The comparison of several sequences is central to many problems of molecular biology. Finding consensus patterns that define genetic control regions or that determine structural or functional themes are examples of these problems. Previously proposed methods, such as dynamic programming, are not adequate for solving problems of realistic size. This paper gives a new and practical solution for finding unknown patterns that occur imperfectly above a preset frequency. Algorithms for finding the pat...
130 Citations Source Cite
Published on Jan 1, 1998in Journal of Computational Biology 1.19
Marie-France Sagot33
Estimated H-index: 33
,
Eugene W. Myers53
Estimated H-index: 53
ABSTRACT We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30–40 base pairs) approximate tandem repeats where copies may differ up to ϵ = 15–20% from a consensus model of the repeating unit (imp...
40 Citations Source Cite
Cited By50
Published on Jan 1, 2004 in Research in Computational Molecular Biology
Pavel A. Pevzner77
Estimated H-index: 77
,
Haixu Tang38
Estimated H-index: 38
,
Glenn Tesler27
Estimated H-index: 27
1 Citations
Published on Jan 1, 2008
G Achaz , F Boyer + 1 AuthorsE Coissac
The importance of genome redundancy has been strongly emphasized in the field of genome dynamics and evolution as well as in medical biology. A repeat is a sequence present twice or more with a high degree of similarity within a larger sequence (e.g. a chromosome) or set of sequences (e.g. a genome with several chromosomes). Each instance of the repeated sub-sequence is called a ’copy’ of the repeat. We use the term ”duplication” to denote any active mechanistic event that creates a repeat. Even...
Published on Jan 1, 2009
Searching for repetitive structures in DNA sequences is a major problem in bioinformatics research. We propose a novel index structure, called Parent-of-Leaves (POL) index and an algorithm for finding supermaximal repeats (SMR) which uses the index. The index is derived from and designed to replace the more versatile, but considerably larger suffix tree index STTD64. The results of our experiments using 24 homo sapiens chromosomes indicate that SMR significantly outperforms the Vmatch tool, the ...
Published on Aug 31, 2011 in International Conference on Information Technology
Maria Federico4
Estimated H-index: 4
(University of Pisa),
Nadia Pisanti13
Estimated H-index: 13
(University of Pisa)
Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologis...
Source Cite
Published on Jan 1, 2015
Cyanobacteria are globally widespread and ecologically highly significant photoautotrophic microorganisms, with diverse geno- and phenotypic characters unprecedented among prokaryotes. This phylum ...
Published on Nov 1, 2004
Teemu Kivioja1
Estimated H-index: 1
In this thesis we provide computational tools for the planning of VTTTRAC experiments. VTT-TRAC is a novel method for measuring expression levels of genes. Monitoring gene expression by measuring the amounts of transcribed mRNAs (transcriptional profiling) has become an important experimental method in molecular biology. This has been due to rapid advance in the high-throughput measurement technology. Methods like microarrays are capable of measuring thousands of expression levels in one experim...
6 Citations
Published on Aug 6, 2015in Microbiology spectrum
Jainy Thomas8
Estimated H-index: 8
(University of Utah),
Ellen J. Pritham21
Estimated H-index: 21
(University of Utah)
Helitrons, the eukaryotic rolling-circle transposable elements, are widespread but most prevalent among plant and animal genomes. Recent studies have identified three additional coding and structural variants of Helitrons called Helentrons, Proto-Helentron, and Helitron2. Helitrons and Helentrons make up a substantial fraction of many genomes where nonautonomous elements frequently outnumber the putative autonomous partner. This includes the previously ambiguously classified DINE-1-like repeats,...
20 Citations Source Cite
Published on Dec 23, 2003in BMC Bioinformatics 2.21
Michael Brudno44
Estimated H-index: 44
(Stanford University),
Michael A. Chapman15
Estimated H-index: 15
(University of Cambridge)
+ 2 AuthorsBurkhard Morgenstern36
Estimated H-index: 36
(Bielefeld University)
Background: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achi...
153 Citations Source Cite
Published on Sep 11, 2006
Aaron E. Darling34
Estimated H-index: 34
(University of Wisconsin-Madison),
Todd J. Treangen6
Estimated H-index: 6
(Polytechnic University of Catalonia)
+ 3 AuthorsNicole T. Perna39
Estimated H-index: 39
(University of Wisconsin-Madison)
We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simultaneously, (2) seed extension (chaining) in order of decreasing multiplicity, and (3) procrastination when low multiplicity matches are encountered. The resulting local multiple alignments may have nucleotide substitutions and internal gaps as la...
15 Citations Source Cite
View next paperREPuter: fast computation of maximal repeats in complete genomes.