A clustering method for repeat analysis in DNA sequences.

Published on Aug 1, 2001in Genome Biology 13.21
· DOI :10.1186/gb-2001-2-8-research0027
Natalia Volfovsky15
Estimated H-index: 15
,
Brian J. Haas45
Estimated H-index: 45
,
Steven L. Salzberg118
Estimated H-index: 118
Abstract
Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.
  • References (23)
  • Citations (118)
Cite
References23
Published on Jan 1, 1999in Nucleic Acids Research 11.56
Al Delcher1
Estimated H-index: 1
,
Simon Kasif44
Estimated H-index: 44
+ 2 AuthorsOwen White84
Estimated H-index: 84
28 Citations
Published on Jan 1, 1994 in Intelligent Systems in Molecular Biology
Pankaj K. Agarwal58
Estimated H-index: 58
,
David J. States33
Estimated H-index: 33
Over 3.6 million bases of DNA sequence from chromosome III of the C. eleganshave been determined. The availability of this extended region of contiguous sequence has allowed us to a~nalyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant locM alignments (utilizing both two-way and three-wa...
28 Citations
Published on Jan 1, 2000in Nature 41.58
Arabidopsis Genome Initiative1
Estimated H-index: 1
The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene tran...
6,700 Citations Source Cite
Published on Aug 19, 2000 in Intelligent Systems in Molecular Biology
Stefan Kurtz31
Estimated H-index: 31
(Bielefeld University),
Enno Ohlebusch23
Estimated H-index: 23
+ 2 AuthorsRobert Giegerich36
Estimated H-index: 36
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.
50 Citations
Published on May 1, 1999in Nature 41.58
Karen E. Nelson70
Estimated H-index: 70
(J. Craig Venter Institute),
Rebecca A. Clayton15
Estimated H-index: 15
(J. Craig Venter Institute)
+ 26 AuthorsKaren A. Ketchum21
Estimated H-index: 21
(J. Craig Venter Institute)
Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima
1,255 Citations Source Cite
Published on Sep 15, 2000in Nucleic Acids Research 11.56
Qiaoping Yuan10
Estimated H-index: 10
(J. Craig Venter Institute),
Feng Liang8
Estimated H-index: 8
(J. Craig Venter Institute)
+ 5 AuthorsRobin Buell1
Estimated H-index: 1
(J. Craig Venter Institute)
A wealth of molecular resources have been developed for rice genomics, including dense genetic maps, expressed sequence tags (ESTs), yeast artificial chromosome maps, bacterial artificial chromosome (BAC) libraries and BAC end sequence databases. Integration of genetic and physical maps involves labor-intensive empirical experiments. To accelerate the integration of the bacterial clone resources with the genetic map for the International Rice Genome Sequencing Project, we cleaned and filtered th...
38 Citations Source Cite
Published on May 28, 1997
Dan Gusfield37
Estimated H-index: 37
(University of California, Davis)
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suf...
3,164 Citations
Published on Mar 1, 1993in Nature Genetics 27.13
Warren Gish12
Estimated H-index: 12
(National Institutes of Health),
David J. States33
Estimated H-index: 33
(Washington University in St. Louis)
Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequenc...
1,366 Citations Source Cite
Published on Jun 1, 1996in SIAM Journal on Computing 0.90
Sampath Kannan33
Estimated H-index: 33
,
Eugene W. Myers52
Estimated H-index: 52
In this paper, we present an $O(N^2 \log^2 \,N)$ algorithm for finding the two nonoverlapping substrings of a given string of length $N$ which have the highest-scoring alignment between them. This significantly improves the previously best-known bound of $O(N^3 )$ for the worst-case complexity of this problem. One of the central ideas in the design of this algorithm is that of partitioning a matrix into pieces in such a way that all submatrices of interest for this problem can be put together as...
38 Citations Source Cite
Published on Oct 20, 1991in Journal of Molecular Biology 4.89
Ming Ying Leung11
Estimated H-index: 11
(University of Texas at San Antonio),
B. Edwin Blaisdell2
Estimated H-index: 2
(Stanford University)
+ 1 AuthorsSamuel Karlin76
Estimated H-index: 76
(Stanford University)
Abstract An efficient algorithm is described for finding matches, repeats and other word relations, allowing for errors, in large data sets of long molecular sequences. The algorithm entails hashing on fixed-size words in conjunction with the use of a linked list connecting all occurrences of the same word. The average memory and run time requirement both increase almost linearly with the total sequence length. Some results of the program's performance on a database of Escherichia coli DNA seque...
53 Citations Source Cite
  • References (23)
  • Citations (118)
Cite
Cited By118
Published on Jan 1, 2004 in Research in Computational Molecular Biology
Pavel A. Pevzner78
Estimated H-index: 78
,
Haixu Tang38
Estimated H-index: 38
,
Glenn Tesler27
Estimated H-index: 27
1 Citations
Published on Jan 1, 2004
Jia-Han Chu , Wei-Yuan Chang + 3 AuthorsHao Teng Chang13
Estimated H-index: 13
Published on Jan 1, 2011in Advances in Genetics 4.69
Dale J. Hedges31
Estimated H-index: 31
(University of Miami),
Victoria P. Belancio18
Estimated H-index: 18
(Tulane University)
Since their initial discovery in maize, there have been various attempts to categorize the relationship between transposable elements (TEs) and their host organisms. These have ranged from TEs being selfish parasites to their role as essential, functional components of organismal biology. Research over the past several decades has, in many respects, only served to complicate the issue even further. On the one hand, investigators have amassed substantial evidence concerning the negative effects t...
15 Citations Source Cite
Published on Aug 6, 2015in Microbiology spectrum
Jainy Thomas8
Estimated H-index: 8
(University of Utah),
Ellen J. Pritham21
Estimated H-index: 21
(University of Utah)
Helitrons, the eukaryotic rolling-circle transposable elements, are widespread but most prevalent among plant and animal genomes. Recent studies have identified three additional coding and structural variants of Helitrons called Helentrons, Proto-Helentron, and Helitron2. Helitrons and Helentrons make up a substantial fraction of many genomes where nonautonomous elements frequently outnumber the putative autonomous partner. This includes the previously ambiguously classified DINE-1-like repeats,...
22 Citations Source Cite
Published on Feb 19, 2016in Discrete Applied Mathematics 0.93
Atsuyoshi Nakamura8
Estimated H-index: 8
(Hokkaido University),
Ichigaku Takigawa13
Estimated H-index: 13
(Hokkaido University)
+ 2 AuthorsHiroshi Mamitsuka25
Estimated H-index: 25
(Kyoto University)
We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k -gap-constrained setting, in which the number of gaps in the alignment between a patter...
5 Citations Source Cite
Shuaibin Lian1
Estimated H-index: 1
(Xinyang Normal University),
Xinwu Chen1
Estimated H-index: 1
(Xinyang Normal University)
+ 2 AuthorsXianhua Dai1
Estimated H-index: 1
(Sun Yat-sen University)
It has become clear that repetitive sequences have played multiple roles in eukaryotic genome evolution including increasing genetic diversity through mutation, changes in gene expression and facilitating generation of novel genes. However, identification of repetitive elements can be difficult in the ab initio manner. Currently, some classical ab initio tools of finding repeats have already presented and compared. The completeness and accuracy of detecting repeats of them are little pool. To th...
4 Citations Source Cite
Published on Jan 1, 2005in Methods in Enzymology 1.98
Robert K. Jansen47
Estimated H-index: 47
,
Linda A. Raubeson14
Estimated H-index: 14
+ 12 AuthorsSallie J. Herman1
Estimated H-index: 1
Abstract During the past decade, there has been a rapid increase in our understanding of plastid genome organization and evolution due to the availability of many new completely sequenced genomes. There are 45 complete genomes published and ongoing projects are likely to increase this sampling to nearly 200 genomes during the next 5 years. Several groups of researchers including ours have been developing new techniques for gathering and analyzing entire plastid genome sequences and details of th...
203 Citations Source Cite
Published on Apr 25, 2006in BMC Evolutionary Biology 3.03
Jean-Charles de Cambiaire5
Estimated H-index: 5
(Laval University),
Christian Otis34
Estimated H-index: 34
(Laval University)
+ 1 AuthorsMonique Turmel41
Estimated H-index: 41
(Laval University)
Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order....
59 Citations Source Cite
Published on Sep 1, 2014in Journal of Biomedical Research
Udayakumar Mani1
Estimated H-index: 1
,
Vaidhyanathan Mahaganapathy + 1 AuthorsSai Mukund Ramakrishnan1
Estimated H-index: 1
The functionality of a gene or a protein depends on codon repeats occurring in it. As a consequence of their vitality in protein function and apparent involvement in causing diseases, an interest in these repeats has developed in recent years. The analysis of genomic and proteomic sequences to identify such repeats requires some algorithmic support from informatics level. Here, we proposed an offline stand-alone toolkit Repeat Searcher and Motif Detector (RSMD), which uncovers and employs few no...
Source Cite
Published on Jan 1, 2004
Tun-Wen Pai9
Estimated H-index: 9
(National Taiwan Ocean University),
Margaret Dah-Tsyr Chang25
Estimated H-index: 25
+ 2 AuthorsHsiu Ling Tai1
Estimated H-index: 1
In this study we have designed a novel algorithm for searching common segments in multiple DNA sequences. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-segments and reduce its time complexity for pattern matching. I...
Are you a researcher?
Try search on the fastest academic search engine.