scinapse is loading now...

Computation and Visualization of Degenerate Repeats in Complete Genomes

Published on Aug 19, 2000 in Intelligent Systems in Molecular Biology
Stefan Kurtz31
Estimated H-index: 31
(Bielefeld University),
Enno Ohlebusch23
Estimated H-index: 23
+ 2 AuthorsRobert Giegerich37
Estimated H-index: 37
Abstract
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.
  • References (29)
  • Citations (50)
References29
Newest
Published on Dec 1, 1999in Nature 41.58
Ian Dunham47
Estimated H-index: 47
(Wellcome Trust Sanger Institute),
Nobuyoshi Shimizu38
Estimated H-index: 38
(Wellcome Trust Sanger Institute)
+ 1 AuthorsS. Chissoe1
Estimated H-index: 1
(Wellcome Trust Sanger Institute)
Knowledge of the complete genomic DNA sequence of an organism allows a systematic approach to defining its genetic components. The genomic sequence provides access to the complete structures of all genes, including those without known function, their control elements, and, by inference, the proteins they encode, as well as all other biologically important sequences. Furthermore, the sequence is a rich and permanent source of information for the design of further biological studies of the organis...
1,302 Citations Source Cite
Published on Nov 1, 1999in Software - Practice and Experience 1.34
Stefan Kurtz31
Estimated H-index: 31
(Bielefeld University)
SUMMARY We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, a...
287 Citations Source Cite
Published on Jul 1, 1999
Vladimir N. Babenko12
Estimated H-index: 12
,
P. S. Kosarev1
Estimated H-index: 1
+ 3 AuthorsAnatoly S. Frolov13
Estimated H-index: 13
Motivation: Despite the growing volume of data on primary nucleotide sequences, the regulatory regions remain a major puzzle with regard to their function. Numerous recognising programs considering a diversity of properties of regulatory regions have been developed. The system proposed here allows the specific contextual, conformational and physico-chemical properties to be revealed based on analysis of extended DNA regions. Results: The Internet-accessible computer system RegScan, designed to a...
29 Citations Source Cite
Published on May 1, 1999in Bioinformatics 5.48
Stefan Kurtz31
Estimated H-index: 31
,
Chris Schleiermacher5
Estimated H-index: 5
A software tool was implemented that computes exact repeats and palindromes in entire genomes very efficiently.
312 Citations Source Cite
Published on Jan 1, 1999in Nucleic Acids Research 11.56
Arthur L. Delcher47
Estimated H-index: 47
(Celera Corporation),
Simon Kasif45
Estimated H-index: 45
(University of Illinois at Chicago)
+ 3 AuthorsSteven L. Salzberg120
Estimated H-index: 120
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation...
693 Citations Source Cite
Published on Jan 1, 1999in Nucleic Acids Research 11.56
Gary Benson26
Estimated H-index: 26
(Icahn School of Medicine at Mount Sinai)
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem re...
3,595 Citations Source Cite
Published on Sep 1, 1998in Bioinformatics 5.48
Pierre Vincens13
Estimated H-index: 13
,
Laurent Buffat2
Estimated H-index: 2
+ 3 AuthorsSerge Hazout17
Estimated H-index: 17
Motivation: Complete genomic sequences will become available in the future. New methods to deal with very large sequences (sizes beyond 100 kb) efficiently are required. One of the main aims of such work is to increase our understanding of genome organization and evolution. This requires studies of the locations of regions of similarity, Results: We present here a new tool, ASSIRC ('Accelerated Search for SImilarity Regions in Chromosomes'),for findin regions of similarity in genomic sequences. ...
35 Citations Source Cite
Published on Aug 1, 1998in SIAM Journal on Computing 0.90
Jeanette P. Schmidt15
Estimated H-index: 15
Weighted paths in directed grid graphs of dimension (m X n) can be used to model the string edit problem, which consists of obtaining optimal (weighted) alignments between substrings of A, |A|=m, and substrings of B, |B|=n. We build a data structure (in O(mn log m) time) that supports O(log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices. Using these techniques we present a simple O(n2 log n) time and $\Theta(n^2)$...
115 Citations Source Cite
Published on Jun 1, 1998in Molecular Microbiology 3.82
Chih-Hung Huang13
Estimated H-index: 13
(National Yang-Ming University),
Yi-Shing Lin6
Estimated H-index: 6
(National Yang-Ming University)
+ 2 AuthorsCarton W. Chen17
Estimated H-index: 17
(National Yang-Ming University)
Summary The chromosomes of the Gram-positive soil bacteria Streptomyces are linear DNA molecules, usually of about 8 Mb, containing a centrally located origin of replication and covalently bound terminal proteins (which are presumably involved in the completion of replication of the telomeres). The ends of the chromosomes contain inverted repeats of variable lengths. The terminal segments of five Streptomyces chromosomes and plasmids were cloned and sequenced. The sequences showed a high degree ...
104 Citations Source Cite
Published on Apr 20, 1998
Marie-France Sagot33
Estimated H-index: 33
(Pasteur Institute)
We present in this paper two algorithms. The first one extracts repeated motifs from a sequence defined over an alphabet Σ. For instance, Σ may be equal to (A, C, G, T} and the sequence represents an encoding of a DNA macromolecule. The motifs searched correspond to words over the same alphabet which occur a minimum number q of times in the sequence with at most e mismatches each time (q is called the quorum constraint). The second algorithm extracts common motifs from a set of N ≥ 2 sequences. ...
195 Citations
Cited By50
Newest
Published on Aug 6, 2015in Microbiology spectrum
Jainy Thomas8
Estimated H-index: 8
(University of Utah),
Ellen J. Pritham22
Estimated H-index: 22
(University of Utah)
Helitrons, the eukaryotic rolling-circle transposable elements, are widespread but most prevalent among plant and animal genomes. Recent studies have identified three additional coding and structural variants of Helitrons called Helentrons, Proto-Helentron, and Helitron2. Helitrons and Helentrons make up a substantial fraction of many genomes where nonautonomous elements frequently outnumber the putative autonomous partner. This includes the previously ambiguously classified DINE-1-like repeats,...
25 Citations Source Cite
Published on Jan 1, 2015
Cyanobacteria are globally widespread and ecologically highly significant photoautotrophic microorganisms, with diverse geno- and phenotypic characters unprecedented among prokaryotes. This phylum ...
Published on Jan 1, 2014
María Botón-Fernández3
Estimated H-index: 3
,
Carlos Martín-Vide26
Estimated H-index: 26
+ 1 AuthorsMiguel A. Vega-Rodríguez20
Estimated H-index: 20
5 Citations Source Cite
Torabi Dashti Hesam1
Estimated H-index: 1
(University of Tehran),
Masoudi Nejad Ali1
Estimated H-index: 1
,
Zare Fatemeh
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome's subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approaches. In this article we propose novel de novo methodology which stands on theory of pattern recognition's sc...
Published on Aug 31, 2011 in International Conference on Information Technology
Maria Federico4
Estimated H-index: 4
(University of Pisa),
Nadia Pisanti13
Estimated H-index: 13
(University of Pisa)
Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologis...
Source Cite
Published on Aug 1, 2011
Dan He13
Estimated H-index: 13
(University of California, Los Angeles),
Xingquan Zhuy37
Estimated H-index: 37
(Florida Atlantic University),
Xindong Wu46
Estimated H-index: 46
(Hefei University of Technology)
The rapid increase of available DNA, protein, and other biological sequences has made the problem of discovering meaningful patterns from sequences an important task for Bioinformatics research. Among all types of patterns defined in the literature, the most challenging one is to find repeating patterns with gap constraints. In this article, we identify a new research problem for mining approximate repeating patterns (ARPs) with gap constraints, where the appearance of a pattern is subject to an...
7 Citations Source Cite
Published on Apr 29, 2011in PLOS ONE 2.77
Liliana Losada15
Estimated H-index: 15
(J. Craig Venter Institute),
John J. Varga14
Estimated H-index: 14
(J. Craig Venter Institute)
+ 5 AuthorsWilliam C. Nierman56
Estimated H-index: 56
(George Washington University)
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T hom...
18 Citations Source Cite
Published on Jan 1, 2010
Carlos Norberto Fischer1
Estimated H-index: 1
,
Adriane Beatriz de Souza Serapião1
Estimated H-index: 1
With the advances in the genome area, new techniques and automation processes for DNA sequencing, the amount of data produced has increased exponentially. Analyzing this data, in order to identify interesting biological features, is an enormous challenge, especially if it would be done manually. Think about trying to find a specific word in a book, say Don Quixote, and we have to search word by word. How long it would take? Bioinformatics has played an important role trying to help specialists t...
1 Citations Source Cite
Published on Dec 1, 2009in BMC Bioinformatics 2.21
Josiah D Seaman1
Estimated H-index: 1
,
John C. Sanford26
Estimated H-index: 26
(Cornell University)
Background It is increasingly evident that there are multiple and overlapping patterns within the genome, and that these patterns contain different types of information - regarding both genome function and genome history. In order to discover additional genomic patterns which may have biological significance, novel strategies are required. To partially address this need, we introduce a new data visualization tool entitled Skittle.
3 Citations Source Cite
Published on Nov 1, 2009 in International Conference on Tools with Artificial Intelligence
Dan He13
Estimated H-index: 13
(University of California, Los Angeles),
Xingquan Zhuy37
Estimated H-index: 37
(Chinese Academy of Sciences),
Xindong Wu46
Estimated H-index: 46
(Hefei University of Technology)
In this paper, we define a new research problem for mining approximate repeating patterns (ARP) with gap constraints, where the appearance of a pattern is subject to an approximate matching, which is very common in biological sciences. To solve the problem, we propose an ArpGap (Approximate repeating pattern mining with Gap constraints) algorithm with three major components for approximate repeating pattern mining: (1) a data-driven pattern generation approach to avoid generating unnecessary pat...
7 Citations Source Cite