A clustering method for repeat analysis in DNA sequences.
Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.
- Full text
- References (23)
- Cited By (118)
- References (23)
- Cited By (118)
Over 3.6 million bases of DNA sequence from chromosome III of the C. eleganshave been determined. The availability of this extended region of contiguous sequence has allowed us to a~nalyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant locM alignments (utilizing both two-way and three-wa...
1994 in Intelligent Systems in Molecular Biology
The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene tran...
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.
2000 in Intelligent Systems in Molecular Biology
The present application describes the complete 1.66-megabase pair genome sequence of an autotrophic archaeon, Methanococcus jannaschii, and its 58- and 16-kilobase pair extrachromosomal elements. Also described are 1738 predicted protein-coding genes.
Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima.Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima
A wealth of molecular resources have been developed for rice genomics, including dense genetic maps, expressed sequence tags (ESTs), yeast artificial chromosome maps, bacterial artificial chromosome (BAC) libraries and BAC end sequence databases. Integration of genetic and physical maps involves labor-intensive empirical experiments. To accelerate the integration of the bacterial clone resources with the genetic map for the International Rice Genome Sequencing Project, we cleaned and filtered th...
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suf...
- References (23)
- Cited By (118)
The primary objective of clustering is to discover a structure in the data by forming some number of clusters or groups. In order to achieve optimal clustering results in current soft computing approaches, two fundamental questions should be considered; (i) how many clusters should be actually presented in the given data, and (ii) how real or good the clustering itself is. Based on these two fundamental questions, almost clustering method needs to determine the number of clusters . Yet, it is di...
The bulk of variation at the nucleotide level is often not visible at the phenotypic level. However, this variation can be exploited using molecular genetic marker systems. Molecular genetic markers represent one of the most powerful tools for genome analysis and permit the association of heritable traits with underlying genomic variation. Molecular marker technology has developed rapidly over the last decade, with the development of high-throughput genotyping methods and the availability of lar...
DNA Sequence Databases.- Sequence Comparison Tools.- Genome Browsers.- Predicting Non-coding RNA Transcripts.- Gene Prediction Methods.- Gene Annotation Methods.- Regulatory Motif Analysis.- Molecular Marker Discovery and Genetic Map Visualisation.- Sequence Based Gene Expression Analysis.- Protein Sequence Databases.- Protein Structure Prediction.- Classification of Information About Proteins.- High-Throughput Plant Phenotyping - Data Acquisition, Transformation, and Analysis.- Phenome Analysis...
The importance of genome redundancy has been strongly emphasized in the field of genome dynamics and evolution as well as in medical biology. A repeat is a sequence present twice or more with a high degree of similarity within a larger sequence (e.g. a chromosome) or set of sequences (e.g. a genome with several chromosomes). Each instance of the repeated sub-sequence is called a ’copy’ of the repeat. We use the term ”duplication” to denote any active mechanistic event that creates a repeat. Even...
Restless Genomes: Humans as a Model Organism for Understanding Host-Retrotransposable Element DynamicsSince their initial discovery in maize, there have been various attempts to categorize the relationship between transposable elements (TEs) and their host organisms. These have ranged from TEs being selfish parasites to their role as essential, functional components of organismal biology. Research over the past several decades has, in many respects, only served to complicate the issue even further. On the one hand, investigators have amassed substantial evidence concerning the negative effects t...
For the last fifteen years, researchers have been using SINE (short interspersed elements; non-autonomous retroposons) insertion polymorphism as characters for phylogeny. Although the collection of these characters is much less straightforward and much more work intensive than for classical sequence data, they are subject to very little homoplasy, and therefore allow more reliable determination of the phylogeny of species. As reversions are very rare, and the ancestral state (absence of the inse...
Application of Machine Learning techniques on the Discovery and annotation of Transposons in genomes
Tese de mestrado integrado. Engenharia Informatica e computacao. Faculdade de Engenharia. Universidade do Porto. 2012