repeat classification and fragment assembly

Published on Jan 1, 2004 in Research in Computational Molecular Biology
Pavel A. Pevzner80
Estimated H-index: 80
Haixu Tang38
Estimated H-index: 38
Glenn Tesler27
Estimated H-index: 27
  • References (28)
  • Citations (1)
Published on Dec 1, 2004in Journal of Computational Biology 1.19
Sebastian Böcker30
Estimated H-index: 30
32 Citations Source Cite
Published on Apr 1, 2004in Genome Research 10.10
Guillaume Bourque39
Estimated H-index: 39
Pavel A. Pevzner80
Estimated H-index: 80
Glenn Tesler27
Estimated H-index: 27
Recent analysis of genome rearrangements in human and mouse genomes revealed evidence for more rearrangements than thought previously and shed light on previously unknown features of mammalian evolution, like breakpoint reuse and numerous microrearrangements. However, two-way analysis cannot reveal the genomic architecture of ancestral mammals or assign rearrangement events to different lineages. Thus, the “original synteny” problem introduced by Nadeau and Sankoff previously, remains unsolved, ...
207 Citations Source Cite
Published on Dec 12, 2003in Genome Research 10.10
Mihai Pop48
Estimated H-index: 48
(J. Craig Venter Institute),
Daniel S. Kosack4
Estimated H-index: 4
(J. Craig Venter Institute),
Steven L. Salzberg120
Estimated H-index: 120
(Johns Hopkins University)
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffol...
176 Citations Source Cite
Published on Jan 1, 2003in Genome Research 10.10
James C. Mullikin72
Estimated H-index: 72
Zemin Ning36
Estimated H-index: 36
Whole-genome shotgun (WGS) sequencing is an approach used since the early 1980s (Sanger et al. 1982); what has changed since then is the size of genome one considers reasonable for the technology available at the time (Staden 1980, 1982). During the 1980s, the optimal size developed from the successful WGS of bacteriophage λ at 49 kb (Sanger et al. 1982) up to hundreds of kilobases for various viral genomes by the end of the decade. Workstation class computers during the 1980s grew from submegab...
189 Citations Source Cite
Itsik Pe'er38
Estimated H-index: 38
(Tel Aviv University),
Naama Arbili2
Estimated H-index: 2
(Tel Aviv University),
Ron Shamir61
Estimated H-index: 61
(Tel Aviv University)
Universal arrays contain all possible oligonucleotides of a certain length, typically 6–10 bases. They can determine in a single experiment all substrings of that length that occur along a target sequence. That information, also called the spectrum of the sequence, is not sufficient to uniquely reconstruct a sequence longer than a few hundred bases. We have devised a polynomial algorithm that reconstructs the sequence, given the spectrum and an additional reference sequence, homologous to the ta...
16 Citations Source Cite
Published on Aug 23, 2002in Science 41.06
Samuel Aparicio1
Estimated H-index: 1
(Agency for Science, Technology and Research),
Jarrod Chapman32
Estimated H-index: 32
(Agency for Science, Technology and Research)
+ 38 AuthorsArian F. A. Smit38
Estimated H-index: 38
(Agency for Science, Technology and Research)
The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some “giant” genes were observed that had average coding sequence sizes but were sprea...
1,289 Citations Source Cite
Published on Aug 1, 2002in Genome Research 10.10
Zhirong Bao23
Estimated H-index: 23
(Washington University in St. Louis),
Sean R. Eddy76
Estimated H-index: 76
(Washington University in St. Louis)
Repetitive sequences make up a major part of eukaryotic genomes. We have developed an approach for the de novo identification and classification of repeat sequence families that is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. When tested o...
397 Citations Source Cite
Published on Jul 1, 2002 in Intelligent Systems in Molecular Biology
Steffen Heber19
Estimated H-index: 19
(University of California, San Diego),
Max A. Alekseyev17
Estimated H-index: 17
(University of California, San Diego)
+ 2 AuthorsPavel A. Pevzner80
Estimated H-index: 80
(University of California, San Diego)
Motivation: The traditional approach to annotate alternative splicing is to investigate every splicing variant of the gene in a case-by-case fashion. This approach, while useful, has some serious shortcomings. Recent studies indicate that alternative splicing is more frequent than previously thought and some genes may produce tens of thousands of different transcripts. A list of alternatively spliced variants for such genes would be difficult to build and hard to analyse. Moreover, such a list d...
161 Citations Source Cite
Published on Mar 1, 2002in Bioinformatics 5.48
Christopher Lee45
Estimated H-index: 45
(University of California, Los Angeles),
Catherine S. Grasso23
Estimated H-index: 23
Mark F. Sharlow1
Estimated H-index: 1
Motivation: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. Results: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optima...
563 Citations Source Cite
Published on Jan 1, 2002in Genome Research 10.10
Serafim Batzoglou45
Estimated H-index: 45
David B. Jaffe58
Estimated H-index: 58
+ 6 AuthorsEric S. Lander245
Estimated H-index: 245
(Massachusetts Institute of Technology)
We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providi...
567 Citations Source Cite