Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch.
Loading Scinapse...
Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
Published on May 28, 1997
Dan Gusfield34
Estimated H-index: 34
(University of California, Davis)
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suffix trees Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology 11. Core string edits, alignments and dynamic programming 12. Refining core string edits and alignments 13. Extending the core problems 14. Multiple string comparison: the Holy Grail 15. Sequence database and their uses: the motherlode Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings 17. Strings and evolutionary trees 18. Three short topics 19. Models of genome-level mutations.
  • References (0)
  • Cited By (3153)
Cited By3153
Published on Jul 25, 2012 in International Conference on Intelligent Computing
Hui Zhang6
Estimated H-index: 6
(Zhejiang University of Technology),
Qing Guo5
Estimated H-index: 5
(Zhejiang University),
Costas S. Iliopoulos29
Estimated H-index: 29
(King's College London)
A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique, and present the O(n 2) time algorithm.
Source Cite
Published on Jan 1, 2013
Masha Sosonkina13
Estimated H-index: 13
(Iowa State University),
Zhao Zhang20
Estimated H-index: 20
(Iowa State University),
Vaibhav Sundriyal7
Estimated H-index: 7
(Iowa State University)
Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in sof...
Source Cite
Published on Jun 12, 2013 in International Conference on Artificial Neural Networks
Xibin Zhu7
Estimated H-index: 7
(Bielefeld University),
Frank-Michael Schleif17
Estimated H-index: 17
(Bielefeld University),
Barbara Hammer30
Estimated H-index: 30
(Bielefeld University)
The amount and complexity of data increase rapidly, however, due to time and cost constrains, only few of them are fully labeled. In this context non-vectorial relational data given by pairwise (dis-)similarities without explicit vectorial representation, like score- values in sequences alignments, are particularly challenging. Existing semi-supervised learning (SSL) algorithms focus on vectorial data given in Euclidean space. In this paper we extend a prototype-based classifier for dissimilarit...
2 Citations Source Cite
Published on Jan 1, 2010 in IEEE International Conference on High Performance Computing, Data, and Analytics
Steven L. Salzberg118
Estimated H-index: 118
(University of Maryland, College Park),
Michael C. Schatz50
Estimated H-index: 50
(University of Maryland, College Park)
Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental comput...
3 Citations
Published on Aug 24, 2013 in Symposium on Search Based Software Engineering
Nesa Asoudeh2
Estimated H-index: 2
(Carleton University),
Yvan Labiche36
Estimated H-index: 36
(Carleton University)
We propose a test suite generation technique from extended finite state machines based on a genetic algorithm that fulfills multiple conflicting objectives. We aim at maximizing coverage and feasibility of a set of test cases while minimizing similarity between these cases and minimizing overall cost.
2 Citations Source Cite
Published on Jan 1, 2013
Elisa Pappalardo4
Estimated H-index: 4
(Johns Hopkins University),
Panos M. Pardalos73
Estimated H-index: 73
(University of Florida),
Giovanni Stracquadanio13
Estimated H-index: 13
(Johns Hopkins University)
The increasing amount of genomic data and the ability to synthesize artificial DNA constructs poses a series of challenging problems involving the identification and design of sequences with specific properties. We address the identification of such sequences; many of these problems present challenges both at biological and computational level. In this chapter, we introduce the main string selection problems and the theoretical and experimental results for the most important instances.
1 Citations Source Cite
Published on Apr 21, 2014 in Database Systems for Advanced Applications
Xianming Wang2
Estimated H-index: 2
(Sichuan University),
Lei Duan5
Estimated H-index: 5
(Sichuan University)
+ 2 AuthorsChangjie Tang12
Estimated H-index: 12
(Sichuan University)
Distinguishing sequential patterns are useful in characterizing a given sequence class and contrasting that class against other sequence classes. This paper introduces the density concept into distinguishing sequential pattern mining, extending previous studies which considered gap and support constraints. Density is concerned with the number of times of given patterns occur in individual sequences; it is an important factor in many applications including biology, healthcare and financial analys...
11 Citations Source Cite
Published on Jan 1, 2014
Data integration is a broad area encompassing techniques to merge data between data sources. Although there are plenty of efficient and effective methods focusing on data integration over homogeneous data, where instances share the same schema and range of values, their applications over heterogeneous data are less clear. This thesis considers data integration within the environment of the Semantic Web. More particularly, we propose a novel architecture for instance matching that takes into acco...
Source Cite
Published on Sep 11, 2011 in Parallel Processing and Applied Mathematics
Tuan Tu Tran3
Estimated H-index: 3
(university of lille),
Mathieu Giraud10
Estimated H-index: 10
(university of lille),
Jean-Stéphane Varré5
Estimated H-index: 5
(university of lille)
Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm [16] to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities with seed-based heuristics. We successfully implemented and ran our algorithms on GPU and multicore CPU. Some speedups ...
10 Citations Source Cite
Published on Jan 1, 2009
This paper surveys some researches to accomplish on bioinformatics. These researches wish to propose a database architecture combining a general view of bioinformatics data as a graph of data objects and data relationships, with the efficiency and robustness of data management and query provided by indexing and generic programming techniques. Here, these invert the role of the index, and make it a first-class citizen in the query language. It is possible to do this in a structured way, allowing ...
View next paperAlgorithms on strings