scinapse is loading now...

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

Published on May 28, 1997
Dan Gusfield39
Estimated H-index: 39
(University of California, Davis)
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suffix trees Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology 11. Core string edits, alignments and dynamic programming 12. Refining core string edits and alignments 13. Extending the core problems 14. Multiple string comparison: the Holy Grail 15. Sequence database and their uses: the motherlode Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings 17. Strings and evolutionary trees 18. Three short topics 19. Models of genome-level mutations.
  • References (0)
  • Citations (3195)
Cited By3195
Published on Dec 1, 2019in BMC Bioinformatics 2.21
Source Cite
Published on Jan 1, 2018
Lavinia Egidi7
Estimated H-index: 7
(University of Eastern Piedmont),
Felipe Alves da Louza4
Estimated H-index: 4
(University of São Paulo)
+ 1 AuthorsGuilherme P. Telles10
Estimated H-index: 10
(State University of Campinas)
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our al...
2 Citations Source Cite
Published on Jun 9, 2019 in International Conference on Machine Learning
Hongyuan Mei (Johns Hopkins University), Guanghui Qin (Peking University), Jason Eisner33
Estimated H-index: 33
(Johns Hopkins University)
Events in the world may be caused by other, unobserved events. We consider sequences of events in continuous time. Given a probability model of complete sequences, we propose particle smoothing---a form of sequential importance sampling---to impute the missing events in an incomplete sequence. We develop a trainable family of proposal distributions based on a type of bidirectional continuous-time LSTM: Bidirectionality lets the proposals condition on future observations, not just on the past as ...
Published on Jun 1, 2019in Algorithmica 0.67
Tomasz Kociumaka11
Estimated H-index: 11
(University of Warsaw),
Jakub Radoszewski12
Estimated H-index: 12
(University of Warsaw),
Tatiana A. Starikovskaya8
Estimated H-index: 8
In the longest common substring problem, we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one character. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention i...
Source Cite
Published on May 1, 2019in Journal of Computer and System Sciences 1.50
Paweł Gawrychowski14
Estimated H-index: 14
(University of Wrocław),
Florin Manea14
Estimated H-index: 14
(University of Kiel)
+ 1 AuthorsDirk Nowotka11
Estimated H-index: 11
(University of Kiel)
Abstract Pseudo-repetitions are a natural generalization of the classical notion of repetitions in sequences: they are the repeated concatenation of a word and its encoding under a certain morphism or antimorphism. Thus, such occurrences can be regarded as hidden repetitive structures of, or within, a word. We solve fundamental algorithmic questions on pseudo-repetitions by application of insightful combinatorial results on words. More precisely, we efficiently decide whether a word is a pseudo-...
Source Cite
Published on May 1, 2019in Theoretical Computer Science 0.77
George Manoussakis (Ben-Gurion University of the Negev)
Abstract Given a graph, we are interested in the question of finding all its maximal cliques. This question models the community detection problem and has been extensively studied. Here, we approach it under the light of an important graph parameter. The degeneracy of a graph G is the smallest integer k such that every subgraph of G contains a vertex of degree at most k . Using a new decomposition technique, we present two output sensitive algorithms for the maximal clique enumeration problem. T...
Source Cite
Published on Apr 22, 2019in bioRxiv
Nicolas C. Rochette1
Estimated H-index: 1
(University of Illinois at Urbana–Champaign),
Angel G Rivera-Colón (University of Illinois at Urbana–Champaign), Julian Michael Catchen26
Estimated H-index: 26
(University of Illinois at Urbana–Champaign)
For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads an...
Source Cite
Published on Apr 20, 2019in Bioinformatics 5.48
Fabio Cunial2
Estimated H-index: 2
(Max Planck Society),
Jarno Alanko1
Estimated H-index: 1
(University of Helsinki),
Djamal Belazzougui16
Estimated H-index: 16
Motivation: Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. Results: We provide practical, versatile representations of variable-order Markov models and of interpolat...
Source Cite
Published on Apr 8, 2019
Leo Mršić2
Estimated H-index: 2
Paper is based on research based on linguistic terms denoting an address used for delivery services in logistic. Distance measurement NLP methods are widely usable in text mining and can be used to find the similarity among sentence or document. As part of logistics process, being able to determine correct address using machine learning we need to tackle issue of two addresses comparison (street name, city name etc.) is crucial for efficient service. This paper explains comparison techniques bas...
Source Cite