Algorithms for Molecular Biology
Papers 354
1 page of 36 pages (354 results)
#1Sarah Christensen (UIUC: University of Illinois at Urbana–Champaign)H-Index: 1
#2Erin K. Molloy (UIUC: University of Illinois at Urbana–Champaign)H-Index: 8
Last.Tandy Warnow (UIUC: University of Illinois at Urbana–Champaign)H-Index: 52
view all 5 authors...
Motivation Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological process...
#1Nicola Prezza (UniPi: University of Pisa)H-Index: 9
#2Nadia PisantiH-Index: 15
Last.Giovanna Rosone (UniPi: University of Pisa)H-Index: 13
view all 4 authors...
Background Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data.
4 CitationsSource
#1Yohei M. Rosen (NYU: New York University)H-Index: 1
#2Benedict J. Paten (NYU: New York University)
Background Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of indiv...
#1Nikolai Karpov (IU: Indiana University Bloomington)H-Index: 2
#2MalikicSalem (SFU: Simon Fraser University)H-Index: 7
Last.S. Cenk Sahinalp (IU: Indiana University Bloomington)H-Index: 7
view all 4 authors...
We introduce a new dissimilarity measure between a pair of “clonal trees”, each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is d...
2 CitationsSource
#1Elizabeth S. Allman (UAF: University of Alaska Fairbanks)H-Index: 15
#2Hector Baños (UAF: University of Alaska Fairbanks)H-Index: 2
Last.John A. Rhodes (UAF: University of Alaska Fairbanks)H-Index: 12
view all 3 authors...
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justif...
1 CitationsSource
#1Christophe Ambroise (CNRS: Centre national de la recherche scientifique)H-Index: 19
#2Alia DehmanH-Index: 1
Last.Nathalie Vialaneix (University of Toulouse)H-Index: 1
view all 5 authors...
Background Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawb...
2 CitationsSource
#1Alexandre Lemos (IST: Instituto Superior Técnico)
#2Inês Lynce (IST: Instituto Superior Técnico)H-Index: 25
Last.Pedro T. Monteiro (IST: Instituto Superior Técnico)H-Index: 17
view all 3 authors...
Background Boolean models of biological signalling-regulatory networks are increasingly used to formally describe and understand complex biological processes. These models may become inconsistent as new data become available and need to be repaired. In the past, the focus has been shed on the inference of (classes of) models given an interaction network and time-series data sets. However, repair of existing models against new data is still in its infancy, where the process is still manually perf...
#1Lavinia Egidi (University of Eastern Piedmont)H-Index: 8
#2Felipe Alves Louza (USP: University of São Paulo)H-Index: 5
Last.Guilherme P. Telles (State University of Campinas)H-Index: 12
view all 4 authors...
Background Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows–Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM.
9 CitationsSource
#1Qiuyi (Richard) Zhang (University of California, Berkeley)H-Index: 1
#2Satish Rao (University of California, Berkeley)H-Index: 50
Last.Tandy Warnow (UIUC: University of Illinois at Urbana–Champaign)H-Index: 52
view all 3 authors...
Background Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was \(DCM_{NJ},\) published in SODA 2001. The main empirical advantage of \({DCM}_{NJ}\) over other AFC...
1 CitationsSource
#1Leonid Chindelevitch (SFU: Simon Fraser University)H-Index: 12
#2Sean La (SFU: Simon Fraser University)
Last.João Meidanis (State University of Campinas)H-Index: 18
view all 3 authors...
Background The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the ...
Top fields of study
Data mining
Phylogenetic tree
Computer science