Six-state amino acid recoding is not an effective strategy to offset the effects of compositional heterogeneity and saturation in phylogenetic analyses

Published on 2019in bioRxiv
· DOI :10.1101/729103
Alexandra M. Hernandez (Whitney Laboratory for Marine Bioscience), Joseph F. Ryan22
Estimated H-index: 22
(Whitney Laboratory for Marine Bioscience)
Six-state amino acid recoding strategies are commonly applied to combat the effects of compositional heterogeneity and substitution saturation in phylogenetic analyses. While these methods have been endorsed from a theoretical perspective, their performance has never been extensively tested. Here, we test the effectiveness of 6-state recoding approaches by comparing the performance of analyses on recoded and non-recoded datasets that have been simulated under gradients of compositional heterogeneity or saturation. In all of our simulation analyses, non-recoding approaches greatly outperformed 6-state recoding approaches. Our results suggest that 6-state recoding strategies are not effective in the face of high saturation. Further, while recoding strategies do buffer the effects of compositional heterogeneity, the loss of information that accompanies 6-state recoding outweighs its benefits, even in the most compositionally heterogeneous datasets. In addition, we evaluate recoding schemes with 9, 12, 15, and 18 states and show that these all outperform 6-state recoding. Our results have important implications for the more than 70 published papers that have incorporated 6-state recoding, many of which have significant bearing on relationships across the tree of life.
  • References (94)
  • Citations (0)
#1Christopher E. Laumer (Harvard University)H-Index: 5
#2Rosa Fernández (Harvard University)H-Index: 16
Last.Gonzalo Giribet (Harvard University)H-Index: 60
view all 10 authors...
#1Joanna M. Wolfe (MIT: Massachusetts Institute of Technology)H-Index: 7
#2Jesse W. Breinholt (Florida Museum of Natural History)H-Index: 16
Last.Heather D. Bracken-Grissom (FIU: Florida International University)H-Index: 12
view all 8 authors...
#1Juan E. Uribe (CSIC: Spanish National Research Council)H-Index: 5
#2Iker Irisarri (Uppsala University)H-Index: 12
Last.Rafael Zardoya (CSIC: Spanish National Research Council)H-Index: 54
view all 4 authors...
#1Filipe de Sousa (University of the Algarve)H-Index: 6
#2Peter G. Foster (Natural History Museum)H-Index: 35
Last.Cymon J. Cox (University of the Algarve)H-Index: 34
view all 5 authors...
#1Tauana Junqueira Cunha (Harvard University)H-Index: 1
#2Gonzalo Giribet (Harvard University)H-Index: 60
#1Sarah Lemer (U.O.G.: University of Guam)H-Index: 11
#2Rüdiger Bieler (FMNH: Field Museum of Natural History)H-Index: 16
Last.Gonzalo Giribet (Harvard University)H-Index: 60
view all 3 authors...
View next paperCompositional uncertainty should not be ignored in high-throughput sequencing data analysis