Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding

Published on Apr 1, 2017in Methods in Ecology and Evolution7.10
· DOI :10.1111/2041-210X.12721
Panu Somervuo13
Estimated H-index: 13
(UH: University of Helsinki),
Douglas W. Yu42
Estimated H-index: 42
(UEA: University of East Anglia)
+ 4 AuthorsOtso Ovaskainen50
Estimated H-index: 50
(UH: University of Helsinki)
Summary A crucial step in the use of DNA markers for biodiversity surveys is the assignment of Linnaean taxonomies (species, genus, etc.) to sequence reads. This allows the use of all the information known based on the taxonomic names. Taxonomic placement of DNA barcoding sequences is inherently probabilistic because DNA sequences contain errors, because there is natural variation among sequences within a species, and because reference data bases are incomplete and can have false annotations. However, most existing bioinformatics methods for taxonomic placement either exclude uncertainty, or quantify it using metrics other than probability. In this paper we evaluate the performance of the recently proposed probabilistic taxonomic placement method PROTAX by applying it to both annotated reference sequence data as well as to unknown environmental data. Our four case studies include contrasting taxonomic groups (fungi, bacteria, mammals and insects), variation in the length and quality of the barcoding sequences (from individually Sanger-sequenced sequences to short Illumina reads), variation in the structures and sizes of the taxonomies (800–130 000 species) and variation in the completeness of the reference data bases (representing 15–100% of known species). Our results demonstrate that PROTAX yields essentially unbiased probabilities of taxonomic placement, which means its quantification of species identification uncertainty is reliable. As expected, the accuracy of taxonomic placement increases with increasing coverage of taxonomic and reference sequence data bases, and with increasing ratio of genetic variation among taxonomic levels over within taxonomic levels. We conclude that reliable species-level identification from environmental samples is still challenging and that neglecting identification uncertainty can lead to spurious inference. A key aim for future research is the completion of taxonomic and reference sequence data bases and making these two types of data compatible.
  • References (32)
  • Citations (23)
#1Panu Somervuo (UH: University of Helsinki)H-Index: 13
#2Sonja Koskela (UH: University of Helsinki)H-Index: 2
Last.Otso Ovaskainen (UH: University of Helsinki)H-Index: 50
view all 5 authors...
27 CitationsSource
#1Martijn Staats (WUR: Wageningen University and Research Centre)H-Index: 15
#2Alfred J. Arulandhu (WUR: Wageningen University and Research Centre)H-Index: 5
Last.Esther J. Kok (WUR: Wageningen University and Research Centre)H-Index: 25
view all 8 authors...
48 CitationsSource
#1Helena Wirta (UH: University of Helsinki)H-Index: 13
#2Gergely Várkonyi (SYKE: Finnish Environment Institute)H-Index: 11
Last.Tomas Roslin (UH: University of Helsinki)H-Index: 35
view all 32 authors...
40 CitationsSource
#1Jenni Hultman (UH: University of Helsinki)H-Index: 17
#2Riitta Rahkila (UH: University of Helsinki)H-Index: 7
Last.K. Johanna Björkroth (UH: University of Helsinki)H-Index: 16
view all 5 authors...
47 CitationsSource
#1Li Heng (Broad Institute)H-Index: 49
83 CitationsSource
#1Thomas J. Wood (University of Sussex)H-Index: 6
#2John M. Holland (Game & Wildlife Conservation Trust)H-Index: 25
Last.Dave Goulson (University of Sussex)H-Index: 64
view all 3 authors...
44 CitationsSource
#1Johan Pansu (CNRS: Centre national de la recherche scientifique)H-Index: 5
Last.Philippe CholerH-Index: 32
view all 10 authors...
44 CitationsSource
#1Rupert A. Collins (UFAM: Federal University of Amazonas)H-Index: 9
#2Robert H. Cruickshank (UFAM: Federal University of Amazonas)H-Index: 20
28 CitationsSource
#1Claus Rasmussen (AU: Aarhus University)H-Index: 19
#2Yoko L. Dupont (AU: Aarhus University)H-Index: 21
Last.Jens M. Olesen (AU: Aarhus University)H-Index: 23
view all 5 authors...
37 CitationsSource
3,847 CitationsSource
Cited By23
#1Alexander M. Piper (La Trobe University)H-Index: 1
#2Jana Batovska (La Trobe University)H-Index: 6
Last.Mark J. BlacketH-Index: 19
view all 7 authors...
#1Ayaka Yamamoto (Tohoku University)
#2Wataru Makino (Tohoku University)H-Index: 17
Last.Jotaro Urabe (Tohoku University)H-Index: 32
view all 3 authors...
#1Jesse F. Abrams (Leibniz Association)H-Index: 3
#2Lisa A. Hörig (Leibniz Association)H-Index: 2
Last.Andreas Wilting (Leibniz Association)H-Index: 18
view all 10 authors...
5 CitationsSource
#1Paul Metzler (U of A: University of Alberta)H-Index: 1
#2Marc La Flèche (U of A: University of Alberta)H-Index: 1
Last.Justine Karst (U of A: University of Alberta)H-Index: 15
view all 3 authors...
1 CitationsSource
#1Jan Axtner (Leibniz Association)H-Index: 9
#2Alex Crampton-Platt (Leibniz Association)H-Index: 3
Last.Andreas Wilting (Leibniz Association)H-Index: 18
view all 7 authors...
3 CitationsSource
#1Matteo Barbato (USYD: University of Sydney)H-Index: 1
#2Toby Kovacs (USYD: University of Sydney)H-Index: 1
Last.Mark de Bruyn (USYD: University of Sydney)H-Index: 17
view all 5 authors...
3 CitationsSource
#1Philip Francis Thomsen (AU: Aarhus University)H-Index: 18
#2Eva Egelyng Sigsgaard (AU: Aarhus University)H-Index: 1
4 CitationsSource
#1Lara Macheriotou (UGent: Ghent University)H-Index: 3
#2Katja Guilini (UGent: Ghent University)H-Index: 14
Last.Sofie Derycke (UGent: Ghent University)H-Index: 23
view all 12 authors...
4 CitationsSource
#1Adam BrunkeH-Index: 2
#2Patrice BouchardH-Index: 15
Last.Mikko PentinsaariH-Index: 1
view all 4 authors...
2 CitationsSource
#1Manuel Zahariev (AAFC: Agriculture and Agri-Food Canada)H-Index: 1
#2Wen ChenH-Index: 6
Last.C. André Lévesque (CFIA: Canadian Food Inspection Agency)H-Index: 1
view all 4 authors...
1 CitationsSource
View next paperUnbiased probabilistic taxonomic classification for DNA barcoding