Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding

Published on Apr 1, 2017in Methods in Ecology and Evolution7.099
· DOI :10.1111/2041-210X.12721
Panu Somervuo15
Estimated H-index: 15
(UH: University of Helsinki),
Douglas W. Yu45
Estimated H-index: 45
(UEA: University of East Anglia)
+ 4 AuthorsOtso Ovaskainen52
Estimated H-index: 52
(UH: University of Helsinki)
Summary A crucial step in the use of DNA markers for biodiversity surveys is the assignment of Linnaean taxonomies (species, genus, etc.) to sequence reads. This allows the use of all the information known based on the taxonomic names. Taxonomic placement of DNA barcoding sequences is inherently probabilistic because DNA sequences contain errors, because there is natural variation among sequences within a species, and because reference data bases are incomplete and can have false annotations. However, most existing bioinformatics methods for taxonomic placement either exclude uncertainty, or quantify it using metrics other than probability. In this paper we evaluate the performance of the recently proposed probabilistic taxonomic placement method PROTAX by applying it to both annotated reference sequence data as well as to unknown environmental data. Our four case studies include contrasting taxonomic groups (fungi, bacteria, mammals and insects), variation in the length and quality of the barcoding sequences (from individually Sanger-sequenced sequences to short Illumina reads), variation in the structures and sizes of the taxonomies (800–130 000 species) and variation in the completeness of the reference data bases (representing 15–100% of known species). Our results demonstrate that PROTAX yields essentially unbiased probabilities of taxonomic placement, which means its quantification of species identification uncertainty is reliable. As expected, the accuracy of taxonomic placement increases with increasing coverage of taxonomic and reference sequence data bases, and with increasing ratio of genetic variation among taxonomic levels over within taxonomic levels. We conclude that reliable species-level identification from environmental samples is still challenging and that neglecting identification uncertainty can lead to spurious inference. A key aim for future research is the completion of taxonomic and reference sequence data bases and making these two types of data compatible.
  • References (33)
  • Citations (32)
📖 Papers frequently viewed together
35 Citations
8,956 Citations
8,412 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Panu Somervuo (UH: University of Helsinki)H-Index: 15
#2Sonja Koskela (UH: University of Helsinki)H-Index: 2
Last. Otso Ovaskainen (UH: University of Helsinki)H-Index: 52
view all 5 authors...
Motivation: When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA...
35 CitationsSource
#1Martijn Staats (WUR: Wageningen University and Research Centre)H-Index: 17
#2Alfred J. Arulandhu (WUR: Wageningen University and Research Centre)H-Index: 6
Last. Esther J. Kok (WUR: Wageningen University and Research Centre)H-Index: 26
view all 8 authors...
Species identification using DNA barcodes has been widely adopted by forensic scientists as an effective molecular tool for tracking adulterations in food and for analysing samples from alleged wildlife crime incidents. DNA barcoding is an approach that involves sequencing of short DNA sequences from standardized regions and comparison to a reference database as a molecular diagnostic tool in species identification. In recent years, remarkable progress has been made towards developing DNA metaba...
64 CitationsSource
#1Helena Wirta (UH: University of Helsinki)H-Index: 14
#2Gergely Várkonyi (SYKE: Finnish Environment Institute)H-Index: 13
Last. Tomas Roslin (UH: University of Helsinki)H-Index: 36
view all 32 authors...
DNA sequences offer powerful tools for describing the members and interactions of natural communities. In this study, we establish the to-date most comprehensive library of DNA barcodes for a terrestrial site, including all known macroscopic animals and vascular plants of an intensively studied area of the High Arctic, the Zackenberg Valley in Northeast Greenland. To demonstrate its utility, we apply the library to identify nearly 20 000 arthropod individuals from two Malaise traps, each operate...
46 CitationsSource
#1Jenni Hultman (UH: University of Helsinki)H-Index: 18
#2Riitta Rahkila (UH: University of Helsinki)H-Index: 7
Last. K. Johanna Björkroth (UH: University of Helsinki)H-Index: 17
view all 5 authors...
ABSTRACT Refrigerated food processing facilities are specific man-made niches likely to harbor cold-tolerant bacteria. To characterize this type of microbiota and study the link between processing plant and product microbiomes, we followed and compared microbiota associated with the raw materials and processing stages of a vacuum-packaged, cooked sausage product affected by a prolonged quality fluctuation with occasional spoilage manifestations during shelf life. A total of 195 samples were subj...
59 CitationsSource
#1Heng Li (Broad Institute)H-Index: 54
#1Li Heng (Broad Institute)H-Index: 1
Summary: BFC is a free, fast and easy-to-use sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm but still maintains a speed comparable to implementations based on greedy methods. In evaluations on real data, BFC appears to correct more errors with fewer overcorrections in comparison to existing tools. It particularly does well in suppressing systematic sequencing errors, which helps to improve the base accuracy of de novo assemblies.
104 CitationsSource
#1Thomas J. Wood (University of Sussex)H-Index: 6
#2John M. Holland (Game & Wildlife Conservation Trust)H-Index: 26
Last. Dave Goulson (University of Sussex)H-Index: 66
view all 3 authors...
Abstract In order to reverse declines in pollinator populations, numerous agri-environment schemes have been implemented across Europe, predominantly focused on increasing the availability of floral resources. Whilst several studies have investigated how bees and wasps (aculeates) respond to management at the scale of the scheme (i.e. within the flower patch) there has been little assessment of how schemes affect diversity at the farm scale. In the current work we assessed whether farms implemen...
52 CitationsSource
#1Johan Pansu (CNRS: Centre national de la recherche scientifique)H-Index: 5
Last. Philippe CholerH-Index: 34
view all 10 authors...
Paleoenvironmental studies are essential to understand biodiversity changes over long timescales and to assess the relative importance of anthropogenic and environmental factors. Sedimentary ancient DNA (sedaDNA) is an emerging tool in the field of paleoecology and has proven to be a complementary approach to the use of pollen and macroremains for investigating past community changes. SedaDNA-based reconstructions of ancient environments often rely on indicator taxa or expert knowledge, but quan...
55 CitationsSource
#1Rupert A. Collins (UFAM: Federal University of Amazonas)H-Index: 10
#2Robert H. Cruickshank (UFAM: Federal University of Amazonas)H-Index: 20
In a recent commentary, Dowton et al. (2014) propose a framework for “next-generation” DNA barcoding, whereby multi-locus datasets are coupled with coalescent-based species delimitation methods to make specimen identifications. They claim single-locus DNA barcoding is outdated, and a multilocus approach superior, with their assertions supported by an analysis of 33 species of Sarcophaga flesh flies. Here, we reanalyse their data and show that a standard DNA barcode analysis is in fact capable of...
30 CitationsSource
#1Claus Rasmussen (AU: Aarhus University)H-Index: 20
#2Yoko L. Dupont (AU: Aarhus University)H-Index: 21
Last. Jens M. Olesen (AU: Aarhus University)H-Index: 22
view all 5 authors...
Most ecological networks are analysed as static structures, where all observed species and links are present simultaneously. However, this is over-simplified, because networks are temporally dynamical. We resolved an arctic, entire-season plant-flower visitor network into a temporal series of 1-day networks and compared the properties with its static equivalent based on data pooled over the entire season. Several properties differed. The nested link pattern in the static network was blurred in t...
42 CitationsSource
To determine microbial community structure, the UPARSE software extracts operational taxonomic unit (OTU) representative sequences with high accuracy on the basis of amplified marker-gene sequences.
4,967 CitationsSource
Cited By32
#1Otso Ovaskainen (UH: University of Helsinki)H-Index: 52
#2Nerea Abrego (UH: University of Helsinki)H-Index: 14
2 Citations
#1Nerea Abrego (UH: University of Helsinki)H-Index: 14
#2Tomas Roslin (UH: University of Helsinki)H-Index: 36
Last. Otso Ovaskainen (UH: University of Helsinki)H-Index: 52
view all 9 authors...
Understanding the role of interspecific interactions in shaping ecological communities is one of the central goals in community ecology. In fungal communities, measuring interspecific interactions directly is challenging because these communities are composed of large numbers of species, many of which are unculturable. An indirect way of assessing the role of interspecific interactions in determining community structure is to identify the species co-occurrences that are not constrained by the en...
#1Georgia M. Nester (Curtin University)
#2Maarten De Brauwer (University of Leeds)H-Index: 5
Last. Michael Bunce (Environmental Protection Authority)H-Index: 61
view all 10 authors...
#1Aitor Ibabe (University of Oviedo)
#2Fernando Rayón (University of Oviedo)
Last. Eva Garcia-Vazquez (University of Oviedo)H-Index: 38
view all 4 authors...
Marine debris is currently a significant source of environmental and economic problems. Floating litter can be employed by marine organisms as a surface to attach to and use as spreading vector. Human activities are promoting the expansion of potentially harmful species into novel ecosystems, endangering autochthonous communities. In this project, more than 1,000 litter items were collected and classified from five beaches eastwards the port of Gijon, in Asturias, Spain. Next generation sequenci...
#1Linett Rasmussen (UCPH: University of Copenhagen)H-Index: 3
#2Christopher J. Barnes (UCPH: University of Copenhagen)H-Index: 7
Last. Anders J. Hansen (UCPH: University of Copenhagen)H-Index: 31
view all 10 authors...
Coral reefs worldwide are rapidly declining due to increasing anthropogenic stressors and environmental changes, with large-scale mortalities of coral reefs observed in many locations across the globe. It has become clear that the microbiome of corals is important in understanding the causes of coral infections, although its exact role is yet to be fully understood. Here, we characterize the bacteria and fungi associated with the non-lesional and lesional (identified by discoloration and tissue ...
#1Chloé Mathieu (AUT: Auckland University of Technology)H-Index: 1
#2Syrie M. Hermans (University of Auckland)H-Index: 4
Last. Hannah L. Buckley (AUT: Auckland University of Technology)H-Index: 23
view all 6 authors...
Environmental DNA (eDNA) is becoming a standard tool in environmental monitoring that aims to quantify spatiotemporal variation for the measurement and prediction of ecosystem change. eDNA surveys have complex workflows encompassing multiple decision-making steps in which uncertainties can accumulate due to field sampling design, molecular biology lab work and bioinformatics analyses. We conducted a quantitative review of studies published prior to December 2017 (n=431) that had sampled eDNA fro...
#1Yinqiu Ji (KIZ: Kunming Institute of Zoology)H-Index: 10
Last. Zhongxing YangH-Index: 1
view all 15 authors...
Environmental DNA (eDNA) has great potential to complement visual surveys, camera trapping, and bioacoustics in measuring biodiversity. We report here a large-scale attempt to use DNA from leech-ingested bloodmeals to estimate vertebrate occupancy at the scale of an entire protected area: the 677 km2 Ailaoshan national-level nature reserve in Yunnan province, southwest China. We contracted 163 park rangers to collect leeches in 172 patrol areas, resulting in 30,468 total leeches, divided over 89...
1 CitationsSource
#1Otso Ovaskainen (UH: University of Helsinki)H-Index: 52
#2Nerea AbregoH-Index: 14
Last. Natalia Ivanova (U of G: University of Guelph)H-Index: 69
view all 15 authors...
The kingdom Fungi is a megadiverse group represented in all ecosystem types. The global diversity and distribution of fungal taxa are poorly known, in part due to the limitations related to traditional fruit-body survey methods. These previous hurdles are now being overcome by rapidly developing DNA-based surveys. Past fungal DNA surveys have predominantly examined soil samples, which capture high species diversity but represent only the local soil community. Recent work has shown that DNA sampl...
1 CitationsSource
#1Ayaka Yamamoto (Tohoku University)H-Index: 1
#2Wataru Makino (Tohoku University)H-Index: 17
Last. Jotaro Urabe (Tohoku University)H-Index: 34
view all 3 authors...
The cladoceran Holopedium gibberum Zaddach, 1855 (Ctenopoda: Holopediidae) was once thought to occur broadly in the northern hemisphere, but its cryptic sister species was recently separated from H. gibberum sensu stricto (s.s.) as a new species, Holopedium glacialis. In East Asia, although “H. gibberum” occurrence has been recorded in many water bodies, the identity of the surveyed populations has rarely been confirmed via molecular analyses. Thus, it is unclear whether it is actually H. gibber...
1 CitationsSource
#1Yuran Dong (NU: Nanjing University)H-Index: 1
#2Tan Li (NU: Nanjing University)
Last. Shucun Sun (NU: Nanjing University)H-Index: 21
view all 4 authors...
Identifying dead insect larvae by morphology is mostly challenging but often required in ecological studies, which entails a molecular method. In this study, we developed a protocol of Cytochrome Oxidase c Subunit I (COI) DNA barcoding that could quickly identify larval tephritid flies in a Tibetan alpine meadow. The protocol includes two major operations. The first is to build up a comprehensive reference library after identifying adults of tephritid flies to species by morphology and by COI DN...