Jiajie Zhang
Heidelberg Institute for Theoretical Studies
Publications 10
#1Paschalia Kapli (Heidelberg Institute for Theoretical Studies)H-Index: 12
#2Sarah Lutteropp (Heidelberg Institute for Theoretical Studies)H-Index: 2
Last.Tomas Flouri (Heidelberg Institute for Theoretical Studies)H-Index: 8
view all 7 authors...
Motivation: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently intr...
119 CitationsSource
#1Tomáš Flouri (Heidelberg Institute for Theoretical Studies)H-Index: 10
#2Jiajie Zhang (Heidelberg Institute for Theoretical Studies)H-Index: 8
Last.Alexandros Stamatakis (Heidelberg Institute for Theoretical Studies)H-Index: 50
view all 5 authors...
Next-Generation Sequencing (NGS) technologies have reshaped the landscape of life sciences. The massive amount of data generated by NGS is rapidly transforming biological research from traditional wet-lab work into a data- intensive analytical discipline (Koboldt et al., Cell 155(1):27–38, 2013). The Illumina “sequencing by synthesis” technique (Mardis, Annu Rev Genomics Hum Genet 9:387–402, 2008) is one of the most popular and widely used NGS technologies.
#1Alexey M. Kozlov (Heidelberg Institute for Theoretical Studies)H-Index: 12
#2Jiajie Zhang (Heidelberg Institute for Theoretical Studies)H-Index: 8
Last.Alexandros Stamatakis (Heidelberg Institute for Theoretical Studies)H-Index: 50
view all 5 authors...
Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the qual...
41 CitationsSource
#1Jiajie Zhang (Heidelberg Institute for Theoretical Studies)H-Index: 8
#2Kassian Kobert (Heidelberg Institute for Theoretical Studies)H-Index: 7
Last.Alexandros Stamatakis (Heidelberg Institute for Theoretical Studies)H-Index: 50
view all 4 authors...
Motivation: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail ...
1,326 CitationsSource
#1Jiajie Zhang (FORTH: Foundation for Research & Technology – Hellas)H-Index: 8
#2Paschalia Kapli (FORTH: Foundation for Research & Technology – Hellas)H-Index: 12
Last.Alexandros Stamatakis (FORTH: Foundation for Research & Technology – Hellas)H-Index: 50
view all 4 authors...
Motivation: Sequence-based methods to delimit species are central to DNA taxonomy, microbial community surveys and DNA metabarcoding studies. Current approaches either rely on simple sequence similarity thresholds (OTU-picking) or on complex and compute-intensive evolutionary models. The OTU-picking methods scale well on large datasets, but the results are highly sensitive to the similarity threshold. Coalescent-based species delimitation approaches often rely on Bayesian statistics and Markov C...
714 CitationsSource
May 1, 2012 in IPDPS (International Parallel and Distributed Processing Symposium)
#1Jiajie Zhang (University of Lübeck)H-Index: 8
#2Alexandros Stamatakis (Exelixis)H-Index: 7
Advances in wet-lab sequencing techniques allow for sequencing between 100 genomes up to 1000 full transcriptomes of species whose evolutionary relationships shall be disentangled by means of phylogenetic analyses. Likelihood-based evolutionary models allow for partitioning such broad phylogenomic datasets, for instance into gene regions, for which likelihood model parameters (except for the tree itself) can be estimated independently. Present day phylogenomic datasets are typically split up int...
13 CitationsSource
#1Suhua ChangH-Index: 14
#2Jiajie ZhangH-Index: 8
Last.Jing WangH-Index: 1
view all 13 authors...
Copyright information: Taken from "Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research"Nucleic Acids Research 2006;35(Database issue):D376-D380.Published online 25 Oct 2006PMCID:PMC1781131.© 2006 The Author(s) Users access the data through Search Engine. Search results can be selectively saved in a personalized WorkSet and subjected to successive data analyses, such as plotting geographical distribution, aligning multiple sequenc...
#1Jiajie Zhang (University of Lübeck)H-Index: 8
#2Amir Madany Mamlouk (University of Lübeck)H-Index: 7
Last.Rolf HilgenfeldH-Index: 41
view all 6 authors...
Background Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the datab...
16 CitationsSource
#1Ximiao He (King Mongkut's University of Technology Thonburi)H-Index: 11
#2Suhua Chang (King Mongkut's University of Technology Thonburi)H-Index: 14
Last.Jing Wang (King Mongkut's University of Technology Thonburi)H-Index: 43
view all 10 authors...
Cancer is ranked as one of the top killers in all human diseases and continues to have a devastating effect on the population around the globe. Current research efforts are aiming to accelerate our understanding of the molecular basis of cancer and develop effective means for cancer diagnostics, treatment and prognosis. An altered pattern of epigenetic modifications, most importantly DNA methylation events, plays a critical role in tumorigenesis through regulating oncogene activation, tumor supp...
89 CitationsSource
#1Suhua Chang (Beijing Institute of Genomics)H-Index: 14
#2Jiajie Zhang (Beijing Institute of Genomics)H-Index: 8
Last.Jing Wang (Beijing Institute of Genomics)H-Index: 43
view all 13 authors...
Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV se...
52 CitationsSource