Icons / Logo / Facebook Created with Sketch. Icons / Logo / Google Created with Sketch. Icons / Logo / ORCID Created with Sketch. Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch. Match!

A method and server for predicting damaging missense mutations

Published on Apr 1, 2010in Nature Methods 28.47
· DOI :10.1038/nmeth0410-248
Ivan Adzhubei15
Estimated H-index: 15
(Harvard University),
Steffen Schmidt27
Estimated H-index: 27
(MPG: Max Planck Society)
+ 5 AuthorsShamil R. Sunyaev64
Estimated H-index: 64
(Harvard University)
To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.
Figures & Tables
  • References (6)
  • Citations (6578)
Published on Nov 28, 2008in PLOS Genetics 5.22
Steffen Schmidt27
Estimated H-index: 27
(MPG: Max Planck Society),
Anna Gerasimova2
Estimated H-index: 2
(UM: University of Michigan)
+ 3 AuthorsShamil R. Sunyaev64
Estimated H-index: 64
(Brigham and Women's Hospital)
Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be prese...
Published on Oct 15, 2008in Bioinformatics 4.53
Yana Bromberg20
Estimated H-index: 20
(SGC: Structural Genomics Consortium),
Guy Yachdav1
Estimated H-index: 1
(SGC: Structural Genomics Consortium),
Burkhard Rost76
Estimated H-index: 76
(SGC: Structural Genomics Consortium)
Summary: Many non-synonymous single nucleotide polymor-phisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated ...
Published on Oct 15, 2006in Bioinformatics 4.53
Emidio Capriotti23
Estimated H-index: 23
(UNIBO: University of Bologna),
Remo Calabrese5
Estimated H-index: 5
(UNIBO: University of Bologna),
Rita Casadio51
Estimated H-index: 51
(UNIBO: University of Bologna)
Motivation: Human single nucleotide polymorphisms (SNPs) are the most frequent type of genetic variation in human population. One of the most important goals of SNP projects is to understand which human genotype variations are related to Mendelian and complex diseases. Great interest is focused on non-synonymous coding SNPs (nsSNPs) that are responsible of protein single point mutation. nsSNPs can be neutral or disease associated. It is known that the mutation of only one residue in a protein se...
Published on Mar 22, 2006in BMC Bioinformatics 2.51
Peng-Fei Yue1
Estimated H-index: 1
(UMD: University of Maryland, College Park),
Eugene Melamud18
Estimated H-index: 18
(UMD: University of Maryland, College Park),
John Moult52
Estimated H-index: 52
(UMBI: University of Maryland Biotechnology Institute)
Background The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level.
Published on Jul 1, 2003in Nucleic Acids Research 11.15
Pauline C. Ng8
Estimated H-index: 8
Steven Henikoff110
Estimated H-index: 110
Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mu...
Published on Sep 1, 2002in Nucleic Acids Research 11.15
Vasily Ramensky19
Estimated H-index: 19
Peer Bork170
Estimated H-index: 170
Shamil R. Sunyaev64
Estimated H-index: 64
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect o...
Cited By6578
Published on Dec 1, 2019in Scientific Reports 4.01
Takashi Higuchi9
Estimated H-index: 9
(University of Tsukuba),
Shomi Oka9
Estimated H-index: 9
(University of Tsukuba)
+ 16 AuthorsHiroshi Kouno5
Estimated H-index: 5
Autoimmune hepatitis (AIH) is an autoimmune liver disease and cirrhosis is sometimes complicated with AIH at diagnosis, influencing its prognosis. TNFAIP3 gene encodes A20, an inhibitor of nuclear factor-κB pathway, and is a susceptibility gene for autoimmune diseases. We investigated deleterious variants in the coding regions of TNFAIP3 gene of Japanese AIH patients or those with cirrhosis. The deleterious variants in the coding regions of TNFAIP3 gene were analyzed by the cycle sequencing meth...
Published on Jul 26, 2019in Scientific Reports 4.01
Carol L. Fischer10
Estimated H-index: 10
Amber M. Bates (UW: University of Wisconsin-Madison)+ 10 AuthorsShireen Vali17
Estimated H-index: 17
Individual computational models of single myeloid, lymphoid, epithelial, and cancer cells were created and combined into multi-cell computational models and used to predict the collective chemokine, cytokine, and cellular biomarker profiles often seen in inflamed or cancerous tissues. Predicted chemokine and cytokine output profiles from multi-cell computational models of gingival epithelial keratinocytes (GE KER), dendritic cells (DC), and helper T lymphocytes (HTL) exposed to lipopolysaccharid...
Published on Jan 28, 2019in BMC Medical Genomics 2.57
Pu Wang2
Estimated H-index: 2
(Peking Union Medical College Hospital),
Yibei Wang2
Estimated H-index: 2
(Peking Union Medical College Hospital)
+ 6 AuthorsXiaowei Chen3
Estimated H-index: 3
(Peking Union Medical College Hospital)
Background Microtia-atresia is characterized by abnormalities of the auricle (microtia) and aplasia or hypoplasia of the external auditory canal, often associated with middle ear abnormalities. To date, no causal genetic mutations or genes have been identified in microtia-atresia patients.
Published on Dec 1, 2019in BMC Neurology 2.23
Sushan Luo7
Estimated H-index: 7
(Fudan University),
Minjie Xu1
Estimated H-index: 1
+ 10 AuthorsJiahong Lu8
Estimated H-index: 8
(Fudan University)
Background Primary periodic paralysis is characterized by recurrent quadriplegia typically associated with abnormal serum potassium levels. The molecular diagnosis of primary PP previously based on Sanger sequencing of hot spots or exon-by-exon screening of the reported genes.
Published on Feb 7, 2019in Scientific Reports 4.01
Hong Sun (SJTU: Shanghai Jiao Tong University), Guangjun Yu (SJTU: Shanghai Jiao Tong University)
Precise classification of non-synonymous single nucleotide variants (SNVs) is a fundamental goal of clinical genetics. Next-generation sequencing technology is effective for establishing the basis of genetic diseases. However, identification of variants that are causal for genetic diseases remains a challenge. We analyzed human non-synonymous SNVs from a multilevel perspective to characterize pathogenicity. We showed that computational tools, though each having its own strength and weakness, ten...
Published on Dec 1, 2019in Nature Communications 11.88
Josepmaria Argemi1
Estimated H-index: 1
(University of Navarra),
M.U. Latasa14
Estimated H-index: 14
(University of Navarra)
+ 37 AuthorsAaron Bell21
Estimated H-index: 21
(University of Pittsburgh)
Alcoholic hepatitis (AH) is a life-threatening condition characterized by profound hepatocellular dysfunction for which targeted treatments are urgently needed. Identification of molecular drivers is hampered by the lack of suitable animal models. By performing RNA sequencing in livers from patients with different phenotypes of alcohol-related liver disease (ALD), we show that development of AH is characterized by defective activity of liver-enriched transcription factors (LETFs). TGFβ1 is a key...
Published in Genome Biology 14.03
Adam Siepel49
Estimated H-index: 49
(CSHL: Cold Spring Harbor Laboratory)
The computer software used for genomic analysis has become a crucial component of the infrastructure for life sciences. However, genomic software is still typically developed in an ad hoc manner, with inadequate funding, and by academic researchers not trained in software development, at substantial costs to the research community. I examine the roots of the incongruity between the importance of and the degree of investment in genomic software, and I suggest several potential remedies for curren...
Published on Dec 1, 2019in Hereditary Cancer in Clinical Practice 1.78
Carolina Cortés1
Estimated H-index: 1
(University of Valle),
Ana Lucía Rivera (University of Valle)+ 4 AuthorsGuillermo Barreto6
Estimated H-index: 6
(University of Valle)
Purpose The main risk factor for familial breast cancer is the presence of mutations in BRCA1 and BRCA2 genes. The prevalence of mutations in these genes is heterogeneous and varies according to geographical origin of studied families. In Colombia mutations in these genes have been mainly studied on patients from Andean region. Bogota and Medellin presented its own battery of mutations. This study aims to identify mutations in BRCA1–2 genes in women with familial breast cancer from different reg...
Published in Nature Communications 11.88
Siming Zhao2
Estimated H-index: 2
(U of C: University of Chicago),
Jun Liu (HHMI: Howard Hughes Medical Institute)+ -3 AuthorsXin He19
Estimated H-index: 19
(U of C: University of Chicago)
Identifying driver genes from somatic mutations is a central problem in cancer biology. Existing methods, however, either lack explicit statistical models, or use models based on simplistic assumptions. Here, we present driverMAPS (Model-based Analysis of Positive Selection), a model-based approach to driver gene identification. This method explicitly models positive selection at the single-base level, as well as highly heterogeneous background mutational processes. In particular, the selection ...
Published on Dec 1, 2019in Gut Pathogens 3.17
J. Todd Kuenstner2
Estimated H-index: 2
(TU: Temple University),
Maher Kali2
Estimated H-index: 2
Christine Welch1
Estimated H-index: 1
A whole exome sequencing study was performed on an extended family including a patient with Crohn’s disease (CD) and a patient with complex regional pain syndrome (CRPS). The patient with CD and the patient with CRPS have experienced resolution of their disease following treatment for paratuberculosis. The study was performed in order to determine if there is an unusual mutation in this extended family that would explain the susceptibility to mycobacterial infection among many of the members. We...