Hypermutable Non-Synonymous Sites Are under Stronger Negative Selection

Published on Nov 28, 2008in PLOS Genetics5.224
· DOI :10.1371/journal.pgen.1000281
Steffen Schmidt28
Estimated H-index: 28
(MPG: Max Planck Society),
Anna Gerasimova12
Estimated H-index: 12
(UM: University of Michigan)
+ 3 AuthorsShamil R. Sunyaev67
Estimated H-index: 67
(Brigham and Women's Hospital)
Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be preserved by negative selection at important sites but destroyed by mutation at sites under no selection. Thus, there may be a positive correlation between the rate of mutations at a nucleotide site and the magnitude of their effect on fitness. We studied the impact of CpG context on the rate of human–chimpanzee divergence and on intrahuman nucleotide diversity at non-synonymous coding sites. We compared nucleotides that occupy identical positions within codons of identical amino acids and only differ by being within versus outside CpG context. Nucleotides within CpG context are under a stronger negative selection, as revealed by their lower, proportionally to the mutation rate, rate of evolution and nucleotide diversity. In particular, the probability of fixation of a non-synonymous transition at a CpG site is two times lower than at a CpG site. Thus, sites with different mutation rates are not necessarily selectively equivalent. This suggests that the mutation rate may complement sequence conservation as a characteristic predictive of functional importance of nucleotide sites.
  • References (42)
  • Citations (25)
📖 Papers frequently viewed together
279 Citations
48 Citations
25 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Fredric Marc WolfH-Index: 1
Meta-Analysis and Synthesizing Research Combined Tests Measures of Effect Size Examining and Reducing Bias Nonparametric Methods Summary and Conclusions
1,224 Citations
#1Adam R. Boyko (Cornell University)H-Index: 34
#2Scott Williamson (Cornell University)H-Index: 24
Last. Carlos Bustamante (Cornell University)H-Index: 139
view all 14 authors...
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20...
512 CitationsSource
#1Sol KatzmanH-Index: 24
#2Andrew D. KernH-Index: 25
Last. David HausslerH-Index: 145
view all 8 authors...
Ultraconserved elements in the human genome are defined as stretches of at least 200 base pairs of DNA that match identically with corresponding regions in the mouse and rat genomes. Most ultraconserved elements are noncoding and have been evolutionarily conserved since mammal and bird ancestors diverged over 300 million years ago. The reason for this extreme conservation remains a mystery. It has been speculated that they are mutational cold spots or regions where every site is under weak but s...
183 CitationsSource
#1Saurabh Asthana (Harvard University)H-Index: 14
#2William Stafford Noble (UW: University of Washington)H-Index: 76
Last. John A. Stamatoyannopoulos (UW: University of Washington)H-Index: 80
view all 6 authors...
It is widely assumed that human noncoding sequences comprise a substantial reservoir for functional variants impacting gene regulation and other chromosomal processes. Evolutionarily conserved noncoding sequences (CNSs) in the human genome have attracted considerable attention for their potential to simplify the search for functional elements and phenotypically important human alleles. A major outstanding question is whether functionally significant human noncoding variation is concentrated in C...
81 CitationsSource
#1Gregory V. Kryukov (Brigham and Women's Hospital)H-Index: 50
#2Len A. Pennacchio (LBNL: Lawrence Berkeley National Laboratory)H-Index: 74
Last. Shamil R. Sunyaev (Brigham and Women's Hospital)H-Index: 67
view all 3 authors...
The accumulation of mildly deleterious missense mutations in individual human genomes has been proposed to be a genetic basis for complex diseases. The plausibility of this hypothesis depends on quantitative estimates of the prevalence of mildly deleterious de novo mutations and polymorphic variants in humans and on the intensity of selective pressure against them. We combined analysis of mutations causing human Mendelian diseases, of human-chimpanzee divergence, and of systematic data on human ...
472 CitationsSource
#1Christina T. L. Chen (WashU: Washington University in St. Louis)H-Index: 7
#2Jen C. Wang (WashU: Washington University in St. Louis)H-Index: 35
Last. Barak A. Cohen (WashU: Washington University in St. Louis)H-Index: 27
view all 3 authors...
Ultraconserved elements are stretches of consecutive nucleotides that are perfectly conserved in multiple mammalian genomes. Although these sequences are identical in the reference human, mouse, and rat genomes, we identified numerous polymorphisms within these regions in the human population. To determine whether polymorphisms in ultraconserved elements affect fitness, we genotyped unrelated human DNA samples at loci within these sequences. For all single-nucleotide polymorphisms tested in ultr...
53 CitationsSource
#1Sankar Subramanian (Massey University)H-Index: 20
#2Sudhir Kumar (ASU: Arizona State University)H-Index: 65
Background Amino acid mutations in a large number of human proteins are known to be associated with heritable genetic disease. These disease-associated mutations (DAMs) are known to occur predominantly in positions essential to the structure and function of the proteins. Here, we examine how the relative perpetuation and conservation of amino acid positions modulate the genome-wide patterns of 8,627 human disease-associated mutations (DAMs) reported in 541 genes. We compare these patterns with 5...
53 CitationsSource
#1Christine P. Bird (Wellcome Trust Sanger Institute)H-Index: 20
#2Barbara Elaine Stranger (Wellcome Trust Sanger Institute)H-Index: 43
Last. Emmanouil T. Dermitzakis (Wellcome Trust Sanger Institute)H-Index: 84
view all 3 authors...
The focus of large genomic studies has shifted from only looking at genes and protein-coding sequences to exploring the full set of elements in each genome. The explosion of comparative sequencing data has led to an increase in methodologies, approaches and ideas on how to analyze the unknown fraction of the genome, namely the non-protein-coding fraction. The main issues relate to the discovery, evolutionary analysis and natural variation of non-coding DNA, and the parameters that prevent us fro...
43 CitationsSource
#1Pauline C. Ng (Fred Hutchinson Cancer Research Center)H-Index: 11
#2Steven Henikoff (Fred Hutchinson Cancer Research Center)H-Index: 55
Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding proteins. Because nsSNPs can affect protein function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect protein function from those that are functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction meth...
720 CitationsSource
#1Sankar Subramanian (ASU: Arizona State University)H-Index: 20
#2Sudhir Kumar (ASU: Arizona State University)H-Index: 65
25 CitationsSource
Cited By25
#1Bryan Thornlow (UCSC: University of California, Santa Cruz)H-Index: 2
#2Josh Hough (UCSC: University of California, Santa Cruz)H-Index: 7
Last. Russell B. Corbett-Detig (UCSC: University of California, Santa Cruz)H-Index: 16
view all 6 authors...
Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in...
3 CitationsSource
#1Laura Pranckėnienė (Vilnius University)H-Index: 1
#2Audronė Jakaitienė (Vilnius University)H-Index: 3
Last. Vaidutis Kučinskas (Vilnius University)H-Index: 25
view all 5 authors...
In the last decade, one of the biggest challenges in genomics research has been to distinguish definitive pathogenic variants from all likely pathogenic variants identified by next-generation sequencing. This task is particularly complex because of our lack of knowledge regarding overall genome variation and pathogenicity of the variants. Therefore, obtaining sufficient information about genome variants in the general population is necessary as such data could be used for the interpretation of d...
1 CitationsSource
#1Pádraic Corcoran (University of Sheffield)H-Index: 9
#2Toni I. Gossmann (University of Sheffield)H-Index: 12
Last. Kai Zeng (University of Sheffield)H-Index: 20
view all 5 authors...
Population genetic theory predicts that selection should be more effective when the effective population size (Ne) is larger, and that the efficacy of selection should correlate positively with recombination rate. Here, we analyzed the genomes of ten great tits and ten zebra finches. Nucleotide diversity at 4-fold degenerate sites indicates that zebra finches have a 2.83-fold larger Ne. We obtained clear evidence that purifying selection is more effective in zebra finches. The proportion of subs...
13 CitationsSource
#2Evgeny ChekalinH-Index: 3
Last. Nickolai AlexandrovH-Index: 22
view all 7 authors...
12 CitationsSource
#1Yaroslav A. Kainov ('KCL': King's College London)H-Index: 3
#2Vasily N. Aushev (ISMMS: Icahn School of Medicine at Mount Sinai)H-Index: 3
Last. Georgii A. BazykinH-Index: 19
view all 5 authors...
Polyadenylation is a step of mRNA processing which is crucial for its expression and stability. The major polyadenylation signal (PAS) represents a nucleotide hexamer that adheres to the AATAAA consensus sequence. Over a half of human genes have multiple cleavage and polyadenylation sites, resulting in a great diversity of transcripts differing in function, stability, and translational activity. Here, we use available whole-genome human polymorphism data together with data on interspecies diverg...
3 CitationsSource
#1Alexander Y. Panchin (RAS: Russian Academy of Sciences)H-Index: 6
#2Vsevolod J. Makeev (MIPT: Moscow Institute of Physics and Technology)H-Index: 25
Last. Yulia A. Medvedeva (RAS: Russian Academy of Sciences)H-Index: 15
view all 3 authors...
CpG dinucleotides are extensively underrepresented in mammalian genomes. It is widely accepted that genome-wide CpG depletion is predominantly caused by an elevated CpG > TpG mutation rate due to frequent cytosine methylation in the CpG context. Meanwhile the CpG content in genomic regions called CpG islands (CGIs) is noticeably higher. This observation is usually explained by lower CpG > TpG substitution rates within CGIs due to reduced cytosine methylation levels. By combining genome-wide data...
7 CitationsSource
#1Li Xu (University of Texas MD Anderson Cancer Center)H-Index: 15
#2Hongwei Tang (University of Texas MD Anderson Cancer Center)H-Index: 3
Last. Erich M. Sturgis (University of Texas MD Anderson Cancer Center)H-Index: 49
view all 6 authors...
BACKGROUND Salivary gland carcinomas (SGCs) are a rare malignancy with unknown etiology. The objective of the current study was to identify genetic variants modifying the risk of SGC and its major subtypes: adenoid cystic carcinoma and mucoepidermoid carcinoma. METHODS The authors conducted a genome-wide association study in 309 well-defined SGC cases and 535 cancer-free controls. A single-nucleotide polymorphism (SNP)-level discovery study was performed in non-Hispanic white individuals followe...
8 CitationsSource
#1Laurent C. Francioli (UU: Utrecht University)H-Index: 15
#2Paz Polak (Harvard University)H-Index: 25
Last. Shamil R. Sunyaev (Harvard University)H-Index: 67
view all 18 authors...
Shamil Sunyaev, Paul de Bakker and colleagues report an analysis of 11,020 de novo mutations from the whole-genome sequences of Dutch families sequenced as part of the Genome of the Netherlands project. They identify correlations related to paternal age and genic content and develop an empirical human mutation rate map.
203 CitationsSource
#1Martin A. M. ReijnsH-Index: 19
#2Harriet KempH-Index: 1
Last. Martin S. TaylorH-Index: 41
view all 6 authors...
The emRiboSeq sequencing method is used to track polymerase activity genome-wide in vivo; despite Okazaki fragment processing, DNA synthesized by error-prone polymerase-α (Pol-α) is retained in vivo and comprises ∼1.5% of the genome, establishing Pol-α as an important source of genomic variability and providing a mechanism for site-specific variation in nucleotide substitution rates.
114 CitationsSource
Recent analyses of cancer genomes have revealed the occurrence of mutation patterns, which indicate their source. This Review discusses what we have learned, and what is yet to learn, from these data and how our current understanding of cancer mutations fits into our understanding of tumorigenesis and tumour progression.
194 CitationsSource