Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations

Published on Jan 15, 2020in bioRxiv
· DOI :10.1101/2020.01.14.905927
Ying Wang1
Estimated H-index: 1
(UQ: University of Queensland),
Ying Wang46
Estimated H-index: 46
(UQ: University of Queensland)
+ 3 AuthorsLoic Yengo43
Estimated H-index: 43
(UQ: University of Queensland)
Polygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.
  • References (51)
  • Citations (2)
📖 Papers frequently viewed together
9 Authors (Huwenbo Shi, ..., Bogdan Pasaniuc)
14 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
El cancer de seno es el cancer mas comun y tambien es la primera causa de muerte por cancer en mujeres a nivel mundial. En Colombia es la primera causa de muerte por cancer en mujeres, de ahi la importancia en la identificacion de genes que puedan estar implicados en el desarrollo y progresion de la enfermedad. En este trabajo se llevo a cabo un estudio de asociacion tipo gen candidato en el gen DEAR1. El gen DEAR1, codifica para un miembro de la subfamilia TRIM de proteinas “finger RING” (TRIM ...
3,735 CitationsSource
#1Max LamH-Index: 10
#2Chia-Yen ChenH-Index: 15
Last. Hailiang Huang (Broad Institute)H-Index: 31
view all 56 authors...
Schizophrenia is a debilitating psychiatric disorder with approximately 1% lifetime risk globally. Large-scale schizophrenia genetic studies have reported primarily on European ancestry samples, potentially missing important biological insights. Here, we report the largest study to date of East Asian participants (22,778 schizophrenia cases and 35,362 controls), identifying 21 genome-wide-significant associations in 19 genetic loci. Common genetic variants that confer risk for schizophrenia have...
13 CitationsSource
#1Jian-ping Guo (UQ: University of Queensland)H-Index: 3
#2Andrew Bakshi (UQ: University of Queensland)H-Index: 1
Last. Jian Yang (WMU: Wenzhou Medical College)H-Index: 113
view all 8 authors...
Genome-wide association studies (GWAS) in samples of European ancestry have identified thousands of genetic variants associated with complex traits in humans. However, it remains largely unclear whether these associations can be used in non-European populations. Here, we seek to quantify the proportion of genetic variation for a complex trait shared between continental populations. We estimated the between-population correlation of genetic effects at all SNPs (rg) or genome-wide significant SNPs...
2 CitationsSource
#1Jian Zeng (UQ: University of Queensland)H-Index: 11
#2Angli Xue (UQ: University of Queensland)H-Index: 6
Last. Jian Yang (WMU: Wenzhou Medical College)H-Index: 97
view all 13 authors...
Understanding how natural selection has shaped the genetic architecture of complex traits and diseases is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level data to estimate multiple features of genetic architecture, including signatures of natural selection. Here, we present an enhanced method (SBayesS) that only requires GWAS summary statistics and incorporates functional genomic annotations. We analysed GWAS data with large sample s...
2 CitationsSource
#1Arun Durvasula (UCLA: University of California, Los Angeles)H-Index: 6
#2Kirk E. Lohmueller (University of California, Berkeley)H-Index: 28
3 CitationsSource
#1Laramie Duncan (Stanford University)H-Index: 22
#2Hanyang Shen (Stanford University)H-Index: 6
Last. Ben Domingue (Stanford University)H-Index: 19
view all 8 authors...
A historical tendency to use European ancestry samples hinders medical genetics research, including the use of polygenic scores, which are individual-level metrics of genetic risk. We analyze the first decade of polygenic scoring studies (2008–2017, inclusive), and find that 67% of studies included exclusively European ancestry participants and another 19% included only East Asian ancestry participants. Only 3.8% of studies were among cohorts of African, Hispanic, or Indigenous peoples. We find ...
5 CitationsSource
#1Genevieve L. Wojcik (Stanford University)H-Index: 11
#2Misa Graff (UNC: University of North Carolina at Chapel Hill)H-Index: 23
Last. Christopher Carlson (Fred Hutchinson Cancer Research Center)H-Index: 49
view all 86 authors...
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1–3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequenc...
28 CitationsSource
#1Amy R. Bentley (NIH: National Institutes of Health)H-Index: 16
#2Yun J. Sung (WashU: Washington University in St. Louis)H-Index: 7
Last. L. Adrienne CupplesH-Index: 137
view all 299 authors...
The concentrations of high- and low-density-lipoprotein cholesterol and triglycerides are influenced by smoking, but it is unknown whether genetic associations with lipids may be modified by smoking. We conducted a multi-ancestry genome-wide gene–smoking interaction study in 133,805 individuals with follow-up in an additional 253,467 individuals. Combined meta-analyses identified 13 new loci associated with lipids, some of which were detected only because association differed by smoking status. ...
7 CitationsSource
#1Yogasudha Veturi (UAB: University of Alabama at Birmingham)H-Index: 3
#2Gustavo de los Campos (MSU: Michigan State University)H-Index: 29
Last. Brigitte KuhnelH-Index: 1
view all 6 authors...
In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified...
1 CitationsSource
#1Alicia R. Martin (Broad Institute)H-Index: 19
#2Masahiro KanaiH-Index: 11
Last. M. J. DalyH-Index: 178
view all 6 authors...
Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation of PRS is that those available today are several times more accurate in individuals of European ancestry than other ancestries. This disparity is an inescapable consequence of Eurocentric biases in genome-wide association studies, thus highlighting that—unlike clinical biomarkers and prescription drugs, which may ind...
54 CitationsSource
Cited By2
#1Sheng Yang (UM: University of Michigan)
#2Xiang Zhou (UM: University of Michigan)H-Index: 23
Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling a...
#1Barbara Domingues Bitarello (UPenn: University of Pennsylvania)
#2Iain Mathieson (UPenn: University of Pennsylvania)H-Index: 21