Due to diverging human migration histories and geodemographic events taking place over thousands of years, the frequencies of individual genetic markers (allele frequency), as well as the degree to which pairs of markers tend to be inherited together (linkage disequilibrium, or LD), vary across global populations. These population-level genetic characteristics heavily influence genetic risk, prediction models. Unfortunately, there is a massive Eurocentric bias in the field of quantitative human genetics: the vast majority of genetic studies have been performed in individuals of European ancestry even though European-ancestry individuals constitute less than 20% of the global population. Since larger sample sizes directly translate to more accurate estimates of population-level genetic characteristics, current genetic risk prediction models are several times more accurate for European-ancestry individuals compared to other ancestral groups—a critically important issue, especially given that genetic risk prediction is already being implemented in the clinic for some diseases.

Statistical associations between genetic markers and complex diseases tend to replicate across populations—evidence of a shared genetic basis of disease risk. However, genetic risk prediction models that are trained on data from one ancestral group perform significantly worse if directly applied to predict disease risk for other ancestral groups. The population-specificity of these models suggests that a nonnegligible amount of disease risk may be driven by population-specific genetic or environmental factors. This presents several open questions. To what extent do the findings of genetic studies transfer across ancestral groups? Is disease risk modulated by the same genetic variants in different populations? If so, do those variants exert the same effects on disease risk across populations?

Motivated by these questions, we developed a statistical framework, PESCA, that estimates the numbers of causal genetic variants that are unique to versus shared between ancestral populations—both at the genome-wide level and within any genomic annotation of interest (e.g., a set of genes). We applied PESCA to 9 complex traits and diseases for which summary statistics of European- and East Asian-ancestry genetic studies were publicly available (body mass index, mean corpuscular hemoglobin, mean corpuscular volume, HDL and LDL cholesterol, total cholesterol, triglycerides, major depressive disorder, and rheumatoid arthritis). For these traits, we estimate that ~80% of common causal variants (minimum allele frequency of 5% in each population separately) are shared between European- and East Asian-ancestry individuals. In regions with genotype-phenotype statistical associations that are significant in only one population, we observe a 2.8x enrichment of putative shared causal variants relative to the genome-wide background. This suggests that many population-specific genotype-phenotype statistical associations may, in fact, be driven by shared causal variants that, due to differential statistical power and/or heterogeneity in population-level genetic characteristics (allele frequencies and LD patterns), are undetected in the second population. 

We conclude that explicitly correcting for population-specific LD and allele frequencies may improve the transferability of findings from well-powered European-ancestry genetic studies to historically understudied and underrepresented populations. Since the cumulative number of study participants of European ancestry will, unfortunately, continue to be orders of magnitude larger than those of other ancestries for the foreseeable future, computational solutions are needed to help prevent the exacerbation of health disparities in medical genetics.

AJHG publicationShi*, Burch*, et al., Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data, The American Journal of Human Genetics (2020)


PESCA software: https://github.com/huwenboshi/pesca

Media Contact: 

Leticia Ortiz | Marketing & Communications | Building a community around data science in biomedicine