Introduction

As the cost of sequencing decreases, the clinical utility of whole-genome sequencing (WGS) is currently undergoing intensive investigation as a tool for precise diagnosis, risk prediction, and therapeutic guidance1; WGS is also undergoing evaluation from ethical and legal perspectives.2,3 The MedSeq Project is a randomized clinical trial studying the integration of WGS into clinical care in two specific contexts4: patients from a specialty clinic with a focus on Mendelian forms of inherited cardiomyopathy and patients from a primary-care practice. In each of these clinical settings, pathogenic variants in known Mendelian disease genes, loss-of-function variants in disease-associated genes across the genome, and other actionable variations, including alleles of pharmacogenetic importance, are the major focus of the whole-genome report. However, one of the advantages of WGS over whole-exome sequencing is that the former provides genomic variants in intronic and other noncoding regions, where the majority of alleles associated with common diseases reside.5,6 As a result, WGS also has the potential, when interpreted in the context of rigorous population data, to enable the efficient estimation of genetic liability for common complex diseases as well as the discovery of possible modifier effects on rare alleles of larger effect size.

One of the most interesting and relevant questions for WGS reporting is in regard to how to define and present data on common alleles associated with increased or decreased risk for certain diseases,7 particularly those with potential therapeutic implications.8 Several approaches might be used to estimate the composite risk for a given trait and to allow its communication to general clinicians. Risk alleles are typically discovered using a case–control design in which the frequency of each allele in cases is compared with that in controls. An allele observed at a higher frequency in cases is considered to be a risk allele and represents a marker for all adjacent variants in linkage disequilibrium.9 Conversely, an allele with a lower frequency in cases is sometimes reported as a “protective” allele. However, the very same allele in different populations may represent distinct risk haplotypes, whereas case and control definitions are also necessarily imperfect. Thus, without longitudinal cohort studies, it may be difficult to establish the clinical validity and clinical utility of common alleles. In this context, in the current study, we focused on the single-nucleotide polymorphisms (SNPs) more frequently found in cases from genome-wide association studies (GWASs) as listed in the National Human Genome Research Institute GWAS catalog.10

An intuitive approach to combine information from several genetic tests is to multiply likelihood ratios with pretest odds of population-specific lifetime disease risk estimates.11,12 However, for the majority of risk alleles, objective likelihood ratios are not available. Polygenic risk scores (PRSs) have been proposed by several investigators7,13,14,15,16,17 to combine multiple risk alleles, including those that fail to attain genome-wide significance in association studies, on the basis that there may be genetic epistasis, interaction with environmental factors, or aggregate effects that can be captured.18 To this end, a multiplicative model including seven risk alleles for breast cancer was proposed for risk stratification.17 Aggregating the information from a larger number of subthreshold risk alleles has also been used, testing the classic models of polygenic inheritance.13,16 These studies highlighted the possibility of using polygenic scores in the context of conditioning nongenetic clinical information, although the performances of such PRSs were inconsistent across different diseases.19,20

Although the prediction of disease risk based solely on genotype is not currently standard of care in medical practice, it may soon be useful for patients and clinicians to know whether a patient presents a high-risk genomic profile for a specific trait or disease as compared with the population norm.21,22 This may be the case even when there are no robust independent data regarding the clinical utility of genetic predictors, given the known role of multiple subjective variables in situations of clinical equipoise. Here, we summarize multiple risk alleles by calculating a normalized PRS using a population-scale WGS data set from the 1000 Genomes Project (1KGP).23 Our approach demonstrates how complex trait risk variants from individual genomes can be efficiently summarized and reported in a clinical context, highlighting the uncertainties of interpretation while facilitating the use of the available information in clinical decision making.

Materials and Methods

Risk alleles

The National Human Genome Research Institute GWAS catalog (http://www.genome.gov/admin/gwascatalog.txt) was downloaded on 12 March 2013.10 The catalog contained a total of 9,785 records corresponding to 8,384 risk alleles. We used a series of filtering steps to retain only informative SNPs for the PRS estimates as detailed in Supplementary Figure S1 online. The excluded SNPs for each filtering step can be found at the second to the rightmost column—“Filtering Status”—of Supplementary Table S1 online. For the risk alleles with odds ratios (ORs) <1, we followed the GWAS catalog’s inversion of ORs using the alternative alleles as risk alleles. A total of 1,565 risk alleles for 182 traits met our filtering criteria (Supplementary Table S1 online).

To test our approach to the reporting of common allele variations in the MedSeq Project, we selected eight binary phenotypes—abdominal aortic aneurysm, atrial fibrillation, coronary heart disease (CHD), type 2 diabetes (T2D), hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation—that are factors frequently weighed in decision making in both primary-care and cardiology subspecialty settings. Quantitative phenotypes were not included because of the inconsistency in phenotype measures and descriptions between studies. A total of 161 risk alleles were then incorporated into PRS estimates for the eight selected phenotypes.

Calculating polygenic risk scores

Several approaches to polygenic risk scoring exist, the majority summing all risk alleles present in an individual genome and assigning allele-specific weighting. The simplest method is to treat all risk alleles equally, that is, an allele counting method in which the weight equals 1.20 Alternatively, observed effect sizes can be used to weight each risk allele differently.13,16 We calculated a multiplicative PRS (MPRS) as detailed in the Supplementary Materials and Methods online. Briefly, the MPRS for each phenotype was calculated as the product of ORs. Thus, log (MPRS) is equivalent to the OR-weighted sum of risk allele counts.20 The population attribution risk (PAR) method integrates population allele frequency (AF) and OR.15 A single SNP PAR was estimated as AFi (ORi − 1)/(AFi × (ORi − 1) + 1), in which AFi is the prevalence of the risk allele at the ith locus in the control population, and ORi is the OR of the risk allele at the ith locus. The multi-SNP PAR was calculated on the basis of the single SNP PAR for each associated SNP: 1 − Π(1 − PARi), in which PARi is the single SNP PAR for the ith locus. The raw scores from counting, and the MPRS and PAR methods were normalized using the median score of the European (EUR) genotypes (N = 392) in the 1KGP, and the ranks of the individual’s score are reported as deciles.

Testing the performance of the MPRS with a GWAS data set

To compare the distribution of polygenic scores between cases and controls, we used the Wellcome Trust Case Control Consortium (WTCCC) phase I data set, which genotyped 16,179 individuals with the Affymetrix GeneChip Human Mapping 500K arrays.24 The details of the WTCCC data set are described in the Supplementary Materials and Methods online. We selected the subset of risk alleles represented on the Affymetrix 500K arrays to calculate the MPRS and performed the analysis after excluding those risk alleles that were originally reported with the WTCCC data set.24 Genotype imputation was not performed because the estimated 5–6% imputation error rate25 might result in significant changes in the MPRS decile (see Results). The MPRS percentile for each individual was calculated for each trait against 2,938 controls. As noted, for SNPs in linkage disequilibrium (r2 > 0.5), we chose the allele with the largest effect size. The SNPs in the major histocompatibility complex region of chromosome 6—rs6458307, rs9469220, rs615672, rs6457617, rs9272346, and rs9465871—were excluded when calculating the MPRS for Crohn disease (CD), type 1 diabetes (T1D), and rheumatoid arthritis.

Results

Correlation between different polygenic scoring methods

The numbers of reported risk alleles per trait skewed to the right because a small number of traits were associated with a majority of risk alleles. Risk alleles for multiple sclerosis (n = 105), CD (n = 95), T2D (n = 77), ulcerative colitis (n = 64), and CHD (n = 62) constituted 25.7% of 1,565 alleles. Forty-three traits were associated with a single reported risk allele. The median OR was 1.25 (interquartile range: 1.15–1.45), and 461 risk alleles exhibited ORs of more than 1.45. The majority of risk alleles were found in non–protein coding regions (91.0% of 1,565): 55.7% (872/1,565) lie within intergenic regions whereas 553 (35.3%) are intronic. A total of 103 (6.6%) risk alleles were found in coding regions, and 14 and 23 were mapped to the 5′-UTR and 3′-UTR regions, respectively. The AFs ranged from 0.011 to 0.983, with an average of 0.422. Risk AFs were not listed for 265 loci in the original discovery studies.

We compared the three methods for combining risk alleles: counting, MPRS, and the multi-SNP PAR outlined in the Methods section. For each individual in the 1KGP EUR population (N = 379), we calculated polygenic scores for eight cardiac phenotypes: abdominal aortic aneurysm, atrial fibrillation, CHD, T2D, hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation. The scores from three methods showed significant positive correlations for all eight traits (Kendall’s tau, P < 2.2 × 10−16; Supplementary Table S2 online); however, the counting method when used with small numbers of risk alleles yielded nonunique scores in 379 EUR individuals (Supplementary Figure S2 online).

To check whether the subgroups at highest genetic risk—i.e., those within the 10th decile—could be consistently defined by different summary methods, we selected two common complex traits—CHD and T2D, which had 62 and 77 risk alleles, respectively, that met our filtering criteria. The percentile rank of each individual was calculated using all three methods, and decile ranks were compared between polygenic scoring approaches. The three methods showed significant positive correlations overall ( Figure 1 and Supplementary Table S2 online), with the correlation between MPRS and PAR being the highest (Kendall’s tau = 0.7229 ( Figure 1c ) and 0.6928 ( Figure 1g ) for CHD and T2D, respectively). However, identifying subgroups within the 10th decile varied significantly by the summary method used. The concordance rate for the 10th decile in CHD PRS was 49% between the MPRS and PAR methods ( Figure 1d ). Among 38 individuals in the 10th decile as ascertained by counting CHD risk alleles, 23 and 16 were in the 10th decile as ascertained by the MPRS and PAR methods, respectively ( Figure 1d ). Similarly, 25 individuals were in the 10th decile by as ascertained counting and PAR for T2D, and 22 were in the 10th decile as ascertained by MPRS and PAR ( Figure 1h ).

Figure 1
figure 1

Comparison of polygenic score calculation methods. Using the risk alleles and allele frequencies reported in the GWAS catalog, we calculated polygenic scores for 379 individuals of the 1000 Genomes Project European cohort. We counted the number of risk alleles in an individual—counting method—and compared with the multiplicative polygenic risk score (MPRS) and multiple single-nucleotide polymorphism (SNP) population attribution risk (PAR) using odd ratios (ORs) and ORs with risk allele frequency, respectively. Red circles represent the individuals in the same decile according to MPRS and PAR. The resulting decile of the counting method was different from those from MPRS and PAR, although they were significantly correlated (c and g). The results for coronary heart disease (60 risk alleles, a–c) and type 2 diabetes (70 risk alleles, e–g) showed the same trend. Venn diagrams show the agreement between polygenic scoring methods for the individuals in the 10th deciles by three methods (d and h). GWAS, genome-wide association study.

PAR provides more intuitive interpretation of genetic risk by combining AF and effect size. However, the prevalence of some risk alleles varies widely across ethnic groups, as indeed may the risk associated with individual alleles. If the AF in the discovery population deviates from the population mean or if the data are from individuals of different ethnic background than those in the original study, then there may be large effects on the estimated PAR. Thus, at present, the validity of PAR is limited for many traits. The validity of a counting method is also limited due to nonunique scores for the traits with fewer risk alleles (Supplementary Figure S2a,c,e,f online). Therefore, we chose the normalized MPRS for further evaluation.

There were also significant differences in MPRS distributions among the four ethnic groups. We compared the distribution of the MPRS for each phenotype between ethnic groups using one-way analysis of variance followed by post hoc tests. With the reported risk alleles, 168 of 182 traits analyzed showed significant differences between ethnic groups (Bonferroni corrected analysis of variance P < 0.01; Supplementary Table S3 online), reinforcing the widely held notion that an individual’s polygenic scores can be rigorously interpreted only in the context of the matched ethnic background.

Performance of polygenic scores with a case–control data set

To check the distribution of the MPRS in cases as compared with that of controls, we used the WTCCC phase I data set.24 We calculated an MPRS for each individual for seven diseases and two control groups, excluding the risk alleles originally reported for the WTCCC data set ( Table 1 ). The five hypertension risk alleles in the GWAS catalog were not sufficient to rank all cases and controls because of tied scores; otherwise, the distributions of the MPRS for six diseases showed significant differences between cases and controls (Tukey’s honestly significant difference (HSD), all P < 0.001 for cases as compared with controls). For all phenotypes, there was no significant difference of MPRS distributions between the 1958 British Birth Cohort and the UK Blood Services cohort ( Figure 2 ). Validating a single risk allele with an independently collected data set often produces inconsistent results26; however, our polygenic score approach successfully identified the overrepresentation of independently discovered risk alleles in cases.

Table 1 Predictive value of high-risk group defined by the 10th decile of the polygenic score
Figure 2
figure 2

Distribution of polygenic scores in a case–control data set. The Wellcome Trust Case Control Consortium (WTCCC) phase I data set (N = 16,179 individuals) consisted of two control groups—the 1958 British Birth Cohort (58BC) and common controls recruited from the UK Blood Services (NBS)—and six disease groups: Crohn disease (CD), bipolar disorder (BD), coronary heart disease (CHD), type 1 diabetes (T1D), type 2 diabetes (T2D), and rheumatoid arthritis (RA). We compared the multiplicative polygenic risk score (MPRS) distributions between cases and controls, except for the hypertension group because of the small number of risk alleles (see Table 1). For all phenotypes, no significant difference was found between 58BC and NBS, and the mean MPRS of case groups was significantly higher as compared with the two control groups (Tukey’s honestly significant difference P values < 0.001 for all case versus control groups).

Polygenic scores for each phenotype were sorted into 10 bins in the control group, and the score decile of each case was then determined using the score range of 1st to 10th decile in controls. Each bin had ~294 control individuals and different numbers of cases according to the MPRS. As expected, we observed a significant overrepresentation of cases as compared with controls in upper deciles (Supplementary Figure S3 online). For the patients with CD, 27.3% were in the 10th decile as compared with 2.35% in the 1st decile, which resulted in the relative risk of 1.91 in this data set. However, the positive predictive value for those individuals in the 10th decile was 0.044% using the upper-bound CD prevalence of 16/100,000.27 Positive predictive value increased with the prevalence of the trait, as summarized in Table 1 , and was as high as 12.4% for T2D. Given the relatively low narrow-sense heritability of 0.05–0.10 for T2D,28 the clinical validity of analyzing common risk alleles for unsegmented common diseases is likely to be limited.29 We also measured the performance of polygenic scores using the area under the receiver operating characteristics curve (AUC). Except for CD (AUC 0.704), overall performance of polygenic scores for diseases was poor (AUCs 0.592 (bipolar disorder), 0.622 (coronary artery disease), 0.595 (T2D), 0.604 (T1D), and 0.614 (rheumatoid arthritis); Supplementary Figure S4 online).

Stability of summary method with fewer risk alleles

In light of potential inaccuracy in genotyping, we checked the stability of the MPRS rank of an individual in a population by comparing the original decile using all reported risk alleles with the deciles recalculated using smaller numbers of randomly selected risk alleles. A total of 111 risk alleles were reported for T2D ( Table 1 ), and we randomly selected n risk alleles to recalculate the MPRS and the relevant decile. For the individuals in the 10th decile with all 111 alleles, we traced the change of decile ranks with random exclusion of n risk alleles from 1 to 56 (Supplementary Figure S5 online). This procedure was repeated 100 times for each n, and the mean decile was plotted. Excluding 20% of risk alleles (blue dotted line in Supplementary Figure S5 online) did not result in a change of classification by more than two deciles on average; however, 25% of instances were equal to or less than the 9th decile ( Table 2 ). With 50% of risk alleles, only 56.8% were in the 10th decile. For other phenotypes with small numbers of risk alleles, excluding a single risk allele could change scores from the highest decile to lower deciles or vice versa.

Table 2 Stability of polygenic risk scores with fewer risk alleles

Summarizing cardiac risk alleles in the clinical context

To summarize polygenic relative risks from known risk alleles for general clinicians and patients, we prepared a report on cardiovascular disease risk from common genetic variation as a part of a Cardiac Supplement to our Genome Report in the MedSeq Project.4 The reports include the disease prevalence and narrow-sense heritability in conjunction with an estimated MPRS for a limited number of common cardiac traits of relevance for decision support in primary prevention and in specialist care of inherited heart disease. For eight traits (abdominal aortic aneurysm, atrial fibrillation, CHD, T2D, hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation) implicated in cardiac diseases with qualitative outcome measures, the effect sizes of risk alleles selected for these cardiac phenotypes were small to moderate (average OR 1.23, range: 1.06–3.57) ( Table 3 ). We normalized MPRS to the 1KGP data set, including four ethnic groups, to calculate relative risks as compared with estimated population norms. Across the four ethnic groups, the number of risk alleles per individual was significantly different (one-way analysis of variance P < 0.0001). The East Asian individuals had more risk alleles (mean ± SD 105.5 ± 4.82) as compared with the other ethnic groups (Tukey’s HSD P < 0.0001 for all three comparisons). The average number of risk alleles in Admixed American individuals (102.5 ± 5.05) was not significantly different from those of EUR (102.0 ± 4.79) and African (103.6 ± 4.37) (Tukey’s HSD P = 0.0853 and 0.745, respectively) individuals, but the difference between African and EUR individuals was significant (Tukey’s HSD P = 0.0005). The differences were partly attributable to biases in discovery cohorts (Supplementary Table S4 online). More than two-thirds of risk alleles (70.8%) were reported from studies with EUR populations. East Asian (20.5%) and African (6.8%) populations were underrepresented in previous studies. For instance, seven risk alleles associated with obesity were discovered from two independent studies of EUR populations. Of these, five risk alleles—rs10508503, rs2116830, rs988712, rs1805081, and rs1421085—are rare (AF ≤ 0.05) in the African group, and two risk alleles—rs10508503 and rs2116830—are not present in any East Asian individuals in the 1KGP. The average MPRS in EUR individuals was higher as compared with those of the other ethnic groups (one-way analysis of variance with Dunnett’s post hoc tests with EUR as control, P < 0.001). Thus, an individual in the interquartile range of MPRS in the EUR population might be placed in the 9th and 10th deciles in the other ethnic groups.

Table 3 A summary of risk alleles for the cardiac supplement in the MedSeq Project

Table 3 demonstrates our current format for reporting the MPRS and the other contextual information outlined above. Age-specific prevalence is also reported, with the proportion of variation in phenotype liability explained by common genetic variants based on the extant literature. The number of risk loci and total risk alleles identified, normalized MPRS truncated at 10 and 90 percentiles for the outlier values, and percentile rank are reported. The clinical application of this result summary (albeit in the absence of objective clinical utility) will be investigated in the MedSeq Project and other longitudinal studies. As such, it will be important to emphasize the changing context and evolving limitations of genetic risk assessment attributable to common variants. For instance, the estimated heritability of T2D from family studies ranges from 0.3 to 0.6, as compared with the more modest proportion of variation in phenotype liability explained by common genetic variants (0.05–0.1). Although much more rigorous data will be required for the demonstration of formal clinical utility, the combination of a detailed family history, with even current risk predictions for common diseases attributable to common genetic variants, may be informative for clinicians and patients to promote specific health behaviors.

Discussion

Predicting the genetic liability for a particular disease based on the reported risk alleles is currently not useful in medical practice. Indeed, even alleles with large effect sizes are of little utility for predicting clinically meaningful outcomes. In most common disorders, the contribution of acquired or environmental risk factors is considered to be of much greater importance than the inherited contribution. These limitations of genetic prediction are also a function of the context in which the extant genetic data have been collected; for common phenotypes, the context is usually case–control studies that are not designed or powered to derive the trait’s genetic architecture. For most diseases, rigorous heritability estimates are scant, genetic studies have used low-resolution phenotypes, and outcomes data are incomplete. For all but a few genotypes there are no robust data regarding clinical utility. If genome sequencing and common genetic variation are to play a substantial role in precision medicine (it is expected that they will), then there will have to be considerable investment in rigorous large-scale studies in clinical cohorts for which validity, clinical utility, and cost-effectiveness can be demonstrated.12,30,31

One of the prerequisites for the studies that will be necessary to establish the role of WGS in the clinic is standardized reporting strategies for genome-scale data. These will be required not only to communicate the primary genetic results but also to inform the clinician of additional nongenomic data and to supply the nuanced context necessary for secondary interpretation. In the current study, we have proposed summarizing polygenic risks using the ranks in a population instead of providing absolute disease risk estimates attributable to known risk alleles.17 Clinicians and patients can review the genetic information in the context of the medical and family histories, lifestyle, and laboratory test results. These are all important elements that can condition interpretation of any genotype and frame the doctor–patient relationship for a range of health-promoting behaviors. Thus, an individual with the highest polygenic disease risk may have a modest overall risk once nongenetic factors are considered. Importantly, the reproducibility and stability of risk prediction in such a complex context are likely to limit the clinical utility of genetics.32 Kalf and colleagues compared the three polygenic relative risk prediction methods of current direct-to-consumer genotyping companies33 and found significant discordance. For six multifactorial diseases, the personal genome tests marketed by the three companies had limited predictive ability (atrial fibrillation, T2D, and prostate cancer), a considerable probability (20–27%) of predicting effects in the “opposite” direction (age-related macular degeneration and CD), or substantial differences in absolute risks at the individual level (celiac disease).

There are some significant limitations to our approach. First, we restricted our model to narrow-sense heritability, aggregating the additive contributions of each risk allele to the phenotype and ignoring potential dependencies between risk alleles for the same phenotype. As a consequence, estimating genetic risks from multiple risk alleles may overestimate the total heritability or genetic risk. Second, we chose the 1KGP cohort to calculate the background distribution of MPRS, but this cohort contains only a few hundred individuals of each major ethnic group, so the samples were not large enough to accurately match genetic background or to estimate population norms. Third, the original GWAS discovery and replication cohorts undoubtedly have biases in population structure and cryptic relatedness34 because the observed levels of MPRS in the 1KGP population were considerably smaller than the expected 3N levels with n risk alleles in our analysis. Indeed, even small numbers of genotyping errors result in significant changes in polygenic risk, as shown in our simulation analysis. Fourth, we also found significant errors throughout the current GWAS catalog. For instance, in some cases risk of AF was replaced by the OR, or minor alleles were reported as major, with downstream errors in the direction and magnitude of effect. Much more stringent data sets will be necessary for clinical interpretation and decision support. Finally, we did not undertake analysis for detection of copy number or other structural variations in the current study, given the limits of current analytic tools, and the phenotypic associations of such variants are not well established, except for specific oncogenic driver mutations.35 As analytic techniques improve and associations are defined, WGS data sets can be reanalyzed for such structural variants.

Family history remains the most commonly used genetic information in clinical practice. Because collecting family history is an important part of the standard medical assessment and can contribute independent genetic information beyond any measured risk alleles, future prospective studies should seek to combine family history and allelic risk predictions. Some such population-scale data sets have accumulated in direct-to-consumer companies over several years and would provide an invaluable resource to the biomedical research community if shared with appropriate privacy protection. The successful implementation of genomic medicine will require the systematic collection of phenotypic data and environmental risk factors, drug responses, and quantitative outcomes. The deconvolution even of the limited genotypic data interpretable at present will require vast data sets that can be mustered only by collaborative projects on a global scale. The unstated inference is that for genomic medicine to be rigorously evaluated, it must first be incorporated into general clinical practice, overturning the “evidence-first” strategy of modern medicine.

Disclosure

The authors declare no conflict of interest.