Abstract
Purpose:
Disease-causing mutations and pharmacogenomic variants are of primary interest for clinical whole-genome sequencing. However, estimating genetic liability for common complex diseases using established risk alleles might one day prove clinically useful.
Methods:
We compared polygenic scoring methods using a case–control data set with independently discovered risk alleles in the MedSeq Project. For eight traits of clinical relevance in both the primary-care and cardiomyopathy study cohorts, we estimated multiplicative polygenic risk scores using 161 published risk alleles and then normalized them using the population median estimated from the 1000 Genomes Project.
Results:
Our polygenic score approach identified the overrepresentation of independently discovered risk alleles in cases as compared with controls using a large-scale genome-wide association study data set. In addition to normalized multiplicative polygenic risk scores and rank in a population, the disease prevalence and proportion of heritability explained by known common risk variants provide important context in the interpretation of modern multilocus disease risk models.
Conclusion:
Our approach in the MedSeq Project demonstrates how complex trait risk variants from an individual genome can be summarized and reported for the general clinician and also highlights the need for definitive clinical studies to obtain reference data for such estimates and to establish clinical utility.
Genet Med 17 7, 536–544.
Similar content being viewed by others
Introduction
As the cost of sequencing decreases, the clinical utility of whole-genome sequencing (WGS) is currently undergoing intensive investigation as a tool for precise diagnosis, risk prediction, and therapeutic guidance1; WGS is also undergoing evaluation from ethical and legal perspectives.2,3 The MedSeq Project is a randomized clinical trial studying the integration of WGS into clinical care in two specific contexts4: patients from a specialty clinic with a focus on Mendelian forms of inherited cardiomyopathy and patients from a primary-care practice. In each of these clinical settings, pathogenic variants in known Mendelian disease genes, loss-of-function variants in disease-associated genes across the genome, and other actionable variations, including alleles of pharmacogenetic importance, are the major focus of the whole-genome report. However, one of the advantages of WGS over whole-exome sequencing is that the former provides genomic variants in intronic and other noncoding regions, where the majority of alleles associated with common diseases reside.5,6 As a result, WGS also has the potential, when interpreted in the context of rigorous population data, to enable the efficient estimation of genetic liability for common complex diseases as well as the discovery of possible modifier effects on rare alleles of larger effect size.
One of the most interesting and relevant questions for WGS reporting is in regard to how to define and present data on common alleles associated with increased or decreased risk for certain diseases,7 particularly those with potential therapeutic implications.8 Several approaches might be used to estimate the composite risk for a given trait and to allow its communication to general clinicians. Risk alleles are typically discovered using a case–control design in which the frequency of each allele in cases is compared with that in controls. An allele observed at a higher frequency in cases is considered to be a risk allele and represents a marker for all adjacent variants in linkage disequilibrium.9 Conversely, an allele with a lower frequency in cases is sometimes reported as a “protective” allele. However, the very same allele in different populations may represent distinct risk haplotypes, whereas case and control definitions are also necessarily imperfect. Thus, without longitudinal cohort studies, it may be difficult to establish the clinical validity and clinical utility of common alleles. In this context, in the current study, we focused on the single-nucleotide polymorphisms (SNPs) more frequently found in cases from genome-wide association studies (GWASs) as listed in the National Human Genome Research Institute GWAS catalog.10
An intuitive approach to combine information from several genetic tests is to multiply likelihood ratios with pretest odds of population-specific lifetime disease risk estimates.11,12 However, for the majority of risk alleles, objective likelihood ratios are not available. Polygenic risk scores (PRSs) have been proposed by several investigators7,13,14,15,16,17 to combine multiple risk alleles, including those that fail to attain genome-wide significance in association studies, on the basis that there may be genetic epistasis, interaction with environmental factors, or aggregate effects that can be captured.18 To this end, a multiplicative model including seven risk alleles for breast cancer was proposed for risk stratification.17 Aggregating the information from a larger number of subthreshold risk alleles has also been used, testing the classic models of polygenic inheritance.13,16 These studies highlighted the possibility of using polygenic scores in the context of conditioning nongenetic clinical information, although the performances of such PRSs were inconsistent across different diseases.19,20
Although the prediction of disease risk based solely on genotype is not currently standard of care in medical practice, it may soon be useful for patients and clinicians to know whether a patient presents a high-risk genomic profile for a specific trait or disease as compared with the population norm.21,22 This may be the case even when there are no robust independent data regarding the clinical utility of genetic predictors, given the known role of multiple subjective variables in situations of clinical equipoise. Here, we summarize multiple risk alleles by calculating a normalized PRS using a population-scale WGS data set from the 1000 Genomes Project (1KGP).23 Our approach demonstrates how complex trait risk variants from individual genomes can be efficiently summarized and reported in a clinical context, highlighting the uncertainties of interpretation while facilitating the use of the available information in clinical decision making.
Materials and Methods
Risk alleles
The National Human Genome Research Institute GWAS catalog (http://www.genome.gov/admin/gwascatalog.txt) was downloaded on 12 March 2013.10 The catalog contained a total of 9,785 records corresponding to 8,384 risk alleles. We used a series of filtering steps to retain only informative SNPs for the PRS estimates as detailed in Supplementary Figure S1 online. The excluded SNPs for each filtering step can be found at the second to the rightmost column—“Filtering Status”—of Supplementary Table S1 online. For the risk alleles with odds ratios (ORs) <1, we followed the GWAS catalog’s inversion of ORs using the alternative alleles as risk alleles. A total of 1,565 risk alleles for 182 traits met our filtering criteria (Supplementary Table S1 online).
To test our approach to the reporting of common allele variations in the MedSeq Project, we selected eight binary phenotypes—abdominal aortic aneurysm, atrial fibrillation, coronary heart disease (CHD), type 2 diabetes (T2D), hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation—that are factors frequently weighed in decision making in both primary-care and cardiology subspecialty settings. Quantitative phenotypes were not included because of the inconsistency in phenotype measures and descriptions between studies. A total of 161 risk alleles were then incorporated into PRS estimates for the eight selected phenotypes.
Calculating polygenic risk scores
Several approaches to polygenic risk scoring exist, the majority summing all risk alleles present in an individual genome and assigning allele-specific weighting. The simplest method is to treat all risk alleles equally, that is, an allele counting method in which the weight equals 1.20 Alternatively, observed effect sizes can be used to weight each risk allele differently.13,16 We calculated a multiplicative PRS (MPRS) as detailed in the Supplementary Materials and Methods online. Briefly, the MPRS for each phenotype was calculated as the product of ORs. Thus, log (MPRS) is equivalent to the OR-weighted sum of risk allele counts.20 The population attribution risk (PAR) method integrates population allele frequency (AF) and OR.15 A single SNP PAR was estimated as AFi (ORi − 1)/(AFi × (ORi − 1) + 1), in which AFi is the prevalence of the risk allele at the ith locus in the control population, and ORi is the OR of the risk allele at the ith locus. The multi-SNP PAR was calculated on the basis of the single SNP PAR for each associated SNP: 1 − Π(1 − PARi), in which PARi is the single SNP PAR for the ith locus. The raw scores from counting, and the MPRS and PAR methods were normalized using the median score of the European (EUR) genotypes (N = 392) in the 1KGP, and the ranks of the individual’s score are reported as deciles.
Testing the performance of the MPRS with a GWAS data set
To compare the distribution of polygenic scores between cases and controls, we used the Wellcome Trust Case Control Consortium (WTCCC) phase I data set, which genotyped 16,179 individuals with the Affymetrix GeneChip Human Mapping 500K arrays.24 The details of the WTCCC data set are described in the Supplementary Materials and Methods online. We selected the subset of risk alleles represented on the Affymetrix 500K arrays to calculate the MPRS and performed the analysis after excluding those risk alleles that were originally reported with the WTCCC data set.24 Genotype imputation was not performed because the estimated 5–6% imputation error rate25 might result in significant changes in the MPRS decile (see Results). The MPRS percentile for each individual was calculated for each trait against 2,938 controls. As noted, for SNPs in linkage disequilibrium (r2 > 0.5), we chose the allele with the largest effect size. The SNPs in the major histocompatibility complex region of chromosome 6—rs6458307, rs9469220, rs615672, rs6457617, rs9272346, and rs9465871—were excluded when calculating the MPRS for Crohn disease (CD), type 1 diabetes (T1D), and rheumatoid arthritis.
Results
Correlation between different polygenic scoring methods
The numbers of reported risk alleles per trait skewed to the right because a small number of traits were associated with a majority of risk alleles. Risk alleles for multiple sclerosis (n = 105), CD (n = 95), T2D (n = 77), ulcerative colitis (n = 64), and CHD (n = 62) constituted 25.7% of 1,565 alleles. Forty-three traits were associated with a single reported risk allele. The median OR was 1.25 (interquartile range: 1.15–1.45), and 461 risk alleles exhibited ORs of more than 1.45. The majority of risk alleles were found in non–protein coding regions (91.0% of 1,565): 55.7% (872/1,565) lie within intergenic regions whereas 553 (35.3%) are intronic. A total of 103 (6.6%) risk alleles were found in coding regions, and 14 and 23 were mapped to the 5′-UTR and 3′-UTR regions, respectively. The AFs ranged from 0.011 to 0.983, with an average of 0.422. Risk AFs were not listed for 265 loci in the original discovery studies.
We compared the three methods for combining risk alleles: counting, MPRS, and the multi-SNP PAR outlined in the Methods section. For each individual in the 1KGP EUR population (N = 379), we calculated polygenic scores for eight cardiac phenotypes: abdominal aortic aneurysm, atrial fibrillation, CHD, T2D, hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation. The scores from three methods showed significant positive correlations for all eight traits (Kendall’s tau, P < 2.2 × 10−16; Supplementary Table S2 online); however, the counting method when used with small numbers of risk alleles yielded nonunique scores in 379 EUR individuals (Supplementary Figure S2 online).
To check whether the subgroups at highest genetic risk—i.e., those within the 10th decile—could be consistently defined by different summary methods, we selected two common complex traits—CHD and T2D, which had 62 and 77 risk alleles, respectively, that met our filtering criteria. The percentile rank of each individual was calculated using all three methods, and decile ranks were compared between polygenic scoring approaches. The three methods showed significant positive correlations overall ( Figure 1 and Supplementary Table S2 online), with the correlation between MPRS and PAR being the highest (Kendall’s tau = 0.7229 ( Figure 1c ) and 0.6928 ( Figure 1g ) for CHD and T2D, respectively). However, identifying subgroups within the 10th decile varied significantly by the summary method used. The concordance rate for the 10th decile in CHD PRS was 49% between the MPRS and PAR methods ( Figure 1d ). Among 38 individuals in the 10th decile as ascertained by counting CHD risk alleles, 23 and 16 were in the 10th decile as ascertained by the MPRS and PAR methods, respectively ( Figure 1d ). Similarly, 25 individuals were in the 10th decile by as ascertained counting and PAR for T2D, and 22 were in the 10th decile as ascertained by MPRS and PAR ( Figure 1h ).
PAR provides more intuitive interpretation of genetic risk by combining AF and effect size. However, the prevalence of some risk alleles varies widely across ethnic groups, as indeed may the risk associated with individual alleles. If the AF in the discovery population deviates from the population mean or if the data are from individuals of different ethnic background than those in the original study, then there may be large effects on the estimated PAR. Thus, at present, the validity of PAR is limited for many traits. The validity of a counting method is also limited due to nonunique scores for the traits with fewer risk alleles (Supplementary Figure S2a,c,e,f online). Therefore, we chose the normalized MPRS for further evaluation.
There were also significant differences in MPRS distributions among the four ethnic groups. We compared the distribution of the MPRS for each phenotype between ethnic groups using one-way analysis of variance followed by post hoc tests. With the reported risk alleles, 168 of 182 traits analyzed showed significant differences between ethnic groups (Bonferroni corrected analysis of variance P < 0.01; Supplementary Table S3 online), reinforcing the widely held notion that an individual’s polygenic scores can be rigorously interpreted only in the context of the matched ethnic background.
Performance of polygenic scores with a case–control data set
To check the distribution of the MPRS in cases as compared with that of controls, we used the WTCCC phase I data set.24 We calculated an MPRS for each individual for seven diseases and two control groups, excluding the risk alleles originally reported for the WTCCC data set ( Table 1 ). The five hypertension risk alleles in the GWAS catalog were not sufficient to rank all cases and controls because of tied scores; otherwise, the distributions of the MPRS for six diseases showed significant differences between cases and controls (Tukey’s honestly significant difference (HSD), all P < 0.001 for cases as compared with controls). For all phenotypes, there was no significant difference of MPRS distributions between the 1958 British Birth Cohort and the UK Blood Services cohort ( Figure 2 ). Validating a single risk allele with an independently collected data set often produces inconsistent results26; however, our polygenic score approach successfully identified the overrepresentation of independently discovered risk alleles in cases.
Polygenic scores for each phenotype were sorted into 10 bins in the control group, and the score decile of each case was then determined using the score range of 1st to 10th decile in controls. Each bin had ~294 control individuals and different numbers of cases according to the MPRS. As expected, we observed a significant overrepresentation of cases as compared with controls in upper deciles (Supplementary Figure S3 online). For the patients with CD, 27.3% were in the 10th decile as compared with 2.35% in the 1st decile, which resulted in the relative risk of 1.91 in this data set. However, the positive predictive value for those individuals in the 10th decile was 0.044% using the upper-bound CD prevalence of 16/100,000.27 Positive predictive value increased with the prevalence of the trait, as summarized in Table 1 , and was as high as 12.4% for T2D. Given the relatively low narrow-sense heritability of 0.05–0.10 for T2D,28 the clinical validity of analyzing common risk alleles for unsegmented common diseases is likely to be limited.29 We also measured the performance of polygenic scores using the area under the receiver operating characteristics curve (AUC). Except for CD (AUC 0.704), overall performance of polygenic scores for diseases was poor (AUCs 0.592 (bipolar disorder), 0.622 (coronary artery disease), 0.595 (T2D), 0.604 (T1D), and 0.614 (rheumatoid arthritis); Supplementary Figure S4 online).
Stability of summary method with fewer risk alleles
In light of potential inaccuracy in genotyping, we checked the stability of the MPRS rank of an individual in a population by comparing the original decile using all reported risk alleles with the deciles recalculated using smaller numbers of randomly selected risk alleles. A total of 111 risk alleles were reported for T2D ( Table 1 ), and we randomly selected n risk alleles to recalculate the MPRS and the relevant decile. For the individuals in the 10th decile with all 111 alleles, we traced the change of decile ranks with random exclusion of n risk alleles from 1 to 56 (Supplementary Figure S5 online). This procedure was repeated 100 times for each n, and the mean decile was plotted. Excluding 20% of risk alleles (blue dotted line in Supplementary Figure S5 online) did not result in a change of classification by more than two deciles on average; however, 25% of instances were equal to or less than the 9th decile ( Table 2 ). With 50% of risk alleles, only 56.8% were in the 10th decile. For other phenotypes with small numbers of risk alleles, excluding a single risk allele could change scores from the highest decile to lower deciles or vice versa.
Summarizing cardiac risk alleles in the clinical context
To summarize polygenic relative risks from known risk alleles for general clinicians and patients, we prepared a report on cardiovascular disease risk from common genetic variation as a part of a Cardiac Supplement to our Genome Report in the MedSeq Project.4 The reports include the disease prevalence and narrow-sense heritability in conjunction with an estimated MPRS for a limited number of common cardiac traits of relevance for decision support in primary prevention and in specialist care of inherited heart disease. For eight traits (abdominal aortic aneurysm, atrial fibrillation, CHD, T2D, hypertension, obesity/metabolic syndrome, platelet aggregation, and QT prolongation) implicated in cardiac diseases with qualitative outcome measures, the effect sizes of risk alleles selected for these cardiac phenotypes were small to moderate (average OR 1.23, range: 1.06–3.57) ( Table 3 ). We normalized MPRS to the 1KGP data set, including four ethnic groups, to calculate relative risks as compared with estimated population norms. Across the four ethnic groups, the number of risk alleles per individual was significantly different (one-way analysis of variance P < 0.0001). The East Asian individuals had more risk alleles (mean ± SD 105.5 ± 4.82) as compared with the other ethnic groups (Tukey’s HSD P < 0.0001 for all three comparisons). The average number of risk alleles in Admixed American individuals (102.5 ± 5.05) was not significantly different from those of EUR (102.0 ± 4.79) and African (103.6 ± 4.37) (Tukey’s HSD P = 0.0853 and 0.745, respectively) individuals, but the difference between African and EUR individuals was significant (Tukey’s HSD P = 0.0005). The differences were partly attributable to biases in discovery cohorts (Supplementary Table S4 online). More than two-thirds of risk alleles (70.8%) were reported from studies with EUR populations. East Asian (20.5%) and African (6.8%) populations were underrepresented in previous studies. For instance, seven risk alleles associated with obesity were discovered from two independent studies of EUR populations. Of these, five risk alleles—rs10508503, rs2116830, rs988712, rs1805081, and rs1421085—are rare (AF ≤ 0.05) in the African group, and two risk alleles—rs10508503 and rs2116830—are not present in any East Asian individuals in the 1KGP. The average MPRS in EUR individuals was higher as compared with those of the other ethnic groups (one-way analysis of variance with Dunnett’s post hoc tests with EUR as control, P < 0.001). Thus, an individual in the interquartile range of MPRS in the EUR population might be placed in the 9th and 10th deciles in the other ethnic groups.
Table 3 demonstrates our current format for reporting the MPRS and the other contextual information outlined above. Age-specific prevalence is also reported, with the proportion of variation in phenotype liability explained by common genetic variants based on the extant literature. The number of risk loci and total risk alleles identified, normalized MPRS truncated at 10 and 90 percentiles for the outlier values, and percentile rank are reported. The clinical application of this result summary (albeit in the absence of objective clinical utility) will be investigated in the MedSeq Project and other longitudinal studies. As such, it will be important to emphasize the changing context and evolving limitations of genetic risk assessment attributable to common variants. For instance, the estimated heritability of T2D from family studies ranges from 0.3 to 0.6, as compared with the more modest proportion of variation in phenotype liability explained by common genetic variants (0.05–0.1). Although much more rigorous data will be required for the demonstration of formal clinical utility, the combination of a detailed family history, with even current risk predictions for common diseases attributable to common genetic variants, may be informative for clinicians and patients to promote specific health behaviors.
Discussion
Predicting the genetic liability for a particular disease based on the reported risk alleles is currently not useful in medical practice. Indeed, even alleles with large effect sizes are of little utility for predicting clinically meaningful outcomes. In most common disorders, the contribution of acquired or environmental risk factors is considered to be of much greater importance than the inherited contribution. These limitations of genetic prediction are also a function of the context in which the extant genetic data have been collected; for common phenotypes, the context is usually case–control studies that are not designed or powered to derive the trait’s genetic architecture. For most diseases, rigorous heritability estimates are scant, genetic studies have used low-resolution phenotypes, and outcomes data are incomplete. For all but a few genotypes there are no robust data regarding clinical utility. If genome sequencing and common genetic variation are to play a substantial role in precision medicine (it is expected that they will), then there will have to be considerable investment in rigorous large-scale studies in clinical cohorts for which validity, clinical utility, and cost-effectiveness can be demonstrated.12,30,31
One of the prerequisites for the studies that will be necessary to establish the role of WGS in the clinic is standardized reporting strategies for genome-scale data. These will be required not only to communicate the primary genetic results but also to inform the clinician of additional nongenomic data and to supply the nuanced context necessary for secondary interpretation. In the current study, we have proposed summarizing polygenic risks using the ranks in a population instead of providing absolute disease risk estimates attributable to known risk alleles.17 Clinicians and patients can review the genetic information in the context of the medical and family histories, lifestyle, and laboratory test results. These are all important elements that can condition interpretation of any genotype and frame the doctor–patient relationship for a range of health-promoting behaviors. Thus, an individual with the highest polygenic disease risk may have a modest overall risk once nongenetic factors are considered. Importantly, the reproducibility and stability of risk prediction in such a complex context are likely to limit the clinical utility of genetics.32 Kalf and colleagues compared the three polygenic relative risk prediction methods of current direct-to-consumer genotyping companies33 and found significant discordance. For six multifactorial diseases, the personal genome tests marketed by the three companies had limited predictive ability (atrial fibrillation, T2D, and prostate cancer), a considerable probability (20–27%) of predicting effects in the “opposite” direction (age-related macular degeneration and CD), or substantial differences in absolute risks at the individual level (celiac disease).
There are some significant limitations to our approach. First, we restricted our model to narrow-sense heritability, aggregating the additive contributions of each risk allele to the phenotype and ignoring potential dependencies between risk alleles for the same phenotype. As a consequence, estimating genetic risks from multiple risk alleles may overestimate the total heritability or genetic risk. Second, we chose the 1KGP cohort to calculate the background distribution of MPRS, but this cohort contains only a few hundred individuals of each major ethnic group, so the samples were not large enough to accurately match genetic background or to estimate population norms. Third, the original GWAS discovery and replication cohorts undoubtedly have biases in population structure and cryptic relatedness34 because the observed levels of MPRS in the 1KGP population were considerably smaller than the expected 3N levels with n risk alleles in our analysis. Indeed, even small numbers of genotyping errors result in significant changes in polygenic risk, as shown in our simulation analysis. Fourth, we also found significant errors throughout the current GWAS catalog. For instance, in some cases risk of AF was replaced by the OR, or minor alleles were reported as major, with downstream errors in the direction and magnitude of effect. Much more stringent data sets will be necessary for clinical interpretation and decision support. Finally, we did not undertake analysis for detection of copy number or other structural variations in the current study, given the limits of current analytic tools, and the phenotypic associations of such variants are not well established, except for specific oncogenic driver mutations.35 As analytic techniques improve and associations are defined, WGS data sets can be reanalyzed for such structural variants.
Family history remains the most commonly used genetic information in clinical practice. Because collecting family history is an important part of the standard medical assessment and can contribute independent genetic information beyond any measured risk alleles, future prospective studies should seek to combine family history and allelic risk predictions. Some such population-scale data sets have accumulated in direct-to-consumer companies over several years and would provide an invaluable resource to the biomedical research community if shared with appropriate privacy protection. The successful implementation of genomic medicine will require the systematic collection of phenotypic data and environmental risk factors, drug responses, and quantitative outcomes. The deconvolution even of the limited genotypic data interpretable at present will require vast data sets that can be mustered only by collaborative projects on a global scale. The unstated inference is that for genomic medicine to be rigorously evaluated, it must first be incorporated into general clinical practice, overturning the “evidence-first” strategy of modern medicine.
Disclosure
The authors declare no conflict of interest.
References
Biesecker LG, Green RC . Diagnostic clinical genome and exome sequencing. N Engl J Med 2014;370:2418–2425.
McGuire AL, McCullough LB, Evans JP . The indispensable role of professional judgment in genomic medicine. JAMA 2013;309:1465–1466.
Pyeritz RE . The coming explosion in genetic testing–is there a duty to recontact? N Engl J Med 2011;365:1367–1369.
Vassy JL, Lautenbach DM, McLaughlin HM, et al.; MedSeq Project. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine. Trials 2014;15:85.
Dewey FE, Grove ME, Pan C, et al. Clinical interpretation and implications of whole-genome sequencing. JAMA 2014;311:1035–1045.
Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009;106:9362–9367.
Lyssenko V, Jonsson A, Almgren P, et al. Clinical risk factors, DNA variants, and the development of type 2 diabetes. N Engl J Med 2008;359:2220–2232.
Manolio TA . Bringing genome-wide association findings into clinical use. Nat Rev Genet 2013;14:549–558.
Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K . A comprehensive review of genetic association studies. Genet Med 2002;4:45–61.
Welter D, MacArthur J, Morales J, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014;42(Database issue):D1001–D1006.
Yang Q, Khoury MJ, Botto L, Friedman JM, Flanders WD . Improving the prediction of complex diseases by testing for multiple disease-susceptibility genes. Am J Hum Genet 2003;72:636–649.
Ashley EA, Butte AJ, Wheeler MT, et al. Clinical assessment incorporating a personal genome. Lancet 2010;375:1525–1535.
Stahl EA, Wegmann D, Trynka G, et al.; Diabetes Genetics Replication and Meta-analysis Consortium; Myocardial Infarction Genetics Consortium. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 2012;44:483–489.
Wray NR, Yang J, Goddard ME, Visscher PM . The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 2010;6:e1000864.
Kraft P, Wacholder S, Cornelis MC, et al. Beyond odds ratios–communicating disease risk based on genetic profiles. Nat Rev Genet 2009;10:264–269.
Purcell SM, Wray NR, Stone JL, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009;460:748–752.
Pharoah PD, Antoniou AC, Easton DF, Ponder BA . Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med 2008;358:2796–2803.
Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747–753.
Machiela MJ, Chen CY, Chen C, Chanock SJ, Hunter DJ, Kraft P . Evaluation of polygenic risk scores for predicting breast and prostate cancer risk. Genet Epidemiol 2011;35:506–514.
Evans DM, Visscher PM, Wray NR . Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet 2009;18:3525–3531.
Pashayan N, Pharoah P . Translating genomics into improved population screening: hype or hope? Hum Genet 2011;130:19–21.
Kathiresan S, Melander O, Anevski D, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med 2008;358:1240–1249.
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 2012;491:56–65.
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–678.
Marchini J, Howie B . Genotype imputation for genome-wide association studies. Nat Rev Genet 2010;11:499–511.
Ioannidis JP, Thomas G, Daly MJ . Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 2009;10:318–329.
Lakatos PL . Recent trends in the epidemiology of inflammatory bowel diseases: up or down? World J Gastroenterol 2006;12:6102–6108.
Weedon MN, Clark VJ, Qian Y, et al. A common haplotype of the glucokinase gene alters fasting glucose and birth weight: association in six studies and population-genetics analyses. Am J Hum Genet 2006;79:991–1001.
Hunter DJ, Khoury MJ, Drazen JM . Letting the genome out of the bottle–will we get our wish? N Engl J Med 2008;358:105–107.
Patel CJ, Sivadas A, Tabassum R, et al. Whole genome sequencing in support of wellness and health maintenance. Genome Med 2013;5:58.
Chen R, Mias GI, Li-Pook-Than J, et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 2012;148:1293–1307.
Ng PC, Murray SS, Levy S, Venter JC . An agenda for personalized medicine. Nature 2009;461:724–726.
Kalf RR, Mihaescu R, Kundu S, de Knijff P, Green RC, Janssens AC . Variations in predicted risks in personal genome testing for common complex diseases. Genet Med 2014;16:85–91.
McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008;9:356–369.
Forbes SA, Bindal N, Bamford S, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 2011;39(Database issue):D945–D950.
Acknowledgements
The MedSeq Project is supported by the National Institutes of Health (NIH) National Human Genome Research Institute (U01-HG006500).
Members of the MedSeq Project are as follows: David W. Bates, MD, Alexis D. Carere, MA, MS, Allison Cirino, MS, Lauren Connor, Kurt D. Christensen, MPH, PhD, Jake Duggan, Robert C. Green, MD, MPH, Carolyn Y. Ho, MD, Joel B. Krier, MD, William J. Lane, MD, PhD, Denise M. Lautenbach, MS, Lisa Lehmann, MD, PhD, MSc, Christina Liu, Calum A. MacRae, MD, PhD, Rachel Miller, MA, Cynthia C. Morton, PhD, Christine E. Seidman, MD, Shamil Sunyaev, PhD, and Jason L. Vassy, MD, MPH, SM (Brigham and Women’s Hospital and Harvard Medical School); Sandy Aronson, ALM, MA, Ozge Ceyhan-Birsoy, PhD, Siva Gowrisankar, PhD, Matthew S. Lebo, PhD, Ignat Leschiner, PhD, Kalotina Machini, PhD, MS, Heather M. McLaughlin, PhD, Danielle R. Metterville, MS, and Heidi L. Rehm, PhD (Partners Personalized Medicine); Jennifer Blumenthal-Barby, PhD, Lindsay Zausmer Feuerman, MPH, Amy L. McGuire, JD, PhD, Sarita Panchang, Jill Oliver Robinson, MA, and Melody J. Slashinski, MPH, PhD (Baylor College of Medicine, Center for Medical Ethics and Health Policy); Stewart C. Alexander, PhD, Kelly Davis, and Peter A. Ubel, MD (Duke University); Peter Kraft, PhD (Harvard School of Public Health); J. Scott Roberts, PhD (University of Michigan); Judy E. Garber, MD, MPH (Dana-Farber Cancer Institute); Tina Hambuch, PhD (Illumina, Inc.); Michael F. Murray, MD (Geisinger Health System); and Isaac S. Kohane, MD, PhD, Sek Won Kong, MD, and In-Hee Lee, PhD (Boston Children’s Hospital).
Additional support for this project was from NIH grants HG005092, HD077671, HG006615, and HG006834.
Author information
Authors and Affiliations
Consortia
Corresponding author
Supplementary information
Supplementary Information
(ZIP 5880 kb)
Rights and permissions
About this article
Cite this article
Kong, S., Lee, IH., Leshchiner, I. et al. Summarizing polygenic risks for complex diseases in a clinical whole-genome report. Genet Med 17, 536–544 (2015). https://doi.org/10.1038/gim.2014.143
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/gim.2014.143
Keywords
This article is cited by
-
Behavioral and psychological impact of genome sequencing: a pilot randomized trial of primary care and cardiology patients
npj Genomic Medicine (2021)
-
Cancer genetics, precision prevention and a call to action
Nature Genetics (2018)
-
Genetic variants demonstrating flip-flop phenomenon and breast cancer risk prediction among women of African ancestry
Breast Cancer Research and Treatment (2018)
-
Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project
European Journal of Human Genetics (2017)
-
Polygenic Scores in Epidemiology: Risk Prediction, Etiology, and Clinical Utility
Current Epidemiology Reports (2015)