Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction

Cancer risk is determined by a complex interplay of environmental and heritable factors. Polygenic risk scores (PRS) provide a personalized genetic susceptibility profile that may be leveraged for disease prediction. Using data from the UK Biobank (413,753 individuals; 22,755 incident cancer cases), we quantify the added predictive value of integrating cancer-specific PRS with family history and modifiable risk factors for 16 cancers. We show that incorporating PRS measurably improves prediction accuracy for most cancers, but the magnitude of this improvement varies substantially. We also demonstrate that stratifying on levels of PRS identifies significantly divergent 5-year risk trajectories after accounting for family history and modifiable risk factors. At the population level, the top 20% of the PRS distribution accounts for 4.0% to 30.3% of incident cancer cases, exceeding the impact of many lifestyle-related factors. In summary, this study illustrates the potential for improving cancer risk assessment by integrating genetic risk scores.

C ancer susceptibility is inherently complex, but it is well accepted that heritable genetic factors and modifiable exposures contribute to cancer development. While our knowledge of causal modifiable risk factors has gradually evolved over the past decades, genome-wide association studies (GWAS) have rapidly produced a wealth of germline genetic risk variants for different cancers. These studies have shed light on genetic mechanisms of cancer susceptibility; however, the public health impact of GWAS findings has been modest. In response, GWAS results have been leveraged to create polygenic risk scores (PRS) by combining weighted genotypes for risk alleles into a single, integrated measure of an individual's genetic predisposition to a specific phenotypic profile. Such genetic risk scores are not designed to reflect the complexity of molecular susceptibility mechanisms, but they are highly amenable to phenotypic prediction.
Multiple studies have demonstrated that PRS can generate informative predictions for heritable traits 1,2 and diseases 3,4 , prompting many to advocate for increased integration of genetic risk scores into clinical practice 5,6 . An important step towards realizing the promise of PRS in precision medicine lies in systematically assessing the added value of genetic information in comparison to conventional risk factors and examining how it affects lifetime risk trajectories 6 . The recent development of large, prospective cohorts with both genome-wide genotyping and deep phenotyping data, such as the UK Biobank 7 , provide an opportunity for integrative analyses of genetic variation and modifiable risk factors. In addition to evaluating PRS predictive performance, these data also provide a unique opportunity to answer etiological questions about the relative contribution of genetic and modifiable risk factors to cancer susceptibility.
In this study we assemble PRS for 16 cancer types, based on previously published GWAS, and apply them to an external population of 413,870 UK Biobank (UKB) participants, with the aim of quantifying the potential for low-penetrance susceptibility variants to improve cancer risk assessment at the population level. First, we evaluate the degree to which PRS can improve risk prediction and stratification based on established cancer risk factors, such as family history and modifiable health-related characteristics. Next, we estimate the proportion of incident cancer cases that can be attributed to high genetic susceptibility, captured by the PRS, and compare this to modifiable determinants of cancer. Taken together, our results show that genetic risk factors represented by the PRS account for a substantial proportion of the overall cancer incidence and that for most cancers, incorporating this genetic information improves risk prediction based on conventional risk factors alone.
Improvement in risk prediction. The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). All cancer-specific risk models were well-calibrated (goodness-of fit P > 0.05; Supplementary Fig. 1). Model discrimination was assessed by Harrell's C-index, estimated as a weighted mean between 1 and 5 years of follow-up time. For completeness, we also report the area under the curve (AUC) at 5 years of follow-up time 9 . Proportionality violations (P < 0.05) were detected for age in the breast cancer model and PRS IV for cervical cancer. For breast cancer this was resolved by incorporating an interaction term with follow-up time. As a sensitivity analysis for cervical cancer we modeled a time-varying PRS effect ( Supplementary Fig. 2).
The C-index reached 0.60 with age and/or sex, for all cancers except for breast and thyroid (Supplementary Table 5). For cancers with available information on a family history of cancer at the same site (prostate, breast, colon/rectum, and lung), incorporating this had a modest impact on the C-index (ΔC < 0.01). In fact, replacing family history with the PRS resulted in an improvement in discrimination for prostate (C = 0.763, ΔC = 0.047), breast (C = 0.620, ΔC = 0.061), and colorectal (C = 0.708, ΔC = 0.029), but not lung (C = 0.711, ΔC = −0.002) cancers.
Next, we assessed the change in the C-index (ΔC) after incorporating the PRS into prediction models with all available risk factors for each cancer ( Fig. 2 and Supplementary Table 5). The resulting improvement in prediction performance was variable. The largest increases in the C-index were observed for cancer sites with few available predictors, such as testes (C PRS = 0.766, ΔC = 0.138), thyroid (C PRS = 0.692, ΔC = 0.099), prostate (C PRS = 0.768, ΔC = 0.051), and lymphocytic leukemia (C PRS = 0.756, ΔC = 0.061). Incorporating the PRS also improved prediction accuracy for melanoma (C PRS = 0.664, ΔC = 0.042), breast (C PRS = 0.635, ΔC = 0.063), and colorectal (C PRS = 0.716, ΔC = 0.030) cancers, which have multiple environmental risk factors. The highest overall C-index was observed for lung (C PRS = 0.849) and bladder (C PRS = 0.814) cancers, which was primarily attributed to non-genetic predictors (C without PRS: lung = 0.846; bladder = 0.808). However, it is worthwhile noting that despite having a large ΔC, the precision of the C-index estimates was low for some rarer cancers, such as testicular (n = 52) and thyroid (n = 191), as well as cancers with genetic risk scores based on relatively few variants. Changes in the AUC at 5 years of follow-up were of similar magnitude (Supplementary Table 5).
For 15 out of 16 cancers, incorporating the PRS resulted in significant improvement in reclassification, as indicated by positive percentile-based net reclassification index (NRI) 11 values with 95% bootstrapped confidence intervals excluding 0 (Supplementary Table 6). The overall NRI was primarily driven by the event NRI (NRI e ), which is the increase in the proportion of cancer cases reclassified to a higher risk group. Positive NRI e values >0.25 were observed for prostate, thyroid, breast, testicular, leukemia, melanoma, and colorectal cancers. The largest reclassification improvement in non-event NRI (NRI ne ) observed for the lung PRS (NRI ne = 0.015) and breast PRS (NRI ne = 0.014). Four cancers (testes, leukemia, kidney, and oral cavity/ pharynx) had significantly negative NRI ne values indicating that adding the PRS decreased classification accuracy in cancer-free individuals.
Refinement of risk stratification. The ability of the PRS to refine risk estimates was assessed by examining 5-year absolute risk trajectories as a function of age, across strata defined by percentiles of PRS (high risk ≥80%, average: >20-<80%, low risk: ≤20%) and family history of cancer ( Fig. 3; exact P values in Supplementary Table 7). Significantly diverging risk trajectories, overall and at age 60, were observed for prostate (P ≤ 4.5 × 10 −25 ), breast (P ≤ 4.6 × 10 −32 ), colorectal (P ≤ 2.0 × 10 −21 ), and lung cancers (P ≤ 0.031). For all cancers except lung, risk stratification was primarily driven by PRS. For instance, 60-year-old men with a high PRS but no family history of prostate cancer had a higher mean 5-year disease risk (4.74%) compared to men with a positive family history and an average PRS (3.66%). For lung cancer, on the other hand, participants with a positive family history had higher average 5-year risks, even with a low PRS (0.54%), compared to those without (high PRS: 0.46%; low PRS: 0.29%). There was evidence of interaction between the PRS and family history of cancer for prostate (P = 9.0 × 10 −128 ), breast (P = 1.2 × 10 −98 ), and colorectal (P = 8.7 × 10 −14 ) cancers (Supplementary Table 8). For lung cancer the interaction with family history was limited to the high PRS group (P = 5.9 × 10 −3 ).
We also compared 5-year risk projections across strata of PRS and modifiable risk factors. Effects of multiple risk factors were combined into a single score by generating summary linear predictors for each cancer (see "Methods" for details). For several common cancers, individuals with a high PRS were predicted to have an overall risk above the median, and this increased risk was observed even when high PRS individuals also had modifiable risk factor scores that were below the median modifiable risk factor score ( Fig. 4 and Supplementary  Fig. 3). PRS achieved significant risk stratification for breast cancer (pre-menopausal: P ≤ 7.9 × 10 −20 ; post-menopausal: P ≤ 1.7 × 10 −40 ), colorectal cancer (P ≤ 1.8 × 10 −42 ), and melanoma (P ≤ 3.5 × 10 −139 ) ( Fig. 4; exact P values in Supplementary Table 7). The same pattern of stratification was observed for NHL, leukemia, pancreatic, thyroid, and testicular cancers (Supplementary Figs. 3,4). For other phenotypes, lifestylerelated risk factors had a stronger overall influence on risk trajectories than PRS ( Fig. 5; exact P values in Supplementary  Table 7). However, stratifying by levels of PRS still resulted in significantly diverging risk projections for several cancers (lung: P ≤ 1.1 × 10 −13 ; oral cavity/pharynx: P ≤ 1.2 × 10 −12 ; kidney: P ≤ 1.7 × 10 −52 ). For bladder cancer, the risk trajectories for high PRS/reduced modifiable risk and low PRS/high modifiable risk were overlapping (P = 0.99).
Quantifying population-level impact. Population attributable fractions (PAF) were used to summarize the relative contribution of genetic susceptibility and modifiable risk factors to cancer risk at the population level. In order to allow comparisons between PAF estimates, the PRS and modifiable risk score distributions were both dichotomized at ≥80th percentile. All risk factors nominally contributed (P < 0.05) to cancer incidence ( Fig. 6 and Supplementary Table 9

Discussion
Cancer is a multifactorial disease with a complex web of etiological factors, from macro-level determinants, such as health policy, to individual-level characteristics, such as health-related behaviors and heritable genetic profiles. Heritable and modifiable risk factors act in concert to influence cancer development, but their relative contributions to disease risk are rarely compared directly in the same population. In this study we provide insight into the potential utility of PRS for cancer risk prediction and the relative of contribution of genetic and modifiable risk factors to cancer incidence at population level.
Our first major finding is that cancer-specific PRS comprised of lead GWAS variants improve risk prediction for all 16 cancers examined. However, the magnitude of the resulting improvement in prediction varies substantially between sites. In evaluating the added predictive value of the PRS it is important to keep in mind that achieving the same incremental increase in the C-index/AUC is more difficult when the baseline model already performs well 12 . This was applicable to most cancers, where age and/or sex alone achieved non-trivial risk discrimination (C-index/AUC > 0.60). Expanding the set of predictors to include modifiable risk factors further improved discrimination, as previously shown 13 . By adding the PRS to the most comprehensive risk factor models facilitated by our data, we adopted a conservative approach for quantifying its added predictive value, which provides an informative benchmark for future efforts seeking to incorporate genetic predisposition in cancer risk assessment.
Cancer sites for which the PRS resulted in the largest gains in prediction performance included prostate, testicular, and thyroid cancers, as well as leukemia, and melanoma. This is consistent with high heritability estimates reported for these cancers in twin studies 14 and our analyses in the UK Biobank 15 . Modeling the PRS in addition to established risk factors yielded very modest improvements in risk discrimination for cancers of the lung, endometrium, bladder, oral cavity/pharynx, and kidney. These cancers have strong environmental risk factors, such as smoking, alcohol consumption, obesity, and HPV infection, some of which were captured in our analysis. Limited predictive ability for cervical and endometrial cancers may also be due to a low number of variants included in the PRS (9 and 10, respectively). The association of the lung cancer PRS with cigarettes per day 16 may have diminished its apparent predictive value when added to a model with smoking status and intensity, which already achieved an AUC > 0.80 making difficult to elicit further improvement. Furthermore, PRS may be particularly relevant for assessing lung cancer risk in never smokers, since other risk factors have a limited impact in this population.
Few pan-cancer PRS studies have been conducted in prospective cohorts and none have considered the breadth of modifiable risk factors that we evaluated. Shi et al. 17 tested 11 cancer PRS in cases from The Cancer Genome Atlas and controls from the Electronic Medical Records and Genomics Network. This analysis was limited by fewer risk variants in each PRS, as well as potential for bias due to selection of cases and controls from different populations. A phenome-wide analysis in the Michigan Genomics Initiative cohort by Fritsche et al. 18  Our second major finding advances the idea of using germline genetic information to refine individual risk estimates. We show that incorporating PRS improves risk stratification provided by conventional risk factors alone, as illustrated by significantly diverging 5-year risk projections within strata based on family history or modifiable risk factors. For certain cancers, including some with strong environmental risk factors, such as melanoma, breast, colorectal, and pancreatic cancers, PRS was the primary determinant of risk stratification. For others, such as lung and bladder cancers, modifiable risk factors had a stronger impact on 5-year risk trajectories. A consistent finding for all cancers was that individuals in the top 20% of the PRS distribution with an unfavorable modifiable risk factor profile had the highest level of risk, with evidence that the effects of PRS and modifiable risk factors may be synergistic. Similar risk stratification results based on genetic and modifiable risk factors have also been reported for coronary disease 28 Fig. 4 Predicted 5-year absolute risk trajectories across strata defined by modifiable risk factors and percentiles of the polygenic risk score (PRS) distribution. Low PRS corresponds to ≤20th percentile, average PRS is defined as >20th to <80th percentile, and high PRS includes individuals in the ≥80th percentile of the normalized genetic risk score distribution. Individuals below the median of the modifiable risk factor distribution were classified as having reduced risk, whereas those above the median had elevated risk. P values for differences in mean absolute risk in each stratum at age 50 for a premenopausal breast cancer and at age 60 for b post-menopausal breast cancer, c colon/rectal cancer, and d melanoma are based on t-tests (two sided).
individuals with wide variation in cancer predisposition based on lifestyle-related risk factors. Furthermore, as a risk factor that is present and stable throughout the life course, PRS may be useful for motivating targeted prevention efforts in high-risk individuals before they accumulate a high burden of modifiable risk factors. In addition to evaluating predictive performance and risk stratification, our work demonstrates the relevance of common genetic risk variants at the population level. High genetic risk (PRS ≥ 80th percentile) explained between 4.0 and 30.3% of incident cancer cases, and for many phenotypes this exceeded PAF estimates for modifiable risk factors or family history. The contribution of genetic variation to disease risk is typically conveyed by heritability, which is an informative metric, although not easily translated into a measure of disease burden useful in a public health context. Recent work on cancer PAF in the UK 30 and a series of publications from the ComPARe initiative in Canada 31,32 examined a wide range of modifiable risk factors. Despite providing useful data, these studies overlook the contribution of genetic susceptibility. Our work addresses these limitations by providing a more complete perspective on the determinants of cancer and potential impact of future prevention policies.
In evaluating the contributions of our study, several limitations should be acknowledged. First, we did not account for the impact of workplace exposures and socio-economic determinants of health, thereby underestimating the role of non-genetic risk factors. We also lacked data on several known carcinogens, such as ionizing radiation, and clinical biomarkers, such as prostatespecific antigen, thus limiting the extent to which our results inform risk classification for certain cancers. Information on family history was also not available for all cancer types. Second, since the UK Biobank cohort is unrepresentative of the general UK population due to low participation and resulting healthy volunteer bias 33 , we may have underestimated PAFs for modifiable risk factors. Finally, the models presented here are calibrated to the UKB population and we urge caution in extrapolating prediction performance and absolute risk projections to other populations. Since our analytic sample is restricted to individuals of predominantly European ancestry, this limits the applicability of our findings to diverse populations. This work has several important strengths. Our study provides a comprehensive description of the joint and relative influence of genetic and modifiable risk factors in a population-based cohort with uniform phenotyping and extensive data on a range of relevant cancer risk factors. We established risk models based on the current knowledge of genetic and modifiable risk factors and report a series of metrics that comprehensively characterize different dimensions of PRS predictive performance in an independent population. With the exception of limited overlap with one colorectal cancer GWAS (see "Methods"), all of our risk models were developed based on previously published associations from studies that did not include the UK Biobank. While our results are promising, we anticipate that the PRS performance reported here may be enhanced by adopting less stringent P value thresholding, optimizing subtype-specific weights, and implementing more sophisticated PRS models that incorporate linkage disequilibrium structure, functional annotations, or singlenucleotide polymorphism (SNP) interactions. Some of these strategies are already being successfully implemented 4,23 . We also provide insight into PRS modeling by showing that accounting for the variance in risk allele effect sizes improves PRS performance. This approach may be particularly advantageous for PRS derived from multiple sources rather than a single GWAS. Throughout this study we consider a relatively lenient definition of high genetic risk, corresponding to the top 20% of the PRS distribution. Exploring other cut-points will be informative; however, our results are valuable for demonstrating that the utility of PRS for stratification is not limited to the most extreme ends of the genetic susceptibility spectrum. This threshold is also compelling from a population-health perspective, as it allows us to quantify the proportion of cases attributed to a risk factor with a 20% prevalence.
Genetic risk scores have the potential to become a powerful tool for precision health, but only if the resulting information can be understood and acted on appropriately. One important consideration is the accuracy and stability of PRS-based risk classifications, especially at clinically actionable risk thresholds that exist for certain cancers. For instance, there are established screening programs for breast and colorectal cancers, and increasing evidence supporting the effectiveness of low-dose computed tomography for lung cancer screening 34,35 . For these cancers PRS could be used to adjust the optimal age for screening initiation and/or intensity. However, to justify this, studies are needed to demonstrate the benefit of using PRS to supplement conventional screening criteria. Such trials are already underway for breast cancer, where genetic risk scores are being incorporated to personalize risk-based screening 36 . For other cancers, such as prostate, screening remains controversial and PRS may prove useful in identifying a subset of high-risk individuals who may benefit the most from screening.
Another area where PRS may prove useful is for prioritizing individuals for targeted health and lifestyle-related interventions. In support of this, our study demonstrates that those with the highest levels of genetic risk, based on the PRS, may also experience larger decreases in risk from shifting to a healthier lifestyle. However, there is also accumulating evidence that simply reporting genetic risk information to individuals does not induce behavior change that could lead to meaningful reductions in risk 37 . Therefore, progress in our ability to construct and apply PRS to identify high-risk individuals must be also accompanied by the development of effective behavioral interventions that can be implemented in response to high disease risk, in addition to early detection and screening protocols.
Ultimately, the impact of PRS on clinical decision-making should be carefully evaluated in randomized trials prior to deployment in healthcare settings. By demonstrating cancerspecific improvements in risk prediction, as well as the substantial proportion of cancer incidence that is captured by known genetic susceptibility variants, we provide evidence that contextualizes the potential for using genetic information to improve cancer outcomes.

Methods
Study population. The UK Biobank (UKB) is a population-based prospective cohort of individuals aged 40-69 years, enrolled between 2006 and 2010. All participants completed extensive questionnaires, in-person physical assessments, and provided blood samples for DNA extraction and genotyping 7 . Health-related outcomes were ascertained via individual record linkage to national cancer and mortality registries and hospital in-patient encounters 7 . Individuals with at least one recorded incident diagnosis of a borderline, in situ, or malignant primary cancer were defined as cases. Cancer diagnoses coded by International Classification of Diseases (ICD)-9 or ICD-10 codes were converted into ICD-O-3 codes using the SEER site recode paradigm in order to classify cancers by organ site.
Participants were genotyped on the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%) 7 . Genotype imputation was performed using the Haplotype Reference Consortium as the main reference panel, supplemented with the UK10K and 1000 Genomes phase 3 reference panels 7 . Genetic ancestry principal components (PCs) were computed using fastPCA 38 based on a set of 407,219 unrelated samples and 147,604 genetic markers 7 . All analyses were restricted to self-reported European ancestry individuals with concordant selfreported and genetically inferred sex. To further minimize potential for population stratification, we excluded individuals with values for either of the first two ancestry PCs outside of five standard deviations of the population mean. Based on a subset of genotyped autosomal variants with minor allele frequency (MAF) ≥ 0.01 and genotype call rate ≥97%, we excluded samples with call rates <97% and/or heterozygosity more than five standard deviations from the mean of the population. With the same subset of SNPs, we used KING 38 to estimate relatedness among the samples. We excluded one individual from each pair of first-degree relatives, preferentially retaining individuals to maximize the number of cancer cases remaining, resulting in a total of 413,870 UKB participants.
Polygenic risk scores. In order to derive PRS for each of the 16 cancers, we extracted previously associated variants by searching the National Human Genome Research Institute (NHGRI)-European Bioinformatics Institute (EBI) Catalog of published GWAS. For every eligible GWAS, both the original primary manuscript and supplemental materials were reviewed. Additional relevant studies were identified by examining the reference section of each article and via PubMed searches of other studies in which each article had been cited. We abstracted all autosomal variants with MAF ≥ 0.01 and P < 5 × 10 −8 identified in populations of at least 70% European ancestry and published by June 2018, with the exception of one colorectal cancer GWAS 39 (published in December 2018). Studies used to identify cancer risk variants and obtain corresponding effect sizes for the PRS were conducted in populations other than the UK Biobank. One exception is the colorectal cancer study by Huyghe et al. 39 , which included 5356 cases and 21,407 controls from the UK Biobank in the GWAS meta-analysis, comprising 9% of cases and 21% of all participants.
Details of the PRS development approach, including a comprehensive list of source studies, is described by Graff et al. 16 . For inclusion in the PRS we preferentially selected independent SNPs (LD r 2 < 0.3) with the highest imputation score and we excluded SNPs with allele mismatches or MAF differences >0.10 relative to the 1000 Genomes reference population, and palindromic SNPs with MAF ≥ 0.45. For associations reported in more than one study of the same ancestry and phenotype, we selected the one with the most information (i.e., which reported the risk allele and effect estimate) with the smallest P value. For breast cancer, the PRS used in this analysis differs slightly from Graff et al. 16 . We looked up 187 candidate PRS variants in publicly meta-analysis summary statistics from the Breast Cancer Association Consortium (BCAC) GWAS, as reported in Michalidou et al. 40 . We retained SNPs with P < 5 × 10 −8 in the BCAC meta-analysis (n = 162) and constructed a standard PRS using risk allele weights from these summary statistics.
Three approaches for combining risk variants in the PRS were considered. First, we used standard PRS weights, corresponding to the log odds ratio (β) for each risk allele: We compared this to an unweighted score corresponding to the sum of the risk alleles, which is equivalent to assigning all variants an equal weight of 1: Lastly, we applied inverse variance (IV) weights that incorporated the standard error (SE) of the SNP log(OR) to account for uncertainty in risk allele effect sizes and downweigh the contribution of variants with less precisely estimated associations (weights provided in Supplementary Data 1): Each PRS was standardized with the entire cohort to have a mean of 0 and standard deviation (SD) of 1.
Risk model development and evaluation. Cancer-specific prediction models consisting of four classes of risk factors were developed: (i) demographic factors (age and sex); (ii) family history of cancer in first-degree relatives; (iii) modifiable risk factors; and (iv) genetic susceptibility, represented by the PRS. Family history of cancer was derived based on self-reported illnesses in non-adopted first-degree relatives, which only listed cancers of the prostate, breast, bowel, or lung. In addition to these four cancer sites, family history of breast cancer was included as a predictor for ovarian cancer 41,42 . Models for pancreatic cancer included a composite variable for family history of cancer at any of these four sites 43,44 Table 1).
Cause-specific Cox proportional hazard models were used to estimate the HRs and corresponding 95% CI for genetic and lifestyle factors associated with each incident cancer. Death from any cause, other than cancer site-specific mortality, was treated as a competing event. Information on primary and contributing causes of death was used to identify cancer site-specific mortality. Follow-up time was calculated from the date of enrollment to the date of cancer diagnosis, date of death, or end of follow-up (1 January 2015). For each cancer, individuals with a past or prevalent cancer diagnosis at that same site were excluded from the analysis, while individuals diagnosed with cancers at other sites were retained in the population. All models including the PRS were also adjusted for genotyping array and the first 15 genetic ancestry PCs. For the PRS, HR estimates correspond to 1 SD increase in the standardized genetic score.
The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). Calibration was assessed with a Hosmer-Lemeshow goodness-of-fit statistic modified for time-to-event outcomes 46 , and by plotting the expected event status against the observed event probability 47 across risk deciles (or quantiles to ensure a minimum of five cases per group). Violation of the proportionality of hazards assumption was assessed by examining the association between standardized Schoenfeld residuals and time.
We evaluated nested models starting with the most minimal set of predictors, such as demographic factors, followed by models including family history of cancer and modifiable risk factors, and finally models incorporating the PRS. Risk discrimination was assessed based on Harrell's C-index, calculated as a weighted average between 1 and 5 years of follow-up time, and area under the curve (AUC) at 5 years. We also report pseudo-R 2 coefficients based on Royston's measure of explained variation for survival models 10 . Percentile-based net reclassification (NRI) index 11 was used to quantify improvements in reclassification. NRI summarizes the proportion of appropriate directional changes in predicted risks. Any upward movement in risk categories for cases indicates improved classification, and any downward movement implies worse reclassification. The opposite is expected for non-cases: NRI e ¼ P eventjup ð Þ n U À PðeventjdownÞ n D n PðeventÞ ; ð4Þ where n U is the number of individuals up-classified and n D is the number downclassified. Overall NRI is the sum of the NRI in cases and non-cases: I = NRI e + NRI ne . Bootstrapped confidence intervals were obtained based on 1000 replicates.
Assessment of risk stratification. For each individual, we estimated the 5-year absolute risk of being diagnosed with a specific cancer using the formula of Benichou and Gail 48 , as implemented by Ozenne et al. 49 in the RiskRegression package. Briefly, for cause-specific Cox regression models the absolute risk accumulates over time as the product between the event-free survival and the hazard of experiencing the event of interest, both conditional to the baseline covariates. For NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19600-4 ARTICLE NATURE COMMUNICATIONS | (2020) 11:6084 | https://doi.org/10.1038/s41467-020-19600-4 | www.nature.com/naturecommunications models with one competing outcome, event-free survival is estimated from the cause-specific hazards using the product integral estimator, where Λ j;z tjx ð Þ denotes cause-specific hazard rates: This is asymptotically equivalent to the product-limit estimator if the distribution of the event times is continuous and the product integral estimator ensures that the sum of transition probabilities over all possible transitions should be one.
Absolute risk trajectories were examined as a function of age across strata defined by genetic and modifiable risk profiles, as well as family history. Individuals in the top 20% of the PRS distribution (PRS≥80th percentile) for a given cancer were classified has having high genetic risk, those in the bottom 20% (PRS≤20th percentile) were classified as low risk, and the middle category (>20th to <80th percentile) classified as average genetic risk.
Modifiable risk factors were summarized by generating summary linear predictors (predicted log-hazard ratios) based on risk factors in Supplementary Table 1, excluding age, sex, and family history. Individuals above the median of this risk score distribution were considered to have an unfavorable modifiable risk profile. Risk trajectories in each stratum were visualized by fitting linear models with smoothing splines to individual risk estimates as a function of age. Differences in mean risk at age 60 were tested using a two-sample t-test. We also tested for interaction between the three-level ordinal PRS variable and the modifiable risk score (dichotomized at the median) in a linear model with the predicted absolute risk as the outcome.
The relative contribution of genetic and modifiable cancer risk factors at the population level was quantified with PAF using the method of Sjölander and Vansteedlandt 50,51 based on the counterfactual framework. To obtain comparable AF estimates, thresholds for high genetic risk and high burden of modifiable risk factors corresponded to the top 20% (≥80th percentile) of each risk score distribution.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The UK Biobank in an open access resource, available at https://www.ukbiobank.ac.uk/ researchers/. This research was conducted with approved access to UK Biobank data under application number 14105. All the other data supporting the findings of this study are available within the article and its supplementary information files and from the corresponding author upon reasonable request. Input data for the construction of polygenic scores (PGS) is available from the PGS Catalog under accession: PGP000050. A reporting summary for this article is available as a Supplementary Information file.