Integration of polygenic risk scores with modifiable risk factors improves risk prediction: results from a pan-cancer analysis

1. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, USA 2. Genetic Epidemiology Group, Section of Genetics, International Agency for Research on Cancer, Lyon, France 3. Department of Medicine, University of California, San Francisco, San Francisco, USA 4. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, USA 5. Institute for Human Genetics, University of California, San Francisco, San Francisco, USA 6. Department of Urology, University of California, San Francisco, San Francisco, USA


INTRODUCTION
Cancer susceptibility is inherently complex, but it is well-accepted that heritable genetic factors and modifiable exposures contribute to cancer development. While our knowledge of causal modifiable risk factors has gradually evolved over the past decades, genome-wide association studies (GWAS) have rapidly produced a wealth of germline genetic risk variants for different cancers. These studies have shed light on genetic mechanisms of cancer susceptibility, however, the public health impact of GWAS findings has been modest. In response, GWAS results have been leveraged to create polygenic risk scores (PRS) by combining weighted genotypes for risk alleles into a single, integrated measure of an individual's genetic predisposition to a specific phenotypic profile. Such genetic risk scores are not designed to reflect the complexity of molecular susceptibility mechanisms, but they are highly amenable to phenotypic prediction.
Multiple studies have demonstrated that PRS can generate informative predictions for heritable traits 1,2 and diseases 3,4 , prompting many to advocate for increased integration of genetic risk scores into clinical practice 5,6 . An important step towards realizing the promise of PRS in precision medicine lies in systematically assessing the added value of genetic information in comparison to conventional risk factors and examining how it affects lifetime risk trajectories 6 . The recent development of large, prospective cohorts with both genome-wide genotyping and deep phenotyping data, such as the UK Biobank 7 , provide an opportunity for integrative analyses of genetic variation and modifiable risk factors. In addition to evaluating PRS predictive performance, these data also provide a unique opportunity to answer etiological questions about the relative contribution of genetic and modifiable risk factors to cancer susceptibility.
Our overarching aim was to quantify the relative contribution of common, low-penetrance risk variants to cancer risk prediction and overall disease susceptibility. To address these aims, we assembled PRS for 16 cancer types, based on results from previously published GWAS, and applied them to 413,870 individuals in the UK Biobank (UKB) cohort. First, we assessed the degree to which PRS can improve risk prediction and stratification based on established cancer risk factors, such as family history and modifiable healthrelated characteristics. Next, we estimated the proportion of cancer cases at the population-level that can be attributed to high genetic susceptibility, captured by the PRS, and compared this to modifiable determinants of cancer. Supplementary Table 1. Over the course of the follow-up period a total of 22,755 incident cancers were diagnosed in 413,753 individuals, after excluding participants outside of the age enrollment criteria and those who withdrew consent after enrollment. Established cancer risk factors (listed in Supplementary Table 2) exhibited associations of expected magnitude and direction with each cancer (Supplementary Table 3). Family history of cancer in first-degree relatives, at the corresponding site, conferred a significantly higher risk of prostate (HR=1.84, 95% CI: 1.68-2.00, p=9.1´10 -46 ), breast (HR=1.56, 1.44-1.69, p=3.0´10 -29 ), lung (HR=1.61, 1.43-1.81, p=7.4´10 -15 ), and colorectal (HR=1.26, 1.14-1.40, p=1.2´10 -5 ) cancers. Metrics of tobacco use, such as smoking status, intensity, and duration were positively associated with risks of lung, colorectal, bladder, kidney, pancreatic, and oral cavity/oropharyngeal cancers. Weekly alcohol intake was associated with higher risks of breast (HR per 70 grams = 1.04, p=2.3´10 -5 ), colorectal (HR=1.04, p=5.9´10 -9 ), and oral cavity/pharyngeal (HR=1.05, p=3.0´10 -10 ) cancers. Adiposity was associated with cancer risk at multiple sites, including endometrium (BMI: HR per 1-unit = 1.09, 1.08-1.10, p=1.6´10 -49 ), colon/rectum (waist-tohip ratio: HR per 10% increase = 1.17, 1.11-1.24, p=2.2´10 -8 ), and kidney (BMI: HR=1.04, 1.02-1.05, p=1.7´10 -6 ). Particulate matter (PM2.5) was associated with lung cancer risk 8 (PM2.5: HR per 1 micro-g/m 3 = 1.10, 1.05-1.15, p=1.9´10 -5 ) in the model that included smoking status and intensity.

Improvement in risk prediction
The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). All cancerspecific risk models were well-calibrated (Goodness of fit p>0.05; Supplementary Figure 1). Model discrimination was assessed by Harrell's C-index, estimated as a weighted mean between 1 and 5 years of follow-up time. For completeness, we also report the AUC at 5 years of follow-up time 9 . Proportionality violations (p<0.05) were detected for age in the breast cancer model and PRSIV for cervical cancer. For breast cancer this was resolved by incorporating an interaction term with follow-up time. As a sensitivity analysis for cervical cancer we modelled a time-varying PRS effect (Supplementary Figure 2).
The C-index reached 0.60 with age and/or sex, for all cancers except for breast and thyroid (Supplementary Table 5). For cancers with available information on family history of cancer at the same site (prostate, breast, colon/rectum, and lung), incorporating this had a modest impact on the C-index (DC<0.01). In fact, replacing family history with the PRS resulted in an improvement in discrimination for prostate (C=0.763, DC=0.047), breast (C=0.618, DC=0.060), and colorectal (C=0.708, DC=0.029), but not lung (C=0.711, DC=-0.002) cancers.
Next, we assessed the change in the C-index (DC) after incorporating the PRS into prediction models with all available risk factors for each cancer (Figure 2; Supplementary Table 5). The resulting improvement in prediction performance was variable. The largest increases in the C-index were observed for cancer sites with few available predictors, such as testes (CPRS=0.766, DC=0.138), thyroid (CPRS=0.692, DC=0.099), prostate (CPRS=0.768, DC=0.051) and lymphocytic leukemia (CPRS=0.756, DC=0.061). However, adding the PRS also improved prediction accuracy for melanoma (CPRS=0.664, DC=0.042), breast (CPRS=0.631, DC=0.060), and colorectal (CPRS=0.716, DC=0.030) cancers, which have multiple environmental risk factors. The highest overall C-index was observed for lung (CPRS=0.849) and bladder (CPRS=0.814) cancers, which was primarily attributed to non-genetic predictors (C without PRS: lung = 0.846; bladder = 0.808). Changes in the AUC at 5 years of follow-up were of similar magnitude (Supplementary Table 5).
For 15 out of 16 cancers, incorporating the PRS resulted in significant improvement in reclassification, as indicated by positive percentile-based net reclassification index (NRI) 11 values with 95% bootstrapped confidence intervals excluding 0 (Supplementary Table 6). The overall NRI was primarily driven by the event NRI (NRIe), which is the increase in the proportion of cancer cases reclassified to a higher risk group. Positive NRIe values >0.25 were observed for prostate, thyroid, breast, testicular, leukemia, melanoma, and colorectal cancers. The largest reclassification improvement in non-event NRI (NRIne) observed for the lung PRS (NRIne=0.015) and breast PRS (NRIne=0.012). Four cancers (testes, leukemia, kidney, oral cavity/pharynx) had significantly negative NRIne values indicating that adding the PRS decreased classification accuracy in cancer-free individuals.

Refinement of risk stratification
The ability of the PRS to refine risk estimates was assessed by examining 5-year absolute risk trajectories as a function of age, across strata defined by percentiles of PRS (high risk ³80%, average: >20% to <80%, low risk: £20%) and family history of cancer (Figure 3). Significantly diverging risk trajectories, overall and at age 60, were observed for prostate (P<4.5´10 -25 ), breast (P<9.3´10 -36 ), colorectal (P<2.0´10 -21 ), and lung cancers (P<0.031). For all cancers except lung, risk stratification was primarily driven by PRS. For instance, 60-year-old men with a high PRS but no family history of prostate cancer had a higher mean 5year disease risk (4.74%) compared to men with a positive family history and an average PRS (3.66%). For lung cancer, on the other hand, participants with a positive family history had higher average 5-year risks, even with a low PRS (0.54%), compared to those without (high PRS: 0.46%; low PRS: 0.29%). There was evidence of interaction between the PRS and family history of cancer for prostate (P = 9.0´10 -128 ), breast (P = 1.7´10 -104 ), colorectal (P = 8.7´10 -14 ) cancers (Supplementary Table 7). For lung cancer the interaction with family history was limited to the high PRS group (P = 5.9´10 -3 ).
We also compared 5-year risk projections across strata of PRS and modifiable risk factors. Effects of multiple risk factors were combined into a single score by generating summary linear predictors for each cancer (see Methods for details). For several common cancers, individuals with a high PRS were predicted to have higher cancer risk, even modifiable risk factor scores below the median (Figure 4). PRS achieved significant risk stratification for breast cancer (pre-menopausal: P<5.9´10 -12 ; post-menopausal: P<4.3´10 -50 ), colorectal cancer (P<1.8´10 -42 ), and melanoma (P<4.6´10 -105 ) (Figure 4). The same pattern of stratification was observed for NHL, leukemia, pancreatic, thyroid, and testicular cancers (Supplementary Figure 3). For other phenotypes, lifestyle-related risk factors had a stronger overall influence on risk trajectories than PRS ( Figure 5). However, the stratifying by levels of PRS still resulted in significantly diverging risk projections for several cancers (lung: P<1.1´10 -13 ; oral cavity/pharynx: P<1.2´10 -12 ; kidney: P<1.1´10 -13 ). For bladder cancer, the risk trajectories for high PRS/reduced modifiable risk and low PRS/high modifiable risk were overlapping (P=0.98).

Quantifying population-level impact
Population attributable fractions (PAF) were used to summarize the relative contribution of genetic susceptibility and modifiable risk factors to cancer risk at the population level. In order to allow comparisons between PAF estimates, the PRS and modifiable risk score distributions were both dichotomized at ³80 th percentile. All risk factors nominally contributed (P<0.05) to cancer incidence (Figure 6; Supplementary

DISCUSSION
Cancer is a multifactorial disease with a complex web of etiological factors, from macro-level determinants, such as health policy, to individual-level characteristics, such as health-related behaviors and heritable genetic profiles. Heritable and modifiable risk factors act in concert to influence cancer development, but their relative contributions to disease risk are rarely compared directly in the same population. In this study we provide new insight into the potential utility of PRS for cancer risk prediction and provide insight into the relative of contribution of genetic and modifiable risk factors to cancer incidence at population level.
Our first major finding is that cancer-specific PRS comprised of lead GWAS variants improve risk prediction for all 16 cancers examined. However, the magnitude of the resulting improvement in prediction varies substantially between sites. In evaluating the added predictive value of the PRS it is important to keep in mind that achieving the same incremental increase in the C-index/AUC is more difficult when the baseline model already performs well 12 . This was applicable to most cancers, where age and/or sex alone achieved non-trivial risk discrimination (C-index/AUC>0.60). Expanding the set of predictors to include modifiable risk factors further improved discrimination, as previously shown 13 . By adding the PRS to the most comprehensive risk factor models facilitated by our data, we adopted a conservative approach for quantifying its added predictive value, which provides an informative benchmark for future efforts seeking to incorporate genetic predisposition in cancer risk assessment.
Cancer sites for which the PRS resulted in the largest gains in prediction performance included prostate, testicular, and thyroid cancers, as well as leukemia, and melanoma. This is consistent with high heritability estimates reported for these cancers in twin studies 14 and our analyses in the UK Biobank 15 . Modelling the PRS in addition to established risk factors yielded very modest improvements in risk discrimination for cancers of the lung, endometrium, bladder, oral cavity/pharynx, and kidney. These cancers have strong environmental risk factors, such as smoking, alcohol consumption, obesity, and HPV infection, some of which were captured in our analysis. Limited predictive ability for cervical and endometrial cancers may also be due to a low number of variants included in the PRS (9 and 10, respectively). The association of the lung cancer PRS with cigarettes per day 16 may have diminished its apparent predictive value when added to a model with smoking status and intensity, which already achieved an AUC>0.80 making difficult to elicit further improvement. Furthermore, PRS may be particularly relevant for assessing lung cancer risk in never smokers, since other risk factors have a limited impact in this population.
Few pan-cancer PRS studies have been conducted in prospective cohorts and none have considered the breadth of modifiable risk factors that we evaluated. Shi et al. 17 tested 11 cancer PRS in cases from The Cancer Genome Atlas and controls from the Electronic Medical Records and Genomics Network. This analysis was limited by fewer risk variants in each PRS, as well as potential for bias due to selection of cases and controls from different populations. A phenome-wide analysis in the Michigan Genomics Initiative cohort by Fritsche et al. 18 examined PRS for 12 cancers and reported similar associations for the target phenotype. However, risk stratification was not formally evaluated. Considering cancerspecific studies, the PRS presented here achieved superior prediction performance for some cancers [19][20][21][22] , but not others 23,24 . For pancreatic cancer 25 and melanoma 26 , our results are consistent with previous analyses using PRS of similar composition. Generally, comparison of prediction performance is complicated by differences in PRS content, population characteristics, and inclusion of different nongenetic predictors. Outside the cancer literature, our conclusions align with a recent study of ischemic stroke, which demonstrated that the PRS is similarly or more predictive than multiple established risk factors, including family history 27 .
Our second major finding advances the idea of using germline genetic information to refine individual risk estimates. We show that incorporating PRS improves risk stratification provided by conventional risk factors alone, as illustrated by significantly diverging 5-year risk projections within strata based on family history or modifiable risk factors. For certain cancers, including some with strong environmental risk factors, such as melanoma, breast, colorectal, and pancreatic cancers, PRS was the primary determinant of risk stratification. For others, such as lung and bladder cancers, modifiable risk factors had a stronger impact on 5-year risk trajectories. A consistent finding for all cancers was that individuals in the top 20% of the PRS distribution with an unfavorable modifiable risk factor profile had the highest level of risk, with evidence that the effects of PRS and modifiable risk factors may be synergistic. Taken together, these findings highlight the potential for attenuating high genetic risk by adhering to a healthier lifestyle. Similar risk stratification results based on genetic and modifiable risk factors have also been reported for coronary disease 28 and Alzheimer's 29 .
In addition to evaluating predictive performance and risk stratification, our work demonstrates the relevance of common genetic risk variants at the population level. High genetic risk (PRS≥80 th percentile) explained between 4.0% and 30.3% of new cancer cases, and for many phenotypes this exceeded PAF estimates for modifiable risk factors or family history. The contribution of genetic variation to disease risk is typically conveyed by heritability, which is an informative metric, although not easily translated into a measure of disease burden useful in a public health context. Recent work on cancer PAF in the UK 30 and a series of publications from the ComPARe initiative in Canada 31,32 examined wide range of modifiable risk factors. Despite providing useful data, these studies overlook the contribution of genetic susceptibility. Our work addresses these limitations by providing a more complete perspective on the determinants of cancer and potential impact of future prevention policies.
In evaluating the contributions of our study, several limitations should be acknowledged. First, we did not account for the impact of workplace exposures and socio-economic determinants of health, thereby underestimating the role of non-genetic risk factors. We also lacked data on several known carcinogens, such as ionizing radiation, and clinical biomarkers, such as prostate-specific antigen, thus limiting the extent to which our results inform risk discrimination for certain cancers. Information on family history was also not available for all cancer types. Second, since the UK Biobank cohort is unrepresentative of the general UK population due to low participation and resulting healthy volunteer bias 33 , we may have underestimated PAFs for modifiable risk factors. Finally, the models presented here are calibrated to the UKB population and we urge caution in extrapolating prediction performance and absolute risk projections to other populations. Since our analytic sample is restricted to individuals of predominantly European ancestry, this limits the applicability of our findings to diverse populations. This work has several important strengths. The UK Biobank resource enabled us to simultaneously evaluate heritable and modifiable cancer risk factors in a population-based cohort with uniform deep phenotyping. We report a series of metrics that comprehensively characterize different dimensions of predictive performance that can be improved by incorporating genetic risk scores. While our results are promising, we anticipate that the performance of the PRS reported here may be enhanced by adopting less stringent p-value thresholding to include additional risk variants, optimizing subtype-specific weights, and implementing more sophisticated PRS models that incorporate linkage disequilibrium structure, functional annotations, or SNP interactions. Some of these strategies are already being successfully implemented 4,23 . We also provide insight into PRS modelling by showing that accounting for the variance in risk allele effect sizes improves PRS performance. This approach may be particularly advantageous for PRS derived from multiple sources rather than a single GWAS. Throughout this study we consider a relatively lenient definition of high genetic risk, corresponding to the top 20% of the PRS distribution. Exploring other cut-points will be informative, however, our results are valuable for demonstrating that the utility of PRS for stratification is not limited to the most extreme ends of the genetic susceptibility spectrum. This threshold is also compelling from a population-health perspective, as it allows us to quantify the proportion of cases attributed to a risk factor with a 20% prevalence.
Genetic risk scores have the potential to become a powerful tool for precision health, but only if the resulting information can be understood and acted on appropriately. One important consideration is the accuracy and stability of PRS-based risk classifications, especially at clinically actionable risk thresholds that exist for certain cancers. For instance, there are established screening programs for breast and colorectal cancers, and increasing evidence supporting the effectiveness of low-dose computed tomography for lung cancer screening 34,35 . For these cancers PRS could be used to adjust the optimal age for screening initiation and/or intensity. However, to justify this, studies are needed to demonstrate the benefit of using PRS to supplement conventional screening criteria. Such trials are already underway for breast cancer, where genetic risk scores are being incorporated to personalize risk-based screening 36 . For other cancers, such as prostate, screening remains controversial and PRS may prove useful in identifying a subset of high-risk individuals who may benefit the most from screening.
Another area where PRS may prove useful is for prioritizing individuals for targeted health and lifestylerelated interventions. In support of this, our study demonstrates that those with the highest levels of genetic risk, based on the PRS, may also experience larger decreases in risk from shifting to a healthier lifestyle. However, there is also accumulating evidence that simply reporting genetic risk information to individuals does not induce behavior change that could lead to meaningful reductions in risk 37 . Therefore, progress in our ability to construct and apply PRS to identify high-risk individuals must be also accompanied by the development of effective behavioral interventions that can be implemented in response to high disease risk, in addition to early detection and screening protocols.
Ultimately, the impact of PRS on clinical decision-making should be carefully evaluated in randomized trials prior to deployment in healthcare settings. By demonstrating cancer-specific improvements in risk prediction, as well as the substantial proportion of cancer incidence that is captured by known genetic susceptibility variants, we provide novel evidence that contextualizes the potential for using genetic information to improve cancer outcomes.

Study Population
The UK Biobank (UKB) is a population-based prospective cohort of individuals aged 40 to 69 years, enrolled between 2006 and 2010. All participants completed extensive questionnaires, in-person physical assessments, and provided blood samples for DNA extraction and genotyping 7 . Health-related outcomes were ascertained via individual record linkage to national cancer and mortality registries and hospital inpatient encounters 7 . Details of the quality control and phenotyping procedures for this dataset have been previously described 15,16 . Briefly, individuals with at least one recorded incident diagnosis of a borderline, in situ, or malignant primary cancer were defined as cases. Cancer diagnoses coded by International Classification of Diseases (ICD)-9 or ICD-10 codes were converted into ICD-O-3 codes using the SEER site recode paradigm in order to classify cancers by organ site.
Participants were genotyped on the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%) 7 . Genotype imputation was performed using the Haplotype Reference Consortium as the main reference panel, supplemented with the UK10K and 1000 Genomes phase 3 reference panels 7 . Genetic ancestry principal components (PCs) were computed using fastPCA 38 based on a set of 407,219 unrelated samples and 147,604 genetic markers 7 . All analyses were restricted to self-reported European ancestry individuals with concordant self-reported and genetically inferred sex. To further minimize potential for population stratification, we excluded individuals with values for either of the first two ancestry PCs outside of five standard deviations of the population mean. Based on a subset of genotyped autosomal variants with minor allele frequency (MAF)≥0.01 and genotype call rate ≥97%, we excluded samples with call rates <97% and/or heterozygosity more than five standard deviations from the mean of the population. With the same subset of SNPs, we used KING 38 to estimate relatedness among the samples. We excluded one individual from each pair of first-degree relatives, preferentially retaining individuals to maximize the number of cancer cases remaining, resulting in a total of 413,870 UKB participants.

Polygenic Risk Scores
In order to derive polygenic risk scores (PRS) for each of the 16 cancers, we extracted previously associated variants by searching the National Human Genome Research Institute (NHGRI)-European Bioinformatics Institute (EBI) Catalog of published GWAS. For every eligible GWAS, both the original primary manuscript and supplemental materials were reviewed. Additional relevant studies were identified by examining the reference section of each article and via PubMed searches of other studies in which each article had been cited. We abstracted all autosomal variants with minor allele frequency MAF≥ 0.01 and P<5´10 -8 identified in populations of at least 70% European ancestry and published by June 2018, with the exception of one colorectal cancer GWAS 39 (published in December 2018). For inclusion in the PRS we preferentially selected independent SNPs (LD r 2 <0.3) with the highest imputation score and we excluded SNPs with allele mismatches or MAF differences >0.10 relative to the 1000 Genomes reference population, and palindromic SNPs with MAF≥0.45. For associations reported in more than one study of the same ancestry and phenotype, we selected the one with the most information (i.e., which reported the risk allele and effect estimate) with the smallest p-value. Further details of the PRS development approach, including a list of source studies, is described by Graff et al 16 .
We considered three approaches for combining risk variants in the PRS. First, we used standard PRS weights, corresponding to the log odds ratio (b) for each risk allele: We compared this to an unweighted score corresponding to the sum of the risk alleles, which is equivalent to assigning all variants an equal weight of 1: !"# /.0 = #*! ( + #*! , + ⋯ + #*! . Lastly, we applied inverse variance (IV) weights that incorporated the standard error (SE) of the SNP log(OR) to account for uncertainty in risk allele effect sizes and downweigh the contribution of variants with less precisely estimated associations (weights provided in Supplementary Data 1): Each PRS was standardized across the entire analytic cohort to have a mean of 0 and standard deviation (SD) of 1.

Development of risk models for each cancer
Cancer-specific prediction models consisting of four classes of risk factors were developed: i) demographic factors (age and sex); ii) family history of cancer in first-degree relatives; iii) modifiable risk factors; and iv) genetic susceptibility, represented by the PRS. Family history of cancer was derived based on self-reported illnesses in non-adopted first-degree relatives, which only listed cancers of the prostate, breast, bowel, or lung. In addition to these four cancer sites, family history of breast cancer was included as a predictor for ovarian cancer 40,41 . Models for pancreatic cancer included a composite variable for family history of cancer at any of these four sites 42,43 . Selection of modifiable risk factors was informed by literature review and reports, such as the European Code Against Cancer 44 , with an emphasis on risk factors that are likely to have a causal role. Final models included established environmental and lifestyle-related characteristics that were collected for the entire UK Biobank cohort (Supplementary Table 1).
Cause-specific Cox proportional-hazard models were used to estimate the hazard ratios (HR) and corresponding 95% confidence intervals (CI) for genetic and lifestyle factors associated with each incident cancer. Death from any cause, other than cancer site-specific mortality, was treated as a competing event.
Information on primary and contributing causes of death was used to identify cancer site-specific mortality. Follow-up time was calculated from the date of enrollment to the date of cancer diagnosis, date of death, or end of follow-up (January 1, 2015). For each cancer, individuals with a past or prevalent cancer diagnosis at that same site were excluded from the analysis, while individuals diagnosed with cancers at other sites were retained in the population. All models including the PRS were also adjusted for genotyping array and the first 15 genetic ancestry PCs. For the PRS, HR estimates correspond to 1 SD increase in the standardized genetic score.

Risk model evaluation
The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). Calibration was assessed with a Hosmer-Lemeshow goodness-of-fit statistic modified for time-to-event outcomes 45 , and by plotting the expected event status against the observed event probability 46 across risk deciles. For rarer cancers calibration was assessed across quantiles of risk to ensure a minimum of 5 cases per group.
Violation of the proportionality of hazards assumption was assessed by examining the association between standardized Schoenfeld residuals and time.
We evaluated nested models starting with the most minimal set of predictors, such as demographic factors, followed by models including family history of cancer and modifiable risk factors, and finally models incorporating the PRS. Risk discrimination was assessed based on Harrell's C-index, calculated as a weighted average between 1 and 5 years of follow-up time, and Area Under the Curve (AUC) at 5 years. We also report pseudo R 2 coefficients based on Royston's measure of explained variation for survival models 10 . Percentile-based net reclassification improvement (NRI) index 11 was used to quantify improvements in reclassification. NRI summarizes the proportion of appropriate directional changes in predicted risks. Any upward movement in risk categories for cases indicates improved classification, and any downward movement implies worse reclassification. The opposite is expected for non-cases: Where : ? is the number of individuals up-classified and : D is the number down-classified. Overall NRI is the sum of the NRI in cases and NRI in non-cases: *"6 = *"6 7 + *"6 .7 . Bootstrapped confidence intervals were obtained based on 1000 replicates.

Risk stratification: genetic vs. modifiable factors
For each individual, we estimated the 5-year absolute risk of being diagnosed with a specific cancer using the formula of Benichou & Gail 47 , as implemented by Ozenne et al 48 . Absolute risk trajectories were examined as a function of age across strata defined by genetic and modifiable risk profiles, as well as family history. Individuals in the top 20% of the PRS distribution (PRS³80 th percentile) for a given cancer were classified has having high genetic risk, those in the bottom 20% (PRS£20 th percentile) were classified as low risk, and the middle category (>20 th to <80 th percentile) classified as average genetic risk.
Modifiable risk factors were summarized by generating summary linear predictors (predicted log-hazard ratios) based on risk factors in Supplementary Table 1, excluding age, sex, and family history. Individuals above the median of this risk score distribution were considered to have an unfavorable modifiable risk profile. Risk trajectories in each stratum were visualized by fitting linear models with smoothing splines across individual risk estimates as a function of age. Differences in mean absolute risk at age 60 were tested using a two-sample t-test. We also tested for interaction between the 3-level ordinal PRS variable and the modifiable risk score (dichotomized at the median) in a linear model with the predicted absolute risk as the outcome.

Etiology: contribution of genetic vs. modifiable risk factors
The relative contribution of genetic and modifiable cancer risk factors at the population level was quantified with population attributable fractions (PAF) using the method of Sjölander & Vansteedlandt 49,50 based on the counterfactual framework. To obtain comparable AF estimates, thresholds for high genetic risk and high burden of modifiable risk factors corresponded to the top 20% (³80 th percentile) of each risk score distribution.

DATA AVAILABILITY
The    Low PRS corresponds to £20 th percentile, average PRS is defined as >20 th to <80 th percentile, and high PRS includes individuals in the ³80 th percentile of the genetic risk score distribution. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60.  Figure 4: Predicted 5-year absolute risk trajectories for cancers where risk stratification is driven by genetic factors. Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to £20 th percentile, average PRS is defined as >20 th to <80 th percentile, and high PRS includes individuals in the ³80 th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60, except for pre-menopausal breast cancer where differences at age 50 were tested. Absolute risk (95% CI) Absolute risk (95% CI) Figure 5: Predicted 5-year absolute risk trajectories for cancers where risk stratification is driven by modifiable risk factors. Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to £20 th percentile, average PRS is defined as >20 th to <80 th percentile, and high PRS includes individuals in the ³80 th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60.    Weekly alcohol intake was derived by summing up the total number of drinks per week across different types of alcoholic beverages (beer, wine, spirits) and converting to units of alcohol based on values from UK Composition of foods integrated dataset: https://www.gov.uk/government/publications/composition-of-foods-integrated-dataset-cofid  Hazard ratio estimates are adjusted for age at assessment (years), sex (if applicable), family history of cancer (for sites with available self-reported information), genotyping array, the first 15 genetic ancestry principal components, and any additional risk factors applicable to each cancer listed in Supplementary Table 1 2.

Supplementary
Harrell's C-index was calculated as a weighted average between 1 and 5 years of follow-up 3. AUC values were estimated at 5 years of follow-up The only modifiable risk factor is air pollution, modeled here as a categorical variable corresponding to PM2.5 levels above the median 3. Family history variable refers to self-reported breast, prostate, lung, or bowel cancer in a first-degree relative  Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to £20 th percentile, average PRS is defined as >20 th to <80 th percentile, and high PRS includes individuals in the ³80 th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60 or age 50 for cervical and testicular cancers.

Supplementary
Note: for cervical cancer estimates of absolute risk were derived from the Cox proportional hazards model without time-varying PRS effects