Heritability And Genetic Correlations For Hormone-Sensitive Cancers In The UK Biobank: A Molecular Evidence of Shared Aetiology

Hormone-related cancers, including cancers of the breast, prostate, ovaries, uterine, and thyroid, globally contribute to the majority of cancer incidence. We hypothesize that hormone-sensitive cancers share common genetic risk factors that have rarely been investigated by previous genomic studies of site-specic cancers. To test this hypothesis, we analysed ve hormone-sensitive cancers in the UK Biobank as a single disease. We observed that a signicant proportion of variance in disease liability was explained by the genome-wide single nucleotide polymorphisms (SNPs), i.e., SNP-based heritability on the liability scale was estimated as 10.06% (SE 0.70%) for the disease. Moreover, we found 55 genome-wide signicant SNPs for the disease, using a genome-wide association study. Our nding suggests that heritable genetic factors may be a key driver in the mechanism of carcinogenesis shared by hormone-sensitive cancers. editions). We mapped all cancer-related ICD codes into “phecodes” which better reect disease coding as relevant for clinical practice 52 excluded participants who had self-reported having had cancer but did not have a record in the cancer registry. For participants with multiple cancer diagnoses, we the rst diagnosed cancer based on the date of diagnosis. As controls, we used participants with no report of any type of cancer-based on self-report, cancer registry, or hospital inpatient data, or benign or in situ tumours from the cancer registry. Applying criteria previously used by others 9 , we grouped cancers according to whether they were sensitive to hormonal variation, classifying cancers of the breast, endometrium, ovary, prostate, and thyroid as “hormone-sensitive cancer”. 235,512 controls and 15,197 hormone-sensitive cancer cases. Incident cancer cases were as those diagnosed after the baseline assessment before the end of October 2016) and prevalent cases were those diagnosed before baseline assessment covariates covariates statistical 250,709 white Europeans were analysed in this study; including 15,197(6.06%) hormone-sensitive cancer cases. In summary, 53.8%(n=155,392) of the study samples were women, 93.48% (n=270,014) were current alcohol drinkers, 42.5%(n=122,628) were overweight, 54.59%(n=157,690) had never smoked and 35.2% (n=101,521) were previous smokers. There was a total of 21,973 incident cancer cases [diagnosed with cancer after baseline during follow-up] with a median follow-up year of 7.7 years (interquartile range [IQR]=7.08 - 8.4) and 24,438 prevalent cancer cases (diagnosed with cancer before baseline assessment). and non-obesity-related cancers (h 2 =1.69%(se=0.48%), P=4.66E-04). The SNP-based heritability estimate using summary-level data shows a similar pattern of heritability estimates for all the subgroupings of cancer (Supplementary Table 5). We also restricted the analysis to incident cancer cases only in the UKB. Similarly, heritability estimates in the liability scale for hormone-sensitive cancers were consistently higher than any other group of cancers when using incident cases only (h 2 = 5.92%, se=1.10%, P=7.84E-08 for GREML and h 2 = 5.60%, se=1.58%, P=3.94E-04 for LDSC) (Supplementary Table 5). The heritability estimates for non-obesity related cancers were not statistically signicant in both methods (h 2 = 0.43%, se=0.75%, P=5.67E-01 for GREML and h 2 = 0.97%, se=2.50%, P=6.98E-01 for LDSC). In contrast, we observed a signicant but lower heritability estimates for prospective overall cancer cases (h 2 = 3.1%, se=0.48%, P=9.29E-11 for GREML and h 2 = 1.84%, se=0.72%, P=1.06E-02 for LDSC). a signicant linear to signicantly indicating two were not suggesting that the is not for concertation to develop hormone-sensitive cancer. The observed correlation between standing height and caner development might be explained by the increased standing height that reects more stem cells as a risk of acquiring mutations during cell division over time, and further circulating level of IGF-1 as the major determinants of height 42 . There can be, however, other possible explanations. In contrast, serum oestradiol level showed a negative genetic correlation with all cases of hormone-sensitive cancers suggesting the presence of lowered risk. However, in the analysis restricted to prospective hormone-sensitive cancers cases, serum oestradiol exhibited a positive genetic correlation (although not signicant), suggesting that exposure to ovarian steroids increases the risk of developing hormone-sensitive cancers 43 . The current study further revealed that the positive genetic correlation of height and IGF-1 with cancer remained positively signicant with prospective cases suggesting that the correlation is related to the commonality within the combined cancers in their gene alteration and gene expression pattern. between nature of current study to detect a phenotypic and genetic correlation between hormone-sensitive cancers and non-cancer traits. if not only one, signicantly explaining the phenotype variance in cancer. Although the estimated genetic correlations are low, can still used as a training set in genomic risk prediction to improve the accuracy.

Genotypic Data: to control for artifacts introduced to the data during genotyping, initial standard quality control (QC) measures were applied to all data sets before analyses. The genotype data in the UKB includes 92,693,895 SNPs genotyped from 488,377 study participants. The QC procedure for the genotypic data focused on two levels i.e., at individual and SNP level. First, at the individual level, we exclude individuals with a call rate of less than 95% and individuals who did not self-identify as white British ancestry or who exhibited sex inconsistencies (sex mismatch between self-reported phenotype sex and genotype determined sex data) and had a putative sex chromosome aneuploidy (chromosomal anomalies). To check identical genes shared through common ancestors, we randomly selected individuals from a pair and excluded those pairs in which their genomic relationship is larger than 0.05. Furthermore, to avoid bias induced as a result of population strati cation and to ensure participants are taken from a relatively homogenous population, we checked the population substructure in the Principal Component (PC) analysis to the excluded individual as population outliers with the rst or second PC outside ± 6 SD of the population mean. Based on the release of the UKB genotype dataset, for those who were included in both the rst and second, we calculated the genotype discordance rate between imputed genotype of the two versions for each SNP and each individual and exclude those with a genotype discordance rate of more than 0.05. Secondly at the SNP level, genetic markers with an INFO score <0.6, markers that deviate signi cantly from Hardy-Weinberg equilibrium (HWE) (1.00E-07) or with a call rate <0.95, with MAF <0.01 and ambiguous or duplicated SNPs were excluded. Additional speci c cohort-level quality control measures can be found in the reference cohort-speci c publications 53 .
To avoid systematic differences between cases and controls being interpreted as genetic variance, a more stringent quality-control process was then applied to the data. This included excluding individuals with incomplete phenotype data and re-moving markers with a minor allele frequency of less than 1%. In this study, we used high-quality SNPs from the International HapMap Project [HapMap3] that were reliable in estimating genetic variance and covariance at the genome-wide level, feasible for more complicated analyses and there was no substantial difference between estimated genetic variance from HapMap3 and 1000 genome SNPs 54 . After QC, 1,217,312 HapMap3 SNPs with 288,837 study participants have remained for the analyses.

Statistical Information
For the Univariate heritability estimate, we assumed a linear mixed model for the heritability analysis as follows: where y is a vector of the response variable (cancer status); b is the vector of regression coe cients for the xed effects; a is additive genetic effects with variance; ε is residual (environment effects) with variance and Z and X is the design of matrix of the xed effects 14 .
For the heritability estimate, the genomic relationship matrix (GRM) was constructed using plink software 55,56 . To estimate the Univariate heritability of the subgroups of cancers, two different methods were applied. First, we used the genomic relationship matrix-restricted maximum likelihood (GREML) method, which is based on the individual level genotype data. Second, as linkage disequilibrium score (LDSC) regression method largely depends on summary level genotype data, using the UKB individual genotype data, we computed the summary statistics. We used the pre-computed LD score for white Europeans 57 which is considered suitable for standard LDSC analysis in European populations to use it in a command-line tool of LDSC. For each method, we used both incident and prevalent cases together in the dataset as cases. The analyses were repeated restricting prospective [incident] cancer cases only. With the use of the prevalence rate of the subgroups of cancers, the observed scale estimates were transformed to liability scale according to Lee et al using MTG2 software. We used χ 2 which is distributed following a chi-square distribution with 2 degrees of freedom and Wald tests.
The GREML method requires individual-level genotype data and is computationally demanding 55 . The sample size of the UKB is large, therefore, we randomly subdivided the dataset to shorten computing time and applied a meta-analysis approach. We rst divided the samples into two groups, UKBB1 (91,472 individuals from the rst release) and the other samples except for UKBB1, named as UKBB2. In UKBB2, 197,365 individuals with genotype data passed the QC. We further randomly divided the UKBB2 into two groups of equal size (denoted as UKB2A [n=98,682] and UKB2B [n=98,683]) and tted all models mentioned above for each group. We then meta-analysed the heritability and other related estimates from UKB2A, UKB2B, and UKBB1 using the Fisher's method 58 . For UKBB2, we used the same variables for adjustment as UKBB1.

Genome-wide association (GWAS) analysis
Recent advances in computational methods have facilitated the investigation of genetic variants and their effects on multiple complex diseases, i.e., GWAS. After estimating heritability, we, therefore, extend the analysis to estimate the effects of genome-wide SNPs associated with causal genes on the group of hormone-sensitive cancers as a single trait GWAS, using a logistic regression model. The phenotype used for the GWAS analysis is similar to the SNP-based heritability estimate. In total, 15,197 hormone-sensitive cancer cases, including breast cancer, prostate, endometrial, ovarian, thyroid, and 223,207 controls were included in the GWAS analysis. The phenotype is similarly adjusted to multiple variables to the heritability estimate to identify signi cant SNPs using the list of common SNPs from HapMap3. We rst computed the statistical power of the study for hormone-sensitive cancers using the online available software GAS Power calculator for genomic study 59 . The power calculation is conducted under the assumptions of genetic models (i.e., additive), 5% minor allele frequencies (MAFs), pair-wise LD, a 6.34% disease prevalence, 1:1 case-to-control ratio, and 5% level of signi cance. We found the sample size of hormone-sensitive cancers was su cient to achieve 80% statistical power according to the additive genetic model applied. The power curve is attached in the supplementary le. [ Supplementary Fig. 2].
We performed post GWAS analyses that involves constructing a quantile-quantile (QQ) plot for hormone-sensitive cancers in each case [all hormonesensitive cancers Vs prospective hormone-sensitive cancer cases only]. We further quanti ed the degree of genomic in ation factor (lambda = λ) i.e., how best the observed data points t to the expected value. The QQ plots in each case showed the bulk of the distribution is in the lower tail of the graph.
We identi ed genome-wide signi cant SNPs for hormone-sensitive cancers using plink software 56 to obtain the GWAS P-values that were used for the Manhattan plot for qqman package in R. For the post GWAS analysis to see the genomic in ation factor (λ = lambda), we plot QQ plot using QCEWAS package in R. λ is the median of the resulting chi-square test statistics divided by the expected median of the chi-square distribution. The median of a chi-squared distribution with one degree of freedom is 0.4549364, i.e., [qchisq(0.5,1) = 0.4549364]. A λ value is calculated from p-values in the output we have from the genome-wide association analysis. Low signi cant results are removed (there are more signi cant results than expected) to increase the lambda value. To rescale the lambda value to provide better information, we use the following formula to rescale the lambda calculated 60 .
where n is the study sample size for cases and controls respectively, and ncases,1000 and ncontrols,1000 is the target sample size of 1000.
Phenotypic correlation: Estimates of phenotypic and genetic correlation were computed separately between hormone-sensitive cancer and each non-cancer trait. The phenotypic correlation was estimated using Pearson correlations between each pair of traits for complete observation in R. To examine the genetic architecture further, we performed phenotypic correlation for components of hormone-sensitive cancers using the leave-one-out analysis approach.
The results are summarized and presented in Table 2.

Genetic correlation analysis
As Bivariate LDSC estimates are not biased with sample overlap wherein controls are common in both traits and computationally very e cient 24 , we run the genetic correlation to generate an overview of the genetic relationship between hormone-sensitive cancers and the six non-cancer subgroup traits. We then used the Bivariate GREML approach to estimate the genetic correlation between hormone-sensitive cancers and seven non-cancer traits. Further, we examine the genetic correlation between each component of hormone-sensitive cancers using a pair-wise comparison approach. The genetic correlation (r g ± SE) is calculated using cross-trait LD Score regression method.
As most oestradiol hormone is bound to the serum protein sex-hormone binding globulin (SHBG) and Albumin, i.e., biologically unavailable to exhibit its physiologic effect, implying the need to compute the free hormone level, we calculated the free concentration using serum oestradiol and the concentration of SHBG and Albumin with their respective association constant K 61 .
where cFo = calculated free oestradiol; E 2 = serum oestradiol level; N E2 =0.64x10 9 *Albumin level +1; N SHBG = 5.55x10 4 * SHBG level; and N TOTAL = N SHBG Leave-one-out (LOO) approach to determine the genetic correlation of hormone-sensitive cancers The iterative scheme of leave-one-out analysis is carried out by using a different possible combination of hormone-sensitive cancers in cross-trait LDSC regression. The grouped hormone-sensitive cancer comprised of 5 distinct heterogeneous cancers and we created a 5-fold leave-one-out analysis that involves the different possible combinations of the hormone-sensitive cancers. During each iterative step, we exclude data of one independent cancer at a time and use the remaining cancer types as grouped hormone-sensitive cancers to compute the genetic correlation in Bivariate LDSC. These steps are iteratively completed ve times. The analysis sketch map demonstrating all the possible combinations is summarized in Supplementary Fig. 4.

Gene-environment interaction
Finally, we checked the gene-environment interaction for hormone-sensitive cancers with selected traits using Bivariate GREML and GxEsum techniques for traits with continuous level measurement. The Bivariate GREML approach is applied with the assumptions of gene-environment interactions in contrast to the Univariate GREML model that assumes the absence of GxE interactions. Here in this method, we strati ed the hormonesensitive cancer phenotype by traits regarded as environments [i.e., BMI-normal Vs high; metabolic environment-favourable Vs unfavourable; and sexmale Vs female] to look for interactions. Such approach allows us to test whether the genetic effects are heterogeneous if individuals lie in the same environment thereby test for gene-environment interaction.
A recently proposed alternative method for quantitative traits, called GxEsum is able to estimate gene-environment interaction. This method is built on LDSC approach by using GWAS summary statistics and suggested as computationally e cient method 26 . For SNP effects modulated by quantitative environment, the expected chi-square statistics (χ²) is , where N is the number of individuals, M is the number of SNPs, is the variance due to GxE, is the variance due to residual heterogeneity or scale effects caused by residual-environment interaction (RxE), is the LD score at the variant j.
The P-value in this study is calculated applying the Wald-test with the assumption of the distribution of estimated genetic correlation was normal. The statistical signi cance level was set at p<0.05 (2-tailed Reporting summary: Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability: All data will be available to approved users of the UK Biobank upon application. The authors state that all data necessary for con rming the conclusions presented in the manuscript are represented fully within the manuscript.

Results
The characteristics of participants strati ed by a group of cancer diagnoses are shown in Table 1

SNP-based Heritability (SNP-h2) for Groups of Cancers
All grouped cancer (prevalent and incident) cases were included for the estimation of SNP-based heritability using individual-level data 14,15 . We also used GWAS summary statistics to estimate SNP-based heritability, applying summary-level data 16 . In both approaches, the estimated heritability was transformed from the observed scale to the liability scale 15 , assuming that the population lifetime prevalence of the group of cancers was the same as the proportion of cases in the sample used in this study. From the estimates in Fig. 1, it is apparent that the SNP-based heritability estimated in GREML for hormone-sensitive cancers was the high (h 2 =10.06%(se=0.70%), P=2.11E-46). In addition, the SNP-based heritability was examined for overall cancers and by grouping those cancer related to obesity. Signi cant heritability estimates for other cancer subgroupings was also observed, e.g. obesity-related cancer (h 2 = 5.26%(se=0.47%), P=4.56E-28); overall cancer (h 2 = 4.38%(se=0.31%), P=3.27E-44); non-hormone-sensitive cancer (h 2 =3.06%(se=0.72%), P=2.15E-05); and non-obesity-related cancers (h 2 =1.69%(se=0.48%), P=4.66E-04). The SNP-based heritability estimate using summary-level data shows a similar pattern of heritability estimates for all the subgroupings of cancer (Supplementary Table 5).
We also restricted the analysis to incident cancer cases only in the UKB. Similarly, heritability estimates in the liability scale for hormone-sensitive cancers were consistently higher than any other group of cancers when using incident cases only (h 2 = 5.92%, se=1.10%, P=7.84E-08 for GREML and h 2 = 5.60%, se=1.58%, P=3.94E-04 for LDSC) (Supplementary Table 5 Genome-wide common SNPs association study (GWAS) for hormone-sensitive cancers The heritability estimates for hormone-sensitive cancers were consistently shown to be signi cant and higher than the other cancer subgroups across all methods applied in the liability scale for both scenarios (i.e., all cancer cases and incident cancer cases only). This clearly suggests that a signi cant proportion of phenotypic variation in hormone-sensitive cancer is explained by the aggregated effects of inherited genetic factors. We further carry out GWAS using genome-wide common SNPs, to identify genetic variants that are associated with hormone-sensitive cancer risk (see method).
We combined heterogeneous cancers that share a characteristics mechanism of carcinogenesis that involve hormones into a single phenotype of hormone-sensitive cancer, totalling 15,197 cases (combined prevalent and incident) and 235,512 controls. Interestingly, our primary GWAS of grouped hormone-sensitive cancer uncovered 55 genome-wide signi cant variants that are associated with the risk of hormone-sensitive cancer at the genomewide signi cant level of p<5x10 −8 (Fig. 2). This analysis demonstrated the existence of shared genetic variants across the different cancer types grouped as hormone-sensitive cancers. For these genetic variants, we replicated 36 independent SNPs associated with the risk of a speci c type of cancer, such as breast, prostate, endometrial or ovarian cancer, which were identi ed in previous GWAS 17 Table 6).
In a GWAS analysis restricted to 7,038 incident hormone-sensitive cancer cases only (i.e., excluding prevalent cases), we found that signi cant associations were reduced from 55 to 33 signi cant SNPs. For these signi cant loci, 16 SNPs were located in already known susceptibility regions for hormone-sensitive cancers among the white European population, but they were independent of previously reported variants. The remaining 17 SNPs were in regions previously found to be associated with hormone-sensitive cancers among white Europeans. A list of SNPs identi ed from GWAS in hormone-sensitive cancers can be found in Supplementary Table 7. It was noted that genomic in ation factors were close to 1 for both GWAS analyses with all and incident hormone-sensitive cancer cases (λ 1000 (all cases) =1.003 and λ 1000 (prospective cases) = 1.003) ( Supplementary Fig. 3 ).
The most striking result to emerge from the phenotypic correlation data is that although there were similar patterns of signi cant correlations with most of the non-cancer traits in the analysis restricted to incident cases, some estimates were substantially changed. For example, we observed a substantially reduced positive phenotypic correlation between incident hormone-sensitive cancers and oestradiol level (r p =0.0025, se=0.0022; (P=2.68E-01)). Interestingly, a signi cant negative phenotypic correlation observed between incident hormone-sensitive cancers and APOA1 (r p = -0.0065, se=0.0022; (P=3.83E-03)). The results of these analyses are summarized in Table 2. Genetic correlation between group hormone-sensitive cancers and non-cancer traits In further analysis to explain the shared genetic architecture of grouped hormone-sensitive cancers, we estimated the genetic correlation with the six non-cancer subgroup traits using GWAS summary statistics (Supplementary Table 8) for Bivariate LDSC which is a fast and robust method 24 as a quick scan in the dataset and for those nominally signi cant traits using individual-level measurement in Bivariate GREML. We estimate the genetic correlation between grouped hormone-sensitive cancers and some non-cancer traits using individual-level genotype data analysed in Bivariate GREML. Interestingly, signi cant positive genetic correlations were observed between IGF-1 (r g = 8.43%, se=1.38%; (P=1.10E-09)); standing height (r g = 4.32%, se=1.31%; (P=9.59E-04)) and hormone-sensitive cancer that provides a suggestive clue to cancer aetiology wherein an increase in IGF-1 level and height confers a higher risk of hormone-sensitive cancer. Moreover, a marginally signi cant inverse genetic correlations were observed between hormone-sensitive cancers and three other non-cancer traits, namely serum oestradiol (r g = -40.86%, se=8.60%; (P=2.02E-06)); calculated free oestradiol (r g = -6.68%, se=1.60%; (P=3.15E-05)); SHBG (r g = -3.33%, se=1.92%; (P=8.20E-02)) and diastolic blood pressure (DBP) (r g = -4.40%, se=0.02116; (P=3.74E-02)) (Fig. 3).
In an analysis restricted to incident cancer cases, we observed a non-signi cant but positive genetic correlation for serum oestradiol (r g =17.08%, se=14.56%; (P=2.41E-01)) ( Fig. 4) contrary to the negative genetic correlation estimate obtained when all combined cases were analysed together (Fig.   3). This suggests that the genetic effects of oestradiol may be positively correlated with the genetic risk of incidence of hormone-sensitive cancer 25 , however, after the onset of hormone-sensitive cancer, the genetic association may be driven by a totally different mechanism, resulting in a negative genetic correlation. For standing height (r g = 9.01%, se=1.97%; (P=4.93E-06)) and IGF-1 (r g = 12.13%, se=2.50%; (P=1.31E-06)), the direction of estimated genetic correlation is consistent and always positive whether using all cases (Fig. 3) or incident cases only (Fig. 4). Apolipoprotein A (r g =11.16%, se=2.58%; (P=1.55E-05)) appeared to have a signi cant negative genetic correlation when using incident cases only, which was different from the result obtained with all cases, implying tumour suppressive role of Apolipoprotein A in the incidence of hormone-sensitive cancer development. Compared to all cases, we further noted a slightly signi cant and higher estimate of negative genetic correlation in calculated free oestradiol (rg = -8.86%, se=2.80%; (P=1.57E-03)); SHBG (r g = -8.78%, se=2.73%; (P=1.32E-03)) and educational status (r g = -11.95%, se= 4.85%; (P=1.39E-02)) for prospective cases. For diastolic blood pressure (r g = -2.06%, se= 2.82%; (P=4.62E-01)) a similar non-signi cant negative genetic correlation was observed even though the analyses for non-cancer traits were restricted to individuals who did not have cancer at baseline (Fig. 4).
In the analyses of genetic correlation using summary statistics in the UKB, though not statistically signi cant the estimates are mostly agreed with the individual level data estimates. The estimates for genetic correlation using summary statistics in Bivariate LDSC are summarized and presented in Supplementary Table 8.

Genetic Correlation between Cancers
We further quanti ed the genetic correlation among the speci c types of cancers in the group of hormone-sensitive cancers to see their shared genetic architecture. We used Bivariate LDSC that is computationally e cient and not biased by sample overlap in two sets of case-control data between which controls are common 24 . In the pair-wise comparison, we observed a positive genetic correlation between colorectal cancer and cancer of the kidney (r g = 0.3712, se = 0.2965); women breast cancer and uterine cancer (r g = 0.3211, se = 0.1990) although they were not signi cantly different from zero. We also found a negative, but non-signi cant, genetic correlation between prostate cancer and colorectal cancers (r g = -0.1073, se = 0.1314); uterine cancer and multiple myeloma (r g = -0.1474, se = 0.5053) ( Table 3). Although none of the estimated genetic correlations were signi cantly different from zero i.e., showing there is not a signi cant linear correlation to one another, most estimates were signi cantly different from 1 or -1, indicating that these types of cancers are genetically heterogeneous. The positive genetic correlations are colorectal cancer with cancer of the kidney, women breast cancer with uterine cancer. The negative genetic correlation includes prostate cancer with colorectal cancer, uterine cancer with multiple myeloma. The estimates with the standard error (r g ± se) are obtained applying the cross-trait ldsc regression.
Leave-one-out (LOO) analysis approach for hormone-sensitive cancers We conducted an iterative leave-one-out (LOO) analysis that involves a different combination of hormone-sensitive cancers (Supplementary Fig. 4).
There was a signi cant modest genetic correlation in the leave-one-out analysis between each component of the hormone-sensitive cancers. For example, we observed a modest positive genotypic correlation between female breast cancer and grouped hormone-sensitive cancer without female breast cancer (r g = 0.1662, se=0.0930); prostate cancer and grouped hormone-sensitive cancer excluding prostate cancer (r g =0.2209, se=0.1101); uterine cancer (rp =0.3487, se=0.1889) and grouped hormone-sensitive cancers without uterine cancer. For ovarian and thyroid cancer, since the number of cases was not su cient for Bivariate LDSC regression analysis, we excluded the two hormone-sensitive cancers from the leave-one-out analysis.
We further carried out the genetic correlation analysis into grouped hormone-sensitive cancers and other non-hormone-sensitive cancers in the UKB.
The genetic correlation between multiple myeloma and hormone-sensitive cancers excluding breast cancers demonstrated a positive genetic correlation (r g =0.1926, se=0.2295). We observed a higher genetic correlation between hormone-sensitive cancer without prostate cancer and colorectal cancer (r g =0.3061, se=0.1597). Hormone-sensitive cancers without uterine cancer demonstrated a higher genetic correlation with colorectal cancer (r g =0.1666, se=0.1229). None of the genetic correlations estimated here were statistically signi cant probably due to lack of power. Taken together, while these estimated genetic correlations suggest a common pathway in the aetiology of hormone-sensitive cancer, there is signi cant evidence of genetic heterogeneity among the cancer types (Table 4). Hormone-sensitive cancer type includes ve cancers namely women breast cancer, prostate, ovarian, uterine, and thyroid cancer. Hormonesensitive-1 is excluding women breast cancer; Hormone-sensitive-2 is excluding prostate cancer; Hormone-sensitive-3 is excluding ovarian cancer; Hormone-sensitive-4 is excluding uterine cancer; Hormone-sensitive-5 is excluding thyroid cancer. the estimate for ovarian and thyroid cancer is not estimated as the number of cases for the two cancers were not su cient suggesting that the two-cancer data is not suitable for LDSC regression.

Gene-environment interaction (GxE) for selected environmental traits
Finally, we investigated the gene-environment interaction, using the hormone-sensitive cancers as the main phenotypes and metabolic health-related traits as environmental variables. Note that we used prospective cases only for this gene-environment interaction analysis. The hormone-sensitive cancer phenotype status was adjusted for multiple variables that include assessment centre, batch effect, birthplace, age, sex, educational status, the 10 principal components, smoking status, alcohol consumption, and TDI. Given the characteristics of these environmental variables, we have applied the Bivariate GREML or GxEsum method 26 . The baseline BMI measurement is categorized as normal and higher based on the World Health Organisation (WHO) BMI threshold recommendations 27 ; metabolic markers classi ed as favourable and unfavourable metabolic environment from the metabolic subgroup analysis in the UKB using machine-learning data-driven analysis 28 and sex as a discrete variable were analysed in Bivariate GREML. This Bivariate GREML analysis was applied to detect the interaction using individual-level measurement in the UKB.
In the Bivariate GREML analysis that requires individual-level genotype data, sex, BMI, and metabolic environment were included as an environment to detect their role in the aetiology of hormone-sensitive cancers. In using the GREML method for BMI classi ed as normal and higher, signi cant evidence for GxE interaction was found as the genetic risk of hormone-sensitive cancer was heterogeneous between the two environments. Estimated genetic correlation was signi cantly different from 1 (P-value = 6.00E-05 in Table 5). Likewise, the estimated genetic correlation between favourable and unfavourable metabolic environments was also signi cantly different from 1 (P-value=1.87E-03), indicating a signi cant GxE interaction. Although there is signi cant heterogeneity between males and females in the genetic risk of hormone-sensitive cancers when sex is included as environment, the observed genetic heterogeneity may not be because of the gene by sex (GxSex) interaction, given the diversi ed nature of distinct cancers types, each of which included is predominantly female or male-only cancer. Therefore, the nding re ects the genetic heterogeneity between sex-speci c cancers (as shown in Tables 5) as a result of diversi ed cancers, and it is not conclusive that the genetic risk of hormone-sensitive cancers is modulated by sex as an environment.  Table 9).

Discussion
A growing number of population-based genomic studies have emphasised the role of hormones and their metabolites in modifying gene-phenotype pathways of cancers 8 . In the current study, we conducted a comprehensive analysis to estimate SNP-based heritability, a GWAS that focused on grouped hormone-sensitive cancer and estimated the phenotypic and genetic correlation with other non-cancer traits in a large contemporary cohort. This study con rms that Genome-wide common SNPs contribute to a substantial proportion of the phenotypic variance of hormone-sensitive cancers. In contrast, a relatively small proportion of phenotypic variance is captured by Genome-wide common SNPs for non-hormonal cancers. A cross-cancer GWAS approach was applied to hormone-sensitive cancers in which we identi ed multiple genome-wide signi cant SNPs that had common effects shared between hormone-sensitive cancers. Interestingly, there was also signi cant genetic heterogeneity among hormone-sensitivity cancers, i.e., estimated genetic correlation for a pair of hormone-sensitivity cancers was signi cantly different from 1. We also found that the hormone-sensitive cancer status was signi cantly associated with non-cancer traits, e.g., IGF-1 and height signifying the suggestive role of these non-cancer traits in the complex biology of cancer.
In the current study, we applied GREML and LDSC methods of estimating heritability in which the GREML estimates were higher than LDSC. The variation demonstrated wherein the GREML analysis in the liability scale showed a 10% of phenotypic variability in hormone-sensitive cancer is due to genetics, further suggesting the existence of shared underlying biology for the combined hormone-sensitive cancers. This further suggests that previous site-speci c independent cancer heritability estimates explain a small fraction of the shared heritability, and a fraction of this heritability can be explained by Genome-wide common SNPs without the need for other variants such as structural and rare variants in whole-exome and wholegenome sequencing. In contrast to earlier ndings, however, our heritability estimate is substantially lower than summary statistics-based estimates for each component of site-speci c hormone-sensitive cancer that ranges from 7% (ovarian) to 27% (prostate) on a liability scale 29 . There are two likely causes for the discrepancy between heritability estimates in the current study and previous studies. First, the difference could be attributed to the genetic heterogeneity of the combined cancers as evidenced in our genetic correlation estimates between cancers. Therefore, a reduced heritability is expected when these genetically heterogeneous cancers are grouped as a single trait. Second, the discrepancy can be explained in part by the difference in the level of information used wherein individual-level data from the UKB is used in our estimate whereas previous studies used GWAS summary statistics with a greater number of cases owing to higher heritability estimates. Although the estimates are low as compared to previous sitespeci c cancer components, our nding, however, provides a comprehensive analysis suggesting a through reconsideration of cancer classi cation for shared biological mechanism of carcinogenesis.
The analytical performance of GWAS is highly dependent upon the size of the cohort and the degree of phenotypic similarity of the combined traits 30 . Therefore, cross-trait GWAS recently adapted to identify common factors of interest in precision medicine that involves identi cation of genetic susceptibility loci for in ammatory bowel disease, mostly shared between Crohn's disease and ulcerative colitis 31 , and among ve major psychiatric disorders generating quanti ed molecular evidence for the need to investigate common pathophysiology for related disorders 32,33 . Despite overwhelming success in other medical elds, cross-traits analysis has not been widely applied in cancer genetics. Furthermore, based on the GWAS to date on cancer, many independent cancer susceptibility variants have been identi ed. When these variants are combined into polygenic risk scores, they explain a small fraction of the heritability of cancer and show differential associations by tumour subtypes. However, it is only a few studies have combined some site-speci c hormone-sensitive cancers 10,12 . Therefore, when cross-trait effects exist, the current study has important implications to systematically integrate the phenome-wide data available for genetic association analysis with improved statistical power in detecting signi cant genetic loci for meaningful biological interpretation.
Most cancer genomics research is focused on somatic events, such as acquired mutations; but increasing evidence suggests that germline variants have been experimentally demonstrated to play a signi cant role in cancer risk prediction 34 and may also inform decisions about cancer-directed therapy 35 . Therefore, in the current study detecting common genetic variants across major cancers that shared similar aetiologic pathways will facilitate our understanding of the possible shared genetic basis of these cancers to develop more optimized diagnostic criteria. Our multi-trait GWAS analysis can be used to look for germline variants and understand how speci c genetic variants may contribute to a broad spectrum of illness and provide information about the degree to which these disorders may have a shared genetic risk factor. To the extent that these genes may have broad effects, they could be potential targets for developing new treatments that might help treat multiple cancer conditions. In agreement with our ndings, previous studies have implicated these genes in liability to each site-speci c cancer in different population 36,37,38,39 . This supports the implementation of such combined analysis that provides more insight in the complex pathway underlying hormone-sensitive cancer biology with the expected molecular evidence on shared genetic risk factors seen in previous studies of major psychiatric and in ammatory disorders. This molecular evidence of shared genetic in uence in hormone-sensitive cancers can be extended to design public health intervention addressing multiple cancers at affordable cost in genetic screening.
Epidemiologic studies have identi ed an association between height, IGF-1, oestradiol, and cancer incidence to provide clues to cancer aetiology. The risk of IGF-1 in cancer is further established in deciphering the mechanism of height to cause cancer 40,41 . In the current study, we found a positive genetic correlation in all cases of hormone-sensitive cancers with IGF-1 and standing height, which suggests there is an increased correlation in height and IGF-1 concertation to develop hormone-sensitive cancer. The observed correlation between standing height and caner development might be explained by the increased standing height that re ects more stem cells as a risk of acquiring mutations during cell division over time, and further circulating level of IGF-1 as the major determinants of height 42 . There can be, however, other possible explanations. In contrast, serum oestradiol level showed a negative genetic correlation with all cases of hormone-sensitive cancers suggesting the presence of lowered risk. However, in the analysis restricted to prospective hormone-sensitive cancers cases, serum oestradiol exhibited a positive genetic correlation (although not signi cant), suggesting that exposure to ovarian steroids increases the risk of developing hormone-sensitive cancers 43 . The current study further revealed that the positive genetic correlation of height and IGF-1 with cancer remained positively signi cant with prospective cases suggesting that the correlation is related to the commonality within the combined cancers in their gene alteration and gene expression pattern.
Contrary to expectations for the rest of the traits, this study did not nd a statistically signi cant genetic correlation between non-cancer trait subgroups and hormone-sensitive cancers. From previous epidemiological studies, it has been suggested that there is a correlation between noncancer traits and speci c hormone-sensitive cancers. This does not appear to be the case in our analysis. The observed low correlation can be explained in part by the underpowered nature of the current study to detect a phenotypic and genetic correlation between hormone-sensitive cancers and non-cancer traits. Therefore, non-genetic factors could be a major reason, if not the only one, signi cantly explaining the phenotype variance in cancer. Although the estimated genetic correlations are low, they can still be used as a training set in genomic risk prediction to improve the accuracy.
In genomic risk predictions when traits were combined as a single trait, slightly increased prediction accuracy was observed 44,45 . This suggests that substantial improvements in predictive power are attainable using training sets of combined cancer with molecular evidence of shared genetic contribution.
Apart from considering the correlation of variables, detecting the interaction with the environment may have an important implication in clinical care 46, 47 . Globally, the incidence of cancer has been steadily increasing for the past decades mirroring an increase in the prevalence of obesity 1 . The genetic effects of hormone-sensitive cancers can be modulated by obesity. Therefore, we sought to estimate the gene-environment interaction to shed light on the causal relationships of modi able environmental risk factor such as BMI and hormone-sensitive cancers. Further, we found signi cant interaction between genetics and adiposity-related factors as environment to interact with and modulate the development of hormone-sensitive cancers. The nominally signi cant GxSex interaction observed cannot be fully attributed to the gene-interaction effect of sex since this might have occurred as a result of unequal distribution of hormone-sensitive cases by sex, i.e., majority of the grouped cancers are female dominated cancer types. Although the combined cancers demonstrated a shared aetiology, the pairwise genetic correlation comparison evidenced that they are heterogeneous. i.e., the ve hormone-sensitive cancers have their unique pathogenic variants besides the shared genes. There is also further heterogeneity within the site-speci c cancers. Endometrial cancer, for example, is a heterogeneous cancer that is believed to have 2 biologically different subtypes that exhibit a different mechanism of tumorigenesis and disease progression 48 .
A major strength of the present study is that it constitutes a greater number of hormone-sensitive cancers grouped to better understand the complex underlying pathway of the disease biology. Previous studies were focusing on each site-speci c hormone-sensitive cancer independently. Further, information on non-cancer traits was used from the large dataset of the UKB. This study offers signi cant insights into the heritability estimates of hormone-sensitive cancer. However, our ndings should be interpreted in light of the limitations. First, participants in the UK Biobank are restricted to middle and old age, which is not representative of the general population on a variety of sociodemographic, lifestyle, and health-related characteristics, with evidence of a "healthy volunteer" selection bias 49 . Second, while the total sample size was large for the grouped cancer, the number of cases for some speci c hormone-sensitive cancers (e.g., uterine and thyroid) could be limited resulting in a large standard error for genetic correlation analysis. Therefore, further studies with a larger sample size for each cancer are warranted to validate our results. Third, in our report of heritability in a liability scale, we assumed the population level prevalence of the disease trait is identical to the observed sample prevalence, but the disease prevalence such as cancer in the UKB is often lower than population prevalence as the dataset is not representative of the UK population 49 . Finally, the present study was conducted in a population of European genetic ancestry, so the generalizability of our ndings to other ancestry group populations is limited.
In conclusion, we show that common genetic factors are a part to play in the mechanism of carcinogenesis shared by hormone-sensitive cancers, evidenced by the fact that SNP-based heritability is substantial and there are 55 genome-wide signi cant variants when combining multiple hormonesensitive cancers as a single disease. Albeit these common genetic factors, it is also observed that there is signi cant genetic heterogeneity between hormone-sensitive cancers. This nding will have an implication in future research to investigate the complex biological pathways of carcinogenesis that may result in a new opportunity for early detection of hormone-sensitive cancers in precision health.

Figure 3
Genetic correlation between all hormone-sensitive cancers [Prospective and Incident] and non-cancer traits using Bivariate GREML in the UK Biobank.
Abbreviations: SHBG: Sex Hormone Binding Globulin, IGF-1=Insulin Like growth factor. The error bars are indicating the 95% CI of the estimates.