Considering hormone-sensitive cancers as a single disease in the UK biobank reveals shared aetiology

Hormone-related cancers, including cancers of the breast, prostate, ovaries, uterine, and thyroid, globally contribute to the majority of cancer incidence. We hypothesize that hormone-sensitive cancers share common genetic risk factors that have rarely been investigated by previous genomic studies of site-specific cancers. Here, we show that considering hormone-sensitive cancers as a single disease in the UK Biobank reveals shared genetic aetiology. We observe that a significant proportion of variance in disease liability is explained by the genome-wide single nucleotide polymorphisms (SNPs), i.e., SNP-based heritability on the liability scale is estimated as 10.06% (SE 0.70%). Moreover, we find 55 genome-wide significant SNPs for the disease, using a genome-wide association study. Pair-wise analysis also estimates positive genetic correlations between some pairs of hormone-sensitive cancers although they are not statistically significant. Our finding suggests that heritable genetic factors may be a key driver in the mechanism of carcinogenesis shared by hormone-sensitive cancers.

A few other issues: -it is not clear to this reviewer that all their "hormone-sensitive cancers" were actually hormone sensitive; for example, were triple-negative breast cancers separated out from ER/PR+ breast cancers? -although the methods and rationale of the heritability estimates, and GREML analysis, are very well described in the Supplementary Note, it would be helpful to incorporate some of this into the Results section, when introduced, to clarify why these are being done; for example, something like "GREML is a statistical method that estimates the amount of variance in one or more phenotypes attributable to a collection of genetic polymorphisms; therefore, we applied it to address ...", instead of just launching into its use in the Results section (this journal is not a statistical genetics journal, but rather has a broad, general audience, that will likely not have expertise in these methods).
In summary, this work is very interesting and promising, but requires further development before I could recommend it for publication in this journal, and should be submitted to a journal more focused on cancer or statistical genetics.

Reviewer #2 (Remarks to the Author):
This is an interesting manuscript that describes findings from evaluating 1) the genetic basis of hormone-sensitive cancers by treating five cancers that are sensitive to hormone levels (breast, uterine, ovarian, prostate, and thyroid cancer) as a single disease and 2) the genetic relationship of hormone-sensitive cancers with other cancer types and cancer-related factors. The authors evaluated the heritability of this overall hormone-sensitive cancers and compared it to the heritability of overall cancer, obesity-related cancer, and finally the site-specific hormone-sensitive cancers from previous studies. A GWAS of this hormone-sensitive cancers phenotype was performed and 55 genome-wide significant SNPs were reported. Phenotypic and genetic correlations were calculated between hormone-sensitive cancers and cancer-related factors. Pairwise genetic correlations between breast, prostate, uterine, and three other cancer types were also reported. Genetic correlations calculated using a leave-one-out approach were reported to show the genetic relationship of each of the five hormone-sensitive cancers with the rest of the cancers and three other cancer types. Finally, gene-environment interaction analyses were performed by treating the hormone-sensitive cancers as the phenotype and sex, BMI, and metabolic environment as environmental variables. This is a comprehensive study of the genetics of hormone-sensitive cancers. The findings provide insights into the shared genetics of hormone-sensitive cancers at both the genome-wide and SNP levels. One strength is that the analyses were performed on all cases and incident cases separately; comparing the results from these two sets of analysis may tell us more about the mechanism underlying the observed genetic correlation. However, I suggest the authors further improve the methods as well as the overall writing and structure of the main text, tables, and figures to strengthen the validity of the study. Some specific points: 1. Maybe I missed this, but the number of cases for each hormone-sensitive cancers was not provided in the main text or tables. If treating all hormone-sensitive cancers as a single disease (if I understand this correctly, subjects diagnosed with any hormone-sensitive cancer are cases, cancer-free subjects are controls, and GWAS was performed on this new phenotype), I would expect the relative proportion of cancer cases for each cancer type to have a substantial influence on the results. One extreme case is that if all cases are breast cancer cases, the GWAS results would not reflect any underlying genetic variations of other cancers (i.e., not reflecting common effects *shared* by all cancers). If the case numbers are small for some cancer types, e.g., ovarian and thyroid, then the genetics of those cancer types may not be reflected by studying this overall phenotype.
2. Instead of treating the five hormone-sensitive cancers as a single disease, it might be better to perform a meta-analysis of the single-trait GWAS (e.g., using MTAG) to identify the shared genetic variations across these hormone-sensitive cancers and reveal any shared mechanism. In addition to listing the identified SNPs in the supplementary table, it would be helpful to show the identified independent loci in a main table, along with their effects on each of the hormone-sensitive cancers.
3. Line 71-72: When treating multiple hormone-sensitive cancers as a single disease, estimated SNP-based heritability can inform if the common germline variants contribute to the carcinogenic risk shared between hormone-sensitive cancers.
Again, I think the heritability of this single disease may not capture the shared genetics across these hormone-sensitive cancers very well. It reflects the genetic variation of being diagnosed with any of the hormone-sensitive cancers, but not necessarily shared by these cancers. Instead, the genetics correlations between these cancers may reflect the shared genetics at the genome-wide scale; shared susceptibility loci may reflect common mechanisms. Table 3 doesn't include thyroid and ovarian cancer, which seems like a sample size issue as the authors described in the leave-one-out analysis. But it is not clear that why the other three cancer types were selected and included (i.e., why select these cancer types but not others and what is the motivation of including other cancer types). Please clarity in the main text. Some other comments:

4.
1. The title of this manuscript sounds like it is describing the heritability of hormone-sensitive cancers and the genetic correlations between these cancers. However, it seems like the main focus of the main text is the genetic correlations of these hormone-sensitive cancers with other cancer types and cancer-related factors, but not between these cancers (at least it was not estimated for all hormone-sensitive cancers). Also, genetic correlation was not mentioned in the abstract.
2. The use of prospective vs. retrospective and incidence vs. prevalence cases is mixed in the text and tables. These terms should be defined clearly at the beginning of results and the usage should be consistent in text and tables.
3. "Female specific factors [age at menopause]" was mentioned in line 151, but Table 2 reports "Women Factors [Menopausal Status]". Please clarify. It's better to name this category as "Menstrual and reproductive factors" or just "Menstrual factors". 4. Table 4 is difficult to read -it might be better to replace 'Hormone-sensitive-X' with 'Excluding XXX' or something else that is easy to understand with only the information in the table. 5. In methods, the five selected hormone-sensitive cancers are cancers of the breast, endometrium, ovary, prostate, and thyroid, but in other parts of the manuscript, it seems that 'uterine cancer' was used to refer to 'endometrial cancer', please be consistent. 6. Figure 3 and 4. It seems that the values are percentages but there is no annotation.

1.
Overall, an interesting analysis of hormone-sensitive cancers, to evaluate for common genetic factors in carcinogenesis of these tumors, compared with non-hormone sensitive cancers. Many cancers share some genetic mechanisms in common, but the authors are hypothesizing that hormone-sensitive carcinogenesis per se has a unique set of common factors. However, they don't provide a clear rationale for this. For example, why would an estrogen-sensitive cancer have factors in common with an androgen-sensitive cancer, or a TSH-sensitive cancer? Do they have common target genes? or common transcriptional co-factors? etc. (in other words, why should they be considered, in this analysis, as a single disease?). There are potentially good reasons for hypothesizing this, but the authors don't develop this idea, either in the introduction or discussion; and they don't relate their findings to that central question; for example, do the genes covered by the 55 SNP's have known interactions between different hormone-sensitive signaling pathways, but not (presumably) signaling pathways involved in non-hormone-sensitive cancers? Why would FGF2, or POU5F1B, etc., be links specifically between hormonesensitive cancers?
Authors' response: We thank the reviewer for this excellent suggestion. As suggested by the reviewer, we have included the rationale for combining five types of hormone-sensitive cancers as single disease, considering the common target genes and transcriptional cofactors, in the introduction section of the revised manuscript (lines 61-72).
The revised content now reads as:

2.
Overall, the statistical analyses are valid.
Authors' response: Thank you for your positive assessment of our statistical analyses.
A few other issues: 3. It is not clear to this reviewer that all their "hormone-sensitive cancers" were actually hormone sensitive; for example, were triple-negative breast cancers separated out from ER/PR+ breast cancers?
Authors' response: We thank the reviewer for this valid point. We could not further classify pathologic-based subtypes of breast cancer, including estrogen-receptor [ER] positive/negative, progesterone-receptor [PR] positive/negative, and triple-negative because the information on these subtypes of breast cancer was not available to us.
Triple-negative breast cancer (TNBC) is characterized by the negative test to estrogen and progesterone receptors, and excess human epidermal growth factor receptor 2 (HER2) protein that accounts for approximately 15% of breast cancers diagnosed worldwide 1 , implying that TNBC is a relatively rare subtype. Furthermore, TNBC is more commonly diagnosed in women younger than 40 years 2 , and the breast cancer cases included in our analyses are mainly cases of postmenopausal breast cancer, which is an obesity-related cancer [mentioned in the method section lines 464-468, page 19 as hormone-sensitive cancer cases are subset of obesity-related cancer cases namely postmenopausal breast, uterine, ovary, prostate, and thyroid"]. Therefore, it is likely that TNBC cases are negligible in our analyses 3 so have had little to influence on our findings. Nonetheless, we agree that this is a limitation of our study and explicitly discuss in the text (lines 435-437). The limitation stated as: "Further, we did not classify subtypes of breast cancer, such as estrogen receptor [ER] positive/negative, progesterone receptor [PR] positive/negative, and triple-negative breast cancers, in our analyses because no such information was available to us."

4.
Although the methods and rationale of the heritability estimates, and GREML analysis, are very well described in the Supplementary Note, it would be helpful to incorporate some of this into the Results section, when introduced, to clarify why these are being done; for example, something like "GREML is a statistical method that estimates the amount of variance in one or more phenotypes attributable to a collection of genetic polymorphisms; therefore, we applied it to address ...", instead of just launching into its use in the Results section (this journal is not a statistical genetics journal, but rather has a broad, general audience, that will likely not have expertise in these methods).

Authors' response:
Thank you for this helpful suggestion. The details for the method description were given in the supplementary notes to shorten the main manuscript, however, as per the reviewer's suggestion, we have now added further detail in the result section of the revised manuscript [lines 101-104].
The revised section now reads as: "Here we used a Genomic Restricted Maximum Likelihood (GREML) analysis, which is a statistical method that estimates the proportion of variance on one or more phenotypes attributed by all genetic polymorphisms using individual-level data to estimate the variance explained by all genetic polymorphisms (SNP-based heritability)".

5.
In summary, this work is very interesting and promising, but requires further development before I could recommend it for publication in this journal, and should be submitted to a journal more focused on cancer or statistical genetics Authors' response: Thank you for your constructive review. As the reviewer commented that this work is promising, and we have explicitly addressed the reviewer's concerns on the previous version of the manuscript, we hope the manuscript is now acceptable.
1. This is an interesting manuscript that describes findings from evaluating 1) the genetic basis of hormone-sensitive cancers by treating five cancers that are sensitive to hormone levels (breast, uterine, ovarian, prostate, and thyroid cancer) as a single disease and 2) the genetic relationship of hormone-sensitive cancers with other cancer types and cancerrelated factors. The authors evaluated the heritability of this overall hormone-sensitive cancers and compared it to the heritability of overall cancer, obesity-related cancer, and finally the site-specific hormone-sensitive cancers from previous studies. A GWAS of this hormone-sensitive cancer phenotype was performed and 55 genome-wide significant SNPs were reported. Phenotypic and genetic correlations were calculated between hormone-sensitive cancers and cancer-related factors. Pairwise genetic correlations between breast, prostate, uterine, and three other cancer types were also reported. Genetic correlations calculated using a leave-one-out approach were reported to show the genetic relationship of each of the five hormone-sensitive cancers with the rest of the cancers and three other cancer types. Finally, gene-environment interaction analyses were performed by treating the hormone-sensitive cancers as the phenotype and sex, BMI, and metabolic environment as environmental variables. This is a comprehensive study of the genetics of hormone-sensitive cancers. The findings provide insights into the shared genetics of hormone-sensitive cancers at both the genome-wide and SNP levels. One strength is that the analyses were performed on all cases and incident cases separately; comparing the results from these two sets of analysis may tell us more about the mechanism underlying the observed genetic correlation. However, I suggest the authors further improve the methods as well as the overall writing and structure of the main text, tables, and figures to strengthen the validity of the study.

Authors' response:
We thank the reviewer for this summary and positive comments. The tabular presentation has been improved, and we have added annotations for tables and figures if needed (see point-by-point responses below).

Some specific points:
2. Maybe I missed this, but the number of cases for each hormone-sensitive cancer was not provided in the main text or tables. If treating all hormone-sensitive cancers as a single disease (if I understand this correctly, subjects diagnosed with any hormone-sensitive cancer are cases, cancer-free subjects are controls, and GWAS was performed on this new phenotype), I would expect the relative proportion of cancer cases for each cancer type to have a substantial influence on the results. One extreme case is that if all cases are breast cancer cases, the GWAS results would not reflect any underlying genetic variations of 5 other cancers (i.e., not reflecting common effects *shared* by all cancers). If the case numbers are small for some cancer types, e.g., ovarian and thyroid, then the genetics of those cancer types may not be reflected by studying this overall phenotype.

Authors' response:
We appreciate the reviewer for this suggestion. We have now incorporated the tabulated information and revised sentences in the main text about the number of cases for each hormone-sensitive cancer for more clarity. Please see the revised manuscript (lines 471-473) and Supplementary Table 6.
The revised sentence in the main text now reads as: "So, 7,038 incident cases of hormone-sensitive cancer were included. The detailed number of cases of hormone-sensitive cancer is included in Supplementary Table 6".

3.
Instead of treating the five hormone-sensitive cancers as a single disease, it might be better to perform a meta-analysis of the single-trait GWAS (e.g., using MTAG) to identify the shared genetic variations across these hormone-sensitive cancers and reveal any shared mechanism. In addition to listing the identified SNPs in the supplementary table, it would be helpful to show the identified independent loci in a main table, along with their effects on each of the hormone-sensitive cancers.

Authors' response:
We thank the reviewer for this suggestion. We have now added the results of the meta-analysis from the 5 single-trait GWASs using the command --metaanalysis in PLINK 1.9 (a fixed-effect inverse-variance weighted method) (Supplementary Table 10 and Supplementary Fig. 4). From the results, we observed that 37 genome-wide significant SNPs were identified at chromosome 9 only, indicating that the meta-analysis is less powered, compared to the analysis of combined hormone-sensitive cancers as a single disease (that identified 55 genome-wide significant SNPs in chromosome 2, 8, 10, 11, 16, 17, and 19). When carrying out the meta-analysis of the single-trait GWAS for the incident cases only, no genome-wide significant SNPs were found, noting that the analysis of combined hormone-sensitive cancers as a single disease for incident cases only identified 33 genomewide significant SNPs in chromosome 8, 10, 11, and 17.
Our analysis supports the conceptual premise of the combined analysis that can capture common genetic risk factors shared between hormone sensitive cancers, some of which may not be identified by the meta-analysis of single-trait GWASs. This is presented in the result section (lines 136-138 and Fig.2) in comparison with the meta-analyzed finding presented in lines 161-169, and in Supplementary Fig. 4.
When considering LD, the number of independent loci (LD r 2 > 0.2) is 12 SNPs (Table 2) for the identified 55 SNPs from the analysis of combined hormone-sensitive cancers as a single disease (a plot showing the LD heatmap in Supplementary Fig. 2a). For the analysis restricted 6 to incident hormone-sensitive cancer cases, we identified 8 genome-wide significant independent loci for the identified 33 SNPs (Supplementary Table 9) and the LD heatmap in Supplementary Fig. 2b.

4.
Line 71-72: When treating multiple hormone-sensitive cancers as a single disease, estimated SNP-based heritability can inform if the common germline variants contribute to the carcinogenic risk shared between hormone-sensitive cancers. Again, I think the heritability of this single disease may not capture the shared genetics across these hormone-sensitive cancers very well. It reflects the genetic variation of being diagnosed with any of the hormone-sensitive cancers, but not necessarily shared by these cancers. Instead, the genetic correlations between these cancers may reflect the shared genetics at the genome-wide scale; shared susceptibility loci may reflect common mechanisms.

Authors' response:
We explicitly estimated genetic correlations between breast vs. prostate (rg = 0.10, SE=0.09), breast vs. uterine (rg =0.32, SE= 0.20), and prostate vs. uterine cancers (rg =0.12, SE=0.18) (Fig. 5). We also estimated genetic correlations between breast cancer vs. hormonal cancer excluding breast cancer (rg = 0.1662, SE= 0.0930) and prostate cancer and hormonal cancer excluding prostate cancer (rg = 0.2209, SE=0.1101) ( Table 4). These estimates suggest that there is significant genetic heterogeneity among these cancers. However, we also would like to quantify how much phenotypic variance is explained by the common genetic factors shared among hormone-sensitive cancers, which can be assessed by SNP-based heritability on the overall hormone-sensitive cancer coded as a single disease.
We have clarified this in the revised manuscript (lines 268-271).
"While these estimates suggest that there is significant genetic heterogeneity among these cancers, the estimate of SNP-based heritability of the overall hormone-sensitive cancer coded as a single disease shows that the phenotypic variance explained by the common genetic factors is significantly different from zero (Fig. 1)." Table 3 doesn't include thyroid and ovarian cancer, which seems like a sample size issue as the authors described in the leave-one-out analysis. But it is not clear that why the other three cancer types were selected and included (i.e., why select these cancer types but not others and what is the motivation of including other cancer types). Please clarity in the main text.

5.
Authors' response: Thank you for bringing attention to the clarity needed in the cancer definition for hormone-sensitive cancers. We have now clarified that hormone-sensitive cancers are a subset of the obesity-related cancers identified by the World Health Organization (WHO) International Agency for Research on Cancer (IARC) 3 . We then grouped the five cancers that share a characteristic mechanism of carcinogenesis that involves hormones from the list of obesity-related cancer by WHO/IARC (lines 464-468, page 19). Those additional cancers included in the genetic correlation analysis i.e., colorectal, renal, and multiple myeloma are from these obesity-related cancers with the intent of looking for genetic correlation as there is sex difference in the incidence of these cancers implying that the association between sex hormones and genetic variants in hormone metabolic pathways might have a role 4, 5, 6 . We have now added a sentence to clarify the need for including other obesity-related cancer in the genetic correlation analysis of the leave-one-cancer-out analysis (lines 254-256).
"We further carried out genetic correlation analyses into grouped hormone-sensitive and other obesityrelated non-hormone sensitive cancers in the UKB (namely colorectal, kidney, and multiple myeloma to gain more detailed understanding of the complexities of hormone-cancer phenomena".
We also clarified as to why ovarian and thyroid cancers were not included in the footnotes of Fig. 5 and Table 4. For example, the footnotes in Fig. 5 reads as: "Ovarian and thyroid cancers were not estimable, which was probably due to the fact that the number of cases was not sufficient for LDSC in the analysis of these diseases." Some other comments: 6. The title of this manuscript sounds like it is describing the heritability of hormonesensitive cancers and the genetic correlations between these cancers. However, it seems like the main focus of the main text is the genetic correlations of these hormone-sensitive cancers with other cancer types and cancer-related factors, but not between these cancers (at least it was not estimated for all hormone-sensitive cancers). Also, genetic correlation was not mentioned in the abstract.
Authors' response: Thank you for this insightful suggestion. The title has been modified to reflect the main focus on the analysis of hormone-sensitive cancers as a single disease. The revised title now reads as: "Heritability of Hormone-Sensitive Cancers as a single disease in the UK Biobank: A molecular Evidence of Shared Aetiology" In addition, we made it clear that we estimated the genetic correlation between each pair of hormone-sensitive cancers except the pairs that were not estimable (lines 236-240, page 10, Figure 5, and Table 4). Please also see responses to Q4 and Q5 above). We have now revised the abstract as (lines 45-47) "Pair-wise analysis also estimated positive genetic correlation between some pairs of hormone-sensitive cancers although they were not statistically significant".