Heritable DNA methylation marks associated with susceptibility to breast cancer

Mendelian-like inheritance of germline DNA methylation in cancer susceptibility genes has been previously reported. We aimed to scan the genome for heritable methylation marks associated with breast cancer susceptibility by studying 25 Australian multiple-case breast cancer families. Here we report genome-wide DNA methylation measured in 210 peripheral blood DNA samples provided by family members using the Infinium HumanMethylation450. We develop and apply a new statistical method to identify heritable methylation marks based on complex segregation analysis. We estimate carrier probabilities for the 1000 most heritable methylation marks based on family structure, and we use Cox proportional hazards survival analysis to identify 24 methylation marks with corresponding carrier probabilities significantly associated with breast cancer. We replicate an association with breast cancer risk for four of the 24 marks using an independent nested case–control study. Here, we report a novel approach for identifying heritable DNA methylation marks associated with breast cancer risk.

Joo et al have studied Mendelian inheritance of methylation marks and association with breast cancer in 25 extended Australian breast cancer families, using genome wide methylation analysis of 210 blood samples (87 breast cancer cases and 123 unaffected controls). Out of the 1000 most heritable marks, 11 methylation marks were found to associate significantly with breast cancer in the families. In addition, these 11 marks were studied for association with breast cancer in a casecontrol material of 435 invasive breast cancer cases and their matched controls. Three of these marks were found to associate nominally significantly with breast cancer also in the case-control material. While constitutional methylation has been hypothesized as a mechanism for many inherited diseases it has been little studied in breast cancer so far. The authors have also developed a method for identifying heritable methylation sites that could be of interest in general also for other diseases. This is an interesting and extensive study that brings new information on epimutations as a possible mechanism also for breast cancer risk.
The methylation marks are indicated to have substantial differences between individuals and fall into hyper-, hypo-or hemimethylated groups. It is not very clear in the manuscript whether the 11 marks identified show consistent or different methylation status and association with breast cancer between the families and also compared to the case-control data set. The M-values would be also useful to show. A table compiling these information could be useful.
In the family-based analysis, only p-values are given and not odds ratios for association with breast cancer risk. The authors indicate the ORs in the family-based analyses would be biased which is undoubtedly the case with ascertainment of multiple case families and no adjustment for this ascertainment criterion. However, if the marks were first selected based on Mendelian inheritance in the families where breast cancer is segregating as well and cases have likely been oversampled, would that not lead to inflated test statistics for p-values as well? Would the associations survive adjustment for this? Unbiased odds ratios for the risk would also be more informative for the evaluation of the (biological) significance of the findings.
In table 2, the ORs in the MCCS case-control analysis are similar to low penetrance risk variants in general (0.83-1.26 for the nominally significant marks), suggesting a risk modifying effect rather than a causative role for the disease as such. In the family-based analysis, the risk of breast cancer is indicated to increase with carrier probabilities for all 11 sites. How is this consistent with the risks in the case-control-set (OR 0.83-1.26)? What could be the approximate effect sizes (OR) in the families (see comment above) -falling into low penetrance/modifier category or having a more substantial disease risk? Please discuss the possible significance of the findings in the framework of other breast cancer risk factors or alleles. How would the risk effects detected here compare with those discussed in the introduction?
Altogether, the relationship between hyper/hypo/hemimethylation, direction of the risk effect (risk/protective) and further, effect on the putative target genes is unclear. In the discussion, please elaborate in more detail on this and the methylation status and effects on the expression of the respective genes, rather than dysregulation in general.
The breast cancer risk analyses were adjusted for several risk factors. Are the risk associations similar or different by estrogen receptor status, i.e. in ER positive or ER negative breast cancer for the 11 marks, and specifically, associating with the GREB1 gene "growth regulation by estrogen in breast cancer 1"?
Reviewer #3 (Remarks to the Author): Expert in breast cancer genetics Summary

Reviewer: Paul Pharoah
This is a clear, well written manuscript reporting a study investigating the association between individual DNA methylation marks (epimutations) in lymphocyte DNA and breast cancer risk. A family based study design is used to identify 11 epimutations with strong statistical evidence of association with breast cancer risk. Overall the findings are novel and reasonably convincing that at least some heritable epimutations are associated with breast cancer risk.
Specific comments 1. As part of the rationale for this study the authors state that heritable epimutations might account for some of the excess familial risk not explained by the known germline genetic variation. This rationale is repeated in the first sentence of the discussion. However, this argument needs further explanation. If an epimutation is truly heritable -passes from one generation to the nextit will presumably be linked to/correlated with nearby germline DNA variation. This DNA variation would also be associated with disease risk and so would account for some of the excess familial risk of disease. The argument is circular. The authors then go on the state that the know example of transgeneration epimutations in mismatch repair genes are in fact linked to nearby cis-acting variants. One would need functional genomic studies to establish whether or not an epimutation was caused by nearby cis-acting variants and whether or not the epimutation itself had any relevant direct functional consequence that resulted in disease risk.
2. The methodology and general approach will not be familiar to most non-specialist readers (such as myself). It would therefore be helpful to include a main figure summarising the data and analysis for one of the associated epimutations that was also "significant" in the validation nested case-control study in order to aid the reader in understanding the underlying methodology. For example, if I have understood correctly, Figure S4 shows a trimodal distribution of methylation values for cg18584561 with means at approx -4, 0 and 2. I would interpret this as corresponding to three genotypes of a common bi-alleleic variant. It is then unclear how this relates to Figure S3. These data (distributions) could perhaps be shown separately for cases and controls from the familial samples (given the association I presume the distributions are different) together with the equivalent panel for Figure S3. Finally the M-value distribution in cases and controls for the validation nested case-control study could be shown.
3. It would be useful to provide the estimated carrier frequency for each of the associated epimutations in the familial samples (by case-control status).
4. If some, or all of these epimutations were confirmed to be associated with breast cancer risk, they would be just like any germline variant only a marker for risk and some causal mechanisms would need to be established to make any claims beyond association. 5. I do not agree that reducing the multiple testing burden increases statistical power in a useful way. The probability that an association that is declared significant at a predetermined threshold is a true positive depends on the statistical power and the prior probability of association. Reducing the number of tests does not alter the prior at all or the power. Whether or not a heritable epimutation is more likely to be associated with risk than an non-heritable epimutation is not known. Some evidence for this could be provided by investigating the association of the least heritable mutations with risk.
6. p10, l198 et seq. The authors state that it is remarkable that three of the eleven associated markers were associated with risk in an independent nested case control study (not cohort study as stated). It is not clear to me why this finding is unexpected. The replication suggests that these epimutations have a population frequency that is sufficient to be detectable in a modest sized case-control study. The authors also speculate that the carrier frequency for some of the epimutations that did not replicate may have been too low. The histograms of the M-values in cases and controls from the nested case control study ought to be provided in Fig S4 to demonstrate the difference in likely carrier frequencies.
7. Some estimation of the power of the replication study to detect the types of variant identified by the family study should be provided.
8. If the frequency is sufficient to be detectable in the replication study one would expect the epimutation to be correlated with a common DNA variant (SNP) and, as such ought to have been detected in one of several large scale GWAS for breast cancer. Presumably germline genotyping data are available for some or all of these samples and so correlation between the epimutation and nearby SNPs should be evaluated and evidence that these SNPs are indeed associated with risk could be obtained (the authors have easy access to results from multiple BCAC studies). Discussion 9. The final sentence of the discussion is simply not justified. If heritable epimutations are in ciswith DNA variants how will epimutations provide new opportunities for increasing the precision of current risk models or how will they help in developing new strategies for cancer control? I accept that epimutations might offer therapeutic targets, but it would first need to be established that these epimutations are not simply markers of risk -i.e. they have functional consequences that make them valid targets.
Methods 10. The timing of the collection of blood from the familial samples should be stated. In particular, were the case samples collected before or after diagnosis. If the latter is the case then the possibility of reverse causation should be discussed -it would be a potential reason for nonreplication in the nested case-control study. 11. A brief explanation of the meaning of the beta-and M-values should be provided in the methods. In addition, because much of the methodology described in this paper will be unfamiliar to the non-specialist reader it would be helpful if an explanation of some of the terms used were provided when first mentioned in the results to avoid the need from switching back and forth between results and methods. E.g. put a brief definition of beta-values, M-values and delta-l in parentheses when they are first mentioned.

Statistical analysis
12. I am not a statistical geneticist, but the statistical approach seems sound and has been explained and justified clearly. Potential biases have been acknowledged and the fact that these do not invalidate the final p-values as a test of association of the epimutation sites of interest noted.
13. Cox PH regression is used to test for association between estimated carrier probability and breast cancer. It would be helpful to show the estimated carrier frequencies in cases and controls.
14. It is stated in the methods (p16, l326) that a Bonferroni correction was used to adjust for multiple testing. However the p-values in Table 1 are unadjusted p-values. It would be more appropriate to state that a Bonferroni corrected threshold was used to determine statistical significance (and to state that threshold).
15. Conditional logistic regression with M-value as the independent variable is used to test for association in the nested case-control study. Given that the underlying model from the family study was Mendelian it would seem more appropriate to use most likely carrier status -at least for those epimutations with a multi-modal distribution with a clear separation between the methylation value peaks (carriers and non-carriers). Presenting the M-values for cases and controls (see comment # 7) would illustrate this.
Minor comments (very minor for some) 16. Figure S2 is of poor quality/low resolution 17. Figure S3. The total number of samples from the y-axis of the histograms seems to be much less than the total sample size.
18. p6, l113 (typo)" … of a known SNPs……" (delete a) 19. p7, l127. The statement that HRs could not be calculated should more accurately state that unbiased HRs could not be calculated. 20. p7, l129. It is not very clear to me how Figure S3 shows the estimated effect of the hypothetical genetic variant on the M-values. It would be helpful to explain why the fitted distributions for some epimutations do not seem to fit well (e.g. cg18584561). 20. These authors should realise that "… a number of …" , as used in supplementary statistical methods (p4) could include the number zero. As such it is an unhelpful phrase. 21. The authors "…..wish to thank…..". I wonder then why they do not do so.
Thank you for reviewing our manuscript and providing us with the opportunity to respond to the reviewers' comments which we address point by point below.

Reviewer comment: Title: "Heritable epimutations" is an oxymoron. The term epimutation refers to changes in DNA methylation (and/or chromatin) that occur independently of genetics, and cause a disease or a specific phenotype (well documented examples are bona fide epimutations that cause Beckwith-Weidemann syndrome, Prader-Willi syndrome, Silver Russel syndrome). So, this can be a very useful and informative term, when applied correctly. However, based on the data in this manuscript, the phenomenon that the authors are actually describing is NOT epimutations. Rather, it is methylation quantitative trait loci (which can be abbreviated as mQTL or meQTL), a well described and well-studied phenomenon that is pervasive in human genomes. It would be a bad mistake and disservice to the field to dilute the useful term epimutations by applying it to the situation that the authors describe in their manuscript. They should use the correct term mQTL.
Author response: It has been challenging to find the terminology appropriate for this work. Indeed, many of the methylation marks that we were calling "heritable epimutations" and that are associated with diseases are now linked to genetic variants, yet some others remain independent of known genetic variation. We agree that the term "heritable epimutations" is often used incorrectly and our use of this term in this manuscript did not only refer to the classical situation/definition described above by the reviewer. The mechanistic explanation for the heritable methylation marks that we describe in this manuscript remains speculative. Indeed, there may be several mechanisms that give rise to methylation marks that are heritable and associated with breast cancer risk. We present the hypothesis that genetic variation may underlie at least some of these heritable methylation marks in the manuscript but the work to explore this hypothesis lies outside the scope of this report (discussed further below). Thus, without reporting genetic variants linked to our methylation marks, and genetic variants being unlikely to explain all of the heritable methylation marks that we describe in the manuscript (see discussion related to VTRNA2-1/mir886). For this reason, we think that "mQTL" is also unsuitable for describing our finding. We understand the reviewers point and have replaced the term "heritable epimutation" with "heritable methylation mark" -which more precisely describes what we have measured and what we are reporting. We have adjusted the manuscript throughout to address this important issue. Key changes in the text include a paragraph that addresses terminology and how we are applying it in the introduction and further speculation about at least some proportions of the reported heritable methylation marks being mQTLs in the discussion.

Reviewer comment:
Abstract and main text: Similarly, in their Abstract, the authors state "Mendelian-like inheritance of germline DNA methylation in particular cancer susceptibility genes. We aimed to identify heritable methylation marks associated with breast cancer susceptibility." This sentence precisely describes mQTLs. They should use this well accepted and standard term throughout. In other words, where the phrase "heritable methylation marks" appears, it should first be defined as equivalent to methylation quantitative trait loci, and then be abbreviated as "mQTLs" or "meQTLs" throughout the remaining text.
Author response: As above, we have reconsidered the terminology used to describe our work and our findings. Throughout the revised manuscript, we use the term "heritable methylation mark" as this is what we sought to identify. We hypothesise and provide additional text for the reader to convey that a proportion of these heritable methylation marks are likely to be due to mQTLs but do not want to label any of the findings with this term until this has been demonstrated.

Reviewer comment:
Abstract/study design: the study design is commendable in that it included both a reasonably large "test set" of samples (PBL from breast cancer families), and a "replication set" of samples from a population study.
Author response: Thank you.

Reviewer comment:
Page 6: "Of the 1,000 most Mendelian methylation marks, 11 of them were associated with breast cancer at the Bonferroni-adjusted p-value threshold of 5 x 10-5". This statement again is essentially the definition of mQTLs. That is, loci for which the levels of CpG methylation are genetically determined, by the haplotypes in which the CpGs are embedded. So, "most Mendelian methylation marks" should be stated as "most Mendelian methylation marks, i.e. mQTLs".
Author response: Please see the responses to 1 and 2 above.

Reviewer comment:
Page 7: The authors refer to a " Figure 2B", but this reviewer cannot find any figures in the main manuscript file (there are tables in it, but not figures). There are some Supplemental Figures, but it is impossible to know if any of these might correspond to " Figure 2B". This problem of potentially "missing figures" is obviously a major one.
Author response: All figures were submitted to the journal/editor. It should be possible to make these available to the reviewer.  Table 1) plotted separately for all individuals with AA, AB, and BB genotypes. To make these "gold standard" plots, the authors will need to determine SNP genotypes around each of their top-ranked loci, but that is easy and can be done using Illumina 2.5M or 5.0M SNP array data for the same samples, or more cheaply by simple Sanger sequencing of 1kb amplicons centered on each of their 11 top-ranked CpGs. If cost of even the Sanger sequencing is an insurmountable issue, I would be satisfied with seeing such plots from as few as 5 of their top-ranked loci. The data will be very informative, and the results may potentially change the authors' conclusions. It simply has to be done.

Reviewer comment:
Author response: The reviewer identifies an extremely important line of investigation. However, the authors do not wish to conduct a quick and limited analysis just to provide some information for this report. This line of investigation requires a comprehensive analysis and it is unlikely to be as straightforward as the reviewer suggests, due to the differences in the frequency of the identified marks, the possible differences in the magnitude of the associated breast cancer risk, phenocopies and the likelihood that at least some of these marks are epimutations (in the strictly defined sense). We are planning a comprehensive analysis to address this question that will be part of a future report (as recognised by reviewer 3 below). We also hope that reporting the findings of our empirically identified heritable methylation marks associated with breast cancer risk may stimulate further investigation of this important aspect of the work in the broader research community. The chromosomal regions on which these marks have been identified have not been associated with breast cancer risk via genome-wide associated studies (information now included in our manuscript), which also suggests that a comprehensive (rather than quick and limited) study of this question is required.

Reviewer comment: Joo et al have studied Mendelian inheritance of methylation marks and association with breast cancer in 25 extended Australian breast cancer families, using genome wide methylation analysis of 210 blood samples (87 breast cancer cases and 123 unaffected controls).
Out of the 1000 most heritable marks, 11 methylation marks were found to associate significantly with breast cancer in the families. In addition, these 11 marks were studied for association with breast cancer in a case-control material of 435 invasive breast cancer cases and their matched controls. Three of these marks were found to associate nominally significantly with breast cancer also in the case-control material. While constitutional methylation has been hypothesized as a mechanism for many inherited diseases it has been little studied in breast cancer so far. The authors have also developed a method for identifying heritable methylation sites that could be of interest in general also for other diseases. This is an interesting and extensive study that brings new information on epimutations as a possible mechanism also for breast cancer risk.

The methylation marks are indicated to have substantial differences between individuals and fall into hyper-, hypo-or hemimethylated groups. It is not very clear in the manuscript whether the 11 marks identified show consistent or different methylation status and association with breast cancer between the families and also compared to the case-control data set. The M-values would be also useful to show. A table compiling these information could be useful.
Author response: The logistic-transformed M-values, which should roughly indicate % methylation levels, for the 11 marks are shown in Supplementary Figure 3. We have now put the histograms of Supplementary Figure 4 on the same scale as Supplementary Figure 3, so that the distributions of methylation for the family analysis can be directly compared to that of the case-control analysis. We have also added a table (Supplementary Table 2) showing the number of hypo-, hemi-and hypermethylated cases and controls for the 11 marks.

Reviewer comment: In the family-based analysis, only p-values are given and not odds ratios for association with breast cancer risk. The authors indicate the ORs in the family-based analyses would be biased which is undoubtedly the case with ascertainment of multiple case families and no adjustment for this ascertainment criterion. However, if the marks were first selected based on Mendelian inheritance in the families where breast cancer is segregating as well and cases have likely been oversampled, would that not lead to inflated test statistics for p-values as well?
Author response: The p-value is the probability of observing data as or more extreme as the observed data, under the assumption that the null hypothesis is true. Under the null hypothesis, there is no association between breast cancer and the carrier probabilities for the probe, so oversampling for cases does not affect the distribution of the test statistic, hence the p-value is unbiased. This is analogous to the way that oversampling cases in a case-control study does not bias the p-value because even though cases are oversampled, there are no constraints on the exposure. Marks were first selected based on Mendelian inheritance, independently of case status.

Reviewer comment: Would the associations survive adjustment for this? Unbiased odds ratios for the risk would also be more informative for the evaluation of the (biological) significance of the findings.
Author response: Unfortunately, there is no conventional way to adjust for the clinic-based ascertainment of the families. Even methods that might be applied for mQTLs would require the relevant genetic variant to be measured. We therefore used an independent and population-based dataset, from the MCCS, to estimate unbiased ORs. Note that we expect aberrant methylation at most of these probes to be rare (the aim of the study is to find heritable factors with high enough risks to explain multiple-case families, and such factors must be rare) so we expect our power to detect these marks to be low in a population-based study (just as power is low in a GWAS to detect a very rare yet "high-risk" BRCA1 mutation). This is likely to explain why only some of the heritable methylation marks were associated with breast cancer risk in the MCCS.

Reviewer comment:
In table 2, the ORs in the MCCS case-control analysis are similar to low penetrance risk variants in general (0.83-1.26 for the nominally significant marks), suggesting a risk modifying effect rather than a causative role for the disease as such. In the family-based analysis, the risk of breast cancer is indicated to increase with carrier probabilities for all 11 sites. How is this consistent with the risks in the case-control-set

)?
Author response: On the direction of the effects, a genetic variant can increase or decrease methylation at a site, and a change in the level of methylation at a site can cause breast cancer, regardless of the direction of this change. So an OR>1 for the M-values of a particular site (in the MCCS) and an HR>1 for the carrier probabilities (in the family-based analyses) could occur if a genetic variant increases M-values at the site and this causes breast cancer. Similarly, an OR<1 for M-values and an HR>1 for the carrier probabilities could occur if a genetic variant decreases M-values at the site and this causes breast cancer. For example, for cg03916490 (near C7orf50), Supp. Fig. 3 shows that carriers generally have lower M-values than non-carriers, so since carriers have higher risks than non-carriers, we would expect OR<1, as observed.
On the size of the effects, note that the reported ORs for each mark in the MCCS analysis are the estimated ORs per standard deviation (SD). In a population-based study like the MCCS, we would only expect a small amount of variation in the M-values of these probes. However, a rare genetic variant that causes aberrant methylation at the probe could have a large effect on its methylation levels, and this would correspond to a large relative risk for carriers. For example, cg01741999 (near PNKD) has an OR per SD of 1.26, so if a genetic variant changes methylation by 5 SDs then the variant would have an OR of 3.2 (=1.26^5). In other words, more substantial variation due to a rare, heritable shift in methylation values may be associated with much larger increases in risk in multiple-case families.

Reviewer comment:
What could be the approximate effect sizes (OR) in the families (see comment above) -falling into low penetrance/modifier category or having a more substantial disease risk? Please discuss the possible significance of the findings in the framework of other breast cancer risk factors or alleles. How would the risk effects detected here compare with those discussed in the introduction?
Author response: Further work is required to estimate the effect sizes. We can only hypothesise. See our response to Comment 4 above, where we discuss extrapolating effect sizes from the population-based analyses to the family-based analyses.

Reviewer comment:
Altogether, the relationship between hyper/hypo/hemimethylation, direction of the risk effect (risk/protective) and further, effect on the putative target genes is unclear. In the discussion, please elaborate in more detail on this and the methylation status and effects on the expression of the respective genes, rather than dysregulation in general.
Author response: With respect, we think the associations between the M-values and breast cancer are clearly laid out in Table 2 (as we mention when responding to the comment 3 above, we can only estimate effect sizes using the population-based MCCS data). To clarify this, we have now also added Supplementary Table 2, which gives a contingency table and pvalue for the association between breast cancer and methylation β-values categorised into three groups (hypo-, hemi-and hypermethylated) for each probe. We also think we have been clearer about the direction of effect of the hypothetical genetic variants on breast cancer risk (and as noted above, we can't estimate the size of the effect in the family-based analyses). For example, we stated in the manuscript that "the risk of breast cancer increased with carrier probabilities for all 11 sites" (line 144). In addition, the effect of the hypothetical genetic variant on the M-values of each site is precisely given both graphically and in tabular form, as we say "the estimated effect of the hypothetical genetic variant on the M-values of each site can be seen from Supplementary Figure S3 or Supplementary Table 1" (line 146). We have attempted to make this clearer by explaining it in more detail (line 148-152).

Reviewer comment:
The breast cancer risk analyses were adjusted for several risk factors. Are the risk associations similar or different by estrogen receptor status, i.e. in ER positive or ER negative breast cancer for the 11 marks, and specifically, associating with the GREB1 gene "growth regulation by estrogen in breast cancer 1"? Author response: This is an interesting suggestion and we tested for an association with ER status. Only one methylation mark was associated with ER status (p<0.05), which may be due to chance (and was not cg18584561 at GREB1). We have included this in the manuscript (line 192-195).

This is a clear, well-written manuscript reporting a study investigating the association between individual DNA methylation marks (epimutations) in lymphocyte DNA and breast cancer risk. A family based study design is used to identify 11 epimutations with strong
statistical evidence of association with breast cancer risk. Overall the findings are novel and reasonably convincing that at least some heritable epimutations are associated with breast cancer risk.
Specific comments 1. Reviewer comment: 1. As part of the rationale for this study the authors state that heritable epimutations might account for some of the excess familial risk not explained by the known germline genetic variation. This rationale is repeated in the first sentence of the discussion. However, this argument needs further explanation. If an epimutation is truly heritable -passes from one generation to the next -it will presumably be linked to/correlated with nearby germline DNA variation. This DNA variation would also be associated with disease risk and so would account for some of the excess familial risk of disease. The argument is circular. The authors then go on the state that the know example of transgeneration epimutations in mismatch repair genes are in fact linked to nearby cis-acting variants. One would need functional genomic studies to establish whether or not an epimutation was caused by nearby cis-acting variants and whether or not the epimutation itself had any relevant direct functional consequence that resulted in disease risk.
Author response: We are studying multiple-case families with no known cause of breast cancer, and the rationale for our study is simply that it would be beneficial to find a cause. Even if a heritable methylation mark is caused by a mutation in an unknown gene, or is just associated with an unknown gene in the way the reviewer suggested, then we think identifying that mark is clearly worthwhile, because it will probably help to identify the unknown gene and it might help to identify a mechanism. Further, cis or trans acting genetic variants responsible for our methylation changes are likely to be situated anywhere in the genome and some of current genomic techniques (e.g. exome-seq, SNP arrays) are likely to miss a large fraction of the genome, especially intergenic regions. Hence, it may be more effective to measure methylation levels. On the last point, we agree (our text is consistent with this) and as discussed below, a comprehensive analysis of these marks, including functional genomic studies, is being planned and further work may be stimulated via the publication of this report.

Reviewer comment:
The methodology and general approach will not be familiar to most non-specialist readers (such as myself). It would therefore be helpful to include a main figure summarising the data and analysis for one of the associated epimutations that was also "significant" in the validation nested case-control study in order to aid the reader in understanding the underlying methodology. For example, if I have understood correctly, Figure S4 shows a trimodal distribution of methylation values for cg18584561 with means at approx -4, 0 and 2. I would interpret this as corresponding to three genotypes of a common bi-alleleic variant. It is then unclear how this relates to Figure S3. These data (distributions) could perhaps be shown separately for cases and controls from the familial samples (given the association I presume the distributions are different) together with the equivalent panel for Figure S3. Finally the M-value distribution in cases and controls for the validation nested case-control study could be shown.
Author response: Supplementary Figures 3 and 4 are now presented on the same scale for clearer comparison. In Supplementary Figures S4, we now present the distributions categorically by separating cases and controls, as suggested. We also have added a new figure illustrating our analytical approach (Figure 3).

Reviewer comment:
It would be useful to provide the estimated carrier frequency for each of the associated epimutations in the familial samples (by case-control status).
Author response: We now provide this data in Supplementary Table 3. The low carrier probabilities for some probes are presumably due to the very low prior (equal to 0.02, which is twice the population allele frequency) that was assumed for the probability for carrying the variant, and because at most one branch of the family can carry the variant (we assumed this in our analysis, but it also follows approximately from the rareness of the variant) so most people within each family will be "non-carriers".

Reviewer comment:
If some, or all of these epimutations were confirmed to be associated with breast cancer risk, they would be just like any germline variant only a marker for risk and some causal mechanisms would need to be established to make any claims beyond association.
Author response: Yes, we agree, we are reporting heritable methylation marks associated with breast cancer risk in this manuscript. Our text is consistent with this intention and consistent with our responses to reviewers 1 and 2.

Reviewer comment:
I do not agree that reducing the multiple testing burden increases statistical power in a useful way. The probability that an association that is declared significant at a predetermined threshold is a true positive depends on the statistical power and the prior probability of association. Reducing the number of tests does not alter the prior at all or the power. Whether or not a heritable epimutation is more likely to be associated with risk than a non-heritable epimutation is not known. Some evidence for this could be provided by investigating the association of the least heritable mutations with risk.
Author response: Thank you for comment. We agree with that screening out non-heritable probes will not necessarily increase our power to detect probes that are associated with the risk of breast cancer. However, (almost tautologically) it will increase the prior probability that a mark taken forward for association testing is associated with heritable breast cancer (i.e. breast cancer caused by a heritable factor). Therefore, the screening step will increase our power to detect probes that are associated with the risk of heritable breast cancer. This is the main aim of our study, so it is the context for our claim about improving power by selecting the most heritable methylation marks. However, we should have been more explicit about this, and we have now amended the manuscript to make it clear that our claims about power only apply to the identification of heritable methylation marks associated with breast cancer risk.
6. Reviewer comment: p10, l198 et seq. The authors state that it is remarkable that three of the eleven associated markers were associated with risk in an independent nested case control study (not cohort study as stated). It is not clear to me why this finding is unexpected. The replication suggests that these epimutations have a population frequency that is sufficient to be detectable in a modest sized case-control study. The authors also speculate that the carrier frequency for some of the epimutations that did not replicate may have been too low.
The histograms of the M-values in cases and controls from the nested case control study ought to be provided in Fig S4 to demonstrate the difference in likely carrier frequencies.
Author response: Thank you, we have edited the text to appropriately name the study design as a nested case control study (with the MCCS) and the distributions for the MCCS cases and controls are presented separately in Supplementary Figures 4. We have also revised the discussion to address other points above (line 271-284).

Reviewer comment:
Some estimation of the power of the replication study to detect the types of variant identified by the family study should be provided.
Author response: Before doing the MCCS analysis, we did not have any estimates of the true standard deviations (SDs) of the β-values for each probe, or of the true differences between the β-values of cases and controls. Therefore, we can only provide post-hoc power calculations based on the observed SDs and beta-value differences, which are likely to be biased. Most probes had observed beta-value SDs of approximately 0.2 and observed betavalue differences between cases and controls of less than 0.01, so post-hoc power calculations show that these probes had less than 11% chance to replicate. The 3 probes that replicated each had roughly 60% (post-hoc) chance of replicating, based on their observed SDs and beta-value differences, though again that post-hoc power calculation are usually biased. For these reasons, we have decided not to include this in the manuscript.

Reviewer comment:
If the frequency is sufficient to be detectable in the replication study one would expect the epimutation to be correlated with a common DNA variant (SNP) and, as such ought to have been detected in one of several large scale GWAS for breast cancer. Presumably germline genotyping data are available for some or all of these samples and so correlation between the epimutation and nearby SNPs should be evaluated and evidence that these SNPs are indeed associated with risk could be obtained (the authors have easy access to results from multiple BCAC studies).
Author response: As discussed in responses to Reviewer #1, we do not have any evidence that these heritable methylation marks are at or close to loci that have been identified to be associated with breast cancer risk via genome-wide association studies. There is some published information available for the three probes at VTRNA2-1 that is now included in our report. See line 222.

Reviewer comment: Discussion
The final sentence of the discussion is simply not justified. If heritable epimutations are in cis-with DNA variants how will epimutations provide new opportunities for increasing the precision of current risk models or how will they help in developing new strategies for cancer control? I accept that epimutations might offer therapeutic targets, but it would first need to be established that these epimutations are not simply markers of risk -i.e. they have functional consequences that make them valid targets.
Author response: As discussed above, we present the heritable methylation marks associated with breast cancer risk. This is the outcome of applying the method that we describe in the manuscript. While at least some of these methylation marks are likely due to underlying variation in genetic sequence, we do not have any data to demonstrate this. We have no evidence to suggest that we have re-identified a genetic risk factor (i.e. genetic variation that is already known to be associated with breast cancer risk) but even if we had then we think