Premenopausal endogenous oestrogen levels and breast cancer risk: a meta-analysis

Background: Many of the established risk factors for breast cancer implicate circulating hormone levels in the aetiology of the disease. Increased levels of postmenopausal endogenous oestradiol (E2) have been found to increase the risk of breast cancer, but no such association has been confirmed in premenopausal women. We carried out a meta-analysis to summarise the available evidence in women before the menopause. Methods: We identified seven prospective studies of premenopausal endogenous E2 and breast cancer risk, including 693 breast cancer cases. From each study we extracted odds ratios of breast cancer between quantiles of endogenous E2, or for unit or s.d. increases in (log transformed) E2, or (where odds ratios were unavailable) summary statistics for the distributions of E2 in breast cancer cases and unaffected controls. Estimates for a doubling of endogenous E2 were obtained from these extracted estimates, and random-effect meta-analysis was used to obtain a pooled estimate across the studies. Results: Overall, we found weak evidence of a positive association between circulating E2 levels and the risk of breast cancer, with a doubling of E2 associated with an odds ratio of 1.10 (95% CI: 0.96, 1.27). Conclusion: Our findings are consistent with the hypothesis of a positive association between premenopausal endogenous E2 and breast cancer risk.

Many breast cancer risk factors are believed to operate through circulating hormone levels; for example, early menarche, late menopause, fewer full-term pregnancies and delayed first full-term pregnancy. Specifically, early menarche and late menopause implicate oestrogens, progesterone, or both in the aetiology of breast cancer (Travis and Key, 2003). Incidence of breast cancer is seen to increase sharply with age in premenopausal women, but following the menopause, when oestrogen and progesterone levels both drop, the rate of increase in risk with age is dramatically reduced (Pike et al, 1993). The main mechanisms hypothesised to explain how circulating hormones could increase breast cancer risk are increased cell proliferation and mutagenesis (Pike et al, 1993;Travis and Key, 2003;Russo and Russo, 2004;Martin and Boyd, 2008). Oestradiol (E2) has been shown to increase breast cell mitosis in vitro (Key, 1999), and oestrogens and their metabolites have been demonstrated to induce DNA damage, genetic instability and cell mutations both in culture and in vivo (Travis and Key, 2003). Higher levels of endogenous oestrogens have been shown to be associated with an increased risk of breast cancer in postmenopausal women; a pooled analysis of nine prospective studies found that a doubling in the levels of E2 increased a woman's odds of developing breast cancer by 29% (95% CI: 15, 44%) (The Endogenous Hormones and Breast cancer Collaborative Group, 2002). However, no such association has been established in premenopausal women, possibly because of difficulties in measuring premenopausal E2, which has cyclic variation throughout the menstrual cycle (Key, 1999;Travis and Key, 2003).
The evidence on endogenous premenopausal oestrogens and breast cancer risk is mixed. Most studies have been fairly small, and the largest study to date, which included around 300 breast cancer cases, showed no evidence of an association. Breast cancer is rare in women at young ages, which makes it difficult to accumulate large numbers of cases prospectively. In addition, the within-subject variation in oestrogen levels is greater in premenopausal than postmenopausal women, giving rise to a greater degree of attenuation of effects through regression dilution bias, and this consequently limits the statistical power of individual small studies. This is a strong motivation for a meta-analysis: by pooling the estimates, the effective sample size is increased and effects can be estimated with greater statistical power.
However, combining results from epidemiological studies that differ to some degree in their designs and methods of statistical analysis is always somewhat controversial, and here there are several concerns. These probably contribute to the fact that no formal meta-analysis has been carried out to date. First, combining the results of the studies of endogenous oestrogen and breast cancer is not completely straightforward, because of the different ways in which studies have analysed and reported their results. Some report odds ratios between quantiles of oestrogen, some as an odds ratio per s.d. or unit increase in (log transformed) oestrogen, and some only as the mean difference in oestrogen between cases and controls. However, provided assumptions concerning the distribution of oestrogen are satisfied, it is possible (Greenland and Longnecker, 1992;Chene and Thompson, 1996) to estimate odds ratios for a unit increase in a risk factor even when results are presented in one of the other ways described above, and so this type of concern can be dealt with. We describe the methods used to do this in detail under statistical analyses.
Dealing with methodological differences in the designs of the individual studies is more problematic. There is substantial withinsubject variation in oestrogen levels across the menstrual cycle, and it is possible that the relationship between oestrogen levels and breast cancer risk differs according to the phase of the cycle. Pooling results from studies conducted using oestrogen levels measured in different phases may obscure this, as may merely adjusting for phase (without taking account of a possible interaction) within the individual studies. Further, the methods used to adjust for phase in the menstrual cycle vary between studies, from simple methods, such as matching on day of cycle or stratifying on phase, to the more sophisticated approach of obtaining residuals from spline regressions of the oestrogen curve throughout the menstrual cycle. In addition, most studies adjust or match for several other breast cancer risk factors, the choice of which varies from study to study. These methodological differences will inevitably introduce a degree of heterogeneity into the results of the studies and, given the available information from each study, this heterogeneity cannot be completely removed through careful statistical analysis. However, despite these differences, we believe that as the studies are all investigating the same basic question, a formal meta-analysis is justified. Such a meta-analysis needs to acknowledge that the differences in study design mean that it is virtually certain that there will be heterogeneity in the magnitude of the observed associations and to quantify this. If the extent of the heterogeneity was large, then this would cast doubt on the validity of a pooled estimate.
We therefore carried out a systematic review of the literature to identify all prospective studies of endogenous premenopausal oestrogens on breast cancer risk. To estimate the effect of a doubling of circulating premenopausal oestrogen levels on the risk of breast cancer, we combined the results from seven prospective studies, which include 693 breast cancer cases. We used a randomeffect meta-analysis to take account of the anticipated heterogeneity, resulting from the differences in methodology between studies described above.

Identification of studies
Studies were identified by performing a literature search on Medline and Embase from 1950 to February 2009, of articles containing the words 'breast' in combination with any of the terms 'cancer', 'carcinoma', 'neoplasia' or 'tumour' and the terms 'estrogen', 'estriol', 'estradiol', 'estrone', 'hormone' or 'steroid' in the title. Articles were restricted to prospective studies of circulating premenopausal oestrogens on breast cancer risk. The search (carried out by KW) produced 7895 articles, of which 148 titles were identified as potentially eligible. Overall, 13 studies were included based on their abstracts, but 6 were excluded after reading the full-text articles, 2 because the subjects were all postmenopausal, 2 because the cases were all a subset of one of the studies included in the meta-analysis, and 2 studies measured hormones other than oestrogens. The remaining seven studies were all nested matched case -control studies (Wysowski et al, 1987;Helzlsouer et al, 1994;Rosenberg et al, 1994;Thomas et al, 1997;Kabuto et al, 2000;Kaaks et al, 2005;Eliassen et al, 2006). In addition, the references of the selected articles and two review articles (Key, 1999;Travis and Key, 2003) were manually searched, but no additional articles were identified. No subjects were included in more than one study; two studies included women from the same cohort (Wysowski et al, 1987;Helzlsouer et al, 1994), but there was no overlap in participants selected for each study. Matched odds ratios for quantiles or linear increases in logtransformed E2, and all available summary statistics of (log transformed) E2 in cases and controls, were extracted from each study.

Statistical analyses
The distribution of E2 is positively skewed and a number of the individual studies considered the effect of E2 on breast cancer risk after logarithmic transformation of E2. We took the same approach, and assumed that a log-normal distribution for E2 is reasonable, so as to yield estimates of the odds ratio associated with any multiplicative increase of circulating E2. We present the odds ratio for a doubling of circulating E2. Strictly, odds ratios that are adjusted for (and/or matched on) different sets of covariates are not comparable (Steyerberg and Eijkemans, 2004). However, in most instances such 'heterogeneity bias' is small and we judged it preferable to use maximally adjusted estimates in the primary meta-analysis, so as to minimise bias arising through confounding. Where possible, as a check on the robustness of results, adjusted and unadjusted odds ratios were estimated or extracted from the individual studies.
The diversity in the ways in which results were reported meant that a range of techniques were needed in order to estimate the log odds ratio for a doubling in E2 and its standard error in the various studies. Maximally adjusted, matched odds ratios were calculated or extracted from all studies, except for Wysowski et al (1987), where such estimates were not reported. The studies by Rosenberg et al (1994) and Kaaks et al (2005) adjusted for phase in the cycle, using spline regression as well as matching on day of cycle, and we used the matched spline-adjusted estimates from both studies. Rosenberg et al (1994) also reported the results of an analysis using serial measurements of E2 on each woman, but we used the results from their analysis of each woman's first measurement of E2 to be consistent with the other studies.
Three studies (Rosenberg et al, 1994;Thomas et al, 1997;Kabuto et al, 2000) reported odds ratios for selected changes in logarithmically transformed E2. Specifically, Thomas et al (1997) reported the odds ratio for a unit increase in log e (E2), Kabuto et al (2000) for a unit increase in log 10 (E2) and Rosenberg et al (1994) for a 1 s.d. increase in spline-adjusted log e (E2). These estimates (and corresponding standard errors) were used to directly estimate the log odds ratio for a doubling of E2 and its standard error in these studies.
In three further studies (Helzlsouer et al, 1994;Kaaks et al, 2005;Eliassen et al, 2006), results concerning the log odds ratio in each quantile of E2 compared with baseline were used to estimate the log odds ratio for a doubling of E2 and its standard error, using the method described by Greenland and Longnecker (1992). The Greenland and Longnecker (1992) method requires estimates of the mean log e (E2) in controls and the overall s.d. of log e (E2) in order to estimate the mean of log e (E2) in each quantile, as described by Chene and Thompson (1996). The mean and s.d. of log e (E2) were estimated as the intercept and slope of normal quantile -quantile plots of the quantiles of log e (E2), respectively, separately for cases and controls. The pooled s.d. in cases and controls was used as the overall s.d., after testing for evidence of a difference in s.d. values in cases and controls using an F test. The odds ratios reported by Kaaks et al (2005) are for quartiles of residuals of E2 from a spline regression model used to model the cyclic change in E2. Quartiles of the residuals in controls added to the mean in controls were assumed to be the quartiles of E2 in controls, adjusted for phase in cycle.
In the study by Wysowski et al (1987), adjusted, matched estimates were not presented, and so linear discriminant function analysis (Hand et al, 1998) was used to obtain estimates of the log odds ratio, and its standard error (ignoring the matching) from the means of E2 in cases and controls. No measure of spread of E2 was given in this paper. Accordingly, a pooled s.d. from the other six studies was used. Methods used to estimate the s.d. of log e (E2) in cases and controls for each study are summarised in Table 2. We pooled the 10 estimates of s.d. of log e (E2) across cases and controls and across studies, first pooling the luteal and follicular estimates of s.d. from the study by Eliassen et al (2006) separately in cases and controls. The means of E2 in cases and controls, along with this pooled s.d., were used to estimate the mean of log e (E2) in cases and controls (Weisstein, 2010), and hence the log odds ratio for a doubling of E2 and its standard error.
The above sections describe the way in which our best estimates of the log odds ratio and its standard error were obtained in each study. In addition, in five of the studies (Helzlsouer et al, 1994;Rosenberg et al, 1994;Thomas et al, 1997;Kaaks et al, 2005;Eliassen et al, 2006) it was possible to obtain estimates using more than one approach, and we compared these estimates within studies. The sensitivity of the meta-analysis to the choice of estimation method was also examined. This allowed a comparison of matched estimates (using reported matched estimates or by implementing the Greenland and Longnecker (1992) method) and unadjusted estimates ignoring the matching (by implementing Discriminant Function Analysis; Supplementary Table 1).
The individual study estimates were combined using a randomeffect meta-analysis, as described by DerSimonian and Laird (1986). A random-effect approach was used, as we anticipated some heterogeneity in study-specific effects due to the methodological differences between studies. The random-effect metaanalysis uses both the variance of the study-specific results and the estimated between-study variance in results to assign weights to the individual studies. This results in more homogeneous weights than a fixed-effect inverse-variance weighted metaanalysis. Between-study heterogeneity was assessed using I 2 -statistics and Cochran's Q-statistic. Publication bias was explored using funnel plots and Galbraith plots (Galbraith, 1988). Eliassen et al (2006) collected two blood samples from each woman, one from each of the follicular and luteal phases of the menstrual cycle, and reported separate odds ratios for each phase. As oestrogen levels within women are correlated , the meta-analysis was carried out twice, once including the follicular estimate and once the luteal estimate from this study. Helzlsouer et al (1994) stratified their analysis by follicular and luteal phase, with measurements in each phase from different women, and we pooled the two estimates before including them in the meta-analysis.
Two of the studies also reported the association between breast cancer risk and circulating levels of free E2 (Kabuto et al, 2000;Eliassen et al, 2006), an estimate of the amount of circulating E2 that is unbound to sex hormone binding globulin. The metaanalysis was repeated using the same methods described above to obtain a combined estimate of the odds ratio for breast cancer with a doubling of free E2.

RESULTS
The seven studies, which provided 693 breast cancer cases and 1609 controls, are summarised in Table 1. Four studies were from US populations (Wysowski et al, 1987;Helzlsouer et al, 1994;Rosenberg et al, 1994;Eliassen et al, 2006), two from European populations (Thomas et al, 1997;Kaaks et al, 2005) and one from a Japanese population (Kabuto et al, 2000). The age ranges of the women in the studies were similar, from approximately 25 to 55. Six out of seven studies matched on time in menstrual cycle (Wysowski et al, 1987;Helzlsouer et al, 1994;Rosenberg et al, 1994;Thomas et al, 1997;Kaaks et al, 2005;Eliassen et al, 2006), and all studies matched on some other factors such as demographic or reproductive factors (Table 1), but estimates from the matched analyses were available for only six of the studies. In two of the studies, women were known to be premenopausal at the time of their breast cancer diagnosis (Wysowski et al, 1987;Helzlsouer et al, 1994). In all but one of the studies (Wysowski et al, 1987;Helzlsouer et al, 1994;Rosenberg et al, 1994;Thomas et al, 1997;Kaaks et al, 2005;Eliassen et al, 2006) it is stated that women using hormonal contraceptives at the time of the blood draw were excluded. Two of the seven studies were limited to invasive breast cancer cases (Rosenberg et al, 1994;Kaaks et al, 2005). Table 2 reports estimates of the geometric mean and geometric s.d. of E2 in cases and controls, within each study. The pooled geometric s.d. was close to 2 (1.91), indicating that levels 1 s.d. above the (geometric) mean were approximately double the geometric mean, while levels 1 s.d. below the geometric mean were approximately half of it. The estimated odds ratios for a doubling of E2 for each study, with a pooled odds ratio from the random-effect meta-analysis, (a) including the luteal estimate from the study by Eliassen et al (2006) and (b) using the follicular estimate from this study, are summarised in Figure 1. There was weak evidence of a positive association between circulating levels of E2 and breast cancer risk, with a doubling of E2 conferring a 10% (95% CI: À4, 27; P ¼ 0.08) to 14% (95% CI: À2, 32; P ¼ 0.17) increase in the odds of developing breast cancer, depending on which estimate is included from the Eliassen et al (2006) Table 3 reports estimated odds ratios for a selection of percentage increases in E2.
There was little evidence of between-study heterogeneity, with the proportion of variability between studies falling between 20% and 33%, depending on which estimate was included from the study by Eliassen et al (2006). The funnel plots and Galbraith plots showed no evidence of publication bias. The results of the metaanalysis were not substantially affected by the methods used to estimate the dose -response effect estimates: the pooled randomeffect estimate from unadjusted estimates ignoring the matching, for a doubling of E2, where available (using Discriminant Function Analysis in six of the studies and the reported unadjusted matched estimate from the study by Kabuto et al (2000)), was 1.08 (95% CI: 0.98, 1.20) and 1.12 (95% CI: 1.00, 1.26), when including the luteal and follicular measurements from Eliassen et al (2006) study, respectively. Within studies, the unadjusted odds ratio ignoring the matching was very close to the adjusted matched odds ratio for four out of five of the studies where it was possible to estimate odds ratios using two methods (Supplementary Table 1).

DISCUSSION
Our analysis provides some evidence of a positive association between circulating E2 levels and breast cancer risk in premenopausal women. We estimate that a doubling of circulating E2 increases a woman's risk of breast cancer by 10% (95% CI: À4, 27).
The breast cancer odds ratio for a doubling of postmenopausal E2 has been estimated to be 1.29 (95% CI: 1.15, 1.44; The Endogenous Hormones and Breast cancer Collaborative Group, 2002), which is larger than our estimate for premenopausal E2. The confidence intervals overlap, however, and it is therefore still unclear whether there is a different effect of circulating E2 in women before and after the menopause. Within-subject variability in E2, which is likely to be substantially larger in premenopausal than postmenopausal women, will dilute the effect estimate, and this could contribute to the difference in the estimates. The majority of breast cancer cases in the meta-analysis are likely to be premenopausal, and there may be differences in the aetiology  Premenopausal oestrogen and breast cancer risk K Walker et al of premenopausal and postmenopausal breast cancer, which result in oestrogen levels influencing breast cancer risk to a lesser extent before the menopause. For example, premenopausal breast cancers are more likely to be oestrogen receptor negative and therefore not sensitive to oestrogen (Vasiljevic et al, 1998). Perhaps circulating premenopausal oestrogen has a different effect on breast cancer risk than circulating postmenopausal oestrogen; premenopausal E2 is produced predominantly in the ovaries, while in postmenopausal women circulating levels are much lower, and production is predominantly by conversion of androgen precursors in adipose tissue into oestrone, which is then converted into E2 (Travis and Key, 2003). We estimated the geometric s.d. of E2 to be 1.91 in premenopausal women, while the geometric s.d. in postmenopausal women, estimated from the lower and upper quartiles of E2 in the published pooled analysis (The Endogenous Hormones and Breast cancer Collaborative Group, 2002) using a Q -Q plot, is 1.67. This is consistent with within-subject variability being somewhat greater in pre-than postmenopausal women, but nonetheless the total variability on the multiplicative scale appears to be sufficiently similar for informal comparisons of the effect of a doubling of levels to be made. The endogenous androgens testosterone, androstenedione, DHEA and DHEAS, which do not have the cyclical variation of oestrogen, have been found to have similar effects on breast cancer risk in premenopausal and postmenopausal women. Odds ratios for breast cancer for premenopausal women in the top vs bottom quartiles of several androgens in the study by Kaaks et al (2005) range from 1.5 to 1.7 , while for postmenopausal women in the top vs bottom quintiles of the same androgens, the estimates range from 1.7 to 2.2 (The Endogenous Hormones and Breast cancer Collaborative Group, 2002). This supports the argument that one contributory factor to the difference in estimates for premenopausal and postmenopausal E2 could be increased within-subject variability before the menopause. Rosenberg et al (1994), however, report an estimate using the mean of several E2 measurements from the same women, thus reducing measurement error in the exposure. This result, which was not included in our meta-analysis, shows a similar association between oestrogen and breast cancer risk to their estimate using a single E2 measurement, whereas a stronger association would be expected if measurement error really was biasing the result towards the null.
All studies used E2 as the exposure, and in addition oestriol was measured in one study (Wysowski et al, 1987) and four studies measured oestrone (Wysowski et al, 1987;Helzlsouer et al, 1994;Kaaks et al, 2005;Eliassen et al, 2006). E2 was chosen as the exposure for the meta-analysis because it is the predominant oestrogen in premenopausal women (Herjan, 2004), and was measured in all seven studies. The estimates of the effects of circulating oestrone on breast cancer risk were similar to those of circulating E2 in the four studies, in which both oestrogens were measured, albeit with somewhat larger effect estimates for luteal oestrone than luteal E2 (but similar estimates for follicular oestrone and E2) in the study by Eliassen et al (2006).
All studies included in the meta-analysis matched cases and controls. All estimates in the primary meta-analysis are from fully matched/adjusted analyses, apart from those from the study by Wysowski et al (1987). However, different studies match and adjust for different factors, including measures of adiposity, reproductive factors and family history of breast cancer. One study reported adjusted and unadjusted (matched) estimates (Eliassen et al, 2006) and found that estimates adjusting for BMI at the age of 18 years, age at menarche and first birth, parity, history of benign breast disease and family history of breast cancer, were only slightly larger than unadjusted (matched) estimates. We also found that within studies our estimate of the odds ratio ignoring the matching and adjustment was very close to the adjusted matched odds ratio for all studies where we were able to estimate both, except for the study by Rosenberg et al (1994), in which matching and adjustment changed the estimate from close to one to a modest, but nonsignificant association. Carrying out a metaanalysis where we used estimates ignoring matching and adjustment (where possible) gave very similar results to those where we did allow for matching and adjustment. These findings, together with the reasonably low between-study heterogeneity, imply that our combined estimates are reasonably robust to differences in matching and adjustment between studies.
We assumed in our analysis that a log-normal distribution for E2 is reasonable. In three of the studies (Rosenberg et al, 1994;Thomas et al, 1997;Kabuto et al, 2000) it is explicitly stated that E2 was log-normally distributed. In addition, the 12.5th and 87.5th percentiles of E2, reported by Eliassen et al (2006) study, are equidistant from their reported median, after log transformation, which means that log transformation removes the positive skew  (1987) Helzlsouer et al (1994) Rosenberg et al (1994) Thomas et al (1997) Kabuto et al (2000) Kaaks et al (2005) Eliassen et al (2006)  observed in their data. Similarly, the 25th and 75th percentiles of the residuals of E2 in controls from the spline regression carried out by Kaaks et al (2005) are equidistant from their median after log transformation. It seems therefore that a log-normal assumption is reasonable. It is possible that the effect of a doubling of E2 on breast cancer risk differs according to the phase of the menstrual cycle. This, coupled with the fact that different methods were used to match or adjust for time of blood sample collection in the menstrual cycle in the various studies, may limit the utility of a single pooled estimate. However, a number of factors suggest that our pooled estimate may well be generalisable to the different phases of the cycle. First, we did not find any statistically significant evidence of heterogeneity in the magnitude of associations across studies. Second, although one study (Eliassen et al, 2006) did report weak evidence of an association in the follicular phase, but not in the luteal phase, confidence intervals on the magnitude of the associations in the two phases were wide and they did not report the result of a statistical interaction test for a difference between these (Matthews and Altman, 1996). Third, in a previous study (Walker et al, 2009) we took repeated measurements of urinary oestrone glucuronide, a principal metabolite of serum oestrogens that is highly correlated with serum E2 (Tanabe et al, 2001), in premenopausal women, and estimated the association at different times in the menstrual cycle with mammographic density, a very strong marker of breast cancer risk. We found that the mean level of oestrone glucuronide throughout the cycle is the most biologically relevant measure associated with mammographic density, rather than that in any particular phase in the cycle. This provides us with some evidence that our pooled results are not materially affected by the different ways in which time in the menstrual cycle was accounted for.
In postmenopausal women, circulating levels of free E2 were found to have a larger effect on breast cancer risk than total E2 (The Endogenous Hormones and Breast cancer Collaborative Group, 2002). We estimated a similar increased risk in breast cancer for a doubling of free E2 and a doubling of total E2; however, there was a large amount of uncertainty in both estimates as only two studies provided estimates of the association with free E2. We have no evidence to suggest that circulating levels of free E2 are more strongly associated with breast cancer risk than total E2 in premenopausal women. Helzlsouer et al (1994) and Eliassen et al (2006) both report results stratifying by phase in the menstrual cycle, and both found a stronger association with breast cancer in the follicular than the luteal phase. The study by Thomas et al (1997) showed that the difference in E2 between cases and controls was greatest in the mid-cycle, at the transition between the follicular and luteal phases, but this was based on only seven cases in the mid-cycle. It remains to be seen whether E2 levels at a particular point in the cycle are more important in the aetiology of breast cancer than the average level over time.
It is worth considering whether a woman's circulating E2 level could be added to models for projecting the risk of breast cancer, such as the Gail model or the 'Gail model 2' (Costantino et al, 1999). Gail (2008) demonstrated that adding seven single-nucleotide polymorphisms to the 'Gail model 2', each of which confers an OR of between 1.07 and 1.26 and has a carrier frequency of between 0.13 and 0.50, added very little discriminatory accuracy to the prediction model. If premenopausal women in the top quartile of circulating E2 carry an increased risk somewhere in the region of 25% compared with those in the bottom quartile, adding E2 level to the model would do little to improve its prognostic accuracy.
This meta-analysis has demonstrated weak evidence of a positive association between premenopausal endogenous E2 levels and breast cancer risk. More studies are needed to accurately quantify this association, in order to provide meaningful estimates and to allow a comparison with the association in postmenopausal women. Repeated measurements in each woman may be helpful to reduce measurement error in E2. Further estimates in the follicular and luteal phases of the cycle separately would aid the discussion of whether E2 in a particular phase in the cycle is important, or whether mean levels across the cycle are implicated.
Supplementary Information accompanies the paper on British Journal of Cancer website (http://www.nature.com/bjc)