Several case reports in the 1980s, linking assisted conception and ovarian cancer, raised concerns about the long-term health effects of infertility treatment (Fishel and Jackson, 1989). These reports prompted many investigations into potential associations between exposure to fertility treatments used to stimulate ovulation and cancer risks. Some earlier studies (Whittemore et al, 1992; Rossing et al, 1994; Shushan et al, 1996) reported positive associations between ovarian stimulation and risk of ovarian cancer, but others did not confirm this. Raised risks have also been reported for cancers of the breast (Burkman et al, 2003; Orgéas et al, 2009), endometrium (Ron et al, 1987; Modan et al, 1998; Althuis et al, 2005; Calderon-Margalit et al, 2008), thyroid (La Vecchia et al, 1999) and malignant melanoma of the skin (Rossing et al, 1995; Calderon-Margalit et al, 2008), but again these findings were not replicated in other studies.

A cause–effect relationship between infertility treatment and cancer risks would have important implications for assisted conception programmes, which have been expanding considerably in recent years. A total of 34 855 women had in vitro fertilisation treatment in the United Kingdom in 2006 corresponding to 44 275 cycles of treatment, a 6.8% increase in the number of patients relative to the previous year (HFEA, 2008). We investigated the long-term health effects of the use of ovarian-stimulation treatments, with particular attention to potential cancer risks, in a large British cohort of women investigated for ovulatory disorders who have been followed-up for over 20 years.

Materials and methods

Study subjects were identified through two case series of women who attended reproductive endocrinology practices in London. The first consisted of 7425 women who attended the Royal Free Hospital in 1963–1999 (Ginsburg and Hardiman, 1991). The second was assembled at University College Hospital (Reproductive Medicine Unit) and comprised 1727 women (including five also seen at the Royal Free Hospital), many of whom were treated with clomiphene citrate in the 1960s and 1970s as part of the initial evaluation of this drug, prior to it being available on the market. The present follow-up study was approved by all the relevant ethics committees.

From the meticulous clinical notes kept by the founders of these case series, a trained abstractor extracted and computerised relevant data, including information on signs and symptoms at presentation, final diagnosis, treatments prescribed (with number of cycles and dose) and their outcome. Hospital records (mainly on microfilms) and computer databases were also reviewed. Data on underlying diagnoses and treatments were reviewed by a panel of infertility clinicians and re-classified using more up-to-date and standardised criteria. Underlying ovulatory disorders were classified according to the WHO classification (WHO, 1973). For the specific purpose of this study, treatments were classified according to their physiological effects on the ovary and endometrium into five broader, and not-mutually exclusive, categories: (i) ovarian stimulation; (ii) ovarian physiological stimulation; (iii) ovarian suppression; (iv) endometrium stimulation; and (v) endometrium suppression (as detailed in Table 1).

Table 1 Baseline and follow-up characteristics of the study population

Study subjects were followed through the National Health Service Central Register (NHSCR) in England and Wales to ascertain their vital status, and to obtain information on site-specific cancer incidence, cause-specific mortality and migrations. A total of 7444 women out of the initial 9152 (81.3%) were traced and flagged through this register. For 95% of the 1708 who could not be traced, there was insufficient detailed information (e.g., name too common with no information on exact date of birth) or the names were foreign suggesting that they might have been non-UK residents. A further 89 subjects were excluded because flagging was considered to be unreliable (n=8), they were no longer NHS patients at the time they joined the cohort (n=79), or they have subsequently undergone a sex change operation (n=2). For a further 183 women information on treatment was lacking and these were excluded from any treatment-related analyses. Women who were known to be still alive and resident in England and Wales, and whose current general practitioner (GP) could be traced through their Health Authorities, were sent (through their GPs) a postal questionnaire to obtain further details on their reproductive and lifestyle characteristics.

To compare mortality and cancer incidence with that in the general population, expected numbers of deaths and cancers in the cohort were calculated by applying England and Wales female rates to the number of person-years at risk, stratifying by 5-year calendar period and 5-year age bands. Time at risk for mortality analyses was estimated from the date of first hospital visit (or the date of first treatment for analysis by type of treatment) to death, emigration, loss to follow-up, or 31 December 2005, whichever occurred first. For cancer incidence analyses, time at risk was from 1 January 1971, when cancer registration reached national coverage, or from date of hospital visit/treatment if these occurred after that date, to the earliest of date of diagnosis of the first primary malignancy, emigration, loss to follow-up, date of death, or 31 December 2005. Period-age-standardised mortality (SMR) and incidence ratios (SIRs) were then calculated as the ratio between the observed and expected number of events (deaths or cancers, respectively) in the cohort, with their 95% percent confidence intervals (CI) estimated using an exact method (Sasieni, 1995).

Data on mortality and cancer incidence for England and Wales were provided by the Office for National Statistics as tabulations of numbers of female cancers and deaths by single calendar year, single year of age, and four-digit codes of the International Classification of Diseases (ICD) (revision 7 (ICD-7) for 1961–1967; ICD-8 for 1968–1978; ICD-9 for 1979–2000; and ICD-10 for 2001–2005) (WHO, 1957, 1967, 1977, 1994). Appropriate bridging of ICD codes across the various revisions was performed to ensure comparability throughout the follow-up period.

To account for socioeconomic differences between the cohort and the general population, we also obtained rates for England and Wales by quintiles of socioeconomic deprivation as defined by the Carstairs Index (Carstairs and Morris, 1989) and, from 1995 onwards, the income domain of the Index of Multiple Deprivation (IMD) (DETR, 2000). Rates for the top two quintiles (the two most affluent) of the national distribution of deprivation scores were used to estimate expected number of events in the cohort as, at that time, infertility treatment was sought mainly by women of high socioeconomic status (68.6% of the women who completed the questionnaire were in social classes I and II (the two most affluent)) (OPCS, 1991), based on their own or their partner's occupation, with only 5.1% being in social classes IV and V (the two least affluent); equivalent figures for England and Wales females were 27.8 and 23.1%, respectively (OPCS/GRO, 1993).

Two approaches were used to compare study groups within the cohort. In the first, the risk of dying from a particular cause, or of developing a certain site-specific cancer, among patients ‘exposed’ to a given characteristic (e.g., treatment type) relative to the risk among those ‘unexposed’ was estimated as the ratio between the two corresponding SMRs, or SIRs, to take into account calendar period and age effects. The 95% CI for these relative risks (RRs) were calculated using an exact method (Breslow and Day, 1987). The second approach used Cox proportional hazards models (Clayton and Hills, 1993) to examine treatment effects in more detail while adjusting for potential confounders. RRs were estimated as hazard rate ratios in users relative to non-users of a given treatment while adjusting for current age (as the analysis time-scale); other variables were included in the regression models to evaluate their roles as potential confounders or effect modifiers. Because of the small number of cases assessment of confounding was performed for each potential confounding variable one at a time, by comparing the age-adjusted and the age-variable-adjusted estimates within the subset of women with data on that variable. The proportional hazard assumption, evaluated by visual examination and a formal test (Schoenfeld, 1982; Grambsch and Therneau, 1994), was met for all models shown. All statistical analyses were performed using Intercooled Stata 10.0 (Stata Corporation, College Station, TX, USA).


The baseline characteristics of the women in the final study population (n=7355) were similar to those in the original cohort (n=9152) (Table 1). The mean age at presentation among the participants was 28.1 years, with a mean follow-up of 21.4 years; 89% of the participants were followed-up for at least 10 years and 14% for at least 30 years. Half of the participants presented because of menstrual disturbances (Table 1). After investigation, 24% were diagnosed with a WHO type II ovulatory disorder, mainly with polycystic ovarian syndrome. A total of 3196 (44.5%) patients received ovarian-stimulation treatments, 1976 receiving clomiphene only, with a median number of cycles equal to two corresponding to a median dose of 1000 mg per woman; 18% were prescribed more than the maximum of six cycles currently recommended by the Committee on Safety of Medicines (1995). A total of 1198 women were prescribed gonadotrophins (alone or in combination with clomiphene), with a median number of cycles per woman equal to three (Table 1).

Questionnaire data were available for 2545 participants (out of the 4475 who were still alive and whose current GP could be traced and was willing to forward the questionnaire to them; a response rate of 56.9%). There was no evidence that respondents differed from the remaining study population in terms of clinic attended (79.8 vs 83.8%, respectively, attended the Royal Free Hospital), age at initial clinical evaluation (mean±s.d.: 28.0±7.1 vs 28.5±8.4 years, respectively), or type of treatment (e.g., 29.6 vs 26.5% were prescribed clomiphene only, respectively).

In all, 274 deaths occurred during follow-up to the end of 2005; 47% were from malignant neoplasms, including 39 from breast cancer, 10 from ovarian cancer and 7 from cancer of corpus uteri (hereafter referred to as cancer of the uterus) (Table 2). Relative to the general population, mortality in the cohort was lower for all-causes (SMR=0.89), reflecting lower risks for most specific causes except, as expected, for endocrine and metabolic diseases (SMR=1.99) (Table 2). Analyses by cancer site showed increased mortality in the whole cohort for cancers of the liver and biliary tract (SMR=2.68) and of the uterus (SMR=3.02), with only the latter being statistically significant (Table 2); in contrast, mortality from cancers of the breast and ovary did not differ from those in the general population. Further analyses stratified by type of treatment revealed that the excess risk of dying from liver and biliary cancer was confined to women who were prescribed ovarian stimulation; these women had a four-fold increase (RR=4.19, based on small numbers; Table 2) in risk relative to those not given this treatment. Similarly, women who were given ovarian-stimulation drugs had more than two times (RR=2.37) the risk of dying from breast cancer than those who were not, with this difference being statistically significant. In contrast, those prescribed ovarian-stimulation drugs had only half the risk of dying from cancers of the uterus (RR=0.53) and ovary (RR=0.57) of those not given such drugs (Table 2), but these estimates were rather imprecise.

Table 2 Mortality in the cohort for selected disease categories, by type of treatment

A total of 367 incident malignant neoplasms occurred from 1971–2005, comprising 177 breast, 31 uterine and 21 ovarian cancers. Relative to the general population, cohort members had increased risks of developing cancers of the breast (SIR=1.13), uterus (SIR=2.02) and nervous system (SIR=1.91), albeit only significant for the latter two, but a significantly lower risk of developing cervical cancer (SIR=0.21) (Table 3). Analyses stratified by type of treatment showed that, relative to the general population, women who were prescribed ovarian stimulation had a borderline raised risk of developing a malignant neoplasm (SIR=1.10), with significant increased risks for cancers of the breast (SIR=1.26) and uterus (SIR=2.31). There were also non-significant increased risks for cancers of the liver and biliary tract (SIR=2.59) and for nervous (SIR=1.96) and lymphatic and haematopoietic systems (SIR=1.66), as well as a significantly lower risk for cervical cancer (SIR=0.09). Women who were not prescribed ovarian stimulation had no increase in the risk of breast cancer (SIR=0.99), but a non-significant increased risk of cancer of the uterus (SIR=1.66); they also had a significantly lower risk of cervical cancer (SIR=0.36). Thus, risks among women who were prescribed ovarian stimulation relative to those not prescribed such treatments were not significantly raised for any cancer site although there were borderline increases for all neoplasms (RR=1.17) and cancer of the breast (RR=1.27) (Table 3). There were no associations between any of the other type of treatment categories (Table 1) and risks of cancers of the breast, uterus or ovary (data not shown).

Table 3 Cancer incidence in the cohort for selected sites, by type of treatment

The lower mortality in the cohort relative to the general population was mainly accounted by socioeconomic differences between the two populations as most SMRs became closer to unity when England and Wales rates for the two most affluent quintiles of the national distribution of area-based deprivation scores were used to derive expected numbers (e.g., the SMRs for deaths from circulatory, respiratory and digestive systems increased from 0.67, 0.77 and 0.87 (Table 2) to 0.96, 1.13 and 1.34, respectively); similarly, the magnitude of most SIRs increased slightly. However, the magnitude of the RRs associated with ovarian stimulation were little affected, with those for cancers of the breast, uterus and ovary being 1.27 (95% CI 0.93, 1.74), 1.40 (0.64, 3.18) and 1.40 (0.53, 3.96), very similar to those presented in Table 3.

Within cohort analyses do not suggest that associations between ovarian stimulation and risks for cancers of the breast and uterus were confounded by other risk factors for these cancers (small numbers precluded similar analyses for ovarian cancer). Having ever been pregnant as recorded in the clinical notes was not a confounder of the association of ovarian stimulation with cancers of the breast (age-adjusted RR=1.30 (95% CI 0.96, 1.77); age and ever-pregnancy-adjusted=RR 1.28 (0.94, 1.74)) or uterus (age-adjusted RR=1.53 (0.72, 3.23); age- and ever-pregnancy-adjusted RR=1.73 (0.81, 3.67)). Similarly, adjustment for having ever been pregnant before treatment, or for having become pregnant as a result of it, did not affect the magnitude of these cancer associations. In the subset of women who completed the questionnaire (comprising 41 treated and 28 untreated breast cancer cases and a total of 30 583 person-years of follow-up) the age-adjusted RR associated with ovarian stimulation was 1.02 (95% CI 0.63, 1.65) and its magnitude changed little with further adjustment for having ever been pregnant (1.02 (0.63, 1.66)), age at first pregnancy (1.03 (0.61, 1.75)), ever-use of oral contraceptives (1.01 (0.62, 1.64)) or hormone replacement therapy (1.05 (0.65, 1.71)), or a positive family history of breast cancer (1.05 (0.65, 1.72)). The small number of cases among respondents precluded similar analyses for cancer of the uterus.

Ovarian-stimulation treatment was associated with underlying diagnosis with, for instance, higher proportions of treated vs untreated women among WHO type II ovulatory disorders (53 vs 47%), but lower proportions among thyroid disorders (39 vs 61%) and weight-related problems (48 vs 52%). There were, however, no associations between underlying diagnosis and cancer risks except that women with type II ovulatory disorders had an elevated risk of cancer of the uterus (2.82 (1.13, 6.70)) relative to women with other diagnoses. Further adjustment for underlying diagnosis did not affect the magnitude of the ovarian stimulation – cancer of the uterus association (e.g., age-adjusted and age and WHO type II-adjusted RR: 1.53 (0.72, 3.23) and 1.48 (0.60, 3.63), respectively). There was also no evidence that the ovarian-stimulation effects were modified by any of these risk factors although the power of the study to detect interactions was limited. In particular, the magnitude of the associations with cancer risks was not modified by age when the treatment was first prescribed. Only 121 women developed ovarian hyperstimulation syndrome and the small numbers of cases among them (only one cancer of the breast and one of the uterus) precluded examination of whether risks were particularly elevated in this subgroup.

More detailed analyses by type of ovarian stimulation drug prescribed showed that the risk of developing breast cancer was significantly elevated among women who were prescribed clomiphene only (SIR=1.41), but this risk was not higher than that among women who were not treated (ovarian stimulation or any other) (Table 4). There were no clear dose–response trends in the risk of breast cancer with time since first treatment, total cumulative dose, or number of cycles of clomiphene, although women in the highest exposure categories had the highest risks (Table 4). The risks of cancer of the uterus were significantly raised among women who took ovarian-stimulation drugs but again none of the risks were significantly higher than those observed among the women not given any type of treatment. There were no clear trends in the risk of this cancer with time since first treatment, but there was a positive trend with total cumulative dose and number of cycles of clomiphene, with the first being significant (P for linear trend=0.034). Thus, women who took 2250 mg were 2.62 (95% CI 0.94, 6.82) times more likely to develop cancer of the uterus than those who were not treated. The risk of cancer of the ovary was not associated with any of the ovarian-stimulation treatments, and there was no evidence of any trends in risk with time since first use, total cumulative dose or number of cycles of clomiphene, but estimates were based on small numbers. Risks for any of these three cancer sites were not associated with time since first treatment with gonadotrophins or number of cycles or ampoules, but again relevant numbers were small (not shown).

Table 4 Risks of cancers of the breast, corpus uteri, and ovary by type of ovarian-stimulation treatment, time since first treatment, and dose


This cohort study has several strengths. Notably, in the absence of trials, cohort studies are the next best design because information on exposures is obtained prior to onset of disease. Linkage to the NHSCR ensured that information on cause-specific mortality and cancer incidence was also unbiased relative to the exposure status of the cohort members and minimised losses to follow-up. Second, the study had a long follow-up, allowing examination of long-term effects of ovarian stimulation. Third, it benefited from detailed information on causes of infertility and drug exposures from medical records as well as on cancer risk predictors obtained through completed questionnaires (although the latter only for a subset). Fourth, about one fifth of women were exposed to high levels of ovarian-stimulation drugs, well above currently recommended maximum levels. Fifthly, the availability of an internal comparison group may have minimised possible confounding by any factors correlated with treatment-seeking behaviour, because women who attended the two reproductive endocrinology practices are likely to differ from the general population in several respects. In particular, our study confirmed that women who seek infertility treatments tend to be healthier and of a higher socioeconomic background as those in the general population. Finally, the availability of information on underlying diagnosis and other risk factors also enabled adjustment for a variety of potential confounding factors.

Weaknesses of our study include the fact that follow-up was possible only for 80% of the original cohort; however, there was no evidence that those untraced through the NHSCR differed from those who were traced. Although the number of cancer cases accrued during the follow-up was larger than in most similar cohorts (Venn et al, 1995, 1999; Doyle et al, 2002; Klip et al, 2002), although not all (Brinton et al, 2004; Jensen et al, 2007; Calderon-Margalit et al, 2008: Jensen et al, 2009), the numbers for certain sites, particularly uterus and ovary, were too small to provide reliable estimates. The numbers will increase with increasing follow-up as these women are now reaching the ages when cancer risk sites are high. Follow-up for cancer incidence was possible only from 1971, but although we may have missed some earlier cases, most women were recruited later. Data on various potential confounding variables were available but its quality and completeness were not always ideal, being limited to women who were still alive and traceable at the time of the re-contact, hence residual confounding by these and other correlates of infertility cannot be excluded.

Our study found increased incidence of, and mortality from, breast cancer among women who were prescribed ovarian-stimulation treatments relative to those who were not, albeit the relative risk was significant only for mortality data. Further analyses by cumulative dose, and number of cycles did not reveal any clear dose–response gradients. The lack of such trends would argue against a true cause–effect relationship. This interpretation would be consistent with findings from most cohort (Modan et al, 1998; Venn et al, 1999; Klip et al, 2002; Doyle et al, 2002; Brinton et al, 2004) and case–control studies (Weiss et al, 1998; Ricci et al, 1999) that have assessed the relationship between fertility treatments and breast cancer risk, which have not found any overall associations, although some (Jensen et al, 2007; Orgéas et al, 2009) reported increases in specific subgroups.

This study provides no evidence that ovarian stimulation is associated with an elevated risk of ovarian cancer. In contrast, some earlier case series and epidemiological studies (Whittemore et al, 1992; Rossing et al, 1994; Shushan et al, 1996) reported positive associations, with risk being particularly high in the subgroup of nulligravid women, whereas among those who took drugs but did achieve a pregnancy the risk was not significantly different from that among gravid women without a history of infertility. More recent cohort (Modan et al, 1998; Venn et al, 1999; Klip et al, 2002; Doyle et al, 2002; Brinton et al, 2004; Calderon-Margalit et al, 2008; Jensen et al, 2009) and case–control (Ness et al, 2003) studies, however, have failed to find any such association. The lack of clear trends with time since start of treatment would also argue against the hypothesis that ovulation stimulation induces growth of pre-existing latent tumours.

There was some evidence that cancer of the uterus may be associated with ovarian stimulation, with risks increasing with increasing cumulative dose of clomiphene and, possibly, number of cycles. Few studies have examined this question but three large cohort studies reported increases among women exposed to high doses or with longer follow-up (Modan et al, 1998; Althuis et al, 2005; Calderon-Margalit et al, 2008). Such a link would be biologically credible as clomiphene and tamoxifen are both selective estrogen-receptor modulators and tamoxifen use has been shown to be associated with an increased risk of cancer of the uterus (Swerdlow et al, 2005). Although high doses of clomiphene may have been preferentially given to women with polycystic ovarian syndrome, a known risk factor for cancer of the uterus, adjustment for underlying diagnosis only slightly reduced the magnitude of the risk estimate.

Ovarian stimulation was not associated with colorectal cancer, malignant melanoma of the skin, or thyroid cancer. There was some evidence suggestive of a positive association with cancer of the liver and biliary tract, but this may be a chance finding (based on only three cases). Such a link has not been reported earlier but it has some biological plausibility as oral contraceptive use is known to be associated with increased risks of benign hepatic adenoma (Edmondson et al, 1976) and liver cancer (IARC, 1999).

The significantly lower risk of cervical cancer in this cohort relative to the general population is consistent with a possible surveillance bias and the fact that parity increases the risk of this cancer (ICESCC, 2006). There is some evidence in our study that risk may be somewhat lower among women who were exposed to ovarian stimulation than among those who were not (RR=0.24; Table 3), but the small number of cases precluded proper examination of dose–response effects.

Overall, the results of this study do not support strong associations between ovulation-stimulation treatments and cancer risks, with the exception of possible increases in the risk of cancers of the uterus and of the liver and biliary tract. These findings support the need for continuing monitoring of the long-term effects of these treatments.