Risk prediction models are important tools for identifying individuals at differing risks of developing a disease, especially if they take into account disease in relatives, and can be used to offer individually tailored prevention and clinical management. Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) is a risk prediction algorithm for breast and ovarian cancers that takes into account individual-specific data from relatives and computes BRCA1 and BRCA2 mutation carrier probabilities as well as age-specific risks for breast and ovarian cancer given a person’s family history of cancer (Antoniou et al, 2004, 2008a). BOADICEA was developed using segregation analyses based on 2785 families ascertained through population-based studies of breast cancer and families with multiple affected individuals in which at least one family member had been screened for BRCA1 and BRCA2 mutations (Antoniou et al, 2008a). A web-interface allows users to easily compute carrier probabilities and future cancer risks (

Independent validation of a risk model is important to justify implementation in clinical management. Although BOADICEA has been shown to be well calibrated in terms of predicting BRCA1 and BRCA2 mutation carrier status (Antoniou et al, 2008b; Stahlbom et al, 2012), it is still important to prospectively evaluate its performance in external populations, and in particular, whether BOADICEA performs adequately for populations outside the United Kingdom. Differences in underlying cancer incidences, founder mutations and environmental exposures could potentially lead to substantial under- or over-prediction of breast cancer risk for women living in different countries.

The aim of this study was to evaluate the performance of the BOADICEA model, in terms of calibration, validation and accuracy, in predicting first invasive breast cancer risks for female relatives of Australian women of European ancestry with (and without) breast cancer. In doing so, we introduce a tool for web-based risk calculation, BOADICEACentre, which can be run in batch mode.

Participants and Methods


Cohort participants were women who had not been diagnosed with invasive breast cancer at the time of recruitment to the Australian Breast Cancer Family Registry (ABCFR), who had completed a baseline questionnaire, and for whom follow-up data were available. This is an example of a prospective family study cohort (Hopper, 2011). The ABCFR is a component of the Breast Cancer Family Registry (John et al, 2004), and includes a population-based case–control-family study of the genetic, environmental and lifestyle factors associated with breast cancer; details of the recruitment strategy and baseline data collection methods have been previously described (Hopper et al, 1994; 1999, McCredie et al, 1998; Dite et al, 2003).

Population-based index cases were women under the age of 60 years when diagnosed with incident primary invasive breast cancer and were ascertained between 1992 and 1999 through population-based state cancer registries and recruited from metropolitan areas of Melbourne and Sydney, Australia (reporting of cancer to these state registries is a legislative requirement) (McCredie et al, 1998; John et al, 2004). Population-based index controls were also aged <60 years when identified from electoral rolls between 1992 and 1999. Family members of the index cases and controls were also recruited.

The ABCFR also includes 132 Melbourne-based families of Ashkenazi Jewish descent with one or more women with a history of breast cancer (Apicella et al, 2003), and 61 families with twin pairs in which one or both had a diagnosis of breast cancer recruited through the Australian Twin Registry (Hopper et al, 2013).

All participants provided written informed consent before participation and all studies were approved by the relevant local ethics committees.

After excluding women who had either a mastectomy or oophorectomy at baseline, the cohort consisted of 4176 women who were unaffected at baseline; 2704 from 991 population-based case families, 1322 (including index controls) from 521 population-based control families, 91 from 73 Ashkenazi Jewish families and 59 from 16 twin families.

Baseline questionnaire

At recruitment, all participants completed an interviewer-administered questionnaire which included detailed information on demographics, lifestyle and environmental factors, past surgeries (such as mastectomy and oophorectomy) and family history of cancer (see, McCredie et al, 1998; Dite et al, 2003; Milne et al, 2011).

Ascertainment of baseline family cancer history

At recruitment, all index cases completed a family history questionnaire (John et al, 2004) that sought cancer history for all their first-degree and second-degree relatives. Recruitment of parents, siblings and aunts of the index cases, and for some families other relatives depending on the cancer family history, was sought. Participating relatives then provided cancer history information on themselves and their relatives. A substantial proportion of the information collected for first-degree relatives was provided independently by the relatives themselves. Documented verification of reported cancers (through pathology reviews and reports, cancer registries and medical records) was sought wherever possible. Overall, 63% of breast cancers and 43% of non-breast cancers were verified.

Ascertainment of incident cancers

Incident breast cancer cases for participants were identified from notifications to the Victorian Cancer Registry and the NSW Central Cancer Registry of diagnoses of adenocarcinoma of the breast (International Classification of Diseases 9th revision rubric 174.0–174.9, or 10th revision rubric C50.0-C50.9) up to the end of 2010. Women with in situ breast cancers were not included as incident cases.

Imputation of missing family data

The data required from each of the participants and their relatives for these analyses were: relationship to the index case, date of birth, vital status, age at interview or death, and for those who had had cancer, the site and age at diagnosis. For some individuals (2% of the cohort), one or more of the above data items were missing and could not be calculated directly from known data. In these instances, data were imputed iteratively using a variation of a previously developed protocol (for more details, see Dite et al, 2003, 2010). Those with unknown age of breast cancer (4% of breast cancers reported) were assumed to have developed the disease at age 70 or last age of follow-up or age at death (if applicable), whichever age was the youngest.

BRCA1 and BRCA2 mutation testing

Where available, information on BRCA1 and BRCA2 mutation testing was also taken into account. Mutations were protein-truncating or missense mutations classified as deleterious by the Breast Cancer Information Core (National Human Genome Research Institute, 2002). Details of testing are given elsewhere (Dite et al, 2010). BRCA1 and BRCA2 mutation testing of family members was conducted for 514 of 1857 prevalent cases (28%) identified at baseline, and for 79 of 4176 unaffected participants (2%). Sensitivity of the mutation detection technique was assumed to equal 70% and 80% for BRCA1 and BRCA2, respectively.


We have developed a set of new computer software programs written in Java, which we have named BOADICEACentre. These programs were used to format the pedigree files for BOADICEA, validate the data and to estimate missing values as required (see Appendix). BOADICEACentre was also used to automate the computing of risk estimates from BOADICEA (as the web version of BOADICEA only computes a risk estimate for one participant at a time) by automatically changing the participants, submitting pedigree files and collating the results.

Statistical analyses

For all participants, BOADICEA (University of Cambridge, Cambridge, UK) (Web program v3) was used to compute the risk of being diagnosed with invasive breast cancer during follow-up (Antoniou et al, 2004, 2008a). Follow-up began at baseline and ended at 10 years post baseline, age at death, age at first mastectomy (of any type), age at first oophorectomy (of any type) or at age 80 years, whichever came first. Australian and UK age-specific cancer incidences were used as the population reference incidences, based on the recent update of the BOADICEA model that incorporates calendar period cancer incidences up to 2010 and country-specific incidences. Risks were computed for women of Ashkenazi Jewish origin assuming BRCA1 and BRCA2 mutation prevalence for young controls (BRCA1: 1.6% and BRCA2: 1.2%; Satagopan et al, 2001; Antoniou et al, 2008a).

We evaluated the performance of BOADICEA by investigating multiple model properties including calibration, discrimination and accuracy at the individual level (or dispersion). Model calibration (which indicates the overall fit of the model) was evaluated by comparing the expected (E) number of cases computed from BOADICEA with the observed (O) number of breast cancer cases. To account for having multiple participants from the same family, we used a robust 95% confidence intervals (CI) for the ratio of expected to observed cases given by where

O=total observed cases, E=total expected, Oi=total observed within family i, Ei=total expected within family i. Summations are over all Nfam families.

Discrimination (which is the ability of the model to distinguish between breast cancer cases and non-cases at the individual level) was evaluated using the area under the receiver-operating characteristic (ROC) curve, ignoring participants who were censored. For this measure, 0.5 is no better than chance and 1.0 is perfect discrimination. Sensitivity and specificity were assessed using cut-points of 10-year projected risks of 2%, 3%, 4% and 5%.

The accuracy of BOADICEA (i.e., whether the predicted probabilities fit the observed breast cancer status at the individual level) was evaluated using a logistic regression analysis in which the observed breast cancer status was the dependent variable and the log-odds of BOADICEA’s predicted probability of developing breast cancer during the follow-up period was the independent variable. The null hypothesis, that the estimated regression coefficient was equal to 1 in the model without a constant term, was used to test for dispersion.

All other statistical analyses and graphs were performed using Stata 12.1 (Stata Corporation, College Station, TX, USA).


Cumulative 10-year invasive breast cancer risks were calculated for 4176 participants from 1601 families from the ABCFR, the characteristics of whom are shown in Table 1. Of these, 117 participants identified themselves as being of Ashkenazi descent, while 35 participants were known to be carriers of BRCA1 or BRCA2 mutations. Figure 1 shows the distribution of 10-year BOADICEA scores. During the 10 years of follow-up, a total of 115 incident invasive breast cancers were identified. Approximately 15% of the participants were censored before attaining 10 years of follow-up (average 5.7 years; 378 reached 80 years of age, 5 had a bilateral mastectomy, 232 had died because of causes other than breast cancer). Duration of follow-up of family members did not differ appreciably by status of the index case (average 9.2 years for case families and 9.6 years for control families).

Table 1 Descriptive characteristics of the ABCFR cohort of 4176 women unaffected with breast cancer at baseline
Figure 1
figure 1

Distribution of BOADICEA scores. The white bars denote the distribution of 10-year BOADICEA risk scores for all participants; black bars denote the distribution for participants from index control families.

The ratio of expected to observed number of breast cancers was 0.92 (95% CI 0.76–1.10; Table 2). There was some evidence that breast cancers were under-predicted for the 60–69 year age group (E/O ratio=0.71, 95% CI 0.50–1.00). The E/O ratios by subgroups of the participant’s relation to the index case and by the number of affected relatives reported ranged between 0.83 and 0.98 and all CIs included 1.00.

Table 2 Comparison of the observed number of incident breast cancers with the expected number based on BOADICEA, overall and by subgroups

The E/O ratio point estimate was almost identical for 5-year follow-up (overall E/O= 0.93, 95% CI 0.71–1.21). Censoring follow-up at the onset of any cancer (apart from non-melanoma skin cancer) did not materially alter the results (overall E/O=0.91, 95% CI 0.75–1.10). Using UK incidences instead of Australian incidences made little difference to the results (e.g., overall E/O ratio=0.93, 95% CI 0.78–1.12). Excluding families that had imputed data on one or more family members (161 families) also made virtually no difference to the results (data not shown).

Figure 2 shows that the test for discrimination (area under the ROC curve) was 0.70 (95% CI 0.66–0.75). The sensitivity and specificity given a 10-year BOADICEA risk score of 2%, 3%, 4% and 5% are highlighted in the figure. For example, sensitivity was 70% while specificity was 62% when a 10-year risk of 3% was used as a cut-point.

Figure 2
figure 2

The receiver operator characteristic (ROC) curve BOADICEA as a predictor of 10-year cumulative risk. The square boxes on the ROC curve denote the cut-points. The area under the ROC curve was 0.70.

There was no evidence that the model was under- or over-dispersed (dispersion coefficient=0.97, 95% CI 0.91–1.02, P=0.2).


We have shown that BOADICEA is well calibrated for first invasive breast cancers overall and for most age and family history subgroups. In addition, the prediction model had good discriminatory accuracy, with no evidence of systematic under- or over-prediction at the individual level. We also provide details of a new BOADICEA utility, namely BOADICEACentre, which is easy-to-use and will help researchers with the input of data from an unlimited number of families in batch mode and the collation of results.

The under-prediction of BOADICEA in the 60–69 year old age group could be due to screening habits of women in the ABCFR. Wide-spread population screening in Australia has been reported for this age group (AIHW (Australian Institute of Health and Welfare), 2012), which may lead to over-diagnosis of breast cancer (Independent UK Panel on Breast Cancer Screening, 2012). Although the incidence rates that underlie BOADICEA should (at least partly) account for trends in population screening, it assumes that the sample being tested has similar characteristics to the overall population. The women in the ABCFR are, by study design, more likely to have a family history of breast cancer than the general population, and Australian women with a family history of breast cancer are more likely to seek additional screening than women without a family history (Roder et al, 2008). This discrepancy is likely to be most apparent for 60–69 year olds, an age range in which incidence peaks (Ferlay et al, 2010).

Use of Australian or UK incidences made little difference to the predictions. This is consistent with the fact that, over the past 30 years, the incidences for both countries were similar (Ferlay et al, 2010). Caution should be exercised in using BOADICEA for other populations in which incidences can vary markedly between and within countries, and between ethnic groups. A model based on an underlying population incidence helps take into account changing screening habits over time and differing lifestyle choices such as use of exogenous hormones. The recent extension of BOADICEA (Web program v3) allows users to select population-specific incidences.

The discriminatory power was good for BOADICEA, and compares favourably with other validated cancer risk prediction models. The Breast Cancer Risk Assessment Tool (BCRAT, also known as the Gail model; Gail et al, 1989) and the International Breast Cancer Intervention Study model (IBIS, also known as the Tyrer–Cuzick model; Tyrer et al, 2004) have been validated in different populations, with areas under the ROC curves ranging from 0.53 to 0.66 for BCRAT, and 0.70 for IBIS (Anothaisintawee et al, 2012; Quante et al, 2012). Amir et al (2003), using a small dataset from the United Kingdom, reported slightly higher discriminatory power (area under the ROC curve ranging from 0.72 to 0.76) but poor calibration (E/O ranging from 0.48 to 0.81) for BCRAT, IBIS and BRCAPro. A Swedish study reported a slight underestimation (although not statistically significant) in the predicted number of invasive breast cancers (25 cases observed, E/O=0.71, 95% CI 0.48–1.05) for an earlier version of BOADICEA (web program v1; Stahlbom et al, 2012).

Work is ongoing to include additional risk factors in BOADICEA to improve discriminatory accuracy, which will ultimately improve targeting of clinical interventions. The web-based version v3 now allows users to incorporate oestrogen and progesterone receptor, HER2, CK5/6 and CK14 status of diagnosed family members (Mavaddat et al, 2010). Ongoing extensions to BOADICEA include the incorporation of explicit other known breast cancer susceptibility variants, namely the associations of common alleles identified through genome-wide association studies and the effects of rare variants conferring moderate risks, as well as lifestyle/hormonal risk factors for breast cancer.

In summary, we have used prospective data and found that BOADICEA was well calibrated for a cohort of Australian women over-sampled for family history. These are the sorts of women seeking advice from clinicians about their risks of breast cancer, and those most likely to be referred to cancer family genetics services for genetic counselling. We are conducting similar prospective validation studies in other populations, and it will be important to compare the performance of BOADICEA with other breast cancer risk prediction models. Large data sets (potentially by combining data from several centres) will be needed to adequately assess the performance of BOADICEA for predicting ovarian and contralateral breast cancers. BOADICEA is freely and widely accessible through the internet and this study provides further support for its use by clinicians and women themselves.