European polygenic risk score for prediction of breast cancer shows similar performance in Asian women

Polygenic risk scores (PRS) have been shown to predict breast cancer risk in European women, but their utility in Asian women is unclear. Here we evaluate the best performing PRSs for European-ancestry women using data from 17,262 breast cancer cases and 17,695 controls of Asian ancestry from 13 case-control studies, and 10,255 Chinese women from a prospective cohort (413 incident breast cancers). Compared to women in the middle quintile of the risk distribution, women in the highest 1% of PRS distribution have a ~2.7-fold risk and women in the lowest 1% of PRS distribution has ~0.4-fold risk of developing breast cancer. There is no evidence of heterogeneity in PRS performance in Chinese, Malay and Indian women. A PRS developed for European-ancestry women is also predictive of breast cancer risk in Asian women and can help in developing risk-stratified screening programmes in Asia.

I n the majority of high-income Western countries, breast cancer screening is systematic and population-based, and this has contributed to an improvement in survival 1 . By contrast, screening in the majority of Asian countries is opportunistic and suffers from poor uptake, contributing to delayed detection and poor survival 2 . In addition, there are concerns about the appropriate starting age of screening, as women are recommended to start screening at age 50 in many Asian countries, even though the peak breast cancer incidence in Asian populations is between 40 and 50 years of age 3 . Taken together with the rapidly increasing incidence of breast cancer in Asia 4 , there is thus an urgent need to develop an appropriate screening strategy for Asian women.
Provision of genetic counselling and genetic testing for rare variants in breast cancer predisposition genes such as BRCA1 and BRCA2 can lead to better management of risk, but these only explain a small fraction of breast cancer cases in the general population 5 . Risk profiles based on a combination of low penetrance but common breast cancer susceptibility single nucleotide polymorphisms (SNPs), summarised as polygenic risk scores (PRS), have been shown to be an important predictor of disease risk [6][7][8] . A 313-SNP PRS developed in European populations has improved predictive power compared to earlier PRS based on fewer SNPs; 6,7 this PRS demonstrated similar associations with disease risk in eleven independent prospective studies. Studies in European populations have demonstrated that PRS substantially improve discrimination, in comparison to risk prediction models based on classical risk factors alone 8,9 . In particular, using the recent extension of the BOADICEA model, it has been demonstrated that the 313-SNP PRS provides greater level of risk stratification in the population than epidemiological risk factors alone, and that the greatest level of risk stratification is achieved when both the PRS and epidemiological risk factors are considered jointly 10 . Screening trials [11][12][13] in women of predominantly European descent are ongoing to evaluate personalised breast cancer screening programme based on a woman's individual risk of disease, as a means of improving screening efficiency 14 .
Although there have been several efforts to create an Asianspecific PRS, these have been limited by the smaller sample size of Asian genetic studies [15][16][17][18][19] . Only~20% of the existing breast cancer genome-wide association study (GWAS) data are from women of Asian ancestry 20,21 . This limits the precision in the relative risk estimates for individual variants, which is critical for development of predictive PRS. Furthermore, Asian populations are ethnically and genetically diverse 22 , and genetic associations with breast cancer risk may vary by ancestry. Here, we evaluate the predictive ability of the 313-SNP PRS developed for European women for predicting breast cancer risk in Asian women, using data from 17,262 cases and 17,695 control women of Asian ancestry, from 10 studies based in Asian countries and three studies from North America, participating in the Breast Cancer Association Consortium (BCAC); and 10,255 Chinese women from a prospective cohort. We also evaluate the heterogeneity in the associations with breast cancer risk by ethnicity. We show that European ancestry-based PRS is predictive of breast cancer risk in Asian women.

Results
SNPs included in PRS analyses. To ensure accurate determination of PRS in the ethnic-specific analyses, 26 of the 313 SNPs with imputation accuracy scores <0.9, based in the Malaysian Breast Cancer Genetic Study (MyBrCa) and Singapore Breast Cancer Cohort (SGBCC) of 6900 cases and 7606 controls, combined, were excluded. Hence, the PRS was constructed using 287 SNPs for all BCAC studies (Supplementary Table 1). For the Singapore Chinese Health Study (SCHS), 229 of the 287 SNPs that were polymorphic and could be imputed in this dataset were used for PRS derivation. To compare the PRS performance with that in women of European ancestry, we recalculated the PRS using these sets of SNPs in the validation and prospective cohorts of European women described in Mavaddat et al. 7 .
For association analyses between PRS and overall breast cancer, the PRS was calculated using overall breast cancer weights while for association analyses between PRS and subtype-specific breast cancer, subtype-specific PRSs were constructed using the same set of SNPs but weights from the hybrid method described by Mavaddat et al. 7 Fig. 1). For overall breast cancer and ER-positive disease, the ORs per SD of the PRS were slightly higher in OncoArray genotyped studies compared to the studies genotyped with iCOGS. However, the confidence intervals for the array-specific estimates overlapped (Fig. 1). There was no evidence that the effect of PRS was modified by age (p value of interaction < 0.05 [Student's t test]; Supplementary Table 2). When analyses were stratified by 10-year age groups, the ORs per SD of the PRS by age group were similar (Supplementary Table 3).
The association between the PRSs and breast cancer risk by PRS percentile are shown in Fig. 2 Table 1). Malaysia and Singapore are ethnically diverse, with the majority of individuals identifying as Chinese, Malay or Indian. Principal component analysis showed that these ethnic groups can be distinguished based on genetic data; however, the distribution of the first two principal components for each ethnic group was similar between the two countries (Supplementary Fig. 1). Hence for the purposes of this analysis, women belonging to the same ethnic group from the two countries were analysed together. Table 4 summarises the characteristics of the study participants by self-reported ethnicity. The majority of the participants were Chinese (72%), while 17% and 11% were Malay and Indian, respectively. The mean PRS was markedly higher in Chinese and Malay women compared to European women, with the mean being highest for Chinese women. The mean for Indian women was intermediate between those for Chinese and Malay women  The overall breast cancer (BC) PRS (PRS overall ) and oestrogen-receptor (ER)-positive PRS (PRS ER+ ) and ER-negative PRS (PRSER−) were derived as describe in the "Method" section. ER-subtype is not available for the prospective cohort. Mean and SD of PRS in European studies were calculated using the data on the validation set as described in Mavaddat et. al. 7 but samples with missing ages information were removed. BCAC Breast Cancer Association consortium, SD standard deviation, PRS polygenic risk scores.
and those for European women (Tables 1 and 4). The PRS SDs of Malay and Indian controls were similar to that of women of European ancestry, while Chinese's SDs were slightly lower. The breast cancer OR per SD of the 287-SNP PRSs and the discriminatory accuracy, measured by area under the receiver operating characteristic curve (AUC), was similar across the three ethnic groups (heterogeneity p values > 0.05 [chi-squared test]; AUCs for overall breast cancer were 0.60-0.62, for ER-positive disease were 0.62-0.63 and for ERnegative disease were 0.57-0.60; Fig. 3). OR estimates by percentiles for overall breast cancer risk, compared to the middle quintile are shown in Fig. 4 Fig. 1 Association between standardised 287-SNP polygenic risk scores and breast cancer risk. Panel a shows the results for iCogs array by study and panel b shows the results for Oncoarray. The squares represent the odds ratios (ORs) and the horizontal lines represent the corresponding 95% confidence intervals. Overall estimates within genotyping array were obtained by combining the estimates across studies using fixed-effect meta-analysis, represented by the diamond shape. I-squared and p value (two-sided) for heterogeneity were obtained by fitting a random-effects model and using generalised Q-statistic estimator (the rma() command in R). The sample size of individual studies are listed in Supplementary Table 1. The ORs and corresponding 95% confidence intervals are provided as a Source Data file.
PRS and breast cancer risk in Asian Americans. The 287-SNP PRS was also evaluated using data from 2719 women of Asian ancestry recruited into three studies from North America (Supplementary Table 1). The means for all PRS were very similar to those in the Asian studies, and markedly higher than those in Europeans. The SDs in controls for all PRSs were similar to those in the Asian studies and somewhat lower than the observed SDs in European controls (Table 1). Compared to the breast cancer OR per SD in the Asian studies from Asia, the OR per SD of the 287-SNP PRS in the North American studies was smaller (p < 0.05) for overall breast cancer (1.36, 95% CI: 1.25-1.49) and ER-positive breast cancer (1.38, 95% CI: 1.25-1.53), but higher (p < 0.05) for ER-negative breast cancer (1.49, 95% CI: 1.26-1.76, Table 2). Of the three studies included in these analyses, only the Los Angeles County Asian-American Breast Cancer (LAABC) case-control study showed a significant association with breast cancer risk for all three PRSs while the Canadian Breast Cancer (CBC) study showed non-significant association across all PRSs (Fig. 1). However, the heterogeneity in the estimates among studies was not significant.
Prospective evaluation for PRS. We further evaluated the PRS in the prospective Singapore Chinese Health Study (SCHS), using data on 10,255 women, of whom 413 had developed breast cancer (Supplementary Table 1). The mean and SD of the 229-SNP PRS in the prospective study were similar to the mean and SD of 229-SNP PRS in the BCAC Asian studies ( Table 1). The estimated hazard ratio (HR) for overall breast cancer, per European-SD of the 229-SNP PRS, was 1.49 (95% CI: 1.33-1.67) and the AUC was 0.610 ( Table 2). The estimates were similar to those for the 229-SNP PRS in Asian studies (Asian studies from Asia: 1.49 (1.45-1.52); from North American studies: 1.33 (1.22-1.45)) but slightly lower than those in the European studies (1.59 (1.55-1.64)).
Absolute risk of developing breast cancer by PRS percentiles. Absolute lifetime and 10-year breast cancer risks by 287 SNP PRS percentile were derived by combining the estimated overall breast cancer ORs from BCAC Asian studies (Supplementary Table 4) and the breast cancer incidence and mortality rates for Chinese, Malay and Indian women in Singapore 23,24 (Table 5; Supplementary Fig. 2). The risks of developing breast cancer by age 80 for women in the lowest and highest 1% of the PRS distribution were~2% and~13-16%, respectively, depending on ethnicity. For women between the 90 and 99th percentiles of the risk distribution, the lifetime risks vary from 9 to 13%. Assuming that a 10-year absolute risk threshold of 2.3% (approximately the 10year risk from age 50 in women of European descent 25 ) is used to define women at sufficient risk to justify screening, Chinese and Malay women in the highest 1% of the PRS distribution would reach this threshold by age 35, while Indian women in the highest 1% would reach the threshold at age 39 years.
We also determined the proportion of women in the general population who would have 10-year absolute risk above the risk threshold (2.3%) at some point in their life. The maximum 10-year absolute risk for Chinese women in the highest 25%, Malay women in the highest 16% and Indian women in the highest 17% of the PRS distribution were greater than 2.3%. Offering screening to these women would capture~40%,~27% and~28% of all breast cancer cases in the Chinese, Malay and Indian populations, respectively ( Supplementary Fig. 3).
Comparison with other PRSs. We compared the predictive performance of the 287-SNP PRS for overall breast cancer with five PRSs 15,17,19,26,27 , which were previously developed or evaluated using data from Asian populations. Of these 5 PRSs, one was developed using iCogs genotyped studies in BCAC and 744 samples from MyBrCa study 15 . To avoid the potential of overfitting and to enable direct comparison between PRSs, we limited the analyses to OncoArray genotyped studies only (excluding 744 samples from MyBrCa study). We also recalculate the 287-SNP PRS using the same samples. The list of SNPs and corresponding weights as reported in the literature are given in Supplementary Table 6. The ORs per one SD of the 5 Asian PRSs  were between 1.10 and 1.41 and corresponding AUCs were between 0.533 and 0.586, substantially lower than that for the European-ancestry based 287-SNP PRS (Table 6).

Discussion
To date, the utility of incorporating common genetic variants into breast cancer risk prediction models has predominantly been investigated in women of European descent. Previous efforts in Asian studies thus far have focused on the development of Asianspecific PRS, and have been limited by small sample size. Given the difficulties of defining population-specific PRS, a more practical question is whether the PRS developed using data from women of European ancestry is predictive of risk for women of Asian ancestry. In this study, using the largest available data of Asian women, we independently evaluated the predictive performance of PRS developed based on 287 variants. Our study showed that the European-ancestry PRS was predictive of overall breast cancer risk for Asians. The magnitudes of association were generally consistent across the ten participating case-control Asian studies and the prospective Singaporean Chinese study. The association was also consistent across the three ethnic groups in Malaysia and Singapore, suggesting that the PRS is associated with similar relative risk estimates in all three ethnicities, though the confidence intervals for Malays and Indians are wide.
The estimated effect size and AUC of both the 287-SNP PRS and 229-SNP PRS were slightly lower than that observed in women of European ancestry. We evaluated the individual association of the 287 SNPs with overall breast cancer risk in Chinese, Malays and Indians separately and compared with the effect sizes in women of European ancestry (Supplementary Data 1). The intraclass correlation coefficients (ICC), taking into account standard errors of estimates, was estimated to be >0.7 for all ethnicities. These results indicate that the susceptibility variants in both populations are largely similar and confer similar relative risks, the lower effect size and AUC may arise from different patterns of linkage disequilibrium. Notably, our analyses showed that the Asian-specific PRS which included only five Asian-specific SNPs 27 , achieved AUC of 0.562 (Table 6), suggesting the development of more accurate PRSs in the Asian population is possible when larger cohorts of Asians becomes available to identify population-specifc SNPs.
The mean for the 287-SNP PRS was markedly higher in Asian populations than European populations, but the SD was slightly lower in Asians than Europeans. The lower variation (SD) may reflect the different allele frequency distributions: of the 287 SNPs that are common in women of European ancestry (minor allele frequency > 0.05), 43 are rare in Asian women and therefore contribute minimally to the PRS. In this paper, we have standardised the PRS to the European SD to enable comparison of the performance of the PRS in European and Asian populations. A more relevant approach is to standardised the PRS to the Asian SD, in which case the overall breast cancer OR per unit increase in PRS would be decreased to 1.48 (95% CI: 1.44-1.52). Taken together, these results highlight the need to calibrate the PRS distribution to enable risk models developed based on one population (e.g. Europeans) to be used in another population (e.g. Asians).
Overall breast cancer OR per SD (95%CI)  Fig. 3 Association between standardised PRSs and breast cancer risk in Chinese, Malay and Indian women from Malaysia and Singapore. Odds ratios (ORs) and AUCs were generated using data from Malaysia Breast Cancer Genetics (MyBrCa) and Singapore Breast Cancer Cohort (SGBCC) studies, stratified by ethnicity. The squares represent the odds ratios (ORs), the horizontal lines represent the corresponding 95% confidence intervals and the diamond shapes represent the overall estimates. I-squared and p value (two-sided) for heterogeneity were obtained by fitting a random-effects model and using generalised Q-statistic estimator (the rma() command in R). The number of cases and controls for each ethnicity by breast cancer subtypes are tabulated in Table 4. The sample size, ORs and corresponding 95% confidence intervals are also provided in the Source Data file.  The 287-SNP PRS had a lower predictive performance for overall breast cancer among Asians from the three North American studies, compared to the Asian or European studies ( Table 2). This somewhat surprising observation might be due to chance, but might reflect a greater admixture with non-Asian ancestry populations, or a greater variation in the distribution of lifestyle factors 26 leading to a greater variation in risk of breast cancer. Larger studies of Asian women in non-Asian countries are needed to provide more reliable estimates.
For subtype analyses using ER-specific PRS, we observed greater discrimination for ER-positive than ER-negative disease. This difference was also seen in European studies, and reflects the fact that the majority of risk SNPs are more strongly associated with ER-positive than ER-negative disease.
The majority of breast cancer studies have been conducted in populations of European descent and, as a result, the screening guidelines for Asian women are often based on those developed in Europe or North America 28,29 . In high income countries with predominantly women of European descent, personalised screening strategy based on age and PRS rather than age alone could reduce the number of people eligible for screening 30 , thus potentially reducing overdiagnosis, overtreatment and falsepositive diagnoses, which could lead to anxiety and stress in women who have gone for screening 14 . In the Asian context, however, a more cogent argument for stratified screening is to target limited screening resources on those women most likely to benefit. Based on the OR estimated in our analyses, and assuming that a 10-year absolute risk threshold of 2.3% is an appropriate threshold for screening, the majority of Asian women living in the Asian country with the highest population risk of breast cancer (Singapore) would never reach this threshold (Table 5; Supplementary Fig. 2). Notably, only~25% of Chinese women,~16% of Malay women and~17% of Indian women, would reach this threshold at any point in their lives. It is important to note, however, that Asians will experience a substantial increase in breast cancer incidence over the next decade, and it will therefore be necessary to revisit the screening recommendations over time.
To explore this, we simulated the 10-year absolute breast cancer risk of Chinese women using Australian breast cancer incidence 31 , which is about twice of that in Singapore (Supplementary Fig. 4). Assuming the breast cancer ORs associated with the PRS remain similar to those estimated here, those who are in the 60-80th percentile of the risk distribution, which would be classified as a low-risk group for screening based on current incidence, would reach the risk threshold for screening at age 45 based on the increased incidences. If the incidence rate reaches that of Western European countries, a similar proportion of women (~20%) would not meet screening threshold at any age 7 .
Our study has some limitations worth noting. Although we used the largest dataset of Asian women available to date to evaluate the performance of PRS, the sample size was still too limited to provide precise relative risk estimates for the extremes of the PRS distribution, particularly for ER-specific disease. The majority of the data in the BCAC dataset were generated with the OncoArray, however,~27% samples were genotyped using iCOGS array, which has lower genome-coverage. Of the 287 SNPs, 42 SNPs have imputation score between 0.75 and 0.9, while 53 SNPs have imputation score below 0.75 in the iCOGs dataset. PRSs were constructed using per allele log odds ratios as reported in the literature. As Asian case-control studies genotyped by iCOGs array and 744 samples from MYBRCA batch 1 were used as part of the development studies in Wen et al. (2016), to avoid upward bias, we restricted these evaluation analyses to Asian cases-controls studies genotyped using the OncoArray and removed overlapping samples in MYBRCA batch 1. a One SNP rs146699004 was not imputed and hence not included in the analyses. b Analyses of 287-SNP PRS was repeated using the same dataset as described. Absolute risks were calculated based on self-declared ethnicity and ethnic-specific incidence and mortality data in Singapore and using 287-SNP PRS relative risk for overall breast cancer. NR never reached, i.e., the 10-year absolute risk in this percentile never exceed 2.3%. a Age at which 10-year absolute risk exceeds 2.3%. The 2.3% threshold is the average 10-year absolute risk for a 50 years old woman of European ancestry (50 years old is the recommended age to begin regular mammographic screening Singapore).
This may explain in part the evidence for some heterogeneity in effect sizes between iCOGS and OncoArray datasets. The attenuation (10%) in the effect size of family history of breast cancer on breast cancer risk after adjusting for the 287-SNP PRS is consistent with the predicted contribution of the SNPs to the twofold familial risk of breast cancer for 287 SNPs (~11%, based on an overall OR per Asian SD of 1.48 8 ). It is important to note, however, that the estimated association of family history on breast cancer risk (OR = 1.35) is lower compared with other studies (OR = 1.8-3.9 in European studies 32-34 and OR = 1.52-2.1 in Asian studies 16,26 ). This might be due to inaccuracies in the family history data. The control women in the largest study (MyBrCa) contributing to these analyses, accounted for~30% of the total data, were recruited through opportunistic screening which may be enriched for family history relative to the cases. In addition, there was evidence of heterogeneity (I 2 = 66.1%, p value < 0.0001 [chi-squared test]) in the effect sizes of association between family history and breast cancer risk across Asian studies. In summary, we have shown that a PRS based on common breast cancer susceptibility variants identified in women of European ancestry is a strong predictor of breast cancer risk in Asian women. Furthermore, even though Asians are genetically diverse, our study shows that the PRS derived from women of European ancestry work equivalently well across the diverse ethnic groups in Asia. In the meantime, the PRS developed using data from large European-ancestry studies (providing this is recalibrated to the Asian population being tested) may be used as the basis for Asian-specific breast cancer risk prediction models that include the PRS as well as other predictors of breast cancer risk. These models will allow for higher levels of risk stratification to be achieved, as recently demonstrated in women of European ancestry 10 . Such risk assessment tools could help in resource planning, especially in low-and middle-income countries where resources are limited and population-based screening is unavailable, to improve the efficiency of personalised screening.  (SCHS 32,33 ). SCHS is a population-based prospective cohort study. Of the total of 10,255 women aged 43-75 years who had not had any cancer diagnosis prior to recruitment, 413 registry-confirmed breast cancers developed over 195,317.2 person years of prospective follow-up. Follow-up started 6 months after recruitment and was censored at age of breast cancer diagnosis, age at last known non-breast cancer status, or age on 31 December 2015, whichever came first. Supplementary Table 1 shows study design and number of breast cancer cases and controls for individual studies. Comparative results for European women were obtained from (a) 4926 cases and 4979 controls from 26 population-based case-control studies participating in BCAC and included in the validation analysis in Mavaddat et al. 7 and (b) ten nested case-control studies within prospective cohorts in BCAC, comprising 11,225 cases and 17,788 controls, included in the test dataset in Mavaddat et al. 7 , but excluding subjects <80 years old and for whom age was unknown. All studies were approved by the relevant institutional ethics committees and review boards, and all participants provided written informed consent.

Methods
Genotyping methods. All samples in BCAC studies were genotyped using one of two arrays: the~211,155-SNP iCOGS array and the~533,000-SNP OncoArray 34 . Genotype calling, quality control procedure and imputation has been described previously 20,21 . Briefly, samples found to be genotypically not female, discordant or cryptic duplicate pairs, and samples with assay call rate <95% and extreme heterozygosity (<5% or >40%, 4.89 SD from the mean for the ethnicity), were excluded. For first-degree relative pairs, the control was removed from the case-control pairs; otherwise the sample with the lower call rate was excluded. SNPs with assay call rate <95% and deviation from Hardy-Weinberg equilibrium in controls at p < 10 −7 in controls or p < 10 −12 for cases were excluded. The iCOGS and OncoArray datasets were imputed separately using a two-stage imputation approach, using SHAPEIT2 35 for phasing and IMPUTE2 36 for imputation, with 1000 Genomes Project (Phase 3) data as the reference panel 37 .
Samples in the prospective study (SCHS) were genotyped using Illumina Global Screening Array. Samples with call rate < 95% and extremes in heterozygosity were excluded. For firstand second-degree relative pairs, the sample with the lower call rate was excluded. Data were imputed using IMPUTE2 with 1000 Genomes Project (Phase 3) as reference panel. Only non-monomorphic SNPs in East Asian population in the reference panel were imputed.
Post-imputation quality was based on the imputation accuracy score INFOSCORE as provided by IMPUTE2 36 . This metric takes values between 0 and 1, with higher values indicating higher imputation certainty and 1 implying perfect imputation.
Principal components analyses were used to identify ethnic outliers and define ancestry informative covariates. For the BCAC data, continental ancestry was derived by combining the data with the 1000 Genomes Project reference data 34 . Individuals with >40% estimated East Asian ancestry were retained. In the second stage, principal components were generated on the Asian ancestry individuals using a subset of uncorrelated SNPs. Similar ancestry informative principal components were generated on the SCHS dataset.
Statistical methods. The analyses were based on the 313-SNP PRS developed in women of European ancestry 7 . SNPs with an imputation accuracy score <0.9, based in the MyBrCa and SGBCC studies, combined, were excluded; to ensure accurate determination of PRSs in the ethnic-specific analyses.
We derived PRS for overall breast cancer using Eq. (1) where x k is the dosage of risk allele (0-2) for SNP k and β k is the corresponding weight. To avoid bias due to overfitting, we used the weights previously derived for women European ancestry 7 . The ER-specific PRSs (denoted as PRS ER+ for ERpositive PRS and PRS ER− for ER-negative PRS) used same set of SNPs but weights from the hybrid method as reported in Mavaddat et al. 7 ; the hybrid method assigns subtype-specific weights to a subset of SNPs for which the effect sizes differ significantly by subtype. The list of SNPs and the corresponding weight are provided in Supplementary Data 1. To enable direct comparison of the performance of each PRS with those reported in European women, we standardised the PRSs by dividing the PRSs of each individual by the SD) of the PRSs in the control subjects from the population-based case-control series in European studies. Logistic regression models were used to estimate ORs for the association between the standardised PRSs and breast cancer risk. The overall breast cancer PRS was used as predictor in association analyses between overall breast cancer and PRS while for subtype-specific analyses, ER-specific PRS were used as predictors. The PRS were treated as either a continuous or categorical predictor in the model. When used as a categorical variable, the PRS was categorised into the following PRS percentile ranges based on the PRS distribution in controls: 1%, 1-5%, 5-10%, 10-20%, 20-40%, 40-60%, 60-80%, 80-90%, 90-95%, 95-99% and 99-100%. The 40-60% category was used as the reference. For ethnic-specific analyses, analyses were stratified by ethnicity (Chinese, Malay and Indian) using only the MyBrCA and SGBCC datasets. All models were adjusted for first ten principal components and study/array/batch; here samples from the same study that were genotyped in two batches (as was the case for MyBrCa and SGBCC) or on both arrays were treated as different strata for the purposes of adjustment. A Cox proportional hazard model was used for the evaluation of the PRS association with overall breast cancer risk in the prospective cohort and HRs per SD of the PRS were estimated.
The discriminatory accuracy of models for predicting breast cancer risk was evaluated using the area under the receiver operating characteristic curve (AUC), adjusted by study. Estimated ORs by PRS quantiles were compared with the predicted ORs under the model in which the PRS is considered as a continuous covariate and the log (OR) is linearly related to the PRS. To determine the proportion of the familial breast cancer risk that could be explained by PRS, we estimated the OR for the association of first-degree family history and breast cancer risk first adjusted for first 10 principal components and study/array/batch, and then additionally adjusted for the PRS.
To evaluate the effect modification of the PRS (as a continuous covariate) by age and family history of breast cancer in first-degree relatives, we included additional interaction terms in the logistic regression model.
The predicted proportion of the familial relative risk of breast cancer explained by the PRS was estimated by noting that the familial relative risk to first degree relatives of affected individuals due to PRS alone is estimated to be λ P ¼ expð γ 2 2 Þ, where γ is the OR per one SD (equivalent to the SD of the polygenic risk distribution) 38 . The proportion of the familial relative risk (on a log scale) due to the PRS was therefore estimated by using Eq. (2): where λ is the familial relative risk of breast cancer in first degree relatives, assumed to be 2 for breast cancer.
To compare the effect sizes of individuals SNPs and breast cancer risk with those reported in women of European ancestry, we estimated the effect size of the association between individual SNP and breast cancer risk in Chinese, Malays and Indians in MyBrCa and SGBCC studies separately using logistic regression, adjusting for age, study and the first 10 principal components, assuming a logadditive genetic model. Intra-class correlation (ICC) was then used to compare the estimated effect sizes with those reported in Mavaddat et al. (2019) 7 . To take into account the sampling error of the effect sizes in the ICC estimate, we fitted a hierarchical model of the form given by Eq. (3): where y ij denotes the parameter estimate of SNP i in population j, β ij are the true parameter estimates and δ ij $ Nð0; σ 2 ij Þ are the sampling errors, with known SDs σ ij . The model was fitted by using the expectation-maximisation (EM) algorithm 39 in which β ij were estimated using a weighted mean of the observed estimates y ij and the group mean α k ð Þ i , as given in Eq. (4) in the E-step and the estimated β ij were treated as complete data in the M-step to estimate α ðkþ1Þ i and σ 2 R , the within-group variance. This process is iterated until the estimated ICC converged.
The age-specific absolute risks of developing breast cancer, adjusting for competing mortality, in each PRS percentile was calculated using Eq. (5) where λ g (u) is the breast cancer incidence associated with PRS at age u, S g (u) is the probability of being breast cancer free at age u, and S m (u) is the probability of not dying from a cause other than breast cancer to age u. The PRS-specific breast cancer incidences, λ g (u), were calculated iteratively by assuming that the average age-specific breast cancer incidence over all PRS percentiles agreed with the population breast cancer incidence 6 . We calculated lifetime and 10-year absolute risks using Singaporean mortality and breast cancer incidence for Chinese, Malays and Indians 23,24 . The recommended screening age at 50 years old in many Asian countries is based on European or North American guidelines 29 and the average 10-year risk of breast cancer for women of European ancestry at age 50 years old is 2.3% 25 . Hence, we determined the proportion of women in the general population who would have the 10-year risk of breast cancer above this threshold, using method as described in Pharoah et al. 38 . To do this, the maximum 10-year absolute risk, adjusting for competing mortality, for women age 20-70, was calculated for each PRS centile category (0-0.1%, …, 99.9-100%), assuming an OR per 1 SD of the PRS of 1.48 (the estimated effect size in Asian studies). We compared the predictive performance of the European ancestry-based PRS with PRSs that were previously developed or evaluated in Asian populations. The five Asian ancestry-derived PRSs included 5 SNPs 15 , 51 SNPs 17 , 44 SNPs 19 , 6 SNPs 26 and 46 SNPs 27 . The PRSs were derived using Eq. (1) and the corresponding weights reported in the literature. The list of SNPs and corresponding weights are tabulated in Supplementary Table 6.
All statistical analyses were conducted using R v.3.0.3 or Stata v.14.2. Logistic regression and AUC were done using logistic() and comproc() in Stata, Cox proportional hazard model was done using Coxph() in R.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Summary statistics (OR and confidence limits) for all SNPs used in the analysis are provided in Supplementary Data 1 of the paper. Request for access to individual level data on which these analyses were based can be made via the Data Access Coordinating Committee of BCAC (BCAC Coordinator: BCAC@medschl.cam.ac.uk). The remaining data are available within the Article, Supplementary Information or available from the authors upon reasonable request. Source data are provided with this paper.