Clinical validation, implementation, and reporting of polygenic risk scores for common diseases

Implementation of polygenic risk scores (PRS) may improve disease prevention and management but requires the construction and validation of clinical assays, interpretation, and reporting pipelines. We developed a clinical genotype array-based assay for published PRS for 6 common diseases. First, we calculated PRS for 36,423 Mass General Brigham Biobank (MGBB) participants. Finding signicant variation in the PRS distributions by race, we implemented adjustment for population structure with ancestry-informative principal components. We replicated published thresholds for odds ratio (OR)>2 in MGBB overall [ranging from 1.75 (1.57, 1.95) for Type 2 diabetes to 2.38 (2.07, 2.73) for breast cancer]. After conrming the high performance and robustness of the pipeline for use as a clinical assay, we analyzed the rst 141 prospective samples from the Genomic Medicine at VA Study; frequency of PRS corresponding to published OR>2 ranged from 5/141 (3.6%) for colorectal cancer to 8/48 (16.7%) for breast cancer. Our development of a clinical PRS assay for multiple conditions illustrates the generalizability of this process and necessary technical and reporting decisions for meaningful clinical PRS implementation.


Introduction
For more than a decade, genome-wide association studies (GWAS) have identi ed thousands of genomic variants signi cantly associated with a range of common complex human diseases, including cardiometabolic diseases and cancers. 1,2 Since the risks conferred by common variants are most often individually insigni cantly small, investigators have aggregated risk alleles across the genome into genetic risk scores, providing a single measure of genetic association for a given trait due to known common variant effects. [3][4][5] While the earliest genetic scores consisted only of variants meeting genomewide signi cance, recent computational and methodological advances have leveraged the summary statistics of all available variants from increasingly larger GWAS to calculate polygenic risk scores (PRS). [6][7][8][9] For some diseases, a PRS value in the upper tail of the distribution may approximate risks equivalent to those conferred by established clinical risk factors as well as genetic variants associated with monogenic disease. 7,10 Although PRS are typically derived from weights from cross-sectional GWAS of prevalent disease cases and controls, further work has demonstrated their potential to estimate risk of incident disease. [11][12][13][14] Suitable clinical implementation of PRS is now an area of active research across many disease areas. [15][16][17][18] However, a key assumption underlying the potential clinical translation of PRS is that clinical laboratories can develop and implement valid clinical assays and interpretation pipelines to report PRS to clinicians and their patients in a meaningful way. The development of a clinical assay from a published PRS is not trivial, and a number of barriers to the process exist. First, uncertainty exists about whether commonly used, cost-effective genotyping arrays and clinical imputation pipelines can calculate a PRS for an individual with the analytic and clinical validity expected of a clinical assay, as opposed to one adequate for research. Second, laboratories must implement methods to account for the reduced validity of most PRS among patients of non-European and admixed ancestry. 19,20 This limitation applies both to the calculation of the PRS itself and its clinical interpretation, as published effect sizes are from populations of primarily European ancestry. 19 Published methods can adjust for population structure, enabling all population groups to be treated similarly in downstream analyses. 21,22 However, these methods are not immediately applicable to correct a PRS for a prospectively genotyped individual, whose sample is likely part of a small batch with insu cient data for robust adjustment for population structure.
Correction thus requires additional decisions about how to adjust for population structure and which reference to use. Third, unanswered questions about the content and format of a clinical PRS report include balancing clarity and actionability with full transparency and limitations.
Despite these challenges, PRS assays are under active development by both clinical and research laboratories. [23][24][25][26][27][28] Many have reported the aggregate performance of these PRS in a population, including biobanks or customers of direct-to-consumer companies, but none has described the development and validation of a clinical, population structure-adjusted assay for prospectively tested individuals. The Genomic Medicine at VA (GenoVA) Study (ClinicalTrials.gov Identi er: NCT04331535) is an ongoing randomized clinical trial examining the impact of PRS on disease diagnosis and prevention in a prospectively enrolled cohort of adult primary care patients. Here, we report the development and validation of a genotype array-based clinical assay and report for published PRS for six diseases and actionable monogenic ndings, for implementation among prospectively genotyped individuals.

Selection of diseases and PRS for implementation
Because the GenoVA Study is examining the impact of PRS on disease diagnosis and prevention in adult primary care, we selected PRS for common diseases for which the typical primary care physician likely already has established practice patterns for screening, diagnosis, and prevention: coronary artery disease (CAD), type 2 diabetes mellitus (T2D), atrial brillation (AFib), colorectal cancer (CRCa), prostate cancer (PrCa), and female breast cancer (BrCa). We identi ed large genome-wide association studies (GWAS) for the six target diseases whose summary statistics (base les with alleles and weights) were freely available from the Polygenic Score (PGS) Catalog 29 (AFib, CAD, T2D, BrCa) or the Cancer-PRSWeb 3.98107170553497e-07) 33 , PrCa (Schumacher 2018; PRSWEB_PHECODE185_Pca-PRACTICAL_LASSOSUM_MGI_20191112, PRS tuning parameter: s0.5_Lambda0.00695192796177561) 34 .

Population and sample
As the GenoVA Study is enrolling participants from eastern Massachusetts, USA, we used data from the Mass General Brigham (formerly Partners Healthcare) Biobank (MGBB), described previously 35,36 , to evaluate the performance of the selected PRSs within a similar population and work ow for our study and assay. MGBB participants were not included in the published derivation and validation studies for the PRS used. In brief, MGBB was launched in 2010 with the initial goal of collecting DNA, plasma, and serum samples from 75,000 patients from Brigham and Women's Hospital, Massachusetts General Hospital, and other MGBB-a liated healthcare facilities, and obtaining patient consent for the linkage among biospecimen data, medical record data, and survey data. Race, ethnicity, and sex data derive from a combination of participant and healthcare provider report in the MGB electronic health record (EHR). For the present analysis, we collapsed reported race and ethnicity into 4 categories: Asian, Black, white, and other/unknown.

Genotyping and imputation
We used genotype data from the 36,423 MGBB participants whose biospecimens had been genotyped within the MGBB as of December 16, 2019. Genotyping was performed using standard processing described previously on one of three Illumina In nium genotyping arrays: 1) a pre-release version developed by the Multi-Ethnic Genotyping Array Consortium (Multi-Ethnic Genotyping Array (MEGA), n=4924), 2) an expanded version of this pre-commercial array (Expanded Multi-Ethnic Genotyping Array (MEGAEX), n=5345), and 3) the nal commercial version (Multi-Ethnic Global (MEG), n=26157). The MEGA, MEGAEX, and MEG arrays consisted of 1.39, 1.74, and 1.78 million probes, respectively 41 . For MEGA and MEGAEX data, only probes found in the commercial version of the array (MEG) were used in the present analysis. Quality control for the genotyping requires samples to have at least a 99% call rate and concordant sex between the EHR and what is computed from the array data. We utilized existing MGBB imputed data generated by batching sets of ~5000 participants and imputing against 1000 Genome Project phase 3 data using the Michigan Imputation Server 42 (https://imputationserver.sph.umich.edu/index.html#!), with ShapeIT (v2.r790) 43 used for phasing and Minimac3 used for imputation with default settings. Sets of imputed variants were compared to the base les for each PRS to ensure su cient representation of probes (Supplemental Table 2). 42 Calculation of PRS and adjustment for population structure Unadjusted raw PRS (PRS raw ) for each disease were calculated using PLINK (version 2.0a) by taking the product of count of risk alleles and the risk allele weight at each locus in the PRS and then summing across available risk loci. The loci included in each PRS, the risk alleles and the corresponding weights were downloaded from the PGS Catalog or Cancer PRSWeb. A population structure-adjusted PRS was calculated for each disease, using a previously described approach 38 implementing principal components analysis (PCA) to compute adjusted residualized PRS for each disease. Principal components were calculated using all genotyped MGBB participants and a set of 16,385 of 16,443 previously reported ancestry-informative SNPs 44 . For each disease, we then t a linear model for unadjusted PRS (PRS raw ) as a function of the rst four PCs among controls for that disease (PRS raw P C1 + PC2 + PC3 + PC4) in R (v4.0.3). We then applied this model to calculate a predicted PRS (PRS pred ) for each disease among all cases and controls. Residualized, population structure-adjusted PRS (PRS adj ) were then computed for each individual for each disease as the difference between raw and predicted PRS (PRS raw -PRS pred ). For PRS raw , values were standardized (PRS std-raw ) using the mean and standard deviation (SD) in MGBB of the PRS raw values (Supplemental Table 3). Similarly, PRS std-adj was computed using the mean and SD values in MGBB of the PRS adj values (Supplemental Table 3). The distributions of PRS std-raw and PRS std-adj by genotype array, sex, age deciles, and reported race were compared among all subjects using the density function in R (v4.0.3).

PRS-disease association
The association of PRS std-adj with odds of disease was replicated among MGBB participants using the six disease phenotypes described above. For each PRS and disease, odds of disease (n cases /n controls ) were calculated for each of 50 PRS quantiles. For race-strati ed analyses, PRS deciles were used if too few cases were available for analysis across 50 quantiles. To visualize PRS-disease associations, we plotted the log(odds) of disease against the mean PRS std-adj in each quantile. Correlation was measured with R coe cients using RStudio (v1.1.383) with R (v4.0.3).

PRS threshold for high risk
We set an odds ratio (OR) >2 to indicate high polygenic risk for each disease, mirroring both a common threshold from Mendelian genetics 45,46 and the effect sizes for disease risk factors already considered in current clinical care. [47][48][49][50][51] We used the OR per standard deviation (SD) reported in the original publication to determine the Z-score threshold corresponding to OR>2 for each disease, where Z for OR>2 = log(2)/log(OR change per SD), 2 is the target OR, and OR change per SD is the coe cient from the literature for each disease (Supplemental Table 3).

Clinical PRS assay for individual samples
Based on the results of the above methods, we developed and validated a genotype array-based clinical assay for PRS, in addition to secondary ndings from the American College of Medical Genetics and Genomics v2.0 list (ACMG SF v2.0, Figure 1). 52

Validation samples
Replicates of each of three reference samples from Genome in a Bottle (GIAB) 53 maintained by the National Institute of Standards and Technology were included in the validation assay: NA12878 x 9, NA24631 x 6, and NA24385 x 6. Analytical performance (sensitivity and positive predictive value for presence/absence of variant sites) was determined within the high-con dence regions (v3.3.2). In addition, we included 1) 22 samples with PCR-free genome data (described below) and 2) 9 samples with high-risk PRS for one of the six diseases as determined by the MGBB data, including one individual with high-risk PRS for two diseases. To test the sensitivity of the secondary nding analysis, we genotyped 20 samples with previously identi ed pathogenic or likely pathogenic variants within the ACMG SF v2.0 list.

Genotyping and imputation
Validation samples were genotyped according to manufacturer-standard work ows on either a precommercial release of the Illumina Global Diversity Array (GDA-PC) or the nal commercial release of the Global Diversity Array (GDA). The GTC les generated by genotype array were converted to VCF format using a custom coding and the gtc2vcf script from Illumina. All samples required an overall call rate >98.5%. Imputation was performed using updated software, with EAGLE v2.4.1 54 for phasing and Minimac4 42 for imputation using the 1000 Genomes Project phase 3 dataset. Importantly, monomorphic sites were not removed during the imputation process due to the small batch sizes used in the prospective assay.
PRS calculation PRS raw was calculated for each sample as described above. To determine PRS adj , the results from the MGBB PC analysis were used to project each new individual sample onto the MGBB PCs (see Supplemental Methods). 55 The scaled PCs were tted into the linear model for each disease developed in the MGBB data to obtain PRS pred , PRS adj , and PRS std-adj as above, standardized using the mean and standard deviation for each phenotype from the MGBB data (Supplemental Table 3).

Genome sequencing
We selected 22 diverse samples that had previously undergone clinical whole genome sequencing to determine the robustness of PRS across different platforms. Genome sequencing was performed at the Clinical Research Sequencing Platform of the Broad Institute using PCR-free library construction and sequencing on an Illumina NovaSeq with 2 x 150 bp paired-end reads with ≥95% of bases covered at ≥20x. Reads were aligned to GRCh37 using the Burrows-Wheeler Aligner (BWA version 0.7.15) 56 and variant calls were made using HaplotypeCaller from the Genomic Analysis Tool Kit (GATK version 4.0.3.0). 57,58 PRS raw , PRS std-raw , PRS adj , and PRS std-adj were calculated as above for the other prospective samples. As stated above, these 22 samples were also run on the GDA-PC array to compare PRS between genome sequencing and array. The difference between the sequence-based and array-based PRS were visualized, and dichotomous risk classi cations were formally compared with Matthews Correlation Coe cient (MCC). 59 Identi cation of actionable variants associated with monogenic disease Variants from the original genotyping vcf were annotated and ltered to the 59 genes suggested for screening of secondary ndings as recommended by the American College of Medical Genetics and Genomics (ACMG SF v2.0) 52 to nd: 1) variants previously identi ed as disease causing by the MGB Laboratory for Molecular Medicine, 2) variants classi ed as P/LP within ClinVar with a minor allele frequency (MAF) <0.1%, 3) variants classi ed as a disease-causing mutation (DM) in HGMD with a MAF <0.03%, and 4) loss-of-function variants (nonsense, frameshift, canonical splice-site, and initiating methionine variants) with a MAF <0.1% in genes where that is a disease-mechanism. Clinical variant classi cation was carried out in accordance with the criteria set by the guidelines by the ACMG and the Association of Molecular Pathology, 60 with disease speci c modi cations as recommended by the Clinical Genome Resource Expert Panels. 61 Prospectively enrolled trial participants The assay described above is now in use in the ongoing GenoVA Study randomized trial of clinical PRS (ClinicalTrials.gov Identi er: NCT04331535), in which eligible participants are patients of the VA Boston Healthcare System, aged 50-70, without known diagnoses of the 6 target diseases. Enrollees provide a clinical blood or saliva sample for analysis at the LMM and then receive a PRS report along with relevant disease-speci c information.

Sample characteristics
Among the 36,423 MGBB participants whose genotype data were used to calculate the 6 PRS, mean (SD) age was 58.8 (17.1) years (range 9-106), 19,719 (54.1%) were female, and 5706 (15.7%) were of reported race other than white [30,716 (84.3%) white, 1,807 (5.0%) Black, 786 (2.2%) Asian, and 3,113 (8.5%) of other/unknown race]. Case counts ranged from 392 CRCa cases to 3,554 cases of CAD. Figure 2A shows the counts of participants with one or multiple of the target diseases as determined by the computed phenotypes. The most common disease co-occurrences were the combinations of CAD and T2D (n=641) and CAD and AFib (n=495).
PRS distributions before and after adjustment for population structure Supplemental Table 2 shows the numbers of SNPs in the base le for each of the 6 published PRS, ranging from 81 SNPs in the Huyghe CRCa 33 to 6,917,436 in the Khera T2D PRS 7 , and the subsets of these available as directly genotyped or imputed data from each of the 3 arrays used among MGBB participants, demonstrating minimal loss of information compared to the original published PRS. As shown in Figure 3, using the weights from the publications directly (PRS std-raw ), we observed marked variation in the distribution of each PRS by race in MGBB, most notably in AFib, CAD, and T2D. For example, only 516/30,716 (1.7%) of white MGBB participants but almost all (1,606/1,807, 88.9%) Black MGBB participants had PRS std-raw above the threshold associated with OR>2 for T2D in Khera 2018 7 (Supplemental Table 4). Adjustment of the raw PRS (PRS std-adj ) reduced this variation (Figure 3), such that, for example, 2,651/30,716 (8.6%) of white MGBB participants and 75/1,807 (4.2%) of Black MGBB participants had aT2D PRS std-adj above the published OR>2 threshold. The distributions of PRS std-adj were well aligned when examined by genotyping batch, decile of age, and sex (Supplemental Figures 1-3).

Replication of PRS-disease association
As shown in Figure 4, quantile of PRS std-adj was highly correlated with log(odds) of disease across the 6 phenotypes in MGBB, with correlation coe cients ranging from 0.68 for CRCa to 0.95 for T2D. Supplemental Figures 4-7 show the correlation of PRS std-adj quantile and log(odds) of disease among reported racial groups separately. Our analyses also replicated the published thresholds corresponding to OR>2. As shown in Table 1, at the published PRS std-adj thresholds, we observed OR ranging from 1.75 (95% CI 1.57, 1.95) for T2D to 2.38 (95% CI 2.07, 2.73) for BrCa among MGBB participants overall. With the exception of T2D, the 95% CI of the replicated OR for all diseases either included or, in the case of BrCa and AFib, exceeded a point estimate of 2. Results were consistent in analyses restricted to white MGBB participants but were variable in other groups, in large part because of small numbers of disease cases. In 22 of 24 analyses strati ed by reported race, subjects with PRS std-adj above the published OR>2 thresholds had higher odds of disease than those below these thresholds. In the MGBB overall, the prevalence of a high-risk PRS std-adj ranged from 5.4% for CRCa to 13.2% for PrCa (among men). Figure 2B illustrates the number of participants with PRS std-adj above the published OR>2 threshold for one or more of the target diseases. Of note, similar to the disease co-occurrences observed among MGBB participants, the most common co-occurrences of high-risk PRS std-adj were the combinations of CAD and T2D (n=333) and CAD and AFib (n=211).

Prospective PRS assay
Sensitivity and speci city of GDA and imputation To determine the performances of the GDA arrays used in the prospective assay and of the imputation pipeline, we used three reference GIAB samples (NA12878, NA24385 and NA24631) (Supplemental Table  5). Sensitivity and positive predictive value (PPV) for SNVs were > 99.7% on average, with lower performance in indels (sensitivity = 96.3% and PPV = 97.8%). Of note, while sensitivity with the ACMG SF v2.0 region was high (96.2%), PPV was low (63.6%), due to these regions having an excess of poorly performing rare variants. 62,63 As expected, sensitivity and PPV decreased for imputed data, especially for indels (SNV sensitivity = 98.0%; SNV PPV = 97.5%; indel sensitivity = 92.8%; indel PPV =90.7%) (Supplemental Table 5). NA12878 was not evaluated for imputation accuracy, as it is present in the imputation reference dataset from the 1000 Genomes Project, and has arti cially high imputation accuracy. To further evaluate imputation accuracy, we compared genome sequencing data to array data for 22 diverse samples. Analytical performance was lower in this dataset as compared to the GIAB high-con dence data (~3% reduction in performance for sensitivity and PPV, Supplemental Table 6).

Performance of prospective PRS assay
For the GIAB samples, PRS std-adj values were robust across different array versions and consistent with results from WGS data; all 3 GIAB samples were below the high risk threshold for all diseases in all methods (Supplemental Table 7). In evaluating the 22 samples with WGS and prospective array data, PRS std-adj scores were similarly concordant, particularly for AF, CAD and T2D (Supplemental Fig. 8).
Additionally, 108/110 high risk status classi cations were concordant among this dataset (98.2% agreement; MCC 0.84, p<0.001), with the two discordant values (1 in CAD and 1 in CRCa) being very close to the high risk threshold (Supplemental Table 8). Finally, we compared 9 individuals with high-risk PRS for 10 diseases identi ed in the MGBB genotyping data to their PRS risk status using the prospective assay (1 individual at high risk for AFib, 1 individual at high risk for BrCa, 3 individuals at high risk for CAD, 3 individuals at high risk for CRCa, 1 individual at high risk for PrCa, and 1 individual at high risk for T2D). All PRS categories were consistent across the two genotyping approaches (Supplemental Table 9).
Clinical PRS report Based on the above validation, we produced a PRS report consistent in format and content with other clinical genetic test reports (Supplemental File 2). [64][65][66][67] That is, it includes a description of the test performed and a prominently displayed summary of important ndings and their interpretations, including any monogenic disease variants identi ed and any PRS indicating increased polygenic disease risk. A graphic highlights in red the disease(s) for which the patient has increased polygenic disease risk. Subsequent sections of the report give more detail about the results, including, for each disease, a general population prevalence and a brief summary of the GWAS from which the PRS was derived. Sections on methodology and literature references are at the end of the report. The report highlights the European bias of these GWAS and PRS in the initial summary, stating "Polygenic risk calculated using data from predominantly European ancestry individuals. Results are known to be less accurate for individuals of non-European ancestry." This information is reiterated in the detailed description of each disease and in a limitations section at the end of the report.
Results from rst 141 prospective samples As of May 15, 2021, the DNA specimens from 141 GenoVA trial participants (73 blood, 68 saliva) have been assayed with the prospective PRS and secondary ndings pipelines. As shown in Table 2, n=92 (65%) participants are of non-white reported race/ethnicity, and 49 (35%) currently identify as women. In this preliminary sample of trial enrollees, the proportions of participants whose PRS are above the study threshold for high risk are similar to the proportions expected from the MGBB, ranging from 3.6% for CRCa to 16.7% for BrCa (8 high-risk results among 48 participants of female sex at birth). Two actionable ACMG SF v2.0 variants have been identi ed and con rmed among the rst 141 enrollees (BRCA1:NM_007294 c.2748delT (p.Asn916LysfsX84), likely pathogenic; BRCA2:NM_000059 c.3545_3546delTT (p.Phe1182X), pathogenic). The reporting of these results to trial participants and their primary care providers is underway.

Discussion
Bridging one signi cant gap between PRS development and clinical PRS implementation, we developed a clinical genotyping array-based assay to calculate published PRS for six common diseases. The scores were robust across multiple genotyping arrays and imputation pipelines. The distributions of these PRS varied by race in a large biobank, impeding clinical validation, but adjustment for population structure enabled the replication of published PRS-disease associations. These results informed the development of a population structure-adjusted pipeline for PRS calculation and reporting among prospectively genotyped patients, now in use in a clinical trial of PRS testing.
The development and implementation of our PRS assay and report illustrates key choices that laboratories must make in such a process. First, across the six target diseases, we had to choose the speci c PRS to implement among multiple publicly available options 29 . Considerations here include the performance of the PRS in both the published discovery and replication cohorts in addition to the target population of interest. Guidelines are emerging on what de nes high quality in publishing PRS 68 , and this improved transparency should help laboratories select appropriate PRS. Second, we chose to use a genotype array-based approach, instead of genome sequencing. Like genotyping, low coverage genome sequencing technology is also relatively low-cost 27 ; we chose the Illumina GDA, in part, because its widespread use in the All of Us Research Program 69 , eMERGE Consortium 15 , and other projects enhances the generalizability of our methods to other institutions looking to implement clinical PRS testing. Third, we chose to impute data from both our development cohort and prospectively enrolled subjects against 1000 Genome Project phase 3 data. Other laboratories may choose to impute against the larger TOPMed population, 70 although issues of genome build discrepancy and regulatory prohibition against sending patient data to external research servers are limitations. Further, the degree to which an imputation pipeline includes stochastic processes may in uence robustness, one reason our prospective assay uses EAGLE for imputation. 26 Fourth, once a platform is selected, a clinical laboratory must determine the benchmarks that should de ne an analytically valid PRS assay. We chose to 1) verify the PRS performance within our laboratory to determine the appropriate parameters for our assay; 2) calculate the analytical performance of the genotyping array and imputation pipeline using both well-characterized reference samples and individual level genome data; and 3) calculate the robustness and performance of the PRS using genome data and multiple array platforms from both reference and individual samples.
This multi-step approach helped ensure the accuracy of the data going into the PRS as well as the nal performance of the PRS itself.
Finally, we made numerous choices in how to report PRS results and their interpretation to patients and providers, a full discussion of which is beyond the scope of this report. A key choice we made was to report the PRS interpretation as a categorical result (i.e. high risk vs. average risk) instead of a continuous result (e.g. percentile rank, relative risk, or absolute risk) or as a categorical result (e.g. low, intermediate, or high risk). We have previously described the trade-offs of these approaches. 71 For the GenoVA Study, we chose OR>2 to de ne high categorical polygenic risk, consistent with effect sizes of traditional risk factors considered for our target diseases. [47][48][49][50][51] Another laboratory could use the methods we describe to produce measures of continuous risk or of categorical risk at different thresholds thought to be clinically meaningful, which will likely vary between diseases. Modeling the prevailing scenario in clinical medicine in which a treating clinician orders a laboratory test for a patient and then receives the results, the content and format of the PRS report mirror those of a more traditional molecular diagnostic report, 64-67 written for clinicians and not explicitly for patients.
Although other laboratories are developing PRS assays in both clinical and research settings, 23-28 few have described their clinical interpretation pipeline in general or their approach to ancestry in particular. Myriad Genetics uses a next-generation sequencing panel backbone to measure a PRS of >80 SNPs identi ed from European GWAS 72 and validated in women of European descent. 23 This test had been available only to European ancestry patients, but a multi-ancestry version is in validation and will be commercially available in 2022. [73][74][75] Ambry Genetics developed similar clinical PRS assays for breast and prostate cancer but announced that it would cease offering these tests as of May 25, 2021, citing lack of validation across ancestries and National Comprehensive Cancer Network advisement against the use of PRS in routine clinical care. 24, 76 The direct-to-consumer genetic testing company 23andMe offers customers a proprietary T2D PRS comprised of >1000 loci, using data derived from its own customers. 25 23andMe customers can also share their genotype data with the MyGeneRank research study, which, as of December 2019, calculates a 163-SNP PRS for CAD and reports an ancestry-speci c percentile rank to each participant. 26 Color Genomics has developed a low-coverage whole-genome sequencing assay for PRS for CAD, AFib, and BrCa and shown that coverage depth as low as x0.5 achieved high correlation with PRS from genotype array-based PRS from UK Biobank and with PRS from sequence data from 120 ancestrally diverse samples from the 1000 Genomes Project. 27 For a period of time, the laboratory offered this test through a research protocol that has since closed. 17 Although the eMERGE IV consortium and other studies are actively developing trans-ancestry PRS for a number of common diseases, 28,77 to our knowledge ours is the rst report of a single clinical assay for population structure-adjusted PRS for multiple diseases.
Much has been written about the reduced validity of most PRS among populations of non-European ancestry, due to their use of non-causal loci and effect sizes from GWAS in predominantly European discovery cohorts 19,20,78,79 . As the genomics community awaits larger data sets from more diverse populations and develops improved methods for deriving trans-ancestry PRS, 10,15,80 a laboratory aiming to develop a clinical PRS assay for a given disease has a few options: 1) postpone implementation; 2) implement separate ancestry-speci c published PRS in those ancestries where it is available; or 3) implement a single PRS and report transparently any applicable limitations in the underlying evidence and its interpretation for speci c individuals. We chose the last of these approaches and implemented a single method of adjustment for population structure. After doing so, we observed that the chosen OR>2 threshold generally identi ed subjects at higher risk of disease across race/ethnicity, although the magnitude of the increase varied across race/ethnicity. We contend that we have developed a clinically valid PRS assay for use in patients of diverse ancestry, including admixed individuals, whose results nonetheless have limitations which must be contextualized for each individual. Moving forward, our approach can be adapted to incorporate the rapid developments in this area to implement improved PRS with greater validity across populations. Similarly, our approach can include additional variants identi ed by the ACMG or organizations as important secondary ndings. 81 The strengths of our approach include the inclusion of validated PRS for multiple common diseases and the use of a cost-effective, widely used genotyping platform that will be well supported for future improvements. We and others can use these methods to update the clinical interpretative pipeline as improved imputation methods are available and as improved PRS for these six and additional diseases become available. Limitations include the predominance of self-reported white race in the MGBB population used to replicate the published PRS and the small number of disease cases for certain diseases among certain racial groups.
In conclusion, data from increasingly larger and more diverse populations, coupled with computational advances, are propelling PRS into consideration for clinical implementation. We have shown that implementing these advancements in a laboratory assay and clinical report is a feasible but non-trivial next step in realizing the potential of PRS for improved patient health.