INTRODUCTION

Over the past decade, direct-to-consumer (DTC) genetic testing has become broadly popular among people looking to better understand their ancestry or medical health risk. More than 26 million people have ordered a DTC genetic test.1,2 A recent survey of primary care and specialist physicians found that 35% of respondents report having received a DTC genetic result from a patient.3 It is also becoming increasingly common for consumers to submit their genotype data to third party websites for interpretation, such as Promethease, codegene.eu, or WeGene.4,5 Thus, DTC genetic testing results have potential to impact clinical decision making by many patients and providers.

The majority of DTC testing is conducted using single-nucleotide polymorphism (SNP) arrays, and population-scale studies, such as All of Us,6 are also likely to assay many individuals by SNP arrays. SNP arrays are designed to assign genotypes for commonly observed, rather than rare, variants; they rely upon, for example, empirically observed patterns of genotype-dependent fluorescent intensity clustering rather than absolute assessments of allelic presence or absence.7 Several studies have suggested that DTC genetic testing leads to identification of variants that appear to be medically relevant but fail confirmatory testing and are false positives (FPs).8 This is, at least in part, a consequence of the fact that most clinically relevant, disease-associated alleles are rare9 and, by definition, have low prior probabilities of being present in any given person, particularly within asymptomatic individuals.10

Thus, as the number of tested individuals and interest in clinical genetics continues to grow, it is important to define the accuracies of SNP array–based detection of highly penetrant variation.8 FP detection of P/LP alleles is a major concern given the potentially life-altering medical interventions that may follow. Further, false negatives may also be of concern, particularly to the extent that they lead to “false reassurance” in which an individual interprets a negative result to indicate a lack of genetic disease risk.11

The Alabama Genomic Health Initiative (AGHI) is a state-wide research program to conduct genetic testing for a broad population unselected for health status to determine utility in disease prevention, management, and treatment. As part of AGHI, we have tested 5369 Alabamians in an attempt to identify clinically actionable genetic variation that may be relevant to participant health. This population screen is being conducted using the Illumina Global Screening Array (GSA), which has probes designed to genotype ~160,000 rare variants. Many of these rare variants were targeted because they reside in clinical variant databases and are annotated as highly penetrant contributors to disease.12,13 The AGHI population screen differs from DTC genetic testing as it is free to participants and provides access to genetic counseling, including phone-based or face-to-face counseling for those who receive positive findings. AGHI also provides customized reports of noninformative results to individuals with a personal or family history of disease that is consistent with elevated genetic risk, along with recommendations and resources for follow-up testing. In addition, with participant consent, genetic data are linked to electronic medical records to facilitate future research about the consequences and utility of genetic disease risk assessments.

MATERIALS AND METHODS

Study participant population

Genotyping via AGHI is available to all adult Alabama residents. The research protocol is approved by the University of Alabama at Birmingham (UAB) institutional review board (IRB). Permanent and “pop-up” enrollment sites are located throughout the state to provide access to a broad, diverse population. Participants are recruited via media, social media, and word of mouth. During enrollment, participants meet with a team member to discuss benefits, risks, limitations, and logistics; collect demographic data/contact information; and provide informed consent. Participants can elect to participate in a biobank, have results shared with a health-care provider, and/or allow recontact about future research (each item represents an independent selection). Two 4-mL blood draws into EDTA tubes are collected; one to isolate DNA and the second for biobanking, given consent. Additionally, a health history questionnaire identifies participants with personal or family history relevant to conditions associated with genes on the American College of Medical Genetics and Genomics (ACMG) SFv2.0 list.14 This information is used to determine who should receive recommendation for follow-up testing even if genotyping results are negative.

Ethics statement

The University of Alabama at Birmingham (IRB 170303004) approved and monitored this study. All enrollees were required to give written consent for study participation.

Population screen gene list

Variants returned to participants reside within the 59 medically actionable genes included on the ACMG SFv2.0 list.14

Genotyping and quality control

DNA isolation was conducted at UAB using the Gentra Puregene blood kit (Qiagen), or at the HudsonAlpha Institute Clinical Services Laboratory (CSL) using the QIAsymphony DSP DNA kit, following standard CAP/CLIA-approved protocols. Genotyping was completed at HudsonAlpha using the Illumina Global Screening Array (GSA-24 v1.0, n = 1807 individuals; GSA-24 v2.0, n = 3562 individuals) per Illumina’s recommended protocol.

Signal intensities detected by the GSA were converted to genotypes using Illumina AutoCall software with a GenCall threshold of 0.15. Primary quality control requirements included per-sample log R standard deviation (SD) less than 0.25 and call rates greater than 98.5% across the array (GSA-24 v1.0), or greater than 99.0% across the autosomes and chromosome X (GSA-24 v2.0).

Filtration and manual curation

Technical data reports were generated by the CSL and genotype calls were converted to a VCF-like format. Samples were analyzed in batches of various sizes, depending on participant recruitment. The number of chromosome X heterozygous counts were calculated per individual to predict sex, and pair-wise kinships values15 were used to determine participant relatedness. Metrics were tracked to detect potential sample mix-ups, and relatedness was accounted for when defining batch- and study-wide allele frequencies.

GSA-identified variation was filtered to yield a set of variants for manual curation. For each batch, we curated variants that have a batch allele count ≤5 (counting alleles within related groups of participants only one time), are present in ExAC/gnomAD16 or 1000 Genomes17 at a frequency ≤0.1%, and are protein- or splice site–altering or have a CADD-scaled score ≥15.18 For MUTYH variation, population database frequency thresholds were increased to 2%. Additionally, we filtered variants using a predefined list of genes (the ACMG SFv2.0). Study-wide allele counts, which are accumulating over time, and no-call rates were used during manual curation. Additionally, batch-level plots of raw fluorescence intensity values were generated for all variants that passed initial filtration and were used during manual curation.

Variant classification

Variant classification was conducted using evidence codes and rules set forth by ACMG/Association for Molecular Pathology (AMP) guidelines.19

Variant validation

Although individual AGHI components are conducted within the HudsonAlpha CSL, a CAP/CLIA-accredited laboratory, the overall workflow was carried out as a research project under an IRB-approved protocol. Medically relevant and returnable variants were validated by Sanger and separately interpreted by an independent CAP/CLIA-certified laboratory.

Return of results

Participants with no P/LP findings receive a noninformative result report via mail. Those who were flagged as being at elevated risk of disease based on personal or family history are given customized reports highlighting their histories and the limitations of AGHI genotyping to detect clinically relevant variants and are encouraged to seek clinical follow-up with specialists as appropriate. Participants with P/LP results receive a phone call from a genetic counselor to describe results, implications, and next steps, and face-to-face counseling is available upon request. In addition to the CLIA-certified results report, an individualized research report based on standard templates is written by genetic counselors and sent to the participant. With consent, both reports are shared with the participant’s health-care provider.

Data sharing

GSA-identified and Sanger confirmed P/LP variants have been submitted to ClinVar (study ID: AGHI_GT).

Estimation of continental ancestry

To determine whether self-reported ancestries (of AGHI participants who received Sanger testing) correlated with genetically inferred ancestry, we created a continental reference data set based on 1000 Genomes to estimate ancestral descent.20 To ensure maximum overlap with the GSA, we filtered GSA v2.0 variants using vcftools v0.1.7 to include only autosomal, biallelic SNPs and, to reduce linkage disequilibrium effects, we thinned SNPs to retain one variant every 2000 bp. We used the isec command of bcftools v1.9 to select the intersecting variants from the 1000 Genomes data and vcftools to select variants at a global minor allele frequency (MAF) >5% to create an ancestry reference data set. For each batch of GSA samples, we used bcftools to select all variants with at least one heterozygous genotype and that intersect with the ancestry reference data set. After recoding each set into plink format using plink v1.9.0, we used Admixture v1.3.0 in supervised mode to train an ancestry model on the filtered, superpopulation-labeled 1000 Genomes variants (K = 5), and then used Admixture v1.3.0 in projection mode to predict the percentages of continental superpopulation ancestry in each of the participants.

RESULTS

Population screening in AGHI via GSA

To date, we have enrolled, genotyped, and analyzed 5369 individuals, 75% of whom are female. Seventy-three percent self-report as European American, 20% as African American, and 2% as Asian. The remaining 5% represent individuals of Native American/Alaskan Native or Native Hawaiian/Pacific Islander descent, or their race is unknown. In addition, 3.3% are Hispanic/Latino, and 45% live in medically underserved areas.21 Race/ethnicity demographics of AGHI participants roughly represent Alabama’s population based on census data, which reports that 69% of Alabamians are European American, 27% African American, 2% Asian, and 4.6% self-report as Hispanic/Latino.22 Females are overrepresented in AGHI, but this gender bias is similar to that seen in other studies with self-selecting participation.23,24

We have used the Illumina GSA to genotype AGHI participants. While the GSA mostly targets common variation, a subset of probes (~160,000 of 654,027) are designed to target rare, potentially clinically relevant variation. Many of these variants were targeted because they reside in databases like ClinVar,12,13 and thus may be highly penetrant contributors to disease. We return only P/LP variants in clinically actionable disease genes defined by the ACMG and predominantly associated with cancer and cardiac disease risk (see “Materials and Methods”).

Variants of interest are filtered and selected based on a number of attributes, manually curated, classified according to ACMG criteria,19 and Sanger validated prior to return to study participants (Fig. 1). Personal and family history of disease is asked of each individual at time of enrollment and is available to the study team, although variant classifications are defined independent of this information.19

Fig. 1
figure 1

Workflow of population screening in the Alabama Genomic Health Initiative (AGHI).

Sensitivity of the GSA to medically relevant variants

To assess sensitivity of the GSA prior to conducting population screening in AGHI, we compared variation targeted by the GSA to secondary finding variants found via genome or exome sequencing of 7422 individuals reported in two previous studies.25,26 Among 126 unique P/LP secondary findings in these studies, 57% are targeted by v1.0 of the array and 71% by v2.0, providing an estimate of the upper limit on sensitivity (Table S1). Most (62%) P/LP secondary finding variants found by sequencing that are not targeted by the GSA (v2.0) are rare loss-of-function (LOF) alleles (nonsense, frameshift, or canonical splice site). This reflects the fact that LOF variants that are not in clinical genetic databases can often meet ACMG criteria for P/LP status.19 Missense alleles not previously seen in affected individuals typically do not meet P/LP criteria due to ambiguity associated with missense variation.

To experimentally assess sensitivity to targeted P/LP variants, we tested DNA samples from 20 previously sequenced individuals (85% European American) known to harbor at least one variant targeted by the GSA (23 unique variants, 15 genes; all heterozygous). All expected variants were identified correctly by the GSA, suggesting high sensitivity among array-targeted variants (Table S2). We also tested two previously sequenced individuals with heterozygous variation at a position targeted by the GSA but harboring a nontargeted alternate allele; these results are described in detail below in the context of FP findings.

True positive findings in AGHI

To date, 5369 individuals from the population cohort have undergone GSA testing and analysis. To confirm array-detected variation, we conducted Sanger testing for 130 unique variants found to be heterozygous, and one variant found to be homozygous, among 191 individuals, comprising 204 total person-by-variant Sanger tests (Table S3). Of the 131 variants tested, 99 were tested in only one individual, and among these 48 were confirmed while 51 were not. Among the 32 variants tested in multiple individuals, 14 confirmed in all tested individuals and 16 failed in all tested individuals, while one variant confirmed in one of two individuals and one variant confirmed in one of five individuals. Overall, 62 variants validated in all individuals tested (47%), 67 did not validate in any individual tested (51%), and two validated in only one of multiple tested samples.

Among the variants that validated in at least one individual, 57 lie within a medically actionable gene, were classified as P/LP, and were returned to study participants. Overall, 80 of 5369 total individuals were confirmed to harbor at least one clinically actionable P/LP variant, resulting in an overall yield of 1.5% (one individual harbored two P/LP variants). Returned variants reside in 19 different genes, including nine unique variants in BRCA2 (11 individuals), seven in BRCA1 (nine individuals), and five in MYBPC3 (nine individuals; Table 1). Of the 62 variants that validated in all tested individuals, 10 are predicted to result in frameshift (16%), 15 result in nonsense (24%), 29 result in missense substitution (47%), 7 are predicted to alter splicing (11%), and 1 leads to in-frame deletion (2%, Fig. 2).

Table 1 Returnable P/LP genetic variation was detected by GSA, and Sanger validated, in 80 AGHI participants across 19 different genes (one individual harbors two findings).
Fig. 2: Differences in genetic variation types between true and false positive variants.
figure 2

Global Screening Array (GSA)–targeted variants determined to be false positive by Sanger testing are enriched for failed detection of targeted indels. 61% of false positive variants are predicted to result in frameshift, compared with 16% of true positive events.

False positive findings

Analysis of GSA data for the first batches of AGHI participants revealed an unreasonably high rate of variants that passed initial quality control (QC) and were suspected to be P/LP variants in medically actionable genes. For example, analysis of the first 55 individuals resulted in 13 unique variants across 15 participants that passed initial genotype quality and variant filters. Eight of these 13 array-detected variants were classified as P/LP. If all eight of these variants were accurately detected, that would suggest a medically relevant variant yield of 15%, in contrast to values of 1–3% seen among nonascertained populations using sequencing.25,26,27 Tellingly, all eight were Sanger tested but none were confirmed.

Here, we describe one variant in detail as an illustrative example of the types of errors that result from the GSA in this context. Specifically, NM_000059.3(BRCA2):c.4258del (p.Asp1420fs), a pathogenic frameshift, was detected in two first-degree relatives within the first 55 participants. Sanger testing of both individuals revealed that they actually harbor NM_000059.3(BRCA2):c.4258G>T (p.Asp1420Tyr), a benign missense variant; thus, the array-detected heterozygosity for a nontargeted benign allele at the targeted genomic position. Across all participants in AGHI, the frameshift variant NM_000059.3(BRCA2):c.4258del has been reported by the GSA as heterozygous at a frequency of 0.64% among called individuals, an observation that is implausibly high relative to its allele frequency of 0.0008% in gnomAD.16 In contrast, the benign missense variant (p.Asp1420Tyr) that was Sanger detected has a frequency of 0.66% in gnomAD. This nontargeted allele is thus likely leading to all of the FP frameshift alleles being flagged by the GSA.

More generally, among the 67 GSA-detected variants that did not validate in any individual, Sanger sequencing detected a benign/likely benign alternative allele at, or near, the tested position in 58% of cases (39/67; Table 2). Further, as part of our sensitivity testing, we also tested two previously sequenced individuals known to have heterozygous variation at a position targeted by the GSA but harboring a nontargeted alternate allele. For both of these variants, the GSA reported that the sample was heterozygous for the array-targeted, rather than the actually present, allele (Table S2), further confirming the effects of nontargeted alleles at targeted positions. These results demonstrate that GSA-detected rare heterozygotes often result from the existence of a nontargeted allele. As the nontargeted alleles are often more common than the targeted alleles and more likely to be benign, this substantially inflates FP detection of disease-associated variation.

Table 2 Clinically relevant genetic variation detected by GSA in AGHI participants that did not validate via Sanger testing, but for which Sanger did detect an alternative nontargeted variant.

Among the 67 array-targeted FPs, 41 are predicted to result in frameshift (61%), 17 result in nonsense (25%), 7 in missense substitution (11%), and 2 are predicted to alter splicing (3%, Fig. 2). In contrast to the predicted effects of the 62 true positives (TPs) that validated in all individuals tested, FPs are enriched for insertions/deletions (indels) that result in frameshift (61% in FPs vs. 16% in TPs). Moreover, array-targeted missense variations resulting from single-nucleotide variation are more likely to be TPs (11% in FPs vs. 47% in TPs).

We examined other attributes of array-targeted FPs to see whether automated predictors of errors could be determined. We found that variant-level no-call rates are higher (p = 0.003) in FPs (0.3%) than in TPs (0.076%), as are study-wide allele frequencies (0.1% in FPs vs. 0.02% in TPs; Table S3). Also, we hypothesized that probe uniqueness/mappability may play some part in FP detection. However, when we aligned probes on the GSA to a reference assembly (GRCh37), we found no difference in the number of matching sequences in the genome when comparing FPs and TPs (Table S3).

Thus, no-call rates, batch- and study-level allele frequencies, and the presence of known alternative nearby variation are the features that we have found to be most predictive of TP/FP status among medically relevant variants. Internal allele frequency and no-call rates have become more effective as the study progresses, and we increasingly add to the list of variants, such as the BRCA2 variant mentioned above, that are unreliable and do not warrant curation or Sanger confirmation. However, we continue to find many FPs, with 46% of tested, unique variants failing Sanger confirmation among the most recent 1000 participants.

False positive findings in the context of race/ethnicity

We have collected self-reported race/ethnicity for almost 98% of enrolled study participants, and have assessed error rates in relation to self-reported ancestry. We focus here on European American (EA) and African American (AA) individuals, as the total numbers of participants (361) and Sanger-tested individuals (10) who are neither EA nor AA is too small to facilitate robust conclusions. Of the 99 participants self-reported as either EA or AA and within whom the GSA produced a FP, 46 are AA (46%), in contrast to the 22% of all EA/AA participants being AA. The enrichment of FPs in AA GSA data is substantial (odds ratio [OR] = 3.2) and statistically significant (p = 5e-4, Fisher’s exact test; Fig. 3). Further supporting this ancestry-correlated FP rate, for the two variants that each validated in one individual but not in others (one failed individual for one variant, and four for the other), the individuals in whom the variants validated are EA, whereas those with GSA-detected FP variation are AA. Notably, self-reported ancestries correlated strongly with genetically inferred degrees of African continental ancestry, with Sanger-tested individuals who self-reported as AA having an estimated 40–90% African ancestry and the Sanger-tested self-reported EA individuals all being <11% African ancestry (median 1.9%, Table S4).

Fig. 3: False positive (FP) findings in the context of race/ethnicity.
figure 3

False positive genetic variation is more often identified in individuals of African descent. 77% of Global Screening Array (GSA)-detected variants in African American individuals represent false positive calls, in contrast to 46% in individuals of European descent, and 56% across all individuals tested.

DISCUSSION

To date, 5369 participants have been genotyped and analyzed in the AGHI population screen. Eighty individuals (~1.5%) were found to harbor a P/LP variant in a subset of medically actionable disease genes associated with cancer or cardiac disease risk. All P/LP variants identified via the genotyping array and returned to participants were validated by Sanger sequencing in a CAP/CLIA-certified lab to ensure technical accuracy prior to return.

SNP arrays are primarily designed to assign genotypes at sites of common variation. They rely on empirical genotype-dependent clustering of fluorescence intensity values from many samples.7 Identification of rare variants by array is thus an exercise in outlier detection (e.g., an individual sample whose fluorescence intensity at a given variant deviates from all other samples in a given batch or, in many cases, from all other samples ever tested in a given study) and is prone to inferring P/LP variants that are false discoveries. Because all variants of interest are Sanger sequenced for confirmation, we can measure array specificity among the population of variants that are classified as P/LP and pass manual quality curation. We have found that many (51%) rare and potentially P/LP genetic variants identified by array do not validate by Sanger and represent FPs. While we demonstrate that some FPs can be systematically filtered as a result of no-call rates, elevated study- and batch-specific allele frequencies, and overlap with known alternative variants, a high rate of FP detection remains even after such filtration. For example, 46% of tests have failed validation within the most recent 1000 AGHI participants despite removal of variants found to be problematic among the first 4369 samples.

We have shown that FPs are enriched for targeted indels (61% of FP variants represent targeted indels) in contrast to an enrichment of variation resulting in missense in TPs (47% of TP variants result in missense; Fig. 2). Moreover, most FPs (58%) result from detection of an allele different from that which is targeted. Often, the array is targeting a rare indel that results in frameshift, while the actual detected alleles are more common in the population and lead to a benign missense or synonymous effect. Given that more than 61% of FPs resulted from targeted indels, this type of variation is especially problematic.

We have also provided data relevant to the sensitivity of the GSA to highly penetrant rare variation. We found that most (57% GSA v1.0, 71% GSA v2.0) P/LP secondary findings reported in two large sequencing-based studies are targeted by the GSA, and we showed that 23 known variants in 20 samples were all successfully flagged as heterozygotes by the GSA. While these estimates suggest high sensitivity and support the utility of array-based testing in the context, they overstate actual sensitivity. In particular, the presence of nontargeted, benign alleles near targeted, pathogenic alleles leads to FPs at a subset of targeted P/LP variants. While such P/LP variants are on the array and potentially could be accurately detected in individuals who harbor them, filtration of these calls is needed. We defined a list of 67 GSA-targeted P/LP rare variants that are unreliable (Table S3). While small relative to the scope of all rare GSA variants (~160,000), this is a large fraction among variants flagged for follow-up (e.g., 67 unreliable versus 64 total returned unique variants in AGHI). Testing all potential heterozygotes called across these 67 variants would have required 574 Sanger tests, compared with the 110 actually conducted. Nearly all of these would reveal FPs and would thus be highly inefficient and effectively unsustainable. However, pruning such variants reduces potential sensitivity, albeit by a small amount, and such limitations should be explicitly accounted for, particularly when describing results to participants.

We also found that ancestry associates with the likelihood that a rare variant is correctly genotyped, with GSA heterozygote calls in African Americans being enriched for FPs. Other studies have shown that there are ancestry-associated discrepancies in accuracy of clinical interpretation of variants. This result, at least in part, reflects reduced representation of non-European individuals in clinical and research genetic databases.28,29 Our results show that ancestry discrepancies also affect the technical quality of array-based detection of rare variants. This is likely also, at least in part, a result of reduced representation in variant databases. Indeed, to the extent that array probe design accounts for the existence of known alternative alleles,30 such designs will necessarily be less effective when there are fewer known alleles within a given population.

Overall, results from screening a large population of Alabamians indicate that FP detection rates among array-based rare variant genotypes are considerable, but manageable. This work supports the findings of others suggesting that array-detected rare genetic variants often represent FPs. Tandy et al.8 report that 40% of clinically relevant variants identified via DTC are FPs, while Weedon et al. found that only 16% of array-identified rare variants confirm in sequencing data. Though our results support the hypothesis that population-scale genotyping can detect many individuals at elevated risk for actionable diseases, they also provide clearer definitions of testing limitations, both in terms of specificity and sensitivity. Further, our observations strongly support the conclusion that all array-detected P/LP variants should be confirmed by an orthogonal method in a clinical genetics laboratory prior to return to patients and clinical providers, especially for individuals of underrepresented minority groups.