Purpose: Increases in throughput and affordability of genotyping products have led to large sample sizes in genetic studies, increasing the likelihood that incidental genetic findings may occur. We set out to survey potential notifiable variants on arrays used in genome-wide association studies and in direct-to-consumer genetic services.
Methods: We used multiple bioinformatics strategies to identify, and map variants tested for genetic disorders in ≥2 CLIA-approved laboratories (based on the GeneTests database). We subsequently surveyed 18 commercial single nucleotide polymorphism arrays and HapMap for these variants.
Results: Of 1,362 genes tested according to GeneTests, we identified 298 specific targeted mutations measured in more than or equal to two laboratories, encompassing 56 disorders. Only 88 of 298 mutations could be identified as known single nucleotide polymorphisms in genomic databases. We found 18 of 88 single nucleotide polymorphisms present in HapMap or on commercial single nucleotide polymorphism arrays. Homozygotes for rare alleles of some variants were identified in the Framingham Heart Study, an active genome-wide association studies cohort (n = 8,410).
Conclusions: Variants in genes including APOE, F5, HFE, CYP21A2, MEFV, SPINK1, BTD, GALT, and G6PD were found on single nucleotide polymorphism arrays or in the HapMap. Some of these variants may warrant further review to determine their likelihood to trigger incidental findings in the course of genome-wide association studies or direct-to-consumer testing.
Recently, there have been remarkable advances in our understanding of the genetic determinants of common, complex human diseases. The Human Genome Project provided a public reference human genome sequence,1 and the HapMap project2 created a genome-wide database of genetic variation of some four million single nucleotide polymorphisms (SNPs). With the identification of so many SNPs and information regarding correlations among these SNPs, genome-wide association studies (GWAS) using high-density arrays of 100K—1000K SNPs became feasible. GWAS have already led to the discovery of thousands of genetic variants contributing to variability in a range of common diseases including diabetes, cancer, and cardiovascular disease.3
Along with the rapid advance in GWAS and knowledge of genetic variation, an important logistical and ethical issue that has received increasing attention is the potential benefits and risks of identifying “incidental” genetic variants that may compel participant notification.4 The specific criteria to be used for participant notification of a finding in biomedical research is currently under debate by different interested parties (researchers, participants, practitioners, lawyers, health insurers, those with financial interest in genetic tests), involving issues relating to informed consent, ethics, law, and clinical practice.5 For the purposes of this study, we adopted one set of criteria recently published for defining a notifiable genetic variant in population cohort research studies.6 These criteria from Bookman et al.6 indicate a notifiable genetic variant is one with established analytic validity for which (a) genetic testing can strongly predict a deleterious clinical outcome with reasonable certainty, and (b) efficacious early medical intervention exists to reduce the risk of disease or its complications or which may impact reproductive decisions.6
Do participants want to receive genetic results? In the administration of informed consent for genetic research studies, some research participants are asked whether they wish to receive the results of their genetic tests. One survey study of potential biobanking participants who did not commit to participation indicated a majority of people would elect to be notified of any significant discoveries regarding their allelic status if it would provide information and future health risk and treatment is available.7 However, the issue has not been widely studied in consented populations and the extent of information sought in the practice of informed consent is heterogeneous, as are the contexts in which informed consent is applied.8 With the passage in May 2008 of the Genetic Information Nondiscrimination Act (GINA), which limits health insurers' and employers' abilities to use genetic information in a discriminatory manner, some, but not all, potential damaging uses of genetic tests reported back to research participants have been limited. GINA could therefore increase the incentive for individuals to participate in genetic research studies, although the impact of GINA on genetic research remains unknown.9 In addition, in the past 2 years, a number of companies have begun to offer genetic testing for SNP variants identified by GWAS and/or genomewide SNP scans direct-to-consumer (DTC) (e.g., 23andMe, DeCode, Variagenics, Knome), suggesting to potential buyers that this information could be clinically useful to them. Accordingly, participants in genetic research may develop greater interest in personal genetic information as DTC information increasingly enters the public consciousness.10
Despite the growing discussion of many facets around implementing participant notification procedures for actionable genetic results, there is little information regarding the number and types of “notifiable” genetic variants on commercial SNP arrays currently used in GWAS or the potential implications of the use of SNP arrays harboring these variants. We used available genome resources to systematically identify the overlap of SNPs tested in GWAS with genetic tests conducted in Clinical Laboratory Improvements Act (CLIA)-certified laboratories. We further provide estimates of the prevalence of notifiable tests from one “real world” study of genomewide association, the Framingham Heart Study (FHS), with 9,274 genotyped and consented participants.
MATERIALS AND METHODS
Abstraction of disease variants tested in CLIA-certified laboratories
From a master list of 1,362 genes catalogued by GeneTests, we characterized the types of genetic tests reported as being tested in CLIA laboratories (Fig. 1). We focused on targeted mutation analyses reported in ≥2 CLIA-certified laboratories (n = 89 disorders, n = 88 genes). “Targeted mutation analysis” was defined as “testing for either (1) a nucleotide repeat expansion, or (2) more than or equal to one specific mutation, and excluding deletion/duplication analysis or family-specific mutation analysis.” We chose to focus on variants tested in ≥2 CLIA-certified laboratories because we felt this would narrow the range of variants to ones where there was a higher level of community consensus on the utility of testing a specific variant. From a list of diseases and genes tested by targeted mutation analysis, we established, where possible, the specific genetic variants tested (Table, Supplemental Digital Content 1, http://links.lww.com/GIM/A109). Many of these CLIA-tested variants have been shown to be causally related to disease in prior genotype-phenotype studies. Although some DTC tests are conducted in CLIA-certified laboratories, they are not included in the GeneTests CLIA laboratories dataset because their primary purpose is not to detect highly penetrant clinical variants.
Bioinformatics search for unique identifiers for variants tested in CLIA-certified laboratories
Two reviewers (A.B., A.D.J.) conducted an exhaustive search to ascertain whether the identified targeted mutations possessed reference SNP identifiers (rsIDs). To find rsIDs, we used GeneTests, OMIM, HGMD, dbSNP (Build 129), ExPASy, primary literature (used in some cases to identify and map probe sequences for variants), and the BLAST-Like Alignment Tool, human genome reference sequence (Build 36) and SNP tracks all via the UCSC genome browser. For some variants, a detailed search was required to locate the appropriate position in the human genome. Some instances required accounting for varied positions in alternate protein isoforms, resolving multiple names for the same variant among the CLIA-certified laboratories, and confirming and aligning sequences from the variant region based on published variant and probe sequences.
Locating variants on commercial SNP arrays
Once a set of variants with verified SNP rsIDs was confirmed (listed in Table, Supplemental Digital Content 1, http://links.lww.com/GIM/A109), we determined if the SNPs themselves or perfect proxy SNPs (r2 = 1.0) are present on commercially available Affymetrix or Illumina SNP arrays using SNP Annotation and Proxy search tool (SNAP, version 1.3).11 This tool provides rapid retrieval of HapMap linkage disequilibrium information, best linkage disequilibrium proxies and genotyping array membership for user-defined query SNP lists, and handles a number of related informatics issues relating to SNP queries, which otherwise could result in false negative queries.12 Eighteen arrays (listed in Table 1) were scanned for the presence of CLIA-tested variants or proxies in the HapMap Utah residents with ancestry from northern and western Europe (CEU) population.
Determination of SNP prevalence in five populations
We determined the prevalence of these SNPs and proxy SNPs in the FHS Original cohort,13 Offspring cohort,14 and Third Generation cohort15 of the FHS, and in four HapMap populations2: Utah residents with ancestry from northern and western Europe (CEU); Han Chinese in Beijing, China (CHB); Japanese in Tokyo, Japan (JPT); and Yoruba in Ibadan, Nigeria (YRI). Genotyping was completed in 9,274 FHS participants as part of the SHARe (SNP Health Association Resource) project, using Affymetrix 500K mapping arrays (250K Nsp I and 250K Sty I arrays) and Affymetrix 50K supplemental Human Gene Focused arrays. Genotyping resulted in 503,551 SNPs with a call rate >95% and Hardy-Weinberg equilibrium P-value >10−6. Imputation of 2.5 million autosomal SNPs in HapMap was conducted using MACH.16 To examine allele frequencies, we restricted the FHS DNA samples to those with genome-wide call rates of ≥97%, samples with genome-wide heterozygosity rates ≤5 SD from the norm, and those without unresolved pedigree errors. The final population examined for genotype frequencies here included 8,410 individuals from the FHS study (original cohort, n = 962; offspring cohort, n = 3,576; third generation cohort, n = 3,872). The FHS protocol is approved by the Institutional Review Board of the Boston University Medical Center, and all participants in the SHARe project provided written informed consent to participate in genetic research.
Localization of SNPs from published GWAS on commercial SNP arrays
From the National Human Genome Research Institute catalog of published GWAS assaying at least 100,000 SNPs, we derived a table of top genetic associations from GWAS. The National Human Genome Research Institute catalog included all SNP-trait associations with P-values <9.5 × 10−6 in the original reports. Information regarding the risk allele frequency in controls from the original report for each SNP was taken from the published table online. We queried SNAP for the relative representation of these GWAS SNPs and their proxies on commercially available SNP arrays. Information regarding the allele frequencies of SNPs was also obtained from the FHS cohorts and HapMap populations.
A schematic overview of results derived from GeneTests is shown in Figure 2. There were 217 genetic diseases for which genetic tests are performed by ≥2 CLIA-certified laboratories; a test by targeted mutation analysis is available in more than equal to two laboratories for 89 disorders. Of these, seven were mitochondrial diseases, and we did not consider further because SNP arrays generally do not include mitochondrial markers. Additionally, for 26 of the genetic diseases, the particular variants being tested were not specified on the GeneTests site or on the websites of the specific laboratories conducting testing.
For the remaining 56 disorders, 298 specific targeted mutations were reported as tested. A complete list and description of these variants and disorders is in Table, Supplemental Digital Content 1, http://links.lww.com/GIM/A109. Of 298 tested variants, we identified rsIDs using available genome databases for 88 SNPS (29.5%). Seventy of the 88 SNPs were not found in HapMap. Characteristics of the remaining 18 SNPs present in HapMap are given in Table 1, including gene, SNP description, genotype counts in FHS when available, and minor allele frequencies reported in HapMap populations and calculated, or estimated from imputed results, in FHS. Twelve of these SNPs, representing nine genes and nine corresponding diseases, were found to be located on commercial SNP arrays. When we sought evidence for proxy SNPs in HapMap for the 18 variants, we identified six “perfect proxy” SNPs (r2 = 1.0); physical distances between the CLIA-tested SNPs and the HapMap proxies ranged from 4.1 to 51.4 kb. Four these six proxies were themselves found on arrays, however, the inclusion of these proxies did not substantially increase the burden of potentially notifiable variants on commercial arrays because they tagged common disease variants for which evidence for clinical impact was modest (e.g., MTHFR E429A, HFE H63D, GALT L218L). Each commercial array contained on average 4.1 of the SNPs or their proxies (range, 0–11, of a maximum possible 24 SNPs). Of the 18 SNPs found in HapMap, six variants were not on any of the commercially available arrays. One of these SNPs, rs2070075, had a proxy SNP rs12553321 (r2 = 1.0) that was on two arrays (Illumina Hap1Million, single and dual formats) but did not have strong evidence for a substantial disease burden (GALT L218L).
We determined the occurrence and prevalence of potentially notifiable SNPs in a large genome-wide scan performed in the FHS. Of the 12 SNPs that were directly located on at least one array, eight SNPs or their proxies were genotyped on arrays used in FHS GWAS (the 100K study on the offspring cohort: Affymetrix 50K Xba I, 50K Hind III; the 550K SHARe study including all three FHS cohorts). One SNP, rs1801133 in MTHFR, was only present on a lower density array used in the FHS 100K GWAS studies, and thus, genotype data were only available in a subset of the offspring cohort (n = 1,325). These eight SNPs corresponded with tests performed in ≥2 US CLIA-certified laboratories for HFE-associated hereditary hemochromatosis, biotinidase deficiency, familial Mediterranean fever, hereditary pancreatitis, and MTHFR deficiency (Table 1). In the FHS, the frequency of the minor alleles ranged from 0.00018 to 0.36. These allele frequencies were generally similar to those reported in HapMap.
Representation of GWAS SNPs cross arrays
From the NHGRI catalog of GWAS findings, we derived a list of variants reported to be significantly associated with common diseases, disease traits, and other human measurements (n = 325). We focused on the occurrence and prevalence in FHS of SNPs reported as significantly associated only with common diseases or disease traits in at least one GWAS report (n = 164, Table, Supplemental Digital Content 2, http://links.lww.com/GIM/A110). For common, chronic disease conditions such as adult onset diabetes mellitus, Crohn's or other inflammatory bowel diseases, or breast or colorectal cancer, 50% of the SNPs reported in GWAS (n = 82) were found directly on arrays included in the FHS 550K study. The average allele frequency for all SNPs (n = 164) either genotyped or imputed in the FHS cohort was 27.9%, with only two SNPs, rs10498345 and rs16901979, having an minor allele frequency (MAF) ≤5%. The average MAF for the full set of GWAS SNPs (n = 164) in the HapMap CEU was 27.6%, similar to that in Framingham. Average MAFs for GWAS SNPs in other HapMap populations were CHB (28.5%), JPT (27.9%), and YRI (32.2%). Most of the SNPs associated with a common, chronic disease were either directly represented on FHS arrays (550K, 50%; 550K + 100K, 62%) or had a highly correlated proxy SNP (r2 ≥ 0.8) on the FHS arrays (550K, 86%; 550K + 100K, 88%). A similar pattern was observed for other commercial arrays, highlighting the overlap among them and their focus on detecting common variants. By contrast, CLIA-tested variants were less uniformly represented on commercial arrays (12 of 298 variants [4.0%] found across all arrays) and were less common among participants (average genotyped or imputed MAF of the 12 variants in FHS was 9.1%).
GWAS have been conducted in hundreds of thousands of research participants and are currently underway in many other populations, with increasingly large sample sizes being used and combined to maximize statistical power. We sought to identify and characterize the potentially “notifiable” genetic variants residing on commercially available SNP arrays used for GWAS. Because the current guidelines for the use of genetic tests from research findings that we adopted indicate these tests should be conducted in CLIA-certified facilities, we considered known genes tested by these laboratories.6 From our review of all types of genetic tests currently measured for 1,362 genes, it appears that only 11.7% of the genes are tested for targeted mutation analysis in ≥2 CLIA-certified laboratories. By using a variety of bioinformatics methods, we confirmed only 12 SNPs tested in ≥2 CLIA-certified laboratories for nine diseases/genes are currently found on ≥1SNP array used for GWAS. Because GWAS studies increasingly impute millions of SNPs from genotyped SNPs using HapMap information,11 it might be possible that many additional potentially notifiable SNPs could be found among the imputed SNPs. However, we confirmed only four additional SNPs found on more than or equal to one commercially available SNP scan that are “perfect proxies” for the 12 potentially notifiable SNPs based on the HapMap. Further, we found six additional HapMap SNPs tested for six diseases/genes in ≥2 CLIA-certified laboratories that are not currently found on any SNP arrays. Thus, our findings suggest that SNP arrays used for GWAS, and the HapMap itself, generally harbor very few potentially notifiable genetic variants.
Potential notifiable SNPs
For a genetic result to be considered notifiable to research participants from community-based populations, a number of criteria have been suggested, including evidence that the notifiable disease has important health implications, that penetrance is relatively high though not necessarily complete (relative risk >2.0), the risk for disease is strong, the magnitude of risk conferred by the genetic variant is significant, and there are proven therapeutic or preventive interventions available, or there are significant reproductive implications.6 When we assessed the state of the literature for each of the genes (and the specific associated variants available on the commercially available SNP arrays) to determine whether they met the criteria and might qualify for notification, we found that few, if any, potentially notifiable variants that reside on arrays used in GWAS meet these criteria. Three disorders—congenital adrenal hyperplasia (CAH) as a result of 21-hydroxylase deficiency (CYP21A2), biotinidase insufficiency (BTD), and galactosemia (GALT)—primarily display onset in newborns or infants and are identified with high sensitivity by newborn screening (National Newborn Screening and Genetics Resource Center). A fourth condition, methylenetetrahydrofolate reductase (MTHFR) deficiency is associated with developmental delays in physical and cognitive functions, as well as mental retardation and various psychiatric disturbances.17 However, this condition is incompletely penetrant and routine newborn screening is not conducted.
Similarly, it is unclear whether most of these variants are strongly associated with adult diseases for which treatments are available and that might qualify them for genetic notification. For the two separate amino acid substitutions detectable for MTHFR, Ala222Val (677C>T) and Glu429Ala (1298A>C), the evidence for association with cardiovascular disease remains uncertain and the indication for notification is weak. Glucose-6-phosphate dehydrogenase deficiency (Val68Met in G6PD, not measured in FHS), Familial Mediterranean fever (FMF, Pro369Ser, Arg408Gln in MEFV, for both FHS MAF = 0.8%), the nonclassic form of CAH (Val237Glu, FHS MAF = 0.02%), and hereditary pancreatitis (Asn34Ser in SPINK1, FHS MAF = 1.0%) may manifest postnatally and are uncommon. In the FHS sample, we observed a total of six homozygotes for the minor alleles of variants in these genes, for which the disease-related gene variants are incompletely penetrant. Knowledge regarding genetic status relating to these conditions might be beneficial, specifically for avoiding triggers for the severe episodic illnesses in hereditary pancreatitis and G6PD deficiency, colchicine prophylaxis in FMF, and steroid replacement in CAH; however, there is insufficient evidence regarding the penetrance of these conditions in general community-based populations and whether there are significant benefits from genetic notification.
We examined evidence for return of results for each of these four conditions. For glucose-6-phosphate dehydrogenase deficiency, a G6PD mutation, Val68Met, is present on genotyping arrays. However, this mutation has only been found to cause a disease phenotype in combination with another mutation, Asn126Asp, which is not present on any of the arrays,18,19 making interpretation of the status for Val68Met alone unclear. The MEFV mutations present on arrays and related to the autosomal recessive FMF, for which three homozygotes have been potentially identified in FHS. Because onset of FMF typically occurs in adulthood, often via a difficult diagnosis after multiple exploratory surgeries, there may be a case for notification. However, the mutations identified here (Pro369Ser, Arg408Gln) are not among the most commonly observed disease variants, their role in FMF has not been clearly elucidated, and some studies suggest incomplete penetrance of these and other MEFV alleles.20–23 Thus, a careful clinical review and consideration of further mutational screening would likely be necessary in cases where notification regarding MEFV mutation homozygosity was considered.
Individuals with the nonclassic form of CAH present postnatally and exhibit moderate enzyme deficiency and in some cases signs of hyperandrogenism. They may be heterozygous for one or more mutations or deletions (compound heterozygotes), including the Val237Glu mutation that is among a cluster of tested mutations in exon 6, and which is present on the array used in the FHS GWAS. In FHS, there were three potential heterozygotes for the Val237Glu mutation, which appears to be a functional null,24 raising the question of whether such participants might benefit from notification because it could explain and possibly lead to treatment of symptoms if there is also a second undetected mutation, which in combination, might lead to undiagnosed nonclassical CAH. Finally, for hereditary pancreatitis, while both heterozygosity and, particularly, homozygosity for the SPINK1 mutation (Asn34Ser) is clearly and strongly associated with symptomatic disease,25 the pancreatitis shows highly incomplete penetrance, likely because of required environmental triggers, such as infection and there may be little impact on treatment,26 although avoidance of alcohol and some drugs may be recommended and earlier recognition of first pancreatitis episode could be a benefit.
Genetic testing for two other common genetic mutations, F5 (Arg506Gln) and HFE (Cys282Tyr), is sometimes conducted in adults with clinical manifestations of venous thromboembolism or hemochromatosis, respectively. Carriers of the Factor V Leiden variant (Arg506Gln) with a history of venous thromboembolism are at increased risk for a second thromboembolic event.27 However, there is no evidence of a clear benefit from screening asymptomatic persons for variants in the F5 gene.28 Furthermore, from available clinical trial evidence, there is little evidence that genetic testing predicts responsiveness or aids in decisions regarding use of more intensive anticoagulation in persons with recurrent venous thromboembolism,29 or that long-term prophylactic anticoagulation is beneficial for asymptomatic, heterozygous individuals. However, knowledge of Factor V Leiden carrier status might result in modifying exposure to thromboembolic risk factors, such as smoking or prophylactic aspirin use for sedentary periods (e.g., airplane flights). Factor V Leiden may contribute to pregnancy loss, however, testing for this variant is a usual part of the evaluation for recurrent pregnancy loss.
HFE homozygotes have an increased prevalence of liver enzyme abnormalities with increased hepatic iron stores.30 There are no randomized trials assessing long-term outcomes of phlebotomy in HFE Cys282Tyr homozygotes. Because homozygous individuals may or may not have biochemical expression of iron overload, and many will not develop disease and end-organ damage, the use of phlebotomy is reserved for homozygotes with abnormal iron levels. In this study, we identified 31 potential FHS homozygotes for the rare allele of Cys282Tyr in HFE (FHS MAF = 5.8%). Although available clinical trial data are limited in asymptomatic homozygotes for HFE variants, it is possible that manifestations of iron overload can be delayed or averted by interventions to reduce iron intake or overload.31 Best practice guidelines exist which do recommend a predictive referral of Cys282Tyr homozygotes for hemochromatosis examination in a clinical setting, but they do not suggest general population screening because of issues of incomplete penetrance.32 Even though HFE is incompletely penetrant and a test has not been conducted in a CLIA laboratory, the measurement of iron levels is routine and safe, and if a clinical imbalance was detected, a safe and effective treatment exists with routine phlebotomy. Thus, further consideration may be warranted of the potential benefits versus risks for reporting variants in the HFE gene to research participants depending on the specific context of the research study in question. In some contexts, important clinical information may be available to supplement knowledge of the genotype in deciding whether to return results (e.g., for F5 or HFE, if a medical history of DVT, or high iron levels, is known, respectively), although such situations invoke the boundary between research and clinical practice, how they are defined and what is expected in each.
Although not an explicit goal of this study, we briefly consider the next steps, ethical obligations, and caveats in the potential incidental notification process, a subject that has been discussed extensively elsewhere.5,6,33,34 If there is a clear medical benefit that outweighs the risks of notifying participants who have consented to such notification, then there seems to be an ethical obligation to inform. We did not immediately deem any of the variants considered here to clearly meet notifiability criteria. However, we decided to bring these findings to our independent Ethics Advisory Board to solicit input and a recommendation was made for further consultation with outside experts. If variants are deemed notifiable, a number of additional practical steps might be considered, including (1) consulting the consent form for the study to assess whether individuals clearly indicated an opt-in or opt-out preference to be informed, in recognition of individuals who would not want this information, (2) reviewing available genotype quality assessments (e.g., cluster plots), if any, for the variants in question to determine potential false positives, and considering additional validation genotyping in a CLIA-certified laboratory, (3) if clinical information is available and consent provided, conducting a clinical review for evidence of related, possibly undiagnosed disorders if this is deemed appropriate to the situation at hand, and (4) verifying that the participants are alive and can be contacted. Identification of resources including a specialized care provider, such as a genetic counselor and/or medical geneticist, and development of educational materials may also be warranted.
Implication of SNP associations identified from GWAS
To our knowledge, our study represents the first study that seeks to quantify the overlap between SNPs tested in GWAS and genetic tests administered in CLIA-certified laboratories. Because many CLIA-laboratory tests target rare, Mendelian conditions caused by low frequency genetic variants, it is not surprising that the overlap is low between most CLIA-laboratory tests and GWAS SNPs. GWAS studies are providing strong evidence for associations between individual SNPs and chronic diseases and quantitative disease traits. Because there is substantial overlap (Table, Supplemental Digital Content 2, http://links.lww.com/GIM/A110) among commercial arrays for “significant” GWAS SNPs, most GWAS that use high-density SNP arrays are likely to identify or reliably impute genotypes for most GWAS-reported risk alleles. A number of DTC companies have formed to market tests based on GWAS findings.8,35 However, for nearly all GWAS associations to date on which DTC reporting is largely based, the magnitude of risk is modest (relative risk <1.5)—excepting age-related macular degeneration and CFH polymorphism—and evidence from clinical trials for interventions that alter risk based on SNPs is lacking. Thus, at present, there appears to be insufficient evidence to classify SNPs from most GWAS as “notifiable” using evidence-based criteria for such reporting.6 Continual re-evaluation of the rapidly evolving evidence base is warranted regarding the threshold for reporting SNPs as further GWAS are conducted. As consumers purchase DTC tests and look for explanations of the results participants may also increasingly query researchers about their own individual genetics results.10,35
Our survey and the resulting tables are only as comprehensive as the information provided by the laboratories contributing test information to GeneTests. Although many laboratories specified which genetic variants they were testing for a given disease, some laboratories did not do so; as a result, some variants could not be identified and checked against SNP arrays. We also focused on US CLIA-certified laboratories. Other databases exist (e.g., EuroGentest) but the participating laboratories generally conform to standards other than CLIA. We did not specifically consider combinations of signals at sex-linked markers to identify ploidy-related sex chromosome disorders like Turner or Klinefelter syndromes. Detecting such ploidy events requires an additional level of analysis on the part of researchers beyond simple SNP calling. Many ploidy disorders are diagnosed in childhood, so their incidental detection in GWAS is most relevant to young cohorts, however, some may persist undiagnosed into adulthood,36 and thus could trigger incidental findings in older populations.
Many of the disease-causing SNPs are not identified and tagged in human genetic databases (e.g., dbSNP, OMIM) because these databases were originally designed to generally capture variation with an MAF of ≥1% in general populations. Moreover, the SNP identifiers in the OMIM and UCSC databases did not always correlate with those in dbSNP, or with each other. This is largely due to redundant alias SNPids.12 We found only 29% of CLIA-tested SNPs have an identifier in genomics databases. The ways in which some CLIA-tested variants are identified, tested, and named can create confusion in identifying a precise genomic position for these variants. For example, a number of CLIA-tested variants are known by multiple names referring to restriction digests used, protein positions in precursor or mature proteins, or varying positions with respect to intron borders.12 With increasing sample numbers, denser arrays, and deeper genome sequencing, a more systematic informatics effort would be beneficial to ensure that both common and rare disease-associated variants are clearly identified in public databases. Without more systematic cataloging of rare disease variants, there are clear informatics barriers to the identification of notifiable genetic findings in genomics research. This study focuses on SNPs to the exclusion of other types of polymorphisms such as insertions, deletions, inversions and CNVs that GWAS arrays generally do not directly capture.37
Our results indicate a small fraction of disease-causing variants currently tested by CLIA-certified laboratories are in HapMap and/or commercially available SNP arrays. Of this limited number of potentially notifiable variants, few seem to fit criteria for notifiability. For several, newborn disease screening is widely practiced in most US states. For other variants, the penetrance with known adult diseases is low and/or there is insufficient evidence for therapeutic or preventive interventions that would avert disease risk in adults. Nevertheless, for some variants, such as the Cys282Tyr mutation in the HFE gene associated with hereditary hemochromatosis, further consideration may be warranted regarding the benefits versus risks of participant notification. Although DTC SNP tests are increasing, there is insufficient evidence regarding virtually all common SNPs implicated in GWAS to suggest that these variants should be considered notifiable at present. Given rapid advances in knowledge regarding SNPs on GWAS arrays and regarding the potential benefits and risks of reporting genetic variants, further research and continual updating of our knowledge regarding genetic return of results will be needed to allow informed and appropriate reporting of genetic variants that are detected in the course of research studies.
Supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contract No. N01-HC-25195). ADJ was supported by an NHLBI IRTA fellowship award. AB was supported by the NIH Summer Internship Program in Biomedical Research. GPJ was supported by a State of Washington Life Sciences Discovery Fund.
This research was conducted in part using data and resources from the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. The analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project.
We thank Shih-Jen Hwang for assistance with FHS genotyping results.
About this article
International Journal of Environmental Research and Public Health (2014)