We integrated whole-exome sequencing (WES) and chromosomal microarray analysis (CMA) into a clinical workflow to serve an endogamous, uninsured, agrarian community.
Seventy-nine probands (newborn to 49.8 years) who presented between 1998 and 2015 remained undiagnosed after biochemical and molecular investigations. We generated WES data for probands and family members and vetted variants through rephenotyping, segregation analyses, and population studies.
The most common presentation was neurological disease (64%). Seven (9%) probands were diagnosed by CMA. Family WES data were informative for 37 (51%) of the 72 remaining individuals, yielding a specific genetic diagnosis (n = 32) or revealing a novel molecular etiology (n = 5). For five (7%) additional subjects, negative WES decreased the likelihood of genetic disease. Compared to trio analysis, “family” WES (average seven exomes per proband) reduced filtered candidate variants from 22 ± 6 to 5 ± 3 per proband. Nineteen (51%) alleles were de novo and 17 (46%) inherited; the latter added to a population-based diagnostic panel. We found actionable secondary variants in 21 (4.2%) of 502 subjects, all of whom opted to be informed.
CMA and family-based WES streamline and economize diagnosis of rare genetic disorders, accelerate novel gene discovery, and create new opportunities for community-based screening and prevention in underserved populations.
Whole-exome sequencing (WES) and chromosomal microarray analysis (CMA) have revolutionized investigation of rare genetic disorders and intellectual disability,1, 2, 3, 4, 5, 6 but important diagnostic and service gaps remain. The pretest probability of a genetic lesion is high for individuals who move through contemporary diagnostic algorithms to arrive at CMA or WES,7, 8, 9 yet many remain undiagnosed at the culmination of the process.2, 3, 9, 10 Moreover, the cost and complexity of these methods limit access for people who are poor, uninsured, or otherwise medically underserved.11
The Clinic for Special Children (CSC) is a medical home for children who derive from endogamous Old Order Amish and Mennonite (Plain) populations of Pennsylvania and surrounding states,12, 13 integrating clinical care with an in-house laboratory for Clinical Laboratory Improvement Amendments–certified targeted testing and research-based CMA analysis. Approximately 90% of CSC patients are medically underserved, as defined by their geographic, social, and economic circumstances (http://www.hrsa.gov/). Many live below the federal poverty threshold (http://www.census.gov/) in state-designated Health Professional Shortage Areas (http://www.health.pa.gov/) and the majority are uninsured, which is the strongest predictor of health disparity in the United States.14
Uninsured Americans are most commonly served by community health centers,15 only 12% of which offer the most basic forms of genetic testing.16 Such services remain particularly sparse in rural settings.16, 17 Urban areas have meanwhile witnessed the rise of ambitious genomic centers funded by academic18 and industry stakeholders19enthused by the promise of precision medicine.20 However, these large-scale genomics initiatives are not necessarily intended to democratize genomic testing16 or confront barriers to its broader implementation.11, 21
To bridge the gap between technical resources and medical need, CSC and the Regeneron Genetics Center (RGC) forged a collaboration to make WES freely accessible to uninsured members of the Plain community (Supplementary Figure S1 online). The arrangement provided benefit to all major stakeholders: uninsured patients received high-quality genomic testing at no cost, CSC received genomic data and operating support, and RGC streamlined their investigation of clinically relevant disease genes and pathways. Through this partnership, we have been able to optimize the yield of genomic testing, explore its broader social and economic value in community practice, and advance precision medicine while simultaneously redressing health care disparities unique to the genomic era.
Materials and methods
We identified 79 probands (36 female, mean age 6.9 ± 9.4 years, range newborn to 49.8 years) who presented to CSC for evaluation between September 1998 and 2015, had clinical signs of an underlying genetic disorder, and remained without a diagnosis following focused biochemical and genetic investigations spanning an average of 3.3 ± 3.2 (range 0.1 to 16.9) years. All but three probands descended from Old Order Amish and Mennonite founder populations.12, 22
The Lancaster General Hospital Institutional Review Board approved the study. Subjects (or their parents) had pretest counseling to explain goals, process, timing, and limitations of CMA and WES before consenting in writing to participate. Subjects could choose whether or not to receive medically actionable secondary findings that fit American College of Medical Genetics and Genomics (ACMG) guidelines,23 including pathogenic variants known to segregate with high frequency in Plain populations (e.g., APOB c.10580 G > A; p.Arg3527Gln).24
Clinicians phenotyped each proband following a structured and standardized assessment guided by PhenoTips (https://phenotips.org)25 and using Human Phenotype Ontology (HPO) terms. The likelihood of a monogenic disorder was based primarily on conventional clinical indices such as abnormal brain size or morphology, developmental delay or regression, the presence of craniofacial/skeletal dysmorphisms, or characteristic end-organ pathology (e.g., hearing loss, vision impairment, or epilepsy) in the absence of environmental antecedents.7 The apparent inheritance of an autosomal recessive, dominant or X-linked phenotypes supported a genetic etiology in only 14 (17.7%) of 79 cases; 65 remaining probands presented with a unique clinical phenotype in the context of an uninformative family history.
Prior to CMA and WES, most probands with developmental delay or neurological disease had additional analyte testing that could include, but was not limited to, plasma amino acids, acylcarnitines, lactate, ammonia, transferrin glycoforms, homocysteine, urine organic acids, purines and pyrimidines, creatine and guanidinoacetate, lysosomal storage markers, and cerebrospinal fluid glucose and neurotransmitters.26 The specific constellation of analyte tests for each proband was shaped by the clinical presentation, its attendant differential diagnosis, and the newborn screening history. In general, we took a parsimonious approach to metabolic analyte testing based upon its relatively low diagnostic yield (1–5%) in this clinical context.26, 27, 28, 29
Several subjects with Rett- or fragile X–like phenotypes had targeted MECP2 and FMR1 testing, respectively, prior to CMA and WES. Finally, the phenotype of each proband was crossed against an existing panel of more than 200 known population-specific alleles detected by CSC laboratory using high-resolution melt analysis or Sanger sequencing.12 Vetting the process in this way (Figure 1a), we ensured a high pretest probability of genetic illness while limiting representation of known recessive “founder alleles” among probands who advanced to WES; institutional knowledge allowed us to enrich for phenotypes caused by de novo, X-linked, compound heterozygous, and copy-number variants (CNVs).2, 3, 7
Sixty-eight (86%) probands had a 2.6-million marker high-density CMA (CytoScan HD Array, Affymetrix) to detect pathogenic CNVs to a resolution of between 25 kb (losses) and 50 kb (gains) using results from Affymetrix Chromosome Analysis Suite software (ChAS 3.1) filtered against CNV data from more than 350 individuals of Amish and Mennonite descent. We investigated any deletion (regardless of size) that encompassed at least one exon of an OMIM gene and impacted at least three separate NspI fragments.
For probands with an uninformative high-density CMA, we proceeded to WES in collaboration with RGC. Briefly, 1 μg of high-quality genomic DNA was exome-captured using the NimbleGen VCRome SeqCap 2.1 reagent. Libraries were sequenced on the Illumina HiSeq 2500 platform using v4 chemistry, achieving coverage of >85% of bases at 20x or greater. Raw sequence reads were mapped and aligned to the GRCh37/hg19 human genome reference assembly using BWA/GATK bioinformatics algorithms (https://software.broadinstitute.org/). Called variants were assessed by standard metrics (read depth ≥10, genotype quality ≥30, allelic balance ≥20%), annotated for potential functional effects (e.g., synonymous, missense, frameshift, nonsense), and subsequently filtered by observed minor allele frequency ≤1% within public (1000 Genomes, ExAC, and NHLBI ESP6500), RGC internal, and CSC population-specific allele frequency databases.
The annotation process incorporated in silico predictions of functional effect (e.g., LRT, Polyphen2, SIFT, CADD, MutationTaster) and conservation scores based on multispecies alignment (GERP, PhyloP, PhastCons). Primary analyses were performed using RGC’s trio-based pipeline and further vetted through segregation analyses among available affected and unaffected family members. In the large majority of cases, we succeeded in generating WES data for all members (affected and unaffected) of the proband’s nuclear family and, when indicated, more distantly related individuals germane to the analysis. Classification of pathogenicity for candidate exome variants was based upon ACMG guidelines.30 Informative case results were restricted to “pathogenic” and “likely pathogenic” variants as judged by these criteria, whereas variants of unknown significance were deemed “open” cases. Prior to reporting, all copy-number and allelic variants were validated in CSC’s Clinical Laboratory Improvement Amendments–certified molecular laboratory.12, 13
Study population and testing indications
The most common indications for genomic testing (Figure 1a) were central nervous system disease (64%), auditory or visual impairment (7%), neuromuscular weakness (6%), growth delay (5%), hepatopathy (4%), and skeletal dysplasia (4%). Among 52 probands with neurological disease, 85% had developmental delay characterized by diverse and overlapping phenotypes such as global developmental delay/intellectual disability (73%), motor disability with or without hypotonia (60%), executive dysfunction (44%), epilepsy (44%), autism (27%), extrapyramidal movement disorders (17%), and affective illness (15%). Nearly half of children who presented with developmental disability had abnormal brain size and/or morphology (microcephaly, 23%; macrocephaly, 12%; and/or cortical malformation, 13%). Prior to high-density CMA and WES, 61% of probands had between one and six (average two) uninformative targeted molecular tests and several were subjects of unsuccessful low-density autozygosity mapping.32
A pathogenic abnormality was identified by high-density CMA in 7 (9%) of 79 cases, including split-hand/split foot malformation with long bone deficiency-3 (MIM 246560), latent hereditary neuropathy with liability to pressure palsies (PMP22 deletion; MIM 162500), novel pathogenic CNVs in syndromic developmental delay accompanied by congenital heart disease, and atypical presentations of Angelman (MIM 105830) and Turner syndrome (Table 1, Figure 1b). One three-year-old boy (Proband 4) who presented with the classic cortical dysplasia-focal epilepsy syndrome (CDFES, MIM 610042) inherited one copy of the common Amish CNTNAP2 variant (c.3709delG) through the maternal line and a second pathogenic 37,556 bp deletion of CNTNAP2 (c.403_550del) from his Mennonite father.
Seventy-two probands advanced to “family” WES for phenotypes unique to the individual (n = 62), found among more than one sibling (n = 6), or segregating within a larger pedigree (n = 4) (Table 2). Family WES data were informative for 37 (51%) of 72 remaining individuals, yielding a definitive genetic diagnosis (n = 32, 44%) or suggesting a novel molecular etiology (n = 5, 7%) (Figure 2b). The diagnostic yield of WES was highest (71%) for the 14 probands who shared a phenotype with one or more related individuals in an apparently recessive, dominant, or X-linked segregation pattern. For 5 (7%) additional subjects with an ambiguous clinical phenotype (e.g., varicella encephalitis, transient hypercholanemia, transient glycogen hepatopathy, borderline QTc prolongation, extensive dental caries), negative WES results markedly reduced the likelihood of a genetic disease mechanism. We performed an average of 7 (range 3–17) exomes per proband (502 exomes for the cohort). When compared to trio analysis (proband and parents only), this inclusive strategy narrowed filtered candidate variants more than fourfold, from 22 ± 6 to 5 ± 3 alleles per proband (Figure 2a).
We identified cases of two Mendelian syndromes segregating in the same proband to produce a complex phenotype, consistent with recent reports of multilocus genomic variation.33 Proband 20 had de novo pathogenic variants in two genes (SHANK3 and TCF20) underlying a presentation of autism spectrum disorder, intellectual disability, and bipolar illness. A sibling pair (Proband 24) with skeletal dysplasia, scoliosis, and clubfoot shared homozygous pathogenic variants in two genes—SLC26A2 (diastrophic dysplasia, MIM 222600) and SH3TC2 (Charcot-Marie-Tooth type 4 C, MIM 601596)—segregating on the same haplotype (Figure 2c). Although diastrophic dysplasia dominated the clinical presentation, nerve conduction velocities subsequently revealed a motor neuropathy characteristic of CMT4C.
Novel, as yet provisional, gene-disease associations listed in Table 2 (Probands 40–44) include four autosomal recessive (CHD1, JKAMP, NIN, NUP188) and one de novo dominant (BMP2) phenotypes. Each allele in Table 2 represents the only compelling variant(s) to pass all filtering criteria and segregate appropriately within the family. However, each is classified as “uncertain significance” according to ACMG criteria,30 largely because such criteria do not accommodate novel gene discoveries or phenotypes that diverge from published reports (Table 2, Figure 3). Pathogenicity of BMP2 was first suspected by matching the “Amish” phenotype to unrelated non-Plain probands (https://genematcher.org) and corroborated by rephenotyping of all affected subjects (Figure 1b).
Our analyses were nondiagnostic in 30 (38%) cases, 18 of which were characterized by a short list of candidate alleles (in mostly uncharacterized genes) that could not be narrowed to a specific variant. For a number of such cases, bioinformatic analyses in conjunction with expression and literature investigations implicated a single candidate allele as pathogenic, but an association could not be firmly established without further evidence, such as in vitro functional data, animal models, or additional patients (Figure 1b).
We identified 37 pathogenic or likely pathogenic exome variants among 32 probands represented in Table 2 (Supplementary Table S1). Twenty-seven (84%) of these individuals presented with primary neurological disease, most commonly symptomatic epilepsy (n = 7), intellectual disability (n = 7), or syndromic global developmental delay (n = 6). Half the variants were missense changes, 27% were insertions or deletions leading to frameshift variants, 13% were nonsense, and 13% affected canonical splice sites. Inheritance was de novo dominant in 16 (50%) cases, autosomal recessive in 12 (38%; 9 homozygous, 3 compound heterozygous) cases, and X-linked recessive in 1 case. Dominant inheritance was observed for two probands within large multigenerational families segregating nonlesional generalized epilepsy; in one such pedigree, seizures were attributable to three variants in two different genes: SCN1B (MIM 604233) and NPRL3 (MIM 617118)(Figure 2e). We identified one putative case of germ-line mosaicism in which two siblings with Rubinstein–Taybi syndrome (MIM 180849) carried a variant of CREBBP that was not present in peripheral blood DNA of either biological parent.
Among 502 subjects included for WES analysis, 490 (98%) elected to receive secondary ACMG findings. Twenty-one (4.2%) subjects harbored one of four known or likely pathogenic variants in three genes: BRCA2 (c.5073dupA and c.7378_7379delAA), APOB (c.10580 G > A), and DSC2 (c.1580_1583delTCAA); all opted to receive these results.
Those who stand to benefit most from genetic testing often have complex medical needs and experience their health care as expensive, fragmented, and confusing. As a corollary, referrals for WES are commonly rejected by insurance carriers2, 21 and authorized samples are sometimes linked to incomplete or unreliable clinical data.1, 3 Such prosaic problems reinforce healthcare disparities and also reduce diagnostic efficacy. In one study of 814 consecutive probands, WES had a diagnostic yield of 26%, but provided potential diagnoses for 228 (28%) additional probands. For the latter, promising variants were assigned “uncertain significance” pending further segregation (50%), phenotyping (25%), or CNV analyses (25%).
Integration of molecular methods into medical practice not only engenders better clinical outcomes but also improves laboratory performance.13 Nonprofit-industry collaboration allowed us to apply this principle to deep sequencing by incorporating a sophisticated genomic testing pipeline, optimized for performance, into the clinical workflow (Supplementary Figure S1). Our overall WES diagnostic yield across diverse phenotypes was 44–51%, approaching the theoretical yield (~50%) proposed for larger outbred cohorts2 and the observed (45–49%) yield among carefully selected children with neurological disease.34 Embedding this service in a community-based practice with clinical laboratory capability12 allowed us to fully interrogate the genomic data, rephenotype patients as needed, validate WES variants on-site, directly report clinically actionable results, and apply new molecular findings to population-based health initiatives.13
To test the broader applicability of this strategy, we took steps to attenuate inflated yield (i.e., >70%) when WES is used as a first-tier diagnostic test for multiplex, consanguineous families.35 Within our Plain patients, “founder” alleles enriched by genetic drift manifest as more than 150 autosomal recessive and 25 autosomal dominant disorders.12, 22 By recognizing and testing for these variants, we provide molecular diagnoses for more than 40% of probands after one office encounter, obviating their need for CMA or WES (Figure 1a).13 To further limit representation of homozygous recessive genotypes within the cohort, we selected study subjects who had unique clinical phenotypes, in many cases with uninformative homozygosity mapping results. These pre-WES procedures largely abrogated the impact of founder variants, as half the pathogenic alleles we discovered were de novo (Table 2), approximating what one expects to find in an outbred cohort.1, 2
In complex clinical contexts, WES data clarify the relative contribution of genetic versus environmental factors and can deliver unexpected results. In some cases, WES data reveal digenic or multigenic interactions (e.g., Table 2, Probands 20 and 24) and in others, are informative only when combined with CMA results (e.g., Table 1, Proband 4). Five probands underscore the phenotypic overlap that often exists between genetic (e.g., neonatal rigidity and multifocal seizure syndrome; MIM 614498)32 and nongenetic (e.g., congenital viral encephalitis) afflictions. In such cases, “negative” WES data reduce the likelihood of a genetic disease mechanism and can critically inform clinical management, whereas “positive” WES data can challenge tacit assumptions about environmental pathogenesis, as we found in one sibship (Proband 16, Figure 2f) affected by both maternal phenylketonuria and SYNGAP1 haploinsufficiency (MIM 612621).
Full genetic ascertainment can reveal surprising complexity at the root of seemingly simple diagnostic problems. Such was the case for a nonlesional epilepsy phenotype segregating through a 38-member Mennonite pedigree (Figure 2e), in which we expected to find a single dominant risk allele. Instead, we identified three different pathogenic epilepsy variants in two epilepsy-associated genes (SCN1B and NPRL3). The relatively low observed penetrance (40–45% as compared to an expected value of ~70%) is noteworthy, but true penetrance may prove higher if currently asymptomatic individuals develop new seizure onsets over time or systemic electroencephalography (not done) reveals epileptiform cortical signatures in otherwise asymptomatic individuals. Such cases provide a potentially informative platform for discovering loci that modify disease expression, providing a fruitful area for future study.
A financial calculation invariably weighs on the use of new technologies and, without better value accounting, constrains the use of WES in clinical practice. In a recent study of 2,000 probands,3 WES was performed at the discretion of the referring physician unless denied by an insurance carrier. Pre-authorization for WES is required by more than 80% of US insurance carriers, who may ultimately fail to reimburse as many as 50% of completed studies.21 As with other measures of health care, this “reimbursement wall” stands as a principal determinant of disparate access to genetic testing.
This study was enabled by a nonprofit–industry collaboration that posed opportunities as well as challenges (Supplementary Figure S1). The final decision to enter into partnership was reached after careful negotiations to insure CSC’s clinical and operational autonomy, shared ownership of data, stringent protection of patient privacy, and unanimous acceptance by the CSC’s nonprofit Board of Directors, most of whom are leaders within Old Order communities. Adult members of the Plain community tend to be entrepreneurial and exceptionally pragmatic, and generally embrace creative forms of collaboration that allow their people to flourish.13 The overall success of the partnership has engendered strong ongoing community support for collaboration, which should enable us prospectively to perform WES on each proband for whom it is indicated.
Growing evidence supports the economy of this approach. Within the US healthcare system, standard evaluation of a child with neurodevelopmental disability costs an average of US$19,000 (range $9,000 to $35,000)9, 13, 36 for testing, not including professional fees or other indirect institutional expenses.9, 36 This approach, which does not encompass WES,7, provides a genetic diagnosis in about one third of cases. By comparison, first-tier WES for children with neurodevelopmental disorders yields a molecular diagnostic rate of 40–60%5, 9, 34 for an average $1,920 (range $1,170 to $3,150) per exome trio (based on 34 reporting labs at http://www.scienceexchange.com). Using this information to calculate a simple metric of value (i.e., favorable outcomes per dollar spent),37 we assign a theoretical genomic evaluation cost of $4,000 per study subject (to comprise costs of targeted allele detection, CMA, and 0–4 additional exomes per proband; Figure 1a) to return actionable information in at least 50% of cases. This strategy yields one molecular diagnosis per $8,000 dollars spent, compared to one diagnosis per $60,000 via the standard approach.
The implication is clear: for select patients, a diagnostic method that prioritizes CMA and WES can be efficient and cost-effective in a variety of clinical contexts, provided cases are chosen carefully and executed systematically. Embedding this service within community-based practice further improves its value and aligns well with the World Health Organization’s call to implement genetics in underserved settings.17, 38 We returned actionable secondary results to 21 subjects and, by designing rapid molecular tests for 17 (46%) alleles discovered by WES,13 created new opportunities for screening and prevention (Figure 3). We conclude that emerging genomic technologies, judiciously applied, can empower communities to curtail wasteful medical spending and improve population health.
This work was supported in part by charitable contributions from Old Order Amish and Mennonite Communities of Pennsylvania and surrounding states. CMA analysis at CSC and functional studies performed by R.N.J. were supported in part by a grant to Franklin & Marshall College from the Howard Hughes Medical Institute through the Precollege and Undergraduate Science Education Program. The authors thank D. Holmes Morton, Zineb Ammous, Olivia Wenger, and James Deline for contributions to proband phenotyping and sample collection.
About this article
Supplementary material is linked to the online version of the paper at http://www.nature.com/gim
Establishing the role of PLVAP in protein-losing enteropathy: a homozygous missense variant leads to an attenuated phenotype
Journal of Medical Genetics (2018)