Public health newborn screening (NBS) programs provide population-scale ascertainment of rare, treatable conditions that require urgent intervention. Tandem mass spectrometry (MS/MS) is currently used to screen newborns for a panel of rare inborn errors of metabolism (IEMs)1,2,3,4. The NBSeq project evaluated whole-exome sequencing (WES) as an innovative methodology for NBS. We obtained archived residual dried blood spots and data for nearly all IEM cases from the 4.5 million infants born in California between mid-2005 and 2013 and from some infants who screened positive by MS/MS, but were unaffected upon follow-up testing. WES had an overall sensitivity of 88% and specificity of 98.4%, compared to 99.0% and 99.8%, respectively for MS/MS, although effectiveness varied among individual IEMs. Thus, WES alone was insufficiently sensitive or specific to be a primary screen for most NBS IEMs. However, as a secondary test for infants with abnormal MS/MS screens, WES could reduce false-positive results, facilitate timely case resolution and in some instances even suggest more appropriate or specific diagnosis than that initially obtained. This study represents the largest, to date, sequencing effort of an entire population of IEM-affected cases, allowing unbiased assessment of current capabilities of WES as a tool for population screening.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The de-identified residual DBS from the California Biobank for this project (SIS request number 496) were obtained with a waiver of consent from the Committee for the Protection of Human Subjects of the State of California, under project no. 14-07-1650 and in compliance with CDPH Biospecimen/Data Use and Confidentiality Agreement. California blood specimens and any data derived from the newborn screening program are confidential and subject to strict administrative, physical and technical protections. California law precludes any researcher from sharing blood specimens or uploading individual data derived from these blood specimens into any genomic data repository. Researchers desiring access to these data would need to make a separate application to the CPDH. Data in Fig. 2b,c and Extended Data Figs. 6, 9 and 10 can be found in Supplementary Table 3.
Variant calling and annotation for the exome sequences were performed using previously published methods as described above. The code used for the screening analysis of exome data and subsequent assessments are deposited in GitHub (https://github.com/nbseq1200/NBSeq1200paper).
Hall, P. L. et al. Postanalytical tools improve performance of newborn screening by tandem mass spectrometry. Genet. Med. 16, 889–895 (2014).
Mak, C. M., Lee, H. C., Chan, A. Y. & Lam, C. W. Inborn errors of metabolism and expanded newborn screening: review and update. Crit. Rev. Clin. Lab. Sci. 50, 142–162 (2013).
McHugh, D. et al. Clinical validation of cutoff target ranges in newborn screening of metabolic disorders by tandem mass spectrometry: a worldwide collaborative project. Genet. Med. 13, 230–254 (2011).
Wilcken, B., Wiley, V., Hammond, J. & Carpenter, K. Screening newborns for inborn errors of metabolism by tandem mass spectrometry. N. Engl. J. Med. 348, 2304–2312 (2003).
Tang, H. et al. Damaged goods?: an empirical cohort study of blood specimens collected 12 to 23 hours after birth in newborn screening in California. Genet. Med. 18, 259–264 (2016).
Adams, D. R. & Eng, C. M. Next-generation sequencing to diagnose suspected genetic disorders. N. Engl. J. Med. 379, 1353–1362 (2018).
Biesecker, L. G. & Green, R. C. Diagnostic clinical genome and exome sequencing. N. Engl. J. Med. 371, 1170 (2014).
Farnaes, L. et al. Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization. NPJ Genom. Med. 3, 10 (2018).
French, C. E. et al. Whole genome sequencing reveals that genetic conditions are frequent in intensively ill children. Intensive Care Med. 45, 627–636 (2019).
Friedman, J. M. et al. Genome-wide sequencing in acutely ill infants: genomic medicine’s critical application? Genet. Med. 21, 498–504 (2018).
Berg, J. S. et al. Newborn sequencing in genomic medicine and public health. Pediatrics. 139, e20162252 (2017).
Regaldo, A. in Technology Review (2017).
Hoffmann, G. F. in Inherited Metabolic Diseases: A Clinical Approach (eds Hoffmann, G. F., Zschocke, J. & Nyhan, W. L.) 31–32 (Springer Berlin Heidelberg, 2017).
Bassaganyas, L. et al. Whole exome and whole genome sequencing with dried blood spot DNA without whole genome amplification. Hum. Mutat. 39, 167–171 (2018).
Biesecker, L. G. Secondary findings in exome slices, virtual panels, and anticipatory sequencing. Genet. Med. 21, 41–43 (2019).
Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Feuchtbaum, L., Yang, J. & Currier, R. Follow-up status during the first 5 years of life for metabolic disorders on the federal recommended uniform screening panel. Genet. Med. 20, 831–839 (2018).
Feuchtbaum, L., Carter, J., Dowray, S., Currier, R. J. & Lorey, F. Birth prevalence of disorders detectable through newborn screening by race/ethnicity. Genet. Med. 14, 937–945 (2012).
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Matern, D. et al. Prospective diagnosis of 2-methylbutyryl-CoA dehydrogenase deficiency in the Hmong population by newborn screening using tandem mass spectrometry. Pediatrics. 112, 74–78 (2003).
Tiranti, V. et al. Ethylmalonic encephalopathy is caused by mutations in ETHE1, a gene encoding a mitochondrial matrix protein. Am. J. Hum. Genet. 74, 239–252 (2004).
Henriques, B. J. et al. Ethylmalonic encephalopathy ETHE1 R163W/R163Q mutations alter protein stability and redox properties of the iron centre. PLoS ONE 9, e107157 (2014).
Wang, Z. Q., Chen, X. J., Murong, S. X., Wang, N. & Wu, Z. Y. Molecular analysis of 51 unrelated pedigrees with late-onset multiple acyl-CoA dehydrogenation deficiency (MADD) in southern China confirmed the most common ETFDH mutation and high carrier frequency of c.250G>A. J. Mol. Med. (Berl.) 89, 569–576 (2011).
Goldfeder, R. L. et al. Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).
Sulonen, A. M. et al. Comparison of solution-based exome capture methods for next generation sequencing. Genome Biol. 12, R94 (2011).
Peng, G. et al. Combining newborn metabolic and DNA analysis for second-tier testing of methylmalonic acidemia. Genet. Med. 21, 896–903 (2019).
Vockley, J., Rinaldo, P., Bennett, M. J., Matern, D. & Vladutiu, G. D. Synergistic heterozygosity: disease resulting from multiple partial defects in one or more metabolic pathways. Mol. Genet. Metab. 71, 10–18 (2000).
Batshaw, M. L., Msall, M., Beaudet, A. L. & Trojak, J. Risk of serious illness in heterozygotes for ornithine transcarbamylase deficiency. J. Pediatr. 108, 236–241 (1986).
Bodian, D. L. et al. Utility of whole-genome sequencing for detection of newborn screening disorders in a population cohort of 1,696 neonates. Genet. Med. 18, 221–230 (2016).
Clark, M. M. et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci. Transl. Med. 11, eaat6177 (2019).
Kingsmore, S. F. et al. A randomized, controlled trial of the analytic and diagnostic performance of singleton and trio, rapid genome and exome sequencing in Ill infants. Am. J. Hum. Genet. 105, 719–733 (2019).
Calonge, N. et al. Committee report: method for evaluating conditions nominated for population-based screening of newborns and children. Genet. Med. 12, 153–159 (2010).
Rodriguez, J. M. et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 41, D110–D117 (2013).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Jian, X., Boerwinkle, E. & Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42, 13534–13544 (2014).
Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).
Chamberlin, M. E., Ubagai, T., Mudd, S. H., Levy, H. L. & Chou, J. Y. Dominant inheritance of isolated hypermethioninemia is associated with a mutation in the human methionine adenosyltransferase 1A gene. Am. J. Hum. Genet. 60, 540–546 (1997).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Van der Auwera, G. A. et al. From FASTQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Punwani, D. et al. Multisystem anomalies in severe combined immunodeficiency with mutant BCL11B. N. Engl. J. Med. 375, 2165–2176 (2016).
Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760–1774 (2012).
Tabor, H. K. et al. Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: implications for the return of incidental results. Am. J. Hum. Genet. 95, 183–193 (2014).
Jian, X. & Liu, X. In silico prediction of deleteriousness for nonsynonymous and splice-altering single nucleotide variants in the human genome. Methods Mol. Biol. 1498, 191–197 (2017).
Sunderam, U., et al. DNA from dried blood spots yields high quality sequences for exome analysis. Preprint at bioRxiv https://doi.org/10.1101/2020.05.19.105304 (2020).
Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American college of medical genetics and genomics and the association for molecular pathology. Genet. Med. 17, 405–424 (2015).
Wang, Y. et al. Perturbation robustness analyses reveal important parameters in variant interpretation pipelines. Preprint at bioRxiv https://doi.org/10.1101/2020.06.29.173815 (2020).
Yorifuji, T. et al. X-inactivation pattern in the liver of a manifesting female with ornithine transcarbamylase (OTC) deficiency. Clin. Genet. 54, 349–353 (1998).
Hu, J. et al. Association of CPT II gene with risk of acute encephalitis in Chinese children. Pediatr. Infect. Dis. J. 33, 1077–1082 (2014).
Bell, C. J. et al. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci. Transl. Med. 3, 65ra64 (2011).
Bergeron, A., D’Astous, M., Timm, D. E. & Tanguay, R. M. Structural and functional analysis of missense mutations in fumarylacetoacetate hydrolase, the gene deficient in hereditary tyrosinemia type 1. J. Biol. Chem. 276, 15225–15231 (2001).
Gallant, N. M. et al. Biochemical, molecular, and clinical characteristics of children with short chain acyl-CoA dehydrogenase deficiency detected by newborn screening in California. Mol. Genet. Metab. 106, 55–61 (2012).
Jethva, R., Bennett, M. J. & Vockley, J. Short-chain acyl-coenzyme A dehydrogenase deficiency. Mol. Genet. Metab. 95, 195–200 (2008).
Wolfe, L., et al. Short-chain acyl-CoA dehydrogenase deficiency. GeneReviews https://www.ncbi.nlm.nih.gov/books/NBK63582/ (2018).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Corvelo, A., Hallegger, M., Smith, C. W. & Eyras, E. Genome-wide association between branch point properties and alternative splicing. PLoS Comput. Biol. 6, e1001016 (2010).
Sterne-Weiler, T., Howard, J., Mort, M., Cooper, D. N. & Sanford, J. R. Loss of exon identity is a common mechanism of human inherited disease. Genome Res. 21, 1563–1571 (2011).
The authors are grateful for expert technical and computational assistance from many diligent contributors, including W. Chan, J.-M. Chandonia, A. Chellappan, N. Dabbiru, B. Dispensa, A. Neumann, A. Nguyen, A. Rao, S. Rana and Z.-Y. Wu. The work was funded by the National Institutes of Health grant U19HD077627 as part of the NSIGHT project, a joint program between the National Human Genome Research Institute and the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health. This work was also supported by a research agreement with Tata Consultancy Services. The biospecimens and/or data used in this study were obtained from the California Biobank Program (SIS request no. 496). The CPDH is not responsible for the results or conclusions drawn by the authors of this publication.
A.A. is currently an employee of Illumina, Inc. K.K. was an employee of Tata Consultancy Services (TCS); U.S. and R.S. are employees of TCS. Y.Z. is currently an employee of Yikon Genomics Co., Ltd. R.N. is an employee of Invitae. J.P. is the spouse of R. Nussbaum, an employee of Invitae. S.E.B. receives support at the University of California Berkeley through a research agreement from TCS.
Peer review information Kate Gao was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Percentage of reads unmapped to the reference genome. b, Percentage of high quality read pairs (MQ > 20), without duplicates and properly paired. c, Percentage of duplicates in the reads across three sequencing batches d-e, Number of reads and high quality reads plotted batchwise. f, Inferred insert sizes plotted batchwise. g, Median coverage across Nimblegen capture region plotted batchwise h, Median coverage across 78 genes region plotted batchwise. i, Median fraction of capture covered at coverage depths of 1x to 30x plotted batchwise. j, Median fraction of 78 genes region covered at coverage depths of 1x to 30x plotted batchwise. In figures a-f and i-j, individual sample values are plotted, and adjacent box plots display the median (red) and interquartile ranges for the dataset, whiskers extend to the last data point within 1.5 times the interquartile range. The sample sizes for the boxplots in a-h were: batch1 (n = 180), batch2 (n = 292), batch3 (n = 744). Violin plots superimposed on the box plots show the data density and mean value (blue).
a, b, Fraction of reads with 0 (green), 1 (yellow), 2 (orange), and ≥3 (red) mismatches with reference genome considering (a) all bases of the reads and (b) first 100 bases of the reads. Batches 1 and 2 had read lengths of 101 bases and batch 3 had read length of 151 bases. All three batches had similar mismatch rates when only the first 100 bases were considered. c, Nucleotide mismatches by base change (NMBC) in the 1,216 samples plotted batch wise. d, Frequencies of all single nucleotide changes by base type in high quality SNVs in the 1,216 samples plotted batchwise. High quality SNVs from the VCF calls defined as marked PASS by GATK VQSR algorithm and with GQ ≥ 30. In both c and d, box plots display the median and interquartile ranges for the dataset, whiskers extend to the last data point within 1.5 times the interquartile range and outliers beyond this are marked with circles. The sample sizes for the boxplots were batch1 (n = 180), batch2 (n = 292), batch3 (n = 744).
a, Confident sites across capture (from the GVCF file) b, Confident sites across 78 genes (from the GVCF file) c, Common high quality SNVs d, Rare high quality SNVs e, Common high quality indels f, Rare high quality indels g, Transition/Transversion ratios for high quality common SNVs h, Transition/Transition ratios for high quality rare SNVs. High quality variants are those marked as PASS by GATK VQSR and have GQ ≥ 30. Common variants have a frequency greater than 0.001 in 1000 Genomes Project phase 3 database and rare variants have a frequency less than 0.001 in the database. Individual sample values are plotted and adjacent box plots display the median (red) and interquartile ranges for the dataset, whiskers extend to the last data point within 1.5 times the interquartile range. Violin plots superimposed on the box plots show the data density and mean value (blue). The sample sizes for the boxplots were batch1 (n = 180), batch2 (n = 292), batch3 (n = 744).
Extended Data Fig. 4 Example showing variability of gene coverage in two IEM genes in the study across 1,216 samples.
MCCC2, top, has poor coverage in the first exon across all samples. In contrast, ACADM, bottom, has good coverage across the gene. The blue vertical lines indicate positions with known pathogenic variants in HGMD and ClinVar. Plot of log10 of the median, 20th percentile and minimum coverage for each coding exon across all samples for a given sample set. Dark grey: Median coverage, medium grey: 20th percentile coverage, light grey: minimum coverage for each position. Coverage quality of each exon is indicated by colored blocks beneath the exon. Coverage quality of each exon is indicated by colored blocks beneath the coverage plot. Red: Greater than 15% of exon has less than 10x median coverage; green: 95% of the exon has minimum 20x coverage. UTRs that are part of the coding exons have a smaller indicator thickness. Regions of the exon that overlap with the capture array are indicated in blue just below the coverage plot. Exon scale in bases is shown in each plot.
Extended Data Fig. 5 Alternative pipelines derived from the final exome analysis pipeline to explore sensitivity-specificity tradeoffs.
We created several alternate pipelines, altering or truncating different parts of the final exome analysis pipeline to probe contributions to overall sensitivity and specificity from various components of the pipeline. For each pipeline, the overall sensitivity and specificity on the NBSeq test set are shown. a, Final exome analysis pipeline b-i) Alternatives: b) Altering final pipeline by considering every CNV call homozygous c-e) Truncating the CNV arm, curation arm and predicted impact arm, respectively. f-g, Retaining the predicted impact arm or curation arm only, respectively h) Retaining only the rare pathogenic HGMD & ClinVar databases i) Allowing multiple gene calls for each sample if more than one gene predicted.
Extended Data Fig. 6 Distribution of variants reported by the exome analysis pipeline in the NBSeq test set.
a, Number of different variant types reported by the pipeline in IEM-affected individuals in genes associated with their IEMs the NBSeq test set (n = 674 individuals). b, Distribution of the types of variants responsible for the predictions of disease status in the 571 affected individuals correctly identified by the exome analysis pipeline.
Extended Data Fig. 7 Whole genome sequencing confirms potential IVD deletions in two individuals diagnosed with isovaleric acidemia initially missed in exome.
In two cases where we performed WGS upon follow up of an exome false negative, we identified large deletions in the associated IVD gene. The WGS read alignments in the genomic region spanning the IVD is shown on the right for the two cases. The first case had almost no coverage in the region spanning the first three exons of IVD. The second case had almost no coverage of exon 12 of IVD along with low coverage across the whole gene. The first case had 11 split reads spanning the deleted region confirming the deletion event of the first three exons.
Extended Data Fig. 8 Experimental splicing assay of a potentially pathogenic intronic variant in an exome false negative case.
a, In an individual affected with MCADD, the exome analysis pipeline reported only a single rare nonsynonymous variant. A second rare intronic variant 14 bases from the splice site (NM_000016.4:c.388-14A>G) was a suspected pathogenic modification of the branchpoint A nucleotide. b, Diagram of the heterologous HBB splicing reporter construct containing the wild type ACADM sequence or the c.388-14A>G variant. c, RT-PCR analysis of reporter transcripts from wild type or mutant (lanes 1 and 2, respectively) reporter plasmids expressed in HEK293T cells (amplicons resolved by 12% PAGE and stained with SYBR Gold). The two spliced products are shown to the right of the gel image. The experiments were performed three times independently with similar results. d, Chromatograms corresponding to the sequence spliced junctions between HBB exon 1 and the wild type or mutant ACADM exon 6 constructs (left and right panel, respectively). e, Open reading frame of aberrant ACADM mRNA containing a 13 nt extension of exon 6 (red), resulting in a premature termination codon (PTC, *). Top, DNA sense strand; middle, predicted polypeptide; bottom, DNA reverse complement.
Extended Data Fig. 9 Stratification of IEM-affected and MS/MS false positives by alleles reported by the exome analysis pipeline for NPV estimation of NPV of exome as a follow-up test after a positive MS/MS screen.
For six MS/MS screens (VLCADD, PKU, LCHADD/TFP, IVA, MSUD, and GA-II), IEM-affected and MS/MS false positive cases in the NBSeq test set are stratified by the number of alleles reported by the exome analysis pipeline in the genes associated with those screens.
Extended Data Fig. 10 Zygosity distribution of variants reported by the pipeline in relevant gene(s).
For each IEM, bars show the zygosity distribution of the variants in relevant genes reported by the exome pipeline for the 674 IEM-affected cases from the test set. The numbers of cases correctly identified by the pipeline are broken down into those that had homozygous variants in relevant gene(s) (dark blue) and those that had two heterozygous variants in relevant genes(s) (orange). The number of cases that failed to be identified by the pipeline are broken down into those that had one heterozygous variant in relevant gene(s) (light blue) and those that had no reported variants in the relevant gene(s) (dark red). Left, core IEMs screened by California; right, secondary/add-on IEMs. IEMs sharing a common causative gene were not distinguished by the exome predictions alone. These included TFP and LCHADD (blue shading), PKU and hyperphenylalaninemia (pink shading), and the various MMA subtypes (yellow shading).
About this article
Cite this article
Adhikari, A.N., Gallagher, R.C., Wang, Y. et al. The role of exome sequencing in newborn screening for inborn errors of metabolism. Nat Med 26, 1392–1397 (2020). https://doi.org/10.1038/s41591-020-0966-5
Treatable inherited metabolic disorders causing intellectual disability: 2021 review and digital app
Orphanet Journal of Rare Diseases (2021)
Discordant results between conventional newborn screening and genomic sequencing in the BabySeq Project
Genetics in Medicine (2021)
Neuroscience Letters (2021)
An appraisal of the Wilson & Jungner criteria in the context of genomic-based newborn screening for inborn errors of immunity
Journal of Allergy and Clinical Immunology (2021)
Journal of Inherited Metabolic Disease (2021)