A next-generation newborn screening pilot study: NGS on dried blood spots detects causal mutations in patients with inherited metabolic diseases

The range of applications performed on dried blood spots (DBS) widely broadened during the past decades to now include next-generation sequencing (NGS). Previous publications provided a general overview of NGS capacities on DBS-extracted DNA but did not focus on the identification of specific disorders. We thus aimed to demonstrate that NGS was reliable for detecting pathogenic mutations on genomic material extracted from DBS. Assuming the future implementation of NGS technologies into newborn screening (NBS), we conducted a pilot study on fifteen patients with inherited metabolic disorders. Blood was collected from DBS. Whole-exome sequencing was performed, and sequences were analyzed with a specific focus on genes related to NBS. Results were compared to the known pathogenic mutations previously identified by Sanger sequencing. Causal mutations were readily characterized, and multiple polymorphisms have been identified. According to variant database prediction, an unexplained homozygote pathogenic mutation, unrelated to patient’s disorder, was also found in one sample. While amount and quality of DBS-extracted DNA are adequate to identify causal mutations by NGS, bioinformatics analysis revealed critical drawbacks: coverage fluctuations between regions, difficulties in identifying insertions/deletions, and inconsistent reliability of database-referenced variants. Nevertheless, results of this study lead us to consider future perspectives regarding “next-generation” NBS.

Considering the growing interest in DBS testing, it was worth evaluating whether whole-exome sequencing of such material could detect specific inborn errors of metabolism (IEM) identified by biochemical methods and/ or Sanger sequencing. Previous generic publications already reported that filter paper could be used for such a purpose [10][11][12] , but these papers provided a general overview of technological capacities (i.e., coverage, error rate, number of single nucleotide polymorphisms (SNPs)) and did not focus on the identification of specific disorders or mutations.
Assuming future implementation of NGS technologies into newborn screening (NBS), we conducted a preliminary study sequencing whole exomes on DBS specifically issued from patients with well-established IEM. We interpreted our data with a specific focus on genes related to NBS programs, thus aiming to demonstrate that DBS is an appropriate material for future NBS programs relying on high-throughput sequencing technologies.

Results
DNA Extraction. Genomic material was extracted from five blood spots (3.1 mm) simultaneously. DNA integrity was assessed using the KAPA hgDNA Quantification and QC ® kit. The amounts of isolated DNA fluctuated between 62 and 248 ng. Q-ratios were close to 1 for all samples, suggesting that the quality of the extracted DNA was reliable.

Sequencing.
A focus was initially set on identifying disorders included in the official newborn screening program of the French Community of Belgium (FWB). Accordingly, Table 1 synthesizes the different diseases and their corresponding mutations for the 15 tested patients.
The bioinformatics flowchart of whole-exome sequencing (WES) was designed to specifically target the 35 IEM genes involved in the NBS program of the FWB and 74 additional genes involved in disorders included or under discussion for inclusion in different official NBS programs [13][14][15][16] . Among these additional disorders, we also considered some specific treatable conditions that cannot be identified with reliable biomarkers but that could benefit from early intervention, such as pyridoxine-dependent epilepsy or serine biosynthesis defects ( Table 2).
Coverage of the different exons for each gene highly fluctuated; some regions were uncovered, while other regions had a read depth of up to 238-fold. The number of reads for the different detected mutations varied between 8 and 83x. This coverage heterogeneity among the different selected genes is depicted in Fig. 1.
Nevertheless, all covered pathogenic mutations, either homozygote or compound heterozygote, for each patient have been identified by WES on DNA extracted from DBS. For patient DBS-14, MSUD was suspected initially upon newborn screening based on leucine/isoleucine levels (1262 µmol/L). Subsequent amino acid analysis identified the pathognomonic presence of allo-isoleucine, thus confirming the disorder. As molecular testing had not yet been requested, mutations had not been previously characterized by Sanger sequencing. We intended then to identify the pathogenic defects in DBT, BCKDHA, or BCKDHB. Unfortunately, the diagnosis of MSUD could not be confirmed based on coding sequence analysis of the respective genes, although a new unreferenced   17 , we cannot exclude a large deletion in this gene, even though the gene coverage for this patient was not significantly different from the other 14 samples analyzed. Thus, any causal intronic mutation cannot be ruled out. For patient DBS-12, the 4 base-pair deletion located in the GALT promoter region and associated with the Duarte 2 (D2) allele was not covered by the exome sequencing probes and thus could not be identified. Nonetheless, the other four mutations associated with the D2 haplotype have been correctly characterized. Determination of the 15 base-pair duplication in MMAB (patient DBS-5) was also critical, as it was neither annotated by Annovar 18 nor automatically identified with IGV software. Only an explicit visualization of the region of interest in IGV allowed the insertion to be identified. We also studied the "presumed benign" polymorphisms using Cartagenia Bench Lab CNV software (Leuven, Belgium). The putative clinical impact of these variants, evaluated with two prediction databases 19,20 , revealed some unexpected information (   c.554-1G > T in FAH was easily confirmed, and the pathogenic nonsense homozygote c.2056C > T (p.Gln686Ter) mutation in DUOX2 (read depth of 6x), known to cause thyroid dyshormonogenesis type 6 and congenital hypothyroidism 21 , was also identified. However, this 24-year-old patient presents fully normal thyroid function, with repeated normal thyroxin and thyrotropin values measured over several years. The sequencing data were confirmed on a separate NGS experiment (from DNA extraction to sequence interpretation) with better coverage (read depth of 27x), as well as by Sanger sequencing. Such a genotype/phenotype discrepancy is quite surprising for a premature termination variant, but the mutation is located downstream of the thyroperoxidase active site of the protein 22 ; thus, we could not exclude a residual functional activity. Moreover, variant databases describe this mutation as pathogenic on the basis of a unique publication reporting a single patient who was heterozygous for the anomaly 21 . To our knowledge, no functional studies have ever been performed to determine the activity of the truncated protein. Therefore, our data indicate that this variant should be classified as variant of unknown significance.

Discussion
This pilot study demonstrates that the amount and the quality of DNA extracted from DBS are adequate to identify pathogenic mutations by high-throughput sequencing. Although samples and genes carrying mutations are in limited numbers and extrapolation of the results to larger cohorts should be done with some circumspection, our present report underlines some of the challenges that WES faces. Indeed, WES reveals the vast depth of fluctuations in coverage between regions, which could subsequently generate difficulties in interpreting variants. Copy number variations (CNVs) should also be detected with caution as the unambiguous identification of small or large allelic deletions by NGS can be challenging when coverage is poor. Moreover, as observed with the 15-base-pair duplication in MMAB, small CNVs are not easily identified by bioinformatics tools. Hopefully, with the next evolution towards whole-genome sequencing (WGS), several drawbacks of WES could be solved. Indeed, WGS offers better coverage uniformity and provides more reliable sequences. WGS also improves CNV identification without the need for target amplification and allows the identification of non-coding alterations 23 . Expecting drastic cost reductions and process automation in the near future, we could easily imagine our experiments contributing to paving the way for "next-generation" neonatal screening programs, provided that new paradigms (clinical, political, economic, societal and ethical) are defined. The first revolution already occurred in the world of newborn screening approximately fifteen years ago with the implementation of tandem mass spectrometry 24 . Currently, while this technological progress continues to challenge enacted codes (i.e., the Wilson and Jungner criteria) [25][26][27] , the second revolution is underway. NGS is now positioned as a universal approach allowing the identification of many disorders with one technology. Considering that and the results of our pilot study, we aim to further assess the utility of massive sequencing in a larger population. Several technical and clinical aspects of this ambitious pursuit are discussed here.
Presently, high-throughput sequencing is laborious and does not meet the requirements of NBS programs. Very large amounts of useless data are generated, and consequently, the treatment of bioinformatics data and review of variants generate unacceptable turnaround times compared to those of current biochemical assays. The interest in using WES (or WGS) to replace targeted approaches has already been discussed 28,29 , and based on actual available technologies and knowledge, the implementation of a selective approach appears to be the better choice. Such a panel analysis would be intended to improve coverage homogenization and to ensure a minimal read depth threshold between regions of interest. Bioinformatics analysis would thus be facilitated, and the costs  of analysis would be reduced. Additionally, with the expected development of automated bioinformatics pipelines, a significant reduction in NGS analysis time can be envisaged in the future. In such targeted approaches, the list of targeted genes should obviously not be restrictive, since newborn screening programs are constantly evolving as new therapies are developed.
To date, the costs of massive sequencing remain disproportionate compared to those of mass spectrometry-based approaches. Therefore, implementation of NGS technologies into NBS could probably be first considered as a combined metabolomics-genomics approach, with the sequencing focusing only on capturing conditions without reliable biomarkers. Indeed, our experiments allowed for accurate sequencing with acceptable coverage of the coding regions of some treatable disorders for which identification is not reliable using mass spectrometry techniques (e.g., pyridoxine-dependent epilepsy, cerebral creatine deficiency syndrome). Using sequencing only curable diseases that lack defined biomarkers would be intended to initially limit the costs of implementing NGS in NBS. Afterward, greatly increasing the number of samples tested using molecular techniques would help to reduce reagents and bioinformatics costs, subsequently supporting the sustainability of molecular NBS.
Applying WES (or WGS) to newborn screening may also present substantial benefits. Assuming that blood samples could be collected earlier (i.e., at the day of birth, eventually from cord blood), the medical care needs of affected neonates could be anticipated. Moreover, given the wide variability of screened disorders worldwide, harmonization of NBS programs could be facilitated with the implementation of such universal technologies. The acquisition of genomic sequences at birth may also be beneficial for individuals who become sick later in life. Indeed, presuming lifelong data storage on a secured and controlled server, retrospective consultations of patients data could be helpful to reduce delays in the diagnosis of rare diseases 30 . Access to patient's information in such instances should obviously be driven by strict clinical and ethical constraints.
Careful consideration will also need to be given to unexpected and medically irrelevant incidental findings. As reported for patient DBS-6, an unexpected homozygote variant that was previously considered a pathogenic has been characterized in a gene unrelated to the patient's disorder, questioning the reliability of some variants referenced in databases. Heterozygous carriers of recessive defects are characterized unequivocally, and polymorphisms and intermediate deficiencies requiring no intervention are also identified. These results might burden medical practices (increasing unnecessary documentation as well as anxiety in healthy carriers) and possibly cripple healthcare budgets. Substantial efforts will thus be needed to clarify genotype/phenotype correlations, and large studies are required to associate unequivocal biochemical defects with gene variants. Our knowledge of the genome will subsequently be improved and will progressively enhance the sensitivity and specificity of these assays.
With these new high-throughput technologies, the current restriction focusing the screening to diseases for which effective treatment is available could also be reconsidered. This limitation confines, among other things, the clinical trials to symptomatic patients and ignores the potential benefits of any preventive intervention. Early identification of patients for other conditions could probably allow pre-symptomatic therapies in randomized studies. Additionally, the feasibility of the voluntary expansion of screening, providing the choice to families who want to know about other conditions, is already under debate [31][32][33][34] . Educational challenges in the training of health professionals and in information provided to the public should also be considered. Parents should be informed of the screening perimeter, its implications and the follow-up required. Appropriate infrastructure should ensure care, education and follow-up. Specific registries should be set up to provide the opportunity for families to include children in clinical trials for new treatments.
Finally, the emergence of the NGS era will call into question the current neonatal screening dogma. Old doctrines should not be barriers to the emergence of new expectations: scientific and technological advances must obviously be encouraged, but they cannot be made without any clinical, political, economic, societal and ethical debates 35,36 . Accordingly, the National Human Genome Research Institute already promotes an Ethical, Legal, and Social Implications (ELSI) Program to anticipate and address these issues 37 .

Methods
Samples. Fifteen patients with confirmed IEM were considered in this study. Almost all patients were identified by newborn screening, and for all except one, mutations were initially characterized by Sanger sequencing during diagnostic work-up.
In the course of the patient's clinical follow-up, amino acid or acylcarnitine profiles are routinely analyzed, and for logistical considerations, whole blood is collected on filter paper. Ethical approval (reference B707201421546) was obtained from the Institutional Review Board (Ethical Committee of the Faculty of Medicine of the University of Liege), in compliance with the Declaration of Helsinki. All experiments were performed in accordance with relevant guidelines and regulations, and all patients or their legal representatives signed a written informed consent form. This work consisted of a prospective study and did not lead to any changes in the treatment of enrolled patients. Only residual DBS were used to perform exome sequencing. DNA Extraction. Experiments were performed using five blood spots (3.1 mm diameter) for each patient. DNA was extracted from DBS according to the protocol recently published by St Julien and collaborators 38 , with slight modifications. The amounts of DNA were estimated, and the quality of the retrieved material was assessed using the KAPA hgDNA Quantification and QC ® kit (Kapa Biosystems), which is designed to amplify targets of 41 base pairs (bp), 129 bp, and 305 bp within a conserved single-copy locus in the human genome. Absolute quantification is achieved using the 41 bp assay, while the longer amplicons are used to assess DNA quality. Since DNA damage has a greater impact on the amplification of longer targets, the relative quality of a DNA sample can be inferred by normalizing the concentration obtained using the 129 bp or 305 bp assay against the concentration obtained with the 41 bp assay. This normalization generates "Q-ratios" with values between 0 and 1, which can be used as a relative measure of DNA quality prior to NGS library construction.
These libraries were pooled equimolarly and incubated with probes to capture all coding exons (44.1 Mb target) (SeqCap EZ Human Exome library v.2.0; Roche). Sequencing was performed with 2*75 bp reads on a high mode NextSeq. 500 run. The entire analytical process is illustrated in Fig. 2.  Fig. 3 39 . Data analysis was performed using Galaxy tools on the usegalaxy.org server 40 . Raw reads were mapped against a reference genome (GRCh37/hg19) with BWA-MEM version 0.7.15.1. PCR duplicates were flagged with Picard version 2.7.1. Indel realignment, base quality recalibration and coverage depth calculations were optimized with GATK version 3.8. Sequences were visualized with IGV (Integrative Genomics Viewer) 41 . Anonymized data were stored under controlled access on a secured server.