Introduction

Next-generation sequencing (NGS) has revolutionized the world of molecular diagnosis over the last decade. This technological evolution has allowed for the sequencing of millions of genomes and exomes, and the exponential increase in related publications is proportional to the gradual decline in cost1. To date, the methodology has mainly been applied in clinical settings on high-quality DNA samples (whole blood) or on DNA extracted from formalin-fixed, paraffin-embedded tissues2, but protocols have not yet been clinically validated on certain challenging materials such as degraded DNA from forensic samples3 or dried blood spots (DBS).

Blood collection on filter paper has evolved as a reference procedure for the collection, transport, analysis and storage of biological fluids. For over 50 years, this sampling protocol has been the key to newborn screening programs worldwide. The Clinical Laboratory Standards Institute (CLSI) periodically edits its corresponding guidelines4. Currently, the range of applications performed using filter paper has widely broadened and includes, among other, diet follow-up in metabolic disorders (e.g., phenylketonuria)5, therapeutic drug monitoring6, doping control7, viral load measurements8 and targeted gene sequencing9. Accordingly, the number of PubMed (www.ncbi.nlm.nih.gov/pubmed)-referenced publications associated with “dried blood spots” item is greatly increasing.

Considering the growing interest in DBS testing, it was worth evaluating whether whole-exome sequencing of such material could detect specific inborn errors of metabolism (IEM) identified by biochemical methods and/or Sanger sequencing. Previous generic publications already reported that filter paper could be used for such a purpose10,11,12, but these papers provided a general overview of technological capacities (i.e., coverage, error rate, number of single nucleotide polymorphisms (SNPs)) and did not focus on the identification of specific disorders or mutations.

Assuming future implementation of NGS technologies into newborn screening (NBS), we conducted a preliminary study sequencing whole exomes on DBS specifically issued from patients with well-established IEM. We interpreted our data with a specific focus on genes related to NBS programs, thus aiming to demonstrate that DBS is an appropriate material for future NBS programs relying on high-throughput sequencing technologies.

Results

DNA Extraction

Genomic material was extracted from five blood spots (3.1 mm) simultaneously. DNA integrity was assessed using the KAPA hgDNA Quantification and QC® kit. The amounts of isolated DNA fluctuated between 62 and 248 ng. Q-ratios were close to 1 for all samples, suggesting that the quality of the extracted DNA was reliable.

Sequencing

A focus was initially set on identifying disorders included in the official newborn screening program of the French Community of Belgium (FWB). Accordingly, Table 1 synthesizes the different diseases and their corresponding mutations for the 15 tested patients.

Table 1 Disorders analyzed by exome sequencing using DBS.

The bioinformatics flowchart of whole-exome sequencing (WES) was designed to specifically target the 35 IEM genes involved in the NBS program of the FWB and 74 additional genes involved in disorders included or under discussion for inclusion in different official NBS programs13,14,15,16. Among these additional disorders, we also considered some specific treatable conditions that cannot be identified with reliable biomarkers but that could benefit from early intervention, such as pyridoxine-dependent epilepsy or serine biosynthesis defects (Table 2).

Table 2 Disorders and corresponding genes generally considered by NBS programs.

Coverage of the different exons for each gene highly fluctuated; some regions were uncovered, while other regions had a read depth of up to 238-fold. The number of reads for the different detected mutations varied between 8 and 83x. This coverage heterogeneity among the different selected genes is depicted in Fig. 1.

Figure 1
figure 1

Mean depth of coverage for the different exons of selected genes. Blue shape represents the mean coverage for each exon. Red markers represent the mean coverage by a gene; these markers are sorted in decreasing order. (A) The 35 IEM genes included in the NBS program of the FWB. (B) Additional disorders that either are considered by different NBS initiatives worldwide or could benefit from early preventive care.

Nevertheless, all covered pathogenic mutations, either homozygote or compound heterozygote, for each patient have been identified by WES on DNA extracted from DBS. For patient DBS-14, MSUD was suspected initially upon newborn screening based on leucine/isoleucine levels (1262 µmol/L). Subsequent amino acid analysis identified the pathognomonic presence of allo-isoleucine, thus confirming the disorder. As molecular testing had not yet been requested, mutations had not been previously characterized by Sanger sequencing. We intended then to identify the pathogenic defects in DBT, BCKDHA, or BCKDHB. Unfortunately, the diagnosis of MSUD could not be confirmed based on coding sequence analysis of the respective genes, although a new unreferenced heterozygote mutation, c.G742T (p.A248S), was identified in BCKDHB. Since a significant percentage of DBT pathogenic variants are deletions (both large and small)17, we cannot exclude a large deletion in this gene, even though the gene coverage for this patient was not significantly different from the other 14 samples analyzed. Thus, any causal intronic mutation cannot be ruled out. For patient DBS-12, the 4 base-pair deletion located in the GALT promoter region and associated with the Duarte 2 (D2) allele was not covered by the exome sequencing probes and thus could not be identified. Nonetheless, the other four mutations associated with the D2 haplotype have been correctly characterized. Determination of the 15 base-pair duplication in MMAB (patient DBS-5) was also critical, as it was neither annotated by Annovar18 nor automatically identified with IGV software. Only an explicit visualization of the region of interest in IGV allowed the insertion to be identified.

We also studied the “presumed benign” polymorphisms using Cartagenia Bench Lab CNV software (Leuven, Belgium). The putative clinical impact of these variants, evaluated with two prediction databases19,20, revealed some unexpected information (Table 3). For patient DBS-6 with Tyrosinemia type I, the homozygote mutation c.554-1G > T in FAH was easily confirmed, and the pathogenic nonsense homozygote c.2056C > T (p.Gln686Ter) mutation in DUOX2 (read depth of 6x), known to cause thyroid dyshormonogenesis type 6 and congenital hypothyroidism21, was also identified. However, this 24-year-old patient presents fully normal thyroid function, with repeated normal thyroxin and thyrotropin values measured over several years. The sequencing data were confirmed on a separate NGS experiment (from DNA extraction to sequence interpretation) with better coverage (read depth of 27x), as well as by Sanger sequencing. Such a genotype/phenotype discrepancy is quite surprising for a premature termination variant, but the mutation is located downstream of the thyroperoxidase active site of the protein22; thus, we could not exclude a residual functional activity. Moreover, variant databases describe this mutation as pathogenic on the basis of a unique publication reporting a single patient who was heterozygous for the anomaly21. To our knowledge, no functional studies have ever been performed to determine the activity of the truncated protein. Therefore, our data indicate that this variant should be classified as variant of unknown significance.

Table 3 Number of variants annotated in the different samples (focused on the 109 genes considered), and the corresponding clinical relevance of filtered polymorphisms evaluated among different databases (MutationTaster and ClinVar).

Discussion

This pilot study demonstrates that the amount and the quality of DNA extracted from DBS are adequate to identify pathogenic mutations by high-throughput sequencing. Although samples and genes carrying mutations are in limited numbers and extrapolation of the results to larger cohorts should be done with some circumspection, our present report underlines some of the challenges that WES faces. Indeed, WES reveals the vast depth of fluctuations in coverage between regions, which could subsequently generate difficulties in interpreting variants. Copy number variations (CNVs) should also be detected with caution as the unambiguous identification of small or large allelic deletions by NGS can be challenging when coverage is poor. Moreover, as observed with the 15-base-pair duplication in MMAB, small CNVs are not easily identified by bioinformatics tools. Hopefully, with the next evolution towards whole-genome sequencing (WGS), several drawbacks of WES could be solved. Indeed, WGS offers better coverage uniformity and provides more reliable sequences. WGS also improves CNV identification without the need for target amplification and allows the identification of non-coding alterations23.

Expecting drastic cost reductions and process automation in the near future, we could easily imagine our experiments contributing to paving the way for “next-generation” neonatal screening programs, provided that new paradigms (clinical, political, economic, societal and ethical) are defined. The first revolution already occurred in the world of newborn screening approximately fifteen years ago with the implementation of tandem mass spectrometry24. Currently, while this technological progress continues to challenge enacted codes (i.e., the Wilson and Jungner criteria)25,26,27, the second revolution is underway. NGS is now positioned as a universal approach allowing the identification of many disorders with one technology. Considering that and the results of our pilot study, we aim to further assess the utility of massive sequencing in a larger population. Several technical and clinical aspects of this ambitious pursuit are discussed here.

Presently, high-throughput sequencing is laborious and does not meet the requirements of NBS programs. Very large amounts of useless data are generated, and consequently, the treatment of bioinformatics data and review of variants generate unacceptable turnaround times compared to those of current biochemical assays. The interest in using WES (or WGS) to replace targeted approaches has already been discussed28,29, and based on actual available technologies and knowledge, the implementation of a selective approach appears to be the better choice. Such a panel analysis would be intended to improve coverage homogenization and to ensure a minimal read depth threshold between regions of interest. Bioinformatics analysis would thus be facilitated, and the costs of analysis would be reduced. Additionally, with the expected development of automated bioinformatics pipelines, a significant reduction in NGS analysis time can be envisaged in the future. In such targeted approaches, the list of targeted genes should obviously not be restrictive, since newborn screening programs are constantly evolving as new therapies are developed.

To date, the costs of massive sequencing remain disproportionate compared to those of mass spectrometry-based approaches. Therefore, implementation of NGS technologies into NBS could probably be first considered as a combined metabolomics-genomics approach, with the sequencing focusing only on capturing conditions without reliable biomarkers. Indeed, our experiments allowed for accurate sequencing with acceptable coverage of the coding regions of some treatable disorders for which identification is not reliable using mass spectrometry techniques (e.g., pyridoxine-dependent epilepsy, cerebral creatine deficiency syndrome). Using sequencing only curable diseases that lack defined biomarkers would be intended to initially limit the costs of implementing NGS in NBS. Afterward, greatly increasing the number of samples tested using molecular techniques would help to reduce reagents and bioinformatics costs, subsequently supporting the sustainability of molecular NBS.

Applying WES (or WGS) to newborn screening may also present substantial benefits. Assuming that blood samples could be collected earlier (i.e., at the day of birth, eventually from cord blood), the medical care needs of affected neonates could be anticipated. Moreover, given the wide variability of screened disorders worldwide, harmonization of NBS programs could be facilitated with the implementation of such universal technologies. The acquisition of genomic sequences at birth may also be beneficial for individuals who become sick later in life. Indeed, presuming lifelong data storage on a secured and controlled server, retrospective consultations of patients data could be helpful to reduce delays in the diagnosis of rare diseases30. Access to patient’s information in such instances should obviously be driven by strict clinical and ethical constraints.

Careful consideration will also need to be given to unexpected and medically irrelevant incidental findings. As reported for patient DBS-6, an unexpected homozygote variant that was previously considered a pathogenic has been characterized in a gene unrelated to the patient’s disorder, questioning the reliability of some variants referenced in databases. Heterozygous carriers of recessive defects are characterized unequivocally, and polymorphisms and intermediate deficiencies requiring no intervention are also identified. These results might burden medical practices (increasing unnecessary documentation as well as anxiety in healthy carriers) and possibly cripple healthcare budgets. Substantial efforts will thus be needed to clarify genotype/phenotype correlations, and large studies are required to associate unequivocal biochemical defects with gene variants. Our knowledge of the genome will subsequently be improved and will progressively enhance the sensitivity and specificity of these assays.

With these new high-throughput technologies, the current restriction focusing the screening to diseases for which effective treatment is available could also be reconsidered. This limitation confines, among other things, the clinical trials to symptomatic patients and ignores the potential benefits of any preventive intervention. Early identification of patients for other conditions could probably allow pre-symptomatic therapies in randomized studies. Additionally, the feasibility of the voluntary expansion of screening, providing the choice to families who want to know about other conditions, is already under debate31,32,33,34. Educational challenges in the training of health professionals and in information provided to the public should also be considered. Parents should be informed of the screening perimeter, its implications and the follow-up required. Appropriate infrastructure should ensure care, education and follow-up. Specific registries should be set up to provide the opportunity for families to include children in clinical trials for new treatments.

Finally, the emergence of the NGS era will call into question the current neonatal screening dogma. Old doctrines should not be barriers to the emergence of new expectations: scientific and technological advances must obviously be encouraged, but they cannot be made without any clinical, political, economic, societal and ethical debates35,36. Accordingly, the National Human Genome Research Institute already promotes an Ethical, Legal, and Social Implications (ELSI) Program to anticipate and address these issues37.

Methods

Samples

Fifteen patients with confirmed IEM were considered in this study. Almost all patients were identified by newborn screening, and for all except one, mutations were initially characterized by Sanger sequencing during diagnostic work-up.

In the course of the patient’s clinical follow-up, amino acid or acylcarnitine profiles are routinely analyzed, and for logistical considerations, whole blood is collected on filter paper. Ethical approval (reference B707201421546) was obtained from the Institutional Review Board (Ethical Committee of the Faculty of Medicine of the University of Liege), in compliance with the Declaration of Helsinki. All experiments were performed in accordance with relevant guidelines and regulations, and all patients or their legal representatives signed a written informed consent form. This work consisted of a prospective study and did not lead to any changes in the treatment of enrolled patients. Only residual DBS were used to perform exome sequencing.

DNA Extraction

Experiments were performed using five blood spots (3.1 mm diameter) for each patient. DNA was extracted from DBS according to the protocol recently published by St Julien and collaborators38, with slight modifications. The amounts of DNA were estimated, and the quality of the retrieved material was assessed using the KAPA hgDNA Quantification and QC® kit (Kapa Biosystems), which is designed to amplify targets of 41 base pairs (bp), 129 bp, and 305 bp within a conserved single-copy locus in the human genome. Absolute quantification is achieved using the 41 bp assay, while the longer amplicons are used to assess DNA quality. Since DNA damage has a greater impact on the amplification of longer targets, the relative quality of a DNA sample can be inferred by normalizing the concentration obtained using the 129 bp or 305 bp assay against the concentration obtained with the 41 bp assay. This normalization generates “Q-ratios” with values between 0 and 1, which can be used as a relative measure of DNA quality prior to NGS library construction.

Sequencing

Briefly, 100 ng of extracted DNA was fragmented (Bioruptor®, Diagenode) and used to prepare indexed libraries (SeqCap EZ Indexed Adapters; Roche) with the KAPA Hyper Prep® kit (Kapa Biosystems). These libraries were pooled equimolarly and incubated with probes to capture all coding exons (44.1 Mb target) (SeqCap EZ Human Exome library v.2.0; Roche). Sequencing was performed with 2*75 bp reads on a high mode NextSeq. 500 run. The entire analytical process is illustrated in Fig. 2.

Figure 2
figure 2

Overview of analytical workflow.

Data Processing

A bioinformatics flowchart is presented in Fig. 3 39. Data analysis was performed using Galaxy tools on the usegalaxy.org server40. Raw reads were mapped against a reference genome (GRCh37/hg19) with BWA-MEM version 0.7.15.1. PCR duplicates were flagged with Picard version 2.7.1. Indel realignment, base quality recalibration and coverage depth calculations were optimized with GATK version 3.8. Sequences were visualized with IGV (Integrative Genomics Viewer)41. Anonymized data were stored under controlled access on a secured server.

Figure 3
figure 3

Framework for variation discovery and genotyping from NGS sequencing.