Adapting SureSelect enrichment protocol to the Ion Torrent S5 platform in molecular diagnostics of craniosynostosis

Obtaining reliable and high fidelity next-generation sequencing (NGS) data requires to choose a suitable sequencing platform and a library preparation approach, which both have their inherent assay-specific limitations. Here, we present the results of successful adaptation of SureSelect hybridisation-based target enrichment protocol for the sequencing on the Ion Torrent S5 platform, which is designed to work preferably with amplicon-based panels. In our study, we applied a custom NGS panel to screen a cohort of 16 unrelated patients affected by premature fusion of the cranial sutures, i.e. craniosynostosis (CS). CS occurs either as an isolated malformation or in a syndromic form, representing a genetically heterogeneous and clinically variable group of disorders. The approach presented here allowed us to achieve high quality NGS data and confirmed molecular diagnosis in 19% of cases, reaching the diagnostic yield similar to some of the published research reports. In conclusion, we demonstrated that an alternative enrichment strategy for library preparations can be successfully applied prior to sequencing on the Ion Torrent S5 platform. Also, we proved that the custom NGS panel designed by us represents a useful and effective tool in the molecular diagnostics of patients with CS.

sequencing complexity and coverage uniformity [15][16][17][18] . In general, the problem of non-specificity in PCR-based methods often cannot be circumvented by careful primer design, as the oligonucleotides have usually very short sequence 16,18,19 . On the other hand, an alternative approach, i.e. hybridisation-based enrichment protocols such as SureSelect (Agilent Technologies) is available and is commonly applied on the Illumina platforms. The SureSelect approach is based on biotinylated RNA oligomers of substantially greater length (120 bp), which can bind to DNA more specifically and consequently enrich the targeted regions of the genome, avoiding repetitive or non-specific amplification. Therefore, SureSelect enrichment strategy allows for obtaining better sequencing complexity and coverage uniformity 16 .
To our knowledge, SureSelect libraries have not been used so far to carry out the sequencing on the Ion Torrent S5 semiconductor-based platform. In this report, we present the first example of a successful adaptation of the hybridisation-based SureSelect enrichment protocol to the sequencing on the Ion Torrent S5 system. In addition, using a cohort of patients presenting with craniosynostosis, we emphasise the utility of targeted gene panel sequencing in the diagnostics of this aetiologically heterogeneous condition. craniosynostosis Craniosynostosis (CS), premature fusion of one or more cranial sutures, occurs either as an isolated malformation or in a syndromic form, representing a genetically heterogeneous and clinically variable group of disorders 20 . Routine diagnostic screening of common craniosynostosis-associated genes (usually FGFR1, FGFR2, FGFR3, TWIST1 and often EFNB1, TCF12) enables to establish genetic aetiology in 21% 21 to 62% 22 , depending on the size of the study, ethnicity of the population, and range of the molecular analysis (either hot-spot screening or the entire gene sequencing). Since targeted NGS is regarded as a useful diagnostic method in identification of causative variants, especially in genetically heterogeneous diseases 23,24 , we have designed and applied a custom hybridisation-based panel (Agilent Technologies) to screen CS patients with negative results of preliminary molecular screening (involving hot-spot mutations located in exon 7 of FGFR1, exons 7 and 8 of FGFR2, and exon 7 of FGFR3, as well as we analysed the entire coding sequence of TWIST1).

Results
Clinical description. We used an NGS targeted gene panel approach to screen 16 consecutive patients with CS in whom the result of conventional Sanger sequencing of preliminary molecular screening was negative. Distribution of prematurely fused sutures was as follows: coronal -6/16 (unilateral -4, bilateral -2), metopic -5/16, sagittal -3/16, multiple -2/16. 56.25% of patients from our cohort presented with the syndromic form of CS, whereas 43.75% had an isolated defect. All patients were subjected to a careful dysmorphological assessment upon which clinically recognisable craniofacial malformations and other defects were photographically documented. Additionally, diagnostic imaging, including X-rays, CT scans, or head MRI was performed. DNA was extracted from venous blood samples of index patients and their parents.  Table 1) and 11 SNVs (see Table 2) thought to be associated with craniosynostosis and abnormalities of craniofacial development. To create our gene panel we have used SureDesign software (Agilent Technologies, SantaClara, USA). The designed panel was further refined in collaboration with Perlan Biotechnologies. The panel summary is as follows: Agilent Design ID: 3056721, panel name: Cranio_V1, region size: 173.794 kb, 6033 probes (225 709 kb) with region extension: 25 bases from 3′ end and 25 bases from 5′ end. The panel was classified to price tier 1, in which target region size ranges from 1 to 499 kbs. Hence, the target sequence and also the gene content could be increased at least two times without additional cost.
Quality control and coverage estimation. In each sample the estimated coverage exceeded 50 reads for over 95% of the target gene sequence (see Fig. 1). Mean coverages for the analysed genes and single nucleotide variants (SNVs) are summarised in Supplementary Materials (see Supplementary 1). There were marked discrepancies among the mean coverages of different samples, ranging from 129 in sample 3 to 337 in sample 11, with an average coverage of 240 reads calculated per gene. Across individual genes, SMAD6 had the lowest average coverage of 133, while POLR1D was relatively best covered (321 on average).

Identification and evaluation of candidate variants.
After sequencing of all 16 DNA samples on the Ion Torrent S5 system and completing the alignments, we assessed variant quality using multiple criteria (see Methods) and predicted the significance of individual variants. During the quality control, out of 2565 called variants, 87 (3.4%) were dropped as artefacts. In three cases, we detected the variants definitely causative for the patients' phenotypes. Patient 1 was suspected of Pfeiffer syndrome, based on the clinical assessment. His phenotype involved sagittal CS, maxillary hypoplasia, high palate, proptosis, broad halluces, and skin syndactyly of 2 nd and 3 rd toes. X-ray examination of the feet showed hypoplastic middle phalanges of all toes and the relative widening of 1 st metatarsals as well as broadening of all bones forming the halluces (see Fig. 2a,b). Upon NGS analysis we found a pathogenic heterozygous variant in FGFR2 gene NM_000141.4:c.868T > G, NP_000132.3:p.Trp290Gly (HGMD: CM950464, ClinVar: 13284) (see Fig. 2c,d). Pathogenic variant was confirmed by means of Sanger sequencing in the index case and excluded in his unaffected parents, clearly indicating a de novo occurrence.
As female Patient 7 presented with complex CS involving sagittal and bilateral coronal synostosis, dolichocephaly, macrocephaly, prominent forehead, flat facial profile, proptosis, brachydactyly and broad halluces, clinical diagnosis also matched Pfeiffer syndrome (see Fig. 3a www.nature.com/scientificreports www.nature.com/scientificreports/ ClinVar: 374823) (see Fig. 3d,e). Pathogenic variant was confirmed by means of Sanger sequencing in the index case and excluded in his unaffected parents, clearly indicating a de novo occurrence.
Intellectual development was normal in all three presented patients.

Discussion
Craniosynostoses encompass a group of distinct, clinically variable phenotypes. Since the first dysmorphological description of the disease by Wheaton in late 19 th century, researchers have been extensively investigating the molecular background of CS 25 . The first causative gene for this condition was identified by Jabs et al. in 1993, who described a pathogenic variant within MSX2 in a family affected by autosomal dominant CS 26 . In the next four years, a few novel genes -FGFR1, FGFR2, FGFR3, TWIST1 -have been linked to premature fusion of the cranial sutures [27][28][29][30][31] . Recently, the development of high-throughput NGS-based strategies allowed for unravelling of the molecular basis of the condition at an unprecedented scale, as it happened for newly described variants, e.g. within ERF, SMAD6, and TCF12 [32][33][34] . However, in a significant percentage of cases, pathogenesis of the craniosynostosis is still unknown or only partially understood 21,35,36 . Thus, there is an unquestionable need of further research by means of WES or WGS to find novel genes or non-coding variants responsible for the development of CS in humans. In the diagnostic setting, however NGS-based panel approach appears to be a sufficient solution for mutational screening of all known causative genes or variants.
Here, we proved that a custom NGS panel designed by us represents a useful and effective tool in the molecular diagnostics of patients with CS. We investigated 16 unrelated patients and provided a diagnosis at a molecular level for 3 (19%) of them, demonstrating the high coverage and high quality of the sequencing data at the same time. In Patient 1 and 7, the pathogenic variants were previously described in individuals affected either by Crouzon or Pfeiffer syndromes. Clinical evaluation of our patients was consistent with the diagnosis of Pfeiffer syndrome [37][38][39] . Interestingly, the variant detected in Patient 7, who presented with complex CS, macrocephaly, prominent forehead, flat face, proptosis, and broad halluces may not only give rise to Pfeiffer or Crouzon syndromes with normal intellectual development, but also to a more severe cloverleaf skull phenotype with an early demise 40 . A broad phenotypic spectrum resulting from the same pathogenic variant suggests a possibility of other yet unidentified genetic or environmental modifiers, as indicated by Oldridge and colleagues 41 .
Patient 15 carried two pathogenic RECQL4 variants, both described as causative for Rothmund-Thomson syndrome (RTS) in osteosarcoma association study 42 . Both variants are very likely to occur in patient 15 in trans orientation, as they were identified in heterozygous state in two different probands 42 . Additionally, both p.Arg1021Pro and p.Pro103Leu RECQL4 alterations represent rare or extremely rare founder mutations already described in GnomAD/ExAC database. The variant p.Arg1021Pro was annotated in heterozygosity in 2 out of almost 280 thousand control alleles, while the variant p.Pro103Leu in 168 out of about 280 thousand alleles, clearly suggesting that those two SNVs do not represent a common haplotype. Importantly, none of the variants was found in homozygosity in a control healthy population, additionally indicating high likelihood of their pathogenicity. Unfortunately, we were unable to check the parental status for the RECQL4 mutations, as the parents disagree to undergo genetic testing. Interestingly, mutations within RECQL4 gene are linked to three distinct autosomal recessive conditions with overlapping phenotypes, i.e. Rothmund-Thomson, RAPADILINO, and Baller-Gerold syndrome (BGS), but only the last disorder comprises hearing loss in its clinical spectrum [43][44][45] . Considering lack of poikiloderma and the presence of hypoacusis in Patient 15, our final diagnosis was BGS. With clinical and molecular data presented here, we have broadened the phenotypic spectrum of previously reported RECQL4 alterations that may give rise to either RTS or BGS phenotype.
Although we confirmed the molecular diagnosis only in 3 out of 16 probands, the diagnostic yield of 19% is equal to some of the published research reports (21%) 21 . Importantly, we analysed only the patients with negative results of Sanger sequencing for TWIST1 alterations and the most common pathogenic hot-spot variants of FGFR1, FGFR2, FGFR3. Consequently, our diagnostic score was significantly lower than e.g. 62% reported by Paumard-Hernández et al., who did not perform any molecular prescreening in the analysed individuals 22 .
Our study, in which a cohort of CS patients was utilised as an example, demonstrated the usefulness of targeted gene panel sequencing in the diagnostics of complex, genetically heterogeneous conditions. To our knowledge, we were the first to adapt SureSelect hybridisation-based enrichment protocol for the sequencing on the Ion Torrent S5 platform, which is intended to work preferably with amplicon-based panels (f.e. AmpliSeq ® Thermo Fisher Scientific).
Agilent hybridisation technology has not been previously used on Ion Torrent S5 equipment, hence the total cost of the analysis is higher than the standard ThermoFisher Scientific procedure. This is due to the fact that our experiment was Table 2. Common SNVs associated with non-syndromic sagittal craniosynostosis included in craniosynostosis-associated genes panel (based on Justice et al. 46 ). www.nature.com/scientificreports www.nature.com/scientificreports/ focused on obtaining the most optimal quality of sequencing data and not on the cost reduction. The estimated price for the analysis was about 1.7 times higher per sample compared to the standard procedure, but optimisation of cost is certainly achievable. Although SureSelect hybridisation-based protocol is about 1.5 times more time consuming and represents a costlier alternative in comparison to amplicon-based approach, it provides several advantages, especially in the diagnostic setting, such as reduction of PCR-related edge artefacts, better and more exact matching of hybridisation probes. Consequently, it allows for obtaining higher specificity of the amplified region.
With the approach presented here, we achieved high molarity of both pooled libraries (756 and 525 pmol/l) and exceeded coverage of 50 reads in each sample for over 95% of the target gene sequences (173.794 kb), as well as an average coverage of 240 reads per gene across all samples. Importantly, we obtained full coverage for all of the exons within FGFR2, which was not possible in amplicon-based protocols (e.g. Ampliseq ® , Thermo Fisher Scientific). Moreover, we were able to include all of the listed genes, which was impossible using even an advanced made-to order option in Ion AmpliSeq Designer tool (Thermo Fisher Scientific).
In conclusion, we successfully adapted hybridisation-based SureSelect enrichment protocol for the Ion Torrent S5 platform, demonstrating that an alternative enrichment strategy for library preparations can be applied prior

Methods
All procedures involving human participants were performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Ethics approval was granted by the Institutional Review Board of Poznan University of Medical Sciences (no 742/17 obtained on 22 th June 2017). All patients and their parents agreed to participate in this study. This research involved human participants under the age of 18 years. Hence we obtained informed consents from parents and/or legal guardians. We present information or images that could lead to the identification of study participants. Accordingly, a specific consent has also been obtained from all parents and/or legal guardians for publication of identifying information/images in an online open-access publication.
Sample preparation. We extracted genomic DNA from the peripheral blood lymphocytes using the MagCore ® HF16 Automated Nucleic Acid Extractor and quantified each gDNA using the Agilent Technologies TapeStation 4200 and Genomic DNA ScreenTape systems. The custom panel designed by us comprised 61 genes and 11 SNPs (see Tables 1 and 2) known to be involved in the development of craniofacial malformations, including craniosynostosis, in human and mouse. Prior to NGS, we performed targeted molecular screening of all patients for the common hot-spot mutations located in exon 7 of FGFR1 (c.755C > G p.Pro252Arg), exons 7 and 8 of FGFR2 (c.755C < G p.Ser252Trp and c.758C > G p.Pro253Arg), and exon 7 of FGFR3 (c.749C > G p.Pro250Arg) as well as we analysed the entire coding sequence of TWIST1 by means of Sanger sequencing. For NGS, we used high-molecular DNA with a range of DNA Integrity Number (DIN) 6.8 to 9.6. In the next step, Ion Shear Plus reagent (Thermo Fisher Scientific) cut each genomic DNA sample (1 μg) into fragments of 50-250 bp. To obtain approximately 130 bp peaks, we adjusted the time of incubation at 37 °C to 50 minutes (step 1). Afterwards, we ligated each sample with Ion P1 Adapter and Ion Express barcode (Thermo Fisher Scientific). The ligation was as follows: 15 minutes at 25 °C, 5 minutes at 72 °C. We used a thermal cycler without a heated lid (40 °C) (step 2). Next, we proceeded to amplification of the adapter-ligated libraries through PCR reaction. To obtain an adequate yield for subsequent capture without introducing bias or non-specific products, we performed pre-capture PCR with Herculase II Fusion DNA Polymerase (Agilent Technologies) consisted of 8 cycles (step 3). After steps 1-3 we purified each sample with the use of Agencourt AMPure XP beads (Beckman Coulter Genomics), whereas after steps 1 and 3 we assessed the quality and quantity of samples on 4200 TapeStation using D1000 ScreenTape system (Agilent   , (a,c). Asymmetry of the skull and brain, including lateral ventricles, and enlargement of left subarachnoid space seen on horizontal section (d). Limb defect clinically recognized as bilateral split foot malformation with syndactyly of the remaining toes, extremely short and hypoplastic thumbs and 5 th fingers, short 5 th metacarpals and valgus deformity of the right 2 nd finger (e,f). Representation of the compound heterozygous RECQL4 deleterious variants c.308C > T p.Pro103Leu and 3062G > A p.Arg1021Gln detected in Patient 15 by means of targeted next-generation sequencing (g,i). Both pathogenic variants were confirmed with the use of Sanger sequencing (h,j). www.nature.com/scientificreports www.nature.com/scientificreports/ Technologies). In the first measurement of the samples (step 1) the obtained concentrations were between 0.229 and 6.39 ng/µl, while in the final phase (step 3) concentrations ranged from 32.2 to 69.4 ng/µl. Hybridisation and capture. We prepared the NGS libraries for Ion Torrent S5 platform using hybridised capture-based target enrichment approach (SureSelect) developed by Agilent Technologies. We performed hybridisation of 750 ng in 3.4 μl of each genomic DNA library using SureSelect Target Enrichment Reagent Kit according to manufacturer's protocol for <3 Mb capture libraries. After 17 hours of hybridisation at 65 °C, we captured the targeted molecules on streptavidin-coated magnetic beads (Dynabeads MyOne Streptavidin T1).
Post-hybridisation amplification and sample processing for multiplexed sequencing. We amplified purified SureSelect-enriched DNA libraries and non-template control through PCR (11 cycles) with the use of Herculase II Fusion Polymerase. Before assessing DNA quality and quantity with High Sensitivity DNA Assay on TapeStation System, we purified each sample using AMPure XP beads. Based on the evaluated concentration of SureSelect-enriched DNA libraries, we calculated the amount of each sample to be included in the pool using the following formula: volume of barcoded sample: V(f)xC(f)/nxC(i). V(f) is the final required/ needed volume of the pool (20 µl), C(f) is the initial concentration of all SureSelect-enriched DNA libraries in the pool, n is the number of samples to be combined, and C(i) is the initial concentration of each barcoded sample. To avoid the presence of additional fragments in each library, we size-selected our pools by agarose gel electrophoresis using the integrated E-Gel system (Thermo Fisher Scientific), purified them using Agencourt AMPure XP beads (Beckman Coulter Genomics) and finally validated using High sensitivity DNA assay on 4200 TapeStation. The molarity of pooled libraries was 756 and 525 pmol/l, respectively (see Supplementary Fig. 1). Since the Ion Chef requires concentration of a loaded pooled library to be 50 pM, we diluted our samples using low TE buffer.
Emulsion PCR and sequencing. We subjected 25 µl pooled libraries to emulsion PCR on the Ion Chef Instrument with the use of the Ion 520 ™ &530 ™ Kit, according to the manufacturer's protocol. Finally, we sequenced each loaded Ion 520 ™ chip on the Ion Torrent S5 System with the use of recommended reagents.
Sanger sequencing. We confirmed pathogenic variants using a conventional Sanger sequencing. We designed specific primers for the amplification using Primer3 tool (see Supplementary Table 1) and carried out the PCR reactions in a mixture containing the following substrates: DNA, 10 × PCR Premix J buffer, primers, H 2 O and DNA polymerase. The PCR products were purified with Exonuclease I and shrimp alkaline phosphatase. Sequencing of the PCR product was carried out using dye-terminator chemistry (kit v.3, ABI 3130XL) and run on automated sequencer Applied Biosystems Prism 3700 DNA Analyser.
Bioinformatic analysis. Reads were initially demultiplexed and aligned to GrCh37 human reference sequence using the TorrentBrowser 5.0.4 software (Thermo Fisher Scientific) running as embedded instance within Ion Torrent S5 sequencer. The resulting alignment BAMs were further processed using IonReporter 5.2 pipeline (Thermo Fisher Scientific), which incorporated variant calling. Estimation of coverage for individual genes/positions was conducted via bedtools 2.27.1 (coverage subcommand) against a BED file defining coding parts of canonical transcripts (RefSeq mapped on UCSC hg19 reference; 5 bp padding around each exon included; see Supplementary 2). Variant quality control was assessed based on a fourfold metric (read depth -greater than or equal to 20, strandedness -no more than 4:1 difference in reporting of the variant on opposite strands, PHRED quality of over 30, and variant proportion of not less than 15% of total reads). The existence of potentially significant variants was further reassessed through manual inspection of aligned reads in IGV 2.4 software.
Visualisation of variants within gene/protein sequence context was done using R/Bioconductor package trackViewer (1. 16.1, ran in R 3.4.1).