A complex intragenic rearrangement of ERCC8 in Chinese siblings with Cockayne syndrome

Cockayne syndrome is an autosomal recessive disorder principally characterized by postnatal growth failure and progressive neurological dysfunction, due primarily to mutations in ERCC6 and ERCC8. Here, we report our diagnostic experience for two patients in a Chinese family suspected on clinical grounds to have Cockayne syndrome. Using multiple molecular techniques, including whole exome sequencing, array comparative genomic hybridization and quantitative polymerase chain reaction, we identified compound heterozygosity for a maternal splicing variant (chr5:60195556, NM_000082:c.618-2A > G) and a paternal complex deletion/inversion/deletion rearrangement removing exon 4 of ERCC8, confirming the suspected pathogenesis in these two subjects. Microhomology (TAA and AGCT) at the breakpoints indicated that microhomology-mediated FoSTeS events were involved in this complex ERCC8 rearrangement. This diagnostic experience illustrates the value of high-throughput genomic technologies combined with detailed phenotypic assessment in clinical genetic diagnosis.

did not detect homozygous or compound heterozygous pathogenic variants although it did reveal a variant presumed to affect splicing in ERCC8. Karotyping and aCGH (Agilent SurePrint G3 4 × 180 K CGH+ SNP array) were performed for both patients, but no pathogenic genomic imbalance or uniparental disomy (UPD) was detected, so no precise genetic diagnosis could be given.

Pathogenic variant and family co-segregation analysis.
To complete genetic diagnosis and counseling for this family, we pursued WES for the two affected siblings (II:6 and II: 10) in an attempt to identify the cause of pathogenesis for this as yet uncertain disorder.
An average of 55.80 million reads of DNA sequence were generated with average 49.37 fold coverage, and above 97.76% of the reads were aligned to hg19, providing sufficient depth to call single nucleotide polymorphisms (SNPs) and deletion/insertion (del/ins). Common variants reported in dbSNP138 or in the 1000 Genomes Project with minor allele frequency (MAF) ≥ 0.005 were excluded. The Exome Aggregation Consortium (ExAC) database was used to confirm the novelty of single nucleotide variants (SNVs). Given the similar characteristics of the two siblings, shared rare variants were collected and uploaded to Ingenuity Pathways Analysis (IPA) software for genotype-phenotype analysis. A total of 826 rare variants was shared between the two patients. To narrow these to a manageable number, we set key phenotypes including "photosensitivity", "intellectual and developmental disabilities", "dwarfism", "hearing loss" and "retinitis pigmentosa" in the IPA system. According to the pedigree, autosomal recessive and X-linked recessive genetic disease were both possible models for inheritance in this family. Recently, the American College of Medical Genetics and Genomics (ACMG) developed standards and guidelines for the interpretation of sequence variants: the ACMG recommends five specific standard terminologies (pathogenic, likely pathogenic, uncertain significance, likely benign and benign) to describe any individual variant identified in genes known to cause Mendelian disorders 8 . We strictly followed ACMG guidelines to classify pathogenic/likely pathogenic variants from WES, with null variants (nonsense, frameshift, canonical splice sites, initiation codon, single exon or multiexon deletion) and predicted deleterious missense variants receiving preferential consideration. With this strategy, 23 heterozygous variants on autosomes and one variant on the X chromosome were chosen for co-segregation analysis using all family members (Supplementary Table S1).
However, all but one of these candidate variants were excluded by co-segregation analysis: ERCC8 c.618-2A > G (NM_000082), a potential splicing variant, was present in both siblings (viewed by NextGENe software in Fig. 2a). Subsequent Sanger sequencing in the family members confirmed that these two siblings and one normal sister (II:4) inherited this heterozygous variant from their mother, while the father and two other unaffected sisters have the reference sequence at this site (Fig. 2b).
Splicing variant leads to deletion of three amino acid residues. To test whether the identified c.618-2A > G (NM_000082) variant does affect pre-mRNA splicing, we generated cDNA from RNA isolated from peripheral blood of the two patients and their parents. ERCC8 reverse transcription polymerase chain reaction (RT-PCR) amplification products were cloned and sequenced. We found that this splicing variant resulted in absence in the mRNA of the first 9 bp (TGCTGACAG) of exon 8 (Fig. 2c), corresponding to loss of three amino acids (ADS, NP_000073) from position 207 to 209 within the fourth WD motif of the ERCC8 (CSA) protein.
Complex ERCC8 deletion/inversion/deletion rearrangement detected by qPCR and 1 M aCGH. We hypothesized that perhaps an undetected intragenic deletion in ERCC8 combined with the splicing variant to generate the CS phenotype. Accordingly, both exon-targeted qPCR and high density 1 M aCGH were performed to test this possibility. Using primers targeted on each exon of ERCC8, we detected a possible deletion  Supplementary Fig. S1). Meanwhile, 1 M aCGH conducted for the proband detected an abnormal log ratio for two probes (chr5:60212688-60219701, 7 kb, green dots in Fig. 3a) in ERCC8 suggesting the possibility of a small intragenic deletion. Long-range PCR with a series of deletion-specific primers (red arrowheads in Fig. 3a, Fm and Rm primers in Supplementary Table S2) was then performed in all family members to confirm and map this intragenic deletion. Fragment size analysis indicated deletion of 3.8 kb in the two affected patients (Fm5 and Rm4 primers in Supplementary Table S2), the father (I:3) and one sister (II:7), but not in the mother and the two other sisters (II:4 and II:8) (Fig. 3b).
To confirm the breakpoints of this intragenic ERCC8 deletion, we purified the deletion-specific PCR product from an agarose gel and sequenced it (using the Fsq, Rsq, Fm5, Rm4, primers in Supplementary Table S2). These sequencing data revealed a complex deletion/inversion/deletion rearrangement (chr5:60211534-60217114) in which two distinct deletions (chr5:60211534-60213756 which removes exon 4 and chr5:60212086-60217114) flank a 1670 bp inverted segment (chr5:60212086-60213756). The sequence characteristics surrounding the four breakpoints revealed microhomologies (TAA on chr5:60211534/60213756; AGCT on chr5:60212086/60217114. see Fig. 3c,d) as potential factors contributing to the ERCC8 rearrangement. We also performed the cDNA sequencing for all available family members, and confirmed the skipping of exon 4 in the proband, sibling and father ( Supplementary Fig. S2).

Discussion
ERCC8 (encoding CSA protein) is located on chromosome 5q12.1 and encodes a 44-kDa protein of 396 amino acids with seven predicted WD-40 repeat motifs 5,9 . CSA interacts with CSB (the product of ERCC6) to recruit other repair factors (XAB2, HMGN1, TFIIS) to the repair site during repair of UV-generated DNA damage [10][11][12][13] . Mutations in ERCC8 account for about 35% of the CS cases and comprise nonsense, missense, frameshift and splicing mutations, along with dosage imbalance due to small insertions and deletions 7,14 . In the current study, we investigated two boys characterized by growth failure, neurological impairment, microcephaly, short statue, hearing and insight loss, and photosensitivity, the characteristics of classical CS (CS type I). However, the genetic diagnosis was not given to the family until the proband was 13 years old. We performed multiple genetic testing modalities for the affected siblings to finally achieve a precise molecular diagnosis for this family. The affected siblings carry a splicing variant and a complex exonic deletion of ERCC8, which were inherited from the mother and the father, respectively. Both variants were interpreted as "pathogenic" in a recessive mode, according to the guidelines of ACMG based upon the following criteria: absent in population databases, predicted as null/deleterious variants, co-segregating with disease in multiple affected family members, presence in trans 8   been reported repeatedly in Caucasian patients, and the transcript analysis has confirmed that it also produces mis-splicing that removes the same amino acids (ADS) within the same WD motif 7,15 . We reviewed CS case reports and found that all causative missense mutations of ERCC8 identified in CS patients are located in the WD repeats 7,16 , implying the importance of WD motifs for building beta propeller structures and protein-protein interactions. In vitro functional analysis has also indicated that missense mutation (A205P) in the fourth WD motif affects its interaction with other proteins (DDB1) 17 . Beside SNVs, exonic deletions account for 16.7% (7/42) of ERCC8 mutations in CS 7 . Examples include a deletion containing exon 1 and upstream regulatory sequences in a CS patient reported by Ren et al. 18 and double deletions in a CS patient composed of a paternally-inherited deletion covering exon 4 and a maternal whole gene deletion of ERCC8 14 . In our patients, the deletion of exon 4 was so small that it was easily missed during routine aCGH testing due to few probes covering each exon of the ERCC8 region on the particular arrays used. This initiated a long diagnostic odyssey for this family.
With the dramatic improvements in genomic technologies and the significantly decreased price for sequencing, high-throughput whole genomic testing has been accepted as the most comprehensive and valuable testing approach for clinical diagnosis laboratory [19][20][21] . In general, high-throughput whole genomic testing includes whole genomic aCGH and WES. The former can detect genomic imbalances more effectively than FISH or karyotyping 22,23 , and the latter can detect any SNVs and del/ins in coding sequences across the genome. Compared with candidate gene or gene panel sequencing, WES provides an unbiased approach to affordably screen a patient's entire exome to establish the genetic basis of disease. These high throughout genomic technologies have increased the diagnostic yield among individuals with unexplained developmental disorders, and rapidly revolutionized molecular diagnostics strategy, enabling physicians and patients to move to more accurate diagnostics and appropriate treatment (precision medicine). For patients with unknown unexplained autism spectrum disorder (ASD), the clinical utility of chromosomal microarray analysis (CMA) has dramatically increased the rate of diagnosis from 2.23% to 7% 24 . Recently, the Scherer group combined CMA and WES assays to study the genomic basis for a heterogeneous sample of ASD patients; they found that the combined molecular diagnostic yield was 15.8%, higher than the diagnostic yield of WES (8.4%) or CMA (9.3%) alone 25 . However, these high-throughput genomic testing approaches cannot handle all diagnostic puzzles if the geneticist does not fully integrate the information from these different techniques with accurate and complete clinical data. The identification of precise/unique symptoms is a key step in phenotype-genotype interpretation during WES. When we initially performed data analysis in the IPA system, we used the symptoms "intellectual and developmental disabilities", "dwarfism", and "hearing loss" but failed to identify the mode of pathogenesis because these symptoms are seen in many genetic/ genomic disorders. In the second round of interpretation, we used two more specific symptoms (photosensitivity and retinitis pigmentosa) for data analysis, reducing the candidate variant pool sufficiently to permit testing of co-segregation in family members. Our results indicated the value of repeated and detailed review of the detailed phenotype of the patient to prioritize and ultimately identify causative pathogenic variants in clinical WES testing.
Notably, deletion of ERCC8 has been identified in the normal population, as evidenced by the Database of Genomic Variant track in the UCSC Genome Browser, indicating no evidence of a clinical phenotype caused by haploinsufficiency. We compared the frequency of intragenic ERCC8 deletions from one large control cohort (dgv9811n54, dgv9810n54) 26 and one CS patient cohort (Laugel's study) 7 . ERCC8 deletion was significantly more frequent in CS patients than in controls (7/42 vs. 12/8329, p < 0.001, two-tailed Fisher's exact test). This emphasized that, while ERCC8 deletion is not recognized as a pathogenic CNV in the normal population, it is involved in the pathogenesis of CS when it co-occurs in trans with another pathogenic ERCC8 variant. While in the process of submitting this paper, we identified another unrelated CS patient carrying exact same exonic deletion of ERCC8 as the two patients in this study (data not shown), arguing that otherwise benign ERCC8 deletions in the general population represent a risk for causing CS in a recessive model of inheritance.
Copy number variants (CNVs) are generated by a variety of genomic rearrangements [27][28][29][30][31][32][33][34] . Non-allelic homologous recombination (NAHR) between low-copy repeats can cause recurrent rearrangements and lead to genomic disorders, such as DiGeorge syndrome and Williams-Beuren syndrome 35 . Non-homologous end-joining (NHEJ) is another mechanism contributing to chromosomal abnormality via the formation and repair of DNA double-strand breaks (DSBs), and is one of the key mechanisms underlying non-recurrent rearrangements 36 . DNA replication fork stalling and template switching (FoSTeS) is a recently identified mechanism for non-recurrent and especially complex rearrangements due to faulty DNA replication. A short stretch of microhomology is a key feature of the FoSTeS process. The FoSTeS molecular mechanistic details have been provided in the microhomology-mediated break-induced replication (MMBIR) model. The DNA replication FoSTeS/ MMBIR mechanism can generate complex genomic, genic and exonic rearrangements in humans 37 . FoSTeS had been implicated in many pathogenic genomic rearrangements, such as non-recurrent intragenic NRXN1 deletion 38 and duplication/deletion in PLP1 associated with Pelizaeus-Merzbacher disease (PMD) 39 . In the current study, we mapped this complex ERCC8 intragenic rearrangement (deletion/inversion/deletion) at the nucleotide level and found microhomologies (TAA and AGCT) around the breakpoint sites supporting the view that two FoSTeS events were involved in generating the ERCC8 complex rearrangement.
In summary, we describe the clinical and genetic characterization of two affected boys with CS type I in a Chinese family. Following the guidelines for the interpretation of sequence variants of the ACMG, we identified compound heterozygosity, including a paternal intragenic rearrangement and a maternal splicing variant in ERCC8, as the cause of CS pathogenesis in these individuals. In the clinical diagnosis of many rare neurological diseases, combined utilization of careful clinical phenotyping with high-throughput genomic technologies, such as WES and aCGH, can increase the diagnostic yield and provide much-needed answers for families faced with these disorders.

Methods
Ethics statement. This study was performed in accordance with the Declaration of Helsinki and approved by the ethics committee of Capital Institute of Pediatrics (SHERLL 2015069). Written informed consent was obtained from the patients' guardian/parent/next of kin for the publication of this clinical information and any accompanying images.
aCGH. Genomic DNAs of the proband and all available family members were extracted from peripheral blood using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Two aCGH (4 × 180 K and 1 M) (Agilent Technologies, Palo Alto, CA, USA) were used to detect genomic imbalance according to previously published methods 40 . aCGH data were analyzed via DNA CytoGenomics software (Agilent Technologies, Palo Alto, CA, USA).
Whole exome sequencing and variant analysis. Genomic DNAs of the two patients were fragmented by Covaris S2 (Covaris, Massachusetts, USA) to 200-300 bp. The paired-end libraries were prepared following the Illumina protocol. Whole exome sequences were captured from the genomic DNA using Agilent SureSelect V5 Enrichment kit (Agilent, Cedar Creek, TX). The exon-enriched DNA libraries were sequenced by 100 bp paired-end reads on a Hiseq2000 sequencer (Illumina, San Diego, California). Raw image files were processed by the Illumina Pipeline for base calling using default parameters.
After image analysis and base calling was conducted using the Illumina Pipeline, raw data were transfer into fastq form and filtered to generate "clean reads" by removing adapters and low quality reads (Q20 qPCR. To test whether the affected patients (II:6, II:10) had the ERCC8 intragenic deletion, qPCR was performed using the 7500 Fast Real-Time PCR System with GAPDH as internal reference gene for each sample. Two unrelated testing samples without ERCC8 deletion (validated by 244k aCGH beforehand) were used as controls.
The qPCR was carried out in the presence of SYBR Green, measuring the fluorescence signal produced by the binding of SYBR Green to the studied amplicons compared with the control. The qPCR primers for reference and target genes are in Supplementary Table S3.
Scientific RepoRts | 7:44271 | DOI: 10.1038/srep44271 Long-range PCR and breakpoint mapping on the junction sequence. Long-range PCR was carried out to amplify the truncated fragment covering exon 4 of ERCC8 and map out the characteristics of breakpoints (Platinum PCR SuperMix High Fidelity kit, Invitrogen, Carlsbad, CA). Using a series of primers around the approximate breakpoints (Supplementary Table S2), the junction fragments were amplified successfully and were visualized on a 1% agarose gel. The truncated fragment was purified from the gel (QIAquick gel extraction kit, Qiagen, Valencia, CA) and sequenced (ABI 3700).

RT-PCR, Cloning and Sequencing.
Total RNA was isolated from fresh peripheral blood using the RNeasy mini kit (Qiagen, Hilden, Germany) and was reverse transcribed with ProtoScript ® First Strand cDNASynthesis Kit (New England Biolabs Inc., MA, USA). PCR amplification of cDNA including spanning exon 3-12 was performed using the primer pair 5′ -CACATGTAAAGCAGTGTGTTCC-3′ and 5′ -GCATTTCATGTTTAAGCCAGATT-3′ , visualized on a 1% agarose gel, purified using QIAquick PCR Purification Kit (Qiagen, Hilden, Germany), and then cloned into a pUC19 vector for Sanger sequencing following the manufacturer's protocol. A mixture of DNA from 10 normal children was used as the control.