De novo large rare copy-number variations contribute to conotruncal heart disease in Chinese patients

Conotruncal heart anomalies (CTDs) are particularly prevalent congenital heart diseases (CHD) in Hong Kong. We surveyed large (>500 kb), rare (<1% frequency in controls) copy-number variations (CNVs) in Chinese patients with CTDs to identify potentially disease-causing variations. Adults who tested negative for 22q11.2 deletions were recruited from the adult CHD clinic in Hong Kong. Using a stringent calling criteria, high-confidence CNV calls were obtained, and a large control set comprising 3,987 Caucasian and 1,945 Singapore Chinese subjects was used to identify rare CNVs. Ten large rare CNVs were identified, and 3 in 108 individuals were confirmed to harbour de novo CNVs. All three patients were syndromic with a more complex phenotype, and each of these CNVs overlapped regions likely to be important in CHD. One was a 611 kb deletion at 17p13.3, telomeric to the Miller–Dieker syndrome (MDS) critical region, overlapping the NXN gene. Another was a 5 Mb deletion at 13q33.3, within a previously described critical region for CHD. A third CNV, previously unreported, was a large duplication at 2q22.3 overlapping the ZEB2 gene. The commonly reported 1q21.1 recurrent duplication was not observed in this Chinese cohort. We provide detailed phenotypic and genotypic descriptions of large rare genic CNVs that may represent CHD loci in the East Asian population. Larger samples of Chinese origin will be required to determine whether the genome-wide distribution differs from that found in predominantly European CHD cohorts.


INTRODUCTION
Congenital heart diseases (CHDs) are the most common birth defect and a major cause of morbidity and mortality. 1 Conotruncal heart defects (CTDs) are CHDs affecting the cardiac outflow tract. These include tetralogy of Fallot (ToF), pulmonary atresia with VSD (PAVSD), truncus arteriosus, interrupted aortic arch, transposition of the great arteries and double outlet right ventricle. Overall, CTDs have an estimated prevalence of 11.6 per 10,000 live births. 2 Figures available for ToF, the most common CTD, indicate a prevalence of 2.7 per 10,000 live births in Europe 2 and 4.7 per 10,000 live births in the United States. 3 In Taiwan, the prevalence has been reported to be 6.26 per 10,000 live births. 4 Although Hong Kong does not have the prevalence data for CTD, in terms of the proportion of all CHD, pulmonary outflow tract obstruction alone accounts for 31.1% of all symptomatic Chinese neonates with CHD in Hong Kong, almost double the proportion reported in western studies, 5 implying that this group of congenital abnormalities may have greater representation in Asian populations.
Although it is thought that genetic factors have an important role in CHD, only about 11% of patients receive a genetic diagnosis. 1 In addition to chromosomal abnormalities and rare single-nucleotide variants, a growing proportion of the molecular diagnosis is attributed to rare copy-number variations (CNVs). Multiple studies show an increased burden of rare CNV in patients affected with ToF compared with controls, and several recurrent loci have been reported to be pathogenic for ToF, the most notable being the deletion responsible for 22q11.2 deletion syndrome. [6][7][8][9][10][11][12] Nevertheless, Chinese studies are lacking.
Recognising the importance of CTDs in Chinese patients, and the wide spectrum of CNVs contributing to CTDs in other cohorts, our aim was to discover important CNVs in a Chinese cohort, focusing on the large (4500 kb) CNVs, readily detectable on clinical microarray platforms. We hypothesised that large rare structural variants that overlap the coding sequence of one or more genes would be associated with CTDs, over and above the recurrent 22q11.2 deletion we previously showed to be prevalent in this cohort. 8 We present case examples characterising the lifetime phenotype of patients harbouring CNVs that are likely to be the most clinically significant.
Large rare CNVs Using stringent criteria and methods, 10 rare CNV calls were obtained for 108 (93.2%) of the 116 subjects and therefore these subjects were used as the CNV discovery cohort. Filtering for only CNVs found in o1% of platform-matched controls, a total of 306 rare autosomal CNVs were identified in these 108 subjects. The subsequent analysis focused on CNVs that were large (defined as 4500 kb), rare ( o1% in all platform-matched and ethnicitymatched control groups, total = 5,902), and overlapped the exons of genes. Ten large CNVs from ten patients met these criteria (Table 1). Of note, the sex distribution of these 10 patients was skewed, with 7 females and 3 males, although the overall cohort had an excess of male patients. None of the large CNVs identified overlapped the smaller (o 500 kb) rare CNVs found in the sample of 108 patients. The clinical features of these 10 patients are summarised in Table 2.
Clinical cases and candidate genes of significant large rare CNVs Of these 10 CNVs, three were de novo large rare CNVs found in syndromic patients. First is a large duplication at 2q22.3 (Patient 1), the second, a 5 Mb deletion at 13q33.3 (Patient 9) and the third, a 611 kb deletion at 17p13.3 (Patient 10). Below we discuss in detail the phenotypic and molecular features of these three patients.
2q22.3 duplication overlapping the ZEB2 gene Patient 1 (female, age 32 years) was born with ToF and patent foramen ovale. She had mild intellectual disability and adjustment disorder. She also has a history of dysfunctional uterine bleeding. On examination, she was dysmorphic, but did not fit the facial features of Mowat-Wilson syndrome. The most remarkable observation was that she had a hypernasal voice (Figure 1a).
A de novo 2.1 Mb copy-number gain was found at 2q22.3, which overlaps the ZEB2 gene. Deletions of this gene are a known cause of Mowat-Wilson syndrome (OMIM#235730) in which conotruncal heart disease have been described. 13,14 However, there have been no known reports of a duplication leading to the Mowat-Wilson syndrome phenotype. Such a large gain in copy-number may disrupt regulatory elements, such as the expression levels of ZEB2, which would subsequently affect transcription downstream.
The ZEB2 gene (previously known as ZFHX1B, or SIP1) is widely expressed in human embryological development, including in the heart. 15 The protein encoded by this gene, is SMAD-interacting protein-1 (ref. 16), which acts to activate or repress transcription by binding to regulatory sequences of E-boxes. 17 SMAD proteins are present in the cytoplasm to mediate transforming growth factor β (TGF-β) signals from receptors of the cell-surface into the nucleus, 18 and hence the TGF-β group of cytokines are likely to have an important role in embryological development, 18 including cardiac outflow tract formation. Loss of ZEB2 in the mouse is associated with variable heart defects, 19 similar to the cardiac anomalies with deletions of this gene in human Mowat-Wilson syndrome.
A 13q33.3 deletion in a critical region for CHD Patient 9 (female, age 21 years) has a complex cardiac anatomy of dextrocardia, PAVSD, and patent ductus arteriosis. She was born cyanotic and small for dates, with a birth weight of 2.5 kg. Neurological assessment up to 8 months was apparently normal, however, at age 15 months, motor and speech delays were noted. She also had a brain abscess complicated by focal seizures at age 21 months. The head circumference was below the 3rd centile even before the occurrence of the brain abscess, however, and remained at the same percentile after the resolution of this infection. Subsequently, there was overall slow progress in intellectual function and the patient had attended a special  For all available parental samples, inheritance of CNV was determined and shown.
Candidate genes with potential importance in cardiac development are highlighted in bold font.
All genomic coordinates are provided using genome build hg19.
a Rarity for Group 1 controls (All CNVs are found in o1% of group 2-Chinese controls) indicating very rare CNVs.
Conotruncal heart anomalies CCY Mak et al school. On examination, she had mild facial dysmorphic features with low set ears and hypernasal speech (no clinical photos).
In this patient, a large de novo 4.9 Mb deletion was found at 13q33.3, detectable by G-banding karyotype performed after the microarray. The CNV overlaps 41 genes, some notable among these were TFDP1, GAS6, COL4A1, COL4A2 and SOX1 ( Figure 2). Given the size, and the known critical region, this CNV was clinically classified as pathogenic.
The 13q33.2-33.3 region has previously been proposed to be a critical region for CHD 20,21 ( Figure 2). Huang et al. summarised deletions with different CHD phenotypes such as ToF, ostium secundum, 22 coarctation of the aorta, 23 interauricular communication, 23 pulmonary valve stenosis, 24 patent ductus arteriosus, 24 ASD/VSD 20 and single atrium. 20 Looking at conotruncal defects only, three cases of ToF have so far been reported to have very large deletions that overlap this critical region. 22,23 McMahon et al. also reported a case of double outlet right ventricle within this proposed critical region with a 2.5 Mb deletion at 13q33.3-33.4 ( Figure 3). 25 They speculated that the collagen genes COL4A1 and COL4A2 are candidate genes for CHD as they are likely to have a role in cardiac development. The CNV found in our patient falls within the proposed critical region, and has a complex phenotype of PAVSD, dextrocardia, patent ductus arteriosus and situs solitus. It is interesting that within the small region of overlap with the CNV reported by McMahon et al. are the genes COL4A1 and COL4A2, which agrees with their hypothesis. However, the 13q33.3 deletion of Patient 9 also overlaps a smaller 1.1 Mb deletion with single atrium reported by Yang et al. that did not include either of these two collagen genes. 21 Of note, in our cohort, there was another large copy-number gain at 13q33.2-33.3 (patient 8), overlapping five genes (EFNB2, ARGLU1, DAOA-AS1, DAOA and LINC00460), further suggesting the potential importance of this region and ephrin-B2 signalling in CHD. 26 Summarising all the evidence so far, we posit that deletions of 13q33.3 are strongly associated with CHD but that narrow critical regions, or candidate genes, remain to be confirmed with more cases.
17p13.3 deletion telomeric to the critical region of Miller-Dieker Syndrome (MDS) Patient 10 (Female, age 27 years) is known to have PAVSD. She required multiple surgeries, including a left modified Blalock-Taussig (LmBT) shunt for left pulmonary artery stenosis at age 7 months. Subsequently, closure of VSD and right ventricular outflow tract reconstruction was performed at the age of 7 years. She is currently on warfarin due to atrial fibrillation. She had a history of epilepsy since age 17 years, initially presenting with petit mal seizures with normal EEG, controlled with sodium valproate. She studied in a special school because of mild intellectual disability. Her other medical history includes scoliosis, left hip dysplasia requiring total hip replacement, severe myopia and amblyopia, as well as primary amenorrhoea (diagnosed at age 17 years). On examination, she had dysmorphic facial features (Figure 1b).
The CNV analysis revealed a de novo 611 kb loss at 17p13.3 overlapping 10 genes, including NXN (Figure 3). The NXN gene is expressed in murine heart and there is evidence that loss of the NXN gene is associated with CHD in mice. 27 In humans, the gene is expressed in human heart and has a haploinsufficiency prediction score of 37.4%, 28 as well as RVIS (Residual Variation Intolerance Score) 29 percentile of 11.77% (i.e., the top 11.77% most intolerant of human genes to variation). The role of NXN in the canonical Wnt/β-catenin signalling pathway through regulation of the dishevelled protein and the gene's subsequent role in second heart field signalling, further suggest this gene's potential involvement in cardiac development. Notably, a similarly sized (544 kb) rare copy-number gain overlapping the NXN gene was reported by another group to be associated with ToF. 10 As PAVSD may be considered to be at the more severe end of the spectrum of the cardiac phenotype in relation to ToF, the loss of one copy at this locus may implicate a more profound cardiac developmental effect for CHD than would a gain of copy. Isolated case reports indicate a link between CHD and large deletions in this region at the resolution of the chromosome band, including two from Taiwan and Japan with ToF 30 and PAVSD, 31 respectively, that overlap NXN. This suggests that a critical region for CHD may be located in the region of the NXN gene (Figure 3), in addition to the more proximal MDS region (Figure 3).

DISCUSSION
Large rare CNVs in conotruncal heart disease A number of studies have explored the contribution of CNVs to conotruncal heart defects, particularly ToF. Most of these studies involved samples from predominantly Caucasian popula-tions. 6,7,[9][10][11]32 Although there is evidence to suggest an increased burden of large rare CNVs in ToF patients compared with the normal population, 6,10,11 few loci have been found to be recurrently associated with the disease, implicating substantial genetic heterogeneity. 33 The most notable recurrent CNVs responsible for ToF are 22q11.2 deletions, followed by 1q21.1 duplications. 7,10,11 In this study of adult Chinese patients with CTDs, after excluding those with recurrent 22q11.2 deletions (15.3%), we discovered large rare CNVs in 10 of 108 (9.3%) patients; the previously reported 1q21.1 duplication involving GJA5 gene was not found. 7,10, 11 We were able to characterise the clinical features of these 10 patients by clinical re-evaluation and determine that three were syndromic. All three of these patients carried a de novo CNV harbouring genes potentially linked to cardiac development that would be considered pathogenic. The other seven large genic CNVs, all very rare and in patients with few or no reported extracardiac features, would represent variants of unknown clinical significance (VUS). For the two of these seven with parental samples available the CNVs were inherited, as is often found for complex developmental conditions. 34 Possible ethnic specific CNV distribution in Chinese Whist there appears to be some similarities and overlap of CNVs in our cohort compared with the literature, such as the prevalence of recurrent 22q11.2 deletions and overlap of deletion at 5q35.3 ( Supplementary Information S1), the distribution of other large rare CNVs in this Chinese CTD cohort seemed to differ to some extent from the published literature on ToF in the predominantly Caucasian samples studied. Several publications reported recurrent large (over 1 Mb) rare duplications involving the GJA5 gene. 7,10,11 Absence of this CNV in our cohort could be due to insufficient sample size given the~1% prevalence in European ToF cohorts. 7 Alternatively, 1q21.1 duplications may be rarer in  East Asian populations. It is known that there can be considerable stratification of recurrent rearrangements between Asian and European populations, 35 and this may be the case for 1q21.1 duplications. Despite our study being the largest Chinese cohort of CTD reported to date, a larger sample sizes will be needed to determine whether 1q21.1 CNVs are truly rarer in individuals of Chinese origin than in those of European descent.
With respect to the rare de novo CNVs found, we speculate that CNVs in the greater 17p13.3 region may be an important cause of developmental disorders in the Chinese population. In general, CNVs in this region are non-recurrent and may lead to various phenotypes other than that of typical MDS. Supporting this possibility, our group has reported cases of 17p13.3 duplication (with aortic stenosis, microcephaly and dysmorphism) 36 and triplication (with split-hand malformation) 37 in Hong Kong. Larger studies of the Chinese population will be needed to better classify pathogenic CNVs in the 17p13.3 region, and to determine whether there are any ethnic differences in prevalence.

Advantages and limitations
Our study was carried out using two large groups of European controls, 10 and one of Chinese controls, 38 to adjudicate rarity of CNVs. The array platform and stringent criteria used for CNV calling have also been validated before. 10 Another advantage of this study is the utilisation of an adult cohort with the availability of long-term medical information. Data regarding both development and later medical complications could be obtained as a result, giving us more information on the potential phenotypic effects of these large rare CNVs over a long period of time. Performing this study on Han Chinese exclusively also meant that we could better begin to characterize the distribution of CNVs related to CHD in this ethnic group, as well as provide documentation of the clinical presentation for reference. Nevertheless, despite being the largest Chinese study so far of CNVs in CHD, our sample size was not sufficient for CNV burden analysis. Furthermore, not all parental samples were available for determination of de novo status of these CNVs.

CONCLUSION
This study of 116 Chinese patients with conotruncal heart defects identified ten large (4500 kb) rare genic CNVs, three of which were determined to be de novo and pathogenic and the others of less certain clinical significance. Notably, we did not find the previously reported recurrent 1q21.1 duplication in our cohort. The small sample size precludes concluding whether or not the distribution of rare CNVs differs in the Chinese population from that observed in European populations. We also provide further evidence for the pathogenicity of the 17p13.3 and 13q33.3 loci, involving the candidate genes COL4A1 and COL4A2, and NXN, respectively.

Patients and samples
A summary of the study design is provided in Figure 4. Adult patients (418 years of age) with CTDs, but no prior genetic diagnosis, were prospectively recruited between February 2012 to July 2013 from the adult congenital heart disease clinic at Queen Mary Hospital, the only tertiary referral centre for CHDs in Hong Kong. All patients were self-reported to be Han Chinese. On a research basis, patients were identified by the attending cardiologist and recruited if a diagnosis of congenital CTD had been recorded. The diagnoses were determined from previous cardiology clinic notes, echocardiograms and operation records based on the cardiologist's experience. The standard care of these patients with CHD in Hong Kong does not involve any genetic testing. Using protocols approved by the local institutional review board, written informed consent was obtained for quantitative fluorescence PCR (QF-PCR), chromosomal genome-wide microarray analysis and publication of photos (Patient 1 and Patient 10).
Genomic DNA was extracted from whole blood samples. Patients were first screened for 22q11.2 deletions by QF-PCR followed by confirmatory fluorescence in situ hybridisation (FISH) using a standard probe. 8 Patients tested 22q11.2 deletion positive 8 (n = 21 of 137, 15%) were excluded. We included only the patients with ToF or PAVSD and excluded further two cases with persistent truncus arteriosus (n = 1) and interrupted aortic arch (n = 1) respectively for a more homogenous discovery cohort. Samples were sent for CNV analysis at The Centre for Applied Genomics (The Hospital for Sick Children, Toronto, ON, Canada) using the Affymetrix 6.0 SNP array.

Stringent CNV calling criteria and control groups
Using multiple CNV calling algorithms (Birdsuite, 39 iPattern 40 and Affymetrix Genotyping Console), a stringent criteria was used to call the CNVs, 10 i.e., at least 10 kb in length, spanning 5 or more consecutive array probes and called by more than one algorithm. We compared CNVs identified to two different groups of population controls and defined rare CNVs as those with a population frequency of o 1% using a 50% reciprocal overlap criterion. First, we compared with CNVs analysed in the same way from 3,957 platform-matched controls, 10 which included CNVs from the Ottawa Heart Institute Coronary Heart Disease Study, n = 1,234; German PopGen Project, n = 1,123; 41 the Ontario Population Genomics Platform, n = 416; 10 and Hapmap3 Project, 42 n = 1,184. As individuals of Chinese ancestry only formed a small proportion (4.8%) of these controls, as a second stage, we compared CNVs to 1,945 Chinese subjects from the Singapore SgD-CNVdatabase 38 in an attempt to eliminate variants with a frequency 41% that were specific to the Chinese population.

Interpretation of CNVs
Our analysis prioritised large (4500 kb) CNVs that overlapped genes since these are more likely to be of clinical significance. CNVs not observed in any of the total 5,902 controls were classified as very rare. Only autosomal CNVs were examined.
The pathogenicity of the CNVs in relation to CTD was analysed systematically using online databases. Each CNV and gene was manually interrogated for the likelihood of causing a cardiac disease phenotype. The databases used for CNV analysis included Database of Genomic Variants (DGV), 43 Online Mendelian Inheritance in Man (OMIM), Decipher 44 and ISCA. 45 Each gene overlapped by the CNV was studied using evidence from PubMed, OMIM, Mouse Genome Informatics (MGI) 46 and ZFIN 47 to elucidate any potential relevance to congenital heart anomalies and identified as candidate genes. Genomic parameters used were from GRCh37/hg19.

Validation of clinically relevant CNVs
CNVs classified as putatively clinically relevant were validated using NimbleGen CGX-135K oligonucleotide array (Signature Genomics, Roche NimbleGen, Madison, WI, USA) or 60K CGX v.2 oligonucleotide array (manufactured by Agilent Technologies, Santa Clara, CA, USA; PerkinElmer, Turku, Finland), as described previously 48 and by the manufacturers. These microarray platforms were used for clinical diagnostic purposes in our institution. The former NimbleGen array has an average resolution of 140 kb, with a higher resolution (40 kb or less) in regions thought to be of significance in human development, while the latter Agilent array has an average resolution 190 kb with a higher resolution of 28 kb in the respective regions of focus. The Genoglyphix software (Signature Genomics, Spokane, WA, USA) was used to analyse and annotate the results. Only a limited number of parental samples were obtained when calling back the patients, and trio aCGH testing was offered where possible. Several large rare CNVs detected on the Affy 6.0 platform could not be validated by the NimbleGen CGX-135K platform or quantitative real-time PCR. These CNVs were either in regions of high density of segmental duplications or in close proximity to the centromere. To eliminate these false positives, all large rare CNVs with 470% segmental duplications overlap or near the centromere (e.g., 4p11) were also excluded from this study.
Detailed phenotyping of patients with large, rare CNVs Patients with potentially clinically relevant CNVs were called back for reassessment by a geneticist and given genetic counselling by a genetic counsellor. Lifetime medical history was reviewed and examination was performed for these patients to systematically characterize any syndromic (two out of three features of learning difficulties, global dysmorphic facial features, hypernasal voice 49 ) or other extracardiac features. This encompassed birth history, medical history including thyroid function, hearing and immunodeficiency, as well as speech, learning and behavioural difficulties. A family history of CHD, intellectual disability and congenital anomalies was also obtained.

Ethics statement
Ethical approval was obtained from the Institutional Review Board of the University of Hong Kong and Hospital Authority of Hong Kong West Cluster.