Copy number variants implicate cardiac function and development pathways in earthquake-induced stress cardiomyopathy

The pathophysiology of stress cardiomyopathy (SCM), also known as takotsubo syndrome, is poorly understood. SCM usually occurs sporadically, often in association with a stressful event, but clusters of cases are reported after major natural disasters. There is some evidence that this is a familial condition. We have examined three possible models for an underlying genetic predisposition to SCM. Our primary study cohort consists of 28 women who suffered SCM as a result of two devastating earthquakes that struck the city of Christchurch, New Zealand, in 2010 and 2011. To seek possible underlying genetic factors we carried out exome analysis, genotyping array analysis, and array comparative genomic hybridization on these subjects. The most striking finding was the observation of a markedly elevated rate of rare, heterogeneous copy number variants (CNV) of uncertain clinical significance (in 12/28 subjects). Several of these CNVs impacted on genes of cardiac relevance including RBFOX1, GPC5, KCNRG, CHODL, and GPBP1L1. There is no physical overlap between the CNVs, and the genes they impact do not appear to be functionally related. The recognition that SCM predisposition may be associated with a high rate of rare CNVs offers a novel perspective on this enigmatic condition.

two main events precipitated large case clusters of SCM [17][18][19][20][21][22] . Unusually for a major natural disaster, the tertiary hospital in Christchurch continued to function, allowing the collection of a relatively large homogenous cohort of cases which have been followed over several years [18][19][20][21][22][23] . Most research around this disorder has focused on sporadic SCM associated with heterogenous triggers 4,24,25 . Although the presentation of earthquake-associated SCM (EqSCM) appears to be similar to that of sporadic cases, a key difference is the homogenous nature of the trigger.
Various mechanisms have been postulated for takotsubo cardiomyopathy, including that the syndrome arises from stunning of the heart muscle (myocardium) as a result of either ischemia from spasm of the coronary arteries, or from the direct effect of catecholamines (dopamine, adrenaline or noradrenaline) on cardiac myocytes 4,24,26,27 . Despite suggestive pathophysiological observations and theories, most authors conclude that the aetiology of SCM is poorly understood, and we do not yet have satisfactory explanations for the origins of this condition 12,24,26,[28][29][30] . Some retrospective case series have suggested that the incidence of SCM is increased in patients with anxiety conditions, but in our studies we did not find any correlation with psychiatric or anxiety disorders 19,23 .
Amongst the models that may be proposed for SCM aetiology, it is worth considering the possible contribution of genetic factors. Many forms of cardiomyopathy have genetic origins 31,32 . Hypertrophic cardiomyopathy is the most common form of familial heart disease and a leading cause of sudden cardiac death. It is inherited in an autosomal dominant Mendelian manner with variable expressivity and age-related penetrance 31 . These cardiomyopathies show considerable genetic heterogeneity, with cases now attributed to some 1400 mutations in 11 genes, all of which contribute to cardiac sarcomere function. Familial dilated cardiomyopathy is also frequently attributable to an underlying genetic predisposition and at least 50 genes have now been implicated, with most eliciting disease as dominant mutations 32 .
Evidence for genetic contributions to SCM are not as strong as for other cardiomyopathies. However, there are several examples of familial occurrence of SCM involving siblings [33][34][35] or mother-daughter pairs [36][37][38][39][40] , and a large Swedish study of SCM identified three families in which several close relatives developed the condition 41 . The overall rarity of SCM would suggest that these familial clusters are significant, and it is quite possible that more overt familial relationships in this disorder are obscured by the simultaneous requirement for two key circumstances (in most cases): post-menopausal status and environmental exposure to a sudden major stressful event. Occasional cases of SCM occur in younger women or males, and a proportion of patients report no preceding stressor 42 , suggesting that an intrinsic pathogenic mechanism is involved. The recurrence of SCM in some patients, including one Christchurch EqSCM case 22 , also implies a biological vulnerability.
These observations have prompted consideration of genetic susceptibility to this condition 37,40,43,44 . Until recently, genetic studies were restricted to candidate gene analysis in case series of sporadic SCM patients [45][46][47][48][49] , but these have yielded mainly negative findings. One candidate gene study reported a significant difference in the frequency of a GRK5 polymorphism in cases 50 , but this has not been replicated and past history of single gene association studies suggests it is unlikely to be meaningful 51 . More recently, another candidate gene study has implicated estrogen receptor genes as potential risk factors for SCM 52 . In an effort to capture genome-wide data, exome sequencing [53][54][55][56] was recently applied to a sample of sporadic SCM cases 40 . Although this analysis did not reveal any difference in allele frequency or burden between SCM cases and population controls (28 adults with normal echocardiograms), it was noted that two thirds of the cases carried a rare deleterious variant within at least one gene of a large set of adrenergic pathway genes, and 11 genes harboured a variant in two or more cases. However, the significance of these rare variants remains unclear.
In this study, we set out to explore the role of genetic factors in predisposition to EqSCM. We specifically tested three discrete hypotheses for potential genetic contributions to risk of SCM: (i) an essentially Mendelian hypothesis that rare genetic variants in one or a few key genes cause predisposition, which was tested by whole exome sequencing (WES); (ii) that SCM was a complex disorder with genetic contributions from multiple common variants, which was tested using the Cardio-MetaboChip genotyping array; and (iii) that rare copy number variants (CNV) impacting on relevant genes contribute to risk, which was tested by array comparative genomic hybridization (aCGH).

Results
Exome Analysis. To test the potential for an essentially Mendelian predisposition to EqSCM, WES was carried out on 24 of the 28 Christchurch EqSCM cases. Several approaches to analysis of the identified variants were used, all of them hypothesising the involvement of gene variants with a low population minor allele frequency (MAF), that were over-represented in the EqSCM cohort. We carried out various iterations of filtering using variant allele frequency data derived from large population databases (1000 Genomes; NHBLI Exome Sequencing Project), followed by careful manual inspection of remaining variants. For example, excluding all variants present in these databases with an allele frequency >3%, and selecting for any present in at least 4/24 EqSCM exomes, identified variants in 131 genes, none of which proved to be convincing on closer analysis. We also carried out ranking of gene variants by predicted functional impact using various approaches [57][58][59] . Once again, none of the variants identified in these analyses proved to be significantly enriched amongst our EqSCM exomes (see supplementary information for more details).
Finally, the 11 genes listed by Goodloe et al. 40 , as well as a gene recently proposed to play a role in SCM, BAG3 44 , were carefully examined for presence of any rare variants in the EqSCM dataset. None of the previously identified variants 40,44 , and no other convincing rare variants in these genes, were detected.

Cardio-MetaboChip Analysis.
To test the possibility of a more complex, polygenetic basis to SCM risk, involving multiple variants of small effect size, Cardio-MetaboChip analysis was carried out. The custom Illumina iSelect Cardio-MetaboChip genotyping array (see supplementary information for more details) was designed to test ~200,000 single gene polymorphisms (SNPs), identified through genome-wide meta-analyses for metabolic, SCIeNtIfIC REPORTS | (2018) 8:7548 | DOI:10.1038/s41598-018-25827-5 atherosclerotic and cardiovascular diseases and traits 60 . The Cardio-MetaboChip data for 27 EqSCM cases and 133 heart-healthy controls (see supplementary information for more details) were compared by logistic regression (adjusted for ethnicity and gender, additive genetic model), first performed for pairwise comparisons across groups. No SNPs reached statistical significance of <0.05 after adjusting for false discovery rate (FDR) when comparing either the EqSCM and HVOLs, or the EqSCM and CHD samples. To investigate whether the top 100 of these SNPs mapped to gene pathways that might assist in understanding potential disease mechanisms underlying SCM, pathway analysis of the leading 100 SNPs in the EqSCM versus HVOLs pairwise comparison was performed in MetaCore. Disease Biomarker Pathway analysis identified Myocardial Ischemia as the third most enriched pathway (FDR-adjusted p = 1.3e −2 ), featuring 11 SNP loci on our list out of 886 pathway objects, including annexin V, ANRIL, COL4A1, dynein, HXK4, nectin-2, PPAR-gamma, prolidase, Tcf(Lef), UGT, and VEGFR-2.
aCGH Analysis. To test for potential involvement of CNVs in SCM, we applied aCGH to all cases. Of the 28 EqSCM cases examined by aCGH, twelve (42%) showed evidence of large, rare heterozygous CNVs classified as being of unclear clinical significance (Table 1), meaning that insufficient evidence is available for unequivocal determination of clinical significance 61 . Of all 17 CNVs detected in the 12 patients, six were deletions and 11 were duplications. All of the CNVs were different, and there was no physical overlap between the various CNVs. Each of these rare CNVs encompasses one or more genes, or their immediate upstream regulatory regions, and many of the genes included within the CNVs have functions of cardiac relevance. A full list of all CNVs detected in the cohort is presented in Supplementary Table S1.
Three cases (EqSCM 01, 06 and 19) harboured deletions very likely to impact genes of high relevance to cardiomyopathy or cardiac function. In EqSCM 01, intragenic deletion of RBFOX1 results in a single copy loss of one exon used by the majority of transcripts predicted for the gene (Fig. 1). This exon contains the start methionine for the RBFOX1 protein, meaning the gene is most likely rendered non-functional. RBFOX1 is an important RNA-binding protein mediating the incorporation of microexons into many transcripts associated with neurological patterning and tissue development 62,63 , particularly in the brain, heart and muscles. Intragenic deletions in RBFOX1 have been observed in a range of conditions, including occasional cases with cardiac defects [64][65][66] . Furthermore, RBFOX1-mediated RNA splicing was also recently shown to be an important regulator of cardiac hypertrophy and heart failure 67 .
In the second case (EqSCM 06, Fig. 2), a heterozygous deletion encompassed exon 2 and the majority of intron 2 of the Glypican 5 (GPC5) locus. GPC5 encodes a cell surface proteoglycan, which binds to the outer surface of the plasma membrane in the cardiovascular system and displays diverse functions including blood vessel formation after ischemic injury and proliferation of smooth muscle cells during atherogenesis 68 . GPC5 was also implicated by GWAS as a protective locus for sudden cardiac arrest 69 , and other glypicans (GPC3, 4 and 6) have been associated with cardiac dysfunction 70 . This case (EqSCM 06) also harbours a duplication on 4q21.21 involving the ANXA3 gene, which encodes a member of the annexin family, annexin A3. Members of this calcium-dependent phospholipid-binding protein family have a range of functions in the regulation of cellular growth and signal transduction pathways. Annexin A6 for example is the most abundant annexin expressed in the heart and its overexpression in mice has been shown to cause physiological alterations in contractility leading to dilated cardiomyopathy, while Annexin A6 knockout has been found to induce faster changes in Ca2 + transience and   increased contractility 71,72 . Alterations in expression and activity of annexins A5 and A7 have also been found to be associated with regulation of Ca2+ handling in the heart 73 . The function of annexin A3 is not fully understood, however it has been shown to play a role in endothelial migration and vascular development 74 . The third case (EqSCM 19), contained a deletion at chr13q14.3. This region harbours at least 10 genes (DLEU2, TRIM13, KCNRG, MIR16-1, MIR15A, DLEU1, DLEU1AS-1, ST13P4, DLEU7AS-1, DLEU7, and RNASEH2B-AS1), including several non-coding RNAs (DLEU genes and micro-RNA genes) and a gene (KCNRG) encoding a protein involved in the regulation of voltage-gated potassium channel activity. The micro-RNA genes mir-16-1 and mir-15a in this interval have been implicated in a range of cardiovascular phenotypes, including a role for mir-15a in postnatal mitotic arrest of cardiomyocytes [75][76][77] . Two further chromosome 13 duplicated CNVs of approximately 150 kb were classified as uncertain significance -one involving LINC00346 and ANKRD10 and the other containing ENOX1, postulated to affect vascular development based on zebrafish expression patterns 78 (Table 1).
Beyond these three cases, cardiac or relevant neurological impacts appeared likely for many of the other rare CNVs identified in EqSCM cases (Table 1, Fig. 2, Supplementary Table S1), several of which are discussed below.
A 96 kb heterozygous deletion in EqSCM 03 disrupts all predicted transcripts of the chondrolectin gene (CHODL), a membrane bound C-type lectin involved in muscle organ development, whose protein product is detected in heart and skeletal muscle by immunohistochemistry 79 . In addition to this CNV, this patient carries a 1.94 Mb duplication at 22q11.25, a locus containing 45 genes or miRNAs, associated with learning difficulties 80,81 . This individual, who exhibited a degree of cognitive impairment, had consented for clinically relevant findings to be forwarded to their General Practice clinician, who subsequently recommended genetic counselling for this individual.
A 455 kb duplication within EqSCM 04 at 1p34.1 affects the genes GPBP1L1, TMEM69, IPP, MAST2, and PIK3R3. Smaller rare duplications in this region have been reported by the DGV database 82 but none span the genes within this CNV. GPBP1L1 is widely expressed in many tissues, including heart muscle 83 and predicted to be involved in transcriptional regulation. TMEM69, a gene of unknown function, is most strongly expressed in heart tissue 83 . IPP is a transcription factor with a 50 amino acid Kelch repeat known to interact with actin, while MAST2 contains a PDZ domain and is another gene highly expressed in heart and skeletal muscle 83 . The protein product of PIK3R3, phosphoinositide-3-kinase regulatory subunit 3, acts downstream of G-protein-coupled receptors in cardiac function 84 , and is also a target for isoproterenol, which can trigger SCM-like conditions in humans and rodents 12,[85][86][87] .
Duplication of a long non-coding RNA (LOC101928358), the 3' segment of COL4A5 and the entire IRS4 gene at Xq22.3 (9 probes, 113 kb) was identified within case EqSCM 05. IRS4 is an insulin receptor molecule expressed in heart and skeletal muscle cells 88 and other tissues such as brain, kidney and liver 89 . Duplication of IRS4 may be of functional significance, although copy number increases on the X chromosome of females may be counteracted to a degree by random X inactivation. Of note is a Genoglyphix Chromosome Aberration Database (GCAD 90 ) case 52414 with a phenotype of low muscle tone, which has an identical duplication at this locus (as well as a 1p33 deletion). With regard to IRS4, Schreyer et al. 88 found a more restricted tissue distribution than IRS1 and IRS2, in primary human skeletal muscle cells and rat cardiac muscle and isolated cardiomyocytes. Although IRS4 protein function is still relatively unknown, the role of IRS proteins in general, acting as mediators of intracellular signalling from insulin and insulin-like growth factor 1 receptors, implicates IRS4 in cell growth and survival 91 . It is interesting to note that PI 3-kinase (PI3K) signalling in HEK293T cells depends on IRS4, and that the IRS proteins relay signals from receptor tyrosine kinases to downstream components of signalling pathways 88 , which is a connection with the PIK3R3 gene duplicated in one of our other cases, EqSCM 04.
EqSCM 10 harboured a 130 kb duplication of NLRP7, NLRP2, GP6 and RDH13 at 19q13.42. One similar DGV duplication has been seen in this area (nsv1062047 82 ), but otherwise duplicated CNVs are generally much smaller and rare. NLRP2 and NLRP7 are genes that encode members of the NACHT, leucine rich repeat, and PYD containing (NLRP) protein family. These proteins are implicated in the activation of pro-inflammatory caspases. Recessive mutations in NLRP2/7 in humans are associated with reproductive disorders 92 . Another gene in this duplicated cluster associated with disease is GP6, a platelet membrane glycoprotein, involved in collagen-induced platelet aggregation and thrombus formation, which is expressed at high levels in heart, kidney and whole blood 83 . A GP6 SNP (c.13254TC) has been implicated in recurrent cardiovascular events and mortality 93 . Another study involving this SNP 94 , found that hormone replacement therapy (HT) reduced the hazard ratio (HR) of CHD events in patients with the GP6 13254TT genotype by 17% but increased the HR in patients with the TC + CC genotypes by 35% (adjusted interaction P < 0.001). The authors found that in postmenopausal women with established CHD, the GP6 polymorphism, and another in GP1B, were predictors of CHD events and significantly modified the effects of HT on CHD risk 94 .
Duplication of PPL2, YPEL1 and MAPK1 on chromosome 22q11.21 (180 kb) observed in EqSCM 11, does not appear in the DGV catalogue of CNVs in healthy individuals, and the consequences of overexpression of these genes, miRNAs or regulatory sequences are unknown. One of the affected genes, MAPK1 (previously named ERK or ERK2), may constitute a link to another kinase intracellular signalling pathway -the RAF-MEK-ERK kinase cascade, which in mice and human has an established role in the induction of cardiac tissue hypertrophy 95 .
Although not the kind of left ventricular enlargement seen in SCM, subtle copy number variation at MAPK1 may influence signalling through this pathway. Another duplication in EqSCM 11 involving the 3' half of TSPAN7, a member of the tetraspanin protein superfamily (Xp11.4) was noted as rare, and a similar (though larger) duplication was recently observed in a patient with Rolandic epilepsy 96 .
An agenic duplication 50 kb upstream from NRG3 (10q23.1) was seen in case EqSCM 15. This CNV could conceivably disrupt upstream regulatory regions of NRG3, which encodes an important ligand for the transmembrane tyrosine kinase receptor ERBB4. NRG3 has been shown to activate tyrosine phosphorylation of its cognate receptor, ERBB4, and is thought to influence neuroblast proliferation, migration and differentiation by signalling through ERBB4. NRG3 is a strong candidate gene for schizophrenia, and neuregulin molecules and their receptors are involved in rat cardiac development and maintenance 97 .
Finally, the two rare deletions observed on chromosome 13 (13q21.33 and 13q33.1) in case EqSCM 17, fall into largely uncharacterised areas of the genome. The first is agenic, although there is a prediction of a spliced EST in the NCBI database, and the second occurs as two 50 kb blocks within the ITGBL1 gene. ITGBL1 is most strongly expressed in aorta 98,99 .

Discussion
Monogenic and polygenic models of risk. The Christchurch earthquakes repeatedly exposed the entire population of the city, approximately 350,000 people, to major stress and life disruption. Almost all patients presenting with EqSCM were post-menopausal females, consistent with other reports 26 . We set out to explore three categories of genetic contributions to SCM predisposition, using WES to explore Mendelian models of risk, Cardio-MetaboChip analysis to test for polygenic risk factors, and aCGH analysis to evaluate the role of genomic structural variants.
Extensive analysis of the WES data did not yield any apparent enrichment of rare, damaging variants within exome regions amongst the EqSCM cases. Therefore, it seems unlikely that point mutations or small insertion-deletion (indels) in a single gene underlie predisposition to earthquake SCM. A limitation of this analysis is that it would have been unable to detect regulatory mutations, or other important variants, not included or well represented within the captured exome regions. Whole genome sequencing may therefore be warranted to further test the hypothesis of Mendelian underpinnings of SCM, as this approach could identify any regulatory variants not obtained with WES, and due to the absence of a DNA capture step, would also provide more uniform coverage of exons. Of course, it is quite possible that SCM has variable or complex penetrance, and heterogeneous origins involving rare mutations, as do many other cardiomyopathies 31,32 , and it may prove difficult to identify underlying genetic contributors in a cohort of this small size. In addition, there may be environmental factors, in addition to earthquake-induced anxiety, that contribute to onset of the condition.
In a second approach, we explored the alternative hypothesis of polygenic risk alleles of small effect size using a case-control association study, with genotypes generated by the Cardio-MetaboChip. This chip allowed genotyping of ~200,000 SNPs previously identified through genome-wide association studies (GWAS) for risk of metabolic, atherosclerotic and cardiovascular diseases and traits 60 . The traits covered by the panel of genetic variants on the chip include myocardial infarction (MI) and coronary heart disease (CHD), type 2 diabetes (T2D), T2D age diagnosed, T2D early onset, mean platelet volume, platelet count, white blood cell, HDL cholesterol, LDL cholesterol, triglycerides, total cholesterol, body mass index, waist hip ratio (BMI adjusted), waist circumference (BMI adjusted), height, percent fat mass, fasting glucose, fasting insulin, 2-hour glucose, HbA1c, systolic blood pressure, diastolic blood pressure and QT interval. This analysis did not yield variants of genome-wide significance in the SCM cases compared to either healthy controls or patients with coronary disease. Exploratory pathway analysis suggested that the EqSCM cases carried a greater burden of SNPs that mapped to a myocardial ischemia pathway compared to the healthy controls, although this must be interpreted with caution as our small sample set meant very limited statistical power. Two limitations of this analysis were the relatively constrained content of the Cardio-MetaboChip, which is less able to provide a rich dataset of genome-wide SNP genotypes than the chips commonly used for GWAS, and the relatively small cohort of cases available for study. Recruitment of a much larger SCM cohort with a view to a well-powered GWAS with a more extensive genotyping chip would therefore be a worthwhile future goal to more fully explore possible polygenic underpinnings of this disorder. We note the recent publication of a preliminary GWAS on 96 SCM cases and 475 healthy controls 100 , and believe extension of this approach to larger cohorts is an important goal.

Involvement of copy number variants.
CNVs have been implicated in many diseases since the recognition a decade ago of their widespread distribution through the genome [101][102][103] . Of note, rare CNVs are implicated in autism, epilepsy, schizophrenia, developmental delay and intellectual disability 82,104-108 . Cardiac conditions which involve CNVs include congenital left-sided heart disease 109,110 , congenital heart disease 111,112 , some cases of long QT syndrome 113 , and Tetralogy of Fallot 114 . Our final analysis, therefore, was to explore the potential involvement of CNV in risk of EqSCM, using aCGH analysis of all cases. Results from this analysis were striking, with 42% of EqSCM cases having a rare CNV of unclear clinical significance. The CNV detection rate for diagnostic aCGH in childhood developmental disorders such as autism, developmental delay and intellectual disabilities, is approximately 20-30% [115][116][117] . A recent report of a large New Zealand aCGH case series (5,300 pre-and post-natal tests) reported CNVs in 28.3% of these clinically-selected cases 118 . Our observation of a rate of 42% for the EqSCM case series is significantly greater (P < 0.02) than rates for the enriched case cohorts normally referred for clinical aCGH testing [115][116][117][118] . The CNVs detected in EqSCM cases were all different, and there were no physical overlaps between them. This situation is similar to the pattern of CNVs seen in other conditions, including Rolandic epilepsies 96 and congenital heart disease 109,111,112 . Many of the CNVs we observed are likely to impact genes of potential relevance to physiological processes implicated in SCM.
We have taken a relatively conservative approach to categorising CNVs, in terms of rarity and predicted functional significance. For example, we did not include two CNVs located at 9p24.3, involving individuals EqSCM 14 and 28 -a deletion and duplication, respectively. These CNVs encompassed a region including the large isoform of DOCK8 gene. DOCK8 encodes a protein implicated in the regulation of the actin cytoskeleton 119 , and DOCK8 mutations cause autosomal recessive hyper-IgE syndrome 120 . One reported case of a homozygous 129 kb deletion in this region was associated with Graves's disease and aortic aneurysm 121  recorded in the DGV, and therefore we did not consider this to be pathogenic. However, it is of interest that we see two relatively rare CNVs at this locus in our small EqSCM cohort.
Of the twelve EqSCM cases with rare CNVs, we consider that three (EqSCM 01, 06 and 19) contain CNVs that affect genes of high relevance to cardiomyopathy or cardiac function. The remaining CNVs are also strong candidates with potential functional relevance. In one of our most highly-ranked CNV containing cases (EqSCM 01) a large genomic deletion removes an exon of RBFOX1 which contains the start codon used by the majority of transcripts predicted for the gene. A recent report by Gao et al. (2016) provided strong functional data that would support our hypothesis of this gene's involvement as a susceptibility locus for SCM 67 . Their work with mouse models has shown RBFox1 deficiency in the heart promoted pressure overload-induced heart failure, and induction of RBFox1 over-expression in these murine pressure-overload models, substantially attenuated cardiac hypertrophy and pathological manifestations 67 . The haploinsufficiency seen in EqSCM 01 at the RBFOX1 locus may, in concert with other environmental or genetic factors, contribute to SCM through reduced global RNA splicing changes in the heart.

Conclusion.
Beginning with a cohort of 28 SCM cases triggered by two major earthquakes that caused extensive death and damage in Christchurch (New Zealand), we carried out exploratory analyses of three models for genetic predisposition to this disorder. Using WES and Cardio-MetaboChip genotyping analyses we did not detect an obvious role for exonic mutations in a monogenic model, or SNPs in a polygenic model, for SCM risk. However, our analysis of copy number variation in SCM cases revealed a high rate of occurrence of CNV categorised as of uncertain clinical significance. Most of the CNV we detected in SCM cases were rare, or not previously seen (Table 1).
These observations lead us to propose that SCM is a copy number variant disorder, whereby haploinsufficiency of genes overlapping deletions or over-expression of duplicated genes leads to relatively subtle modification of cardiac or adrenergic physiology, such that these individuals are at increased risk of suffering SCM when exposed to specific environmental triggers. Although no obvious single pathway relationships between the genes affected by these CNVs is apparent, most of the CNVs encompass loci relevant to cardiac function or cardioneuronal development.
In order to confirm whether SCM predisposition does indeed arise from CNVs, four key areas for future work need to be pursued. First, more widespread analysis is required of CNVs in many SCM cases. This would confirm whether our observation of a high rate of CNV in EqSCM also prevails in sporadic cases, and it will broaden the catalog of affected genes, helping to discern underlying signalling networks and physiological processes. In addition, with increasing numbers of cases, physical overlaps between CNVs in different individuals should become apparent, pinpointing key genomic regions for more intensive analysis. Second, the inheritance patterns of these CNVs must be established. It is unclear what proportion are de novo versus inherited from either parent. Third, there is a clear need for detailed physiological and gene expression analyses on appropriate cells, including cardiomyocytes, derived from SCM cases. Given the diversity of CNV seen in our SCM cases, this goal would most effectively be achieved by generation of induced-pluripotent stem cell (iPSC) lines from many patients and appropriate controls 122,123 . Finally, although our data implicate CNV as a significant genetic factor underlying SCM risk, it would seem wise to pursue an effective GWAS strategy to identify other genetic contributors to SCM and build on the initial study in this area 100 . International initiatives to collate SCM cases 13 should therefore ensure that consented DNA is available to provide appropriately large numbers of well phenotyped cases and controls to facilitate this goal.
Finally, we hope our observations implicating CNV in this unique case series of EqSCM will stimulate further studies of copy number variation in other SCM cohorts, and lead to an improved understanding of this perplexing and intriguing condition.

Material and Methods
Cases. The September 2010 earthquake of magnitude 7.1 on the Richter scale (Mw 7.1) in Christchurch (New Zealand) triggered eight cases of EqSCM, and the shallow highly destructive quake (Mw 6.3) that followed in February 2011 triggered 21 cases over four days. One woman presented after both quakes 22 , and one was in hospital during the initial quake. Enrolment of this latter participant was delayed, and her sample was available for aCGH analysis (n = 28) but not for Cardio-Metabolome analysis (n = 27). The steps leading to recruitment of our EqSCM cohort are detailed elsewhere 20,21 , but briefly, our study commenced the day of the first earthquake with the creation of a register of prospectively identified earthquake stress cardiomyopathy cases. As our hospital was still functioning we could build a cohort with first-world data from complete single centre capture. After the second earthquake the study was extended [20][21][22] .
Inclusion criteria: i) Meeting modified Mayo criteria for stress cardiomyopathy and admitted to Christchurch Hospital within one week of either the September 2010 or February 2011 earthquake; ii) age over 18; iii) informed consent given. Exclusion criteria: i) unable to understand English sufficiently to be able to complete questionnaires.
All participants were recruited with informed consent, including discussion of the possibility of incidental findings from genetic analyses, and return of such findings after consultation with a medical geneticist. The Southern Health and Disability Ethics Committee (New Zealand) approved this study (reference URA/11/07/033). All methods used in this research were performed in accordance with the relevant guidelines and regulations. Exome analysis. We applied WES to a subset (24 of 28) EqSCM cases. The exome capture and sequencing was carried out in two batches of 12, during 2012-13 (New Zealand Genomics Limited, Dunedin, New Zealand). DNA was processed with Illumina TruSeq sample preparation and exome enrichment kits (which capture ~62 Mb of genomic DNA), and sequencing (100 bp paired-end reads) was carried out on an Illumina HiSeq. 2000 system. Good quality sequence was obtained across all exomes, with very few unassigned reads. Raw read data were aligned to the GRCh37 human reference genome using the Burrows-Wheeler Aligner (BWA) 124 , and processed through the Broad GATK pipeline 125 . The alignment process included removal of reads from duplicate fragments, realignment around known indels, and recalibration of all base quality scores. Read depth and coverage for all exome datasets were derived using Picard CollectHsMetrics function. On average, 22.7 million reads (range 16.9-30.7 M) at mean quality scores (Phred) of Q37 were mapped to the exome target intervals; 70% (range 61-77%) of targeted bases were covered at least 20-fold. Full metrics for each sample are provided in Supplementary Table S2.
Joint variant calling was performed with GATK's HaplotypeCaller. This included de novo assembly at each potential variant locus. Variants were annotated and analysed using Ingenuity Variant Analysis (IVA) software (QIAGEN, Redwood City, CA, USA), MutationTaster2 59 , SnpEff 58 , SeattleSeq annotation server 53 , and Galaxy (via usegalaxy.org) 126 . Allele frequencies and additional annotations were drawn from 1000 Genomes project 127  Quality control with summary analysis of allele and genotype frequencies, Hardy-Weinberg equilibrium tests, and missing genotype rates were performed with PLINK version 1.07 software 130 . SNPs with a minor allele frequency of <0.05 and those that failed the Hardy-Weinberg equilibrium test (p < 0.001) were excluded from the analysis, leaving 141,095 SNPs in the analysis (Table 2). Three samples from the CHD Study were also removed after analysis of relatedness. Principal Component Analysis (PCA, Eigenstrat 4.2) was performed on an independent subset of almost 50,000 SNPs; the first principal component explained 6% of the variation, subsequent components all less than 0.5%, and matched self-reported ethnicity (visual inspection). Hence the first principal component was subsequently included as a factor in the logistic regression. Logistic regression was performed to evaluate differences in SNP minor allele frequencies between groups, adjusted for ethnicity and gender, using an additive genetic model (R 3.01 software 131 ). P values were adjusted for false discovery rate (FDR) using the Benjamini Yekuteli method 132 . Pathway analysis was performed for the leading 100 SNPs in each pairwise group comparison, using MetaCore from GeneGo (Thomson Reuters).
CNV detection and analysis. Array comparative genomic hybridisation (aCGH) was undertaken on 28 EqSCM cases to examine structural variants in the cohort. For this analysis, we used either the Nimblegen 135k oligo array (CGX12) (Roche NimbleGen Inc, Madison, WI, USA), capable of genome-wide screening for CNV to a resolution of 10 kb in well-categorised pathogenic genomic regions, and 50 kb elsewhere, or the Agilent 180k HD oligo array (Sureprint G3 Human 4 × 180k) (Agilent Technologies, Santa Clara, CA, USA), which has a similar resolution (see supplementary information for more details).
Pooled reference DNA samples (catalogue numbers G147A and G152A) were purchased from Promega (Madison, WI, USA). EqSCM case and reference DNA samples (0.5-1 μg each) were labelled with Cy3 and Cy5 dyes respectively, purified, hybridized, and washed according to Nimblegen and Agilent protocols. Microarrays were scanned on a GenePix 4000B laser scanner (Axon Instruments, CA, USA) or a G2600D Agilent SureScan microarray scanner (Agilent Technologies). Data was processed using Nimblescan (Roche Nimblegen Inc) or Cytogenomics software (Agilent Technologies) with the default algorithms and analysis settings, but with a 5 probe minimum calling threshold. All arrays passed QC metrics for derivative log ratio spread (DLRS) values of <0.2. CNV data was visualised and interpreted using Genoglyphix software (Perkin Elmer) and NCBI genome browser software (genome build hg19 (GRCh37)). The EqSCM aCGH data were assessed against many CNV databases including Genoglyphix Chromosome Aberration Database (containing over 14,000 validated variants from 50,000 samples) (Perkin Elmer, Waltham, MA, USA) 90 , DECIPHER 133 , and the Database of Genomic Variation (DGV, containing CNV data from over 35,000 unaffected individuals) 134 . CNVs were classified as thought to be benign (TBB), uncertain clinical significance (UCS), or clinically significant (CS) using an evidence-based approach 61,135-137 which included database comparisons (frequency in cases/controls and relation to phenotype), gene content, gene function and dosage sensitivity. A broad summary of our CNV interpretation algorithm is depicted in Supplementary Figure S1. Rare CNVs are defined as those that occur at a frequency of ≤1%. We classified our rare CNV frequency using larger DGV studies containing >1000 individuals 82,138-141 .

Stage of analysis Number of SNPs Comments
Data availability. The datasets generated during the current study are not publicly available for ethical reasons but can be made available by the corresponding author (M.A.K) on reasonable request.