Introduction

Cardiomyopathy is a diverse group of cardiac disorders characterized by mechanical and/or electrical dysfunction of the cardiac muscle. The diseases are associated with significant morbidity and mortality and are a known risk factor for sudden cardiac death.1, 2, 3 Over time, several classification systems have evolved based on etiology, anatomy, physiology, or histopathological expression.4 In newer classification systems, major types include hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), and arrhythmogenic right ventricular cardiomyopathy (ARVC).

Hypertrophic cardiomyopathy is characterized by a non-dilated, hypertrophic left ventricle with variable degrees of diastolic dysfunction, whereas DCM is characterized by dilated ventricular cavities and systolic dysfunction.4, 5, 6, 7 In ARVC, progressive fibrofatty replacement of the normal cardiac tissue predisposes to ventricular tachycardia and sudden death.8, 9 The prevalence of these three cardiomyopathies in the general population has been estimated to be 1:500, 1:2500, and 1:5000, respectively.3

Inherited cardiomyopathy has traditionally been considered a monogenic disorder and to date hundreds of variants in 84 genes have been associated with these syndromes. However, some associations are based on weak family phenotype–genotype co-segregation and/or the absence of the variant in a limited number of controls.

Until recently, there has only been limited knowledge regarding the genetic variation in the general population, especially with regard to low-frequency variants. This was changed in June 2011 when whole exome data from the NHLBI GO Exome Sequencing Project (ESP) was published (latest update June 2012).10 In order to identify possible false-positive cardiomyopathy variants reported in the literature, we aimed to investigate the prevalence of previously cardiomyopathy-associated variants in the new ESP exome data and compare the prevalence of these variants with the expected prevalences of monogenic cardiomyopathies in the same population.

Methods

In ESP, next-generation sequencing of all protein coding regions in 6500 individuals, including both European Americans (4300 individuals) and African Americans (2203 individuals), from different population studies were carried out.10 No clinical data were available on the ESP population, nor at request. By literature search, we found inclusion and exclusion criteria on 9/12 cohorts used in ESP. None of these has specifically included persons with cardiomyopathies or other heart diseases and at least two cohorts have excluded such patients.

The databases ARVD/C Genetic Variants database (last update April 2012)11 and The Human Gene Mutation Database (updated June 2012)12 were searched for missense and nonsense cardiomyopathy-associated variants involving the three major types; HCM, DCM, and ARVC. All genes in ARVD/C Genetic Variants database were evaluated and in HGMD the search term ‘Cardiomyopathy’ was used. In total, 84 genes associated with cardiomyopathy were identified. Genes were then evaluated one by one and the ones associated with any of the above-mentioned cardiomyopathies were selected. Additionally, we included the recently reported DCM-associated TTN nonsense variants published by Herman et al13 in order to include all genes so far associated with DCM. All identified variants were then systematically searched for in ESP. Only variants classified by one of the databases as being pathogenic/disease causing were included in the analyses. Variants of unknown pathogenicity or variants classified as ‘disease-causing mutation?’ are marked with ‘b’ in Tables 1, 2, 3, but in order to make a conservative approach, these variants were excluded from our calculations. Due to lack of data regarding variants positioned in promoters, introns and UTRs regions in ESP, these could not be included.

Table 1 Variants associated with hypertrophic cardiomyopathy present in the ESP population
Table 2 Variants associated with dilated cardiomyopathy present in the ESP population
Table 3 Variants associated with arrhythmogenic right ventricular cardiomyopathy present in the ESP population

In addition to taking all identified variants associated with HCM, DCM, and ARVC, into account for the calculation of genotype prevalences, we also did a more conservative approach. Based on the frequencies of HCM, DCM, and ARVC in the general population (1:500, 1:2500, and 1:5000, respectively), the estimated number of individuals in the ESP data that can be expected to be affected by HCM, DCM, and ARVC are ∼13, 3, and 2, respectively. These values roughly represent the number of times a given variant with complete penetrance can be present in the exome database and still theoretically be the cause of monogenic forms of the respective cardiomyopathies.

The literature was searched for functional data and family co-segregation of all the cardiomyopathy-associated variants identified in the ESP population. Positive functional data were defined as any in vivo or in vitro model, demonstrating results differing from the wild-type model. Co-segregation was defined as at least two family members in two generations both having the phenotype and the genotype.

Additionally, we conducted a PolyPhen-2 prediction14 on all previously reported missense variants. Variants were, by PolyPhen-2, predicted to be ‘benign’, ‘possible damaging’, or ‘probably damaging’. As nonsense variants cannot be evaluated by PolyPhen-2, we classified these as of ‘unknown pathogenicity’. In an analysis, we evaluated differences in distributions of the four categories of pathogenicity between the variants identified in ESP vs variants not identified in ESP with the use of Fisher’s exact test. A P-value <0.05 was considered as statistical significant. In case of a statistical significant difference, we also evaluated the difference in proportions of variants being predicted as benign for variants identified in ESP vs variants not identified in ESP, also with the use of Fisher’s exact test.

Using a Taqman assay as previously described,15 we genotyped seven variants with a pathogenic association and a prevalence in the proportion of ESP with European American ancestry high enough (10:6500) to have a modest chance of being detected in our own control population (N=534). The control population of Northern European ancestry consisted of men and women between the age of 55–75 years with no history of arrhythmias or other cardiac diseases and with available ECGs as previously described.16 The ECGs from geno-positive controls were evaluated by two independent experienced ECG readers with regard to the 2010 task force ECG criteria for ARVC17 and with regard to the Cornell18 and the Sokolow–Lyon criteria for ventricular hypertrophy.19

Results

Hypertrophic cardiomyopathy

In the ESP population, we identified 94 out of 687 variants previously associated with HCM (14%). Ninety-tree missense and one nonsense variants were identified, affecting 1672 individuals in total (homozygote=76, heterozygote=1596). Eighteen variants with family co-segregation analyses and 16 variants with functional characterization different form wild-type were identified in ESP. On average, the genes investigated were sequenced in 6286 individuals, corresponding to a genotype prevalence of 1:4 (1672:6286). PolyPhen-2 analysis of the 94 HCM-associated variants present in ESP predicted 39 (41%) to be probably damaging, 14 (15%) to be possibly damaging, and 40 (43%) to be benign. Only one nonsense variant was found in ESP and classified as being of unknown pathogenicity (Table 1). Of the remaining 593 HCM-associated variants not present in ESP, 324 (55%) were predicted to be probably damaging, 108 (18%) possibly damaging and 107 (18%) were predicted to be benign. Fifty-four nonsense variants were classified as being of unknown pathogenicity. This difference in the distribution of the four categories of pathogenicity was statistical significant both for the overall comparison (P<0.0001) and when comparing the proportion of variants predicted to be benign for variants identified in ESP vs variants not identified in ESP (43% vs 18%, respectively, P<0.0001).

Fourteen of the 94 variants were identified in ≥13 individuals, though above our conservative cutoff value. These variants affected a total of 1474 individuals, which is equivalent to a HCM genotype prevalence of 1:4 (1474:5810). If variants predicted to be benign by Polyphen-2 (43%) were additionally excluded, the genotype prevalence was 1:7. The cardiomyopathy-associated variants identified in the ESP population are listed in Table 1.

Two variants (MYBPC3 p.V896M and MYH7 p.M982T) were, based on our criteria, selected for genotyping in our control population. Five individuals were heterozygous carriers of the MYBPC3 p.V896M variant and three carried the MYH7 p.M982T variant. This corresponds to genotype prevalences of 0.94 and 0.56%, respectively, which are comparable to those found in ESP (0.96 and 0.44%, respectively; Table 4).

Table 4 Cardiomyopathy-associated variants in ESP and in control population

Dilated cardiomyopathy

In DCM, we found 58 out of 337 variants previously associated with DCM (17%). Two out of the 58 variants were nonsense variants. Both nonsense variants (LAMA4 p.R1073X and VSP13A p.R3135X) were only found in a single individual and both were heterozygous for the variant. A total of 1043 individuals were affected. On average, the genes investigated have been screened in 6314 individuals, and this results in a DCM genotype prevalence of 1:6 (1043:6314). Four variants with convincing segregation analyses were identified and 26 variants were found to have functional effects. PolyPhen-2 analysis of the 58 DCM-associated variants predicted 26 (45%) to be probably damaging, 11 (19%) possibly damaging, and 19 (33%) variants were predicted to be benign whereas two nonsense variants were classified as being of unknown pathogenicity (Table 2). Of the remaining 279 DCM-associated variants, not present in ESP, 134 (48%) were predicted to be probably damaging, 43 (15%) possibly damaging, and 56 (20%) were predicted to be benign. Forty-six nonsense variants were classified as being of unknown pathogenicity. This difference in the distribution of the four categories of pathogenicity was statistical significant both for the overall comparison (P=0.013) and when comparing the proportion of variants predicted to be benign for variants identified in ESP vs variants not identified in ESP (33 vs 20%, respectively, P=0.039).

Thirty-five out of the 58 variants were identified in three or more individuals, though above our conservative cutoff value. These variants affected a total of 963 individuals giving a genotype prevalence of 1:7 (963:6334). If variants predicted to be benign by Polyphen-2 (33%) were additionally excluded, the genotype prevalence was 1:10. The DCM-associated variants identified in the ESP population are listed in Table 2.

Two variants (CSRP3 p.W4R and MYH6 p.A1004S) were selected for genotyping in our control population. Six individuals were heterozygote carriers of the CSRP3 p.W4R variant and two carried the MYH6 p.A1004S variant. The prevalences were thus comparable to the ones in ESP (1.12 vs 1.07% and 0.37 vs 0.26%, respectively; Table 4). One individual carrying the CSRP3 p.W4R variant fulfilled the Cornell ECG criteria for ventricular hypertrophy; however, this individual died at the age of 73 and was never diagnosed with cardiomyopathy. The ECGs from the rest of the genotype-positive individuals were normal and without signs of ventricular hypertrophy.

Arrhythmogenic right ventricular cardiomyopathy

Thirty-eight out of 209 variants associated with ARVC (18%) were found in the ESP population. One nonsense and 37 missense variants were identified, affecting a total of 1404 individuals. Only one variant with convincing family co-segregation and three variants with functional characterization different from wild-type were identified in ESP. Twenty-eight of the 38 variants were identified in two or more individuals. On average, the genes investigated in ARVC have been sequenced in 6354 individuals thus corresponding to an ARVC genotype prevalence of 1:5 (1407:6354). PolyPhen-2 analysis of the 38 ARVC-associated variants predicted 14 (37%) to be probably damaging, 3 (8%) to be possibly damaging, and 20 (53%) were predicted to be benign whereas one nonsense variant was classified as being of unknown pathogenicity (Table 3). Of the remaining 171 ARVC-associated variants, not present in ESP, 77 (45%) were predicted to be probably damaging, 14 (8%) possibly damaging, and 21 (12%) were predicted to be benign. Fifty-nine nonsense variants were classified as being of unknown pathogenicity. This difference in the distribution of the four categories of pathogenicity was statistical significant both for the overall comparison (P<0.0001) and when comparing the proportion of variants predicted to be benign for variants identified in ESP vs variants not identified in ESP (53 vs 12%, respectively, P<0.001).

Twenty-eight variants were present in two or more individuals, though above our conservative cutoff value, and this still corresponded to an ARVC genotype prevalence of 1:5 (1393:6359). If variants predicted to be benign by Polyphen-2 (53%) were additionally excluded, the genotype prevalence was 1:11. The ARVC-associated variants identified in the ESP population are listed in Table 3.

Three variants (PKP2 p.D26N; DSG2 p.V158G; and DSP p.V30M) were genotyped in the control population and five individuals were heterozygote carriers of the PKP2 p.D26N variant, nine of DSG2 p.V158G, and five of DSP p.V30M. One individual was carrier of both the DSG2 p.V158G and the DSP p.V30M variant. The variant frequencies were comparable to those found in ESP (0.94 vs 1.37%; 1.69 vs 1.58%; and 0.94 vs 0.37%, respectively; Table 4). ECG’s from geno-positive individuals were normal and without signs of ARVC or ventricular hypertrophy.

Discussion

The present study identified a high prevalence of cardiomyopathy-associated genetic variants in recently published population-based exome data. Fourteen percent of all previously HCM-associated variants and 18% of all DCM- and ARVC-associated variants were identified in ESP. Thus, a much higher prevalence of cardiomyopathy-associated genetic variants were identified in ESP than expected from the phenotype prevalences in the general population.

In order to validate the marked overrepresentation of variants associated with HCM, DCM, and ARVC in ESP, we genotyped seven variants in seven different genes associated with cardiomyopathy in a second population with clinical data available and no history of arrhythmias or other cardiac diseases. Thirty-four out of the 534 control subjects carried at least one of the variants. The seven genotyped variants were present with frequencies comparable with those found in ESP (Table 4), and all geno-positive controls, except from one individual, had ECGs without any signs of cardiomyopathy (eg, no hypertrophy or signs of ARVC). Thus, overrepresentation of cardiomyopathy-associated variants in ESP does not seem to be a major problem. In a recent paper,20 we also established that prevalences of four other variants genotyped in a control population were comparable to those of ESP. These results thus indicate that ESP consists of individuals representative of the general population.

A control population with available echocardiograms would have been preferable, but such a control population was not available. However, symptoms and signs of cardiomyopathy do not usually appear beyond the age of 50–60 years in these diseases,21, 22, 23 and also 75–95% of ARVC and HCM patients display ECG abnormalities.24, 25 This indicates that our control population is well suited since it consists of 534 people all above the age of 55 with no reported signs of cardiovascular diseases. It is of course possible that a small fraction of the control population might develop cardiomyopathy in a very late age and that variant carriers are displaying reduced penetrance. However, this is not very likely, since we found a high number of carriers of the seven genotyped variants, and the fact that all geno-positive individuals except one had ECGs without any signs of cardiomyopathy and no history of cardiac diseases.

A genotype prevalence of 1:4 for HCM, 1:6 for DCM, and 1:5 for ARVC is unlikely to be caused by reduced or age-related penetrance. Even when taking into consideration a penetrance as low as 20% (reported for some genotypes), it would still result in genotype prevalences being massively overrepresented.

PolyPhen-2 predicted a statistically significant higher proportion of the variants present in ESP to be benign compared with the variants not present in ESP (43 vs 18% for HCM, 33 vs 20% for DCM, and 53 vs 12% for ARVC). This analysis further questions the pathogenic role of at least some of the variants present in ESP.

In the lack of phenotypic data available on the ESP population, we defined a cutoff value based on the expected prevalences of the respective cardiomyopathies in the same population. In this definition, variants with prevalence above this cutoff were assumed not to be monogenic causes of cardiomyopathy. However, taking this conservative cutoff into account revealed genotype prevalences similar to the ones obtained when including all cardiomyopathy-associated variants. Such a cutoff is of course somewhat arbitrary because of uncertainty regarding true prevalences of the cardiomyopathies in the general population (ESP) and because variants with reduced penetrance or recessive inheritance are not taken in to account. However, most variants listed in the ARVC database and in HGMD are reported as monogenic, autosomal dominant causes of the cardiomyopathies.

Interpretation of the significance of the cardiomyopathy-associated variants with prevalences below our cutoff and thus present in a very low frequency in the ESP data is much less straightforward. These rare variants may be monogenic causes of cardiomyopathy, disease-modifiers, or benign. A small number of studies have associated genetic variation with increased susceptibility for cardiomyopathy in a non-monogenic manner.26, 27, 28 For this reason, we can only exclude high-prevalent variants as monogenic causes of cardiomyopathy, but we cannot make a conclusion about possible disease-modifying effects.

It is noteworthy that four genes associated with HCM in the HGMD database (COX15, OBSCN, SRI, and VCL) only had one variant that has been associated with cardiomyopathy (COX15 p.R217W, OBSCN p.4344Q, SRI p.F112L, and VCL p.L277M). These four variants were also present in ESP and both OBSCN p.4344Q and SRI p.F112L had prevalences above our defined cutoff values. Similarly, in five genes associated with DCM only one variant was identified in each gene (DSG2 p.T335A, FLT1 p.R54S, POLG p.N736S, TMPO p.R690C, and VPS13A p.R3135X) and all of these were also present in ESP. Only DSG2 p.T335A and VPS13A p.R3135X were below our cutoff value. Our data suggest that the genes OBSCN, SRI, FLT1, POLG, and TMPO require a revaluation regarding their disease causation with HCM and DCM.

A number of variants with functional effects or family co-segregation were identified in ESP. Functional characterization and family co-segregation analyses within families are valuable tools in determining the pathogenicity of identified sequence variants. However, small family sizes and reduced penetrance often hampers segregation analyses. In addition, functional characterization in model systems may not be representative of in vivo human physiology and an observed difference in a model system may not be of clinical importance. As an example, the CSRP3 (alternative symbol MLP) p.W4R variant has been associated with cardiomyopathy in functional systems,29, 30 but lack of family co-segregation has also been reported.31

Genetic screening is gaining ground in the identification of patients and family members at an increased risk of cardiomyopathies. Identification of a misclassified genetic variant in cardiomyopathy patients might lead to erroneous risk stratification, misdiagnosis of family members and this could have potentially devastating clinical consequences. It is therefore important that variants being reported as causative of cardiomyopathies are truly disease causing.

In conclusion, we identified a massive overrepresentation of previously cardiomyopathy-associated genetic variants in new population-based exome data. With genotype prevalences up to one thousand times higher than expected from the phenotype prevalence in the general population, we suspect a high number of these genetic variants to be only modest disease-modifiers or even non-pathogenic.