Introduction

Determining variant pathogenicity is the major challenge in cardiomyopathy genetic testing, spawning debate in clinics worldwide. Sequence data interpretation has been biased historically by a focus on affected patients and a full appreciation of the normal spectrum of variation in cardiomyopathy “disease genes” has been lacking. In recent years, it has become apparent that many rare protein-altering variants reported to be pathogenic are also seen in the general population.1,2,3,4 These disturbing findings raise doubts about the discriminative efficacy of variant annotation pipelines used in literature reports and the reliability of genetics results provided to patients. Given the increasing role of personal genome sequencing in clinical practice, there is a critical need for an improved strategy for identifying the subset of rare variants that are truly disease-causing.

Dilated cardiomyopathy (DCM) is the most common cardiomyopathy and frequently has a genetic etiology. More than 100 genes have been implicated to date.5,6,7,8 Genetic testing panels generally contain sets of putative DCM-associated genes and have expanded over time to include genes with direct and indirect links to other cardiac and skeletal myopathies. This inclusive approach potentially reduces negative screening results but magnifies the problem of rare variant interpretation.

The aim of this study was to investigate parameters that might improve discrimination between rare cardiomyopathy gene variants in patients with DCM and healthy control subjects. Our analysis included evaluation of variant-level and gene-level characteristics, as well as assessment of the total numbers of rare variants in each individual. This type of information about personal burden of rare variants is unable to be determined from studies of single genes or small gene panels in DCM patients, and is generally not available in population databases where single variants in de-identified subjects are listed separately. Our findings provide a new framework for variant prioritization and highlight the key role of a subset of genes in DCM pathogenesis.

Materials and methods

Study subjects

Participants provided informed written consent, and protocols were approved by the institutional human ethics committees. Patients with familial or sporadic idiopathic DCM (n = 532, aged 47 ± 19 years, 69% males) and healthy control subjects (n = 527, aged 49 ± 15 years, 38% males) were recruited from Brigham and Women’s Hospital, Boston Children’s Hospital, Victor Chang Cardiac Research Institute, Royal Brompton & Harefield NHS Foundation Trust, and the London Institute of Medical Sciences, Imperial College, London (Supplementary Methods, Supplementary Fig. S1). The absence of cardiovascular disease in control subjects was ascertained by medical history, and was confirmed by cardiac magnetic resonance imaging in 319 individuals (61%). All subjects had self-reported European ancestry and population stratification was evaluated using principal component analysis (Supplementary Methods, Supplementary Table S1).

A DCM replication cohort was comprised of Australian familial DCM probands (n = 101, aged 45 ± 16 years, 67% males). Replication control data were obtained for subjects aged >70 years drawn from the Alzheimer’s Disease Sequencing Project (n = 2971) and the Medical Genome Reference Bank (n = 1144). These replication control cohorts were selected specifically to account for the age-related penetrance of DCM and because per-person data were available.

Gene sequencing

Genomic DNA libraries were constructed using standard library preparation protocols. Fragments were ligated to adaptors, amplified, purified, then hybridized to custom arrays enriched for coding regions of genes associated with DCM and other inherited cardiac disorders. A 69-gene panel was used by the Boston laboratory (n = 203 DCM cases, 208 controls), and a 64-gene panel was used by the UK laboratory (n = 320 DCM cases, 319 controls) (Supplementary Fig. S1, Supplementary Table S2). Sequencing was performed using the Illumina HiSeq 2000 or SOLiD 5500xl platforms. Selected variants were evaluated in probands and family members using Sanger sequencing.

Sequence data analysis

Sequence data were processed and aligned to the hg19 (GRCh37) human genome reference using Novoalign and the Genome Analysis Toolkit, or Lifescope v2.5.1. Data for the 41 genes that were represented on both the 69-gene and 64-gene panels, and for which variants were detected in cases, were included in this study (Supplementary Methods, Supplementary Table S2). Coding-sequence variants that were truncating (stop gain, splice donor or acceptor site gain or loss, frameshift indels) or missense, and that met quality metrics for mapping, read depth, and allelic balance were evaluated. Population data for variant minor allele frequency (MAF) were obtained from the Exome Sequencing Project (ESP) and Exome Aggregation Consortium (ExAC) v1 databases. In silico pathogenicity predictions for missense variants were made using PolyPhen2, SIFT, PROVEAN, and MetaSVM.9,10,11

Gene group allocation

A PubMed literature search was undertaken to assess available evidence supporting roles in DCM pathogenesis for each of the 41 genes evaluated. A semiquantitative scoring system was devised to rank genes using grades of A (strong) to D (weak) based on genetic, in vitro, and animal model data (Supplementary Methods).

Statistical analysis

Statistical comparisons between groups were made using Fisher’s exact or chi-squared (2 × 2, 2 × 4) tests. To identify the cells that contributed most to 2 × 4 chi-squared tests, standardized residuals were calculated and deviations of absolute value >2 were considered significant. To evaluate variant distribution in annotated domains of group A genes, missense variants identified in DCM cases were pooled with those of Walsh et al.12 and compared with ExAC missense variants (MAF <0.1%) within the same protein domains using Fisher’s exact test. P < 0.05 was used to denote statistical significance.

Results

Rare variants in cardiomyopathy gene are common in cases and controls

Genetic screening was performed in 532 DCM patients and 527 healthy control subjects. Study participants were of European ancestry, and no significant population stratification effects were detected on a principal component analysis (Supplementary Fig. S2). Variants that were protein-altering (truncating, missense) and rare (defined here by MAF <0.1% in the European [EA] subgroup in the ESP database) in a set of 41 cardiomyopathy-associated genes were evaluated. Variants that met these criteria were found in 407 (77%) DCM cases and in 348 (66%) control subjects (P = 0.0002), with the number of rare variants per person ranging from 0 to 13 (mean 1.63) in DCM cases and from 0 to 8 (mean 1.24) in controls (P < 0.0001). To explore strategies for identifying disease-associated variants, we compared 770 different variants found in DCM cases and 589 variants in control subjects (Supplementary Table S3).

TTN variants

There was an excess of truncating TTN variants (TTNtv) in DCM cases (83 [11%] variants versus controls, 6 [1%] variants; P < 0.0001; Supplementary Table S4). TTN missense variants comprised 40% of all variants found in cases and controls. Unlike missense variants in other genes (see below), TTN missense variants were not differentiated between groups by assessment of novelty or in silico functional predictions (Supplementary Table S5), and these were excluded from subsequent variant-level, gene-level, and per-person analyses.

Limited discriminative value of variant characteristics

There were relatively few truncating variants in cardiomyopathy genes other than TTN, and these were equally seen in DCM cases (25 [3%] variants) and controls (17 [3%] variants; P = 0.75). To start to stratify the larger numbers of missense variants found in both groups, we looked at novelty and in silico functional predictions.

Variant novelty

The absence of a variant in healthy subjects has been used as a criterion of pathogenicity in many DCM pathogenic variant reports.13,14,15 However, assessment of novelty varies with the size of the reference cohort, and differences between groups were only found by using the ExAC database (>60,000 subjects), with DCM cases having more novel variants than controls (P < 0.0001; Supplementary Table S5). Surprisingly, more than half of the missense variants in our data set have been identified previously in genetic testing studies and are listed in ClinVar (Supplementary Table S5).

Variant functional predictions

To assess variant function, we evaluated subsets of variants that were (1) uniformly predicted to be deleterious in PolyPhen2, SIFT, and PROVEAN,9,10 or (2) predicted deleterious by MetaSVM, which provides an ensemble score derived from 10 separate algorithms.11 Both of these approaches yielded significant differences between DCM cases and controls, with modestly better results for MetaSVM’s ensemble score than for the three-program consensus score (Supplementary Table S5). When novelty and functional predictions were combined, discrimination between groups was increased: DCM cases were twice as likely as controls to have variants that were absent from ExAC or deleterious (MetaSVM), and nearly five-fold more likely to have variants that were absent from ExAC and deleterious (MetaSVM) (Fig. 1a).

Fig. 1: Criteria for variant prioritization.
figure 1

(a) Missense rare variants in cardiomyopathy genes in DCM cases and controls were compared using a number of criteria including novelty with respect to the European subgroup of the Exome Sequencing Project (EA-ESP) and non-Finnish European (NFE) or all subjects in the Exome Aggregation Consortium (ExAC), in silico functional predictions (PolyPhen2, SIFT, PROVEAN, MetaSVM), Residual Variation Intolerance Score (RVIS), and gene group. The odds ratios (OR) for DCM are displayed in a forest plot. (b) Distribution of truncating variants across gene groups in controls (white bars) and DCM cases (blue bars). (c) Distribution of missense variants across gene groups in controls (white bars) and DCM cases (blue bars). The subset of “novel (ExAC) + deleterious (MetaSVM)” missense variants are denoted in the solid sections of bars. There were significant effects of clinical status and gene group for truncating variants (b, P = 0.01; 2 × 4 chi-squared test) and missense variants (c; P = 0.018); see Supplementary Table S8 for statistical analysis. For all panels, TTN truncating and missense variants were excluded

Importance of gene-based parameters

Because variant-level parameters incompletely differentiated DCM cases from controls, we next sought to determine whether there might be a hierarchy for pathogenicity between genes.

ExAC metrics of expected genetic variation

ExAC provides two gene constraint metrics for expected variation in the general population.16 The “probability of being loss-of-function intolerant” (pLI) scores for our 41 genes ranged from 0 to 1,16 with only 10 genes assessed as extremely intolerant (pLI > 0.9; Table 1). DCM-associated truncating variants were enriched in genes with high pLI scores (11 of 23 [48%] variants versus controls, 1 of 17 [6%] variants, P = 0.005), but >50% variants were in genes with intermediate or low pLI scores. Twenty-six genes had expected intolerance to missense variants, indicated by positive Z scores (Table 1).16 There was a statistically significant but relatively modest excess of missense variants in these genes in DCM cases (182 [52%] variants versus controls, 129 [43%] variants; P = 0.018).

Table 1 Gene characteristics and evidence for role in DCM pathogenesis

Residual Variation Intolerance Score (RVIS)

RVIS is another metric of expected genetic variation.17 In this scoring system, genes that cause Mendelian disorders should have little population variability (25th percentile RVIS), while genes associated with complex disorders are likely to be highly variable (100th percentile RVIS). In keeping with the Mendelian model, we found that 18 of our genes were in the 25th percentile bin (Table 1). However, while DCM-associated truncating variants were preferentially distributed in 25th percentile genes (P = 0.01, 2 × 4 chi-squared test; Supplementary Table S6), this was not the case for missense variants (P = 0.67).

Variant burden per gene

There were only three genes, RBM20, MYH7, and LMNA, in which there were significantly more rare variants in DCM cases than in controls (Supplementary Table S3). These findings strongly suggest that these genes harbor pathogenic rare variants, but do not directly inform the interpretation of any single variant. For most genes, the relatively small numbers of variants limited statistical comparisons.

DCM gene ranking

We used a semiquantitative method to rank genes according to literature evidence for roles in DCM pathogenesis (Table 1, Supplementary Table S7). Only 14 of 41 genes had robust evidence for disease association and were classified as group A. There was a prominent peak of DCM-associated truncating variants in these genes, with control-associated truncating variants mostly found in gene groups C and D (P = 0.01, 2 × 4 chi-squared test; Fig. 1b, Supplementary Table S8). DCM-associated missense variants also showed a peak in group A genes (P = 0.018), with marked enrichment of variants that were novel (ExAC) + deleterious (MetaSVM) (P = 0.001, Fig. 1c). Adding the “group A” gene parameter to these variant parameters increased the odds ratio for DCM from approximately 5-fold to nearly 9-fold (Fig. 1a).

A similar pattern was seen for the subset of ClinVar-listed truncating and missense variants, with a majority of pathogenic/likely pathogenic variants occurring in DCM cases and in group A genes (P = 0.0007, Supplementary Fig. S3A). Variants annotated specifically as pathogenic/likely pathogenic for DCM were exclusive to group A genes, comprising 17 of the 94 (18%) group A variants in DCM cases and 1 of 65 (1.5%) group A variants in controls. Of the 30 total pathogenic/likely pathogenic group A variants in DCM cases, 24 (80%) were also identified as “damaging” by our criteria (Supplementary Figure S3B). The 6 ClinVar variants that were not captured by our method included 5 very rare variants (1 to 5 alleles in ExAC), 3 of which were annotated pathogenic/likely pathogenic for DCM, and one variant (40 alleles in ExAC) that was likely pathogenic for HCM.

Yield of variants per person

We next looked at the yield per person of prioritized variants (Table 2). For this analysis, truncating variants and “novel (ExAC) + deleterious (MetaSVM)” missense variants are referred to as “damaging” group A gene variants (Supplementary Table S9). Damaging variants were present in 65 (12.2%) DCM cases and in 8 (1.5%) control subjects (P < 0.0001). Similar patterns for the prevalence of damaging group A variants were seen in an independent cohort of familial DCM cases (17 probands [16.8%]), and in two replication control cohorts from the Alzheimer’s Disease Sequencing Project (54 subjects [1.8%]) and the Medical Genome Reference Bank (33 subjects [2.9%]).

Table 2 Yield per person of TTNtv and damaginga group A gene variants in the primary cohort and in replication cohorts

When TTNtv were also considered, 151 (28.4%) of our DCM cases were positive for damaging variants and/or TTNtv compared with 14 (2.7%) controls (P < 0.0001). Damaging variants and/or TTNtv were present in 32 (31.7%) probands in the familial DCM replication cohort, and in 61 (2.1%) and 46 (4.0%) subjects, respectively, in the two control replication cohorts. It has been questioned whether TTNtv are sufficient alone to cause DCM or require “second hit” genetic and/or acquired factors for DCM manifestation. We found no differences in the background burden of variants in TTNtv + DCM cases (1.58 ± 1.74 variants, range 0–9) when compared with TTNtv-/damaging + DCM cases (1.17 ± 1.22 variants, range 0–5), TTNtv-/damaging–DCM cases (1.30 ± 1.55 variants, range 0–13), or TTNtv-/damaging–control subjects (1.20 ± 1.29, range 0–8; P = 0.3819, Kruskal–Wallis test). These data suggest that TTNtv carriers generally do not show an excess of second damaging variants.

Family segregation

To further test our criteria for variant prioritization, we performed cosegregation analysis in 28 DCM families in which DNA samples from ≥3 informative individuals were available. All rare variants that were identified in each family proband were evaluated in the respective family members.

Truncating group A gene variants

Group A gene truncating variants were evaluated in three families (Supplementary Figure S4). In family BY, the proband carried a BAG3 stop codon with segregation analysis consistent with linkage (odds ratio 1/18, logarithm of the odds (LOD) score 1.3). The truncated protein would lack the signature BAG domain that binds to HSP70 and is required for chaperone activity.18 Numerous BAG3 truncating pathogenic variants have been reported in DCM patients, and reduced levels of BAG3 protein have been seen in ventricular tissues from patients with heart failure.19,20,21

The family DF proband and her affected brother carried a MYH7 splice acceptor site variant. This variant was absent from three unaffected individuals aged >40 years, but the family size was insufficient to show statistically significant linkage (odds ratio 1/6, LOD score 0.8). MYH7 truncating variants were found in two additional sporadic DCM cases in our cohort (Supplementary Table S9) but have rarely been reported in DCM cases,12,22 and there is uncertainty about their clinical significance.

The family C proband carried a SCN5A stop codon that would truncate the C-terminus with loss of a PY-motif (binding site for Nedd4 ubiquitin ligase) and a PDZ domain binding motif (interacts with the cytoskeletal adapter protein, syntrophin), potentially giving rise to protein degradation or trafficking defects.23,24 The cardiac phenotype associated with SCN5A pathogenic variants often includes arrhythmias and conduction-system abnormalities as well as DCM,25,26 and atrial fibrillation was present in all affected individuals in this kindred. The apparent nonsegregation of this variant with DCM (LOD score –2.5) may be confounded by two possible phenocopies who had other plausible acquired causes of disease.

Missense variants

Segregation of 30 missense variants was evaluated in 14 kindreds (Supplementary Table S10). Only 4 of the 30 variants had LOD scores >1 (suggestive of linkage relative to family size) and all of these were damaging group A gene variants (Supplementary Fig. S4). Affected individuals in family CZ carried a p.Arg369Gln MYH7 variant and had both DCM and left ventricular noncompaction. This variant, in the myosin motor domain, has been associated with both phenotypes.6,27 The p.Arg634Trp RBM20 variant, present in all affected individuals in family AB, lies within the arginine–serine-rich (RS) domain, which is a putative DCM pathogenic variant hotspot.28,29 There were also two cosegregating DES variants, p.Leu398Pro (family FK) and p.Lys449Thr (family FG), the latter associated with myofibrillar myopathy.30 Four additional kindreds (families BG, BK, FR, KS) had damaging group A gene variants with LOD scores <1 (Supplementary Fig. S5). The damaging variants were present in 15 of 16 affected individuals in these families, with genotype–phenotype discordance driven mainly by nonpenetrant variant carriers. In all families tested, there was no evidence for cosegregation of any variants that did not meet the criteria for being “damaging” or that were in genes other than group A.

Multiple variants

With expanded genetic testing panels, it is not uncommon to find several rare variants in DCM cases, prompting hypotheses of multiple pathogenic variants.7,31 In an extension of the current analysis, we reviewed data for individuals who had undergone testing with the 69-gene testing panel and looked at all 69 genes as well as increasing the threshold level of MAF to <1%. Family cosegregation analysis was then performed in five kindreds in which the proband was found to carry ≥5 variants. In four of these families (families BY, FK, AB, BA), only one of the proband’s multiple variants segregated with DCM (Supplementary Figure S6). In the remaining family (family GX), there were no cosegregating variants. Four of the proband’s five rare variants were inherited from her unaffected father, and none were damaging group A variants. These studies demonstrate the value of family analysis and support the expectation that a single “driver” variant will be present when DCM appears as a Mendelian trait.

Location of variants in group A genes

To explore potential effects of variant location within group A genes, we compared the distribution of missense variants identified in DCM cases (derived from our own cohorts and two clinical laboratories12) with rare (MAF < 0.1%) missense variants in ExAC (Fig. 2, Supplementary Table S11). In MYH7, DCM variants occurred in all protein domains but were preferentially located in the myosin motor (Fig. 2a). DCM-associated variants in RBM20 were significantly enriched in the RS domain that is associated with altered titin splicing and myocardial compliance (Fig. 2b).28,29 More than half of the damaging RBM20 variants resided in the RS domain or in a glutamate-rich region that is also associated with titin splicing defects.32 DCM-associated LMNA variants were mostly located in the coiled-coil rod domain, particularly in coil 2, which is critical for dimer formation (Fig. 2c),33 and there was a cluster of DCM-associated variants in the S4 voltage sensor, repeat I, of SCN5A (Fig. 2e). Variants in the S4 transmembrane segments cause arrhythmic forms of DCM and have been associated with gain-of-function effects and gating pore currents25,26,34,35,36. Clustering of DCM variants was also apparent in the α-tropomyosin binding domain of TNNT2, and several damaging variants in TPMI resided in the cardiac troponin T binding domain (Fig. 2g,h).

Fig. 2: Rare variant distribution in DCM cases and control subjects.
figure 2

The location of variants in different protein domains of group A genes is shown in panels (a–l). In each panel, variants identified in DCM patients in the primary cohort (top row) and in the familial dilated cardiomyopathy (FDCM) replication cohort (second row) are shown: all variants (pink, “DCM all”), damaging variants (red, “DCM HP”). Variant types are indicated as follows: circle = missense, square = splice site change, diamond = frameshift, star = stop codon. The third row shows DCM variants identified by two clinical diagnostic laboratories (“DCM Walsh”).12 Missense variants identified in control subjects are shown below the protein schematic: control subjects in this study (Controls), Exome Aggregation Consortium (ExAC) database. See Supplementary Table S11 for protein domain coordinates and statistical analysis

Discussion

Our data show that rare protein-altering cardiomyopathy gene variants are not unique to DCM patients but also occur commonly in apparently healthy individuals. Because of this, effective discriminative strategies are clearly needed to derive meaningful results from genetic testing. Here we provide a new approach for assessment of rare variants that combines both variant-level and gene-level information. We propose that ranking genes based on their a priori likelihood of disease causation is a key factor in identifying clinically actionable variants.

Only 14 of 41 genes achieved “group A” status. Our literature-based gene grading method was subsequently validated by the predilection for damaging variants in DCM cases to occur in group A genes and by the similar clustering of pathogenic/likely pathogenic ClinVar variants. Gene group is a dynamic parameter, and is highly dependent on the contemporary knowledge base. Assessment of genetic evidence is intrinsically biased by the size of the kindreds evaluated, the frequency in which genes have been screened, and published reports, and definitive conclusions cannot be drawn if there is insufficient information. Genes in which variants cosegregated with DCM in multiple large kindreds (≥5 affected) scored highly but for many genes, only small families or single cases have been studied, raising the possibility that variants, even if function-altering, might cosegregate by chance or be incidental findings. Similarly, in vitro functional data are helpful if these are available but are often lacking from the literature. Group A genes typically had animal data showing spontaneous development of DCM in models expressing heterozygous loss-of-function alleles or human missense pathogenic variants. This level of evidence was missing from many studies in which only homozygous loss-of-function animals have been studied. While such models can implicate genes in normal cardiac function, it cannot be assumed that there is a direct correlation with gene “dose” and that heterozygous counterparts will have a similar, albeit less severe, phenotype. In homozygous loss-of-function animals, there may also be profound effects on cardiac development that independently predispose to contractile impairment.

The group A genes, MYH7, RBM20, and LMNA, had significantly more rare variants in DCM cases than controls, and frequently harbored damaging variants. MYH7 and LMNA have consistently appeared on lists of the “most frequently mutated” genes in DCM genetics studies, and were the top two genes (after TTN) showing an excess of rare variants (MAF <0.0001) in a recent analysis of 1315 DCM patients referred to diagnostic genetic testing laboratories.6,7,12,22 RBM20 is a more recent addition to the disease gene list and has been less frequently screened. Despite their overall enrichment in DCM cases, not all rare variants in MYH7, RBM20, and LMNA are necessarily deleterious and additional information is required for clinical interpretation of any specific variant. Interestingly, variants in these genes had a nonrandom distribution in DCM cases: MYH7 variants clustered in the myosin motor, RBM20 variants in the RS domain, and LMNA variants in the coiled-coil rod. As the numbers of reported DCM-associated variants expands, additional patterns of variant distribution patterns may become apparent.

Our method for stratifying rare variants provides a framework for future studies and further refinements are anticipated. We expect that over time, more genes will achieve group A status, and ongoing rigorous gene ranking is warranted. For missense variants, the criterion of absence from ExAC may underestimate the yield of deleterious variants and varying MAF threshold levels need to be evaluated. It is notable, for example, that the few ClinVar-annotated pathogenic/likely pathogenic group A gene variants that were not captured by our method were mostly extremely rare in ExAC. Many of the ClinVar-listed variants are annotated as pathogenic/likely pathogenic for disorders other than DCM. This phenotypic heterogeneity suggests that at least some of these variants might be modifiers rather than directly causative of disease. Another explanation is that the ultimate clinical manifestations of any given variant may be flavored by the context of each person’s genetic and lifestyle factors. The detection of pathogenic/likely pathogenic variants for diverse disorders suggests that our prioritization method could be applied to a range of genetic testing indications, with the caveat that disease-specific gene lists would be needed.

The lack of cardiac investigations in 40% of control subjects is a limitation of this study. We focused on subjects of European ancestry to avoid confounding effects of ethnicity-related background genetic variation, and the applicability of our findings to other populations has yet to be determined. In assessing individual burden of genetic variation, only rare cardiomyopathy gene variants were considered. The extent to which rare, low-frequency, and common variants in a broad range of cardiac genes might cumulatively contribute to a myopathic substrate is unknown.

Ongoing efforts by the American College of Medical Genetics to standardize variant annotation methods are an important step toward establishing international guidelines for genetic testing in DCM and other inherited human disorders.37 The cost-efficacy of first-line screening of a core set of key DCM disease genes (e.g., group A genes) versus extended panels or genome-wide testing warrants further analysis. As genome sequencing is poised to become part of mainstream health care, there is a pressing need for curation of clinical and genetic information in databases such as ClinGen,38 functional evaluation of variants, and clinical trials in genotyped patients. Advancements in these areas should expedite implementation of personalized medicine.

In summary, our data suggest that interpretation of genetic testing results cannot be made on the basis of variant characteristics alone, and that the effect size of protein-altering variants is primarily determined by the biological importance of the gene involved and relevance to DCM pathogenesis.