Introduction

Arrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited heart disease characterized by ventricular dysfunction and arrhythmias.1 ARVC has an estimated phenotypic prevalence of 1:1000 to 1:5000,2 and is among the leading causes of sudden cardiac death in people under 35 years of age—especially young athletes.3, 4 Clinical symptoms are frequently absent before a sudden death event,5 so improved methods of screening and early diagnosis are needed to provide an opportunity for life-saving prophylaxis. ARVC is primarily attributed to genetic variants in cardiac desmosome genes,6, 7 so a “genome-first” approach to patient identification is a promising option for earlier diagnosis. This paradigm is a natural extension of recommendations from the American College of Medical Genetics and Genomics that specific incidental findings from clinical genetic sequencing should be reported to patients because of the potential for medical benefit.8 Indeed, ARVC was one of the few selected conditions included with these guidelines, representing five of the 56 genes recommended for screening.

Recent studies have called into question the efficacy of incidental genetic findings for ARVC and other inherited cardiac diseases.9, 10, 11, 12, 13 Most notably, a study by Van Driest et al.11 concluded that there was no abnormal phenotype in individuals carrying potentially pathogenic variants for select genes associated with either long QT syndrome or Brugada syndrome, suggesting that notifying patients of incidental findings was unwarranted. While striking, this study was limited by the facts that only 2,022 individuals were studied, and only two genes—accounting for approximately 38% of long QT syndrome cases14 and 16% of Brugada syndrome cases15—were screened. Moreover, the observed variants were almost exclusively missense or variants of uncertain significance (VUS). Missense variants have previously been shown to be both common10, 16 and difficult to evaluate for pathogenicity,17 as is further evidenced by the lack of consensus among clinical laboratory reviews in the aforementioned study by Van Driest et al. Based on disease prevalence,11 the likelihood of identifying even one patient with a causal mutation linked to either syndrome in that cohort was only 34%. Larger numbers are clearly needed to sufficiently power such analyses for rare diseases.

In this study, we used a 15-fold larger cohort (30,716 subjects with exome sequencing), reviewed more genes (seven), and focused on “radical” putative loss-of-function (pLOF) variants, which have a higher probability of disease association,9 to provide a more comprehensive analysis of the genotypic prevalence of ARVC. Furthermore, since no other study has evaluated the phenotypes of individuals with pLOF ARVC variants ascertained through population sequencing, we reviewed electronic health records (EHR) for identified subjects to establish a genotype–phenotype association. While full disease penetrance was not expected, we hypothesized that a discernible EHR phenotype (such as the presence of ARVC diagnostic criteria, arrhythmias, or other primary cardiomyopathies) would be present in individuals with pLOF variants.

Materials and methods

Study information

The MyCode Community Health Initiative of Geisinger Health System (GHS) is an institutional review board–approved research biorepository and precision medicine project. MyCode participants provide samples for research, including permission to link samples and associated data with information in their EHR.18 Through the “DiscovEHR” collaboration between GHS and the Regeneron Genetics Center, DNA samples from MyCode participants are used to generate exome sequence data to populate this database.18, 19 Details of the exome sequencing and postprocessing have been described elsewhere19 (see the Supplementary Information online for additional detail). Briefly, sequencing was performed on an Illumina v4 HiSeq 2500 to a coverage depth such that over 95% of samples had greater than 85% of the target bases covered with a read depth greater than 20X. At the time of this study, the MyCode repository comprised 30,716 subjects of predominantly European ancestry. No exclusions were made for relatedness or ancestry.

Variant evaluation

We reviewed the ClinVar (http://ncbi.nlm.nih.gov/clinvar) and ARVC (http://arvcdatabase.info20) databases for all ARVC-associated variants in PKP2, DSP, DSC2, DSG2, JUP, TMEM43, and TGFβ3 that were classified as pathogenic (P) or likely pathogenic (LP) as of 2 June 2015. The ClinVar search criteria included both the gene name and the term “arrhythmogenic right ventricular cardiomyopathy.” For conflicting classifications, ClinVar superseded the ARVC database, and conflicting interpretations in ClinVar were resolved based on the most recent submission. The variants observed in the MyCode cohort were then further classified by expert review according to published methods21 consistent with the 2015 American College of Medical Genetics and Genomics–Association for Molecular Pathology guideline for sequence variant interpretation.22 Of note, loss-of-function is the predominant mechanism of pathogenicity for the desmosome genes in ARVC,9 so the evidence assertion for radical pLOF variants was generally very strong. This expert review was conducted by staff at the Laboratory for Molecular Medicine at Partners Personalized Medicine (Christina Austin-Tse, Heather Mason-Suares, Matthew Lebo). To account for continuous, ongoing improvements in the bioinformatics pipeline with time, variant calls were rechecked against the most recent quality control–filtered data at the time of manuscript preparation. This pipeline included GATK best practices for variant calling,23 and filtering with GATK for genotype quality with a threshold of 20.

Subject identification and phenotype analysis

All subjects carrying at least one potential ARVC variant (database reported “P/LP”) were identified. Age (±5 years) and sex-matched subjects (5:1 match) were randomly ascertained from MyCode subjects lacking ARVC-associated variants.

For all subjects with potential variants and a random subset (10%) of controls, we completed a blinded expert review of the primary data, including:

  1. 1

    Most recent nonpaced electrocardiogram (ECG) (initial review by C.A.J. and C.T.; any abnormal finding read by H.C. for classification).

  2. 2

    Most recent echocardiogram for which the right ventricular function and size were not explicitly reported as normal (V.C.M. and D.J.M.). A random 20% of the “normal” cases were also reviewed to ensure accuracy.

Data from the most recent Holter monitoring were also reviewed. Findings were compared against the diagnostic Task Force criteria for ARVC, which include right ventricular dysfunction and depolarization/repolarization abnormalities on ECG.24 International Classification of Diseases, Ninth Revision (ICD-9) codes in patient records as of 21 July 2015 were reviewed for specific (see Supplementary Table S1) and nonspecific (Supplementary Table S2) codes. As appropriate, the cause of death (i.e., cardiac, noncardiac, or unknown) was assessed from a blinded physician chart review (B.K.F.) of death certificates or contemporaneous physician notes.

Separately, the GHS EHR was reviewed for individuals who had experienced any medical encounter with an associated diagnosis of arrhythmogenic right ventricular cardiomyopathy or dysplasia (Supplementary Table S1). A physician chart review (B.K.F.) of the two most recent cardiology, internal/family medicine, and discharge notes (or any two subspecialty notes when other notes were not available) was performed for affirmative documentation of disease. This blinded procedure was also used to confirm the diagnostic status for a random subset (33%) of subjects with potential variants, plus any subject with an external cardiology referral.

Statistical analysis

Statistical testing was performed using R (version 3.3.2). Descriptive statistics are reported as mean±standard deviation. Group differences in phenotypes were compared using Fisher’s exact tests; the log-rank test was used to compare survival, using the OIsurv package in R.25 Reported P-values were not adjusted for multiple comparisons.

Results

Missense variants are common; previously reported pLOF variants are rare

A total of 323 rare variants were database-listed as “P/LP” in ClinVar and/or the ARVC database (see Supplementary Table S3). Of these, 45 potentially pathogenic variants were identified in 301 MyCode participants. One variant (TMEM43 c.705+7G>A) was observed in 76 subjects and immediately reclassified as likely benign due to the high allele frequency (>0.075%).21 Furthermore, the variant calls for seven subjects did not pass genotype quality filtering, and were excluded. The remaining 215 patients (0.7% of the population) comprised the potentially pathogenic variant group (Figure 1; Table 1). The mean age was 62±17 years. The distribution with respect to affected gene is shown in Supplementary Table S4. Missense variants were most prevalent with respect to the observed variants (30 of 44 (excluding TMEM43 c.705+7G>A); 68%) and the number of subjects (191 of 215; 89%).

Figure 1
figure 1

Overview of the study design and patient group identities. ARVC, arrhythmogenic right ventricular cardiomyopathy; EHR, electronic health records; pLOF, putative loss of function; VUS, variant of uncertain significance.

Table 1 Details of variants database listed as “P/LP” and identified in 30,716 patients

After evidence review, nine radical pLOF variants (nonsense and splice site) met the criteria to be classified as P/LP. Supporting evidence for other variants, including all missense variants, was insufficient and they were scored as VUS or likely benign (Table 1). Of note, eight of the nine pLOF variants were documented in the ClinVar database; only one of the 32 observed variants unique to the ARVC database was confirmed.

The nine pLOF variants were identified in 18 subjects (1:1706). The mean age of this group was 59±18 years (Table 2), and the mean body mass index (BMI) was 32±8.

Table 2 Patient demographic details by group

No increase in the prevalence of ARVC diagnostic criteria in patients with pLOF variants

The 18 subjects with pLOF variants had a median of 9.5 years of EHR data (range: 0–16 years). Based on ICD-9 coding and a randomized chart review, none had a documented diagnosis of ARVC. Fourteen subjects (78%) had a previous ECG. These were manually reviewed for diagnostic depolarization/repolarization criteria (Table 3). One of the 14 subjects satisfied a minor criterion (inverted T-waves in V5 and V6), which qualifies for a “borderline” diagnosis with the addition of the LP variant. Separately, echocardiograms were reviewed for the eight subjects (44%) with previous studies, and right ventricular function was reported as “normal” for all. Four of the 18 subjects with pLOF variants (22%) had neither an ECG nor echocardiogram—including three individuals younger than 37 years of age.

Table 3 Summary findings of 2010 Task Force criteria by group

Given the uncertain importance of a rare VUS in ARVC-associated genes, primary EHR data for the 184 individuals with such variants were also reviewed (Table 2). The average BMI for these subjects was 31±7. Based on ICD-9 coding and a randomized chart review, none of these individuals had a documented diagnosis of ARVC. ECGs were available for 160 individuals (87%); 83% satisfied no diagnostic criteria, 4% satisfied a major criterion, and 13% satisfied a minor criterion (Table 3). These frequencies were comparable to those of the variant-negative control group, from which 5% and 17% were found to have major and minor ECG criteria, respectively. Hence, the findings in the VUS group were statistically similar to the variant-negative control group (P=0.54 and P=0.35, respectively). The same trends were observed with respect to echocardiographic and Holter monitoring criteria (Table 3).

Nonspecific phenotypes distinguish known ARVC cases but not potential variant groups

Thirty patients with at least one instance of an ARVC diagnostic code in their EHR were identified, out of 1.35 million subjects. A physician chart review confirmed an ARVC diagnosis in only 8 subjects (1:168,750; Table 2); diagnostic codes were determined to be inappropriately applied in the remaining 22 subjects. No exome sequence data were available for these individuals.

Nonspecific ICD-9 composite phenotypes—nonischemic cardiomyopathies, cardiac electrophysiologic abnormalities, ARVC “characteristic symptoms,” and automatic implantable cardioverter defibrillator use—for the pLOF, VUS, variant-negative controls, and definite ARVC study groups are compared in Figure 2a. The prevalence of nonischemic cardiomyopathies, electrophysiologic abnormalities, and automatic implantable cardioverter defibrillator use was significantly higher in the definite ARVC group, whereas the patients with a pLOF variant or VUS showed prevalences similar to those in the negative control group.

Figure 2
figure 2

Condition prevalence and survival analysis. (a) Prevalence of the composite ICD-9 categories within each study group. P-values denote Fisher’s exact test for a given category. (b) Kaplan–Meier survival estimates for each study group, showing that the pLOF group had a significantly reduced survival by log-rank test. AICD, automatic implantable cardioverter defibrillator; ARVC, arrhythmogenic right ventricular cardiomyopathy; pLOF, putative loss of function; VUS, variant of uncertain significance.

Increased all-cause mortality in the pLOF group

We compared all-cause mortality (age at death) between groups (pLOF, VUS, and variant-negative controls). For the control and VUS groups, 89% and 92% of subjects were alive at the time of analysis, respectively (P = 0.19; Figure 2b). By comparison, only 72% of subjects (13 of 18) in the pLOF group were alive, representing significantly increased mortality compared with the VUS group (P=0.003). However, from the chart review, the causes of death for the five pLOF subjects were non–cardiac related in four cases and unknown in one.

Discussion

Our major findings appear to parallel recent studies in cardiac genetics with respect to the high prevalence of rare variants once thought to be disease-causing, but weak association with classic disease symptoms.10, 11, 12, 13, 16 However, the present work specific to ARVC represents a significant advance for several reasons. This study represents the largest analysis to date of next-generation sequencing data to evaluate the presence of putative genetic variants related to ARVC. The size and scope of this study (inclusive of all primary ARVC genes) provided a 98 to 99% likelihood of finding at least one subject with ARVC and a causal variant. This high probability contrasts with previous studies, such as the work by Van Driest et al., which have been drastically underpowered to study rare diseases and their associated radical genetic variants.11 Indeed, we observed previously reported pLOF variants in 18 (1:1706) individuals, which were further adjudicated as P/LP following expert evidence review. This study also represents the first population-based analysis of ARVC genetics with linked EHR data to evaluate genotype–phenotype associations. With these linked data, we reviewed diagnostic codes for all subjects, ECGs for 86% of the pLOF and rare VUS groups, and right ventricular function via echocardiogram in 58%. Our findings suggest that, in unselected individuals with incidentally detected pLOF variants, genetic penetrance may be much lower than estimates (40–60%) from familial studies5 suggest, as no individual had a documented diagnosis of disease, and only one subject satisfied any additional minor diagnostic criteria. Moreover, our findings demonstrate that individuals with a rare VUS also have an unremarkable phenotype, with no existing diagnoses and diagnostic criteria frequencies in line with the observed false-positive rate in variant-negative controls. Several studies have documented the considerable background noise in our understanding of putative pathogenic ARVC variants,9, 10 but none has previously demonstrated that this noise has no phenotypic consequence.

Vigorous exercise is the most significant known modulator of genetic penetrance for ARVC.26, 27 In fact, current guidelines restrict athletic participation following even a possible diagnosis.28 Considering the obese average BMI of our cohort (greater than 32 in the pLOF group), it is likely that few subjects, if any, regularly participate in the level of vigorous physical activity typically associated with ARVC. Such a lifestyle likely confers a protective effect in the setting of genetic predisposition to ARVC and may have played a role in the low penetrance we observed in our study.

While our phenotype determination was not made from prospective patient evaluations, 86% of the individuals studied had a nonpaced ECG available for review. ECG abnormalities comprise two of the six elements of ARVC diagnosis, and the negative predictive value of a normal ECG is very high, as past studies have shown that over 80% of diagnosed ARVC patients have ECG abnormalities.29, 30 Hence, while the available data are insufficient to definitively determine the status for the 13% of subjects with observed abnormalities or the 14% without ECGs, there is a high degree of certainty that the 73% of subjects whose ECG did not satisfy Task Force criteria do not have ARVC.

These results are not an inherent indictment of clinical follow-up for incidental genetic findings of ARVC, as there was at least one case of suggestive findings discovered through a genome-first ascertainment. The increased all-cause mortality in the small group of subjects with pLOF variants was also suggestive of a meaningful finding, although confirmation with a larger sample size is necessary. Instead, our findings strongly indicate that such efforts should be managed conservatively and with stringent evaluation of observed genetic variants based on current standards. Even for individuals identified with radical pLOF variants through genome-first ascertainment, low penetrance should be assumed because such screening is likely to maximize the probability of false-positive findings. Furthermore, this low penetrance may be exacerbated in the absence of environmental or lifestyle factors, such as a vigorously active lifestyle. Thus, the incidental identification of a pathogenic variant does not appear to warrant automatic acceptance as a major criterion for ARVC diagnosis, as it is for patients ascertained through clinical presentation or familial testing.24 Similar modification of family history criteria for arrhythmic risk stratification has recently been proposed.30 Ultimately, more research is needed to develop evidence-based guidelines for the clinical management of patients with incidentally identified pLOF variants in ARVC-associated genes.

Other potential mechanisms to explain the results

Genetic modulation of penetrance

Additional unknown genetic factors may also explain the low penetrance we observed. For example, there has been some debate about the importance of compound or digenic heterozygosity to ARVC,9, 31, 32 which may predispose patients to a more severe phenotype.5, 32 Furthermore, since only 50 to 70% of diagnosed cases have been attributed to a genetic cause, unrecognized genes may play critical roles. More extensive use of next-generation sequencing may help to address such questions.33 None of the subjects in our cohort had more than one known variant, although potential novel variants were not assessed.

Phenotype misclassification

ARVC is a relatively new and evolving clinical entity that may be underrecognized in clinical populations. The observed prevalence of diagnosed ARVC in our large community health system was 1:168,750, which is two orders of magnitude below reported prevalence and supports this assertion. Since our evaluation was based on EHR data, we cannot rule out clinical phenotype misclassification in the 27% of our cohort who did not have a documented normal ECG. Moreover, without a formalized set of evidence-based diagnostic criteria for the expanding phenotypic spectrum of ARVC (e.g., bi-ventricular and left-dominant forms),34 our ability to rule out these less common manifestations of the disease from this cohort is limited.

Sampling bias

Given the advanced mean age of the sequenced cohort, it is possible that sampling was biased for survivors. Since ARVC often presents with sudden death in early adulthood, truly affected individuals may have died at a younger age and are therefore not represented in the data. Furthermore, the presence of other age-related comorbidities, such as coronary artery disease, in this older cohort may also contribute to an underdiagnosis of ARVC. Future studies should seek to incorporate younger individuals into penetrance estimates.

Limitations

This study was retrospective and used data derived from our institutional EHR, which had a median of 12 years of longitudinal data per patient for the entire MyCode cohort.18 Prospective targeted phenotyping of pLOF subjects is forthcoming through our institution’s GenomeFIRST return of results initiative, which will provide additional information on genotype–phenotype relationships and better clarify the predictive value of EHR phenotyping.35 However, the only other study using next-generation sequencing to study ARVC variants in 6,354 unselected individuals did not have linked EHR data.10 With a fivefold larger sequenced population and linkage to EHR, our study offers substantially more comprehensive results.

ARVC exhibits age-dependent penetrance, with symptoms and diagnostic criteria developing with time.30 Therefore, without completed lifelong follow-up, we cannot definitively say that none of the subjects in our cohort will develop new symptoms. However, previous studies have reported that the cumulative prevalence of ARVC is essentially flat after 60 years of age,36 suggesting that few of our subjects could be expected to develop new symptoms, given the mean age of the cohort.

We did not evaluate novel variants, instead focusing on previously observed variants. This was a similar approach to the recent work by Ghouse et al.12, 13 The addition of novel pLOF variants will increase the likelihood of identifying affected individuals, but will also increase the genotype prevalence estimates. Future studies will include potentially novel variants.

Conclusion

In a large, unselected cohort with exome sequencing data linked to EHR, rare variants listed as P/LP in database resources are common (1:143 genotypic prevalence); however, only 20% of the observed variants (9 of 44) were pLOF. Of the 18 individuals (1:1706) identified with a pLOF variant, the majority (72%) had a normal ECG and were likely unaffected, while only 6% (one subject) satisfied a minor ARVC diagnostic criterion. Because of the importance of environmental factors on ARVC pathogenesis, identification of radical variants alone does not provide a genome-first solution to the identification of ARVC. A conservative approach to clinical return of incidental genetic findings associated with ARVC with further in-person condition-specific phenotyping is warranted.