INTRODUCTION

The study of the genetic basis of human disease has traditionally utilized a “phenotype-first” approach in which persons with phenotypic disease traits are genotyped or sequenced to identify gene variants that may be associated with or causal for disease.1,2 A “genome-first” approach in which sequencing is applied to large heterogeneous populations with subsequent determination of the associated phenotypes is of interest.3,4 This approach can be applied to health-care populations with extensive electronic health record (EHR) phenotype data, thus permitting an unbiased approach to phenome-wide association studies (PheWAS) to determine the clinical impact of specific genetic variants.5,6 In addition to identifying previously unsuspected gene ontologies, this approach may also reveal that many patients with single-gene Mendelian disorders are not clinically diagnosed.7

Large-scale exome sequencing allows for the identification of rare exonic variants. Statistical aggregation tests that interrogate the cumulative effects of multiple rare variants in a gene (i.e., “gene burden”) increase the statistical power of regression analyses and enable gene-based association studies to describe the implications of mutated genes in human disease. Gene burden PheWAS in large health-care populations could increase the potential to uncover novel consequences of gene variants in the human disease phenome. One approach to gene burden PheWAS is to focus only on predicted loss-of-function (pLOF) variants,6 but could lead to lack of power due to their infrequency. To address this issue, private and very rare missense variants could be added to substantially increase the number of genotypic cases. However, a major challenge is deciding which missense variants to include in gene burden tests of association.

The unbiased genome-first approach is an ideal system for studying the effects of rare variants in genes with known pleiotropy. Pathogenic variants in LMNA are highly pleiotropic and cause several rare diseases including dilated cardiomyopathy, familial partial lipodystrophy type 2, and Emery–Dreifuss muscular dystrophy, among others.8,9,10,11 We leveraged the Penn Medicine Biobank (PMBB, University of Pennsylvania), a large academic biobank with exome sequencing linked to EHR data, to evaluate in detail the phenotypes associated with rare pLOF and annotated deleterious missense variants in LMNA. In addition to mining qualitative ICD-based diagnosis codes, we interrogated EHR data for quantitative phenotypic traits via analyses of clinical imaging and laboratory measurements. Our findings represent the first report of a genome-first approach to examining the clinical effects of pLOF and predicted deleterious missense variants in LMNA.

MATERIALS AND METHODS

Setting and study participants

All individuals recruited for the Penn Medicine Biobank (PMBB) are patients of clinical practice sites of the University of Pennsylvania Health System. Appropriate consent was obtained from each participant regarding storage of biological specimens, genetic sequencing, and access to all available EHR data. The study was approved by the Institutional Review Board of the University of Pennsylvania and complied with the principles set out in the Declaration of Helsinki.

The DiscovEHR cohort was used to replicate major findings. DiscovEHR is a collaboration between the Geisinger Health System and Regeneron Genetics Center in which exome sequencing was performed on biospecimens collected and linked to EHR data through Geisinger’s MyCode Community Health Initiative.12

Exome sequencing

This study included a subset of 11,451 individuals in the PMBB who had exome sequencing. We extracted DNA from stored buffy coats and then obtained exome sequences as generated by the Regeneron Genetics Center (Tarrytown, NY). These sequences were mapped to GRCh37 as previously described.13 For subsequent phenotypic analyses, we removed samples with low exome sequencing coverage (i.e., less than 75% of targeted bases achieving 20× coverage; N = 46), high missingness (i.e., greater than 5% of targeted bases; N = 14), high heterozygosity (N = 97), dissimilar reported and genetically determined sex (N = 104), genetic evidence of sample duplication (N = 89), and cryptic relatedness (i.e., closer than third-degree relatives; N = 145) with overlap among categories, leading to a total of 455 removed from our database. Of note, among the 72 individuals identified as carrying one of pLOF variants or missense variants with Rare Exome Variant Ensemble Learner (REVEL)14 scores of at least 0.65 who were used for the primary analyses of this work, 4 individuals were removed from subsequent analyses due to low coverage (N = 2), sex discordance (N = 1), and being part of a parent–child pair (N = 1).

Exome sequencing in the DiscovEHR cohort was also performed by the Regeneron Genetics Center, as previously described.6,15 In addition to exclusions for sequence quality, sample duplicates, and sex discordance, we excluded 31,399 individuals with closer than third-degree relatedness, yielding a study set of 61,056 individuals.

Variant annotation and selection for gene burden association testing

For both PMBB and DiscovEHR, variants were annotated using ANNOVAR16 as pLOF or missense variants. pLOFs were defined as frameshift insertions or deletions, gain or loss of stop codon, and disruption of canonical splice site dinucleotides. Only variants with minor allele frequencies (MAF) ≤0.1% per the Genome Aggregation Database (gnomAD) were considered for inclusion in the gene burden association testing. Several approaches to inclusion of rare variants in the gene burden were applied, including pLOFs only, additional ClinVar pathogenic variants, and inclusion of missense variants that were scored deleterious by 5/5 algorithms (SIFT17, PolyPhen2 HumDiv, PolyPhen2 HumVar18, LRT19, MutationTaster20). To capture additional individuals with potentially pathogenic missense variants, we utilized REVEL, an ensemble method for predicting the pathogenicity of missense variants,14 to score rare missense variants in LMNA.

Clinical data collection

International Classification of Diseases Ninth Revision (ICD-9) and Tenth Revision (ICD-10) diagnosis codes and procedural billing codes, medications, and clinical imaging and laboratory measurements were extracted from the patients’ EHR. All laboratory values measured in the outpatient setting were extracted for participants from the time of enrollment in the Biobank until 3 March 2018; all units were converted to their respective clinical traditional units. Minimum, median, and maximum measurements of each measurement were recorded per individual. Glomerular filtration rate (GFR) estimates were calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) creatinine equation, given its superiority to the Modification of Diet in Renal Disease Study (MDRD) equation in patient populations with normal or mildly reduced eGFR. Inpatient and outpatient echocardiography measurements were extracted if available for participants from 1 January 2010 until 9 September 2016; outliers for each echocardiographic parameter (less than Q1 – 1.5*IQR or greater than Q3 + 1.5*IQR) were removed. Similarly, minimum, median, and maximum values for each parameter were recorded per patient.

For DiscovEHR, phenotypes were retrieved from Geisinger’s Phenomic Initiative database, which incorporates numerous sources (including the EHR) into a common data model. Patient demographics and ICD-10 codes from inpatient and outpatient encounters were retrieved as of 28 November 2018. ICD-9 codes were mapped to equivalent ICD-10 codes using underlying diagnosis codes.

Phenome-wide association studies

A PheWAS approach was used to determine the phenotypes associated with predicted deleterious variants in LMNA carried by individuals in PMBB.21 ICD-10 encounter diagnoses were mapped to ICD-9 via the Center for Medicare and Medicaid Services 2017 General Equivalency Mappings (https://www.cms.gov/Medicare/Coding/ICD10/2017-ICD-10-CM-and-GEMs.html) and manual curation. Phenotypes for each individual were then determined by mapping ICD-9 codes to distinct disease entities (i.e., PheCodes) using the R package “PheWAS.”22 Patients were determined to have a certain disease phenotype if they had the corresponding ICD diagnosis on two or more dates, while phenotypic controls consisted of individuals who never had the ICD code. Individuals with an ICD diagnosis on only one date as well as individuals under control exclusion criteria based on PheWAS phenotype mapping protocols were not considered in statistical analyses.

Each disease phenotype was tested for association with the LMNA gene burden using a logistic regression model adjusted for age, age2, gender, and the first ten principal components of genetic ancestry. We used an additive genetic model to collapse predictably deleterious LMNA variants via an extension of the fixed threshold approach.23 Given the relatively high percentage of individuals of African ancestry present in PMBB, PheWAS analyses were performed separately by European and African genetic ancestry and combined with inverse variance weighted meta-analysis. Our association analyses considered only disease phenotypes with at least 200 cases (≥~1.75% prevalence in the cohort), based on a prior simulation study for power analysis of PheWAS.24 This led to the interrogation of 333 total phenotypes, and we used a Bonferroni correction to adjust for multiple testing (p = 0.05/333 ≈ 1.5E-04).

Replication of major PheWAS findings in DiscovEHR was performed using a logistic regression model adjusted for age, age2, sex, and the first four principal components of ancestry. Dilated cardiomyopathy was defined as two or more encounter diagnoses of I42.0 (“Dilated cardiomyopathy”), or two or more instances of I42.8 (“Other cardiomyopathies”)/I42.9 (“Cardiomyopathy, unspecified”) diagnoses and mention of “dilated” in the underlying diagnosis code. Chronic kidney disease was defined as two or more encounter diagnoses of N18.3 (“Chronic kidney disease, stage 3 [moderate]”). For both phenotypes, patients with only one encounter diagnosis were excluded from analysis.

Statistical analyses

To compare available echocardiographic and serum laboratory measurements between carriers of predicted deleterious LMNA variants and genotypic controls, we used a nonparametric statistical model to compare each clinical measurement between the two groups using the Wilcoxon rank-sum test (i.e., Mann–Whitney U test). Additionally, comparisons were made using robust linear regression, adjusted for age, age2, gender, and the first ten principal components of genetic ancestry, in both the overall population and individuals of European ancestry alone. Furthermore, 95% confidence intervals (CIs) and p values were corrected by bootstrapping with 1000 replicates via the adjusted percentile method. All statistical analyses, including PheWAS, were completed using R version 3.3.1 or version 3.5 (Vienna, Austria).

RESULTS

Phenome-wide association studies for gene burden of deleterious variants in LMNA

Among the 11,451 individuals in PMBB with exome sequencing, we identified a total of 11 individuals carrying one of nine different pLOF variants (including five frameshift insertions/deletions, one gain of stop codon, and three variants disrupting canonical splice site dinucleotides) in LMNA (Table S1). All 11 individuals carrying pLOF variants had a diagnosis of either “primary/intrinsic cardiomyopathy,” “cardiac conduction disorders,” or both, confirming that heterozygous pLOF variants in LMNA have a high penetrance for cardiomyopathy. Interestingly, only 4 of these 11 individuals had received clinical genetic testing to confirm their laminopathies.

A PheWAS on the 11 carriers with pLOFs alone showed a signal for cardiomyopathy (Fig. S1) but had insufficient power; furthermore, most known pathogenic LMNA variants are missense variants. Therefore, we identified 167 individuals with one of 88 rare (MAF ≤ 0.1% in gnomAD) missense variants in LMNA (Table S1). We aggregated pLOF variants and missense variants annotated as pathogenic in ClinVar (N = 9 different variants, 20 carriers) and performed PheWAS (N = 33 carriers), resulting in a stronger signal for cardiomyopathy that was significant (Fig. S2). Given that many of the rare LMNA variants were of unknown pathogenicity, we combined missense variants predicted to be deleterious by a consensus of 5/5 algorithms (SIFT17, PolyPhen2 HumDiv, PolyPhen2 HumVar18, LRT19, MutationTaster20), one of the standard approaches for combining pLOF variants with computationally predicted pathogenic missense variants6 (N = 14 different variants, 24 carriers; Table S1), in a gene burden PheWAS (N = 35 carriers; Fig. 1a). The signal for cardiomyopathy diagnoses was even stronger, additionally identifying related diagnoses such as “first-degree atrioventricular (AV) block," “sinoatrial node dysfunction," and “congestive heart failure.”

Fig. 1
figure 1

Phenome-wide association studies (PheWAS) of predicted deleterious LMNA variants. Gene burden tests of association for predicted loss-of-function (pLOF) variants and predicted deleterious missense variants in LMNA. (a) Gene burden PheWAS of pLOF variants (N = 11 carriers) and missense variants predicted to be deleterious by 5/5 algorithms (SIFT, PolyPhen2 HumDiv, PolyPhen2 HumVar, MutationTaster, and LRT; N = 24). The blue line represents a p value of 0.05, and the red line represents the Bonferroni corrected significance threshold to adjust for multiple testing (p = 0.05/333). (b) Plot of p value for gene burden association with “primary/intrinsic cardiomyopathy” using pLOF variants and missense variants predicted to be deleterious per various REVEL cutoff scores as well as 5/5 algorithms. Each point is labeled with the number of exome-sequenced individuals who are carriers for missense variants in each threshold category without using a minor allele frequency threshold. (c) Venn diagram of number of exome-sequenced carriers for missense variants predicted to be deleterious by 5/5 algorithms and/or with a REVEL score ≥0.65. (d) Gene burden PheWAS of pLOF variants (N = 11) and missense variants with REVEL scores of at least 0.65 (N = 61). The blue line represents a p value of 0.05, and the red line represents the Bonferroni corrected significance threshold to adjust for multiple testing (p = 0.05/333).

However, we noted that there were a substantial number of carriers for rare missense variants in LMNA that did not meet the 5/5 criteria who were diagnosed with “primary/intrinsic cardiomyopathy” (Table S1), suggesting that this algorithmic filter was too stringent. To capture more individuals with pathogenic missense variants, we utilized REVEL, which has been reported to more accurately distinguish pathogenic from neutral missense variants, particularly those with MAFs less than 0.5%, compared with other predictive methods.14 Analysis of variance on ClinVar-annotated variants showed that REVEL scores correlate with clinical pathogenicity (Table S2). While a threshold of 0.5 has been suggested,14 we experimented with REVEL score thresholds in bins of 0.05 to evaluate the optimal score cutoff for capturing the most robust association with cardiomyopathy as a positive control (Fig. 1b). Of note, all REVEL cutoff scores of at least 0.5 performed better in identifying association with “primary/intrinsic cardiomyopathy” compared with the usage of 5/5 algorithms.

We chose a REVEL cutoff score of 0.65 given its optimal p value for association with “primary/intrinsic cardiomyopathy” (Fig. 1b) while maintaining relatively high numbers of carriers for predictably deleterious LMNA variants. This cutpoint included 19 of the 24 carriers (11 of the 14 variants) that met the 5/5 criteria, but also included 42 additional carriers (21 variants) that did not meet the 5/5 criteria (Fig. 1c). PheWAS of the LMNA gene burden of pLOF variants plus missense variants with REVEL scores of at least 0.65 (N = 72 carriers) revealed a much more robust signal for cardiomyopathy and related phenotypes (Fig. 1d, Table 1). Of note, the signal was more statistically robust compared with other recently developed ensemble methods for predicting pathogenicity such as VEST325,26 (Fig. S3), M-CAP27 (Fig. S4), and CADD28 (Fig. S5). Furthermore, we addressed potential issues of small sample sizes by using Firth’s penalized likelihood approach, and found that beta and p value estimates were consistent with exact logistic regression (Table S3). Importantly, only 6 of the 35 individuals with a rare deleterious variant in LMNA and a diagnosis of “primary/intrinsic cardiomyopathy” had been molecularly diagnosed with a LMNA variant (Table 1), indicating that LMNA cardiomyopathy is substantially underdiagnosed. Furthermore, 15 missense variants with REVEL scores >0.5 that are annotated as variants of uncertain significance or having conflicting interpretations of pathogenicity had at least one carrier with a diagnosis of “primary/intrinsic cardiomyopathy” and/or “cardiac conduction disorder” (Table S1).

Table 1 Demographics, clinical characteristics, and significant cardiovascular PheWAS associations for individuals in Penn Medicine Biobank (PMBB) carrying a predicted deleterious LMNA variant

Given the variety of cardiovascular traits that were highly significant in the REVEL-informed gene burden PheWAS for LMNA, we addressed whether these are independent signals. After running association analyses among all individuals with a phenotype of “primary/intrinsic cardiomyopathy," we found that the entire spectrum of cardiovascular PheWAS signals disappeared, suggesting that the other cardiac phenotypes were secondary to primary cardiomyopathy in carriers of the deleterious LMNA variants (Fig. S6).

In addition to cardiac disease phenotypes, our REVEL-informed LMNA gene burden PheWAS also identified phenome-wide significant disease phenotypes that are not typically defined as laminopathies, including “chronic kidney disease, stage III” (p = 1.13E-06; Fig. 1d, Table 1). The relative persistence of the association signal for “chronic kidney disease, stage III” (p = 1.33E-03) when controlling for primary cardiomyopathy suggests an independent pathophysiological mechanism for renal failure in the context of loss of function in LMNA (Fig. S6).

We replicated these observations in the DiscovEHR cohort using the same approach (pLOFs plus REVEL score ≥0.65; Table S4a). There was a significant association between LMNA gene burden and dilated cardiomyopathy (odds ratio [OR]: 4.2 [95% CI: 1.3–10.0], p = 0.005; Table S4b). Furthermore, the association of LMNA gene burden with chronic kidney disease was also replicated (OR: 1.6 [95% CI: 1.1–2.5], p = 0.02; Table S4b).

Association of LMNA gene burden with cardiovascular imaging and clinical laboratory data

To build upon the PheWAS findings, we took a deeper dive into the cardiovascular imaging and laboratory EHR data (Table 1). First, we analyzed the cardiac structures of these individuals by interrogating available echocardiography data. By doing so, we also aimed to better define the PheCode “primary/intrinsic cardiomyopathy,” which does not differentiate between the different types of primary cardiomyopathy. Carriers of rare deleterious LMNA variants had heart morphology consistent with dilated cardiomyopathy when compared with the rest of the PMBB population with echo data available (Table 2, Table S5a, b). More specifically, carriers had significantly increased left atrial volume indices, decreased left ventricular ejection fractions, decreased left ventricular outflow tract velocity time integrals, and increased mitral E/A ratios as an indication for weak atrial contraction.

Table 2 Cardiac architecture for carriers of presumed deleterious variants in LMNA is consistent with dilated cardiomyopathy

We also conducted similar quantitative analyses for select clinical laboratory measurements. Carriers of predicted deleterious LMNA variants had significantly elevated alanine transaminase (ALT) and aspartate transaminase (AST) levels when compared with individuals not carrying a predicted deleterious LMNA variant (Table 3, Table S6a). In the overall population, carrier status was significantly associated with increased total cholesterol levels (Table 3, Table S6a,b). Furthermore, maximum blood triglyceride levels trended to be elevated among carriers (p = 0.0559; Table 3). These laboratory features are consistent with subclinical features of partial lipodystrophy, such as fatty liver and dyslipidemia. While only 2 of the 72 carriers of predicted deleterious variants had an ICD diagnosis of “lipodystrophy,” there were 44 carriers with a phenotype of “hyperlipidemia,” 20 carriers with a diagnosis of “type 2 diabetes," and eight with “secondary diabetes mellitus.” Comprehensive investigation of physical exam notes written by health-care providers for individuals with these related metabolic phenotypes showed no mention of loss of subcutaneous fat from the extremities, trunk, or gluteal region, which is the classic presentation specific to partial lipodystrophy type 2.

Table 3 Clinical laboratory measurements for carriers of presumed deleterious variants in LMNA is consistent with subclinical features of partial lipodystrophy and renal disease

Finally, regarding the identification of “chronic kidney disease, stage III” from our REVEL-informed gene burden PheWAS, we compared quantitative markers of renal disease between carriers of predicted deleterious LMNA variants and noncarriers in PMBB. We found that carrier status was associated with significantly decreased eGFR and serum albumin levels (Table 3, Table S6a, b). Furthermore, eGFR was still significantly decreased among carriers of predicted deleterious LMNA variants after adjusting for lifetime diagnosis of both congestive heart failure and diabetes mellitus, as well as adjusting for each diagnosis separately (Table 4). Additionally, serum albumin was also significantly decreased for carriers of predicted deleterious LMNA variants after adjusting for both heart failure and diabetes mellitus lifetime diagnoses (Table 4).

Table 4 Renal clinical laboratory measurements for carriers of presumed deleterious variants in LMNA are consistent with primary renal disease

DISCUSSION

While exome-wide interrogation of patients with shared phenotypic traits has been successful in identifying many new genetic variants associated with rare human disease, proving causality of disease due to pathogenic genetic variants in humans in vivo remains enigmatic.29,30 We attempt to address the limitations of traditional phenotype-first approaches through this study, which represents a genome-first approach to analyzing the clinical manifestations of predicted deleterious variants in LMNA by fully utilizing available EHR data. Our study serves as an example of a genome-first approach for studying the medical consequences of rare pLOF and deleterious missense genetic variants in specific genes within the context of large health-care biobanks linked to extensive EHR phenotypic data.

An important area of research in precision medicine initiatives is to create a platform by which health-care providers can make accurate diagnoses based on a wide variety of personalized health data, including individuals’ genetic information. However, current genetic panels offered at most health-care institutions cover only a small portion of genetic variants implicated in rare human diseases.31 We suggest that the pipeline for interpretation of variants in LMNA identified via clinical genetic testing should be updated, as indicated by the number of variants of uncertain significance (VUS) identified in PMBB that we suggest may be pathogenic given the combination of their association with cardiomyopathy and/or arrhythmia and their predicted deleteriousness. Additionally, we found that important molecular diagnoses were missed, as many carriers for predicted deleterious variants in LMNA with dilated cardiomyopathy had not been sequenced for LMNA. In our analysis of PMBB, 35 individuals with a diagnosis of “primary/intrinsic cardiomyopathy” had a rare deleterious variant in LMNA and only six had been previously tested and molecularly diagnosed with a LMNA variant, suggesting that there is a lack of genetic testing for laminopathies in patients with cardiomyopathy of unknown etiology. Currently, LMNA genetic testing is not routinely offered to all patients with dilated cardiomyopathy unless a genetic cause is suspected to underlie dilated cardiomyopathy as a primary condition.32,33,34 Furthermore, all six individuals who received testing were identified as carriers for known pathogenic variants, suggesting that some carriers of potentially pathogenic variants annotated as VUS as well as novel variants would not have been identified even if offered genetic testing in the clinic. Similarly, familial partial lipodystrophy due to a pathogenic LMNA variant is also likely underdiagnosed.

Although there are no current therapies specific to LMNA cardiomyopathy, there is benefit to making the molecular diagnosis with regard to providing an etiology for the cardiomyopathy, predicting clinical course and complications, and testing other family members at risk. More effective molecular diagnoses can lead to change in medical management for these individuals who are at high risk for arrhythmic sudden cardiac death.35,36 In the clinical setting, dilated cardiomyopathy patients with confirmed pathogenic LMNA variants are often referred for electrophysiologic risk stratification earlier than other patients with nongenetic dilated cardiomyopathy. Thus, while evaluation of the contribution of individual variants remains clinically challenging and a definitive classification of pathogenicity for each presumed deleterious variant is hard to predict, our analyses suggest that earlier identification of laminopathies through an improved framework promoting genetic testing in the clinical setting using a comprehensive and updated variant panel is warranted to provide earlier, preventive treatments.

Additionally, the increased number of specific pathogenic variants in LMNA identified through this genome-first approach will provide greater insight into LMNA structure–function. Interestingly, 19 of 29 known ClinVar-annotated pathogenic missense variants cause a deviation from arginine in various locations of the LMNA protein product, highlighting a potential importance of the positively charged arginine in the LMNA protein structure, consistent with previous studies identifying arginine in many splicing binding sites for generating prelamin A and lamin C.37 Notably, among novel missense variants discovered in this study, 8 of 18 variants with REVEL scores of at least 0.65 cause deviations from arginine, consistent with the prevalence of these changes in known clinically pathogenic missense variants.

This approach to inclusion of REVEL-annotated likely deleterious missense variants in a gene burden has the advantage of increasing the power for gene burden PheWAS analyses that can identify novel gene ontologies, as seen by the identification of advanced renal disease in the context of loss of function in LMNA. While renal abnormalities are possible direct clinical sequelae related to heart failure and diabetes mellitus, pathophysiological mechanisms for renal failure due to pathogenic LMNA variants through primary, noncardiorenal processes have recently been suggested.38,39 We report impaired renal function and hypoalbuminemia in the context of loss of function in LMNA, even after adjusting for both a lifetime diagnosis of congestive heart failure and diabetes mellitus, suggesting a pathophysiology for renal failure due to a proteinuric, primary nephrotic clinical picture that may be confounded by, yet independent of, the pathophysiology of heart failure in dilated cardiomyopathy and the overlap with diabetes in partial lipodystrophy. Our results suggest a clinical or subclinical nephrotic phenotype due to loss-of-function variants in LMNA that may have been further masked by comorbid cardiac and metabolic disease traits, calling for follow-up studies interrogating primary renal disease as a potential novel laminopathy.

In conclusion, we used an approach to include pLOFs and REVEL-annotated deleterious missense variants in LMNA in a gene burden to show by PheWAS, using a relatively small number of carriers, significant associations with primary dilated cardiomyopathy, laboratory values consistent with partial lipodystrophy, and a novel finding of chronic kidney disease. We demonstrate the importance of deeply interrogating quantitative data in the EHR to uncover important clinical and subclinical information relevant to other rare laminopathies implicated by deleterious LMNA variants. Our approach suggests an expanded role for clinical genetic testing for patients who present with primary dilated cardiomyopathy or early pathophysiologic signs like conduction defects. Importantly, our study also lays a methodological framework by which future studies can uncover novel gene–disease relationships and identify novel pathogenic loss-of-function variants across the human genome through genome-first analyses of large, heterogeneous health care–based populations.