Relevance of Titin Missense and Non-Frameshifting Insertions/Deletions Variants in Dilated Cardiomyopathy

Recent advancements in next generation sequencing (NGS) technology have led to the identification of the giant sarcomere gene, titin (TTN), as a major human disease gene. Truncating variants of TTN (TTNtv) especially in the A-band region account for 20% of dilated cardiomyopathy (DCM) cases. Much attention has been focused on assessment and interpretation of TTNtv in human disease; however, missense and non-frameshifting insertions/deletions (NFS-INDELs) are difficult to assess and interpret in clinical diagnostic workflow. Targeted sequencing covering all exons of TTN was performed on a cohort of 530 primary DCM patients from three cardiogenetic centres across Europe. Using stringent bioinformatic filtering, twenty-nine and two rare TTN missense and NFS-INDELs variants predicted deleterious were identified in 6.98% and 0.38% of DCM patients, respectively. However, when compared with those identified in the largest available reference population database, no significant enrichment of such variants was identified in DCM patients. Moreover, DCM patients and reference individuals had comparable frequencies of splice-region missense variants with predicted splicing alteration. DCM patients and reference populations had comparable frequencies of rare predicted deleterious TTN missense variants including splice-region missense variants suggesting that these variants are not independently causative for DCM. Hence, these variants should be classified as likely benign in the clinical diagnostic workflow, although a modifier effect cannot be excluded at this stage.

www.nature.com/scientificreports www.nature.com/scientificreports/ Variants leading to truncations of the giant sarcomeric protein titin (TTNtv) are the most frequent cause of DCM. Recently, we and others have identified clinically relevant TTNtv in about 20% of unselected and familial DCM cases [7][8][9][10] . Interestingly, DCM-causing TTNtv are clustered in the clinically relevant A-band region of the gene whereas TTNtv in reference and healthy individuals are clustered more in the I-band region where they may be spliced out, due to the usage of an alternative promoter, from mature long TTN isoforms resulting in minimal or no functional consequences 11 . Interpretation of TTNtv in clinical setting has been challenged by several factors including lack of knowledge of the true frequency of TTNtv in the general population, lack of understanding of different protein domains and alternatively spliced isoforms, and variable disease presentation hampering assessment of co-segregation. We and others have previously addressed these issues 7,9,[12][13][14] . However, assessing the relevance of TTN missense variants in the pathogenesis of DCM has been even more difficult due to the high frequency of these variants in both reference individuals and DCM patients and the absence of large pedigrees where segregation can be studied. While few studies have reported modifier roles for TTN missense variants 15,16 , others have considered TTN missense variants as disease-causing variants in DCM, arrhythmogenic right ventricular cardiomyopathy (ARVC), and left ventricular non-compaction cardiomyopathy (LVNC) but these observations suggest recessive inheritance pattern and no convincing segregation has been demonstrated so far [17][18][19] . Furthermore, TTNtvs are also occasionally found in patients with SCD/aborted SCD who do not have diagnosis of DCM or DCM with LVNC 20,21 In the current study, we focused on TTN missense and non-frameshifting insertion-deletion (NFS-INDELs) variants identified in DCM patients and reference population. By analyzing TTN missense and NFS-INDELs variants identified in 530 primary DCM patients, over 60,000 ExAC individuals, and for the first time, the largest available 123,136 gnomAD database reference individuals, we report a seemingly insignificant enrichment of TTN missense variants in DCM patients compared with reference individuals.

Results
Patient Cohort. DCM patients in this multi-centre study were recruited from three major cardiogenetic centres in Finland and the Netherlands. All patients fulfilled diagnostic criteria for DCM. Table 1 summarizes spectrum of pathogenic and likely pathogenic variants identified in the patients.

Deriving Allele Frequency for Potentially Deleterious DCM Variant.
To set a threshold for filtering TTN missense variants identified in DCM cohorts, we obtained the gnomAD allele frequency of the most common pathogenic variant associated with DCM with a wide consensus of the pathogenicity in the field. Covering 100% of the total gnomAD alleles, the PLN c.40_42delAGA, p.Arg14del (NM_002667.4) variant is the most common "certainly pathogenic" DCM variant [22][23][24][25][26] . This variant is absent in the ExAC individuals (allele count [AC] = 0), while it is present in two individuals (AC = 2) in the gnomAD database. Based on these observations, the minor allele frequency (MAF) cut-offs were set to 0.0016% (AC = 2) and 0.0016% (AC = 4) for ExAC and gnomAD datasets respectively, and variants with higher frequency in the reference populations would not be expected to be pathogenic.  www.nature.com/scientificreports www.nature.com/scientificreports/ NFS-INDELs ( Fig. 1), 39 frameshift variants, 27 nonsense variants, 14 splice-site variants, 205 silent variants, 4 untranslated region (UTR) variants, and the remaining 607 variants being intronic. Of the total variants, 52.7% were private (Fig. 2). Applying the ExAC and gnomAD MAF cut-offs yielded a total of 87 missense and 5 NFS-INDEL variants. Taken together, 18.9% and 1.9% of DCM patients carried rare heterozygous missense and NFS-INDELs, respectively. Given that many rare and even private variants are believed to be innocent bystanders, variant pathogenicity potential was further assessed by using three in-silico prediction tools. Implementing this, 33.3% (29/87) of the rare missense variants were predicted deleterious. Interestingly, 69.0% (20/29) of the predicted deleterious missense variants were in the A-band region of TTN. Of note, 55.2% (16/29) of the rare missense variants were absent in both ExAC and gnomAD reference individuals. Bioinformatic filtering strategy to identify "deleterious" variants. Total TTN variants were filtered using allele frequency (0.0016%) threshold derived from the frequency of known pathogenic DCM variants identified in gnomAD reference individuals. Variants more frequent than derived thresholds in respective sources were not expected to be deleterious. www.nature.com/scientificreports www.nature.com/scientificreports/ Twenty-nine rare heterozygous TTN missense variants predicted deleterious were identified in 6.8% (36/530) of DCM patients, with only one patient harbouring two rare heterozygous TTN missense variants. Heterozygous NFS-INDELS affecting at least five transcripts of the TTN (Transcript Count Index, TCI ≥ 5) were identified in 0.37% (2/530) of DCM patients. Of note, all the NFS-INDELS were in the I-band region of TTN (Tables 2 and 3). When considering mutation status, gene elusive DCM patients and mutation positive DCM patients had comparable frequencies of rare TTN missense variants further suggesting that these variants may not be monogenic cause of DCM (Supplemantary Table S1).

TTN Missense and NFS-INDELs Variants in Reference Population.
Applying the derived allele frequency thresholds to TTN variants reported in ExAC, we identified 8265 and 82 rare missense and NFS-INDELs variants respectively in 16.5% (10035/60707) and 0.15% (93/60707) of the ExAC individuals. Of the ExAC rare missense variants, 24% (1990/8265) were predicted deleterious by the three prediction tools used in this study. In total, 3.98% of ExAC individuals carried a rare TTN missense variant predicted deleterious. In addition, 0.11% (65/60706) of ExAC cohort harboured rare heterozygous NFS-INDELs affecting at least five transcripts of TTN.
Given that gnomAD cohort is almost twice the size of ExAC cohort and ExAC individuals are in gnomAD cohort, we set the threshold to four and considered variants more frequent than this in gnomAD as likely benign. With this threshold, we identified 13808 rare missense variants present in 17.1% (21102/123136) of gno-mAD cohort. However, rare heterozygous TTN missense variants predicted deleterious were observed in 5.7% (7071/123136) of the gnomAD (Table 2). Notably, 54.6% and 52.2% of TTN missense and NFS-INDELS identified in ExAC and gnomAD cohorts respectively were private (Fig. 2).

Multiple Heterozygous TTN Variants.
To estimate the frequency of reference individuals with multiple heterozygous TTN variants, the phase 3 call set of the 1000 Genomes project was utilized since it provides sample-level genotype data. Only TTN missense variants with allele frequency less than or equal the derived thresholds were used for this analysis. We identified 8.2% (205/2504) of the 1000 Genomes Project phase 3 cohorts with a rare heterozygous TTN missense variant. Of these, 5.4% (11/205) harboured two heterozygous rare TTN missense variants.
When considering eleven individuals in 1000 Genomes Project phase 3 cohorts with TTN truncations (TTNtv) as published previously 12 , 18.2% (2/11) of these individuals, both from South Asian population also harboured a rare heterozygous TTN missense variant, thus, being heterozygous for both rare TTNtv and TTN missense variant. However, it is unclear whether they are in cis (at the same allele) or in trans (at different allele) Interestingly, only one DCM patient in this study harboured two rare heterozygous TTN missense variants, one in the A-band and the other at the distal I-band TTN region. Five out of thirty-six (13.9%) DCM patients with rare TTN missense variants predicted deleterious harbored a TTNtv ( Table 4).  www.nature.com/scientificreports www.nature.com/scientificreports/  (Table 2 and Fig. 3). Moreover, distribution of TTN missense variants was similar in patients with a known pathogenic or likely pathogenic DCM mutation (Mutation positive) compared to those without a mutation (Mutation negative) when the variant classification follows ACMG criteria (Supplementary Table S1).   www.nature.com/scientificreports www.nature.com/scientificreports/

Splice-Region Missense Variants with Predicted Splicing Effect are not enriched in DCM.
Given that variants that affect splicing can directly cause disease or contribute to disease severity, we analysed missense variants located near exon-intron boundaries (splice-regions). A total of five rare splice-region missense variants affecting 3.6% (19/530) of DCM patients was identified. Of note, only two out of five were predicted to potentially alter natural splice site or activate exonic cryptic splice-sites. When compared with ExAC reference individuals (DCM: 3.6% vs. ExAC: 2.9%; P = 0.4624) and gnomAD reference individuals (DCM: 3.6% vs. gnomAD: 3.0%; P = 0.4808), there was no significant enrichment of variants with high potential for defective splicing in DCM patients.

Discussion
Recent advancement in high-throughput sequencing technology has unravelled TTN as a major human disease gene, mutated in both human skeletal and cardiac muscles disease. Inclusion of the TTN gene in genetic screening for DCM has led to an increase in diagnostic rate for adult DCM by at least 18% and 25% respectively in sporadic and familial DCM cases [8][9][10]27,28 . Much attention has been focused on the assessment and clinical interpretation of frameshift, nonsense and splice-site variants, leading to a truncation of TTN, leaving missense and NFS-INDELs variants in this major human disease gene unassessed and unexplored. Consequently, these variants are difficult to assess and interpret in the clinical diagnostic workflow.
DCM is a genetically heterogeneous disorder and multiple genes have been implicated in the pathogenesis of DCM. Truncations of the giant sarcomeric protein, TTN, are to date, the most common genetic cause of DCM, accounting for about 20% of cases, and are overrepresented in the A-band region of the protein. Such A-band and/ or distal I-band TTNtv have been estimated to have nearly 98% chance of being disease causing when identified in unselected DCM patients 7 . Interestingly, a recent study showed a significant association of TTNtv in constitutive exons throughout TTN protein with DCM 29 . Furthermore, it has been shown that TTNtv rarely cause pediatric DCM but the gene typically expresses the disease at middle-age 30 . In addition, we and others have addressed several issues and challenges associated with the clinical assessment and interpretation of TTNtv in both research and clinical diagnostic setting 7,9,[12][13][14] . In contrast, the role and contribution of missense and NFS-INDELs variants in this gene has been poorly addressed so far.
In this European multi-centre DCM study, we assessed the possible role of TTN missense and NFS-INDELs variants in DCM. By analyzing TTN variants in: 1) a large primary DCM cohort (n = 530), 2) the widely used ExAC reference individuals, and 3) the largest available gnomAD database reference individuals, we showed that there is a statistically insignificant enrichment of rare TTN missense and NFS-INDEL variants in DCM patients compared with reference individuals. This observation holds even when considering different TTN regions, suggesting that TTN missense and NFS-INDEL variants do not cause DCM and should therefore be classified as "likely benign" in clinical diagnostic settings. Using stringent bioinformatic filtering criteria, we identified rare TTN missense variants predicted deleterious in 7.0% of primary DCM cases as against 5.7% in gnomAD reference individuals (OR = 1.2; P = 0.22). When considering different TTN domains (Z-disk, I-band, A-band and M-band), we identified no enrichment of these variants in DCM patients compared to reference individuals.
A recent study assessing the role of TTN missense variants identified "severe" missense variants in 25.2% (37/147) of DCM probands 15 . We randomly selected five "severe" missense variants as published by the authors and applied our bioinformatic filtering strategy. Of note, 40% (2/5) of these "severe" variants were found in at least seven gnomAD database reference individuals, and as such would not be considered "deleterious" using our approach. Interestingly, five probands in the study were double heterozygous for a "severe" TTN missense variant and a pathogenic (P)/likely pathogenic (LP) variant in LMNA, MYH7 and SCN5A. Three (3/5) of these patients harboured a P/LP LMNA variant in addition to "severe" TTN variant. Two of the "severe" TTN missense variants were later reclassified by the authors as "unlikely" disease associated, possibly due to lack of segregation, and third patient harbouring a P/LP LMNA variant co-occurring with a "severe" TTN missense variant had undergone heart transplantation 15 . In another study of an extended DCM family with 14 affected subjects, four had severe DCM requiring heart transplantation in early adulthood 31 . In all affected patients, the P/LP LMNA p.(Lys219Thr) variant was identified. In addition to this variant, a TTN missense variant, p.(Leu4855Phe), was identified in all www.nature.com/scientificreports www.nature.com/scientificreports/ four patients with severe phenotypes, all of whom had transplantation at a younger age compared to those with only LMNA p.(Lys219Thr) variant. Although this may suggest that TTN missense variants can modify disease severity, the TTN p.(Leu4855Phe) is found heterozygous in 15 individuals in gnomAD reference population and the position is multi-allelic, thus stronger evidence is needed before scientific interpretation of potential modifier role can be inferred.
Disruption of natural splice sites by point mutations can result in exon skipping, activation of cryptic splice-site, and rarely intron retention 32 , with activation of cryptic splice-site being the second most frequent consequence of such mutations. Such point mutations, put together, account for 10-15% of all mutations causing human inherited disease 33,34 . By analyzing missense variants located near exon-intron boundaries (splice-region) of TTN, we identified five rare splice-region missense variants in DCM patients. Two (40%) of these five splice-region missense variants were predicted to disrupt natural splice-site. To test for possible enrichment of this event in DCM patients, we randomly and iteratively selected five rare splice-regions variants from gnomAD and tested their potential to alter splicing. Surprisingly, two (40%) of the five randomly chosen splice-region missense variants were predicted to alter splicing, suggesting that splice-region missense variants identified in our DCM cohort do not carry higher risk of splicing defect than any other splice-region missense variants in the population.
DCM cohorts in this study were sequenced using high coverage sequencing assays yielding more uniform coverage even across difficult-to-sequence regions that may alone increase likelihood of finding slightly higher variant counts from DCM cohorts compared to the population databases. If we overestimate that DCM has a prevalence of 1 per 500 and further speculate that TTN missense variants would account 10% of all DCM cases leading to a frequency of 1 per 5000 (0.02%) in TTN missense positive DCM. However, 5.7% of gnomAD reference population have rare missense variant predicted deleterious by in-silico tools, thus 99.7% of these missense variants are not expected to cause DCM in monogenic manner. Thus, these variants should be classified as likely benign in clinical and diagnostic workflow as far as more understanding have been gathered of them individually and as a group.
Rare Limitations. We acknowledge the following limitations: The health status of reference individuals used is not known even though curators of reference population databases have made every effort to exclude individuals with severe pediatric diseases from these cohorts. Moreover, we had limited evidence of the effect of predicted deleterious TTN missense variants in patients with other P/LP variants including TTNtv. Functional analysis of predicted deleterious TTN missense and NFS-INDELs was beyond the scope of this study; and we recommend that such should be combined with segregation information and de novo status to reclassify the variants.

Conclusion
The current data revealed that predicted deleterious TTN missense and NFS-INDEL variants are not significantly enriched in a large group of DCM patients compared to control populations, and should therefore be classified as likely benign in a clinical diagnostic setting. Nevertheless, it cannot be excluded that some rare TTN missense variants might have modifier effect for the phenotype.

Patient Cohorts and Clinical Evaluation.
This European multi-centre study was approved by institutional ethical committees of the University of Helsinki, Academic Medical Centre (Amsterdam, the Netherlands), and University Medical Centre (Groningen, the Netherlands) and informed consent was obtained from all subjects. The DCM cohort in this study comprised of 530 unrelated primary DCM patients of European origin. The patients were recruited at three major cardiogenetic centres in Finland (Helsinki; n = 145) and the Netherlands (Amsterdam; n = 216, and Groningen; n = 169). All patients fulfilled the diagnostic criteria for DCM (LV size/ volume >117% or >2 SD of the reference value and LV-EF <45%) 35 .
Genetic Studies of DCM. Genomic DNA was extracted from patients' blood samples by standard procedures and guidelines. TTN was sequenced in a total of 530 primary DCM patients using targeted panels of increasing sizes as previously described 8,27 . Irrespective of the sequencing panel used, sequence data from each patient was processed uniformly following GATK best practices recommendation to identify genetic variants 36 . Identified variants were annotated using Ensembl's Variant Effect Predictor (VEP v87) tool 37 , and SIFT 38 , PolyPhen 39 , and MutationTaster 40 pathogenicity scores and predictions were obtained for each missense variant via the database of nonsynonymous SNP functional prediction (dbNSFPv2.9.2) 41 . Rare TTN missense variants predicted deleterious by the three prediction tools were considered candidate risk variants. NFS-INDELs were assessed using the Transcript Count Index (TCI) as previously published 7 .

Analysis of TTN Missense and NFS-INDELs Variants in the Population. Exonic boundaries of TTN
in build 37 of the human genome (hg19) were downloaded from the Ensembl Genome Browser (www.ensembl. org), and were used to query ExAC and gnomAD databases (accessed 23 July 2017) for TTN exonic variants. Extracted variants were filtered for quality (Phred Quality Score 29) and coverage (15.0X), and calls that passed all quality filters were used for downstream analysis. The two datasets were re-annotated using VEP v87, processed uniformly to identify spectrum and distribution of missense and NFS-INDELS mutations in the reference population.
www.nature.com/scientificreports www.nature.com/scientificreports/ Variant Mapping to Uniprot Protein Domain. To allow inclusion of variants located in exons not present in the principal cardiac long isoform (N2BA), otherwise known as the canonical transcript with the Uniprot ID Q8WZ42-1, the location of variants identified in DCM patients as well as in the reference population was reported with respect to the Meta transcript annotation -(ENST00000589042/NM_001267550). Protein domain annotation for the Uniprot consensus sequence Q8WZ42-1 was transferred to the inferred Meta transcript as described earlier 7,12 . In-silico Splicing Defect Prediction. The effect of splice-region missense variants on splicing efficiency were predicted with five different tools -Human Splicing Finder 42,43 , Splice Site Prediction by Neural Network (NNSPLICE) 44 , MaxEntScan 45 , GeneSplicer 46 , and Splice Site Finder, via Alamut splicing software v2.0 (Interactive Biosoftware, France) using default settings in all predictions. Controls for each rare splice-region missense variant were selected from gnomAD from near-by genomic location and with similar allele frequency.
Statistical Analysis. Between groups comparisons for categorical variables were performed using χ 2 test if appropriate, otherwise, Fisher's exact test. For non-parametric and continuous variables, between groups comparisons were done using Mann-Whitney U tests and independent samples t-tests combined with Levene's tests respectively. Bonferroni correction was performed on all analyses to adjust for multiple testing. All statistical analyses were done using R statistical software (version 3.3.3).