Introduction

Systemic amyloidoses are a group of rare diseases characterised by extracellular deposition of amyloid fibrils. The most common subset of amyloidoses includes primary systemic amyloidosis (AL), reactive systemic amyloidosis (AA) and transthyretin (TTR) amyloidosis (ATTR amyloidosis). However, more than 30 different proteins have been described as causing these disease [1, 2].

Focusing on ATTR amyloidosis, amyloid deposition material is produced by the destabilisation of the transthyretin protein (TTR). This protein is predominantly produced in the liver (>95%), the choroid plexus and the retinal pigment epithelia, and it is involved in the transport of thyroxine and retinol [3].

ATTR amyloidosis can be sub-classified as one of two types. One type is associated with aging, and it is known as senile systemic amyloidosis or “wild-type ATTR” (ATTR-wt). The other type is known as hereditary amyloidosis (ATTR-m) due to point changes in the TTR allele that cause a deposition of mutant TTR amyloid [4].

ATTR-m has an autosomal dominant inheritance. It is caused by single amino-acid changes in the transthyretin gene, which is located on chromosome 18 (18q12.1). This gene is divided into four exons and, despite being small, over 130 allelic variants have been described, most of which are amyloidogenic [5, 6].

ATTR amyloidosis was first described in 1952 by Andrade in Portugal. In the following decades, other endemic areas in Japan and Sweden were also described [7]. Although the disease first appeared to be endemic, advances in diagnostic techniques have led to diagnoses worldwide [5].

Clinical manifestations depend mainly on the existing variant. In the c.148G>A (p.(Val50Met)) located in exon 2 (most common in Europe), neurological manifestations are predominant (familial amyloid polyneuropathy [FAP]); in other variants, such as c.424G>A (p.(Val142Ile)) located in exon 4 (most common in the United States), a cardiac phenotype is predominate. In any case, there are described mixed phenotypes and other organ involvement [4, 8].

FAP has an estimated prevalence of 1/100,000 in Europe and the United States. In endemic regions of Portugal, Japan, Brazil or Sweden, the prevalence ranges from 1/1000 to 1/10,000 people. In Majorca, 5/100,000 people have been observed with FAP, and in Cyprus, 3.72/100,000 people [9,10,11]. ATTR-m with a cardiac phenotype has a higher frequency than FAP; the c.424G>A (p.(Val142Ile)) in the African population is 3–4%, while in Caucasian and Latin populations are <0.5% [5, 12].

The age of onset ranges from the second to ninth decade of life [13]. There is great heterogeneity in penetrance data depending on the phenotype, genotype and environmental factors. There are few references in the literature regarding the penetrance of the disease, the available data focus on the variant c.148G>A (p.(Val50Met)), the penetrance of the disease in carriers of these variant is different in different areas, increasing the risk of suffering the disease with age. In Portugal, the penetrance at 50 years is 80% compared with an 18% in French population. These percentages increase considerably with age, so at 70 years penetrance in Portugal increases to 91 and 50% in France. Swedish carriers have an even lower risk of 11% by age 50 and 36% by age 70 years [10, 14]. Anticipation phenomenon has been observed and furthermore, there is a risk of earlier onset when the mutation is inhered from the mother [12, 15].

In addition to the difficulties of knowing the true prevalence and penetrance of the disease, there are many references that suggest that the disease is underdiagnosed due to the clinical variability and the lack of specific symptoms or biomarkers [16,17,18]. For those reasons, we find the study of population data to be relevant to the analysis of both the TTR gene variants and the frequency of each of those variants.

There are tools that can help us to study rare diseases, such as ATTR amyloidosis. Also, there are large genomic databases—for example, the gnomAD database—that can support the study of genetic variation frequencies [19]. Our primary aim was to estimate the allele frequencies of variants in the TTR gene using the gnomAD database, especially those variants that are known to play a role in pathological processes.

Materials and methods

GnomAD is an update of the ExAC project. The gnomAD database currently contains 123,136 exomes and 15,496 genomes from unrelated individuals and from different populations, sequenced as part of various disease-specific and population genetic studies. Individuals known to be affected by severe paediatric disease, as well as their first-degree relatives, have been removed, so this data set should serve as a useful reference of allele frequencies for severe disease studies—however, individuals with severe disease may still be included in the data set, albeit likely at a frequency equivalent to or lower than that seen in the general population. No statistical methods were used to predetermine sample size. The experiments were not randomised [19].

The analysis of GnomAD data was carried out from November 2017 to January 2018. All TTR (canonical transcript ENST00000237014) variants were selected from the gnomAD database (http://gnomad.broadinstitute.org/). The variants studied were missense variants for which a change of one amino acid for another occurred; variants of disease are those that affect protein folding and that induce the accumulation of misfolded proteins, not those leading to a non-functional truncated protein. Thus, splicing variants, stop codons or frameshift changes were eliminated. Intron variants, variants in non-coding regions, or synonymous variants were not taken into account.

The variables studied were classified according to their clinical significance and according to the different populations studied in gnomAD.

  • In terms of the clinical significance, variants of the TTR gene were identified based on descriptions in other databases and in the literature. The variants found in the database, previously described in the literature, have been classified mainly according to their clinical significance, placing special interest on those described in the literature as disease producers.

The following sources were used: ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/), the Mutations in Hereditary Amyloidosis database (http://www.amyloidosismutations.com/), the Human Genetic Mutation database (www.hgmd.cf.ac.uk/) and available literature from PubMed Central (www.PubMed.com).

Following the recommendations of Human Genome Variation Society, the variants were classified as affects function, probably affects function, unknown, probably does not affect function and does not affect function [20].

It has been considered if these variants were described or not before or, also if there were or not changes described in that protein’s position. In this study we have been mainly focused on the group of variants that produce disease, these variants are associated to the formation of misfolded proteins. For those variants which are probably pathological or those with an uncertain significance, most of them not described in the literature yet, these studies are not available.

The pathogenic potential of each unreported variant was classified as recommended by the American College of Medical Genetics and Genomics [21].

  • In terms of populations, all variants analysed were classified according to the diverse ethnicities defined in the gnomAD database. The population classifications were as follows: non-Finnish European (45.71%), African/African American (8.67%), Latino (12.41%), Finnish (9.30%), Ashkenazi Jewish (3.66%), East Asian (6.81%), South Asian (11.10 %) and other (2.33%).

Results

We found 71 missense variants in the TTR gene, with an average of 115,780 exomes studied. The variants were divided into the following categories: affects function, probably affects function, unknown, probably does not affect function and does not affect function variants.

Affects function variants

From the database search results, 11 variants were described in this group (Table 1). The frequency of variants that affect function was 1:230 people. The most frequently detected were c.424G>A (p.(Val142Ile)) and c.148G>A (p.(Val50Met)) which represented 88% and 5%, respectively, of all affect function variants detected.

Table 1 Affects function variants of the TTR gene from the gnomAD database

The c.424G>A (p.(Val142Ile)) variant had a minor allelic frequency (MAF) of 0.00151 and a prevalence of 1:332. The most commonly represented population was African, with a prevalence of 1:31 people (MAF 0.01602). The c.148G>A (p.(Val50Met)) variant represented 5% of this group, resulting in an estimated prevalence of 1:4924 (MAF 0.000102). The distribution by subpopulations of these variants is presented in Table 2. By dividing this group by ethnicity, we found that in Europe, the prevalence was 1:1269 people, and the most frequent was c.148G>A (p.(Val50Met)), with one carrier per 2792 inhabitants (MAF 0.000179) (Table 3).

Table 2 Population descriptions of c.424G>A (p.(Val142Ile)) and c.148G>A (p.(Val50Met)) of the TTR gene
Table 3 Variants of the TTR gene classified as affects function variants from the gnomAD database in the European population

Probably affects function variants

Seventeen variants were classified as probably affects function variants, and the allelic frequencies of those variants ranged from 0.0000041 to 0.0000646. The most frequently identified variants were c.235A>G (p.(Thr79Ala)), with an estimated prevalence of 1:11,191 people (0.0000447 MAF), and c.406T>C (p.(Tyr136His)), with an estimated prevalence of 1:17,588 people (MAF 0.0000284) (Table 4).

Table 4 Variants of the TTR gene from the gnomAD database. Classified as probably affects function, unknown, probably does not affect function and does not affect function variants

Unknown variants

There were 29 variants that were classified as effect on function not known. The highest frequencies were for c.14G>A (p.(Arg5His)) and c.368G>A (p.(Arg123His)), with prevalence rates of 1:3959 people (MAF 0.000126) and 1:6927 people (MAF 0.0000722), respectively (Table 4).

Probably does not affect function variants

Four variants were classified as probably does not affect function variants, with a prevalence of 1:15,371 (MAF 0.0000325). The c.355G>A (p.(Asp119Asn)) variant was the most common in this group. Other variants included c.98T>C (p.(Met33Thr)), c.362G>A (p.(Gly121Asp)) and c.383C>T (p.(Ala128Val)) (Table 4).

Does not affect function variants

The most frequent variant in the group of no functional effect variants was c.76G>A (p.(Gly26Ser)), representing 94.5% of this group (MAF 0.0515) (Table 4).

Discussion

We used the gnomAD database to estimate the population prevalence of ATTR-m. In most papers published about this disease, the prevalence was estimated from the recorded data of patients already diagnosed. In our work, we started from a general population sample, in which we analysed allele frequency [9, 22]. Although ATTR-m is considered to be a rare disease, there is sufficient evidence to conclude that it is underdiagnosed [17, 18], and until now data of the true prevalence in the general population are not conclusive.

After analysing all exonic variants present in the gnomAD database, 71 missense variants were detected. Only 28 had been previously reported, of which 11 were considered to be amyloidogenic [6]. We classified 17 of the variants as variants that probably affects function; these had not been described before, but pathological changes in the same position had already been reported. Only 6 of the 29 variants that we determined to be of unknown significance had been described as such in the bibliography (c.14G>A (p.(Arg5His)), c.25C>T (p.(Leu9Phe)), c.52T>G (p.(Ser18Ala)), c.68C>T (p.(Thr23Met)), c.140A>G (p.(Asn47Ser)) and c.368G>A (p.(Arg123His))) [6], the others 23 are not described variants or with no changes reported in that position. Within the probably does not affect function group, the c.355G>A (p.(Asp119Asn)) variant was included, all variants with similar amino acid modifications described previously as does not affect function (c.98T>C (p.(Met33Thr)), c.362G>A (p.(Gly121Asp)) and c.383C>T (p.(Ala128Val)) [6]. Finally, in the group of does not affect function variants, all variants have been previously described, among which the most common was c.76G>A (p.(Gly26Ser)), which represented 94.5% of the group (MAF 0.0515).

If only the 11 recognised affects function variants were considered, the prevalence of amyloidogenic variants in the general population was 1:230 cases. If data from the most represented group in the gnomAD database (that is, participants with European ancestry) were considered, the prevalence decreased to 1:1269 people, probably due to the influence of the African population on the whole population. If we included variants classified as probably affects function, the overall prevalence was 1:207.

The results show a much higher frequency of amyloidogenic variants of the expected prevalence of disease according to the heretofore described 1:100,000 inhabitants or less [9, 22]. Part of this difference could be explained by incomplete penetrance of the disease, but we believe that there are likely other factors contributing to this fact, such as a fundamentally the underdiagnosis of the disease, as previously discussed [17].

In terms of the amyloidogenic variants described, the most common in the gnomAD database was the c.424G>A (p.(Val142Ile)), with a carrier frequency of 1:332 (MAF 0.00151), followed by c.148G>A (p.(Val50Met)) with a frequency of 1:4924 people (MAF 0.000102). For both variants, ethnic differences were observed—c.424G>A (p.(Val142Ile)) was especially prevalent in the African population (MAF 0.01602; prevalence of 1:31) and c.148G>A (p.(Val50Met)) was especially prevalent in the European population (MAF 0.000179; prevalence of 1:2,792).

The c.424G>A (p.(Val142Ile)) variant is currently recognised as the most common cause of hereditary amyloid heart disease worldwide, which is consistent with the results of our work in which it represents 88% of all affect function variants. Our data show that 92.33% of the carriers of this variants detected in the gnomAD database were of African descent (385 of 417 registered c.424G>A (p.(Val142Ile)) variant), 5.52% were of Latino descent and 1.2% were of European descent. According to the THAOS registry, data from the United States demonstrated similar proportions; of the 91 American patients registered as c.424G>A (p.(Val142Ile)) carriers, 79 (86.8%) were of African descent compared to 6 (6.6%) of Caucasian descent [23].

There are limited data on the overall prevalence of the c.424G>A (p.(Val142Ile)) variant, but there are several studies that focus on the African-American population. In 1996, Jacobson et al. concluded that the allelic frequency of this variant in people of African descent was 0.02, which was equivalent to 4% of that population [24]. In a new study by Jacobson et al. in 2015, a study of 1688 African-Americans showed an allelic prevalence of 0.0193 [25]. In data released by Buxbaum et al. in 2017, the allele frequency in an African-American population was estimated to be 0.017 [26]. An analysis of gnomAD data resulted in a similar pattern, with a prevalence of 1:31, which indicated that 3.2% of the African population carried the c.424G>A (p.(Val142Ile)) variant (MAF 0.01602).

A recent study of Jacobson et al in 2016, establishes the prevalence of c.424G>A (p.(Val142Ile)) in the African population [27]. The results of the countries studied show a higher prevalence in West African countries (allele frequency 0.0253) and a mean allele frequency of 0.011 in the rest of the countries. These data support the idea of a small number of founder carriers of the amyloidogenic allele in southern West Africa

Revised data showed that the single amyloidogenic variant that appears in homozygosity is c.424G>A (p.(Val142Ile)), probably due to the low number of total cases of other variants. According to the present study, 1.56% of c.424G>A (p.(Val142Ile)) carriers are homozygous; this ratio is slightly below ratios that have been described so far (3.0–9.2%) [28,29,30,31].

Accepting that c.424G>A (p.(Val142Ile)) is much more prevalent in African populations, it is possible to find patients with the c.424G>A (p.(Val142Ile)) variant worldwide. According to data from gnomAD, 1:12,666 Europeans carry this variant (MAF 0.00003948), 1:748 people in a Latin population (MAF 0.0006683) and 1:7695 people in a South Asian population carry this amyloidogenic variant (MAF 0.00006497). In the literature there are few cases of patients with no African descent, which were presented as sporadic cases, and there was a small series of cases described in Italy by Capelli et al., in which 5 of the 33 Caucasian patients diagnosed at its centre carried the variant c.424G>A (p.(Val142Ile)) [32,33,34].

The second most common amyloidogenic variant identified in the gnomAD database was c.148G>A (p.(Val50Met)) (MAF 0.000102; prevalence of 1:4924), which represented 5% of all affect function variants. To date, c.148G>A (p.(Val50Met)) has been described in many patients globally, and it represents more than 50% of the cases of familial amyloid polyneuropathy that have been described in the literature [9]. This discrepancy in the percentage is probably due to the increased aggressiveness of neurological manifestations; a greater awareness by being the first described variant; and, above all, that is present in areas considered endemic, for example, Portugal, Sweden or Japan, where this variant is responsible for almost all cases [7, 9, 12].

As for the association between c.148G>A (p.(Val50Met)) and an ethnic group, our data are consistent with previously reported data in terms of the association with European and Latin-American populations [35]. Of the 25 cases identified in gnomAD, 20 people belong to European populations, with a prevalence of 1:2792 (MAF 0.000179), and 2 people were of Latin descent, with a prevalence of 1:8395 (MAF 0.00005956).

The remaining variants that were classified as affects function occurred less frequently, including c.262A>T (p.(Ile88Leu)), with a prevalence of 1:23,098 (MAF 0.00002165). This variant is primarily associated with people of European ancestry, in which the prevalence is 1:12,668 (MAF 0.00003947). The clinical manifestations are almost exclusively cardiac; in fact, in a number of cases of Caucasian patients, c.262A>T (p.(Ile88Leu)) was the variant that was found most frequently [36]. This is in contrast to the data in African populations, in which the most frequent cardiac variant was c.424G>A (p.(Val142Ile)).

The c.190T>C (p.(Phe64Leu)) variant, described in the literature as producing a mixed phenotype [37, 38], was identified in 1:9900 persons studied in the gnomAD database (MAF 0.00005051). It was mainly present in people of African descent, with a prevalence of 1:924 (MAF 0.0005411), and only one case was identified in a person in the European population (MAF 0.000007892).

The c.238A>G (p.(Thr80Ala)) variant that only appeared in the European population in our study, with a prevalence of 1:7507 (MAF 0.00006660), has been described previously in a cluster in Northern Ireland, where it was estimated that 1.1% of the population carries these variant, and in isolated cases in England and Scotland [22, 39].

Cases of c.349G>T (p.(Ala117Ser)) that appeared in the gnomAD database were all people of an Eastern Asia population, where the prevalence was estimated at 1:9,431 (MAF 0.000106). This amyloidogenic variant has been described in the literature as one of the most common in data from Chinese populations [40], which is consistent with our findings.

In the group of variants that probably affects function there is predominance according to ethnicity in some variants. The variants c.367C>T (p.(Arg123Cys)) and c.367C>G (p.(Arg123Gly)) appear in gnomAD only in European population, other frequent variants as c.235A>G (p.(Thr79Ala)) or c.406T>C (p.(Thr136His)) are represented, almost exclusively, for Latin population. Therefore, in this group there is also an association between the variants and the ethnic groups.

The data derived from the present study supports the idea of an underdiagnosis of ATTR-m. Some current studies indicate this trend, concluding that the overall prevalence figure is higher than what has been reported to date [7, 17]. The prevalence of ATTR-m polyneuropathy has been estimated to be 5000 to 10,000 cases of disease worldwide, and some authors have concluded that it could be as high as 40,000 cases [17]. Other studies indicated that 32% of patients are undiagnosed [35]. For ATTR-m with cardiomyopathy, there are an estimated 40,000 cases [17]. This is thought to be an underestimation because, for example, 5% patients with hypertrophic cardiomyopathy have ATTR-m [18].

According to the data obtained in this work and comparing it with the current literature, the prevalence of the c.424G>A (p.(Val142Ile)) variant is well established in the African population, but not in the rest of the populations, with lower frequency such as in the European population. The problem of underdiagnosis of this variant is frequent in all population. For other variants, including c.148G>A (p.(Val50Met)), there is an underestimation of prevalence and underdiagnosis more notable in non-endemic areas than in those regions recognised as endemic. This may be caused in part due to the low clinical suspicion and the lack of family history of the patients in the non-endemic areas.

Because of large population studies like gnomAD, the prevalence of variants involved in various diseases can be studied; [41,42,43] however, one of the apparent limitation of the use of these databases in amyloidosis ATTR is that the prevalence of amyloidogenic variants in the population is not the same as disease prevalence, so the data should be taken with caution. In addition, phenotype data are not available for gnomAD samples. That is, this study describes the frequency of people carriers, presymptomatic carriers should be identified and receive adequate follow-up so that in the event of the occurrence of symptoms, pharmacological treatment should be instituted without delay [12].

Another fact to take into account is that affects function variants are associated to the formation of misfolded proteins. But for new variants included in other groups, these studies are not available, it would be an interesting study to better understand of the new variants.

In addition, it should be considered that not all populations are equally represented, with the European population being the majority. Populations like Asian or Latino should be more present to have more appropriate conclusions to these groups.

In conclusion, although recent data suggest that the actual prevalence of amyloidosis ATTR-m could be much higher than what has been reported to date, data provided by an analysis of the gnomAD database are even more compelling, opening the possibility that might be hundreds of thousands of people having potentially disease-causing variants.