Introduction

In recent years, transthyretin-related amyloidosis (ATTR) has emerged as the most common cause of amyloid cardiomyopathy in Italy [1]. This condition can present as an isolated, non-familial disease defined as wild-type ATTR, or as a monogenic dominant disease caused by protein-altering variants in the TTR gene, encoding transthyretin on chromosome 18 (ATTRv). In both cases, the disease is caused by the structural dissociation of the stable tetrameric protein into unstable monomers subject to misfolding and self-assembly with abnormal fibrillar conformations. Such aberrant fibrils form an insoluble amyloid matrix that accumulates in the extracellular space of several tissues, including the myocardium, with the variant position on the protein being determinant with regards to the preferential amyloid accumulation in the peripheral nervous system or in the heart muscle [2]. Among the > 120 known ATTRv-causing variants, the substitution of the valine residue with an isoleucine residue at position 142 (p.Val142Ile, previously described as p.Val122Ile referring to the altered residue number in the mature protein) is the commonest worldwide. The variant has a particularly high prevalence (up to 4%) in West African countries and in the community of African descent living in the USA [3, 4], mainly formed of descendants of slaves deported to America in the past centuries from the westernmost African countries. In the Genome Aggregation Database (gnomAD) the overall variant prevalence in the broader African population is 1.6%, > 25-fold higher than that observed in any other gnomAD super-population. The higher minor allele frequency (MAF) of the p.Val142Ile variant in African-ancestry populations is the reason why this variant has been primarily studied in Africans and African-Americans, and is regarded as a prevalent cause of ATTRv almost uniquely in patients from these groups [3,4,5]. However, during the last decade it became increasingly evident that the prevalence of p.Val142Ile was markedly higher in Tuscany (a region of central Italy) compared with that of other regions populated mainly by individuals of European descent [6,7,8,9,10]. The fact that Tuscans represent an exception not only towards the other Europeans broadly speaking, but also within the somewhat more specific context of Southern Europe is reflected by the low MAF of this variant observed in the Southern European subgroup of the gnomAD database (0.026%).

The frequency of adult and newborn carriers of p.Val142Ile is comparable, suggesting that reproductive fitness of heterozygotes is not diminished. This hypothesis is corroborated by the fact that ATTRv onset generally occurs between the seventh and the eighth decade of life, well beyond the reproductive age [11, 12]. This, in addition to the wild-type nucleotide being a guanine residue within a CpG dinucleotide (long suggested to be mutational hotspots) [13], increases the likelihood that the variant has arisen de novo in the population, with subsequent local spread. If it arises in small communities subject to reduced gene flow for example owing to geographical boundaries, significant oscillations of allelic frequencies may occur. Therefore, a variant possessing these characteristics may become significantly more prevalent in the local population than elsewhere. Such a phenomenon, i.e. a founder effect, has been shown to account for the exceptional prevalence of specific cardiovascular disease-causing variants in local communities [14,15,16,17,18]. However, owing to its position in the Mediterranean Sea and to its broad latitude span, with its southernmost islands being closer to the North African coast than to the rest of the country, Italy’s history has been shaped by migration and colonization. As a result, the genome of modern Italians displays remarkable genetic variability and carries Middle Eastern and North African signatures, especially in southern Italian populations [19, 20]. In addition, due to its geographical conformation Italy is rife with genetic isolates, liable to genetic drift. For these reasons, we investigated the high prevalence of p.Val142Ile in Tuscany, to determine whether the high occurrence observed among unrelated patients followed at our center may be due to recent admixture events or whether the variant arose spontaneously in the local population with a prominent founder effect. The latter scenario would constitute a proof of principle that the pathogenic variant is also likely to have arisen spontaneously in other communities around the world, and that it should be deemed a potentially common cause of ATTRv also in non-African communities.

Materials and methods

Genotyping and sequencing

Samples (N = 47, comprising 16 probands carrying the p.Val122Ile variant and 30 family members, besides the NA10851 control sample from the Coriell Institute) have been genotyped on 128 SNPs selected for being highly informative for ancestry (EUROFORGEN Global AIM-SNP set [21]). In addition, the full TTR gene was sequenced (including 20 kb flanking regions) with a HaloPlex-based custom panel (Agilent Technologies, CA, USA). Paired-end sequencing was performed with the Illumina MiSeq platform.

Clinical evaluation and recruitment

All the analyzed probands and relatives were clinically evaluated by means of ECG and echocardiography and/or cardiac magnetic resonance imaging by accredited cardiologists at the Tuscan Regional Amyloidosis Center in Florence, Italy. ATTR diagnosis was issued — following published guidelines — either relying on immunohistochemistry/electron microscopy-based analysis of a biopsy stained with anti-TTR antibodies (irrespective of cardiac uptake at bone-tracer scintigraphy), or following a non-invasive workup (i.e. in case of grade 2 or 3 cardiac uptake at bone scintigraphy after exclusion of a monoclonal gammopathy [22, 23]. Prior to blood sample collection, study subjects read and signed a document regarding the anonymized research use of the collected sample.

Data processing and bioinformatics analysis

Sequencing reads were aligned to the hg19 reference genome and variant calling was performed with the Genome Analysis Toolkit (v.3.3.0). Variants were annotated with Ensembl VEP (v104). Principal component analysis (PCA) was performed including the 2504 samples from the 1000 Genomes Project (Phase 3) individuals as reference population. PCA was performed using PLINK (v1.9).

The linkage disequilibrium (LD) structure analysis on the analyzed samples and on the Tuscan individuals comprised in the 1000 Genomes Project (TSI, N = 107) was performed using Haploview [24]. Prior to the Haploview analysis, phasing was performed using Eagle (v2.4.1) [25] with TSI as a reference set. Haploview automatically selected the “maximum unrelated subset” (N = 29) from the 45 probands and family members of ascertained European descent. LD was measured by means of the D’ metric, and LD blocks were defined using the method proposed by Gabriel et al. [26].

Haplotypes were identified by means of Haploview [24]. The haplotype median-joining network was built using Network (v10.2.0.0) [27], including data from our 46 samples as well as from the TSI subset of the 1000 Genomes Project, Phase 3 (N = 107). The median-joining network was built on raw nucleotide data of genotypes observed at the 20 haplotype-defining variant sites. VariantValidator was used to validate variant notation and description [28].

Results

A total of 46 individuals from 16 families in which the p.Val142Ile variant segregates were analyzed (Fig. 1), and sample NA10851 (Coriell Institute) was used as a reference control individual of European descent. All the analyzed families reside in Tuscany (a region of central Italy roughly the size of New Jersey) at the time of study, are unrelated in recent generations and are not reported consanguineous, and are followed at the Tuscan Regional Amyloidosis Centre in Florence. Morpho-functional cardiac parameters and general clinical characteristics of study subjects are reported in Supplementary Table 1. A principal component analysis (PCA) was conducted on the 16 probands (alongside the NA10851 reference sample) together with the 1000 Genomes Project (phase 3) data (N = 2504) to exclude batch effects and dissect the study subjects’ ancestry. The first two principal components explained 70.5% of the total variance, and – besides demonstrating the absence of specific batch effects due to the positioning of the internal NA10851 sample – confirmed the European ancestry of 15 of the 16 probands (Fig. 2). The remaining isolated proband (sample A228), excluded from further analyses, was positioned by the PCA within the smear of native Americans, and his clinical records confirmed his Argentinian origin. Of note, the family of A228 was the only one comprised in the analysis not originally from Tuscany, according to the interviewed probands and family members.

Fig. 1: Pedigrees of the analyzed families where p.Val142Ile-related ATTRv segregates.
figure 1

The pedigree structure of families from which multiple individuals have been included in the study is displayed in panels (AJ).

Fig. 2: Principal component analysis (PCA) of the 16 probands together with the 2504 individuals of the 1000 Genomes Project (Phase 3).
figure 2

Individual NA10851 is plotted twice for quality control purposes, having been genotyped both within the 1000 Genomes Project (NA10851_1KG) and in-house (NA10851_internal). The proband clustering within the smear of native Americans is A228.

Once the African origin of the p.Val142Ile variant in the study subjects was excluded, we performed linkage disequilibrium (LD) analysis to examine the haplotype structure of the region comprising TTR on chromosome 18. This revealed that the entire TTR gene (approximately 7.3 kb long) lies within a linkage block that is subject to complete LD in the analyzed families as well as in Tuscans from the 1000 Genomes Project (TSI, N = 107; Fig. 3), excluding the occurrence of major recombination events across this locus in the study populations and confirming the genetic similarity between the analyzed families and TSI. Four common SNPs (MAF > 5% in Europeans in the 1000 Genomes Project) were identified as haplotype-defining tag SNPs (rs3764478, rs72922940, rs1791228 and rs1791229), and subsequent analysis unveiled how p.Val142Ile carriers all share the T-A-T-G haplotype (carried by 12.6% TSI individuals; Fig. 3). Finer, sub-haplotype phylogenetic network analysis performed including rare variants to obtain finer resolution showed that the p.Val142Ile variant appeared on a specific ancestral Tuscan haplotype found in <3% of TSI (Fig. 4).

Fig. 3: Linkage disequilibrium (LD) structure.
figure 3

LD structure of a 17 kb region comprising the TTR gene in 107 Tuscans (TSI) comprised by the 1000 Genomes Project (a, on the left) and in the maximum unrelated subset selected by Haploview from the study subjects (N = 29; b, on the right). In both cases, a unique LD block is detected across this region (chr18:29165918-29183812). Numbers in the LD squares represent D’ values, and the red color represents full LD (D′ = 100). The light blue bars above the LD plots represent real relative distances between the displayed SNPs (selected for having a frequency >5% in Europeans in the 1000 Genomes Project) within the 17 kb region. The four SNPs in black rectangular frames are the haplotype-defining (tag) SNPs, and the resulting haplotype frequencies are reported in the table below, with the 4 letters representing the alleles of the tag SNPs in the same order as they are represented in the LD plots. Of note, the T-A-T-G haplotype is found on 39.7% chromosomes in the study subjects (with all 15 probands of European ancestry being T-A-T-G carriers, with proband A204 being homozygous) and only on a minority 12.6% of TSI chromosomes. Details of all the variants represented in the figure are reported in Supplementary Table 3.

Fig. 4: Median-joining haplotype network constructed on a total of 136 individuals (N = 107 TSI and N = 29 comprising the maximum unrelated subset extracted from the study subjects), obtained including variants at all frequencies for finer dissection (including p.Val142Ile).
figure 4

Ancestral haplotypes (forming the network torso) are represented in light blue (or purple). Ancestral haplotypes not detected in the analyzed sample set, but of which the existence is hypothesized are represented as “median vectors” (mv), in red. The two haplotypes with the p.Val142Ile variant are represented in yellow (VAR1 and VAR2) and the unique ancestral haplotype on which p.Val142Ile appeared in Tuscany is represented in purple (HAP16). Circle areas are proportional to the number of chromosomes carrying the haplotype, with HAP16 detected on 3 chromosomes in the TSI set and HAP1 (the most prevalent haplotype in TSI) found on 93. The full haplotype sequence is displayed for the most prevalent ancestral haplotype (HAP1) and for the two haplotypes with the p.Val142Ile variant. In the former, alleles that are different from those defining HAP1 are highlighted in red. The p.Val142Ile allele is displayed with a light blue background. Ticks along the network edges represent the number of differing alleles between the connected haplotypes. The full sequence of all haplotypes is provided in Supplementary Table 2 and details of all the haplotype-defining variants are reported in Supplementary Table 3.

Discussion

The TTR:p.Val142Ile variant has historically – and rightfully – been considered a prevalent genetic cause of ATTRv in individuals of African descent, with a limited number of cohort- or population- based analyses on its prevalence in individuals of different ancestries, consistently reporting a low prevalence in Europeans [29, 30]. However, more recently there have been repeated reports of regional recurrence or higher-than-expected prevalence among individuals of European ancestry in studies conducted both in Europe and in North America, questioning the African-specificity of the p.Val142Ile allele [8,9,10]. We originally reported an unexpectedly high occurrence of the p.Val142Ile variant among unrelated individuals referred to our center for suspected ATTRv, and hypothesized a founder effect [6, 7]. Alternative hypotheses were those of independent mutational events or recent migrations from Africa, though the former would be particularly unlikely, and the latter was not easily attributable to known historic events, given that the Italian population is geographically closer to North African peoples rather than to sub-Saharan populations. In the present work, we collected definitive evidence demonstrating the presence of a founder effect in central Italy (more specifically in Tuscany) leading to a regionally high prevalence of the otherwise rare, ATTRv-causing p.Val142Ile variant in TTR. Our conclusion is based on several observations: first, Tuscan carriers do not possess any genetic signature of ancestral African descent (Fig. 2), ruling out the possibility of being the descendants of hypothetical sub-Saharan African individuals crossing the Mediterranean Sea in the past centuries. Second, the unrelated Italian probands analyzed share the T-A-T-G haplotype in the region comprising TTR on chromosome 18. It is extremely unlikely that the 15 probands of European descent (i.e. 30 chromosome-18 pairs) included in this study carry this haplotype by chance, considering that this haplotype is found on only 12.6% chromosomes-18 in the Tuscan population, as shown in Fig. 3 (binomial test for 16 of 30 chromosomes with the T-A-T-G haplotype (14 heterozygotes and 1 homozygote individual), p = 1.009 × 10−7). This, and the fact that the only proband not carrying the T-A-T-G haplotype is the Argentinian individual A228 (carrying G-A-C-T and G-A-T-G) de facto suggest that the hypothesis of separate and independent mutational events being at the basis of the p.Val142Ile recurrence in Tuscany can be rejected. Third, finer-grained haplotype network analysis including rare variants identified a total of 33 haplotypes in 136 Tuscan individuals (107 from the 1000 Genomes Project and 29 forming the “maximum unrelated subset” from the analysed families) showed how p.Val142Ile has arisen on a specific ancestral haplotype carried by only 3 of the 107 Tuscans analyzed in the 1000 Genomes Project (Fig. 4).

The ascertained presence of a large founder effect outside Africa in subjects of European descent comes with the clinically relevant implication that the overall prevalence of the p.Val142Ile variant worldwide — and especially in European populations — may be significantly higher than previously believed. Of note, white carriers of p.Val142Ile are characterized by a similar, yet often distinguishable, disease phenotype compared with those of African ancestry [31]. This is in contrast with the second commonest ATTRv-causing TTR variant (p.Val30Met), associated with two major founder effects in Portugal and in Sweden that are characterized by radical clinical differences, with Portuguese carriers presenting with an early-onset and mainly neurological form of ATTRv and Swedish ones with later-onset ATTRv and a mixed phenotype with both neurological and myocardial involvement [32].

In conclusion, we believe these results suggest that the actual prevalence of the p.Val142Ile variant in populations of non-African descent is higher than previously believed, warranting routine TTR screening in European-descent patients also if initially referred for hypertrophic cardiomyopathy (HCM). Although careful clinical evaluation can reveal differences between HCM and ATTR, the distinction is not trivial and a genotype-first approach would streamline differential diagnosis [33]. Whenever TTR is tested, the identification of p.Val142Ile or of another pathogenic variant in the gene would translate into prompt and unequivocal ATTRv diagnosis per se, enabling referral to a specialized ATTR center and providing the chance of an early treatment start for carriers [34]. This scenario would be beneficial to these patients and their relatives not only in terms of quality of life, but also in terms of improved survival [35]. More broadly speaking, this study also serves an example of how founder populations can be helpful in reconstructing the (migratory, in principle) history of regional communities.